Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-07 Thread Max Vlasov
On Sat, Feb 4, 2012 at 3:51 PM, Alexey Pechnikov wrote:

> It's very important but there are some questions about
> http://www.sqlite.org/src/info/72b01a982a
> Some times ago DRH wrote that checksum calculation don't slow down
> SQLite significantly.
> But can be this realized in current SQLite 3.x branch? When checksum
> can help to restore
> damaged pages/database?
>


Very interesting discussion )

As I long time fun of vfs system of sqlite I thought about implementing
this as a vfs extension. At first it seemed that changing the format is a
necessity. Although there's also another approach. This vfs can implement
the checksum data as a separate file ({dbname}.chk) and work with existing
format, so passing all read/write operation unchanged and changing only
records of the chk file. There are two different scenarios
- When the software that modified the db opens the db correctly (using this
vfs)
- When some other tool (not aware of this vfs) modified the contents of the
db. In this case the vfs can track File change counter (the one at offset
24) and compare the one saves in chk file and the current one. The mismatch
can produce a warning, i.e as a recommendation to rebuild checksum db
(although don't know whether the vfs can report some existing error). The
same effect can be if the chk file is missing.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-07 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/02/12 12:14, Nico Williams wrote:
> You need to be able to store the checksums where the page pointers are
> stored.

The page containing the page pointers is itself checksummed.  While there
are still some extremely rare cases that isn't sufficient for, it is
certainly enough, way better than what we already have and not as disruptive.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk8xiAcACgkQmOOfHg372QR26wCdG2pjahqw6NAW5XB7oeuBRg3U
/sYAn16crIpmamTSK/WVetJh4VomvcmI
=943A
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-07 Thread Nico Williams
On Tue, Feb 7, 2012 at 11:12 AM, Michael Stephenson
 wrote:
> It's almost trivial to add per-page checksums at the page level.  Here are 
> basic steps:

This is not enough, though it's a lot better than nothing.  You need
to be able to store the checksums where the page pointers are stored.
This requires extensive changes to the btree layer in SQLite3.  And it
requires COW-ish updates too (which WAL basically provides, but WAL is
an option).

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-07 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/02/12 09:12, Michael Stephenson wrote:
> It's almost trivial to add per-page checksums at the page level.

You omitted the steps of extensive testing and documentation :-)

> Given the above, would per-page checksums be better served up as an
> extension, rather than written into the core database code?

I'd want it in the core because it would get the other details right.  For
example the write version can be bumped which would ensure SQLite versions
without this functionality would not write to the database.

I'd also want a pragma added to control turning it on/off which likely
would also involve a vacuum to rewrite the pages.

Finally I'd also want the full integrity check to actually read and check
every page.

Or in other words, a more complete implementation is possible when written
into the core.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk8xX3kACgkQmOOfHg372QRRSgCdEQmyYXVruKCc20Fs+eQfra4y
riYAn0J666Fca3JmckTn6v4Jxn6ACmwa
=SZXl
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-07 Thread Michael Stephenson
It's almost trivial to add per-page checksums at the page level.  Here are 
basic steps:

 

1)  Define SQLITE_HAS_CODEC when building.  This will cause 5 unresolved 
externals (functions declared but not defined/implemented).  You need to 
provide implementations of these 5 functions:

a.   sqlite3_activate_see:  can provide an empty implementation.  It is 
required for the SEE security extension, not needed here.

b.  sqlite3_key:  can provide an empty implementation.  This is used for 
encryption extensions to set the encryption key, not needed here

c.   sqlite3_rekey:  can provide an empty implementation.  This is used for 
changing the encryption key on a database, not needed here.

d.  sqlite3CodecAttach:  This is the only function of these 5 that needs 
implementing.  Two things are required:

   i.  Reserve 
space at the end of each database page for storing the checksum for that page.  
(via a call to sqlite3BtreeSetPageSize).  

 ii.  Wire our 
“codec” into the Pager via a call to sqlite3PagerSetCodec and passing a pointer 
to our “codec” function.

e.  sqlite3CodecGetKey:  can provide an empty implementation.  SQLite will 
call this to get the encryption key, not needed here.

2)  Implement a “codec function” that the Pager will call each time a page 
is read from disk or written to disk.  Perhaps we could call this function 
XCodec.  A pointer to the function is passed to SQLite when 
sqlite3PagerSetCodec is called.  The XCodec function would verify the checksum 
if the page was being read and write the checksum if the page is being written. 
 What to do if a problem was found should probably be implementation specific.

 

That’s about it, minus some detail.

 

Given the above, would per-page checksums be better served up as an extension, 
rather than written into the core database code?  Note that none of the above 
steps require changes to the SQLite amalgamation; rather the “codec” code can 
be compiled in as an independent unit and linked in.

 

Using the Pager hooks would mean that this would conflict with either SEE or a 
home-grown encryption extension, but my assumption would be that that extension 
would have its own functionality for per-page checksums or HMACs.

 

-Original Message-
From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] 
On Behalf Of Nico Williams
Sent: Monday, February 06, 2012 4:49 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] About new ticket "Per page/region checksums"

 

On Mon, Feb 6, 2012 at 2:27 PM, Roger Binns < <mailto:rog...@rogerbinns.com> 
rog...@rogerbinns.com> wrote:

> On 06/02/12 11:35, Nico Williams wrote:

>> Indeed, but if you'd do integrity protection at the application layer 

>> then surely you'd have have a backup/restore strategy to deal with 

>> lower-layer corruption.

> 

> Only if you know about that corruption! It is perfectly possible for 

> stray zeroes to be written into the btrees in such a way that they are 

> still considered valid, but you have effectively amputated a row.

 

No, you can do something about that: you could store a hash of the XOR of the 
hashes of all the rows (per-table).  To check integrity simply get all the 
rows, compute the running XOR of hashes, then when you're done hash the result 
and then check that against the stored value for the whole DB.

 

(You could also rebuild indexes after checking DB integrity.)

 

>> However, if you're going to have SQLite3 do it

> 

> Note that SQLite already has a way of reserving space on a per page basis.

>  It is currently used by the encryption extension.  It is possible to 

> store a checksum there instead which would still be readable but not 

> writeable by older versions of SQLite.

 

Interesting.

 

> Since the btree pages would also have checksums things should work in 

> most scenarios.  Any solution would be better than the current no 

> solution, especially as checksums let you discover corruption early.

 

Yes.

 

>> Well, one can hope.  (I don't keep up with btrfs; perhaps it will 

>> mature and become common on Android devices.)

> 

> Approximately zero percent of SQLite databases will be on btrfs for 

> the foreseeable future, even if you and I start using btrfs.  Android 

> 3 and above do support encrypting the filesystem which effectively 

> gives you a lot of the checksumming, although filesystem 

> implementations tend to care far more about metadata than file contents.

 

Encryption is not enough.  You really need block pointers to carry the block 
checksum/hash/MAC/integrity tag.

 

> Windows 8 server will be coming with a filesystem that does integrity 

> checking.  However again approximately z

Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-07 Thread Alexey Pechnikov
Roger, it's good to know the reasons of SQLITE_CORRUPT signal. And
very useful is possibility to mark all corrupted database pages as
unused in repair process and so make the database workable again.
And this repair may be processed only by SQLite internals because is
needed ignore corrupted pages and all related pages too. As example
ZFS can't resolve dependencies in SQLite files between tables and
indicies pages.

P.S. RAID solutions is not the "silver bullet" and we can get two or
more versions of the _differently_ corrupted database files on RAID
disks after crash of RAID. And there are a lot of embedded/mobile
devices without additional protection by RAID and without of regular
backups.

P.P.S. Is it possible to dump SQLite database with corrupted schema? I
think no. But we can copy some first database pages from any work copy
of the database like to
dd if=backup.db of=corrupted.db bs=4096 count=100
and repair after database by using per pages checksums.

-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-06 Thread Simon Slavin

On 6 Feb 2012, at 11:31pm, Roger Binns wrote:

> [nicely rebuffed my earlier post]

Okay, you convinced me.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-06 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/02/12 14:38, Simon Slavin wrote:
> File systems (FAT, NTFS, HTFS) already have block checksums.

Huh?  FAT has a checksum on the super block but not on datablocks.  NTFS
does not have block checksums.  That is why they are introducing ReFS.
What evidence do you have for your claims - I couldn't find any?

Some file formats do have checksums (eg zip and exe) but that isn't relevant.

There are some checksums but not particularly strong between drive
platters, firmware, drive controllers, and the host.  Quite simply if
existing systems had working checksums then the only times people would
get SQLITE_CORRUPT would be SQLite bugs.

> Problem is, what would you do if a checksum was wrong ?  Signal it in a
> newly-invented result code ?

Return SQLITE_CORRUPT -  a code that has existed forever.

> All apps would have to have some logic to cope with the situation.

You mean like they already have to do?  In fact something that is even
mentioned in the FAQ:

  http://www.sqlite.org/faq.html#q21

> I don't think it's worth doing this in a thin/light/fast system like
> SQLite.

Then don't use it.  Just because you use systems with perfect data
integrity doesn't mean the rest of us do.

> And the number of bugs that cause corruption that would be spotted this
> way seems to be low.

http://search.gmane.org/search.php?group=gmane.comp.db.sqlite.general=SQLITE_CORRUPT

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk8wYsQACgkQmOOfHg372QRLtgCgxdtewXE/45VSF0pX80wGApm/
s5kAmwawvPSLPAjtRpCv5vUrS3/DUmJX
=xcIk
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-06 Thread Nico Williams
On Mon, Feb 6, 2012 at 4:38 PM, Simon Slavin  wrote:
> On 6 Feb 2012, at 9:49pm, Nico Williams wrote:
>> Encryption is not enough.  You really need block pointers to carry the
>> block checksum/hash/MAC/integrity tag.
>
> File systems (FAT, NTFS, HTFS) already have block checksums.  So do network 
> file systems (AFP, SMB, NFS).  Adding more to SQLite, whether at a row or 
> page level, would catch only cases where the communication was correct at the 
> storage level but wrong at the semantic level -- which /should/ mean it can 
> catch /only/ bugs in SQLite.

I don't know of any filesystems other than ZFS, btrfs, and the new one
from Microsoft, that store checksums of blocks *with the pointers*.

> Problem is, what would you do if a checksum was wrong ?  Signal it in a 
> newly-invented result code ?  Then what ?  All apps would have to have some 
> logic to cope with the situation.  What should a simple app do about it ?  
> How many more lines of code would that add to apps which use SQLite specially 
> because of its low overheads ?

If you have redundancy you can use the checksum to find the best copy
of a block, or, in the case of a RAID-5-like storage strategy, to
reconstruct the correct version of the block.

If you can't do any of that, then you can fail.  This is painful for
the user, but the alternative is to return garbage without anyone
knowing it.

> I don't think it's worth doing this in a thin/light/fast system like SQLite.  
> Not only does it all weight to the API, it also adds weight to any app which 
> uses it.  And the number of bugs that cause corruption that would be spotted 
> this way seems to be low.  Instead perhaps it would be better to look into 
> logic which does simple monitoring at the structure level, e.g. checking that 
> if a command should add three rows, it really does add three rows.  (Although 
> some sort of analysis of the corruption bugs previously found in SQLite 
> should be undertaken first.)  These can be enabled/disabled with a 
> compilation flag, whereas a checksum can't because you always need to 
> calculate it.

I would generally prefer to let the filesystem do this.  However,
Roger has a point that it will be years before a significant number of
SQLite3 databases are hosted on filesystems that provide strong
end-to-end integrity protection, and that is a good argument for doing
the protection in the DB.  (Well, it's also a strong argument for
asking the OS vendors just when they're going to get off their butts
and ship reliable filesystems.  But that's another story, and anyways,
it's likely all in the works at this time for all major OS vendors.)

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-06 Thread Simon Slavin

On 6 Feb 2012, at 9:49pm, Nico Williams wrote:

> On Mon, Feb 6, 2012 at 2:27 PM, Roger Binns  wrote:
>> Approximately zero percent of SQLite databases will be on btrfs for the
>> foreseeable future, even if you and I start using btrfs.  Android 3 and
>> above do support encrypting the filesystem which effectively gives you a
>> lot of the checksumming, although filesystem implementations tend to care
>> far more about metadata than file contents.
> 
> Encryption is not enough.  You really need block pointers to carry the
> block checksum/hash/MAC/integrity tag.

File systems (FAT, NTFS, HTFS) already have block checksums.  So do network 
file systems (AFP, SMB, NFS).  Adding more to SQLite, whether at a row or page 
level, would catch only cases where the communication was correct at the 
storage level but wrong at the semantic level -- which /should/ mean it can 
catch /only/ bugs in SQLite.

Problem is, what would you do if a checksum was wrong ?  Signal it in a 
newly-invented result code ?  Then what ?  All apps would have to have some 
logic to cope with the situation.  What should a simple app do about it ?  How 
many more lines of code would that add to apps which use SQLite specially 
because of its low overheads ?

I don't think it's worth doing this in a thin/light/fast system like SQLite.  
Not only does it all weight to the API, it also adds weight to any app which 
uses it.  And the number of bugs that cause corruption that would be spotted 
this way seems to be low.  Instead perhaps it would be better to look into 
logic which does simple monitoring at the structure level, e.g. checking that 
if a command should add three rows, it really does add three rows.  (Although 
some sort of analysis of the corruption bugs previously found in SQLite should 
be undertaken first.)  These can be enabled/disabled with a compilation flag, 
whereas a checksum can't because you always need to calculate it.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-06 Thread Nico Williams
On Mon, Feb 6, 2012 at 2:27 PM, Roger Binns  wrote:
> On 06/02/12 11:35, Nico Williams wrote:
>> Indeed, but if you'd do integrity protection at the application layer
>> then surely you'd have have a backup/restore strategy to deal with
>> lower-layer corruption.
>
> Only if you know about that corruption! It is perfectly possible for stray
> zeroes to be written into the btrees in such a way that they are still
> considered valid, but you have effectively amputated a row.

No, you can do something about that: you could store a hash of the XOR
of the hashes of all the rows (per-table).  To check integrity simply
get all the rows, compute the running XOR of hashes, then when you're
done hash the result and then check that against the stored value for
the whole DB.

(You could also rebuild indexes after checking DB integrity.)

>> However, if you're going to have SQLite3 do it
>
> Note that SQLite already has a way of reserving space on a per page basis.
>  It is currently used by the encryption extension.  It is possible to
> store a checksum there instead which would still be readable but not
> writeable by older versions of SQLite.

Interesting.

> Since the btree pages would also have checksums things should work in most
> scenarios.  Any solution would be better than the current no solution,
> especially as checksums let you discover corruption early.

Yes.

>> Well, one can hope.  (I don't keep up with btrfs; perhaps it will
>> mature and become common on Android devices.)
>
> Approximately zero percent of SQLite databases will be on btrfs for the
> foreseeable future, even if you and I start using btrfs.  Android 3 and
> above do support encrypting the filesystem which effectively gives you a
> lot of the checksumming, although filesystem implementations tend to care
> far more about metadata than file contents.

Encryption is not enough.  You really need block pointers to carry the
block checksum/hash/MAC/integrity tag.

> Windows 8 server will be coming with a filesystem that does integrity
> checking.  However again approximately zero percent of SQLite databases
> will be stored on that filesystem for the foreseeable future.

Right.

> SQLite having its own checksumming will solve the integrity issues today,
> for hundreds of millions of users, especially when developers consider
> data very important.  It will also solve some of the corruption issue
> finger pointing - if a page has a correct checksum but SQLite considers
> the contents corrupt then the most likely explanation is a bug in SQLite,
> while an invalid checksum points to the contents being different than what
> SQLite wrote hence points fingers at the operating system and storage.

Agreed.  For this, again, I'd prefer page-level hashing, with hash
values stored with page pointers.  This way the overhead of integrity
checking would be paid once per-I/O.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-06 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/02/12 11:35, Nico Williams wrote:
> Indeed, but if you'd do integrity protection at the application layer 
> then surely you'd have have a backup/restore strategy to deal with 
> lower-layer corruption.

Only if you know about that corruption! It is perfectly possible for stray
zeroes to be written into the btrees in such a way that they are still
considered valid, but you have effectively amputated a row.

> Yes, this tends to argue that if you want row-level checksums

My point is that they are fairly useless since they only work if btrees
and indices work perfectly.  Anything that could affect rows could also
affect the other pieces of information SQLite is storing.

> However, if you're going to have SQLite3 do it

Note that SQLite already has a way of reserving space on a per page basis.
 It is currently used by the encryption extension.  It is possible to
store a checksum there instead which would still be readable but not
writeable by older versions of SQLite.

Since the btree pages would also have checksums things should work in most
scenarios.  Any solution would be better than the current no solution,
especially as checksums let you discover corruption early.

> Well, one can hope.  (I don't keep up with btrfs; perhaps it will 
> mature and become common on Android devices.)

Approximately zero percent of SQLite databases will be on btrfs for the
foreseeable future, even if you and I start using btrfs.  Android 3 and
above do support encrypting the filesystem which effectively gives you a
lot of the checksumming, although filesystem implementations tend to care
far more about metadata than file contents.

Windows 8 server will be coming with a filesystem that does integrity
checking.  However again approximately zero percent of SQLite databases
will be stored on that filesystem for the foreseeable future.


http://blogs.msdn.com/b/b8/archive/2012/01/16/building-the-next-generation-file-system-for-windows-refs.aspx

SQLite having its own checksumming will solve the integrity issues today,
for hundreds of millions of users, especially when developers consider
data very important.  It will also solve some of the corruption issue
finger pointing - if a page has a correct checksum but SQLite considers
the contents corrupt then the most likely explanation is a bug in SQLite,
while an invalid checksum points to the contents being different than what
SQLite wrote hence points fingers at the operating system and storage.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk8wN6oACgkQmOOfHg372QTDnACgmHiKBZ43TbYlVGPRxN5a9UF5
uGkAoIbjVsjuA7IuATyC/0M2IDDE6IJN
=o4lD
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-06 Thread Nico Williams
On Mon, Feb 6, 2012 at 1:20 PM, Roger Binns  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 06/02/12 09:11, Nico Williams wrote:
>> A checksum per-row is certainly a valuable thing at the layer above the
>> RDBMS since it allows for integrity checking above the RBDMS, and in an
>> RBDMS-independent manner.
>
> It doesn't actually help that much since it is only over a portion of the
> database content.  While it will catch an individual row being corrupted
> it will not catch the btree that points to the table rows being corrupted.

Indeed, but if you'd do integrity protection at the application layer
then surely you'd have have a backup/restore strategy to deal with
lower-layer corruption.

>  If that container is "tweaked" then it could still be sufficiently valid
> but end up omitting the row so you will get wrong query results.  The same
> story applies to indices where a judicious tweak will not corrupt it, but
> will result in the wrong rows being selected.

Yes, this tends to argue that if you want row-level checksums you also
want to have SQLite3 do it as in Andy's solution.  However, if you're
going to have SQLite3 do it I'd rather it followed the ZFS pattern of
checksumming entire pages and storing the checksum with the page
pointers.

> It is nice that some filesystems are adding integrity protection (and in
> some cases recovery), but approximately zero percent of the systems out
> there running SQLite do not have the databases stored in such a configured
> filesystem today.

Well, one can hope.  (I don't keep up with btrfs; perhaps it will
mature and become common on Android devices.)

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-06 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/02/12 09:11, Nico Williams wrote:
> A checksum per-row is certainly a valuable thing at the layer above the
> RDBMS since it allows for integrity checking above the RBDMS, and in an
> RBDMS-independent manner.

It doesn't actually help that much since it is only over a portion of the
database content.  While it will catch an individual row being corrupted
it will not catch the btree that points to the table rows being corrupted.
 If that container is "tweaked" then it could still be sufficiently valid
but end up omitting the row so you will get wrong query results.  The same
story applies to indices where a judicious tweak will not corrupt it, but
will result in the wrong rows being selected.

Being able to integrity check the rows that are returned doesn't help if
they are the wrong rows!

It is nice that some filesystems are adding integrity protection (and in
some cases recovery), but approximately zero percent of the systems out
there running SQLite do not have the databases stored in such a configured
filesystem today.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk8wJ/YACgkQmOOfHg372QQ9AACgp1xLCbo9tEIqyF+Ar7ZqaKxi
xjoAnjx6AEN0lP6ZP9QwVSKC642hpfao
=wC3P
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-06 Thread Nico Williams
A checksum per-row is certainly a valuable thing at the layer above
the RDBMS since it allows for integrity checking above the RBDMS, and
in an RBDMS-independent manner.  Of course, this approach is easiest
to graft into SQLite3 simply because you'd be adding a hidden column
and the machinery for maintaining and checking it fits into a handful
of VM opcodes, thus barely changing the on-disk format, but...

... for an integrity protection scheme within the RDBMS I would argue
that it's better to have per-page checksums.  The argument is based on
the ZFS argument for checksumming entire blocks (and storing the
checksum with the block pointers): that you then pay for the cost of
checksumming only once per-I/O (whereas with a per-row solution you
either pay the checksum penalty every time you read that row, even
from cache, or you must do some extra bookkeeping to ensure that you
checksum no more than once per-{I/O, row}).  I suppose this might not
work out as well for an RDBMS as for ZFS since ZFS is able to get
offsetting benefits, such as reduced I/O times by compressing blocks.

Which brings me to a different point, which is that if you can host a
SQLite3 DB on a filesystem like ZFS then you don't have to concern
yourself quite as much with doing data integrity protection within
SQLite3.  What's the best layer for integrity protection in your
application, then?  So far we have three possible layers: app-layer,
SQLite3, filesystem.  If you trust your storage to provide integrity
protection you might add that as a layer.  I'm not sure that there's a
one-size-fits-all answer here, and I find your own solution to be
rather clever (since easy to graft in and keep up to date).

One common issue in all cases is: how to report integrity check
failures, and what else to do about them.  This is another area where
ZFS gets benefits you're not likely to get from higher-layers: since
ZFS integrates volume management into the filesystem it can use
integrity protection checksums to help correct errors when there is
redundancy (e.g., mirroring) in the volume.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] About new ticket "Per page/region checksums"

2012-02-06 Thread Andy Gibbs
On 4th February 2012 12:51pm, Alexey Pechnikov wrote:

> It's very important but there are some questions about
> http://www.sqlite.org/src/info/72b01a982a
> Some times ago DRH wrote that checksum calculation don't slow down
> SQLite significantly.
> But can be this realized in current SQLite 3.x branch? When checksum
> can help to restore
> damaged pages/database? Does powersave overwrite feature conflicts
> with idea of per pages
> checksums?

Obviously I can't speak for DRH and co., but I can speak from my own 
experience, and maybe it will be beneficial to the debate.

I implemented a checksumming feature into sqlite back in 2008/9 and have 
maintained it ever since and now have it in my branch of 3.7.10.  This 
checksum is an 8-bit crc and is calculated *per row* on any table for which 
it is enabled.  My implementation adds two bytes overhead to each row of 
checksummed tables in the database.  I found that when used on a 400mhz 
PowerPC embedded platform there was a very small but measurable performance 
regression when dealing with hundreds of thousands of rows of data compared 
to a database file not performing the crc calculation.  On a desktop PC this 
performance regression was not measurable (i.e. insignificant).

I won't go into the details of exactly how it has been implemented, unless 
someone is particularly interested, but to say this:

Since I observed a slight performance regression on slow hardware I made it 
possible to turn the feature on and off on a per-table basis.  This meant a 
small addition to the SQL syntax for CREATE TABLE.  I wanted the checksum 
calculation to only occur when the row was created and only when checking 
for checksum errors in the database so that in general use performance 
wouldn't be affected.  I also wished the checksum to be "invisible" to the 
user in normal operation, e.g. when performing a SELECT * FROM table, or 
when using an INSERT or UPDATE statement.

In the implementation it became necessary to upgrade the file format to "5" 
to support all these features, and to remain compatible with old database 
files.  In addition to PRAGMA legacy_file_format, I also have PRAGMA 
compatible_file_format, so that the user can choose between 0, 4 and 5 as 
the format for the database file.  Only in format "5" can the checksum be 
used.

The user cannot find out the checksum of a row, but instead I implemented an 
additional SQL function which checks the checksum of a particular row and 
returns '0' or '1' depending on whether the row had become corrupted.  A 
simple statement like this:

SELECT rowid FROM table WHERE !valid_row_checksum(rowid);

is enough to find corrupted rows.

If the checksum feature were to be added to Sqlite as standard, as proposed 
by Roger Binns, I would strongly suggest that it should be possible to 
fine-grain the corruption detection to a single row of data.  To an actual 
column within the row would be nice, but I think too much of an overhead to 
be truly worth while.  My implementation also didn't create any checksums on 
data other than row data.  For me that wasn't an issue, but I could see that 
in other applications it may be a nice feature.

I have a complete patch for my implementation with documentation and 
test-suite additions.  If anyone is interested, even the Sqlite 
developers(!), then drop me an email and I'll post it along.  I have no 
restriction on me stopping this patch being released to the public domain.

Andy




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] About new ticket "Per page/region checksums"

2012-02-04 Thread Alexey Pechnikov
It's very important but there are some questions about
http://www.sqlite.org/src/info/72b01a982a
Some times ago DRH wrote that checksum calculation don't slow down
SQLite significantly.
But can be this realized in current SQLite 3.x branch? When checksum
can help to restore
damaged pages/database? Does powersave overwrite feature conflicts
with idea of per pages
checksums?

P.S. The article "Berkeley DB Recoverability" provides some info about
WAL+checksum mode:
http://help.bdbxml.net/html/ed539869-eebd-478f-97de-7e5377e87f66.htm

-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users