Re: [sqlite] About new ticket "Per page/region checksums"
On Sat, Feb 4, 2012 at 3:51 PM, Alexey Pechnikovwrote: > It's very important but there are some questions about > http://www.sqlite.org/src/info/72b01a982a > Some times ago DRH wrote that checksum calculation don't slow down > SQLite significantly. > But can be this realized in current SQLite 3.x branch? When checksum > can help to restore > damaged pages/database? > Very interesting discussion ) As I long time fun of vfs system of sqlite I thought about implementing this as a vfs extension. At first it seemed that changing the format is a necessity. Although there's also another approach. This vfs can implement the checksum data as a separate file ({dbname}.chk) and work with existing format, so passing all read/write operation unchanged and changing only records of the chk file. There are two different scenarios - When the software that modified the db opens the db correctly (using this vfs) - When some other tool (not aware of this vfs) modified the contents of the db. In this case the vfs can track File change counter (the one at offset 24) and compare the one saves in chk file and the current one. The mismatch can produce a warning, i.e as a recommendation to rebuild checksum db (although don't know whether the vfs can report some existing error). The same effect can be if the chk file is missing. Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/02/12 12:14, Nico Williams wrote: > You need to be able to store the checksums where the page pointers are > stored. The page containing the page pointers is itself checksummed. While there are still some extremely rare cases that isn't sufficient for, it is certainly enough, way better than what we already have and not as disruptive. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk8xiAcACgkQmOOfHg372QR26wCdG2pjahqw6NAW5XB7oeuBRg3U /sYAn16crIpmamTSK/WVetJh4VomvcmI =943A -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
On Tue, Feb 7, 2012 at 11:12 AM, Michael Stephensonwrote: > It's almost trivial to add per-page checksums at the page level. Here are > basic steps: This is not enough, though it's a lot better than nothing. You need to be able to store the checksums where the page pointers are stored. This requires extensive changes to the btree layer in SQLite3. And it requires COW-ish updates too (which WAL basically provides, but WAL is an option). Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/02/12 09:12, Michael Stephenson wrote: > It's almost trivial to add per-page checksums at the page level. You omitted the steps of extensive testing and documentation :-) > Given the above, would per-page checksums be better served up as an > extension, rather than written into the core database code? I'd want it in the core because it would get the other details right. For example the write version can be bumped which would ensure SQLite versions without this functionality would not write to the database. I'd also want a pragma added to control turning it on/off which likely would also involve a vacuum to rewrite the pages. Finally I'd also want the full integrity check to actually read and check every page. Or in other words, a more complete implementation is possible when written into the core. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk8xX3kACgkQmOOfHg372QRRSgCdEQmyYXVruKCc20Fs+eQfra4y riYAn0J666Fca3JmckTn6v4Jxn6ACmwa =SZXl -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
It's almost trivial to add per-page checksums at the page level. Here are basic steps: 1) Define SQLITE_HAS_CODEC when building. This will cause 5 unresolved externals (functions declared but not defined/implemented). You need to provide implementations of these 5 functions: a. sqlite3_activate_see: can provide an empty implementation. It is required for the SEE security extension, not needed here. b. sqlite3_key: can provide an empty implementation. This is used for encryption extensions to set the encryption key, not needed here c. sqlite3_rekey: can provide an empty implementation. This is used for changing the encryption key on a database, not needed here. d. sqlite3CodecAttach: This is the only function of these 5 that needs implementing. Two things are required: i. Reserve space at the end of each database page for storing the checksum for that page. (via a call to sqlite3BtreeSetPageSize). ii. Wire our “codec” into the Pager via a call to sqlite3PagerSetCodec and passing a pointer to our “codec” function. e. sqlite3CodecGetKey: can provide an empty implementation. SQLite will call this to get the encryption key, not needed here. 2) Implement a “codec function” that the Pager will call each time a page is read from disk or written to disk. Perhaps we could call this function XCodec. A pointer to the function is passed to SQLite when sqlite3PagerSetCodec is called. The XCodec function would verify the checksum if the page was being read and write the checksum if the page is being written. What to do if a problem was found should probably be implementation specific. That’s about it, minus some detail. Given the above, would per-page checksums be better served up as an extension, rather than written into the core database code? Note that none of the above steps require changes to the SQLite amalgamation; rather the “codec” code can be compiled in as an independent unit and linked in. Using the Pager hooks would mean that this would conflict with either SEE or a home-grown encryption extension, but my assumption would be that that extension would have its own functionality for per-page checksums or HMACs. -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Nico Williams Sent: Monday, February 06, 2012 4:49 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] About new ticket "Per page/region checksums" On Mon, Feb 6, 2012 at 2:27 PM, Roger Binns < <mailto:rog...@rogerbinns.com> rog...@rogerbinns.com> wrote: > On 06/02/12 11:35, Nico Williams wrote: >> Indeed, but if you'd do integrity protection at the application layer >> then surely you'd have have a backup/restore strategy to deal with >> lower-layer corruption. > > Only if you know about that corruption! It is perfectly possible for > stray zeroes to be written into the btrees in such a way that they are > still considered valid, but you have effectively amputated a row. No, you can do something about that: you could store a hash of the XOR of the hashes of all the rows (per-table). To check integrity simply get all the rows, compute the running XOR of hashes, then when you're done hash the result and then check that against the stored value for the whole DB. (You could also rebuild indexes after checking DB integrity.) >> However, if you're going to have SQLite3 do it > > Note that SQLite already has a way of reserving space on a per page basis. > It is currently used by the encryption extension. It is possible to > store a checksum there instead which would still be readable but not > writeable by older versions of SQLite. Interesting. > Since the btree pages would also have checksums things should work in > most scenarios. Any solution would be better than the current no > solution, especially as checksums let you discover corruption early. Yes. >> Well, one can hope. (I don't keep up with btrfs; perhaps it will >> mature and become common on Android devices.) > > Approximately zero percent of SQLite databases will be on btrfs for > the foreseeable future, even if you and I start using btrfs. Android > 3 and above do support encrypting the filesystem which effectively > gives you a lot of the checksumming, although filesystem > implementations tend to care far more about metadata than file contents. Encryption is not enough. You really need block pointers to carry the block checksum/hash/MAC/integrity tag. > Windows 8 server will be coming with a filesystem that does integrity > checking. However again approximately z
Re: [sqlite] About new ticket "Per page/region checksums"
Roger, it's good to know the reasons of SQLITE_CORRUPT signal. And very useful is possibility to mark all corrupted database pages as unused in repair process and so make the database workable again. And this repair may be processed only by SQLite internals because is needed ignore corrupted pages and all related pages too. As example ZFS can't resolve dependencies in SQLite files between tables and indicies pages. P.S. RAID solutions is not the "silver bullet" and we can get two or more versions of the _differently_ corrupted database files on RAID disks after crash of RAID. And there are a lot of embedded/mobile devices without additional protection by RAID and without of regular backups. P.P.S. Is it possible to dump SQLite database with corrupted schema? I think no. But we can copy some first database pages from any work copy of the database like to dd if=backup.db of=corrupted.db bs=4096 count=100 and repair after database by using per pages checksums. -- Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
On 6 Feb 2012, at 11:31pm, Roger Binns wrote: > [nicely rebuffed my earlier post] Okay, you convinced me. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/02/12 14:38, Simon Slavin wrote: > File systems (FAT, NTFS, HTFS) already have block checksums. Huh? FAT has a checksum on the super block but not on datablocks. NTFS does not have block checksums. That is why they are introducing ReFS. What evidence do you have for your claims - I couldn't find any? Some file formats do have checksums (eg zip and exe) but that isn't relevant. There are some checksums but not particularly strong between drive platters, firmware, drive controllers, and the host. Quite simply if existing systems had working checksums then the only times people would get SQLITE_CORRUPT would be SQLite bugs. > Problem is, what would you do if a checksum was wrong ? Signal it in a > newly-invented result code ? Return SQLITE_CORRUPT - a code that has existed forever. > All apps would have to have some logic to cope with the situation. You mean like they already have to do? In fact something that is even mentioned in the FAQ: http://www.sqlite.org/faq.html#q21 > I don't think it's worth doing this in a thin/light/fast system like > SQLite. Then don't use it. Just because you use systems with perfect data integrity doesn't mean the rest of us do. > And the number of bugs that cause corruption that would be spotted this > way seems to be low. http://search.gmane.org/search.php?group=gmane.comp.db.sqlite.general=SQLITE_CORRUPT Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk8wYsQACgkQmOOfHg372QRLtgCgxdtewXE/45VSF0pX80wGApm/ s5kAmwawvPSLPAjtRpCv5vUrS3/DUmJX =xcIk -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
On Mon, Feb 6, 2012 at 4:38 PM, Simon Slavinwrote: > On 6 Feb 2012, at 9:49pm, Nico Williams wrote: >> Encryption is not enough. You really need block pointers to carry the >> block checksum/hash/MAC/integrity tag. > > File systems (FAT, NTFS, HTFS) already have block checksums. So do network > file systems (AFP, SMB, NFS). Adding more to SQLite, whether at a row or > page level, would catch only cases where the communication was correct at the > storage level but wrong at the semantic level -- which /should/ mean it can > catch /only/ bugs in SQLite. I don't know of any filesystems other than ZFS, btrfs, and the new one from Microsoft, that store checksums of blocks *with the pointers*. > Problem is, what would you do if a checksum was wrong ? Signal it in a > newly-invented result code ? Then what ? All apps would have to have some > logic to cope with the situation. What should a simple app do about it ? > How many more lines of code would that add to apps which use SQLite specially > because of its low overheads ? If you have redundancy you can use the checksum to find the best copy of a block, or, in the case of a RAID-5-like storage strategy, to reconstruct the correct version of the block. If you can't do any of that, then you can fail. This is painful for the user, but the alternative is to return garbage without anyone knowing it. > I don't think it's worth doing this in a thin/light/fast system like SQLite. > Not only does it all weight to the API, it also adds weight to any app which > uses it. And the number of bugs that cause corruption that would be spotted > this way seems to be low. Instead perhaps it would be better to look into > logic which does simple monitoring at the structure level, e.g. checking that > if a command should add three rows, it really does add three rows. (Although > some sort of analysis of the corruption bugs previously found in SQLite > should be undertaken first.) These can be enabled/disabled with a > compilation flag, whereas a checksum can't because you always need to > calculate it. I would generally prefer to let the filesystem do this. However, Roger has a point that it will be years before a significant number of SQLite3 databases are hosted on filesystems that provide strong end-to-end integrity protection, and that is a good argument for doing the protection in the DB. (Well, it's also a strong argument for asking the OS vendors just when they're going to get off their butts and ship reliable filesystems. But that's another story, and anyways, it's likely all in the works at this time for all major OS vendors.) Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
On 6 Feb 2012, at 9:49pm, Nico Williams wrote: > On Mon, Feb 6, 2012 at 2:27 PM, Roger Binnswrote: >> Approximately zero percent of SQLite databases will be on btrfs for the >> foreseeable future, even if you and I start using btrfs. Android 3 and >> above do support encrypting the filesystem which effectively gives you a >> lot of the checksumming, although filesystem implementations tend to care >> far more about metadata than file contents. > > Encryption is not enough. You really need block pointers to carry the > block checksum/hash/MAC/integrity tag. File systems (FAT, NTFS, HTFS) already have block checksums. So do network file systems (AFP, SMB, NFS). Adding more to SQLite, whether at a row or page level, would catch only cases where the communication was correct at the storage level but wrong at the semantic level -- which /should/ mean it can catch /only/ bugs in SQLite. Problem is, what would you do if a checksum was wrong ? Signal it in a newly-invented result code ? Then what ? All apps would have to have some logic to cope with the situation. What should a simple app do about it ? How many more lines of code would that add to apps which use SQLite specially because of its low overheads ? I don't think it's worth doing this in a thin/light/fast system like SQLite. Not only does it all weight to the API, it also adds weight to any app which uses it. And the number of bugs that cause corruption that would be spotted this way seems to be low. Instead perhaps it would be better to look into logic which does simple monitoring at the structure level, e.g. checking that if a command should add three rows, it really does add three rows. (Although some sort of analysis of the corruption bugs previously found in SQLite should be undertaken first.) These can be enabled/disabled with a compilation flag, whereas a checksum can't because you always need to calculate it. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
On Mon, Feb 6, 2012 at 2:27 PM, Roger Binnswrote: > On 06/02/12 11:35, Nico Williams wrote: >> Indeed, but if you'd do integrity protection at the application layer >> then surely you'd have have a backup/restore strategy to deal with >> lower-layer corruption. > > Only if you know about that corruption! It is perfectly possible for stray > zeroes to be written into the btrees in such a way that they are still > considered valid, but you have effectively amputated a row. No, you can do something about that: you could store a hash of the XOR of the hashes of all the rows (per-table). To check integrity simply get all the rows, compute the running XOR of hashes, then when you're done hash the result and then check that against the stored value for the whole DB. (You could also rebuild indexes after checking DB integrity.) >> However, if you're going to have SQLite3 do it > > Note that SQLite already has a way of reserving space on a per page basis. > It is currently used by the encryption extension. It is possible to > store a checksum there instead which would still be readable but not > writeable by older versions of SQLite. Interesting. > Since the btree pages would also have checksums things should work in most > scenarios. Any solution would be better than the current no solution, > especially as checksums let you discover corruption early. Yes. >> Well, one can hope. (I don't keep up with btrfs; perhaps it will >> mature and become common on Android devices.) > > Approximately zero percent of SQLite databases will be on btrfs for the > foreseeable future, even if you and I start using btrfs. Android 3 and > above do support encrypting the filesystem which effectively gives you a > lot of the checksumming, although filesystem implementations tend to care > far more about metadata than file contents. Encryption is not enough. You really need block pointers to carry the block checksum/hash/MAC/integrity tag. > Windows 8 server will be coming with a filesystem that does integrity > checking. However again approximately zero percent of SQLite databases > will be stored on that filesystem for the foreseeable future. Right. > SQLite having its own checksumming will solve the integrity issues today, > for hundreds of millions of users, especially when developers consider > data very important. It will also solve some of the corruption issue > finger pointing - if a page has a correct checksum but SQLite considers > the contents corrupt then the most likely explanation is a bug in SQLite, > while an invalid checksum points to the contents being different than what > SQLite wrote hence points fingers at the operating system and storage. Agreed. For this, again, I'd prefer page-level hashing, with hash values stored with page pointers. This way the overhead of integrity checking would be paid once per-I/O. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/02/12 11:35, Nico Williams wrote: > Indeed, but if you'd do integrity protection at the application layer > then surely you'd have have a backup/restore strategy to deal with > lower-layer corruption. Only if you know about that corruption! It is perfectly possible for stray zeroes to be written into the btrees in such a way that they are still considered valid, but you have effectively amputated a row. > Yes, this tends to argue that if you want row-level checksums My point is that they are fairly useless since they only work if btrees and indices work perfectly. Anything that could affect rows could also affect the other pieces of information SQLite is storing. > However, if you're going to have SQLite3 do it Note that SQLite already has a way of reserving space on a per page basis. It is currently used by the encryption extension. It is possible to store a checksum there instead which would still be readable but not writeable by older versions of SQLite. Since the btree pages would also have checksums things should work in most scenarios. Any solution would be better than the current no solution, especially as checksums let you discover corruption early. > Well, one can hope. (I don't keep up with btrfs; perhaps it will > mature and become common on Android devices.) Approximately zero percent of SQLite databases will be on btrfs for the foreseeable future, even if you and I start using btrfs. Android 3 and above do support encrypting the filesystem which effectively gives you a lot of the checksumming, although filesystem implementations tend to care far more about metadata than file contents. Windows 8 server will be coming with a filesystem that does integrity checking. However again approximately zero percent of SQLite databases will be stored on that filesystem for the foreseeable future. http://blogs.msdn.com/b/b8/archive/2012/01/16/building-the-next-generation-file-system-for-windows-refs.aspx SQLite having its own checksumming will solve the integrity issues today, for hundreds of millions of users, especially when developers consider data very important. It will also solve some of the corruption issue finger pointing - if a page has a correct checksum but SQLite considers the contents corrupt then the most likely explanation is a bug in SQLite, while an invalid checksum points to the contents being different than what SQLite wrote hence points fingers at the operating system and storage. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk8wN6oACgkQmOOfHg372QTDnACgmHiKBZ43TbYlVGPRxN5a9UF5 uGkAoIbjVsjuA7IuATyC/0M2IDDE6IJN =o4lD -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
On Mon, Feb 6, 2012 at 1:20 PM, Roger Binnswrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 06/02/12 09:11, Nico Williams wrote: >> A checksum per-row is certainly a valuable thing at the layer above the >> RDBMS since it allows for integrity checking above the RBDMS, and in an >> RBDMS-independent manner. > > It doesn't actually help that much since it is only over a portion of the > database content. While it will catch an individual row being corrupted > it will not catch the btree that points to the table rows being corrupted. Indeed, but if you'd do integrity protection at the application layer then surely you'd have have a backup/restore strategy to deal with lower-layer corruption. > If that container is "tweaked" then it could still be sufficiently valid > but end up omitting the row so you will get wrong query results. The same > story applies to indices where a judicious tweak will not corrupt it, but > will result in the wrong rows being selected. Yes, this tends to argue that if you want row-level checksums you also want to have SQLite3 do it as in Andy's solution. However, if you're going to have SQLite3 do it I'd rather it followed the ZFS pattern of checksumming entire pages and storing the checksum with the page pointers. > It is nice that some filesystems are adding integrity protection (and in > some cases recovery), but approximately zero percent of the systems out > there running SQLite do not have the databases stored in such a configured > filesystem today. Well, one can hope. (I don't keep up with btrfs; perhaps it will mature and become common on Android devices.) Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/02/12 09:11, Nico Williams wrote: > A checksum per-row is certainly a valuable thing at the layer above the > RDBMS since it allows for integrity checking above the RBDMS, and in an > RBDMS-independent manner. It doesn't actually help that much since it is only over a portion of the database content. While it will catch an individual row being corrupted it will not catch the btree that points to the table rows being corrupted. If that container is "tweaked" then it could still be sufficiently valid but end up omitting the row so you will get wrong query results. The same story applies to indices where a judicious tweak will not corrupt it, but will result in the wrong rows being selected. Being able to integrity check the rows that are returned doesn't help if they are the wrong rows! It is nice that some filesystems are adding integrity protection (and in some cases recovery), but approximately zero percent of the systems out there running SQLite do not have the databases stored in such a configured filesystem today. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk8wJ/YACgkQmOOfHg372QQ9AACgp1xLCbo9tEIqyF+Ar7ZqaKxi xjoAnjx6AEN0lP6ZP9QwVSKC642hpfao =wC3P -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
A checksum per-row is certainly a valuable thing at the layer above the RDBMS since it allows for integrity checking above the RBDMS, and in an RBDMS-independent manner. Of course, this approach is easiest to graft into SQLite3 simply because you'd be adding a hidden column and the machinery for maintaining and checking it fits into a handful of VM opcodes, thus barely changing the on-disk format, but... ... for an integrity protection scheme within the RDBMS I would argue that it's better to have per-page checksums. The argument is based on the ZFS argument for checksumming entire blocks (and storing the checksum with the block pointers): that you then pay for the cost of checksumming only once per-I/O (whereas with a per-row solution you either pay the checksum penalty every time you read that row, even from cache, or you must do some extra bookkeeping to ensure that you checksum no more than once per-{I/O, row}). I suppose this might not work out as well for an RDBMS as for ZFS since ZFS is able to get offsetting benefits, such as reduced I/O times by compressing blocks. Which brings me to a different point, which is that if you can host a SQLite3 DB on a filesystem like ZFS then you don't have to concern yourself quite as much with doing data integrity protection within SQLite3. What's the best layer for integrity protection in your application, then? So far we have three possible layers: app-layer, SQLite3, filesystem. If you trust your storage to provide integrity protection you might add that as a layer. I'm not sure that there's a one-size-fits-all answer here, and I find your own solution to be rather clever (since easy to graft in and keep up to date). One common issue in all cases is: how to report integrity check failures, and what else to do about them. This is another area where ZFS gets benefits you're not likely to get from higher-layers: since ZFS integrates volume management into the filesystem it can use integrity protection checksums to help correct errors when there is redundancy (e.g., mirroring) in the volume. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] About new ticket "Per page/region checksums"
On 4th February 2012 12:51pm, Alexey Pechnikov wrote: > It's very important but there are some questions about > http://www.sqlite.org/src/info/72b01a982a > Some times ago DRH wrote that checksum calculation don't slow down > SQLite significantly. > But can be this realized in current SQLite 3.x branch? When checksum > can help to restore > damaged pages/database? Does powersave overwrite feature conflicts > with idea of per pages > checksums? Obviously I can't speak for DRH and co., but I can speak from my own experience, and maybe it will be beneficial to the debate. I implemented a checksumming feature into sqlite back in 2008/9 and have maintained it ever since and now have it in my branch of 3.7.10. This checksum is an 8-bit crc and is calculated *per row* on any table for which it is enabled. My implementation adds two bytes overhead to each row of checksummed tables in the database. I found that when used on a 400mhz PowerPC embedded platform there was a very small but measurable performance regression when dealing with hundreds of thousands of rows of data compared to a database file not performing the crc calculation. On a desktop PC this performance regression was not measurable (i.e. insignificant). I won't go into the details of exactly how it has been implemented, unless someone is particularly interested, but to say this: Since I observed a slight performance regression on slow hardware I made it possible to turn the feature on and off on a per-table basis. This meant a small addition to the SQL syntax for CREATE TABLE. I wanted the checksum calculation to only occur when the row was created and only when checking for checksum errors in the database so that in general use performance wouldn't be affected. I also wished the checksum to be "invisible" to the user in normal operation, e.g. when performing a SELECT * FROM table, or when using an INSERT or UPDATE statement. In the implementation it became necessary to upgrade the file format to "5" to support all these features, and to remain compatible with old database files. In addition to PRAGMA legacy_file_format, I also have PRAGMA compatible_file_format, so that the user can choose between 0, 4 and 5 as the format for the database file. Only in format "5" can the checksum be used. The user cannot find out the checksum of a row, but instead I implemented an additional SQL function which checks the checksum of a particular row and returns '0' or '1' depending on whether the row had become corrupted. A simple statement like this: SELECT rowid FROM table WHERE !valid_row_checksum(rowid); is enough to find corrupted rows. If the checksum feature were to be added to Sqlite as standard, as proposed by Roger Binns, I would strongly suggest that it should be possible to fine-grain the corruption detection to a single row of data. To an actual column within the row would be nice, but I think too much of an overhead to be truly worth while. My implementation also didn't create any checksums on data other than row data. For me that wasn't an issue, but I could see that in other applications it may be a nice feature. I have a complete patch for my implementation with documentation and test-suite additions. If anyone is interested, even the Sqlite developers(!), then drop me an email and I'll post it along. I have no restriction on me stopping this patch being released to the public domain. Andy ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] About new ticket "Per page/region checksums"
It's very important but there are some questions about http://www.sqlite.org/src/info/72b01a982a Some times ago DRH wrote that checksum calculation don't slow down SQLite significantly. But can be this realized in current SQLite 3.x branch? When checksum can help to restore damaged pages/database? Does powersave overwrite feature conflicts with idea of per pages checksums? P.S. The article "Berkeley DB Recoverability" provides some info about WAL+checksum mode: http://help.bdbxml.net/html/ed539869-eebd-478f-97de-7e5377e87f66.htm -- Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users