Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order
On Wed, 5 Dec 2018 05:21:30 + Simon Slavin wrote: > On 5 Dec 2018, at 5:16am, Ryan Schmidt wrote: > > > https://kb.vmware.com/s/article/1008542 > > > > "VMware ESX acknowledges a write or read to a guest operating > > system only after that write or read is acknowledged by the > > hardware controller to ESX. Applications running inside virtual > > machines on ESX are afforded the same crash consistency guarantees > > as applications running on physical machines or physical disk > > controllers." > > Interesting. That paragraph is a well-written piece of text > explaining the opposite of what I thought. Maybe things have changed > in the past decade. VMware may well be doing the best it can on unreliable hardware. I believe it's common knowledge that consumer-grade hard drives lie when acknowledging writes: the acknowlegement is sent when the data are received into the device's write buffer, not after being written to disk. It's good for benchmarks. No one benchmarks data corruptions. 'Twas ever thus: If you want a reliable database, use a reliable disk. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order
On 5 Dec 2018, at 12:22pm, Ryan Schmidt wrote: > Since I don't know what else may have gone wrong with the contents of the > registry by this point, it seems safer to erase the MacPorts installation and > start fresh. This will take a bit longer as every port has to be re-fetched > and re-installed but at least I'll have confidence in the integrity of the > registry. I agree with your diagnosis. Your software has used the corrupted database and made incorrect changes to it because it trusted corrupt information. Unfortunate. Since you do have the ability to build a new dataset from scratch, I think that's the best way. I still have no better idea of the cause of the corruption. I'd imagine you're still considering what you found in "howtocorrupt". Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order
On Dec 4, 2018, at 22:42, Simon Slavin wrote: > On 5 Dec 2018, at 3:20am, Ryan Schmidt wrote: > >> $ sqlite3 /opt/local/var/macports/registry/registry.db >> SQLite version 3.25.2 2018-09-25 19:08:10 >> Enter ".help" for usage hints. >> sqlite> .load /tmp/macports.sqlext >> sqlite> pragma integrity_check; >> *** in database main *** >> On tree page 76852 cell 303: Rowid 18741471 out of order >> On tree page 76852 cell 301: Rowid 18741430 out of order >> On tree page 76852 cell 299: Rowid 18741387 out of order >> On tree page 76852 cell 296: Rowid 18741324 out of order > > It is possible that the rows reported as missing are actually deleted rows, > and that tree page 76852 just hasn't been moved from "part of TABLE ports" to > "freed pages". > > Try using ".dump" to dump that database to a text file. Can you read it and > tell whether anything is missing ? You can either read the SQL commands by > eye, or use command-line-tool to ".read" the .sql file to create another > database, and use database tools to interrogate that one. > > Does your database have any relations, either explicitly declared using > FOREIGN KEY or implicit in how your software handles the data ? Presumably > every row in TABLE file should be part of a row in TABLE port, or something > like that. Can you use your understanding of the relation to prove that > certain rows are missing from certain tables ? Does it give you any idea how > much data is missing ? I was able to .dump the data from the corrupt database and .read it into a new database (1.4GB, slightly smaller than the original 1.5GB database, which could be plausible if the original had not been vacuumed?). pragma integrity_check then found no problems in the new database. The tables do have some very rudimentary FOREIGN KEY relations. The "ports" table has an "id" INTEGER primary key, and the "files" table and a couple others have an "id" column that references it. Verifying the correctness of the data just by looking at it seemed infeasible, so, feeling adventurous, I put the new database in place and let MacPorts build a few things, but problems soon became apparent. The "files" table which records the association of files with a port has over 31,000 files associated with a port two or three times each -- duplicate entries. MacPorts didn't expect this condition to exist and didn't react well to it. This table does not have any unique constraints. We probably should have had a unique constraint over the combination of id and path to prevent this condition from existing. After the corruption first occurred, several automated builds went by, activating and deactivating various ports, which would have affected the registry, before I noticed the problem and stopped the automated builds. It's possible that because MacPorts was not able to access the registry properly, it got the wrong idea about what to do, and created those duplicate entries. Since I don't know what else may have gone wrong with the contents of the registry by this point, it seems safer to erase the MacPorts installation and start fresh. This will take a bit longer as every port has to be re-fetched and re-installed but at least I'll have confidence in the integrity of the registry. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order
On 2018/12/05 5:20 AM, Ryan Schmidt wrote: Hello, I have a corrupt SQLite database about which I'd appreciate your advice. The data is not critical but I'd like to fix it if it's possible and not too time-consuming. Even just knowing why the problem occurred or how to prevent it in the future would be helpful. If there's something the application should be doing differently in its use of the SQLite library to ensure the database doesn't get corrupted that would be good to know. The application that is using the SQLite library (MacPorts) is experiencing this error:// Hi Ryan, the question has two parts: 1 - Can it be fixed? Very probably - by simply dumping it to text and re-importing again. It may need a scan by eye to make sure there are no serious broken text, but usually the Index errors as per your error dump will not be too complicated and easily/automatically corrected. See the .dump command in the sqlite CLI documentation: https://sqlite.org/cli.html#converting_an_entire_database_to_an_ascii_text_file 2 - How did it happen? Hard to say, almost certainly a write that somehow didn't make it to the physical layer. Here is a list of why that can happen (which you may have already encountered in your research): https://sqlite.org/howtocorrupt.html Good luck! Ryan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order
On 5 Dec 2018, at 5:16am, Ryan Schmidt wrote: > https://kb.vmware.com/s/article/1008542 > > "VMware ESX acknowledges a write or read to a guest operating system only > after that write or read is acknowledged by the hardware controller to ESX. > Applications running inside virtual machines on ESX are afforded the same > crash consistency guarantees as applications running on physical machines or > physical disk controllers." Interesting. That paragraph is a well-written piece of text explaining the opposite of what I thought. Maybe things have changed in the past decade. I suppose the lie "Your changes have been made on physical hardware, you can now proceed." may be somewhere else in your system (e.g. the hardware controller). Or maybe I'm barking up the wrong tree. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order
On Dec 4, 2018, at 22:42, Simon Slavin wrote: > On 5 Dec 2018, at 3:20am, Ryan Schmidt wrote: > >> $ sqlite3 /opt/local/var/macports/registry/registry.db >> SQLite version 3.25.2 2018-09-25 19:08:10 >> Enter ".help" for usage hints. >> sqlite> .load /tmp/macports.sqlext >> sqlite> pragma integrity_check; >> *** in database main *** >> On tree page 76852 cell 303: Rowid 18741471 out of order >> On tree page 76852 cell 301: Rowid 18741430 out of order >> On tree page 76852 cell 299: Rowid 18741387 out of order >> On tree page 76852 cell 296: Rowid 18741324 out of order > > It is possible that the rows reported as missing are actually deleted rows, > and that tree page 76852 just hasn't been moved from "part of TABLE ports" to > "freed pages". > > Try using ".dump" to dump that database to a text file. Can you read it and > tell whether anything is missing ? You can either read the SQL commands by > eye, or use command-line-tool to ".read" the .sql file to create another > database, and use database tools to interrogate that one. > > Does your database have any relations, either explicitly declared using > FOREIGN KEY or implicit in how your software handles the data ? Presumably > every row in TABLE file should be part of a row in TABLE port, or something > like that. Can you use your understanding of the relation to prove that > certain rows are missing from certain tables ? Does it give you any idea how > much data is missing ? Thanks very much for taking the time to respond. I'll look into what you suggested. >> the VMware virtual machine the builder runs on froze and had to be manually >> powered off. Upon restarting the VM, registry operations began failing with >> the above error. > > I'm 95% certain that your VMware software was caching changes made to the > database file rather than flushing them to disk when the software told it to. > Under default configuration they all do this, and this does not mean that > VMWare is worse than any competing product. > > In other words, the cause of corruption was that you were using a virtual > machine and not real hardware, and that the virtual machine violates ACID for > speed. It's possible that WMware has settings which correct this problem. > If it does, using them will, of course, slow it down. That occurred to me as well. But from what I can tell initially, writes should not be cached: https://kb.vmware.com/s/article/1008542 "VMware ESX acknowledges a write or read to a guest operating system only after that write or read is acknowledged by the hardware controller to ESX. Applications running inside virtual machines on ESX are afforded the same crash consistency guarantees as applications running on physical machines or physical disk controllers." ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order
On 5 Dec 2018, at 3:20am, Ryan Schmidt wrote: > $ sqlite3 /opt/local/var/macports/registry/registry.db > SQLite version 3.25.2 2018-09-25 19:08:10 > Enter ".help" for usage hints. > sqlite> .load /tmp/macports.sqlext > sqlite> pragma integrity_check; > *** in database main *** > On tree page 76852 cell 303: Rowid 18741471 out of order > On tree page 76852 cell 301: Rowid 18741430 out of order > On tree page 76852 cell 299: Rowid 18741387 out of order > On tree page 76852 cell 296: Rowid 18741324 out of order It is possible that the rows reported as missing are actually deleted rows, and that tree page 76852 just hasn't been moved from "part of TABLE ports" to "freed pages". Try using ".dump" to dump that database to a text file. Can you read it and tell whether anything is missing ? You can either read the SQL commands by eye, or use command-line-tool to ".read" the .sql file to create another database, and use database tools to interrogate that one. Does your database have any relations, either explicitly declared using FOREIGN KEY or implicit in how your software handles the data ? Presumably every row in TABLE file should be part of a row in TABLE port, or something like that. Can you use your understanding of the relation to prove that certain rows are missing from certain tables ? Does it give you any idea how much data is missing ? > the VMware virtual machine the builder runs on froze and had to be manually > powered off. Upon restarting the VM, registry operations began failing with > the above error. I'm 95% certain that your VMware software was caching changes made to the database file rather than flushing them to disk when the software told it to. Under default configuration they all do this, and this does not mean that VMWare is worse than any competing product. In other words, the cause of corruption was that you were using a virtual machine and not real hardware, and that the virtual machine violates ACID for speed. It's possible that WMware has settings which correct this problem. If it does, using them will, of course, slow it down. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users