Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order

2018-12-05 Thread James K. Lowden
On Wed, 5 Dec 2018 05:21:30 +
Simon Slavin  wrote:

> On 5 Dec 2018, at 5:16am, Ryan Schmidt  wrote:
> 
> > https://kb.vmware.com/s/article/1008542
> > 
> > "VMware ESX acknowledges a write or read to a guest operating
> > system only after that write or read is acknowledged by the
> > hardware controller to ESX. Applications running inside virtual
> > machines on ESX are afforded the same crash consistency guarantees
> > as applications running on physical machines or physical disk
> > controllers."
> 
> Interesting.  That paragraph is a well-written piece of text
> explaining the opposite of what I thought.  Maybe things have changed
> in the past decade.

VMware may well be doing the best it can on unreliable hardware.  I
believe it's common knowledge that consumer-grade hard drives lie when
acknowledging writes: the acknowlegement is sent when the data are
received into the device's write buffer, not after being written to
disk.  It's good for benchmarks.  No one benchmarks data corruptions.  

'Twas ever thus: If you want a reliable database, use a reliable disk.  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order

2018-12-05 Thread Simon Slavin
On 5 Dec 2018, at 12:22pm, Ryan Schmidt  wrote:

> Since I don't know what else may have gone wrong with the contents of the 
> registry by this point, it seems safer to erase the MacPorts installation and 
> start fresh. This will take a bit longer as every port has to be re-fetched 
> and re-installed but at least I'll have confidence in the integrity of the 
> registry.

I agree with your diagnosis.  Your software has used the corrupted database and 
made incorrect changes to it because it trusted corrupt information.  
Unfortunate.  Since you do have the ability to build a new dataset from 
scratch, I think that's the best way.

I still have no better idea of the cause of the corruption.  I'd imagine you're 
still considering what you found in "howtocorrupt".

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order

2018-12-05 Thread Ryan Schmidt


On Dec 4, 2018, at 22:42, Simon Slavin wrote:

> On 5 Dec 2018, at 3:20am, Ryan Schmidt wrote:
> 
>> $ sqlite3 /opt/local/var/macports/registry/registry.db
>> SQLite version 3.25.2 2018-09-25 19:08:10
>> Enter ".help" for usage hints.
>> sqlite> .load /tmp/macports.sqlext
>> sqlite> pragma integrity_check;
>> *** in database main ***
>> On tree page 76852 cell 303: Rowid 18741471 out of order
>> On tree page 76852 cell 301: Rowid 18741430 out of order
>> On tree page 76852 cell 299: Rowid 18741387 out of order
>> On tree page 76852 cell 296: Rowid 18741324 out of order
> 
> It is possible that the rows reported as missing are actually deleted rows, 
> and that tree page 76852 just hasn't been moved from "part of TABLE ports" to 
> "freed pages".
> 
> Try using ".dump" to dump that database to a text file.  Can you read it and 
> tell whether anything is missing ?  You can either read the SQL commands by 
> eye, or use command-line-tool to ".read" the .sql file to create another 
> database, and use database tools to interrogate that one.
> 
> Does your database have any relations, either explicitly declared using 
> FOREIGN KEY or implicit in how your software handles the data ?  Presumably 
> every row in TABLE file should be part of a row in TABLE port, or something 
> like that.  Can you use your understanding of the relation to prove that 
> certain rows are missing from certain tables ?  Does it give you any idea how 
> much data is missing ?

I was able to .dump the data from the corrupt database and .read it into a new 
database (1.4GB, slightly smaller than the original 1.5GB database, which could 
be plausible if the original had not been vacuumed?). pragma integrity_check 
then found no problems in the new database.

The tables do have some very rudimentary FOREIGN KEY relations. The "ports" 
table has an "id" INTEGER primary key, and the "files" table and a couple 
others have an "id" column that references it.

Verifying the correctness of the data just by looking at it seemed infeasible, 
so, feeling adventurous, I put the new database in place and let MacPorts build 
a few things, but problems soon became apparent. The "files" table which 
records the association of files with a port has over 31,000 files associated 
with a port two or three times each -- duplicate entries. MacPorts didn't 
expect this condition to exist and didn't react well to it. This table does not 
have any unique constraints. We probably should have had a unique constraint 
over the combination of id and path to prevent this condition from existing. 
After the corruption first occurred, several automated builds went by, 
activating and deactivating various ports, which would have affected the 
registry, before I noticed the problem and stopped the automated builds. It's 
possible that because MacPorts was not able to access the registry properly, it 
got the wrong idea about what to do, and created those duplicate entries.

Since I don't know what else may have gone wrong with the contents of the 
registry by this point, it seems safer to erase the MacPorts installation and 
start fresh. This will take a bit longer as every port has to be re-fetched and 
re-installed but at least I'll have confidence in the integrity of the registry.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order

2018-12-04 Thread R Smith

On 2018/12/05 5:20 AM, Ryan Schmidt wrote:

Hello,

I have a corrupt SQLite database about which I'd appreciate your advice. The 
data is not critical but I'd like to fix it if it's possible and not too 
time-consuming. Even just knowing why the problem occurred or how to prevent it 
in the future would be helpful. If there's something the application should be 
doing differently in its use of the SQLite library to ensure the database 
doesn't get corrupted that would be good to know.

The application that is using the SQLite library (MacPorts) is experiencing 
this error://


Hi Ryan, the question has two parts:
1 - Can it be fixed?
Very probably - by simply dumping it to text and re-importing again. It 
may need a scan by eye to make sure there are no serious broken text, 
but usually the Index errors as per your error dump will not be too 
complicated and easily/automatically corrected.

See the .dump command in the sqlite CLI documentation:
https://sqlite.org/cli.html#converting_an_entire_database_to_an_ascii_text_file


2 - How did it happen?
Hard to say, almost certainly a write that somehow didn't make it to the 
physical layer.
Here is a list of why that can happen (which you may have already 
encountered in your research):

https://sqlite.org/howtocorrupt.html


Good luck!
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order

2018-12-04 Thread Simon Slavin
On 5 Dec 2018, at 5:16am, Ryan Schmidt  wrote:

> https://kb.vmware.com/s/article/1008542
> 
> "VMware ESX acknowledges a write or read to a guest operating system only 
> after that write or read is acknowledged by the hardware controller to ESX. 
> Applications running inside virtual machines on ESX are afforded the same 
> crash consistency guarantees as applications running on physical machines or 
> physical disk controllers."

Interesting.  That paragraph is a well-written piece of text explaining the 
opposite of what I thought.  Maybe things have changed in the past decade.

I suppose the lie "Your changes have been made on physical hardware, you can 
now proceed." may be somewhere else in your system (e.g. the hardware 
controller).  Or maybe I'm barking up the wrong tree.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order

2018-12-04 Thread Ryan Schmidt


On Dec 4, 2018, at 22:42, Simon Slavin wrote:

> On 5 Dec 2018, at 3:20am, Ryan Schmidt wrote:
> 
>> $ sqlite3 /opt/local/var/macports/registry/registry.db
>> SQLite version 3.25.2 2018-09-25 19:08:10
>> Enter ".help" for usage hints.
>> sqlite> .load /tmp/macports.sqlext
>> sqlite> pragma integrity_check;
>> *** in database main ***
>> On tree page 76852 cell 303: Rowid 18741471 out of order
>> On tree page 76852 cell 301: Rowid 18741430 out of order
>> On tree page 76852 cell 299: Rowid 18741387 out of order
>> On tree page 76852 cell 296: Rowid 18741324 out of order
> 
> It is possible that the rows reported as missing are actually deleted rows, 
> and that tree page 76852 just hasn't been moved from "part of TABLE ports" to 
> "freed pages".
> 
> Try using ".dump" to dump that database to a text file.  Can you read it and 
> tell whether anything is missing ?  You can either read the SQL commands by 
> eye, or use command-line-tool to ".read" the .sql file to create another 
> database, and use database tools to interrogate that one.
> 
> Does your database have any relations, either explicitly declared using 
> FOREIGN KEY or implicit in how your software handles the data ?  Presumably 
> every row in TABLE file should be part of a row in TABLE port, or something 
> like that.  Can you use your understanding of the relation to prove that 
> certain rows are missing from certain tables ?  Does it give you any idea how 
> much data is missing ?

Thanks very much for taking the time to respond. I'll look into what you 
suggested.


>> the VMware virtual machine the builder runs on froze and had to be manually 
>> powered off. Upon restarting the VM, registry operations began failing with 
>> the above error.
> 
> I'm 95% certain that your VMware software was caching changes made to the 
> database file rather than flushing them to disk when the software told it to. 
>  Under default configuration they all do this, and this does not mean that 
> VMWare is worse than any competing product.
> 
> In other words, the cause of corruption was that you were using a virtual 
> machine and not real hardware, and that the virtual machine violates ACID for 
> speed.  It's possible that WMware has settings which correct this problem.  
> If it does, using them will, of course, slow it down.

That occurred to me as well. But from what I can tell initially, writes should 
not be cached:

https://kb.vmware.com/s/article/1008542

"VMware ESX acknowledges a write or read to a guest operating system only after 
that write or read is acknowledged by the hardware controller to ESX. 
Applications running inside virtual machines on ESX are afforded the same crash 
consistency guarantees as applications running on physical machines or physical 
disk controllers."


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corrupted database: On tree page 76852 cell 303: Rowid 18741471 out of order

2018-12-04 Thread Simon Slavin
On 5 Dec 2018, at 3:20am, Ryan Schmidt  wrote:

> $ sqlite3 /opt/local/var/macports/registry/registry.db
> SQLite version 3.25.2 2018-09-25 19:08:10
> Enter ".help" for usage hints.
> sqlite> .load /tmp/macports.sqlext
> sqlite> pragma integrity_check;
> *** in database main ***
> On tree page 76852 cell 303: Rowid 18741471 out of order
> On tree page 76852 cell 301: Rowid 18741430 out of order
> On tree page 76852 cell 299: Rowid 18741387 out of order
> On tree page 76852 cell 296: Rowid 18741324 out of order

It is possible that the rows reported as missing are actually deleted rows, and 
that tree page 76852 just hasn't been moved from "part of TABLE ports" to 
"freed pages".

Try using ".dump" to dump that database to a text file.  Can you read it and 
tell whether anything is missing ?  You can either read the SQL commands by 
eye, or use command-line-tool to ".read" the .sql file to create another 
database, and use database tools to interrogate that one.

Does your database have any relations, either explicitly declared using FOREIGN 
KEY or implicit in how your software handles the data ?  Presumably every row 
in TABLE file should be part of a row in TABLE port, or something like that.  
Can you use your understanding of the relation to prove that certain rows are 
missing from certain tables ?  Does it give you any idea how much data is 
missing ?

> the VMware virtual machine the builder runs on froze and had to be manually 
> powered off. Upon restarting the VM, registry operations began failing with 
> the above error.

I'm 95% certain that your VMware software was caching changes made to the 
database file rather than flushing them to disk when the software told it to.  
Under default configuration they all do this, and this does not mean that 
VMWare is worse than any competing product.

In other words, the cause of corruption was that you were using a virtual 
machine and not real hardware, and that the virtual machine violates ACID for 
speed.  It's possible that WMware has settings which correct this problem.  If 
it does, using them will, of course, slow it down.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users