On Tue, Feb 7, 2012 at 6:31 PM, David Barrett <dbarr...@expensify.com>wrote:

> On 02/07/2012 03:00 PM, Richard Hipp wrote:
>
>> On Tue, Feb 7, 2012 at 5:19 PM, David Barrett<dbarr...@expensify.com**
>> >wrote:
>>
>>> 2) However, we get erratic behavior when using the sqlite3 command-line
>>>
>>> tool to just do a basic select on the database: sometimes it works,
>>> sometimes it returns "Error: database disk image malformed".  Sometimes
>>> we
>>> just run the same command many times until it works.
>>>
>>
>> As the very first thing you do in the command-line tool, enter this
>> command:
>>
>>      .log stdout
>>
>> That will cause additional error diagnostics to appear on standard output.
>> Then do your commands that provoke the malformed error, and let us know
>> what you see as output.
>>
>
> Great idea.  Here's the output:
>
> SQLite version 3.7.2
> Enter ".help" for instructions
> Enter SQL statements terminated with a ";"
> sqlite> .log stdout
> sqlite> select count(*) from **redacted**;
> (11) database corruption at line 45894 of [42537b6056]
>

This tells me that the error is occurring at
http://www.sqlite.org/src/artifact/5047fb303cdf6?ln=1362 which occurs right
as SQLite is first starting to decode a page that it as loaded from the
disk.  The error indicates that the shell really is seeing a malformed
database file.

Can you tell me more about your "custom distributed transaction layer"?
Might that have something to do with this?  Are you using a custom VFS?
Are you bypassing the built-in locking mechanisms of SQLite and doing some
kind of custom locking?  Are you running this on a network filesystem?

I don't have much to go on here, but my instinct is to look for a broken
locking implementation that allows the servers to change the database out
from under the command-line tool.  I'm guessing that perhaps the
command-line tool does not compile-in the "custom distributed transaction
layer" and hence the command-line tool is not properly setting the locks
that tell the servers "I'm reading this, so don't change it out from under
me" and so the busy servers do end up changing the data out from under the
command-line tool.  Or, perhaps you are running the command-line tool on a
different machine where it is not able to access the WAL's database-shm
file in shared memory. So the command-line tool reads a one page of the
database which indicates the the content it is seeking is found on some
other page X.  But by the time the command-line tool has loaded page X, the
server has already shifted the content to someplace else.  The page that
the command-line tool loaded is no longer formatted as the command-line
tool expects it to be, causing exactly the error shown above.



> (11) database corruption at line 45932 of [42537b6056]
> (11) statement aborts at 16: [select count(*) from **redacted**;] database
> disk image is malformed
>
> Error: database disk image is malformed
> sqlite>
>
> It happens very erratically, and each time we've run "PRAGMA
> integrity_check;" after seeing the problem (which requires several hours of
> downtime for that server, so I didn't do it for the above query), it comes
> up clean every single time.
>
> Thanks for your help!
>
> -david
>
> PS: I apologize for redacting the query -- let me know if that would be
> particularly helpful, otherwise I'd like to keep it private.
>
>
> ______________________________**_________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users<http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users>
>



-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to