On Tue, Feb 7, 2012 at 6:31 PM, David Barrett <dbarr...@expensify.com>wrote:
> On 02/07/2012 03:00 PM, Richard Hipp wrote: > >> On Tue, Feb 7, 2012 at 5:19 PM, David Barrett<dbarr...@expensify.com** >> >wrote: >> >>> 2) However, we get erratic behavior when using the sqlite3 command-line >>> >>> tool to just do a basic select on the database: sometimes it works, >>> sometimes it returns "Error: database disk image malformed". Sometimes >>> we >>> just run the same command many times until it works. >>> >> >> As the very first thing you do in the command-line tool, enter this >> command: >> >> .log stdout >> >> That will cause additional error diagnostics to appear on standard output. >> Then do your commands that provoke the malformed error, and let us know >> what you see as output. >> > > Great idea. Here's the output: > > SQLite version 3.7.2 > Enter ".help" for instructions > Enter SQL statements terminated with a ";" > sqlite> .log stdout > sqlite> select count(*) from **redacted**; > (11) database corruption at line 45894 of [42537b6056] > This tells me that the error is occurring at http://www.sqlite.org/src/artifact/5047fb303cdf6?ln=1362 which occurs right as SQLite is first starting to decode a page that it as loaded from the disk. The error indicates that the shell really is seeing a malformed database file. Can you tell me more about your "custom distributed transaction layer"? Might that have something to do with this? Are you using a custom VFS? Are you bypassing the built-in locking mechanisms of SQLite and doing some kind of custom locking? Are you running this on a network filesystem? I don't have much to go on here, but my instinct is to look for a broken locking implementation that allows the servers to change the database out from under the command-line tool. I'm guessing that perhaps the command-line tool does not compile-in the "custom distributed transaction layer" and hence the command-line tool is not properly setting the locks that tell the servers "I'm reading this, so don't change it out from under me" and so the busy servers do end up changing the data out from under the command-line tool. Or, perhaps you are running the command-line tool on a different machine where it is not able to access the WAL's database-shm file in shared memory. So the command-line tool reads a one page of the database which indicates the the content it is seeking is found on some other page X. But by the time the command-line tool has loaded page X, the server has already shifted the content to someplace else. The page that the command-line tool loaded is no longer formatted as the command-line tool expects it to be, causing exactly the error shown above. > (11) database corruption at line 45932 of [42537b6056] > (11) statement aborts at 16: [select count(*) from **redacted**;] database > disk image is malformed > > Error: database disk image is malformed > sqlite> > > It happens very erratically, and each time we've run "PRAGMA > integrity_check;" after seeing the problem (which requires several hours of > downtime for that server, so I didn't do it for the above query), it comes > up clean every single time. > > Thanks for your help! > > -david > > PS: I apologize for redacting the query -- let me know if that would be > particularly helpful, otherwise I'd like to keep it private. > > > ______________________________**_________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users<http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users> > -- D. Richard Hipp d...@sqlite.org _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users