We've been trying to figure out a database corruption issue for a week, and 
think we've got it figured out. It involves the "zombie database handle" state 
induced by sqlite3_close_v2:

> If sqlite3_close_v2() is called on a database connection that still has 
> outstanding prepared statements, BLOB handles, and/orsqlite3_backup objects 
> then it returns SQLITE_OK and the deallocation of resources is deferred until 
> all prepared statements, BLOB handles, and sqlite3_backup objects are also 
> destroyed.

In general this is useful behavior for us, as using Java bindings to our API 
can result in statement handles hanging around longer than they ought to until 
the JVM runs finalizers. With sqlite3_close_v2, closing the database doesn't 
fail in these circumstances. (Which is, I believe, why this function was added 
to SQLite in the first place.)

The situation where things go wrong is:
1. Database handle A is opened
2. A compiled statement is created and run. There's a Java object abstracting 
this sqlite3_stmt.
3. The database is closed with sqlite3_close_v2. The statement is still alive, 
so the database handle secretly remains open in "zombie mode".
4. We delete the database's parent directory and all files in it.
5. We create a new directory (with the same path as the old one) and call 
sqlite3_open_v2 to create a new empty database with handle B.
6. The JVM gets around to running finalizers, causing the sqlite3_stmt to be 
freed, which makes the zombie handle closable.
7. The internal function sqlite3LeaveMutexAndCloseZombie() closes the file 
handle … and also deletes the -wal and -shm files.

Now we're in trouble: the -wal and -shm files that just got deleted happen to 
belong to the *newly created* database, not the old one, so SQLite just nuked a 
live WAL. At this point the next SQLite call tends to return error 522, "disk 
I/O error", and the new db is unreadable.

This scenario isn't covered in "How To Corrupt An SQLite Database File", 
although what sqlite3LeaveMutexAndCloseZombie() does is similar to sections 1.3 
or 2.4.

This sequence of events may look weird at this low level, but at a higher level 
they're not uncommon, especially in our own unit tests. The test suite is using 
our API to create a database (which we bundle in a directory), run a test on 
it, close it, then delete it. After that, the _next_ unit test does exactly the 
same thing, just with a different set of test functions. So the same database 
path ends up getting opened, closed, deleted, reopened over and over. (This 
pattern happens in real use too, not just in testing.)

At the moment I'm not sure what to do about this. The two options seem to be
(a) Stop using sqlite3_close_v2. This means we have to implement our own 
mechanism to forcibly close all open statements when requested to close a 
database, and clear the statement handles in the objects holding them.
(b) Avoid ever reusing the same path. This means we'd have to give the database 
file inside the wrapper directory a different name every time — instead of just 
"db.sqlite3" it'd include a UUID or timestamp or something. That of course 
complicates opening a database, necessitating a directory traversal to discover 
the file in the directory.

Anyone else run into this? Anyone agree it should be added to "How To Corrupt"? 
I don't know if it can be fixed inside SQLite...

—Jens
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to