On Wed, Oct 10, 2012 at 7:52 AM, Clemens Ladisch <clem...@ladisch.de> wrote:

> (CC'd sqlite-users ML)
> Theodore Ts'o wrote:
> >
> > What bothers me is no one is asking the question, *why* is Android
> > (and more specifically SQLite and the applications which call SQLite)
> > using fsync's so often?  These aren't transaction processing systems,
> > after all.
>

SQLite (and every other transactional storage system) needs a write-barrier
operation in order to prevent database corruption on an power-loss or hard
reset.  By "write-barrier" I mean some method of ensuring that all write
operations (on a particular pair of files, collectively) that occur before
the write-barrier must persist to flash prior to any write operations that
occur after the write-barrier.  In other words, no write operations are
allowed to be re-ordered across the write-barrier.  Without a write-barrier
of some kind, it is not possible to ensure the integrity of a transaction
across a power-loss.

The only write-barrier operation available to us on unix is fsync().  In
the default rollback-journal modes of SQLite, a write-barrier is required
for every SQL-level transaction.  This means that with SQLite in rollback
mode, lots of fsync() operations are occurring.

SQLite also support a write-ahead log (WAL) mode also exists that works
fine on android, and WAL mode only requires a write-barrier on a
checkpoint.  Checkpoints (normally) occur far less often than transactions,
and hence far fewer fsyncs() are required.  However WAL mode requires a
shared-memory segment accessible to all processes.  The shared memory is
used to coordinate access to the WAL file.  On unix, this shared-memory is
obtained using mmap() of a temporary file that is created in the same
directory as the original database.  Unprivileged processes running in
sandboxes without write access to the directory containing the database
file cannot create this temporary file used to implement shared memory, and
thus cannot use WAL mode.  And due to security concerns, a lot of processes
on phones tend to run in unprivileged sandboxes, meaning that they have
difficulty with WAL mode.  There are ways to work around this limitation of
sandboxes, and iOS does make use of those work-arounds.  But Android never
has tried to do so.

The shared-memory temporary files do not have to be in the same directory
as the database.  You can recompile SQLite with the
SQLITE_SHM_DIRECTORY=/dev/shm compile-time option to cause all
shared-memory temporary files to be put in some common place (like
/dev/shm) which is accessible to all processes.  This works fine for many
processes, but fails utterly for processes that try to open SQLite
databases following a chroot().  So it is not the default mode of
operation.  Are there any chroot() processes on Android that use SQLite?
If not, then the SQLITE_SHM_DIRECTORY compile-time option might be a good
idea there.

We would really, really love to have some kind of write-barrier that is
lighter than fsync().  If there is some method other than fsync() for
forcing a write-barrier on Linux that we don't know about, please enlighten
us.

We would also love to have guidance on alternative techniques for obtaining
memory shared across multiple processes that does not involve mmap() of
temporary files.

-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to