On Sun, 03 May 2015 13:09:43 -0700
Scott Doctor <scott at scottdoctor.com> wrote:

> I wonder why the network developers have not appropriately addressed
> this issue. Looks like I need to research this problem more before
> implementing. 

>From one perspective -- a common use case -- they *have* addressed this
issue.  But first let's be clear on what's going on and why network
filesystems are absolutely, positively unreliable for SQLite.  

Consider a 1-byte file containing the letter 'A'.  Two processes open
it read-write, and read the file.  The first time read(2) is called, the
'A' is read from the server's file into the client NFS (or whatever)
driver. Thereafter, on each machine, each iteration of 

        lseek(fd, 0, 0, SEEK_SET);
        read(fd, buf, 1);

will read from the filebuffer cache, not from the server, at least for
a while.  

Now the first machine writes 'B' to the file.  On that machine, the
above iteration returns 'B' because that machine's filebuffer cache was
updated when the 'B' was sent to the server.  The first machine's cache
is coherent with the file contents on the server.  

The second machine is *not* notified of the update.  Reads continue to
be satisfied from cache, and the cache continues to contain 'A'.  The
cache is inconsistent with the file contents on the server.  

That is the way most remote filesystems are designed and implemented
and documented.  Cf. http://www.ietf.org/rfc/rfc1813.txt:

   4.11 Caching policies

   The NFS version 3 protocol does not define a policy for
   caching on the client or server. In particular, there is no
   support for strict cache consistency between a client and
   server, nor between different clients. See [Kazar] for a
   discussion of the issues of cache synchronization and
   mechanisms in several distributed file systems.

If you can find documentation for cache semantics for CIFS, I'd be
interested to see it.  

This is why I keep reminding the list that problems with remote
filesystems aren't due to "bugs".  They work as designed and as
documented.  They just don't work as expected, when "expected"
means, "just like local filesystems (only slower)".  

A moment's thought reveals why they work this way.  Network-wide
client-cache coherency is has O(n) complexity where N is the number of
clients.  It's fraught with timing and performance issues.  

We haven't even mentioned SQLite per se.  I don't know where SQLite
tracks table-update status.  I do know that some of the filebuffer
cache data are in SQLite variables.  Some invariants about
thread-of-control consistency of those variables do not hold when the
underlying file is changed by an "external" process, which is what a
SQLite client running on another machine is.  The situation invites
textbook write-after-read cache-coherency problems.  

The problems that SQLite has with network filesystems isn't the least
mysterious if you scratch the surface.  RFC 1813 is in the public
domain, and references several papers that are freely available.  One
evening will convince you.  

You might ask, if the problems are hard but solveable, why not solve
them anyway, and perhaps make the sematics configurable on a
correctness-performance tradeoff?  I think the answer is that most
applications don't require it!  Most remote-file applications read the
entire file into memory, modify the data in memory, and write the file
back out minutes or hours later when the user saves it.  For them,
last-writer-wins is OK.  For applications that need to coordinate
comingled writes to a shared file, the dominant solution is a
server-based DBMS such as Postgres.  

--jkl

Reply via email to