On Tue, 2005-08-02 at 17:36 +0200, djm wrote:
> Hello,
> 
> Thanks very much for the detailed and helpful response. Ill certainly
> have a look at the book you receommended.. thanks for the tip.
> 
> Sunday, July 31, 2005, 4:47:11 PM, you wrote:
> 
> > As a result, any piece of software that relies on these semantics is
> > bound to be surprised.
> 
> But in this case the piece of sofware is sqlite, and Im not familiar
> enough with its locking and other interanals to know exactly how safe
> it is to do what. As far as I can tell its just reading and writing
> sql commands that are of primary interest, and its up to sqlite to
> make this as safe as it can, and to state under which conditions it
> cant. The information on the website was pretty vague. It shouldnt
> really be a prerequisite to be intimitely familiar with various
> operating systems and/or filesystems to use sqlite or another
> database, on a network to use sqlite effectively (not that such
> knowledge wouldnt be very desirable, and I will read the book you
> recommended).

I disagree with that: see below.

> > As a classical example: AFS deals with this problem by copying the file
> > locally (completely!) before allowing the client to read it. Once the
> > last client closes it, any changes are pushed up blindly. An AFS enabled
> > SQLite is bound to produce corrupt databases if there is more than one
> > writer (!)
> 
> What do you mean "pushed up blindly".

I mean that AFS will actually upload the file overwriting anything else
there.

> > NFS caches blocks together before sending them. This violates the
> > single-byte atomicity of UNIX's own filesystem. As a result, no amount
> > of journalling will help- but at least rename() is still atomic. So with
> > NFS, one should write the new database - completely from scratch, and
> > rename it over the original. Clients should NEVER update a shared file
> > on NFS. Ever.
> 
> How is that different to the above? Isnt the local copy above copied
> over the original on the server?

No. The cache isn't write through in the case of NFS (with -osync), and
the lock rpc causes the read-cache to be invalidated anyway.

> > .. you really shouldn't expect any writers to function on SQLite in
> > a reliable manner...
> 
> Fair enough. I had presumed this alerady. However is is always 100%
> safe to do simultaneous reads (with all sqlite versions (3 and 2) and
> on all os's it runs on?) And does what one reasonybly presumes is a
> read (non write) in terms of sql commands (something that shouldnt
> change the data in the database) always translate to a open file in
> read_only mode for the os?

No, none of those things are guaranteed. If there's even a single writer
it can be unsafe (consider AFS putting a corrupt journal up that clients
notice download, and thrash their local copy).

If the file is never written to, then why is it on the network (besides
for distribution)?


> >> > If there's never any writers, why bother keeping it on the network?
> >> 
> >> I do have reasons, most of which are based on particular customer
> >> requirenments.
> 
> > Customers never say "the database has to be stored on the network so
> > it's slower."
> 
> > They might put something else there- and that's my question: what is the
> > exact requirement that makes you want to keep it on the network _IF_
> > there aren't any writers?
> 
> My customer wants the app instaled cleanly on the network (server) and
> all clients to be oblivious to any updates app or database (presuming
> theyre not unning the app while its being updated). The app should
> also work for the other customers just running it locally.

Okay. that's what's important. That update procedure is completely
unsafe UNLESS you can guarantee that the sqlite database will be
overwritten atomically. Most operating systems don't have such an
operation- the closest thing being rename().

If you're willing to do that, why not download the database
periodically? Why not store the database via HTTP or otherwise check the
mtime of it, and download it anew if the size/mtime doesn't match?

This'll be a lot safer AND provides a mechanism by which corrupt
databases can otherwise be transparently updated.

> > And don't just say "I do have reasons" when you're asking a question.
> > It's really rude:
> 
> > http://www.catb.org/~esr/faqs/smart-questions.html#id3002514
> 
> Sorry if I sounded so but I didnt mean to be rude. I just thought that
> the motivation behind my question would be wasting bandwidth, since
> the question was valid (and in my opinion interesting) whetever the
> motivation. However I didnt ask "how to use x to do y". I asked "can I
> do x" and you asked "why dont I do y instead". And I said that Id
> still like to know even out of curiouity if its safe to do x ;-)

you said "I need to access a database on the network (x), [how do I] do
this safely (y)".

The answer is "it's almost never safe" but nobody wants a short answer-
they want to know why- with justification and everything :)

No, the motivation wasn't about merely wasting bandwidth. It simply is
not possible to modify a collection of bits in a distributed fashion
_transparently_, and as a result networked filesystems are quite simply,
never safe.

Nevertheless, networked filesystems are all high-bandwidth devices.


> I think my original question still stands (one person answered
> directly but qualified it by saying that he'd not really in a position
> to do so definitively). But in answer your (good) suggestion to copy
> the database locally if it has changed, and use the local copy then..
> this would also require querying the network copy for a version first,
> which could also occur simultaneously from several clients.. which
> brings me back to my question of whether its always safe to read
> simultaneously (or more precisely, whether everything sqlite does when
> I query for this version is safe)

You have to query the network copy for a version first _anyway_. The
difference is that the OS handles the messy details if you "keep it on
the network".

Fortunately it doesn't have to be messy:

* Use an HTTP server and use the If-Modified-Since flags.

* Check the last modified timestamp/size (inode?) of the file using the
stat() system call.

* Spawn/embed rsync to transfer portions of the file efficiently.

Other ideas:

* Store on the network a log of sql commands instead of an sqlite
database. Run the SQL commands that haven't been seen (perhaps by
restarting or seeking to a last known position)


Reply via email to