Re: Re[6]: [sqlite] Accessing Database on Network

2005-08-03 Thread Mrs. Brisby
On Wed, 2005-08-03 at 08:46 -0500, Henry Miller wrote:
> >In my opinion system time stamps etc are not a reliable means of
> >comparing 2 files. Many things can change the timestamp of a file,
> >without changing the contents, and one (server) os/filesystem can
> >report a different file size to another (local) for the same file
> >(contents). As I said already, I think having a version number
> >embedded in the databse itself is much more relible.
> 
> You should be running NTP (network time protocol) on all computers.
> This will keep all your system times to within milliseconds.   Unix
> systems keep the last modified times tamp separately.  Microsoft
> Windows sets (resets?  I can never remember) the archive bit, which
> could be abused to tell you when a file is modified - at the cost of
> breaking backups so I can't recommend it.

NTP isn't relevant. Set the mtime to whatever you saw on the server
using wstat() or utime() or what have you. Don't bother trying "to get
close".




Re: Re[6]: [sqlite] Accessing Database on Network

2005-08-03 Thread Henry Miller

On 8/3/2005 at 10:34 djm wrote:


>Wednesday, August 3, 2005, 4:41:24 AM, you wrote:
>
>> No, none of those things are guaranteed. If there's even a single
>> writer it can be unsafe (consider AFS putting a corrupt journal up
>> that clients notice download, and thrash their local copy).
>
>But Im saying there -wont- be a single writer (except the rare update
>process on the server, during which all other clients wont be
>reading), and the question is it it always safe then?

No.   It might be on your particular systems, but we cannot make a
general case.   It is possible that it will always be safe on (for
example) Windows 2000, while not safe on NT or XP!   It depends on how
the OS handles caching.   They are supposed to check on each file
access to see if the file changed, but networks are slow, so they might
not.Thus you could start reading data that is in the cache, which
is out of date.

You may get different results from AFS, NFS, Coda, Samba, or Windows
networking.  Not to mention different versions of each protocol, and
implementation.  (Windows file sharing uses the same protocol as Samba,
but that doesn't mean they handle caching the same!)

>> Okay. that's what's important. That update procedure is completely
>> unsafe UNLESS you can guarantee that the sqlite database will be
>> overwritten atomically. Most operating systems don't have such an
>> operation- the closest thing being rename().
>
>And this is presumably only important if another client is reading the
>file while its being updated. If no client is reading the file on the
>server during update (and none are writing anyways), then the os and
>the filesystem should be irrelevant, right?

No, because we don't know what caching the OS is doing.  It is possible
for the remote OS to not look at the server at all if it still
remembers the part of the file you are asking for after the write!

Even writing your own file system may not work if the OS is caching
things above the file system.   

That said, most OSes get this right these days.  So if you correctly
shut down all your readers, then to a write, then start them back up
again, you might consider the problems rare enough to ignore.   In fact
they may never happen in practice, but in theory they could. 

What will work is to replace all the places where sqlite reads the file
to instead talk to some server on the network you write.   This is the
same as writing a network file system, but you control ALL the layers,
so you know what caching is going on.   This is a lot of work to get
right, and generally a poor use of time, but it is an option.

>In my opinion system time stamps etc are not a reliable means of
>comparing 2 files. Many things can change the timestamp of a file,
>without changing the contents, and one (server) os/filesystem can
>report a different file size to another (local) for the same file
>(contents). As I said already, I think having a version number
>embedded in the databse itself is much more relible.

You should be running NTP (network time protocol) on all computers.
This will keep all your system times to within milliseconds.   Unix
systems keep the last modified times tamp separately.  Microsoft
Windows sets (resets?  I can never remember) the archive bit, which
could be abused to tell you when a file is modified - at the cost of
breaking backups so I can't recommend it.

>> you said "I need to access a database on the network (x), [how do I]
>> do this safely (y)".
>
> I didnt. Youre rephrasing my post, to suit your means, and I think
> the point is degenerating in the process. 

Not really.   We are just reading between the lines.   Your questions
look suspiciously like his rephrasing.   Don't be offended if we are
answering the question we think you should have asked, not the one you
did.

The correct answer to your question you should ask is to use something
like postgresql/Oracle/(I can think of half a dozen other choices).
Sqlite was not designed for what you want to do.  Now there may be good
reasons you cannot use a different database.  That is a whole different
issue.  We cannot make decisions for you, but we can point out what
looks like a bad decision from the parts of the problem we know (we
know that we don't know all your issues).   

Bottom line: what you want to do will probably work without problem.
However if you want a strong statement you need to have your lawyers
contact your OS vender(s) and get a contract in writing that they will
guarantee that this will work.   Salesmen will lie, and the
documentation may not account for the latest changes, but when you have
a contract in writing you can at least get relief if things do not
work.  Asking management for this forces them to place a value on data
if nothing else, which may help drive the decision on what to do.  




Re: Re[6]: [sqlite] Accessing Database on Network

2005-08-03 Thread Mrs. Brisby
On Wed, 2005-08-03 at 10:34 +0200, djm wrote:
> Hello,
> 
> Wednesday, August 3, 2005, 4:41:24 AM, you wrote:
> 
> > No, none of those things are guaranteed. If there's even a single
> > writer it can be unsafe (consider AFS putting a corrupt journal up
> > that clients notice download, and thrash their local copy).
> 
> But Im saying there -wont- be a single writer (except the rare update
> process on the server, during which all other clients wont be
> reading), and the question is it it always safe then?

The answer is "sometimes not". Sometimes, it's even "often not".

It really depends on your situation. It depends on what networked
filesystem you're using, and what operating system you're using.

In the case of UNIX and AFS- if you have all clients turned off - or you
do it during AFS replication, then yes.


> > Okay. that's what's important. That update procedure is completely
> > unsafe UNLESS you can guarantee that the sqlite database will be
> > overwritten atomically. Most operating systems don't have such an
> > operation- the closest thing being rename().
> 
> And this is presumably only important if another client is reading the
> file while its being updated. If no client is reading the file on the
> server during update (and none are writing anyways), then the os and
> the filesystem should be irrelevant, right?

It depends on what you mean by "no client is reading."
If you really mean the machines are turned off [and have no cache to
upload], then maybe.

If you mean that AND you're doing the update "on the server" (instead of
by using another node/client/workstation) then the answer is "probably".


> > If you're willing to do that, why not download the database
> > periodically? Why not store the database via HTTP or otherwise check
> > the mtime of it, and download it anew if the size/mtime doesn't
> > match?
> 
> > This'll be a lot safer AND provides a mechanism by which corrupt
> > databases can otherwise be transparently updated.
> 
> Downloading a copy of the database is indeed is indeed a good
> suggestion, but it doesnt change my original question.
> 
> In my opinion system time stamps etc are not a reliable means of
> comparing 2 files. Many things can change the timestamp of a file,
> without changing the contents, and one (server) os/filesystem can
> report a different file size to another (local) for the same file
> (contents). As I said already, I think having a version number
> embedded in the databse itself is much more relible.

Wrong. The sqlite database is binary. Any size check is going to be in
8-bit bytes and will always be the same- except for platforms that
SQLite doesn't run on- any comparison with a system that doesn't have
8-bit bytes will have to be aware of that fact.

Changing the timestamp COULD be enough to have clients interrogate it-
for example, perform some rsync operations on the file, or check in a
prearranged place for a generation number.

Either store the version/generation number in a separate file or hack
the database to store it in the header. IIRC there are a few unused
bytes there.

Your OS might have other freshness checks (unix, for example, can use
inodes if you always rename()), but even if they don't, there are other
mechanisms:

* Encoding the generation number/version into the filename
* Update some EA (attribute)
* Update the file's mtime to some value THAT IT WASN'T before (by
recording all used mtimes)

You still haven't said what platform you're developing for.

> So the question is still is it always safe if 2 clients are
> simultaneously reading this version info from the database (or for
> that matter, but its not sqlite specific, if 2 clients are
> accessing/running the executable file) ?

The answer is still one of "usually", "probably" or "maybe".

By using the methods I suggest the answer is "yes".

> > you said "I need to access a database on the network (x), [how do I]
> > do this safely (y)".
> 
>  I didnt. Youre rephrasing my post, to suit your means, and I think
>  the point is degenerating in the process. And as I said already I
>  wasnt trying to be rude, and Im sorry if it seemed so. But if you
>  insist on interpreting it as such I suppose I cant stop you.

I'm showing you how I read your original post. Is this critically
different than what you're saying? If so, how?



Re[6]: [sqlite] Accessing Database on Network

2005-08-03 Thread djm
Hello,

Wednesday, August 3, 2005, 4:41:24 AM, you wrote:

> No, none of those things are guaranteed. If there's even a single
> writer it can be unsafe (consider AFS putting a corrupt journal up
> that clients notice download, and thrash their local copy).

But Im saying there -wont- be a single writer (except the rare update
process on the server, during which all other clients wont be
reading), and the question is it it always safe then?

> Okay. that's what's important. That update procedure is completely
> unsafe UNLESS you can guarantee that the sqlite database will be
> overwritten atomically. Most operating systems don't have such an
> operation- the closest thing being rename().

And this is presumably only important if another client is reading the
file while its being updated. If no client is reading the file on the
server during update (and none are writing anyways), then the os and
the filesystem should be irrelevant, right?

> If you're willing to do that, why not download the database
> periodically? Why not store the database via HTTP or otherwise check
> the mtime of it, and download it anew if the size/mtime doesn't
> match?

> This'll be a lot safer AND provides a mechanism by which corrupt
> databases can otherwise be transparently updated.

Downloading a copy of the database is indeed is indeed a good
suggestion, but it doesnt change my original question.

In my opinion system time stamps etc are not a reliable means of
comparing 2 files. Many things can change the timestamp of a file,
without changing the contents, and one (server) os/filesystem can
report a different file size to another (local) for the same file
(contents). As I said already, I think having a version number
embedded in the databse itself is much more relible.

So the question is still is it always safe if 2 clients are
simultaneously reading this version info from the database (or for
that matter, but its not sqlite specific, if 2 clients are
accessing/running the executable file) ?


> you said "I need to access a database on the network (x), [how do I]
> do this safely (y)".

 I didnt. Youre rephrasing my post, to suit your means, and I think
 the point is degenerating in the process. And as I said already I
 wasnt trying to be rude, and Im sorry if it seemed so. But if you
 insist on interpreting it as such I suppose I cant stop you.