On Fri, 3 Sep 2004, Ara.T.Howard wrote:

if you are unfamiliar with nfs sillynames, they occur when a file that is open
on one client is removed or renamed on another.


i am seeing alot of these appear in an NFS directory i'm using to store a
sqlite database acessed by many clients. the access protocol is a
meta-transaction wrapped around an actual sqlite transaction using an
additional empty lockfile (db.lock). this is to ensure single writer multiple
reader semantics for the entire network, eg.



lock_type = read # or perhaps write

 aquire fcntl lock of lock_type on db.lock

   open db.lock

     start_transaction(lock_type)

       db.execute(sql)

     end_transaction

   close db.lock

 release fcntl lock


NONE of the applications has exited for about 30 days. ALL of the application
are aquiring write locks on the db.lock so one process only is accessing the
db. NONE of the uses removes, renames, etc. the file - only reads and writes
are done via the sqlite api. the semantics are definitely single writer (and
potentially) many reader, if this we not so the application would crash
horribly almost instantly. the api i'm using throws an exception if it gets
SQLITE_BUSY and i have no busy handler so i am positive that the locking
works, and that no readers ever attempt to write (upgrade lock) and vice
versa.


i'm simply mystified as to what's creating the sillynames. they all appear to
be the product of the sqlite api: opening them up in vi shows them to be a
binary file that's obviously part of the database since i can recognize many
strings from the database in them.


is worries me - it seems to imply that sqlite, at some point does a rename or
remove when some remote client has an open file handle. could this be because
ALL of my operations (even reads) is inside a transaction?


more info...

the sillynamed files are only ever a minute or so old and disappear almost as
quickly (this would be after the last client called a close). the application
is working great but i'd like to understand this as it concerns me somewhat.
the sqlite lib version is the latest 2.8 branch. the nfs server/client impl
are the latest patched redhat enterprise versions.


kind regards.

a followup on this post:

i was running some straces on sqlite while it using transactions and think i
may have found the source of the problem, we see this in in the strace output:

  ...
  ...
  close(5)                                = 0
  unlink("/dmsp/moby-1-1/ahoward/shared/silly/db-journal") = 0
  ...
  ...

i think what is happening is

  ...
  ...
  close(5)                                = 0

here remote client opens db-journal

  unlink("/dmsp/moby-1-1/ahoward/shared/silly/db-journal") = 0

and the sillyname is created, only to disappear when the remote client closes
the db-journal.

  ...
  ...

the locking seems like it should prevent this - but the files themselves are
definitely subsets of the original - so i'm thinking there is some race
condition here.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it. | --Dogen
===============================================================================

Reply via email to