On Dec 22, 2007, at 9:38 AM, Chris Bainbridge wrote:

I have a number of processes running on hosts with a common NFS /home.
I was using a file on this shared NFS as a ZODB database. I had
thought that this wouldn't be a problem, since it would be impossible
for any process to open the zodb file while another process has it
locked, but the zodb file kept getting corrupted anyway.

NFS locking is notoriously fragile or broken. I would never store a database on an NFS file system.

I removed the
transactions, in effect using the zodb as read only, I then got this
error:

ERROR:root:Traceback (most recent call last):
 File "go.py", line 116, in ?
   db.close()
 File "/exports/home/s9734229/phd/src/db.py", line 47, in close
   conn.db().close()
File "/exports/home/s9734229/lib/python/ZODB3-3.7.2-py2.4-linux- x86_64.egg/ZODB/DB.py",
line 444, in close
   self._storage.close()
File "/exports/home/s9734229/lib/python/ZODB3-3.7.2-py2.4-linux- x86_64.egg/ZODB/FileStorage/FileStorage.py",
line 400, in close
   self._lock_file.close()
File "/exports/home/s9734229/lib/python/ZODB3-3.7.2-py2.4-linux- x86_64.egg/ZODB/lock_file.py",
line 74, in close
   os.unlink(self._path)
OSError: [Errno 2] No such file or directory:
'/exports/home/s9734229/gozeo/datastore.fs.lock'

Looking at the code, it does:

   def close(self):
       if self._fp is not None:
           unlock_file(self._fp)
           self._fp.close()
           os.unlink(self._path)
           self._fp = None

So the lock is released before being unlinked. Shouldn't this be the
other way around? As far as I can see, releasing the lock allows a
second process to acquire the lock, start using the zodb, then the
first process will unlink the lock, allowing a third process to
acquire it and also open the zodb, resulting in parallel writing and
corruption.


You are right that there is a race condition in the locking code. This is fixed in ZODB 3.8. I really should have backported this to 3.7. :( The new code doesn't remove the file at all. (I don't remember the details, but there was a race condition to which the file- removal contributed.)

It might be good to post this as a bug to 3.7. If someone backports the fix I'd be happy to make a new release. (I was planning to make a bug-fix release of 3.7 anyway to include aother recent bug fix.))

Jim

--
Jim Fulton
Zope Corporation


_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Reply via email to