Bugs item #1599254, was opened at 2006-11-19 11:03 Message generated for change (Comment added) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1599254&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: David Watson (baikie) Assigned to: A.M. Kuchling (akuchling) Summary: mailbox: other programs' messages can vanish without trace Initial Comment: The mailbox classes based on _singlefileMailbox (mbox, MMDF, Babyl) implement the flush() method by writing the new mailbox contents into a temporary file which is then renamed over the original. Unfortunately, if another program tries to deliver messages while mailbox.py is working, and uses only fcntl() locking, it will have the old file open and be blocked waiting for the lock to become available. Once mailbox.py has replaced the old file and closed it, making the lock available, the other program will write its messages into the now-deleted "old" file, consigning them to oblivion. I've caused Postfix on Linux to lose mail this way (although I did have to turn off its use of dot-locking to do so). A possible fix is attached. Instead of new_file being renamed, its contents are copied back to the original file. If file.truncate() is available, the mailbox is then truncated to size. Otherwise, if truncation is required, it's truncated to zero length beforehand by reopening self._path with mode wb+. In the latter case, there's a check to see if the mailbox was replaced while we weren't looking, but there's still a race condition. Any alternative ideas? Incidentally, this fixes a problem whereby Postfix wouldn't deliver to the replacement file as it had the execute bit set. ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2006-12-15 09:06 Message: Logged In: YES user_id=11375 Originator: NO I'm testing the fix using two Python processes running mailbox.py, and my test case fails even with your patch. This is due to another bug, even in the patched version. mbox has a dictionary attribute, _toc, mapping message keys to positions in the file. flush() writes out all the messages in self._toc and constructs a new _toc with the new file offsets. It doesn't re-read the file to see if new messages were added by another process. One fix that seems to work: instead of doing 'self._toc = new_toc' after flush() has done its work, do self._toc = None. The ToC will be regenerated the next time _lookup() is called, causing a re-read of all the contents of the mbox. Inefficient, but I see no way around the necessity for doing this. It's not clear to me that my suggested fix is enough, though. Process #1 opens a mailbox, reads the ToC, and the process does something else for 5 minutes. In the meantime, process #2 adds a file to the mbox. Process #1 then adds a message to the mbox and writes it out; it never notices process #2's change. Maybe the _toc has to be regenerated every time you call lock(), because at this point you know there will be no further updates to the mbox by any other process. Any unlocked usage of _toc should also really be regenerating _toc every time, because you never know if another process has added a message... but that would be really inefficient. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2006-12-15 08:17 Message: Logged In: YES user_id=11375 Originator: NO The attached patch adds a test case to test_mailbox.py that demonstrates the problem. No modifications to mailbox.py are needed to show data loss. Now looking at the patch... File Added: mailbox-test.patch ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2006-12-12 16:04 Message: Logged In: YES user_id=11375 Originator: NO I agree with David's analysis; this is in fact a bug. I'll try to look at the patch. ---------------------------------------------------------------------- Comment By: David Watson (baikie) Date: 2006-11-19 15:44 Message: Logged In: YES user_id=1504904 Originator: YES This is a bug. The point is that the code is subverting the protection of its own fcntl locking. I should have pointed out that Postfix was still using fcntl locking, and that should have been sufficient. (In fact, it was due to its use of fcntl locking that it chose precisely the wrong moment to deliver mail.) Dot-locking does protect against this, but not every program uses it - which is precisely the reason that the code implements fcntl locking in the first place. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-11-19 15:02 Message: Logged In: YES user_id=21627 Originator: NO Mailbox locking was invented precisely to support this kind of operation. Why do you complain that things break if you deliberately turn off the mechanism preventing breakage? I fail to see a bug here. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1599254&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com