Re: [ZODB-Dev] Persistent ZEO Cache corruption?
I have seen a similar problems bringing up a system which has been brought down with the usual zeoctl stop zopectl start mantra. The zope/zeo connection is via a network connection. Next time it happens I'll file a bug report. On Thu, 12 Jan 2006, Sidnei da Silva wrote: > On Thu, Jan 12, 2006 at 10:17:54AM -0500, Tim Peters wrote: > | [Sidnei da Silva] > | >> Every now and then I face a corruption of the persistent zeo cache, but > | >> this is the first time I get this variant. > | > | What other variants do you see? > > Can't remember right now, it was quite some time ago and involved > making changes to one zeo client while the other one was down using > 'zopectl debug'. Seen it about 6 times in different environments, so > should be reproduceable. > > | >> The cause is very likely to be a forced shutdown of the box this zope > | >> instance was running on, but I thought it would be nice to report the > | >> issue. > | > | Yes it is! Thank you. It would be better to open a bug report ;-). > > Sure will. > > | >> Here's the traceback:: > | >> > | >> File "/home/sidnei/src/zope/28five/lib/python/ZEO/ClientStorage.py", line > | 314, in __init__ > | >> self._cache.open() > | >> File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 112, in > | open > | >>self.fc.scan(self.install) File > | >> "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 835, in scan > | >>install(self.f, ent) File > | >> "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 121, in > | install > | >> o = Object.fromFile(f, ent.key, skip_data=True) > | >> File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 630, in > | fromFile > | >> raise ValueError("corrupted record, oid") > | >> ValueError: corrupted record, oid > | >> > | >> I have a copy of the zeo cache file if anyone is interested. > | > | Attaching a compressed copy to the bug report would be best (if it's too big > | for that, or it's proprietary, let me know how to get it and I'll put it on > | an internal ZC machine). Can't tell in advance whether that will reveal > | something useful, though (see below). > > Don't think there might be anything sensitive in there, maybe my blog > password in the worst case *wink*. Here's the files (zeo1-1.zec is > probably the one you're after): > > http://awkly.org/files/zeo-cache.tar.bz2 > > | > It seems as though persistent caches haven't been a very sucessful > | > feature. Perhaps we should abandon them. > | > | They do seem to be implicated in more than their share of problems, both > | before and after MVCC. > | > | The post-MVCC ZEO persistent cache _intends_ to call flush() after each file > | change. If it's missing one of those, and depending on what "forced > | shutdown" means exactly, that could be a systematic cause for corruption. > | It doesn't call fsync() unless it's explicitly closed cleanly, but it's > | unclear what good fsync() actually does across platforms when flush() is > | called routinely and the power stays on. > > Oh, I really meant to say "accidental shutdown", though I wasn't around > when the box restarted it looks like it was a power failure. > > -- ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Persistent ZEO Cache corruption?
On Thu, Jan 12, 2006 at 10:17:54AM -0500, Tim Peters wrote: | [Sidnei da Silva] | >> Every now and then I face a corruption of the persistent zeo cache, but | >> this is the first time I get this variant. | | What other variants do you see? Can't remember right now, it was quite some time ago and involved making changes to one zeo client while the other one was down using 'zopectl debug'. Seen it about 6 times in different environments, so should be reproduceable. | >> The cause is very likely to be a forced shutdown of the box this zope | >> instance was running on, but I thought it would be nice to report the | >> issue. | | Yes it is! Thank you. It would be better to open a bug report ;-). Sure will. | >> Here's the traceback:: | >> | >> File "/home/sidnei/src/zope/28five/lib/python/ZEO/ClientStorage.py", line | 314, in __init__ | >> self._cache.open() | >> File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 112, in | open | >>self.fc.scan(self.install) File | >> "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 835, in scan | >>install(self.f, ent) File | >> "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 121, in | install | >> o = Object.fromFile(f, ent.key, skip_data=True) | >> File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 630, in | fromFile | >> raise ValueError("corrupted record, oid") | >> ValueError: corrupted record, oid | >> | >> I have a copy of the zeo cache file if anyone is interested. | | Attaching a compressed copy to the bug report would be best (if it's too big | for that, or it's proprietary, let me know how to get it and I'll put it on | an internal ZC machine). Can't tell in advance whether that will reveal | something useful, though (see below). Don't think there might be anything sensitive in there, maybe my blog password in the worst case *wink*. Here's the files (zeo1-1.zec is probably the one you're after): http://awkly.org/files/zeo-cache.tar.bz2 | > It seems as though persistent caches haven't been a very sucessful | > feature. Perhaps we should abandon them. | | They do seem to be implicated in more than their share of problems, both | before and after MVCC. | | The post-MVCC ZEO persistent cache _intends_ to call flush() after each file | change. If it's missing one of those, and depending on what "forced | shutdown" means exactly, that could be a systematic cause for corruption. | It doesn't call fsync() unless it's explicitly closed cleanly, but it's | unclear what good fsync() actually does across platforms when flush() is | called routinely and the power stays on. Oh, I really meant to say "accidental shutdown", though I wasn't around when the box restarted it looks like it was a power failure. -- Sidnei da Silva Enfold Systems, LLC. http://enfoldsystems.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] Persistent ZEO Cache corruption?
[Tim Peters] > ... > It doesn't call fsync() unless it's explicitly closed cleanly, but it's > unclear what good fsync() actually does across platforms when flush() is > called routinely and the power stays on. LOL. Immediately upon sending that msg, there was a power outage here, which left behind corrupted files in at least 3 (non-ZODB) applications I had running at the time. One of them offered to recover from its last auto-backup, and the other two went insane. It would certainly be nice if persistent ZEO cache files got out of the "went insane" category :-(. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] Persistent ZEO Cache corruption?
[Sidnei da Silva] >> Every now and then I face a corruption of the persistent zeo cache, but >> this is the first time I get this variant. What other variants do you see? >> The cause is very likely to be a forced shutdown of the box this zope >> instance was running on, but I thought it would be nice to report the >> issue. Yes it is! Thank you. It would be better to open a bug report ;-). >> Here's the traceback:: >> >> File "/home/sidnei/src/zope/28five/lib/python/ZEO/ClientStorage.py", line 314, in __init__ >> self._cache.open() >> File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 112, in open >>self.fc.scan(self.install) File >> "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 835, in scan >>install(self.f, ent) File >> "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 121, in install >> o = Object.fromFile(f, ent.key, skip_data=True) >> File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 630, in fromFile >> raise ValueError("corrupted record, oid") >> ValueError: corrupted record, oid >> >> I have a copy of the zeo cache file if anyone is interested. Attaching a compressed copy to the bug report would be best (if it's too big for that, or it's proprietary, let me know how to get it and I'll put it on an internal ZC machine). Can't tell in advance whether that will reveal something useful, though (see below). >> What is bad about this problem is that it prevented Zope from starting >> and there is no obvious clue that removing the persistent zeo cache >> would cure it, though that's what anyone that has a clue about what he's >> doing would do *wink*. [Jim Fulton] > It sounds like there should be logic in that code to abandon the cache if > a problem is found, much as we abandon file-storage index files if > anything seems suspicious. That's an excellent idea, and should be doable with finite effort. > It seems as though persistent caches haven't been a very sucessful > feature. Perhaps we should abandon them. They do seem to be implicated in more than their share of problems, both before and after MVCC. The post-MVCC ZEO persistent cache _intends_ to call flush() after each file change. If it's missing one of those, and depending on what "forced shutdown" means exactly, that could be a systematic cause for corruption. It doesn't call fsync() unless it's explicitly closed cleanly, but it's unclear what good fsync() actually does across platforms when flush() is called routinely and the power stays on. Those were intended to be reliability improvements over the pre-MVCC file cache (which never called flush() or fsync()). "kill -9" can do damage regardless, though. Alas, if the cause is one of those, it's doubtful that analyzing the corrupted file could prove it. It's generally true that our file formats aren't designed to detect corruption (e.g., we don't include checksums of any sort), so it's not clear how well we can detect corruption either. The only post-MVCC ZEO file cache gimmick in this direction is storing 8 redundant bytes per record (the oid for the record is stored near the start of the record, and again at the end). That's a lot better than nothing, and a mismatch in these redundant oids is precisely what caused Sidnei's traceback. The only other kinds of corruption routinely detected are weak checks: we hit EOF when trying to read a record's version, or when trying to read a record's pickle. There is one other strategy, but it only kicks in if you're not running Python with -O (but should probably be run unconditionally at startup, and changed to stop using "assert" statements): if __debug__: self._verify_filemap() That checks that the info read from the cache file is mutually consistent in various ways; Sidnei's run didn't get that far. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Persistent ZEO Cache corruption?
Sidnei da Silva wrote: Every now and then I face a corruption of the persistent zeo cache, but this is the first time I get this variant. The cause is very likely to be a forced shutdown of the box this zope instance was running on, but I thought it would be nice to report the issue. Here's the traceback:: File "/home/sidnei/src/zope/28five/lib/python/ZEO/ClientStorage.py", line 314, in __init__ self._cache.open() File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 112, in openself.fc.scan(self.install) File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 835, in scaninstall(self.f, ent) File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 121, in install o = Object.fromFile(f, ent.key, skip_data=True) File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 630, in fromFile raise ValueError("corrupted record, oid") ValueError: corrupted record, oid I have a copy of the zeo cache file if anyone is interested. What is bad about this problem is that it prevented Zope from starting and there is no obvious clue that removing the persistent zeo cache would cure it, though that's what anyone that has a clue about what he's doing would do *wink*. It sounds like there should be logic in that code to abandon the cache if a problem is found, much as we abandon file-storage index files if anything seems suspicious. It seems as though persistent caches haven't been a very sucessful feature. Perhaps we should abandon them. Jim -- Jim Fulton mailto:[EMAIL PROTECTED] Python Powered! CTO (540) 361-1714http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Persistent ZEO Cache corruption?
Every now and then I face a corruption of the persistent zeo cache, but this is the first time I get this variant. The cause is very likely to be a forced shutdown of the box this zope instance was running on, but I thought it would be nice to report the issue. Here's the traceback:: File "/home/sidnei/src/zope/28five/lib/python/ZEO/ClientStorage.py", line 314, in __init__ self._cache.open() File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 112, in openself.fc.scan(self.install) File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 835, in scaninstall(self.f, ent) File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 121, in install o = Object.fromFile(f, ent.key, skip_data=True) File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 630, in fromFile raise ValueError("corrupted record, oid") ValueError: corrupted record, oid I have a copy of the zeo cache file if anyone is interested. What is bad about this problem is that it prevented Zope from starting and there is no obvious clue that removing the persistent zeo cache would cure it, though that's what anyone that has a clue about what he's doing would do *wink*. -- Sidnei da Silva Enfold Systems, LLC. http://enfoldsystems.com ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev