Re: [ZODB-Dev] What's best to do when there is a failure in the second phase of 2-phase commit on a storage server
On Sep 30, 2008, at 1:38 PM, Dieter Maurer wrote: > Jim Fulton wrote at 2008-9-19 13:45 -0400: >> ... >> 2. We (ZC) are moving to 64-bit OSs. I've resisted this for a while >> due to the extra memory overhead of 64-bit pointers in Python >> programs, but I've finally (too late) come around to realizing that >> the benefit far outweighs the cost. (In this case, the process was >> around 900MB in size. > > That is very strange. > On our Linux systems (Debian etch), the processes can use 2.7 to 2.9 > GB > of memory before the os refuses to allocate more. Yeah. Strange. >> It was probably trying to malloc a few hundred >> MB. The malloc failed despite the fact that there was more than 2GB >> of available process address space and system memory.) >> >> 3. I plan to add code to FileStorage's _finish that will, if there's >> an error: >> >> a. Log a critical message. >> >> b. Try to roll back the disk commit. I decided not to do this. Too complicated. >> >> >> c. Close the file storage, causing subsequent reads and writes to >> fail. > > Raise an easily recognizable exception. I raise the original exception. > In our error handling we look out for some nasty exceptions and > enforce > a restart in such cases. The exception above might be such a nasty > exception. The critical log entry should be easy enough to spot. ... >> I considered some other ideas: >> >> - Try to get FileStorage to repair it's meta data. This is certainly >> theoretically doable. For example, it could re-build it's in-memory >> index. At this point, that's the only thing in question. OTOH, >> updating it is the only thing left to fail at this point. If >> updating >> it fails, it seems likely that rebuilding it will fail as well. >> >> - Have a storage server restart when a tpc_finish call fails. This >> would work fine for FileStorage, but might be the wrong thing to do >> for another storage. The server can't know. > > Why do you think that a failing "tpc_finish" is less critical > for some other kind of storage? It's not a question of criticality. It's a question of whether a restart will fix the problem. I happen to know that a file storage would be in a reasonable state after a restart. I don't know this to be the case for some other storage. Jim -- Jim Fulton Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] 3.8.1b8 released and would like to release 3.8.1 soon
Jim Fulton wrote: > I'd appreciate it if people would try it out soon. Besides the RelStorage site where we were running into problems as mentioned in the other thread, we also use the ZODB 3.8 branch (beta8 plus the first two beta fixes) in a smaller site with blob storage for about a week in production now. So far it looks stable. The Data.fs in question is about 4 gigabyte (packed) with about 3.5 million objects in it and about 8 gigabyte of blobs. The site is used heavily and has a certain amount of conflict errors happening. It uses a non-persistent ZEO client cache of one gigabyte in addition to a 15 object count cache size. Hanno ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] 3.8.1b8 released and would like to release 3.8.1 soon
Dieter Maurer wrote: > Wichert Akkerman wrote at 2008-9-24 09:44 +0200: >> Jim Fulton wrote: >>> I'd appreciate it if people would try it out soon. >>> >> I can say that the combination of 3.8.1b8 and Dieter's >> zodb-cache-size-bytes patch does not seem to work. With >> zodb-cache-size-bytes set to 1 gigabyte on an instance with a single >> thread and using RelStorage Zope capped its memory usage at 200mb. > > I can see two potential reasons (beside a bug in my implementation): > > * you have not used a very large object count. > > The most tight restriction (count or size) restricts what can be > in the cache. With a small object count, this will be tighter than > the byte size restriction. The object count is 65. Without the cache-size-bytes setting this produces a memory load of about one gigabyte to one and a half gigabytes currently. > * Size is only estimated -- not exact. > > The pickle size is used as size approximation. > > I would be surprized however, when the pickle size would be five times > larger than the real size. IIRC turned into a packed Data.fs the size of the whole content is about 25 gigabytes of all typical Plone content. I think a possible interaction with RelStorage (which we asked Shane to look into) or Jim's mentioned cPersistence.h change was far more likely causing this. Hanno ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] 3.8.1b8 released and would like to release 3.8.1 soon
Jim Fulton wrote: > On Sep 24, 2008, at 8:06 AM, Wichert Akkerman wrote: > >> Wichert Akkerman wrote: >>> I can say that the combination of 3.8.1b8 and Dieter's >>> zodb-cache-size-bytes patch does not seem to work. With >>> zodb-cache-size-bytes set to 1 gigabyte on an instance with a single >>> thread and using RelStorage Zope capped its memory usage at 200mb. >>> >> After having a few more zope instances break down it seems the >> behaviour is slightly different: even with the zodb-cache-size-bytes >> patch included but without setting that option in zope.conf Zope >> instances will at some point stop responding without any hint as to >> why in any of the logs. We do not seem to see this in another site, >> so this might be due to an interaction with RelStorage. > > See: > > http://svn.zope.org/ZODB/trunk?view=rev&rev=91565 > > Maybe the problem that this addresses is what's affecting you. This is indeed a very likely contender. As we use this inside Zope2, I doubt the Zope C extensions were rebuilt using the cPersitence.h file from the ZODB egg we included. Hanno ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] 3.8.1b8 released and would like to release 3.8.1 soon
On Sep 24, 2008, at 8:06 AM, Wichert Akkerman wrote: > Wichert Akkerman wrote: >> >> Jim Fulton wrote: >> >>> I'd appreciate it if people would try it out soon. >>> >>> >> I can say that the combination of 3.8.1b8 and Dieter's >> zodb-cache-size-bytes patch does not seem to work. With >> zodb-cache-size-bytes set to 1 gigabyte on an instance with a single >> thread and using RelStorage Zope capped its memory usage at 200mb. >> > > After having a few more zope instances break down it seems the > behaviour is slightly different: even with the zodb-cache-size-bytes > patch included but without setting that option in zope.conf Zope > instances will at some point stop responding without any hint as to > why in any of the logs. We do not seem to see this in another site, > so this might be due to an interaction with RelStorage. See: http://svn.zope.org/ZODB/trunk?view=rev&rev=91565 Maybe the problem that this addresses is what's affecting you. Jim -- Jim Fulton Zope Corporation ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] 3.8.1b8 released and would like to release 3.8.1 soon
Wichert Akkerman wrote at 2008-9-24 09:44 +0200: >Jim Fulton wrote: >> I'd appreciate it if people would try it out soon. >> > >I can say that the combination of 3.8.1b8 and Dieter's >zodb-cache-size-bytes patch does not seem to work. With >zodb-cache-size-bytes set to 1 gigabyte on an instance with a single >thread and using RelStorage Zope capped its memory usage at 200mb. I can see two potential reasons (beside a bug in my implementation): * you have not used a very large object count. The most tight restriction (count or size) restricts what can be in the cache. With a small object count, this will be tighter than the byte size restriction. * Size is only estimated -- not exact. The pickle size is used as size approximation. I would be surprized however, when the pickle size would be five times larger than the real size. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] What's best to do when there is a failure in the second phase of 2-phase commit on a storage server
Jim Fulton wrote at 2008-9-19 13:45 -0400: > ... >2. We (ZC) are moving to 64-bit OSs. I've resisted this for a while >due to the extra memory overhead of 64-bit pointers in Python >programs, but I've finally (too late) come around to realizing that >the benefit far outweighs the cost. (In this case, the process was >around 900MB in size. That is very strange. On our Linux systems (Debian etch), the processes can use 2.7 to 2.9 GB of memory before the os refuses to allocate more. >It was probably trying to malloc a few hundred >MB. The malloc failed despite the fact that there was more than 2GB >of available process address space and system memory.) > >3. I plan to add code to FileStorage's _finish that will, if there's >an error: > > a. Log a critical message. > > b. Try to roll back the disk commit. > > c. Close the file storage, causing subsequent reads and writes to >fail. Raise an easily recognizable exception. In our error handling we look out for some nasty exceptions and enforce a restart in such cases. The exception above might be such a nasty exception. If possible, the exception should provide full information about the original exception (in the way of the nested exceptions of Java, emulated by Tim at some places in the ZODB code). >4. I plan to fix the client storage bug. > >I can see 3c being controversial. :) In particular, it means that your >application will be effectively down without human intervention. That's why I would prefer an easily recognizable exception -- in order to restart automatically. >I considered some other ideas: > >- Try to get FileStorage to repair it's meta data. This is certainly >theoretically doable. For example, it could re-build it's in-memory >index. At this point, that's the only thing in question. OTOH, >updating it is the only thing left to fail at this point. If updating >it fails, it seems likely that rebuilding it will fail as well. > >- Have a storage server restart when a tpc_finish call fails. This >would work fine for FileStorage, but might be the wrong thing to do >for another storage. The server can't know. Why do you think that a failing "tpc_finish" is less critical for some other kind of storage? -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev