[ZODB-Dev] Re: Connection pool makes no sense
A little bit of history... We have zope as an application server for heavy loaded tech process. We have high peaks of load several times a day and my question is about how can we can avoid unused connections to remain in memory after peak is passed? Before ZODB-3.4.1 connection pool was fixed size of pool_size and that caused zope to block down while load peaks. ZODB-3.4.2 that is shipped with Zope-2.8.5 has connection pool that does not limit the opened connections, but tries to reduce the pool to the pool_size and this behavior is broken IMO. Follow my idea... After peak load I have many (thousands of connections) that have cached up different objects including RDB connections. Huh are you sure? That would mean you have thousands of threads. Or hundreds or ZEO clients. Or hundreds of ZODB mountpoints. By itself Zope never uses more than one connection per thread, and the number of thread is usually small. If you see many RDB connections, then it's a RDB problem and not a ZODB problem. Something not releasing RDB connections quick enough, or leaking RDB connections. Florent -- Florent Guillaume, Nuxeo (Paris, France) Director of RD +33 1 40 33 71 59 http://nuxeo.com [EMAIL PROTECTED] ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Re: Connection pool makes no sense
В Чтв, 29/12/2005 в 11:30 +0100, Florent Guillaume пишет: A little bit of history... We have zope as an application server for heavy loaded tech process. We have high peaks of load several times a day and my question is about how can we can avoid unused connections to remain in memory after peak is passed? Before ZODB-3.4.1 connection pool was fixed size of pool_size and that caused zope to block down while load peaks. ZODB-3.4.2 that is shipped with Zope-2.8.5 has connection pool that does not limit the opened connections, but tries to reduce the pool to the pool_size and this behavior is broken IMO. Follow my idea... After peak load I have many (thousands of connections) that have cached up different objects including RDB connections. Hundreds... my mistake. Huh are you sure? That would mean you have thousands of threads. Or hundreds or ZEO clients. Or hundreds of ZODB mountpoints. By itself Zope never uses more than one connection per thread, and the number of thread is usually small. If you see many RDB connections, then it's a RDB problem and not a ZODB problem. Something not releasing RDB connections quick enough, or leaking RDB connections. Not agree. Can you answer the question? Does self.all.remove(c) mean that we WANT to destroy connection instance? If not then where in ZODB source code i can see connection destruction? Clearing cache and calling _v_database_connection.close() method? You've just caught me on thousands but gave no comments on deletion of connection instances... but this is the clue to the topic. Florent --== *** ==-- Заместитель директора Департамента Информационных Технологий Юдыцкий Игорь Владиславович ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: Connection pool makes no sense
A little bit of history... We have zope as an application server for heavy loaded tech process. We have high peaks of load several times a day and my question is about how can we can avoid unused connections to remain in memory after peak is passed? Before ZODB-3.4.1 connection pool was fixed size of pool_size and that caused zope to block down while load peaks. ZODB-3.4.2 that is shipped with Zope-2.8.5 has connection pool that does not limit the opened connections, but tries to reduce the pool to the pool_size and this behavior is broken IMO. Follow my idea... After peak load I have many (thousands of connections) that have cached up different objects including RDB connections. Hundreds... my mistake. Huh are you sure? That would mean you have thousands of threads. Or hundreds or ZEO clients. Or hundreds of ZODB mountpoints. By itself Zope never uses more than one connection per thread, and the number of thread is usually small. If you see many RDB connections, then it's a RDB problem and not a ZODB problem. Something not releasing RDB connections quick enough, or leaking RDB connections. Not agree. Can you answer the question? Does self.all.remove(c) mean that we WANT to destroy connection instance? The self.all.remove(c) in _ConnectionPool attempts to destroy the connection. If something else has a reference to it once it's closed, then that's a bug, and it shouldn't. It should only keep a weak reference to it at most. If not then where in ZODB source code i can see connection destruction? Clearing cache and calling _v_database_connection.close() method? Sorry I don't know what a _v_database_connection is, it's not in ZODB or transaction code. If it's RDB code I can't help you. You've just caught me on thousands but gave no comments on deletion of connection instances... but this is the clue to the topic. Even hundreds of ZODB connections is absurd. Again, with 4 threads you should never get more than 4 Filestorage connections plus 4 TemporaryStorage connections. Florent -- Florent Guillaume, Nuxeo (Paris, France) CTO, Director of RD +33 1 40 33 71 59 http://nuxeo.com [EMAIL PROTECTED] ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: Connection pool makes no sense
В Чтв, 29/12/2005 в 12:27 +0100, Florent Guillaume пишет: If you see many RDB connections, then it's a RDB problem and not a ZODB problem. Something not releasing RDB connections quick enough, or leaking RDB connections. Not agree. Can you answer the question? Does self.all.remove(c) mean that we WANT to destroy connection instance? The self.all.remove(c) in _ConnectionPool attempts to destroy the connection. If something else has a reference to it once it's closed, then that's a bug, and it shouldn't. It should only keep a weak reference to it at most. But it's nonsense! If weakref exists then some other object has ref to the obj! And weakValueDictionary is cleaned up automatically when the last strong ref disappears. Destroying obj with this logic is absurd: def _reduce_size(self, strictly_less=False): target = self.pool_size - bool(strictly_less) while len(self.available) target: c = self.available.pop(0) == we have ref to the connection here, before calling remove self.all.remove(c) def remove(self, obj): del self.data[id(obj)] == there is no use to delete obj by deleting weakref... we just deleting weakref from the weakValueDictionary! Try this: 1. add this method to Connection class definition def __del__(self): print 'Destruction...' then do this: import sys sys.path.append('/opt/Zope/lib/python') from ZODB import Connection c = Connection.Connection() del(c) c = Connection.Connection() del(c._cache) del(c) Destruction... See? You can NOT delete object because _cache keeps reference to it... and connection remains forever!!! It's cache has RDB connection objects and they are not closed. Connection becomes inaccessible and unobtainable trough the connection pool. That's what I wanted to say. It's definitely a BUG. If not then where in ZODB source code i can see connection destruction? Clearing cache and calling _v_database_connection.close() method? Sorry I don't know what a _v_database_connection is, it's not in ZODB or transaction code. If it's RDB code I can't help you. Don't bother... it's RDB DA handle. You've just caught me on thousands but gave no comments on deletion of connection instances... but this is the clue to the topic. Even hundreds of ZODB connections is absurd. Again, with 4 threads you should never get more than 4 Filestorage connections plus 4 TemporaryStorage connections. Okay... we moved from Zope 2.7.4, that blocked with small number of threads and pool_size with high site activity, so we had to increase those numbers. Anyway in the default configuration of 4 threads and pool_size of 7 we can watch lots of lost connections, and we now know both that it's a bug... so we have big pool_size to avoid connection deletion (loosing). Florent --== *** ==-- Заместитель директора Департамента Информационных Технологий Юдыцкий Игорь Владиславович ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] Connection pool makes no sense
[??? ? ?] Hi. A little bit of history... We have zope as an application server for heavy loaded tech process. We have high peaks of load several times a day and my question is about how can we can avoid unused connections to remain in memory after peak is passed? Before ZODB-3.4.1 connection pool was fixed size of pool_size and that caused zope to block down while load peaks. ZODB-3.4.2 that is shipped with Zope-2.8.5 has connection pool that does not limit the opened connections, but tries to reduce the pool to the pool_size and this behavior is broken IMO. Follow my idea... After peak load I have many (thousands of connections) that have cached up different objects including RDB connections. Those connections NEVER used after that. Why... well, because connection pool doesn't work correctly with _reduce_size method. # Throw away the oldest available connections until we're under our # target size (strictly_less=False) or no more than that # (strictly_less=True, the default). def _reduce_size(self, strictly_less=False): target = self.pool_size - bool(strictly_less) while len(self.available) target: c = self.available.pop(0) self.all.remove(c) --- Does this mean that we want to delete connection object from the memory? No, it means that _ConnectionPool (the class to which method reduce_size() belongs) no longer wishes to remember anything about connection `c`. Nothing in _ConnectionPool _prevents_ `c` from going away then, but references in application code can keep `c` alive for an arbitrarily long time after this. If yes then why we use remove method of weakSet object? _ConnectionPool no longer has any reason to remember anything about `c`, so it would be wasteful for it to continue burning RAM keeping `c` in its weak set. If ill-behaved application code is keeping `c` alive, _ConnectionPool.all could grow without bound otherwise. It's nonsense. It's defensive coding, protecting _ConnectionPool from some bad effects of ill-behaved application code. # Same as a Set, remove obj from the collection, and raise # KeyError if obj not in the collection. def remove(self, obj): del self.data[id(obj)] --- This just removes weekref from the weakValueDictionary not the object... Any time you do del dict[key] on a dictionary `dict`, `key` is removed from `dict`, and so is the associated value `dict[key]` (which is a weakref to `obj` in the snippet above). The only _strong_ reference _ConnectionPool had to `c` was in its .available queue, and that's gone too. So if we are willing to destroy obj - we are are wrong way here... Sorry, it looks fine to me. Ok. Lets look at pop, push and repush... ... We do pop connection from self.available and do push and repush used connection to self.available, so there is no other way to obtain connection. Only from self.available. Good. But... look what _reduce_size method does. It pops first connection and tries to remove it in case of larger size of self.available list, so the connection is not in the self.available list and nobody can obtain it any more. Good... but it's good just in case of deletion of connection object from the memory. But it's still there!!! and it's cache too with opened RDB connections that will never serve anyone. Then it's most likely that application code is retaining one or more strong references to it. ZODB can't stop application code from doing that. I don't know if there is some other way to return connection to the pool. A Connection `c` is returned to the pool as a result of explicitly calling c.close(). I do not know ZODB as a whole thing, but accordingly to the logic that I can see in these pieces of code and what I see every day after load peaks makes me believe that connection object SHOULD be deleted with the cached objects from the memory. ZODB has never done that -- close() has always returned a Connection to a pool for potential reuse, with its cache intact. Most people call that an important optimization, because most people (meaning Zope ;-)) have only a handful of Connections open at a time, and get real value out of reusing cache content. And this can be done definitely not by deleting weakref. The primary purpose to deleting the weakref is to prevent unbounded memory growth of the .all set in the face of ill-behaved application code. It also speeds Python's gc to destroy weakrefs that are no longer needed (otherwise Python's gc has to spend time analyzing them). Or it seams to me as a memory leak. If application code keeps strong references to closed Connection objects, then yes, they'll certainly leak. I think better logic would be to have idle period along with pool_size. We should remove oldest connection from the pool that was not used for idle period. So we can have small pool_size and small connection pool that can grow with site load and shrink with low
RE: [ZODB-Dev] Re: Connection pool makes no sense
[Florent Guillaume] ... The self.all.remove(c) in _ConnectionPool attempts to destroy the connection. Nope, it's simply getting rid of a weak reference that no longer serves a purpose, to avoid unbounded growth of the .all set in case of ill-behaved application code, and to speed Python's cyclic gc a little. Removing the Connection from .available removed _ConnectionPool's only strong reference to the Connection. If something else has a reference to it once it's closed, then that's a bug, and it shouldn't. Yup! ... Even hundreds of ZODB connections is absurd. I'd settle for calling it uncommon and unexpected. Again, with 4 threads you should never get more than 4 Filestorage connections plus 4 TemporaryStorage connections. Bears repeating ;-) ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] Re: Connection pool makes no sense
... [Florent Guillaume] The self.all.remove(c) in _ConnectionPool attempts to destroy the connection. If something else has a reference to it once it's closed, then that's a bug, and it shouldn't. It should only keep a weak reference to it at most. [EMAIL PROTECTED] But it's nonsense! Please try to remain calm here. It's not nonsense, but if you're screaming too loudly you won't be able to hear :-) If weakref exists then some other object has ref to the obj! Or there are no strong reference to `obj`, but `obj` is part of cyclic garbage so _continues to exist_ until a round of Python's cyclic garbage collection runs. And weakValueDictionary is cleaned up automatically when the last strong ref disappears. That's a necessary precondition, but isn't necessarily sufficient. When the last strong reference to a value in a weakValueDictionary goes away, if that value is part of cyclic garbage then the weakValueDictionary does not change until Python's cyclic gc runs. Destroying obj with this logic is absurd: I covered that before, so won't repeat it. You misunderstood the intent of this code. ... del self.data[id(obj)] == there is no use to delete obj by deleting weakref... we just deleting weakref from the weakValueDictionary! Yes, it's just deleting the weakref -- and that's all it's trying to do, and there are good reasons to delete the weakref here (but are not the reasons you thought were at work here). Try this: 1. add this method to Connection class definition def __del__(self): print 'Destruction...' then do this: You're _really_ going to confuse yourself now ;-) Because Connections are always involved in reference cycles, adding a __del__ method to Connection guarantees that Python's garbage collection will _never_ reclaim a Connection (at least not until you explicitly break the reference cycles). import sys sys.path.append('/opt/Zope/lib/python') from ZODB import Connection c = Connection.Connection() del(c) c = Connection.Connection() del(c._cache) You're breaking a reference cycle by hand here, so that it becomes _possible_ for gc to clean up the Connection. But the only reason that was necessary is because you added a __del__ method to begin with. del(c) Destruction... See? You can NOT delete object because _cache keeps reference to it... and connection remains forever!!! That's because you added a __del__ method; it's not how Connection normally works. I'll give other code below illustrating this. It's cache has RDB connection objects and they are not closed. Connection becomes inaccessible and unobtainable trough the connection pool. In your code above, `c` was never in a connection pool. You're supposed to get a Connection by calling DB.open(), not by instantiating Connection() yourself (and I sure hope you're not instantiating Connection() directly in your app!). That's what I wanted to say. It's definitely a BUG. Sorry, there's no evidence of a ZODB bug here yet. Consider this code instead. It opens 10 Connections in the intended way (via DB.open()), and creates a weakref with a callback to each so that we can tell when they're reclaimed. It then closes all the Connections, and destroys all its strong reference to them: import weakref import gc import ZODB import ZODB.FileStorage class Wrap: def __init__(self, i): self.i = i def __call__(self, *args): print Connection #%d went away. % self.i N = 10 st = ZODB.FileStorage.FileStorage('blah.fs') db = ZODB.DB(st) cns = [db.open() for i in xrange(N)] wrs = [weakref.ref(cn, Wrap(i)) for i, cn in enumerate(cns)] print closing connections for cn in cns: cn.close() print del'ing cns del cns # destroy all our hard references print invoking gc gc.collect() print done This is the output: closing connections del'ing cns invoking gc Connection #0 went away. Connection #1 went away. Connection #2 went away. Done Note that nothing happens before Python's cyclic gc runs. That's because Connections are in reference cycles, and refcounting cannot reclaim objects in trash cycles. Because I used weakref callbacks instead of __del__ methods, cyclic gc _can_ reclaim Connections in trash cycles. When the 10 Connections got closed, internally _ConnectionPool added them, one at a time, to its .available queue. When #7 was closed, the pool grew to 8 objects, so it forgot everything it knew about the first Connection (#0) in its queue. Nothing happens then, though, because nothing _can_ happen before cyclic gc runs. When #8 was closed, #1 got removed from .available, and when #9 was closed, #2 got removed from .available. When gc.collect() runs, those 3 Connections (#0, #1, and #2) are all reclaimed. The other 7 Connections (#3-#9) are still alive, sitting in the .available queue waiting to be reused. ___ For more information about ZODB, see the ZODB Wiki:
RE: [ZODB-Dev] Re: Connection pool makes no sense
Oops! I sent this to zope-dev instead of zodb-dev by mistake. [EMAIL PROTECTED] Not agree. Can you answer the question? Does self.all.remove(c) mean that we WANT to destroy connection instance? [Tim Peters] It means that _ConnectionPool no longer has a reason to remember anything about that Connection. Application code can continue keeping it alive forever, though. [Denis Markov] But what about RDB-Connection what stay in cache forever? Sorry, I don't know anything about how your app uses RDB connections. ZODB isn't creating them on its own ;-) On the next peak load we get some next ZODB-Connections with RDB-Connection After repush() old ZODB-Connections will be killed (if pool_size) I don't like the word killed here, because it seems highly misleading. ZODB doesn't destroy any Connections or any caches. ZODB destroys all its strong references to old Connections, and that's all. Nothing can be done to _force_ Connections to go away forever. It's ZODB's job here to make sure it isn't forcing Connections (beyond the pool_size limit) to stay alive, and it's doing that job. It can't kill Connections. but RDB-Connection stay in cache forever And so on There's one cache per Connection. If and when a Connection goes away, its cache goes away too. So when you say something stays in cache forever, I don't know what you mean -- you apparently have many (hundreds? thousands?) of Connections, in which case you also have many (hundreds or thousands) of caches. I don't know how an RDB-Connection gets into even one of those caches to begin with. as a result we get many RDB-Connections what will never use but hang our RDB At this point I have to hope that someone else here understands what you're doing. If not, you may have better luck on the zope-db list (which is devoted to using other databases with Zope): http://mail.zope.org/mailman/listinfo/zope-db ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] ZEO and setting instance variables in __getstate__
[Syver Enstad] I have recently upgraded from ZODB 3.2 to 3.5.1. After doing this I notice that ZEO throws exceptions on commiting a transaction for certain types of Persistent classes. ... I was able to create a small self-contained test case from this description, and opened a Collector issue containing it: http://www.zope.org/Collectors/Zope3-dev/526 As it says, I don't know whether it should work, but offhand don't see why not. As is, creating new state inside __getstate__ is confusing the heck out of ZEO's MVCC cache for some reason. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] Re: Connection pool makes no sense
Tim Peters wrote at 2005-12-29 11:28 -0500: ... Or there are no strong reference to `obj`, but `obj` is part of cyclic garbage so _continues to exist_ until a round of Python's cyclic garbage collection runs. And this is *VERY* likely as any persistent object in the cache has a (strong, I believe) reference to the connection which in turn references any of these objects indirectly via the cache. In my view, closed connections not put back into the pool should be explicitely cleaned e.g. their cache cleared or at least minimized. If for some reason, the garbage collector does not release the cache/cache content cycles, then the number of connections would grow unboundedly which is much worse than an unbound grow of the all attribute. Pitcher seem to observe such a situation (where for some unknown reason, the garbage collector does not collect the connection. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
RE: [ZODB-Dev] Re: Connection pool makes no sense
[Tim] ... Or there are no strong reference to `obj`, but `obj` is part of cyclic garbage so _continues to exist_ until a round of Python's cyclic garbage collection runs. [Dieter Maurer] And this is *VERY* likely as any persistent object in the cache has a (strong, I believe) reference to the connection which in turn references any of these objects indirectly via the cache. I'm not sure I follow: it's not just very likely that Connections end up in cycles, it's certain that they do. The small test code I posted later should make that abundantly clear. They end up in cycles even if they're never used: call DB.open(), and the Connection it returns is already in a cycle (at least because a Connection and its cache each hold a strong reference to the other). In my view, closed connections not put back into the pool That never happens: when an open Connection is closed, it always goes back into the pool. If that would cause the configured pool_size to be exceeded, then other, older closed Connections are removed from the pool to make room. It's an abuse of the system for apps even to get into that state: that's why ZODB logs warnings if pool_size is ever exceeded, and logs at critical level if it's exceeded a lot. Connections should be viewed as a limited resource. should be explicitely cleaned e.g. their cache cleared or at least minimized. The code that removes older Connections from the pool doesn't do that now; it could, but there's no apparent reason to complicate it that I can see. If for some reason, the garbage collector does not release the cache/cache content cycles, then the number of connections would grow unboundedly which is much worse than an unbound grow of the all attribute. There's a big difference, though: application code alone _could_ provoke unbounded growth of .all without the current defensive coding -- that doesn't require hypothesizing Python gc bugs for which there's no evidence. If an application is seeing unbounded growth in the number of Connections, it's a Python gc bug, a ZODB bug, or an application bug. While cyclic gc may still seem novel to Zope2 users, it's been in Python for over five years, and bug reports against it have been very rare -- most apps stopped worrying about cycles years ago, and Zope3 has cycles just about everywhere you look. ZODB isn't a pioneer here. I ran stress tests against ZODB a year or so ago (when the new connection management code was implemented) that created millions of Connections, and saw no leaks then, regardless of whether they were or weren't explicitly closed. That isn't part of the test suite because it tied up a machine for a day ;-), but nothing material has changed since then that I know of. It's possible a new leak got introduced, but I'd need more evidence of that before spending time on it; the small test code I posted before showed that at least that much still works as designed, and that hit all the major paths thru the connection mgmt code. Pitcher seem to observe such a situation (where for some unknown reason, the garbage collector does not collect the connection. I don't believe we have any real idea what they're doing, beyond that something somewhere is sticking around longer than they would like. ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Query Regrading ZODB FileStorage(.fs file)
HelloI have recently jumped into python and ZODB . I am quiet familar to syntax and everything but there is one Issue . when we create a .fs file or say data.fs and saves in it some objects.If we open this file in notepad or other editior. It shows the data about objects everything its name its address or whatever information an object has.you can search particular property in that file. What should be done to hide that data.Pls. If u can reply me back on my email id if u have any solution. Thanks Monica Yahoo! Shopping Find Great Deals on Holiday Gifts at Yahoo! Shopping ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Query Regrading ZODB FileStorage(.fs file)
On 12/29/05, Monica chopra [EMAIL PROTECTED] wrote: I have recently jumped into python and ZODB . I am quiet familar to syntax and everything but there is one Issue . when we create a .fs file or say data.fs and saves in it some objects.If we open this file in notepad or other editior. It shows the data about objects everything its name its address or whatever information an object has.you can search particular property in that file. What should be done to hide that data. You need to be more careful in formulating your question. What data are you attempting to hide? Who you are attempting to hide it from? And so. Neither ZODB nor FileStorage were designed with a thought towards encrypting the persistent representation of the data. Jeremy ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev