r5201 removes the "dict" reference from InstanceState upon disposal, which in the usual case is detected by a weakref handler. It runs one particular test which loads many thousands of rows with lots of garbage collection 1.8% faster, from 12.20 sec to 11.99. I don't think there are any cycles in Query, the cycle between Connection and Transaction is self-managing when the transaction is rolled back or committed, and similarly the cycle between SessionTransaction and Session is cleaned up when you call Session.close().
On Oct 26, 12:49 pm, Michael Bayer <[EMAIL PROTECTED]> wrote: > On Oct 26, 2008, at 10:42 AM, Henk wrote: > > > > > > > Hi, > > > In what way is sqlalchemy dependend on pythons cyclic gc?, > > Is there any effort to make it work even if gc is off?. > > > I tried to turn gc off in a server side program i am developing, but > > this resulted in > > a lot of 'leaked' cycles/memory as soon as sqlalchemy is used to do > > some queries... > > > I traced one cycle down to the relation between Session and > > SessionTransaction. They both > > hold a reference to each other, and this cycle is not broken in the > > session's close method, nor is one > > of them a weakref... > > > Turning on gc leak detection, also showes a lot of IdentyManagedState > > objects being involved in cycles (e.g. not collected by refcount). The > > number of these objects just keeps growing with each query. Closing > > the Session has no influence on this... > > > Depending on cyclic gc is detrimental to the performance of server > > side apps server serving many simultanious clients. The cyclic gc will > > halt the whole python process for up to seconds (if there is a large > > number of object instances in the process). All clients will thus > > experience quite some lag when the cycle gc kicks in. > > > Unfortunatly with sqlalchemy tuning the gc to not run so often is also > > not good, because of the very large numbers of instances being > > created. A simple query for 2000 rows in my experience already creates > > about 10mb worth of sqlalchemy objects involved in cycles that need to > > be gc'd. So putting of the gc would mean memory would quickly be > > exhausted.... > > > (i am using sqlalchemy trunk revision 5200) > > In general we don't worry about cyclic GC unless an issue has been > demonstrated. There are some areas where we are careful not to create > cycles, in cases where weakly referenced structures need to fall out > of scope automatically or where the timing of gc.collect() tends to be > troublesome. Specifically within the the area of fetched rows and > fetched ORM objects which you mention, we are careful to not introduce > any cyclic references within the rows themselves or on your mapped > instances, so that zero refcount will in fact garbage collect your > instances without a cyclic run, even though the associated > IdentityManagedState may have cycles as you are mentioning. The > connection pool is also very cycle-aware. > > The key issue here is if you're requesting a completely "pure" non- > cyclical application, or if we're just talking about cycles within > objects that are created on a large scale. In the latter case I > would think we're probably only talking about IdentityManagedState. > RowProxy, its "SQL expression" analogue, doesn't have any cycles. I > wouldn't think SessionTransaction puts that much of a burden on cyclic > gc since you typically have only one or maybe two of those per > request, assuming you are using autocommit=False with a request-scoped > session. There's also probably a cycle between Connection and > Transaction which is the lower level analogue of session- > >sessiontransaction. There are also at least a few cycles within > the Query object, and I don't think there are any within ClauseElement > structures but I am not 100% sure. > > In the case that you are seeking to remove all cycles from SQLAlchemy > as a whole, that doesn't strike me as particularly practical. There > are many areas where traversal among internal structures in both > directions is required, especially within Table and mapper > structures. To achieve this without cycles, weakrefs would have to be > used. Weakrefs introduce a significant performance penalty of their > own on every access, since it turns very fast attribute access into a > function call, which are very expensive in Python. Its not clear to > me that the performance saved by disabling cyclic gc would be greater > than that introduced widespread usage of weakrefs, and at least would > impose a significant performance penalty on the vast majority of users > who leave Python's GC settings unchanged. It would also be a great > burden on future development and testing to ensure that all new code > added in all cases does not introduce cycles, and also maintaining all > those strong/weak cycles without unexpected reference loss . SQLA > also relies upon DBAPI implementations which may use cycles, and we'd > also someday have other dependencies which might require cycles. > > So I think removing cycles from row-corresponding objects like > IdentityManagedState is worth it, only marginally so for objects like > Query, Session, SessionTrans, Connection and Transaction, and not > worth it at all for application-scoped things like mappers and > tables. For IMS we'd also need to add some tests, we have a "memory > cleanup" testing methodology that is somewhat lacking since I've found > that peeking into gc.get_objects() and similar looking for specific > patterns is more difficult than it might seem. > > Beyond the case of IdentityManagedState, I wouldn't be convinced that > gc has a significant impact without seeing some benchmarks (im > actually skeptical of the IMS case too, but removing cycles there > shouldn't be too hard). In Python I've often seen a great disparity > between the theoretical and the actual wrt performance and its not > worth going down any avenue without benchmarks and profile results to > start. I wonder if sites like reddit and Youtube run with gc turned > off. reddit.com uses Mako templates which definitely has circular > references (and definitely requires them) within its Context object, > so I doubt they've given attention to this - they just do what > everyone else does and scale horizontally, something that is > ultimately needed whether you're running a pure-C application server > or Python with its GIL and interpreter overhead. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---
