[sqlalchemy] Re: sqlalchemy dependency on cyclic gc?

Michael Bayer Sun, 26 Oct 2008 13:15:09 -0700

r5201 removes the "dict" reference from InstanceState upon disposal,
which in the usual case is detected by a weakref handler.   It runs
one particular test which loads many thousands of rows with lots of
garbage collection 1.8% faster, from 12.20 sec to 11.99.   I don't
think there are any cycles in Query, the cycle between Connection and
Transaction is self-managing when the transaction is rolled back or
committed, and similarly the cycle between SessionTransaction and
Session is cleaned up when you call Session.close().



On Oct 26, 12:49 pm, Michael Bayer <[EMAIL PROTECTED]> wrote:
> On Oct 26, 2008, at 10:42 AM, Henk wrote:
>
>
>
>
>
> > Hi,
>
> > In  what way is sqlalchemy dependend on pythons cyclic gc?,
> > Is there any effort to make it work even if gc is off?.
>
> > I tried to turn gc off in a server side program i am developing, but
> > this resulted in
> > a lot of 'leaked' cycles/memory as soon as sqlalchemy is used to do
> > some queries...
>
> > I traced one cycle down to the relation between Session and
> > SessionTransaction. They both
> > hold a reference to each other, and this cycle is not broken in the
> > session's close method, nor is one
> > of them a weakref...
>
> > Turning on gc leak detection, also showes a lot of IdentyManagedState
> > objects being involved in cycles (e.g. not collected by refcount). The
> > number of these objects just keeps growing with each query. Closing
> > the Session has no influence on this...
>
> > Depending on cyclic gc is detrimental to the performance of server
> > side apps server serving many simultanious clients. The cyclic gc will
> > halt the whole python process for up to seconds (if there is a large
> > number of object instances in the process). All clients will thus
> > experience quite some lag when the cycle gc kicks in.
>
> > Unfortunatly with sqlalchemy tuning the gc to not run so often is also
> > not good, because of the very large numbers of instances being
> > created. A simple query for 2000 rows in my experience already creates
> > about 10mb worth of sqlalchemy objects involved in cycles that need to
> > be gc'd. So putting of the gc would mean memory would quickly be
> > exhausted....
>
> > (i am using sqlalchemy trunk revision 5200)
>
> In general we don't worry about cyclic GC unless an issue has been  
> demonstrated.  There are some areas where we are careful not to create  
> cycles, in cases where weakly referenced structures need to fall out  
> of scope automatically or where the timing of gc.collect() tends to be  
> troublesome.    Specifically within the the area of fetched rows and  
> fetched ORM objects which you mention, we are careful to not introduce  
> any cyclic references within the rows themselves or on your mapped  
> instances, so that zero refcount will in fact garbage collect your  
> instances without a cyclic run, even though the associated  
> IdentityManagedState may have cycles as you are mentioning.   The  
> connection pool is also very cycle-aware.
>
> The key issue here is if you're requesting a completely "pure" non-
> cyclical application, or if we're just talking about cycles within  
> objects that are created on a large scale.   In the latter case I  
> would think we're probably only talking about IdentityManagedState.  
> RowProxy, its "SQL expression" analogue, doesn't have any cycles.   I  
> wouldn't think SessionTransaction puts that much of a burden on cyclic  
> gc since you typically have only one or maybe two of those per  
> request, assuming you are using autocommit=False with a request-scoped  
> session.  There's also probably a cycle between Connection and  
> Transaction which is the lower level analogue of session-
>  >sessiontransaction.     There are also at least a few cycles within  
> the Query object, and I don't think there are any within ClauseElement  
> structures but I am not 100% sure.
>
> In the case that you are seeking to remove all cycles from SQLAlchemy  
> as a whole, that doesn't strike me as particularly practical.  There  
> are many areas where traversal among internal structures in both  
> directions is required, especially within Table and mapper  
> structures.  To achieve this without cycles, weakrefs would have to be  
> used.  Weakrefs introduce a significant performance penalty of their  
> own on every access, since it turns very fast attribute access into a  
> function call, which are very expensive in Python.  Its not clear to  
> me that the performance saved by disabling cyclic gc would be greater  
> than that introduced widespread usage of weakrefs, and at least would  
> impose a significant performance penalty on the vast majority of users  
> who leave Python's GC settings unchanged.     It would also be a great  
> burden on future development and testing to ensure that all new code  
> added in all cases does not introduce cycles, and also maintaining all  
> those strong/weak cycles without unexpected reference loss .   SQLA  
> also relies upon DBAPI implementations which may use cycles, and we'd  
> also someday have other dependencies which might require cycles.
>
> So I think removing cycles from row-corresponding objects like  
> IdentityManagedState is worth it, only marginally so for objects like  
> Query, Session, SessionTrans, Connection and Transaction, and not  
> worth it at all for application-scoped things like mappers and  
> tables.    For IMS we'd also need to add some tests, we have a "memory  
> cleanup" testing methodology that is somewhat lacking since I've found  
> that peeking into gc.get_objects() and similar looking for specific  
> patterns is more difficult than it might seem.
>
> Beyond the case of IdentityManagedState, I wouldn't be convinced that  
> gc has a significant impact without seeing some benchmarks (im  
> actually skeptical of the IMS case too, but removing cycles there  
> shouldn't be too hard).  In Python I've often seen a great disparity  
> between the theoretical and the actual wrt performance and its not  
> worth going down any avenue without benchmarks and profile results to  
> start.    I wonder if sites like reddit and Youtube run with gc turned  
> off.   reddit.com uses Mako templates which definitely has circular  
> references (and definitely requires them) within its Context object,  
> so I doubt they've given attention to this - they just do what  
> everyone else does and scale horizontally, something that is  
> ultimately needed whether you're running a pure-C application server  
> or Python with its GIL and interpreter overhead.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: sqlalchemy dependency on cyclic gc?

Reply via email to