On Sep 28, 2010, at 6:32 PM, kindly wrote:

> I have been doing some profiling on a batch job I have been running.
> 
> I control all my own cascading, so I set the cascade flag on each
> relation to "none".  Even so mapper.cascade_iterator does quite a lot
> of work.
> 
> I did the crudest test by just placing a return at the top of
> cascade_iterator.  It speeds up my job by %10-20.  I imagine this
> would be more if the my relation tree was more complicated.
> 
> Do you think this is worth having a mapper option for no cascades?
> Or detecting there are not any and therefore not pre-emptively
> recursing the relation tree?

I'd need to see a specific example for detail on this.     If all of your 
relationships() are configured with no cascade, you'd basically see calls to 
mapper.cascade_iterator() that bundle up all the relationships into a list, it 
then calls "cascade_iterator" on all of those, and they should all exit 
immediately.

There is certainly a case to be made that the check for "cascades" could be 
done inside of mapper.cascade_iterator(), thereby avoiding the call to 
RelationshipProperty.     There is further an optimization such that when 
mapper.cascade_iterator() calls upon self._props.itervalues(), that itself 
could be changed to return a cached collection of all RelationshipProprerties 
which include the cascade - so in the case of no cascades, that collection 
would be empty, and mapper.cascade_iterator() could be reduced to one call, one 
boolean pull and then it returns.    The catch of "StopIteration" is certainly 
a possible bottleneck on this, and we'd have to revisit Ants' refactoring here 
which originally allowed it to work without recursion.

If we were to optimize this, that's how we would do it.   A top level mapper 
option would be way too specific to the internals and esoteric to most users. 

What we need here though is some code to run profiling on.  It's hard to 
understand that the cascade_iterator() call with several no-op calls to 
RelationshipProperty.cascade_iterator() is taking up 20% of a batch operation - 
since when cascade is called its usually inside of some other operation like a 
merge() or flush() that is overall much more expensive.   


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to