Re: [sqlalchemy] Re: no cascades on mapper?

Michael Bayer Wed, 29 Sep 2010 13:46:48 -0700

On Sep 29, 2010, at 3:57 PM, kindly wrote:

> I wish I could replicate this easily.   However, I have a layer of
> abstraction on top of sqlalchemy to make this harder than it should
> be. I am actually building a user level data loader.
> 
> The profiler seems to say that get_history is called on after when
> "cascade_iterator" is run on the properties.  Here is a picture of the
> output.  http://i.imgur.com/mNZoM.png.
> 
> The other exceptional thing about my mappers is that there is an
> attribute extension on each non relation mapped field.  I don't know
> if that would make a difference but I know it affects get_history.
> They also all have a version_id field.
> 
> I think that this issue may be the amount of times that it is called,
> not the speed of the method itself.  It appears to be called once per
> "add".  As I manage the cascades I probably do more "add"s than are
> needed.
> 
> I have used your patch and it does not makes any difference, if
> anything it makes it very slightly worse.
> 
> However, I have ran the test a few more times and the effect is more
> modest than my first tests.  Putting the return at
> the top only seems to shave about 6% off on average.  This however
> includes all the other things my application is doing (i.e.it
> validates each object etc.) which take up about half the time. So it
> could mean sqlalchemy loosing 12% because of it.
> 
> I hope you can glean something out of this.
> 
> If I get time I will try and get a profilable case.  It may take me a
> while.  Maybe setting up the mappers like I have and pathologically
> "add"ing will do the trick.


if you could send a raw hotshot profile file, at least we could look through it.




> 
> 
> 
> On Sep 29, 12:20 am, Michael Bayer <[email protected]> wrote:
>> On Sep 28, 2010, at 6:32 PM, kindly wrote:
>> 
>>> I have been doing some profiling on a batch job I have been running.
>> 
>>> I control all my own cascading, so I set the cascade flag on each
>>> relation to "none".  Even so mapper.cascade_iterator does quite a lot
>>> of work.
>> 
>>> I did the crudest test by just placing a return at the top of
>>> cascade_iterator.  It speeds up my job by %10-20.  I imagine this
>>> would be more if the my relation tree was more complicated.
>> 
>>> Do you think this is worth having a mapper option for no cascades?
>>> Or detecting there are not any and therefore not pre-emptively
>>> recursing the relation tree?
>> 
>> I'd need to see a specific example for detail on this.     If all of your 
>> relationships() are configured with no cascade, you'd basically see calls to 
>> mapper.cascade_iterator() that bundle up all the relationships into a list, 
>> it then calls "cascade_iterator" on all of those, and they should all exit 
>> immediately.
>> 
>> There is certainly a case to be made that the check for "cascades" could be 
>> done inside of mapper.cascade_iterator(), thereby avoiding the call to 
>> RelationshipProperty.     There is further an optimization such that when 
>> mapper.cascade_iterator() calls upon self._props.itervalues(), that itself 
>> could be changed to return a cached collection of all 
>> RelationshipProprerties which include the cascade - so in the case of no 
>> cascades, that collection would be empty, and mapper.cascade_iterator() 
>> could be reduced to one call, one boolean pull and then it returns.    The 
>> catch of "StopIteration" is certainly a possible bottleneck on this, and 
>> we'd have to revisit Ants' refactoring here which originally allowed it to 
>> work without recursion.
>> 
>> If we were to optimize this, that's how we would do it.   A top level mapper 
>> option would be way too specific to the internals and esoteric to most users.
>> 
>> What we need here though is some code to run profiling on.  It's hard to 
>> understand that the cascade_iterator() call with several no-op calls to 
>> RelationshipProperty.cascade_iterator() is taking up 20% of a batch 
>> operation - since when cascade is called its usually inside of some other 
>> operation like a merge() or flush() that is overall much more expensive.  
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: no cascades on mapper?

Reply via email to