On Sep 29, 2010, at 3:57 PM, kindly wrote: > I wish I could replicate this easily. However, I have a layer of > abstraction on top of sqlalchemy to make this harder than it should > be. I am actually building a user level data loader. > > The profiler seems to say that get_history is called on after when > "cascade_iterator" is run on the properties. Here is a picture of the > output. http://i.imgur.com/mNZoM.png. > > The other exceptional thing about my mappers is that there is an > attribute extension on each non relation mapped field. I don't know > if that would make a difference but I know it affects get_history. > They also all have a version_id field. > > I think that this issue may be the amount of times that it is called, > not the speed of the method itself. It appears to be called once per > "add". As I manage the cascades I probably do more "add"s than are > needed. > > I have used your patch and it does not makes any difference, if > anything it makes it very slightly worse. > > However, I have ran the test a few more times and the effect is more > modest than my first tests. Putting the return at > the top only seems to shave about 6% off on average. This however > includes all the other things my application is doing (i.e.it > validates each object etc.) which take up about half the time. So it > could mean sqlalchemy loosing 12% because of it. > > I hope you can glean something out of this. > > If I get time I will try and get a profilable case. It may take me a > while. Maybe setting up the mappers like I have and pathologically > "add"ing will do the trick.
if you could send a raw hotshot profile file, at least we could look through it. > > > > On Sep 29, 12:20 am, Michael Bayer <[email protected]> wrote: >> On Sep 28, 2010, at 6:32 PM, kindly wrote: >> >>> I have been doing some profiling on a batch job I have been running. >> >>> I control all my own cascading, so I set the cascade flag on each >>> relation to "none". Even so mapper.cascade_iterator does quite a lot >>> of work. >> >>> I did the crudest test by just placing a return at the top of >>> cascade_iterator. It speeds up my job by %10-20. I imagine this >>> would be more if the my relation tree was more complicated. >> >>> Do you think this is worth having a mapper option for no cascades? >>> Or detecting there are not any and therefore not pre-emptively >>> recursing the relation tree? >> >> I'd need to see a specific example for detail on this. If all of your >> relationships() are configured with no cascade, you'd basically see calls to >> mapper.cascade_iterator() that bundle up all the relationships into a list, >> it then calls "cascade_iterator" on all of those, and they should all exit >> immediately. >> >> There is certainly a case to be made that the check for "cascades" could be >> done inside of mapper.cascade_iterator(), thereby avoiding the call to >> RelationshipProperty. There is further an optimization such that when >> mapper.cascade_iterator() calls upon self._props.itervalues(), that itself >> could be changed to return a cached collection of all >> RelationshipProprerties which include the cascade - so in the case of no >> cascades, that collection would be empty, and mapper.cascade_iterator() >> could be reduced to one call, one boolean pull and then it returns. The >> catch of "StopIteration" is certainly a possible bottleneck on this, and >> we'd have to revisit Ants' refactoring here which originally allowed it to >> work without recursion. >> >> If we were to optimize this, that's how we would do it. A top level mapper >> option would be way too specific to the internals and esoteric to most users. >> >> What we need here though is some code to run profiling on. It's hard to >> understand that the cascade_iterator() call with several no-op calls to >> RelationshipProperty.cascade_iterator() is taking up 20% of a batch >> operation - since when cascade is called its usually inside of some other >> operation like a merge() or flush() that is overall much more expensive. > > -- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/sqlalchemy?hl=en. > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
