Re: [sqlalchemy] The cost of defer()

Michael Bayer Sat, 13 Jul 2013 13:47:03 -0700

anyway, with some profiling of loading 1000 rows with six deferred cols, the 
function count with the defer was over 50K and without the defer around 37K; 
the patch is now committed and with the defer it's at 32K, so a bit less than 
that of loading the data, as it should be (more would be better sure, but this 
is a decent improvement).  The change to get that much wasn't that big a deal 
so this is in 0.8.



On Jul 11, 2013, at 2:04 PM, Michael Bayer <[email protected]> wrote:

> please try out this patch:
> 
> http://www.sqlalchemy.org/trac/attachment/ticket/2778/2778.patch
> 
> 
> which refactors this particular system to not require the production of a new 
> object per instance, which is the slowest part of this, and also inlines the 
> work of assembling the callable.  This should give you 50% or more method 
> call improvement.  if this is enough, this might be OK for 0.8.
> 
> 
> 
> 
> 
> On Jul 11, 2013, at 1:40 PM, Michael Bayer <[email protected]> wrote:
> 
>> well what kind of data are we talking about?  defer()'s use case was for 
>> binary large objects and such, fields that are many K/Megs in size.  if 
>> you're deferring a bunch of ints, then yes it's not optimized very well for 
>> that.
>> 
>> Half of the overhead could be easily fixed here, creating those 
>> LoadDeferredColumns objects could be offloaded to a later point.    The 
>> other half, setting up that callable, I'd have to spend some time reviewing 
>> the use cases here.  The difference between an attribute that is "deferred" 
>> vs. one that is "expired" is that if you access some other expired 
>> attribute, the "deferred" attribute will still not load - because the use 
>> case is, you really don't want this BLOB column to load unless you touch it 
>> specifically.   So to get that instruction into the state, "don't load these 
>> keys even on an unexpire", uses some kind of method call on every state.    
>> InstanceState._set_callable could be inlined more here to do less work, 
>> instructions up to the loader process just to populate a key in a dictionary 
>> maybe, though these reorganizations can destabilize the code.   it's not 
>> something I'd be comfortable doing in 0.8, the ticket I created 
>> (http://www.sqlalchemy.org/trac/ticket/2778) has any potential work here for 
>> 0.9.
>> 
>> The other way to go here is to provide a query option that explicitly 
>> delivers the attribute as "expired" as opposed to "deferred", looking at how 
>> that works right now I can give you the recipe below, but it still involves 
>> a function call per column so that the InstanceState knows the attribute is 
>> "expired".
>> 
>> 
>> from sqlalchemy.orm.strategies import DeferredOption, DeferredColumnLoader
>> 
>> 
>> class DontLoadColumnOption(DeferredOption):
>>     def get_strategy_class(self):
>>         return NoColumnLoader
>> 
>> 
>> class NoColumnLoader(DeferredColumnLoader):
>>     def create_row_processor(self, context, path, mapper, row, adapter):
>>         if not self.is_class_level:
>>             def set_deferred_for_local_state(state, dict_, row):
>>                 state.callables[self.key] = state
>>             return set_deferred_for_local_state, None, None
>>         else:
>>             return super(NoColumnLoader, self).create_row_processor(
>>                                     context, path, mapper, row, adapter)
>> 
>> if __name__ == '__main__':
>>     from sqlalchemy import *
>>     from sqlalchemy.orm import *
>>     from sqlalchemy.ext.declarative import declarative_base
>> 
>>     Base = declarative_base()
>> 
>>     class A(Base):
>>         __tablename__ = 'a'
>> 
>>         id = Column(Integer, primary_key=True)
>> 
>>         x = Column(Integer)
>>         y = Column(Integer)
>>         z = Column(Integer)
>>         q = Column(Integer)
>> 
>>     e = create_engine("sqlite://", echo=True)
>>     Base.metadata.create_all(e)
>>     s = Session(e)
>>     s.add_all([
>>         A(x="x%d" % i, y="y%d" % i, z="z%d" % i, q="q%d" % i)
>>         for i in xrange(1000)
>>     ])
>>     s.commit()
>>     s.close()
>> 
>>     loaded = s.query(A).options(DontLoadColumnOption("y"),
>>                     DontLoadColumnOption("z")).order_by(A.id).all()
>> 
>>     for a in loaded:
>>         assert 'y' not in a.__dict__
>>         assert 'z' not in a.__dict__
>>         assert 'x' in a.__dict__
>>         assert 'q' in a.__dict__
>> 
>>         assert a.z == "z%d" % (a.id - 1), a.z
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Jul 11, 2013, at 10:23 AM, "Gombas, Gabor" 
>> <[email protected]> wrote:
>> 
>>> 
>>> I did need the objects, not just the raw data, otherwise I’d had to 
>>> duplicate a bunch of existing code which expected full-blown objects to 
>>> operate on. Modifying the mapper is not really an option unless the 
>>> majority of the users have the same requirements, otherwise I end up having 
>>> to add a huge amount of undefer() calls everywhere else (and the query 
>>> storm caused by missing an undefer() would be much more painful). Maybe the 
>>> documentation of defer() could mention that the use of the option can in 
>>> fact reduce performance, because it’s not intuitive that loading unneeded 
>>> data is cheaper than not loading it.
>>>  
>>> From: [email protected] [mailto:[email protected]] On 
>>> Behalf Of Michael Bayer
>>> Sent: 11 July 2013 15:34
>>> To: [email protected]
>>> Subject: Re: [sqlalchemy] The cost of defer()
>>>  
>>> the path would be to figure out if the logic of a per-query defer option 
>>> can somehow be linked to the attribute when it hits its normal refresh 
>>> logic - if those attribute were set up as "deferred" at the mapper config 
>>> level (where deferred is usually used), you wouldn't see this overhead 
>>> since the deferred loader would be the default callable.    But if you only 
>>> need a few attributes why not just go the other way and query for those 
>>> attributes directly?   that would save you way more overhead than even if 
>>> we removed all overhead from defer(), since the most expensive thing is all 
>>> the mapper identity map logic that takes place when full entities are 
>>> loaded.
>>>  
>>>  
>>> On Jul 11, 2013, at 4:02 AM, "Gombas, Gabor" 
>>> <[email protected]> wrote:
>>> 
>>> 
>>>  
>>> Hi,
>>>  
>>> I wrote a query joining a couple of tables, returning over a hundred 
>>> thousand rows. Since I only needed to access a couple of the attributes of 
>>> the returned objects for this specific use case, I thought to use a dozen 
>>> or so Query.options(defer(…)) calls to avoid loading the unneeded columns. 
>>> But to my surprise, the query became much slower. Profiling attributed 
>>> almost all the extra time to set_deferred_for_local_state() and 
>>> LoadDeferredColumns.__init__():
>>>  
>>>    827855   15.668    0.000   27.068    0.000 
>>> …/SQLAlchemy-0.8.1-py2.6-linux-x86_64.egg/sqlalchemy/orm/strategies.py:184(set_deferred_for_local_state)
>>>    827855   10.524    0.000   10.524    0.000 
>>> …/SQLAlchemy-0.8.1-py2.6-linux-x86_64.egg/sqlalchemy/orm/strategies.py:259(__init__)
>>>  
>>> If defer() is so expensive, then it is not very useful. Would it be 
>>> possible to make it cheaper?
>>>  
>>> Gabor
>>>  
>>>  
>>> 
>>> NOTICE: Morgan Stanley is not acting as a municipal advisor and the 
>>> opinions or views contained herein are not intended to be, and do not 
>>> constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall 
>>> Street Reform and Consumer Protection Act. If you have received this 
>>> communication in error, please destroy all electronic and paper copies and 
>>> notify the sender immediately. Mistransmission is not intended to waive 
>>> confidentiality or privilege. Morgan Stanley reserves the right, to the 
>>> extent permitted under applicable law, to monitor electronic 
>>> communications. This message is subject to terms available at the following 
>>> link: http://www.morganstanley.com/disclaimers If you cannot access these 
>>> links, please notify us by reply message and we will send the contents to 
>>> you. By messaging with Morgan Stanley you consent to the foregoing.
>>>  
>>>  
>>>  
>>>  
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/sqlalchemy.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>  
>>>  
>>>  
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/sqlalchemy.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>  
>>>  
>>> 
>>> 
>>> 
>>> NOTICE: Morgan Stanley is not acting as a municipal advisor and the 
>>> opinions or views contained herein are not intended to be, and do not 
>>> constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall 
>>> Street Reform and Consumer Protection Act. If you have received this 
>>> communication in error, please destroy all electronic and paper copies and 
>>> notify the sender immediately. Mistransmission is not intended to waive 
>>> confidentiality or privilege. Morgan Stanley reserves the right, to the 
>>> extent permitted under applicable law, to monitor electronic 
>>> communications. This message is subject to terms available at the following 
>>> link:http://www.morganstanley.com/disclaimers If you cannot access these 
>>> links, please notify us by reply message and we will send the contents to 
>>> you. By messaging with Morgan Stanley you consent to the foregoing.
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/sqlalchemy.
>>> For more options, visit https://groups.google.com/groups/opt_out.    
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] The cost of defer()

Reply via email to