Re: [sqlalchemy] The cost of defer()

Michael Bayer Thu, 11 Jul 2013 10:42:02 -0700

well what kind of data are we talking about?  defer()'s use case was for binary 
large objects and such, fields that are many K/Megs in size.  if you're 
deferring a bunch of ints, then yes it's not optimized very well for that.


Half of the overhead could be easily fixed here, creating those 
LoadDeferredColumns objects could be offloaded to a later point.    The other 
half, setting up that callable, I'd have to spend some time reviewing the use 
cases here.  The difference between an attribute that is "deferred" vs. one 
that is "expired" is that if you access some other expired attribute, the 
"deferred" attribute will still not load - because the use case is, you really 
don't want this BLOB column to load unless you touch it specifically.   So to 
get that instruction into the state, "don't load these keys even on an 
unexpire", uses some kind of method call on every state.    
InstanceState._set_callable could be inlined more here to do less work, 
instructions up to the loader process just to populate a key in a dictionary 
maybe, though these reorganizations can destabilize the code.   it's not 
something I'd be comfortable doing in 0.8, the ticket I created 
(http://www.sqlalchemy.org/trac/ticket/2778) has any potential work here for 
0.9.

The other way to go here is to provide a query option that explicitly delivers 
the attribute as "expired" as opposed to "deferred", looking at how that works 
right now I can give you the recipe below, but it still involves a function 
call per column so that the InstanceState knows the attribute is "expired".


from sqlalchemy.orm.strategies import DeferredOption, DeferredColumnLoader


class DontLoadColumnOption(DeferredOption):
    def get_strategy_class(self):
        return NoColumnLoader


class NoColumnLoader(DeferredColumnLoader):
    def create_row_processor(self, context, path, mapper, row, adapter):
        if not self.is_class_level:
            def set_deferred_for_local_state(state, dict_, row):
                state.callables[self.key] = state
            return set_deferred_for_local_state, None, None
        else:
            return super(NoColumnLoader, self).create_row_processor(
                                    context, path, mapper, row, adapter)

if __name__ == '__main__':
    from sqlalchemy import *
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base

    Base = declarative_base()

    class A(Base):
        __tablename__ = 'a'

        id = Column(Integer, primary_key=True)

        x = Column(Integer)
        y = Column(Integer)
        z = Column(Integer)
        q = Column(Integer)

    e = create_engine("sqlite://", echo=True)
    Base.metadata.create_all(e)
    s = Session(e)
    s.add_all([
        A(x="x%d" % i, y="y%d" % i, z="z%d" % i, q="q%d" % i)
        for i in xrange(1000)
    ])
    s.commit()
    s.close()

    loaded = s.query(A).options(DontLoadColumnOption("y"),
                    DontLoadColumnOption("z")).order_by(A.id).all()

    for a in loaded:
        assert 'y' not in a.__dict__
        assert 'z' not in a.__dict__
        assert 'x' in a.__dict__
        assert 'q' in a.__dict__

        assert a.z == "z%d" % (a.id - 1), a.z













On Jul 11, 2013, at 10:23 AM, "Gombas, Gabor" <[email protected]> 
wrote:

> 
> I did need the objects, not just the raw data, otherwise I’d had to duplicate 
> a bunch of existing code which expected full-blown objects to operate on. 
> Modifying the mapper is not really an option unless the majority of the users 
> have the same requirements, otherwise I end up having to add a huge amount of 
> undefer() calls everywhere else (and the query storm caused by missing an 
> undefer() would be much more painful). Maybe the documentation of defer() 
> could mention that the use of the option can in fact reduce performance, 
> because it’s not intuitive that loading unneeded data is cheaper than not 
> loading it.
>  
> From: [email protected] [mailto:[email protected]] On 
> Behalf Of Michael Bayer
> Sent: 11 July 2013 15:34
> To: [email protected]
> Subject: Re: [sqlalchemy] The cost of defer()
>  
> the path would be to figure out if the logic of a per-query defer option can 
> somehow be linked to the attribute when it hits its normal refresh logic - if 
> those attribute were set up as "deferred" at the mapper config level (where 
> deferred is usually used), you wouldn't see this overhead since the deferred 
> loader would be the default callable.    But if you only need a few 
> attributes why not just go the other way and query for those attributes 
> directly?   that would save you way more overhead than even if we removed all 
> overhead from defer(), since the most expensive thing is all the mapper 
> identity map logic that takes place when full entities are loaded.
>  
>  
> On Jul 11, 2013, at 4:02 AM, "Gombas, Gabor" <[email protected]> 
> wrote:
> 
> 
>  
> Hi,
>  
> I wrote a query joining a couple of tables, returning over a hundred thousand 
> rows. Since I only needed to access a couple of the attributes of the 
> returned objects for this specific use case, I thought to use a dozen or so 
> Query.options(defer(…)) calls to avoid loading the unneeded columns. But to 
> my surprise, the query became much slower. Profiling attributed almost all 
> the extra time to set_deferred_for_local_state() and 
> LoadDeferredColumns.__init__():
>  
>    827855   15.668    0.000   27.068    0.000 
> …/SQLAlchemy-0.8.1-py2.6-linux-x86_64.egg/sqlalchemy/orm/strategies.py:184(set_deferred_for_local_state)
>    827855   10.524    0.000   10.524    0.000 
> …/SQLAlchemy-0.8.1-py2.6-linux-x86_64.egg/sqlalchemy/orm/strategies.py:259(__init__)
>  
> If defer() is so expensive, then it is not very useful. Would it be possible 
> to make it cheaper?
>  
> Gabor
>  
>  
> 
> 
> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions 
> or views contained herein are not intended to be, and do not constitute, 
> advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform 
> and Consumer Protection Act. If you have received this communication in 
> error, please destroy all electronic and paper copies and notify the sender 
> immediately. Mistransmission is not intended to waive confidentiality or 
> privilege. Morgan Stanley reserves the right, to the extent permitted under 
> applicable law, to monitor electronic communications. This message is subject 
> to terms available at the following link: 
> http://www.morganstanley.com/disclaimers If you cannot access these links, 
> please notify us by reply message and we will send the contents to you. By 
> messaging with Morgan Stanley you consent to the foregoing.
>  
>  
>  
>  
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  
>  
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  
> 
> 
> 
> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions 
> or views contained herein are not intended to be, and do not constitute, 
> advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform 
> and Consumer Protection Act. If you have received this communication in 
> error, please destroy all electronic and paper copies and notify the sender 
> immediately. Mistransmission is not intended to waive confidentiality or 
> privilege. Morgan Stanley reserves the right, to the extent permitted under 
> applicable law, to monitor electronic communications. This message is subject 
> to terms available at the following 
> link:http://www.morganstanley.com/disclaimers If you cannot access these 
> links, please notify us by reply message and we will send the contents to 
> you. By messaging with Morgan Stanley you consent to the foregoing.
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/groups/opt_out.    

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] The cost of defer()

Reply via email to