Re: [sqlalchemy] subqueryload for a ColumnProperty?

Michael Bayer Sun, 11 Sep 2011 08:53:44 -0700

On Sep 11, 2011, at 7:05 AM, Sumeet Agarwal wrote:

> It results in a huge derived table and a table scan over the order table. Out 
> of curiosity, I ran the same statement in PostgresSQL (9.0.4) and noticed it 
> works much better:


Yeah one thing I do when I use MySQL, is that I assume at the gate that joins 
are out of the question.   MySQL is not a system that is reasonably capable of 
basic relational tasks.   If I'm using MySQL, it's usually because I'm 
integrating with some existing PHP application or some manager forced some bad 
decision on me.

I know this is not helping and you want this:

> 
> mysql> explain SELECT customer.id AS customer_id, anon_1.customer_id AS 
> anon_1_customer_id, anon_1.count AS anon_1_count  FROM customer LEFT OUTER 
> JOIN (SELECT `order`.customer_id AS customer_id, count(`order`.id) AS count  
> FROM `order` WHERE customer_id = 46 GROUP BY `order`.custome r_id) AS anon_1 
> ON customer.id = anon_1.customer_id  WHERE customer.id = 46 
So to get exactly the behavior of, copy the "WHERE" clause into the joined load 
or subq load, that's not directly possible with the JoinedLoader or 
SubqueryLoader.    It's also the kind of thing that's not very easy to do 
generically, if your lead query were much more complex than a simple WHERE 
clause for example.

By far the easiest way is to just write it out as two queries and assign the 
value.   You don't even need the whole relationship() thing, just emit a second 
query for the counts, then set the attribute.   That's doityourself.py 
attached, where I purposely did it with no joins whatsoever since if you have 
the customer_id here, you don't need the JOIN at all - MySQL hates joins so 
this approach will definitely be the fastest (plus no SQLAlchemy attribute 
overhead).    You can make it a little more generic by creating a function that 
you give the WHERE clause to, it then applies the WHERE to both the customer 
and order tables in some way.  

If you still wanted it with the JOIN to the parent,  and optionally deal with 
the relationship() to OrderCount again, that approach is actually a 
specialization of the "subqueryload" recipe at 
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/DisjointEagerLoading .    The 
recipe then uses q.from_self(Child).join(Parent.children) to create the join 
from the parent to child.   But here the level of hardcoding is already so 
great I don't see what it buys you except a slower MySQL query.

If you were doing joined loading, you can keep using OrderCount and use 
contains_eager() to write it out explicitly:

subq = s.query(OrderCount).filter(OrderCount.customer_id==1).subquery()
for c in s.query(Customer).\
            filter(Customer.id==1).\
            outerjoin(subq, Customer._num_orders).\
            options(contains_eager("_num_orders", alias=subq)):
    print c.id, c.num_orders


The query there still has more nesting than MySQL likes.

Next approach, hardcode a bind in there, very brittle but would produce this 
result:

order_counts = select([Order.customer_id, 
func.count(Order.id).label('count')]).\
        where(Order.customer_id==bindparam("cust_id")).\
        group_by(Order.customer_id).alias()

mapper(
    OrderCount, 
    order_counts,
    primary_key=order_counts.c.customer_id
)

the "joinedload" scenario would work like this:

s.query(Customer).filter(Customer.id==48).options(joinedload("_num_orders")).params(cust_id=48)

"subqueryload()", not as easy - we need to use the technique at 
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/GlobalFilter (which also works 
with the joinedload version):

from sqlalchemy.orm.interfaces import MapperOption

class SetABindOption(MapperOption):
    propagate_to_loaders = True

    def __init__(self, **kw):
        self.kw = kw

    def process_query_conditionally(self, query):
        """process query during a lazyload"""
        query._params = query._params.union(self.kw)

    def process_query(self, query):
        """process query during a primary user query"""

        self.process_query_conditionally(query)

s.query(Customer).filter(Customer.id==1).options(subqueryload("_num_orders"), 
SetABindOption(cust_id=1))

But I'd stick with, just do an extra SELECT statement unless more genericism is 
needed.  Since it appears this case is so special purpose already and MySQL is 
going to chug to a halt if you fall back on other methods.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base= declarative_base()

class Customer(Base):
    __tablename__ = 'customer'
    id = Column(Integer, primary_key=True)
    orders = relationship("Order")

class Order(Base):
    __tablename__ = 'order'
    id = Column(Integer, primary_key=True)
    customer_id = Column(Integer, ForeignKey('customer.id'))

e = create_engine('sqlite://', echo=True)

Base.metadata.create_all(e)

s = Session(e)
s.add_all([
    Customer(orders=[Order(), Order(), Order()]),
    Customer(orders=[Order(), Order()]),
    Customer(orders=[]),
])
s.commit()

customers = s.query(Customer).filter(Customer.id==1)

from itertools import groupby
order_counts = dict((k, list(v)[0]) for k, v in groupby(
    s.query(Order.customer_id, func.count(Order.id).label('count')).
        filter(Order.customer_id==1).
        order_by(Order.customer_id),
    lambda x:x.customer_id
))

for c in customers:
    if c.id in order_counts:
        c.num_orders = order_counts[c.id]
    else:
        c.num_orders = 0

for c in customers:
    print c.id, c.num_orders

Re: [sqlalchemy] subqueryload for a ColumnProperty?

Reply via email to