On Sep 11, 2011, at 7:05 AM, Sumeet Agarwal wrote:
> It results in a huge derived table and a table scan over the order table. Out
> of curiosity, I ran the same statement in PostgresSQL (9.0.4) and noticed it
> works much better:
Yeah one thing I do when I use MySQL, is that I assume at the gate that joins
are out of the question. MySQL is not a system that is reasonably capable of
basic relational tasks. If I'm using MySQL, it's usually because I'm
integrating with some existing PHP application or some manager forced some bad
decision on me.
I know this is not helping and you want this:
>
> mysql> explain SELECT customer.id AS customer_id, anon_1.customer_id AS
> anon_1_customer_id, anon_1.count AS anon_1_count FROM customer LEFT OUTER
> JOIN (SELECT `order`.customer_id AS customer_id, count(`order`.id) AS count
> FROM `order` WHERE customer_id = 46 GROUP BY `order`.custome r_id) AS anon_1
> ON customer.id = anon_1.customer_id WHERE customer.id = 46
So to get exactly the behavior of, copy the "WHERE" clause into the joined load
or subq load, that's not directly possible with the JoinedLoader or
SubqueryLoader. It's also the kind of thing that's not very easy to do
generically, if your lead query were much more complex than a simple WHERE
clause for example.
By far the easiest way is to just write it out as two queries and assign the
value. You don't even need the whole relationship() thing, just emit a second
query for the counts, then set the attribute. That's doityourself.py
attached, where I purposely did it with no joins whatsoever since if you have
the customer_id here, you don't need the JOIN at all - MySQL hates joins so
this approach will definitely be the fastest (plus no SQLAlchemy attribute
overhead). You can make it a little more generic by creating a function that
you give the WHERE clause to, it then applies the WHERE to both the customer
and order tables in some way.
If you still wanted it with the JOIN to the parent, and optionally deal with
the relationship() to OrderCount again, that approach is actually a
specialization of the "subqueryload" recipe at
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/DisjointEagerLoading . The
recipe then uses q.from_self(Child).join(Parent.children) to create the join
from the parent to child. But here the level of hardcoding is already so
great I don't see what it buys you except a slower MySQL query.
If you were doing joined loading, you can keep using OrderCount and use
contains_eager() to write it out explicitly:
subq = s.query(OrderCount).filter(OrderCount.customer_id==1).subquery()
for c in s.query(Customer).\
filter(Customer.id==1).\
outerjoin(subq, Customer._num_orders).\
options(contains_eager("_num_orders", alias=subq)):
print c.id, c.num_orders
The query there still has more nesting than MySQL likes.
Next approach, hardcode a bind in there, very brittle but would produce this
result:
order_counts = select([Order.customer_id,
func.count(Order.id).label('count')]).\
where(Order.customer_id==bindparam("cust_id")).\
group_by(Order.customer_id).alias()
mapper(
OrderCount,
order_counts,
primary_key=order_counts.c.customer_id
)
the "joinedload" scenario would work like this:
s.query(Customer).filter(Customer.id==48).options(joinedload("_num_orders")).params(cust_id=48)
"subqueryload()", not as easy - we need to use the technique at
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/GlobalFilter (which also works
with the joinedload version):
from sqlalchemy.orm.interfaces import MapperOption
class SetABindOption(MapperOption):
propagate_to_loaders = True
def __init__(self, **kw):
self.kw = kw
def process_query_conditionally(self, query):
"""process query during a lazyload"""
query._params = query._params.union(self.kw)
def process_query(self, query):
"""process query during a primary user query"""
self.process_query_conditionally(query)
s.query(Customer).filter(Customer.id==1).options(subqueryload("_num_orders"),
SetABindOption(cust_id=1))
But I'd stick with, just do an extra SELECT statement unless more genericism is
needed. Since it appears this case is so special purpose already and MySQL is
going to chug to a halt if you fall back on other methods.
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en.
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
Base= declarative_base()
class Customer(Base):
__tablename__ = 'customer'
id = Column(Integer, primary_key=True)
orders = relationship("Order")
class Order(Base):
__tablename__ = 'order'
id = Column(Integer, primary_key=True)
customer_id = Column(Integer, ForeignKey('customer.id'))
e = create_engine('sqlite://', echo=True)
Base.metadata.create_all(e)
s = Session(e)
s.add_all([
Customer(orders=[Order(), Order(), Order()]),
Customer(orders=[Order(), Order()]),
Customer(orders=[]),
])
s.commit()
customers = s.query(Customer).filter(Customer.id==1)
from itertools import groupby
order_counts = dict((k, list(v)[0]) for k, v in groupby(
s.query(Order.customer_id, func.count(Order.id).label('count')).
filter(Order.customer_id==1).
order_by(Order.customer_id),
lambda x:x.customer_id
))
for c in customers:
if c.id in order_counts:
c.num_orders = order_counts[c.id]
else:
c.num_orders = 0
for c in customers:
print c.id, c.num_orders