On Sep 23, 2010, at 8:18 PM, Anthony Tran wrote:
> Hi everyone,
>
> I'm trying to do a query where I joinedload with a polymorphic child table.
> The child table is using a joined table inheritance scheme. I'm using
> with_polymorphic = '*' to accomplish the eager loading of polymorphic types.
> The query that SQLAlchemy generates results in a sub select that returns all
> the child rows. Here's the setup:
>
>
> Here's the resulting SQL query:
>
> SELECT parent_table.id AS parent_table_id,
> anon_1.child_base_table_id AS anon_1_child_base_table_id,
> anon_1.child_base_table_parent_id AS anon_1_child_base_table_parent_id,
> anon_1.child_base_table_child_type AS
> anon_1_child_base_table_child_type,
> anon_1.child1_table_child_id AS anon_1_child1_table_child_id,
> anon_1.child1_table_child1_data AS anon_1_child1_table_child1_data,
> anon_1.child2_table_child_id AS anon_1_child2_table_child_id,
> anon_1.child2_table_child2_data AS anon_1_child2_table_child2_data
> FROM parent_table
> LEFT OUTER JOIN (SELECT
> child_base_table.id AS child_base_table_id,
> child_base_table.parent_id AS
> child_base_table_parent_id,
> child_base_table.child_type AS
> child_base_table_child_type,
> child1_table.child_id AS
> child1_table_child_id,
> child1_table.child1_data AS
> child1_table_child1_data,
> child2_table.child_id AS
> child2_table_child_id,
> child2_table.child2_data AS
> child2_table_child2_data
> FROM child_base_table
> LEFT OUTER JOIN child1_table
> ON child_base_table.id = child1_table.child_id
> LEFT OUTER JOIN child2_table
> ON child_base_table.id =
> child2_table.child_id)
> AS
> anon_1
> ON parent_table.id = anon_1.child_base_table_parent_id
> WHERE parent_table.id = 999999
>
> Note that the inner select has no WHERE clause which results in selecting all
> of the child rows. Without the with_polymorphic option, SQLAlchemy doesn't
> produce the inner select. Using subquery_load causes the same problem. Is
> there a way to prevent this from happening and eager load the polymorphic
> tables?
So, interesting terminology you used above - "a sub select that returns all the
child rows". The SELECT that is the right hand target of the LEFT OUTER JOIN
doesn't actually "return" any rows to your application - the outer one does.
The subquery only represents "the set of all child rows".
In relational theory, there is no difference between these two statements :
select * from parent join child on parent.id = child.parent_id
select * from parent join (select * from child) as c1 on
parent.id=c1.parent_id
SQLAlchemy's normal behavior makes great usage of this natural effect. The
"set of all rows in 'child'" is the same as the "set of all rows in the set of
all rows in 'child'". A good SQL optimizer, like that of Postgresql or
Oracle, knows this, and in fact with PG my experience is that it's extremely
difficult to get it to come up with a different execution plan by moving
subqueries out into joins and such - it almost always figures out the common
relational structure underneath (however, if you're on PG, I can't say for sure
without trying how well the above query would optimize). SQLAlchemy's eager
loader treats all "join targets" as the same thing - only the kind of
selectable which comprises the "join target" changes.
Now, that's the theory part. The practical part is, if you're using MySQL, a
query like the above is going to hit you hard and mercilessly, as their
optimizer is the worst I've ever seen.
So my advice here depends strongly on which backend you're using. To be
blunt, if you're using MySQL, I would not be using joined table inheritance for
anything critical, at all, period. MySQL completely sucks at joins and it
always will. SQLAlchemy is heavily oriented around relational theory,
whereas MySQL is the database where they didn't think foreign keys were really
that important until ten years into the project.
If you're using Postgresql, I would be checking what EXPLAIN tells you - it may
or may not make any difference if a WHERE clause were stuck inside the
subquery. I will say about PG that it works much better on inner joins than
outer ones, though that's hard to use when you're trying to scan out on
polymorphic tables like the above.
So lets say you're on MySQL or PG, you can't change your schema, you've
observed that yes, the planner really is performing more poorly because the
subquery isn't pre-limited (I really think thats only MySQL's quirk, though),
what do you do ? Well then you have to construct the JOIN by hand and use
contains_eager():
from sqlalchemy.orm import contains_eager
subq = session.query(Child).filter(Child.parent_id==99999).subquery()
query(Parent).outerjoin((subq,
Parent.children)).filter(Parent.id=99999).options(contains_eager(Parent.children,
alias=subq)).one()
so the above is an example of more explicit usage when the default
relational-theory approach is not practical. I haven't tested the snippet
above but its built on patterns that are very well tested so hopefully should
work out of the gate.
>
> Thanks!
> Anthony
>
> --
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/sqlalchemy?hl=en.
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en.