On 7/28/15 7:41 AM, David Allouche wrote:
On 28 Jul 2015, at 11:31, Jacob Magnusson <[email protected]> wrote:

I'm trying to avoid "implicit" joins (i.e. more than one entry in the FROM 
clause) in my code and I'm wondering if you guys have a good idea on how to best achieve 
this. I want to raise an exception if a query changes to have more than one entry in the 
FROM clause. An alternative would be to have it happen only before the query is executed. 
I'm guessing you would have a good idea on where to start Michael?
Just chiming in to say I think this would indeed be useful.

In my software, I have a few places where queries are built all over the place: column 
selection here, filters there, and I did at least once have a bug caused by such an 
"implicit" join.

Such bugs tend to be very dangerous, because if the executed query has a 
DISTINCT clause, they can be invisible, and if the test dataset is small enough 
they can go undetected. When they land in production with real dataset they 
cause combinatorial explosions.

The ORM often has bugs where extra FROM objects are added by accident, but usually these are issues that are broken on my end to fix in the ORM. The general concept of "implicit joins" is quite useful, hence they are in the tutorial. A lazy-load on a "secondary" relationship uses them as well.

However, the SQL expression system is of course fully open to inspection, so while I can't advise on when you'd want this check and when you wouldn't (seems like the hard part), I'd at least say a way to intercept SQL is to use a before_execute() event. But what is difficult is that it's not easy to tell if a SELECT really has an implicit join in it or not. Here's a simple way:

from sqlalchemy.sql import Select

@event.listens_for(e, "before_execute")
def catch_joins(conn, clauseelement, multiparams, params):
    if isinstance(clauseelement, Select) and \
        len(clauseelement.froms) > 1:
raise Exception("select has too many froms: %s" % clauseelement)

That should work for simple cases. It won't work as is if the SELECT with the implicit froms is buried inside, you'd need to traverse for that. That becomes difficult because a SELECT might have multiple froms, but they're correlated. You'd need to check that somehow as well (or not use correlated SELECTs). That would be:

from sqlalchemy.sql import visitors, Selectable

@event.listens_for(e, "before_execute")
def catch_joins(conn, clauseelement, multiparams, params):
    if isinstance(clauseelement, Selectable):
        def visit_select(select):
            if len(select.froms) > 1:
                raise Exception("select has too many froms: %s" % select)
        visitors.traverse(clauseelement, {}, {"select": visit_select})


as far as how to limit when this takes place, I'd probably put clues into conn.execution_options(). You can check these like:

@event.listens_for(e, "before_execute")
def catch_joins(conn, clauseelement, multiparams, params):
    if conn._execution_options.get('no_implicit_joins', False):
        if isinstance(clauseelement, Selectable):
            def visit_select(select):
                if len(select.froms) > 1:
raise Exception("select has too many froms: %s" % select)
            visitors.traverse(clauseelement, {}, {"select": visit_select})

and set them on a Query:

s.query(A, B).execution_options(no_implicit_joins=True).all()









Regards.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to