[sqlalchemy] empty IN clauses

Shivaram Lingamneni Thu, 30 Jun 2016 05:09:07 -0700

We're working with a codebase that uses SQLAlchemy 1.0.12 to talk to MySQL 
5.5. Periodically, we have issues with empty IN clauses. Specifically, we 
have queries that look like `MyTable.my_column.in_(my_sequence)`. When 
`my_sequence` is empty, SQLAlchemy compiles this to `my_table.my_column <> 
my_table.my_column`. On MySQL 5.5, this clause will be optimized out to 
false, as long `my_column` was declared NOT NULL. However, if `my_column` 
can take on NULL values, it cannot be optimized out and it will cause a 
table scan.


SQLAlchemy's rationale for compiling in this way is described here:

http://article.gmane.org/gmane.comp.python.sqlalchemy.user/6749

specifically, if empty-IN were always compiled to false and empty-NOT-IN 
were always compiled to true, `~MyTable.my_column.in_(empty_sequence)` 
would return rows where `my_column IS NULL`, which would be 
counterintuitive. However, we're willing to accept this tradeoff in order 
to achieve the goals of (1) protecting our servers from unintentional table 
scans (2) not having to explicitly guard against empty-IN across our 
codebase, as we would if we made SQLAlchemy's warning into an exception.

My question is: what's the best way to achieve this? I've considered:

1. Using the TypeEngine.Comparator API to change the way `in_` is compiled 
(this was suggested on freenode).

http://docs.sqlalchemy.org/en/rel_1_0/core/custom_types.html#redefining-and-creating-new-operators

This is the cleanest approach in terms of working with the established 
APIs. The difficulty I'm seeing is that it seems necessary to subclass all 
of the standard SQLAlchemy types (e.g., Integer, String, Float), then 
ensure that the subclasses are used in all model definitions. This seems 
error-prone, i.e., if we miss a type, we'll get table scans anyway.

2. Swapping out the default implementation of the `in_` operator by 
modifying `sqlalchemy.sql.default_comparator.operator_lookup`. I had 
trouble getting this to work, which is probably just as well: it's a 
monkeypatch and it requires working with non-public APIs.

3. Adding an in-house patch that compiles empty IN to 0. Something like 
this:

https://gist.github.com/slingamn/8fc40ed59287267a89ad102ece75881e

This seems relatively clean, but we'd like to stay on the stock upstream 
package if possible.

So I'm interested in advice about these approaches, or suggestions for new 
ones. Has anyone else dealt with this issue recently? Thanks very much for 
your time.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

[sqlalchemy] empty IN clauses

Reply via email to