Re: [sqlalchemy] empty IN clauses

Mike Bayer Thu, 30 Jun 2016 07:18:03 -0700


On 06/30/2016 08:08 AM, Shivaram Lingamneni wrote:

We're working with a codebase that uses SQLAlchemy 1.0.12 to talk to
MySQL 5.5. Periodically, we have issues with empty IN clauses.
Specifically, we have queries that look like
`MyTable.my_column.in_(my_sequence)`. When `my_sequence` is empty,
SQLAlchemy compiles this to `my_table.my_column <> my_table.my_column`.
On MySQL 5.5, this clause will be optimized out to false, as long
`my_column` was declared NOT NULL. However, if `my_column` can take on
NULL values, it cannot be optimized out and it will cause a table scan.

SQLAlchemy's rationale for compiling in this way is described here:

http://article.gmane.org/gmane.comp.python.sqlalchemy.user/6749

specifically, if empty-IN were always compiled to false and empty-NOT-IN
were always compiled to true, `~MyTable.my_column.in_(empty_sequence)`
would return rows where `my_column IS NULL`, which would be
counterintuitive.

oddly enough a resurgence in this issue has revealed some disagreementon that.

is NULL "not in" the empty set? I asked on twitter and the answersleaned towards "the answer is NULL" but someone did find out thatdatabases do have an opinion (postgresql for example):


test=# SELECT NULL IN (select 1 where false);
 ?column?
----------
 f
(1 row)

So it seems clear that probably for 1.2 we will need to add acomprehensive layer of being able to change what "IN" does, the choicesbeing "raise", "simple-false", "column-false", and probably some others,and for starters it would be a dialect-level flag. I'd love to changethe default to "simple-false" too due to the above revelation thatdatabases can be coerced into admitting their opinion on this.



However, we're willing to accept this tradeoff in

order to achieve the goals of (1) protecting our servers from
unintentional table scans (2) not having to explicitly guard against
empty-IN across our codebase, as we would if we made SQLAlchemy's
warning into an exception.


probably quick and dirty is best for now:

from sqlalchemy import create_engine
from sqlalchemy import String, Table, MetaData, Column, select

from sqlalchemy import event

import re

e = create_engine("sqlite://", echo=True)

t = Table('t', MetaData(), Column('x', String(50)))
t.create(e)


@event.listens_for(e, "before_cursor_execute", retval=True)
def fix_slow_falses(
        conn, cursor, statement, parameters, context, executemany):
    statement = re.sub(r'(.+) != \1', '1 != 1', statement)
    return statement, parameters


e.execute(select([t]).where(t.c.x.in_(['a', 'b', 'c'])))
e.execute(select([t]).where(t.c.x != 5))
e.execute(select([t]).where(t.c.x.in_([])))


My question is: what's the best way to achieve this? I've considered:

1. Using the TypeEngine.Comparator API to change the way `in_` is
compiled (this was suggested on freenode).

http://docs.sqlalchemy.org/en/rel_1_0/core/custom_types.html#redefining-and-creating-new-operators

This is the cleanest approach in terms of working with the established
APIs. The difficulty I'm seeing is that it seems necessary to subclass
all of the standard SQLAlchemy types (e.g., Integer, String, Float),
then ensure that the subclasses are used in all model definitions. This
seems error-prone, i.e., if we miss a type, we'll get table scans anyway.

2. Swapping out the default implementation of the `in_` operator by
modifying `sqlalchemy.sql.default_comparator.operator_lookup`. I had
trouble getting this to work, which is probably just as well: it's a
monkeypatch and it requires working with non-public APIs.

3. Adding an in-house patch that compiles empty IN to 0. Something like
this:

https://gist.github.com/slingamn/8fc40ed59287267a89ad102ece75881e

This seems relatively clean, but we'd like to stay on the stock upstream
package if possible.

So I'm interested in advice about these approaches, or suggestions for
new ones. Has anyone else dealt with this issue recently? Thanks very
much for your time.

--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] empty IN clauses

Reply via email to