When I pass binary data to a multi-column in_ clause, I seem to be geting 
inconsistent results and I need some help!  I did some testing with MySQL, 
Postgres, and Vertica (connecting via 
https://pypi.python.org/pypi/vertica-sqlalchemy/0.1).  It appears MySQL 
works correctly but both Postgres and Vertica (which is almost compatible 
with Posgres) I am having trouble.

I authored a simple test model in a schema named 'hashtest'.  The (silly) 
idea is to store binary hash values in the hash_val column and tag the type 
(i.e. md5) in hash_type -- I added a region column just so I would have 2 
non-string columns to play with:

    Base = declarative_base()
    class HashTest(Base):
        __tablename__ = 'hash_test'
        __table_args__ = {'schema': 'hashtest'}

        hash_val = Column(Binary, primary_key=True)
        hash_type = Column(String, primary_key=True)
        region = Column(String)


Insert a single row:
    hashtest=> INSERT INTO hashtest.hash_test (hash_val, hash_type, region) 
VALUES (E'\\xf2666f453b364db65cfdd19756d7e0ad', 'md5', 'US');


For these tests I am using a binary string to store the md5 (rather than 16 
characters 0-F, using 16 raw bytes); the binascii module can do this 
transformation for us:

    *encoded_hash* = '\xf2foE;6M\xb6\\\xfd\xd1\x97V\xd7\xe0\xad'
    
self.assertEquals(binascii.unhexlify('f2666f453b364db65cfdd19756d7e0ad'), 
*encoded_hash*)


We can pass this to a single-column IN clause no problem, and get our data 
back out (the unittest module provides assertEqual):

    # in_ clause with 1 BINARY
    filter_cols = tuple_(HashTest.hash_val)
    filter_vals = ((*encoded_hash*,),)

    q = session.query(HashTest)
    q = q.filter(filter_cols.in_((filter_vals)))
    rows = q.all()
    self.assertEqual(len(rows), 1)
    self.assertEqual(rows[0].hash_val, *encoded_hash*)

We can also build a 2-column in_ clause, with both of the string hash_type 
and region columns, and things also work as expected.  However, when I pass 
in a String and Binary value to a single in_ clause on hash_val and 
hash_type, things stop working:

    # in_ clause with 1 STRING, 1 BINARY
    filter_cols = tuple_(HashTest.hash_val, HashTest.hash_type)
    filter_vals = ((*encoded_hash*, 'md5'),)

    q = session.query(HashTest)
    q = q.filter(filter_cols.in_((filter_vals)))
    rows = q.all()
    self.assertEqual(len(rows), 1)
    self.assertEqual(rows[0].hash_val, *encoded_hash*)


MySQL happily responds with the desired results, but Postgres (and Vertica) 
seem to choke up:

======================================================================
ERROR: testPostgres (md5test.BinaryTupleInTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "md5test.py", line 70, in testPostgres
    self._lookupHash(session)
  File "md5test.py", line 49, in _lookupHash
    rows = q.all()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", 
line 2286, in all
    return list(self)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", 
line 2398, in __iter__
    return self._execute_and_instances(context)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", 
line 2413, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", 
line 717, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", 
line 317, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", 
line 814, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", 
line 927, in _execute_context
    context)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", 
line 1076, in _handle_dbapi_exception
    exc_info
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", 
line 185, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", 
line 920, in _execute_context
    context)
  File 
"/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 
426, in do_execute
    cursor.execute(statement, parameters)
ProgrammingError: (ProgrammingError) operator does not exist: character 
varying = bytea
LINE 3: ...hash_test.hash_val, hashtest.hash_test.hash_type) IN (('\xf2...
                                                             ^
HINT:  No operator matches the given name and argument type(s). You might 
need to add explicit type casts.
 'SELECT hashtest.hash_test.hash_val AS hashtest_hash_test_hash_val, 
hashtest.hash_test.hash_type AS hashtest_hash_test_hash_type, 
hashtest.hash_test.region AS hashtest_hash_test_region \nFROM 
hashtest.hash_test \nWHERE (hashtest.hash_test.hash_val, 
hashtest.hash_test.hash_type) IN ((%(param_1)s, %(param_2)s))' {'param_1': 
<psycopg2._psycopg.Binary object at 0x2d9d850>, 'param_2': 
<psycopg2._psycopg.Binary object at 0x2c18940>}


It's complaining about an improper encoding for the BINARY column.  I 
chased this down for a long time, looking into how sqlalchemy (and python) 
might be serializing the hash_val binary string before sending it over the 
wire.  However, I think something else might be going on instead.

I added a print statement in do_execute in sqlalchemy/engine/default.py:

    def do_execute(self, cursor, statement, parameters, context=None):
print "do_execute parameters:", parameters
cursor.execute(statement, parameters)


And when I run the 2-column in_ query, I see some unexpected results:

MySQL:
    do_execute parameters: ('\xf2foE;6M\xb6\\\xfd\xd1\x97V\xd7\xe0\xad', 
'md5')
Postgres: 
    do_execute parameters: {'param_1': <psycopg2._psycopg.Binary object at 
0x2d98800>, 'param_2': <psycopg2._psycopg.Binary object at 0x2d987b0>}


It looks to me like the postgres engine is passing both hash_val (which is 
a Binary column) and hash_type (which is a String column) as Binary to the 
database!  So my guess is that, while sqlalchemy is encoding the hash_val 
just fine, it is sending the hash_type also as binary, which could be 
causing problems?

Interestingly, if I swap the order of columns, putting the String type 
first and Binary type second, the do_execute parameters seem to both be 
strings:

    # in_ clause with 1 BINARY, 1 STRING
    filter_cols = tuple_(HashTest.hash_type, HashTest.hash_val)
    filter_vals = (('md5', encoded_hash),)
    q = session.query(HashTest)
    q = q.filter(filter_cols.in_((filter_vals)))
    rows = q.all()
    self.assertEqual(len(rows), 1)
    self.assertEqual(rows[0].hash_val, encoded_hash)

MySQL:
    do_execute parameters: ('md5', 
'\xf2foE;6M\xb6\\\xfd\xd1\x97V\xd7\xe0\xad')
Postgres:
    do_execute parameters: {'param_1': 'md5', 'param_2': 
'\xf2foE;6M\xb6\\\xfd\xd1\x97V\xd7\xe0\xad'}


However, when I run that query I get a different encoding error, likely due 
to the fact that both of these values are now being sent as a String type:

======================================================================
ERROR: testPostgres (md5test.BinaryTupleInTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "md5test.py", line 74, in testPostgres
    self._lookupHash(session)
  File "md5test.py", line 62, in _lookupHash
    rows = q.all()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", 
line 2286, in all
    return list(self)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", 
line 2398, in __iter__
    return self._execute_and_instances(context)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", 
line 2413, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", 
line 717, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", 
line 317, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", 
line 814, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", 
line 927, in _execute_context
    context)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", 
line 1076, in _handle_dbapi_exception
    exc_info
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", 
line 185, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", 
line 920, in _execute_context
    context)
  File 
"/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 
426, in do_execute
    cursor.execute(statement, parameters)
DataError: (DataError) invalid byte sequence for encoding "UTF8": 0xf2666f45
 'SELECT hashtest.hash_test.hash_val AS hashtest_hash_test_hash_val, 
hashtest.hash_test.hash_type AS hashtest_hash_test_hash_type, 
hashtest.hash_test.region AS hashtest_hash_test_region \nFROM 
hashtest.hash_test \nWHERE (hashtest.hash_test.hash_type, 
hashtest.hash_test.hash_val) IN ((%(param_1)s, %(param_2)s))' {'param_1': 
'md5', 'param_2': '\xf2foE;6M\xb6\\\xfd\xd1\x97V\xd7\xe0\xad'}


Is there a better way for me to pass binary data to an IN clause with 
SQLAlchemy?  For convenience, I put my full test case up on github (DDL and 
example INSERT added as a comment):
https://gist.github.com/robcrowell/d40f8de223ef108c1923


Thanks for the help!

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to