When I pass binary data to a multi-column in_ clause, I seem to be geting
inconsistent results and I need some help! I did some testing with MySQL,
Postgres, and Vertica (connecting via
https://pypi.python.org/pypi/vertica-sqlalchemy/0.1). It appears MySQL
works correctly but both Postgres and Vertica (which is almost compatible
with Posgres) I am having trouble.
I authored a simple test model in a schema named 'hashtest'. The (silly)
idea is to store binary hash values in the hash_val column and tag the type
(i.e. md5) in hash_type -- I added a region column just so I would have 2
non-string columns to play with:
Base = declarative_base()
class HashTest(Base):
__tablename__ = 'hash_test'
__table_args__ = {'schema': 'hashtest'}
hash_val = Column(Binary, primary_key=True)
hash_type = Column(String, primary_key=True)
region = Column(String)
Insert a single row:
hashtest=> INSERT INTO hashtest.hash_test (hash_val, hash_type, region)
VALUES (E'\\xf2666f453b364db65cfdd19756d7e0ad', 'md5', 'US');
For these tests I am using a binary string to store the md5 (rather than 16
characters 0-F, using 16 raw bytes); the binascii module can do this
transformation for us:
*encoded_hash* = '\xf2foE;6M\xb6\\\xfd\xd1\x97V\xd7\xe0\xad'
self.assertEquals(binascii.unhexlify('f2666f453b364db65cfdd19756d7e0ad'),
*encoded_hash*)
We can pass this to a single-column IN clause no problem, and get our data
back out (the unittest module provides assertEqual):
# in_ clause with 1 BINARY
filter_cols = tuple_(HashTest.hash_val)
filter_vals = ((*encoded_hash*,),)
q = session.query(HashTest)
q = q.filter(filter_cols.in_((filter_vals)))
rows = q.all()
self.assertEqual(len(rows), 1)
self.assertEqual(rows[0].hash_val, *encoded_hash*)
We can also build a 2-column in_ clause, with both of the string hash_type
and region columns, and things also work as expected. However, when I pass
in a String and Binary value to a single in_ clause on hash_val and
hash_type, things stop working:
# in_ clause with 1 STRING, 1 BINARY
filter_cols = tuple_(HashTest.hash_val, HashTest.hash_type)
filter_vals = ((*encoded_hash*, 'md5'),)
q = session.query(HashTest)
q = q.filter(filter_cols.in_((filter_vals)))
rows = q.all()
self.assertEqual(len(rows), 1)
self.assertEqual(rows[0].hash_val, *encoded_hash*)
MySQL happily responds with the desired results, but Postgres (and Vertica)
seem to choke up:
======================================================================
ERROR: testPostgres (md5test.BinaryTupleInTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "md5test.py", line 70, in testPostgres
self._lookupHash(session)
File "md5test.py", line 49, in _lookupHash
rows = q.all()
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py",
line 2286, in all
return list(self)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py",
line 2398, in __iter__
return self._execute_and_instances(context)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py",
line 2413, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py",
line 717, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py",
line 317, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py",
line 814, in _execute_clauseelement
compiled_sql, distilled_params
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py",
line 927, in _execute_context
context)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py",
line 1076, in _handle_dbapi_exception
exc_info
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py",
line 185, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py",
line 920, in _execute_context
context)
File
"/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line
426, in do_execute
cursor.execute(statement, parameters)
ProgrammingError: (ProgrammingError) operator does not exist: character
varying = bytea
LINE 3: ...hash_test.hash_val, hashtest.hash_test.hash_type) IN (('\xf2...
^
HINT: No operator matches the given name and argument type(s). You might
need to add explicit type casts.
'SELECT hashtest.hash_test.hash_val AS hashtest_hash_test_hash_val,
hashtest.hash_test.hash_type AS hashtest_hash_test_hash_type,
hashtest.hash_test.region AS hashtest_hash_test_region \nFROM
hashtest.hash_test \nWHERE (hashtest.hash_test.hash_val,
hashtest.hash_test.hash_type) IN ((%(param_1)s, %(param_2)s))' {'param_1':
<psycopg2._psycopg.Binary object at 0x2d9d850>, 'param_2':
<psycopg2._psycopg.Binary object at 0x2c18940>}
It's complaining about an improper encoding for the BINARY column. I
chased this down for a long time, looking into how sqlalchemy (and python)
might be serializing the hash_val binary string before sending it over the
wire. However, I think something else might be going on instead.
I added a print statement in do_execute in sqlalchemy/engine/default.py:
def do_execute(self, cursor, statement, parameters, context=None):
print "do_execute parameters:", parameters
cursor.execute(statement, parameters)
And when I run the 2-column in_ query, I see some unexpected results:
MySQL:
do_execute parameters: ('\xf2foE;6M\xb6\\\xfd\xd1\x97V\xd7\xe0\xad',
'md5')
Postgres:
do_execute parameters: {'param_1': <psycopg2._psycopg.Binary object at
0x2d98800>, 'param_2': <psycopg2._psycopg.Binary object at 0x2d987b0>}
It looks to me like the postgres engine is passing both hash_val (which is
a Binary column) and hash_type (which is a String column) as Binary to the
database! So my guess is that, while sqlalchemy is encoding the hash_val
just fine, it is sending the hash_type also as binary, which could be
causing problems?
Interestingly, if I swap the order of columns, putting the String type
first and Binary type second, the do_execute parameters seem to both be
strings:
# in_ clause with 1 BINARY, 1 STRING
filter_cols = tuple_(HashTest.hash_type, HashTest.hash_val)
filter_vals = (('md5', encoded_hash),)
q = session.query(HashTest)
q = q.filter(filter_cols.in_((filter_vals)))
rows = q.all()
self.assertEqual(len(rows), 1)
self.assertEqual(rows[0].hash_val, encoded_hash)
MySQL:
do_execute parameters: ('md5',
'\xf2foE;6M\xb6\\\xfd\xd1\x97V\xd7\xe0\xad')
Postgres:
do_execute parameters: {'param_1': 'md5', 'param_2':
'\xf2foE;6M\xb6\\\xfd\xd1\x97V\xd7\xe0\xad'}
However, when I run that query I get a different encoding error, likely due
to the fact that both of these values are now being sent as a String type:
======================================================================
ERROR: testPostgres (md5test.BinaryTupleInTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "md5test.py", line 74, in testPostgres
self._lookupHash(session)
File "md5test.py", line 62, in _lookupHash
rows = q.all()
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py",
line 2286, in all
return list(self)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py",
line 2398, in __iter__
return self._execute_and_instances(context)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/query.py",
line 2413, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py",
line 717, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py",
line 317, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py",
line 814, in _execute_clauseelement
compiled_sql, distilled_params
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py",
line 927, in _execute_context
context)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py",
line 1076, in _handle_dbapi_exception
exc_info
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py",
line 185, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py",
line 920, in _execute_context
context)
File
"/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line
426, in do_execute
cursor.execute(statement, parameters)
DataError: (DataError) invalid byte sequence for encoding "UTF8": 0xf2666f45
'SELECT hashtest.hash_test.hash_val AS hashtest_hash_test_hash_val,
hashtest.hash_test.hash_type AS hashtest_hash_test_hash_type,
hashtest.hash_test.region AS hashtest_hash_test_region \nFROM
hashtest.hash_test \nWHERE (hashtest.hash_test.hash_type,
hashtest.hash_test.hash_val) IN ((%(param_1)s, %(param_2)s))' {'param_1':
'md5', 'param_2': '\xf2foE;6M\xb6\\\xfd\xd1\x97V\xd7\xe0\xad'}
Is there a better way for me to pass binary data to an IN clause with
SQLAlchemy? For convenience, I put my full test case up on github (DDL and
example INSERT added as a comment):
https://gist.github.com/robcrowell/d40f8de223ef108c1923
Thanks for the help!
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.