On Feb 6, 2014, at 6:59 PM, Erich Blume <[email protected]> wrote:
> Hmm, this one has me stumped. As best I can tell after poking at it using the
> column_reflect event, a custom dialect, etc. - the issue here is that in
> pysqlite.py we (in my Python 3.3 install) are selecting `sqlite3.dbapi2` as
> the dbapi interface, but we aren't telling sqlite3 anything about how to
> treat unicode errors. From what I am reading (but it seems inconsistent,
> maybe?) sqlite3 automatically decodes all database retrieved values from
> their bytes for text fields, returning unicode strings. Except... that
> doesn't always seem to be true. I hex-edited a db file to change the utf-8
> string "hello" to "hell" + 0x92 and sqlite3 switched from returning "hello"
> to b"hell\x92", or something like that - I've been poking at this for so long
> I've lost track of that transcript.
>
> One can override sqlite3's text factory, apparently, with (for instance)
> `sqlite3.text_factory = lambda x: x.decode('utf-8', errors='ignore')`. Maybe
> the key is to try and find a way to trigger that from sqlalchemy? I tried and
> failed, maybe someone else can point me back to the path?
>
> Just to re-summarize the problem: In python 3, I'm getting errors trying to
> read a row from a sqlite database that has a TEXT column with an invalid
> utf-8 sequence (specifically, the singleton bye '0x92'). I'd love to just
> have sqlalchemy move along and ignore the byte, but I'm not clear how to do
> that.
Pysqlite (e.g. sqlite3 module) returns TEXT as Python unicode out of the gate.
That exception message is being raised by sqlite3 itself, SQLAlchemy is just a
pass-through, as the string type knows on sqlite that the value is already
unicode.
you might need to CAST() the value as BINARY perhaps, not sure. You’d first
want to get a plain sqlite3 script to do what you want. Setting a
“text_factory” at the module level of sqlite3 is certainly easy enough but that
seems way too broad. Ideally you’d want to be able to get the value on a
per-column basis.
signature.asc
Description: Message signed with OpenPGP using GPGMail
