On Tue, Feb 4, 2014 at 10:15 AM, Erich Blume <[email protected]> wrote: > I am working on a binding to a SQLite database that I do not control the > creation of, with the aid of reflection. I'm running in to what I believe > are very basic UTF-8 decoding errors. For instance, a TEXT cell has the byte > '0x92' in it and is causing an OperationalError. Presumably, this is because > 0x92 (by itself) is not a valid encoding for any Unicode code point. I would > prefer that the decoding from UTF-8 to be forced, perhaps by dropping the > bad byte. How can I do this? > > The database has a table with a column called 'description', which is of > type TEXT. The "PRAGMA encoding" is left at 'UTF-8', thank goodness. One of > the rows, however, contains within its otherwise ascii byte contents the > singleton byte '0x92'. Based on the context of the sentence, it seems that > this was intended to be encoded as a single quotation mark, some googling > suggests 'RIGHT SINGLE QUOTATION MARK' in unicode, which is '0xE2 0x80 > 0x99'. I gather that MSSQL (which was the original source of the data in > this database) uses Microsofts' infernal web encodings sometimes and that is > probably the source of this byte. > > The issue is this: I really need to read this data! It would be *ideal* to > have the aid of something like python's 'replace' decoding handler but > failing that just eliding the byte would do fine in a pinch. > > When fetching this row in Python 3.3 with SQLAlchemy 0.9.1 my session looks > vaguely like this (with the text and stack trace truncated out for brevity). > > File > "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/sqlalchemy/engine/result.py", > line 760, in <listcomp> > return [process_row(metadata, row, processors, keymap) > sqlalchemy.exc.OperationalError: (OperationalError) Could not decode > to UTF-8 column 'description' with text <...> > > Is there some way to accomplish this? >
The String-related column types have a "unicode_error" parameter which sounds like it might be what you want: http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#sqlalchemy.types.String.params.unicode_error Note the various warnings around it though... Hope that helps, Simon -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/groups/opt_out.
