On Apr 24, 2012, at 8:23 PM, JT wrote:

> I have a database with some fields that are encoded in random fashion.
> Apparently the previous maintainer was not aware of the existence of
> content encodings, and wrote raw data from the user directly into the
> db, so there is a mix of utf8, windows-1251, ascii, gb2312, etc, etc,
> etc. As a result, attempting to define these fields as String or
> Unicode causes decoding errors on a large number of the sites. There
> is no way to convert them automatically, so I'm trying to create a
> user-driven method to 'correct' the data.

so far, this entirely sounds like something you'd want to address at the type 
level, using a TypeDecorator.

> 
> My solution is to create a class for the object, and have it mapped in
> the standard way to the table, with the expected datatypes assigned to
> the columns. For example, here are two models:
> 
> Now, in order to handle the situations where the object fails to
> initialize because of encoding errors, i've attempted to define a
> non_primary mapper which maps to the same table, but with different
> column types. Right now, that looks like this (but isn't working):

That seems very awkward.   A non_primary mapper is only a trick to get a 
particular class to load from a certain kind of alternate statement, but the 
persistence of that class stays with it's originating table.   

The exception you have here:

> sqlalchemy.exc.OperationalError: (OperationalError) Could not decode
> to UTF-8 column 'anon_1_test_content' with text 'That�s' u'SELECT
> anon_1.test_id AS anon_1_test_id, anon_1.test_content AS
> anon_1_test_content, subtest_1.id AS subtest_1_id, subtest_1.test_id
> AS subtest_1_test_id, subtest_1.content AS subtest_1_content \nFROM
> (SELECT test.id AS test_id, test.content AS test_content, test.content
> AS test_content \nFROM test \nWHERE test.id = ?\n LIMIT ? OFFSET ?) AS
> anon_1 LEFT OUTER JOIN subtest AS subtest_1 ON anon_1.test_id =
> subtest_1.test_id' (2, 1, 0)

is emitted by SQLite.  So the ultimate problem here is probably not related to 
the mapping, the sqlite3 driver is unable to decode this data as returned from 
the DB.  sqlite3 has very little ability to deal with encodings, it really 
wants you to pass in Python Unicode objects.   So you probably want to at least 
test on a DB that is more than willing to accept any kind of encoding, this is 
usually MySQL.

Usually if you really want the same class mapped in two entirely different ways 
including persistence, you make two classes with the same base, such as that 
described at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/EntityName.

But this is a typing issue and you can get at exactly whatever string behavior 
you want with a TypeDecorator built against String.     You'd be looking to 
write a "decode" method in process_result_value(), docs start at 
http://docs.sqlalchemy.org/en/rel_0_7/core/types.html#augmenting-existing-types 
, which does whatever you want for the data.

If this were my case, I'd also be looking to write a script that will rewrite 
all the data in the database to be in a consistent encoding. 


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to