On Apr 24, 2012, at 8:23 PM, JT wrote: > I have a database with some fields that are encoded in random fashion. > Apparently the previous maintainer was not aware of the existence of > content encodings, and wrote raw data from the user directly into the > db, so there is a mix of utf8, windows-1251, ascii, gb2312, etc, etc, > etc. As a result, attempting to define these fields as String or > Unicode causes decoding errors on a large number of the sites. There > is no way to convert them automatically, so I'm trying to create a > user-driven method to 'correct' the data.
so far, this entirely sounds like something you'd want to address at the type level, using a TypeDecorator. > > My solution is to create a class for the object, and have it mapped in > the standard way to the table, with the expected datatypes assigned to > the columns. For example, here are two models: > > Now, in order to handle the situations where the object fails to > initialize because of encoding errors, i've attempted to define a > non_primary mapper which maps to the same table, but with different > column types. Right now, that looks like this (but isn't working): That seems very awkward. A non_primary mapper is only a trick to get a particular class to load from a certain kind of alternate statement, but the persistence of that class stays with it's originating table. The exception you have here: > sqlalchemy.exc.OperationalError: (OperationalError) Could not decode > to UTF-8 column 'anon_1_test_content' with text 'That�s' u'SELECT > anon_1.test_id AS anon_1_test_id, anon_1.test_content AS > anon_1_test_content, subtest_1.id AS subtest_1_id, subtest_1.test_id > AS subtest_1_test_id, subtest_1.content AS subtest_1_content \nFROM > (SELECT test.id AS test_id, test.content AS test_content, test.content > AS test_content \nFROM test \nWHERE test.id = ?\n LIMIT ? OFFSET ?) AS > anon_1 LEFT OUTER JOIN subtest AS subtest_1 ON anon_1.test_id = > subtest_1.test_id' (2, 1, 0) is emitted by SQLite. So the ultimate problem here is probably not related to the mapping, the sqlite3 driver is unable to decode this data as returned from the DB. sqlite3 has very little ability to deal with encodings, it really wants you to pass in Python Unicode objects. So you probably want to at least test on a DB that is more than willing to accept any kind of encoding, this is usually MySQL. Usually if you really want the same class mapped in two entirely different ways including persistence, you make two classes with the same base, such as that described at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/EntityName. But this is a typing issue and you can get at exactly whatever string behavior you want with a TypeDecorator built against String. You'd be looking to write a "decode" method in process_result_value(), docs start at http://docs.sqlalchemy.org/en/rel_0_7/core/types.html#augmenting-existing-types , which does whatever you want for the data. If this were my case, I'd also be looking to write a script that will rewrite all the data in the database to be in a consistent encoding. -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
