I have a database with some fields that are encoded in random fashion.
Apparently the previous maintainer was not aware of the existence of
content encodings, and wrote raw data from the user directly into the
db, so there is a mix of utf8, windows-1251, ascii, gb2312, etc, etc,
etc. As a result, attempting to define these fields as String or
Unicode causes decoding errors on a large number of the sites. There
is no way to convert them automatically, so I'm trying to create a
user-driven method to 'correct' the data.

My solution is to create a class for the object, and have it mapped in
the standard way to the table, with the expected datatypes assigned to
the columns. For example, here are two models:

from sqlalchemy import (create_engine, Column, Integer, ForeignKey,
    UnicodeText, LargeBinary, MetaData, Table)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, backref, sessionmaker, mapper
import sqlite3

engine = create_engine('sqlite:///test.db', echo=True)
Base = declarative_base(bind=engine)
Session = sessionmaker(bind=engine)

class Test(Base):
    __tablename__ = 'test'
    id = Column(Integer, primary_key=True)
    content = Column(UnicodeText)

class SubTest(Base):
    __tablename__ = 'subtest'
    id = Column(Integer, primary_key=True)
    test_id = Column(Integer, ForeignKey(Test.id))
    content = Column(UnicodeText)

    test = relationship(Test, lazy=True,
        backref=backref('subtests', lazy=False))

Now, in order to handle the situations where the object fails to
initialize because of encoding errors, i've attempted to define a
non_primary mapper which maps to the same table, but with different
column types. Right now, that looks like this (but isn't working):

table = Table('test', Base.metadata,
    Column('content', LargeBinary),
    extend_existing=True,
    )
binary_mapper = mapper(Test, table, non_primary=True)

and the main part of the script looks like this:

if __name__ == '__main__':
    Base.metadata.create_all()
    session = Session()

    conn = sqlite3.connect('test.db')
    conn.text_factory = str
    c = conn.cursor()

    # load in the test data
    c.execute('delete from test')
    query = 'insert into test (content) values(?)'
    values = [
        ('That\'s',), #ascii
        ('That\x92s',), #windows-1251
        ('That\xe2\x80\x99s',), #utf8
        ]
    c.executemany(query, values)
    conn.commit()

    # verify that inserts were successful
    c.execute('select * from test')
    print c.fetchall()

    test_ids = [x[0] for x in session.query(Test.id).all()]
    for test_id in test_ids:
        try:
            test = session.query(Test).get(test_id)
        except:
            test = session.query(binary_mapper, Test) \
                .filter_by(id=test_id) \
                .first()
        print test

This raises this error when run:

sqlalchemy.exc.OperationalError: (OperationalError) Could not decode
to UTF-8 column 'anon_1_test_content' with text 'That�s' u'SELECT
anon_1.test_id AS anon_1_test_id, anon_1.test_content AS
anon_1_test_content, subtest_1.id AS subtest_1_id, subtest_1.test_id
AS subtest_1_test_id, subtest_1.content AS subtest_1_content \nFROM
(SELECT test.id AS test_id, test.content AS test_content, test.content
AS test_content \nFROM test \nWHERE test.id = ?\n LIMIT ? OFFSET ?) AS
anon_1 LEFT OUTER JOIN subtest AS subtest_1 ON anon_1.test_id =
subtest_1.test_id' (2, 1, 0)

I suspect this is because my attempt to create the second mapper
resulted in there now being two references to the same 'content'
column, one of which is the original declaration which is still trying
to decode the content. I was hoping the secondary mapper would replace
the original column with the new one, but that wasn't the case.

Is there a way I can map the Test class to two variants of the same
Table? Or does someone know a better way to accomplish what I'm trying
to do here?

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to