On Thu, Oct 30, 2008 at 11:36 AM, James Henstridge <[EMAIL PROTECTED]> wrote: > [CC'ing the mailing list, since you dropped it in your reply] > > On Wed, Oct 29, 2008 at 6:37 PM, kevin gill <[EMAIL PROTECTED]> wrote: >>> PostgreSQL should reencode input/output between the database encoding >>> and client encoding for text/character fields. >>> >>> http://www.postgresql.org/docs/8.3/static/multibyte.html >>> >>> Storm sets the client encoding to UTF-8, which should work with any >>> database encoding (of course, some unicode strings passed to the >>> database may give errors if they can't be represented, but that is >>> what you'd expect. Is this not happening for you? >> >> This is an old database which is connected to a Zope 2 site. The database >> is SQL_ASCII, and the Zope 2 system binds to it using latin-1 (PyscopgDA >> etc). The result is that there is data on the database encoded in latin-1 >> but PostgreSQL has no rules for handling it. > > That does sound like a problem. I don't suppose you'd have the > opportunity to dump and restore your database with a correct encoding? > The page I referenced above strongly recommends against use of that > encoding.
This is a valid setup, although a non-optimal one. I think in Kevin's case he might save himself future pain if the database can be rebuilt to explicitly specify LATIN1 as the encoding if he doesn't plan to migrate to a UTF8 database in the future. I think other people do use this setup though when they need to store data in multiple encodings in the database. The absolute worst case would be subsets of rows storing data in different columns. eg. a table that stores text in the original input encoding (along with enough information that it can be decoded again!) rather than normalizing it to a common encoding. A slightly better case is a table that stores data in different columns in different encodings. Next is database where data in different tables is stored in different encodings. Finally is a database where all data is stored in a particular encoding, but the client needs to know the encoding so it can decode it (Kevin's Latin1 database). If people thinks these legacy systems are worth supporting, I would hope it can be done adding minimal complexity. Perhaps an EncodedText column type to use instead of Unicode that has the DB encoding as a required parameter? I think this is preferable as it is explicit, supports more scenarios and database backends, and allows systems to gradually migrate to a UTF8 only DB. (I don't think the first scenario I listed is supportable by an ORM directly without great complexity, as the ORM would need to be taught now to deduce the encoding of columns, and how to store a valid row when writing). -- Stuart Bishop <[EMAIL PROTECTED]> http://www.stuartbishop.net/ -- storm mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/storm
