Hey, thanks. I tried to add the encoding parameter with the "latin1" value, but it messed up everything and all of content was shown wrong.
I decided to try to convert my whole DB into UTF-8, but I found out that I'm not sure how SA converts the gibberish in my DB into Hebrew. After a lot of trying different encodings, I built a program that will tell me what conversion is done to my Hebrew strings, so I can revert them back to Hebrew and then insert them as UTF-8. Apparently I need to use iconv to convert my sql dump file from utf8 to cp1252, and then I could just insert the sql file as a UTF-8 file. I'll try to convert everything in the next few days and will let you know. Anyhow, the program called "Memir" is released here - http://github.com/bjesus/memir . It's a PyGTK application that helps you test different encodings quickly, and trace conversions. Thank you, Yo'av. 2009/10/13 Michael Bayer <[email protected]> > > On Oct 12, 2009, at 7:22 PM, Yo'av Moshe wrote: > > Hey, > Yes, I'm using a MySQL 5. > > I understand that the problem is probably happening because of some data I > have in my DB, but it's seems odd to me since everything I have in this DB > was created using SA. Can't it read the data it written? > > My mysql connection is specified with "charset=latin1&unicode=0". My > website is shown right, and if I set it to charsrt=utf8 like the wiki says > everything is garbled. The charset is because that is my mysql's tables' > encoding. > > Maybe if I used utf8 when I created the tables it was working now, but it's > too late and I just don't understand how come everything works except for > this search query, and how come SA created data it cannot read, and why the > hell it works the second time ... :( > > > so if your MySQL DB is all in latin1, then you'd have to use that character > set across the board, including the "encoding" parameter sent to > create_engine() - it defaults to utf-8, which is why you see that in your > error message. > > to dig deeper you'd have to really understand exactly what is present in > your tables. This would involve pulling out the row as a raw string and > just trying to decode it with different encodings to see what you have. > > I'm not sure that "latin1" encoding can handle hebrew characters either > (maybe it can, I've never used "latin1" extensively), that's something you > might want to research as well. > > > > > > > > Yo'av > > 2009/10/11 Michael Bayer <[email protected]> > >> >> On Oct 11, 2009, at 2:29 PM, Yo'av Moshe wrote: >> >> No, the error is an UnicodeDecodeError (http://paste2.org/p/457059). >> I can't just "try" a different DB, switch to SQLite, etc. As I've said, my >> website is on production and I have a lot of users using it. >> >> >> the purpose of "trying" a different database is to narrow down the cause >> of the issue, not that you would switch the platform in use for production. >> >> One thing you should be aware of is that your program is failing due to >> the data coming back in your result set, not the data being bound to your >> SQL query. You likely have mis-encoded data present in your table which is >> matched by the criterion you're sending it. When the data is fetched, it >> cannot be decoded via utf-8. >> >> Also you havent as yet told us what database you're using , but I'm >> guessing MySQL, in which case you should ensure that you are using the >> correct client encoding as well as the correct encoding in your schema. >> These are MySQL settings, not SQLAlchemy. client encoding can be specified >> with create_engine() ( >> http://www.sqlalchemy.org/trac/wiki/DatabaseNotes#MySQL) or within >> my.cnf. >> >> >> >> >> Also, the problem is something that started lately, probably because of >> some content that a user has uploaded, so a new DB will work for sure, even >> if it's the same kind. But, I need it to work with my DB, or a least >> understand what caused it so I can make sure it never happens again. >> >> I'll check my DBAPI, although I'm pretty sure it's that latest one that is >> shipped with CentOS5. >> >> Thank you, >> Yo'av >> >> 2009/10/10 Michael Bayer <[email protected]> >> >>> >>> On Oct 10, 2009, at 3:43 AM, Yo'av Moshe wrote: >>> >>> Any ideas? >>> I still don't understand why the query is failing even when I'm using a >>> unicode object. >>> >>> >>> whats the error ? "EOF in multi-line statement" ? thats not a >>> SQLAlchemy error message. what happens when you try SQLA 0.5.6 (perhaps >>> there was some quirk regarding encoding that was fixed) ? a different / >>> latest version of your DBAPI (perhaps your DBAPI is misunderstanding a >>> character as a newline ) ? try SQLite with the same statement ? (what >>> database are you using ?) >>> >>> >>> >>> >>> Yo'av >>> >>> 2009/10/8 Yo'av Moshe <[email protected]> >>> >>>> Thanks, I didn't know about that awful IPython bug... >>>> >>>> I checked, and apparently my website is already doing the SA query with >>>> a unicode object and not with a string one, so I think that it's not the >>>> u'' >>>> thing (it's true that I forgot it in my console testing, though). >>>> What you showed about IPython explains why it didn't give me any result >>>> when running in IPython with the unicode object - since it wasn't really a >>>> unicode object. >>>> >>>> So again - I *am* querying SA with a unicode object, and still, it fails >>>> the first time and works the second time. >>>> >>>> Yo'av. >>>> >>>> 2009/10/7 Wolodja Wentland <[email protected]> >>>> >>>>> On Wed, Oct 07, 2009 at 07:55 -0700, Yo'av Moshe wrote: >>>>> > See what I mean here (it's me running the same query twice in >>>>> > IPython): http://paste2.org/p/457059 >>>>> > >>>>> > What can cause this behavior?! I can't think of anything! I guess >>>>> that >>>>> > one of my users has uploaded some article with some invalid utf8 >>>>> code, >>>>> > but should that kill the query? and how come it doesn't kill the >>>>> > second one? and what can I do to avoid it? >>>>> >>>>> In addition to the bug Mike pointed out to you I want to introduce you >>>>> to my favourite bug this year: >>>>> >>>>> https://bugs.launchpad.net/ipython/+bug/339642 >>>>> >>>>> If you run into unicode issues with IPython it is wise to check the >>>>> 'python' behaviour before development code against this bug. >>>>> >>>>> kind regards >>>>> >>>>> Wolodja Wentland >>>>> >>>>> -----BEGIN PGP SIGNATURE----- >>>>> Version: GnuPG v1.4.10 (GNU/Linux) >>>>> >>>>> iQIcBAEBCAAGBQJKzMesAAoJEIt/fTDK8U78OTsP/jLC/OHMy7SqyM4T1OswUsfL >>>>> 7V4JXjvxk7xSRUaUwWSqbi4FHYPUDVQ3iFD4czVxmqBXeClW8gxJBXCLpYjisXNR >>>>> yXiDurakbeHG5FxrJEstYK9S2ZCM5uAx/aFy8PdT6rf7UO6XAi6nJ7xxQaMx4JMX >>>>> XoA4oU1HsyOh8a0eg8NkmpMVJxeeZxr4DjlfLmXosMEpysG3d+mdq9SkKfKXGEsS >>>>> t8PQqJDw8uLS+XdMmVLuwK6RtHV+ojNkH/FBQ6qfMGJEFWleeh2cKxiBoNTqOKlg >>>>> sf9PznO/63HrswpeUJb8gfPs3tq7Mxa9DJzhgBc0U3toRg2VPjQTASXDc4PYqsJd >>>>> K+WT/vbhpy34VDTABEPdD1DAxgit5H7AI+4DP6l5610qgWn1eNG6/jUi3mRIbojI >>>>> S24/3udaFhOY/0NNDcI5mMijr77sjMbTSizO8ITabef/o9IiYkob32+0pW3j3+aO >>>>> 0kK4SwWtoJ4qWwFsOD4ANcg5QjC9KcL2NlYe2gtWQhk3f9Fz9FbdfNzAptNvs94v >>>>> qic2JONG9aa/CWnqO6RjF0JUCXIcUyr3jr5eKsBh9mli6wd3RYJbRZXHAXBD7ypA >>>>> 3MPd2gX72zl6lCM+gJWgedK7c1YB6YbDcie+hGrj4m/0oHZeZdThbZJLymxvFRul >>>>> 0gr9vxE99ggO3sTq9XLr >>>>> =2y73 >>>>> -----END PGP SIGNATURE----- >>>>> >>>>> >>>> >>>> >>>> -- >>>> Yo'av Moshe >>>> >>> >>> >>> -- >>> Yo'av Moshe >>> >>> >>> >>> >>> >>> >> >> -- >> Yo'av Moshe >> >> >> >> >> >> > > -- > Yo'av Moshe > > > > > > > -- Yo'av Moshe --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---
