Hey, thanks.
I tried to add the encoding parameter with the "latin1" value, but it messed
up everything and all of content was shown wrong.

I decided to try to convert my whole DB into UTF-8, but I found out that I'm
not sure how SA converts the gibberish in my DB into Hebrew. After a lot of
trying different encodings, I built a program that will tell me what
conversion is done to my Hebrew strings, so I can revert them back to Hebrew
and then insert them as UTF-8. Apparently I need to use iconv to convert my
sql dump file from utf8 to cp1252, and then I could just insert the sql file
as a UTF-8 file.

I'll try to convert everything in the next few days and will let you know.
Anyhow, the program called "Memir" is released here -
http://github.com/bjesus/memir . It's a PyGTK application that helps you
test different encodings quickly, and trace conversions.

Thank you,
Yo'av.

2009/10/13 Michael Bayer <[email protected]>

>
> On Oct 12, 2009, at 7:22 PM, Yo'av Moshe wrote:
>
> Hey,
> Yes, I'm using a MySQL 5.
>
> I understand that the problem is probably happening because of some data I
> have in my DB, but it's seems odd to me since everything I have in this DB
> was created using SA. Can't it read the data it written?
>
> My mysql connection is specified with "charset=latin1&unicode=0". My
> website is shown right, and if I set it to charsrt=utf8 like the wiki says
> everything is garbled. The charset is because that is my mysql's tables'
> encoding.
>
> Maybe if I used utf8 when I created the tables it was working now, but it's
> too late and I just don't understand how come everything works except for
> this search query, and how come SA created data it cannot read, and why the
> hell it works the second time ... :(
>
>
> so if your MySQL DB is all in latin1, then you'd have to use that character
> set across the board, including the "encoding" parameter sent to
> create_engine() - it defaults to utf-8, which is why you see that in your
> error message.
>
> to dig deeper you'd have to really understand exactly what is present in
> your tables.   This would involve pulling out the row as a raw string and
> just trying to decode it with different encodings to see what you have.
>
> I'm not sure that "latin1" encoding can handle hebrew characters either
> (maybe it can, I've never used "latin1" extensively), that's something you
> might want to research as well.
>
>
>
>
>
>
>
> Yo'av
>
> 2009/10/11 Michael Bayer <[email protected]>
>
>>
>> On Oct 11, 2009, at 2:29 PM, Yo'av Moshe wrote:
>>
>> No, the error is an UnicodeDecodeError (http://paste2.org/p/457059).
>> I can't just "try" a different DB, switch to SQLite, etc. As I've said, my
>> website is on production and I have a lot of users using it.
>>
>>
>> the purpose of "trying" a different database is to narrow down the cause
>> of the issue, not that you would switch the platform in use for production.
>>
>> One thing you should be aware of is that your program is failing due to
>> the data coming back in your result set, not the data being bound to your
>> SQL query.   You likely have mis-encoded data present in your table which is
>> matched by the criterion you're sending it.   When the data is fetched, it
>> cannot be decoded via utf-8.
>>
>> Also you havent as yet told us what database you're using , but I'm
>> guessing MySQL, in which case you should ensure that you are using the
>> correct client encoding as well as the correct encoding in your schema.
>> These are MySQL settings, not SQLAlchemy.  client encoding can be specified
>> with create_engine() (
>> http://www.sqlalchemy.org/trac/wiki/DatabaseNotes#MySQL)  or within
>> my.cnf.
>>
>>
>>
>>
>> Also, the problem is something that started lately, probably because of
>> some content that a user has uploaded, so a new DB will work for sure, even
>> if it's the same kind. But, I need it to work with my DB, or a least
>> understand what caused it so I can make sure it never happens again.
>>
>> I'll check my DBAPI, although I'm pretty sure it's that latest one that is
>> shipped with CentOS5.
>>
>> Thank you,
>> Yo'av
>>
>> 2009/10/10 Michael Bayer <[email protected]>
>>
>>>
>>> On Oct 10, 2009, at 3:43 AM, Yo'av Moshe wrote:
>>>
>>> Any ideas?
>>> I still don't understand why the query is failing even when I'm using a
>>> unicode object.
>>>
>>>
>>> whats the error ?  "EOF in multi-line statement" ?  thats not a
>>> SQLAlchemy error message.   what happens when you try SQLA 0.5.6 (perhaps
>>> there was some quirk regarding encoding that was fixed) ?  a different /
>>> latest version of your DBAPI (perhaps your DBAPI is misunderstanding a
>>> character as a newline ) ?  try SQLite with the same statement  ?  (what
>>> database are you using ?)
>>>
>>>
>>>
>>>
>>> Yo'av
>>>
>>> 2009/10/8 Yo'av Moshe <[email protected]>
>>>
>>>> Thanks, I didn't know about that awful IPython bug...
>>>>
>>>> I checked, and apparently my website is already doing the SA query with
>>>> a unicode object and not with a string one, so I think that it's not the 
>>>> u''
>>>> thing (it's true that I forgot it in my console testing, though).
>>>> What you showed about IPython explains why it didn't give me any result
>>>> when running in IPython with the unicode object - since it wasn't really a
>>>> unicode object.
>>>>
>>>> So again - I *am* querying SA with a unicode object, and still, it fails
>>>> the first time and works the second time.
>>>>
>>>> Yo'av.
>>>>
>>>> 2009/10/7 Wolodja Wentland <[email protected]>
>>>>
>>>>> On Wed, Oct 07, 2009 at 07:55 -0700, Yo'av Moshe wrote:
>>>>> > See what I mean here (it's me running the same query twice in
>>>>> > IPython): http://paste2.org/p/457059
>>>>> >
>>>>> > What can cause this behavior?! I can't think of anything! I guess
>>>>> that
>>>>> > one of my users has uploaded some article with some invalid utf8
>>>>> code,
>>>>> > but should that kill the query? and how come it doesn't kill the
>>>>> > second one? and what can I do to avoid it?
>>>>>
>>>>> In addition to the bug Mike pointed out to you I want to introduce you
>>>>> to my favourite bug this year:
>>>>>
>>>>> https://bugs.launchpad.net/ipython/+bug/339642
>>>>>
>>>>> If you run into unicode issues with IPython it is wise to check the
>>>>> 'python' behaviour before development code against this bug.
>>>>>
>>>>> kind regards
>>>>>
>>>>>    Wolodja Wentland
>>>>>
>>>>> -----BEGIN PGP SIGNATURE-----
>>>>> Version: GnuPG v1.4.10 (GNU/Linux)
>>>>>
>>>>> iQIcBAEBCAAGBQJKzMesAAoJEIt/fTDK8U78OTsP/jLC/OHMy7SqyM4T1OswUsfL
>>>>> 7V4JXjvxk7xSRUaUwWSqbi4FHYPUDVQ3iFD4czVxmqBXeClW8gxJBXCLpYjisXNR
>>>>> yXiDurakbeHG5FxrJEstYK9S2ZCM5uAx/aFy8PdT6rf7UO6XAi6nJ7xxQaMx4JMX
>>>>> XoA4oU1HsyOh8a0eg8NkmpMVJxeeZxr4DjlfLmXosMEpysG3d+mdq9SkKfKXGEsS
>>>>> t8PQqJDw8uLS+XdMmVLuwK6RtHV+ojNkH/FBQ6qfMGJEFWleeh2cKxiBoNTqOKlg
>>>>> sf9PznO/63HrswpeUJb8gfPs3tq7Mxa9DJzhgBc0U3toRg2VPjQTASXDc4PYqsJd
>>>>> K+WT/vbhpy34VDTABEPdD1DAxgit5H7AI+4DP6l5610qgWn1eNG6/jUi3mRIbojI
>>>>> S24/3udaFhOY/0NNDcI5mMijr77sjMbTSizO8ITabef/o9IiYkob32+0pW3j3+aO
>>>>> 0kK4SwWtoJ4qWwFsOD4ANcg5QjC9KcL2NlYe2gtWQhk3f9Fz9FbdfNzAptNvs94v
>>>>> qic2JONG9aa/CWnqO6RjF0JUCXIcUyr3jr5eKsBh9mli6wd3RYJbRZXHAXBD7ypA
>>>>> 3MPd2gX72zl6lCM+gJWgedK7c1YB6YbDcie+hGrj4m/0oHZeZdThbZJLymxvFRul
>>>>> 0gr9vxE99ggO3sTq9XLr
>>>>> =2y73
>>>>> -----END PGP SIGNATURE-----
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Yo'av Moshe
>>>>
>>>
>>>
>>> --
>>> Yo'av Moshe
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Yo'av Moshe
>>
>>
>>
>>
>>
>>
>
> --
> Yo'av Moshe
>
>
>
>
> >
>

-- 
Yo'av Moshe

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to