Non-unicode data from CSV into MySQL can't render in Myghty

Chris Shenton Fri, 06 Jul 2007 09:10:52 -0700

I had issues in the past with (for example) umlauted-chars being
entered into my data that then couldn't be rendere by Myghty with the
default ascii codec; changing python's sitecustomize.py to use:


  sys.setdefaultencoding("utf-8")
  
has fixed this -- until now. 

My user uploaded a CSV file using a Myghty web form, processed by
Pylons, and stored through SQLAlchemy into MySQL.  When Pylons and
Myghty try to render one of these names it breaks:

  File '/usr/local/zerowait/er/controllers/vendor.py', line 38 in show
    return render_response("vendor_show.myt")
  ...
  Error: Error(UnicodeDecodeError): 'utf8' codec can't decode bytes in position 
15-16: invalid data at 
/usr/local/lib/python2.4/site-packages/Myghty-1.1-py2.4.egg/myghty/requestbuffer.py
 line 367

So it's seeing the utf8 encoding right, what's the problem then?

The troublesome word is Osterreich with an umlauted-O:

  mysql> select * from client where vendor_id=1 and name like "%sterreich%";
  +-----------+-----------+----------------------------------+
  | client_id | vendor_id | name                             |
  +-----------+-----------+----------------------------------+
  |        88 |         1 | Company Ã-sterreich GmbH.     |
  |       122 |         1 | Company Österreich GmbH.         |
  +-----------+-----------+----------------------------------+

I don't know how this renders for you in your reader, but on the
deployed system Emacs SQL mode renders with the umlaut-O for id=88 as:

  \303\226sterreich

while id=122 is rendered like:

  \326sterreich

Octal 326 is 0xD6 or decimal 214.

If I change the name text of id=122 myghty renders both, with the
id=88 variant showing an umlauted-O in Firefox.

Does this mean that the data in the DB for id=122 is bogus, not UTF-8?
Is 0xD6 a valid utf8 char?  Possibly some bogus Windows chars?  Any
suggestions how I can prevent bogus chars from getting into the DB?

The SQLAlchemy docs indicate some engine settings, defaults as:

  convert_unicode=False - if set to True, all String/character based
  types will convert Unicode values to raw byte values going into the
  database, and all raw byte values to Python Unicode coming out in
  result sets. This is an engine-wide method to provide unicode
  conversion across the board. For unicode conversion on a
  column-by-column level, use the Unicode column type instead,
  described in The Types System. 

  encoding='utf-8' - the encoding to use for all Unicode translations,
  both by engine-wide unicode conversion as well as the Unicode type
  object.  

Is it possible SQLAlchemy and/or MySQL are both doing some kind of
conversion, resulting in doubly-converted chars? 

Thanks for your help.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Non-unicode data from CSV into MySQL can't render in Myghty

Reply via email to