On 14/11/2011 21:43, Tony Pelletier wrote:
Good Afternoon,
I'm writing a program that is essentially connecting to MS SQL Server
and dumping all the contents of the tables to separate csv's. I'm
almost complete, but now I'm running into a Unicode issue and I'm not
sure how to resolve it.
I have a ridiculous amount of tables but I managed to figure out it was
my Contact and a contact named Robert Bock. Here's what I caught.
(127, None, u'Robert', None, u'B\xf6ck', 'uCompany Name', None, 1, 0,
327, 0)
The u'B\xf6ck' is actually Böck. Notice the ö
My problem is I'm not really sure how to handle it and whether or not
it's failing on the query or the insert to the csv. The Exception is:
'ascii' codec can't encode character u'\xf6' in position 1: ordinal not
in range(128)
Thanks for producing a thinned-down example. If I may take this at
face value, I assume you're doing something like this:
<code>
import csv
#
# Obviously from database, but for testing...
#
data = [
(127, None, u'Robert', None, u'B\xf6ck', 'uCompany Name', None, 1, 0,
327, 0),
]
with open ("temp.csv", "wb") as f:
writer = csv.writer (f)
writer.writerows (data)
</code>
which gives the error you describe.
In short, the csv module in Python 2.x (not sure about 3.x) is
unicode-unaware. You're passing it a unicode object and it's got no way
of knowing what codec to use to encode it. So it doesn't try to guess:
it just uses the default (ascii) and fails.
And this is where it gets just a little bit messy. Depending on how much
control you have over your data and how important the unicodeiness of it
is, you need to encode things explicitly before they get to the csv module.
One (brute force) option is this:
<code snippet>
def encoded (iterable_of_stuff):
return tuple (
(i.encode ("utf8") if isinstance (i, unicode) else i)
for i in iterable_of_stuff
)
#
# ... other code
#
writer.writerows ([encoded (row) for row in data])
</code snippet>
This will encode anything unicode as utf8 and leave everything else
untouched. It will slow down your csv generation, but that might well
not matter (especially if you're basically IO-bound).
TJG
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor