Thanks to everyone who replied!

I'll take a further look into the encoding of the file because I'm
interested in that for other reasons. In the output I saw, u"\xe1" (and a
few others I found after sending my note) were prevalent around the splits.

For the moment, though, I've solved my immediate difficulty by splitting
twice. I really only need the space delimited fields that appear after a tab
in each line, and the characters causing problems are always before that. I
split by tab first and then a normal split of that gets me to the fields I
need.


-- 
Jeremy


_______________________________________________
Pythonmac-SIG maillist  -  Pythonmac-SIG@python.org
http://mail.python.org/mailman/listinfo/pythonmac-sig

Reply via email to