Thanks to everyone who replied! I'll take a further look into the encoding of the file because I'm interested in that for other reasons. In the output I saw, u"\xe1" (and a few others I found after sending my note) were prevalent around the splits.
For the moment, though, I've solved my immediate difficulty by splitting twice. I really only need the space delimited fields that appear after a tab in each line, and the characters causing problems are always before that. I split by tab first and then a normal split of that gets me to the fields I need. -- Jeremy _______________________________________________ Pythonmac-SIG maillist - Pythonmac-SIG@python.org http://mail.python.org/mailman/listinfo/pythonmac-sig