On Dec 21, 2:51 am, Peter Otten <[EMAIL PROTECTED]> wrote: > Mark T wrote: > > "Gabriel Genellina" <[EMAIL PROTECTED]> wrote in message > >> If you got that from a file, I bet you read it using the wrong > >> encoding. Try opening the file using codecs.open("filename", "rb", > >> encoding="utf-16-be") instead of plain open. > > There is an odd number of bytes in each string. Each begins and ends > > with \x00, so it doesn't look like utf-16-be. > > I think Gabriel is right. The OP probably butchered the original structure > with > > open(filename).read().split("\n")
Or he's read the file "normally" and then done line = lineZAP where ZAP is one of [:-1], .rstrip(), .rstrip("\n"), etc However that accounts only for the rightmost trailing \x00. Looks like each line has been chainsawed with .split(",") or whatever the original field separator was. If Gabriel's instructions don't "work" for the OP, the OP should show us an unambiguous representation of the first few bytes of the original file, instead of leaving it to guesswork: print repr(open("the_file", "rb").read()[:200]) -- http://mail.python.org/mailman/listinfo/python-list