On Dec 21, 8:13 am, Steven D'Aprano <[EMAIL PROTECTED] cybersource.com.au> wrote: > [Fixing top-posting.] > > > > > > On Thu, 20 Dec 2007 12:41:44 -0800, Wojciech Gryc wrote: > > On Dec 20, 3:30 pm, John Machin <[EMAIL PROTECTED]> wrote: > [snip] > >> > However, when I use Python's various methods -- readline(), > >> > readlines(), or xreadlines() and loop through the lines of the file, > >> > the line program exits at 16,000 lines. No error output or anything > >> > -- it seems the end of the loop was reached, and the code was > >> > executed successfully. > ... > >> One possibility: you are running this on Windows and the file contains > >> Ctrl-Z aka chr(26) aka '\x1a'. > > > Hi, > > > Python 2.5, on Windows XP. Actually, I think you may be right about \x1a > > -- there's a few lines that definitely have some strange character > > sequences, so this would make sense... Would you happen to know how I > > can actually fix this (e.g. replace the character)? Since Python doesn't > > see the rest of the file, I don't even know how to get to it to fix the > > problem... Due to the nature of the data I'm working with, manual > > editing is also not an option. > > > Thanks, > > Wojciech > > Open the file in binary mode: > > open(filename, 'rb') > > and Windows should do no special handling of Ctrl-Z characters. > > -- > Steven
I don't know whether it's a bug or a feature or just a dark corner, but using mode='rU' does no special handling of Ctrl-Z either. >>> x = 'foo\r\n\x1abar\r\n' >>> f = open('udcray.txt', 'wb') >>> f.write(x) >>> f.close() >>> open('udcray.txt', 'r').readlines() ['foo\n'] >>> open('udcray.txt', 'rU').readlines() ['foo\n', '\x1abar\n'] >>> for line in open('udcray.txt', 'rU'): ... print repr(line) ... 'foo\n' '\x1abar\n' >>> Using 'rU' should make the OP's task of finding the strange character sequences a bit easier -- he won't have to read a block at a time and worry about the guff straddling a block boundary. -- http://mail.python.org/mailman/listinfo/python-list