New submission from Diego Argueta <diego.argu...@gmail.com>: It appears that calling readline() on a codecs.EncodedFile stream breaks seeking and causes subsequent attempts to iterate over the lines or call readline() to backtrack and return already consumed lines.
A minimal example: ``` from __future__ import print_function import codecs import io def run(stream): offset = stream.tell() try: stream.seek(0) header_row = stream.readline() finally: stream.seek(offset) print('Got header: %r' % header_row) if stream.tell() == 0: print('Skipping the header: %r' % stream.readline()) for index, line in enumerate(stream, start=2): print('Line %d: %r' % (index, line)) b = io.BytesIO(u'a,b\r\n"asdf","jkl;"\r\n'.encode('utf-16-le')) s = codecs.EncodedFile(b, 'utf-8', 'utf-16-le') run(s) ``` Output: ``` Got header: 'a,b\r\n' Skipping the header: '"asdf","jkl;"\r\n' <-- this is line 2 Line 2: 'a,b\r\n' <-- this is line 1 Line 3: '"asdf","jkl;"\r\n' <-- now we're back to line 2 ``` As you can see, the line being skipped is actually the second line, and when we try reading from the stream again, the iterator starts from the beginning of the file. Even weirder, adding a second call to readline() to skip the second line shows it's going **backwards**: ``` Got header: 'a,b\r\n' Skipping the header: '"asdf","jkl;"\r\n' <-- this is actually line 2 Skipping the second line: 'a,b\r\n' <-- this is line 1 Line 2: '"asdf","jkl;"\r\n' <-- this is now correct ``` The expected output shows that we got a header, skipped it, and then read one data line. ``` Got header: 'a,b' Skipping the header: 'a,b\r\n' Line 2: '"asdf","jkl;"\r\n' ``` I'm sure this is related to the implementation of readline() because if we change this: ``` header_row = stream.readline() ``` to this: ``` header_row = stream.read().splitlines()[0] ``` then we get the expected output. If on the other hand we comment out the seek() in the finally clause, we also get the expected output (minus the "skipping the header") code. ---------- components: IO, Library (Lib) messages: 315768 nosy: da priority: normal severity: normal status: open title: readline() + seek() on io.EncodedFile breaks next readline() type: behavior versions: Python 2.7, Python 3.6 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33361> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com