New submission from Diego Argueta <diego.argu...@gmail.com>:

It appears that calling readline() on a codecs.EncodedFile stream breaks 
seeking and causes subsequent attempts to iterate over the lines or call 
readline() to backtrack and return already consumed lines.

A minimal example:

```
from __future__ import print_function

import codecs
import io


def run(stream):
    offset = stream.tell()
    try:
        stream.seek(0)
        header_row = stream.readline()
    finally:
        stream.seek(offset)

    print('Got header: %r' % header_row)

    if stream.tell() == 0:
        print('Skipping the header: %r' % stream.readline())

    for index, line in enumerate(stream, start=2):
        print('Line %d: %r' % (index, line))


b = io.BytesIO(u'a,b\r\n"asdf","jkl;"\r\n'.encode('utf-16-le'))
s = codecs.EncodedFile(b, 'utf-8', 'utf-16-le')

run(s)
```

Output:

```
Got header: 'a,b\r\n'
Skipping the header: '"asdf","jkl;"\r\n'    <-- this is line 2
Line 2: 'a,b\r\n'                           <-- this is line 1
Line 3: '"asdf","jkl;"\r\n'                 <-- now we're back to line 2
```

As you can see, the line being skipped is actually the second line, and when we 
try reading from the stream again, the iterator starts from the beginning of 
the file.

Even weirder, adding a second call to readline() to skip the second line shows 
it's going **backwards**:

```
Got header: 'a,b\r\n'
Skipping the header: '"asdf","jkl;"\r\n'    <-- this is actually line 2
Skipping the second line: 'a,b\r\n'         <-- this is line 1
Line 2: '"asdf","jkl;"\r\n'                 <-- this is now correct
```

The expected output shows that we got a header, skipped it, and then read one 
data line.

```
Got header: 'a,b'
Skipping the header: 'a,b\r\n'
Line 2: '"asdf","jkl;"\r\n'
```

I'm sure this is related to the implementation of readline() because if we 
change this:

```
header_row = stream.readline()
```

to this:

```
header_row = stream.read().splitlines()[0]
```

then we get the expected output. If on the other hand we comment out the seek() 
in the finally clause, we also get the expected output (minus the "skipping the 
header") code.

----------
components: IO, Library (Lib)
messages: 315768
nosy: da
priority: normal
severity: normal
status: open
title: readline() + seek() on io.EncodedFile breaks next readline()
type: behavior
versions: Python 2.7, Python 3.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33361>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to