[issue23455] file iterator deemed broken; can resume after StopIteration

2015-07-21 Thread Ethan Furman

Changes by Ethan Furman et...@stoneleaf.us:


--
nosy:  -ethan.furman

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23455
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23455] file iterator deemed broken; can resume after StopIteration

2015-03-01 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +pitrou
type:  - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23455
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23455] file iterator deemed broken; can resume after StopIteration

2015-02-12 Thread Andrew Dalke

New submission from Andrew Dalke:

The file iterator is deemed broken. As I don't think it should be made 
non-broken, I suggest the documentation should be changed to point out when 
file iteration is broken. I also think the term 'broken' is a label with 
needlessly harsh connotations and should be softened.

The iterator documentation uses the term 'broken' like this (quoting here from 
https://docs.python.org/3.4/library/stdtypes.html):

  Once an iterator’s __next__() method raises StopIteration,
  it must continue to do so on subsequent calls. Implementations
  that do not obey this property are deemed broken.

(Older versions comment This constraint was added in Python 2.3; in Python 
2.2, various iterators are broken according to this rule.)

An IOBase is supposed to support the iterator protocol (says 
https://docs.python.org/3.4/library/io.html#io.IOBase ). However, it does not, 
nor does the documentation say that it's broken in the face of a changing file 
(eg, when another process appends to a log file).

  % ./python.exe 
  Python 3.5.0a1+ (default:4883f9046b10, Feb 11 2015, 04:30:46) 
  [GCC 4.8.4] on darwin
  Type help, copyright, credits or license for more information.
   f = open(empty)
   next(f)
  Traceback (most recent call last):
File stdin, line 1, in module
  StopIteration
  
   ^Z
  Suspended
  % echo Hello!  empty
  % fg
  ./python.exe

   next(f)
  'Hello!\n'

This is apparently well-known behavior, as I've come across several references 
to it on various Python-related lists, including this one from Miles in 2008: 
https://mail.python.org/pipermail/python-list/2008-September/491920.html .

  Strictly speaking, file objects are broken iterators:

Fredrik Lundh in the same thread ( 
https://mail.python.org/pipermail/python-list/2008-September/521090.html ) says:

  it's a design guideline, not an absolute rule

The 7+ years of 'broken' behavior in Python suggests that /F is correct. But 
while 'broken' could be considered a meaningless label, it carries with it some 
rather negative connotations. It sounds like developers are supposed to make 
every effort to avoid broken code, when that's not something Python itself 
does. It also means that my code can be called broken solely because it 
assumed Python file iterators are non-broken. I am not happy when people say my 
code is broken.

It is entirely reasonable that a seek(0) would reset the state and cause 
next(it) to not continue to raise a StopIteration exception. However, errors 
can arise when using Python file objects, as an iterator, to parse a log file 
or any other files which are appended to by another process.

Here's an example of code that can break. It extracts the first and last 
elements of an iterator; more specifically, the first and last lines of a file. 
If there are no lines it returns None for both values; and if there's only one 
line then it returns the same line as both values.

  def get_first_and_last_elements(it):
first = last = next(it, None)
for last in it:
pass
return first, last

This code expects a non-broken iterator. If passed a file, and the file were 1) 
initially empty when the next() was called, and 2) appended to by the time 
Python reaches the for loop, then it's possible for first value to be None 
while last is a string.

This is unexpected, undocumented, and may lead to subtle errors.

There are work-arounds, like ensuring that the StopIteration only occurs once:

  def get_first_and_last_elements(it):
first = last = next(it, None)
if last is not None:
for last in it:
pass
return first, last

but much existing code expects non-broken iterators, such as the Python example 
implementation at 
https://docs.python.org/2/library/itertools.html#itertools.dropwhile . (I have 
a reproducible failure using it, a fork(), and a file iterator with a sleep() 
if that would prove useful.)

Another option is to have a wrapper around file object iterators to keep 
raising StopIteration, like:

   def safe_iter(it):
   yield from it

   # -or-  (line for line in file_iter)

but people need to know to do this with file iterators or other potentially 
broken iterators. The current documentation does not say when file iterators 
are broken, and I don't know which other iterators are also broken.

I realize this is a tricky issue.

I don't think it's possible now to change the file's StopIteration behavior. I 
expect that there is code which depends on the current brokenness, the ability 
to seek() and re-iterate is useful, and the idea that next() returns text if 
and only if readline() is not empty is useful and well-entrenched. Pypy has the 
same behavior as CPython so any change will take some time to propagate to the 
other implementations.

Instead, I'm fine with a documentation change in io.html . It currently says:

  IOBase (and its subclasses) support the iterator protocol,
  meaning that an IOBase object can be iterated over yielding
 

[issue23455] file iterator deemed broken; can resume after StopIteration

2015-02-12 Thread Ethan Furman

Changes by Ethan Furman et...@stoneleaf.us:


--
nosy: +ethan.furman

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23455
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com