[issue22084] Mutating while iterating

2014-07-26 Thread Aaron Brady

New submission from Aaron Brady:

Hi, I asked about the inconsistency of the RuntimeError being raised when 
mutating a container while iterating over it here [1], set and dict iteration 
on Aug 16, 2012.

[1] http://www.gossamer-threads.com/lists/python/python/1004659

I posted a patch on the ML but never submitted it.  People's reaction seemed 
ambivalent.  Now I have an idea for a different implementation.  I'd like to 
take another shot at it.

It's one of the worst silent errors, since there's an error in the *iterator* 
when we call a *set* method.  We're going to add something to make it safer, at 
least in the sense of getting a clear failure, if the programmer does something 
that's always been ill-advised.

We have a number of options for the implementation.  We still have the option 
to introduce IterationError, possibly a subclass of RuntimeError.  These 
options are still applicable:

1) Collection of iterators
. Invalidate all open iterators on mutating
. a) Linked list
.. i) Uncounted references
.. ii) Counted references
.. iii) Weak references
. b) Weak set
2) Version index / timestamp / memo
. Iterators check whether the container has been mutated
since they were created
. a) No overflow - Python longs
.. i) Reset index if no iterators left
. b) Overflow - C ints / shorts (silent error)
3) Iterator count
. Raise exception on mutation, not iteration

The new option is:

2d) Use a dedicated empty *object* for a timestamp or memo.  A new memo is 
created on every mutation.  Before advancing, the iterator checks whether the 
current memo is a different object than it was when it was created.

Costs: The existing silent error is fairly rare.  The container gains a pointer 
to its current memo.  The iterator loses the cached length but gains a pointer 
to a memo.  The memos are blank objects: a Py ssize t and a pointer with 
certain flags at time of writing.  Speed is the same: comparing the lengths is 
replaced with comparing the memos.

Some caveats: The memory manager is used to obtain perpetually unique IDs.  A 
unique algorithm could be used instead of the memory manager, though the memo 
needs to contain a reference count more or less regardless.  There can at most 
be one memo per iterator.  The approach is outlined in pseudocode here [2]. 
Implementation could be optimized slightly by only creating new memos if 
iterators have been opened, shown here [3].

[2] 
http://home.comcast.net/~castironpi-misc/irc-0168%20mutating%20while%20iterating%20markup.html

[3] 
http://home.comcast.net/~castironpi-misc/irc-0168%20mutating%20while%20iterating%202%20markup.html

--
components: Library (Lib)
messages: 224071
nosy: castironpi
priority: normal
severity: normal
status: open
title: Mutating while iterating
type: behavior
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22084
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22084] Mutating while iterating

2014-07-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

It would be better discuss such ideas on python-ideas mailing list 
(http://mail.python.org/mailman/listinfo/python-ideas).

Option 3 breaks existing code such as

for k, v in d.items():
if pred(k, v):
d[k] = newvalue
break

Option 1 is memory inefficient. It requires a list of iterators in every dict 
(well, in almost every dict). And it doesn't look more time efficient than 
option 2.

Implementation of option 2 was provided and rejected in issue19332.

--
nosy: +rhettinger, serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22084
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22084] Mutating while iterating

2014-07-26 Thread Nick Coghlan

Nick Coghlan added the comment:

Raymond's answer at http://bugs.python.org/issue19332#msg202287 still stands: 
the checks for mutation while iterating are there primarily to *protect the 
interpreter itself*, rather than to help detect bugs where the user code misses 
some items because it mutated the dict during iteration. The fact it helps 
detect user errors is a helpful side effect of the interpreter protecting 
itself, rather than the *purpose* of the change.

Mutations that don't change the mapping size don't do the interpreter any harm, 
even if they're not what the user intended:

 d = dict(a=1, b=2, c=3)
 for k, v in d.items():
... del d[k]
... d[v] = k
... 
 d
{1: 'a', 2: 'b', 3: 'c'}
 for k, v in d.items():
... del d[k]
... d[v] = k
... 
 
 d
{'a': 1, 'b': 2, 'c': 3}

There are *lots* of ways to write buggy code that are significantly less 
obscure than this, and we don't change the way core data types work to prevent 
them.

For example, list iterators don't care if the underlying list is mutated at 
all, as again, such mutation poses no threat to the interpreter:

 seq = [x for x in range(10)]
 for i, x in enumerate(seq):
... print(i, x)
... del seq[i]
...
0 0
1 2
2 4
3 6
4 8

This kind of quirky behaviour is why be careful when mutating containers 
you're iterating over is sound advice. It *isn't* a reason to change the 
behaviour of the language to make currently legal (albeit odd) code a runtime 
error.

--
nosy: +ncoghlan
resolution:  - wont fix
stage:  - resolved
status: open - closed
versions:  -Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22084
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com