[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-09-24 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Added the patch for PR 1568 against the current master just for history.

--
Added file: https://bugs.python.org/file47167/groupby-invalid.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-09-24 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you for your review and explanations Raymond. This makes sense to me.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-09-24 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:


New changeset c247caf33f6e6000d828db4762d1cb12edf3cd57 by Serhiy Storchaka in 
branch 'master':
bpo-30346: An iterator produced by the itertools.groupby() iterator (#1569)
https://github.com/python/cpython/commit/c247caf33f6e6000d828db4762d1cb12edf3cd57


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-09-24 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Go ahead with PR 1569 to exhaust the inner iterator when the outer iterator 
advances.

Rationale:  The OP expected the inner iterator to be exhausted.  That is also 
what the PyPy team originally implemented and it is what I would have expected.

Also, there are some loose analogies elsewhere in the language.  Raising 
StopIteration is what next(somefileobject) does when there is a seek-to-end 
between next() calls.  A list iterator raises StopIteration when there is a del 
somelist[:] between next() calls.  When zip(it1, it2) terminates on the 
shortest input, it leaves the other iterator alive, allowing it to run and 
terminally normally with StopIteration. And though I don't have a use case for 
it, I would expect that next(inner_iterator, default_value) would return a 
value from the input stream or the default value but would not fail with a 
RuntimeError (this would apply to islice() as well).

--
assignee: rhettinger -> serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-09-21 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Ping. Can you make your decision Raymond?

This issue is a stopper for issue30347 which fixes a crash. I don't want to 
merge the fix for issue30347 until this issue be solved, because both proposed 
PRs solve issue30347 too, and I don't want to make unneeded changes or move 
code forth and back.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-29 Thread Raymond Hettinger

Raymond Hettinger added the comment:

I've looked at the patches but am still thinking about what I prefer before 
committing to a re-design.  Please give it a little time.  There is zero 
urgency here (nothing is broken, no user complaints, etc).   I have a bunch of 
other stuff on my plate for a little while and this isn't the top priority.  I 
had asked to hold off on PRs until a decision is made and not rush into coding 
(which is usually the easy part).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-28 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Raymond, could you please look at these patches? Both are small. What do you 
prefer?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-13 Thread Tim Peters

Tim Peters added the comment:

Users certainly have been fooled by this, although "unpacking" is a red 
herring.  I've been burned by it, and I've seen StackOverflow puzzles related 
to the same thing.  I think this is the heart of it:  given a finite iterable 
I, it usually makes no semantic difference whether a program does:

for thing in I:

or

for thing in list(I):

But when `I` is obtained from `groupby()`, `thing` contains an iterator that 
shares state with `I` itself, so they're not at all the same.  It's surprising 
just because it's so uncommon.  "unpacking" is one special case of this.

I'd like to see an attempt to invoke a sub-iterator raise an exception if the 
primary iterator has moved beyond it.  The behavior in that case now is 
undefined, utterly accidental, and useful only for surprising people ;-)  
Nobody _intends_ to do this, unless they have no idea how `groupby()` is meant 
to be used.

--
nosy: +tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

PR 1568 and PR 1569 are two subvariants of #1. PR 1568 makes an inner iterator 
invalid and raising RuntimeError when used. PR 1569 makes an inner iterator 
exhausted.

--
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
pull_requests: +1664

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
pull_requests: +1663

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Raymond Hettinger

Raymond Hettinger added the comment:

> If I was to submit this as a PR, ...

Please don't jump to writing code yet.  We need to decide what should be done 
first.  

I'm not really excited about making a change at all.  No users seem to have 
ever been affected by this.  The CPython code matches what it is documented to 
do (by the pure python code in the documentation).  In a way, it is just an 
obscure implementation artifact.

If we do make a small change, I would like it to be 3.7 only.  No need to risk 
breaking existing code for something that technically isn't even a bug (because 
it matches the documentation).  As the title suggests, it is more of an oddity 
than anything else.

--
versions: +Python 3.7 -Python 3.3, Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Matthew Gilson

Matthew Gilson added the comment:

Tracking which group the grouper _should_ be on using an incrementing integer 
seems to work pretty well.

In additional to the tests in `test_itertools.py`, I've gotten the following 
tests to pass:

class TestGroupBy(unittest.TestCase):
def test_unpacking(self):
iterable = 'AB'
(_, a), (_, b) = groupby(iterable)
self.assertEqual(list(a), [])
self.assertEqual(list(b), [])

def test_weird_iterating(self):
g = groupby('AB')
_, a = next(g)
_, b = next(g)
_, aa = next(g)
self.assertEqual(list(a), [])
self.assertEqual(list(b), [])
self.assertEqual(list(aa), list('A'))

If I was to submit this as a PR,

1. where would I want to add these tests?
2. should I update the documentation for the "equivalent" python version to 
match exactly?

--
keywords: +patch
Added file: http://bugs.python.org/file46860/groupby-fix.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I agree and prefer #1. This will save users from incorrect using and us from 
reports about odd behavior.

The implementation is simple. The groupby object generates an unique identifier 
when creates a new grouper. This may be a sequential integer, just object(), or 
the grouper object itself (but the latter case creates a reference loop). The 
grouper object saves the parent's identifier when created and compare the owned 
reference with the current parent's identifier before every use (in __next__ 
and __reduce__). If they don't match, the grouper object is not valid.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Raymond Hettinger

Raymond Hettinger added the comment:

I have no interest in #2.   Itertools are all about *not* storing data in 
memory unless that is an intrinsic requirement for the operation (tee() and 
cycle() for example).

Also, I would like any changes to be minimal and have the lowest possible risk 
of breaking code.  The groupby() tool is about 13 years old and has served its 
users well.  These little corner cases we're working on now don't seem to 
correspond to how users actually use groupby or how it was intended to be used 
(either eat the sub-iterator right away before advancing the outer iterator or 
ignore it entirely).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a Python implementation of the idea #2:

from itertools import tee
def groupby(iterable, key=None):
if key is None:
key = lambda x: x
def grouper(currvalue, it, tgtkey):
yield currvalue
for currvalue in it:
if key(currvalue) != tgtkey:
break
yield currvalue
it = iter(iterable)
tgtkey = init = object()
while True:
try:
currvalue = next(it)
except StopIteration:
return
currkey = key(currvalue)
if tgtkey is init or currkey != tgtkey:
tgtkey = currkey
it, it2 = tee(it)
yield (currkey, grouper(currvalue, it2, tgtkey))

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Does it make sense using a grouper after advancing the groupby iterator? 
Currently it is possible by accident:

>>> from itertools import groupby, count
>>> it = groupby(count(), lambda i: (i//10)%2)
>>> _, even = next(it)
>>> _, odd = next(it)
>>> print(list(even))
[]
>>> print(list(odd))
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>> print(list(even))
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
>>> print(list(odd))
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
>>> print(list(even))
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]

But I think that it would be more consistent to implement one of following 
variant:

1. Invalidate groupers after creating a new grouper. There should be only one 
valid grouper related to the groupby object. Using of invalid grouper should 
raise either StopIteration or RuntimeError.

>>> it = groupby(count(), lambda i: (i//10)%2)
>>> _, even = next(it)
>>> _, odd = next(it)
>>> print(list(even))
[]
>>> print(list(odd))
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>> print(list(even))
[]
>>> print(list(odd))
[]

or

>>> it = groupby(count(), lambda i: (i//10)%2)
>>> _, even = next(it)
>>> _, odd = next(it)
>>> print(list(even))
Traceback (most recent call last):
  File "", line 1, in 
RuntimeError
>>> print(list(odd))
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>> print(list(even))
Traceback (most recent call last):
  File "", line 1, in 
RuntimeError
>>> print(list(odd))
[]

2. Duplicate the source iterator using tee() when create a new grouper. All 
groupers can be used independently from the groupby object and other groupers.

>>> it = groupby(range(100), lambda i: (i//10)%2)
>>> _, even = next(it)
>>> _, odd = next(it)
>>> _ = list(it)  # exhaust the source iterator
>>> print(list(even))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> print(list(odd))
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>> print(list(even))
[]
>>> print(list(odd))
[]

I think resolving this issue will also fix issue30347.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Matthew Gilson

Matthew Gilson added the comment:

Oops.  You don't need to pass `self.currvalue` to `_grouper`. I didn't end up 
using it in there...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Matthew Gilson

Matthew Gilson added the comment:

I think that works to solve the problem that I pointed out.  In my stack 
overflow question (http://stackoverflow.com/a/43926058/748858) it has been 
pointed out that there are other opportunities for weirdness here.

Specifically, if if I skip processing 2 groups and then I process a third group 
whose key is the same as the first:


inputs = [(x > 5, x) for x in range(10)]
inputs += [(False, 10), (True, 11)]

g = groupby(inputs2 + [(True, 11)], key=itemgetter(0))
_, a = next(g)
_, b = next(g)
_, c = next(g)

print(list(a))
print(list(b))

Both `a` and `b` should probably be empty at this point, but they aren't.  

What if you kept track of the last iterable group and just consumed it at 
whenever `next` is called?  I think then you also need to keep track of whether 
or not the input iterable has been completely consumed, but that's not too bad 
either:

_sentinel = object()

class groupby:
# [k for k, g in groupby('BBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('BBBCCD')] -->  BBB CC D
def __init__(self, iterable, key=None):
if key is None:
key = lambda x: x
self.keyfunc = key
self.it = iter(iterable)
self.last_group = self.currkey = self.currvalue = _sentinel
self.empty = False

def __iter__(self):
return self

def __next__(self):
if self.last_group is not _sentinel:
for _ in self.last_group:
pass
if self.empty:
raise StopIteration

if self.currvalue is _sentinel:
try:
self.currvalue = next(self.it)
except StopIteration:
self.empty = True
raise
self.currkey = self.keyfunc(self.currvalue)
self.last_group = self._grouper(self.currkey, self.currvalue)
return (self.currkey, self.last_group)

def _grouper(self, tgtkey, currvalue):
while self.currkey == tgtkey:
yield self.currvalue
try:
self.currvalue = next(self.it)
except StopIteration:
self.empty = True
return
self.currkey = self.keyfunc(self.currvalue)

I haven't tested this to make sure it passes the test suite -- I also don't 
know if this would have major performance implications or anything.  If it did 
have severe performance implications, then it probably isn't worthwhile...

--
nosy: +mgilson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-11 Thread Raymond Hettinger

Raymond Hettinger added the comment:

I suppose that when the input iterator is exhausted, we could poison the 
currkey to make the subordinate iterator ends as well:

def __next__(self):
while self.currkey == self.tgtkey:
try:
self.currvalue = next(self.it)
except StopIteration:
self.currkey = object()
raise StopIteration from None
self.currkey = self.keyfunc(self.currvalue)
self.tgtkey = self.currkey
return (self.currkey, self._grouper(self.tgtkey))

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-11 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
priority: normal -> low

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-11 Thread Raymond Hettinger

Raymond Hettinger added the comment:

FYI, the CPython behavior matches the pure python implementation show in the 
docs:

Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "copyright", "credits" or "license()" for more information.
>>> class groupby:
# [k for k, g in groupby('BBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('BBBCCD')] -->  BBB CC D
def __init__(self, iterable, key=None):
if key is None:
key = lambda x: x
self.keyfunc = key
self.it = iter(iterable)
self.tgtkey = self.currkey = self.currvalue = object()
def __iter__(self):
return self
def __next__(self):
while self.currkey == self.tgtkey:
self.currvalue = next(self.it)# Exit on StopIteration
self.currkey = self.keyfunc(self.currvalue)
self.tgtkey = self.currkey
return (self.currkey, self._grouper(self.tgtkey))
def _grouper(self, tgtkey):
while self.currkey == tgtkey:
yield self.currvalue
try:
self.currvalue = next(self.it)
except StopIteration:
return
self.currkey = self.keyfunc(self.currvalue)

>>> from operator import itemgetter
>>> inputs = ((x > 5, x) for x in range(10))
>>> (_, a), (_, b) = groupby(inputs, key=itemgetter(0))
>>> print(list(a))
[]
>>> print(list(b))
[(True, 9)]

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-11 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
assignee:  -> rhettinger
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-11 Thread Matt Gilson

New submission from Matt Gilson:

There is some odd behavior when unpacking the groups from an itertools.groupby 
result.  For example:

from itertools import groupby
from operator import itemgetter

inputs = ((x > 5, x) for x in range(10))
(_, a), (_, b) = groupby(inputs, key=itemgetter(0))
print(list(a))
print(list(b))

On CPython, this results in:

[]
[(True, 9)]

I would expect it to print out 2 empty lists since the second group would have 
to be consumed to make sure that there isn't another value yielded from groupby 
(If there was another value to yield, we'd get a ValueError since we couldn't 
unpack the correct number of items).

This is the behavior that PyPy had prior to re-implementing to be consistent 
with CPython in 
https://bitbucket.org/pypy/pypy/commits/6093ff1a44e6b17f09db83aa80aea562a738c286

--
messages: 293511
nosy: Matt Gilson
priority: normal
severity: normal
status: open
title: Odd behavior when unpacking `itertools.groupby`
type: behavior
versions: Python 3.3, Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com