[issue32099] Use range in itertools roundrobin recipe

2017-11-23 Thread Raymond Hettinger

Raymond Hettinger  added the comment:


New changeset 0858495a50e19defde786a4ec050ec182e920f46 by Raymond Hettinger in 
branch 'master':
bpo-32099 Add deque variant of roundrobin() recipe (#4497)
https://github.com/python/cpython/commit/0858495a50e19defde786a4ec050ec182e920f46


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-23 Thread Raymond Hettinger

Change by Raymond Hettinger :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-21 Thread Raymond Hettinger

Change by Raymond Hettinger :


--
pull_requests: +4434

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-21 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

But why use less efficient code? Is my code worse?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-21 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

Really, I don't give a whit about the micro-benchmarks -- that completely 
misses the point.  The recipes are primarily intended to have educational value 
on ways to use itertools, deques, and whatnot.

For itertools, I'm satisfied with new variable name and the additional comment. 
 For collections, there is an open question about whether adding an extra 
example would make users better off.   Beyond that, I have little interest is 
exploring all the little variants or wasting further time on this (that is what 
ASPN, StackOverflow, and blog posts are for).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-21 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Today I have published a similar recipe on Python-Ideas. It uses popleft/append 
instead of __getitem__/rotate.

def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
nexts = deque(iter(it).__next__ for it in iterables)
popleft = nexts.popleft
append = nexts.append
while nexts:
next = popleft()
try:
yield next()
except StopIteration:
pass
else:
append(next)

It is faster (10-25%) in all microbenchmarks that I did (Steven's benchmarks 
for small number of iterables and my examples for large number of iterables).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-21 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

While we're on the topic, I had some thought of also adding a similar recipe to 
https://docs.python.org/3/library/collections.html#deque-recipes .  The 
alternative recipe is slower is common cases but doesn't degrade when there are 
a huge number of iterables:  

def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
nexts = deque(iter(it).__next__ for it in iterables)
while nexts:
try:
while True:
yield nexts[0]()
nexts.rotate(-1)
except StopIteration:
nexts.popleft()

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-21 Thread Roundup Robot

Change by Roundup Robot :


--
pull_requests: +4425

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Actually I tried to test if this implementation can cause a crash due to using 
deeply nested iterators (like in issue14010). For example in 
next(roundrobin(*([[]]*N + [[1]]))). But if there is such problem here, 
roundrobin() becomes unusable due to quadratic time for smaller N (tens of 
thousands).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

I agree that using binary tree would be excessively here. I just wondering if 
there is a simple way to get rid of the quadratic complexity. In any case this 
does not matter much until you work with many hundreds or thousands of 
iterables.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Raymond Hettinger

Change by Raymond Hettinger :


--
Removed message: https://bugs.python.org/msg306629

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

Serhiy, I think you're focusing on an irrelevant edge case and reading too much 
into the recipe.  You could create an underlying binary tree with O(n) 
iteration and O(log n) deletion but then that completely misses the point of 
the itertools recipes and would likely be pointless in the real-world and 
likely perform worse than the current recipe in common cases unless you wrote a 
C extension for it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

roundrobin() has quadratic computational complexity. For example 
list(roundrobin(*([[1]]*N))). Is there a way to make it with linear complexity?

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Raymond Hettinger

Change by Raymond Hettinger :


--
keywords: +patch
pull_requests: +4424
stage: needs patch -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Tim Peters

Tim Peters  added the comment:

I agree the current recipe strikes a very nice balance among competing 
interests, and is educational on several counts.

s/pending/numactive/

# Remove the iterator we just exhausted from the cycle.
numactive -= 1
nexts = cycle(islice(nexts, numactive))

--
nosy: +tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

IMO, George Sakkis's original version is better than the TJR's proposed 
revision which has three separate adjustments by one and adds to extraneous 
functions which aren't central to the example.  Dubslow's alternative at least 
avoids the three offsets by one; however, it has the disadvantage that meaning 
of the loop induction variable is somewhat indirect and creates its own source 
of confusion.

I agree that some algorithmic comment should be added, probably just for the 
last line.

Let's not add any of the listed alternative names to the comments.  Each one is 
either  confusing or wrong.  The word "merge" already has an established, 
different meaning as in "sort/merge".  For example,  heapq.merge('ABC', 'D', 
'EF') gives ['A', 'B', 'C', 'D', 'E', 'F'].  The word "alternate" tends to mean 
"toggle back-and-forth" in common usage.  The term "zip_flat" has no meaning to 
me, it has no hits on Google, and the closest match is a recipe on 
StackOverflow that does something different.  And "interleave" is not 
suggestive of looping back over the iterables until each is exhausted.  Also, 
we may yet use interleave() to mean something else.

In contrast, "round robin" has a well established meaning that matches what 
this recipe does. Until now, not a single reader has ever claimed to not know 
what it means.  https://en.wikipedia.org/wiki/Round-robin_scheduling

FWIW, the recipe has several benefit.  1) Give a way to implement round robin 
iteration without re-inventing the wheel, 2) Demonstrate ways to use cycle() 
and islice().  3) Demonstrate useful optimization technique of factoring the 
try/except out of the for-loop, 4) Demonstrate the useful optimization 
technique of calling bound methods, and 5) Be concise (I've left longer or more 
complex recipes for the ASPN cookbook or StackOverflow).

Ideally, I would prefer to not add two extra builtin lookups (the recipe is 
sometime used on short inputs which would be affected by the extra overhead).  
Also, I prefer the visual weight to be on the central message of tight inner 
loops that exploit itertools rather than having the visual weight shift to the 
for-loop which is the least important part.

Can you a suggest a concise single line comment that would make the last line 
clearer about what it is doing?   Also, I'm open to changing the name of the 
"pending" variable but think "current_len" and "reduced_len" are not 
improvements.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Raymond Hettinger

Change by Raymond Hettinger :


--
assignee: docs@python -> rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Dubslow

Dubslow  added the comment:

Er, in my first message, make that "(yield from tup for tup in 
zip_longest(*iters, usefill=False))"

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Dubslow

Dubslow  added the comment:

Perhaps the loop variable could be renamed to "len_minus_1" or some such 
something which is more semantic to the algorithm.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Dubslow

Dubslow  added the comment:

Regarding the current bug more specifically, I think a few comments would go a 
long way to helping readers understand the code.

And also, I do think the (1, +1, -1) is less readable, simply because it 
doesn't follow the most common usage patterns of range, where your first 
version using (0,0,0) (with implicit zeroes of course) is cleaner. It's much 
more apparent how long the loop is at first glance ("loop once per iter" vs 
"loop... from what to what? oh, once per iter"). Perhaps not using reversed() 
but instead setting step=-1 would be a better case to use the off-by-1 variant.

Such a version with both proposed changes (which are independent and can be 
considered or accepted separately) looks like this:


def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# This recipe is also called other names like "alternate", "interleave", or
# "merge". "zip_flat" would also be an accurate name.
# Algorithm: cycle around the __next__ methods of each iterable. when an
# iter runs out, remove its __next__ from the cycle.
nexts = cycle(iter(it).__next__ for it in iterables)
for reduced_len in reversed(range(len(iterables))):
try:
for next in nexts:
yield next()
except StopIteration:
nexts = cycle(islice(nexts, reduced_len))
# This skips one round of the cycle, starting the next round
# without the current iter


The last comment is probably the least valuable, though it does point out the 
slight quirk of how the last line works. I suppose this is the case for having 
the loop variable be named the actual length, but I think most Python users are 
accustomed to the last element of a list having index length-1. At any rate, I 
think the readability gain in the for statement is more than the readability 
loss in the islice().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Dubslow

Dubslow  added the comment:

Note that this follows a bit of discussion on python-ideas, in two threads:

https://mail.python.org/pipermail/python-ideas/2017-November/047920.html

https://mail.python.org/pipermail/python-ideas/2017-November/047989.html

I agree the zip_longest-derived solution is both easier to read/understand and 
also not really accurate, especially in extreme cases.

But, and see the second thread, I think it might just be better to add this 
funcitonality itself to itertools, under the name zip_flat -- per the second 
thread, there's a lot of confusion around the topic beyond just the suitability 
of the current recipe (including such things as lack of a clear name, 
legibility, efficiency, and brevity -- a fair number of people are looking for 
one or two liners, even if that doesn't really exist). (One alternative to 
zip_flat would be to add a second keyword argument to zip_longest that 
*doesn't* use a fillvalue, producing variable-length tuples when the supplied 
iters run out. Then the recipe and the entire conversation/topic could be 
replaced by "yield from zip_longest(*iters, usefill=False)".)

Despite that opinion, this suggested change is better than nothing.

--
nosy: +Dubslow

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Terry J. Reedy

New submission from Terry J. Reedy :

The itertools roundrobin recipe has an outer loop executed a preset number of 
times.  It is currently implemented with two assignments and a while loop.
https://docs.python.org/3/library/itertools.html#itertools-recipes
These can be replaced with a for loop using a reversed range.

def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
nexts = cycle(iter(it).__next__ for it in iterables)
for current_len in reversed(range(1, len(iterables)+1)):
try:
for next in nexts:
yield next()
except StopIteration:
nexts = cycle(islice(nexts, current_len - 1))

I think this is easier to understand.  So do some other participants in the 
current python-ideas thread 'Rewriting the "roundrobin" recipe in the itertools 
documentation'.

I changed 'pending' to 'current_len' because, to me, 'pending' should be the 
set of iter_nexts and not their number.

I originally avoided the '-1' in the islice call by decrementing both range 
arguments by 1 and calling the loop variable 'reduced_len'.  But having the 
loop variable be the size of the nexts iterable in the next outer iteration 
seemed confusing and not worth the trivial efficiency gain.

An independent change would be to replace 'next' with 'iter' on the basis that 
reusing the builtin name is not good and because 'nexts' is awkward to 
pronounce.

I will produce a PR if any version is preferred to the current one.
---

The OP proposed, and some participants like, an accept-reject algorithm based 
on zip_longest.

def roundrobin(*iters):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Perhaps "flat_zip_nofill" is a better name, or something similar
sentinel = object()
for tup in it.zip_longest(*iters, fillvalue=sentinel):
yield from (x for x in tup if x is not sentinel)

I dislike creating tuples we don't want, with values we don't want, with an 
arbitrarily small acceptance ratio.  I also note that zip_longest is properly 
used in grouper, whereas roundrobin is the only recipe using cycle.

--
assignee: docs@python
components: Documentation
messages: 306605
nosy: docs@python, rhettinger, terry.reedy
priority: normal
severity: normal
stage: needs patch
status: open
title: Use range in itertools roundrobin recipe
type: enhancement
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com