On Sat, Mar 16, 2013 at 2:32 PM, Oscar Benjamin <oscar.j.benja...@gmail.com> wrote: > On 16 March 2013 21:14, Abhishek Pratap <abhishek....@gmail.com> wrote: >> Hey Guys >> >> I am trying to use itertools.izip_longest to read a large file in >> chunks based on the examples I was able to find on the web. However I >> am not able to understand the behaviour of the following python code. >> (contrived form of example) >> >> for x in itertools.izip_longest(*[iter([1,2,3])]*2): >> print x >> >> >> ###output: >> (1, 2) >> (3, None) >> >> >> It gives me the right answer but I am not sure how it is doing it. I >> also referred to the itertools doc but could not comprehend much. In >> essence I am trying to understand the intracacies of the following >> documentation from the itertools package. >> >> "The left-to-right evaluation order of the iterables is guaranteed. >> This makes possible an idiom for clustering a data series into >> n-length groups using izip(*[iter(s)]*n)." >> >> How is *n able to group the data and the meaning of '*' in the >> beginning just after izip. > > The '*n' part is to multiply the list so that it repeats. This works > for most sequence types in Python: > >>>> a = [1,2,3] >>>> a * 2 > [1, 2, 3, 1, 2, 3] > > In this particular case we multiply a list containing only one item, > the iterator over s. This means that the new list contains the same > element twice: >>>> it = iter(a) >>>> [it] > [<listiterator object at 0x166c990>] >>>> [it] * 2 > [<listiterator object at 0x166c990>, <listiterator object at 0x166c990>] > > So if every element of the list is the same iterator, then we can call > next() on any of them to get the same values in the same order: >>>> d = [it]*2 >>>> d > [<listiterator object at 0x166c990>, <listiterator object at 0x166c990>] >>>> next(d[1]) > 1 >>>> next(d[0]) > 2 >>>> next(d[0]) > 3 >>>> next(d[0]) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > StopIteration >>>> next(d[1]) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > StopIteration > > The * just after izip is for argument unpacking. This allows you to > call a function with arguments unpacked from a list: > >>>> def f(x, y): > ... print('x is %s' % x) > ... print('y is %s' % y) > ... >>>> f(1, 2) > x is 1 > y is 2 >>>> args = [1,2] >>>> f(args) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > TypeError: f() takes exactly 2 arguments (1 given) >>>> f(*args) > x is 1 > y is 2 > > So the original expression, izip(*[iter(s)]*2), is another way of writing > > it = iter(s) > izip(it, it) > > And izip(*[iter(s)]*10) is equivalent to > > izip(it, it, it, it, it, it, it, it, it, it) > > Obviously writing it out like this will get a bit unwieldy if we want > to do izip(*[iter(s)]*100) so the preferred method is > izip(*[iter(s)]*n) which also allows us to choose what value to give > for n without changing anything else in the code. > > > Oscar
Thanks a bunch Oscar. This is why I love this community. It is absolutely clear now. It is funny I am getting the solution over the mailing list while I am at pycon :) best, -Abhi _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor