On 16 March 2013 21:14, Abhishek Pratap <abhishek....@gmail.com> wrote: > Hey Guys > > I am trying to use itertools.izip_longest to read a large file in > chunks based on the examples I was able to find on the web. However I > am not able to understand the behaviour of the following python code. > (contrived form of example) > > for x in itertools.izip_longest(*[iter([1,2,3])]*2): > print x > > > ###output: > (1, 2) > (3, None) > > > It gives me the right answer but I am not sure how it is doing it. I > also referred to the itertools doc but could not comprehend much. In > essence I am trying to understand the intracacies of the following > documentation from the itertools package. > > "The left-to-right evaluation order of the iterables is guaranteed. > This makes possible an idiom for clustering a data series into > n-length groups using izip(*[iter(s)]*n)." > > How is *n able to group the data and the meaning of '*' in the > beginning just after izip.
The '*n' part is to multiply the list so that it repeats. This works for most sequence types in Python: >>> a = [1,2,3] >>> a * 2 [1, 2, 3, 1, 2, 3] In this particular case we multiply a list containing only one item, the iterator over s. This means that the new list contains the same element twice: >>> it = iter(a) >>> [it] [<listiterator object at 0x166c990>] >>> [it] * 2 [<listiterator object at 0x166c990>, <listiterator object at 0x166c990>] So if every element of the list is the same iterator, then we can call next() on any of them to get the same values in the same order: >>> d = [it]*2 >>> d [<listiterator object at 0x166c990>, <listiterator object at 0x166c990>] >>> next(d[1]) 1 >>> next(d[0]) 2 >>> next(d[0]) 3 >>> next(d[0]) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration >>> next(d[1]) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration The * just after izip is for argument unpacking. This allows you to call a function with arguments unpacked from a list: >>> def f(x, y): ... print('x is %s' % x) ... print('y is %s' % y) ... >>> f(1, 2) x is 1 y is 2 >>> args = [1,2] >>> f(args) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: f() takes exactly 2 arguments (1 given) >>> f(*args) x is 1 y is 2 So the original expression, izip(*[iter(s)]*2), is another way of writing it = iter(s) izip(it, it) And izip(*[iter(s)]*10) is equivalent to izip(it, it, it, it, it, it, it, it, it, it) Obviously writing it out like this will get a bit unwieldy if we want to do izip(*[iter(s)]*100) so the preferred method is izip(*[iter(s)]*n) which also allows us to choose what value to give for n without changing anything else in the code. Oscar _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor