Paul Rubin wrote:
> This can probably be cleaned up some:
>
> from itertools import islice
> from collections import deque
>
> def ngram(n, seq):
> it = iter(seq)
> d = deque(islice(it, n))
> if len(d) != n:
> return
> for s in it:
> yield tuple(d)
> d.popleft()
> d.append(s)
> if len(d) == n:
> yield tuple(d)
>
> def test():
> xs = range(20)
> for a in ngram(5, xs):
> print a
>
> test()
I started with
def ngrams2(items, n):
items = iter(items)
d = deque(islice(items, n-1), maxlen=n)
for item in items:
d.append(item)
yield tuple(d)
and then tried a few dirty tricks, but nothing except omitting tuple(d)
brought performance near Steven's version.
Just for fun, here's the obligatory oneliner:
def ngrams1(items, n):
return zip(*(islice(it, i, None) for i, it in enumerate(tee(items, n))))
Be aware that the islice() overhead is significant (I wonder if the islice()
implementation could be tweaked to reduce that).
--
https://mail.python.org/mailman/listinfo/python-list