Re: N-grams

Peter Otten Thu, 10 Nov 2016 01:03:56 -0800

Paul Rubin wrote:

> This can probably be cleaned up some:
> 
>     from itertools import islice
>     from collections import deque
> 
>     def ngram(n, seq):
>         it = iter(seq)
>         d = deque(islice(it, n))
>         if len(d) != n:
>             return
>         for s in it:
>             yield tuple(d)
>             d.popleft()
>             d.append(s)
>         if len(d) == n:
>             yield tuple(d)
> 
>     def test():
>         xs = range(20)
>         for a in ngram(5, xs):
>             print a
> 
>     test()


I started with

def ngrams2(items, n):
    items = iter(items)
    d = deque(islice(items, n-1), maxlen=n)
    for item in items:
        d.append(item)
        yield tuple(d)

and then tried a few dirty tricks, but nothing except omitting tuple(d) 
brought performance near Steven's version.

Just for fun, here's the obligatory oneliner:

def ngrams1(items, n):
    return zip(*(islice(it, i, None) for i, it in enumerate(tee(items, n))))

Be aware that the islice() overhead is significant (I wonder if the islice() 
implementation could be tweaked to reduce that).

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: N-grams

Reply via email to