On 11 June 2017 at 13:35, David Mertz <me...@gnosis.cx> wrote:
> You are right.  I made a thinko.
>
> List construction from an iterator is O(N) just as is `sum(1 for _ in it)`.
> Both of them need to march through every element.  But as a constant
> multiplier, just constructing the list should be faster than needing an
> addition (Python append is O(1) because of smart dynamic memory
> pre-allocation).
>
> So the "just read the iterator" is about 2-3 times faster than
> read-then-accumulate).  Of course, it the run-lengths are LARGE, we can
> start worrying about the extra memory allocation needed as a tradeoff.  Your
> sum uses constant memory.

This would be another argument in favour of providing an
itertools.counted_len function, as that would be able to avoid all the
overheads currently associated with the  "sum(1 for __ in iterable)"
counting strategy.

Without that, the best you can do in pure Python is to use
__length_hint__ to choose your preferred counting strategy at runtime
based on the specific input.

Something like:

    from operator import length_hint

    # 10k 64-bit pointers ~= 640k max temporary list
   _USE_COUNTED_SUM = 10_001

    def counted_len(iterable):
        # For sized containers, just return their length
        try:
            return len(iterable)
        except TypeError:
            pass
        # For probably-large inputs & those with no length hint, count them
        hint = length_hint(iterable, default=_USE_COUNTED_SUM)
        if hint >= _USE_COUNTED_SUM:
            return sum(1 for __ in iter(iterable))
        # Otherwise, make a temporary list and report its length
        # as the specifics of the CPython implementation make this
        # faster than using the generator expression
        return len(list(iterable))

Cheers,
Nick.

P.S. itertools._grouper objects don't currently provide a length hint,
and it would be tricky to add one, since it would need to be based on
the number of remaining items in the original sequence, which would it
turn depend on *that* defining either __len__ or __length_hint__.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to