On Sun, Mar 5, 2017 at 6:45 PM, Steven D'Aprano <st...@pearwood.info> wrote:

> Here is a radical thought... why don't lists track their common type
> themselves? There's only a few methods which can add items:
>

I had exactly the same thought.  Lists would need to grow a new attribute,
of course. I'm not sure how that would affect the object layout and word
boundaries.  But there might be free space for another attribute slot.

The real question is whether doing this is a win.  On each append/mutation
operation we would need to do a comparison to the __type_hint__ (assuming
Steven's spelling of the attribute).  That's not free.  Balancing that,
however, when we actually *did* a sort, it would be O(1) to tell if it was
homogeneous (and also the actual type if yes) rather than O(N).

This question isn't really subject to microbenchmarks though.  If we added
__type_hint__ as None/type-object and added those comparisons to it on
.insert()/.append()/etc, then we would be slower by some increment while
all we were doing was adding things.  There could only be a win when the
list is sorted (but a big win for that case).

In real world code, how often are lists sorted? Also I'm not really
confident that Elliot's estimates of 95% of lists being homogeneous holds,
but the speedup he proposes would seem to win even if that percentage is a
lot lower than 95%.  If only 10% of lists in real world code ever get
`my_list.sort()` called on them, Steven's idea is probably not good.  If
50% of lists do, it probably is.  But then, it depends just how *often*
lists that get sorted are re-sorted too.

Yours, David...

P.S. I think that given that we can .append() then delete items to make a
heterogenous list become homogeneous again, Elliot's idea is somewhat
orthogonal to Steven's.  That is, even if we had .__type_hint__ on lists,
it might be a "false None" and it could still be worth doing Elliot's
linear scan anyway.  On the other hand, the None-ness on non-empty lists
might be a good enough predictor of heterogeneity in real world code that
the linear scan would almost always be a waste.  I do not know without
benchmarks against real codebases.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to