[Python-ideas] Re: Access (ordered) dict by index; insert slice

Christopher Barker Sat, 11 Jul 2020 12:46:14 -0700

I had a nice note almost written yesterday, but now there've been a bunch
more discussion, so I'm going to try to hit a few points that have been
recently made.

TL;DR: I personally think it would be a nice feature to add indexing to the
dict views. But to be fair, the only real use case I've seen is
random.choice(), so it's really not very compelling. And it seems the
consensus is coming down on the side of no.

However, I find I disagree with many of the "no" arguments, other than:
"it's too much churn for little gain", so I'm going to make my points,
thinking that maybe that trade-off will be rethought. So if yiure firmly on
the side of no, I guess there's no point in reading this, but I do hope it
will be considered.

On Fri, Jul 10, 2020 at 12:45 PM David Mertz <me...@gnosis.cx> wrote:
> The strongest argument I've seen is: `list(d.items())` adds six
characters.

That's a misrepresentation: the reason to prefer not to use the list call
is twofold:

1) Matching our mental model / usability: if I want the nth item (or a
random item) from a dict, I want to ask for that -- I don't want to make a
list, just to index it and throw it away. the list(d.items()) idiom is the
right one if I actually need a list -- it's a bit awkward to have to make a
list, just to throw it away.

2) Performance: making an entire list just to get one item out is a
potentially expensive operation. Again, for the limited use cases, probably
not a big deal, I'm having a really hard time imagining a application where
that would be a bottleneck, but it is *a* reason, if not a compelling one.

> Moreover, even apart from the work of maintaining the feature itself, the
attractive nuisance of getting O(N) behavior rather than O(1) seems like a
strong anti-feature.

Yes, this is the only anti-feature I've seen described in this thread. But
it's only an anti-feature for the use case of making multiple indexing
operations from the same dict view, without changes to the dict. It's a
feature if you need to make only one (or very few) indexing operations from
the same non-mutated dict. After all, that's exactly why we have the dict
views in the first place: you don't want to have to make an unnecessary
copy if don't need to. That clearly applies to iteration and membership:
why not to the "getting one item out" case?

But of course, it is indeed an attractive nuisance in some cases, which is
different than the other view use cases: they are the same or more
efficient than the old "make a list" approach, whereas this would be more
efficient in some cases, and less in others -- so users would need to
evaluate the trade offs, and many wouldn't even know they should think
about that. Overall though, I think that folks would still need to make a
list if they wanted to do any other MutableSequence operations (or be
clearly working with a copy), so I don't think there's all that much danger
is this feature being accidentally used.

On Fri, Jul 10, 2020, 1:07 PM Stestagg <stest...@gmail.com> wrote:
>
>> I don't mind the shooting down, as long as the arguments make sense :D.
>>
>
I agree here for sure: I've no problem with folks having a different
opinion about the value of the trade offs, but I think the trade offs have
been misrepresented -- hence this post ... (no, I don't think anyone's
misrepresenting anything on purpose -- this is about the technical issues)

> It seems like we're both in agreement that the cost of implementing &
>> maintaining the change is non-zero.
>>
>
Another note there: one of the big costs is implementation and
documentation. But this is Open Source: we can all decide that a feature is
a good idea, but it'll never get done unless someone(s) actually decides it
is worth it, to them, to write the code and docs. If no one does, then it's
not going to happen. So that part of the cost is self limiting. Granted,
once written, it needs to be maintained, but that is a lesser cost, at
least in this case, where it's not a whole new object or anything.

> I don't believe that this feature would steepen the language learning
>> curve however, but actually help to shallow it slightly (Explained more
>> below)
>>
>
I agree here. Granted, it's again, only the one use case, but when my
newbie students have to figure out how to get a random key from a dict,
there is no question that:

random.choice(the_dict.keys())

is a little easier than:

random.choice(list(the_dict.keys())

and a lot easier than (untested):

idx = random.randint(0, len(the_dict))
it = iter(the_dict.keys())
for _ in range(choice):
    choice = next(it)

getting an arbitrary one is a bit easier:

choice = next(iter(the_dict.keys()))

In practice, I use this as a teaching opportunity -- but the fact that it
IS a teaching opportunity kind makes my point.

Granted, if this feature were there, there'd be the need to teach folks
about why they want to avoid the attractive nuisance discussed above -- so
I'll set a net-zero.
 > >>> import numpy as np

> > >>> mapping_table = np.array(BIG_LOOKUP_DICT.items())
>

one note on numpy: the numpy array() function is very much designed for
Sequences: partly due to history, but also for convenience and performance
-- it needs to know what the size and data type of the array it is going to
create is before it creates it.

And honestly, I'm not sure that array() would work with the dict views
anyway if we added indexing -- we'd have to look at the logic inside array()

And numpy has from_iter() for working with iterators.

In short: it would work with numpy is NOT a reason to add this feature :-)

> And I expect that even if dict.items() was indexable, numpy would

> still have to copy the items. I don't know how numpy works in detail,
> but I doubt that it will be able to use a view of a hash table internals
> as a fast array without copying.
>

of course not -- but it makes a copy of the items in a list too -- so the
extra copy for the list is still there.
(numpy works with homogenous lower level data types -- the actual bytes of
the C datatype -- so it is always copying the values  when it makes an
array out of Python types. (except for the numpy object dtype, but that's a
special case)

> What making dict_* types a Sequence will do is make this code (as
written) behave:

For my part, I'm not asking for the dict views to be full blown Sequences
-- I think that *would* be an attractive nuisance. I'm thinking only adding
indexing.

still think of concrete sequences and indexing as fundamental, while
> Python 3 has moved in the direction of making the iterator protocol and
> iterators as fundamental.
>

That is indeed a change in Python over the years, but i don't think it was
a practicality-driven change: in short: don't make copies you don't need to
make. So I don't think we should use "Iterators are fundamental to Python"
as a reason to NOT add Sequence-like behavior.

You have a hammer (indexing), so you want views to be nails so you can
> hammer them. But views are screws, and need a screwdriver (iter and
> next).
>

But there are, in carpentry, many places where you can use either a screw
or a nail, and some of us have even been known to hammer a screw in, even
if we had a screwdriver handy, and knew what the heck we were doing. That
is the argument here: when the screw can be well used, in a particular
case, by  hitting it with a hammer, then why not let me do that. To take
the analogy way too far: don't take the hammer out of my toolbox just
because there are some screwdrivers in there.

> The existing dictionary memory layout doesn't support direct indexing
(without stepping), so this functionality is not being added as a
requirement.

But it does make it much more efficient if the stepping is done inside the
dict object by code that knows its internal structure. Both because it can
be in C, and can be done without any additional references or copying. yes,
it's all O(n) but a very different constant.

>The

> fact that they can be indexed in reasonable time is not part of the
> design, just an accident of implementation, and being an accident, it
> could change in the future.
>

It *could*, but I can't imagine how you could have an efficient
order-preserving data structure that could not be indexed reasonably -- in
particular, more efficiently than making a full list copy first. And even
so -- fine: performance characteristics are not guaranteed anyway.

> If random.choice should support non-sequence ordered container,
just propose it to random.choice.

That would indeed solve the usability issue, and so may be a good idea,

The problem here is that there is no way for random.choice to efficiently
work with generic Mappings. This whole discussion started because now that
dicts preserve order, there is both a logical reason, and a practical
implementation for indexing. But if that is not exposed, then
random.choice(), nor any other function, can take advantage of it.

Which would lead to adding a random_choice protocol -- but THAT sure seems
like overkill.
(OK, you could have the builtin random.choice check for an actual dict, and
then use custom code to make a random selection, but that would really be a
micro-optimization!)

> but they can't be Sequences, since they are already Sets. They would
> have to be a hybrid of the two, and that, I feel, comes with more
> baggage than just being one or the other.

I Think this is where I fundamentally disagree, as far as language design
and Python philosophy is concerned. I've been using Python for 20+ years
(terrifying!) and I have always really like the Duck typing concept. in
fact, even one better, it doesn't have to look, walk, and quack like a duck
to be a duck -- if I only need it to quack, I don't care how it looks and
walks.

Since those pre-2.0 days, Python has grown a lot more "structure" to its
typing, notably ABCs and now facilities for static type checking. So far,
those *enable* more formal typing, but don't *require* it. But as more
folks start to use them, I'm going to have to start writing more strictly
typed code if I want to use other libraries -- I"m hoping it won't come to
that, but we'll see.

To bring this back to the case at hand:

I haven't looked at the code, but I"m pretty sure that random.choice() does
not check for the Sequence ABC: it simply tries to get the length, and then
index the object to get a random item. If that works, then it works -- This
is proven by passing it a dict with integer indexes in the right range:

In [28]: d
Out[28]: {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
In [29]: random.choice(d)
Out[29]: 9

I LIKE this -- so the argument that dict views shouldn't support indexing
because they are a Set and can't be a proper Sequence is exactly backwards
from how I think Python should work:

If a feature is useful, and doesn't conflict with another feature, then we
can add it.

In the end though, while I think there is very little reason NOT to add
indexing to dict views, unless someone comes up with a good use case beyond
random.choice(), it may not be worth the churn.

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FTD5PXPLJYMCJPXNW4C4NEFMG37GMP4S/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Access (ordered) dict by index; insert slice

Reply via email to