Re: [Python-Dev] Proposal: dict.with_values(iterable)

Steve Dower Mon, 22 Apr 2019 23:05:01 -0700

On 22Apr2019 2143, Inada Naoki wrote:

On Tue, Apr 23, 2019 at 11:30 AM Steve Dower <steve.do...@python.org> wrote:


Or possibly just "dict(existing_dict).update(new_items)".


Do you mean .update accepts values tuple?
I can't think it's


Not sure what you were going to go on to say here, but why not?

If it's a key-sharing dict, then all the keys are strings. We know thatwhen we go to do the update, so we can intern all the strings (going todo that anyway) and then it's a quick check if it already exists. Ifit's a regular dict, then we calculate hashes as normal. Updating thevalue is just a decref, incref and assignment.

If not all these conditions are met, we convert to a regular dict. Theproposed function was going to raise an error in this case, so all we'vedone is make it transparent. The biggest downside is now you don't get awarning that your preferred optimization isn't actually working when youpass in new_items with different keys from what were in existing_dict.

Note that it .update() would probably require a dict or key/value tupleshere - but if you have the keys in a tuple already then zip() is goingto be good enough for setting it (in fact, zip(existing_dict,new_values) should be fine, and we can internally special-case thatscenario, too). I'd assumed the benefit was in memory usage afterconstruction, rather than speed-to-construct, since everyone keepstalking about "key-sharing dictionaries" and not "arrays" ;)

(Randomizing side note: is this scenario enough to make a case for abuilt-in data frame type?)

My primary concern is still to avoid making CPython performance
characteristics part of the Python language definition. That only makes
it harder for alternate implementations.


Note that this proposal is not only for key sharing dict:

* We can avoid rebuilding hash table again and again.
* We can avoid checking duplicated keys again and again.

These characteristics are not only for Python, but for all mapping
implementations using hash table.

I believe all of these are met by making d2=dict(d1) construct a dict d2that shares keys with d1 by default. Can you show how they are not?


* when you only d2.update existing keys, no need to rebuild the table

* a duplicated key overwrites multiple times - what else are you goingto do? This is already easiest, fastest, uses the least memory and ismost consistent with every other form of setting dict items. Whycomplicate things by checking them? Let the caller do it


Cheers,
Steve
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: dict.with_values(iterable)

Reply via email to