On 22Apr2019 2143, Inada Naoki wrote:
On Tue, Apr 23, 2019 at 11:30 AM Steve Dower <steve.do...@python.org> wrote:

Or possibly just "dict(existing_dict).update(new_items)".


Do you mean .update accepts values tuple?
I can't think it's

Not sure what you were going to go on to say here, but why not?

If it's a key-sharing dict, then all the keys are strings. We know that when we go to do the update, so we can intern all the strings (going to do that anyway) and then it's a quick check if it already exists. If it's a regular dict, then we calculate hashes as normal. Updating the value is just a decref, incref and assignment.

If not all these conditions are met, we convert to a regular dict. The proposed function was going to raise an error in this case, so all we've done is make it transparent. The biggest downside is now you don't get a warning that your preferred optimization isn't actually working when you pass in new_items with different keys from what were in existing_dict.

Note that it .update() would probably require a dict or key/value tuples here - but if you have the keys in a tuple already then zip() is going to be good enough for setting it (in fact, zip(existing_dict, new_values) should be fine, and we can internally special-case that scenario, too). I'd assumed the benefit was in memory usage after construction, rather than speed-to-construct, since everyone keeps talking about "key-sharing dictionaries" and not "arrays" ;)

(Randomizing side note: is this scenario enough to make a case for a built-in data frame type?)

My primary concern is still to avoid making CPython performance
characteristics part of the Python language definition. That only makes
it harder for alternate implementations.

Note that this proposal is not only for key sharing dict:

* We can avoid rebuilding hash table again and again.
* We can avoid checking duplicated keys again and again.

These characteristics are not only for Python, but for all mapping
implementations using hash table.

I believe all of these are met by making d2=dict(d1) construct a dict d2 that shares keys with d1 by default. Can you show how they are not?

* when you only d2.update existing keys, no need to rebuild the table
* a duplicated key overwrites multiple times - what else are you going to do? This is already easiest, fastest, uses the least memory and is most consistent with every other form of setting dict items. Why complicate things by checking them? Let the caller do it

Cheers,
Steve
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to