On 2019-12-03 8:34 a.m., Steven D'Aprano wrote:
On Tue, Dec 03, 2019 at 01:54:44AM -0800, Andrew Barnert via Python-ideas wrote:
> On Dec 2, 2019, at 16:27, Soni L. <fakedme...@gmail.com> wrote:
> > > > Even use-cases where you have different objects whose differences are ignored for __eq__ and __hash__ and you want to grab the one from the set ignoring their differences would benefit from this. > > A more concrete use case might help make the argument better.

Is interning concrete enough?

The main reason I spelled "interning" as "interning(?)" is that, uh, as far as I can tell we kinda lack weak sets, and they're pretty important for interning. I could be wrong tho.

Other than that, I'd definitely prefer sets over dicts for interning. I also believe sets are better represented as key-key mappings, not key-None nor key-True, as such I've taken to treating sets as equivalent to key-key mappings for the purposes of my library, but this is a bit of a pain point due to the lack of indexing. Both lists and dicts have indexing, so there's no issue treating them as mappings, but sets *don't* have indexing, so a current wart in my DSL is that you can index lists and dicts but not sets, and yet you can iterate and filter all 3. I could (and maybe I should) add a special case for sets, but idk.

The Python interpreter interns at least two kinds of objects: ints and
strings, or rather, *some* ints and strings. Back in Python 1.5, there
was a built-in for interning strings:

     # Yes I still have a 1.5 interpreter :-)
     >>> a = intern("hello world")
     >>> b = intern("hello world")
     >>> a is b

so perhaps people might like to track down the discussion for and
against removing intern.

We can get the same effect with a dict, but at the cost of using two
pointers per interned object (one as the key, one as the value):

     cache = {}
     def intern(obj):
         return cache.setdefault(obj, obj)

You could cut that to one pointer by using a set, at the expense of
making retrieval slower and more memory-hungry:

     # untested
     cache = set()
     def intern(obj):
         if obj in cache:
             return cache - (cache - {obj})
         return obj

The interpreter interns only a subset of ints and strings because to
intern more would just waste memory for no use. But that's because the
interpreter has to consider arbitrary programs. If I knew that my
program was generating billions of copies of the same subset of values,
I might be able to save memory (and time?) by interning them.

This is terribly speculative of course, but with no easy way to
experiment, speculating is all I can do.

Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
Message archived at 
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to