[Python-ideas] Re: Sets for easy interning(?)

Steven D'Aprano Tue, 03 Dec 2019 20:48:33 -0800

On Tue, Dec 03, 2019 at 10:26:35AM -0800, Andrew Barnert wrote:

> If you’re using interning for functionality, to distinguish two equal 
> strings that came from different inputs or processes, your code is 
> probably broken.


That's not how interning works. The purpose of interning is to *remove* 
the distinction between values that come from different inputs, to 
guarantee that they are the same object. Not to distinguish them!


> Python is allowed to merge distinct equal values of 
> builtin immutable types whenever it wants to.

True. And so am *I*, the coder, but I have to do it myself, the language 
no longer has a built-in intern() function to help (and even when it 
did, it only worked on strings, not ints or floats or fractions or 
tuples of same).


> And different 
> interpreters, and even different CPython versions, may do that in 
> different cases. That means any code that relies on the result of is 
> on two equal immutable values is wrong.

No. That means any code that relies on the *interpreter* interning 
values in a particular way is wrong. If the code itself does its own 
interning, then it controls what gets interned and when, using whatever 
strategy makes sense for its own use.

Why would you want to? Well, we already have at least one std lib 
memoisation decorator, `functools.lru_cache`, and that's sort of a kind 
of interning, so the idea is clearly not that preposterous.

Whether it would be useful in practice is, as I already acknowledged, 
rather speculative.


> You could try to optimize your code by interning a bunch of your 
> strings and then using `a is b or a == b` instead of just `a == b`, 
> but this will almost always make it slower, not faster.

I don't believe that assertion without evidence:

1. A lot of collections define element equality using an identity test 
first as an optimization (even if that means that they do the wrong 
thing when NANs are involved). So that's prima facie evidence that using 
`is` will be faster.

2. That also includes strings. Being able to do an `is` comparison is a 
major speed-up for large strings:

$ ./python -m timeit -s "s = 'abcde'*1000000" -s "t = s" "s == t"
1000000 loops, best of 5: 313 nsec per loop

$ ./python -m timeit -s "s = 'abcde'*1000000" -s "t = s[0] + s[1:]" "s == t"
20 loops, best of 5: 15.6 msec per loop

3. `is` is a pointer comparison handled by the interpreter as a single 
opcode; `==` is an operator which has to look up the object's class, 
look up its `__eq__` method, and call it. The overhead is much higher.


but in any case, the purpose of interning is not to encourage the coder 
to use `is`. Generally it is to save the time required to construct new 
instances (if possible), or at least save the memory required to hold 
lots of equal immutable instances.



> > The Python interpreter interns at least two kinds of objects: ints and 
> > strings, or rather, *some* ints and strings.
> 
> This is of course the CPython interpreter; different interpreters will be 
> different.

Yes, you are correct, mea culpa.


Anyway, I think I've said enough about interning. Without a good way to 
experiment, it's hard to say whether the idea would go anywhere or not, 
or whether it offers anything that lru_cache doesn't offer.


-- 
Steven
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/77WKM3X3VOWG7DMM7EYQVPS4FIMFX4OO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Sets for easy interning(?)

Reply via email to