On Tue, Dec 03, 2019 at 10:26:35AM -0800, Andrew Barnert wrote: > If you’re using interning for functionality, to distinguish two equal > strings that came from different inputs or processes, your code is > probably broken.
That's not how interning works. The purpose of interning is to *remove* the distinction between values that come from different inputs, to guarantee that they are the same object. Not to distinguish them! > Python is allowed to merge distinct equal values of > builtin immutable types whenever it wants to. True. And so am *I*, the coder, but I have to do it myself, the language no longer has a built-in intern() function to help (and even when it did, it only worked on strings, not ints or floats or fractions or tuples of same). > And different > interpreters, and even different CPython versions, may do that in > different cases. That means any code that relies on the result of is > on two equal immutable values is wrong. No. That means any code that relies on the *interpreter* interning values in a particular way is wrong. If the code itself does its own interning, then it controls what gets interned and when, using whatever strategy makes sense for its own use. Why would you want to? Well, we already have at least one std lib memoisation decorator, `functools.lru_cache`, and that's sort of a kind of interning, so the idea is clearly not that preposterous. Whether it would be useful in practice is, as I already acknowledged, rather speculative. > You could try to optimize your code by interning a bunch of your > strings and then using `a is b or a == b` instead of just `a == b`, > but this will almost always make it slower, not faster. I don't believe that assertion without evidence: 1. A lot of collections define element equality using an identity test first as an optimization (even if that means that they do the wrong thing when NANs are involved). So that's prima facie evidence that using `is` will be faster. 2. That also includes strings. Being able to do an `is` comparison is a major speed-up for large strings: $ ./python -m timeit -s "s = 'abcde'*1000000" -s "t = s" "s == t" 1000000 loops, best of 5: 313 nsec per loop $ ./python -m timeit -s "s = 'abcde'*1000000" -s "t = s[0] + s[1:]" "s == t" 20 loops, best of 5: 15.6 msec per loop 3. `is` is a pointer comparison handled by the interpreter as a single opcode; `==` is an operator which has to look up the object's class, look up its `__eq__` method, and call it. The overhead is much higher. but in any case, the purpose of interning is not to encourage the coder to use `is`. Generally it is to save the time required to construct new instances (if possible), or at least save the memory required to hold lots of equal immutable instances. > > The Python interpreter interns at least two kinds of objects: ints and > > strings, or rather, *some* ints and strings. > > This is of course the CPython interpreter; different interpreters will be > different. Yes, you are correct, mea culpa. Anyway, I think I've said enough about interning. Without a good way to experiment, it's hard to say whether the idea would go anywhere or not, or whether it offers anything that lru_cache doesn't offer. -- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/77WKM3X3VOWG7DMM7EYQVPS4FIMFX4OO/ Code of Conduct: http://python.org/psf/codeofconduct/