> On Dec 3, 2019, at 03:41, Steven D'Aprano <st...@pearwood.info> wrote: > On Tue, Dec 03, 2019 at 01:54:44AM -0800, Andrew Barnert via Python-ideas > wrote: >>>>> On Dec 2, 2019, at 16:27, Soni L. <fakedme...@gmail.com> wrote: >>>> Even use-cases where you have different objects whose differences are >>>> ignored for __eq__ and __hash__ and you want to grab the one from the set >>>> ignoring their differences would benefit from this. >> A more concrete use case might help make the argument better. > > Is interning concrete enough?
No. A concrete use for interning would be, but interning itself isn’t. If you’re using interning for functionality, to distinguish two equal strings that came from different inputs or processes, your code is probably broken. Python is allowed to merge distinct equal values of builtin immutable types whenever it wants to. And different interpreters, and even different CPython versions, may do that in different cases. That means any code that relies on the result of is on two equal immutable values is wrong. If you don’t care about portability or future compatibility, you could always work out the rules for one interpreter, version, and build. But they’re pretty complicated. IIRC, the current rules for a default build of CPython are something like this: * Two equal string literals in the same scope are identical. * Two string expressions in the same scope with equal values that the optimizer is able to turn into constants are identical. * There’s some rule for interactive literals that I don’t remember, so even though two top-level interactive statements are compiled and evaluated as separate scopes they can still share constant string values. * Two empty strings are identical if they’re created by any builtin, but it’s possible to create distinct ones with the C API. * Some single-character strings are treated the same as the empty string; the exact set is a compile-time option but defaults to all printable ASCII characters or all ASCII characters or something like that. * Copying a string with [:] or even copy.deepcopy gives you the same string. And there are similar but not identical rules for bytes and int, while bools and None are stricter (even C extensions can’t give you a distinct but equal None value), and float and tuple are looser (inf is a singleton like “”, but every float('inf’) returns a new value anyway). And I can’t remember how tuple scope merging changed when tuples deeper than 1 were allowed to become constants. So, what can you actually safely do with interning? You could try to optimize your code by interning a bunch of your strings and then using `a is b or a == b` instead of just `a == b`, but this will almost always make it slower, not faster. What about optimizing for memory instead of speed? Interning a string would waste, say, 24 bytes, but if you have 1000 copies of that same string, N+24 is a lot better than N*1000. But what kind of application are you building that stores vast numbers of duplicates of strings and isn’t storing them in a set or dict or database or custom b-tree or trie or whatever? And once you do that, it doesn’t matter whether the boxed Python values are interned, only whether the values inside that data structure are collapsed (and in all those cases, they either are or trivially could be). Maybe you can come up with some application that does need to store a billion copies of only a thousand strings, and needs to store them in a list (or a billion separate locals, I guess…). If so, then you’ve got a concrete use case. > The Python interpreter interns at least two kinds of objects: ints and > strings, or rather, *some* ints and strings. This is of course the CPython interpreter; different interpreters will be different. > Back in Python 1.5, there > was a built-in for interning strings: > > # Yes I still have a 1.5 interpreter :-) >>>> a = intern("hello world") >>>> b = intern("hello world") >>>> a is b > 1 And (at least in Pythonista, which currently embeds CPython 3.6.1, but I’m not sure its REPL behavior is always identical to the stock one): >>> a = 'hello' >>> b = 'hello' >>> a is b True By the way, intern was still there until 2.7, but in that list of “we can’t deprecate these but please never use them” functions at the end of builtins, so you didn’t actually need 1.5 to test it. But I understand; you can never be too sure that the 2.0 license won’t turn out to be as unusable as the 1.6 license, so you need something to fall back on. :) _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GI5YSUB5NSXQDZNML7EGPVT7RA5BTSDY/ Code of Conduct: http://python.org/psf/codeofconduct/