On Fri, 12 Feb 2016 04:02 pm, Chris Angelico wrote: > On Fri, Feb 12, 2016 at 3:45 PM, Paulo da Silva > <p_s_d_a_s_i_l_v_a...@netcabo.pt> wrote: >>> Correct. Two equal strings, passed to sys.intern(), will come back as >>> identical strings, which means they use the same memory. You can have >>> a million references to the same string and it takes up no additional >>> memory. >> I have being playing with this and found that it is not always true!
It is true, but only for the lifetime of the string. Once the string is garbage collected, it is removed from the cache as well. If you then add the string again, you may not get the same id. py> mystr = "hello world" py> str2 = sys.intern(mystr) py> str3 = "hello world" py> mystr is str2 # same string object, as str2 is interned True py> mystr is str3 # not the same string object False But if we delete all references to the string objects, the intern cache is also flushed, and we may not get the same id: py> del str2, str3 py> id(mystr) # remember this ID number 3079482600 py> del mystr py> id(sys.intern("hello world")) # a new entry in the cache 3079227624 This is the behaviour you want: if a string is completely deleted, you don't want it remaining in the intern cache taking up memory. > I'm not 100% sure of what's going on here, but my suspicion is that a > string that isn't being used is allowed to be flushed from the > dictionary. If you retain a reference to the string (not to its id, > but to the string itself), you shouldn't see that change. By doing the > dict yourself, you guarantee that ALL the strings will be retained, > which can never be _less_ memory than interning them all, and can > easily be _more_. Yep. Back in the early days, interned strings were immortal and lasted forever. That wasted memory, and is no longer the case. -- Steven -- https://mail.python.org/mailman/listinfo/python-list