New submission from Larry Hastings:

A visual inspection of PyUnicode_InternInPlace() suggests there might be a rare 
edge-case bug lurking there.

Specifically, the bug has to do with "interned mortal" strings.  Interned 
mortal strings are stored in the static "interned" dictionary, with their key 
and value set to the same string.  (If "x" is interned, then in the interned 
dictionary "x" is set to "x".)  Under normal circumstances, these two 
references would keep the string alive forever.  However, in order for the 
string to be *mortal*, the unicode object then DECREFs the string twice so the 
references in the "interned" dict no longer "count".  When the refcnt reaches 
0, and the string is being destroyed, unicode_dealloc() temporarily resurrects 
the object, bumping its reference count up to 3.  It then removes the string 
from the interned dict and destroys it as normal.

The actual intern substitution happens in a function called 
PyUnicode_InternInPlace().  This function looks the string up in the "interned" 
dictionary, and if the string is there it substitutes in the interned version 
from the dictionary.  There's a curious comment in that function:

    /* It might be that the GetItem call fails even
       though the key is present in the dictionary,
       namely when this happens during a stack overflow. */
    Py_ALLOW_RECURSION
    t = PyDict_GetItem(interned, s);
    Py_END_ALLOW_RECURSION

If t is NULL, then PyUnicode_InternInPlace() goes on to set t in the interned 
dictionary, reduces the refcnt by 2, etc etc.

The problem: if t is in the dictionary, but PyDict_GetItem() fails (during a 
stack overflow?), then the subsequent PyDict_SetItem() will *overwrite* the 
existing interned string.  Which means the "interned" dict will drop its two 
references, which means two REFCNTs.  If there were only 1 or 2 extant 
references to this string, it means the string will be immediately dealloc'd, 
*even though the string was still in use*.

I suppose this is already such a speculative condition that it's probably not 
worrying over.  If PyDict_GetItem fails due to stack overflow, perhaps the 
Python process is doomed to fail soon anyway.  Perhaps PyDict_SetItem has no 
chance of working.  But it *seems* like this code is intended to function 
correctly even when PyDict_GetItem fails due to stack overflow, and I suspect 
it won't.

----------
components: Interpreter Core
messages: 278421
nosy: larry
priority: low
severity: normal
stage: test needed
status: open
title: Possible PyUnicode_InternInPlace() edge-case bug
type: behavior
versions: Python 3.5, Python 3.6, Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28406>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to