[issue44222] Improving _removeHandlerRef for a very long _handlerList

2021-05-25 Thread Yonatan Goldschmidt


Change by Yonatan Goldschmidt :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue44222>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44222] Improving _removeHandlerRef for a very long _handlerList

2021-05-24 Thread Yonatan Goldschmidt


Yonatan Goldschmidt  added the comment:

That system has many (>10k) objects of a certain type. Each of them has a 
separate Logger object + Handlers. That was done, among other reasons, to allow 
for easy consuming of the logs relating to a particular object in a file-based 
manner.

I agree that it's not common at all (other Python projects I've worked on don't 
exhibit this behavior) but since this change is harmless I thought we might as 
well make it.

--

___
Python tracker 
<https://bugs.python.org/issue44222>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44222] Improving _removeHandlerRef for a very long _handlerList

2021-05-23 Thread Yonatan Goldschmidt


Change by Yonatan Goldschmidt :


--
keywords: +patch
pull_requests: +24918
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/26325

___
Python tracker 
<https://bugs.python.org/issue44222>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44222] Improving _removeHandlerRef for a very long _handlerList

2021-05-23 Thread Yonatan Goldschmidt


New submission from Yonatan Goldschmidt :

We have an application which creates 10,000s of logging.Logger & 
logging.Handler objects. I encountered a major slowdown during GCs which happen 
to collect dead Handler objects, calling logging._removeHandlerRef() along the 
way.
That function looks like:

def _removeHandlerRef(wr):
acquire, release, handlers = _acquireLock, _releaseLock, _handlerList
if acquire and release and handlers:
acquire()
try:
if wr in handlers:
handlers.remove(wr)
finally:
release()

and is called by a weakref, attached to each Handler created in 
_addHandlerRef().

The slowdown occurs in the "in" operator, and the remove() call. These 
operations on very long lists are expensive. My suggestion is that, without 
modifying the observed behavior of this function, it can be changed to perform 
only one lookup in the list, by simply calling remove() and ignoring the 
ValueError if it gets raised.

Attached is a small script which demonstrates the issue. It creates N1 Handler 
objects, then deletes the strong reference to the last N2 handler objects, and 
triggers a GC, measuring the time it takes. Additionally, we can measure its 
entire execution (since all remaining Handler objects are cleaned up on exit, 
and eventually _removeHandlerRef() is called for each, and it drastically slows 
down the interpreter shutdown).

Running the demo with N1=30k and N2=2k:

root@8aafe9d1308b:/# python -V
Python 3.9.2
root@8aafe9d1308b:/# time python demo.py 3 2000
len(logging._handlerList)=30001
len(logging._handlerList)=28001
gc.collect() took 1.552838139992673s

real0m12.626s
user0m12.618s
sys 0m0.008s

then applying the improving patch...

root@8aafe9d1308b:/# (cd /usr/local/lib/python3.9/ && patch -p2 < /patch )
patching file logging/__init__.py
Hunk #1 succeeded at 826 (offset -19 lines).

and trying again:

root@8aafe9d1308b:/# time python demo.py 3 2000
len(logging._handlerList)=30001
len(logging._handlerList)=28001
gc.collect() took 0.8691686679958366s

real0m7.134s
user0m7.130s
sys 0m0.004s
root@8aafe9d1308b:/#

Almost a 2x speedup.

I also tried removing the acquire/release locks (now that the remove operation 
is atomic...). I'm not sure it maintains the semantics, and it didn't make too 
much of a positive effect on the run-time anyway, so I didn't continue looking 
into it.

--
components: Library (Lib)
files: demo.py
messages: 394229
nosy: Yonatan Goldschmidt
priority: normal
severity: normal
status: open
title: Improving _removeHandlerRef for a very long _handlerList
type: performance
versions: Python 3.11
Added file: https://bugs.python.org/file50060/demo.py

___
Python tracker 
<https://bugs.python.org/issue44222>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31356] Add context manager to temporarily disable GC

2021-05-01 Thread Yonatan Goldschmidt


Change by Yonatan Goldschmidt :


--
nosy: +Yonatan Goldschmidt

___
Python tracker 
<https://bugs.python.org/issue31356>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36656] Add race-free os.link and os.symlink wrapper / helper

2021-05-01 Thread Yonatan Goldschmidt


Change by Yonatan Goldschmidt :


--
nosy: +Yonatan Goldschmidt

___
Python tracker 
<https://bugs.python.org/issue36656>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29657] os.symlink: FileExistsError shows wrong message

2021-05-01 Thread Yonatan Goldschmidt


Yonatan Goldschmidt  added the comment:

Just reached this issue independently (spent a few minutes debugging an error 
message like "FileExistsError: [Errno 17] File exists: 'a' -> 'b'", where 'a' 
didn't exist...)

I agree with Rich on this - for me, the source of confusion was that the way 
Python uses the arrow notation is just swapped from how ls(1) uses it.

Stage is "patch review" but I couldn't find any PR for it; if we're good with 
Larry's solution then I'm happy to post a PR implementing it.

------
nosy: +Yonatan Goldschmidt

___
Python tracker 
<https://bugs.python.org/issue29657>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41545] gc API requiring matching number of gc.disable - gc.enable calls

2021-01-21 Thread Yonatan Goldschmidt


Yonatan Goldschmidt  added the comment:

Dennis, you're right. I guess I missed it when I previously searched for 
matching issues. https://bugs.python.org/issue31356 indeed solves my problem. 
Closing this as a duplicate.

--
resolution:  -> duplicate
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue41545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42287] Use Py_NewRef & Py_XNewRef where applicable

2020-11-16 Thread Yonatan Goldschmidt


Yonatan Goldschmidt  added the comment:

> I don't see how Py_NewRef() or Py_XNewRef() is less error prone. In my 
> experience, any change is more likely to introduce new bugs than leaving the 
> code unchanged.

Agree that inserting changes opens a door to introducing bugs.

However, the "end state" of having Py_NewRef() is desired, I think. It is more 
concise. It is less error prone because where you use it, you literally can't 
miss the "increment refcount" part when stealing a reference from another 
source. Py_INCREF has to come before/after stealing a ref, leaving room for 
error, IMHO.

> In general, we don't accept changes which are only coding style changes.

Didn't know that. Well if that's the case, then obviously there is no reason to 
work specifically on this conversion.

--

___
Python tracker 
<https://bugs.python.org/issue42287>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42287] Use Py_NewRef & Py_XNewRef where applicable

2020-11-07 Thread Yonatan Goldschmidt


New submission from Yonatan Goldschmidt :

Following https://bugs.python.org/issue42262, I think it'd be nice to convert 
existing C code to use the more concise Py_NewRef() and Py_XNewRef() where 
possible. Increases readability and less bug prone. 

Opening this new issue to track the conversion.

--
components: Interpreter Core
messages: 380531
nosy: Yonatan Goldschmidt, vstinner
priority: normal
severity: normal
status: open
title: Use Py_NewRef & Py_XNewRef where applicable
type: enhancement
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue42287>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42266] LOAD_ATTR cache does not fully replicate PyObject_GetAttr behavior

2020-11-04 Thread Yonatan Goldschmidt


Change by Yonatan Goldschmidt :


--
nosy: +Yonatan Goldschmidt

___
Python tracker 
<https://bugs.python.org/issue42266>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42173] Drop Solaris support

2020-10-28 Thread Yonatan Goldschmidt


Change by Yonatan Goldschmidt :


--
nosy: +Yonatan Goldschmidt

___
Python tracker 
<https://bugs.python.org/issue42173>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42143] Corruptions in func_dealloc() with partially-created function object

2020-10-24 Thread Yonatan Goldschmidt


Change by Yonatan Goldschmidt :


--
keywords: +patch
pull_requests: +21871
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/22953

___
Python tracker 
<https://bugs.python.org/issue42143>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42143] Corruptions in func_dealloc() with partially-created function object

2020-10-24 Thread Yonatan Goldschmidt


New submission from Yonatan Goldschmidt :

While reading funcobject.c, I noticed that PyFunction_NewWithQualName() may 
exit early if it fails retrieving __name__ from the globals dict.
It will destroy the partially-created PyFunction object. However, this is done 
before ever initializing ->func_qualname. Since the object is allocated with 
PyObject_GC_New() (which does *not* provide zero-initialized memory), we are 
using Py_CLEAR() on uninitialized memory.

I wrote a simple snippet to produce this bug, expecting to see a crash due to 
the Py_DECREF() call on garbage memory:

from types import FunctionType
import random

class BadEq:
def __hash__(self):
print("hash called")
return hash("__name__")
def __eq__(self, other):
print("eq called")
raise Exception()

# run until we hit a 0x61616161.. pointer for 
PyFunctionObject->func_qualname, and crash
while True:
s = int(random.random() * 1000) * "a"
try:
FunctionType(BadEq.__hash__.__code__, {BadEq(): 1})
except Exception:
pass

However, I got *another* crash immediately: func_dealloc() calls 
_PyObject_GC_UNTRACK() which crashes if _PyObject_GC_TRACK() was not called 
beforehand. When running with "--with-pydebug", you get this nice assert:

Objects/funcobject.c:595: _PyObject_GC_UNTRACK: Assertion "(((PyGC_Head 
*)(op)-1)->_gc_next != 0)" failed: object not tracked by the garbage collector

I replaced "_PyObject_GC_UNTRACK(op);" in func_dealloc() with:

if (PyObject_GC_IsTracked(op)) {
_PyObject_GC_UNTRACK(op);
}

And ran my snippet again, this time it crashes after some loops with the error 
I was expecting to receive
(here I print _py_tmp, the temporary variable inside Py_CLEAR()):

Program received signal SIGSEGV, Segmentation fault.
func_clear (op=op@entry=0x7f9c8a25faf0) at Objects/funcobject.c:587
587 Py_CLEAR(op->func_qualname);
(gdb) p _py_tmp
$1 = (PyObject *) 0x6161616161616161

This issue exists since https://github.com/python/cpython/pull/2, which 
fixed tons of call sites to use PyDict_GetItemWithError(), I guess that this 
specific flow was not tested.

As a fix, I think we should run the part that can result in errors *before* 
creating the PyFunction, so we can no longer get an error while we have a 
partial object. This fixes both of the problems I've described. Thus we can 
move the dict lookup part to the top of
PyFunction_NewWithQualName().

I see no reason not to make such a change in the function's flow. If we'd 
rather keep the flow as-is, we can make sure to initialize ->func_qualname to 
NULL, and wrap _PyObject_GC_UNTRACK() with _PyObject_GC_IS_TRACKED() in 
func_dealloc(). I like this solution less because it adds this unnecessary "if" 
statement in func_dealloc().

I'll post a PR in GitHub soon.

--
components: Interpreter Core
messages: 379546
nosy: Yonatan Goldschmidt
priority: normal
severity: normal
status: open
title: Corruptions in func_dealloc() with partially-created function object
type: crash
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue42143>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42127] functools.cached_property possibly disables key-sharing instance dictionaries

2020-10-23 Thread Yonatan Goldschmidt


New submission from Yonatan Goldschmidt :

Key-sharing dictionaries, defined by https://www.python.org/dev/peps/pep-0412/, 
require that any resizing of the shared dictionary keys will happen before a 
second instance of the class is created.

cached_property inserts its resolved result into the instance dict after it is 
called. This is likely to happen *after* a second instance has been created, 
and it is also likely to cause a resize of the dict, as demonstrated by this 
snippet:

from functools import cached_property
import sys

def dict_size(o):
return sys.getsizeof(o.__dict__)

class X:
def __init__(self):
self.a = 1
self.b = 2
self.c = 3
self.d = 4
self.e = 5

@cached_property
def f(self):
return id(self)

x1 = X()
x2 = X()

print(dict_size(x1))
print(dict_size(x2))

x1.f

print(dict_size(x1))
print(dict_size(x2))

x3 = X()
print(dict_size(x3))

Essentially it means that types using cached_property are less likely to enjoy 
the benefits of shared keys. It may also incur a certain performance hit, 
because a resize + unshare will happen every time.

A simple way I've thought of to let cached_property play more nicely with 
shared keys, is to first create a single object of the class, and set the 
cached_property attribute to some value (so the key is added to the shared 
dict). In the snippet above, if you add "x0 = X(); x0.f = None" before creating 
x1 and x2, you'll see that the cached_property resolving does not unshare the 
dicts.

But I wonder if there's a way to do so without requiring user code changes.

--
components: Library (Lib)
messages: 379439
nosy: Yonatan Goldschmidt
priority: normal
severity: normal
status: open
title: functools.cached_property possibly disables key-sharing instance 
dictionaries
type: performance
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue42127>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41545] gc API requiring matching number of gc.disable - gc.enable calls

2020-08-17 Thread Yonatan Goldschmidt


Yonatan Goldschmidt  added the comment:

Hmm... I didn't think of overlapping lock periods. You are right, Dennis, a 
counter must be managed.

--

___
Python tracker 
<https://bugs.python.org/issue41545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41545] gc API requiring matching number of gc.disable - gc.enable calls

2020-08-17 Thread Yonatan Goldschmidt


Yonatan Goldschmidt  added the comment:

> This is exactly the motivation for context managers, no? I attached no_gc.py, 
> which works when nested and should additionally be thread-safe.

My solution was roughly the same (also a context manager, but a bit simplified 
because I didn't need threading support so I didn't bother with locking).

> There is also gc.isenabled(), so couldn't you check that before disabling and 
> remember whether you needed to disable or not?

Yes, but it must be protected like Dennis suggested, otherwise it can't be used 
in a race-free way, for example this snippet is susceptible to a thread switch 
between the `isenabled()` and `disable()` calls (so another thread could 
meanwhile disable GC, and we retain a stale `was_enabled` result)

 was_enabled = gc.isenabled()
 gc.disable()
 ...
 if was_enabled:
 gc.enable()

My points in this issue are:

1. I think that such a safer API should be available in the standard library (I 
don't want to find myself repeating this across different projects). I think 
that wherever you find yourself using `gc.disable()` you should actually be 
using a safer API (that takes into account multithreading & previous 
invocations of `gc.disable()`)
2. A tiny change in `gc.enable()` (as I suggested in msg375376) can make it 
substantially easier for the Python side to protect itself, because it will be 
race-free without any locks.

--

___
Python tracker 
<https://bugs.python.org/issue41545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41545] gc API requiring matching number of gc.disable - gc.enable calls

2020-08-14 Thread Yonatan Goldschmidt


Yonatan Goldschmidt  added the comment:

> If this race condition is a bug in gc, then we should fix that.

> If it is a bug in your code, surely you should fix that rather than disable 
> gc.

It's neither: it is something related to the combination of some 3rd party 
libraries I'm using (if you're interested, I have described it here: 
https://github.com/paramiko/paramiko/issues/1634).

> Either way, I don't think we should complicate the gc iterface by adding a 
> second way to enable and disable the cyclic gc.

I see what you're saying. I wasn't thinking of of this idea as complicating it, 
I had in mind existing interfaces which have these 2 sets of functions (for 
example, Linux's local_irq_enable/disable and local_irq_save/restore).

Another approach, only modifying the existing API in a compatible way, will be 
as follows:

--- a/Modules/gcmodule.c
+++ b/Modules/gcmodule.c
@@ -1489,9 +1489,10 @@ static PyObject *
 gc_disable_impl(PyObject *module)
 /*[clinic end generated code: output=97d1030f7aa9d279 
input=8c2e5a14e800d83b]*/
 {
+int was_enabled = gcstate->enabled
 GCState *gcstate = get_gc_state();
 gcstate->enabled = 0;
-Py_RETURN_NONE;
+return was_enabled ? (Py_INCREF(Py_True), Py_True) : 
(Py_INCREF(Py_False), Py_False);
 }

Then I can write code this way:

foo():
disabled = gc.disable()

if disabled:
 gc.enable()

It can be taken ahead to change `gc.enable()` to `gc.enable(disabled=True)` so 
I can just call it as `gc.enable(disabled)`:

foo():
disabled = gc.disable()

gc.enable(disabled)

--

___
Python tracker 
<https://bugs.python.org/issue41545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41545] gc API requiring matching number of gc.disable - gc.enable calls

2020-08-13 Thread Yonatan Goldschmidt


New submission from Yonatan Goldschmidt :

I have a construct in my code where I wrap a function's logic with 
`gc.disable()` to prevent GC from triggering some race condition.

Today this construct got a bit more complicated, and another function had to 
use this "trick". Now there are 2 flows:

foo():
gc.disable()

gc.enable()

bar()

gc.disable()
...
# bar calls foo, which also messes with the gc
foo()
gc.disable()
...
gc.enable()
...
gc.enable()
...

I'd expected the GC to be truly enabled only on the second call to `enable()`, 
but apparently it's enabled on the first call :( Both `gc.enable()` and 
`gc.disable()` just write `1`/`0` to `gcstate->enabled`, respectively.

For the meantime I wrote a simple wrapper class to do this counting (finally 
calling `enable()` only when necessary).

Another option is for the code to check first "is GC enabled?" before 
disabling, and before enabling again, remember whether it really disabled it or 
not.

But I think those manual workarounds are more error prone, and it's easier to 
forget to use it them in all sites (compared to using the "right" API from the 
standard module), so I was considering if we can add an API that encapsulate 
this counting: an enable-disable pair that makes sure GC is only enabled back 
when the number of "enable" calls matches the number of "disable" calls.

--
components: Interpreter Core
messages: 375355
nosy: Yonatan Goldschmidt
priority: normal
severity: normal
status: open
title: gc API requiring matching number of gc.disable - gc.enable calls
type: enhancement
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue41545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10399] AST Optimization: inlining of function calls

2020-08-11 Thread Yonatan Goldschmidt


Change by Yonatan Goldschmidt :


--
nosy: +Yonatan Goldschmidt

___
Python tracker 
<https://bugs.python.org/issue10399>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41427] howto/descriptor.rst unnecessarily mentions method.__class__

2020-07-28 Thread Yonatan Goldschmidt


Change by Yonatan Goldschmidt :


--
keywords: +patch
pull_requests: +20810
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/21665

___
Python tracker 
<https://bugs.python.org/issue41427>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41427] howto/descriptor.rst unnecessarily mentions method.__class__

2020-07-28 Thread Yonatan Goldschmidt


New submission from Yonatan Goldschmidt :

In Doc/howto/descriptor.rst:

# Internally, the bound method stores the underlying function,
# the bound instance, and the class of the bound instance.
>>> d.f.__func__

>>> d.f.__self__
<__main__.D object at 0x1012e1f98>
>>> d.f.__class__


The bound method (PyMethodObject) does not store "the class of the bound 
instance" - it only stores the "function" and "self".
d.f.__class__ is the class of the "method" type itself, not the class of d.f's 
instance (D from d = D())

I think this mention should be removed from the documentation?

--
assignee: docs@python
components: Documentation
messages: 374526
nosy: Yonatan Goldschmidt, docs@python
priority: normal
severity: normal
status: open
title: howto/descriptor.rst unnecessarily mentions method.__class__
type: enhancement
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue41427>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35823] Use vfork() in subprocess on Linux

2020-06-26 Thread Yonatan Goldschmidt


Yonatan Goldschmidt  added the comment:

With issue35537 merged, do we have any intention to get on with this? From what 
I can tell, it provides roughly the same performance benefit - but works also 
with close_fds=True, which is the default of Popen, so IMO it's a worthy 
addition.

I tested a rebase on current master, test_subprocess passes on my Linux box and 
there are no more -Wclobbered warnings after Alexey's latest change of the 
do_fork_exec() helper.

--

___
Python tracker 
<https://bugs.python.org/issue35823>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35823] Use vfork() in subprocess on Linux

2020-06-24 Thread Yonatan Goldschmidt


Change by Yonatan Goldschmidt :


--
nosy: +Yonatan Goldschmidt

___
Python tracker 
<https://bugs.python.org/issue35823>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40222] "Zero cost" exception handling

2020-06-02 Thread Yonatan Goldschmidt


Change by Yonatan Goldschmidt :


--
nosy: +Yonatan Goldschmidt

___
Python tracker 
<https://bugs.python.org/issue40222>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4264] Patch: optimize code to use LIST_APPEND instead of calling list.append

2020-05-17 Thread Yonatan Goldschmidt


Change by Yonatan Goldschmidt :


--
nosy: +Yonatan Goldschmidt

___
Python tracker 
<https://bugs.python.org/issue4264>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39382] abstract_issubclass() doesn't take bases tuple item ref

2020-02-16 Thread Yonatan Goldschmidt


Change by Yonatan Goldschmidt :


--
pull_requests: +17906
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/18530

___
Python tracker 
<https://bugs.python.org/issue39382>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39382] abstract_issubclass() doesn't take bases tuple item ref

2020-01-18 Thread Yonatan Goldschmidt


New submission from Yonatan Goldschmidt :

I encountered a crash using rpyc. Since rpyc is pure-Python, I guessed the 
problem is with Python itself.

Originally tested on v3.7, the rest of this issue is based on v3.9.0a2 
(compiling on an Ubuntu 18.04 docker).

I narrowed down the rpyc-based snippet to this:

# server side
class X: pass
x_instance = X()

from rpyc.core import SlaveService
from rpyc.utils.classic import DEFAULT_SERVER_PORT
from rpyc.utils.server import ThreadedServer
t = ThreadedServer(SlaveService, port=DEFAULT_SERVER_PORT, reuse_addr=True)
t.start()

# client side
import rpyc
conn = rpyc.classic.connect("localhost")
x = conn.modules.__main__.x_instance
y = x.__class__
issubclass(y, int)

Client side gets a SIGSEGV in `_PyObject_LookupAttr`, dereferencing an invalid 
`tp` pointer read from a posioned `v` object.

After some reference count debugging, I found that for the rpyc `y` object (in 
the client code), accessing `__bases__` returns a tuple with refcount=1, and it 
has a single item whose refcount is 1 as well.

abstract_issubclass() calls abstract_get_bases() to get this refcount=1 tuple, 
and in the fastpath for single inheritance (tuple size = 1) it loads the single 
item from the tuple (`derived = PyTuple_GET_ITEM(bases, 0)`) and then 
decrements the refcount on the tuple, effectively deallocating the tuple and 
the `derived` object (whose only reference was from the tuple).

I tried to mimic the Python magic rpyc does to get the same crash without rpyc, 
and reached the following snippet (which doesn't exhibit the problem):

class Meta(type):
def __getattribute__(self, attr):
if attr == "__bases__":
class Base: pass
return (Base, )
return type.__getattribute__(self, attr)

class X(metaclass=Meta):
pass

issubclass(X().__class__, int)

In this case, the refcount is decremented from 4 to 3 as abstract_issubclass() 
gets rid of the tuple (instead of from 1 to 0 as happens in the rpyc case). I 
don't know how rpyc does it.

Attached is a patch that solves the problem (takes a ref of the tuple item 
before releasing the ref of the tuple itself).
I'm not sure this change is worth the cost because, well, I don't fully 
understand the severity of it since I couldn't reproduce it without using rpyc. 
I assume dynamically-created, unreferenced `__bases__` tuples as I have here 
are not so common.

Anyway, if you do decide it's worth it, I'd be happy to improve the patch (it's 
quite messy the way this function is structured) and post it to GitHub :)

Yonatan

--
components: Interpreter Core
files: abstract_issubclass_refcount_fix.diff
keywords: patch
messages: 360247
nosy: Yonatan Goldschmidt
priority: normal
severity: normal
status: open
title: abstract_issubclass() doesn't take bases tuple item ref
type: crash
versions: Python 3.9
Added file: 
https://bugs.python.org/file48851/abstract_issubclass_refcount_fix.diff

___
Python tracker 
<https://bugs.python.org/issue39382>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com