[issue38392] Ensure that objects entering the GC are valid

2019-10-10 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset a5447735c334a041ee2ffdeb5c7e13d7d4502ea2 by Victor Stinner in 
branch 'master':
bpo-38392: Only declare visit_validate() if Py_DEBUG is defined (GH-16689)
https://github.com/python/cpython/commit/a5447735c334a041ee2ffdeb5c7e13d7d4502ea2


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38392] Ensure that objects entering the GC are valid

2019-10-10 Thread STINNER Victor


Change by STINNER Victor :


--
pull_requests: +16275
pull_request: https://github.com/python/cpython/pull/16689

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38392] Ensure that objects entering the GC are valid

2019-10-08 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

I am sure this will break third-party projects. But I thing that we should not 
revert this change, as it completely matches the documentation. It is good that 
it was applied so early at development cycle.

Stefan, could you please test Cython and lxml with this?

--
nosy: +scoder

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38392] Ensure that objects entering the GC are valid

2019-10-07 Thread STINNER Victor


STINNER Victor  added the comment:

Ok, I pushed a change. Let's see how third party projects like it :-) We still 
have time to revert it if it causes too many damage.

--
resolution:  -> fixed
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38392] Ensure that objects entering the GC are valid

2019-10-07 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset 1b1845569539db5c1a6948a5d32daea381f1e35f by Victor Stinner in 
branch 'master':
bpo-38392: PyObject_GC_Track() validates object in debug mode (GH-16615)
https://github.com/python/cpython/commit/1b1845569539db5c1a6948a5d32daea381f1e35f


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38392] Ensure that objects entering the GC are valid

2019-10-07 Thread STINNER Victor


STINNER Victor  added the comment:

> It could be possible to do this in backward compatible way. 
> PyObject_GC_Track() could add the object to the list of new objects (all 
> objects are already linked in bi-linked list, so it would need to just move 
> the object to the specified list), and PyObject_GC_NewVar() could check all 
> new objects and clear the list of new objects (just move the head).

That's different than what I'm proposing here. Checking "later" is more on the 
bpo-36389 side: my bpo-36389 PR modified PyObject_GC_NewVar() to check that 
young object are valid.

Moreover, my bpo-36389 PR required threshold: checking all young objects at 
each PyObject_GC_NewVar() simply killed performance which made the whole 
approach unusable in practice. Because of the threshold, my implementation 
failed to detect the hamt.c bug bpo-33803: the partially initialized was 
already fully initialized when my "object debugger" ran. PR 16615 detects 
immediately the hamt.c bug bpo-33803.

I designed bpo-36389 to attempt to detect when an object is corrupted after its 
creation, corrupted by code running "later".

We had a customer with a bug in OpenStack Nova. Nova crashed in visit_decref() 
but I failed to debug the issue. It motivated me to modify the debug ABI in 
Python 3.8, so later it will become possible to switch from release to debug 
mode to attempt to debug a program using additional checks added by the debug 
build.

Here my intent is to ensure that objects entering the GC are valid to help to 
keep the GC list consistent and valid. But it doesn't prevent an object from 
being corrupted after its creation. We should try to find a different approach 
for that (maybe revisit bpo-36389).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38392] Ensure that objects entering the GC are valid

2019-10-07 Thread STINNER Victor


STINNER Victor  added the comment:

> I'm pretty sure you meant nascheme, not nnorwitz.

Oops, right :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38392] Ensure that objects entering the GC are valid

2019-10-07 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

>From documentation:

.. c:function:: void PyObject_GC_Track(PyObject *op)

   Adds the object *op* to the set of container objects tracked by the
   collector.  The collector can run at unexpected times so objects must be
   valid while being tracked.  This should be called once all the fields
   followed by the :c:member:`~PyTypeObject.tp_traverse` handler become valid, 
usually near the
   end of the constructor.

So there was a bug in Modules/pyexpat.c.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38392] Ensure that objects entering the GC are valid

2019-10-07 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

It could be possible to do this in backward compatible way. PyObject_GC_Track() 
could add the object to the list of new objects (all objects are already linked 
in bi-linked list, so it would need to just move the object to the specified 
list), and PyObject_GC_NewVar() could check all new objects and clear the list 
of new objects (just move the head).

I am not sure it is worth to do such complication.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38392] Ensure that objects entering the GC are valid

2019-10-07 Thread Thomas Wouters


Thomas Wouters  added the comment:

I'm pretty sure you meant nascheme, not nnorwitz.

--
nosy: +nascheme, twouters -nnorwitz

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38392] Ensure that objects entering the GC are valid

2019-10-07 Thread STINNER Victor


STINNER Victor  added the comment:

Pablo, Tim, Neal: what do you think of this idea?

--
nosy: +nnorwitz, pablogsal, tim.peters
stage: patch review -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38392] Ensure that objects entering the GC are valid

2019-10-07 Thread STINNER Victor


Change by STINNER Victor :


--
keywords: +patch
pull_requests: +16204
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/16615

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38392] Ensure that objects entering the GC are valid

2019-10-07 Thread STINNER Victor


New submission from STINNER Victor :

A bug in Python/hamt.c was only discovered 4 months after the code was added to 
Python:
* https://mail.python.org/pipermail/python-dev/2018-June/153857.html
* https://bugs.python.org/issue33803

The problem was that an object was tracked by the GC, whereas calling its 
traverse method crashed Python... depending when the traverse method is called 
exactly.

Pseudo-code:

  PyObject_GC_Track();
  ...
  obj2 = PyObject_GC_NewVar(...);
  ...
  

The problem is that PyObject_GC_NewVar() can trigger a GC collection which uses 
the partially initialized object, and then we get a cryptic crash in the GC 
(usually seen in visit_decref()).

I propose to make PyObject_GC_Track() stricter: validate that the object seems 
to be "valid" or "fully initialized". From the GC point of view, it means that 
calling tp_traverse must not crash.

Attached PR is a concrete implementation.

This issue is a follow-up of my previously failed attempt bpo-36389 "Add 
gc.enable_object_debugger(): detect corrupted Python objects in the GC". While 
bpo-36389 attempts to discoverd corrupted objects once a GC collection is 
triggered, this issue is restricted to objects entering the GC.

First problem: my PR breaks Modules/pyexpat.c whereas the code is not strictly 
a bug: in Python 3.8, it's ok to put a partially initialized object into the GC 
if no GC collection can be triggered before the object is fully initialized. In 
Modules/pyexpat.c case, the code looks safe from that point of view.

It means that such change is a backward incompatible change. IMHO it's a good 
step forward, since it is likely to discovered hidden bugs.

--
components: Interpreter Core
messages: 354074
nosy: vstinner
priority: normal
severity: normal
status: open
title: Ensure that objects entering the GC are valid
versions: Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com