[issue40255] Fixing Copy on Writes from reference counting

Neil Schemenauer Thu, 16 Apr 2020 13:52:14 -0700


Neil Schemenauer <nas-pyt...@arctrix.com> added the comment:


A few thoughts

First, the PR is well done.  The changes are pretty minimal and are well 
localized.  Thanks Eddie for writing clean code and responding to review 
requests.

My feeling is that, in current state, this still should not get merged or at 
least not get enabled by default.  The main consideration is what is the cost 
of this new feature vs what is the benefit.

If the immortalize heap function is not called by default, all Python users pay 
a performance hit for this feature.  Even after recent PR improvements, if you 
don't immortalize the heap, Python is slower. 

Only a small fraction of Python users will benefit from this feature.  I think 
the "pre-fork, CoW exploiting" application architecture is more common than 
some people would realize (it is a successful way to achieve multi-processing 
with CPython, making good use of multiple CPU cores).  However, I'm pretty sure 
it is used by a very tiny fraction of all Python programs.

If we do immortalize by default (e.g. on Python startup), the semantics of the 
refcnt field are subtly changed.  I suspect some programs will be broken as a 
result of this change.  A clue about what might go wrong comes from the unicode 
object code.  E.g. the resize_inplace() function in unicodeobject.c.  It is 
possible that a non-trivial amount of other 3rd party code will make 
assumptions about refcnt that will be broken by this change.  That code could 
be fixed but it is a matter of cost vs benefit.

If it is disabled by default with a build flag we have an ABI compatibility 
problem.  I.e. incref/decref are inlined functions/macros.  Also, as mentioned 
above, these kinds of build options tend to make maintenance and testing harder.

The frozen GC generation did not have these kinds of issues.  If people don't 
use it, there is basically no cost.  I think it was a good idea to merge the 
frozen generation feature.  There is a small amount of ongoing maintenance cost 
but that's acceptable.

Regarding Mark's comment about immutability vs immutable object headers, I 
would frame the goal of the PR differently.  Like the GC frozen generation, 
this immutable instance change is about providing a way to opt-out of Python's 
default garbage collection mechanism.  The frozen generation was to opt-out of 
the mark-and-sweep style cyclic collection.  This PR is about opting-out of the 
default Python reference counting.  

Put that way, could we consider this PR as an incremental step in a process of 
making it possible to have a non-reference count based GC for CPython?  E.g. 
perhaps we could eventually have a mixture of mark-and-sweep and reference 
counted GC, in order to support the handle (HPy) API.  If we want to move 
towards that, we want CPython code to stop making assumptions about the refcnt 
field, like the unicodeobject.c file currently does.

I think pitching the PR in that way makes it more compelling.  Merge it not 
because it helps a tiny class of Python applications.  Instead, merge it 
because it helps find unwarranted assumptions about how the Python GC works.  
Once we get bad assumptions cleaned up in 3rd party code, we have more of 
chance of pursuing an alternative GC strategy for CPython.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue40255>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue40255] Fixing Copy on Writes from reference counting

Reply via email to