[Python-ideas] Re: Live variable analysis -> earlier release?

Andrew Barnert via Python-ideas Thu, 09 Apr 2020 19:55:45 -0700

On Apr 9, 2020, at 15:13, Wes Turner <[email protected]> wrote:
> 
> - > And then take a look at how @ApacheArrow
>   "supports zero-copy reads for lightning-fast data access without 
> serialization overhead."
> - .@blazingsql … #cuDF … @ApacheArrow 
>   https://docs.blazingdb.com/docs/blazingsql


This isn’t relevant here at all. How objects get constructed and manage their 
internal storage is completely orthogonal to the how Python manages object 
lifetimes.

>   … New #DataFrame Interface and when that makes a copy for 2x+ memory use
>   - "A dataframe protocol for the PyData ecosystem"
>     
> https://discuss.ossdata.org/t/a-dataframe-protocol-for-the-pydata-ecosystem/267

Same here.

> Presumably, nothing about magic del statements would affect C extensions, 
> Cython, zero-copy reads, or data that's copied to the GPU for faster 
> processing; but I don't understand this or how weakrefs and c-extensions 
> share memory that could be unlinked by a del.

And same for some of this—but not all.

C extensions can do the same kind of frame hacking, etc., as Python code, so 
they will have the same problems already raised in this thread. But I don’t 
think they they add anything new. (There are special rules allowing you to 
cheat with objects that haven’t been shared with Python code yet, which sounds 
like it would make things more complicated—until you realize that objects that 
haven’t been shared with Python code obviously can’t be affected by when Python 
code releases references.)

But weakrefs would be affected, and that might be a problem with the proposal 
that I don’t think anyone else has noticed before you.

Consider this toy example:

    spam = make_giant_spam()
    weakspam = weakref.ref(spam)
    with ThreadPoolExecutor() as e:
        for _ in range(1000):
            e.submit(dostuff, weakspam)

Today, the spam variable lives until the end of the scope, which doesn’t happen 
until the with statement ends, which doesn’t happen until all 1000 tasks 
complete. So, the object in that variable is still alive for all of the tasks.

With Guido’s proposed change, the spam variable is deleted after the last 
statement that uses it, which is before the with statement is even entered. 
Assuming it’s the only (non-weak) reference to the object, which is probably 
true, it will get destroyed, releasing all the memory (or other expensive 
resources) used by that giant spam object. That’s the whole point of the 
proposal, after all. But that means weakspam is now a dead weakref. So all 
those dostuff tasks are now doing stuff with a dead weakref. Presumably dostuff 
is designed to handle that safely, so you won’t crash or anything—but it can’t 
do the actual stuff you wanted it to do with that spam object.

And, while this is obviously a toy example, perfectly reasonable real code will 
do similar things. It’s pretty common to use weakrefs for cases where 99% of 
the time the object is there but occasionally it’s dead (e.g., during graceful 
shutdown), and changing that 99% to 0% or 1% will make the entire process 
useless. It’s also common to use weakrefs for cases where 80% of the time the 
object is there but 20% of the time it’s been ejected from some cache and has 
to be regenerated; changing that 80% to 1% will mean the process still 
functions, but the cache is no longer doing anything, so it functions a lot 
slower. And so on.

So, unless you could introduce some compiler magic to detect weakref.ref and 
weakref.weakdict.__setitem__ and so on (which might not be feasible, especially 
since it’s often buried inside some wrapper code), this proposal might well 
break many, maybe even most, good uses of weakrefs.

> Would be interested to see the real performance impact of this potential 
> optimization:
> - 10%: 
> https://instagram-engineering.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172

Skimming this, it looks like this one is not just orthogonal to Guido’s 
proposal, it’s almost directly counter to it. Their goal is to have relatively 
short-lived killable children that defer refcount twiddling and destruction as 
much as possible so that fork-inherited objects don’t have to be copied and 
temporary objects don’t have to be cleaned up, they can just be abandoned. 
Guido’s goal is to get things decref’d and therefore hopefully destroyed as 
early as possible.

Anyway, their optimization is definitely useful for a special class of programs 
that meet some requirements that sound unusual until you realize a lot of web 
servers/middlewares are designed around nearly the same requirements. People 
have done similar (in fact, even more radical, akin to building CPython and all 
of your extensions with refcounting completely disabled) in C and other 
languages, and there’s no reason (if you’re really careful) it couldn’t work in 
Python. But it’s certainly not the behavior you’d want from a general-purpose 
Python implementation.

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/OTQHA3IEAVGIBXVM3L5WZHEJ67HVDOT7/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Live variable analysis -> earlier release?

Reply via email to