On Apr 9, 2020, at 15:13, Wes Turner <wes.tur...@gmail.com> wrote:
>
> - > And then take a look at how @ApacheArrow
> "supports zero-copy reads for lightning-fast data access without
> serialization overhead."
> - .@blazingsql … #cuDF … @ApacheArrow
> https://docs.blazingdb.com/docs/blazingsql
This isn’t relevant here at all. How objects get constructed and manage their
internal storage is completely orthogonal to the how Python manages object
lifetimes.
> … New #DataFrame Interface and when that makes a copy for 2x+ memory use
> - "A dataframe protocol for the PyData ecosystem"
>
> https://discuss.ossdata.org/t/a-dataframe-protocol-for-the-pydata-ecosystem/267
Same here.
> Presumably, nothing about magic del statements would affect C extensions,
> Cython, zero-copy reads, or data that's copied to the GPU for faster
> processing; but I don't understand this or how weakrefs and c-extensions
> share memory that could be unlinked by a del.
And same for some of this—but not all.
C extensions can do the same kind of frame hacking, etc., as Python code, so
they will have the same problems already raised in this thread. But I don’t
think they they add anything new. (There are special rules allowing you to
cheat with objects that haven’t been shared with Python code yet, which sounds
like it would make things more complicated—until you realize that objects that
haven’t been shared with Python code obviously can’t be affected by when Python
code releases references.)
But weakrefs would be affected, and that might be a problem with the proposal
that I don’t think anyone else has noticed before you.
Consider this toy example:
spam = make_giant_spam()
weakspam = weakref.ref(spam)
with ThreadPoolExecutor() as e:
for _ in range(1000):
e.submit(dostuff, weakspam)
Today, the spam variable lives until the end of the scope, which doesn’t happen
until the with statement ends, which doesn’t happen until all 1000 tasks
complete. So, the object in that variable is still alive for all of the tasks.
With Guido’s proposed change, the spam variable is deleted after the last
statement that uses it, which is before the with statement is even entered.
Assuming it’s the only (non-weak) reference to the object, which is probably
true, it will get destroyed, releasing all the memory (or other expensive
resources) used by that giant spam object. That’s the whole point of the
proposal, after all. But that means weakspam is now a dead weakref. So all
those dostuff tasks are now doing stuff with a dead weakref. Presumably dostuff
is designed to handle that safely, so you won’t crash or anything—but it can’t
do the actual stuff you wanted it to do with that spam object.
And, while this is obviously a toy example, perfectly reasonable real code will
do similar things. It’s pretty common to use weakrefs for cases where 99% of
the time the object is there but occasionally it’s dead (e.g., during graceful
shutdown), and changing that 99% to 0% or 1% will make the entire process
useless. It’s also common to use weakrefs for cases where 80% of the time the
object is there but 20% of the time it’s been ejected from some cache and has
to be regenerated; changing that 80% to 1% will mean the process still
functions, but the cache is no longer doing anything, so it functions a lot
slower. And so on.
So, unless you could introduce some compiler magic to detect weakref.ref and
weakref.weakdict.__setitem__ and so on (which might not be feasible, especially
since it’s often buried inside some wrapper code), this proposal might well
break many, maybe even most, good uses of weakrefs.
> Would be interested to see the real performance impact of this potential
> optimization:
> - 10%:
> https://instagram-engineering.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172
Skimming this, it looks like this one is not just orthogonal to Guido’s
proposal, it’s almost directly counter to it. Their goal is to have relatively
short-lived killable children that defer refcount twiddling and destruction as
much as possible so that fork-inherited objects don’t have to be copied and
temporary objects don’t have to be cleaned up, they can just be abandoned.
Guido’s goal is to get things decref’d and therefore hopefully destroyed as
early as possible.
Anyway, their optimization is definitely useful for a special class of programs
that meet some requirements that sound unusual until you realize a lot of web
servers/middlewares are designed around nearly the same requirements. People
have done similar (in fact, even more radical, akin to building CPython and all
of your extensions with refcounting completely disabled) in C and other
languages, and there’s no reason (if you’re really careful) it couldn’t work in
Python. But it’s certainly not the behavior you’d want from a general-purpose
Python implementation.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/OTQHA3IEAVGIBXVM3L5WZHEJ67HVDOT7/
Code of Conduct: http://python.org/psf/codeofconduct/