[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

Nathaniel Smith Wed, 06 May 2020 11:56:20 -0700

On Wed, May 6, 2020 at 5:41 AM Victor Stinner <vstin...@python.org> wrote:
>
> Hi Nathaniel,
>
> Le mer. 6 mai 2020 à 04:00, Nathaniel Smith <n...@pobox.com> a écrit :
> > As far as I understand it, the subinterpreter folks have given up on
> > optimized passing of objects, and are only hoping to do optimized
> > (zero-copy) passing of raw memory buffers.
>
> I think that you misunderstood the PEP 554. It's a bare minimum API,
> and the idea is to *extend* it later to have an efficient
> implementation of "shared objects".


No, I get this part :-)

> IMO it should easy to share *data* (object "content") between
> subinterpreters, but each interpreter should have its own PyObject
> which exposes the data at the Python level. See the PyObject has a
> proxy to data.

So when you say "shared object" you mean that you're sharing a raw
memory buffer, and then you're writing a Python object that stores its
data inside that memory buffer instead of inside its __dict__:

class MySharedObject:
    def __init__(self, shared_memview, shared_lock):
        self._shared_memview = shared_memview
        self._shared_lock = shared_lock

    @property
    def my_attr(self):
        with self._shared_lock:
            return struct.unpack_from(MY_ATTR_FORMAT,
self._shared_memview, MY_ATTR_OFFSET)[0]

    @my_attr.setter
    def my_attr(self, new_value):
        with self._shared_lock:
            struct.pack_into(MY_ATTR_FORMAT, self._shared_memview,
MY_ATTR_OFFSET, new_value)

This is an interesting idea, but I think when most people say "sharing
objects between subinterpreters", they mean being able to pass some
pre-existing object between subinterpreters cheaply, while this
requires defining custom objects with custom locking. So we should
probably use different terms for them to avoid confusion :-).

This is an interesting idea, and it's true that it's not considered in
my post you're responding to. I was focusing on copying objects, not
sharing objects on an ongoing basis. You can't implement this kind of
"shared object" using a pipe/socket, because those create two
independent copies of the data.

But... if this is what you want, you can do the exact same thing with
subprocesses too. OSes provide inter-process shared memory and
inter-process locks. 'MySharedObject' above would work exactly the
same. So I think the conclusion still holds: there aren't any plans to
make IPC between subinterpreters meaningfully faster than IPC between
subprocesses.

> I don't think that we have to reinvent the wheel. threading,
> multiprocessing and asyncio already designed such APIs. We should to
> design similar APIs and even simply reuse code.

Or, we could simply *use* the code instead of using subinterpreters
:-). (Or write new and better code, I feel like there's a lot of room
for a modern 'multiprocessing' competitor.) The question I'm trying to
figure out is what advantage subinterpreters give us over these proven
technologies, and I'm still not seeing it.

> My hope is that "synchronization" (in general, locks in specific) will
> be more efficient in the same process, than synchronization between
> multiple processes.

Hmm, I would be surprised by that – the locks in modern OSes are
highly-optimized, and designed to work across subprocesses. For
example, on Linux, futexes work across processes. Have you done any
benchmarks?

Also btw, note that if you want to use async within your
subinterpreters, then that rules out a lot of tools like regular
locks, because they can't be integrated into an event loop. If your
subinterpreters are using async, then you pretty much *have* to use
full-fledged sockets or equivalent for synchronization.

> I would be interested to have a generic implementation of "remote
> object": a empty proxy object which forward all operations to a
> different interpreter. It will likely be inefficient, but it may be
> convenient for a start. If a method returns an object, a new proxy
> should be created. Simple scalar types like int and short strings may
> be serialized (copied).

How would this be different than
https://docs.python.org/3/library/multiprocessing.html#proxy-objects ?

How would you handle input arguments -- would those get proxied as well?

Also, does this mean the other subinterpreter has to be running an
event loop to process these incoming requests? Or is the idea that the
other subinterpreter would process these inside a traditional Python
thread, so users are exposed to all the classic shared-everything
locking issues?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/53XQ52JVILNQH7IQC7SHKFSNHWD4DNX6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround

Reply via email to