date:20200611

[Python-Dev] Re: My take on multiple interpreters (Was: Should we be making so many changes in pursuit of PEP 554?)

2020-06-11 Thread Jim J. Jewett

In fairness, if the process is really exiting, the OS should clear that out.  
Even if it is embedded, the embedding process could just release (or zero out) 
the entire memory allocation.  I personally like plugging those leaks, but it 
does feel like putting purity over practicality.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YOSDQDIXDKKG76XPBKPE4DZVTBEIDBJQ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should we be making so many changes in pursuit of PEP 554?

2020-06-11 Thread Jim J. Jewett

I don't think that sharing data only by copying is the final plan.  Proxied 
objects seem like a fairly obvious extension.

I am also a bit suspicious of that great timing; perhaps latency is also 
important for startup?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QTNMF22BFGKUQELM6XICSQ5PCHVUZIRJ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: My take on multiple interpreters (Was: Should we be making so many changes in pursuit of PEP 554?)

2020-06-11 Thread Riccardo Ghetta

Hello Mark,
and thanks for your suggestions. However, I'm afraid I haven't explained
our use of python well enough.

On 11/06/2020 12:59, Mark Shannon wrote:

If you need to share objects across threads, then there will be
contention, regardless of how many interpreters there are, or which
processes they are in.
As a rule, we don't use that many python objects. Most of the time a
script calls C++ functions, operating on C++ data.

Perhaps with a small snippet I will explain myself better :

hcpi='INFLEUR'
n_months=3
base_infl=hs_base(hcpi, n_months, 0)
im=hs_fs(hcpi,'sia','m',n_months,0)
ip=hs_fs(hcpi,'sia','m',n_months-1,0)
ir=im+(hs_range()[1].day-1)/month_days(hs_range()[1])*(ip-im)
return ir/base_infl # double

this is a part of a inflation estimation used in pricing an
inflation-linked bond.
hcpi and n_months are really parameters of the script and the hs_
functions are all implemented in C++.
Some are very small and fast like hs_range, others are much more complex
and slow (hs_fs), so we wrap them with
Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS.
As you see, here python is used more to direct C++, than manipulate
objects. At GUI level things work a bit differently, but here we just
tried to avoid building and destroying a lot of ephemeral python objects
(unneeded anyway, because all subsequent processing is done by C++).
This python script is only a part of a larger processing done in
parallel by several threads, each operating in distinct instruments.
Evaluating an instrument could involve zero, one, or several of those
scripts.
During evaluation an instrument is bound to a single thread, so from the
point of view of python threads share nothing.

If the additional resource consumption is irrelevant, what's the
objection to spinning up a new processes?
The additional resource consumption of a new python interpreter is
irrelevant, but the process as a whole needs a lot of extra data making
a new process rather costly.
Plus there are issues of licensing, synchronization and load balancing
that are much easier to resolve (for our system, at least) with threads
than processes.
Still, we /do/ use multiple processes, but those tend to be across
administrative boundaries, or for very specific issues.

Ciao,
Riccardo
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/MJGGZ5HOBC5KTMQ5CPFI4NX6YYTD34F3/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: My take on multiple interpreters (Was: Should we be making so many changes in pursuit of PEP 554?)

2020-06-11 Thread Mark Shannon

Hi Riccardo,

On 10/06/2020 5:51 pm, Riccardo Ghetta wrote:

Hi,
as an user, the "lua use case" is right what I need at work.
I realize that for python this is a niche case, and most users don't
need any of this, but I hope it will useful to understand why having
multiple independent interpreters in a single process can be an
essential feature.
The company I work for develop and sells a big C++ financial system with
python embedded, providing critical flexibility to our customers.
Python is used as a scripting language, with most cases having C++
calling a python script itself calling other C++ functions.
Most of the times those scripts are in workloads I/O bound or where the
time spent in python is negligible. > But some workloads are really cpu bound and those tend to become
GIL-bound, even with massive use of C++ helpers; some to the point that
GIL-contention makes up over 80% of running time, instead of 1-5%.
And every time our customers upgrade their server, they buy machines
with more cores and the contention problem worsens.

Different interpreters need to operate in their own isolated address
space, or there will be horrible race conditions.

Regardless of whether that separation is done in software or hardware,
it has to be done.

Whenever data contained in a Python object is passed to C/C++ code,
there are two ways to do it. Either pass the whole object, or a
reference to the underlying data.
By passing the underlying data, you can release the GIL, and your
problem is solved, or at least alleviated.
If you can't do that, and must pass the object, then all accesses to
that object must be protected by a per-interpreter lock.
That's because interpreters need to operate serially, or you'll get
horrible race conditions.

If you need to share objects across threads, then there will be
contention, regardless of how many interpreters there are, or which
processes they are in.

Obviously, our use case calls for per-thread separate interpreters:
server processes run continuously and already consume gigabytes of RAM,
so startup time or increased memory consumption are not issues. Shared
state also is not needed, actually we try to avoid it as much as possible.

In the end, removing process-global state is extremely interesting for us.

If the additional resource consumption is irrelevant, what's the
objection to spinning up a new processes?

Cheers,
Mark.

P.S.
Do try passing the underlying data, not the whole object, and dropping
the GIL when calling back into C++. It can be effective.
CPython already drops the GIL for some computational workloads
implemented in C, like compression.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/6KYRUABTLNYNGNRBS5KRKPHKLKS2AI7U/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: My take on multiple interpreters (Was: Should we be making so many changes in pursuit of PEP 554?)

[Python-Dev] Re: Should we be making so many changes in pursuit of PEP 554?

[Python-Dev] Re: My take on multiple interpreters (Was: Should we be making so many changes in pursuit of PEP 554?)

[Python-Dev] Re: My take on multiple interpreters (Was: Should we be making so many changes in pursuit of PEP 554?)

4 matches

Site Navigation

Mail list logo

Footer information