subject:"\[issue35813\] shared memory construct to avoid need for serialization between processes"

[issue35813] shared memory construct to avoid need for serialization between processes

2019-12-09 Thread Xavier de Gaye



Change by Xavier de Gaye :


--
nosy:  -xdegaye

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-05-08 Thread Pam McA'Nulty



Pam McA'Nulty  added the comment:

Reviewing docs and am writing a queue implementation for usability testing.  I 
think ShareableList needs to support `close()` and `unlink()` directly.  The 
`.shm` attribute should be considered an _implementation_ detail, and not be 
exposed.

--
nosy: +Pam.McANulty

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-03-18 Thread Xavier de Gaye



Xavier de Gaye  added the comment:

After changeset e895de3e7f3cc2f7213b87621cfe9812ea4343f0, test_all fails on 
platforms that lack the _posixshmem extension module (Android for example):

==
FAIL: test_all (test.test___all__.AllTest) (module='multiprocessing.managers')
--
Traceback (most recent call last):
File "/data/local/tmp/python/lib/python3.8/test/test___all__.py", line 34, in 
check_all
  exec("from %s import *" % modname, names)
AttributeError: module 'multiprocessing.managers' has no attribute 
'SharedMemoryManager'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/local/tmp/python/lib/python3.8/test/test___all__.py", line 37, in 
check_all
  self.fail("__all__ failure in {}: {}: {}".format(
AssertionError: __all__ failure in multiprocessing.managers: AttributeError: 
module 'multiprocessing.managers' has no attribute 'SharedMemoryManager'


The following patch fixes the problem:

diff --git a/Lib/multiprocessing/managers.py b/Lib/multiprocessing/managers.py
index 7973012b98..3fdd60fff7 100644
--- a/Lib/multiprocessing/managers.py
+++ b/Lib/multiprocessing/managers.py
@@ -8,8 +8,7 @@
# Licensed to PSF under a Contributor Agreement.
#

-__all__ = [ 'BaseManager', 'SyncManager', 'BaseProxy', 'Token',
-'SharedMemoryManager' ]
+__all__ = [ 'BaseManager', 'SyncManager', 'BaseProxy', 'Token' ]

#
# Imports
@@ -33,6 +32,7 @@ from . import get_context
try:
   from . import shared_memory
   HAS_SHMEM = True
+__all__.append('SharedMemoryManager')
except ImportError:
   HAS_SHMEM = False

--
nosy: +xdegaye

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-23 Thread Davin Potts



Davin Potts  added the comment:

@Giampaolo:  The docstring in the shared_memory module currently marks the API 
as experimental.  (You read my mind...)

I will start a new PR where we can work on the 
better-integration-into-the-larger-multiprocessing-docs and add comments there.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-23 Thread Davin Potts



Davin Potts  added the comment:


New changeset e895de3e7f3cc2f7213b87621cfe9812ea4343f0 by Davin Potts in branch 
'master':
bpo-35813: Tests and docs for shared_memory (#11816)
https://github.com/python/cpython/commit/e895de3e7f3cc2f7213b87621cfe9812ea4343f0


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-23 Thread Giampaolo Rodola'



Giampaolo Rodola'  added the comment:

IMHO given the size of the change and how quickly this evolved I would probably 
feel safer to mark the API as experimental (no pun intended, this is great 
work), at least in alpha2. I believe we now have a good explanation in the docs 
but it still needs to be integrated better into the larger, existing 
multiprocessing docs (I will try to add some comments on the PR tomorrow).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-23 Thread Raymond Hettinger



Raymond Hettinger  added the comment:

Any objections to getting this patch applied today so that it will be in the 
second alpha?

Of course, further adjustments can still be made afterwards if needed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-23 Thread Giampaolo Rodola'


Giampaolo Rodola'  added the comment:

> Side note: I’m the author of the sysv_ipc package you found, as well as the 
> posix_ipc package.

Sorry, I didn't realize it was you.

> Not true.  A manager started by one process can be connected to by another 
> process that is not a child.  This is covered in the docs here:  
> https://docs.python.org/3/library/multiprocessing.html#using-a-remote-manager 

Ah nice! OK then. With this + Philip's explanation on why shmat()/shmdt() are 
not needed I guess I ran out of API-related complaints now. =)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-23 Thread Davin Potts



Davin Potts  added the comment:

> FWIW I bumped into this lib: http://semanchuk.com/philip/sysv_ipc/

The author of that lib, Philip Semanchuk, is one of the people participating in 
this effort -- he has posted above in msg334934 here on b.p.o. and has helped 
review the PR in GH-11816.

He is also the author of the posix_ipc package which was the original basis for 
our POSIX Shared Memory implementation here.

The decision to base our Unix platform support upon POSIX and not SystemV 
libraries came after considerable research and there are important differences 
between the two.  To oversimplify:  POSIX Shared Memory support has now been 
available for some time on Linux, *BSD, MacOS, and others and is something of a 
successor to the SystemV.


> That assumes a single app/process which spawns a child (the "worker").

Not true.  A manager started by one process can be connected to by another 
process that is not a child.  This is covered in the docs here:  
https://docs.python.org/3/library/multiprocessing.html#using-a-remote-manager  
That child can then request that shared memory blocks it creates be remotely 
tracked and managed by that remote process's manager.  While I would not expect 
this to be a common use case, this is a feature of BaseManager that we inherit 
into SharedMemoryManager.

The SyncManager.Lock can be used as part of this as well.  Thus, two unrelated 
apps/processes *can* coordinate their management of shared memory blocks 
through the SharedMemoryManager.


> That would translate into a new Semaphore(name=None, create=False)
> class which (possibly?) would also provide better performances
> compared to SyncManager.Semaphore

Right!  You might have noticed that Philip has such a semaphore construct in 
his posix_ipc lib.

I opted to not attempt to add this feature as part of this effort to both (1) 
keep focused on the core needs to work with shared memory, and (2) to take more 
time in the future to work out how to get cross-platform support for the 
semaphore right (as you point out, there are complications to work through).


> Extra 1: apparently there are also POSIX msgget(), msgrcv() and
> msgsnd() syscalls which could be used to implement a System-V message
> Queue similar to SyncManager.Queue later on.

Right!  This is also something Philip has in his posix_ipc lib.  This should be 
part of the roadmap for what we do next with SharedMemory.  This one may be 
complicated by the fact that not all platforms that implement POSIX Shared 
Memory chose to also implement these functions in the same way.  We will need 
time to work out what we can or can not reasonably do here.


> Extra 2: given the 2 distinct use-cases I wonder if the low-level
> component (shared_memory.py) really belongs to multiprocessing module

Given what I wrote above about how multiprocessing.managers does enable these 
use cases and the existing "distributed shared memory" support in 
multiprocessing, I think it logically belongs in multiprocessing.  I suggest 
that "shm_open" and "shm_unlink" are our low-level tools, which appropriately 
are in _posixshmem, but SharedMemory and the rest are high-level tools; 
SharedMemoryManager will not be able to cover all life-cycle management use 
cases thus SharedMemory will be needed by many and in contrast, "shm_open" and 
"shm_unlink" will be needed only by those wishing to do something wacky.  
(Note: I am not trying to make "wacky" sound like a bad thing because wacky can 
be very cool sometimes.)


Philip's ears should now be burning, I mentioned him so many times in this 
post.  Ah!  He beat me to it while I was writing this.  Awesome!

We would not be where we are with SharedMemory without his efforts over many 
years with his posix_ipc lib.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-23 Thread Philip Semanchuk


Philip Semanchuk  added the comment:

> On Feb 23, 2019, at 10:40 AM, Giampaolo Rodola'  
> wrote:
> 
> 
> Giampaolo Rodola'  added the comment:
> 
>> We are consciously choosing to not support an atomic "create or attach".  
>> This significantly simplifies the API and avoids the valid concerns raised 
>> around user confusion relating to that behavior (including the use of 
>> different specified 'size' values in a race) but does not preclude our 
>> potentially introducing this as a feature in the future.
> 
> I understand that because of *size* we cannot solve the race condition issue 
> unless the user uses some sort of synchronization mechanism. FWIW I bumped 
> into this lib:
> http://semanchuk.com/philip/sysv_ipc/
> ...which provides two separate APIs to "create" and "attach":
> 
 SharedMemory("name", IPC_CREX)
 attach("name")
> 
> At this point I'm agnostic about the API, which is probably just a matter of 
> personal taste (e.g. one may prefer a separate SharedMemory.attach() 
> classmethod or a *mode* argument accepting "x" and "a"). I see that that lib 
> use shmat() on attach and shmdt() on detach. I'm not sure if that makes a 
> difference, just mentioning it because your implementation doesn't do that on 
> close() and perhaps it should.

attach() and detach() are particular to SysV IPC which is different from the 
POSIX IPC that’s being used here. There’s no need for attach() and detach() 
with POSIX shared memory. 

POSIX IPC is generally simpler than SysV IPC, in part because it was developed 
after SysV IPC so the developers had the benefit of experience with the older 
API.

Side note: I’m the author of the sysv_ipc package you found, as well as the 
posix_ipc package.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-23 Thread Giampaolo Rodola'



Giampaolo Rodola'  added the comment:

> We are consciously choosing to not support an atomic "create or attach".  
> This significantly simplifies the API and avoids the valid concerns raised 
> around user confusion relating to that behavior (including the use of 
> different specified 'size' values in a race) but does not preclude our 
> potentially introducing this as a feature in the future.

I understand that because of *size* we cannot solve the race condition issue 
unless the user uses some sort of synchronization mechanism. FWIW I bumped into 
this lib:
http://semanchuk.com/philip/sysv_ipc/
...which provides two separate APIs to "create" and "attach":

>>> SharedMemory("name", IPC_CREX)
>>> attach("name")

At this point I'm agnostic about the API, which is probably just a matter of 
personal taste (e.g. one may prefer a separate SharedMemory.attach() 
classmethod or a *mode* argument accepting "x" and "a"). I see that that lib 
use shmat() on attach and shmdt() on detach. I'm not sure if that makes a 
difference, just mentioning it because your implementation doesn't do that on 
close() and perhaps it should.


> Combined with a SyncManager.Lock, users can already achieve an atomic "create 
> or attach" using this simpler API.

That assumes a single app/process which spawns a child (the "worker"). In that 
case SyncManager.Lock/others is indeed compatible with SharedMemory and can be 
used to implement non-racy "create or attach" and also to synchronize memory 
access on read and write. But AFAICT there are two uses cases for this 
functionality, and there currently is no mechanism to do any that if you have 
two unrelated apps/processes relying on a common shared memory *name* and 
*size*. I'm taking a peek at "man smh_overview" which says:

<>

That would translate into a new Semaphore(name=None, create=False) class which 
(possibly?) would also provide better performances compared to 
SyncManager.Semaphore. Not sure if we can do the same on Windows though. 

Extra 1: apparently there are also POSIX msgget(), msgrcv() and msgsnd() 
syscalls which could be used to implement a System-V message Queue similar to 
SyncManager.Queue later on. 

Extra 2: given the 2 distinct use-cases I wonder if the low-level component 
(shared_memory.py) really belongs to multiprocessing module. Perhaps this 
should be provided as a separate "sharedmemory" module with 
multiprocessing.managers.SharedMemoryMemory being the high-level interface.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-22 Thread Davin Potts



Davin Potts  added the comment:

> Code looks much better now. I'm still not convinced 
> "SharedMemory(name=None, create=False, size=0)" is the best API.
> How are you supposed to "create or attach" atomically?

We are consciously choosing to not support an atomic "create or attach".  This 
significantly simplifies the API and avoids the valid concerns raised around 
user confusion relating to that behavior (including the use of different 
specified 'size' values in a race) but does not preclude our potentially 
introducing this as a feature in the future.

This simpler API still supports a "try: create; except: attach" which is not 
atomic but effectively covers the primary use cases for "create or attach".  
Combined with a SyncManager.Lock, users can already achieve an atomic "create 
or attach" using this simpler API.


> Also, could you address my comment about size?
> https://bugs.python.org/issue35813#msg335731
>> Let me rephrase: are we forced to specify a value (aka call
>> ftruncate()) on create ? If we are as I think, could size have a
>> reasonable default value instead of 0? Basically I'm wondering if we
>> can relieve the common user from thinking about what size to use,
>> mostly because it's sort of a low level detail. Could it perhaps
>> default to mmap.PAGESIZE?

Apologies for not responding to your question already, Giampaolo.

For the same reasons that (in C) malloc does not provide a default size, I do 
not think we should attempt to provide a default here.  Not all platforms 
allocate shared memory blocks in chunks of mmap.PAGESIZE, thus on some 
platforms we would unnecessarily over-allocate no matter what default size we 
might choose.  I do not think we should expect users to know what mmap.PAGESIZE 
is on their system.  I think it is important that if a user requests a new 
allocation of memory, that they first consider how much memory will be needed.  
When attaching to an existing shared memory block, its size is already defined.

I think this even fits with CPython's over-allocation strategies behind things 
like list, where an empty list triggers no malloc at all.  We will not allocate 
memory until the user tells us how much to allocate.


> Also, there is no way to delete/unwrap memory without using an
> existing SharedMemory instance, which is something we may not have
> on startup. Perhaps we should have a "shared_memory.unlink(name)"
> function similar to os.unlink() which simply calls C shm_unlink().

It is not really possible to offer this on non-POSIX platforms so I think we 
should not attempt to offer a public "shared_memory.unlink(name)".  It is 
possible to invoke "shm_unlink" with the name of a shared memory block (for 
those who really need it) on platforms with POSIX Shared Memory support via:
shared_memory._posixshmem.shm_unlink('name')

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-22 Thread Giampaolo Rodola'



Giampaolo Rodola'  added the comment:

Also, there is no way to delete/unwrap memory without using an existing 
SharedMemory instance, which is something we may not have on startup. Perhaps 
we should have a "shared_memory.unlink(name)" function similar to os.unlink() 
which simply calls C shm_unlink().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-22 Thread Giampaolo Rodola'



Giampaolo Rodola'  added the comment:

Code looks much better now. I'm still not convinced "SharedMemory(name=None, 
create=False, size=0)" is the best API. How are you supposed to "create or 
attach" atomically? You can do that with O_EXCL but as it stands this is not 
togglable via the API.

Also, could you address my comment about size?
https://bugs.python.org/issue35813#msg335731

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-20 Thread Davin Potts



Davin Potts  added the comment:

The simpler API is now implemented in GH-11816 as discussed previously.  
Notably:
> * We go with this simpler API:  SharedMemory(name=None, create=False, size=0)
> * 'size' is ignored when create=False
> * create=True acts like O_CREX and create=False only attaches to existing 
> shared memory blocks

As part of this change, the PosixSharedMemory and WindowsNamedSharedMemory 
classes are no more; they have been consolidated into the SharedMemory class 
with a single, simpler, consistent-across-platforms API.

On the SharedMemory class, 'size' is now stored by the __init__ and does not 
use fstat() as part of its property.

Also, SharedMemoryManager (and its close friends) has been relocated to the 
multiprocessing.managers submodule, matching the organization @Giampaolo 
outlined previously:
multiprocessing.managers.SharedMemoryManager
multiprocessing.managers._SharedMemoryTracker
multiprocessing.managers.SharedMemoryServer  (not documented)
multiprocessing.shared_memory.SharedMemory
multiprocessing.shared_memory.SharedList 
multiprocessing.shared_memory.WindowsNamedSharedMemory  (REMOVED)
multiprocessing.shared_memory.PosixSharedMemory  (REMOVED)

I believe this addresses all of the significant discussion topics in a way that 
brings together all of the excellent points being made.  Apologies if I have 
missed something -- I did not think so but I will go back through all of the 
discussions tomorrow to double-check.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-17 Thread Davin Potts



Davin Potts  added the comment:

> I think we need the "create with exclusive behavior" option, even
> though we don't know how to implement it on Windows right now.

A fix to avoid the potential race condition on Windows is now part of GH-11816.


> To support 1 & 2, we could just have 'create'.  When true, it would
> act like O_CREX.  When false, you would get an error if the name
> doesn't already exist.

I am good with this and now it can be supported.


> a 3rd case where you have "co-equal" processes and any one of them
> could create and the others would attach.

There are some practical use cases motivating this.  Rather than debate the 
merits of those use cases, given the concern raised, perhaps we should forego 
supporting this 3rd case for now.


> Regarding 'size', I think it is a bit weird how it currently works.
> Maybe 'size' should only be valid if you are creating a new shared
> memory object.

This would avoid potential confusion in the details of how attempts to resize 
do/don't work on different platforms.  I would prefer to not need to explain 
that on MacOS, requesting a smaller size is disallowed.  This defers such 
issues until considering a "resize()" method as you suggest.

I like this.


> Should 'size' be a property that always does fstat() to find the
> size of the underlying file?

The potential exists for non-Python code to attach to these same shared memory 
blocks and alter their size via ftruncate() (only on certain Unix platforms).

We could choose to not support such "external" changes and let size be a fixed 
value from the time of instantiation.  But I would like to believe we can be 
more effective and safely use fstat() behind our reporting of 'size'.


> It seems unclear to me how you should avoid cluttering /var/run/shm
> with shared memory objects that people forget to cleanup.

This is the primary purpose of the SharedMemoryManager.  Admittedly, we will 
not convince everyone to use it when they should, just like we are not able to 
convince everyone to use NamedTemporaryFile for their temp files.


To update the proposed change to the API:

* We go with this simpler API:  SharedMemory(name=None, create=False, size=0)
* 'size' is ignored when create=False
* create=True acts like O_CREX and create=False only attaches to existing 
shared memory blocks

Remaining question:  do PosixSharedMemory and WindowsNamedSharedMemory mirror 
this simplified API or do we expose the added functionality each offers, 
permitting informed users to use things like 'mode' when they know it is 
enforced on a particular platform?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-17 Thread Neil Schemenauer



Neil Schemenauer  added the comment:

Some thoughts on this API.

I think we need the "create with exclusive behavior" option, even though we 
don't know how to implement it on Windows right now.  To me, there are two 
cases when calling SharedMemory:

1) You want to create a new shared memory object.  In this case, if your chosen 
name conflicts with an existing name, you should get an error.  If you don't, 
you could be sharing data between unrelated processes.  Chaos will ensue.

2) You want to attach to an existing shared memory object.  You have a name, 
passed to you by some means.  In this case, you don't want to create a new 
object if it doesn't exist.  You should get an error.

I thought there might be a 3rd case where you have "co-equal" processes and any 
one of them could create and the others would attach.  That dosen't seem safe 
though, if you can't ensure a unique name is used.  You might attach to some 
unrelated shared memory object due to a name conflict.  So, I think the API 
should not support this case.

To support 1 & 2, we could just have 'create'.  When true, it would act like 
O_CREX.  When false, you would get an error if the name doesn't already exist.

Regarding 'size', I think it is a bit weird how it currently works.  Maybe 
'size' should only be valid if you are creating a new shared memory object.  If 
you are attaching to an existing one, size would be found from the existing 
object (if that's possible on all platforms).  SharedMemory could grow a 
"resize()" method that lets you enlarge the underlying memory object.  I don't 
know if that's useful in practice so maybe better to leave it out and keep the 
API simple.

Should 'size' be a property that always does fstat() to find the size of the 
underlying file?  Or, should it be the size passed in (for create=True) or the 
size of the file when the mmap is created.  I'm not sure but maybe it is better 
to not always do the fstat().

The 'read_only' keyword doesn't make sense if you care creating a new object 
and need to call ftruncate() on it.  I think the OS will not allow that.  Maybe 
'read_only' should not be allowed if you are creating.  Or, you should have a 
method that takes a read-write object and makes it read-only.

On Linux at least, shm_open() is just a thin layer over open().  So the shared 
memory file is really just a regular file that is stored in /var/run/shm.  If 
you realize that, it seems SharedMemory() is much like 
tempfile.NamedTemporaryFile().  E.g. if you create a NamedTemporaryFile under 
/var/run/shm it will behave like the file descriptor created by SharedMemory(). 
 SharedMemory() objects have the difference that they don't clean themselves 
up.  Also, they don't set the close-on-fork flag on the file descriptor.  Maybe 
they should?  It seems unclear to me how you should avoid cluttering 
/var/run/shm with shared memory objects that people forget to cleanup.  I guess 
the plan is that people need to call unlink() at the right time.  That seems 
error prone though.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-16 Thread Giampaolo Rodola'



Giampaolo Rodola'  added the comment:

> It looks like it does skip calling ftruncate() if size is 0. From 
> posixshmem.c:

Let me rephrase: are we forced to specify a value (aka call ftruncate()) on 
create ? If we are as I think, could size have a reasonable default value 
instead of 0? Basically I'm wondering if we can relieve the common user from 
thinking about what size to use, mostly because it's sort of a low level 
detail. Could it perhaps default to mmap.PAGESIZE?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-16 Thread Davin Potts



Davin Potts  added the comment:

@giampaolo:

> Also, what happens if you alter the size of an existing object with a smaller 
> value? Is the memory region overwritten?

Attaching to an existing shared memory block with a size=N which is smaller 
than its allocated size (say it was created with size=M and N Can't you just avoid calling ftruncate() if size is not passed (None)?

It looks like it does skip calling ftruncate() if size is 0.  From posixshmem.c:
if (size) {
DPRINTF("calling ftruncate, fd = %d, size = %ld\n", self->fd, size);
if (-1 == ftruncate(self->fd, (off_t)size)) {


>> I think this misses the ...
> It appears this is already covered:

Sorry for any confusion; I was interpreting your proposed parameter name, 
attach_if_exists, in the following way:
* attach_if_exists=True:   If exists, attach to it otherwise create one
* attach_if_exists=False:  Create a new one but do not attach to an existing 
with the same name
I did not see a way to indicate a desire to *only* attach without creation.  I 
need a way to test to see if a shared memory block already exists or not 
without risk of creating one.  At least this is how I was interpreting "attach 
if exists".


> Don't you also want to "create if it doesn't exist, else attach" as a single, 
> atomic operation?

Yes, I do!  This was part of my description for the parameter named "create" in 
msg335660:
When set to True, a new shared memory block will be created unless
one already exists with the supplied unique name, in which case that block
will be attached to and used.


> I'm not sure if there are or should be sync primitives to "wait for another 
> memory to join me" etc.

In the case of shared memory, I do not think so.  I think such signaling 
between processes, when needed, can be accomplished by our existing signaling 
mechanisms (like, via the Proxy Objects for Event or Semaphore).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-16 Thread Giampaolo Rodola'



Giampaolo Rodola'  added the comment:

As for *flags* argument "man shm_open" mentions 5 possible flags:

- O_RDONLY, OR_RDWR: these can be served by existing *read_only* arg
- O_CREAT, O_EXCL: dictates whether to raise error if file exists; could be 
served via *attach_if_exists* arg, if added, or internally if not added
- O_TRUNC: related to *size* argument (it seems it should be set when size=0)

As such I think "flags" should not be exposed. As for Windows, *read_only* is 
currently exposed but ignored, and no other flags are contemplated. It seems 
you can implement *read_only* by passing FILE_MAP_READ or FILE_MAP_WRITE to 
OpenFileMappingW() here:
https://github.com/python/cpython/blob/1e5341eb171d7d2b90020cfcc0a64021326d/Lib/multiprocessing/shared_memory.py#L98
Not sure how to reliably check if a file mapping already exists, but possibly 
POSIX and Windows should behave the same in this regard (and some test cases to 
exercise all the possible combinations would be good to have).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-16 Thread Giampaolo Rodola'



Giampaolo Rodola'  added the comment:

> The unlink() method is available on the SharedMemory class.  No manager is 
> required. This is also captured in the docs.

I missed that, sorry.


>> 3) it seems "size" kwarg cannot be zero (current default)
> From the docs: When attaching to an existing shared memory block, set to 0

Mmm... this makes the API a bit weird because it's a arg which is optional only 
for existing memory object, not new ones. I see that internally *size* is 
passed to ftruncate(). Can't you just avoid calling ftruncate() if size is not 
passed (None)? Also, what happens if you alter the size of an existing object 
with a smaller value? Is the memory region overwritten?

a = shared_memory.SharedMemory(None, size=10)
b = shared_memory.SharedMemory(a.name, size=4)


>> Maybe something like this instead?
>>SharedMemory(name=None, attach_if_exists=False, size=0)
> I think this misses the use case when wanting to ensure we only attach to an 
> existing shared memory block and if it does not exist, we should raise an 
> exception because we can not continue.

It appears this is already covered:

>>> SharedMemory(name="non-existent")
Traceback (most recent call last):
...
_posixshmem.ExistentialError: No shared memory exists with the specified 
name


> I believe the two dominant use cases to address are:
> 1) I want to create a shared memory block (either with or without a 
> pre-conceived name).
> 2) I want to attach to an existing shared memory block by its unique name.

Don't you also want to "create if it doesn't exist, else attach" as a single, 
atomic operation? It seems to me the only way to achieve that would be this, 
but it's racy:

try:
m = shared_memory.SharedMemory("foo", size=10)  # create
except ExistentialError:
m = shared_memory.SharedMemory("foo")  # attach (race condition)

Note that here I'm entering into an unknown territory, because I'm not sure if 
there are or should be sync primitives to "wait for another memory to join me" 
etc. Just mentioning it for the sake of mental lucubration. =)


>  If we move SharedMemoryManager to live in multiprocessing.managers, we need 
> to decide how to handle (and communicate to the user appropriately) its 
> potential absence.

How about:

# managers.py
try:
from .shared_memory import SharedMemory, SharedList 
except ImportError:
pass
else:
class SharedMemoryManager: ...
class SharedMemoryServer: ...
del SharedMemory, SharedList


>> 4) I have some reservations about SharedMemory's "flags" and "mode" args.
> It sounds like you are agreeing with what I advocated in msg335660 (up 
> above). Great!

If you mean drop those 2 args entirely then it probably there's no point in 
exposing WindowsNamedSharedMemory and PosixSharedMemory . If you mean to keep 
them then it seems "read_only=True" would conflict with "flags=O_RDWR". The 
fact that the Windows counterpart uses different constants makes me think this 
bitwise stuff should probably be handled internally, and exposed via friendly 
named-args like "read_only" / "attach_if_exists" or similar (but it's still not 
clear what those named args should be exactly).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-16 Thread Davin Potts



Davin Potts  added the comment:

@giampaolo:

> 1) As for SharedMemoryManager, I believe it should live in 
> multiprocessing.managers, not shared_memory.py.

I am happy to see it live in multiprocessing.managers so long as we can provide 
a clean way of handling what happens on a platform where we can not support 
shared memory blocks.  (We have implementations for PosixSharedMemory and 
NamedSharedMemory which together cover Windows, Linux, MacOS, the *BSDs, and 
possibly others but that does not cover everything.)

@Neil has already raised this question of what do we want the behavior to be on 
these unsupported platforms on import?  If everything dependent upon shared 
memory blocks remains inside shared_memory.py, then we could raise a 
ModuleNotFoundError or ImportError or similar when attempting to `import 
shared_memory`.  If we move SharedMemoryManager to live in 
multiprocessing.managers, we need to decide how to handle (and communicate to 
the user appropriately) its potential absence.  So far, I am unable to find a 
good example of another module where they have chosen to split up such code 
rather than keeping it all bottled up inside a single module, but perhaps I 
have missed something?


> 2) Same for SharedMemoryServer (which is a subclass of 
> multiprocessing.managers.Server).

Same thing as above.  If we decide how to handle the unsupported platforms on 
import, we can re-organize appropriately.


> 3) ShareableList name is kinda inconsistent with other classes (they all have 
> a "Shared" prefix). I'd call it SharedList instead.

Oooh, interesting.  I am happy to see a name change here.  To share how I came 
up with its current name:  I had thought to deliberately break the naming 
pattern here to make it stand out.  The others, SharedMemory, 
SharedMemoryManager, and SharedMemoryServer, are all focused on the shared 
memory block itself which is something of a more primitive concept (like 
accessing SharedMemory.buf as a memoryview) compared to working with something 
like a list (a less primitive, more widely familiar concept).  Likewise, I 
thought a dict backed by shared memory might be called a ShareableDict and 
other things like a NumPy array backed by shared memory might be called a 
ShareableNDArray or similar.  I was hoping to find a different pattern for the 
names of these objects-backed-by-shared-memory-blocks, but I am uncertain I 
found the best name.


> 4) I have some reservations about SharedMemory's "flags" and "mode" args.

It sounds like you are agreeing with what I advocated in msg335660 (up above).  
Great!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-16 Thread Davin Potts



Davin Potts  added the comment:

@giampaolo:

> Maybe something like this instead?
>  SharedMemory(name=None, attach_if_exists=False, size=0)

I think this misses the use case when wanting to ensure we only attach to an 
existing shared memory block and if it does not exist, we should raise an 
exception because we can not continue.  (If the shared memory block should 
already be there but it is not, this means something bad happened earlier and 
we might not know how to recover here.)

I believe the two dominant use cases to address are:
1) I want to create a shared memory block (either with or without a 
pre-conceived name).
2) I want to attach to an existing shared memory block by its unique name.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-16 Thread Davin Potts



Davin Potts  added the comment:

@giampaolo:

> 1) it seems SharedMemory.close() does not destroy the memory region
> (I'm able to re-attach to it via name). If I'm not mistaken only
> the manager can do that.

Correct, close() does not and should not destroy the memory region because 
other processes may still be using it.  Only a call to unlink() triggers the 
destruction of the memory region and so unlink() should only be called once 
across all the processes with access to that shared memory block.

The unlink() method is available on the SharedMemory class.  No manager is 
required.  This is also captured in the docs.


> 2) I suggest to turn SharedMemory.buf in a read-onl property

Good idea!  I will make this change today, updating GH-11816.


> 3) it seems "size" kwarg cannot be zero (current default)

>From the docs:
When attaching to an existing shared memory block, set to 0 (which is the 
default).

This permits attaching to an existing shared memory block by name without 
needing to also already know its size.


> 4) I wonder if we should have multiprocessing.active_memory_children() or 
> something

I also think this would be helpful but...

> I'm not sure if active_memory_children() can return meaningful results with a 
> brand new process (I suppose it can't).

You are right.  As an aside, I think it interesting that in the implementation 
of "System V Shared Memory", its specification called for something like a 
system-wide registry where all still-allocated shared memory blocks were 
listed.  Despite the substantial influence System V Shared Memory had on the 
more modern implementations of "POSIX Shared Memory" and Windows' "Named Shared 
Memory", neither chose to make it part of their specification.

By encouraging the use of SharedMemoryManager to track and ensure cleanup, we 
are providing a reliable and cross-platform supportable best practice.  If 
something more low-level is needed by a user, they can choose to manage cleanup 
themselves.  This seems to parallel how we might encourage, "when opening a 
file, always use a with statement", yet users can still choose to call open() 
and later close() when they wish.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-16 Thread Giampaolo Rodola'



Giampaolo Rodola'  added the comment:

Hopefully my last iteration: =)

1) As for SharedMemoryManager, I believe it should live in 
multiprocessing.managers, not shared_memory.py. It's a subclass of Manager and 
behaves like a manager (start(), shutdown(), get_server(), etc.), so IMO that's 
its natural place. 

2) Same for SharedMemoryServer (which is a subclass of 
multiprocessing.managers.Server). SharedMemoryTracker appears to be just a 
support class for it, so IMO it should either be private or not documented.

3) ShareableList name is kinda inconsistent with other classes (they all have a 
"Shared" prefix). I'd call it SharedList instead.

4) Ideally the (public) APIs I have in mind are:

multiprocessing.managers.SharedMemoryManager
multiprocessing.managers._SharedMemoryTracker
multiprocessing.managers.SharedMemoryServer  (not documented)
multiprocessing.shared_memory.SharedMemory
multiprocessing.shared_memory.SharedList 
multiprocessing.shared_memory.WindowsNamedSharedMemory  (maybe)
multiprocessing.shared_memory.PosixSharedMemory  (maybe)

AFAIU there are 2 distinct use-cases at play: 
- two separate apps agreeing on a specific fixed memory *name* to attach to, 
which will use SharedMemory/List directly
- one single app using a manager and a worker, which will use 
SharedMemoryManager, and get SharedMemory/List directly from it (see 
https://github.com/python/cpython/pull/11816/files#r257458137)
IMO the 2 different namespaces would reflect and enforce this separation of use 
cases. 

4) I have some reservations about SharedMemory's "flags" and "mode" args.
* flags (O_CREAT, O_EXCL, O_CREX, O_TRUNC, O_RDONLY): it seems it may conflict 
with "read_only" arg. I wonder if we could achieve the same goal with more 
high-level named args instead (e.g. "create" / "attach_if_exists"). If in 
doubt, I would recommend to simply drop it for now. 
* mode: same considerations as above. Doc says "Its specification is not 
enforceable on all platforms" which makes me think it's probably better to drop 
it (also not sure why it defaults to 384).

Hope this helps.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-16 Thread Giampaolo Rodola'



Giampaolo Rodola'  added the comment:

> Because shared memory blocks are not "owned" by a single process...
> [...]
> I propose a simpler, consistent-across-platforms API like:
> SharedMemory(name=None, create=False, size=0)

Maybe something like this instead?

  SharedMemory(name=None, attach_if_exists=False, size=0)

The use case I'm thinking about is 2 distinct processes/apps which agree on a 
common fixed name.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-16 Thread Giampaolo Rodola'



Giampaolo Rodola'  added the comment:

> Because shared memory blocks are not "owned" by a single process, they are 
> not destroyed/freed when a process exits. 

1) it seems SharedMemory.close() does not destroy the memory region (I'm able 
to re-attach to it via name). If I'm not mistaken only the manager can do that. 
As such it probably makes sense to have an unlink() or destroy() methods. Maybe 
SharedMemory should support the with statement. 


2) it seems SharedMemory is only supposed to handle `memoryview`s. I suggest to 
turn SharedMemory.buf in a read-onl property:

>>> sm = shared_memory.SharedMemory(None, size=23)
>>> sm.buf = [1,2,3]
>>> sm.close()
  File "/home/giampaolo/svn/cpython-shm/Lib/multiprocessing/shared_memory.py", 
line 166, in close
self.buf.release()
AttributeError: 'list' object has no attribute 'release'


3) it seems "size" kwarg cannot be zero (current default). Suggestion: either 
choose default != 0 or make it a mandatory arg:

>>> shared_memory.SharedMemory(name=None)
ValueError: cannot mmap an empty file
>>> shared_memory.SharedMemory(name=None, size=0)
ValueError: cannot mmap an empty file
>>> shared_memory.SharedMemory(name=None, size=1)
PosixSharedMemory('psm_32161_20838', size=1)
>>> 


4) I wonder if we should have multiprocessing.active_memory_children() or 
something, similar to multiprocessing.active_children() so that one can do:

@atexit.register
def cleanup():
for x in multiprocessing.active_memory_children():
x.close()  # or unlink/destroy

I guess that would be useful for people not using a manager. Also, I'm not sure 
if active_memory_children() can return meaningful results with a brand new 
process (I suppose it can't).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-15 Thread Davin Potts



Davin Potts  added the comment:

Regarding the API of the SharedMemory class, its flags, mode, and read_only 
parameters are not universally enforced or simply not implemented on all 
platforms that offer POSIX Shared Memory or Windows Named Shared Memory.  A 
simplified API for the SharedMemory class that behaves consistently across all 
platforms would avoid confusion for users.  For users who have specific need of 
flags/mode/read_only controls on a platform that they know does indeed respect 
that control, they should still have a mechanism to leverage those controls.

I propose a simpler, consistent-across-platforms API like:
SharedMemory(name=None, create=False, size=0)

*name* and *size* retain their purpose though the former now defaults to 
None.
*create* is set to False to indicate no new shared memory block is to be
created because we only wish to attach to an already existing shared memory
block.  When set to True, a new shared memory block will be created unless
one already exists with the supplied unique name, in which case that block
will be attached to and used.

Example of attaching to an already existing shared memory block:
SharedMemory(name='uniquename')
Example of creating a new shared memory block where any new name will do:
SharedMemory(create=True, size=128)
Example of creating/attaching a shared memory block with a specific name:
SharedMemory(name='specialsnowflake', create=True, size=4096)


Even with its simplified API, SharedMemory will continue to be powered by 
PosixSharedMemory on systems where "POSIX Shared Memory" is implemented and 
powered by NamedSharedMemory on Windows systems.  The API for PosixSharedMemory 
will remain essentially unchanged from its current form:
PosixSharedMemory(name=None, flags=None, mode=0o600, size=0, 
read_only=False)
The API for NamedSharedMemory will be updated to no longer attempt to mirror 
its POSIX counterpart:
NamedSharedMemory(name=None, create=False, size=0, read_only=False)

To be clear:  the inconsistencies motivating this proposed API change is *not* 
only arising from differences between Windows and POSIX-supporting systems.  
For example, among systems implementing POSIX shared memory, the mode flag 
(which promises control over whether user/group/others can read/write to a 
shared memory block) is often but not always ignored; it differs from one OS to 
the next.


Alternatives/variations to this proposed API change:

* Leave the current APIs alone where all 3 classes have identical APIs.  
Feedback in discussions and from those experimenting with the code suggests 
this is creating confusion.

* Change all 3 classes to have the matching APIs again.  This unnecessarily 
thwarts the ability of users to exploit functionality that they know to be 
there on specific target platforms that they care about.

* Do not expose flags/mode/read_only as part of the input paramemters to 
PosixSharedMemory/NamedSharedMemory but do expose them as class attributes 
instead.  This arguably makes things unnecessarily complicated.  This is not a 
simple topic but its complexity can be treated in a more straightforward way.

* Use a parameter name other than 'create' (e.g. 'attach_only') in the newly 
proposed API.

* Make all input parameters keyword-only for greater flexibility in the API in 
the future.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-15 Thread Davin Potts



Davin Potts  added the comment:

These questions (originally asked in comments on GH-11816) seemed more 
appropriate to discuss here:
Why should the user want to use `SharedMemory` directly?
Why not just go through the manager?  Also, perhaps a
naive question: don't you _always_ need a `start()`ed
manager in order for the processes to communicate?
Doesn't `SharedMemoryServer` has to be involved?

I think it helps to discuss the last question first.  A SharedMemoryManager is 
*not* needed for two processes to share information across a shared memory 
block, nor is a SharedMemoryServer required.  The docs have examples 
demonstrating this but here is another meant to showcase exactly this:

Start up a Python shell and do the following:
>>> from multiprocessing import shared_memory
>>> shm = shared_memory.SharedMemory(name=None, size=10)
>>> shm.buf[:5] = b'Feb15'
>>> shm.name  # Note this name and use it in the next steps
'psm_26792_26631'

Start up a second Python shell in a new window and do the following:
>>> from multiprocessing import shared_memory
>>> also_shm = shared_memory.SharedMemory(name='psm_26792_26631')  # 
Use that same name
>>> bytes(also_shm.buf[:5])
b'Feb15'

If also_shm.buf is further modified in the second shell, those
changes will be visible on shm.buf in the first shell.  The same
is true of the reverse.

The key point is that there is no sending of messages between the processes at 
all.  In stark contrast, SyncManager offers and supports objects held in 
"distributed shared memory" where messages must be sent from one process to 
another to access or manipulate data; those objects held in "distributed shared 
memory" *must* have a SyncManager+Server to enable their use.  That is not 
needed at all for SharedMemory because access to and manipulation of the data 
is performed directly without the cost-delay of messaging.

This begs a new question, "so what is the SharedMemoryManager used for then?"  
The docs answer:
To assist with the life-cycle management of shared memory
especially across distinct processes, a BaseManager subclass,
SharedMemoryManager, is also provided.
Because shared memory blocks are not "owned" by a single process, they are not 
destroyed/freed when a process exits.  A SharedMemoryManager is used to ensure 
the free-ing of a shared memory block when it is no longer needed.

New SharedMemory instances may be created via a SharedMemoryManager (in which 
case their birth-to-death life-cycle is being managed) or they may be created 
directly as seen in the above example.  Returning to the first question, "Why 
should the user want to use `SharedMemory` directly?", there are more use cases 
than these two:
1. In process 1, a shared memory block is created by calling 
SharedMemoryManager.SharedMemory().  In process 2, we need to attach to that 
existing shared memory block and can do so by referring to its name.  This is 
accomplished as in the above example by simply calling 
SharedMemory(name='uniquename').  We do not want to attach to it via a second 
SharedMemoryManager because only one manager should oversee the life-cycle of a 
single shared memory block.
2. Sometimes direct management of the life-cycle of a shared memory block is 
desirable.  For example, on systems supporting POSIX shared memory, it is a 
feature that shared memory blocks outlive processes.  Some services choose to 
speed a service restart by preserving state data in shared memory, saving the 
newly restarted service from rebuilding it.  The SharedMemoryManager provides 
one life-cycle strategy but can not cover all scenarios so the option to 
directly manage it is important.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-12 Thread Antoine Pitrou



Antoine Pitrou  added the comment:

> SharedMemoryManager does not subclass SyncManager but it did previously.

Ah, right.  Then it's ok to change the tests IMO.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-12 Thread Davin Potts



Davin Potts  added the comment:

@Antoine:  SharedMemoryManager does not subclass SyncManager but it did 
previously.  This is the source of the confusion.  SharedMemoryManager 
subclasses BaseManager which does not provide Value, Array, list, dict, etc.

Agreed that the manager facility does not appear to see that much use in 
existing code.

When working with shared memory, I expect SharedMemoryManager to be much more 
popular.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-12 Thread Antoine Pitrou



Antoine Pitrou  added the comment:

Davin:

> This is my fault because I altered SharedMemoryManager to no longer support 
> functionality from SyncManager that I thought could be confusing to include.  
> I am just now discovering this and am not immediately sure if simply removing 
> the SharedMemoryManager-relevant lines from your patch is the right solution 
> but I wanted to mention this thought right away.

If SharedMemoryManager subclasses SyncManager then I *think* it should obey the 
SyncManager contract.  Regardless of the shared memory facility, the user may 
also want to "shared" regular proxies between processes.

(to be honest, I don't think the multiprocessing Manager facility is used a lot 
currently...)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-11 Thread Davin Potts



Davin Potts  added the comment:

@giampaolo.rodola: It definitely helps.


Conceptually, SyncManager provides "distributed shared memory" where lists, 
dicts, etc. are held in memory by one process but may be accessed remotely from 
another via a Proxy Object.  Mutating a dict from one process requires sending 
a message to some other process to request the change be made.

In contrast, SharedMemoryManager provides non-distributed shared memory where a 
special region of memory is held by the OS kernel (not a process) and made 
directly addressable to many processes simultaneously.  Modifying any data in 
this special region of memory requires zero process-to-process communication; 
any of the processes may modify the data directly.

In a speed contest, the SharedMemoryManager wins in every use case -- and it is 
not a close race.  There are other advantages and disadvantages to each, but 
speed is the key differentiator.


Thinking ahead to the future of SharedMemoryManager, there is the potential for 
a POSIX shared memory based semaphore.  The performance of this semaphore 
across processes should drastically outperform SyncManager's semaphore.  It 
might be something we will want to support in the future.  SharedMemoryManager 
needs a synchronization mechanism now (in support of common use cases) to 
coordinate across processes, which is why I initially thought 
SharedMemoryManager should expose the Lock, Semaphore, Event, Barrier, etc. 
powered by distributed shared memory.  I am no longer sure this is the right 
choice for three reasons:
(1) it unnecessarily complicates and confuses the separation of what is powered 
by fast SystemV-style shared memory and what is powered by slow distributed 
shared memory,
(2) it would be a very simple example in the docs to show how to add our 
existing Lock or Semaphore to SharedMemoryManager via register(),
(3) if we one day implement POSIX shared memory semaphores (and equivalent 
where POSIX is not supported), we will have the burden of an existing 
lock/semaphore creation methods and apis with behavioral differences.


I propose that it would be clearer but no less usable if we drop these 
registered object types (created via calls to register()) from 
SharedMemoryManager.  It is one line of code for a user to add "Lock" to 
SharedMemoryManager, which I think we can demonstrate well with a simple 
example.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-11 Thread Raymond Hettinger



Raymond Hettinger  added the comment:

FWIW, I read the draft docs and found them to be at the right level throughout. 
I definitely wouldn't want anything more terse.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-11 Thread Giampaolo Rodola'



Giampaolo Rodola'  added the comment:

I submitted an initial review / comments in the PR. I think this is good for a 
first iteration in order to understand what APIs to expose publicly (also, keep 
in mind I may not have a full picture of how this is intended to be used 
precisely).

As for the doc, I find it a bit too verbose. If the namespace change is 
accepted consider doing the following:

* document `SharedMemoryManager` right after `SyncManager`. That would let you 
reuse the description for `Semaphore`, `Barrier`, `Lock`, etc.
* within `SharedMemoryManager` doc link/point to a different section of the doc 
where you explain the whole thing in more details, and also document the 
remaining APIs 

Just an idea. Writing good doc is not easy and it requires actually putting 
hands on it. 

Hope this helps.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-11 Thread Davin Potts



Davin Potts  added the comment:

@terry.reedy and @ronaldoussoren: I have asked Van again to provide comments 
here clarifying the topics of (1) copyright notices and (2) requiring the 
BSD-licensed-work's author to sign a contributor agreement.

Specifically regarding the appearance of __copyright__, I added my agreement to 
your comments on GH-11816 on this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-10 Thread Davin Potts



Davin Potts  added the comment:

@giampaolo.rodola: Your patch from 3 days ago in issue35917 included additional 
tests around the SharedMemoryManager which are now causing test failures in my 
new PR.  This is my fault because I altered SharedMemoryManager to no longer 
support functionality from SyncManager that I thought could be confusing to 
include.  I am just now discovering this and am not immediately sure if simply 
removing the SharedMemoryManager-relevant lines from your patch is the right 
solution but I wanted to mention this thought right away.

Thank you for discovering that SyncManager was being overlooked in the tests 
and the nice patch in issue35917.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-10 Thread Davin Potts



Davin Potts  added the comment:

Docs and tests are now available in a new PR.  I have stayed focused on getting 
these docs and tests to everyone without delay but that means I have not yet 
had an opportunity to respond to the helpful comments, thoughtful questions, 
and threads that have popped up in the last few days.  I will follow up with 
all comments as quickly as possible starting in the morning.

There are two topics in particular that I hope will trigger a wider discussion: 
 the api around the SharedMemory class and the inclusion-worthiness of the 
shareable_wrap function.

Regarding the api of SharedMemory, the docs explain that not all of the current 
input parameters are supportable/enforceable across platforms.  I believe we 
want an api that is relevant across all platforms but at the same time we do 
not want to unnecessarily suppress/hide functionality that would be useful on 
some platforms -- there needs to be a balance between these motivations but 
where do we strike that balance?

Regarding the inclusion-worthiness of the shareable_wrap function, I 
deliberately did not include it in the docs but its docstring in the code 
explains its purpose.  If included, it would drastically simplify working with 
NumPy arrays; please see the code example in the docs demonstrating the use of 
NumPy arrays without the aid of the shareable_wrap function.  I have received 
feedback from others using this function also worth discussing.


Thank you to everyone who has already looked at the code and shared helpful 
thoughts -- please have a look at the tests and docs.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-10 Thread Davin Potts



Change by Davin Potts :


--
pull_requests: +11834

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-10 Thread Davin Potts



Change by Davin Potts :


--
pull_requests: +11834, 11835, 11836

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-10 Thread Davin Potts



Change by Davin Potts :


--
pull_requests: +11834, 11835

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-07 Thread Antoine Pitrou



Antoine Pitrou  added the comment:

(also, OSError will automatically convert to more specific subclasses for some 
errnos, see PEP 3151:

>>> raise OSError(errno.ENOENT, "foobar")   
>>> 
>>>
Traceback (most recent call last):
  File "", line 1, in 
raise OSError(errno.ENOENT, "foobar")
FileNotFoundError: [Errno 2] foobar

)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-07 Thread Neil Schemenauer



Neil Schemenauer  added the comment:

I didn't finish reviewing completely yet but here are some comments.

I think the random filename generation can be pulled out of
_posixshmem.  Make it require that a filename is passed into it.
That moves a fair bit of complexity out of C and into Python.  In
the Python module, I suggest that you could use secrets.token_hex()
to build the filename.  Put the loop that retries because of name
collisions in the Python module as well.

If you like, I can try to make a patch that does the above.

When looking at at how the Python code would handle a name collision,
I see this code:


+switch (errno) {
+case EACCES:
+PyErr_Format(pPermissionsException,
+"No permission to %s this segment",
+(flags & O_TRUNC) ? "truncate" : "access"
+);
+break;
+
+case EEXIST:
+PyErr_SetString(pExistentialException,
+"Shared memory with the specified name already 
exists");
+break;
+
+case ENOENT:
+PyErr_SetString(pExistentialException,
+"No shared memory exists with the specified 
name");
+break;
+
+case EINVAL:
+PyErr_SetString(PyExc_ValueError, "Invalid parameter(s)");
+break;
+
+case EMFILE:
+PyErr_SetString(PyExc_OSError,
+ "This process already has the maximum number 
of files open");
+break;
+
+case ENFILE:
+PyErr_SetString(PyExc_OSError,
+ "The system limit on the total number of open 
files has been reached");
+break;
+
+case ENAMETOOLONG:
+PyErr_SetString(PyExc_ValueError,
+ "The name is too long");
+break;
+
+default:
+PyErr_SetFromErrno(PyExc_OSError);
+break;
+}


I think it might be cleaner just to make it:

PyErr_SetFromErrno(PyExc_OSError);

Then, if you need a customized exception or message, you can
re-raise inside the Python code.  To me, it seems simpler and more
direct to just preserve the errno and always raise OSError.  Changing things
to ValueError means that callers need to look at the message text to 
differentiate between some errno values.

Is it the case that _posixshmem started life as a module that would
be used directly and not something hidden by another layer of
abstraction?  If so, having these customized exceptions and having
it do the filename generation itself makes sense.  However, it is an
internal implementation detail of shared_memory.py, I think it
should be simplified to do only what is needed.  E.g. a thin layer
of the system calls.

--
nosy: +nascheme

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-06 Thread Giampaolo Rodola'



Giampaolo Rodola'  added the comment:

Also, for completeness (since discussion is getting split), please see my 
proposal to move SharedMemoryManager and SharedMemoryServer into 
multiprocessing.managers namespace and rename shared_memory.py to 
_shared_memory.py:
https://mail.python.org/pipermail/python-dev/2019-February/156235.html
Also, it appears ShareableList should be register()ed against 
SharedMemoryManager.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-06 Thread Giampaolo Rodola'



Giampaolo Rodola'  added the comment:

Unit-tests at https://bugs.python.org/issue35917.

--
nosy: +giampaolo.rodola

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-06 Thread Terry J. Reedy



Terry J. Reedy  added the comment:

I would prefer that we be consistent.  In any case, I think you should be added 
to Misc/ACKS in the PR.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-06 Thread Philip Semanchuk



Philip Semanchuk  added the comment:

Hi all, I'm the author of `posix_ipc` on which some of this code is based. I'd 
be happy to sign a contributor agreement in order to erase any concerns on that 
front.

--
nosy: +osvenskan

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-05 Thread Stefan Krah



Change by Stefan Krah :


--
nosy: +skrah

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-04 Thread pmpp



Change by pmpp :


--
nosy: +pmpp

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-04 Thread Eric Snow



Eric Snow  added the comment:

FTR, I've had a number of extensive conversations with Davin (over the last 
year and a half) about this feature.  His choices seemed reasonable (caveat: 
I'm not an expert on multiprocessing) and the feature seemed desirable.  I did 
not review his PR.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-04 Thread Antoine Pitrou



Antoine Pitrou  added the comment:

Also, I will want to review this as well.  Since Alpha 2 is scheduled for  
2019-02-24, and this is a sizable piece of work, I would prefer if a full PR 
was available, say, before 2019-02-15.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-04 Thread Ronald Oussoren



Ronald Oussoren  added the comment:

Please also address my review comments.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-04 Thread Davin Potts



Davin Potts  added the comment:

@lukasz.langa: Missing tests and documentation will be in by alpha2.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-04 Thread Łukasz Langa


Łukasz Langa  added the comment:

@Davin, in what time can you fill in the missing tests and documentation?  If 
this is something you finish do before alpha2, I'm inclined to leave the change 
in.

As it stands, I missed the controversy yesterday as I was busy making my first 
release.  So the merge *got released* in alpha1.  I would prefer to fix the 
missing pieces forward instead of reverting and re-submitting which will only 
thrash blame and history at this point.

FTR, I do agree with Antoine, Ronald and others that in the future such big 
changes should be as close to their ready state at merge time.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-03 Thread Ronald Oussoren



Ronald Oussoren  added the comment:

David, 

I don't agree with merging at this point.

The lack of documentation makes it hard to ask for more feedback and broader 
testing, because people won't know how to use the library and provide 
constructive feedback (that's why my review comments on GitHub are mostly 
nitpicking).

Furthermore, distributing this as a package on PyPI is much better way to 
solicit further testing and feedback, because then it can be used right now 
instead of having to wait for a release. Releasing on PyPI is feasible for this 
new functionality because it is a standalone library.

It would be better to revert this changeset until it is much closer to finished 
and there has been time for better code review.

BTW. My comment about copyright on the pull request was about the __copyright__ 
attribute of the C extension, that is rarely used in the stdlib and is IMHO 
something to avoid.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-03 Thread Davin Potts



Davin Potts  added the comment:

This work is the result of ~1.5 years of development effort, much of it 
accomplished at the last two core dev sprints.  The code behind it has been 
stable since September 2018 and tested as an independently installable package 
by multiple people.

I was encouraged by Lukasz, Yury, and others to check in this code early, not 
waiting for tests and docs, in order to both solicit more feedback and provide 
for broader testing.  I understand that doing such a thing is not at all a 
novelty.  Thankfully it is doing that -- I hope that feedback remains 
constructive and supportive.

There are some tests to be found in a branch (enh-tests-shmem) of 
github.com/applio/cpython which I think should become more comprehensive before 
inclusion.  Temporarily deferring and not including them as part of the first 
alpha should reduce the complexity of that release.

Regarding the BSD license on the C code being adopted, my conversations with 
Brett and subsequently Van have not raised concerns, far from it -- there is a 
process which is being followed to the letter.  If there are other reasons to 
object to the thoughtful adoption of code licensed like this one, that deserves 
a decoupled and larger discussion first.

--
nosy: +brett.cannon

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-03 Thread Terry J. Reedy



Terry J. Reedy  added the comment:

I think there is also a license problem.  posixshmem.c contains "Copyright 2012 
Philip Semanchuk, 2018-2019 Davin Potts"

Ronald commented "The only other files with a copyright attribute are parser, 
optparse and platform. I'd prefer to avoid adding new copyright attributes to 
stdlib modules."

(turtle.py has a problematic copyright notice in the docstring.)

I think we definitely should not add new copyright notices.

Copyright notices are redundant: all contributors retain copyright in their 
contributions.

Copyright notices are deceptive: 1. since they are rare, they imply that there 
is something special about a particular module and the listed authors; 2. even 
if a module were special, the notice becomes obsolete as soon as anyone else 
contributes to the module.

Copyright notices are not needed: contributors authorize PSF to distribute the 
collective work under a rather liberal license.  If anyone want to make a use 
of Python code not covered by that license, and wants to bypass PSF, they would 
have to look at git log and and git blame to find the relevant contributors.

In this case, part of the work is attributed to Philip Semanchuk as a current 
copyright owner.  According to https://bugs.python.org/user2567, he has not 
signed the contributor agreement, so his work should not have been merged until 
he has.  Even if he had, he would have to specifically agree to his work being 
contributed.  Sorry to be a grinch, but aside from anything else, I think this 
should be reverted until the legal question is clarified.

Some of this might need discussion on pydev.

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-03 Thread Ned Deily



Change by Ned Deily :


--
nosy:  -ned.deily

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-03 Thread Antoine Pitrou



Antoine Pitrou  added the comment:

And to make things clear, I'm not saying anything about the functionality.  
Easier shared memory + multiprocessing is definitely an interesting endeavour 
(and a non-trivial one).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-03 Thread Ronald Oussoren



Ronald Oussoren  added the comment:

FWIW I agree with reverting this pull request, the work is clearly not finished 
yet.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-03 Thread Antoine Pitrou



Antoine Pitrou  added the comment:

Yes, I think this was merged too soon.  Davin, the convention is that we only 
merge completed work, not some work-in-progress.

You already did this once in https://bugs.python.org/issue28053 .  You still 
haven't completed that piece of work.

I'm going to ask for this to be reverted.

--
nosy: +pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-03 Thread Ronald Oussoren



Ronald Oussoren  added the comment:

Was this merged too soon? This is new functionality without any docs and tests. 

I've also left review comments on the pull request.

--
nosy: +ronaldoussoren

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-01 Thread Davin Potts



Davin Potts  added the comment:


New changeset e5ef45b8f519a9be9965590e1a0a587ff584c180 by Davin Potts in branch 
'master':
bpo-35813: Added shared_memory submodule of multiprocessing. (#11664)
https://github.com/python/cpython/commit/e5ef45b8f519a9be9965590e1a0a587ff584c180


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-01-23 Thread Davin Potts



Change by Davin Potts :


--
keywords: +patch, patch
pull_requests: +11470, 11471
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-01-23 Thread Davin Potts



Change by Davin Potts :


--
keywords: +patch, patch, patch
pull_requests: +11470, 11471, 11472
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-01-23 Thread Davin Potts



Change by Davin Potts :


--
keywords: +patch
pull_requests: +11470
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35813] shared memory construct to avoid need for serialization between processes

2019-01-23 Thread Davin Potts



New submission from Davin Potts :

A facility for using shared memory would permit direct, zero-copy access to 
data across distinct processes (especially when created via multiprocessing) 
without the need for serialization, thus eliminating the primary performance 
bottleneck in the most common use cases for multiprocessing.

Currently, multiprocessing communicates data from one process to another by 
first serializing it (by default via pickle) on the sender's end then 
de-serializing it on the receiver's end.  Because distinct processes possess 
their own process memory space, no data in memory is common across processes 
and thus any information to be shared must be communicated over a 
socket/pipe/other mechanism.  Serialization via tools like pickle is convenient 
especially when supporting processes on physically distinct hardware with 
potentially different architectures (which multiprocessing does also support).  
Such serialization is wasteful and potentially unnecessary when multiple 
multiprocessing.Process instances are running on the same machine.  The cost of 
this serialization is believed to be a non-trivial drag on performance when 
using multiprocessing on multi-core and/or SMP machines.

While not a new concept (System V Shared Memory has been around for quite some 
time), the proliferation of support for shared memory segments on modern 
operating systems (Windows, Linux, *BSDs, and more) provides a means for 
exposing a consistent interface and api to a shared memory construct usable 
across platforms despite technical differences in the underlying implementation 
details of POSIX shared memory versus Native Shared Memory (Windows).

For further reading/reference:  Tools such as the posix_ipc module have 
provided fairly mature apis around POSIX shared memory and seen use in other 
projects.  The "shared-array", "shared_ndarray", and "sharedmem-numpy" packages 
all have interesting implementations for exposing NumPy arrays via shared 
memory segments.  PostgreSQL has a consistent internal API for offering shared 
memory across Windows/Unix platforms based on System V, enabling use on 
NetBSD/OpenBSD before those platforms supported POSIX shared memory.

At least initially, objects which support the buffer protocol can be most 
readily shared across processes via shared memory.  From a design standpoint, 
the use of a Manager instance is likely recommended to enforce access rules in 
different processes via proxy objects as well as cleanup of shared memory 
segments once an object is no longer referenced.  The documentation around 
multiprocessing's existing sharedctypes submodule (which uses a single  memory 
segment through the heap submodule with its own memory management 
implementation to "malloc" space for allowed ctypes and then "free" that space 
when no longer used, recycling it for use again from the shared memory segment) 
will need to be updated to avoid confusion over concepts.

Ultimately, the primary motivation is to provide a path for better parallel 
execution performance by eliminating the need to transmit data between distinct 
processes on a single system (not for use in distributed memory architectures). 
 Secondary use cases have been suggested including a means for sharing data 
across concurrent Python interactive shells, potential use with 
subinterpreters, and other traditional uses for shared memory since the first 
introduction of System V Shared Memory onwards.

--
assignee: davin
components: Library (Lib)
messages: 334278
nosy: davin, eric.snow, lukasz.langa, ned.deily, rhettinger, yselivanov
priority: normal
severity: normal
status: open
title: shared memory construct to avoid need for serialization between processes
type: enhancement
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

67 matches

Mail list logo