[issue40440] allow array.array construction from memoryview w/o copy
Davin Potts added the comment: Being able to create an array.array without making a copy of a memoryview's contents does sound valuable. We do not always want to modify the size of the array, as evidenced by array.array's existing functionality where its size-changing manipulations (like append) are suppressed when exporting a buffer. So I think it is okay to not require a copy be made when constructing an array.array in this way. Serhiy's example is a good one for demonstrating how different parts of an array.array can be treated as having different types as far as getting and setting items. I have met a number of hardware groups in mostly larger companies that use array.array to expose raw data being read directly from devices. They wastefully make copies of their often large array.array objects, each with a distinct type code, so that they can make use of array.array's index() and count() and other functions, which are not available on a memoryview. Within the core of Python (that is, including the standard library but excluding 3rd party packages), we have a healthy number of examples of objects that expose a buffer via the Buffer Protocol but they lack the symmetry of going the other way to enable creation from an existing buffer. My sense is it would be a welcome thing to see something like array.array, that is designed to work with low-level data types, support creation from an existing buffer without the need for a copy -- this is the explicit purpose of the Buffer Protocol after all but array.array only supports export, not creation, which currently makes array.array feel inconsistent. -- nosy: +davin ___ Python tracker <https://bugs.python.org/issue40440> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39584] multiprocessing.shared_memory: MacOS crashes by running attached Python code
Davin Potts added the comment: My sense is that it would be nice if we can catch this before ftruncate does something nasty. Where else is ftruncate used in CPython that this could similarly trigger a problem? How is it handled there (or not)? -- ___ Python tracker <https://bugs.python.org/issue39584> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33082] multiprocessing docs bury very important 'callback=' args
Davin Potts added the comment: I appreciate the functionality offered by the callbacks and have found good uses for them, as Chad clearly does/has. That said, the thought of expanding the documentation on the callbacks had not come up for me. Reading through the proposed changes to the prose explanations, the choice of words has changed but not significantly and virtually no new concepts are being explained. I agree with Julien that the docs should stay as they are. Chad: Thank you for advocating for things you think more people need to know about even if we do not update the docs this time. -- resolution: -> rejected stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue33082> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35727] sys.exit() in a multiprocessing.Process does not align with Python behavior
Davin Potts added the comment: I believe the mentality behind multiprocessing.Process triggering an exit code of 1 when sys.exit() is invoked inside its process is to indicate a non-standard exit out of its execution. There may yet be other side effects that could be triggered by having a sys.exit(0) translate into an exit code of 0 from the Process's process -- and we might not notice them with the current tests. Was there a particular use case that motivates this suggested change? -- ___ Python tracker <https://bugs.python.org/issue35727> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22393] multiprocessing.Pool shouldn't hang forever if a worker process dies unexpectedly
Change by Davin Potts : -- pull_requests: +15722 pull_request: https://github.com/python/cpython/pull/16103 ___ Python tracker <https://bugs.python.org/issue22393> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37652] Multiprocessing shared_memory ValueError on race with ShareableList
Davin Potts added the comment: Apologies, one of the quotes in my previous response should have been attributed to @mental. I think @pierreglaser phrased it very nicely: > shared_memory is a low level python module. Precautions should be made when > handling concurrently the shared_memory objects using synchronization > primitives for example. I'm not sure this should be done internally in the > SharedMemory class -- especially, we don't want to slow down concurrent READ > access. Per the further suggestion: > +1 For a documentation addition. I can take a crack at adding something more along the lines of this discussion, but I would very much welcome suggestions (@bjs, @mental, @pierreglaser)... -- ___ Python tracker <https://bugs.python.org/issue37652> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37652] Multiprocessing shared_memory ValueError on race with ShareableList
Davin Potts added the comment: Short responses to questions/comments from @bjs, followed by hopefully helpful further comments: > Are you supposed to ever use a raw SharedMemory buffer directly? Yes. > What atomicity guarantees are there for ShareableList operations and > read/write to the SharedMemory buffer? None. > I've had a fix spinning for about a day now, it introduced a > `multiprocessing.Lock` and it was simply wrapped around any struct packing > and unpacking calls. That sounds like a nice general-purpose fix for situations where it is impossible to plan ahead to know when one or more processes will need to modify the ShareableList/SharedMemory.buf. When it is possible to design code to ensure, because of the execution flow through the code, no two processes/threads will attempt to modify and access/modify the same location in memory at the same time, locks become unnecessary. Locks are great tools but they generally result in slower executing code. > What are the use cases for SharedMemory and ShareableList? Speed. If we don't care about speed, we can use distributed shared memory through the SyncManager -- this keeps one copy of a dict/list/whatever in the process memory space of a single process and all other processes may modify or access it through two-sided communication with that "owner" process. If we do care about speed, we use SharedMemory and ShareableList and other things created on top of SharedMemory -- this effectively gives us fast, communal memory access where we avoid the cost of communication except for when we truly need to synchronize (where multiprocessing.Lock can help). Reduced memory footprint. If I have a "very large" amount of data consuming a significant percentage of the available memory on my system, I can make it available to multiple processes without duplication. This provides processes with fast access to that data, as fast as if each were accessing data in its own process memory space. It is one thing to imagine using this in parallel-executing code, but this can be just as useful in the Python interactive shell. One such scenario: after starting a time-consuming, non-parallel calculation in one Python shell, it is possible to open a new Python shell in another window and attach to the data through shared memory to continue work while the calculation runs in the first window. -- ___ Python tracker <https://bugs.python.org/issue37652> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37637] multiprocessing numpy.ndarray not transmitted properly
Davin Potts added the comment: Marking as closed after providing an example of how to send NumPy arrays as bytes with the send_bytes() function. -- resolution: -> not a bug stage: -> resolved status: -> closed ___ Python tracker <https://bugs.python.org/issue37637> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38119] resource tracker destroys shared memory segments when other processes should still have valid access
Change by Davin Potts : -- keywords: +patch pull_requests: +15618 stage: -> patch review pull_request: https://github.com/python/cpython/pull/15989 ___ Python tracker <https://bugs.python.org/issue38119> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37754] Consistency of Unix's shared_memory implementation with windows
Davin Potts added the comment: I have created issue38119 to track a fix to the inappropriate use of resource tracker with shared memory segments, but this does not replace or supersede what is discussed here. -- ___ Python tracker <https://bugs.python.org/issue37754> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38119] resource tracker destroys shared memory segments when other processes should still have valid access
New submission from Davin Potts : The resource tracker currently destroys (via _posixshmem.shm_unlink) shared memory segments on posix systems when any independently created Python process with a handle on a shared memory segment exits (gracefully or otherwise). This breaks the expected cross-platform behavior that a shared memory segment persists at least as long as any running process has a handle on that segment. As described with an example scenario in issue37754: Let's say a three processes P1, P2 and P3 are trying to communicate using shared memory. --> P1 creates the shared memory block, and waits for P2 and P3 to access it. --> P2 starts and attaches this shared memory segment, writes some data to it and exits. --> Now in case of Unix, shm_unlink is called as soon as P2 exits. (This is by action of the resource tracker.) --> Now, P3 starts and tries to attach the shared memory segment. --> P3 will not be able to attach the shared memory segment in Unix, because shm_unlink has been called on that segment. --> Whereas, P3 will be able to attach to the shared memory segment in Windows. Another key scenario we expect to work but does not currently: 1. A multiprocessing.managers.SharedMemoryManager is instantiated and started in process A. 2. A shared memory segment is created using that manager in process A. 3. A serialized representation of that shared memory segment is deserialized in process B. 4. Process B does work with the shared memory segment that is also still visible to process A. 5. Process B exits cleanly. 6. Process A reads data from the shared memory segment after process B is gone. (This currently fails.) The SharedMemoryManager provides a flexible means for ensuring cleanup of shared memory segments. The current resource tracker attempts to treat shared memory segments as equivalent to semaphore references, which is too narrow of an interpretation. As such, the current resource tracker should not be attempting to enforce cleanup of shared memory segments because it breaks expected behavior and significantly limits functionality. -- assignee: davin components: Library (Lib) messages: 351960 nosy: davin, pablogsal, pitrou, vinay0410, vstinner priority: normal severity: normal status: open title: resource tracker destroys shared memory segments when other processes should still have valid access type: behavior versions: Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue38119> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35267] reproducible deadlock with multiprocessing.Pool
Davin Potts added the comment: I second what @vstinner already said in the comments for PR11143, that this should not merely be documented. -- nosy: +davin ___ Python tracker <https://bugs.python.org/issue35267> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38084] multiprocessing cannot recover from crashed worker
Davin Potts added the comment: Agreed with @ppperry that this is a duplicate of issue22393. The proposed patch in issue22393 is, for the moment, out of sync with more recent changes. That patch's approach would result in the loss of all partial results from a Pool.map, but it may be faster to update and review. -- ___ Python tracker <https://bugs.python.org/issue38084> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38084] multiprocessing cannot recover from crashed worker
Davin Potts added the comment: Thanks to Pablo's good work with implementing the use of multiprocessing's Process.sentinel, the logic for handling PoolWorkers that die has been centralized into Pool._maintain_pool(). If _maintain_pool() can also identify which job died with the dead PoolWorker, then it should be possible to put a corresponding message on the outqueue to indicate an exception occurred but pool can otherwise continue its work. The question of whether Pool.map() should expose a timeout parameter deserves a separate discussion and should not be considered a path forward on this issue as it would require that users always specify and somehow know beforehand how long it should take for results to be returned from workers. Exposing the timeout control may have other practical benefits elsewhere but not here. -- ___ Python tracker <https://bugs.python.org/issue38084> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38084] multiprocessing cannot recover from crashed worker
Davin Potts added the comment: Sharing for the sake of documenting a few things going on in this particular example: * When a PoolWorker process exits in this way (os._exit(anything)), the PoolWorker never gets the chance to send a signal of failure (normally sent via the outqueue) to the MainProcess. * In the current logic of the MainProcess, Pool._maintain_pool() detects the termination of that PoolWorker process and starts a new PoolWorker process to replace it, maintaining the desired size of Pool. * The infinite hang observed in this example comes from the original p.map() call performing an unlimited-timeout wait for a result to appear on the outqueue, hence an infinite wait. This wait is performed in MapResult.get() which does expose a timeout parameter though it is not possible to control through Pool.map(). It is not at all a correct, general solution, but exposing the control on this timeout and setting it to 1.0 seconds permits Steve's repro code snippet to run to completion (no infinite hang, raises a multiprocessing.context.TimeoutError). -- ___ Python tracker <https://bugs.python.org/issue38084> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38018] Increase Code Coverage for multiprocessing.shared_memory
Davin Potts added the comment: Initial review of the test failure suggests a likely flaw in the mechanism used by the resource tracker. I will continue investigating more tomorrow. -- ___ Python tracker <https://bugs.python.org/issue38018> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38018] Increase Code Coverage for multiprocessing.shared_memory
Davin Potts added the comment: New changeset d14e39c8d9a9b525c7dcd83b2a260e2707fa85c1 by Davin Potts (Vinay Sharma) in branch 'master': bpo-38018: Increase code coverage for multiprocessing.shared_memory (GH-15662) https://github.com/python/cpython/commit/d14e39c8d9a9b525c7dcd83b2a260e2707fa85c1 -- ___ Python tracker <https://bugs.python.org/issue38018> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37185] use os.memfd_create in multiprocessing.shared_memory?
Davin Potts added the comment: Unless I am missing something, memfd_create appears to be specific to the Linux kernel still so we would need to replicate its behavior on all of the other unix systems. To your point, but quoting from the docs, "separate invocations of memfd_create with the same name will not return descriptors for the same region of memory". If it is possible to use the anonymous shared memory created via memfd_create in another process (which is arguably the primary motivation / use case for multiprocessing.shared_memory), we would need to replicate the unique way of referencing a shared memory segment when trying to attach to it from other processes. To permit resource management of a shared memory segment (in the sense of ensuring the shared memory segment is always unlinked at the end), the multiprocessing.managers.SharedMemoryManager exists. Because destroying a shared memory segment at exit is not always desirable, the SharedMemoryManager provides additional control over when it is appropriate to unlink a shared memory segment. -- ___ Python tracker <https://bugs.python.org/issue37185> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37754] Consistency of Unix's shared_memory implementation with windows
Davin Potts added the comment: A shared semaphore approach for the resource tracker sounds appealing as a way to make the behavior on Windows and posix systems more consistent. However this might get implemented, we should not artificially prevent users from having some option to persist beyond the last Python process's exit. I like the point that @eryksun makes that we could instead consider using NtMakePermanentObject on Windows to permit more posix-like behavior instead, but I do not think we want to head down a path of using undocumented NT APIs. In the current code, the resource tracker inappropriately triggers _posixshmem.shm_unlink; we need to fix this in the immediate short term (before 3.8 is released) as it breaks the expected behavior @vinay0410 describes. -- ___ Python tracker <https://bugs.python.org/issue37754> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37754] alter size of segment using multiprocessing.shared_memory
Davin Potts added the comment: Attempts to alter the size of a shared memory segment are met with a variety of different, nuanced behaviors on systems we want to support. I agree that it would be valuable to be able to effectively realloc a shared memory segment, which thankfully the user can do with the current implementation although they become responsible for adjusting for platform-specific behaviors. The design of the API in multiprocessing.shared_memory strives to be as feature-rich as possible while providing consistent behavior across platforms that can be reasonably supported; it also leaves the door open (so to speak) for users to exploit additional platform-specific capabilities of shared memory segments. Knowing beforehand whether to create a segment or attach to an existing one is an important feature for a variety of use cases. I believe this is discussed at some length in issue35813. If what is discussed there does not help (it did get kind of long sometimes), please say so and we can talk through it more. -- ___ Python tracker <https://bugs.python.org/issue37754> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33725] Python crashes on macOS after fork with no exec
Davin Potts added the comment: Victor raises an important question: should the *default* start behavior be made consistent across platforms? Assuming we change it on MacOS, the default start behavior on Windows and MacOS will be spawn but the default start behavior on Linux and FreeBSD (among others) will be fork. Reasons to consider such a breaking change: * This inconsistency in default start behavior on different platforms (Windows versus not) has historically been a significant source of confusion for many, many users. * These days, the majority of users are not already familiar with the rule "fork-before-creating-threads" and so are surprised and confused when they fork a process that already has spun up multiple threads and bad things happen. * We are changing the default on one platform (MacOS), which should prompt us to consider how are defaults are set elsewhere. Reasons to reject such a breaking change: * Though changing the default does not break everyone's code everywhere, it will require changes to any code that depends upon the default start method AND depends upon data/functions/stuff from the parent to also be present in the forked child process. -- nosy: +pablogsal ___ Python tracker <https://bugs.python.org/issue33725> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33725] Python crashes on macOS after fork with no exec
Davin Potts added the comment: I believe we must change the default behavior on MacOS to use spawn instead of fork. Encouraging people to use fork by default on MacOS is encouraging them to create something that effectively will not work. Keeping fork as the default behavior when we have already turned off all of the tests of fork behavior on MacOS also makes no sense. Existing Python code that depends upon the default behavior (fork) on MacOS has already been broken -- if we make this change, we are arguably not breaking anyone's working code. Users can and will still be able to specify the start mechanism on MacOS, including fork. This empowers users to continue to handle even the most esoteric use cases without loss of functionality from multiprocessing. Though admittedly, without an ability to test the behavior of fork, this will need to be marked as deprecated. I will supply a patch making this change and updating the docs shortly after PyCon. -- ___ Python tracker <https://bugs.python.org/issue33725> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36364] errors in multiprocessing.shared_memory examples
Davin Potts added the comment: Very much agreed, they're moving over to the main docs. -- ___ Python tracker <https://bugs.python.org/issue36364> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33725] Python crashes on macOS after fork with no exec
Davin Potts added the comment: As best as I can see, there is no magic bullet to help mitigate this. At a minimum, I am convinced we need to update the documentation to describe this behavior on MacOS and recommend alternatives. I continue to give serious thought to the idea of changing the default start method on MacOS from fork to spawn. This would be a breaking change though one could argue MacOS has already undergone a breaking change. Is such a change warranted? The alternative (which does not seem all that appealing) is that we start encouraging everyone to first consider the start method before attempting to use multiprocessing even for their first time. Providing sensible defaults is to be preferred, but changing the default to reflect a non-trivial change in the underlying platform is still not to be taken lightly. -- ___ Python tracker <https://bugs.python.org/issue33725> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36102] TestSharedMemory fails on AMD64 FreeBSD CURRENT Shared 3.x
Davin Potts added the comment: Closing. Thank you Giampaolo for jumping in so quickly to review! Thank you Victor for catching this on the buildbot. Though what is this talk of "_if_ the color changes"? ;) -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue36102> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36102] TestSharedMemory fails on AMD64 FreeBSD CURRENT Shared 3.x
Davin Potts added the comment: New changeset aadef2b41600cb6a4f845cdc4cea001c916d8745 by Davin Potts in branch 'master': bpo-36102: Prepend slash to all POSIX shared memory block names (#12036) https://github.com/python/cpython/commit/aadef2b41600cb6a4f845cdc4cea001c916d8745 -- ___ Python tracker <https://bugs.python.org/issue36102> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36102] TestSharedMemory fails on AMD64 FreeBSD CURRENT Shared 3.x
Change by Davin Potts : -- stage: -> patch review ___ Python tracker <https://bugs.python.org/issue36102> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36102] TestSharedMemory fails on AMD64 FreeBSD CURRENT Shared 3.x
Davin Potts added the comment: I have locally tested GH-12036 on all 5 of the aforementioned OSes and all are made happy by the patch. Victor: If we want to go ahead and apply this patch right away to hopefully make the FreeBSD buildbot go green, the nature of this change is sufficiently small that it will still be easy to change during the alpha period. -- stage: patch review -> ___ Python tracker <https://bugs.python.org/issue36102> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36102] TestSharedMemory fails on AMD64 FreeBSD CURRENT Shared 3.x
Change by Davin Potts : -- keywords: +patch pull_requests: +12065 stage: -> patch review ___ Python tracker <https://bugs.python.org/issue36102> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36102] TestSharedMemory fails on AMD64 FreeBSD CURRENT Shared 3.x
Davin Potts added the comment: In local testing, I found the following systems to impose the leading slash as a requirement for simply creating a shared memory block: * NetBSD 8.0 * FreeBSD 12.x * TrueOS 18.12 (the OS formerly known as PC-BSD) I found the following systems to have no required leading slash and all tests currently pass without modification: * OpenBSD 6.4 * DragonflyBSD 5.4 -- ___ Python tracker <https://bugs.python.org/issue36102> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36102] TestSharedMemory fails on AMD64 FreeBSD CURRENT Shared 3.x
Davin Potts added the comment: Though apparently undocumented on FreeBSD, their implementation of shm_open differs from others in the following way: all names for shared memory blocks *must* begin with a slash. This requirement does not exist on OpenBSD. According to its man page on shm_open, FreeBSD does at least communicate the following non-standard, additional requirement: Two processes opening the same path are guaranteed to access the same shared memory object if and only if path begins with a slash (`/') character. Given that this requirement is not universal and because a leading slash controls other behaviors on platforms like Windows, it would be confusing to make a leading slash a universal requirement. Likewise, requiring users on FreeBSD to be aware of this nuance would be contrary to the goals of the SharedMemory class. I will prepare a patch to prepend a leading slash onto the requested shared memory block name and detect the need for it. I have verified that this does solve the problem on FreeBSD and that all tests then pass. I will test NetBSD and DragonflyBSD to see if they also impose FreeBSD's undocumented requirement. -- ___ Python tracker <https://bugs.python.org/issue36102> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36099] Clarify the difference between mu and xbar in the statistics documentation
Davin Potts added the comment: Without necessarily defining what each means, perhaps it is sufficient to change this clause in the docs: it should be the mean of data For pvariance() it could read as: it should be the *population* mean of data And for variance() it could read as: it should be the *sample* mean of data -- nosy: +davin ___ Python tracker <https://bugs.python.org/issue36099> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36018] Add a Normal Distribution class to the statistics module
Davin Potts added the comment: Steven: Your point about population versus sample makes sense and your point that altering their names would be a breaking change is especially important. I think that pretty well puts an end to my suggestion of alternative names and says the current pattern should be kept with NormalDist. I particularly like the idea of using the TI Nspire and Casio Classpad to guide or help confirm what symbols might be recognizable to secondary students or 1st year university students. Raymond: As an idea for examples demonstrating the code, what about an example where a plot of pdf is created, possibly for comparison with cdf? This would require something like matplotlib but would help to visually communicate the concepts of pdf, perhaps with different sigma values? -- ___ Python tracker <https://bugs.python.org/issue36018> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: @Giampaolo: The docstring in the shared_memory module currently marks the API as experimental. (You read my mind...) I will start a new PR where we can work on the better-integration-into-the-larger-multiprocessing-docs and add comments there. -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: New changeset e895de3e7f3cc2f7213b87621cfe9812ea4343f0 by Davin Potts in branch 'master': bpo-35813: Tests and docs for shared_memory (#11816) https://github.com/python/cpython/commit/e895de3e7f3cc2f7213b87621cfe9812ea4343f0 -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36018] Add a Normal Distribution class to the statistics module
Davin Potts added the comment: There is an inconsistency worth paying attention to in the choice of names of the input parameters. Currently in the statistics module, pvariance() accepts a parameter named "mu" and pstdev() and variance() each accept a parameter named "xbar". The docs describe both "mu" and "xbar" as "it should be the mean of data". I suggest it is worth rationalizing the names used within the statistics module for consistency before reusing "mu" or "xbar" or anything else in NormalDist. Using the names of mathematical symbols that are commonly used to represent a concept is potentially confusing because those symbols are not always *universally* used. For example, students are often introduced to new concepts in introductory mathematics texts where concepts such as "mean" appear in formulas and equations not as "mu" but as "xbar" or simply "m" or other simple (and hopefully "friendly") names/symbols. As a mathematician, if I am told a variable is named, "mu", I still feel the need to ask what it represents. Sure, I can try guessing based upon context but I will usually have more than one guess that I could make. Rather than continue down a path of using various mathematical-symbols-written-out-in-English-spelling, one alternative would be to use less ambiguous, more informative variable names such as "mean". It might be worth considering a change to the parameter names of "mu" and "sigma" in NormalDist to names like "mean" and "stddev", respectively. Or perhaps "mean" and "standard_deviation". Or perhaps "mean" and "variance" would be easier still (recognizing that variance can be readily computed from standard deviation in this particular context). In terms of consistency with other packages that users are likely to also use, scipy.stats functions/objects commonly refer to these concepts as "mean" and "var". I like the idea of making NormalDist readily approachable for students as well as those more familiar with these concepts. The offerings in scipy.stats are excellent but they are not always the most approachable things for new students of statistics. -- ___ Python tracker <https://bugs.python.org/issue36018> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: > FWIW I bumped into this lib: http://semanchuk.com/philip/sysv_ipc/ The author of that lib, Philip Semanchuk, is one of the people participating in this effort -- he has posted above in msg334934 here on b.p.o. and has helped review the PR in GH-11816. He is also the author of the posix_ipc package which was the original basis for our POSIX Shared Memory implementation here. The decision to base our Unix platform support upon POSIX and not SystemV libraries came after considerable research and there are important differences between the two. To oversimplify: POSIX Shared Memory support has now been available for some time on Linux, *BSD, MacOS, and others and is something of a successor to the SystemV. > That assumes a single app/process which spawns a child (the "worker"). Not true. A manager started by one process can be connected to by another process that is not a child. This is covered in the docs here: https://docs.python.org/3/library/multiprocessing.html#using-a-remote-manager That child can then request that shared memory blocks it creates be remotely tracked and managed by that remote process's manager. While I would not expect this to be a common use case, this is a feature of BaseManager that we inherit into SharedMemoryManager. The SyncManager.Lock can be used as part of this as well. Thus, two unrelated apps/processes *can* coordinate their management of shared memory blocks through the SharedMemoryManager. > That would translate into a new Semaphore(name=None, create=False) > class which (possibly?) would also provide better performances > compared to SyncManager.Semaphore Right! You might have noticed that Philip has such a semaphore construct in his posix_ipc lib. I opted to not attempt to add this feature as part of this effort to both (1) keep focused on the core needs to work with shared memory, and (2) to take more time in the future to work out how to get cross-platform support for the semaphore right (as you point out, there are complications to work through). > Extra 1: apparently there are also POSIX msgget(), msgrcv() and > msgsnd() syscalls which could be used to implement a System-V message > Queue similar to SyncManager.Queue later on. Right! This is also something Philip has in his posix_ipc lib. This should be part of the roadmap for what we do next with SharedMemory. This one may be complicated by the fact that not all platforms that implement POSIX Shared Memory chose to also implement these functions in the same way. We will need time to work out what we can or can not reasonably do here. > Extra 2: given the 2 distinct use-cases I wonder if the low-level > component (shared_memory.py) really belongs to multiprocessing module Given what I wrote above about how multiprocessing.managers does enable these use cases and the existing "distributed shared memory" support in multiprocessing, I think it logically belongs in multiprocessing. I suggest that "shm_open" and "shm_unlink" are our low-level tools, which appropriately are in _posixshmem, but SharedMemory and the rest are high-level tools; SharedMemoryManager will not be able to cover all life-cycle management use cases thus SharedMemory will be needed by many and in contrast, "shm_open" and "shm_unlink" will be needed only by those wishing to do something wacky. (Note: I am not trying to make "wacky" sound like a bad thing because wacky can be very cool sometimes.) Philip's ears should now be burning, I mentioned him so many times in this post. Ah! He beat me to it while I was writing this. Awesome! We would not be where we are with SharedMemory without his efforts over many years with his posix_ipc lib. -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: > Code looks much better now. I'm still not convinced > "SharedMemory(name=None, create=False, size=0)" is the best API. > How are you supposed to "create or attach" atomically? We are consciously choosing to not support an atomic "create or attach". This significantly simplifies the API and avoids the valid concerns raised around user confusion relating to that behavior (including the use of different specified 'size' values in a race) but does not preclude our potentially introducing this as a feature in the future. This simpler API still supports a "try: create; except: attach" which is not atomic but effectively covers the primary use cases for "create or attach". Combined with a SyncManager.Lock, users can already achieve an atomic "create or attach" using this simpler API. > Also, could you address my comment about size? > https://bugs.python.org/issue35813#msg335731 >> Let me rephrase: are we forced to specify a value (aka call >> ftruncate()) on create ? If we are as I think, could size have a >> reasonable default value instead of 0? Basically I'm wondering if we >> can relieve the common user from thinking about what size to use, >> mostly because it's sort of a low level detail. Could it perhaps >> default to mmap.PAGESIZE? Apologies for not responding to your question already, Giampaolo. For the same reasons that (in C) malloc does not provide a default size, I do not think we should attempt to provide a default here. Not all platforms allocate shared memory blocks in chunks of mmap.PAGESIZE, thus on some platforms we would unnecessarily over-allocate no matter what default size we might choose. I do not think we should expect users to know what mmap.PAGESIZE is on their system. I think it is important that if a user requests a new allocation of memory, that they first consider how much memory will be needed. When attaching to an existing shared memory block, its size is already defined. I think this even fits with CPython's over-allocation strategies behind things like list, where an empty list triggers no malloc at all. We will not allocate memory until the user tells us how much to allocate. > Also, there is no way to delete/unwrap memory without using an > existing SharedMemory instance, which is something we may not have > on startup. Perhaps we should have a "shared_memory.unlink(name)" > function similar to os.unlink() which simply calls C shm_unlink(). It is not really possible to offer this on non-POSIX platforms so I think we should not attempt to offer a public "shared_memory.unlink(name)". It is possible to invoke "shm_unlink" with the name of a shared memory block (for those who really need it) on platforms with POSIX Shared Memory support via: shared_memory._posixshmem.shm_unlink('name') -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: The simpler API is now implemented in GH-11816 as discussed previously. Notably: > * We go with this simpler API: SharedMemory(name=None, create=False, size=0) > * 'size' is ignored when create=False > * create=True acts like O_CREX and create=False only attaches to existing > shared memory blocks As part of this change, the PosixSharedMemory and WindowsNamedSharedMemory classes are no more; they have been consolidated into the SharedMemory class with a single, simpler, consistent-across-platforms API. On the SharedMemory class, 'size' is now stored by the __init__ and does not use fstat() as part of its property. Also, SharedMemoryManager (and its close friends) has been relocated to the multiprocessing.managers submodule, matching the organization @Giampaolo outlined previously: multiprocessing.managers.SharedMemoryManager multiprocessing.managers._SharedMemoryTracker multiprocessing.managers.SharedMemoryServer (not documented) multiprocessing.shared_memory.SharedMemory multiprocessing.shared_memory.SharedList multiprocessing.shared_memory.WindowsNamedSharedMemory (REMOVED) multiprocessing.shared_memory.PosixSharedMemory (REMOVED) I believe this addresses all of the significant discussion topics in a way that brings together all of the excellent points being made. Apologies if I have missed something -- I did not think so but I will go back through all of the discussions tomorrow to double-check. -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: > I think we need the "create with exclusive behavior" option, even > though we don't know how to implement it on Windows right now. A fix to avoid the potential race condition on Windows is now part of GH-11816. > To support 1 & 2, we could just have 'create'. When true, it would > act like O_CREX. When false, you would get an error if the name > doesn't already exist. I am good with this and now it can be supported. > a 3rd case where you have "co-equal" processes and any one of them > could create and the others would attach. There are some practical use cases motivating this. Rather than debate the merits of those use cases, given the concern raised, perhaps we should forego supporting this 3rd case for now. > Regarding 'size', I think it is a bit weird how it currently works. > Maybe 'size' should only be valid if you are creating a new shared > memory object. This would avoid potential confusion in the details of how attempts to resize do/don't work on different platforms. I would prefer to not need to explain that on MacOS, requesting a smaller size is disallowed. This defers such issues until considering a "resize()" method as you suggest. I like this. > Should 'size' be a property that always does fstat() to find the > size of the underlying file? The potential exists for non-Python code to attach to these same shared memory blocks and alter their size via ftruncate() (only on certain Unix platforms). We could choose to not support such "external" changes and let size be a fixed value from the time of instantiation. But I would like to believe we can be more effective and safely use fstat() behind our reporting of 'size'. > It seems unclear to me how you should avoid cluttering /var/run/shm > with shared memory objects that people forget to cleanup. This is the primary purpose of the SharedMemoryManager. Admittedly, we will not convince everyone to use it when they should, just like we are not able to convince everyone to use NamedTemporaryFile for their temp files. To update the proposed change to the API: * We go with this simpler API: SharedMemory(name=None, create=False, size=0) * 'size' is ignored when create=False * create=True acts like O_CREX and create=False only attaches to existing shared memory blocks Remaining question: do PosixSharedMemory and WindowsNamedSharedMemory mirror this simplified API or do we expose the added functionality each offers, permitting informed users to use things like 'mode' when they know it is enforced on a particular platform? -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: @giampaolo: > Also, what happens if you alter the size of an existing object with a smaller > value? Is the memory region overwritten? Attaching to an existing shared memory block with a size=N which is smaller than its allocated size (say it was created with size=M and N Can't you just avoid calling ftruncate() if size is not passed (None)? It looks like it does skip calling ftruncate() if size is 0. From posixshmem.c: if (size) { DPRINTF("calling ftruncate, fd = %d, size = %ld\n", self->fd, size); if (-1 == ftruncate(self->fd, (off_t)size)) { >> I think this misses the ... > It appears this is already covered: Sorry for any confusion; I was interpreting your proposed parameter name, attach_if_exists, in the following way: * attach_if_exists=True: If exists, attach to it otherwise create one * attach_if_exists=False: Create a new one but do not attach to an existing with the same name I did not see a way to indicate a desire to *only* attach without creation. I need a way to test to see if a shared memory block already exists or not without risk of creating one. At least this is how I was interpreting "attach if exists". > Don't you also want to "create if it doesn't exist, else attach" as a single, > atomic operation? Yes, I do! This was part of my description for the parameter named "create" in msg335660: When set to True, a new shared memory block will be created unless one already exists with the supplied unique name, in which case that block will be attached to and used. > I'm not sure if there are or should be sync primitives to "wait for another > memory to join me" etc. In the case of shared memory, I do not think so. I think such signaling between processes, when needed, can be accomplished by our existing signaling mechanisms (like, via the Proxy Objects for Event or Semaphore). -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: @giampaolo: > 1) As for SharedMemoryManager, I believe it should live in > multiprocessing.managers, not shared_memory.py. I am happy to see it live in multiprocessing.managers so long as we can provide a clean way of handling what happens on a platform where we can not support shared memory blocks. (We have implementations for PosixSharedMemory and NamedSharedMemory which together cover Windows, Linux, MacOS, the *BSDs, and possibly others but that does not cover everything.) @Neil has already raised this question of what do we want the behavior to be on these unsupported platforms on import? If everything dependent upon shared memory blocks remains inside shared_memory.py, then we could raise a ModuleNotFoundError or ImportError or similar when attempting to `import shared_memory`. If we move SharedMemoryManager to live in multiprocessing.managers, we need to decide how to handle (and communicate to the user appropriately) its potential absence. So far, I am unable to find a good example of another module where they have chosen to split up such code rather than keeping it all bottled up inside a single module, but perhaps I have missed something? > 2) Same for SharedMemoryServer (which is a subclass of > multiprocessing.managers.Server). Same thing as above. If we decide how to handle the unsupported platforms on import, we can re-organize appropriately. > 3) ShareableList name is kinda inconsistent with other classes (they all have > a "Shared" prefix). I'd call it SharedList instead. Oooh, interesting. I am happy to see a name change here. To share how I came up with its current name: I had thought to deliberately break the naming pattern here to make it stand out. The others, SharedMemory, SharedMemoryManager, and SharedMemoryServer, are all focused on the shared memory block itself which is something of a more primitive concept (like accessing SharedMemory.buf as a memoryview) compared to working with something like a list (a less primitive, more widely familiar concept). Likewise, I thought a dict backed by shared memory might be called a ShareableDict and other things like a NumPy array backed by shared memory might be called a ShareableNDArray or similar. I was hoping to find a different pattern for the names of these objects-backed-by-shared-memory-blocks, but I am uncertain I found the best name. > 4) I have some reservations about SharedMemory's "flags" and "mode" args. It sounds like you are agreeing with what I advocated in msg335660 (up above). Great! -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: @giampaolo: > Maybe something like this instead? > SharedMemory(name=None, attach_if_exists=False, size=0) I think this misses the use case when wanting to ensure we only attach to an existing shared memory block and if it does not exist, we should raise an exception because we can not continue. (If the shared memory block should already be there but it is not, this means something bad happened earlier and we might not know how to recover here.) I believe the two dominant use cases to address are: 1) I want to create a shared memory block (either with or without a pre-conceived name). 2) I want to attach to an existing shared memory block by its unique name. -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: @giampaolo: > 1) it seems SharedMemory.close() does not destroy the memory region > (I'm able to re-attach to it via name). If I'm not mistaken only > the manager can do that. Correct, close() does not and should not destroy the memory region because other processes may still be using it. Only a call to unlink() triggers the destruction of the memory region and so unlink() should only be called once across all the processes with access to that shared memory block. The unlink() method is available on the SharedMemory class. No manager is required. This is also captured in the docs. > 2) I suggest to turn SharedMemory.buf in a read-onl property Good idea! I will make this change today, updating GH-11816. > 3) it seems "size" kwarg cannot be zero (current default) >From the docs: When attaching to an existing shared memory block, set to 0 (which is the default). This permits attaching to an existing shared memory block by name without needing to also already know its size. > 4) I wonder if we should have multiprocessing.active_memory_children() or > something I also think this would be helpful but... > I'm not sure if active_memory_children() can return meaningful results with a > brand new process (I suppose it can't). You are right. As an aside, I think it interesting that in the implementation of "System V Shared Memory", its specification called for something like a system-wide registry where all still-allocated shared memory blocks were listed. Despite the substantial influence System V Shared Memory had on the more modern implementations of "POSIX Shared Memory" and Windows' "Named Shared Memory", neither chose to make it part of their specification. By encouraging the use of SharedMemoryManager to track and ensure cleanup, we are providing a reliable and cross-platform supportable best practice. If something more low-level is needed by a user, they can choose to manage cleanup themselves. This seems to parallel how we might encourage, "when opening a file, always use a with statement", yet users can still choose to call open() and later close() when they wish. -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: Regarding the API of the SharedMemory class, its flags, mode, and read_only parameters are not universally enforced or simply not implemented on all platforms that offer POSIX Shared Memory or Windows Named Shared Memory. A simplified API for the SharedMemory class that behaves consistently across all platforms would avoid confusion for users. For users who have specific need of flags/mode/read_only controls on a platform that they know does indeed respect that control, they should still have a mechanism to leverage those controls. I propose a simpler, consistent-across-platforms API like: SharedMemory(name=None, create=False, size=0) *name* and *size* retain their purpose though the former now defaults to None. *create* is set to False to indicate no new shared memory block is to be created because we only wish to attach to an already existing shared memory block. When set to True, a new shared memory block will be created unless one already exists with the supplied unique name, in which case that block will be attached to and used. Example of attaching to an already existing shared memory block: SharedMemory(name='uniquename') Example of creating a new shared memory block where any new name will do: SharedMemory(create=True, size=128) Example of creating/attaching a shared memory block with a specific name: SharedMemory(name='specialsnowflake', create=True, size=4096) Even with its simplified API, SharedMemory will continue to be powered by PosixSharedMemory on systems where "POSIX Shared Memory" is implemented and powered by NamedSharedMemory on Windows systems. The API for PosixSharedMemory will remain essentially unchanged from its current form: PosixSharedMemory(name=None, flags=None, mode=0o600, size=0, read_only=False) The API for NamedSharedMemory will be updated to no longer attempt to mirror its POSIX counterpart: NamedSharedMemory(name=None, create=False, size=0, read_only=False) To be clear: the inconsistencies motivating this proposed API change is *not* only arising from differences between Windows and POSIX-supporting systems. For example, among systems implementing POSIX shared memory, the mode flag (which promises control over whether user/group/others can read/write to a shared memory block) is often but not always ignored; it differs from one OS to the next. Alternatives/variations to this proposed API change: * Leave the current APIs alone where all 3 classes have identical APIs. Feedback in discussions and from those experimenting with the code suggests this is creating confusion. * Change all 3 classes to have the matching APIs again. This unnecessarily thwarts the ability of users to exploit functionality that they know to be there on specific target platforms that they care about. * Do not expose flags/mode/read_only as part of the input paramemters to PosixSharedMemory/NamedSharedMemory but do expose them as class attributes instead. This arguably makes things unnecessarily complicated. This is not a simple topic but its complexity can be treated in a more straightforward way. * Use a parameter name other than 'create' (e.g. 'attach_only') in the newly proposed API. * Make all input parameters keyword-only for greater flexibility in the API in the future. -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: These questions (originally asked in comments on GH-11816) seemed more appropriate to discuss here: Why should the user want to use `SharedMemory` directly? Why not just go through the manager? Also, perhaps a naive question: don't you _always_ need a `start()`ed manager in order for the processes to communicate? Doesn't `SharedMemoryServer` has to be involved? I think it helps to discuss the last question first. A SharedMemoryManager is *not* needed for two processes to share information across a shared memory block, nor is a SharedMemoryServer required. The docs have examples demonstrating this but here is another meant to showcase exactly this: Start up a Python shell and do the following: >>> from multiprocessing import shared_memory >>> shm = shared_memory.SharedMemory(name=None, size=10) >>> shm.buf[:5] = b'Feb15' >>> shm.name # Note this name and use it in the next steps 'psm_26792_26631' Start up a second Python shell in a new window and do the following: >>> from multiprocessing import shared_memory >>> also_shm = shared_memory.SharedMemory(name='psm_26792_26631') # Use that same name >>> bytes(also_shm.buf[:5]) b'Feb15' If also_shm.buf is further modified in the second shell, those changes will be visible on shm.buf in the first shell. The same is true of the reverse. The key point is that there is no sending of messages between the processes at all. In stark contrast, SyncManager offers and supports objects held in "distributed shared memory" where messages must be sent from one process to another to access or manipulate data; those objects held in "distributed shared memory" *must* have a SyncManager+Server to enable their use. That is not needed at all for SharedMemory because access to and manipulation of the data is performed directly without the cost-delay of messaging. This begs a new question, "so what is the SharedMemoryManager used for then?" The docs answer: To assist with the life-cycle management of shared memory especially across distinct processes, a BaseManager subclass, SharedMemoryManager, is also provided. Because shared memory blocks are not "owned" by a single process, they are not destroyed/freed when a process exits. A SharedMemoryManager is used to ensure the free-ing of a shared memory block when it is no longer needed. New SharedMemory instances may be created via a SharedMemoryManager (in which case their birth-to-death life-cycle is being managed) or they may be created directly as seen in the above example. Returning to the first question, "Why should the user want to use `SharedMemory` directly?", there are more use cases than these two: 1. In process 1, a shared memory block is created by calling SharedMemoryManager.SharedMemory(). In process 2, we need to attach to that existing shared memory block and can do so by referring to its name. This is accomplished as in the above example by simply calling SharedMemory(name='uniquename'). We do not want to attach to it via a second SharedMemoryManager because only one manager should oversee the life-cycle of a single shared memory block. 2. Sometimes direct management of the life-cycle of a shared memory block is desirable. For example, on systems supporting POSIX shared memory, it is a feature that shared memory blocks outlive processes. Some services choose to speed a service restart by preserving state data in shared memory, saving the newly restarted service from rebuilding it. The SharedMemoryManager provides one life-cycle strategy but can not cover all scenarios so the option to directly manage it is important. -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: @Antoine: SharedMemoryManager does not subclass SyncManager but it did previously. This is the source of the confusion. SharedMemoryManager subclasses BaseManager which does not provide Value, Array, list, dict, etc. Agreed that the manager facility does not appear to see that much use in existing code. When working with shared memory, I expect SharedMemoryManager to be much more popular. -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: @giampaolo.rodola: It definitely helps. Conceptually, SyncManager provides "distributed shared memory" where lists, dicts, etc. are held in memory by one process but may be accessed remotely from another via a Proxy Object. Mutating a dict from one process requires sending a message to some other process to request the change be made. In contrast, SharedMemoryManager provides non-distributed shared memory where a special region of memory is held by the OS kernel (not a process) and made directly addressable to many processes simultaneously. Modifying any data in this special region of memory requires zero process-to-process communication; any of the processes may modify the data directly. In a speed contest, the SharedMemoryManager wins in every use case -- and it is not a close race. There are other advantages and disadvantages to each, but speed is the key differentiator. Thinking ahead to the future of SharedMemoryManager, there is the potential for a POSIX shared memory based semaphore. The performance of this semaphore across processes should drastically outperform SyncManager's semaphore. It might be something we will want to support in the future. SharedMemoryManager needs a synchronization mechanism now (in support of common use cases) to coordinate across processes, which is why I initially thought SharedMemoryManager should expose the Lock, Semaphore, Event, Barrier, etc. powered by distributed shared memory. I am no longer sure this is the right choice for three reasons: (1) it unnecessarily complicates and confuses the separation of what is powered by fast SystemV-style shared memory and what is powered by slow distributed shared memory, (2) it would be a very simple example in the docs to show how to add our existing Lock or Semaphore to SharedMemoryManager via register(), (3) if we one day implement POSIX shared memory semaphores (and equivalent where POSIX is not supported), we will have the burden of an existing lock/semaphore creation methods and apis with behavioral differences. I propose that it would be clearer but no less usable if we drop these registered object types (created via calls to register()) from SharedMemoryManager. It is one line of code for a user to add "Lock" to SharedMemoryManager, which I think we can demonstrate well with a simple example. -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: @terry.reedy and @ronaldoussoren: I have asked Van again to provide comments here clarifying the topics of (1) copyright notices and (2) requiring the BSD-licensed-work's author to sign a contributor agreement. Specifically regarding the appearance of __copyright__, I added my agreement to your comments on GH-11816 on this. -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: @giampaolo.rodola: Your patch from 3 days ago in issue35917 included additional tests around the SharedMemoryManager which are now causing test failures in my new PR. This is my fault because I altered SharedMemoryManager to no longer support functionality from SyncManager that I thought could be confusing to include. I am just now discovering this and am not immediately sure if simply removing the SharedMemoryManager-relevant lines from your patch is the right solution but I wanted to mention this thought right away. Thank you for discovering that SyncManager was being overlooked in the tests and the nice patch in issue35917. -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: Docs and tests are now available in a new PR. I have stayed focused on getting these docs and tests to everyone without delay but that means I have not yet had an opportunity to respond to the helpful comments, thoughtful questions, and threads that have popped up in the last few days. I will follow up with all comments as quickly as possible starting in the morning. There are two topics in particular that I hope will trigger a wider discussion: the api around the SharedMemory class and the inclusion-worthiness of the shareable_wrap function. Regarding the api of SharedMemory, the docs explain that not all of the current input parameters are supportable/enforceable across platforms. I believe we want an api that is relevant across all platforms but at the same time we do not want to unnecessarily suppress/hide functionality that would be useful on some platforms -- there needs to be a balance between these motivations but where do we strike that balance? Regarding the inclusion-worthiness of the shareable_wrap function, I deliberately did not include it in the docs but its docstring in the code explains its purpose. If included, it would drastically simplify working with NumPy arrays; please see the code example in the docs demonstrating the use of NumPy arrays without the aid of the shareable_wrap function. I have received feedback from others using this function also worth discussing. Thank you to everyone who has already looked at the code and shared helpful thoughts -- please have a look at the tests and docs. -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Change by Davin Potts : -- pull_requests: +11834 ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Change by Davin Potts : -- pull_requests: +11834, 11835, 11836 ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Change by Davin Potts : -- pull_requests: +11834, 11835 ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35903] Build of posixshmem.c should probe for required OS functions
Davin Potts added the comment: Agreed that the logic for building that code needs exactly this sort of change. Thanks for the patch! It looks like your patch does not happily detect the dependencies on MacOS for some reason, but all appears well on Windows & Linux. I will have a closer look in the morning on a MacOS system. -- ___ Python tracker <https://bugs.python.org/issue35903> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: @lukasz.langa: Missing tests and documentation will be in by alpha2. -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: This work is the result of ~1.5 years of development effort, much of it accomplished at the last two core dev sprints. The code behind it has been stable since September 2018 and tested as an independently installable package by multiple people. I was encouraged by Lukasz, Yury, and others to check in this code early, not waiting for tests and docs, in order to both solicit more feedback and provide for broader testing. I understand that doing such a thing is not at all a novelty. Thankfully it is doing that -- I hope that feedback remains constructive and supportive. There are some tests to be found in a branch (enh-tests-shmem) of github.com/applio/cpython which I think should become more comprehensive before inclusion. Temporarily deferring and not including them as part of the first alpha should reduce the complexity of that release. Regarding the BSD license on the C code being adopted, my conversations with Brett and subsequently Van have not raised concerns, far from it -- there is a process which is being followed to the letter. If there are other reasons to object to the thoughtful adoption of code licensed like this one, that deserves a decoupled and larger discussion first. -- nosy: +brett.cannon ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Davin Potts added the comment: New changeset e5ef45b8f519a9be9965590e1a0a587ff584c180 by Davin Potts in branch 'master': bpo-35813: Added shared_memory submodule of multiprocessing. (#11664) https://github.com/python/cpython/commit/e5ef45b8f519a9be9965590e1a0a587ff584c180 -- ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Change by Davin Potts : -- keywords: +patch, patch pull_requests: +11470, 11471 stage: -> patch review ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Change by Davin Potts : -- keywords: +patch, patch, patch pull_requests: +11470, 11471, 11472 stage: -> patch review ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
Change by Davin Potts : -- keywords: +patch pull_requests: +11470 stage: -> patch review ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35813] shared memory construct to avoid need for serialization between processes
New submission from Davin Potts : A facility for using shared memory would permit direct, zero-copy access to data across distinct processes (especially when created via multiprocessing) without the need for serialization, thus eliminating the primary performance bottleneck in the most common use cases for multiprocessing. Currently, multiprocessing communicates data from one process to another by first serializing it (by default via pickle) on the sender's end then de-serializing it on the receiver's end. Because distinct processes possess their own process memory space, no data in memory is common across processes and thus any information to be shared must be communicated over a socket/pipe/other mechanism. Serialization via tools like pickle is convenient especially when supporting processes on physically distinct hardware with potentially different architectures (which multiprocessing does also support). Such serialization is wasteful and potentially unnecessary when multiple multiprocessing.Process instances are running on the same machine. The cost of this serialization is believed to be a non-trivial drag on performance when using multiprocessing on multi-core and/or SMP machines. While not a new concept (System V Shared Memory has been around for quite some time), the proliferation of support for shared memory segments on modern operating systems (Windows, Linux, *BSDs, and more) provides a means for exposing a consistent interface and api to a shared memory construct usable across platforms despite technical differences in the underlying implementation details of POSIX shared memory versus Native Shared Memory (Windows). For further reading/reference: Tools such as the posix_ipc module have provided fairly mature apis around POSIX shared memory and seen use in other projects. The "shared-array", "shared_ndarray", and "sharedmem-numpy" packages all have interesting implementations for exposing NumPy arrays via shared memory segments. PostgreSQL has a consistent internal API for offering shared memory across Windows/Unix platforms based on System V, enabling use on NetBSD/OpenBSD before those platforms supported POSIX shared memory. At least initially, objects which support the buffer protocol can be most readily shared across processes via shared memory. From a design standpoint, the use of a Manager instance is likely recommended to enforce access rules in different processes via proxy objects as well as cleanup of shared memory segments once an object is no longer referenced. The documentation around multiprocessing's existing sharedctypes submodule (which uses a single memory segment through the heap submodule with its own memory management implementation to "malloc" space for allowed ctypes and then "free" that space when no longer used, recycling it for use again from the shared memory segment) will need to be updated to avoid confusion over concepts. Ultimately, the primary motivation is to provide a path for better parallel execution performance by eliminating the need to transmit data between distinct processes on a single system (not for use in distributed memory architectures). Secondary use cases have been suggested including a means for sharing data across concurrent Python interactive shells, potential use with subinterpreters, and other traditional uses for shared memory since the first introduction of System V Shared Memory onwards. -- assignee: davin components: Library (Lib) messages: 334278 nosy: davin, eric.snow, lukasz.langa, ned.deily, rhettinger, yselivanov priority: normal severity: normal status: open title: shared memory construct to avoid need for serialization between processes type: enhancement versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue35813> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33725] Python crashes on macOS after fork with no exec
Davin Potts added the comment: @ned.deily: Apologies, I misread what you wrote -- I would like to see the random segfaults that you were seeing on Mojave if you can still point me to a few. -- ___ Python tracker <https://bugs.python.org/issue33725> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33725] Python crashes on macOS after fork with no exec
Davin Potts added the comment: Do we really need to disable the running of test_multiprocessing_fork entirely on MacOS? My understanding so far is that not *all* of the system libraries on the mac are spinning up threads and so we should expect that there are situations where fork alone may be permissible, but of course we don't yet know what those are. Pragmatically speaking, I have not yet seen a report of test_multiprocessing_fork tests triggering this problem but I would like to see/hear that when it is observed (that's my pitch for leaving the tests enabled). -- ___ Python tracker <https://bugs.python.org/issue33725> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35219] macOS 10.14 Mojave crashes in multiprocessing
Davin Potts added the comment: Resolution is marked dupe but status is still open. Are we closing this one or is there a more specific remedy for this situation (as opposed to what issue33725 discusses) that would be helpful to document? -- nosy: +davin ___ Python tracker <https://bugs.python.org/issue35219> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33725] Python crashes on macOS after fork with no exec
Davin Potts added the comment: Given the original post mentioned 2.7.15, I wonder if it is feasible to fork near the beginning of execution, then maintain and pass around a multiprocessing.Pool to be used when needed instead of dynamically forking? Working with legacy code is almost always more interesting than you want it to be. -- ___ Python tracker <https://bugs.python.org/issue33725> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33725] Python crashes on macOS after fork with no exec
Davin Potts added the comment: Barry's effort as well as comments in other links seem to all suggest that OBJC_DISABLE_INITIALIZE_FORK_SAFETY is not comprehensive in its ability to make other threads "safe" before forking. "Objective-C classes defined by the OS frameworks remain fork-unsafe" (from @kapilt's first link) suggests we furthermore remain at risk using certain MacOS system libraries prior to any call to fork. "To guarantee that forking is safe, the application must not be running any threads at the point of fork" (from @kapilt's second link) is an old truth that we continue to fight with even when we know very well that it's the truth. For newly developed code, we have the alternative to employ spawn instead of fork to avoid these problems in Python, C, Ruby, etc. For existing legacy code that employed fork and now surprises us by failing-fast on MacOS 10.13 and 10.14, it seems we are forced to face a technical debt incurred back when the choice was first made to spin up threads and afterwards to use fork. If we didn't already have an "obvious" (zen of Python) way to avoid such problems with spawn versus fork, I would feel this was something to solve in Python. As to helping the poor unfortunate souls who must fight the good fight with legacy code, I am not sure what to do to help though I would like to be able to help. -- ___ Python tracker <https://bugs.python.org/issue33725> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35242] multiprocessing.Queue in an inconsistent state and a traceback silently suppressed if put an unpickable object and process's target function is finished
Change by Davin Potts : -- nosy: +davin ___ Python tracker <https://bugs.python.org/issue35242> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33196] multiprocessing: serialization must ensure that contexts are compatible (the same)
Change by Davin Potts : -- nosy: +davin ___ Python tracker <https://bugs.python.org/issue33196> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31308] forkserver process isn't re-launched if it died
Davin Potts added the comment: I have two concerns with this: 1) The implicit restart of the forkserver process seems in conflict with the zen of making things explicit. 2) This would seem to make forkserver's behavior inconsistent with the behavior of things like the Manager which similarly creates its own process for managing resources but does not automatically restart that process if it should die or become unreachable. In the case of the Manager, I don't think we'd want it to automagically restart anything in these situations so it's not a simple matter of enhancing the Manager to adopt similar behavior. I do appreciate the use cases that would be addressed by having a convenient way to detect that a forkserver has died and then restart it. If the forkserver dies, I doubt we really want it to try to restart a potentially infinite number of times. Maybe a better path would be if we had a way to explicitly request that the Process trigger a restart of the forkserver, if necessary, but this setting/request defaults to False? -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31308> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20854] multiprocessing.managers.Server: problem with returning proxy of registered object
Davin Potts added the comment: It appears that the multiple workarounds proposed by the OP (@allista) address the original request and that there is no bug or unintended behavior arising from multiprocessing itself. Combined with the lack of activity in this discussion, I'm inclined to believe that the workarounds have satisfied the OP and this issue should be closed. -- nosy: +davin status: open -> pending type: -> behavior ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue20854> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30339] test_multiprocessing_main_handling: "RuntimeError: Timed out waiting for results" on x86 Windows7 3.x
Davin Potts added the comment: Patch on issue30317 also addresses this issue in a more flexible way. -- dependencies: +test_timeout() of test_multiprocessing_spawn.WithManagerTestBarrier fails randomly on x86 Windows7 3.x buildbot nosy: +davin ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30339> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30317] test_timeout() of test_multiprocessing_spawn.WithManagerTestBarrier fails randomly on x86 Windows7 3.x buildbot
Davin Potts added the comment: To better accommodate very slow buildbots, a parameter is added in PR-1722 to scale up the timeout durations where they are necessary on a per-machine basis. Relevant tests have a timeout set to some default number of seconds times a multiplier value. The multiplier value can be controlled by the environment variable 'CONF_TIMEOUT_MULTIPLIER' which defaults to a multiplier of 1.0 if not set. On buildbots, this environment variable can be set by defining a parameter by that name in the buildbot configuration file for a machine. Otherwise, this environment variable can be set in the usual way before running tests on non-buildbot machines. -- nosy: +davin, zach.ware stage: -> patch review ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30317> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30317] test_timeout() of test_multiprocessing_spawn.WithManagerTestBarrier fails randomly on x86 Windows7 3.x buildbot
Changes by Davin Potts <pyt...@discontinuity.net>: -- pull_requests: +1810 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30317> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28053] parameterize what serialization is used in multiprocessing
Davin Potts added the comment: Docs need updating still. -- versions: +Python 3.7 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28053> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26434] multiprocessing cannot spawn grandchild from a Windows service
Davin Potts added the comment: Patch committed in 2.7 branch. Thanks for your help, Marc. -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26434> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26434] multiprocessing cannot spawn grandchild from a Windows service
Davin Potts added the comment: New changeset c47c315812b1fa9acb16510a7aa3b37d113def48 by Davin Potts (Marc Schlaich) in branch '2.7': bpo-26434: Fix multiprocessing grandchilds in a Windows service (GH-1167) https://github.com/python/cpython/commit/c47c315812b1fa9acb16510a7aa3b37d113def48 -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26434> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30379] multiprocessing Array create for ctypes.c_char, TypeError unless 1 char string arg used
Davin Potts added the comment: Perhaps I should've used ctypes.c_uint8 in that example/question instead. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30379> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30379] multiprocessing Array create for ctypes.c_char, TypeError unless 1 char string arg used
Davin Potts added the comment: Maybe I missed your point but why would you not want to do this instead? >>> mp.Array(ctypes.c_int8, arr) > -- nosy: +davin ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30379> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30018] multiprocessing.Pool garbles call stack for __new__
Davin Potts added the comment: > I am unfortunately not at liberty to share the code I'm working on. I very much understand and am very thankful you took the time to create a simple example that you could share. Honestly, that's the reason I felt inspired to stop what I was doing to look at this now rather than later. > I suppose I should just work around it by checking right away if the input to > my constructor has already been constructed! There are probably a number of different ways to address it but your suggestion of adding a check to see if this is the first time that object has been constructed sounds like it might be an easy win. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30018> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30018] multiprocessing.Pool garbles call stack for __new__
Davin Potts added the comment: Expanding my above example to show how multiprocessing relates: >>> import multiprocessing >>> import os >>> class Floof(object): ... def __new__(cls): ... print("New via pid=%d" % os.getpid()) ... return object.__new__(cls) ... >>> os.getpid() # parent pid 46560 >>> pool = multiprocessing.Pool(1) >>> getter = pool.apply_async(Floof, (), {}) # output seen from child AND >>> parent >>> New via pid=46583 New via pid=46560 >>> getter.get() # everything seems to be working >>> as intended <__main__.Floof object at 0x10866f250> FWIW, near the end of my prior message: s/it didn't merely/it merely/ -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30018> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30018] multiprocessing.Pool garbles call stack for __new__
Davin Potts added the comment: It looks like the first 'Called Foo.__new__' is being reported by the child (pool of 1) process and the second 'Called Foo.__new__' is being reported by the parent process. In multiprocessing, because objects are by default serialized using pickle, this may be caused by the unpickling of the Foo object by the parent process which is something you would not experience when using ThreadPool because it does not have the same need for serialization. Example showing invocation of __new__ as part of unpickling: >>> class Foo(object): ... def __new__(cls): ... print("New") ... return object.__new__(cls) ... >>> import pickle >>> f = Foo() New >>> pf = pickle.dumps(f, protocol=2) >>> pickle.loads(pf) # unpickling triggers __new__ New <__main__.Foo object at 0x1084a06d0> Having discovered this phenomenon, is this causing a problem for you somewhere in code? (Your example code on github was helpful, thank you, but it didn't merely demonstrated the behavior and didn't show where this was causing you pain.) -- nosy: +davin ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30018> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29828] Allow registering after-fork initializers in multiprocessing
Davin Potts added the comment: Having a read through issue16500 and issue6721, I worry that this could again become bogged down with similar concerns. With the specific example of NumPy, I am not sure I would want its random number generator to be reseeded with each forked process. There are many situations where I very much need to preserve the original seed and/or current PRNG state. I do not yet see a clear, motivating use case even after reading those two older issues. I worry that if it were added it would (almost?) never get used either because the need is rare or because developers will more often think of how this can be solved in their own target functions when they first start up. The suggestion of a top-level function and Context method make good sense to me as a place to offer such a thing but is there a clearer use case? -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29828> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29795] Clarify how to share multiprocessing primitives
Changes by Davin Potts <pyt...@discontinuity.net>: -- resolution: -> works for me stage: needs patch -> resolved status: open -> closed ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29795> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17560] problem using multiprocessing with really big objects?
Davin Potts added the comment: @artxyz: The current release of 2.7 is 2.7.13 -- if you are still using 2.7.5 you might consider updating to the latest release. As pointed out in the text of the issue, the multiprocessing pickler has been made pluggable in 3.3 and it's been made more conveniently so in 3.6. The issue reported here arises from the constraints of working with large objects and pickle, hence the enhanced ability to take control of the multiprocessing pickler in 3.x applies. I'll assign this issue to myself as a reminder to create a blog post around this example and potentially include it as a motivating need for controlling the multiprocessing pickler in the documentation. -- assignee: -> davin nosy: +davin ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17560> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29795] Clarify how to share multiprocessing primitives
Davin Potts added the comment: On Windows, because that OS does not support fork, multiprocessing uses spawn to create new processes by default. Note that in Python 3, multiprocessing provides the user with a choice of how to create new processes (i.e. fork, spawn, forkserver). When fork is used, the 'q = Queue()' in this example would be executed once by the parent process before the fork takes place, the resulting child process continues execution from the same point as the parent when it triggered the fork, and thus both parent and child processes would see the same multiprocessing.Queue. When spawn is used, a new process is spawned and the whole of this example script would be executed again from scratch by the child process, resulting in the child (spawned) process creating a new Queue object of its own with no sense of connection to the parent. Would you be up for proposing replacement text to improve the documentation? Getting the documentation just right so that everyone understands it is worth spending time on. -- nosy: +davin stage: -> needs patch type: behavior -> enhancement ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29795> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29701] Add close method to queue.Queue
Davin Potts added the comment: The example of AMQP is perhaps a stronger argument for why multiprocessing.Queue.close should (or does) exist, not as much a reason for queue.Queue. The strongest point, I think, is the argument that existing patterns are lacking. In the multiprocessing module, the pattern of placing None into a queue.Queue to communicate between threads is also used but with a slightly different use case: a queue may have multiple None's added to it so that the queue's contents may be fully consumed and at the end the consumers understand to not look for more work when they each get a None. It might be restated as "do your work, then close". If close were introduced to queue.Queue as proposed, it would not eliminate the need for this pattern. Thankfully inside multiprocessing the number of threads is known (for example, a thread to manage each process created by multiprocessing) making code possible such as: `inqueue.queue.extend([None] * size)`. In the more general case, the point that `size` is not always known is a valid one. In this same vein, other parts of multiprocessing could potentially make use of queue.Queue.close but at least in multiprocessing's specific case I'm not sure I see a compelling simplification to warrant the change. Though multiprocessing doesn't provide one, I think it would be helpful to see concrete use cases where there would be a clear benefit. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29701> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29454] Shutting down consumer on a remote queue
Davin Potts added the comment: My understanding of other message queueing systems is that many are motivated by speed to the point that they will permit messages to be "lost" due to specific scenarios that would be overly costly to defend against. Other message queueing systems adopt a philosophy that no message should ever be lost but as a compromise to speed do not promise that a message will be immediately recovered when caught in one of these problematic scenarios, only that it will eventually be recovered and processed fully. It appears that the philosophy adopted or really the solution requirements lead to different best practices. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29454> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29454] Shutting down consumer on a remote queue
Changes by Davin Potts <pyt...@discontinuity.net>: -- stage: -> needs patch type: behavior -> enhancement ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29454> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29454] Shutting down consumer on a remote queue
Davin Potts added the comment: My understanding is that example uses a queue.Queue() to demonstrate how to create a custom, remote service from scratch. The implementation in this simple example lacks the sophistication of multiprocessing.Queue() for handling situations such as the one raised by the OP. The example was not attempting to demonstrate a comprehensive replacement for multiprocessing.Queue(), rather it was attempting to demonstrate the mechanism for creating and consuming a callable service hosted by a remote manager. The documentation currently does not introduce this example well nor describe the above motivation. As to why this simplistic implementation of a distributed queue appears to lose an item when the client is killed, it works in the following way: 1. Let's say a server is started to hold a queue.Queue() which is populated with 1 item. 2. A client requests an item from the server. 3. The server receives the request and performs a blocking q.get() (where q is the queue.Queue() object held by the server). 4. When the q.get() releases and returns an item, q has had one item removed leaving a queue size of 0 in our scenario, and then that item is sent from the server to the client. 5. A client requests another item from the server. 6. The server receives the request and performs a blocking q.get() on the queue. Because there's nothing left to grab from the queue, the server blocks and waits for something to magically appear in the queue. We'll have a "producer" put something into the queue in a moment but for the time being the server is stuck waiting on the q.get() and likewise the client is waiting on a response from the server. 7. That client is killed in an unexpected, horrible death because someone accidentally hits it with a Cntrl-C. 8. A "producer" comes along and puts a new item into the server's queue. 9. The server's blocking q.get() call releases, q has had one item removed leaving a queue size of 0 again, and then that item is sent from the server to the client only the client is dead and the transmission fails. 10. A "producer" comes along and puts another new item into the server's queue. 11. The someone who accidentally, horribly killed the client now frantically restarts the client; the client requests an item from the server and the server responds with a new item. However, this is the item introduced in step 10 and not the item from step 8. Hence the item from step 8 appears lost. Note that in our simplistic example from the docs, there is no functionality to repopulate the queue object when communication of the item fails to complete. In general, a multiprocessing.manager has no idea what a manager will contain and has no insight on what to do when a connection to a client is severed. Augmenting the example in the docs to cover situations like this would significantly complicate the example but there are many others to consider on the way to building a comprehensive solution -- instead a person should choose multiprocessing.Queue() unless they have something particular in mind. I think the example should be better introduced (the intro is terse) to explain its purpose and warn that it does not offer a comprehensive replacement for multiprocessing.Queue(). It does not need to go into all of the above explanation. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29454> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29575] doc 17.2.1: basic Pool example is too basic
Davin Potts added the comment: When passing judgement on what is "too basic", the initial example should be so basic as to be immediately digestible by as many people as possible. Some background: All too many examples mislead newcomers into believing that the number of processes should (a) match the number of processor cores, or (b) match the number of inputs to be processed. This example currently attempts to dispel both notions. In practice, and this depends upon what specific code is to be performed in parallel, it is not uncommon to find that slightly over-scheduling the number of processes versus the number of available cores can achieve superior throughput and performance. In other cases, slightly under-scheduling may provide a win. To help subtly encourage the newcomer, this example uses 5 processes as opposed to something which might be mistaken for a common number of cores available on current multi-core processors. Likewise, the number of distinct inputs to be processed deliberately does not match the number of processes nor a multiple of the number of processes. This hopefully encourages the newcomer to not feel obligated to only accept inputs of a particular size or multiple. Granted, optimizing for performance motivates tuning such things but this is the first example / first glance at what functionality is available. Considering the suggested change: * range(20) will likely produce more output than can be comfortably accommodated and easily read in the available browser window where most will see this * the addition of execution time measurement is an interesting choice here given how computationally trivial the f(x) function is, which is perhaps what motivated the introduction of a time.sleep(1) inside that function; a ThreadPool would be more appropriate for a sleepy function such as this Ultimately these changes complicate the example while potentially undermining its value. An interesting improvement to this example might be to introduce a computationally taxing function which more clearly demonstrates the benefit of using a process Pool but still achieving the ideal of being immediately digestible and understood by the largest reading audience. Some of the topics/variations in the proposed change might be better introduced and addressed later in the documentation rather than unnecessarily complicating the first example. -- resolution: -> works for me stage: -> resolved status: open -> closed type: -> enhancement ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29575> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19675] Pool dies with excessive workers, but does not cleanup
Davin Potts added the comment: For triggering the exception, supplying a Process target that deliberately fails sounds right. As for tests for the various start methods (fork/forkserver/spawn), if you are looking at the 3.x branches you'll find this was been consolidated so that one test could conceivably be written to handle multiple variants (see Lib/test/_test_multiprocessing.py). -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19675> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19675] Pool dies with excessive workers, but does not cleanup
Davin Potts added the comment: @Winterflower: Thank you for encouraging @dsoprea to create the new PR and working to convert the previous patch. @dsoprea: Thank you for taking the time to create the PR especially after this has been sitting unloved for so long. Though the new workflow using PR's is still in a bit of a state of flux, my understanding is that we will want to have one PR per feature branch (i.e. one for each of 2.7, 3.6, 3.7) that we want to target. Now that we seem to have spawned two parallel discussion tracks (one here and one in the PR https://github.com/python/cpython/pull/57), I'm not sure how best to resolve that but for the time being I'll offer code-related comments here as they're much more likely to be preserved (and thus discoverable) for posterity: we do need some sort of tests around this to complete the patch -- something that would exercise both the non-exception and exception paths (and thus would detect that intended call to util.debug()). -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19675> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19675] Pool dies with excessive workers, but does not cleanup
Changes by Davin Potts <pyt...@discontinuity.net>: -- versions: +Python 2.7, Python 3.7 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19675> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9914] trace/profile conflict with the use of sys.modules[__name__]
Davin Potts added the comment: Though this issue is specifically concerned with runpy APIs and their impact especially in running unittest test scripts, it's worth commenting here for people who need a workaround in the short term: code such as that shared in http://stackoverflow.com/q/41892297/1878788 can be made to run happily by creating a second script which imports the first and simply runs the test(s) from there. In the specific case of the 'forkiter.py' from http://stackoverflow.com/q/41892297/1878788, one would create a 'run_my_tests.py' with the contents: from forkiter import main if __name__ == "__main__": exit(main()) Now this invocation of cProfile runs happily because pickle is able to see the module where all the needed classes/functions were defined: python3.6 -m cProfile -o forkiter.prof ./run_my_tests.py -- nosy: +davin ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9914> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29284] Include thread_name_prefix in the concurrent.futures.ThreadPoolExecutor example 17.4.2.1
Changes by Davin Potts <pyt...@discontinuity.net>: -- nosy: +davin ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29284> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29345] More lost updates with multiprocessing.Value and .Array
Davin Potts added the comment: I'm having difficulty watching your video attachment. Would it be possible to instead describe, preferably with example code that others can similarly try to reproduce the behavior, what you're experiencing? Please keep in mind what the documentation repeatedly advises about the need for capturing your process-creating multiprocessing calls inside a "if __name__ == '__main__'" clause, especially on Windows platforms. -- nosy: +davin ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29345> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20804] Sentinels identity lost when pickled (unittest.mock)
Davin Potts added the comment: Serhiy: The above discussion seemed to converge on the perspective that object identity should not survive pickling and that the point of a sentinel is object identity. While your proposed patch may mechanically work, I believe it is in conflict with the outcome of the thoughtful discussion above. -- nosy: +davin ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue20804> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29229] incompatible: unittest.mock.sentinel and multiprocessing.Pool.map()
Davin Potts added the comment: I think this should be regarded as a duplicate of issue20804 though discussion in issue14577 is also related/relevant. -- superseder: -> Sentinels identity lost when pickled (unittest.mock) ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29229> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29229] incompatible: unittest.mock.sentinel and multiprocessing.Pool.map()
Davin Potts added the comment: This arises from the behavior of pickle (which is used by default in multiprocessing to serialize objects sent to / received from other processes in data exchanges), as seen with Python 3.6: >>> import pickle >>> x = pickle.dumps(mock.sentinel.foo) >>> x b'\x80\x03cunittest.mock\n_SentinelObject\nq\x00)\x81q\x01}q\x02X\x04\x00\x00\x00nameq\x03X\x03\x00\x00\x00fooq\x04sb.' >>> pickle.loads(x) sentinel.foo >>> pickle.loads(x) == mock.sentinel.foo False -- nosy: +davin ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29229> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com