[Python-ideas] Re: How to prevent shared memory from being corrupted ?

Wes Turner Sun, 02 Aug 2020 09:04:13 -0700

It's best to avoid those synchronization barriers if possible.

If you have all of the data in SHM (RAM) on one node, and you need to
notify processes / wait for other workers to be available to perform a task
that requires that data, you need a method for IPC: a queue, channel
subscriptions, a source/sink, over-frequent polling that's more resilient
against dropped messages. (But you only need to scale to one node).


There needs to be a shared structure that tracks allocations, right? What
does it need to do lookups by.

[
[obj_id_or_shm_pointer, [subscribers]]
]

Does the existing memory pool solve for that?

And there also needs to be an instruction pipeline; a queue/channel/source
of messages for each worker or only some workers to process.

...

https://distributed.dask.org/en/latest/journey.html

https://distributed.dask.org/en/latest/work-stealing.html

"Accelerate intra-node IPC with shared memory"
https://github.com/dask/dask/issues/6267


On Sun, Aug 2, 2020, 3:21 AM Vinay Sharma <vinay04sha...@icloud.com> wrote:

> I understand that I won’t need locks with immutable objects at some level,
> but I don’t understand how they can be used to synchronise shared memory
> segments.
>
> For every change in an immutable object, a copy is created which will have
> a different address. Now, for processes to use this updated object they
> will have to remap a new address in their address space for them to see any
> changes, and this remap will have to occur whenever a change takes place,
> which is obviously not feasible.
>
> So, changes in the shared memory segment should be done in the shared
> memory segment itself, therefore shared memory segments should be mutable.
>
> On 02-Aug-2020, at 5:11 AM, Wes Turner <wes.tur...@gmail.com> wrote:
>
>
> https://docs.dask.org/en/latest/shared.html#known-limitations :
>
> > Known Limitations
> > The shared memory scheduler has some notable limitations:
> >
> > - It works on a single machine
> > - The threaded scheduler is limited by the GIL on Python code, so if
> your operations are pure python functions, you should not expect a
> multi-core speedup
> > - The multiprocessing scheduler must serialize functions between
> workers, which can fail
> > - The multiprocessing scheduler must serialize data between workers and
> the central process, which can be expensive
> > - The multiprocessing scheduler cannot transfer data directly between
> worker processes; all data routes through the master process.
>
> ...
> https://distributed.dask.org/en/latest/memory.html#difference-with-dask-compute
>
> (... https://github.com/dask/dask-labextension )
>
> On Sat, Aug 1, 2020 at 7:34 PM Wes Turner <wes.tur...@gmail.com> wrote:
>
>> PyArrow Plasma object ids, "sealing" makes an object immutable, pyristent
>>
>> https://arrow.apache.org/docs/python/plasma.html#object-ids
>> https://arrow.apache.org/docs/python/plasma.html#creating-an-object-buffer
>>
>> > Objects are created in Plasma in two stages. First, they are created,
>> which allocates a buffer for the object. At this point, the client can
>> write to the buffer and construct the object within the allocated buffer.
>> >
>> > To create an object for Plasma, you need to create an object ID, as
>> well as give the object’s maximum size in bytes.
>> > ```python
>> > # Create an object buffer.
>> > object_id = plasma.ObjectID(20 * b"a")
>> > object_size = 1000
>> > buffer = memoryview(client.create(object_id, object_size))
>> >
>> > # Write to the buffer.
>> > for i in range(1000):
>> >   buffer[i] = i % 128
>> > ```
>> >
>> > When the client is done, the client seals the buffer, making the object
>> immutable, and making it available to other Plasma clients.
>> >
>> > ```python
>> > # Seal the object. This makes the object immutable and available to
>> other clients.
>> > client.seal(object_id)
>> > ```
>>
>> https://pypi.org/project/pyrsistent/ also supports immutable structures
>>
>> On Sat, Aug 1, 2020 at 4:44 PM Eric V. Smith <e...@trueblade.com> wrote:
>>
>>> On 8/1/2020 1:25 PM, Marco Sulla wrote:
>>> > You don't need locks with immutable objects. Since they're immutable,
>>> > any operation that usually will mutate the object, generate another
>>> > immutable instead. The most common example is str: the sum of two
>>> > strings in Python (and in many other languages) produces a new string.
>>>
>>> While they're immutable at the Python level, strings (and all other
>>> objects) are mutated at the C level, due to reference count updates. You
>>>
>>> need to consider this if you're sharing objects without locking or other
>>>
>>> synchronization.
>>>
>>> Eric
>>>
>>> _______________________________________________
>>> Python-ideas mailing list -- python-ideas@python.org
>>> To unsubscribe send an email to python-ideas-le...@python.org
>>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/python-ideas@python.org/message/FEJEHFKBK7TMH6KIYJBPLBYBDU4IA4EB/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/IRDFSJP7CIQRPQQEP54T42HN33BUOOOV/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
>

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/S6HLASS4SJ6KGEI3JFY4TMUBSOGBRHBR/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: How to prevent shared memory from being corrupted ?

Reply via email to