Re: [python-tulip] multiprocessing issue with python 3.6

2016-12-28 Thread Martin Richard
Hi Denis,

We are talking about forking without exec-ing right after, so using
subprocess coroutines is mostly fine.

It's dangerous because you may:
1/ run scheduled code (callback, task, etc) twice,
2/ interfere with the parent loop from the child by mistake.

1/ you can't really know if the loop has other tasks or pending callbacks
scheduled to run when you fork: it means that if both the parent and child
runs the same loop, some tasks will run on both process. This is a problem
because some side effects may be applied twice: both the master and the
child will write the same buffer on a socket, or the child might steal data
that should have been consumed by the parent.

2/ At least on linux, asyncio uses epoll, which is a structure owned by the
kernel and identified by a fd. When forking, the child inherits this fd.
This means that the list of events watched by the loop (for instance "a
read is ready on socket X") is registered by the kernel and shared by both
processes.

If one of the processes open a socket and watches an event on it, the other
process will receive the notification... but it doesn't know anything about
this new file, and may try to read on the wrong one (or non-existing fd).

Anyway, even if the loop is not running when the fork is performed, there
is still a problem which requires monkey-patching.

If you choose to close the paren's loop in the child right after the fork,
you will only prevent the 1st problem: when the loop is closed and
disposed, the cleanup will unregister all watched events, which will affect
the parent loop (2nd problem). You *must* monkey-patch the loop's selector.

I've got a project which does that, and it's quite brittle as I've got to
take care of the global state of the loop when forking. I am considering
replacing this fragile implementation with one that start a fresh python
process. The downside with the strategy is that spawning a process will
take more time (initialization is quite slow in python) and I will need a
RPC to send data from the parent.

Maybe there are other problems I'm not aware of, but as I said, I fork a
process with a running loop in a something used in prod, and it works fine,
so in practice, it's hard but doable.

2016-12-28 10:58 GMT+01:00 Denis Costa :

> Hi Yuri,
>
> On Friday, December 9, 2016 at 11:43:49 PM UTC+1, Yury Selivanov wrote:
>>
>> I find forking from within a coroutine or a callback function to be quite
>> dangerous. It’s usually better to pre-fork or to use the approach I
>> describe above (with any kind of asynchronous IO framework, not just
>> asyncio).
>>
>
> Could you elaborate more why this is quite dangerous?
>
>
> Thanx
>
> Denis Costa
>



-- 
Martin  Richard
www.martiusweb.net


Re: [python-tulip] multiprocessing issue with python 3.6

2016-12-28 Thread Denis Costa
Hi Yuri,

On Friday, December 9, 2016 at 11:43:49 PM UTC+1, Yury Selivanov wrote:
>
> I find forking from within a coroutine or a callback function to be quite 
> dangerous. It’s usually better to pre-fork or to use the approach I 
> describe above (with any kind of asynchronous IO framework, not just 
> asyncio). 
>

Could you elaborate more why this is quite dangerous?


Thanx

Denis Costa


Re: [python-tulip] multiprocessing issue with python 3.6

2016-12-09 Thread Luca Sbardella
>
>
> >
> >   Ideally, you want to stop the loop, spawn a process, resume the loop.
> >
> > that does not sound what I should be doing, but I'll test it
>
> I find forking from within a coroutine or a callback function to be quite
> dangerous. It’s usually better to pre-fork or to use the approach I
> describe above (with any kind of asynchronous IO framework, not just
> asyncio).
>
> I think I'll stick with my initial hack for now, simpler

Thanks for help,


-- 
http://lucasbardella.com


Re: [python-tulip] multiprocessing issue with python 3.6

2016-12-09 Thread Yury Selivanov

> > Is that what I'm supposed to do? Or is there a better way?
> 
> A better was is to never fork or spawn multiprocessing.Process from a running 
> coroutine.
> 
> right, so if the forking is not in a coroutine it may work?!?!

It should, because the running loop is set only when the loop is running :)

>  
>   Ideally, you want to stop the loop, spawn a process, resume the loop.
> 
> that does not sound what I should be doing, but I'll test it

I find forking from within a coroutine or a callback function to be quite 
dangerous. It’s usually better to pre-fork or to use the approach I describe 
above (with any kind of asynchronous IO framework, not just asyncio).

Yury

Re: [python-tulip] multiprocessing issue with python 3.6

2016-12-09 Thread Luca Sbardella
On 9 December 2016 at 21:25, Yury Selivanov  wrote:

>
> > On Dec 9, 2016, at 5:57 AM, Luca Sbardella 
> wrote:
> >
> > Hi,
> >
> > I'm trying to run pulsar in multiprocessing mode (using the
> multiprocessing module to create processes rather than asyncio subprocess).
> > However, in python 3.6 I have a small problem.
> > When the new process starts, it creates the event loop and starts it but
> I get
> >
> > raise RuntimeError('This event loop is already running')
> >
> > The loop is not running, but the _running_loop global in the
> asyncio.events module has been inherited from the master process and
> therefore the get_event_loop function is somehow broken.
> >
> > I resolved the issue via setting the running loop to None when the
> Process run method is called:
> >
> > def run(self):
> > try:
> > from asyncio.events import _set_running_loop
> > _set_running_loop(None)
> > except ImportError:
> > pass
> > ...
> >
> > Is that what I'm supposed to do? Or is there a better way?
>
> A better was is to never fork or spawn multiprocessing.Process from a
> running coroutine.


right, so if the forking is not in a coroutine it may work?!?!


>   Ideally, you want to stop the loop, spawn a process, resume the loop.
>

that does not sound what I should be doing, but I'll test it

Thanks


> Yury
>
>


-- 
http://lucasbardella.com


Re: [python-tulip] multiprocessing issue with python 3.6

2016-12-09 Thread Yury Selivanov

> On Dec 9, 2016, at 5:57 AM, Luca Sbardella  wrote:
> 
> Hi,
> 
> I'm trying to run pulsar in multiprocessing mode (using the multiprocessing 
> module to create processes rather than asyncio subprocess).
> However, in python 3.6 I have a small problem.
> When the new process starts, it creates the event loop and starts it but I get
> 
> raise RuntimeError('This event loop is already running')
> 
> The loop is not running, but the _running_loop global in the asyncio.events 
> module has been inherited from the master process and therefore the 
> get_event_loop function is somehow broken.
> 
> I resolved the issue via setting the running loop to None when the Process 
> run method is called:
> 
> def run(self):
> try:
> from asyncio.events import _set_running_loop
> _set_running_loop(None)
> except ImportError:
> pass
> ...
> 
> Is that what I'm supposed to do? Or is there a better way?

A better was is to never fork or spawn multiprocessing.Process from a running 
coroutine.  Ideally, you want to stop the loop, spawn a process, resume the 
loop.

Yury