Re: [python-tulip] Running different selectors in separate threads/processes

Jack Murray Thu, 24 Apr 2014 08:18:08 -0700

Apologies for digging up old thread.

I notice that in python 3.4, asyncio spawns a new thread in this situation:


            print("Threading, before:",threading.active_count())
            self._server = yield from self._loop.create_server(lambda: 
self, *self._address)
            print("Threading, after:",threading.active_count())

That is called in a coroutine running in self._loop. I am confused about 
this behaviour.. is there a way to get the accept loop for the server 
running in the same thread?
What I want is to ensure that any coroutine touching this object is being 
run in the same thread, including all connections and their handlers.

On Saturday, 23 November 2013 02:16:24 UTC+11, Guido van Rossum wrote:
>
> Given the reality of the Python beta 1 release (happening this weekend) I 
> think we are forced to put this off until a future version of Tulip.
>
> I'm also not at all sure of the advantage of using an event loop in a 
> thread or subprocess to handle exactly one connection -- it would seem that 
> you're better off using synchronous reads and writes in this case (famous 
> web servers nothwithstanding).
>
> And if you plan to make calls to e.g. a blocking ORM, using an event loop 
> is even less defensible -- since while you're waiting for a blocking ORM 
> call the event loop doesn't run at all.
>
> Basically, it seems to me you're fooling yourself with this model into 
> "political correctness": claim you are using an event loop but not actually 
> benefiting from it or abiding by its rules.
>
> The way Tulip envisions you make ORM calls (and anything else that does 
> blocking I/O) is using run_in_executor().
>
>
> On Fri, Nov 22, 2013 at 3:00 AM, Giampaolo Rodola' 
> <[email protected]<javascript:>
> > wrote:
>
>> The basic idea is to allow the possibility to run multiple IO loops each 
>> one in a separate thread/process, and the exact moment when that should 
>> happen is when a new connection occurs (either on accept() or connect()).
>>
>> The use case this tries to address is for when you are forced to use a 
>> blocking component (say an ORM or a blocking network lib) within the async 
>> loop.
>> The overall performances will be worse than when using the standard IO 
>> loop (spawning on each connect() + the no longer necessary multiplexing 
>> occurring each handler ('yielf from sock_recv' instead of 'sock.recv()') 
>> but at least you will be able to handle other concurrent connections.
>>
>> What you describe in 1), 2) and 3) look more or less what I'm talking 
>> about so I think we're on the same track.
>> What's missing is a step #4 in which the spawned workers/connections are 
>> nicely shut down but I think you understand what I'm talking about.
>> Judging from your description of steps 1), 2) and 3) it seems that in 
>> principle Tulip already provides the necessary hooks to do this.
>> What might be worth investigating is if it's the case to provide a high 
>> level API which does the work for you and changes the default concurrency 
>> model.
>> I'm not familiar with Tulip's API but I would expect it to be something 
>> as easy as:
>>
>> >>> loop = asyncio.get_event_loop()
>> >>> start_server(loop, args.host, args.port, spawn_threads=True) 
>>
>> ...at least, this is what I achieved in pyftpdlib, but instead of passing 
>> a flag to a function I use a different "acceptor" class. 
>>
>> > Using a separate thread or process per connection seems to 
>> > go against the whole idea of Tulip.
>>
>> I understand you being reluctant.
>> What I'm talking about here is a mixed concurrency model where the 
>> "acceptor" is async and each "handler" keeps using 'select()' but it does 
>> it in another process/thread. 
>> I believe this kind of model is not new BTW.
>> I remember of some famous web server which might be using it, but I can't 
>> recall which one (light-httpd or NGINX perhaps)?
>>
>> > (b) Using "yield from loop.sock_accept()" in a loop may not be the 
>> fastest way
>> >  to accept connections for a high-performance server
>>
>> At that level there shouldn't be any performance degradation as long as 
>> the while loop is awakened by select()/epoll()/whatever() as usual.
>>  
>> --- Giampaolo
>> https://code.google.com/p/pyftpdlib/
>> https://code.google.com/p/psutil/
>> https://code.google.com/p/pysendfile/
>>
>>
>> On Fri, Nov 22, 2013 at 1:23 AM, Guido van Rossum 
>> <[email protected]<javascript:>
>> > wrote:
>>
>>> Hi Giampaolo,
>>>
>>> I'm not sure I understand your idea. :-(
>>>
>>> Using a separate thread or process per connection seems to go against 
>>> the whole idea of Tulip. Tulip does support multiple threads, each with 
>>> their own I/O loop, but there is no way to hand off a connection (either 
>>> incoming or outgoing) to a different thread.
>>>
>>> Maybe you can do the following:
>>>
>>> (1) Manually create and bind a socket using the socket module (maybe use 
>>> EventLoop.getaddrinfo() to get a numeric IP/IPv6 address to bind it to) and 
>>> set it in non-blocking mode.
>>>
>>> (2) Run some kind of loop that repeatedly calls EventLoop.sock_accept() 
>>> on that socket (this returns a Future, so you have to do this in a 
>>> coroutine using yield from), and whenever that returns a new connection, 
>>> pass the socket to another function to be run in another thread. (You may 
>>> even use EventLoop.run_in_executor() to start the thread using a thread 
>>> pool -- and by passing in a multiprocessing executor you may even be able 
>>> to run it in a subprocess.)
>>>
>>> (3) That other function now owns the socket. It can start an event loop 
>>> (set_event_loop(new_event_loop()) and then uses create_connection(), 
>>> passing in the connection socket. This feels weird, but it should work.
>>>
>>> I haven't tried this, and there are a few smelly parts to it:
>>>
>>> (a) You're not benefiting from any of the logic in create_server(), 
>>> which includes a nice way to stop serving and (separately) to wait for all 
>>> the connections to be done (all through the Server object, which you can 
>>> only get by calling create_server()).
>>>
>>> (b) Using "yield from loop.sock_accept()" in a loop may not be the 
>>> fastest way to accept connections for a high-performance server (I believe 
>>> Glyph hammered on this point a while ago).
>>>
>>> (c) Using create_connection() to get a transport+protocol for a 
>>> server-side socket is definitely weird. For SSL there may be an problem -- 
>>> the SSL transport receives a flag indicating whether it is being used 
>>> server-side or client-side, and you'd have to study the code to make sure 
>>> it's safe. Grepping for server_side I think it's only really used to decide 
>>> whether to pass the server_hostname argument to wrap_socket(), so I think 
>>> you can bypass that with an explicit argument of server_hostname=''. We're 
>>> definitely talking implementation accident here -- you'd have to experiment 
>>> and see how this works out, and then who knows how it will work using a 
>>> proactor event loop.
>>>
>>> But I really don't know if I any of this is actually related to what 
>>> you're asking about, so let's see what you say first...
>>>  
>>>
>>>
>>> On Thu, Nov 21, 2013 at 3:39 PM, Giampaolo Rodola' 
>>> <[email protected]<javascript:>
>>> > wrote:
>>>
>>>> Hello and sorry in advance if this has already been discussed but I've 
>>>> not been tuned with Tulip development for a while now due to lack of time.
>>>>
>>>> One of the feature I appreciate the most in asyncore, Tornado and 
>>>> pyftpdlib event loops is the fact that the IO loop class can optionally 
>>>> accept an existing IO loop instance.
>>>>
>>>> Thanks to that capability in pyftpdlib I managed to do the following:
>>>>
>>>> 1 - I can replace the "main" async dispatcher class with one which will 
>>>> be used  only to accept new connections 
>>>>
>>>> 2 - every time a new connection comes in that will be dispatched to a 
>>>> separate thread/process which internally will run its own IO loop
>>>>
>>>> 3 - when the server is shut down the main dispatcher (1) will take care 
>>>> to "free" / disconnect the pending workers
>>>>
>>>> With this strategy every connection handler will be free to block 
>>>> without hanging the whole FTP server, which is particularly handy in case 
>>>> the user code makes queries to a DB, the file system is too slow etc.
>>>> Here's a couple of references:
>>>>
>>>> https://code.google.com/p/pyftpdlib/wiki/Tutorial?#4.6_-_Changing_the_concurrency_model
>>>>
>>>> https://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/servers.py#280
>>>>
>>>> I just took a look at Tulip's code and noticed that 
>>>> BaseSelectorEventLoop allows a selector instance to be passed to the 
>>>> constructor.
>>>> I may be misinterpreting the code but I don't see the same paradigm 
>>>> replicated into other parts of the code (in details the transports and the 
>>>> scheduler).
>>>>
>>>> So here comes my question: has this use case been considered or 
>>>> explored?
>>>>
>>>>
>>>> --- Giampaolo
>>>> https://code.google.com/p/pyftpdlib/
>>>> https://code.google.com/p/psutil/
>>>> https://code.google.com/p/pysendfile/
>>>>
>>>
>>>
>>>
>>> -- 
>>> --Guido van Rossum (python.org/~guido) 
>>>
>>
>>
>
>
> -- 
> --Guido van Rossum (python.org/~guido) 
>

Re: [python-tulip] Running different selectors in separate threads/processes

Reply via email to