Apologies for digging up old thread.
I notice that in python 3.4, asyncio spawns a new thread in this situation:
print("Threading, before:",threading.active_count())
self._server = yield from self._loop.create_server(lambda:
self, *self._address)
print("Threading, after:",threading.active_count())
That is called in a coroutine running in self._loop. I am confused about
this behaviour.. is there a way to get the accept loop for the server
running in the same thread?
What I want is to ensure that any coroutine touching this object is being
run in the same thread, including all connections and their handlers.
On Saturday, 23 November 2013 02:16:24 UTC+11, Guido van Rossum wrote:
>
> Given the reality of the Python beta 1 release (happening this weekend) I
> think we are forced to put this off until a future version of Tulip.
>
> I'm also not at all sure of the advantage of using an event loop in a
> thread or subprocess to handle exactly one connection -- it would seem that
> you're better off using synchronous reads and writes in this case (famous
> web servers nothwithstanding).
>
> And if you plan to make calls to e.g. a blocking ORM, using an event loop
> is even less defensible -- since while you're waiting for a blocking ORM
> call the event loop doesn't run at all.
>
> Basically, it seems to me you're fooling yourself with this model into
> "political correctness": claim you are using an event loop but not actually
> benefiting from it or abiding by its rules.
>
> The way Tulip envisions you make ORM calls (and anything else that does
> blocking I/O) is using run_in_executor().
>
>
> On Fri, Nov 22, 2013 at 3:00 AM, Giampaolo Rodola'
> <[email protected]<javascript:>
> > wrote:
>
>> The basic idea is to allow the possibility to run multiple IO loops each
>> one in a separate thread/process, and the exact moment when that should
>> happen is when a new connection occurs (either on accept() or connect()).
>>
>> The use case this tries to address is for when you are forced to use a
>> blocking component (say an ORM or a blocking network lib) within the async
>> loop.
>> The overall performances will be worse than when using the standard IO
>> loop (spawning on each connect() + the no longer necessary multiplexing
>> occurring each handler ('yielf from sock_recv' instead of 'sock.recv()')
>> but at least you will be able to handle other concurrent connections.
>>
>> What you describe in 1), 2) and 3) look more or less what I'm talking
>> about so I think we're on the same track.
>> What's missing is a step #4 in which the spawned workers/connections are
>> nicely shut down but I think you understand what I'm talking about.
>> Judging from your description of steps 1), 2) and 3) it seems that in
>> principle Tulip already provides the necessary hooks to do this.
>> What might be worth investigating is if it's the case to provide a high
>> level API which does the work for you and changes the default concurrency
>> model.
>> I'm not familiar with Tulip's API but I would expect it to be something
>> as easy as:
>>
>> >>> loop = asyncio.get_event_loop()
>> >>> start_server(loop, args.host, args.port, spawn_threads=True)
>>
>> ...at least, this is what I achieved in pyftpdlib, but instead of passing
>> a flag to a function I use a different "acceptor" class.
>>
>> > Using a separate thread or process per connection seems to
>> > go against the whole idea of Tulip.
>>
>> I understand you being reluctant.
>> What I'm talking about here is a mixed concurrency model where the
>> "acceptor" is async and each "handler" keeps using 'select()' but it does
>> it in another process/thread.
>> I believe this kind of model is not new BTW.
>> I remember of some famous web server which might be using it, but I can't
>> recall which one (light-httpd or NGINX perhaps)?
>>
>> > (b) Using "yield from loop.sock_accept()" in a loop may not be the
>> fastest way
>> > to accept connections for a high-performance server
>>
>> At that level there shouldn't be any performance degradation as long as
>> the while loop is awakened by select()/epoll()/whatever() as usual.
>>
>> --- Giampaolo
>> https://code.google.com/p/pyftpdlib/
>> https://code.google.com/p/psutil/
>> https://code.google.com/p/pysendfile/
>>
>>
>> On Fri, Nov 22, 2013 at 1:23 AM, Guido van Rossum
>> <[email protected]<javascript:>
>> > wrote:
>>
>>> Hi Giampaolo,
>>>
>>> I'm not sure I understand your idea. :-(
>>>
>>> Using a separate thread or process per connection seems to go against
>>> the whole idea of Tulip. Tulip does support multiple threads, each with
>>> their own I/O loop, but there is no way to hand off a connection (either
>>> incoming or outgoing) to a different thread.
>>>
>>> Maybe you can do the following:
>>>
>>> (1) Manually create and bind a socket using the socket module (maybe use
>>> EventLoop.getaddrinfo() to get a numeric IP/IPv6 address to bind it to) and
>>> set it in non-blocking mode.
>>>
>>> (2) Run some kind of loop that repeatedly calls EventLoop.sock_accept()
>>> on that socket (this returns a Future, so you have to do this in a
>>> coroutine using yield from), and whenever that returns a new connection,
>>> pass the socket to another function to be run in another thread. (You may
>>> even use EventLoop.run_in_executor() to start the thread using a thread
>>> pool -- and by passing in a multiprocessing executor you may even be able
>>> to run it in a subprocess.)
>>>
>>> (3) That other function now owns the socket. It can start an event loop
>>> (set_event_loop(new_event_loop()) and then uses create_connection(),
>>> passing in the connection socket. This feels weird, but it should work.
>>>
>>> I haven't tried this, and there are a few smelly parts to it:
>>>
>>> (a) You're not benefiting from any of the logic in create_server(),
>>> which includes a nice way to stop serving and (separately) to wait for all
>>> the connections to be done (all through the Server object, which you can
>>> only get by calling create_server()).
>>>
>>> (b) Using "yield from loop.sock_accept()" in a loop may not be the
>>> fastest way to accept connections for a high-performance server (I believe
>>> Glyph hammered on this point a while ago).
>>>
>>> (c) Using create_connection() to get a transport+protocol for a
>>> server-side socket is definitely weird. For SSL there may be an problem --
>>> the SSL transport receives a flag indicating whether it is being used
>>> server-side or client-side, and you'd have to study the code to make sure
>>> it's safe. Grepping for server_side I think it's only really used to decide
>>> whether to pass the server_hostname argument to wrap_socket(), so I think
>>> you can bypass that with an explicit argument of server_hostname=''. We're
>>> definitely talking implementation accident here -- you'd have to experiment
>>> and see how this works out, and then who knows how it will work using a
>>> proactor event loop.
>>>
>>> But I really don't know if I any of this is actually related to what
>>> you're asking about, so let's see what you say first...
>>>
>>>
>>>
>>> On Thu, Nov 21, 2013 at 3:39 PM, Giampaolo Rodola'
>>> <[email protected]<javascript:>
>>> > wrote:
>>>
>>>> Hello and sorry in advance if this has already been discussed but I've
>>>> not been tuned with Tulip development for a while now due to lack of time.
>>>>
>>>> One of the feature I appreciate the most in asyncore, Tornado and
>>>> pyftpdlib event loops is the fact that the IO loop class can optionally
>>>> accept an existing IO loop instance.
>>>>
>>>> Thanks to that capability in pyftpdlib I managed to do the following:
>>>>
>>>> 1 - I can replace the "main" async dispatcher class with one which will
>>>> be used only to accept new connections
>>>>
>>>> 2 - every time a new connection comes in that will be dispatched to a
>>>> separate thread/process which internally will run its own IO loop
>>>>
>>>> 3 - when the server is shut down the main dispatcher (1) will take care
>>>> to "free" / disconnect the pending workers
>>>>
>>>> With this strategy every connection handler will be free to block
>>>> without hanging the whole FTP server, which is particularly handy in case
>>>> the user code makes queries to a DB, the file system is too slow etc.
>>>> Here's a couple of references:
>>>>
>>>> https://code.google.com/p/pyftpdlib/wiki/Tutorial?#4.6_-_Changing_the_concurrency_model
>>>>
>>>> https://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/servers.py#280
>>>>
>>>> I just took a look at Tulip's code and noticed that
>>>> BaseSelectorEventLoop allows a selector instance to be passed to the
>>>> constructor.
>>>> I may be misinterpreting the code but I don't see the same paradigm
>>>> replicated into other parts of the code (in details the transports and the
>>>> scheduler).
>>>>
>>>> So here comes my question: has this use case been considered or
>>>> explored?
>>>>
>>>>
>>>> --- Giampaolo
>>>> https://code.google.com/p/pyftpdlib/
>>>> https://code.google.com/p/psutil/
>>>> https://code.google.com/p/pysendfile/
>>>>
>>>
>>>
>>>
>>> --
>>> --Guido van Rossum (python.org/~guido)
>>>
>>
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>