Re: [python-tulip] Process + Threads + asyncio... has sense?
On 4/25/16, cr0hn cr0hnwrote: > I uploaded as GIST my PoC code, if anyone would like to see the code or > send any improvement: > https://gist.github.com/cr0hn/e88dfb1fe8ed0fbddf49185f419db4d8 > Regards, Thanks for the work. >> 2) You cant use any blocking call anywhere in async server. >> If you do, ALL your server is dead in the water till the return >> of this blocking call. Do you think that my design is faulty? >> Then look at the SSH/TLS implementation of asyncio itself. >> During handshake, you are at the mercy of openssh library. >> Thus, it is impossible to build medium to highload TLS server. >> To do that safely and appropiately you need asyncio >> implemenation of openssh! It's openssl. Not ssh... Sorry..
Re: [python-tulip] Process + Threads + asyncio... has sense?
Thanks for your responses. I uploaded as GIST my PoC code, if anyone would like to see the code or send any improvement: https://gist.github.com/cr0hn/e88dfb1fe8ed0fbddf49185f419db4d8 Regards, El miércoles, 20 de abril de 2016, 1:00:08 (UTC+2), Imran Geriskovan escribió: > > >1. With threads you need more locks, and the more locks you have: a) > the > > lower the performance, and b) the greater the risk of introducing > > deadlocks; > > So please keep in mind that things are not as black and white as "which > is > > faster". There are other things to consider. > > While handling mutually exclusive muItithreaded I/O, > you dont need any lock. Aside from generalist advices, > reasons for thinking to go back to threads are: > > 1) Awaits are viral. Async programmining is kind of all_or_nothing. > You need all I/O libraries to be async. > > 2) You cant use any blocking call anywhere in async server. > If you do, ALL your server is dead in the water till the return > of this blocking call. Do you think that my design is faulty? > Then look at the SSH/TLS implementation of asyncio itself. > During handshake, you are at the mercy of openssh library. > Thus, it is impossible to build medium to highload TLS server. > To do that safely and appropiately you need asyncio > implemenation of openssh! > > 3) I appreciate the core idea of asyncio. However, it is not cheap. > It hardly justifies the whole new thing, while you can only > drop "await" s and run it as multithreaded and preserving compatibility > with all old libraries. If you did not bought the inverted > async patterns, even you still preserve your chances of migrating > to any other classical language. > > 4) Major Down side of thread approach is memory consumption. > That is 8MB per thread on linux. Other than this OS threads are cheap > on linux. (Windows is another story.) If your use case can afford > it, why not use it. > > Returning to the original subject of this message thread; > as cr...@cr0hn.com proposed certain combinations of > processes, > threads and coroutines definetely make sense.. > > Regards, > Quick Reply >
Re: [python-tulip] Process + Threads + asyncio... has sense?
> This is a very simple example, but it illustrates some of the problems with > threading vs coroutines: >1. With threads you need more locks, and the more locks you have: a) the > lower the performance, and b) the greater the risk of introducing > deadlocks; > So please keep in mind that things are not as black and white as "which is > faster". There are other things to consider. While handling mutually exclusive muItithreaded I/O, you dont need any lock. Aside from generalist advices, reasons for thinking to go back to threads are: 1) Awaits are viral. Async programmining is kind of all_or_nothing. You need all I/O libraries to be async. 2) You cant use any blocking call anywhere in async server. If you do, ALL your server is dead in the water till the return of this blocking call. Do you think that my design is faulty? Then look at the SSH/TLS implementation of asyncio itself. During handshake, you are at the mercy of openssh library. Thus, it is impossible to build medium to highload TLS server. To do that safely and appropiately you need asyncio implemenation of openssh! 3) I appreciate the core idea of asyncio. However, it is not cheap. It hardly justifies the whole new thing, while you can only drop "await" s and run it as multithreaded and preserving compatibility with all old libraries. If you did not bought the inverted async patterns, even you still preserve your chances of migrating to any other classical language. 4) Major Down side of thread approach is memory consumption. That is 8MB per thread on linux. Other than this OS threads are cheap on linux. (Windows is another story.) If your use case can afford it, why not use it. Returning to the original subject of this message thread; as cr...@cr0hn.com proposed certain combinations of processes, threads and coroutines definetely make sense.. Regards,
Re: [python-tulip] Process + Threads + asyncio... has sense?
Sorry, I should have been more explicit: With Python (both CPython and PyPy), the least overhead / best performance (throughput) approach to network servers is: Use a multi-process architecture with shared listening ports (Linux SO_REUSEPORT), with each process running an event loop (asyncio/Twisted). I don't recommend using OS threads (of course) ;) Am 19.04.2016 um 23:51 schrieb Gustavo Carneiro: On 19 April 2016 at 22:02, Imran Geriskovan> wrote: >> A) Python threads are not real threads. It multiplexes "Python Threads" >> on a single OS thread. (Guido, can you correct me if I'm wrong, >> and can you provide some info on multiplexing/context switching of >> "Python Threads"?) > Sorry, you are wrong. Python threads map 1:1 to OS threads. They are as > real as threads come (the GIL notwithstanding). Ok then. Just to confirm for cpython: - Among these OS threads, only one thread can run at a time due to GIL. A thread releases GIL (thus allow any other thread began execution) when waiting for blocking I/O. (http://www.dabeaz.com/python/GIL.pdf) This is similar to what we do in asyncio with awaits. Thus, multi-threaded I/O is the next best thing if we do not use asyncio. Then the question is still this: Which one is cheaper? Thread overheads or asyncio overheads. IMHO, that is the wrong question to ask; that doesn't matter that much. What matters most is, which one is safer. Threads appear deceptively simple... that is up to the point where you trigger a deadlock and your whole application just freezes as a result. Because threads need lots and lots of locks everywhere. Asyncio code also may need some locks, but only a fraction, because for a lot of things you can get away with not doing any locking. For example, imagine a simple statistics class, like this: class MeanStat: def __init__(self): self.num_values = 0 self.sum_values = 0 def add_sample(self, value): self.num_values += 1 self.sum_values += value @property def mean(self): return self.sum_values/self.num_values if self.num_values > 0 else 0 The code above can be used as is in asyncio applications. You can call MeanStat.add_sample() from multiple asyncio tasks at the same time without any locking and you know the MeanStat.mean property will always return a correct value. However, if you try to do this with a threaded application, if you don't use any locking you will get incorrect results (and what is annoying is that you may not get incorrect results in development, but only in production!), because a thread may be calling MeanStat.mean() and the sum/nvalues expression may en up being calculated in the middle of another thread adding a sample: def add_sample(self, value): self.num_values += 1 < switches to another thread here: num_values was updated, but sum_values was not! self.sum_values += value The correct way to fix that code with threading is to add locks: class MeanStat: def __init__(self): self.lock = threading.Lock() self.num_values = 0 self.sum_values = 0 def add_sample(self, value): with self.lock: self.num_values += 1 self.sum_values += value @property def mean(self): with self.lock: return self.sum_values/self.num_values if self.num_values > 0 else 0 This is a very simple example, but it illustrates some of the problems with threading vs coroutines: 1. With threads you need more locks, and the more locks you have: a) the lower the performance, and b) the greater the risk of introducing deadlocks; 2. If you /forget/ that you need locks in some place (remember that most code is not as simple as this example), you get race conditions: code that /seems/ to work fine in development, but behaves strangely in production: strange values being computed, crashes, deadlocks. So please keep in mind that things are not as black and white as "which is faster". There are other things to consider. -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert
Re: [python-tulip] Process + Threads + asyncio... has sense?
On 19 April 2016 at 22:02, Imran Geriskovanwrote: > >> A) Python threads are not real threads. It multiplexes "Python Threads" > >> on a single OS thread. (Guido, can you correct me if I'm wrong, > >> and can you provide some info on multiplexing/context switching of > >> "Python Threads"?) > > > Sorry, you are wrong. Python threads map 1:1 to OS threads. They are as > > real as threads come (the GIL notwithstanding). > > Ok then. Just to confirm for cpython: > - Among these OS threads, only one thread can run at a time due to GIL. > > A thread releases GIL (thus allow any other thread began execution) > when waiting for blocking I/O. (http://www.dabeaz.com/python/GIL.pdf) > This is similar to what we do in asyncio with awaits. > > Thus, multi-threaded I/O is the next best thing if we do not use asyncio. > > Then the question is still this: Which one is cheaper? > Thread overheads or asyncio overheads. > IMHO, that is the wrong question to ask; that doesn't matter that much. What matters most is, which one is safer. Threads appear deceptively simple... that is up to the point where you trigger a deadlock and your whole application just freezes as a result. Because threads need lots and lots of locks everywhere. Asyncio code also may need some locks, but only a fraction, because for a lot of things you can get away with not doing any locking. For example, imagine a simple statistics class, like this: class MeanStat: def __init__(self): self.num_values = 0 self.sum_values = 0 def add_sample(self, value): self.num_values += 1 self.sum_values += value @property def mean(self): return self.sum_values/self.num_values if self.num_values > 0 else 0 The code above can be used as is in asyncio applications. You can call MeanStat.add_sample() from multiple asyncio tasks at the same time without any locking and you know the MeanStat.mean property will always return a correct value. However, if you try to do this with a threaded application, if you don't use any locking you will get incorrect results (and what is annoying is that you may not get incorrect results in development, but only in production!), because a thread may be calling MeanStat.mean() and the sum/nvalues expression may en up being calculated in the middle of another thread adding a sample: def add_sample(self, value): self.num_values += 1 < switches to another thread here: num_values was updated, but sum_values was not! self.sum_values += value The correct way to fix that code with threading is to add locks: class MeanStat: def __init__(self): self.lock = threading.Lock() self.num_values = 0 self.sum_values = 0 def add_sample(self, value): with self.lock: self.num_values += 1 self.sum_values += value @property def mean(self): with self.lock: return self.sum_values/self.num_values if self.num_values > 0 else 0 This is a very simple example, but it illustrates some of the problems with threading vs coroutines: 1. With threads you need more locks, and the more locks you have: a) the lower the performance, and b) the greater the risk of introducing deadlocks; 2. If you /forget/ that you need locks in some place (remember that most code is not as simple as this example), you get race conditions: code that /seems/ to work fine in development, but behaves strangely in production: strange values being computed, crashes, deadlocks. So please keep in mind that things are not as black and white as "which is faster". There are other things to consider. -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert
Re: [python-tulip] Process + Threads + asyncio... has sense?
Am 19.04.2016 um 23:02 schrieb Imran Geriskovan: A) Python threads are not real threads. It multiplexes "Python Threads" on a single OS thread. (Guido, can you correct me if I'm wrong, and can you provide some info on multiplexing/context switching of "Python Threads"?) Sorry, you are wrong. Python threads map 1:1 to OS threads. They are as real as threads come (the GIL notwithstanding). Ok then. Just to confirm for cpython: - Among these OS threads, only one thread can run at a time due to GIL. A thread releases GIL (thus allow any other thread began execution) when waiting for blocking I/O. (http://www.dabeaz.com/python/GIL.pdf) This is similar to what we do in asyncio with awaits. Thus, multi-threaded I/O is the next best thing if we do not use asyncio. Then the question is still this: Which one is cheaper? Thread overheads or asyncio overheads. The overhead of cooperative multitasking is smaller, but for maximum performance you need to combine that with preemptive multitasking because to saturate modern hardware, you need high IO concurrency (I am leaving out stuff like Linux AIO in this discussion)
Re: [python-tulip] Process + Threads + asyncio... has sense?
Thank you for your responses. The scenario (I forgot in my first post): I'm trying to improve I/O accesses (disk/network...). So, if a Python thread map with a OS 1:1 thread, and the main problem (I understood that) is the cost of context switching between of threads/coroutines... this raises me a new question: If I only run a process with 1 thread (the default state) the GIL will change the context after the thread ticks was spent? Or the behavior is like a plain run until the program ends? Thinking about that, I suppose that if the status is 1 process <-> 1 thread, without context change, obviously the best approach for high performance network I/O are with creating coroutines and not threads, right? I'm wrong? En 19 de abril de 2016 en 0:54:28, Guido van Rossum (gu...@python.org) escrito: On Mon, Apr 18, 2016 at 1:26 PM, Imran Geriskovanwrote: A) Python threads are not real threads. It multiplexes "Python Threads" on a single OS thread. (Guido, can you correct me if I'm wrong, and can you provide some info on multiplexing/context switching of "Python Threads"?) Sorry, you are wrong. Python threads map 1:1 to OS threads. They are as real as threads come (the GIL notwithstanding). -- --Guido van Rossum (python.org/~guido) --- Daniel García (cr0hn) Security researcher and ethical hacker Personal site: http://cr0hn.com Linkedin: https://www.linkedin.com/in/garciagarciadaniel Company: http://abirtone.com Twitter: @ggdaniel signature.asc Description: Message signed with OpenPGP using AMPGpg
Re: [python-tulip] Process + Threads + asyncio... has sense?
On Mon, Apr 18, 2016 at 1:26 PM, Imran Geriskovan < imran.gerisko...@gmail.com> wrote: > A) Python threads are not real threads. It multiplexes "Python Threads" > on a single OS thread. (Guido, can you correct me if I'm wrong, > and can you provide some info on multiplexing/context switching of > "Python Threads"?) > Sorry, you are wrong. Python threads map 1:1 to OS threads. They are as real as threads come (the GIL notwithstanding). -- --Guido van Rossum (python.org/~guido)
Re: [python-tulip] Process + Threads + asyncio... has sense?
>>> I don't think you need the threads. >>> 1. If your tasks are I/O bound, coroutines are a safer way to do things, >>> and probably even have better performance; >> >> Thread vs Coroutine context switching is an interesting topic. >> Do you have any data for comparison? > My 2cts: > OS native (= non-green) threads are an OS scheduler driven, preemptive > multitasking approach, necessarily with context switching overhead that > is higher than a cooperative multitasking approach like asyncio event loop. > Note: that is Twisted, not asyncio, but the latter should behave the > same qualitatively. > /Tobias Linux OS threads come with 8MB stack per thread + switching costs as you mentioned. A) Python threads are not real threads. It multiplexes "Python Threads" on a single OS thread. (Guido, can you correct me if I'm wrong, and can you provide some info on multiplexing/context switching of "Python Threads"?) B) Where as asyncio multiplexes coroutines on a "Python Thread"? The question is "Which one is more effective?". The answer is ofcourse dependent on use case. However, as a heavy user of coroutines, I begin to think to go back to "Python Threads".. Anyway that's personal choice. Now lets clarify advantages and disadvantages between A and B.. Regards, Imran
Re: [python-tulip] Process + Threads + asyncio... has sense?
Am 18.04.2016 um 21:33 schrieb Imran Geriskovan: On 4/18/16, Gustavo Carneirowrote: I don't think you need the threads. 1. If your tasks are I/O bound, coroutines are a safer way to do things, and probably even have better performance; Thread vs Coroutine context switching is an interesting topic. Do you have any data for comparison? My 2cts: OS native (= non-green) threads are an OS scheduler driven, preemptive multitasking approach, necessarily with context switching overhead that is higher than a cooperative multitasking approach like asyncio event loop. Eg the context switching with threads involves saving and restoring the whole CPU core register set. OS native threads also involves bounding back and forth between kernel- and userspace. Practical evidence: name one high performance network server that is using threads (and only threads), and not some event loop thing;) You want N threads/processes where N is related to number of cores and/or effective IO concurrency _and_ each thread/process run an event loop thing. And because of the GIL, you want processes, not threads on (C)Python. The effective IO concurrency depends on the number of IO queues your hardware supports (the NICs or the storage devices). The IO queues should have affinity to the (nearest) CPU core on an SMP system also. For network, I once did some experiments of how far Python can go. Here is Python (PyPy) doing 630k HTTP requests/sec (12.6 GB/sec) using 40 cores: https://github.com/crossbario/crossbarexamples/tree/master/benchmark/web Note: that is Twisted, not asyncio, but the latter should behave the same qualitatively. Cheers, /Tobias Regards, Imran
Re: [python-tulip] Process + Threads + asyncio... has sense?
On 4/18/16, Gustavo Carneirowrote: > I don't think you need the threads. > 1. If your tasks are I/O bound, coroutines are a safer way to do things, > and probably even have better performance; Thread vs Coroutine context switching is an interesting topic. Do you have any data for comparison? Regards, Imran
Re: [python-tulip] Process + Threads + asyncio... has sense?
I don't think you need the threads. 1. If your tasks are I/O bound, coroutines are a safer way to do things, and probably even have better performance; 2. If your tasks are CPU bound, only multiple processes will help, multiple (Python) threads do not help at all. Only in the special case where the CPU work is mostly done via a C library[*] do threads help. I would recommend using multiple threads only if interacting with 3rd party code that is I/O bound but is not written with an asynchronous API, such as the requests library, selenium, etc. But in this case, probably using asyncio.Loop.run_in_executor() is a simpler solution. [*] and a C API wrapped in such a way that it does a lot of work with few Python calls, plus it releases the GIL, so don't go thinking that a simple scalar math function call can take advantage of multithreading. On 18 April 2016 at 19:33, cr0hn cr0hnwrote: > Hi all, > > It's the first time I write in this list. Sorry if it's not the best place > for this question. > > After I read the Asyncio's documentation, PEPs, Guido/Jesse/David Beazley > articles/talks, etc, I developed a PoC library that mixes: Process + > Threads + Asyncio Tasks, doing an scheme like this diagram: > > main -> Process 1 -> Thread 1.1 -> Task 1.1.1 > -> Task 1.1.2 > -> Task 1.1.3 > >-> Thread 1.2 > -> Task 1.2.1 > -> Task 1.2.2 > -> Task 1.2.3 > > Process 2 -> Thread 2.1 -> Task 2.1.1 > -> Task 2.1.2 > -> Task 2.1.3 > > -> Thread 2.2 > -> Task 2.2.1 > -> Task 2.2.2 > -> Task 2.2.3 > > In my local tests, this approach appear to improve (and simplify) the > concurrency/parallelism for some tasks but, before release the library at > github, I don't know if my aproach is wrong and I would appreciate your > opinion. > > Thank you very much for your time. > > Regards! > -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert