Re: [python-tulip] Process + Threads + asyncio... has sense?

Tobias Oberstein Tue, 19 Apr 2016 15:14:25 -0700

Sorry, I should have been more explicit:

With Python (both CPython and PyPy), the least overhead / bestperformance (throughput) approach to network servers is:

Use a multi-process architecture with shared listening ports (LinuxSO_REUSEPORT), with each process running an event loop (asyncio/Twisted).


I don't recommend using OS threads (of course) ;)

Am 19.04.2016 um 23:51 schrieb Gustavo Carneiro:

On 19 April 2016 at 22:02, Imran Geriskovan <[email protected]
<mailto:[email protected]>> wrote:

    >> A) Python threads are not real threads. It multiplexes "Python Threads"
    >> on a single OS thread. (Guido, can you correct me if I'm wrong,
    >> and can you provide some info on multiplexing/context switching of
    >> "Python Threads"?)

    > Sorry, you are wrong. Python threads map 1:1 to OS threads. They are as
    > real as threads come (the GIL notwithstanding).

    Ok then. Just to confirm for cpython:
    - Among these OS threads, only one thread can run at a time due to GIL.

    A thread releases GIL (thus allow any other thread began execution)
    when waiting for blocking I/O. (http://www.dabeaz.com/python/GIL.pdf)
    This is similar to what we do in asyncio with awaits.

    Thus, multi-threaded I/O is the next best thing if we do not use
    asyncio.

    Then the question is still this: Which one is cheaper?
    Thread overheads or asyncio overheads.


IMHO, that is the wrong question to ask; that doesn't matter that much.
What matters most is, which one is safer.  Threads appear deceptively
simple... that is up to the point where you trigger a deadlock and your
whole application just freezes as a result.  Because threads need lots
and lots of locks everywhere.  Asyncio code also may need some locks,
but only a fraction, because for a lot of things you can get away with
not doing any locking.  For example, imagine a simple statistics class,
like this:

class MeanStat:
     def __init__(self):
         self.num_values = 0
         self.sum_values = 0

     def add_sample(self, value):
         self.num_values += 1
         self.sum_values += value
     @property
     def mean(self):
         return self.sum_values/self.num_values if self.num_values > 0
else 0


The code above can be used as is in asyncio applications.  You can call
MeanStat.add_sample() from multiple asyncio tasks at the same time
without any locking and you know the MeanStat.mean property will always
return a correct value.

However, if you try to do this with a threaded application, if you don't
use any locking you will get incorrect results (and what is annoying is
that you may not get incorrect results in development, but only in
production!), because a thread may be calling MeanStat.mean() and the
sum/nvalues expression may en up being calculated in the middle of
another thread adding a sample:

     def add_sample(self, value):
         self.num_values += 1
               <<<<< switches to another thread here: num_values was
updated, but sum_values was not!
         self.sum_values += value

The correct way to fix that code with threading is to add locks:

class MeanStat:
     def __init__(self):
         self.lock = threading.Lock()
         self.num_values = 0
         self.sum_values = 0

     def add_sample(self, value):
         with self.lock:
             self.num_values += 1
             self.sum_values += value
     @property
     def mean(self):
         with self.lock:
             return self.sum_values/self.num_values if self.num_values >
0 else 0

This is a very simple example, but it illustrates some of the problems
with threading vs coroutines:

    1. With threads you need more locks, and the more locks you have: a)
the lower the performance, and b) the greater the risk of introducing
deadlocks;

    2. If you /forget/ that you need locks in some place (remember that
most code is not as simple as this example), you get race conditions:
code that /seems/ to work fine in development, but behaves strangely in
production: strange values being computed, crashes, deadlocks.

So please keep in mind that things are not as black and white as "which
is faster".  There are other things to consider.

--
Gustavo J. A. M. Carneiro
Gambit Research
"The universe is always one step beyond logic." -- Frank Herbert

Re: [python-tulip] Process + Threads + asyncio... has sense?

Reply via email to