Re: [python-tulip] Process + Threads + asyncio... has sense?

2016-04-25 Thread Imran Geriskovan
On 4/25/16, cr0hn cr0hn  wrote:
> I uploaded as GIST my PoC code, if anyone would like to see the code or
> send any improvement:
> https://gist.github.com/cr0hn/e88dfb1fe8ed0fbddf49185f419db4d8
> Regards,

Thanks for the work.

>> 2) You cant use any blocking call anywhere in async server.
>> If you do, ALL your server is dead in the water till the return
>> of this blocking call. Do you think that my design is faulty?
>> Then look at the SSH/TLS implementation of asyncio itself.
>> During handshake, you are at the mercy of openssh library.
>> Thus, it is impossible to build medium to highload TLS server.
>> To do that safely and appropiately you need asyncio
>> implemenation of openssh!

It's openssl. Not ssh... Sorry..


Re: [python-tulip] Process + Threads + asyncio... has sense?

2016-04-25 Thread cr0hn cr0hn
Thanks for your responses.

I uploaded as GIST my PoC code, if anyone would like to see the code or 
send any improvement:

https://gist.github.com/cr0hn/e88dfb1fe8ed0fbddf49185f419db4d8 

Regards,

El miércoles, 20 de abril de 2016, 1:00:08 (UTC+2), Imran Geriskovan 
escribió:
>
> >1. With threads you need more locks, and the more locks you have: a) 
> the 
> > lower the performance, and b) the greater the risk of introducing 
> > deadlocks; 
> > So please keep in mind that things are not as black and white as "which 
> is 
> > faster".  There are other things to consider. 
>
> While handling mutually exclusive muItithreaded I/O, 
> you dont need any lock. Aside from generalist advices, 
> reasons for thinking to go back to threads are: 
>
> 1) Awaits are viral. Async programmining is kind of all_or_nothing. 
> You need all I/O libraries to be async. 
>
> 2) You cant use any blocking call anywhere in async server. 
> If you do, ALL your server is dead in the water till the return 
> of this blocking call. Do you think that my design is faulty? 
> Then look at the SSH/TLS implementation of asyncio itself. 
> During handshake, you are at the mercy of openssh library. 
> Thus, it is impossible to build medium to highload TLS server. 
> To do that safely and appropiately you need asyncio 
> implemenation of openssh! 
>
> 3) I appreciate the core idea of asyncio. However, it is not cheap. 
> It hardly justifies the whole new thing, while you can only 
> drop "await" s and run it as multithreaded and preserving compatibility 
> with all old libraries. If you did not bought the inverted 
> async patterns, even you still preserve your chances of migrating 
> to any other classical language. 
>
> 4) Major Down side of thread approach is memory consumption. 
> That is 8MB per thread on linux. Other than this OS threads are cheap 
> on linux. (Windows is another story.) If your use case can afford 
> it, why not use it. 
>
> Returning to the original subject of this message thread; 
> as cr...@cr0hn.com  proposed certain combinations of 
> processes, 
> threads and coroutines definetely make sense.. 
>
> Regards, 
> Quick Reply 
>


Re: [python-tulip] Process + Threads + asyncio... has sense?

2016-04-19 Thread Imran Geriskovan
> This is a very simple example, but it illustrates some of the problems with
> threading vs coroutines:
>1. With threads you need more locks, and the more locks you have: a) the
> lower the performance, and b) the greater the risk of introducing
> deadlocks;
> So please keep in mind that things are not as black and white as "which is
> faster".  There are other things to consider.


While handling mutually exclusive muItithreaded I/O,
you dont need any lock. Aside from generalist advices,
reasons for thinking to go back to threads are:

1) Awaits are viral. Async programmining is kind of all_or_nothing.
You need all I/O libraries to be async.

2) You cant use any blocking call anywhere in async server.
If you do, ALL your server is dead in the water till the return
of this blocking call. Do you think that my design is faulty?
Then look at the SSH/TLS implementation of asyncio itself.
During handshake, you are at the mercy of openssh library.
Thus, it is impossible to build medium to highload TLS server.
To do that safely and appropiately you need asyncio
implemenation of openssh!

3) I appreciate the core idea of asyncio. However, it is not cheap.
It hardly justifies the whole new thing, while you can only
drop "await" s and run it as multithreaded and preserving compatibility
with all old libraries. If you did not bought the inverted
async patterns, even you still preserve your chances of migrating
to any other classical language.

4) Major Down side of thread approach is memory consumption.
That is 8MB per thread on linux. Other than this OS threads are cheap
on linux. (Windows is another story.) If your use case can afford
it, why not use it.

Returning to the original subject of this message thread;
as cr...@cr0hn.com proposed certain combinations of processes,
threads and coroutines definetely make sense..

Regards,


Re: [python-tulip] Process + Threads + asyncio... has sense?

2016-04-19 Thread Tobias Oberstein

Sorry, I should have been more explicit:

With Python (both CPython and PyPy), the least overhead / best 
performance (throughput) approach to network servers is:


Use a multi-process architecture with shared listening ports (Linux 
SO_REUSEPORT), with each process running an event loop (asyncio/Twisted).


I don't recommend using OS threads (of course) ;)

Am 19.04.2016 um 23:51 schrieb Gustavo Carneiro:

On 19 April 2016 at 22:02, Imran Geriskovan > wrote:

>> A) Python threads are not real threads. It multiplexes "Python Threads"
>> on a single OS thread. (Guido, can you correct me if I'm wrong,
>> and can you provide some info on multiplexing/context switching of
>> "Python Threads"?)

> Sorry, you are wrong. Python threads map 1:1 to OS threads. They are as
> real as threads come (the GIL notwithstanding).

Ok then. Just to confirm for cpython:
- Among these OS threads, only one thread can run at a time due to GIL.

A thread releases GIL (thus allow any other thread began execution)
when waiting for blocking I/O. (http://www.dabeaz.com/python/GIL.pdf)
This is similar to what we do in asyncio with awaits.

Thus, multi-threaded I/O is the next best thing if we do not use
asyncio.

Then the question is still this: Which one is cheaper?
Thread overheads or asyncio overheads.


IMHO, that is the wrong question to ask; that doesn't matter that much.
What matters most is, which one is safer.  Threads appear deceptively
simple... that is up to the point where you trigger a deadlock and your
whole application just freezes as a result.  Because threads need lots
and lots of locks everywhere.  Asyncio code also may need some locks,
but only a fraction, because for a lot of things you can get away with
not doing any locking.  For example, imagine a simple statistics class,
like this:

class MeanStat:
 def __init__(self):
 self.num_values = 0
 self.sum_values = 0

 def add_sample(self, value):
 self.num_values += 1
 self.sum_values += value
 @property
 def mean(self):
 return self.sum_values/self.num_values if self.num_values > 0
else 0


The code above can be used as is in asyncio applications.  You can call
MeanStat.add_sample() from multiple asyncio tasks at the same time
without any locking and you know the MeanStat.mean property will always
return a correct value.

However, if you try to do this with a threaded application, if you don't
use any locking you will get incorrect results (and what is annoying is
that you may not get incorrect results in development, but only in
production!), because a thread may be calling MeanStat.mean() and the
sum/nvalues expression may en up being calculated in the middle of
another thread adding a sample:

 def add_sample(self, value):
 self.num_values += 1
   < switches to another thread here: num_values was
updated, but sum_values was not!
 self.sum_values += value

The correct way to fix that code with threading is to add locks:

class MeanStat:
 def __init__(self):
 self.lock = threading.Lock()
 self.num_values = 0
 self.sum_values = 0

 def add_sample(self, value):
 with self.lock:
 self.num_values += 1
 self.sum_values += value
 @property
 def mean(self):
 with self.lock:
 return self.sum_values/self.num_values if self.num_values >
0 else 0

This is a very simple example, but it illustrates some of the problems
with threading vs coroutines:

1. With threads you need more locks, and the more locks you have: a)
the lower the performance, and b) the greater the risk of introducing
deadlocks;

2. If you /forget/ that you need locks in some place (remember that
most code is not as simple as this example), you get race conditions:
code that /seems/ to work fine in development, but behaves strangely in
production: strange values being computed, crashes, deadlocks.

So please keep in mind that things are not as black and white as "which
is faster".  There are other things to consider.

--
Gustavo J. A. M. Carneiro
Gambit Research
"The universe is always one step beyond logic." -- Frank Herbert




Re: [python-tulip] Process + Threads + asyncio... has sense?

2016-04-19 Thread Gustavo Carneiro
On 19 April 2016 at 22:02, Imran Geriskovan 
wrote:

> >> A) Python threads are not real threads. It multiplexes "Python Threads"
> >> on a single OS thread. (Guido, can you correct me if I'm wrong,
> >> and can you provide some info on multiplexing/context switching of
> >> "Python Threads"?)
>
> > Sorry, you are wrong. Python threads map 1:1 to OS threads. They are as
> > real as threads come (the GIL notwithstanding).
>
> Ok then. Just to confirm for cpython:
> - Among these OS threads, only one thread can run at a time due to GIL.
>
> A thread releases GIL (thus allow any other thread began execution)
> when waiting for blocking I/O. (http://www.dabeaz.com/python/GIL.pdf)
> This is similar to what we do in asyncio with awaits.
>
> Thus, multi-threaded I/O is the next best thing if we do not use asyncio.
>
> Then the question is still this: Which one is cheaper?
> Thread overheads or asyncio overheads.
>

IMHO, that is the wrong question to ask; that doesn't matter that much.
What matters most is, which one is safer.  Threads appear deceptively
simple... that is up to the point where you trigger a deadlock and your
whole application just freezes as a result.  Because threads need lots and
lots of locks everywhere.  Asyncio code also may need some locks, but only
a fraction, because for a lot of things you can get away with not doing any
locking.  For example, imagine a simple statistics class, like this:

class MeanStat:
def __init__(self):
self.num_values = 0
self.sum_values = 0

def add_sample(self, value):
self.num_values += 1
self.sum_values += value

@property
def mean(self):
return self.sum_values/self.num_values if self.num_values > 0 else 0


The code above can be used as is in asyncio applications.  You can call
MeanStat.add_sample() from multiple asyncio tasks at the same time without
any locking and you know the MeanStat.mean property will always return a
correct value.

However, if you try to do this with a threaded application, if you don't
use any locking you will get incorrect results (and what is annoying is
that you may not get incorrect results in development, but only in
production!), because a thread may be calling MeanStat.mean() and the
sum/nvalues expression may en up being calculated in the middle of another
thread adding a sample:

def add_sample(self, value):
self.num_values += 1
  < switches to another thread here: num_values was
updated, but sum_values was not!
self.sum_values += value

The correct way to fix that code with threading is to add locks:

class MeanStat:
def __init__(self):
self.lock = threading.Lock()
self.num_values = 0
self.sum_values = 0

def add_sample(self, value):
with self.lock:
self.num_values += 1
self.sum_values += value

@property
def mean(self):
with self.lock:
return self.sum_values/self.num_values if self.num_values > 0
else 0

This is a very simple example, but it illustrates some of the problems with
threading vs coroutines:

   1. With threads you need more locks, and the more locks you have: a) the
lower the performance, and b) the greater the risk of introducing deadlocks;

   2. If you /forget/ that you need locks in some place (remember that most
code is not as simple as this example), you get race conditions: code that
/seems/ to work fine in development, but behaves strangely in production:
strange values being computed, crashes, deadlocks.

So please keep in mind that things are not as black and white as "which is
faster".  There are other things to consider.

-- 
Gustavo J. A. M. Carneiro
Gambit Research
"The universe is always one step beyond logic." -- Frank Herbert


Re: [python-tulip] Process + Threads + asyncio... has sense?

2016-04-19 Thread Tobias Oberstein

Am 19.04.2016 um 23:02 schrieb Imran Geriskovan:

A) Python threads are not real threads. It multiplexes "Python Threads"
on a single OS thread. (Guido, can you correct me if I'm wrong,
and can you provide some info on multiplexing/context switching of
"Python Threads"?)



Sorry, you are wrong. Python threads map 1:1 to OS threads. They are as
real as threads come (the GIL notwithstanding).


Ok then. Just to confirm for cpython:
- Among these OS threads, only one thread can run at a time due to GIL.

A thread releases GIL (thus allow any other thread began execution)
when waiting for blocking I/O. (http://www.dabeaz.com/python/GIL.pdf)
This is similar to what we do in asyncio with awaits.

Thus, multi-threaded I/O is the next best thing if we do not use asyncio.

Then the question is still this: Which one is cheaper?
Thread overheads or asyncio overheads.



The overhead of cooperative multitasking is smaller, but for maximum 
performance you need to combine that with preemptive multitasking 
because to saturate modern hardware, you need high IO concurrency


(I am leaving out stuff like Linux AIO in this discussion)




Re: [python-tulip] Process + Threads + asyncio... has sense?

2016-04-18 Thread cr0hn
Thank you for your responses.

The scenario (I forgot in my first post): I'm trying to improve I/O accesses 
(disk/network...).

So, if a Python thread map with a OS 1:1 thread, and the main problem (I 
understood that) is the cost of context switching between of 
threads/coroutines... this raises me a new question:

If I only run a process with 1 thread (the default state) the GIL will change 
the context after the thread ticks was spent? Or the behavior is like a plain 
run until the program ends?

Thinking about that, I suppose that if the status is 1 process <-> 1 thread, 
without context change, obviously the best approach for high performance 
network I/O are with creating coroutines and not threads, right?

I'm wrong?


En 19 de abril de 2016 en 0:54:28, Guido van Rossum (gu...@python.org) escrito:

On Mon, Apr 18, 2016 at 1:26 PM, Imran Geriskovan  
wrote:
A) Python threads are not real threads. It multiplexes "Python Threads"
on a single OS thread. (Guido, can you correct me if I'm wrong,
and can you provide some info on multiplexing/context switching of
"Python Threads"?)

Sorry, you are wrong. Python threads map 1:1 to OS threads. They are as real as 
threads come (the GIL notwithstanding).

--
--Guido van Rossum (python.org/~guido)
---
Daniel García (cr0hn)
Security researcher and ethical hacker

Personal site: http://cr0hn.com
Linkedin: https://www.linkedin.com/in/garciagarciadaniel 
Company: http://abirtone.com 
Twitter: @ggdaniel 

signature.asc
Description: Message signed with OpenPGP using AMPGpg


Re: [python-tulip] Process + Threads + asyncio... has sense?

2016-04-18 Thread Guido van Rossum
On Mon, Apr 18, 2016 at 1:26 PM, Imran Geriskovan <
imran.gerisko...@gmail.com> wrote:

> A) Python threads are not real threads. It multiplexes "Python Threads"
> on a single OS thread. (Guido, can you correct me if I'm wrong,
> and can you provide some info on multiplexing/context switching of
> "Python Threads"?)
>

Sorry, you are wrong. Python threads map 1:1 to OS threads. They are as
real as threads come (the GIL notwithstanding).

-- 
--Guido van Rossum (python.org/~guido)


Re: [python-tulip] Process + Threads + asyncio... has sense?

2016-04-18 Thread Imran Geriskovan
>>> I don't think you need the threads.
>>> 1. If your tasks are I/O bound, coroutines are a safer way to do things,
>>> and probably even have better performance;
>>
>> Thread vs Coroutine context switching is an interesting topic.
>> Do you have any data for comparison?

> My 2cts:
> OS native (= non-green) threads are an OS scheduler driven, preemptive
> multitasking approach, necessarily with context switching overhead that
> is higher than a cooperative multitasking approach like asyncio event loop.
> Note: that is Twisted, not asyncio, but the latter should behave the
> same qualitatively.
> /Tobias

Linux OS threads come with 8MB stack per thread + switching
costs as you mentioned.

A) Python threads are not real threads. It multiplexes "Python Threads"
on a single OS thread. (Guido, can you correct me if I'm wrong,
and can you provide some info on multiplexing/context switching of
"Python Threads"?)

B) Where as asyncio multiplexes coroutines on a "Python Thread"?

The question is "Which one is more effective?". The answer is
ofcourse dependent on use case.

However, as a heavy user of coroutines, I begin to think to go back to
"Python Threads".. Anyway that's personal choice.

Now lets clarify advantages and disadvantages between A and B..

Regards,
Imran


Re: [python-tulip] Process + Threads + asyncio... has sense?

2016-04-18 Thread Tobias Oberstein

Am 18.04.2016 um 21:33 schrieb Imran Geriskovan:

On 4/18/16, Gustavo Carneiro  wrote:

I don't think you need the threads.
1. If your tasks are I/O bound, coroutines are a safer way to do things,
and probably even have better performance;


Thread vs Coroutine context switching is an interesting topic.
Do you have any data for comparison?


My 2cts:

OS native (= non-green) threads are an OS scheduler driven, preemptive 
multitasking approach, necessarily with context switching overhead that 
is higher than a cooperative multitasking approach like asyncio event loop.


Eg the context switching with threads involves saving and restoring the 
whole CPU core register set. OS native threads also involves bounding 
back and forth between kernel- and userspace.


Practical evidence: name one high performance network server that is 
using threads (and only threads), and not some event loop thing;)


You want N threads/processes where N is related to number of cores 
and/or effective IO concurrency _and_ each thread/process run an event 
loop thing. And because of the GIL, you want processes, not threads on 
(C)Python.


The effective IO concurrency depends on the number of IO queues your 
hardware supports (the NICs or the storage devices). The IO queues 
should have affinity to the (nearest) CPU core on an SMP system also.


For network, I once did some experiments of how far Python can go. Here 
is Python (PyPy) doing 630k HTTP requests/sec (12.6 GB/sec) using 40 cores:


https://github.com/crossbario/crossbarexamples/tree/master/benchmark/web

Note: that is Twisted, not asyncio, but the latter should behave the 
same qualitatively.


Cheers,
/Tobias



Regards,
Imran





Re: [python-tulip] Process + Threads + asyncio... has sense?

2016-04-18 Thread Imran Geriskovan
On 4/18/16, Gustavo Carneiro  wrote:
> I don't think you need the threads.
> 1. If your tasks are I/O bound, coroutines are a safer way to do things,
> and probably even have better performance;

Thread vs Coroutine context switching is an interesting topic.
Do you have any data for comparison?

Regards,
Imran


Re: [python-tulip] Process + Threads + asyncio... has sense?

2016-04-18 Thread Gustavo Carneiro
I don't think you need the threads.

1. If your tasks are I/O bound, coroutines are a safer way to do things,
and probably even have better performance;

2. If your tasks are CPU bound, only multiple processes will help, multiple
(Python) threads do not help at all.  Only in the special case where the
CPU work is mostly done via a C library[*] do threads help.

I would recommend using multiple threads only if interacting with 3rd party
code that is I/O bound but is not written with an asynchronous API, such as
the requests library, selenium, etc.  But in this case, probably using
asyncio.Loop.run_in_executor() is a simpler solution.

[*] and a C API wrapped in such a way that it does a lot of work with few
Python calls, plus it releases the GIL, so don't go thinking that a simple
scalar math function call can take advantage of multithreading.


On 18 April 2016 at 19:33, cr0hn cr0hn  wrote:

> Hi all,
>
> It's the first time I write in this list. Sorry if it's not the best place
> for this question.
>
> After I read the Asyncio's documentation, PEPs, Guido/Jesse/David Beazley
> articles/talks, etc, I developed a PoC library that mixes: Process +
> Threads + Asyncio Tasks, doing an scheme like this diagram:
>
>  main -> Process 1 -> Thread 1.1 -> Task 1.1.1
>  -> Task 1.1.2
>  -> Task 1.1.3
>
>-> Thread 1.2
>  -> Task 1.2.1
>  -> Task 1.2.2
>  -> Task 1.2.3
>
> Process 2 -> Thread 2.1 -> Task 2.1.1
> -> Task 2.1.2
> -> Task 2.1.3
>
>   -> Thread 2.2
> -> Task 2.2.1
> -> Task 2.2.2
> -> Task 2.2.3
>
> In my local tests, this approach appear to improve (and simplify) the
> concurrency/parallelism for some tasks but, before release the library at
> github, I don't know if my aproach is wrong and I would appreciate your
> opinion.
>
> Thank you very much for your time.
>
> Regards!
>



-- 
Gustavo J. A. M. Carneiro
Gambit Research
"The universe is always one step beyond logic." -- Frank Herbert