Re: [python-tulip] Performance of asynchronous data processing.

Glyph Fri, 13 Jun 2014 00:27:07 -0700

On Jun 12, 2014, at 7:37 AM, Jonathan Slenders <[email protected]> 
wrote:


> I'm very interested if anyone has faced the same problem.

For what it's worth, we faced the same performance issue with twisted.web2, 
which is one of the reasons that twisted.web2 eventually got canned and we went 
back to incrementally maintaining twisted.web.  There was an abstraction called 
"streams" which created a new Deferred for every read operation, and it was 
just painfully slow because of all the allocation and garbage collecting of 
billions of little Deferred objects.

Really what you want is an API like fetch_into(collector) where "collector" is 
an object with row_received and query_complete methods.  Then you don't need a 
Future for every single row (or batch of rows); you just get a method called 
for each row.

Of course this makes it somewhat difficult to write a nice syntactic for-loop 
in a coroutine over the result set, but it is an open question how to resolve 
that :-).

This is somewhat similar to the transport/protocol separation, just at the 
application layer.

There's an ongoing branch (although it might be more accurate to call it a 
"research project") in Twisted as to how to fix this in a more general way than 
creating a new interface for every new form of variable-length data, and if it 
ever works out, I'll be sure to share the technique with the asyncio community.

-glyph

Re: [python-tulip] Performance of asynchronous data processing.

Reply via email to