[Fwd: Re: Why POE is slower then Python - Twisted ... quite a lot?]

2009-03-26 Thread Martijn van Beers
forwarding for apocalypse
--- Begin Message ---

Hello,

   I have done some analysis on this, and used wireshark to track down 
the differences like Rocco suggested. Looking at the dumps, I 
immediately noticed 2 things, one being more important.


   1. The python code doesn't retrieve the first article in the group, 
it starts at the last-100 ( or whatever you set the articleCount to in 
the script ) This little fix solved it and now we're fetching the same 
articles as the perl one.


   Change line 17 to: self.articlesToFetch = range(first, 
first+self.factory.articleCount)


   2. ( the important difference ) Looking at the nntp protocol itself, 
I see that the behavior of both code is very different. The perl one 
does it "the right way" by using the server-side article pointer and 
issues the "next" command to move it to the next article id. The python 
one naively increments the article id and tries to fetch it.


   By incurring an extra roundtrip to the server, the perl code becomes 
2x slower. THIS is the culprit, and any other differences/factors are 
insignificant when you realize this. Also, since the python code just 
increments the article id, sometimes the server returns "423 no such 
article number in group" to the script. This means sometimes the article 
isn't downloaded - a slight decrease in the network activity :)


   Attached is the patch to the perl code to make it behave in the same 
way as the python code. IMO it's not the "right" way but at least this 
provides us with a fair comparison, ha! This is the output of both 
(fixed) code, with 100 articles. Let me know if you have any other 
questions :)


a...@satellite:~/Desktop/nntptest$ time perl nntp_fixed.pl
...
real0m19.627s
user0m0.588s
sys0m0.048s

a...@satellite:~/Desktop/nntptest$ time python nntp_fixed.py 
news.microsoft.com microsoft.public.access access.mbox

...
real0m19.589s
user0m0.420s
sys0m0.060s

howard chen wrote:

I have 2 simple scripts which connect to NNTP server to fetch 100
articles (no writing to local disk) in order to test the performance
of two frameworks.

I have found Twisted is faster by at least 2 times which make me surprised.

I know it is not fair to compare two frameworks by this simple test.
But I think 2 times is quite much.

Therefore, I upload the code online to see if any experts can find any problem?


Thanks.

http://howachen.googlepages.com/nntp.py
http://howachen.googlepages.com/nntp.pl


  

time python nntp.py news.microsoft.com microsoft.public.access access.mbox
  


18 sec

  

time perl nntp.pl
  


37 sec


Any comments?
  
--- nntp.pl	2009-03-26 14:18:11.0 +0100
+++ nntp_fixed.pl	2009-03-26 14:46:17.0 +0100
@@ -7,6 +7,7 @@
$|=1;
 
my $count = 0;
+   my $article_id;
my $nntp = POE::Component::Client::NNTP->spawn ( 'NNTP-Client', { NNTPServer => 'news.microsoft.com' } );
 
POE::Session->create(
@@ -17,7 +18,7 @@
 nntp_200  => '_connected',
 nntp_201  => '_connected',
 },
-'main' => [ qw(_start nntp_211 nntp_220 nntp_223 nntp_registered)
+'main' => [ qw(_start nntp_211 nntp_220 nntp_223 nntp_registered nntp_423)
 ],
 ],
);
@@ -47,8 +48,10 @@
 
sub nntp_211 {
 my ($kernel,$heap,$text) = @_[KERNEL,HEAP,ARG0];
+	my @data = split( ' ', $text );
+	$article_id = $data[1];
 
-$kernel->post( 'NNTP-Client' => 'article' );
+$kernel->post( 'NNTP-Client' => 'article' => $article_id );
 undef;
}
 
@@ -59,7 +62,17 @@
 $count++;
 die if $count >= 100;
 
-$kernel->post( 'NNTP-Client' => 'next' );
+$kernel->post( 'NNTP-Client' => 'article' => ++$article_id );
+undef;
+   }
+
+   sub nntp_423 {
+	print "Failed to retrieve $article_id\n";
+
+	$count++;
+die if $count >= 100;
+
+$_[KERNEL]->post( 'NNTP-Client' => 'article' => ++$article_id );
 undef;
}
 
--- End Message ---


Re: Why POE is slower then Python - Twisted ... quite a lot?

2009-03-25 Thread Rocco Caputo

It is too soon to conclude that POE is the cause.

The difference in CPU time between your Python and Perl tests is  
0.0002 seconds per article (user+sys).  It's not a significant  
difference at this scale.  It could be caused by any number of low- 
level variations between POE and Twisted, or Perl and Python.


The wall times are a better place to look for inefficiencies.  They  
are three orders of magnitude larger than the total CPU times.


The large difference between wall and CPU times indicates a non-CPU  
bottleneck.  In your test case, it's probably the network.


What could account for network differences?

You are comparing two different NNTP client libraries.  They may  
interact differently with the server, which is a likely place to  
look.  To investigate this, I would scale the tests down to a small  
number of articles (2 < count <= 10).  I would dump the resulting  
network traffic for each test.  I would use the dumps to compare  
interaction patterns between the two libraries.  I would also look at  
timings: the times between command & response, the times between one  
command and the next, the length of time to receive an article, and so  
on.  tcpdump and tshark (or wireshark) are very good for this sort of  
thing.


If network interaction is comparable, we should revisit the points I  
outlined in my last message.  They don't seem to have made much of an  
impression, but they may still be relevant.


Thank you.

--
Rocco Caputo - [email protected]

On Mar 25, 2009, at 13:18, howard chen wrote:


But anyway, now I re-run the test, instead of downloading 100 posts,
now I download 1K articles.


python

real3m6.007s
user0m0.088s
sys 0m0.016s


perl

real6m13.730s
user0m0.268s
sys 0m0.024s


Re: Why POE is slower then Python - Twisted ... quite a lot?

2009-03-25 Thread howard chen
Hi,


On Wed, Mar 25, 2009 at 11:14 PM, Chris Prather  wrote:
>
> That's a big supposition to make:
>


Please understand that my original email is comparing 37 vs 18 seconds
difference (i.e. 200% speedup), so I don't really care about the
startup/shutdown time since the difference is such obvious.

But anyway, now I re-run the test, instead of downloading 100 posts,
now I download 1K articles.


>> python
real3m6.007s
user0m0.088s
sys 0m0.016s

>> perl
real6m13.730s
user0m0.268s
sys 0m0.024s


Re: Why POE is slower then Python - Twisted ... quite a lot?

2009-03-25 Thread Chris Prather
On Wed, Mar 25, 2009 at 11:03 AM, howard chen  wrote:
> Hello,
>
> On Wed, Mar 25, 2009 at 4:10 AM, Rocco Caputo  wrote:
>>
>> time(1) includes Perl & Python startup/shutdown time, which may be
>> significant.
>>
>
> Suppose the difference is negligible.

That's a big supposition to make:

[11:12:35] g...@~/ $time python -c'import twisted'

real0m1.487s
user0m0.282s
sys 0m0.103s
[11:12:58] g...@~/ $time perl -MPOE -e1

real0m0.092s
user0m0.076s
sys 0m0.014s



Note that I am not a Python person so I don't know if these two
commands really are equivalent.

-Chris


Re: Why POE is slower then Python - Twisted ... quite a lot?

2009-03-25 Thread howard chen
Hello,

On Wed, Mar 25, 2009 at 4:10 AM, Rocco Caputo  wrote:
>
> time(1) includes Perl & Python startup/shutdown time, which may be
> significant.
>

Suppose the difference is negligible.


> Your times seem to be wall clock based, which can vary based on system load,
> network conditions, and so on.  Show the CPU times instead.
>
> It appears that you've only run one iteration.  Benchmarks should be run
> enough times to account for statistical variance.
>

No, I run the test for several times, on an idle server (Intel Quad
Core, SMP, 4GB RAM. using both latest POE & Twisted). The difference
is roughly 200% differences, so I was surprised.


Thank you.


Re: Why POE is slower then Python - Twisted ... quite a lot?

2009-03-24 Thread Rocco Caputo
I have a few issues with the way the benchmarks are run, and then some  
actual useful comments. :)


time(1) includes Perl & Python startup/shutdown time, which may be  
significant.


Your times seem to be wall clock based, which can vary based on system  
load, network conditions, and so on.  Show the CPU times instead.


It appears that you've only run one iteration.  Benchmarks should be  
run enough times to account for statistical variance.


Ok, that said:

A deep and accurate answer requires a deep knowledge of all  
technologies involved.  I'm not familiar with Python and Twisted, so I  
can't make a good comparison by myself.


A previous reply on this thread suggested that CORE::select() was  
taking the most time, which strongly implies that POE spends more time  
waiting for network activity than Twisted does.


If that's the case, it may be that POE may be checking for I/O faster  
than Twisted, and therefore more often, which could result in more and  
smaller reads from the socket.  The overhead per read will be higher,  
although the reads will be more timely.


Someone with dual Perl/Python and POE/Twisted experience would be best  
to dig into this to find the actual cause.  We can then act on that,  
rather than on conjecture.


Since the cause is likely to be within POE's implementation, we have  
room for improvement.  For example, it's been suggested on  
irc.perl.org #poe to make POE's I/O events latency tunable.  An  
interactive application might choose better responsiveness, while a  
bulk transfer application (such as yours?) could be tuned for higher  
throughput.  It's an interesting idea, but I don't have resources to  
pursue it.  If someone else does, I'll be happy to provide advice and  
guidance, here, in private e-mail, or on IRC.


--
Rocco Caputo - [email protected]



On Mar 23, 2009, at 07:18, howard chen wrote:


I have 2 simple scripts which connect to NNTP server to fetch 100
articles (no writing to local disk) in order to test the performance
of two frameworks.

I have found Twisted is faster by at least 2 times which make me  
surprised.


I know it is not fair to compare two frameworks by this simple test.
But I think 2 times is quite much.

Therefore, I upload the code online to see if any experts can find  
any problem?



Thanks.

http://howachen.googlepages.com/nntp.py
http://howachen.googlepages.com/nntp.pl


time python nntp.py news.microsoft.com microsoft.public.access  
access.mbox


18 sec


time perl nntp.pl


37 sec


Any comments?




Re: Why POE is slower then Python - Twisted ... quite a lot?

2009-03-24 Thread Martijn van Beers
On Tue, 2009-03-24 at 10:05 +0900, Daisuke Maki wrote:
> I'm going to take a wild guess here: POE::Component::Pluggable.

Nope, that isn't it. I would have found that surprising anyway, since
there aren't any plugins being used, so it's just a few checks if there
are any plugins. (I checked by disabling the plugin handling)

I thought maybe it had something to do with twisted running a more
efficient loop by default, so I tried with some alternate loops (EV,
POE::Loop::XS::EPoll), but that doesn't make much of a difference either

Then I thought maybe it's the fact that Client::NNTP uses Filter::Line,
so there's an event dispatched for each line of message instead of just
one per message. While hacking up Client::NNTP to use Filter::Stream
instead reduced the number of _invoke_state calls from almost 6000 to
slightly over a thousand, that still only shaves off a few seconds of
runtime.

Devel::NYTProf says most of the time is being spent in CORE::select in
loop_do_timeslice for the select loop or in EV::loop() for the EV loop
(and not much difference between the two). I'm not an expert on POE
internals, nor familiar with Twisted, so I can't say if it's just
because Twisted does less than POE, or that there's something that could
be optimised.


Martijn



Re: Why POE is slower then Python - Twisted ... quite a lot?

2009-03-23 Thread Daisuke Maki

I'm going to take a wild guess here: POE::Component::Pluggable.

In looking at twister.news.nntp.py, it's a straigh nntp implementation, 
with a bunch of hook points.


PoCo::Client::NNTP is built on top of POE::Component::Pluggable, and so 
does all dispatching via a context-less layer (i.e., PoCo::Pluggable 
doesn't know anything about NNTP or what have you), so dispatching works 
 in a somewhat inefficient way.


I suppose one could make it faster by inlining the dispatch code in 
PoCo::Plubble or something like that, though.


--d

howard chen wrote:

I have 2 simple scripts which connect to NNTP server to fetch 100
articles (no writing to local disk) in order to test the performance
of two frameworks.

I have found Twisted is faster by at least 2 times which make me surprised.

I know it is not fair to compare two frameworks by this simple test.
But I think 2 times is quite much.

Therefore, I upload the code online to see if any experts can find any problem?


Thanks.

http://howachen.googlepages.com/nntp.py
http://howachen.googlepages.com/nntp.pl



time python nntp.py news.microsoft.com microsoft.public.access access.mbox


18 sec


time perl nntp.pl


37 sec


Any comments?