forwarding for apocalypse
--- Begin Message ---
Hello,
I have done some analysis on this, and used wireshark to track down
the differences like Rocco suggested. Looking at the dumps, I
immediately noticed 2 things, one being more important.
1. The python code doesn't retrieve the first article in the group,
it starts at the last-100 ( or whatever you set the articleCount to in
the script ) This little fix solved it and now we're fetching the same
articles as the perl one.
Change line 17 to: self.articlesToFetch = range(first,
first+self.factory.articleCount)
2. ( the important difference ) Looking at the nntp protocol itself,
I see that the behavior of both code is very different. The perl one
does it "the right way" by using the server-side article pointer and
issues the "next" command to move it to the next article id. The python
one naively increments the article id and tries to fetch it.
By incurring an extra roundtrip to the server, the perl code becomes
2x slower. THIS is the culprit, and any other differences/factors are
insignificant when you realize this. Also, since the python code just
increments the article id, sometimes the server returns "423 no such
article number in group" to the script. This means sometimes the article
isn't downloaded - a slight decrease in the network activity :)
Attached is the patch to the perl code to make it behave in the same
way as the python code. IMO it's not the "right" way but at least this
provides us with a fair comparison, ha! This is the output of both
(fixed) code, with 100 articles. Let me know if you have any other
questions :)
a...@satellite:~/Desktop/nntptest$ time perl nntp_fixed.pl
...
real 0m19.627s
user 0m0.588s
sys 0m0.048s
a...@satellite:~/Desktop/nntptest$ time python nntp_fixed.py
news.microsoft.com microsoft.public.access access.mbox
...
real 0m19.589s
user 0m0.420s
sys 0m0.060s
howard chen wrote:
I have 2 simple scripts which connect to NNTP server to fetch 100
articles (no writing to local disk) in order to test the performance
of two frameworks.
I have found Twisted is faster by at least 2 times which make me surprised.
I know it is not fair to compare two frameworks by this simple test.
But I think 2 times is quite much.
Therefore, I upload the code online to see if any experts can find any problem?
Thanks.
http://howachen.googlepages.com/nntp.py
http://howachen.googlepages.com/nntp.pl
time python nntp.py news.microsoft.com microsoft.public.access access.mbox
18 sec
time perl nntp.pl
37 sec
Any comments?
--- nntp.pl 2009-03-26 14:18:11.000000000 +0100
+++ nntp_fixed.pl 2009-03-26 14:46:17.000000000 +0100
@@ -7,6 +7,7 @@
$|=1;
my $count = 0;
+ my $article_id;
my $nntp = POE::Component::Client::NNTP->spawn ( 'NNTP-Client', { NNTPServer => 'news.microsoft.com' } );
POE::Session->create(
@@ -17,7 +18,7 @@
nntp_200 => '_connected',
nntp_201 => '_connected',
},
- 'main' => [ qw(_start nntp_211 nntp_220 nntp_223 nntp_registered)
+ 'main' => [ qw(_start nntp_211 nntp_220 nntp_223 nntp_registered nntp_423)
],
],
);
@@ -47,8 +48,10 @@
sub nntp_211 {
my ($kernel,$heap,$text) = @_[KERNEL,HEAP,ARG0];
+ my @data = split( ' ', $text );
+ $article_id = $data[1];
- $kernel->post( 'NNTP-Client' => 'article' );
+ $kernel->post( 'NNTP-Client' => 'article' => $article_id );
undef;
}
@@ -59,7 +62,17 @@
$count++;
die if $count >= 100;
- $kernel->post( 'NNTP-Client' => 'next' );
+ $kernel->post( 'NNTP-Client' => 'article' => ++$article_id );
+ undef;
+ }
+
+ sub nntp_423 {
+ print "Failed to retrieve $article_id\n";
+
+ $count++;
+ die if $count >= 100;
+
+ $_[KERNEL]->post( 'NNTP-Client' => 'article' => ++$article_id );
undef;
}
--- End Message ---