forwarding for apocalypse
--- Begin Message ---
Hello,

I have done some analysis on this, and used wireshark to track down the differences like Rocco suggested. Looking at the dumps, I immediately noticed 2 things, one being more important.

1. The python code doesn't retrieve the first article in the group, it starts at the last-100 ( or whatever you set the articleCount to in the script ) This little fix solved it and now we're fetching the same articles as the perl one.

Change line 17 to: self.articlesToFetch = range(first, first+self.factory.articleCount)

2. ( the important difference ) Looking at the nntp protocol itself, I see that the behavior of both code is very different. The perl one does it "the right way" by using the server-side article pointer and issues the "next" command to move it to the next article id. The python one naively increments the article id and tries to fetch it.

By incurring an extra roundtrip to the server, the perl code becomes 2x slower. THIS is the culprit, and any other differences/factors are insignificant when you realize this. Also, since the python code just increments the article id, sometimes the server returns "423 no such article number in group" to the script. This means sometimes the article isn't downloaded - a slight decrease in the network activity :)

Attached is the patch to the perl code to make it behave in the same way as the python code. IMO it's not the "right" way but at least this provides us with a fair comparison, ha! This is the output of both (fixed) code, with 100 articles. Let me know if you have any other questions :)

a...@satellite:~/Desktop/nntptest$ time perl nntp_fixed.pl
...
real    0m19.627s
user    0m0.588s
sys    0m0.048s

a...@satellite:~/Desktop/nntptest$ time python nntp_fixed.py news.microsoft.com microsoft.public.access access.mbox
...
real    0m19.589s
user    0m0.420s
sys    0m0.060s

howard chen wrote:
I have 2 simple scripts which connect to NNTP server to fetch 100
articles (no writing to local disk) in order to test the performance
of two frameworks.

I have found Twisted is faster by at least 2 times which make me surprised.

I know it is not fair to compare two frameworks by this simple test.
But I think 2 times is quite much.

Therefore, I upload the code online to see if any experts can find any problem?


Thanks.

http://howachen.googlepages.com/nntp.py
http://howachen.googlepages.com/nntp.pl


time python nntp.py news.microsoft.com microsoft.public.access access.mbox

18 sec

time perl nntp.pl

37 sec


Any comments?
--- nntp.pl	2009-03-26 14:18:11.000000000 +0100
+++ nntp_fixed.pl	2009-03-26 14:46:17.000000000 +0100
@@ -7,6 +7,7 @@
    $|=1;
 
    my $count = 0;
+   my $article_id;
    my $nntp = POE::Component::Client::NNTP->spawn ( 'NNTP-Client', { NNTPServer => 'news.microsoft.com' } );
 
    POE::Session->create(
@@ -17,7 +18,7 @@
                             nntp_200          => '_connected',
                             nntp_201          => '_connected',
                 },
-                'main' => [ qw(_start nntp_211 nntp_220 nntp_223 nntp_registered)
+                'main' => [ qw(_start nntp_211 nntp_220 nntp_223 nntp_registered nntp_423)
                 ],
         ],
    );
@@ -47,8 +48,10 @@
 
    sub nntp_211 {
         my ($kernel,$heap,$text) = @_[KERNEL,HEAP,ARG0];
+	my @data = split( ' ', $text );
+	$article_id = $data[1];
 
-        $kernel->post( 'NNTP-Client' => 'article' );
+        $kernel->post( 'NNTP-Client' => 'article' => $article_id );
         undef;
    }
 
@@ -59,7 +62,17 @@
         $count++;
         die if $count >= 100;
 
-        $kernel->post( 'NNTP-Client' => 'next' );
+        $kernel->post( 'NNTP-Client' => 'article' => ++$article_id );
+        undef;
+   }
+
+   sub nntp_423 {
+	print "Failed to retrieve $article_id\n";
+
+	$count++;
+        die if $count >= 100;
+
+        $_[KERNEL]->post( 'NNTP-Client' => 'article' => ++$article_id );
         undef;
    }
 

--- End Message ---

Reply via email to