Duh, 1 Fetcher2 instance, 2 maps!

----- Original Message ----
From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
To: Nutch User List <[email protected]>
Sent: Wednesday, April 9, 2008 5:49:03 PM
Subject: Weirdness: 2 Fetcher2 instances?

Hi (Andrzej in particular :)),

The (String) status into sent to the Reporter (Hadoop) is nice to see in logs.  
One could modify Fetcher2 to simply log that string.  This reveals something 
weird, check this:

2008-04-09 17:15:01,764 INFO  fetcher.Fetcher2 - 100 threads, 285 pages, 47 
errors, 3.0 pages/s, 828 kb/s, 
2008-04-09 17:15:02,732 INFO  fetcher.Fetcher2 - 100 threads, 160 pages, 30 
errors, 1.7 pages/s, 408 kb/s, 
2008-04-09 17:15:02,765 INFO  fetcher.Fetcher2 - 100 threads, 285 pages, 47 
errors, 3.0 pages/s, 819 kb/s, 
2008-04-09 17:15:03,734 INFO  fetcher.Fetcher2 - 100 threads, 161 pages, 30 
errors, 1.7 pages/s, 406 kb/s, 
2008-04-09 17:15:03,767 INFO  fetcher.Fetcher2 - 100 threads, 286 pages, 48 
errors, 3.0 pages/s, 811 kb/s, 
2008-04-09 17:15:04,736 INFO  fetcher.Fetcher2 - 100 threads, 162 pages, 30 
errors, 1.7 pages/s, 403 kb/s, 
2008-04-09 17:15:04,769 INFO  fetcher.Fetcher2 - 100 threads, 288 pages, 48 
errors, 3.0 pages/s, 808 kb/s,


Notice anything weird above?  As if the stats are from 2 different Fetcher2 
instances, each increasing independently:
1) 285 pages, 286 pages, 288 pages...
2) 160 pages, 161 pages, 162 pages...

Is this expected?  Looks suspicious to me - as if there are 2 Fetcher2 
instances running (and there aren't).  Shouldn't we see just one series of 
numbers?

I don't see how one could end up with 2 Fetcher2 instances without calling 
bin/nutch fetch2 .... twice (not the case here and the same behaviour is 
observed with multiple separate Fetcher2 runs).

Any ideas?  Am I missing something?

Thanks,
Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





Reply via email to