Hi (Andrzej in particular :)), The (String) status into sent to the Reporter (Hadoop) is nice to see in logs. One could modify Fetcher2 to simply log that string. This reveals something weird, check this:
2008-04-09 17:15:01,764 INFO fetcher.Fetcher2 - 100 threads, 285 pages, 47 errors, 3.0 pages/s, 828 kb/s, 2008-04-09 17:15:02,732 INFO fetcher.Fetcher2 - 100 threads, 160 pages, 30 errors, 1.7 pages/s, 408 kb/s, 2008-04-09 17:15:02,765 INFO fetcher.Fetcher2 - 100 threads, 285 pages, 47 errors, 3.0 pages/s, 819 kb/s, 2008-04-09 17:15:03,734 INFO fetcher.Fetcher2 - 100 threads, 161 pages, 30 errors, 1.7 pages/s, 406 kb/s, 2008-04-09 17:15:03,767 INFO fetcher.Fetcher2 - 100 threads, 286 pages, 48 errors, 3.0 pages/s, 811 kb/s, 2008-04-09 17:15:04,736 INFO fetcher.Fetcher2 - 100 threads, 162 pages, 30 errors, 1.7 pages/s, 403 kb/s, 2008-04-09 17:15:04,769 INFO fetcher.Fetcher2 - 100 threads, 288 pages, 48 errors, 3.0 pages/s, 808 kb/s, Notice anything weird above? As if the stats are from 2 different Fetcher2 instances, each increasing independently: 1) 285 pages, 286 pages, 288 pages... 2) 160 pages, 161 pages, 162 pages... Is this expected? Looks suspicious to me - as if there are 2 Fetcher2 instances running (and there aren't). Shouldn't we see just one series of numbers? I don't see how one could end up with 2 Fetcher2 instances without calling bin/nutch fetch2 .... twice (not the case here and the same behaviour is observed with multiple separate Fetcher2 runs). Any ideas? Am I missing something? Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
