Hi John , Great thanks for your answer, with fetch job was my fail, but i can't understand situation with parse job (and also with index job). We succeed fetched 1882487 documents, but parse only 258629+562 (mapreduce take only 713961) On my mind it could be some problem with hadoop and etc, but i have no idea what to check and how. Could you please give a little bit more information if you have any assumption.
BR Sergey Bolshakov >Hi Ai, > >On Wed, May 20, 2015 at 1:03 AM, < [email protected] > wrote: > >> >> I hope someone can give me advice: i run nutch over last version of >> cloudera, i have 4 servers. I tried to crawl start pages and all links >> from it (with same domain). I uploaded about 5 mln domains and see the next >> >> >[SNIP] > > >> >> nutch fetch 1432017717-23908 - fine, but already we got 4881050 instead >> of 4881110 >> >> Map-Reduce Framework >> Map input records=4881050 >> Map output records=4881050 >> > >Please have a look at the Fetch custom Hadoop counters that a number of us >dev's have been adding over previous development cycles > > FetcherStatus > ACCESS_DENIED=4846 > EXCEPTION=1944714 > GONE=1293 > HitByTimeLimit-QueueFeeder=0 > MOVED=314310 > NOTFOUND=30906 > NOTMODIFIED=1 > SUCCESS=1882487 > TEMP_MOVED=150753 > > >> >> --------------------------- >> >> nutch parse 1432017717-23908 >> >> Map-Reduce Framework >> Map input records=713961 >> Map output records=702082 >> >> We took only 713961 records, why? I can't uderstand >> > >Again please see the custom Counters > > ParserStatus > failed=562 > success=258629 > >Thanks for pasting the entire LOG. It really helps when we have TRACE level >logging for this type of debugging. >Thanks >Lewis >

