The output format is <hostCount,hostName>
Maybe FETCHED and NOT FETCHED attribute of CrawlDatum will appear in any
position in Text file. It is not be sorted.

On Mon, Aug 20, 2012 at 7:56 PM, Markus Jelsma
<[email protected]>wrote:

> Those counts are the sum of the fetched pages for that host. 210661 are
> fetched in total and 427773 are unfetched.
>
>
> -----Original message-----
> > From:Alexei Korolev <[email protected]>
> > Sent: Mon 20-Aug-2012 13:38
> > To: [email protected]
> > Subject: what's mean this values?
> >
> > Hello,
> >
> > I tried to google about it, but without luck. I run this command:
> >
> > nutch domainstats crawl/crawldb/current temp host
> >
> > and then have following output:
> >
> > 469       ttt.in.ua
> > 12         aaa.com.ua
> > 210661  FETCHED
> > 427773  NOT_FETCHED
> > 4238     aaaa.ru
> > 1          all4vvvv.com.ua
> > 17844   amtist.ru
> > 4092     aptrrr.ru
> >
> > Anybody could explore for me what's mean this values? And why I have
> > FETCHED and NOT FETCHED in the middle of this list?
> >
> > Thanks.
> >
> > --
> > Alexei A. Korolev
> >
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to