[Nutch-general] Re: Crawl status

TDLN Wed, 05 Apr 2006 02:34:05 -0700

> How can I have the status of the crawl process ?

In general this should be apparent from the crawl log.


- number of fetched pages >> is printed to the logs at certain
intervals (also number of pages/sec etc.)
- number of indexed pages >> if you use the crawl too, indexing is
done after all pages have been fetched, so if fetching is still going
on, the answer is 0
- current depth of crawl >> equals the number of directories in the
segments folder.

Depending off course on the number of domains you're crawling, the
speed of your internet connection and hardware used, but from my own
experience one week for depth 10 with the Crawl tool is not unusual.

HTH,   Thomas


> number of fetched/indexed pages, current depth of
> crawl, percentage of tasks realised...and any other useful information).
>
> "bin/nutch readdb -stats" gives me some tips but I have some
> difficulties to interpret them and they do not chage very often
>
> Thanks for this great list,
>
> Fabrice
>


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: Crawl status

Reply via email to