> How can I have the status of the crawl process ? In general this should be apparent from the crawl log.
- number of fetched pages >> is printed to the logs at certain intervals (also number of pages/sec etc.) - number of indexed pages >> if you use the crawl too, indexing is done after all pages have been fetched, so if fetching is still going on, the answer is 0 - current depth of crawl >> equals the number of directories in the segments folder. Depending off course on the number of domains you're crawling, the speed of your internet connection and hardware used, but from my own experience one week for depth 10 with the Crawl tool is not unusual. HTH, Thomas > number of fetched/indexed pages, current depth of > crawl, percentage of tasks realised...and any other useful information). > > "bin/nutch readdb -stats" gives me some tips but I have some > difficulties to interpret them and they do not chage very often > > Thanks for this great list, > > Fabrice > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
