Am 21.12.2005 um 01:35 schrieb Teruhiko Kurosaka:
Is there a way to configure nutch to show the
pages that have broken links?
Well, you may can hack nutch to do this, but this makes less sense.
There are some other tools for that.
In the default setting, the crawl log lists the URL
that is being fetched and failed. But it does not
tell me which page has that broken link.
By the way what, does
"org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry
later"
mean?
The page was not fetched (e.g. since you only fetch one host and has
many threads, but only one thread per host configured)
I increated the http.max.delapys parameter to 100 but I still see
this.
How large does the value need to be for a medium sized inranet web
site
for
a small company? Is the error saying that I should rerun the crawler,
or is it simply informing me that it will try again in the same
session?
if you fetch only one host, it is a good idea t have
fetcher.threads.per.host and fetcher.threads.fetch identically.
HTH
Stefan
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general