From: src/java/net/nutch/fetcher/Fetcher.java
Any suggestions on where to look for logging of this stack, related
to the message below. I have to missing something small here (perhaps
lack of coffee). "LOG.info" by default displays to stdout. Where
does/can "LOG.log" write to?
private void logError(String url, FetchListEntry fle, Throwable t) {
LOG.info("fetch of " + url + " failed with: " + t);
LOG.log(Level.FINE, "stack", t); // stack trace
synchronized (Fetcher.this) { // record failure
errors++;
}
}
When following the whole web crawling strategy outlined in the tutorial,
the following error is occurring. I'd say probably 50% of the output
from the fetch is this error? Has anyone else seen this? There are a
few thousand URLs loaded via nutch inject. I can understand possibly
getting a few errors but in hand checking the URLs for which this
happens, they respond fine.
I checked the URL file list and there are not extraneous characters.
Error: (example.com is not the real URL)
050719 221355 fetch of http://example.com/ failed with:
net.nutch.protocol.http.HttpException: java.net.ConnectException:
Invalid argument
The Script:
#!/bin/bash
rm -rf db
rm -rf segments
mkdir db
mkdir segments
bin/nutch admin db -create
bin/nutch inject db -urlfile urls
bin/nutch generate db segments
s=`ls -d segments/2* | tail -1`
echo Segment is $s
bin/nutch fetch $s <-- ERROR ERROR ERROR
bin/nutch updatedb db $s
bin/nutch analyze db 5
bin/nutch index $s
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers