Nutch Fetch - HttpException : Connect Exception : Invalid Argument

Jon Shoberg Tue, 19 Jul 2005 19:24:13 -0700

When following the whole web crawling strategy outlined in the tutorial,the following error is occurring. I'd say probably 50% of the outputfrom the fetch is this error? Has anyone else seen this? There are afew thousand URLs loaded via nutch inject. I can understand possiblygetting a few errors but in hand checking the URLs for which thishappens, they respond fine.


I checked the URL file list and there are not extraneous characters.


Error: (example.com is not the real URL)

050719 221355 fetch of http://example.com/ failed with:net.nutch.protocol.http.HttpException: java.net.ConnectException:Invalid argument


The Script:

#!/bin/bash
rm -rf db
rm -rf segments
mkdir db
mkdir segments
bin/nutch admin db -create
bin/nutch inject db -urlfile urls
bin/nutch generate db segments
s=`ls -d segments/2* | tail -1`
echo Segment is $s
bin/nutch fetch $s   <-- ERROR ERROR ERROR
bin/nutch updatedb db $s
bin/nutch analyze db 5
bin/nutch index $s

Nutch Fetch - HttpException : Connect Exception : Invalid Argument

Reply via email to