so one of the exceptions that I see a lot in my log files is invalid uri
exception like this:
2012-02-13 15:05:50,217 ERROR org.apache.nutch.protocol.httpclient.Http:
java.lang.IllegalArgumentException: Invalid uri
'http://www.prolitegear.com/site/xdpy/ssg/Shelters/Shelter
Accessories.html': escaped absolute path not valid
2012-02-13 15:05:50,217 ERROR org.apache.nutch.protocol.httpclient.Http:
java.lang.IllegalArgumentException: Invalid uri
'http://www.prolitegear.com/site/xdpy/ssg/Shelters/Shelter
Accessories.html': escaped absolute path not valid
2012-02-13 15:05:50,226 ERROR org.apache.nutch.protocol.httpclient.Http:
java.lang.IllegalArgumentException: Invalid uri
'http://www.prolitegear.com/activity/Adventure Racing/index.html':
escaped absolute path not valid
(there is a space between "Shelter" and "Accessories") I thought at
first that it is because of the space in the linke but these addresses
from the same site go through with no problem:
2012-02-13 15:05:50,114 INFO org.apache.nutch.fetcher.Fetcher: fetching
http://www.prolitegear.com/site/xdpy/ssg/Shelters/Shelter Accessories.html
2012-02-13 15:05:50,105 INFO org.apache.nutch.fetcher.Fetcher: fetching
http://www.prolitegear.com/site/xdpy/ssg/Accessories/Sun Protection.html
2012-02-13 15:05:50,149 INFO org.apache.nutch.fetcher.Fetcher: fetching
http://www.prolitegear.com/site/xdpy/ssg/Bargains & Closeouts/Sleeping
Bags: 0° to 20° F.html
2012-02-13 15:05:50,100 INFO org.apache.nutch.fetcher.Fetcher: fetching
http://www.prolitegear.com/site/xdpy/ssg/Climbing Gear/Protection.html
does anybody have any idea what might be wrong here? ( I am using
protocol-httpclient and all the links are actually valid. they work if u
copy and paste them into a browser)
--
Kaveh Minooie
www.plutoz.com