Hi,

Can anyone help me with the following problem. In my crawl.log I'm getting lots of messages such as those below. However if I test the URLs in my browser, they're fine. Is there a regular expression I need to update somewhere e.g. One of the URLs below has a space in it. So I was thinking I might need to change or add a line in crawl-urlfilter.txt ?


fetch of http://planetbp.bp.com/general/aptrix/bani.nsf/Content/XXXXPS%5FMB%5F090605%5CXXXXps%5FManagement+Briefing%5F090605
failed with: org.apache.nutch.protocol.http.HttpError: HTTP Error: 400

fetch of http://planetbp.bp.com/general/aptrix/aptrix.nsf/Content/BP websites failed with: org.apache.nutch.protocol.http.HttpError: HTTP Error: 400


fetch of http://planetbp.bp.com/general/aptrix/aptcsops.nsf/Content/GoHi+Services+Home%5CSocial failed with: org.apache.nutch.protocol.http.HttpError: HTTP Error: 400


fetch of http://planetbp.bp.com/general/aptrix/aptppl.nsf/Content/Training+Home%5CBusiness+Tools%5CPatrol+Medical failed with: org.apache.nutch.protocol.http.HttpError: HTTP Error: 500


Reply via email to