Have a look into the http protocol plugin.
HTH
Stefan
Am 18.06.2005 um 08:32 schrieb J S:
Hi,
I'm crawling the Intranet at work which runs on a Lotus Domino
server. When you go to some URLs on the Intranet, Domino returns a
code 400, then appends ?OpenDocument on the end of the URL, and the
GET on this comes back with a code 200.
The problem is obviously with Domino, but I don't think it's
something I can fix easily (as it's not in my department). Nutch
correctly thinks the URL doesn't work so misses it out. However, I
wondered if I could tailor the code in Nutch to say (in pseudo code):
if (rc=400){
try{
URL=URL+"?OpenDocument");
getURL(URL);
}
}
Could anyone point me to the relevant java file which I would need
to update to achieve this?
Many thanks,
JS.
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net