Have a look into the http protocol plugin.
HTH
Stefan
Am 18.06.2005 um 08:32 schrieb J S:

Hi,

I'm crawling the Intranet at work which runs on a Lotus Domino server. When you go to some URLs on the Intranet, Domino returns a code 400, then appends ?OpenDocument on the end of the URL, and the GET on this comes back with a code 200.

The problem is obviously with Domino, but I don't think it's something I can fix easily (as it's not in my department). Nutch correctly thinks the URL doesn't work so misses it out. However, I wondered if I could tailor the code in Nutch to say (in pseudo code):

if (rc=400){
  try{
       URL=URL+"?OpenDocument");
       getURL(URL);
  }
}

Could anyone point me to the relevant java file which I would need to update to achieve this?

Many thanks,

JS.





---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net


Reply via email to