Hi,

I'm crawling the Intranet at work which runs on a Lotus Domino server. When you go to some URLs on the Intranet, Domino returns a code 400, then appends ?OpenDocument on the end of the URL, and the GET on this comes back with a code 200.

The problem is obviously with Domino, but I don't think it's something I can fix easily (as it's not in my department). Nutch correctly thinks the URL doesn't work so misses it out. However, I wondered if I could tailor the code in Nutch to say (in pseudo code):

if (rc=400){
  try{
       URL=URL+"?OpenDocument");
       getURL(URL);
  }
}

Could anyone point me to the relevant java file which I would need to update to achieve this?

Many thanks,

JS.




-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to