Hi,
I'm crawling the Intranet at work which runs on a Lotus Domino server. When
you go to some URLs on the Intranet, Domino returns a code 400, then appends
?OpenDocument on the end of the URL, and the GET on this comes back with a
code 200.
The problem is obviously with Domino, but I don't think it's something I can
fix easily (as it's not in my department). Nutch correctly thinks the URL
doesn't work so misses it out. However, I wondered if I could tailor the
code in Nutch to say (in pseudo code):
if (rc=400){
try{
URL=URL+"?OpenDocument");
getURL(URL);
}
}
Could anyone point me to the relevant java file which I would need to update
to achieve this?
Many thanks,
JS.
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers