I'm new to nutch, and I'm trying to crawl an intranet site.  It has
some word docs on it which contain spaces in the filenames, and these
seem to be causing problems.  An excerpt from my log:
041230 114316 fetching
http://10.23.1.206/documents/STT/Components/Technical/CommonServices/Scope
Manager User Guide.doc
041230 114316 fetched 365 bytes from
http://10.23.1.206/documents/STT/Components/Technical/CommonServices/Scope
Manager User Guide.doc
041230 114316 fetch of
http://10.23.1.206/documents/STT/Components/Technical/CommonServices/Scope
Manager User Guide.doc failed with: net.nutch.protocol.http.HttpError:
HTTP Error: 400


This happens consistently with documents which have spaces in the URL.
 Is there a simple setting that I'm overlooking?

Thanks,
Mike Monette


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to