I'm new to nutch, and I'm trying to crawl an intranet site. It has some word docs on it which contain spaces in the filenames, and these seem to be causing problems. An excerpt from my log: 041230 114316 fetching http://10.23.1.206/documents/STT/Components/Technical/CommonServices/Scope Manager User Guide.doc 041230 114316 fetched 365 bytes from http://10.23.1.206/documents/STT/Components/Technical/CommonServices/Scope Manager User Guide.doc 041230 114316 fetch of http://10.23.1.206/documents/STT/Components/Technical/CommonServices/Scope Manager User Guide.doc failed with: net.nutch.protocol.http.HttpError: HTTP Error: 400
This happens consistently with documents which have spaces in the URL. Is there a simple setting that I'm overlooking? Thanks, Mike Monette ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
