I had a similar problem. My solution was to modify the HTTPREsponse class in
org.apache.nutch.protocol.httpclient.
In Constructor i changed the first lines like this:
// Prepare GET method for HTTP request
this.url = url;
URI uri =null;
//MODIFIED
try {
uri = new URI(url.getProtocol(), url.getHost(), url.getPath(),
url.getQuery(), null);
} catch (Exception e) {
// do whatever you want
}
GetMethod get = new GetMethod(uri.toASCIIString());
//Continue with the original code
--
View this message in context:
http://lucene.472066.n3.nabble.com/local-file-system-crawl-unable-to-fetch-file-name-containing-CJK-letter-tp4003999p4004059.html
Sent from the Nutch - User mailing list archive at Nabble.com.