I had a similar problem. My solution was to modify the HTTPREsponse class in
org.apache.nutch.protocol.httpclient.

In Constructor i changed the first lines like this:

 // Prepare GET method for HTTP request
   this.url = url;
   URI uri =null;
     //MODIFIED  

   try {
     uri = new URI(url.getProtocol(), url.getHost(), url.getPath(),
url.getQuery(), null);   
   } catch (Exception e) {
   // do whatever you want
  } 

 GetMethod get = new GetMethod(uri.toASCIIString());

//Continue with the original code





--
View this message in context: 
http://lucene.472066.n3.nabble.com/local-file-system-crawl-unable-to-fetch-file-name-containing-CJK-letter-tp4003999p4004059.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to