Hi Ye, If you could contribute this to the community as a patch it would be greatly appreciated.
If you need any help wit this then please ping us on dev@nutch and we will be more than happy to help you out. Thanks you in advance Lewis On Thu, Aug 30, 2012 at 2:14 PM, Ye T Thet <[email protected]> wrote: > Hi Folks, > > I solved the issue. I am sharing it here in case if others have similar > unsolved issue. > > It is due to the bug in the protocol-file plugin. FileResponse.java. File > name is not properly encoded for UTF 8 file name. I changed some code in > the constructor and one private method called list2html. The change is the > combination of the discussion on following tow JIRAs. > > https://issues.apache.org/jira/browse/NUTCH-824 > https://issues.apache.org/jira/browse/NUTCH-968 > > It is important to change the code both in constructor and the private > method. > > Cheers, > > Ye > > > On Wed, Aug 29, 2012 at 10:52 PM, hugo.ma <[email protected]> wrote: > >> I had a similar problem. My solution was to modify the HTTPREsponse class >> in >> org.apache.nutch.protocol.httpclient. >> >> In Constructor i changed the first lines like this: >> >> // Prepare GET method for HTTP request >> this.url = url; >> URI uri =null; >> //MODIFIED >> >> try { >> uri = new URI(url.getProtocol(), url.getHost(), url.getPath(), >> url.getQuery(), null); >> } catch (Exception e) { >> // do whatever you want >> } >> >> GetMethod get = new GetMethod(uri.toASCIIString()); >> >> //Continue with the original code >> >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/local-file-system-crawl-unable-to-fetch-file-name-containing-CJK-letter-tp4003999p4004059.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> -- Lewis

