Hi Lewis, I would be happy to do that. Let me dig up some docs on Nutch Dev. I am completely new to open source project.
Catch you folks in dev@nutch. Cheers, Ye On Fri, Aug 31, 2012 at 2:27 AM, Lewis John Mcgibbney < [email protected]> wrote: > Hi Ye, > > If you could contribute this to the community as a patch it would be > greatly appreciated. > > If you need any help wit this then please ping us on dev@nutch and we > will be more than happy to help you out. > > Thanks you in advance > > Lewis > > On Thu, Aug 30, 2012 at 2:14 PM, Ye T Thet <[email protected]> wrote: > > Hi Folks, > > > > I solved the issue. I am sharing it here in case if others have similar > > unsolved issue. > > > > It is due to the bug in the protocol-file plugin. FileResponse.java. File > > name is not properly encoded for UTF 8 file name. I changed some code in > > the constructor and one private method called list2html. The change is > the > > combination of the discussion on following tow JIRAs. > > > > https://issues.apache.org/jira/browse/NUTCH-824 > > https://issues.apache.org/jira/browse/NUTCH-968 > > > > It is important to change the code both in constructor and the private > > method. > > > > Cheers, > > > > Ye > > > > > > On Wed, Aug 29, 2012 at 10:52 PM, hugo.ma <[email protected]> > wrote: > > > >> I had a similar problem. My solution was to modify the HTTPREsponse > class > >> in > >> org.apache.nutch.protocol.httpclient. > >> > >> In Constructor i changed the first lines like this: > >> > >> // Prepare GET method for HTTP request > >> this.url = url; > >> URI uri =null; > >> //MODIFIED > >> > >> try { > >> uri = new URI(url.getProtocol(), url.getHost(), url.getPath(), > >> url.getQuery(), null); > >> } catch (Exception e) { > >> // do whatever you want > >> } > >> > >> GetMethod get = new GetMethod(uri.toASCIIString()); > >> > >> //Continue with the original code > >> > >> > >> > >> > >> > >> -- > >> View this message in context: > >> > http://lucene.472066.n3.nabble.com/local-file-system-crawl-unable-to-fetch-file-name-containing-CJK-letter-tp4003999p4004059.html > >> Sent from the Nutch - User mailing list archive at Nabble.com. > >> > > > > -- > Lewis >

