Hi Lewis,

I would be happy to do that. Let me dig up some docs on Nutch Dev. I am
completely new to open source project.

Catch you folks in dev@nutch.

Cheers,

Ye

On Fri, Aug 31, 2012 at 2:27 AM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi Ye,
>
> If you could contribute this to the community as a patch it would be
> greatly appreciated.
>
> If you need any help wit this then please ping us on dev@nutch and we
> will be more than happy to help you out.
>
> Thanks you in advance
>
> Lewis
>
> On Thu, Aug 30, 2012 at 2:14 PM, Ye T Thet <[email protected]> wrote:
> > Hi Folks,
> >
> > I solved the issue. I am sharing it here in case if others have similar
> > unsolved issue.
> >
> > It is due to the bug in the protocol-file plugin. FileResponse.java. File
> > name is not properly encoded for UTF 8 file name. I changed some code in
> > the constructor and one private method called list2html. The change is
> the
> > combination of the discussion on following tow JIRAs.
> >
> > https://issues.apache.org/jira/browse/NUTCH-824
> > https://issues.apache.org/jira/browse/NUTCH-968
> >
> > It is important to change the code both in constructor and the private
> > method.
> >
> > Cheers,
> >
> > Ye
> >
> >
> > On Wed, Aug 29, 2012 at 10:52 PM, hugo.ma <[email protected]>
> wrote:
> >
> >> I had a similar problem. My solution was to modify the HTTPREsponse
> class
> >> in
> >> org.apache.nutch.protocol.httpclient.
> >>
> >> In Constructor i changed the first lines like this:
> >>
> >>  // Prepare GET method for HTTP request
> >>    this.url = url;
> >>    URI uri =null;
> >>      //MODIFIED
> >>
> >>    try {
> >>      uri = new URI(url.getProtocol(), url.getHost(), url.getPath(),
> >> url.getQuery(), null);
> >>    } catch (Exception e) {
> >>    // do whatever you want
> >>   }
> >>
> >>  GetMethod get = new GetMethod(uri.toASCIIString());
> >>
> >> //Continue with the original code
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/local-file-system-crawl-unable-to-fetch-file-name-containing-CJK-letter-tp4003999p4004059.html
> >> Sent from the Nutch - User mailing list archive at Nabble.com.
> >>
>
>
>
> --
> Lewis
>

Reply via email to