Hi Ye,

If you could contribute this to the community as a patch it would be
greatly appreciated.

If you need any help wit this then please ping us on dev@nutch and we
will be more than happy to help you out.

Thanks you in advance

Lewis

On Thu, Aug 30, 2012 at 2:14 PM, Ye T Thet <[email protected]> wrote:
> Hi Folks,
>
> I solved the issue. I am sharing it here in case if others have similar
> unsolved issue.
>
> It is due to the bug in the protocol-file plugin. FileResponse.java. File
> name is not properly encoded for UTF 8 file name. I changed some code in
> the constructor and one private method called list2html. The change is the
> combination of the discussion on following tow JIRAs.
>
> https://issues.apache.org/jira/browse/NUTCH-824
> https://issues.apache.org/jira/browse/NUTCH-968
>
> It is important to change the code both in constructor and the private
> method.
>
> Cheers,
>
> Ye
>
>
> On Wed, Aug 29, 2012 at 10:52 PM, hugo.ma <[email protected]> wrote:
>
>> I had a similar problem. My solution was to modify the HTTPREsponse class
>> in
>> org.apache.nutch.protocol.httpclient.
>>
>> In Constructor i changed the first lines like this:
>>
>>  // Prepare GET method for HTTP request
>>    this.url = url;
>>    URI uri =null;
>>      //MODIFIED
>>
>>    try {
>>      uri = new URI(url.getProtocol(), url.getHost(), url.getPath(),
>> url.getQuery(), null);
>>    } catch (Exception e) {
>>    // do whatever you want
>>   }
>>
>>  GetMethod get = new GetMethod(uri.toASCIIString());
>>
>> //Continue with the original code
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/local-file-system-crawl-unable-to-fetch-file-name-containing-CJK-letter-tp4003999p4004059.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>



-- 
Lewis

Reply via email to