Thanks for the hint. Will try the LANG setting later...

But why is Nutch not able to identify the encoding when everything's set to
utf-8 in all header and meta data of the page?




Miguel Costa wrote:
> 
> Nutch uses the default LANG set in your machine if it can not identify the
> document encoding. 
> I can only resolve this by updating the /etc/sysconfig/i18n file for the
> default LANG in all machines of the hadoop cluster. 
> export LANG=... doesn't work also.
> 

-- 
View this message in context: 
http://www.nabble.com/Problems-with-encoding-%28UTF-8%29%2C-display-of-search-results-with-special-characters-tp16954447p16974586.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to