Thanks for the hint. Will try the LANG setting later... But why is Nutch not able to identify the encoding when everything's set to utf-8 in all header and meta data of the page?
Miguel Costa wrote: > > Nutch uses the default LANG set in your machine if it can not identify the > document encoding. > I can only resolve this by updating the /etc/sysconfig/i18n file for the > default LANG in all machines of the hadoop cluster. > export LANG=... doesn't work also. > -- View this message in context: http://www.nabble.com/Problems-with-encoding-%28UTF-8%29%2C-display-of-search-results-with-special-characters-tp16954447p16974586.html Sent from the Nutch - User mailing list archive at Nabble.com.
