I'm attempting to crawl pages with charset utf-16 and send the index to solr
where it can be searched.  I followed the instructions 
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ here  and
successfully crawled and searched test content with utf-8. However, when I
attempt to crawl the utf-16 content it gets sent to solr as japanese
characters. The pages encoded as utf-16 contain only english text, no
special characters. Is there anyway to force nutch to crawl the page as
utf-8 and ignore the utf-16 setting?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Nutch-crawler-charset-issues-utf-16-tp25981513p25981513.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to