Hi all, I am trying to dump the content by the segment reader(bin/nutch -dump). The output text contain 2 encoding, utf-8 and a multi-byte character-encoding. When I open the dump page, I found the multi-byte encoding is broken - even I convert to the correct encoding, the text displayed is broken. How can I fix the text?
Thank you. -- View this message in context: http://www.nabble.com/Broken-crawled-content--tp16246942p16246942.html Sent from the Nutch - User mailing list archive at Nabble.com.
