[ https://issues.apache.org/jira/browse/NUTCH-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney resolved NUTCH-625. ---------------------------------------- Resolution: Won't Fix as per Dogacan's comments > Non-ascii character broken in dumped content for mixed encoding (utf-8 and > multi-byte) > -------------------------------------------------------------------------------------- > > Key: NUTCH-625 > URL: https://issues.apache.org/jira/browse/NUTCH-625 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.0.0 > Reporter: Vinci > Priority: Minor > > If the crawl db contains both utf-8 non-ascii character and non-utf-8 > non-ascii character(i.e. multi-byte character), the dumped contents by > readseg utility will have garbled character appear in all of the non-utf8 > non-ascii text, and those texts are unable to repair by encoding reload. > At the same time, the utf-8 text is normal, only the non-utf8 text broken. > Any possible solution available for repairing the broken text? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira