Wrong charset conversion in some RTF documents. -----------------------------------------------
Key: TIKA-422 URL: https://issues.apache.org/jira/browse/TIKA-422 Project: Tika Issue Type: Bug Components: parser Affects Versions: 0.7 Reporter: Piotr B. RTF parser uses javax.swing.text.rtf, but it sucks. It doesn't support '\ansicpg' tag (cite from RTF file format specification: "This keyword represents the default ANSI code page used to perform the Unicode to ANSI conversion when writing RTF text"). Unfortunately Windows WordPad saves nonascii characters using \ansicpg instead of supported by javax.swing.text.rtf unicode characters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.