Wrong charset conversion in some RTF documents.
-----------------------------------------------

                 Key: TIKA-422
                 URL: https://issues.apache.org/jira/browse/TIKA-422
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.7
            Reporter: Piotr B.


RTF parser uses javax.swing.text.rtf, but it sucks.

It doesn't support '\ansicpg' tag (cite from RTF file format specification:
"This keyword represents the default ANSI code page used to perform the Unicode 
to ANSI conversion when writing RTF text").

Unfortunately Windows WordPad saves nonascii characters using \ansicpg instead 
of supported by javax.swing.text.rtf unicode characters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to