RFR JDK-8244324: RTFEditorKit does not display some of Japanese characters correctly

Prasanta Sadhukhan Thu, 14 May 2020 03:51:26 -0700

Hi All,

Please review a fix for an issue seen whereby RTFEditorKit used to readJapanese characters reads some garbage characters.

The default character set used for the RTF document is set to "ansi" inour RTFReader.java.And share/classes/javax/swing/text/rtf/charsets/ansi.txt code table hasundefined values , i.e., 91-98 and A0 are "0". According tojavax/swing/text/rtf/RTFParser.java, If the ch is 0, handleText() is notcalled


As per http://www.biblioscape.com/rtf15_spec.htm#Heading8,

/RTF file includes the following Character set in its header : //
//<character set> //
// (\ansi | \mac | \pc | \pca)? \ansicpgN? //
//Where, //

//\ansicpgN This keyword represents the default ANSI code page used toperform the *Unicode to ANSI conversion* when writing RTF text. Nrepresents the code page in decimal. This is typically set to thedefault ANSI code page of the run-time environment (for example,\ansicpg1252 for U.S. Windows). The reader can use the same ANSI codepage to convert ANSI text back to Unicode. This keyword should beemitted in the RTF header section right after the \ansi, \mac, \pc or\pca keyword. /

Possible values include those in the following table.We can make use ofansicpgN (can switch ANSI text to Unicode), define it to refer to thelatin1TranslationTable [RTFParser inherits it from AbstractFilter] whichdoes not include undefined areas instead of ansi's translationTablewhich has undefined areas as seen above.


Bug: https://bugs.openjdk.java.net/browse/JDK-8244324

webrev: http://cr.openjdk.java.net/~psadhukhan/8244324/webrev.0/

Note: I am not able to create a testcase for this as it involves readingfrom rtf file which probably is copyrighted and inserting Japanesecharacters as a string (instead of rtf file) was not working.

RFR JDK-8244324: RTFEditorKit does not display some of Japanese characters correctly

Reply via email to