Hi,
I am having a problem when attempting to output a string containing Unicode
characters. If the Unicode sequence corresponds to single byte character
(e.g., a Registered Trademark symbol, U+00AE), the character is output
correctly. However, if the character is a 2-byte value (e.g., Trademark
character(TM), U+2122), the string is generated as UTF-16BE as expected, but
the output file is drawn with the FE and FF BOM characters and the 21, 22
characters as single byte characters.
Is there something that I need to initialize to properly handle the UTF-16
characters (the most likely solution)? Is it a bug in PDFBox? Is it a quirk
in Reader X (least likely since I have seen the TM character being displayed
correctly in other documents)?
Any help and pointers on how to deal with this problem will be greatly
appreciated.
I am using PDFBox 1.8.2 and Adobe Reader X (Version 10.1.8) and here is a
simple program to demonstrate the problem:
package example;
import java.io.*;
import org.apache.pdfbox.exceptions.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.edit.*;
import org.apache.pdfbox.pdmodel.font.*;
public class PDFUnicodeExample
{
public static void main(String[] args)
{
PDDocument document = null;
try
{
document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream cs = new PDPageContentStream(document, page);
PDFont font = PDType1Font.HELVETICA;
cs.beginText();
cs.setFont(font, 16.0f);
cs.moveTextPositionByAmount(100, 700);
cs.drawString("Reg TM \u00AE ");
cs.endText();
cs.beginText();
cs.setFont(font, 16.0f);
cs.moveTextPositionByAmount(100, 680);
cs.drawString("TM \u2122 ");
cs.endText();
cs.close();
document.save("Unicode Example.pdf");
}
catch (IOException e)
{
e.printStackTrace();
}
catch (COSVisitorException e)
{
e.printStackTrace();
}
}
}
Best regards,
--Glenn Karcher
This communication, including any attachments, may contain information that is
proprietary, privileged, confidential or legally exempt from disclosure. If you
are not a named addressee, you are hereby notified that you are not authorized
to read, print, retain a copy of or disseminate any portion of this
communication without the consent of the sender and that doing so may be
unlawful. If you have received this communication in error, please immediately
notify the sender via return e-mail and delete it from your system.