Hello all,
We are currently working on a project that transforms XML into PDF - a bit
like FOP, but with specific requirements for a client. One of these
requirements is the ability to create PDF files containing asian characters,
like chinese for example. This has turned out to be somewhat harder than
expected.
The following is a simple example of what we want to achieve:
PDTrueTypeFont font = PDTrueTypeFont.loadTTF(pdDocument, new
File(System.getProperty("user.home")+"/temp/fonts/ARIALUNI.TTF"));
PDPageContentStream contentStream = new
PDPageContentStream(pdDocument, pdPage, false, false);
try {
contentStream.beginText();
contentStream.setFont(font, 12);
contentStream.moveTextPositionByAmount(100, 480);
String s = "Here are some chinese characters: 官话";
contentStream.drawString(s);
contentStream.endText();
} finally {
contentStream.close();
}
However, this does not work for a number of reasons. First of all, the
resulting PDF creates a font descriptor with WinAnsiEncoding, which is not
suitable for chinese characters. This is somewhat fixable (not entirely
though):
font.setEncoding(new StandardEncoding()); // Identity-H would
probably be more accurate, not sure?
It still does not work, because the PDF file lacks a CIDSystemInfo stream
that contains mapping information between UCS and glyphs (from what I
understand). Also it seems to be generating a character width table only for
the first 256 characters (may be wrong there, it wasn't too clear from the
code what was going on :-)
I'm somewhat stuck at this point, because of the complexity of the problem.
How do I get PDFBox to generate the CIDSystemInfo stream? And if it is not
supported yet, what would be the way to do this?
We are prepared to invest some time in this, so if any of the developers
have time to nudge us in the right direction it would be greatly
appreciated.
Kind regards,
Michael Berg