Hi,

In pdfbox 3.0, an IllegalStateException occurs when trying to output
surrogate pair characters.
According to the exception, it seems that one Kanji character is
processed as two chars.

Is this a bug?
Is there any possible workaround on the program side?


**** Conditions ****
JDK: 21
PDFBox: 3.0.0 / 3.0.1 / 3.0.2
Font: Noto Sans Japanese (https://fonts.google.com/noto/specimen/Noto+Sans+JP)
Font and glyph preview :
https://fonts.google.com/noto/specimen/Noto+Sans+JP?preview.text=%F0%A9%B8%BD

**** Test code ****
  public static void main(String[] args) throws IOException {

    final String fontPath = "NotoSansJP-Regular.ttf";
    final String out = "output.pdf";

    // Atka Mackerel in Japanese kanji. (surrogate pair)
    final String message = "\uD867\uDE3D";

    try (PDDocument doc = new PDDocument()) {
      PDPage page = new PDPage();
      doc.addPage(page);
      PDFont font = PDType0Font.load(doc, new File(fontPath));

      try (PDPageContentStream contents = new PDPageContentStream(doc, page)) {
        contents.beginText();
        contents.setFont(font, 64);
        contents.newLineAtOffset(100, 700);
        contents.showText(message);
        contents.endText();
      }

      doc.save(out);
      System.out.println(out + " created!");
    }
  }


**** StackTrace ****
Exception in thread "main" java.lang.IllegalStateException: could not
find the glyphId for the character: ?
    at 
org.apache.pdfbox.pdmodel.PDAbstractContentStream.applyGSUBRules(PDAbstractContentStream.java:1651)
    at 
org.apache.pdfbox.pdmodel.PDAbstractContentStream.encodeForGsub(PDAbstractContentStream.java:1632)
    at 
org.apache.pdfbox.pdmodel.PDAbstractContentStream.showTextInternal(PDAbstractContentStream.java:302)
    at 
org.apache.pdfbox.pdmodel.PDAbstractContentStream.showText(PDAbstractContentStream.java:266)
    at 
org.apache.pdfbox.pdmodel.PDPageContentStream.showText(PDPageContentStream.java:37)
    at org.example.App.main(App.java:30)



My English isn't so good so feel free to ask me if there is anything unclear.

--
Toshiaki Ito
Mail: evolut...@1024kb.cx

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to