Hi, Additional suggestions.
> throw new IllegalStateException( > "could not find the glyphId for the character: " + codePoint); This part, before the fix, was outputting the character that caused the error. After the fix, however, the code point value was output, making it difficult to understand the cause. Therefore, we made a change to get the actual character from the code point and output it. I also created a test (assumed to be added to TestFontEmbedding.java). LiberationSans-Regular.ttf does not contain Japanese characters, and we are checking for exceptions and output of expected messages. "あ" -> Character.isBmpCodePoint() == true "𩸽" -> Character.isValidCodePoint() == true **** update code PDAbstractContentStream.java applyGSUBRules **** int glyphId = cmapLookup.getGlyphId(codePoint); if (glyphId <= 0) { String source; if (Character.isBmpCodePoint(codePoint)) { source = String.valueOf((char) codePoint); } else if (Character.isValidCodePoint(codePoint)) { source = new String(new int[]{codePoint},0,1); } else { source = "?"; } throw new IllegalStateException( "could not find the glyphId for the character: " + source); } originalGlyphIds.add(glyphId); **** Unit Test **** @Test void testSurrogatePairCharacterExceptionIsBmpCodePoint() throws IOException { final String message = "あ"; try (PDDocument doc = new PDDocument()) { PDPage page = new PDPage(); doc.addPage(page); PDFont font = PDType0Font.load(doc, this.getClass().getResourceAsStream("/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf")); try (PDPageContentStream contents = new PDPageContentStream(doc, page)) { contents.beginText(); contents.setFont(font, 64); contents.newLineAtOffset(100, 700); contents.showText(message); contents.endText(); } fail(); } catch (IllegalStateException e) { assertEquals("could not find the glyphId for the character: あ", e.getMessage()); } catch (Exception e) { fail(); } } @Test void testSurrogatePairCharacterExceptionIsValidCodePoint() throws IOException { final String message = "𩸽"; try (PDDocument doc = new PDDocument()) { PDPage page = new PDPage(); doc.addPage(page); PDFont font = PDType0Font.load(doc, this.getClass().getResourceAsStream("/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf")); try (PDPageContentStream contents = new PDPageContentStream(doc, page)) { contents.beginText(); contents.setFont(font, 64); contents.newLineAtOffset(100, 700); contents.showText(message); contents.endText(); } fail(); } catch (IllegalStateException e) { assertEquals("could not find the glyphId for the character: 𩸽" ,e.getMessage()); } catch (Exception e) { fail(); } } 2024年5月5日(日) 18:00 Toshiaki Ito <evolut...@1024kb.cx>: > > Hi, Tilman. > > I used the snapshot "3.0.3-20240505.072852-59" and got the expected results! > I also tried a few other Kanji characters besides "𩸽" and none of > them had any problems! > > I am glad I could contribute :) > > 2024年5月5日(日) 16:32 Tilman Hausherr <thaush...@t-online.de>: > > > > Hello Toshiaki, > > > > It's been committed and available as a snapshot: > > https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.3-SNAPSHOT/ > > > > I've also added a test for the 2.0 version to avoid we break this in the > > future. > > > > Thanks again > > Tilman > > > > On 04.05.2024 22:06, Toshiaki Ito wrote: > > > Hi, Tilman. > > > > > > Thank you for checking and correcting the attached code. > > > I look forward to waiting for it to be committed! > > > > > > 2024年5月5日(日) 2:05 Tilman Hausherr<thaush...@t-online.de>: > > >> Hello, > > >> > > >> I can confirm that your proposed change works, it also passes the > > >> "private" tests that aren't in the repository. Thank you so much in > > >> solving this! I'll commit these soon (probably tomorrow) and will report > > >> it here. Another (smaller) good news is that one of the fonts we use for > > >> tests (ipafont) has the glyph, I have prepared a small test also based > > >> on your code. > > >> > > >> Tilman > > >> > > >> On 04.05.2024 16:39, Tilman Hausherr wrote: > > >>> On 04.05.2024 15:21, Toshiaki Ito wrote: > > >>>> By the way, with pdbox 2.0.31, the same code produces the expected > > >>>> output. > > >>> Ouch, I can confirm that. I have created a new ticket: > > >>> > > >>> https://issues.apache.org/jira/browse/PDFBOX-5812 > > >>> > > >>> Tilman > > >>> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail:users-unsubscr...@pdfbox.apache.org > > >> For additional commands, e-mail:users-h...@pdfbox.apache.org > > >> > > > > > > > -- > Toshiaki Ito > Mail: evolut...@1024kb.cx -- Toshiaki Ito Mail: evolut...@1024kb.cx --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org