Hi,

Additional suggestions.

> throw new IllegalStateException(
> "could not find the glyphId for the character: " + codePoint);

This part, before the fix, was outputting the character that caused the error.
After the fix, however, the code point value was output, making it
difficult to understand the cause.
Therefore, we made a change to get the actual character from the code
point and output it.

I also created a test (assumed to be added to TestFontEmbedding.java).
LiberationSans-Regular.ttf does not contain Japanese characters, and
we are checking for exceptions and output of expected messages.


"あ" -> Character.isBmpCodePoint() == true
"𩸽" -> Character.isValidCodePoint() == true


**** update code  PDAbstractContentStream.java  applyGSUBRules ****

            int glyphId = cmapLookup.getGlyphId(codePoint);
            if (glyphId <= 0)
            {
                String source;
                if (Character.isBmpCodePoint(codePoint))
                {
                   source = String.valueOf((char) codePoint);
                }
                else if (Character.isValidCodePoint(codePoint))
                {
                   source = new String(new int[]{codePoint},0,1);
                }
                else
                {
                    source = "?";
                }
                throw new IllegalStateException(
                        "could not find the glyphId for the character:
" + source);
            }
            originalGlyphIds.add(glyphId);


**** Unit Test ****

    @Test
    void testSurrogatePairCharacterExceptionIsBmpCodePoint() throws IOException
    {
        final String message = "あ";

        try (PDDocument doc = new PDDocument())
        {
            PDPage page = new PDPage();
            doc.addPage(page);
            PDFont font = PDType0Font.load(doc,
this.getClass().getResourceAsStream("/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf"));

            try (PDPageContentStream contents = new
PDPageContentStream(doc, page))
            {
                contents.beginText();
                contents.setFont(font, 64);
                contents.newLineAtOffset(100, 700);
                contents.showText(message);
                contents.endText();
            }

            fail();
        }
        catch (IllegalStateException e)
        {
            assertEquals("could not find the glyphId for the
character: あ", e.getMessage());
        }
        catch (Exception e)
        {
            fail();
        }
    }

    @Test
    void testSurrogatePairCharacterExceptionIsValidCodePoint() throws
IOException
    {
        final String message = "𩸽";
        try (PDDocument doc = new PDDocument())
        {
            PDPage page = new PDPage();
            doc.addPage(page);
            PDFont font = PDType0Font.load(doc,
this.getClass().getResourceAsStream("/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf"));

            try (PDPageContentStream contents = new
PDPageContentStream(doc, page))
            {
                contents.beginText();
                contents.setFont(font, 64);
                contents.newLineAtOffset(100, 700);
                contents.showText(message);
                contents.endText();
            }

            fail();
        }
        catch (IllegalStateException e)
        {
            assertEquals("could not find the glyphId for the
character: 𩸽" ,e.getMessage());
        }
        catch (Exception e)
        {
            fail();
        }
    }

2024年5月5日(日) 18:00 Toshiaki Ito <evolut...@1024kb.cx>:
>
> Hi, Tilman.
>
> I used the snapshot "3.0.3-20240505.072852-59" and got the expected results!
> I also tried a few other Kanji characters besides "𩸽" and none of
> them had any problems!
>
> I am glad I could contribute :)
>
> 2024年5月5日(日) 16:32 Tilman Hausherr <thaush...@t-online.de>:
> >
> > Hello Toshiaki,
> >
> > It's been committed and available as a snapshot:
> > https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.3-SNAPSHOT/
> >
> > I've also added a test for the 2.0 version to avoid we break this in the
> > future.
> >
> > Thanks again
> > Tilman
> >
> > On 04.05.2024 22:06, Toshiaki Ito wrote:
> > > Hi, Tilman.
> > >
> > > Thank you for checking and correcting the attached code.
> > > I look forward to waiting for it to be committed!
> > >
> > > 2024年5月5日(日) 2:05 Tilman Hausherr<thaush...@t-online.de>:
> > >> Hello,
> > >>
> > >> I can confirm that your proposed change works, it also passes the
> > >> "private" tests that aren't in the repository. Thank you so much in
> > >> solving this! I'll commit these soon (probably tomorrow) and will report
> > >> it here. Another (smaller) good news is that one of the fonts we use for
> > >> tests (ipafont) has the glyph, I have prepared a small test also based
> > >> on your code.
> > >>
> > >> Tilman
> > >>
> > >> On 04.05.2024 16:39, Tilman Hausherr wrote:
> > >>> On 04.05.2024 15:21, Toshiaki Ito wrote:
> > >>>> By the way, with pdbox 2.0.31, the same code produces the expected
> > >>>> output.
> > >>> Ouch, I can confirm that. I have created a new ticket:
> > >>>
> > >>> https://issues.apache.org/jira/browse/PDFBOX-5812
> > >>>
> > >>> Tilman
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail:users-unsubscr...@pdfbox.apache.org
> > >> For additional commands, e-mail:users-h...@pdfbox.apache.org
> > >>
> > >
>
>
>
> --
> Toshiaki Ito
> Mail: evolut...@1024kb.cx



-- 
Toshiaki Ito
Mail: evolut...@1024kb.cx

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to