Hello, Since pdfbox 3.0.2, we have an OutOfMemoryError when the method PDPageContentStream.showText(String) is used with a TrueType font (PD0FontType) to draw a text containing at least one space character.
Here is an example of code causing the error: File fontFile = new File(ttcUrl.getFile()); TrueTypeCollection ttc = new TrueTypeCollection(fontFile); var regularFont = PDType0Font.load(doc, ttc.getFontByName("Inter-Regular"), true); // Considering cs is a valid PDPageContentStream object. cs.beginText(); cs.newLineAtOffset(10, 500); cs.setFont(regularFont, 12); cs.showText("This an example of sentence containing spaces"); // <--- The error is thrown here. cs.endText(); Here is the stack trace: java.lang.OutOfMemoryError: Java heap space at java.base/java.lang.StringLatin1.newString(StringLatin1.java:752) at java.base/java.lang.String.substring(String.java:2839) at java.base/java.lang.String.subSequence(String.java:2872) at java.base/java.util.regex.Matcher.getSubSequence(Matcher.java:1819) at java.base/java.util.regex.Matcher.group(Matcher.java:691) at java.base/java.util.regex.Matcher.group(Matcher.java:645) at org.apache.fontbox.ttf.gsub.CompoundCharacterTokenizer.tokenize(CompoundCharacterTokenizer.java:108) at org.apache.pdfbox.pdmodel.PDAbstractContentStream.encodeForGsub(PDAbstractContentStream.java:1621) at org.apache.pdfbox.pdmodel.PDAbstractContentStream.showTextInternal(PDAbstractContentStream.java:303) at org.apache.pdfbox.pdmodel.PDAbstractContentStream.showText(PDAbstractContentStream.java:267) at org.apache.pdfbox.pdmodel.PDPageContentStream.showText(PDPageContentStream.java:37) [...] After investigation, it comes from the changes introduced by the resolution of https://issues.apache.org/jira/browse/PDFBOX-5808 in the method org.apache.fontbox.ttf.gsub.CompoundCharacterTokenizer.tokenize(String), especially those lines: https://github.com/apache/pdfbox/blob/257ae934e676ab8177797f8b265213df3ffbd54c/fontbox/src/main/java/org/apache/fontbox/ttf/gsub/CompoundCharacterTokenizer.java#L113-L118 Maybe not decreasing the value of lastIndexOfPrevMatch when the CompoundCharacterTokenizer has been initialized via the constructor CompoundCharacterTokenizer(Pattern pattern) could be a solution to avoid an infinite loop when the pattern "\s" is used to instantiate CompoundCharacterTokenizer as it's the case in method PDAbstractContentStream.encodeForGsub(). Indeed, in these conditions, the condition `lastIndexOfPrevMatch < text.length() && text.charAt(lastIndexOfPrevMatch) != '_'` is never met and causes the infinite loop. Thanks, Maxime Wiewiora