Hello, We parse a lot of pdf in a server application (for generating a thumbnail thanks to PageDrawer).Sometimes (unlikely) something goes wrong and we get a lot of NPE here : java.lang.NullPointerException: null at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getPath(PDTrueTypeFont.java:470) at org.apache.pdfbox.rendering.TTFGlyph2D.getPathForGID(TTFGlyph2D.java:144) at org.apache.pdfbox.rendering.TTFGlyph2D.getPathForCharacterCode(TTFGlyph2D.java:93) at org.apache.pdfbox.rendering.PageDrawer.drawGlyph2D(PageDrawer.java:495) at org.apache.pdfbox.rendering.PageDrawer.showFontGlyph(PageDrawer.java:476) at org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:787) at org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:805) at org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:743) at org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:606) at org.apache.pdfbox.contentstream.operator.text.ShowText.process(ShowText.java:56) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:514) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:492) at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:277)
PDTrueTypeFont.java:470 = GlyphData glyph = ttf.getGlyph().getGlyph(gid); This is because ttf.getGlyph() returns null : the font has no 'glyf' table ?! This happens with some specific fonts, but I did not check these are always the same. The problem may arise at server startup, although I am not sure. When such a problem arises, we restart server but this is very annoying. I suspected a corrupted font file, but this is not the case ; I tried to load that font file manually with pdfbox and it works fine.Problem may have appeared with jdk8 -> jdk11 or pdfbox-2.0.22 -> 2.0.24 migration. We did a heap dump of JVM in order to see FontCache content. Here is an extract : At first a correct value for font Arial_Bold.ttf : Class Name | Shallow Heap | Retained Heap ----------------------------------------------------------------------------------------------------------------------------------- [61] java.util.concurrent.ConcurrentHashMap$Node @ 0x712b73c50 | 32 | 144 |- <class> class java.util.concurrent.ConcurrentHashMap$Node @ 0x700262200 System Class | 0 | 0 |- key org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo @ 0x70e3e1778 | 56 | 160 | |- <class> class org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo @ 0x70e3da518 | 8 | 280 | |- format org.apache.pdfbox.pdmodel.font.FontFormat @ 0x70e3dacf0 | 24 | 24 | |- parent org.apache.pdfbox.pdmodel.font.FileSystemFontProvider @ 0x70e3dbf48 | 24 | 7 712 | |- postScriptName java.lang.String @ 0x70e3e17b0 Arial-BoldMT | 24 | 56 | |- panose org.apache.pdfbox.pdmodel.font.PDPanoseClassification @ 0x70e3e17e8 | 16 | 48 | |- file java.io.File @ 0x70e3e1818 | 32 | 32 | | |- <class> class java.io.File @ 0x70032e130 System Class | 56 | 14 496 | | |- status java.io.File$PathStatus @ 0x7002669c8 | 24 | 24 | | |- path java.lang.String @ 0x70e3e1838 /usr/share/fonts/truetype/msttcorefonts/Arial_Bold.ttf| 24 | 96 | | '- Total: 3 entries | | | '- Total: 6 entries | | |- next java.util.concurrent.ConcurrentHashMap$Node @ 0x712b73c70 | 32 | 72 |- val java.lang.ref.SoftReference @ 0x7198c9628 | 40 | 40 | |- <class> class java.lang.ref.SoftReference @ 0x70010e340 System Class | 8 | 8 | |- queue java.lang.ref.ReferenceQueue$Null @ 0x70026cb48 | 32 | 48 | |- referent org.apache.fontbox.ttf.TrueTypeFont @ 0x7196732a0 | 40 | 485 816 | | |- <class> class org.apache.fontbox.ttf.TrueTypeFont @ 0x70edd2df0 | 8 | 400 | | |- tables java.util.HashMap @ 0x7196732c8 | 48 | 360 496 | | | |- <class> class java.util.HashMap @ 0x7003279e8 System Class | 40 | 144 | | | |- table java.util.HashMap$Node[32] @ 0x7196732f8 | 144 | 360 448 | | | '- Total: 2 entries | | | | |- postScriptNames java.util.HashMap @ 0x719827048 | 48 | 69 008 | | |- data org.apache.fontbox.ttf.RAFDataStream @ 0x719c0a6d8 | 24 | 24 | | |- enabledGsubFeatures java.util.ArrayList @ 0x719c0a770 | 24 | 24 ----------------------------------------------------------------------------------------------------------------------------------- the tables in bold above is the TrueTypeFont.tables field. It has the following fields :Type |Name |Value --------------------------------------------------------- float|loadFactor|0.75 int |threshold |24 int |modCount |23 int |size |23 ref |entrySet |null ref |table |java.util.HashMap$Node[32] @ 0x7196732f8 ref |values |null ref |keySet |null --------------------------------------------------------- Indeed, I checked that there are 23 tables in font Arial_Bold.ttf But this is another entry in font cache, that triggers the bug : Class Name | Shallow Heap | Retained Heap --------------------------------------------------------------------------------------------------------------------------------------- key org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo @ 0x70e3e19b8 | 56 | 168 |- <class> class org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo @ 0x70e3da518 | 8 | 280 |- format org.apache.pdfbox.pdmodel.font.FontFormat @ 0x70e3dacf0 | 24 | 24 |- parent org.apache.pdfbox.pdmodel.font.FileSystemFontProvider @ 0x70e3dbf48 | 24 | 7 712 |- postScriptName java.lang.String @ 0x70e3e19f0 Arial-BoldItalicMT | 24 | 64 |- panose org.apache.pdfbox.pdmodel.font.PDPanoseClassification @ 0x70e3e1a30 | 16 | 48 |- file java.io.File @ 0x70e3e1a60 | 32 | 32 | |- <class> class java.io.File @ 0x70032e130 System Class | 56 | 14 496 | |- status java.io.File$PathStatus @ 0x7002669c8 | 24 | 24 | |- path java.lang.String @ 0x70e3e1a80 /usr/share/fonts/truetype/msttcorefonts/Arial_Bold_Italic.ttf| 24 | 104 | '- Total: 3 entries | | '- Total: 6 entries | | val java.lang.ref.SoftReference @ 0x71a904010 | 40 | 40 |- <class> class java.lang.ref.SoftReference @ 0x70010e340 System Class | 8 | 8 |- queue java.lang.ref.ReferenceQueue$Null @ 0x70026cb48 | 32 | 48 |- referent org.apache.fontbox.ttf.TrueTypeFont @ 0x71a904038 | 40 | 392 | |- <class> class org.apache.fontbox.ttf.TrueTypeFont @ 0x70edd2df0 | 8 | 400 | |- tables java.util.HashMap @ 0x71a904060 | 48 | 304 | | |- <class> class java.util.HashMap @ 0x7003279e8 System Class | 40 | 144 | | |- table java.util.HashMap$Node[16] @ 0x71a904090 | 80 | 256 | | | |- <class> class java.util.HashMap$Node[] @ 0x70025d338 | 0 | 0 | | | |- [0] java.util.HashMap$Node @ 0x71a9040e0 | 32 | 176 | | | | |- <class> class java.util.HashMap$Node @ 0x70025d2c8 System Class | 8 | 32 | | | | |- key java.lang.String @ 0x71a904100 \u0000\u0000\u0000\u0000 | 24 | 48 | | | | | |- <class> class java.lang.String @ 0x7002a3c00 System Class, JNI Global | 24 | 704 | | | | | |- value byte[4] @ 0x71a904118 .... | 24 | 24 | | | | | | '- <class> class byte[] @ 0x7002a42a8 | 0 | 0 | | | | | '- Total: 2 entries | | | | | | |- value org.apache.fontbox.ttf.TTFTable @ 0x71a904130 | 48 | 96 | | | | | |- <class> class org.apache.fontbox.ttf.TTFTable @ 0x70edd4450 | 0 | 0 | | | | | |- font org.apache.fontbox.ttf.TrueTypeFont @ 0x71a904038 | 40 | 392 | | | | | |- tag java.lang.String @ 0x71a904160 \u0000\u0000\u0000\u0000 | 24 | 48 | | | | | '- Total: 3 entries | | | | | | '- Total: 3 entries | | | | | '- Total: 2 entries | | | | '- Total: 2 entries | | | |- data org.apache.fontbox.ttf.RAFDataStream @ 0x71a904190 | 24 | 24 | |- enabledGsubFeatures java.util.ArrayList @ 0x71a9041a8 | 24 | 24 --------------------------------------------------------------------------------------------------------------------------------------- The tables in bold only contains 1 table: It contains the following fields :Type |Name |Value --------------------------------------------------------- float|loadFactor|0.75 int |threshold |12 int |modCount |1 int |size |1 ref |entrySet |null ref |table |java.util.HashMap$Node[16] @ 0x71a904090 ref |values |null ref |keySet |null --------------------------------------------------------- modCount=1 so I guess put() was called only once ? If we look at that single TTFTable with strange name composed of 4 binary 0 chars (value in bold above) :Type |Name |Value --------------------------------------------------------------------- ref |font |org.apache.fontbox.ttf.TrueTypeFont @ 0x71a904038 boolean|initialized|false long |length |3124 long |offset |22104 long |checkSum |3617481689 ref |tag |\u0000\u0000\u0000\u0000 --------------------------------------------------------------------- It appears that 3124 is the length of the last table in file Arial-Bold_Italic.ttf,and same for offset 22104.This is an binary dump of the beginning of file Arial_BoldItalic.ttf :00000000 00 01 00 00 00 14 01 00 00 04 00 40 44 53 49 47 |...........@DSIG| 00000010 f0 30 30 da 00 03 58 00 00 00 15 b4 4c 54 53 48 |.00...X.....LTSH| 00000020 f1 da 51 07 00 00 41 04 00 00 03 c0 4f 53 2f 32 |..Q...A.....OS/2| 00000030 9d c2 94 0f 00 00 01 c8 00 00 00 56 50 43 4c 54 |...........VPCLT| 00000040 73 c0 41 82 00 03 57 c8 00 00 00 36 56 44 4d 58 |s.A...W....6VDMX| 00000050 55 34 6f 96 00 00 44 c4 00 00 11 94 63 6d 61 70 |U4o...D.....cmap| 00000060 e3 18 5f 9b 00 00 21 2c 00 00 10 e2 63 76 74 20 |.._...!,....cvt | 00000070 4d fc 64 93 00 00 67 f8 00 00 06 9c 66 70 67 6d |M.d...g.....fpgm| 00000080 57 78 09 53 00 00 62 8c 00 00 05 6b 67 61 73 70 |Wx.S..b....kgasp| 00000090 00 12 00 09 00 00 02 20 00 00 00 10 67 6c 79 66 |....... ....glyf| 000000a0 8e 29 b0 1f 00 00 d7 8c 00 02 4f d8 68 64 6d 78 |.)........O.hdmx| 000000b0 3c d8 97 63 00 00 7d 84 00 00 5a 08 68 65 61 64 |<..c..}...Z.head| 000000c0 c4 87 8e 35 00 00 01 4c 00 00 00 36 68 68 65 61 |...5...L...6hhea| 000000d0 0c 52 08 bb 00 00 01 84 00 00 00 24 68 6d 74 78 |.R.........$hmtx| 000000e0 b7 fa fd f4 00 00 6e 94 00 00 0e f0 6b 65 72 6e |......n.....kern| 000000f0 b4 ed b4 bc 00 03 44 7c 00 00 13 4a 6c 6f 63 61 |......D|...Jloca| 00000100 04 db c8 d8 00 00 32 10 00 00 0e f4 6d 61 78 70 |......2.....maxp| 00000110 09 32 0d be 00 00 01 a8 00 00 00 20 6e 61 6d 65 |.2......... name| 00000120 6c 8f 82 f6 00 00 02 30 00 00 1e fc 70 6f 73 74 |l......0....post| 00000130 c4 ff 0a 6f 00 03 27 64 00 00 1d 18 70 72 65 70 |...o..'d....prep| 00000140 d7 9e 63 d9 00 00 56 58 00 00 0c 34 00 01 00 00 |..c...VX...4....| 0x14 = 20 tables ; last one is 'prep'.prep has offset 0x00005658 = 22104 and size 0x00000c34 = 3124 So here I am : by inspecting the code I could not imagine how this can be possible.I suspect a race condition during font loading Any idea ? Regards, MM - pdfbox-2.0.24 - openjdk version "11.0.13" 2021-10-19 OpenJDK Runtime Environment Temurin-11.0.13+8 (build 11.0.13+8) OpenJDK 64-Bit Server VM Temurin-11.0.13+8 (build 11.0.13+8, mixed mode)