Hello,
We parse a lot of pdf in a server application (for generating a thumbnail 
thanks to PageDrawer).Sometimes (unlikely) something goes wrong and we get a 
lot of NPE here :
java.lang.NullPointerException: null
 at 
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getPath(PDTrueTypeFont.java:470)
 at org.apache.pdfbox.rendering.TTFGlyph2D.getPathForGID(TTFGlyph2D.java:144)
 at 
org.apache.pdfbox.rendering.TTFGlyph2D.getPathForCharacterCode(TTFGlyph2D.java:93)
 at org.apache.pdfbox.rendering.PageDrawer.drawGlyph2D(PageDrawer.java:495)
 at org.apache.pdfbox.rendering.PageDrawer.showFontGlyph(PageDrawer.java:476)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:787)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:805)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:743)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:606)
 at 
org.apache.pdfbox.contentstream.operator.text.ShowText.process(ShowText.java:56)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:514)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:492)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155)
 at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:277)

PDTrueTypeFont.java:470 = GlyphData glyph = ttf.getGlyph().getGlyph(gid);

This is because ttf.getGlyph() returns null : the font has no 'glyf' table ?!
This happens with some specific fonts, but I did not check these are always the 
same.
The problem may arise at server startup, although I am not sure.
When such a problem arises, we restart server but this is very annoying.

I suspected a corrupted font file, but this is not the case ; I tried to load 
that font file manually with pdfbox and it works fine.Problem may have appeared 
with jdk8 -> jdk11 or pdfbox-2.0.22 -> 2.0.24 migration.


We did a heap dump of JVM in order to see FontCache content. Here is an extract 
:
At first a correct value for font Arial_Bold.ttf :

Class Name                                                                      
                    | Shallow Heap | Retained Heap
-----------------------------------------------------------------------------------------------------------------------------------
[61] java.util.concurrent.ConcurrentHashMap$Node @ 0x712b73c50                  
                    |           32 |           144
|- <class> class java.util.concurrent.ConcurrentHashMap$Node @ 0x700262200 
System Class             |            0 |             0
|- key org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo @ 
0x70e3e1778               |           56 |           160
|  |- <class> class 
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo @ 0x70e3da518  
|            8 |           280
|  |- format org.apache.pdfbox.pdmodel.font.FontFormat @ 0x70e3dacf0            
                    |           24 |            24
|  |- parent org.apache.pdfbox.pdmodel.font.FileSystemFontProvider @ 
0x70e3dbf48                    |           24 |         7 712
|  |- postScriptName java.lang.String @ 0x70e3e17b0  Arial-BoldMT               
                    |           24 |            56
|  |- panose org.apache.pdfbox.pdmodel.font.PDPanoseClassification @ 
0x70e3e17e8                    |           16 |            48
|  |- file java.io.File @ 0x70e3e1818                                           
                    |           32 |            32
|  |  |- <class> class java.io.File @ 0x70032e130 System Class                  
                    |           56 |        14 496
|  |  |- status java.io.File$PathStatus @ 0x7002669c8                           
                    |           24 |            24
|  |  |- path java.lang.String @ 0x70e3e1838  
/usr/share/fonts/truetype/msttcorefonts/Arial_Bold.ttf|           24 |          
  96
|  |  '- Total: 3 entries                                                       
                    |              |              
|  '- Total: 6 entries                                                          
                    |              |              
|- next java.util.concurrent.ConcurrentHashMap$Node @ 0x712b73c70               
                    |           32 |            72
|- val java.lang.ref.SoftReference @ 0x7198c9628                                
                    |           40 |            40
|  |- <class> class java.lang.ref.SoftReference @ 0x70010e340 System Class      
                    |            8 |             8
|  |- queue java.lang.ref.ReferenceQueue$Null @ 0x70026cb48                     
                    |           32 |            48
|  |- referent org.apache.fontbox.ttf.TrueTypeFont @ 0x7196732a0                
                    |           40 |       485 816
|  |  |- <class> class org.apache.fontbox.ttf.TrueTypeFont @ 0x70edd2df0        
                    |            8 |           400
|  |  |- tables java.util.HashMap @ 0x7196732c8                                 
                    |           48 |       360 496
|  |  |  |- <class> class java.util.HashMap @ 0x7003279e8 System Class          
                    |           40 |           144
|  |  |  |- table java.util.HashMap$Node[32] @ 0x7196732f8                      
                    |          144 |       360 448
|  |  |  '- Total: 2 entries                                                    
                    |              |              
|  |  |- postScriptNames java.util.HashMap @ 0x719827048                        
                    |           48 |        69 008
|  |  |- data org.apache.fontbox.ttf.RAFDataStream @ 0x719c0a6d8                
                    |           24 |            24
|  |  |- enabledGsubFeatures java.util.ArrayList @ 0x719c0a770                  
                    |           24 |            24
-----------------------------------------------------------------------------------------------------------------------------------

the tables in bold above is the TrueTypeFont.tables field. It has the following 
fields :Type |Name      |Value
---------------------------------------------------------
float|loadFactor|0.75
int  |threshold |24
int  |modCount  |23
int  |size      |23
ref  |entrySet  |null
ref  |table     |java.util.HashMap$Node[32] @ 0x7196732f8
ref  |values    |null
ref  |keySet    |null
---------------------------------------------------------

Indeed, I checked that there are 23 tables in font Arial_Bold.ttf

But this is another entry in font cache, that triggers the bug :
Class Name                                                                      
                        | Shallow Heap | Retained Heap
---------------------------------------------------------------------------------------------------------------------------------------
key org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo @ 
0x70e3e19b8                      |           56 |           168
|- <class> class 
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo @ 0x70e3da518  
       |            8 |           280
|- format org.apache.pdfbox.pdmodel.font.FontFormat @ 0x70e3dacf0               
                        |           24 |            24
|- parent org.apache.pdfbox.pdmodel.font.FileSystemFontProvider @ 0x70e3dbf48   
                        |           24 |         7 712
|- postScriptName java.lang.String @ 0x70e3e19f0  Arial-BoldItalicMT            
                        |           24 |            64
|- panose org.apache.pdfbox.pdmodel.font.PDPanoseClassification @ 0x70e3e1a30   
                        |           16 |            48
|- file java.io.File @ 0x70e3e1a60                                              
                        |           32 |            32
|  |- <class> class java.io.File @ 0x70032e130 System Class                     
                        |           56 |        14 496
|  |- status java.io.File$PathStatus @ 0x7002669c8                              
                        |           24 |            24
|  |- path java.lang.String @ 0x70e3e1a80  
/usr/share/fonts/truetype/msttcorefonts/Arial_Bold_Italic.ttf|           24 |   
        104
|  '- Total: 3 entries                                                          
                        |              |              
'- Total: 6 entries                                                             
                        |              |              
val java.lang.ref.SoftReference @ 0x71a904010                                   
                        |           40 |            40
|- <class> class java.lang.ref.SoftReference @ 0x70010e340 System Class         
                        |            8 |             8
|- queue java.lang.ref.ReferenceQueue$Null @ 0x70026cb48                        
                        |           32 |            48
|- referent org.apache.fontbox.ttf.TrueTypeFont @ 0x71a904038                   
                        |           40 |           392
|  |- <class> class org.apache.fontbox.ttf.TrueTypeFont @ 0x70edd2df0           
                        |            8 |           400
|  |- tables java.util.HashMap @ 0x71a904060                                    
                        |           48 |           304
|  |  |- <class> class java.util.HashMap @ 0x7003279e8 System Class             
                        |           40 |           144
|  |  |- table java.util.HashMap$Node[16] @ 0x71a904090                         
                        |           80 |           256
|  |  |  |- <class> class java.util.HashMap$Node[] @ 0x70025d338                
                        |            0 |             0
|  |  |  |- [0] java.util.HashMap$Node @ 0x71a9040e0                            
                        |           32 |           176
|  |  |  |  |- <class> class java.util.HashMap$Node @ 0x70025d2c8 System Class  
                        |            8 |            32
|  |  |  |  |- key java.lang.String @ 0x71a904100  \u0000\u0000\u0000\u0000     
                        |           24 |            48
|  |  |  |  |  |- <class> class java.lang.String @ 0x7002a3c00 System Class, 
JNI Global                 |           24 |           704
|  |  |  |  |  |- value byte[4] @ 0x71a904118  ....                             
                        |           24 |            24
|  |  |  |  |  |  '- <class> class byte[] @ 0x7002a42a8                         
                        |            0 |             0
|  |  |  |  |  '- Total: 2 entries                                              
                        |              |              
|  |  |  |  |- value org.apache.fontbox.ttf.TTFTable @ 0x71a904130              
                        |           48 |            96
|  |  |  |  |  |- <class> class org.apache.fontbox.ttf.TTFTable @ 0x70edd4450   
                        |            0 |             0
|  |  |  |  |  |- font org.apache.fontbox.ttf.TrueTypeFont @ 0x71a904038        
                        |           40 |           392
|  |  |  |  |  |- tag java.lang.String @ 0x71a904160  \u0000\u0000\u0000\u0000  
                        |           24 |            48
|  |  |  |  |  '- Total: 3 entries                                              
                        |              |              
|  |  |  |  '- Total: 3 entries                                                 
                        |              |              
|  |  |  '- Total: 2 entries                                                    
                        |              |              
|  |  '- Total: 2 entries                                                       
                        |              |              
|  |- data org.apache.fontbox.ttf.RAFDataStream @ 0x71a904190                   
                        |           24 |            24
|  |- enabledGsubFeatures java.util.ArrayList @ 0x71a9041a8                     
                        |           24 |            24
---------------------------------------------------------------------------------------------------------------------------------------

The tables in bold only contains 1 table:
It contains the following fields :Type |Name      |Value
---------------------------------------------------------
float|loadFactor|0.75
int  |threshold |12
int  |modCount  |1
int  |size      |1
ref  |entrySet  |null
ref  |table     |java.util.HashMap$Node[16] @ 0x71a904090
ref  |values    |null
ref  |keySet    |null
---------------------------------------------------------

modCount=1 so I guess put() was called only once ?
If we look at that single TTFTable with strange name composed of 4 binary 0 
chars (value in bold above) :Type   |Name       |Value
---------------------------------------------------------------------
ref    |font       |org.apache.fontbox.ttf.TrueTypeFont @ 0x71a904038
boolean|initialized|false
long   |length     |3124
long   |offset     |22104
long   |checkSum   |3617481689
ref    |tag        |\u0000\u0000\u0000\u0000
---------------------------------------------------------------------

It appears that 3124 is the length of the last table in file  
Arial-Bold_Italic.ttf,and same for offset 22104.This is an binary dump of the 
beginning of file  Arial_BoldItalic.ttf :00000000  00 01 00 00 00 14 01 00  00 
04 00 40 44 53 49 47  |...........@DSIG|
00000010  f0 30 30 da 00 03 58 00  00 00 15 b4 4c 54 53 48  |.00...X.....LTSH|
00000020  f1 da 51 07 00 00 41 04  00 00 03 c0 4f 53 2f 32  |..Q...A.....OS/2|
00000030  9d c2 94 0f 00 00 01 c8  00 00 00 56 50 43 4c 54  |...........VPCLT|
00000040  73 c0 41 82 00 03 57 c8  00 00 00 36 56 44 4d 58  |s.A...W....6VDMX|
00000050  55 34 6f 96 00 00 44 c4  00 00 11 94 63 6d 61 70  |U4o...D.....cmap|
00000060  e3 18 5f 9b 00 00 21 2c  00 00 10 e2 63 76 74 20  |.._...!,....cvt |
00000070  4d fc 64 93 00 00 67 f8  00 00 06 9c 66 70 67 6d  |M.d...g.....fpgm|
00000080  57 78 09 53 00 00 62 8c  00 00 05 6b 67 61 73 70  |Wx.S..b....kgasp|
00000090  00 12 00 09 00 00 02 20  00 00 00 10 67 6c 79 66  |....... ....glyf|
000000a0  8e 29 b0 1f 00 00 d7 8c  00 02 4f d8 68 64 6d 78  |.)........O.hdmx|
000000b0  3c d8 97 63 00 00 7d 84  00 00 5a 08 68 65 61 64  |<..c..}...Z.head|
000000c0  c4 87 8e 35 00 00 01 4c  00 00 00 36 68 68 65 61  |...5...L...6hhea|
000000d0  0c 52 08 bb 00 00 01 84  00 00 00 24 68 6d 74 78  |.R.........$hmtx|
000000e0  b7 fa fd f4 00 00 6e 94  00 00 0e f0 6b 65 72 6e  |......n.....kern|
000000f0  b4 ed b4 bc 00 03 44 7c  00 00 13 4a 6c 6f 63 61  |......D|...Jloca|
00000100  04 db c8 d8 00 00 32 10  00 00 0e f4 6d 61 78 70  |......2.....maxp|
00000110  09 32 0d be 00 00 01 a8  00 00 00 20 6e 61 6d 65  |.2......... name|
00000120  6c 8f 82 f6 00 00 02 30  00 00 1e fc 70 6f 73 74  |l......0....post|
00000130  c4 ff 0a 6f 00 03 27 64  00 00 1d 18 70 72 65 70  |...o..'d....prep|
00000140  d7 9e 63 d9 00 00 56 58  00 00 0c 34 00 01 00 00  |..c...VX...4....|
0x14 = 20 tables ; last one is 'prep'.prep has offset 0x00005658 = 22104 and 
size 0x00000c34 = 3124
So here I am : by inspecting the code I could not imagine how this can be 
possible.I suspect a race condition during font loading
Any idea ?
Regards,
  MM

- pdfbox-2.0.24
- openjdk version "11.0.13" 2021-10-19
OpenJDK Runtime Environment Temurin-11.0.13+8 (build 11.0.13+8)
OpenJDK 64-Bit Server VM Temurin-11.0.13+8 (build 11.0.13+8, mixed mode)

Reply via email to