Hi,
Yes I suspect this is a parallel access problem, probably parallel
initialization of a standard 14 font. This has made occasional troubles
for years (despite that we tried to solve this) which is why it was
modified in 3.0.
Please try this workaround:
PDType1Font.COURIER.getPath("a");
and do this for all courier, helvetica and times fonts before the actual
work starts.
Alternatively try a change in the source code, in PDType1Font.java
change this
FontMapping<FontBoxFont> mapping = FontMappers.instance()
.getFontBoxFont(getBaseFont(),
getFontDescriptor());
genericFont = mapping.getFont();
to this
FontMapping<FontBoxFont> mapping;
synchronized(this)
{
mapping = FontMappers.instance()
.getFontBoxFont(getBaseFont(),
getFontDescriptor());
}
Tilman
Am 17.12.2021 um 17:17 schrieb Maison Mo:
Hello,
We parse a lot of pdf in a server application (for generating a thumbnail
thanks to PageDrawer).Sometimes (unlikely) something goes wrong and we get a
lot of NPE here :
java.lang.NullPointerException: null
at
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getPath(PDTrueTypeFont.java:470)
at org.apache.pdfbox.rendering.TTFGlyph2D.getPathForGID(TTFGlyph2D.java:144)
at
org.apache.pdfbox.rendering.TTFGlyph2D.getPathForCharacterCode(TTFGlyph2D.java:93)
at org.apache.pdfbox.rendering.PageDrawer.drawGlyph2D(PageDrawer.java:495)
at org.apache.pdfbox.rendering.PageDrawer.showFontGlyph(PageDrawer.java:476)
at
org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:787)
at
org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:805)
at
org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:743)
at
org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:606)
at
org.apache.pdfbox.contentstream.operator.text.ShowText.process(ShowText.java:56)
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933)
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:514)
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:492)
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:277)
PDTrueTypeFont.java:470 = GlyphData glyph = ttf.getGlyph().getGlyph(gid);
This is because ttf.getGlyph() returns null : the font has no 'glyf' table ?!
This happens with some specific fonts, but I did not check these are always the
same.
The problem may arise at server startup, although I am not sure.
When such a problem arises, we restart server but this is very annoying.
I suspected a corrupted font file, but this is not the case ; I tried to load that
font file manually with pdfbox and it works fine.Problem may have appeared with jdk8
-> jdk11 or pdfbox-2.0.22 -> 2.0.24 migration.
We did a heap dump of JVM in order to see FontCache content. Here is an extract
:
At first a correct value for font Arial_Bold.ttf :
Class Name
| Shallow Heap | Retained Heap
-----------------------------------------------------------------------------------------------------------------------------------
[61] java.util.concurrent.ConcurrentHashMap$Node @ 0x712b73c50
| 32 | 144
|- <class> class java.util.concurrent.ConcurrentHashMap$Node @ 0x700262200
System Class | 0 | 0
|- key org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo @
0x70e3e1778 | 56 | 160
| |- <class> class
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo @ 0x70e3da518 |
8 | 280
| |- format org.apache.pdfbox.pdmodel.font.FontFormat @ 0x70e3dacf0
| 24 | 24
| |- parent org.apache.pdfbox.pdmodel.font.FileSystemFontProvider @
0x70e3dbf48 | 24 | 7 712
| |- postScriptName java.lang.String @ 0x70e3e17b0 Arial-BoldMT
| 24 | 56
| |- panose org.apache.pdfbox.pdmodel.font.PDPanoseClassification @
0x70e3e17e8 | 16 | 48
| |- file java.io.File @ 0x70e3e1818
| 32 | 32
| | |- <class> class java.io.File @ 0x70032e130 System Class
| 56 | 14 496
| | |- status java.io.File$PathStatus @ 0x7002669c8
| 24 | 24
| | |- path java.lang.String @ 0x70e3e1838
/usr/share/fonts/truetype/msttcorefonts/Arial_Bold.ttf| 24 |
96
| | '- Total: 3 entries
| |
| '- Total: 6 entries
| |
|- next java.util.concurrent.ConcurrentHashMap$Node @ 0x712b73c70
| 32 | 72
|- val java.lang.ref.SoftReference @ 0x7198c9628
| 40 | 40
| |- <class> class java.lang.ref.SoftReference @ 0x70010e340 System Class
| 8 | 8
| |- queue java.lang.ref.ReferenceQueue$Null @ 0x70026cb48
| 32 | 48
| |- referent org.apache.fontbox.ttf.TrueTypeFont @ 0x7196732a0
| 40 | 485 816
| | |- <class> class org.apache.fontbox.ttf.TrueTypeFont @ 0x70edd2df0
| 8 | 400
| | |- tables java.util.HashMap @ 0x7196732c8
| 48 | 360 496
| | | |- <class> class java.util.HashMap @ 0x7003279e8 System Class
| 40 | 144
| | | |- table java.util.HashMap$Node[32] @ 0x7196732f8
| 144 | 360 448
| | | '- Total: 2 entries
| |
| | |- postScriptNames java.util.HashMap @ 0x719827048
| 48 | 69 008
| | |- data org.apache.fontbox.ttf.RAFDataStream @ 0x719c0a6d8
| 24 | 24
| | |- enabledGsubFeatures java.util.ArrayList @ 0x719c0a770
| 24 | 24
-----------------------------------------------------------------------------------------------------------------------------------
the tables in bold above is the TrueTypeFont.tables field. It has the following
fields :Type |Name |Value
---------------------------------------------------------
float|loadFactor|0.75
int |threshold |24
int |modCount |23
int |size |23
ref |entrySet |null
ref |table |java.util.HashMap$Node[32] @ 0x7196732f8
ref |values |null
ref |keySet |null
---------------------------------------------------------
Indeed, I checked that there are 23 tables in font Arial_Bold.ttf
But this is another entry in font cache, that triggers the bug :
Class Name
| Shallow Heap | Retained Heap
---------------------------------------------------------------------------------------------------------------------------------------
key org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo @
0x70e3e19b8 | 56 | 168
|- <class> class
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo @ 0x70e3da518
| 8 | 280
|- format org.apache.pdfbox.pdmodel.font.FontFormat @ 0x70e3dacf0
| 24 | 24
|- parent org.apache.pdfbox.pdmodel.font.FileSystemFontProvider @ 0x70e3dbf48
| 24 | 7 712
|- postScriptName java.lang.String @ 0x70e3e19f0 Arial-BoldItalicMT
| 24 | 64
|- panose org.apache.pdfbox.pdmodel.font.PDPanoseClassification @ 0x70e3e1a30
| 16 | 48
|- file java.io.File @ 0x70e3e1a60
| 32 | 32
| |- <class> class java.io.File @ 0x70032e130 System Class
| 56 | 14 496
| |- status java.io.File$PathStatus @ 0x7002669c8
| 24 | 24
| |- path java.lang.String @ 0x70e3e1a80
/usr/share/fonts/truetype/msttcorefonts/Arial_Bold_Italic.ttf| 24 |
104
| '- Total: 3 entries
| |
'- Total: 6 entries
| |
val java.lang.ref.SoftReference @ 0x71a904010
| 40 | 40
|- <class> class java.lang.ref.SoftReference @ 0x70010e340 System Class
| 8 | 8
|- queue java.lang.ref.ReferenceQueue$Null @ 0x70026cb48
| 32 | 48
|- referent org.apache.fontbox.ttf.TrueTypeFont @ 0x71a904038
| 40 | 392
| |- <class> class org.apache.fontbox.ttf.TrueTypeFont @ 0x70edd2df0
| 8 | 400
| |- tables java.util.HashMap @ 0x71a904060
| 48 | 304
| | |- <class> class java.util.HashMap @ 0x7003279e8 System Class
| 40 | 144
| | |- table java.util.HashMap$Node[16] @ 0x71a904090
| 80 | 256
| | | |- <class> class java.util.HashMap$Node[] @ 0x70025d338
| 0 | 0
| | | |- [0] java.util.HashMap$Node @ 0x71a9040e0
| 32 | 176
| | | | |- <class> class java.util.HashMap$Node @ 0x70025d2c8 System Class
| 8 | 32
| | | | |- key java.lang.String @ 0x71a904100 \u0000\u0000\u0000\u0000
| 24 | 48
| | | | | |- <class> class java.lang.String @ 0x7002a3c00 System Class,
JNI Global | 24 | 704
| | | | | |- value byte[4] @ 0x71a904118 ....
| 24 | 24
| | | | | | '- <class> class byte[] @ 0x7002a42a8
| 0 | 0
| | | | | '- Total: 2 entries
| |
| | | | |- value org.apache.fontbox.ttf.TTFTable @ 0x71a904130
| 48 | 96
| | | | | |- <class> class org.apache.fontbox.ttf.TTFTable @ 0x70edd4450
| 0 | 0
| | | | | |- font org.apache.fontbox.ttf.TrueTypeFont @ 0x71a904038
| 40 | 392
| | | | | |- tag java.lang.String @ 0x71a904160 \u0000\u0000\u0000\u0000
| 24 | 48
| | | | | '- Total: 3 entries
| |
| | | | '- Total: 3 entries
| |
| | | '- Total: 2 entries
| |
| | '- Total: 2 entries
| |
| |- data org.apache.fontbox.ttf.RAFDataStream @ 0x71a904190
| 24 | 24
| |- enabledGsubFeatures java.util.ArrayList @ 0x71a9041a8
| 24 | 24
---------------------------------------------------------------------------------------------------------------------------------------
The tables in bold only contains 1 table:
It contains the following fields :Type |Name |Value
---------------------------------------------------------
float|loadFactor|0.75
int |threshold |12
int |modCount |1
int |size |1
ref |entrySet |null
ref |table |java.util.HashMap$Node[16] @ 0x71a904090
ref |values |null
ref |keySet |null
---------------------------------------------------------
modCount=1 so I guess put() was called only once ?
If we look at that single TTFTable with strange name composed of 4 binary 0
chars (value in bold above) :Type |Name |Value
---------------------------------------------------------------------
ref |font |org.apache.fontbox.ttf.TrueTypeFont @ 0x71a904038
boolean|initialized|false
long |length |3124
long |offset |22104
long |checkSum |3617481689
ref |tag |\u0000\u0000\u0000\u0000
---------------------------------------------------------------------
It appears that 3124 is the length of the last table in file
Arial-Bold_Italic.ttf,and same for offset 22104.This is an binary dump of the
beginning of file Arial_BoldItalic.ttf :00000000 00 01 00 00 00 14 01 00 00
04 00 40 44 53 49 47 |...........@DSIG|
00000010 f0 30 30 da 00 03 58 00 00 00 15 b4 4c 54 53 48 |.00...X.....LTSH|
00000020 f1 da 51 07 00 00 41 04 00 00 03 c0 4f 53 2f 32 |..Q...A.....OS/2|
00000030 9d c2 94 0f 00 00 01 c8 00 00 00 56 50 43 4c 54 |...........VPCLT|
00000040 73 c0 41 82 00 03 57 c8 00 00 00 36 56 44 4d 58 |s.A...W....6VDMX|
00000050 55 34 6f 96 00 00 44 c4 00 00 11 94 63 6d 61 70 |U4o...D.....cmap|
00000060 e3 18 5f 9b 00 00 21 2c 00 00 10 e2 63 76 74 20 |.._...!,....cvt |
00000070 4d fc 64 93 00 00 67 f8 00 00 06 9c 66 70 67 6d |M.d...g.....fpgm|
00000080 57 78 09 53 00 00 62 8c 00 00 05 6b 67 61 73 70 |Wx.S..b....kgasp|
00000090 00 12 00 09 00 00 02 20 00 00 00 10 67 6c 79 66 |....... ....glyf|
000000a0 8e 29 b0 1f 00 00 d7 8c 00 02 4f d8 68 64 6d 78 |.)........O.hdmx|
000000b0 3c d8 97 63 00 00 7d 84 00 00 5a 08 68 65 61 64 |<..c..}...Z.head|
000000c0 c4 87 8e 35 00 00 01 4c 00 00 00 36 68 68 65 61 |...5...L...6hhea|
000000d0 0c 52 08 bb 00 00 01 84 00 00 00 24 68 6d 74 78 |.R.........$hmtx|
000000e0 b7 fa fd f4 00 00 6e 94 00 00 0e f0 6b 65 72 6e |......n.....kern|
000000f0 b4 ed b4 bc 00 03 44 7c 00 00 13 4a 6c 6f 63 61 |......D|...Jloca|
00000100 04 db c8 d8 00 00 32 10 00 00 0e f4 6d 61 78 70 |......2.....maxp|
00000110 09 32 0d be 00 00 01 a8 00 00 00 20 6e 61 6d 65 |.2......... name|
00000120 6c 8f 82 f6 00 00 02 30 00 00 1e fc 70 6f 73 74 |l......0....post|
00000130 c4 ff 0a 6f 00 03 27 64 00 00 1d 18 70 72 65 70 |...o..'d....prep|
00000140 d7 9e 63 d9 00 00 56 58 00 00 0c 34 00 01 00 00 |..c...VX...4....|
0x14 = 20 tables ; last one is 'prep'.prep has offset 0x00005658 = 22104 and
size 0x00000c34 = 3124
So here I am : by inspecting the code I could not imagine how this can be
possible.I suspect a race condition during font loading
Any idea ?
Regards,
MM
- pdfbox-2.0.24
- openjdk version "11.0.13" 2021-10-19
OpenJDK Runtime Environment Temurin-11.0.13+8 (build 11.0.13+8)
OpenJDK 64-Bit Server VM Temurin-11.0.13+8 (build 11.0.13+8, mixed mode)
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org