Is there any way to prevent pdftops from subsetting fonts? I want to be able to convert the ps back to a PDF and still be able to extract text with pdftotext.
I have a large single page PDF. When I drag to copy text in atril or okular or run pdftotext, it finds the text. pdffonts shows about 40 fonts. They are all similar: name type encoding emb sub uni object ID ------------------------------------ ----------------- ---------------- --- --- --- --------- HelveticaNeueLTStd-Roman--Identity-H CID Type 0C Identity-H yes no yes 214 0 HelveticaNeueLTStd-BdIt--Identity-H CID Type 0C Identity-H yes no yes 236 0 ... HelveticaLTStd-Bold--Identity-H CID Type 0C Identity-H yes no yes 70 0 Berkeley-Bold--Identity-H CID Type 0C Identity-H yes no yes 60 0 pdfinfo shows ModDate: Fri Jun 26 21:27:37 2020 WEST Tagged: no UserProperties: no Suspects: no Form: none JavaScript: no Pages: 1 Encrypted: no Page size: 702 x 1296 pts Page rot: 0 File size: 13501736 bytes Optimized: no PDF version: 1.6 When I run the PDF through pdftops, it subsets the fonts, and then when I convert it back into a PDF with ghostscript ps2pdf, the text shows, but copying it or running pdftotext does not work. The end of the generated ps is %%+ font BHQHNF+MinionPro-Regular %%+ font BHQHNG+Berkeley-Book %%+ font BHQHNH+HelveticaLTStd-Bold %%+ font BHQHNI+Berkeley-Bold %%EOF so it looks like pdftops is subsetting the fonts. "grep Berkeley-Bold", for example, shows %%BeginResource: font BHQHNI+Berkeley-Bold /CIDFontName /BHQHNI+Berkeley-Bold def /F60_0 /BHQHNI+Berkeley-Bold 0 pdfMakeFont16L3 %%+ font BHQHNI+Berkeley-Bold "grep -A 1 ' Tc$' x.ps | grep '(' | head" also appears to show that the fonts have been subsetted. (\000\025\000\014) (\000\015\000\024) (\000\001\000*) (\000\002\000\003\000\012) (\000\006\000\015) (\000\014\000\017\000\005\000\007) (\000\033\000\031) (\000\013\000"\000"\000\026\000\022) (\000\012\000\004) (\000\024\000\023\000\017\000\001) In testing, I also noticed that some pdftops options like -level3 generate ps files that crash ghostscript, but for now I think that is a ghostscript issue. https://bugs.ghostscript.com/show_bug.cgi?id=702526 The ghostscript bug report has a copy of the PDF. I can post this as a poppler bug report, but I wanted to check first that I didn't miss a pdftops option or that there wasn't an internal flag that I could expose as an option in pdftops. William
_______________________________________________ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler