Hi,
I have created an application that uses PdfBox to search through text looking
for specific words within pdf files.. When a match is found, I replace the word
with a masked version. All of the pdfs contain custom fonts that are embedded.
My problem is that some of the fonts are subsetted and are missing the
characters of the masked version.
For Type 1 fonts, I have been able to load the pfb files and replace the font
with an un-subsetted version. I was just wondering what my options would be for
pdfs that use type 3 fonts.
I was thinking of one of the following:
1) Create a type 1 font that looks like the type 3 font and load that. ISSUE:
Not sure if this is possible or how easy this would be.2a) Load a type 3 font
from a file. ISSUE: I don't see anything other than the Type3StreamParser, but
it works off of a COSStream. In addition, I don't have the original fonts, so I
would need to extract the original font from a pdf and save it to a file. Not
sure if that is possible.2b) Load the dictionary of a template pdf that
contains the unsubsetted type 3 font, get the font object from the PDResources
map and add it to the pdf I am modifying. ISSUE: I tried this for the type 1
fonts originally and found that it only worked for the first pdf. I'm assuming
the font object would need to be cloned or copied so that it does not become
invalid.4) Modify the existing type 3 font and add the extract characters right
in the pdf. ISSUE: Again, not sure if something like this is possible.
Any help would be greatly appreciated.