Hi,
I have created an application that uses PdfBox to search through text looking 
for specific words within pdf files.. When a match is found, I replace the word 
with a masked version. All of the pdfs contain custom fonts that are embedded. 
My problem is that some of the fonts are subsetted and are missing the 
characters of the masked version.
For Type 1 fonts, I have been able to load the pfb files and replace the font 
with an un-subsetted version. I was just wondering what my options would be for 
pdfs that use type 3 fonts.
I was thinking of one of the following:
1) Create a type 1 font that looks like the type 3 font and load that. ISSUE: 
Not sure if this is possible or how easy this would be.2a) Load a type 3 font 
from a file. ISSUE: I don't see anything other than the Type3StreamParser, but 
it works off of a COSStream. In addition, I don't have the original fonts, so I 
would need to extract the original font from a pdf and save it to a file. Not 
sure if that is possible.2b) Load the dictionary of a template pdf that 
contains the unsubsetted type 3 font, get the font object from the PDResources 
map and add it to the pdf I am modifying. ISSUE: I tried this for the type 1 
fonts originally and found that it only worked for the first pdf. I'm assuming 
the font object would need to be cloned or copied so that it does not become 
invalid.4) Modify the existing type 3 font and add the extract characters right 
in the pdf. ISSUE: Again, not sure if something like this is possible.
Any help would be greatly appreciated.                                    

Reply via email to