Re: Detecting CID fonts

jorgeeflorez Thu, 06 May 2021 11:53:22 -0700

>
> If the problem goes away by just opening and saving the PDF, then why
> modify it?


I cannot share the PDF file to the support team of that library. So I was
wondering if I could share the pdf without images, so technically would not
be sharing the file, only the part that is causing them problems. But it
probably won't work...

I've never heard of fonts being a problem... rather patterns, big images
> or very complex vector graphics.
>
An internal call to Arrays.copyOf (doing God knows what) takes all
available memory. Strange indeed.

El jue, 6 may 2021 a las 12:34, Tilman Hausherr (<[email protected]>)
escribió:

> Maybe the PDF is somehow broken and PDFBox repairs it. If the problem
> goes away by just opening and saving the PDF, then why modify it?
>
> I've never heard of fonts being a problem... rather patterns, big images
> or very complex vector graphics.
>
> Tilman
>
> Am 06.05.2021 um 14:16 schrieb jorgeeflorez:
> > Hi Tilman,
> > thank you for your reply.
> >
> > It's more complicated because form XObjects, patterns, annotations,
> >> softmasks (and maybe more) can also have fonts. I also doubt that you
> >> can detect CK fonts this way.
> >>
> > I see... I have a nasty pdf file that is causing OutOfMemoryError when
> used
> > by another library and I reached the conclusion that it is (somehow)
> > because the text and the fonts it uses...
> >
> > I saw the RemoveAllText example and maybe is what I need. I modified it
> and
> > instead of removing text I did nothing, and the new pdf file seems to
> have
> > the "corruption" removed...
> >
> > One last question, how could I modify the RemoveAllText example to remove
> > from the pdf file all images?
> >
> > Thanks.
> >
> > Jorge
> >
> >
> >
> > El jue, 6 may 2021 a las 1:07, Tilman Hausherr (<[email protected]>)
> > escribió:
> >
> >> Am 05.05.2021 um 18:39 schrieb jorgeeflorez:
> >>> Hi,
> >>> I would like to know what would be the best way to detect whether ia
> pdf
> >>> file has CID fonts. As far as I understand, these fonts are used in
> asian
> >>> texts (japanese, chinese, korean, etc). I have the following code:
> >>>
> >>>           PDDocument doc = PDDocument.load(myFile);
> >>>           for (int i = 0; i < doc.getNumberOfPages(); ++i)
> >>>           {
> >>>               PDPage page = doc.getPage(i);
> >>>               PDResources res = page.getResources();
> >>>               for (COSName fontName : res.getFontNames())
> >>>               {
> >>>                   PDFont font = res.getFont(fontName);
> >>>                   COSName subType =
> >>> font.getCOSObject().getCOSName(COSName.SUBTYPE);
> >>>                   System.out.println("CID? " +
> >> COSName.TYPE0.equals(subType));
> >>>                   System.out.println("font instanceof PDType0Font? " +
> >> (font
> >>> instanceof PDType0Font));
> >>>               }
> >>>           }
> >>> Would this be the right way to do it?
> >>
> >> It's more complicated because form XObjects, patterns, annotations,
> >> softmasks (and maybe more) can also have fonts. I also doubt that you
> >> can detect CK fonts this way.
> >>
> >> Re removing the text, see the RemoveAllTexts example in the source code
> >> download. IIRC this one only does the page content stream.
> >>
> >> Tilman
> >>
> >>
> >>> I need to detect this and try to create a pdf file from the original,
> but
> >>> without the text.
> >>>
> >>> Any indication is appreciated.
> >>>
> >>> Regards,
> >>>
> >>> Jorge
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Detecting CID fonts

Reply via email to