Re: Detect Invisible Text (placed by tools which make searchable PDF)

Luca Loiodice Fri, 03 May 2019 08:24:23 -0700

Excellent, looks promising, thanks a lot for your help!

A related (still in the area of low quality extracted text) question ...
would it be also possible to detect which characters are drawn with a
font with no unicode mappings? I generally know for example how to detect
if a PDF has for example a type 3 font with no unicode
mapping, but sometimes that font is only used for a small portion of the
characters in the page and wanted to special handle those characters.


Thanks again





On Fri, May 3, 2019 at 10:07 AM Tilman Hausherr <[email protected]>
wrote:

> These answers may help:
>
> https://stackoverflow.com/questions/50044892/pdfbox-invisible-text-from-pdftextstripper-not-clip-path-or-color-issue
>
> https://stackoverflow.com/questions/50487520/pdfbox-2-0-invisible-text-from-pdftextstripper
>
> Tilman
>
> Am 03.05.2019 um 17:02 schrieb Luca Loiodice:
> > Hello,
> >
> > I would need to remove (often low quality) invisible text placed on
> images
> > by
> > tools which use OCR to make searchable PDF.
> >
> > We use pdfbox ourselves to make searchable PDF... and we use
> > setRenderingMode(RenderingMode.NEITHER); when we place the text to
> > make it invisible.We also use pdfbox's text stripper to remove text from
> > PDF.
> >
> > What I am not sure if there is a way for the text stripper to identify
> the
> > characters that
> > have been placed as invisible and only remove those in some cases.
> >
> > Thanks for your help,
> > Luca
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Detect Invisible Text (placed by tools which make searchable PDF)

Reply via email to