Re: Detect Invisible Text (placed by tools which make searchable PDF)

Tilman Hausherr Fri, 03 May 2019 08:29:55 -0700

Am 03.05.2019 um 17:23 schrieb Luca Loiodice:

Excellent, looks promising, thanks a lot for your help!


A related (still in the area of low quality extracted text) question ...
would it be also possible to detect which characters are drawn with a
font with no unicode mappings? I generally know for example how to detect
if a PDF has for example a type 3 font with no unicode

You could check whether getUnicode() is null or empty, that would be theeasiest. Or get the font, call getCOSObject() and check whether aToUnicode item exists. (However sometimes there is a missing ToUnicodebut getUnicode() returns something anyway... "it's complicated")


Tilman

mapping, but sometimes that font is only used for a small portion of the
characters in the page and wanted to special handle those characters.

Thanks again





On Fri, May 3, 2019 at 10:07 AM Tilman Hausherr <[email protected]>
wrote:

These answers may help:

https://stackoverflow.com/questions/50044892/pdfbox-invisible-text-from-pdftextstripper-not-clip-path-or-color-issue

https://stackoverflow.com/questions/50487520/pdfbox-2-0-invisible-text-from-pdftextstripper

Tilman

Am 03.05.2019 um 17:02 schrieb Luca Loiodice:

Hello,

I would need to remove (often low quality) invisible text placed on

images

by
tools which use OCR to make searchable PDF.

We use pdfbox ourselves to make searchable PDF... and we use
setRenderingMode(RenderingMode.NEITHER); when we place the text to
make it invisible.We also use pdfbox's text stripper to remove text from
PDF.

What I am not sure if there is a way for the text stripper to identify

the

characters that
have been placed as invisible and only remove those in some cases.

Thanks for your help,
Luca


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Detect Invisible Text (placed by tools which make searchable PDF)

Reply via email to