Re: Token Coordinates at Image

Tim Allison Mon, 25 Nov 2019 07:03:06 -0800

Hi Furkan,

  First, are you processing PDFs or actual image files?  If PDFs, be
careful about blacking out images because there may be some record of the
underlying text in the file, and while a user might not be able to see the
sensitive information, that information may be available for inquiring
minds.

  If PDFs, are these PDFs that are image-only or is there underlying
electronic text.  If image-only, you could use the hocr output from
tesseract, which reports coordinates in an html output file.

  Now, if there is underlying text, we aren't currently extracting text
positions from PDFs...although we could.

@Eric Pugh <[email protected]>, recommendations?

  Cheers,

                      Tim

On Mon, Nov 25, 2019 at 7:39 AM Furkan KAMACI <[email protected]>
wrote:

> Hi All,
>
> I want to black out some particular texts at image (similar to described
> at here:
> https://helpx.adobe.com/acrobat/using/removing-sensitive-content-pdfs.html
> )
>
> I know that I can find tokens at image via Tika. However, I need the
> coordinates of a found token at image to automatically black out specific
> texts.
>
> How can I achieve this?
>
> Kind Regards,
> Furkan KAMACI
>

Re: Token Coordinates at Image

Reply via email to