Hi Furkan, First, are you processing PDFs or actual image files? If PDFs, be careful about blacking out images because there may be some record of the underlying text in the file, and while a user might not be able to see the sensitive information, that information may be available for inquiring minds.
If PDFs, are these PDFs that are image-only or is there underlying electronic text. If image-only, you could use the hocr output from tesseract, which reports coordinates in an html output file. Now, if there is underlying text, we aren't currently extracting text positions from PDFs...although we could. @Eric Pugh <[email protected]>, recommendations? Cheers, Tim On Mon, Nov 25, 2019 at 7:39 AM Furkan KAMACI <[email protected]> wrote: > Hi All, > > I want to black out some particular texts at image (similar to described > at here: > https://helpx.adobe.com/acrobat/using/removing-sensitive-content-pdfs.html > ) > > I know that I can find tokens at image via Tika. However, I need the > coordinates of a found token at image to automatically black out specific > texts. > > How can I achieve this? > > Kind Regards, > Furkan KAMACI >
