On Thursday, December 8, 2016 at 9:39:20 PM UTC-8, José Luis Mendoza Azanza
> I am integrating Tesseract into an application, but I have some questions
> before keep going with the process.
I think every application should have security filters and considerations
> in order to avoid malicious and bad input data, so my questions are:
> 1. Does Tesseract have special code to handle bad or malicious input
> Bad data for tesseract means an invalid image of some kind. It uses the
leptonica library which does a number of sanity checks on images. It does
not do anything special.
In its current form I would not consider it safe to allow a potential
attacker to submit a chosen image to tesseract. I would assume that remote
code execution vulnerabilities exist. Using ImageMagick or Pillow to
sanitize the image before tesseract gets to see it.
> 1. Or just have a few validations to tell the user the correct input
> What's there is pretty basic for the command line input, and the API has
> 1. Releases are performed after doing some security reviews and
> To my knowledge, no, there's never been a formal security review. There's
a lot of ugly legacy C++ and C and questionable practices in the code,
> 1. Or just functional testing?
> The CI scripts only checks that Tesseract compiles on a some supported
platforms. There's a test suite that checks OCR quality in a statistical
sense, but not correctness or valid output per se.
> I will appreciate your answers.
> Thanks a lot!
You received this message because you are subscribed to the Google Groups
To unsubscribe from this group and stop receiving emails from it, send an email
To post to this group, send email to firstname.lastname@example.org.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
For more options, visit https://groups.google.com/d/optout.