On Thursday, December 8, 2016 at 9:39:20 PM UTC-8, José Luis Mendoza Azanza 
> I am integrating Tesseract into an application, but I have some questions 
> before keep going with the process.
I think every application should have security filters and considerations 
> in order to avoid malicious and bad input data, so my questions are:
>    1. Does Tesseract have special code to handle bad or malicious input 
>    data? 
> Bad data for tesseract means an invalid image of some kind. It uses the 
leptonica library which does a number of sanity checks on images. It does 
not do anything special.

In its current form I would not consider it safe to allow a potential 
attacker to submit a chosen image to tesseract. I would assume that remote 
code execution vulnerabilities exist. Using ImageMagick or Pillow to 
sanitize the image before tesseract gets to see it.

>    1. Or just have a few validations to tell the user the correct input 
>    data?
> What's there is pretty basic for the command line input, and the API has 
even less.

>    1. Releases are performed after doing some security reviews and 
>    testing?
> To my knowledge, no, there's never been a formal security review. There's 
a lot of ugly legacy C++ and C and questionable practices in the code, 

>    1. Or just functional testing?
> The CI scripts only checks that Tesseract compiles on a some supported 
platforms. There's a test suite that checks OCR quality in a statistical 
sense, but not correctness or valid output per se.

>    1. 
> I will appreciate your answers.
> Thanks a lot!

You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
For more options, visit https://groups.google.com/d/optout.

Reply via email to