[tesseract-ocr] Re: Tesseract security considerations

James R Barlow Fri, 13 Jan 2017 09:09:51 -0800

On Thursday, December 8, 2016 at 9:39:20 PM UTC-8, José Luis Mendoza Azanza 
wrote:
>
> I am integrating Tesseract into an application, but I have some questions 
> before keep going with the process.
>
I think every application should have security filters and considerations 
> in order to avoid malicious and bad input data, so my questions are:
>
>    1. Does Tesseract have special code to handle bad or malicious input 
>    data? 
>
> Bad data for tesseract means an invalid image of some kind. It uses the 
leptonica library which does a number of sanity checks on images. It does 
not do anything special.


In its current form I would not consider it safe to allow a potential 
attacker to submit a chosen image to tesseract. I would assume that remote 
code execution vulnerabilities exist. Using ImageMagick or Pillow to 
sanitize the image before tesseract gets to see it.
 

>
>    1. Or just have a few validations to tell the user the correct input 
>    data?
>    
> What's there is pretty basic for the command line input, and the API has 
even less.

>
>    1. Releases are performed after doing some security reviews and 
>    testing?
>
> To my knowledge, no, there's never been a formal security review. There's 
a lot of ugly legacy C++ and C and questionable practices in the code, 
honestly.

>
>    1. Or just functional testing?
>    
> The CI scripts only checks that Tesseract compiles on a some supported 
platforms. There's a test suite that checks OCR quality in a statistical 
sense, but not correctness or valid output per se.

>
>    1. 
>
> I will appreciate your answers.
> Thanks a lot!
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e7343aa4-2387-419e-bef7-8de4fb98300a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Tesseract security considerations

Reply via email to