[tesseract-ocr] Extracting text from animated GIFs

Daniel Bishop Fri, 19 Feb 2016 01:33:55 -0800

Hello everyone!

I'm just getting started with Tesseract and am wowed at how well it does on 
tasks like scanned black and white text! I'm... less than thrilled at how 
it does at my current endeavor, which is to extract the text from animated 
GIFs, such as from reaction GIFs and memes and so on.


After reading the FAQ and the ImproveQuality articles as well as some 
further prodding around, it seems the DPI of the images isn't too small, 
but rather that most of the issue comes from the variety of background 
colors around the text and/or the font(s) commonly used for memes.

Does anyone have any experience with this, or have any helpful advice for 
this specific task? Attached is a sample of the kind of thing I want to 
process.

Thank you for your time.

(Incidentally, even though the new version of leptonica and tesseract both 
say they support gifs, I get the following error when I send a gif in:

Tesseract Open Source OCR Engine v3.04.00 with Leptonica

Error in pixReadMemGif: function not present

Error in pixReadMem: gif: no pix returned

Error during processing.

)

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c7872f44-b5b9-4479-8136-8b434044cda5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Extracting text from animated GIFs

Reply via email to