Good work extracting text. But not sufficient for Tesseract. Try blurring
your result image until characters become less blocky. This way you
probably wouldn't need training.

Completely different approach is to use fixed pattern matching. Go find my
post about pulling text out of game screenshots. You'll need to program
yourself then.

The last thing I'd try is training. Wiki is your friend.

-Dmitri
On Sep 15, 2015 10:36 AM, "Keith Reilly" <[email protected]> wrote:

> Okay so my project is i want to extract the text imbedded in video. After
> experimenting with Imagemagick i was able to isolate the text and put it on
> a white background. I thought that would be the hard part. But every
> command line OCR software i try is pretty bad at converting what i have. In
> the sample image, f2.png, you can see what i'm working with. It is just the
> network name and date i need. With this imagemagick command:
>
> *convert f1.png f2.png f3.png f4.png f5.png f6.png f7.png
> -evaluate-sequence Min -threshold 60% -negate output.png*. I thought that
> was pretty good result. Clean image with decent text. Tesseract is about
> %50. My question is this: Can i train tesseract without the full alphabet?
> Since these are all labeled by network and Vanderbilt only records a few
> i'll have FOX, ABC, CBS, NBC, and CNN. Not too many letters to train with.
> Also could anyone point out instructions on getting the training tools
> installed on Mac os X? Macports doesn't have the training part, I did
> install v3 from source but the training programs won't compile. Any help is
> appreciated
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/52275c37-543e-4b85-ab44-6c51f890ca6b%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/52275c37-543e-4b85-ab44-6c51f890ca6b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAKzLxFOOvFKU%3DNvzLKKnjVVxnzvMq9-k0E-jqukEkVn5Aza3ow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to