Have you tried with the existing english traineddata?

I get good recognition with your 'prepared-image'?

If that is the kind of image you need to OCR, you could do that with psm 6
and then split each letter separately?

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Nov 14, 2014 at 7:12 PM, Simon Støvring <[email protected]>
wrote:

> Hello,
>
> I am trying to recognize single characters written with the Gotham Bold
> font. I have trained Tesseract by following Michael Jay Lissners guide
> "Adding New Fonts to Tesseract 3 OCR Engine"
> <http://michaeljaylissner.com/posts/2012/02/11/adding-new-fonts-to-tesseract-3-ocr-engine/>.
> I trained it using a newspaper article and removed all characters that I am
> not interested in as well as making sure all characters are upper case as I
> am not going to match lower case characters.
>
> I run Tesseract with my custom language and with page segmentation set to
> 10, which treat the image as a single character.
>
> While most of the matches are fine, I am getting a lot of incorrect
> matches. For example, the below image of the letter "B" is matched as an
> "X". I cannot figure out why this is.
>
>
> <https://lh4.googleusercontent.com/-AOLPnD7nXJY/VGYC58I-roI/AAAAAAAAASQ/kTJq9eSNMy4/s1600/0-4.png>
>
> And the "B" below which looks the same as the above but it is in fact not
> the same image, is not matched to anything. Tesseract does not know what is
> on the image.
>
>
> <https://lh4.googleusercontent.com/-b0kMaAzcN-Y/VGYFI6NOzjI/AAAAAAAAASk/c9EfpR8CjWI/s1600/1-7.png.png>
>
>
> The below "C" is not matched to anything. Tesseract cannot figure out what
> is on the image.
>
>
> <https://lh5.googleusercontent.com/-ZKl8jE2Orto/VGYEs2xzGlI/AAAAAAAAASc/2xTXomhIkWI/s1600/0-8.png>
> The same goes for the "U" below.
>
>
> <https://lh5.googleusercontent.com/-fciIyBe9bDw/VGYFRh3YBNI/AAAAAAAAASs/29WZQUHqPmE/s1600/1-8.png>
> And it thinks the "E" below is a "K".
>
>
> <https://lh4.googleusercontent.com/-ZZFkr77drgM/VGYFcDydDXI/AAAAAAAAAS0/RQ1UO8U3rOY/s1600/1-9.png>
>
> The above errors are just examples. There are others but I think those
> four examples illustrate the quirks I'm currently dealing with.
>
> I manually slice the image below into images of single characters like the
> ones above. Maybe a completely different approach is better?
>
>
> <https://lh4.googleusercontent.com/-TfwZnXosqB0/VGYFjLppJ9I/AAAAAAAAAS8/Oun76IHLwks/s1600/prepared_image.png>
> Does anyone know how I can improve the recognition of single characters?
> I'ld like the above examples to match correctly but generally it's just not
> good enough and I'ld like to know if there's any way I can improve it.
> Should I train differently? Should I pass other configurations or should I
> process the images before trying to recognize the characters?
>
> Best regards,
> Simon B. Støvring
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/e905020c-f0b2-47b6-b09c-e01efa96dcc1%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/e905020c-f0b2-47b6-b09c-e01efa96dcc1%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWBmmBVaZp879yXz-%3D2i2ctp0MtTLciX893t-_cPRLODQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to