[tesseract-ocr] Font Limit = 64 fonts in traineddata, really ??

2014-07-08 Thread Albrecht Hilker
The manual Training Tesseract 3 says: Tesseract needs to know about different shapes of the same character by having different fonts separated explicitly. This used to be limited to 32 fonts, but the limit has been raised to 64. It is set by the constant MAX_NUM_CONFIGS defined in

[tesseract-ocr] need help removing garbage characters from my OCR

2014-07-08 Thread Alex Ryan
I'm trying to make a words with friends cheat for a university project. I'm obviously trying to OCR the tiles from a screen shot of the app. I have tesseract 3.03 set up and running fine, but I'm not getting useable output. I've tried various training methods but so far haven't hit upon the

[tesseract-ocr] Re: Font Limit = 64 fonts in traineddata, really ??

2014-07-08 Thread Paul
If you have a look at intproto.h, you'll see there is a similar limitation, bit it's much more complicated. Unfortunately I don't have an overview of what is possible yet, but I'm working on it. :) Just use normproto.h as a reference. Am Dienstag, 8. Juli 2014 02:55:37 UTC+2 schrieb Albrecht

[tesseract-ocr] Re: need help removing garbage characters from my OCR

2014-07-08 Thread Paul
You will probably need a better binarization technique. See [1], [2]. [1]: https://groups.google.com/d/topic/tesseract-ocr/y-Yjxr1tRTQ/discussion [2]: https://groups.google.com/d/topic/tesseract-ocr/neyvXo2TAn0/discussion Am Dienstag, 8. Juli 2014 07:31:39 UTC+2 schrieb Alex Ryan: I'm trying

Re: [tesseract-ocr] Re: Font Limit = 64 fonts in traineddata, really ??

2014-07-08 Thread Shree Devi Kumar
As far as I understand, the font limitation applies up to tesseract 3.02. Major changes to training are currently in the works in SVN for 3.03 (not fully released yet - hence you see large number of fonts for english traineddata but not for others). The other languages traineddata maybe

Re: [tesseract-ocr] need help removing garbage characters from my OCR

2014-07-08 Thread Nick White
Hi Alex, If you're up for some programming, you could recognise the squares yourself, and pass each one separately to tesseract with the PSM_SINGLE_CHAR segmentation type. That should help if Tesseract is not segmenting each whole square separately. If the board is always the same size, you

Re: [tesseract-ocr] Re: Font Limit = 64 fonts in traineddata, really ??

2014-07-08 Thread Albrecht Hilker
As far as I understand, the font limitation applies up to tesseract 3.02. Major changes to training are currently in the works in SVN for 3.03 The files I am talking about are downloaded from https://code.google.com/p/tesseract-ocr/downloads/list They are all declared as version 3.02. For

Re: [tesseract-ocr] Re: Font Limit = 64 fonts in traineddata, really ??

2014-07-08 Thread shree
My information IS dated - I haven't followed the recent changes. Please see this thread - almost a year old which talked of the upcoming changes for training https://groups.google.com/forum/#!searchin/tesseract-dev/fonts/tesseract-dev/4lxGjCGLBSw/CH1cZsovPjIJ On Wednesday, July 9,