The manual Training Tesseract 3 says:
Tesseract needs to know about different shapes of the same character by
having different fonts separated explicitly.
This used to be limited to 32 fonts, but the limit has been raised to 64.
It is set by the constant MAX_NUM_CONFIGS defined in
I'm trying to make a words with friends cheat for a university project. I'm
obviously trying to OCR the tiles from a screen shot of the app. I have
tesseract 3.03 set up and running fine, but I'm not getting useable output.
I've tried various training methods but so far haven't hit upon the
If you have a look at intproto.h, you'll see there is a similar limitation,
bit it's much more complicated. Unfortunately I don't have an overview of
what is possible yet, but I'm working on it. :) Just use normproto.h as a
reference.
Am Dienstag, 8. Juli 2014 02:55:37 UTC+2 schrieb Albrecht
You will probably need a better binarization technique. See [1], [2].
[1]: https://groups.google.com/d/topic/tesseract-ocr/y-Yjxr1tRTQ/discussion
[2]: https://groups.google.com/d/topic/tesseract-ocr/neyvXo2TAn0/discussion
Am Dienstag, 8. Juli 2014 07:31:39 UTC+2 schrieb Alex Ryan:
I'm trying
As far as I understand, the font limitation applies up to tesseract 3.02.
Major changes to training are currently in the works in SVN for 3.03 (not
fully released yet - hence you see large number of fonts for english
traineddata but not for others). The other languages traineddata maybe
Hi Alex,
If you're up for some programming, you could recognise the squares
yourself, and pass each one separately to tesseract with the
PSM_SINGLE_CHAR segmentation type. That should help if Tesseract is
not segmenting each whole square separately.
If the board is always the same size, you
As far as I understand, the font limitation applies up to tesseract 3.02.
Major changes to training are currently in the works in SVN for 3.03
The files I am talking about are downloaded from
https://code.google.com/p/tesseract-ocr/downloads/list
They are all declared as version 3.02.
For
My information IS dated - I haven't followed the recent changes. Please see
this thread - almost a year old which talked of the upcoming changes for
training
https://groups.google.com/forum/#!searchin/tesseract-dev/fonts/tesseract-dev/4lxGjCGLBSw/CH1cZsovPjIJ
On Wednesday, July 9,
8 matches
Mail list logo