Hi,

I'm trying to detect page numbers in a book. Contrary to normal page-
numbers, the ones below 10 are written with a leading zero - 01,
02, ...09.

For all numbers *above* 09, detection is much more stable.

I'm using english as language, and limit the character set to 0-9.

So my question is: is it possible that the english language training
set contains numbers without leading zeros, and thus the detection is
better? If yes, is there a way to only supply a different word-set,
without having to give the full image/box/stuff?

Also, I found a bug. I'm using baseapi, with a custom tessdata-dir
(OSX app bundle). This however doesn't work, because mainblk.cpp
relies on hard-coded paths or a environment-variable to determine
datadir, which is a global.

Setting the envdir in code is my work-around, but of course that's not
really cool.

Thanks for any suggestions,

Diez

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to