Hi, I'm trying to detect page numbers in a book. Contrary to normal page- numbers, the ones below 10 are written with a leading zero - 01, 02, ...09.
For all numbers *above* 09, detection is much more stable. I'm using english as language, and limit the character set to 0-9. So my question is: is it possible that the english language training set contains numbers without leading zeros, and thus the detection is better? If yes, is there a way to only supply a different word-set, without having to give the full image/box/stuff? Also, I found a bug. I'm using baseapi, with a custom tessdata-dir (OSX app bundle). This however doesn't work, because mainblk.cpp relies on hard-coded paths or a environment-variable to determine datadir, which is a global. Setting the envdir in code is my work-around, but of course that's not really cool. Thanks for any suggestions, Diez -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

