On 30 July 2010 22:48, Sven Pedersen <[email protected]> wrote: > In a conversation between Philip Pemberton and Jimmy on the 27th, it > seems that the user wordlist may not work for Tesseract 3. You may > need to call the file 'eng.' or $LANG. and put it in the traindata > folder. It sounds like Jimmy is eventually planning to improve the > situation. In the mean time you may have to train tesseract yourself > with your corpus (and font) to improve results, or do image > manipulations (resize/adjust) to improve the input at runtime.
That's true, but the results would have been more or less the same anyway. Anyway; going by some of the stuff Google have published, there will be a post-editing facility in Tesseract in the future, where the dictionaries and something very much like DangAmbigs will be used in more or less the way people expected that they were used. It might actually be in the codebase now (hey, it's quite large, and I don't have a huge amount of spare time), but I've only found the training code (and that's not quite set up to be used yet). -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

