Training Text Best Practices

Joe Carter Thu, 29 Nov 2012 04:53:35 -0800

Hello,

I'm trying to Train Tesseract to recognize a script with over 200 letters.


Is it ok to train Tesseract with gibberish text? Or does the training 
method rely on a probable distribution of characters i.e. Actual writing? 
I'd like to train it with a random distribution of characters where each 
character appears 10-20 times depending on how common it is.

When it comes to punctuation, does the same apply? I know the training 
guide  says to make sure that the punctuation is not grouped together, but 
do the examples of punctuation have to be plausible? For example, 
do parentheses have to be properly matched? e.g. *The (quick brown] fox 
jump over the lazy dog.*
*
*
Thanks.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Training Text Best Practices

Reply via email to