Hello, I'm trying to Train Tesseract to recognize a script with over 200 letters. Due to the large number of letters, I'm trying to see if I can come with a text that is easy to generate and is optimal for training.
I'd like to train it with a random distribution of characters where each character appears 10-20 times depending on how common it is. Is it ok to train Tesseract with gibberish text? Or does the training method rely on a probable distribution of characters i.e. Actual writing? When it comes to punctuation, does the same apply? I know the training guide says to make sure that the punctuation is not grouped together, but do the examples of punctuation have to be plausible? For example, do parentheses have to be properly matched? e.g. *The quick (brown fox}jump over the lazy dog. * * * Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

