I'm working on OCRing a book that has intermixed English and Greek. The accuracy is pretty poor so far, and I want to try fine-tuning tesseract for the Greek font used in this book. It seems to think δ looks like S because it has a curly top, and it mistakes λ for d. I've prepared about of page of text as training data, comprising about 20 lines of text. Is this too little to be useful? How much would be a normal amount of sample text to use for this purpose? I'm finding it's pretty time-consuming to prepare the data. It took me about an hour to do the one page.
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ba155f57-c0ca-4d93-9f69-74b1a54f1639n%40googlegroups.com.

