[tesseract-ocr] amount of data needed for fine-tuning on a particular font?

Ben Crowell Tue, 11 May 2021 16:51:48 -0700

I'm working on OCRing a book that has intermixed English and Greek. The 
accuracy is pretty poor so far, and I want to try fine-tuning tesseract for 
the Greek font used in this book. It seems to think δ looks like S because 
it has a curly top, and it mistakes λ for d. I've prepared about of page of 
text as training data, comprising about 20 lines of text. Is this too 
little to be useful? How much would be a normal amount of sample text to 
use for this purpose? I'm finding it's pretty time-consuming to prepare the 
data. It took me about an hour to do the one page.


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ba155f57-c0ca-4d93-9f69-74b1a54f1639n%40googlegroups.com.

[tesseract-ocr] amount of data needed for fine-tuning on a particular font?

Reply via email to