[tesseract-ocr] Is it right that training can only help with different font but not page layout

Jingjing Lin Fri, 14 Jun 2019 08:52:01 -0700

It seems that when training we only have to input training_text, and then 
you train the training_text on different fonts. Tesseract will create 
images itself during training. And we don't have to give tesseract our 
image during training. Does this mean retrain will only help with fonts but 
not page layout? Meaning there's no way you can affect the way tesseract 
does the segmentation? (I understand that you can use --psm) 
I'm just wondering whether training will help you get better result for 
special layout, like a tabular image, with usual fonts.


On the other hand, it seems we can also create our own .box file and so the 
training. I guess I have the above idea just because I was drawing 
conclusion from fine tuning with a few characters.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d21049e1-19ae-4018-a40a-b4abbfa07bb8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Is it right that training can only help with different font but not page layout

Reply via email to