Hi People,
I am trying to retrain tesseract with
https://www.primaresearch.org/repository/index/IMPACT_Digitisation
As I read in the documentation, the input to retraining tesseract was a
line ( an image of line of text with accompanying groundtruth) Is it
possible for me to train using Paragraphs, as the dataset contains
groundtruths only paragraph-wise?
Will it help in increasing accuracy? Do you guys know of some tools to
detect line in the paragraph?
I think, if I use OpenCV Image processing to separate a paragraph to text
lines, It would fail for some images. Please suggest me a better solution
if possible. Thanks in advance.
Regards,
Krishna Prasad A S
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/df29ef59-1b57-46dc-8207-8dfe91f28230%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.