[tesseract-ocr] Paragraph wise training

Krishna Prasad Mon, 15 Jul 2019 22:10:00 -0700

Hi People,
     I am trying to retrain tesseract with 
https://www.primaresearch.org/repository/index/IMPACT_Digitisation


 As I read in the documentation, the input to retraining tesseract was a 
line ( an image of line of text with accompanying groundtruth) Is it 
possible for me to train using Paragraphs, as the dataset contains 
groundtruths only paragraph-wise? 

Will it help in increasing accuracy? Do you guys know of some tools to 
detect line in the paragraph? 

I think, if I use OpenCV Image processing to separate a paragraph to text 
lines, It would fail for some images. Please suggest me a better solution 
if possible. Thanks in advance.

Regards,
Krishna Prasad A S

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/df29ef59-1b57-46dc-8207-8dfe91f28230%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Paragraph wise training

Reply via email to