Here is my sample base on Tesseract 4.0 <https://lh3.googleusercontent.com/-DsGMChP0lbM/WZ4f0bWb1-I/AAAAAAAAC48/KsOj6R26NtU_XfcN6vwymQRyaJALtV90gCLcBGAs/s1600/Shikshak2072BS_Mangsir.pdf-13.png>
Vào 13:33:28 UTC+9 Thứ Tư, ngày 23 tháng 8 năm 2017, Nirajan Pant đã viết: > > I am working on GUI for tesseract OCR 4.0.0 (Nepali Language). When I > started analysis of the recognition results I found some missing words or > sentences. To find the reason behind this I just draw the boxes detected by > tesseract (using hocr) recognition result. The detection was shown here- > > > <https://lh3.googleusercontent.com/-fHOpPPkhnNA/WZ0EYWs61PI/AAAAAAAAEIE/-hNTXifXurIijRu12yJyNnSa-JEhjtvYACLcBGAs/s1600/tesseract_layout_analysis_error.png> > This is a part of document with paragraph detection error. Red line is the > boundary of detected paragraph (second column of original image given > below). > > The original image is: > > > <https://lh3.googleusercontent.com/-5cmTOXk9NN0/WZ0E-a8Wt7I/AAAAAAAAEIM/xok4rU6HiAITT5FhdLdWwsP1EU6iO8wxwCLcBGAs/s1600/Shikshak2072BS_Mangsir.pdf-13.png> > > Help me to deal with this issue. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3ac9ff6e-223c-4a2b-858d-7193ab35ae40%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

