Are you using c++ Tesseract API ?
In mycase i'm using PSM = 11 ,4,5
api->SetPageSegMode(tesseract::PSM_SINGLE_BLOCK);
api->SetPageSegMode(tesseract::PSM_SPARSE_TEXT);
api->SetPageSegMode(tesseract::PSM_SINGLE_COLUMN);

I think psm =4 have good result words of sentences and psm=11 have good ocr 
result.
Idk hows it's work? but if you have problem with  missing words or 
sentences you must try change the default psm value,
Vào 13:33:28 UTC+9 Thứ Tư, ngày 23 tháng 8 năm 2017, Nirajan Pant đã viết:
>
> I am working on GUI for tesseract OCR 4.0.0 (Nepali Language). When I 
> started analysis of the recognition results I found some missing words or 
> sentences. To find the reason behind this I just draw the boxes detected by 
> tesseract (using hocr) recognition result. The detection was shown here-
>
>
> <https://lh3.googleusercontent.com/-fHOpPPkhnNA/WZ0EYWs61PI/AAAAAAAAEIE/-hNTXifXurIijRu12yJyNnSa-JEhjtvYACLcBGAs/s1600/tesseract_layout_analysis_error.png>
> This is a part of document with paragraph detection error. Red line is the 
> boundary of detected paragraph (second column of original image given 
> below).
>
> The original image is:
>
>
> <https://lh3.googleusercontent.com/-5cmTOXk9NN0/WZ0E-a8Wt7I/AAAAAAAAEIM/xok4rU6HiAITT5FhdLdWwsP1EU6iO8wxwCLcBGAs/s1600/Shikshak2072BS_Mangsir.pdf-13.png>
>
> Help me to deal with this issue.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/607b823e-5111-401d-9a9c-633e5c38b7a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to