Skipping words is issue from tesseract. Amit do has a proposed patch for it. Look in tesseract issues.
You can see if it helps in your case. -- Excuse the brevity, msg sent from phone. On 23-Aug-2017 9:16 PM, "Nirajan Pant" <[email protected]> wrote: > Yeah! I have tried both gimagereader and vietocr as gui interface for > tesseract for Nepali. Result from both GUI skips the words. > > On Wednesday, 23 August 2017 17:30:32 UTC+5:45, shree wrote: >> >> You could try doing your own layout analysis instead of relying o >> tesseract's auto mode? >> >> Have you tried gimagereader and vietocr as gui interface for tesseract >> for Nepali? >> >> >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Wed, Aug 23, 2017 at 10:03 AM, Nirajan Pant <[email protected]> wrote: >> >>> I am working on GUI for tesseract OCR 4.0.0 (Nepali Language). When I >>> started analysis of the recognition results I found some missing words or >>> sentences. To find the reason behind this I just draw the boxes detected by >>> tesseract (using hocr) recognition result. The detection was shown here- >>> >>> >>> <https://lh3.googleusercontent.com/-fHOpPPkhnNA/WZ0EYWs61PI/AAAAAAAAEIE/-hNTXifXurIijRu12yJyNnSa-JEhjtvYACLcBGAs/s1600/tesseract_layout_analysis_error.png> >>> This is a part of document with paragraph detection error. Red line is >>> the boundary of detected paragraph (second column of original image given >>> below). >>> >>> The original image is: >>> >>> >>> <https://lh3.googleusercontent.com/-5cmTOXk9NN0/WZ0E-a8Wt7I/AAAAAAAAEIM/xok4rU6HiAITT5FhdLdWwsP1EU6iO8wxwCLcBGAs/s1600/Shikshak2072BS_Mangsir.pdf-13.png> >>> >>> Help me to deal with this issue. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/ae0aa097-93ba-4424-baf5-b4ed93ca574a%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/ae0aa097-93ba-4424-baf5-b4ed93ca574a%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/8e726246-a186-47f7-9850-f49441e75191% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/8e726246-a186-47f7-9850-f49441e75191%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUMa4bEHTiT2%3DZdopcu0yac0B-mp5s5yj6CedURErox8A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

