Skipping words is issue from tesseract. Amit do has a proposed patch for
it. Look in tesseract issues.

You can see if it helps in your case.

-- Excuse the brevity, msg sent from phone.

On 23-Aug-2017 9:16 PM, "Nirajan Pant" <[email protected]> wrote:

> Yeah! I have tried both gimagereader and vietocr as gui interface for
> tesseract for Nepali. Result from both GUI skips the words.
>
> On Wednesday, 23 August 2017 17:30:32 UTC+5:45, shree wrote:
>>
>> You could try doing your own layout analysis instead of relying o
>> tesseract's auto mode?
>>
>> Have you tried gimagereader and vietocr as gui interface for tesseract
>> for Nepali?
>>
>>
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Wed, Aug 23, 2017 at 10:03 AM, Nirajan Pant <[email protected]> wrote:
>>
>>> I am working on GUI for tesseract OCR 4.0.0 (Nepali Language). When I
>>> started analysis of the recognition results I found some missing words or
>>> sentences. To find the reason behind this I just draw the boxes detected by
>>> tesseract (using hocr) recognition result. The detection was shown here-
>>>
>>>
>>> <https://lh3.googleusercontent.com/-fHOpPPkhnNA/WZ0EYWs61PI/AAAAAAAAEIE/-hNTXifXurIijRu12yJyNnSa-JEhjtvYACLcBGAs/s1600/tesseract_layout_analysis_error.png>
>>> This is a part of document with paragraph detection error. Red line is
>>> the boundary of detected paragraph (second column of original image given
>>> below).
>>>
>>> The original image is:
>>>
>>>
>>> <https://lh3.googleusercontent.com/-5cmTOXk9NN0/WZ0E-a8Wt7I/AAAAAAAAEIM/xok4rU6HiAITT5FhdLdWwsP1EU6iO8wxwCLcBGAs/s1600/Shikshak2072BS_Mangsir.pdf-13.png>
>>>
>>> Help me to deal with this issue.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/ae0aa097-93ba-4424-baf5-b4ed93ca574a%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/ae0aa097-93ba-4424-baf5-b4ed93ca574a%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/8e726246-a186-47f7-9850-f49441e75191%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/8e726246-a186-47f7-9850-f49441e75191%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUMa4bEHTiT2%3DZdopcu0yac0B-mp5s5yj6CedURErox8A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to