Re: [tesseract-ocr] Tesseract 4.0.0 fails to extract some words from the attached form

2019-02-28 Thread Zdenko Podobny
tesseract problem with OCR of tables is known problem - search archive and issue tracker. Zdenko pi 1. 3. 2019 o 5:13 sachin chavan napĂ­sal(a): > I'm also facing the same issue > > On Sat, Feb 23, 2019 at 2:09 AM Russia Aiyappa > wrote: > >> Tesseract misses the extraction of some words like

Re: [tesseract-ocr] Tesseract 4.0.0 fails to extract some words from the attached form

2019-02-28 Thread sachin chavan
I'm also facing the same issue On Sat, Feb 23, 2019 at 2:09 AM Russia Aiyappa wrote: > Tesseract misses the extraction of some words like "Monthly" and "Total" > (under section V) in the attached form. Upon using the PRImA tools I found > that "Monthly" was omitted as it wasn't segmented correc

Re: [tesseract-ocr] Re: tesseract 4 box files format

2019-02-28 Thread shree
> > https://github.com/tesseract-ocr/tesseract/pull/2231 implements the > Wordstr box file option. > These box files are for each textline and can be easily edited for non-RTL languages. example usage to create box files for english language images p001.png to p015.png for i in $(seq -f "