Re: [tesseract-ocr] Best export method

2020-03-20 Thread Dayton
I have output to hocr and tsv but I still get the all text without hard return or any separation between paragraphs. Is there an HOCR tool which allows to export to Microsoft Word? The original document is in PDF format. It´s actually an official document. First, I run ImageMagick and got a

[tesseract-ocr] Re: What is the difference between script *.traineddata and normal *.traineddata models

2020-03-20 Thread Essam Zaky
Thanks @Shreeshrii So the following commands recognize Arabic/English text tesseract AE.jpg AE1 -l ara+eng tesseract AE.jpg AE2 -l script/Arabic بتاريخ الخميس، 19 مارس، 2020 6:42:19 م UTC+2، كتب Essam Zaky: > > Hi Dears > > What is the difference between script *.traineddata and normal >

Re: [tesseract-ocr] Re: What is the difference between script *.traineddata and normal *.traineddata models

2020-03-20 Thread Shree Devi Kumar
Yes and the result of the two commands could be different. On Fri, Mar 20, 2020, 17:43 Essam Zaky wrote: > Thanks @Shreeshrii > > So the following commands recognize Arabic/English text > tesseract AE.jpg AE1 -l ara+eng > tesseract AE.jpg AE2 -l script/Arabic > > > > بتاريخ الخميس، 19 مارس،

Re: [tesseract-ocr] Best export method

2020-03-20 Thread Shree Devi Kumar
Take a look at gimagereader, which uses tesseract . It has the options you are looking for. On Fri, Mar 20, 2020, 17:55 Dayton wrote: > I have output to hocr and tsv but I still get the all text without hard > return or any separation between paragraphs. > > Is there an HOCR tool which allows

[tesseract-ocr] How to perfom layout analysis without ocr

2020-03-20 Thread Alex
Deal all, I need to detect the regions of a page without knowing the text inside in a fast way. I want to use tesseract from the command line. How can I do this? Which are the config value and the parameters useful to make a layout analysis? The task should be fast, for this reason I want to

Re: [tesseract-ocr] Best export method

2020-03-20 Thread Dayton
Thanks shree. I´ll have a look at gimagereader. Looks like promising. El viernes, 20 de marzo de 2020, 13:27:22 (UTC+1), shree escribió: > Take a look at gimagereader, which uses tesseract . It has the options you > are looking for. > > On Fri, Mar 20, 2020, 17:55 Dayton > > wrote: > >> I