Re: [tesseract-ocr] Re: Tesseract error while combine_lang_model

2020-04-14 Thread Piyush Chandra
hin.des0.txt These are the files I used. For box file, I used the below command: tesseract hin.des0.PNG hin.des0 -l hin lstmbox On Wednesday, 15 April 2020 06:52:48 UTC+5:30, shree wrote: > > How are you creating the

Re: [tesseract-ocr] Re: textline finding fail

2020-04-14 Thread Shree Devi Kumar
I have also noticed the same for Javanese and Balinese scriptts. On Tue, Apr 14, 2020, 09:42 Pndaza wrote: > Textline finding fails when base constants and their upper vowel or asat > are seperate. > When base constants and their upper vowel or asat are join, it ok > > On Tuesday, 14 April 2020

Re: [tesseract-ocr] Re: Tesseract error while combine_lang_model

2020-04-14 Thread Shree Devi Kumar
How are you creating the box files? On Wed, Apr 15, 2020, 01:52 Piyush Chandra wrote: > For other files, when I try on linux, its coming like this: > > unicharset_extractor --norm_mode 2 hin.desk0.box hin.desk1.box > Extracting unicharset from box file hin.desk0.box > Invalid start of grapheme

Re: [tesseract-ocr] Re: Tesseract error while combine_lang_model

2020-04-14 Thread Piyush Chandra
For other files, when I try on linux, its coming like this: unicharset_extractor --norm_mode 2 hin.desk0.box hin.desk1.box Extracting unicharset from box file hin.desk0.box Invalid start of grapheme sequence:H=0x94d Normalization failed for string '्' Invalid start of grapheme sequence:M=0x93e

Re: [tesseract-ocr] Re: Tesseract error while combine_lang_model

2020-04-14 Thread Piyush Chandra
Hi Shree, When I used unicharset extractor command, I get these error: unicharset_extractor --norm_mode 2 --output_unicharset min.unicharset hin.exp1.box Extracting unicharset from box file hin.exp1.box Invalid start of grapheme sequence:M=0x93e Normalization failed for string 'ा' Invalid

Re: [tesseract-ocr] 2 min on 1 page TIFF using Fast trained data

2020-04-14 Thread Zdenko Podobny
Without AVX support tesseract 4/5 will be slow(er). So try to focus on this. Using more than one lang will slower OCR too... Zdenko ut 14. 4. 2020 o 5:56 Ravil R napísal(a): > Oh you gave so much info, thanks! > My test exe file shows this version information: > tesseract 5.0.0 >

[tesseract-ocr] Tesseract functions in C

2020-04-14 Thread Pooja Kamra
Hi, I am using tesseract in my c project. Also i gone through below link: https://tesseract-ocr.github.io/tessdoc/APIExample.html In this sample code what TessBaseAPI means: TessBaseAPI *handle; In Tesseract code it is a class. What does it mean in c code and how i need to declare it? Please