Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-06-22 Thread Владимир Калачихин
I returned to this job. четверг, 4 июня 2020 г., 19:13:58 UTC+3 пользователь Piyush Chandra написал: > > This is what is missing : --net_spec . Check the line below that I > mentioned before. > > lstmtraining --traineddata ./out/own/own.traineddata --model_output > ./output/own --net_spec

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-06-04 Thread Piyush Chandra
This is what is missing : --net_spec . Check the line below that I mentioned before. lstmtraining --traineddata ./out/own/own.traineddata --model_output ./output/own --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c110]" --train_listfile ./eng_ltsm/eng.training_files.txt

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-06-02 Thread Владимир Калачихин
понедельник, 1 июня 2020 г., 19:36:07 UTC+3 пользователь shree написал: This is for Latin script not Latin language. > wget the file from > https://github.com/tesseract-ocr/langdata_lstm/blob/master/Latin.unicharset > > Ok, I did it, and some next steps. On step ### Train: > lstmtraining .

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-06-02 Thread Владимир Калачихин
понедельник, 1 июня 2020 г., 19:37:25 UTC+3 пользователь shree написал: > > You may find this repo useful > > https://github.com/UYousafzai/easy_train_tesseract > > You don't understand. I don't want training to new fonts of existing language. I want a new language. -- You received this

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-06-01 Thread Shree Devi Kumar
You may find this repo useful https://github.com/UYousafzai/easy_train_tesseract On Mon, Jun 1, 2020 at 10:05 PM Shree Devi Kumar wrote: > >Failed to load script unicharset from:./langdata/Latin.unicharset" > > This is for Latin script not Latin language. > wget the file from >

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-06-01 Thread Shree Devi Kumar
>Failed to load script unicharset from:./langdata/Latin.unicharset" This is for Latin script not Latin language. wget the file from https://github.com/tesseract-ocr/langdata_lstm/blob/master/Latin.unicharset On Mon, Jun 1, 2020 at 8:16 PM Владимир Калачихин wrote: > Hi! > понедельник, 1 июня

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-06-01 Thread Владимир Калачихин
Hi! понедельник, 1 июня 2020 г., 11:23:39 UTC+3 пользователь shree написал: > > > ### create tif and box using fonts and training text > text2image --fonts_dir=/home/ubuntu/.fonts > --outputbase=/mylang.myfont.exp0 --max_pages=0 --font=myfont > --text=../langdata/mylang/mylang.training_text >

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-06-01 Thread Shree Devi Kumar
So, modify the info given by Piyush Chandra earlier in this thread. The paths needs to based on where you have the files. ### create tif and box using fonts and training text text2image --fonts_dir=/home/ubuntu/.fonts --outputbase=/mylang.myfont.exp0 --max_pages=0 --font=myfont

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-05-31 Thread Владимир Калачихин
воскресенье, 31 мая 2020 г., 19:16:55 UTC+3 пользователь shree написал: > > Use tesstrain.sh or tesstrain.py > > On Sun, May 31, 2020 at 6:45 PM Владимир Калачихин > wrote: > >> Ok, I want to train from training text and fonts. >> Whats method must be? >> > I thought You knew that you can't

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-05-31 Thread Shree Devi Kumar
Use tesstrain.sh or tesstrain.py On Sun, May 31, 2020 at 6:45 PM Владимир Калачихин wrote: > Ok, I want to train from training text and fonts. > Whats method must be? > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-05-31 Thread Владимир Калачихин
Ok, I want to train from training text and fonts. Whats method must be? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-05-31 Thread Shree Devi Kumar
What I mentioned was for the case where you have images and their groundtruth. gt.txt is the grountruth - expected correct output from that image. If you want to train from training text and fonts, then the method is different. On Sun, May 31, 2020, 18:32 Владимир Калачихин wrote: > Hi ! > > I

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-05-31 Thread Владимир Калачихин
Hi ! I still don't understand. пятница, 29 мая 2020 г., 15:02:22 UTC+3 пользователь shree написал: > Input Files > > myfile1.png > myfile1.gt.txt > > Is "myfile1.png" - the picture with training text? What is "myfile1.gt.txt"? -- You received this message because you are subscribed to the

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-05-29 Thread Shree Devi Kumar
On Thu, May 28, 2020 at 9:55 PM Владимир Калачихин wrote: > > I don't quite understand You. > Could you give us an example of use tesseract to create wordstrbox, and > use combine_lang_model with groundtruth text? > For starting from images and their groundtruth, it would be similar to the

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-05-28 Thread Владимир Калачихин
I don't quite understand You. Could you give us an example of use tesseract to create wordstrbox, and use combine_lang_model with groundtruth text? четверг, 28 мая 2020 г., 18:21:31 UTC+3 пользователь shree написал: > > lstmbox creates character level box files. > > Wordstrbox creates line

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-05-28 Thread Shree Devi Kumar
lstmbox creates character level box files. Wordstrbox creates line level box files. If using wordstrbox, please use the groundtruth text for creating unicharset instead of the box files. On Thu, May 28, 2020, 20:49 Владимир Калачихин wrote: > > четверг, 28 мая 2020 г., 16:36:14 UTC+3

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-05-28 Thread Владимир Калачихин
четверг, 28 мая 2020 г., 16:36:14 UTC+3 пользователь shree написал: > Alternately you can use wordstrbox config file. > > What is "wordstrbox config file"? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and

Re: [tesseract-ocr] Re: Creating trainneddata from box files

2020-05-28 Thread Shree Devi Kumar
>Create box files: tesseract /path/to/image.tif path/and/nameof/boxfile/imgae lstmbox Alternately you can use wordstrbox config file. In both cases, if you are generating box files from images, the box files need to be corrected before proceeding for training. On Thu, May 28, 2020 at 5:51

[tesseract-ocr] Re: Creating trainneddata from box files

2020-05-28 Thread Владимир Калачихин
Hi! Another question: четверг, 28 мая 2020 г., 8:04:03 UTC+3 пользователь Piyush Chandra написал: > > > Create box files: tesseract /path/to/image.tif > path/and/nameof/boxfile/imgae lstmbox > > > On this step tesseract recognize the image? What if this does it badly? Can I specify what text is

[tesseract-ocr] Re: Creating trainneddata from box files

2020-05-28 Thread Владимир Калачихин
четверг, 28 мая 2020 г., 14:46:10 UTC+3 пользователь Piyush Chandra написал: > > Read about --Net spec here: > https://tesseract-ocr.github.io/tessdoc/VGSLSpecs > > Yes, but why custom net configuration for common task? And, which net configuration well suited for trainning to math symbols?

[tesseract-ocr] Re: Creating trainneddata from box files

2020-05-28 Thread Piyush Chandra
Is it required: "--words...", "--numbers..." and "--puncs"? => No, they are optional Read about --Net spec here: https://tesseract-ocr.github.io/tessdoc/VGSLSpecs On Thursday, 28 May 2020 15:12:04 UTC+5:30, Владимир Калачихин wrote: > > Hi! > > четверг, 28 мая 2020 г., 8:04:03 UTC+3

[tesseract-ocr] Re: Creating trainneddata from box files

2020-05-28 Thread Владимир Калачихин
Hi! четверг, 28 мая 2020 г., 8:04:03 UTC+3 пользователь Piyush Chandra написал: > > Hope below information helps: :) > > Pls, some questions: Is it required: "--words...", "--numbers..." and "--puncs"? Why do need "--net_spec..."? -- You received this message because you are subscribed to the

[tesseract-ocr] Re: Creating trainneddata from box files

2020-05-27 Thread Piyush Chandra
Hi, Hope below information helps: :) Creating trained data file own.traineddata : Create box files: tesseract /path/to/image.tif path/and/nameof/boxfile/imgae lstmbox Create unicharset file: unicharset_extractor --norm_mode 1 --output_unicharset ./output/folder/own.unicharset