I returned to this job.
четверг, 4 июня 2020 г., 19:13:58 UTC+3 пользователь Piyush Chandra написал:
>
> This is what is missing : --net_spec . Check the line below that I
> mentioned before.
>
> lstmtraining --traineddata ./out/own/own.traineddata --model_output
> ./output/own --net_spec
This is what is missing : --net_spec . Check the line below that I
mentioned before.
lstmtraining --traineddata ./out/own/own.traineddata --model_output
./output/own --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256
O1c110]" --train_listfile ./eng_ltsm/eng.training_files.txt
понедельник, 1 июня 2020 г., 19:36:07 UTC+3 пользователь shree написал:
This is for Latin script not Latin language.
> wget the file from
> https://github.com/tesseract-ocr/langdata_lstm/blob/master/Latin.unicharset
>
>
Ok, I did it, and some next steps.
On step
### Train:
> lstmtraining .
понедельник, 1 июня 2020 г., 19:37:25 UTC+3 пользователь shree написал:
>
> You may find this repo useful
>
> https://github.com/UYousafzai/easy_train_tesseract
>
> You don't understand. I don't want training to new fonts of existing
language. I want a new language.
--
You received this
You may find this repo useful
https://github.com/UYousafzai/easy_train_tesseract
On Mon, Jun 1, 2020 at 10:05 PM Shree Devi Kumar
wrote:
> >Failed to load script unicharset from:./langdata/Latin.unicharset"
>
> This is for Latin script not Latin language.
> wget the file from
>
>Failed to load script unicharset from:./langdata/Latin.unicharset"
This is for Latin script not Latin language.
wget the file from
https://github.com/tesseract-ocr/langdata_lstm/blob/master/Latin.unicharset
On Mon, Jun 1, 2020 at 8:16 PM Владимир Калачихин
wrote:
> Hi!
> понедельник, 1 июня
Hi!
понедельник, 1 июня 2020 г., 11:23:39 UTC+3 пользователь shree написал:
>
>
> ### create tif and box using fonts and training text
> text2image --fonts_dir=/home/ubuntu/.fonts
> --outputbase=/mylang.myfont.exp0 --max_pages=0 --font=myfont
> --text=../langdata/mylang/mylang.training_text
>
So, modify the info given by Piyush Chandra earlier in this thread. The
paths needs to based on where you have the files.
### create tif and box using fonts and training text
text2image --fonts_dir=/home/ubuntu/.fonts
--outputbase=/mylang.myfont.exp0 --max_pages=0 --font=myfont
воскресенье, 31 мая 2020 г., 19:16:55 UTC+3 пользователь shree написал:
>
> Use tesstrain.sh or tesstrain.py
>
> On Sun, May 31, 2020 at 6:45 PM Владимир Калачихин > wrote:
>
>> Ok, I want to train from training text and fonts.
>> Whats method must be?
>>
>
I thought You knew that you can't
Use tesstrain.sh or tesstrain.py
On Sun, May 31, 2020 at 6:45 PM Владимир Калачихин
wrote:
> Ok, I want to train from training text and fonts.
> Whats method must be?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe
Ok, I want to train from training text and fonts.
Whats method must be?
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
What I mentioned was for the case where you have images and their
groundtruth. gt.txt is the grountruth - expected correct output from that
image.
If you want to train from training text and fonts, then the method is
different.
On Sun, May 31, 2020, 18:32 Владимир Калачихин
wrote:
> Hi !
>
> I
Hi !
I still don't understand.
пятница, 29 мая 2020 г., 15:02:22 UTC+3 пользователь shree написал:
> Input Files
>
> myfile1.png
> myfile1.gt.txt
>
>
Is "myfile1.png" - the picture with training text?
What is "myfile1.gt.txt"?
--
You received this message because you are subscribed to the
On Thu, May 28, 2020 at 9:55 PM Владимир Калачихин
wrote:
>
> I don't quite understand You.
> Could you give us an example of use tesseract to create wordstrbox, and
> use combine_lang_model with groundtruth text?
>
For starting from images and their groundtruth, it would be similar to the
I don't quite understand You.
Could you give us an example of use tesseract to create wordstrbox, and use
combine_lang_model with groundtruth text?
четверг, 28 мая 2020 г., 18:21:31 UTC+3 пользователь shree написал:
>
> lstmbox creates character level box files.
>
> Wordstrbox creates line
lstmbox creates character level box files.
Wordstrbox creates line level box files.
If using wordstrbox, please use the groundtruth text for creating
unicharset instead of the box files.
On Thu, May 28, 2020, 20:49 Владимир Калачихин
wrote:
>
> четверг, 28 мая 2020 г., 16:36:14 UTC+3
четверг, 28 мая 2020 г., 16:36:14 UTC+3 пользователь shree написал:
> Alternately you can use wordstrbox config file.
>
> What is "wordstrbox config file"?
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and
>Create box files: tesseract /path/to/image.tif
path/and/nameof/boxfile/imgae lstmbox
Alternately you can use wordstrbox config file.
In both cases, if you are generating box files from images, the box files
need to be corrected before proceeding for training.
On Thu, May 28, 2020 at 5:51
Hi!
Another question:
четверг, 28 мая 2020 г., 8:04:03 UTC+3 пользователь Piyush Chandra написал:
>
>
> Create box files: tesseract /path/to/image.tif
> path/and/nameof/boxfile/imgae lstmbox
>
>
>
On this step tesseract recognize the image? What if this does it badly?
Can I specify what text is
четверг, 28 мая 2020 г., 14:46:10 UTC+3 пользователь Piyush Chandra написал:
>
> Read about --Net spec here:
> https://tesseract-ocr.github.io/tessdoc/VGSLSpecs
>
> Yes, but why custom net configuration for common task?
And, which net configuration well suited for trainning to math symbols?
Is it required: "--words...", "--numbers..." and "--puncs"? => No, they are
optional
Read about --Net spec here:
https://tesseract-ocr.github.io/tessdoc/VGSLSpecs
On Thursday, 28 May 2020 15:12:04 UTC+5:30, Владимир Калачихин wrote:
>
> Hi!
>
> четверг, 28 мая 2020 г., 8:04:03 UTC+3
Hi!
четверг, 28 мая 2020 г., 8:04:03 UTC+3 пользователь Piyush Chandra написал:
>
> Hope below information helps: :)
>
>
Pls, some questions:
Is it required: "--words...", "--numbers..." and "--puncs"?
Why do need "--net_spec..."?
--
You received this message because you are subscribed to the
Hi,
Hope below information helps: :)
Creating trained data file own.traineddata :
Create box files: tesseract /path/to/image.tif
path/and/nameof/boxfile/imgae lstmbox
Create unicharset file: unicharset_extractor --norm_mode 1
--output_unicharset ./output/folder/own.unicharset
23 matches
Mail list logo