[tesseract-ocr] Re: Tesstrain.sh not generating TrainedData

[email protected] Wed, 28 Apr 2021 22:20:22 -0700

Resolved. There were mismatch between trainedata used. Tesseract installed 
was of version 4.1.0. And i was giving path of downloaded tesseract 4.0.
It was causing issue.


On Saturday, April 24, 2021 at 8:25:26 PM UTC+5:30 [email protected] 
wrote:

> Hi,
>
> I am running the following command to create trained data:
> tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only 
> --fontlist "FreeMono" --noextract_font_properties --langdata_dir 
> /home/administrator/Downloads/tesseract-4.0.0/langdata --my_boxtiff_dir 
> /home/administrator/pooja/testImages/ --tessdata_dir 
> /home/administrator/Downloads/tesseract-4.0.0/tessdata --output_dir 
> /home/administrator/images/output_folder_1/
>
> After this it is printing:
> === Starting training for language 'eng'
> [Sat Apr 24 20:15:19 IST 2021] /usr/local/bin/text2image 
> --fonts_dir=/usr/share/fonts --font=FreeMono 
> --outputbase=/tmp/font_tmp.e9Fi4vFUQQ/sample_text.txt 
> --text=/tmp/font_tmp.e9Fi4vFUQQ/sample_text.txt 
> --fontconfig_tmpdir=/tmp/font_tmp.e9Fi4vFUQQ
> Rendered page 0 to file /tmp/font_tmp.e9Fi4vFUQQ/sample_text.txt.tif
>
> === Phase I: Generating training images ===
> Rendering using FreeMono
> [Sat Apr 24 20:15:23 IST 2021] /usr/local/bin/text2image 
> --fontconfig_tmpdir=/tmp/font_tmp.e9Fi4vFUQQ --fonts_dir=/usr/share/fonts 
> --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 
> --exposure=0 --outputbase=/tmp/eng-2021-04-24.Y8g/eng.FreeMono.exp0 
> --max_pages=0 --font=FreeMono 
> --text=/home/administrator/Downloads/tesseract-4.0.0/langdata/eng/eng.training_text
> Rendered page 0 to file /tmp/eng-2021-04-24.Y8g/eng.FreeMono.exp0.tif
> Rendered page 1 to file /tmp/eng-2021-04-24.Y8g/eng.FreeMono.exp0.tif
>
> === Phase UP: Generating unicharset and unichar properties files ===
> [Sat Apr 24 20:15:25 IST 2021] /usr/local/bin/unicharset_extractor 
> --output_unicharset /tmp/eng-2021-04-24.Y8g/eng.unicharset --norm_mode 1
> Usage: /usr/local/bin/unicharset_extractor [--output_unicharset filename] 
> [--norm_mode mode] box_or_text_file [...]
> Where mode means:
>  1=combine graphemes (use for Latin and other simple scripts)
>  2=split graphemes (use for Indic/Khmer/Myanmar)
>  3=pure unicode (use for Arabic/Hebrew/Thai/Tibetan)
>
> As per specification it should be end with:
> Created starter traineddata for LSTM training of language 'eng' 
> Run 'lstmtraining' command to continue LSTM training for language 'eng
>
> Please help.
>
> Regards,
> Pooja
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f701551e-045d-4ddd-85b3-9da204e9aceen%40googlegroups.com.

[tesseract-ocr] Re: Tesstrain.sh not generating TrainedData

Reply via email to