I have created an example traineddata for xsa. I will upload later today. You can then modify with a larger training text and run training.
On Sat, Mar 7, 2020, 02:58 aby tesh <[email protected]> wrote: > I think it is, most likely , Right To Left, it has passed that error now >>> using eng since i only have the traindata for it, the other issue i am >>> encountering is >> >> > === Starting training for language 'eng' > [Sat 07 Mar 2020 12:26:06 AM EAT] /usr/bin/text2image > --fonts_dir=./sabaean_fonts/ --ptsize 12 --font=Sabaean > --outputbase=/tmp/fc-cache/sample_text.txt > --text=/tmp/fc-cache/sample_text.txt --fontconfig_tmpdir=/tmp/fc-cache > Fontconfig warning: "/tmp/fc-cache/fonts.conf", line 4: Use of ambiguous > path in <dir> element. please add prefix="cwd" if current behavior is > desired. > Stripped 1 unrenderable words > Rendered page 0 to file /tmp/fc-cache/sample_text.txt.tif > > === Phase I: Generating training images === > Rendering using Sabaean > [Sat 07 Mar 2020 12:26:08 AM EAT] /usr/bin/text2image > --fontconfig_tmpdir=/tmp/fc-cache --fonts_dir=./sabaean_fonts/ > --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 > --exposure=0 --outputbase=/tmp/eng-2020-03-07.lif/eng.Sabaean.exp0 > --max_pages=0 --font=Sabaean --ptsize 12 > --text=./tesslang/eng/eng.training_text > Fontconfig warning: "/tmp/fc-cache/fonts.conf", line 4: Use of ambiguous > path in <dir> element. please add prefix="cwd" if current behavior is > desired. > Stripped 2 unrenderable words > Rendered page 0 to file /tmp/eng-2020-03-07.lif/eng.Sabaean.exp0.tif > > === Phase UP: Generating unicharset and unichar properties files === > [Sat 07 Mar 2020 12:26:08 AM EAT] /usr/bin/unicharset_extractor > --output_unicharset /tmp/eng-2020-03-07.lif/eng.unicharset --norm_mode 1 > /tmp/eng-2020-03-07.lif/eng.Sabaean.exp0.box > Failed to read data from: /tmp/eng-2020-03-07.lif/eng.Sabaean.exp0.box > Wrote unicharset file /tmp/eng-2020-03-07.lif/eng.unicharset > [Sat 07 Mar 2020 12:26:08 AM EAT] /usr/bin/set_unicharset_properties -U > /tmp/eng-2020-03-07.lif/eng.unicharset -O > /tmp/eng-2020-03-07.lif/eng.unicharset -X > /tmp/eng-2020-03-07.lif/eng.xheights --script_dir=./langdata > Loaded unicharset of size 3 from file > /tmp/eng-2020-03-07.lif/eng.unicharset > Setting unichar properties > Setting script properties > Failed to load script unicharset from:./langdata/Latin.unicharset > Writing unicharset to file /tmp/eng-2020-03-07.lif/eng.unicharset > > === Phase E: Generating lstmf files === > Using TESSDATA_PREFIX=./tessdata/ > [Sat 07 Mar 2020 12:26:08 AM EAT] /usr/bin/tesseract > /tmp/eng-2020-03-07.lif/eng.Sabaean.exp0.tif > /tmp/eng-2020-03-07.lif/eng.Sabaean.exp0 --psm 6 lstm.train > read_params_file: Can't open lstm.train > Tesseract Open Source OCR Engine v4.1.1 with Leptonica > Page 1 > ERROR: /tmp/eng-2020-03-07.lif/eng.Sabaean.exp0.lstmf does not exist or is > not readable > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/ee9d5e16-328e-480d-ab2c-4ca4de708381%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/ee9d5e16-328e-480d-ab2c-4ca4de708381%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUPxpTkmT5cLij8hgWnAHfObt6MkLSpRvppkbZD7_beMA%40mail.gmail.com.

