Please see https://github.com/Shreeshrii/tesstrain-xsa
On Sat, Mar 7, 2020 at 6:54 PM Shree Devi Kumar <[email protected]> wrote: > I have created an example traineddata for xsa. I will upload later today. > You can then modify with a larger training text and run training. > > On Sat, Mar 7, 2020, 02:58 aby tesh <[email protected]> wrote: > >> I think it is, most likely , Right To Left, it has passed that error now >>>> using eng since i only have the traindata for it, the other issue i am >>>> encountering is >>> >>> >> === Starting training for language 'eng' >> [Sat 07 Mar 2020 12:26:06 AM EAT] /usr/bin/text2image >> --fonts_dir=./sabaean_fonts/ --ptsize 12 --font=Sabaean >> --outputbase=/tmp/fc-cache/sample_text.txt >> --text=/tmp/fc-cache/sample_text.txt --fontconfig_tmpdir=/tmp/fc-cache >> Fontconfig warning: "/tmp/fc-cache/fonts.conf", line 4: Use of ambiguous >> path in <dir> element. please add prefix="cwd" if current behavior is >> desired. >> Stripped 1 unrenderable words >> Rendered page 0 to file /tmp/fc-cache/sample_text.txt.tif >> >> === Phase I: Generating training images === >> Rendering using Sabaean >> [Sat 07 Mar 2020 12:26:08 AM EAT] /usr/bin/text2image >> --fontconfig_tmpdir=/tmp/fc-cache --fonts_dir=./sabaean_fonts/ >> --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 >> --exposure=0 --outputbase=/tmp/eng-2020-03-07.lif/eng.Sabaean.exp0 >> --max_pages=0 --font=Sabaean --ptsize 12 >> --text=./tesslang/eng/eng.training_text >> Fontconfig warning: "/tmp/fc-cache/fonts.conf", line 4: Use of ambiguous >> path in <dir> element. please add prefix="cwd" if current behavior is >> desired. >> Stripped 2 unrenderable words >> Rendered page 0 to file /tmp/eng-2020-03-07.lif/eng.Sabaean.exp0.tif >> >> === Phase UP: Generating unicharset and unichar properties files === >> [Sat 07 Mar 2020 12:26:08 AM EAT] /usr/bin/unicharset_extractor >> --output_unicharset /tmp/eng-2020-03-07.lif/eng.unicharset --norm_mode 1 >> /tmp/eng-2020-03-07.lif/eng.Sabaean.exp0.box >> Failed to read data from: /tmp/eng-2020-03-07.lif/eng.Sabaean.exp0.box >> Wrote unicharset file /tmp/eng-2020-03-07.lif/eng.unicharset >> [Sat 07 Mar 2020 12:26:08 AM EAT] /usr/bin/set_unicharset_properties -U >> /tmp/eng-2020-03-07.lif/eng.unicharset -O >> /tmp/eng-2020-03-07.lif/eng.unicharset -X >> /tmp/eng-2020-03-07.lif/eng.xheights --script_dir=./langdata >> Loaded unicharset of size 3 from file >> /tmp/eng-2020-03-07.lif/eng.unicharset >> Setting unichar properties >> Setting script properties >> Failed to load script unicharset from:./langdata/Latin.unicharset >> Writing unicharset to file /tmp/eng-2020-03-07.lif/eng.unicharset >> >> === Phase E: Generating lstmf files === >> Using TESSDATA_PREFIX=./tessdata/ >> [Sat 07 Mar 2020 12:26:08 AM EAT] /usr/bin/tesseract >> /tmp/eng-2020-03-07.lif/eng.Sabaean.exp0.tif >> /tmp/eng-2020-03-07.lif/eng.Sabaean.exp0 --psm 6 lstm.train >> read_params_file: Can't open lstm.train >> Tesseract Open Source OCR Engine v4.1.1 with Leptonica >> Page 1 >> ERROR: /tmp/eng-2020-03-07.lif/eng.Sabaean.exp0.lstmf does not exist or >> is not readable >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/ee9d5e16-328e-480d-ab2c-4ca4de708381%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/ee9d5e16-328e-480d-ab2c-4ca4de708381%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXzjuDjdTtKPCrDK85v%2Bi12nUCBxy9X_W44%2Bi32ZT_hdQ%40mail.gmail.com.

