Please read tesstrain_utils.sh if you want to know the details. Dictionary files are built from your sources in langdata. Unicharset is also built from your training_text in langdata.
On 24-Sep-2017 7:05 PM, "Dan9er" <[email protected]> wrote: > That answer doesn't help me. > > How can I add dictionary files to tesstrain? > > On Saturday, September 23, 2017 at 12:05:37 PM UTC-4, shree wrote: >> >> You cannot use a random unicharset, it needs to be the same one used for >> training the model. >> >> For multiple exposures, use the following method >> >> training/tesstrain.sh \ >> --fonts_dir /mnt/c/Windows/Fonts \ >> --lang eng \ >> --noextract_font_properties --linedata_only \ >> --exposures "-1, 0, 1" \ >> --langdata_dir ../langdata \ >> --tessdata_dir ../tessdata \ >> --fontlist \ >> "Arial" \ >> "Tahoma" \ >> "Times New Roman," \ >> "Sanskrit 2003," \ >> "FreeSerif Italic" \ >> "Times New Roman, Italic" \ >> --output_dir ../tesstutorial/eng >> >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Sat, Sep 23, 2017 at 8:46 PM, Dan9er <[email protected]> wrote: >> >>> I'm making a unicharset file so I can compile DAWG dictionary files so >>> I can use it with tesstrain.sh. I want to use multiple exposures (-1, >>> 0,1) for the tiff/box pairs. How should name them to separate the >>> different exposures? >>> >>> Can I do this?: >>> >>> lang.Arial.exp0 >>> lang.Arial.exp1 >>> lang.Arial.exp2 >>> >>> Or will changing the file numbers screw things up? As an alternative, >>> can I do this?: >>> >>> lang.Arial0.exp0 >>> lang.Arial1.exp0 >>> lang.Arial2.exp0 >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/6e9f4a45-5dde-41f6-8a41-a403778aef54%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/6e9f4a45-5dde-41f6-8a41-a403778aef54%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/f473592f-3bc3-4e8f-b625-6a14b2d3bfba% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/f473592f-3bc3-4e8f-b625-6a14b2d3bfba%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWGPqKCNiywjaTTn%2B1ZZF4XjGE-wRCohDoeYF2gafngRw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

