Added that and it worked perfectly. I'm finally done.
On Saturday, September 16, 2017 at 7:41:39 PM UTC-4, Dan9er wrote: > > I ditched my 500+ font fontlist for one with just 3. It runs much faster > now, and I got to Phase M before I got a ./langdata/font_properties does > not exist or is not readable error. > > On Sunday, September 10, 2017 at 2:08:10 PM UTC-4, Dan9er wrote: >> >> Did that, and it actually started training! It almost got to the end of >> my font list before... >> ERROR: /tmp/tmp.YLL4mGn66F/npn/npn.Aileron_Heavy.exp0.tr does not exist >> or is not readable >> >> Also, there were some ERRORs, but there weren't any FATALITYies (lol), so >> I think i'm good. >> >> On Sunday, September 10, 2017 at 12:13:25 PM UTC-4, shree wrote: >>> >>> >>> 1. Fontconfig error: line 1: no element found >>> 2. Fontconfig error: Cannot load default config file >>> 3. Could not find font named NanumMyeongjo Semi-Bold. >>> 4. Pango suggested font NanumMyeongjo Bold. >>> 5. Please correct --font arg. >>> 6. ERROR: /tmp/tmp.tiLxemomPr/npn/npn.NanumMyeongjo_Semi-Bold.exp0.box >>> does not exist or is not readable >>> >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Sun, Sep 10, 2017 at 9:37 PM, Dan9er <[email protected]> wrote: >>> >>>> I added the common.punc file. But now I'm getting the box error again: >>>> https://pastebin.com/BsNL3KJv >>>> >>>> On Saturday, September 9, 2017 at 3:22:48 PM UTC-4, shree wrote: >>>>> >>>>> https://github.com/tesseract-ocr/langdata/blob/master/common.punc >>>>> >>>>> You should read the Readme.md in langada repo for info on the files >>>>> required for training g >>>>> >>>>> On 10-Sep-2017 12:39 AM, "Dan9er" <[email protected]> wrote: >>>>> >>>>> Ok, I made a sh that runs tesstrain.sh with all 562 compatible fonts. >>>>> But now I'm getting an error saying ./langdata/common.punc does not >>>>> exist... https://pastebin.com/8aaMjH6k >>>>> >>>>> On Saturday, September 9, 2017 at 12:51:45 PM UTC-4, shree wrote: >>>>> >>>>>> Your command needs to be on the following lines: >>>>>> >>>>>> training/tesstrain.sh \ >>>>>> --fonts_dir /home/shree/.fonts \ >>>>>> --tessdata_dir ./tessdata \ >>>>>> --training_text ../langdata/ben/ben.training_text \ >>>>>> --langdata_dir ../langdata \ >>>>>> --lang ben \ >>>>>> --linedata_only \ >>>>>> --noextract_font_properties \ >>>>>> --exposures "0" \ >>>>>> --fontlist "e-Grantamil" \ >>>>>> "e-Grantha OT" \ >>>>>> --output_dir ~/tesstutorial/ben >>>>>> >>>>>> See the fontlist argument, it is quoted names of the fonts. You can >>>>>> put one on each line with \ >>>>>> >>>>>> >>>>>> >>>>>> ShreeDevi >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>>> On Sat, Sep 9, 2017 at 10:12 PM, Dan9er <[email protected]> wrote: >>>>>> >>>>>>> Nope. 😢 >>>>>>> https://pastebin.com/BskUsSm7 >>>>>>> >>>>>>> On Saturday, September 9, 2017 at 11:57:18 AM UTC-4, Dan9er wrote: >>>>>>>> >>>>>>>> I think I now know how to do it. >>>>>>>> >>>>>>>> I have to run training/text2image --find_fonts and then set the >>>>>>>> tesstrain --fontlist flag to the file that is generated. >>>>>>>> >>>>>>>> On Thursday, September 7, 2017, at 2:19:09 PM UTC-4, Dan9er wrote: >>>>>>>>> >>>>>>>>> I'm trying to train tesseract using tesstrain and I'm getting this >>>>>>>>> error: https://pastebin.com/xJj3w9jZ >>>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> >>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/ee1d68eb-a92c-4a30-905f-ac52128bccb6%40googlegroups.com >>>>>>> >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/ee1d68eb-a92c-4a30-905f-ac52128bccb6%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/43979ac1-6555-4ae3-a6da-330c3b0dce16%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/43979ac1-6555-4ae3-a6da-330c3b0dce16%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/5c2e194a-ffbf-44f8-b0a1-42693ea70d69%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/5c2e194a-ffbf-44f8-b0a1-42693ea70d69%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e4125a13-d63c-4457-af9e-b88fbc3afca4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

