the problem is still there, i saw those links but problem is still here On Tue, Jul 3, 2018 at 12:54 AM, Shree Devi Kumar <[email protected]> wrote:
> also see https://github.com/tesseract-ocr/tesseract/issues/549 > > > > On Mon, Jul 2, 2018 at 7:45 PM Shree Devi Kumar <[email protected]> > wrote: > >> You can use find_fonts with your training_text to locate the fonts to use. >> >> Modify the following command to match your directory setup and try >> >> echo "###### FIND FONTS ######" >> # Find fonts which can render your training_text. Run `fc-cache -vf` to >> refresh cache. >> # You can change the minimum coverage % as needed. >> # This process can take a while if you have a number of installed fonts. >> # Review the generated fontlist and modify, if needed. >> # 2000 fonts found. Use a smaller set >> >> nice text2image --find_fonts \ >> --fonts_dir $fonts_dir \ >> --text $langdata_dir/$Lang/$Lang.training_text \ >> --min_coverage 0.999 \ >> --render_per_font=false \ >> --outputbase $langdata_dir/$Lang/$Lang \ >> |& grep raw \ >> | sed -e 's/ :.*/@ \\/g' \ >> | sed -e "s/^/ '/" \ >> | sed -e "s/@/'/g" > $langdata_dir/$Lang/$Lang.fontslist.txt >> >> On Mon, Jul 2, 2018 at 12:06 PM ran go <[email protected]> wrote: >> >>> in my opinion error is for font-type, for some font there is no error >>> but for some other fonts there is error >>> >>> On Mon, Jul 2, 2018 at 9:15 AM, john <[email protected]> wrote: >>> >>>> I use tesseract 4.0.0-beta.1. downloaded from this link (UB mannheim) >>>> <https://github.com/UB-Mannheim/tesseract/tree/v4.0.0-beta.1.20180414> >>>> >>>> On Saturday, June 30, 2018 at 7:13:30 PM UTC+4:30, shree wrote: >>>>> >>>>> Also check that there is no tab or other unprintable character in your >>>>> training text. >>>>> >>>>> Which version of tesseract are you using? show output of >>>>> >>>>> tesseract -v >>>>> >>>>> >>>>> On Sat, Jun 30, 2018 at 8:04 PM Shree Devi Kumar <[email protected]> >>>>> wrote: >>>>> >>>>>> Then there must be a mismatch between the unicharset you are using >>>>>> and the training text. eg. check whether the copyright symbol is in your >>>>>> unicharset. >>>>>> >>>>>> On Sat, Jun 30, 2018 at 4:48 PM john <[email protected]> wrote: >>>>>> >>>>>>> I saw that link. this error occured many times,how can i prevent >>>>>>> that? >>>>>>> >>>>>>> On Saturday, June 30, 2018 at 3:17:26 PM UTC+4:30, shree wrote: >>>>>>>> >>>>>>>> see https://github.com/tesseract-ocr/tesseract/wiki/ >>>>>>>> TrainingTesseract-4.00#error-messages-from-training >>>>>>>> >>>>>>>> On Sat, Jun 30, 2018 at 3:23 PM john <[email protected]> wrote: >>>>>>>> >>>>>>>>> Encoding of string failed! Failure bytes: ffffffc2 ffffffa9 20 >>>>>>>>> ffffffd8 ffffffa8 ffffffd8 ffffffa7 ffffffd8 ffffffae ffffffd8 >>>>>>>>> ffffffaa >>>>>>>>> ffffffd9 ffffff86 ffffffd8 ffffffa7 20 ffffffd9 ffffff84 ffffffd8 >>>>>>>>> ffffffa7 >>>>>>>>> ffffffd8 ffffffa4 ffffffd8 ffffffb3 20 ffffffdb ffffff8c ffffffd9 >>>>>>>>> ffffff86 >>>>>>>>> ffffffd8 ffffffa7 ffffffd8 ffffffb1 ffffffdb ffffff8c ffffffd8 >>>>>>>>> ffffffa7 20 >>>>>>>>> ffffffd8 ffffffa7 ffffffd8 ffffffa8 20 ffffffd8 ffffffaa ffffffd8 >>>>>>>>> ffffffa8 >>>>>>>>> ffffffd8 ffffffab ffffffd9 ffffff87 20 ffffffd8 ffffffaf ffffffd8 >>>>>>>>> ffffffa7 >>>>>>>>> ffffffd9 ffffff81 ffffffd8 ffffffaa ffffffd8 ffffffb3 ffffffd8 >>>>>>>>> ffffffa7 20 >>>>>>>>> ffffffd9 ffffff86 ffffffdb ffffff8c ffffffd9 ffffff86 ffffffda >>>>>>>>> ffffff86 >>>>>>>>> ffffffd9 ffffff85 ffffffd9 ffffff87 20 ffffffd9 ffffff82 ffffffd9 >>>>>>>>> ffffff84 >>>>>>>>> ffffffd8 ffffffb7 ffffffd9 ffffff85 >>>>>>>>> Can't encode transcription: '۱۹ 2006© باختنا لاؤس یناریا اب تبثه >>>>>>>>> دافتسا نینچمه قلطم' in language '' >>>>>>>>> ^C >>>>>>>>> >>>>>>>>> when I finetune network for fas language i see top error? >>>>>>>>> what is wrong with training? >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To post to this group, send email to [email protected]. >>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/11d5277e- >>>>>>>>> 2ef1-4ae9-8cb3-3f38290c1dfc%40googlegroups.com >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/11d5277e-2ef1-4ae9-8cb3-3f38290c1dfc%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> ____________________________________________________________ >>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/bb5696d3- >>>>>>> f251-4181-a1a2-dcd6b0bbdf62%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/bb5696d3-f251-4181-a1a2-dcd6b0bbdf62%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/tesseract-ocr/fb051eec-930c-4114-b2d7-a574aa6e79b5% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/fb051eec-930c-4114-b2d7-a574aa6e79b5%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/tesseract-ocr/CAH8gkc9V_Ocb5S-Aq%2BaHP% >>> 3DTXBZcfxCBJ2v2XbRdU8mMpzvNJTg%40mail.gmail.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/CAH8gkc9V_Ocb5S-Aq%2BaHP%3DTXBZcfxCBJ2v2XbRdU8mMpzvNJTg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/CAG2NduUqWiCer_Auz7yxWuerQ6C5MbEbh% > 2BsSy37twQ%3DDtOL4WQ%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUqWiCer_Auz7yxWuerQ6C5MbEbh%2BsSy37twQ%3DDtOL4WQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAH8gkc_graaYuB7uv1L4o7C9pxMikzdSy2j7gbwAJdXgO76ZzQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

