Hi @shree i'm sorry for what happened. please tell me how to create a model like this? it's accurate but not so good i wanna create one like this from scratch. please aswer
On Wednesday, January 8, 2020 at 5:32:48 PM UTC+3:30, shree wrote: > > you can test with attached traineddata file for Kurdish. > > On Wed, Jan 8, 2020 at 7:08 PM Ayub Rauf <[email protected] <javascript:>> > wrote: > >> Training from scratch will take a long time - days/weeks ! also if I >> want to train only for one font? >> I wanna train Kurdish written in Arabic script but in Arabic script >> traineddada we have a lots of characters that doesn't exists in Kurdish. >> can you tell me a shortcut for that "long time - days/weeks". I want to >> make a best traineddata for it. >> thanks again >> On Wednesday, January 8, 2020 at 4:07:42 PM UTC+3:30, shree wrote: >>> >>> If you want to train using text, then you also need to specify a set of >>> fonts. eg. >>> >>> ~/tesseract/src/training/tesstrain.sh \ >>> --fonts_dir ~/.fonts \ >>> --lang ara \ >>> --linedata_only \ >>> --noextract_font_properties \ >>> --langdata_dir ~/langdata \ >>> --tessdata_dir ~/tessdata \ >>> --fontlist "Amiri" \ >>> "Amiri Bold Italic" \ >>> "Amiri Bold" \ >>> "Amiri Italic" \ >>> --training_text ./ara.training_text \ >>> --workspace_dir ~/tmp/ \ >>> --save_box_tiff \ >>> --output_dir ~/tesstutorial/araeval >>> >>> This will create a set of lstmf files and their list and those can be >>> used for lstmtraining. >>> >>> If you don't want to use existing traineddata, then follow instructions >>> to train from scratch - >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#training-from-scratch >>> >>> >>> Training from scratch will take a long time - days/weeks. >>> >>> On Wed, Jan 8, 2020 at 4:09 PM Ayub Rauf <[email protected]> wrote: >>> >>>> Thanks it helped and I could create a multi-page tif but as you know >>>> tesseract 4 accept single line tif with his truth text and doesn't need >>>> box >>>> file, am I right?I say that i only need lstmf file not box! is that right? >>>> >>>> anyway I'll find a splitter and get data ready. Do you have any solution >>>> for that can split and rename files automatically, multi-page tif and also >>>> multi-line text? >>>> And does those two files I mean tif and truth text paired files will >>>> be enough for start create my language model? because when I try to >>>> training it says "Tesseract couldn't load any languages! >>>> Could not initialize tesseract." >>>> when I searched for making .traindata I found tesstrain.sh >>>> <https://github.com/tesseract-ocr/tesseract/blob/master/src/training/tesstrain.sh> >>>> but >>>> don't know how to run it and work with it, so please if you can help me to >>>> make a new traindata because I don't wanna use existing traindata! >>>> Thanks >>>> >>>> >>>> On Wednesday, January 8, 2020 at 8:35:56 AM UTC+3:30, shree wrote: >>>>> >>>>> Read your textfile line by line >>>>> run text2image to create box/tif, similar to following. >>>>> >>>>> text2image --fonts_dir="$unicodefontdir" --text="${linetext}" >>>>> --strip_unrenderable_words --xsize=2500 --ysize=300 --leading=32 >>>>> --margin=12 --exposure=0 --font="$fontname" --outputbase="${fontname// >>>>> /_}.exp0" >>>>> >>>>> >>>>> run tesseract to create lstmf files , similar to following. >>>>> >>>>> tesseract "${fontname// /_}.exp0".tif "${fontname// /_}.exp0" -l >>>>> "$lang" --psm 13 --dpi 300 lstm.train >>>>> >>>>> >>>>> >>>>> On Wed, Jan 8, 2020 at 1:24 AM Ayub Rauf <[email protected]> wrote: >>>>> >>>>>> Hi please someone help me how to create single-line tif from texts >>>>>> and use them for training my model. >>>>>> Thanks >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/47c002a2-9a79-431d-8ff5-8acce2e00941%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/47c002a2-9a79-431d-8ff5-8acce2e00941%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/4f67b2af-b14e-4a9c-848a-af72d3272a1d%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/4f67b2af-b14e-4a9c-848a-af72d3272a1d%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/827b054d-1ac3-49c1-96ca-0159adf0ebc3%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/827b054d-1ac3-49c1-96ca-0159adf0ebc3%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3bd7ff1f-6c3b-464d-8735-05633993db6e%40googlegroups.com.

