sorry shree I just deleted my ubuntu completely and felling regret for wasting my time on s*** training tesseract guide It's so complicated mixed and every step that I took I got a new error. I think if the developers want more languages to be trained, take an action and make a gui software for training or prepare a better documentation. I think anyone that worked with tesseract training tool feel what I say, it's 4 days and nights working on but no luck! but I want you train the model if you can! I'll give you all needed files and make also unicharset and wordlist and also training texts, just create and train it. I'm waiting for you reply . On Wednesday, January 8, 2020 at 5:32:48 PM UTC+3:30, shree wrote: > > you can test with attached traineddata file for Kurdish. > > On Wed, Jan 8, 2020 at 7:08 PM Ayub Rauf <[email protected] <javascript:>> > wrote: > >> Training from scratch will take a long time - days/weeks ! also if I >> want to train only for one font? >> I wanna train Kurdish written in Arabic script but in Arabic script >> traineddada we have a lots of characters that doesn't exists in Kurdish. >> can you tell me a shortcut for that "long time - days/weeks". I want to >> make a best traineddata for it. >> thanks again >> On Wednesday, January 8, 2020 at 4:07:42 PM UTC+3:30, shree wrote: >>> >>> If you want to train using text, then you also need to specify a set of >>> fonts. eg. >>> >>> ~/tesseract/src/training/tesstrain.sh \ >>> --fonts_dir ~/.fonts \ >>> --lang ara \ >>> --linedata_only \ >>> --noextract_font_properties \ >>> --langdata_dir ~/langdata \ >>> --tessdata_dir ~/tessdata \ >>> --fontlist "Amiri" \ >>> "Amiri Bold Italic" \ >>> "Amiri Bold" \ >>> "Amiri Italic" \ >>> --training_text ./ara.training_text \ >>> --workspace_dir ~/tmp/ \ >>> --save_box_tiff \ >>> --output_dir ~/tesstutorial/araeval >>> >>> This will create a set of lstmf files and their list and those can be >>> used for lstmtraining. >>> >>> If you don't want to use existing traineddata, then follow instructions >>> to train from scratch - >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#training-from-scratch >>> >>> >>> Training from scratch will take a long time - days/weeks. >>> >>> On Wed, Jan 8, 2020 at 4:09 PM Ayub Rauf <[email protected]> wrote: >>> >>>> Thanks it helped and I could create a multi-page tif but as you know >>>> tesseract 4 accept single line tif with his truth text and doesn't need >>>> box >>>> file, am I right?I say that i only need lstmf file not box! is that right? >>>> >>>> anyway I'll find a splitter and get data ready. Do you have any solution >>>> for that can split and rename files automatically, multi-page tif and also >>>> multi-line text? >>>> And does those two files I mean tif and truth text paired files will >>>> be enough for start create my language model? because when I try to >>>> training it says "Tesseract couldn't load any languages! >>>> Could not initialize tesseract." >>>> when I searched for making .traindata I found tesstrain.sh >>>> <https://github.com/tesseract-ocr/tesseract/blob/master/src/training/tesstrain.sh> >>>> but >>>> don't know how to run it and work with it, so please if you can help me to >>>> make a new traindata because I don't wanna use existing traindata! >>>> Thanks >>>> >>>> >>>> On Wednesday, January 8, 2020 at 8:35:56 AM UTC+3:30, shree wrote: >>>>> >>>>> Read your textfile line by line >>>>> run text2image to create box/tif, similar to following. >>>>> >>>>> text2image --fonts_dir="$unicodefontdir" --text="${linetext}" >>>>> --strip_unrenderable_words --xsize=2500 --ysize=300 --leading=32 >>>>> --margin=12 --exposure=0 --font="$fontname" --outputbase="${fontname// >>>>> /_}.exp0" >>>>> >>>>> >>>>> run tesseract to create lstmf files , similar to following. >>>>> >>>>> tesseract "${fontname// /_}.exp0".tif "${fontname// /_}.exp0" -l >>>>> "$lang" --psm 13 --dpi 300 lstm.train >>>>> >>>>> >>>>> >>>>> On Wed, Jan 8, 2020 at 1:24 AM Ayub Rauf <[email protected]> wrote: >>>>> >>>>>> Hi please someone help me how to create single-line tif from texts >>>>>> and use them for training my model. >>>>>> Thanks >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/47c002a2-9a79-431d-8ff5-8acce2e00941%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/47c002a2-9a79-431d-8ff5-8acce2e00941%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/4f67b2af-b14e-4a9c-848a-af72d3272a1d%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/4f67b2af-b14e-4a9c-848a-af72d3272a1d%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/827b054d-1ac3-49c1-96ca-0159adf0ebc3%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/827b054d-1ac3-49c1-96ca-0159adf0ebc3%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/57c84ff8-5340-4128-8889-a2d8846ce7e0%40googlegroups.com.

