Training from scratch will take a long time - days/weeks ! also if I want to train only for one font? I wanna train Kurdish written in Arabic script but in Arabic script traineddada we have a lots of characters that doesn't exists in Kurdish. can you tell me a shortcut for that "long time - days/weeks". I want to make a best traineddata for it. thanks again On Wednesday, January 8, 2020 at 4:07:42 PM UTC+3:30, shree wrote: > > If you want to train using text, then you also need to specify a set of > fonts. eg. > > ~/tesseract/src/training/tesstrain.sh \ > --fonts_dir ~/.fonts \ > --lang ara \ > --linedata_only \ > --noextract_font_properties \ > --langdata_dir ~/langdata \ > --tessdata_dir ~/tessdata \ > --fontlist "Amiri" \ > "Amiri Bold Italic" \ > "Amiri Bold" \ > "Amiri Italic" \ > --training_text ./ara.training_text \ > --workspace_dir ~/tmp/ \ > --save_box_tiff \ > --output_dir ~/tesstutorial/araeval > > This will create a set of lstmf files and their list and those can be used > for lstmtraining. > > If you don't want to use existing traineddata, then follow instructions to > train from scratch - > > https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#training-from-scratch > > > Training from scratch will take a long time - days/weeks. > > On Wed, Jan 8, 2020 at 4:09 PM Ayub Rauf <[email protected] <javascript:>> > wrote: > >> Thanks it helped and I could create a multi-page tif but as you know >> tesseract 4 accept single line tif with his truth text and doesn't need box >> file, am I right?I say that i only need lstmf file not box! is that right? >> anyway I'll find a splitter and get data ready. Do you have any solution >> for that can split and rename files automatically, multi-page tif and also >> multi-line text? >> And does those two files I mean tif and truth text paired files will be >> enough for start create my language model? because when I try to training >> it says "Tesseract couldn't load any languages! >> Could not initialize tesseract." >> when I searched for making .traindata I found tesstrain.sh >> <https://github.com/tesseract-ocr/tesseract/blob/master/src/training/tesstrain.sh> >> but >> don't know how to run it and work with it, so please if you can help me to >> make a new traindata because I don't wanna use existing traindata! >> Thanks >> >> >> On Wednesday, January 8, 2020 at 8:35:56 AM UTC+3:30, shree wrote: >>> >>> Read your textfile line by line >>> run text2image to create box/tif, similar to following. >>> >>> text2image --fonts_dir="$unicodefontdir" --text="${linetext}" >>> --strip_unrenderable_words --xsize=2500 --ysize=300 --leading=32 >>> --margin=12 --exposure=0 --font="$fontname" --outputbase="${fontname// >>> /_}.exp0" >>> >>> >>> run tesseract to create lstmf files , similar to following. >>> >>> tesseract "${fontname// /_}.exp0".tif "${fontname// /_}.exp0" -l "$lang" >>> --psm 13 --dpi 300 lstm.train >>> >>> >>> >>> On Wed, Jan 8, 2020 at 1:24 AM Ayub Rauf <[email protected]> wrote: >>> >>>> Hi please someone help me how to create single-line tif from texts and >>>> use them for training my model. >>>> Thanks >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/47c002a2-9a79-431d-8ff5-8acce2e00941%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/47c002a2-9a79-431d-8ff5-8acce2e00941%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/4f67b2af-b14e-4a9c-848a-af72d3272a1d%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/4f67b2af-b14e-4a9c-848a-af72d3272a1d%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/827b054d-1ac3-49c1-96ca-0159adf0ebc3%40googlegroups.com.

