Training from scratch will take a long time - days/weeks !   also if I want 
to train only for one font? 
I wanna train Kurdish written  in Arabic script but in Arabic script 
traineddada we have a lots of characters that doesn't exists in Kurdish. 
can you tell me a shortcut for that "long time - days/weeks". I want to 
make a best traineddata for it.
thanks again
On Wednesday, January 8, 2020 at 4:07:42 PM UTC+3:30, shree wrote:
>
> If you want to train using text, then you also need to specify a set of 
> fonts. eg.
>
> ~/tesseract/src/training/tesstrain.sh \
>   --fonts_dir ~/.fonts \
>   --lang ara \
>   --linedata_only \
>   --noextract_font_properties \
>   --langdata_dir ~/langdata \
>   --tessdata_dir ~/tessdata \
>   --fontlist "Amiri" \
>   "Amiri Bold Italic" \
>   "Amiri Bold" \
>   "Amiri Italic" \
>   --training_text ./ara.training_text \
>   --workspace_dir ~/tmp/ \
>   --save_box_tiff \
>   --output_dir ~/tesstutorial/araeval
>
> This will create a set of lstmf files and their list and those can be used 
> for lstmtraining.
>
> If you don't want to use existing traineddata, then follow instructions to 
> train from scratch -
>
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#training-from-scratch
>  
>
> Training from scratch will take a long time - days/weeks. 
>
> On Wed, Jan 8, 2020 at 4:09 PM Ayub Rauf <[email protected] <javascript:>> 
> wrote:
>
>> Thanks it helped and I could create a multi-page tif but as you know 
>> tesseract 4 accept single line tif with his truth text and doesn't need box 
>> file, am I right?I say that i only need lstmf file not box! is that right?  
>> anyway I'll find a splitter and get data ready. Do you have any solution 
>> for that can split and rename files automatically, multi-page tif and also 
>> multi-line text?
>>  And does those two files I mean tif and truth text paired files will be 
>> enough for start create my language model? because when I try to training 
>> it says "Tesseract couldn't load any languages!
>> Could not initialize tesseract."
>> when I searched for making .traindata I found  tesstrain.sh 
>> <https://github.com/tesseract-ocr/tesseract/blob/master/src/training/tesstrain.sh>
>>  but 
>> don't know how to run it and work with it, so please if you can help me to 
>> make a new traindata because I don't wanna use existing traindata!
>> Thanks
>>
>>
>> On Wednesday, January 8, 2020 at 8:35:56 AM UTC+3:30, shree wrote:
>>>
>>> Read your textfile line by line 
>>> run text2image to create box/tif, similar to following.
>>>
>>> text2image --fonts_dir="$unicodefontdir" --text="${linetext}" 
>>> --strip_unrenderable_words --xsize=2500 --ysize=300  --leading=32 
>>> --margin=12 --exposure=0  --font="$fontname"   --outputbase="${fontname// 
>>> /_}.exp0" 
>>>
>>>
>>> run tesseract to create lstmf files , similar to following. 
>>>
>>> tesseract "${fontname// /_}.exp0".tif "${fontname// /_}.exp0" -l "$lang" 
>>> --psm 13 --dpi 300 lstm.train
>>>
>>>
>>>
>>> On Wed, Jan 8, 2020 at 1:24 AM Ayub Rauf <[email protected]> wrote:
>>>
>>>> Hi please someone help me how to create single-line tif from texts and 
>>>> use them for training my model.
>>>> Thanks
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/47c002a2-9a79-431d-8ff5-8acce2e00941%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/47c002a2-9a79-431d-8ff5-8acce2e00941%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> -- 
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/4f67b2af-b14e-4a9c-848a-af72d3272a1d%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/4f67b2af-b14e-4a9c-848a-af72d3272a1d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/827b054d-1ac3-49c1-96ca-0159adf0ebc3%40googlegroups.com.

Reply via email to