Thanks it helped and I could ceate a multipage tif but as you know
tesseract 4 accept single line tif with his truth text and doesn't need box
file, am I right? anyway I'll fine a splitter and get data ready. Do you
have any solution for that can split and rename files automatically?
And does those two files I mean tif and truth text paired files will be
enough for start create my language model? I'm creating a new model for
Kurdish in Arabic script and it's right to left. I saw a model when I test
it was not so good so I prepared a huge word list and so big corpus for
starting and want create a unicharset for it but now I'm stucking on how
tesseract can train a new language only with prepared TIFFs and their
ground truth texts?
Thanks
On Wednesday, January 8, 2020 at 8:35:56 AM UTC+3:30, shree wrote:
>
> Read your textfile line by line
> run text2image to create box/tif, similar to following.
>
> text2image --fonts_dir="$unicodefontdir" --text="${linetext}"
> --strip_unrenderable_words --xsize=2500 --ysize=300 --leading=32
> --margin=12 --exposure=0 --font="$fontname" --outputbase="${fontname//
> /_}.exp0"
>
>
> run tesseract to create lstmf files , similar to following.
>
> tesseract "${fontname// /_}.exp0".tif "${fontname// /_}.exp0" -l "$lang"
> --psm 13 --dpi 300 lstm.train
>
>
>
> On Wed, Jan 8, 2020 at 1:24 AM Ayub Rauf <[email protected] <javascript:>>
> wrote:
>
>> Hi please someone help me how to create single-line tif from texts and
>> use them for training my model.
>> Thanks
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/47c002a2-9a79-431d-8ff5-8acce2e00941%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/tesseract-ocr/47c002a2-9a79-431d-8ff5-8acce2e00941%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> --
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/9dee4ad9-142e-4011-a41f-9e210fe6d58d%40googlegroups.com.