How to convert the images as stated above into fonts for tesstrain.sh
command runnning which generates images files along with box and .lstmf
files?


On Thu, Jun 14, 2018 at 11:05 AM chandra churh chatterjee <
chandrachurh.chatterje...@gmail.com> wrote:

> can you tell me from which directory we have to run the following command
> and what will be the following arguments if we are using our trained data
> which contains files as follows:
> -07-2016     12:45             11 digits.f4.exp0.txt
> -a----       08-07-2016     12:37            198 digits.f5.exp0.box
> -a----       08-07-2016     12:10          14044 digits.f5.exp0.jpg
> -a----       08-07-2016     12:45          16309 digits.f5.exp0.tr
> -a----       08-07-2016     12:45             11 digits.f5.exp0.txt
> -a----       08-07-2016     12:31            188 digits.f6.exp0.box
> -a----       23-06-2016     13:06           9824 digits.f6.exp0.jpg
> -a----       08-07-2016     12:45          17538 digits.f6.exp0.tr
> -a----       08-07-2016     12:45             11 digits.f6.exp0.txt
> -a----       08-07-2016     12:38            199 digits.f7.exp0.box
> -a----       08-07-2016     12:11          13178 digits.f7.exp0.jpg
> -a----       08-07-2016     12:45          16019 digits.f7.exp0.tr
> -a----       08-07-2016     12:45             11 digits.f7.exp0.txt
> -a----       08-07-2016     12:38            198 digits.f8.exp0.box
> -a----       23-06-2016     13:06           9485 digits.f8.exp0.jpg
> -a----       08-07-2016     12:45          17078 digits.f8.exp0.tr
> -a----       08-07-2016     12:45             11 digits.f8.exp0.txt
> -a----       08-07-2016     12:38            199 digits.f9.exp0.box
> -a----       08-07-2016     12:11          13411 digits.f9.exp0.jpg
> -a----       08-07-2016     12:45          15916 digits.f9.exp0.tr
> -a----       08-07-2016     12:45             11 digits.f9.exp0.txt
> -a----       08-07-2016     12:57            543 digits.font_properties
> -a----       08-07-2016     12:59         184521 digits.inttemp
> -a----       08-07-2016     13:00           4832 digits.normproto
> -a----       08-07-2016     12:59             84 digits.pffmtable
> -a----       08-07-2016     12:59           6520 digits.shapetable
> -a----       08-07-2016     13:01         196755 digits.traineddata
> -a----       08-07-2016     12:59            658 digits.unicharset
> -a----       08-07-2016     12:55            648 unicharset
>
> how to convert these files and from where to run the command as sugested
> by you?
>
> On Wed, Jun 13, 2018 at 8:38 PM ShreeDevi Kumar <shreesh...@gmail.com>
> wrote:
>
>> If you have box tiff pairs in tesseract4 format you can generate the
>> lstmf files by running
>>
>> tesseract   lang.file.exp0.tif     lang.file.exp0   lstm.train
>>
>> lstm.train is  a config file.
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>>
>> On Wed, Jun 13, 2018 at 6:46 PM chandra churh chatterjee <
>> chandrachurh.chatterje...@gmail.com> wrote:
>>
>>> I have trained tesseract 3 with 64 fonts using respective box and .tr
>>> files, But now i want to use the same trained data for training tesseract 4
>>> after creating the starter trained data using the "Using tesstrain
>>>
>>> The setup for running tesstrain.sh is the same as for base Tesseract.
>>> Use --linedata_only option for LSTM training. Note that it is
>>> beneficial to have more training text and make more pages though, as neural
>>> nets don't generalize as well and need to train on something similar to
>>> what they will be running on. If the target domain is severely limited,
>>> then all the dire warnings about needing a lot of training data may not
>>> apply, but the network specification may need to be changed.
>>>
>>> Training data is created using tesstrain.sh
>>> <https://github.com/tesseract-ocr/tesseract/blob/master/src/training/tesstrain.sh>
>>>  as
>>> follows: Note that your fonts location may vary.
>>>
>>> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng 
>>> --linedata_only \
>>>   --noextract_font_properties --langdata_dir ../langdata \
>>>   --tessdata_dir ./tessdata --output_dir ~/tesstutorial/engtrain
>>>
>>> The above command makes LSTM training data equivalent to the data used
>>> to train base Tesseract for English. For making a general-purpose
>>> LSTM-based OCR engine, it is woefully inadequate, but makes a good tutorial
>>> demo.
>>>
>>> Now try this to make eval data for the 'Impact' font:
>>>
>>> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng 
>>> --linedata_only \
>>>   --noextract_font_properties --langdata_dir ../langdata \
>>>   --tessdata_dir ./tessdata \
>>>
>>>   --fontlist "Impact Condensed" --output_dir ~/tesstutorial/engeval"
>>>
>>>
>>>
>>> Now i want to proceed further using my previous trained data to do the
>>> training but the problem is that the previous trained data had .tr files
>>> and box files but tesseract 4 requires .lstmf files .
>>> Requesting for any solution.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/f3d6c64e-7763-478e-b047-a64edd032d99%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/f3d6c64e-7763-478e-b047-a64edd032d99%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWD0-BJ6sq4mypJhnc5FKudVcmSeBg%2BB5w5EARV4NPL4g%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWD0-BJ6sq4mypJhnc5FKudVcmSeBg%2BB5w5EARV4NPL4g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAD_EDkYCgsvqniCzGA1m4GRrOyEo-O9s4Rh5iE4xZ9hS578xjg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to