Best and fast are both from the same check point.

You have to use convert_to_int with stop_training to convert the model from
floating point to integer.

Please see
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#lstmtraining-command-line

for the exact syntax.

Since digits traineddata is not adding any characters, you will probably
need fewer iterations.

I had created this traineddata in response to a post in the forum and had
used number formats in training text and font similar to the sample image
provided.



On 04-Jan-2018 11:54 PM, "Thomas Menguy" <[email protected]> wrote:

> Thanks! Really great you took the time, very much appreciated, with that
> level of information we I’ll be able to find ou way :)
>
> For your set which fonts did you use? (You have a best and a fast one)
>
> Thanks again
> Thomas
>
> Envoyé de mon iPhone
>
> Le 4 janv. 2018 à 17:19, ShreeDevi Kumar <[email protected]> a écrit :
>
> I am attaching a zip file.
>
> The files in langdata/eng are my modified version of training text and
> input files for punctuation and number formats. You can modify them further
> to match your requirements.
>
> I could not find a saved script with the command I used. Instead please
> see attached engtrain.sh - it was posted by one of users in the forum. You
> will need to modify it based on the file locations on your system. If you
> know the font used in the images you need to ocr, you can train with just
> that font/similar fonts.
>
>
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Thu, Jan 4, 2018 at 7:23 PM, Thomas Menguy <[email protected]>
> wrote:
>
>> Thanks a lot, seen the tutorial but was a bit confused as it is made to «
>> remove » characters to let only the digits, but was not sure which chars to
>> be removed ...(the whole Unicode minus the digits?) ...
>> Anyway thanks again for the answer ... would be awesome if you could find
>> back the command line ;)
>> BR
>>
>> Envoyé de mon iPhone
>>
>> Le 4 janv. 2018 à 10:08, ShreeDevi Kumar <[email protected]> a écrit :
>>
>> I will have to look for the exact commands and training text I used at
>> that time.
>>
>> You should be able to recreate the training by following instructions
>> given at https://github.com/tesseract-ocr/tesseract/wiki/TrainingT
>> esseract-4.00#fine-tuning-for--a-few-characters
>>
>> I had modified the english langdata files and then finally renamed the
>> traineddata to digits after completing training.
>>
>> Create a training text which has digits and signs.
>>
>> Replace the word list to match the kind of number patterns you expect or
>> don't use a word list at all.
>>
>>
>>
>> On 04-Jan-2018 12:04 PM, "Thomas Menguy" <[email protected]> wrote:
>>
>> Hi Shree,
>>
>> Tried your Data for digits ... really works well!
>> Need to do a training set with number and signs for example ... could you
>> point me on how you've done your own training data (sorry fairly new to
>> Tesseract, never trained it before)
>>
>> Thanks for your help!
>> BR
>>
>> On Tuesday, October 3, 2017 at 6:39:30 PM UTC+2, shree wrote:
>>>
>>> You can try the plus-minus type of training if you just want a digits
>>> type of traineddata.
>>>
>>> Your training_text can contain numbers in the format you need and you
>>> can train with a font matching your images.
>>>
>>> For proof of concept you can try my experimental version at
>>>
>>> https://github.com/Shreeshrii/tessdata4alpha/blob/master/fas
>>> t/digits.traineddata
>>>
>>> On Friday, September 29, 2017 at 12:32:41 PM UTC+5:30, John Miller wrote:
>>>>
>>>> Today,I found that the problem had been  posted on
>>>> https://github.com/tesseract-ocr/tesseract/issues/751
>>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/5f98dc8f-55e9-46dc-84b2-4ee1c7adc868%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/5f98dc8f-55e9-46dc-84b2-4ee1c7adc868%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>> pic/tesseract-ocr/-oeCTcojYfw/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/CAG2NduXyCd3RFDA0G%3DXyYtUa6Cft1afT4KRrEx2
>> %3DFhZKq_yS%2BQ%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXyCd3RFDA0G%3DXyYtUa6Cft1afT4KRrEx2%3DFhZKq_yS%2BQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/58D78AED-8C8D-44C9-9C70-B7BB5B7E19AE%40gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/58D78AED-8C8D-44C9-9C70-B7BB5B7E19AE%40gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> <engtrain.zip>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVKOThdt-8oRFj4nJx0SgHjvHPaa7jHpmYMhHGP_OCTgg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to