Best and fast are both from the same check point. You have to use convert_to_int with stop_training to convert the model from floating point to integer.
Please see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#lstmtraining-command-line for the exact syntax. Since digits traineddata is not adding any characters, you will probably need fewer iterations. I had created this traineddata in response to a post in the forum and had used number formats in training text and font similar to the sample image provided. On 04-Jan-2018 11:54 PM, "Thomas Menguy" <[email protected]> wrote: > Thanks! Really great you took the time, very much appreciated, with that > level of information we I’ll be able to find ou way :) > > For your set which fonts did you use? (You have a best and a fast one) > > Thanks again > Thomas > > Envoyé de mon iPhone > > Le 4 janv. 2018 à 17:19, ShreeDevi Kumar <[email protected]> a écrit : > > I am attaching a zip file. > > The files in langdata/eng are my modified version of training text and > input files for punctuation and number formats. You can modify them further > to match your requirements. > > I could not find a saved script with the command I used. Instead please > see attached engtrain.sh - it was posted by one of users in the forum. You > will need to modify it based on the file locations on your system. If you > know the font used in the images you need to ocr, you can train with just > that font/similar fonts. > > > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Thu, Jan 4, 2018 at 7:23 PM, Thomas Menguy <[email protected]> > wrote: > >> Thanks a lot, seen the tutorial but was a bit confused as it is made to « >> remove » characters to let only the digits, but was not sure which chars to >> be removed ...(the whole Unicode minus the digits?) ... >> Anyway thanks again for the answer ... would be awesome if you could find >> back the command line ;) >> BR >> >> Envoyé de mon iPhone >> >> Le 4 janv. 2018 à 10:08, ShreeDevi Kumar <[email protected]> a écrit : >> >> I will have to look for the exact commands and training text I used at >> that time. >> >> You should be able to recreate the training by following instructions >> given at https://github.com/tesseract-ocr/tesseract/wiki/TrainingT >> esseract-4.00#fine-tuning-for--a-few-characters >> >> I had modified the english langdata files and then finally renamed the >> traineddata to digits after completing training. >> >> Create a training text which has digits and signs. >> >> Replace the word list to match the kind of number patterns you expect or >> don't use a word list at all. >> >> >> >> On 04-Jan-2018 12:04 PM, "Thomas Menguy" <[email protected]> wrote: >> >> Hi Shree, >> >> Tried your Data for digits ... really works well! >> Need to do a training set with number and signs for example ... could you >> point me on how you've done your own training data (sorry fairly new to >> Tesseract, never trained it before) >> >> Thanks for your help! >> BR >> >> On Tuesday, October 3, 2017 at 6:39:30 PM UTC+2, shree wrote: >>> >>> You can try the plus-minus type of training if you just want a digits >>> type of traineddata. >>> >>> Your training_text can contain numbers in the format you need and you >>> can train with a font matching your images. >>> >>> For proof of concept you can try my experimental version at >>> >>> https://github.com/Shreeshrii/tessdata4alpha/blob/master/fas >>> t/digits.traineddata >>> >>> On Friday, September 29, 2017 at 12:32:41 PM UTC+5:30, John Miller wrote: >>>> >>>> Today,I found that the problem had been posted on >>>> https://github.com/tesseract-ocr/tesseract/issues/751 >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/tesseract-ocr/5f98dc8f-55e9-46dc-84b2-4ee1c7adc868%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/5f98dc8f-55e9-46dc-84b2-4ee1c7adc868%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "tesseract-ocr" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/to >> pic/tesseract-ocr/-oeCTcojYfw/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/tesseract-ocr/CAG2NduXyCd3RFDA0G%3DXyYtUa6Cft1afT4KRrEx2 >> %3DFhZKq_yS%2BQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXyCd3RFDA0G%3DXyYtUa6Cft1afT4KRrEx2%3DFhZKq_yS%2BQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/tesseract-ocr/58D78AED-8C8D-44C9-9C70-B7BB5B7E19AE%40gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/58D78AED-8C8D-44C9-9C70-B7BB5B7E19AE%40gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > <engtrain.zip> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVKOThdt-8oRFj4nJx0SgHjvHPaa7jHpmYMhHGP_OCTgg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

