Thanks! Really great you took the time, very much appreciated, with that level of information we I’ll be able to find ou way :)
For your set which fonts did you use? (You have a best and a fast one) Thanks again Thomas Envoyé de mon iPhone > Le 4 janv. 2018 à 17:19, ShreeDevi Kumar <[email protected]> a écrit : > > I am attaching a zip file. > > The files in langdata/eng are my modified version of training text and input > files for punctuation and number formats. You can modify them further to > match your requirements. > > I could not find a saved script with the command I used. Instead please see > attached engtrain.sh - it was posted by one of users in the forum. You will > need to modify it based on the file locations on your system. If you know the > font used in the images you need to ocr, you can train with just that > font/similar fonts. > > > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > >> On Thu, Jan 4, 2018 at 7:23 PM, Thomas Menguy <[email protected]> >> wrote: >> Thanks a lot, seen the tutorial but was a bit confused as it is made to « >> remove » characters to let only the digits, but was not sure which chars to >> be removed ...(the whole Unicode minus the digits?) ... >> Anyway thanks again for the answer ... would be awesome if you could find >> back the command line ;) >> BR >> >> Envoyé de mon iPhone >> >>> Le 4 janv. 2018 à 10:08, ShreeDevi Kumar <[email protected]> a écrit : >>> >>> I will have to look for the exact commands and training text I used at that >>> time. >>> >>> You should be able to recreate the training by following instructions given >>> at >>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters >>> >>> I had modified the english langdata files and then finally renamed the >>> traineddata to digits after completing training. >>> >>> Create a training text which has digits and signs. >>> >>> Replace the word list to match the kind of number patterns you expect or >>> don't use a word list at all. >>> >>> >>> >>> On 04-Jan-2018 12:04 PM, "Thomas Menguy" <[email protected]> wrote: >>> Hi Shree, >>> >>> Tried your Data for digits ... really works well! >>> Need to do a training set with number and signs for example ... could you >>> point me on how you've done your own training data (sorry fairly new to >>> Tesseract, never trained it before) >>> >>> Thanks for your help! >>> BR >>> >>>> On Tuesday, October 3, 2017 at 6:39:30 PM UTC+2, shree wrote: >>>> You can try the plus-minus type of training if you just want a digits type >>>> of traineddata. >>>> >>>> Your training_text can contain numbers in the format you need and you can >>>> train with a font matching your images. >>>> >>>> For proof of concept you can try my experimental version at >>>> >>>> https://github.com/Shreeshrii/tessdata4alpha/blob/master/fast/digits.traineddata >>>> >>>>> On Friday, September 29, 2017 at 12:32:41 PM UTC+5:30, John Miller wrote: >>>>> Today,I found that the problem had been posted on >>>>> https://github.com/tesseract-ocr/tesseract/issues/751 >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/5f98dc8f-55e9-46dc-84b2-4ee1c7adc868%40googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "tesseract-ocr" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/tesseract-ocr/-oeCTcojYfw/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXyCd3RFDA0G%3DXyYtUa6Cft1afT4KRrEx2%3DFhZKq_yS%2BQ%40mail.gmail.com. >>> For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/58D78AED-8C8D-44C9-9C70-B7BB5B7E19AE%40gmail.com. >> >> For more options, visit https://groups.google.com/d/optout. > > <engtrain.zip> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/C62656BA-0815-496D-B9E7-D01B1DFC6340%40gmail.com. For more options, visit https://groups.google.com/d/optout.

