Thanks! Really great you took the time, very much appreciated, with that level 
of information we I’ll be able to find ou way :)

For your set which fonts did you use? (You have a best and a fast one)
 
Thanks again
Thomas

Envoyé de mon iPhone

> Le 4 janv. 2018 à 17:19, ShreeDevi Kumar <[email protected]> a écrit :
> 
> I am attaching a zip file.
> 
> The files in langdata/eng are my modified version of training text and input 
> files for punctuation and number formats. You can modify them further to 
> match your requirements.
> 
> I could not find a saved script with the command I used. Instead please see 
> attached engtrain.sh - it was posted by one of users in the forum. You will 
> need to modify it based on the file locations on your system. If you know the 
> font used in the images you need to ocr, you can train with just that 
> font/similar fonts.
> 
> 
> 
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
> 
>> On Thu, Jan 4, 2018 at 7:23 PM, Thomas Menguy <[email protected]> 
>> wrote:
>> Thanks a lot, seen the tutorial but was a bit confused as it is made to « 
>> remove » characters to let only the digits, but was not sure which chars to 
>> be removed ...(the whole Unicode minus the digits?) ...
>> Anyway thanks again for the answer ... would be awesome if you could find 
>> back the command line ;)
>> BR
>> 
>> Envoyé de mon iPhone
>> 
>>> Le 4 janv. 2018 à 10:08, ShreeDevi Kumar <[email protected]> a écrit :
>>> 
>>> I will have to look for the exact commands and training text I used at that 
>>> time.
>>> 
>>> You should be able to recreate the training by following instructions given 
>>> at 
>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters
>>> 
>>> I had modified the english langdata files and then finally renamed the 
>>> traineddata to digits after completing training.
>>> 
>>> Create a training text which has digits and signs. 
>>> 
>>> Replace the word list to match the kind of number patterns you expect or 
>>> don't use a word list at all.
>>> 
>>> 
>>> 
>>> On 04-Jan-2018 12:04 PM, "Thomas Menguy" <[email protected]> wrote:
>>> Hi Shree, 
>>> 
>>> Tried your Data for digits ... really works well!
>>> Need to do a training set with number and signs for example ... could you 
>>> point me on how you've done your own training data (sorry fairly new to 
>>> Tesseract, never trained it before)
>>> 
>>> Thanks for your help!
>>> BR
>>> 
>>>> On Tuesday, October 3, 2017 at 6:39:30 PM UTC+2, shree wrote:
>>>> You can try the plus-minus type of training if you just want a digits type 
>>>> of traineddata.
>>>> 
>>>> Your training_text can contain numbers in the format you need and you can 
>>>> train with a font matching your images.
>>>> 
>>>> For proof of concept you can try my experimental version at 
>>>> 
>>>> https://github.com/Shreeshrii/tessdata4alpha/blob/master/fast/digits.traineddata
>>>> 
>>>>> On Friday, September 29, 2017 at 12:32:41 PM UTC+5:30, John Miller wrote:
>>>>> Today,I found that the problem had been  posted on 
>>>>> https://github.com/tesseract-ocr/tesseract/issues/751
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/5f98dc8f-55e9-46dc-84b2-4ee1c7adc868%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/tesseract-ocr/-oeCTcojYfw/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXyCd3RFDA0G%3DXyYtUa6Cft1afT4KRrEx2%3DFhZKq_yS%2BQ%40mail.gmail.com.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/58D78AED-8C8D-44C9-9C70-B7BB5B7E19AE%40gmail.com.
>> 
>> For more options, visit https://groups.google.com/d/optout.
> 
> <engtrain.zip>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/C62656BA-0815-496D-B9E7-D01B1DFC6340%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to