This is very helpful, thank you!

On Monday, July 6, 2015 at 3:17:50 PM UTC+2, shree wrote:
>
> You may also find it helpful to read Training Tesseract for Ancient Greek 
> OCR by Nick White -  http://ancientgreekocr.org/e29-a01.pdf 
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Mon, Jul 6, 2015 at 6:41 PM, ShreeDevi Kumar <[email protected] 
> <javascript:>> wrote:
>
>> Please see https://github.com/tesseract-ocr/langdata/tree/master/lat
>>
>> which has the language data used for latin. You can use this as the basis 
>> to create your own traineddata file for an old historical version of 
>> latin 
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Mon, Jul 6, 2015 at 6:37 PM, Brennan Nunamaker <[email protected] 
>> <javascript:>> wrote:
>>
>>> I need to use my own trained data, because in the future we will be 
>>> using it on text that has no trained data, so we will have to generate it 
>>> ourselves. If I don't understand what I am doing wrong, I won't be able 
>>> to... 
>>>
>>> Thank you anyway
>>>
>>> On Monday, July 6, 2015 at 3:03:20 PM UTC+2, shree wrote:
>>>>
>>>> Did you try with the Latin traineddata 
>>>>
>>>>
>>>> https://github.com/tesseract-ocr/tessdata/blob/master/lat.traineddata?raw=true
>>>>
>>>>
>>>>
>>>> ShreeDevi
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>> On Mon, Jul 6, 2015 at 5:46 PM, Brennan Nunamaker <[email protected]> 
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I just generated the traineddata file for an old historical version of 
>>>>> latin text, but when I run tesseract on the .tif that I used to train 
>>>>> tesseract for the language (as well as with other sample images), it 
>>>>> returns an empty result. However, when I use the English language for 
>>>>> classification, it generates text with a few errors due to a lack of 
>>>>> recognition for some specific characters. (Meaning that the fault lies 
>>>>> with 
>>>>> the traineddata and not the samples I am running it on)
>>>>>
>>>>> Why could this be? I have been struggling to even generate the 
>>>>> traineddata, and ended up using a fairly short training text (see 
>>>>> attachment). Do I need to use a longer training text/tif?
>>>>>
>>>>> If anyone could point me in the right direction I would be extremely 
>>>>> grateful.
>>>>>
>>>>> Thanks in advance!
>>>>> -Brennan
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To post to this group, send email to [email protected] 
>>> <javascript:>.
>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/71c3b314-ff5f-4387-bf5f-ffc2cc6d2875%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/71c3b314-ff5f-4387-bf5f-ffc2cc6d2875%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/18f2c8de-df85-4afa-9aaf-e9d5be47862c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to