You may also find it helpful to read Training Tesseract for Ancient Greek
OCR by Nick White -  http://ancientgreekocr.org/e29-a01.pdf

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Jul 6, 2015 at 6:41 PM, ShreeDevi Kumar <[email protected]>
wrote:

> Please see https://github.com/tesseract-ocr/langdata/tree/master/lat
>
> which has the language data used for latin. You can use this as the basis
> to create your own traineddata file for an old historical version of
> latin
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Mon, Jul 6, 2015 at 6:37 PM, Brennan Nunamaker <[email protected]>
> wrote:
>
>> I need to use my own trained data, because in the future we will be using
>> it on text that has no trained data, so we will have to generate it
>> ourselves. If I don't understand what I am doing wrong, I won't be able
>> to...
>>
>> Thank you anyway
>>
>> On Monday, July 6, 2015 at 3:03:20 PM UTC+2, shree wrote:
>>>
>>> Did you try with the Latin traineddata
>>>
>>>
>>> https://github.com/tesseract-ocr/tessdata/blob/master/lat.traineddata?raw=true
>>>
>>>
>>>
>>> ShreeDevi
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Mon, Jul 6, 2015 at 5:46 PM, Brennan Nunamaker <[email protected]>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I just generated the traineddata file for an old historical version of
>>>> latin text, but when I run tesseract on the .tif that I used to train
>>>> tesseract for the language (as well as with other sample images), it
>>>> returns an empty result. However, when I use the English language for
>>>> classification, it generates text with a few errors due to a lack of
>>>> recognition for some specific characters. (Meaning that the fault lies with
>>>> the traineddata and not the samples I am running it on)
>>>>
>>>> Why could this be? I have been struggling to even generate the
>>>> traineddata, and ended up using a fairly short training text (see
>>>> attachment). Do I need to use a longer training text/tif?
>>>>
>>>> If anyone could point me in the right direction I would be extremely
>>>> grateful.
>>>>
>>>> Thanks in advance!
>>>> -Brennan
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/71c3b314-ff5f-4387-bf5f-ffc2cc6d2875%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/71c3b314-ff5f-4387-bf5f-ffc2cc6d2875%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXiNzt3P%2Bi-Xp6E-tbMrzpewTPzfSyUhT4TTQtnwiiZTg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to