You may also want to see the latest code and the tesstrain.sh script for
the newer developments in training at
https://github.com/tesseract-ocr/tesseract/tree/master/training

Also see the release history on http://ancientgreekocr.org/
since Nick updated the software for the changes in tesseract - the article
is older.


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Jul 6, 2015 at 7:30 PM, Brennan Nunamaker <[email protected]>
wrote:

> This is very helpful, thank you!
>
> On Monday, July 6, 2015 at 3:17:50 PM UTC+2, shree wrote:
>>
>> You may also find it helpful to read Training Tesseract for Ancient Greek
>> OCR by Nick White -  http://ancientgreekocr.org/e29-a01.pdf
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Mon, Jul 6, 2015 at 6:41 PM, ShreeDevi Kumar <[email protected]>
>> wrote:
>>
>>> Please see https://github.com/tesseract-ocr/langdata/tree/master/lat
>>>
>>> which has the language data used for latin. You can use this as the
>>> basis to create your own traineddata file for an old historical version
>>> of latin
>>>
>>> ShreeDevi
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Mon, Jul 6, 2015 at 6:37 PM, Brennan Nunamaker <[email protected]>
>>> wrote:
>>>
>>>> I need to use my own trained data, because in the future we will be
>>>> using it on text that has no trained data, so we will have to generate it
>>>> ourselves. If I don't understand what I am doing wrong, I won't be able
>>>> to...
>>>>
>>>> Thank you anyway
>>>>
>>>> On Monday, July 6, 2015 at 3:03:20 PM UTC+2, shree wrote:
>>>>>
>>>>> Did you try with the Latin traineddata
>>>>>
>>>>>
>>>>> https://github.com/tesseract-ocr/tessdata/blob/master/lat.traineddata?raw=true
>>>>>
>>>>>
>>>>>
>>>>> ShreeDevi
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>
>>>>> On Mon, Jul 6, 2015 at 5:46 PM, Brennan Nunamaker <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I just generated the traineddata file for an old historical version
>>>>>> of latin text, but when I run tesseract on the .tif that I used to train
>>>>>> tesseract for the language (as well as with other sample images), it
>>>>>> returns an empty result. However, when I use the English language for
>>>>>> classification, it generates text with a few errors due to a lack of
>>>>>> recognition for some specific characters. (Meaning that the fault lies 
>>>>>> with
>>>>>> the traineddata and not the samples I am running it on)
>>>>>>
>>>>>> Why could this be? I have been struggling to even generate the
>>>>>> traineddata, and ended up using a fairly short training text (see
>>>>>> attachment). Do I need to use a longer training text/tif?
>>>>>>
>>>>>> If anyone could point me in the right direction I would be extremely
>>>>>> grateful.
>>>>>>
>>>>>> Thanks in advance!
>>>>>> -Brennan
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/71c3b314-ff5f-4387-bf5f-ffc2cc6d2875%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/71c3b314-ff5f-4387-bf5f-ffc2cc6d2875%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/18f2c8de-df85-4afa-9aaf-e9d5be47862c%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/18f2c8de-df85-4afa-9aaf-e9d5be47862c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUSNhZoqDqjJNpaU_sjZOz%3D_rUVmjZxe%2BG6DOak_rzBTg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to