You may also find it helpful to read Training Tesseract for Ancient Greek OCR by Nick White - http://ancientgreekocr.org/e29-a01.pdf
ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 6, 2015 at 6:41 PM, ShreeDevi Kumar <[email protected]> wrote: > Please see https://github.com/tesseract-ocr/langdata/tree/master/lat > > which has the language data used for latin. You can use this as the basis > to create your own traineddata file for an old historical version of > latin > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Mon, Jul 6, 2015 at 6:37 PM, Brennan Nunamaker <[email protected]> > wrote: > >> I need to use my own trained data, because in the future we will be using >> it on text that has no trained data, so we will have to generate it >> ourselves. If I don't understand what I am doing wrong, I won't be able >> to... >> >> Thank you anyway >> >> On Monday, July 6, 2015 at 3:03:20 PM UTC+2, shree wrote: >>> >>> Did you try with the Latin traineddata >>> >>> >>> https://github.com/tesseract-ocr/tessdata/blob/master/lat.traineddata?raw=true >>> >>> >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Mon, Jul 6, 2015 at 5:46 PM, Brennan Nunamaker <[email protected]> >>> wrote: >>> >>>> Hello, >>>> >>>> I just generated the traineddata file for an old historical version of >>>> latin text, but when I run tesseract on the .tif that I used to train >>>> tesseract for the language (as well as with other sample images), it >>>> returns an empty result. However, when I use the English language for >>>> classification, it generates text with a few errors due to a lack of >>>> recognition for some specific characters. (Meaning that the fault lies with >>>> the traineddata and not the samples I am running it on) >>>> >>>> Why could this be? I have been struggling to even generate the >>>> traineddata, and ended up using a fairly short training text (see >>>> attachment). Do I need to use a longer training text/tif? >>>> >>>> If anyone could point me in the right direction I would be extremely >>>> grateful. >>>> >>>> Thanks in advance! >>>> -Brennan >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/71c3b314-ff5f-4387-bf5f-ffc2cc6d2875%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/71c3b314-ff5f-4387-bf5f-ffc2cc6d2875%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXiNzt3P%2Bi-Xp6E-tbMrzpewTPzfSyUhT4TTQtnwiiZTg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

