You may also want to see the latest code and the tesstrain.sh script for the newer developments in training at https://github.com/tesseract-ocr/tesseract/tree/master/training
Also see the release history on http://ancientgreekocr.org/ since Nick updated the software for the changes in tesseract - the article is older. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 6, 2015 at 7:30 PM, Brennan Nunamaker <[email protected]> wrote: > This is very helpful, thank you! > > On Monday, July 6, 2015 at 3:17:50 PM UTC+2, shree wrote: >> >> You may also find it helpful to read Training Tesseract for Ancient Greek >> OCR by Nick White - http://ancientgreekocr.org/e29-a01.pdf >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Mon, Jul 6, 2015 at 6:41 PM, ShreeDevi Kumar <[email protected]> >> wrote: >> >>> Please see https://github.com/tesseract-ocr/langdata/tree/master/lat >>> >>> which has the language data used for latin. You can use this as the >>> basis to create your own traineddata file for an old historical version >>> of latin >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Mon, Jul 6, 2015 at 6:37 PM, Brennan Nunamaker <[email protected]> >>> wrote: >>> >>>> I need to use my own trained data, because in the future we will be >>>> using it on text that has no trained data, so we will have to generate it >>>> ourselves. If I don't understand what I am doing wrong, I won't be able >>>> to... >>>> >>>> Thank you anyway >>>> >>>> On Monday, July 6, 2015 at 3:03:20 PM UTC+2, shree wrote: >>>>> >>>>> Did you try with the Latin traineddata >>>>> >>>>> >>>>> https://github.com/tesseract-ocr/tessdata/blob/master/lat.traineddata?raw=true >>>>> >>>>> >>>>> >>>>> ShreeDevi >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>>> On Mon, Jul 6, 2015 at 5:46 PM, Brennan Nunamaker <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I just generated the traineddata file for an old historical version >>>>>> of latin text, but when I run tesseract on the .tif that I used to train >>>>>> tesseract for the language (as well as with other sample images), it >>>>>> returns an empty result. However, when I use the English language for >>>>>> classification, it generates text with a few errors due to a lack of >>>>>> recognition for some specific characters. (Meaning that the fault lies >>>>>> with >>>>>> the traineddata and not the samples I am running it on) >>>>>> >>>>>> Why could this be? I have been struggling to even generate the >>>>>> traineddata, and ended up using a fairly short training text (see >>>>>> attachment). Do I need to use a longer training text/tif? >>>>>> >>>>>> If anyone could point me in the right direction I would be extremely >>>>>> grateful. >>>>>> >>>>>> Thanks in advance! >>>>>> -Brennan >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/71c3b314-ff5f-4387-bf5f-ffc2cc6d2875%40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/71c3b314-ff5f-4387-bf5f-ffc2cc6d2875%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/18f2c8de-df85-4afa-9aaf-e9d5be47862c%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/18f2c8de-df85-4afa-9aaf-e9d5be47862c%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUSNhZoqDqjJNpaU_sjZOz%3D_rUVmjZxe%2BG6DOak_rzBTg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

