This is very helpful, thank you! On Monday, July 6, 2015 at 3:17:50 PM UTC+2, shree wrote: > > You may also find it helpful to read Training Tesseract for Ancient Greek > OCR by Nick White - http://ancientgreekocr.org/e29-a01.pdf > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Mon, Jul 6, 2015 at 6:41 PM, ShreeDevi Kumar <[email protected] > <javascript:>> wrote: > >> Please see https://github.com/tesseract-ocr/langdata/tree/master/lat >> >> which has the language data used for latin. You can use this as the basis >> to create your own traineddata file for an old historical version of >> latin >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Mon, Jul 6, 2015 at 6:37 PM, Brennan Nunamaker <[email protected] >> <javascript:>> wrote: >> >>> I need to use my own trained data, because in the future we will be >>> using it on text that has no trained data, so we will have to generate it >>> ourselves. If I don't understand what I am doing wrong, I won't be able >>> to... >>> >>> Thank you anyway >>> >>> On Monday, July 6, 2015 at 3:03:20 PM UTC+2, shree wrote: >>>> >>>> Did you try with the Latin traineddata >>>> >>>> >>>> https://github.com/tesseract-ocr/tessdata/blob/master/lat.traineddata?raw=true >>>> >>>> >>>> >>>> ShreeDevi >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Mon, Jul 6, 2015 at 5:46 PM, Brennan Nunamaker <[email protected]> >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> I just generated the traineddata file for an old historical version of >>>>> latin text, but when I run tesseract on the .tif that I used to train >>>>> tesseract for the language (as well as with other sample images), it >>>>> returns an empty result. However, when I use the English language for >>>>> classification, it generates text with a few errors due to a lack of >>>>> recognition for some specific characters. (Meaning that the fault lies >>>>> with >>>>> the traineddata and not the samples I am running it on) >>>>> >>>>> Why could this be? I have been struggling to even generate the >>>>> traineddata, and ended up using a fairly short training text (see >>>>> attachment). Do I need to use a longer training text/tif? >>>>> >>>>> If anyone could point me in the right direction I would be extremely >>>>> grateful. >>>>> >>>>> Thanks in advance! >>>>> -Brennan >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> To post to this group, send email to [email protected] >>> <javascript:>. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/71c3b314-ff5f-4387-bf5f-ffc2cc6d2875%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/71c3b314-ff5f-4387-bf5f-ffc2cc6d2875%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/18f2c8de-df85-4afa-9aaf-e9d5be47862c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

