For clarification: With "text", I meant languages On Monday, July 6, 2015 at 3:07:36 PM UTC+2, Brennan Nunamaker wrote: > > I need to use my own trained data, because in the future we will be using > it on text that has no trained data, so we will have to generate it > ourselves. If I don't understand what I am doing wrong, I won't be able > to... > > Thank you anyway > > On Monday, July 6, 2015 at 3:03:20 PM UTC+2, shree wrote: >> >> Did you try with the Latin traineddata >> >> >> https://github.com/tesseract-ocr/tessdata/blob/master/lat.traineddata?raw=true >> >> >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Mon, Jul 6, 2015 at 5:46 PM, Brennan Nunamaker <[email protected]> >> wrote: >> >>> Hello, >>> >>> I just generated the traineddata file for an old historical version of >>> latin text, but when I run tesseract on the .tif that I used to train >>> tesseract for the language (as well as with other sample images), it >>> returns an empty result. However, when I use the English language for >>> classification, it generates text with a few errors due to a lack of >>> recognition for some specific characters. (Meaning that the fault lies with >>> the traineddata and not the samples I am running it on) >>> >>> Why could this be? I have been struggling to even generate the >>> traineddata, and ended up using a fairly short training text (see >>> attachment). Do I need to use a longer training text/tif? >>> >>> If anyone could point me in the right direction I would be extremely >>> grateful. >>> >>> Thanks in advance! >>> -Brennan >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/29355c0a-deeb-4f65-a176-9abae60bcb9c%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >>
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/69ff93c3-3a56-498a-8cfc-417c7fc2aab4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

