Hi Martin, Some things indeed can be done to improve results for the upper word.
- Source image (inet009.jpg) - Upscale by 5x. This is required since your upper word has too small characters. (inet009_rs.jpg) - Crop out your upper word - you need to help Tess with layout analysis (inet009_rs_cr.jpg) - Threshold - you need to help Tess with binarization >convert inet009_rs_cr.jpg -threshold 45% inet009_rs_cr_ts.jpg (inet009_rs_cr_ts.jpg) - Call Tess. I don't know if Spanish traineddata contains two-dotted "e" but French surely do. Used Tess compiled from sources as of 20150203. Perfect OCR result. >tesseract inet009_rs_cr_ts.jpg inet009_rs_cr_ts.jpg -l fra (inet009_rs_cr_ts.jpg.txt) The lower word just being cropped out leads to normal recognition. Best regards, Dmitri Silaev www.CustomOCR.com On Sat, May 2, 2015 at 2:01 AM, Martín Ochoa <[email protected]> wrote: > Hi, > I'm developing an app that will have to read text from image in order to > do some things that have nothing to do with my question. So I have that > image and I want to read the text but unfortunately it's not reading it > right, I tried to do some image preprocesing but I didn't understand it > since I'm new at this, and I don't know if I even have to do it. > This is my output: > > Coürdow > Abathur > > I've changed the language to spa, so it would read the "ë". But then I > think the problem is that the word "Caërdagor", doesn't exist in any > language since it's a invented name, then again "Abathur" doesn't exist > either but is reading it ok. > All the images that the app would read are the same as this, but obviously > with different text. Any tips on how to improve this? Remember I'm a noob > at this. Also do you think it would be a good idea to "train" the language, > adding this invented names as the app reads them? > > > Thanks in advance. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/37bb44d3-5299-4576-ac31-57d68b901204%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/37bb44d3-5299-4576-ac31-57d68b901204%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAKzLxFNvpHb-2LppNKK_RPtDKzSFv62T8iaiUk7SrHPOsF0pGg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Caërdagor

