Hi Martin,

Some things indeed can be done to improve results for the upper word.

- Source image
(inet009.jpg)

- Upscale by 5x. This is required since your upper word has too small
characters.
(inet009_rs.jpg)

- Crop out your upper word - you need to help Tess with layout analysis
(inet009_rs_cr.jpg)

- Threshold - you need to help Tess with binarization
>convert inet009_rs_cr.jpg -threshold 45% inet009_rs_cr_ts.jpg
(inet009_rs_cr_ts.jpg)

- Call Tess. I don't know if Spanish traineddata contains two-dotted "e"
but French surely do. Used Tess compiled from sources as of 20150203.
Perfect OCR result.
>tesseract inet009_rs_cr_ts.jpg inet009_rs_cr_ts.jpg -l fra
(inet009_rs_cr_ts.jpg.txt)

The lower word just being cropped out leads to normal recognition.

Best regards,
Dmitri Silaev
www.CustomOCR.com





On Sat, May 2, 2015 at 2:01 AM, Martín Ochoa <[email protected]> wrote:

> Hi,
> I'm developing an app that will have to read text from image in order to
> do some things that have nothing to do with my question. So I have that
> image and I want to read the text but unfortunately it's not reading it
> right, I tried to do some image preprocesing but I didn't understand it
> since I'm new at this, and I don't know if I even have to do it.
> This is my output:
>
> Coürdow
> Abathur
>
> I've changed the language to spa, so it would read the "ë". But then I
> think the problem is that the word "Caërdagor", doesn't exist in any
> language since it's a invented name, then again "Abathur" doesn't exist
> either but is reading it ok.
> All the images that the app would read are the same as this, but obviously
> with different text. Any tips on how to improve this? Remember I'm a noob
> at this. Also do you think it would be a good idea to "train" the language,
> adding this invented names as the app reads them?
>
>
> Thanks in advance.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/37bb44d3-5299-4576-ac31-57d68b901204%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/37bb44d3-5299-4576-ac31-57d68b901204%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAKzLxFNvpHb-2LppNKK_RPtDKzSFv62T8iaiUk7SrHPOsF0pGg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Caërdagor

Reply via email to