OCR a test image with you app, store result to text file. Than OCR the same
image with tesseract executable (output should be in text file by default)
and compare results.
If output from tesseract executable is OK, but from your app is wrong (e.g.
there are only ascii letters) => you have problem within you app (e.g. it
does not handle unicode string correctly).

Zdenko

On Fri, Oct 17, 2014 at 12:07 PM, Salvo Piazza <[email protected]>
wrote:

> Hi Zdenko,
> thanks for your response.
>
> I know tesseract at very beginning level, so can you tell me how can I
> check it? (I use a Linux version of tesseract...)
>
> Thanks,
> Salvo.
>
>
> Il giorno giovedì 16 ottobre 2014 21:46:31 UTC+2, zdenop ha scritto:
>>
>> fl is recognizes as ligature in English, so there could be the same issue
>> in Italian. If it is replaced with '?' I would guess you have problem with
>> unicode... Can you check it with tesseract executable?
>>
>> Zdenko
>>
>> On Thu, Oct 16, 2014 at 10:18 AM, Salvo Piazza <
>> [email protected]> wrote:
>>
>>> Hi all,
>>> I've written a little simple program to extract text from image with
>>> tesseract 3.0.2 as:
>>>
>>> Tesseract instance = Tesseract.getInstance();
>>> instance.setDatapath(currentDir);
>>> instance.setLanguage("ita");
>>> String returner = instance.doOCR(new File(filename));
>>>
>>> It works fine but I've many question mark chars '?' in the extracted
>>> text.
>>>
>>> For example the word *fluidi *is recognized as *?uidi *and much more
>>> example...
>>>
>>> Does anyone know some tips in order to fix this behaviour?
>>>
>>> Thanks in advance,
>>> Salvo.
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/tesseract-ocr/dc3dc154-fc24-48d8-8f5e-4a1df7f36282%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/dc3dc154-fc24-48d8-8f5e-4a1df7f36282%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/1ab08bc3-b6aa-45a7-bdfd-a6c0402e950f%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/1ab08bc3-b6aa-45a7-bdfd-a6c0402e950f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xARq_qH_UQftQUzFZPOb_-4-_LGmQOEXWu1x%3D61CdiNw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to