Re: [tesseract-ocr] Many 'question mark' chars in recognized text

2014-10-17 Thread Salvo Piazza
Hi Zdenko, thanks for your response. I know tesseract at very beginning level, so can you tell me how can I check it? (I use a Linux version of tesseract...) Thanks, Salvo. Il giorno giovedì 16 ottobre 2014 21:46:31 UTC+2, zdenop ha scritto: fl is recognizes as ligature in English, so there

Re: [tesseract-ocr] Many 'question mark' chars in recognized text

2014-10-17 Thread Rick Leir
On Linux try YAGF, it is a GUI front end for Tesseract. As zdenop said, you have a unicode problem. You need to use UTF8 for strings. On Friday, October 17, 2014 6:07:26 AM UTC-4, Salvo Piazza wrote: Hi Zdenko, thanks for your response. I know tesseract at very beginning level, so can you

Re: [tesseract-ocr] Many 'question mark' chars in recognized text

2014-10-17 Thread zdenko podobny
OCR a test image with you app, store result to text file. Than OCR the same image with tesseract executable (output should be in text file by default) and compare results. If output from tesseract executable is OK, but from your app is wrong (e.g. there are only ascii letters) = you have problem

[tesseract-ocr] Many 'question mark' chars in recognized text

2014-10-16 Thread Salvo Piazza
Hi all, I've written a little simple program to extract text from image with tesseract 3.0.2 as: Tesseract instance = Tesseract.getInstance(); instance.setDatapath(currentDir); instance.setLanguage(ita); String returner = instance.doOCR(new File(filename)); It works fine but I've many question

Re: [tesseract-ocr] Many 'question mark' chars in recognized text

2014-10-16 Thread Greg Dunkel
Many OCR programs have trouble with ligatures. On Oct 16, 2014 11:21 AM, Salvo Piazza s.pia...@tsc-consulting.com wrote: Hi all, I've written a little simple program to extract text from image with tesseract 3.0.2 as: Tesseract instance = Tesseract.getInstance();