Many OCR programs have trouble with ligatures. On Oct 16, 2014 11:21 AM, "Salvo Piazza" <s.pia...@tsc-consulting.com> wrote:
> Hi all, > I've written a little simple program to extract text from image with > tesseract 3.0.2 as: > > Tesseract instance = Tesseract.getInstance(); > instance.setDatapath(currentDir); > instance.setLanguage("ita"); > String returner = instance.doOCR(new File(filename)); > > It works fine but I've many question mark chars '?' in the extracted text. > > For example the word *fluidi *is recognized as *?uidi *and much more > example... > > Does anyone know some tips in order to fix this behaviour? > > Thanks in advance, > Salvo. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/dc3dc154-fc24-48d8-8f5e-4a1df7f36282%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/dc3dc154-fc24-48d8-8f5e-4a1df7f36282%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CA%2BOX7tr6zWdceD6REe_YrxAyzV-jsc40a5fteUD7cBo%2BOmX6Tg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.