I have a scanned bank statement, printed with a sans-serif font. using gocr, the only problem I have is '1' being recognized as 'I'. ocrad is a lot worse, but still useful. my results with the same file is complete gibberish with tesseract.
The file is very high resolution, very high contrast. I can't show it as it contains my bank statements. Is there some kind of guide for tunning the tool? At this point I'm trying it to see if it recognizes the '1's better as the numbers are of importance. But at this stage, the output is useless. English language by the way. Here's an exert of the output, I think it's safe to paste as it seems to contain nothing intelligible. ----------------------------------------------------- F’I?IE`\!I()L.IS ST4¤n.TIEI**‘IEI\IT . 6 I)IEF’()SITS 4¤n.I\II) ()TI—IE| 51 (ZI—IE(ZI(S 4¤n.I\II) ()TI—IEI2 I IINTEIQEST F’4¤n.II) TI—IIS F’| SEI?\!I(ZE (ZI—I4¤n.I2(5E 4¤n.I**‘I()L.II\I` (ZLJIQIQEINT ]B4¤n.I.4¤n.I\I(.TIE 4¤n.S (III I\IL.II**‘IZBIEI2 (III: I)4¤n.‘¤’S II\I ST. 4¤n.I\II\IL.I4¤n.I. F’IEI?(.TEI\IT4¤¤.l 4¤n.\!EI24¤n.(5E I)4¤n.II.‘¤’ IB. IINTEIQEST F’4¤n.II) ‘¤’| ]D4¤n.TE 4¤n.I**‘I()L.II\I ----------------------------------------------------- --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

