That sounds very much like this: http://groups.google.com/group/tesseract-ocr/browse_thread/thread/bc687b07cac549ed?hl=en
On Jan 15, 9:17 pm, Arthur Pemberton <[email protected]> wrote: > I have a scanned bank statement, printed with a sans-serif font. using > gocr, the only problem I have is '1' being recognized as 'I'. ocrad is > a lot worse, but still useful. my results with the same file is > complete gibberish with tesseract. > > The file is very high resolution, very high contrast. I can't show it > as it contains my bank statements. > > Is there some kind of guide for tunning the tool? At this point I'm > trying it to see if it recognizes the '1's better as the numbers are > of importance. But at this stage, the output is useless. English > language by the way. > > Here's an exert of the output, I think it's safe to paste as it seems > to contain nothing intelligible. > > ----------------------------------------------------- > F’I?IE`\!I()L.IS ST4¤n.TIEI**‘IEI\IT . > 6 I)IEF’()SITS 4¤n.I\II) ()TI—IE| > 51 (ZI—IE(ZI(S 4¤n.I\II) ()TI—IEI2 I > IINTEIQEST F’4¤n.II) TI—IIS F’| > SEI?\!I(ZE (ZI—I4¤n.I2(5E 4¤n.I**‘I()L.II\I` > (ZLJIQIQEINT ]B4¤n.I.4¤n.I\I(.TIE 4¤n.S (III > I\IL.II**‘IZBIEI2 (III: I)4¤n.‘¤’S II\I ST. > 4¤n.I\II\IL.I4¤n.I. F’IEI?(.TEI\IT4¤¤.l > 4¤n.\!EI24¤n.(5E I)4¤n.II.‘¤’ IB. > IINTEIQEST F’4¤n.II) ‘¤’| > ]D4¤n.TE 4¤n.I**‘I()L.II\I > ----------------------------------------------------- --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

