That sounds very much like this:
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/bc687b07cac549ed?hl=en

On Jan 15, 9:17 pm, Arthur Pemberton <[email protected]> wrote:
> I have a scanned bank statement, printed with a sans-serif font. using
> gocr, the only problem I have is '1' being recognized as 'I'. ocrad is
> a lot worse, but still useful. my results with the same file is
> complete gibberish with tesseract.
>
> The file is very high resolution, very high contrast. I can't show it
> as it contains my bank statements.
>
> Is there some kind of guide for tunning the tool? At this point I'm
> trying it to see if it recognizes the '1's better as the numbers are
> of importance. But at this stage, the output is useless. English
> language by the way.
>
> Here's an exert of the output, I think it's safe to paste as it seems
> to contain nothing intelligible.
>
> -----------------------------------------------------
> F’I?IE`\!I()L.IS ST4¤n.TIEI**‘IEI\IT .
> 6 I)IEF’()SITS 4¤n.I\II) ()TI—IE|
> 51 (ZI—IE(ZI(S 4¤n.I\II) ()TI—IEI2 I
> IINTEIQEST F’4¤n.II) TI—IIS F’|
> SEI?\!I(ZE (ZI—I4¤n.I2(5E 4¤n.I**‘I()L.II\I`
> (ZLJIQIQEINT ]B4¤n.I.4¤n.I\I(.TIE 4¤n.S (III
> I\IL.II**‘IZBIEI2 (III: I)4¤n.‘¤’S II\I ST.
> 4¤n.I\II\IL.I4¤n.I. F’IEI?(.TEI\IT4¤¤.l
> 4¤n.\!EI24¤n.(5E I)4¤n.II.‘¤’ IB.
> IINTEIQEST F’4¤n.II) ‘¤’|
> ]D4¤n.TE 4¤n.I**‘I()L.II\I
> -----------------------------------------------------
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to