Well basically what I do is, if I have a gradient background I create the black and white image myself using a fixed threshold, if the input is blurry then I sharpen it. If the input is too small I use a zoom and sharpen. So nothing special but it helps me get good results for my purpose.
Mike -----Ursprüngliche Nachricht----- Von: [email protected] [mailto:[email protected]] Im Auftrag von mw18888 Gesendet: Mittwoch, 6. Juli 2011 15:57 An: tesseract-ocr Betreff: Re: Teseract vs Abbyy Mike and Patrick, Thank you for the comment. Mike, can you clarify the "preprocessing well"? Regards, mw18888 On Jul 6, 7:07 am, "Lutz, Michael" <[email protected]> wrote: > If you are referring tohttp://www.abbyyusa.com/, then I think the biggest > difference is that tesseract is open source and abbyy not :). > So in ABBYY you pay for the image preprocessing and in tesseract not. > I totally agree with Patrick, if you do the preprocessing well then I always > get perfect result with tesseract, but I never tried ABBYY. > > Mike > > Von: [email protected] [mailto:[email protected]] > Im Auftrag von Patrick Questembert > Gesendet: Mittwoch, 6. Juli 2011 12:54 > An: [email protected] > Betreff: Re: Teseract vs Abbyy > > It's really a long list of approaches, including: > - spacing: we don't trust any spacing determination by Tesseract and > reevaluate every space indicated by Tesseract for possible elimination or > consider every two letters for a possible space insertion > - obvious mistakes: this is by far the largest category of corrections we > make. For example VV is usually corrected back to W - but there are hundreds > more cases > - ambiguous letters such as i versus l: surprisingly, Tesseract makes a ton > of incongruous mistakes that lead me to believe there is no feature analysis > whatsoever - for example a 'y' may get mapped to 'g', even though there is 0% > chance of that based on a wide open gap on top. For these types of mistakes > we go back to the source image to apply our own OCR of sorts. > - dictionaries: another big disappointment - from our testing we found that > Tesseract applies the dictionary in less than 5% of the cases where it should > (i.e. where the letter mistake is one listed in the ambigs files, with the > correct spelling in the user dictionary) so we implemented our own > dictionaries > - pattern matching: the regular expressions we use include wide tolerance for > mistakes. Under the "protection" of a regular expression for a specific > pattern we have the flexibility to include hundreds of ambiguities (because > these trigger only when they help complete a match which makes it more likely > to be a valid substitution > > PatrickOn Mon, Jul 4, 2011 at 12:56 AM, Andres > <[email protected]<mailto:[email protected]>> wrote: > > Hello Patrick, > > Could you extend a little about what do you mean with Tesseract heuristics ? > > Thanks, > > Andres > 2011/7/3 patrickq > <[email protected]<mailto:[email protected]>> > The answer is (of course) "it depends": > 1. If you compare Tesseract and ABBY on a same image, without applying > preprocessing to it, ABBY wins (because Tesseract's image processing > is very rudimentary - at best). Of course if your test images are > produced (for example) by a flatbed scanner, the lack of image > processing is not an issue and refer to case 2 below. > 2. If you compare Tesseract and ABBY on a clean (processed) image, > without applying any post-Tesseract heuristic, ABBY may have an > advantage > 3. However, if you compare Tesseract + image processing + heuristics & > corrections, Tesseract actually beats ABBY hands down. > > ScanBizCards is case #3 around Tesseract 3.01. If you want to test > this combo please do this: > - go tohttp://www.scanbizcards.com/webdemo > > - upload an image (under Batch Actions). Warning: ScanBizCards is > geared towards recognizing text on business cards so it would be best > if you tested on something *like* a business card (sparse text), not a > full page with lots of text > - click that image then "Image Editor" on top and OCR it > - when done testing please delete the test images from this demo > account (or get your own online account) ... > > You can also test instead on your Android or iPhone mobile device by > installing the free version of ScanBizCards. ABBY powers two iPhone > apps made by German company - Business Card Reader (by Shape Services) > and Card Reader (by xRoot Software) - and of course ABBY's own > iPhone / Android business card reader app. > > Patrick > > On Jul 3, 10:10 am, mw18888 <[email protected]<mailto:[email protected]>> > wrote: > > > Can anyone comment on the accuracy of Tesseract vs Abbyy? > > > Regards, > > > mw18888 > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to > [email protected]<mailto:[email protected]> > To unsubscribe from this group, send email to > [email protected]<mailto:tesseract-ocr%[email protected]> > For more options, visit this group > athttp://groups.google.com/group/tesseract-ocr?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to > [email protected]<mailto:[email protected]> > To unsubscribe from this group, send email to > [email protected]<mailto:tesseract-ocr%[email protected]> > For more options, visit this group > athttp://groups.google.com/group/tesseract-ocr?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group > athttp://groups.google.com/group/tesseract-ocr?hl=en > > ________________________________ > This message is confidential and intended only for the addressee. If you have > received this message in error, please immediately notify the > [email protected] and delete it from your system as well as any copies. The > content of e-mails as well as traffic data may be monitored by NDS for > employment and security purposes. > To protect the environment please do not print this e-mail unless necessary. > > An NDS Group Limited company.www.nds.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the [email protected] and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by NDS for employment and security purposes. To protect the environment please do not print this e-mail unless necessary. An NDS Group Limited company. www.nds.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

