If you are referring to http://www.abbyyusa.com/, then I think the biggest difference is that tesseract is open source and abbyy not :). So in ABBYY you pay for the image preprocessing and in tesseract not. I totally agree with Patrick, if you do the preprocessing well then I always get perfect result with tesseract, but I never tried ABBYY.
Mike Von: [email protected] [mailto:[email protected]] Im Auftrag von Patrick Questembert Gesendet: Mittwoch, 6. Juli 2011 12:54 An: [email protected] Betreff: Re: Teseract vs Abbyy It's really a long list of approaches, including: - spacing: we don't trust any spacing determination by Tesseract and reevaluate every space indicated by Tesseract for possible elimination or consider every two letters for a possible space insertion - obvious mistakes: this is by far the largest category of corrections we make. For example VV is usually corrected back to W - but there are hundreds more cases - ambiguous letters such as i versus l: surprisingly, Tesseract makes a ton of incongruous mistakes that lead me to believe there is no feature analysis whatsoever - for example a 'y' may get mapped to 'g', even though there is 0% chance of that based on a wide open gap on top. For these types of mistakes we go back to the source image to apply our own OCR of sorts. - dictionaries: another big disappointment - from our testing we found that Tesseract applies the dictionary in less than 5% of the cases where it should (i.e. where the letter mistake is one listed in the ambigs files, with the correct spelling in the user dictionary) so we implemented our own dictionaries - pattern matching: the regular expressions we use include wide tolerance for mistakes. Under the "protection" of a regular expression for a specific pattern we have the flexibility to include hundreds of ambiguities (because these trigger only when they help complete a match which makes it more likely to be a valid substitution Patrick On Mon, Jul 4, 2011 at 12:56 AM, Andres <[email protected]<mailto:[email protected]>> wrote: Hello Patrick, Could you extend a little about what do you mean with Tesseract heuristics ? Thanks, Andres 2011/7/3 patrickq <[email protected]<mailto:[email protected]>> The answer is (of course) "it depends": 1. If you compare Tesseract and ABBY on a same image, without applying preprocessing to it, ABBY wins (because Tesseract's image processing is very rudimentary - at best). Of course if your test images are produced (for example) by a flatbed scanner, the lack of image processing is not an issue and refer to case 2 below. 2. If you compare Tesseract and ABBY on a clean (processed) image, without applying any post-Tesseract heuristic, ABBY may have an advantage 3. However, if you compare Tesseract + image processing + heuristics & corrections, Tesseract actually beats ABBY hands down. ScanBizCards is case #3 around Tesseract 3.01. If you want to test this combo please do this: - go to http://www.scanbizcards.com/webdemo - upload an image (under Batch Actions). Warning: ScanBizCards is geared towards recognizing text on business cards so it would be best if you tested on something *like* a business card (sparse text), not a full page with lots of text - click that image then "Image Editor" on top and OCR it - when done testing please delete the test images from this demo account (or get your own online account) ... You can also test instead on your Android or iPhone mobile device by installing the free version of ScanBizCards. ABBY powers two iPhone apps made by German company - Business Card Reader (by Shape Services) and Card Reader (by xRoot Software) - and of course ABBY's own iPhone / Android business card reader app. Patrick On Jul 3, 10:10 am, mw18888 <[email protected]<mailto:[email protected]>> wrote: > Can anyone comment on the accuracy of Tesseract vs Abbyy? > > Regards, > > mw18888 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]<mailto:[email protected]> To unsubscribe from this group, send email to [email protected]<mailto:tesseract-ocr%[email protected]> For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]<mailto:[email protected]> To unsubscribe from this group, send email to [email protected]<mailto:tesseract-ocr%[email protected]> For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en ________________________________ This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the [email protected] and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by NDS for employment and security purposes. To protect the environment please do not print this e-mail unless necessary. An NDS Group Limited company. www.nds.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

