Mike, Thank you for your comment.
On Jul 7, 4:49 am, "Lutz, Michael" <[email protected]> wrote: > Well basically what I do is, if I have a gradient background I create the > black and white image myself using a fixed threshold, if the input is blurry > then I sharpen it. > If the input is too small I use a zoom and sharpen. So nothing special but it > helps me get good results for my purpose. > > Mike > > -----Ursprüngliche Nachricht----- > Von: [email protected] [mailto:[email protected]] > Im Auftrag von mw18888 > Gesendet: Mittwoch, 6. Juli 2011 15:57 > An: tesseract-ocr > Betreff: Re: Teseract vs Abbyy > > Mike and Patrick, > > Thank you for the comment. > > Mike, can you clarify the "preprocessing well"? > > Regards, > > mw18888 > > On Jul 6, 7:07 am, "Lutz, Michael" <[email protected]> wrote: > > > > > > > > > > > If you are referring tohttp://www.abbyyusa.com/, then I think the biggest > > difference is that tesseract is open source and abbyy not :). > > So in ABBYY you pay for the image preprocessing and in tesseract not. > > I totally agree with Patrick, if you do the preprocessing well then I > > always get perfect result with tesseract, but I never tried ABBYY. > > > Mike > > > Von: [email protected] [mailto:[email protected]] > > Im Auftrag von Patrick Questembert > > Gesendet: Mittwoch, 6. Juli 2011 12:54 > > An: [email protected] > > Betreff: Re: Teseract vs Abbyy > > > It's really a long list of approaches, including: > > - spacing: we don't trust any spacing determination by Tesseract and > > reevaluate every space indicated by Tesseract for possible elimination or > > consider every two letters for a possible space insertion > > - obvious mistakes: this is by far the largest category of corrections we > > make. For example VV is usually corrected back to W - but there are > > hundreds more cases > > - ambiguous letters such as i versus l: surprisingly, Tesseract makes a ton > > of incongruous mistakes that lead me to believe there is no feature > > analysis whatsoever - for example a 'y' may get mapped to 'g', even though > > there is 0% chance of that based on a wide open gap on top. For these types > > of mistakes we go back to the source image to apply our own OCR of sorts. > > - dictionaries: another big disappointment - from our testing we found that > > Tesseract applies the dictionary in less than 5% of the cases where it > > should (i.e. where the letter mistake is one listed in the ambigs files, > > with the correct spelling in the user dictionary) so we implemented our own > > dictionaries > > - pattern matching: the regular expressions we use include wide tolerance > > for mistakes. Under the "protection" of a regular expression for a specific > > pattern we have the flexibility to include hundreds of ambiguities (because > > these trigger only when they help complete a match which makes it more > > likely to be a valid substitution > > > PatrickOn Mon, Jul 4, 2011 at 12:56 AM, Andres > > <[email protected]<mailto:[email protected]>> wrote: > > > Hello Patrick, > > > Could you extend a little about what do you mean with Tesseract heuristics ? > > > Thanks, > > > Andres > > 2011/7/3 patrickq > > <[email protected]<mailto:[email protected]>> > > The answer is (of course) "it depends": > > 1. If you compare Tesseract and ABBY on a same image, without applying > > preprocessing to it, ABBY wins (because Tesseract's image processing > > is very rudimentary - at best). Of course if your test images are > > produced (for example) by a flatbed scanner, the lack of image > > processing is not an issue and refer to case 2 below. > > 2. If you compare Tesseract and ABBY on a clean (processed) image, > > without applying any post-Tesseract heuristic, ABBY may have an > > advantage > > 3. However, if you compare Tesseract + image processing + heuristics & > > corrections, Tesseract actually beats ABBY hands down. > > > ScanBizCards is case #3 around Tesseract 3.01. If you want to test > > this combo please do this: > > - go tohttp://www.scanbizcards.com/webdemo > > > - upload an image (under Batch Actions). Warning: ScanBizCards is > > geared towards recognizing text on business cards so it would be best > > if you tested on something *like* a business card (sparse text), not a > > full page with lots of text > > - click that image then "Image Editor" on top and OCR it > > - when done testing please delete the test images from this demo > > account (or get your own online account) ... > > > You can also test instead on your Android or iPhone mobile device by > > installing the free version of ScanBizCards. ABBY powers two iPhone > > apps made by German company - Business Card Reader (by Shape Services) > > and Card Reader (by xRoot Software) - and of course ABBY's own > > iPhone / Android business card reader app. > > > Patrick > > > On Jul 3, 10:10 am, mw18888 <[email protected]<mailto:[email protected]>> > > wrote: > > > > Can anyone comment on the accuracy of Tesseract vs Abbyy? > > > > Regards, > > > > mw18888 > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to > > [email protected]<mailto:[email protected]> > > To unsubscribe from this group, send email to > > [email protected]<mailto:tesseract-ocr%[email protected]> > > For more options, visit this group > > athttp://groups.google.com/group/tesseract-ocr?hl=en > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to > > [email protected]<mailto:[email protected]> > > To unsubscribe from this group, send email to > > [email protected]<mailto:tesseract-ocr%[email protected]> > > For more options, visit this group > > athttp://groups.google.com/group/tesseract-ocr?hl=en > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group > > athttp://groups.google.com/group/tesseract-ocr?hl=en > > > ________________________________ > > This message is confidential and intended only for the addressee. If you > > have received this message in error, please immediately notify the > > [email protected] and delete it from your system as well as any copies. > > The content of e-mails as well as traffic data may be monitored by NDS for > > employment and security purposes. > > To protect the environment please do not print this e-mail unless necessary. > > > An NDS Group Limited company.www.nds.com > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group > athttp://groups.google.com/group/tesseract-ocr?hl=en > > This message is confidential and intended only for the addressee. If you have > received this message in error, please immediately notify the > [email protected] and delete it from your system as well as any copies. The > content of e-mails as well as traffic data may be monitored by NDS for > employment and security purposes. > To protect the environment please do not print this e-mail unless necessary. > > An NDS Group Limited company.www.nds.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

