Mike,

Thank you for your comment.



On Jul 7, 4:49 am, "Lutz, Michael" <[email protected]> wrote:
> Well basically what I do is, if I have a gradient background I create the 
> black and white image myself using a fixed threshold, if the input is blurry 
> then I sharpen it.
> If the input is too small I use a zoom and sharpen. So nothing special but it 
> helps me get good results for my purpose.
>
> Mike
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected] [mailto:[email protected]] 
> Im Auftrag von mw18888
> Gesendet: Mittwoch, 6. Juli 2011 15:57
> An: tesseract-ocr
> Betreff: Re: Teseract vs Abbyy
>
> Mike and Patrick,
>
> Thank you for the comment.
>
> Mike, can you clarify the "preprocessing well"?
>
> Regards,
>
> mw18888
>
> On Jul 6, 7:07 am, "Lutz, Michael" <[email protected]> wrote:
>
>
>
>
>
>
>
>
>
> > If you are referring tohttp://www.abbyyusa.com/, then I think the biggest 
> > difference is that tesseract is open source and abbyy not :).
> > So in ABBYY you pay for the image preprocessing and in tesseract not.
> > I totally agree with Patrick, if you do the preprocessing well then I 
> > always get perfect result with tesseract, but I never tried ABBYY.
>
> > Mike
>
> > Von: [email protected] [mailto:[email protected]] 
> > Im Auftrag von Patrick Questembert
> > Gesendet: Mittwoch, 6. Juli 2011 12:54
> > An: [email protected]
> > Betreff: Re: Teseract vs Abbyy
>
> > It's really a long list of approaches, including:
> > - spacing: we don't trust any spacing determination by Tesseract and 
> > reevaluate every space indicated by Tesseract for possible elimination or 
> > consider every two letters for a possible space insertion
> > - obvious mistakes: this is by far the largest category of corrections we 
> > make. For example VV is usually corrected back to W - but there are 
> > hundreds more cases
> > - ambiguous letters such as i versus l: surprisingly, Tesseract makes a ton 
> > of incongruous mistakes that lead me to believe there is no feature 
> > analysis whatsoever - for example a 'y' may get mapped to 'g', even though 
> > there is 0% chance of that based on a wide open gap on top. For these types 
> > of mistakes we go back to the source image to apply our own OCR of sorts.
> > - dictionaries: another big disappointment - from our testing we found that 
> > Tesseract applies the dictionary in less than 5% of the cases where it 
> > should (i.e. where the letter mistake is one listed in the ambigs files, 
> > with the correct spelling in the user dictionary) so we implemented our own 
> > dictionaries
> > - pattern matching: the regular expressions we use include wide tolerance 
> > for mistakes. Under the "protection" of a regular expression for a specific 
> > pattern we have the flexibility to include hundreds of ambiguities (because 
> > these trigger only when they help complete a match which makes it more 
> > likely to be a valid substitution
>
> > PatrickOn Mon, Jul 4, 2011 at 12:56 AM, Andres 
> > <[email protected]<mailto:[email protected]>> wrote:
>
> > Hello Patrick,
>
> > Could you extend a little about what do you mean with Tesseract heuristics ?
>
> > Thanks,
>
> > Andres
> > 2011/7/3 patrickq 
> > <[email protected]<mailto:[email protected]>>
> > The answer is (of course) "it depends":
> > 1. If you compare Tesseract and ABBY on a same image, without applying
> > preprocessing to it, ABBY wins (because Tesseract's image processing
> > is very rudimentary - at best). Of course if your test images are
> > produced (for example) by a flatbed scanner, the lack of image
> > processing is not an issue and refer to case 2 below.
> > 2. If you compare Tesseract and ABBY on a clean (processed) image,
> > without applying any post-Tesseract heuristic, ABBY may have an
> > advantage
> > 3. However, if you compare Tesseract + image processing + heuristics &
> > corrections, Tesseract actually beats ABBY hands down.
>
> > ScanBizCards is case #3 around Tesseract 3.01. If you want to test
> > this combo please do this:
> > - go tohttp://www.scanbizcards.com/webdemo
>
> > - upload an image (under Batch Actions). Warning: ScanBizCards is
> > geared towards recognizing text on business cards so it would be best
> > if you tested on something *like* a business card (sparse text), not a
> > full page with lots of text
> > - click that image then "Image Editor" on top and OCR it
> > - when done testing please delete the test images from this demo
> > account (or get your own online account) ...
>
> > You can also test instead on your Android or iPhone mobile device by
> > installing the free version of ScanBizCards. ABBY powers two iPhone
> > apps made by German company - Business Card Reader (by Shape Services)
> > and Card Reader (by xRoot Software) - and of course ABBY's own
> > iPhone / Android business card reader app.
>
> > Patrick
>
> > On Jul 3, 10:10 am, mw18888 <[email protected]<mailto:[email protected]>> 
> > wrote:
>
> > > Can anyone comment on the accuracy of Tesseract vs Abbyy?
>
> > > Regards,
>
> > > mw18888
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to 
> > [email protected]<mailto:[email protected]>
> > To unsubscribe from this group, send email to
> > [email protected]<mailto:tesseract-ocr%[email protected]>
> > For more options, visit this group 
> > athttp://groups.google.com/group/tesseract-ocr?hl=en
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to 
> > [email protected]<mailto:[email protected]>
> > To unsubscribe from this group, send email to
> > [email protected]<mailto:tesseract-ocr%[email protected]>
> > For more options, visit this group 
> > athttp://groups.google.com/group/tesseract-ocr?hl=en
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group 
> > athttp://groups.google.com/group/tesseract-ocr?hl=en
>
> > ________________________________
> > This message is confidential and intended only for the addressee. If you 
> > have received this message in error, please immediately notify the 
> > [email protected] and delete it from your system as well as any copies. 
> > The content of e-mails as well as traffic data may be monitored by NDS for 
> > employment and security purposes.
> > To protect the environment please do not print this e-mail unless necessary.
>
> > An NDS Group Limited company.www.nds.com
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group 
> athttp://groups.google.com/group/tesseract-ocr?hl=en
>
> This message is confidential and intended only for the addressee. If you have 
> received this message in error, please immediately notify the 
> [email protected] and delete it from your system as well as any copies. The 
> content of e-mails as well as traffic data may be monitored by NDS for 
> employment and security purposes.
> To protect the environment please do not print this e-mail unless necessary.
>
> An NDS Group Limited company.www.nds.com

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to