Well basically what I do is, if I have a gradient background I create the black 
and white image myself using a fixed threshold, if the input is blurry then I 
sharpen it.
If the input is too small I use a zoom and sharpen. So nothing special but it 
helps me get good results for my purpose.

Mike

-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]] Im 
Auftrag von mw18888
Gesendet: Mittwoch, 6. Juli 2011 15:57
An: tesseract-ocr
Betreff: Re: Teseract vs Abbyy

Mike and Patrick,

Thank you for the comment.

Mike, can you clarify the "preprocessing well"?

Regards,

mw18888

On Jul 6, 7:07 am, "Lutz, Michael" <[email protected]> wrote:
> If you are referring tohttp://www.abbyyusa.com/, then I think the biggest 
> difference is that tesseract is open source and abbyy not :).
> So in ABBYY you pay for the image preprocessing and in tesseract not.
> I totally agree with Patrick, if you do the preprocessing well then I always 
> get perfect result with tesseract, but I never tried ABBYY.
>
> Mike
>
> Von: [email protected] [mailto:[email protected]] 
> Im Auftrag von Patrick Questembert
> Gesendet: Mittwoch, 6. Juli 2011 12:54
> An: [email protected]
> Betreff: Re: Teseract vs Abbyy
>
> It's really a long list of approaches, including:
> - spacing: we don't trust any spacing determination by Tesseract and 
> reevaluate every space indicated by Tesseract for possible elimination or 
> consider every two letters for a possible space insertion
> - obvious mistakes: this is by far the largest category of corrections we 
> make. For example VV is usually corrected back to W - but there are hundreds 
> more cases
> - ambiguous letters such as i versus l: surprisingly, Tesseract makes a ton 
> of incongruous mistakes that lead me to believe there is no feature analysis 
> whatsoever - for example a 'y' may get mapped to 'g', even though there is 0% 
> chance of that based on a wide open gap on top. For these types of mistakes 
> we go back to the source image to apply our own OCR of sorts.
> - dictionaries: another big disappointment - from our testing we found that 
> Tesseract applies the dictionary in less than 5% of the cases where it should 
> (i.e. where the letter mistake is one listed in the ambigs files, with the 
> correct spelling in the user dictionary) so we implemented our own 
> dictionaries
> - pattern matching: the regular expressions we use include wide tolerance for 
> mistakes. Under the "protection" of a regular expression for a specific 
> pattern we have the flexibility to include hundreds of ambiguities (because 
> these trigger only when they help complete a match which makes it more likely 
> to be a valid substitution
>
> PatrickOn Mon, Jul 4, 2011 at 12:56 AM, Andres 
> <[email protected]<mailto:[email protected]>> wrote:
>
> Hello Patrick,
>
> Could you extend a little about what do you mean with Tesseract heuristics ?
>
> Thanks,
>
> Andres
> 2011/7/3 patrickq 
> <[email protected]<mailto:[email protected]>>
> The answer is (of course) "it depends":
> 1. If you compare Tesseract and ABBY on a same image, without applying
> preprocessing to it, ABBY wins (because Tesseract's image processing
> is very rudimentary - at best). Of course if your test images are
> produced (for example) by a flatbed scanner, the lack of image
> processing is not an issue and refer to case 2 below.
> 2. If you compare Tesseract and ABBY on a clean (processed) image,
> without applying any post-Tesseract heuristic, ABBY may have an
> advantage
> 3. However, if you compare Tesseract + image processing + heuristics &
> corrections, Tesseract actually beats ABBY hands down.
>
> ScanBizCards is case #3 around Tesseract 3.01. If you want to test
> this combo please do this:
> - go tohttp://www.scanbizcards.com/webdemo
>
> - upload an image (under Batch Actions). Warning: ScanBizCards is
> geared towards recognizing text on business cards so it would be best
> if you tested on something *like* a business card (sparse text), not a
> full page with lots of text
> - click that image then "Image Editor" on top and OCR it
> - when done testing please delete the test images from this demo
> account (or get your own online account) ...
>
> You can also test instead on your Android or iPhone mobile device by
> installing the free version of ScanBizCards. ABBY powers two iPhone
> apps made by German company - Business Card Reader (by Shape Services)
> and Card Reader (by xRoot Software) - and of course ABBY's own
> iPhone / Android business card reader app.
>
> Patrick
>
> On Jul 3, 10:10 am, mw18888 <[email protected]<mailto:[email protected]>> 
> wrote:
>
> > Can anyone comment on the accuracy of Tesseract vs Abbyy?
>
> > Regards,
>
> > mw18888
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to 
> [email protected]<mailto:[email protected]>
> To unsubscribe from this group, send email to
> [email protected]<mailto:tesseract-ocr%[email protected]>
> For more options, visit this group 
> athttp://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to 
> [email protected]<mailto:[email protected]>
> To unsubscribe from this group, send email to
> [email protected]<mailto:tesseract-ocr%[email protected]>
> For more options, visit this group 
> athttp://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group 
> athttp://groups.google.com/group/tesseract-ocr?hl=en
>
> ________________________________
> This message is confidential and intended only for the addressee. If you have 
> received this message in error, please immediately notify the 
> [email protected] and delete it from your system as well as any copies. The 
> content of e-mails as well as traffic data may be monitored by NDS for 
> employment and security purposes.
> To protect the environment please do not print this e-mail unless necessary.
>
> An NDS Group Limited company.www.nds.com

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

This message is confidential and intended only for the addressee. If you have 
received this message in error, please immediately notify the 
[email protected] and delete it from your system as well as any copies. The 
content of e-mails as well as traffic data may be monitored by NDS for 
employment and security purposes.
To protect the environment please do not print this e-mail unless necessary.

An NDS Group Limited company. www.nds.com

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to