If you are referring to http://www.abbyyusa.com/, then I think the biggest 
difference is that tesseract is open source and abbyy not :).
So in ABBYY you pay for the image preprocessing and in tesseract not.
I totally agree with Patrick, if you do the preprocessing well then I always 
get perfect result with tesseract, but I never tried ABBYY.

Mike

Von: [email protected] [mailto:[email protected]] Im 
Auftrag von Patrick Questembert
Gesendet: Mittwoch, 6. Juli 2011 12:54
An: [email protected]
Betreff: Re: Teseract vs Abbyy

It's really a long list of approaches, including:
- spacing: we don't trust any spacing determination by Tesseract and reevaluate 
every space indicated by Tesseract for possible elimination or consider every 
two letters for a possible space insertion
- obvious mistakes: this is by far the largest category of corrections we make. 
For example VV is usually corrected back to W - but there are hundreds more 
cases
- ambiguous letters such as i versus l: surprisingly, Tesseract makes a ton of 
incongruous mistakes that lead me to believe there is no feature analysis 
whatsoever - for example a 'y' may get mapped to 'g', even though there is 0% 
chance of that based on a wide open gap on top. For these types of mistakes we 
go back to the source image to apply our own OCR of sorts.
- dictionaries: another big disappointment - from our testing we found that 
Tesseract applies the dictionary in less than 5% of the cases where it should 
(i.e. where the letter mistake is one listed in the ambigs files, with the 
correct spelling in the user dictionary) so we implemented our own dictionaries
- pattern matching: the regular expressions we use include wide tolerance for 
mistakes. Under the "protection" of a regular expression for a specific pattern 
we have the flexibility to include hundreds of ambiguities (because these 
trigger only when they help complete a match which makes it more likely to be a 
valid substitution

Patrick
On Mon, Jul 4, 2011 at 12:56 AM, Andres 
<[email protected]<mailto:[email protected]>> wrote:
Hello Patrick,

Could you extend a little about what do you mean with Tesseract heuristics ?

Thanks,

Andres
2011/7/3 patrickq 
<[email protected]<mailto:[email protected]>>
The answer is (of course) "it depends":
1. If you compare Tesseract and ABBY on a same image, without applying
preprocessing to it, ABBY wins (because Tesseract's image processing
is very rudimentary - at best). Of course if your test images are
produced (for example) by a flatbed scanner, the lack of image
processing is not an issue and refer to case 2 below.
2. If you compare Tesseract and ABBY on a clean (processed) image,
without applying any post-Tesseract heuristic, ABBY may have an
advantage
3. However, if you compare Tesseract + image processing + heuristics &
corrections, Tesseract actually beats ABBY hands down.

ScanBizCards is case #3 around Tesseract 3.01. If you want to test
this combo please do this:
- go to http://www.scanbizcards.com/webdemo

- upload an image (under Batch Actions). Warning: ScanBizCards is
geared towards recognizing text on business cards so it would be best
if you tested on something *like* a business card (sparse text), not a
full page with lots of text
- click that image then "Image Editor" on top and OCR it
- when done testing please delete the test images from this demo
account (or get your own online account) ...

You can also test instead on your Android or iPhone mobile device by
installing the free version of ScanBizCards. ABBY powers two iPhone
apps made by German company - Business Card Reader (by Shape Services)
and Card Reader (by xRoot Software) - and of course ABBY's own
iPhone / Android business card reader app.

Patrick

On Jul 3, 10:10 am, mw18888 <[email protected]<mailto:[email protected]>> wrote:
> Can anyone comment on the accuracy of Tesseract vs Abbyy?
>
> Regards,
>
> mw18888

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to 
[email protected]<mailto:[email protected]>
To unsubscribe from this group, send email to
[email protected]<mailto:tesseract-ocr%[email protected]>
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to 
[email protected]<mailto:[email protected]>
To unsubscribe from this group, send email to
[email protected]<mailto:tesseract-ocr%[email protected]>
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

________________________________
This message is confidential and intended only for the addressee. If you have 
received this message in error, please immediately notify the 
[email protected] and delete it from your system as well as any copies. The 
content of e-mails as well as traffic data may be monitored by NDS for 
employment and security purposes.
To protect the environment please do not print this e-mail unless necessary.

An NDS Group Limited company. www.nds.com

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to