Re: a test suit for Polish (was: [ocropus] Re: OCRopus 0.5.4 and UTF-8 encoding)

c.kruk Thu, 26 Jul 2012 15:09:17 -0700

 

A few days ago my machine stopped to work displaying the screen full of the 
obscure error messages. I took the picture of the screen and rebooted the 
machine. Because I was too lazy to spend an hour on copying out the 
contents of the screen manually I decided to try some OCR engines. I 
inspected gocr 0.49, OCRopus 0.5.4, and Tesseract 3.01.


After four days of the intensive work I learned a bit about OCR and now I 
know none of the mentioned programs is able to process properly the strings 
of numbers and letters such as “[226158.728554] [<c1430000>] ? 
cs5520_init_one+0x14e/0x35f”. Personally I doubt there is any other OCR 
engine capable to process such a text on the basis of the photo of the 
moderate quality. The only solution is to copy out these messages manually.

It’s the instructive example of the state of affairs named the irony of 
fate.


 I studied the “Report on the comparison of Tesseract and ABBYY FineReader 
OCR engines” by Heliński, Kmieciak, and Parkoła 
(http://lib.psnc.pl/dlibra/docmetadata?id=358&from=publication&showContent=true).
 
It is very interesting – at least for the users of these two programs – 
though the other people interested in OCR engines should be satisfied 
reading that document as well. The report is very reliable and informative. 
Thank you, professor, for that valuable link.


-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msg/ocropus/-/1uo5A6p6E64J.
For more options, visit https://groups.google.com/groups/opt_out.

Re: a test suit for Polish (was: [ocropus] Re: OCRopus 0.5.4 and UTF-8 encoding)

Reply via email to