Re: [Vo]:Neat new OCR technology
2010/3/19 Michel Jullian : ... if you convert a > clearscan pdf back to image format in higher resolution e.g. 600 dpi > (this can be set in edit>preferences>convert from pdf>TIFF>edit > settings), make a new pdf from that, and re-do an OCR on it, > interestingly the recognition accuracy is improved, Let me retract this, after experimenting on a few more pages it turns out the 2nd OCR pass makes roughly the same number of recognition errors as the 1st pass on average, what fooled me is that it doesn't do them on the same words. So there is no point really in going through the complexity and hard work of a 2nd pass. There is another use however, useful this time, of the trick of saving as tiff and re-pdf-ing before OCRing: it circumvents the "Acrobat could not perform recognition (OCR) on this page because: This page contains renderable text." error you get on some documents, which annoyingly aborts the whole OCR process. If anyone knows of a simpler way, I am interested. Last point, I see they have integrated the "OCR multiple files" feature to the main menu in version 9, so one doesn't have to go through the batch processing procedure to OCR a large collection of documents. Much more convenient. Michel
Re: [Vo]:Neat new OCR technology
One can download Acrobat 9 from their web site and try it for a month for free. Disappointingly, the accuracy of the recognition itself is not better with this clearscan option, it's just the look. However, thanks to the zoomable (vector) nature of the clearscan characters, if you convert a clearscan pdf back to image format in higher resolution e.g. 600 dpi (this can be set in edit>preferences>convert from pdf>TIFF>edit settings), make a new pdf from that, and re-do an OCR on it, interestingly the recognition accuracy is improved, at least it seemed to be in the couple trials I have done. If this is confirmed, hopefully they will realize this and automate the two pass OCR in version 10. Michel 2010/3/18 Jed Rothwell : > That is impressive! > > I hate Adobe's user interface and documentation, but I might get this > product anyway. > > - Jed > >
Re: [Vo]:Neat new OCR technology
That is impressive! I hate Adobe's user interface and documentation, but I might get this product anyway. - Jed
[Vo]:Neat new OCR technology
Jed, have you tried the "clearscan" setting in Adobe Acrobat 9 OCR? Very impressive. They explain their clever (and "obvious", in retrospect) trick in this demo video: http://my.adobe.acrobat.com/p28891758/ Michel