2010/3/19 Michel Jullian <[email protected]>: ... if you convert a > clearscan pdf back to image format in higher resolution e.g. 600 dpi > (this can be set in edit>preferences>convert from pdf>TIFF>edit > settings), make a new pdf from that, and re-do an OCR on it, > interestingly the recognition accuracy is improved,
Let me retract this, after experimenting on a few more pages it turns out the 2nd OCR pass makes roughly the same number of recognition errors as the 1st pass on average, what fooled me is that it doesn't do them on the same words. So there is no point really in going through the complexity and hard work of a 2nd pass. There is another use however, useful this time, of the trick of saving as tiff and re-pdf-ing before OCRing: it circumvents the "Acrobat could not perform recognition (OCR) on this page because: This page contains renderable text." error you get on some documents, which annoyingly aborts the whole OCR process. If anyone knows of a simpler way, I am interested. Last point, I see they have integrated the "OCR multiple files" feature to the main menu in version 9, so one doesn't have to go through the batch processing procedure to OCR a large collection of documents. Much more convenient. Michel

