I am working with some scanned .pdf documents, one image per page, with OCR text behind the page image. I need to extract the OCR text behind a user mouse selection of a rectangle. I believe I can use the techniques of ExtractTextByArea, but I need to scale from the image coordinates to the 72/inch PDF units for text.
When using the PrintImageLocations example I am getting strange/unknown width & height. Search of the pdfbox mail archive shows discussion of this problem back in Dec 2009. In the thread http://markmail.org/message/m5tcighpru2dccbu Andreas Lehmkühler recommends using the technique used in http://svn.apache.org/repos/asf/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/operator/pagedrawer/Invoke.java Unfortunately, this URL is currently broken. Any assistance/pointers would be greatly appreciated. Thanks, Michael

