I am working with some scanned .pdf documents, one image per page,
with OCR text behind the page image.
I need to extract the OCR text behind a user mouse selection of a rectangle.
I believe I can use the techniques of ExtractTextByArea, but I need to
scale from the image coordinates to the 72/inch PDF units for text.

When using the PrintImageLocations example I am getting
strange/unknown width & height.
Search of the pdfbox mail archive shows discussion of this problem
back in Dec 2009.
In the thread
  http://markmail.org/message/m5tcighpru2dccbu
Andreas Lehmkühler recommends using the technique used in
  
http://svn.apache.org/repos/asf/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/operator/pagedrawer/Invoke.java
Unfortunately, this URL is currently broken.

Any assistance/pointers would be greatly appreciated.

Thanks,
Michael

Reply via email to