Hello, I have just started using pdfbox and I have had some luck with extracting text by area. I have run into a slight problem and after searching I am still unable to find anything that might help.
The pdfs I am processing are a generic form. Most of the time the text I want is in the same location, but occasionally the text is shifted down by one line if one of the fields runs longer than expected. Is there a way to locate the x,y coordinates of a field's name so that I can get around this problem? To illustrate my problem: A single page pdf has three fields that are separated by a blank line each (imagine a word document output). The three fields are Name, Address, Hobby. In most cases the address field will be three lines long (street, city/state, zip), but occasionally there will be a fourth line (apt num, floor, etc) present. When the fourth line is present, the hobby field is shifted down by one line. Is there are way to find the location of the hobby string on the page so that the location is no longer hard coded into the program, but is a variable? Thanks for you help, James Vines

