Hello,

I have just started using pdfbox and I have had some luck with extracting
text by area. I have run into a slight problem and after searching I am
still unable to find anything that might help.

The pdfs I am processing are a generic form. Most of the time the text I
want is in the same location, but occasionally the text is shifted down by
one line if one of the fields runs longer than expected. Is there a way to
locate the x,y coordinates of a field's name so that I can get around this
problem?

To illustrate my problem:

A single page pdf has three fields that are separated by a blank line each
(imagine a word document output). The three fields are Name, Address, Hobby.
In most cases the address field will be three lines long (street,
city/state, zip), but occasionally there will be a fourth line (apt num,
floor, etc) present. When the fourth line is present, the hobby field is
shifted down by one line. Is there are way to find the location of the hobby
string on the page so that the location is no longer hard coded into the
program, but is a variable?

Thanks for you help,
James Vines

Reply via email to