Alexander Meyer wrote:
There is this
command line utility ExtractText coming (soon, the website says) with
PDFBox. If this tool would be able to extrect not just the text but
also the position I think it maybe could solve the problem.

Internally, it is representing the needed information (the Java Class is existing). It prints text fragments. The printed objects are TextPositions which have the represented text and the position on the page as attribues.

The problems mentioned in my poster and this thread still remain.

Regards,
 Roman

Reply via email to