Hi Spud:

At least not for me. It seems that the TextPosition object has relative
method to let you work on the font and such a object is said be
available from the PDFTextStripper class. However, I tried both writing
code and reading through the source code and double convinced that
things are not working out that way. And I posted my questions on that
here, but so far nobody can give an answer.

Best,

Felix


spud wrote:
A few months ago I was trying to extract formatted text from a pdf,
and output in a structured format (ideally xml/html). The text
attributes I required to be available for each line of text were:

- Paragraph (ie relative location on page)
- Font
- Font size
- Font weight

I tried to do this with PDFBox at the time but was unable to. I posted
to the mailing list and was told this functionality was not available
yet, and I would have to implement it myself. I didn't have the time
(and possibly the ability) to do this, so I went with a commercial
tool.

Has PDFBox now moved on enough for it to be able to do the above out
of the box (no pun intended!)?

Thanks.

Reply via email to