Hi,
PDFBox doesn't have much about this. Apache Tika (which uses PDFBox) has
better support re: tables.
Tilman
On 16.05.2025 12:47, Mathias Hultman wrote:
Hi!
I am trying to get pdfbox to convert a number of pdfs into a html-version. It
should so far as it is possible look like the pdf, with the structure of
images, tables, and text intact. But Im running into problems when trying to
accomplish this, and I find that the documentation is sort of lacking.
Ive managed to extract all text from the pdf, and Ive managed to extract all
the images extending PDFStreamEngine. Now I want to ‘merge’ these two into the
same application, where consideration is taken to the placement of the pictures
in regards to the text. Can anyone please help me out?
Regards,
Mathias Hultman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org