Extract text and images into html version of pdf

Mathias Hultman Fri, 16 May 2025 03:48:25 -0700

Hi!

I am trying to get pdfbox to convert a number of pdfs into a html-version. It 
should so far as it is possible look like the pdf, with the structure of 
images, tables, and text intact. But Im running into problems when trying to 
accomplish this, and I find that the documentation is sort of lacking.
Ive managed to extract all text from the pdf, and Ive managed to extract all 
the images extending PDFStreamEngine. Now I want to ‘merge’ these two into the 
same application, where consideration is taken to the placement of the pictures 
in regards to the text. Can anyone please help me out?


Regards,
Mathias Hultman

Extract text and images into html version of pdf

Reply via email to