Hi! I am trying to get pdfbox to convert a number of pdfs into a html-version. It should so far as it is possible look like the pdf, with the structure of images, tables, and text intact. But Im running into problems when trying to accomplish this, and I find that the documentation is sort of lacking. Ive managed to extract all text from the pdf, and Ive managed to extract all the images extending PDFStreamEngine. Now I want to ‘merge’ these two into the same application, where consideration is taken to the placement of the pictures in regards to the text. Can anyone please help me out?
Regards, Mathias Hultman