To create an image from a PDF, look at the code for the PDFToImage command line utility (https://pdfbox.apache.org/2.0/commandline.html#pdftoimage). I adapted this to convert PDFs to images. From a quick glance at my old code, I think you want org.apache.pdfbox.rendering.PDFRenderer.
On Fri, Feb 2, 2018 at 3:04 AM, Serban Alexe <serban.al...@gmail.com> wrote: > Thanks for the hints, I'll look into both of them. > > I'm aware that it's not possible to obtain something that looks like the > original PDF, I'm rather aiming for something as close as possible, at > least from the content perspective. > > *As an alternative* I could settle for a solution that extracts each page > from the pdf as an individual image. What options would I have in this case > ? > > Thanks. > > > > On 2018/02/01 16:14:00, Serban Alexe <s...@gmail.com> wrote: > > Hello everybody,> > > > > I need to write a Java class that converts a *.pdf* document to the html> > > format, preferably keeping the original formatting to the best extent> > > possible.> > > Also, I need to be able to extract the images (and preferably encode > them> > > as base64 in the html file).> > > > > *Can you please provide me some useful starting points and/or examples ? > *> > > > > Through google search, I was able to find some limited functionality> > > examples. None of these deal with images, and also my guess is that they> > > refer to some older version of the PDFBox suite...> > > > > Thank you,> > > > > Serban> > > > -- "Hell hath no limits, nor is circumscrib'd In one self-place; but where we are is hell, And where hell is, there must we ever be" --Christopher Marlowe, *Doctor Faustus* (v. 121-24)