Re: Convert PDF to HTML with PDFBox in a Java app - Need some introductory info & guidance

2018-02-03 Thread Tilman Hausherr
Am 02.02.2018 um 09:04 schrieb Serban Alexe: Thanks for the hints, I'll look into both of them. I'm aware that it's not possible to obtain something that looks like the original PDF, I'm rather aiming for something as close as possible, at least from the content perspective. *As an alternative*

Re: Convert PDF to HTML with PDFBox in a Java app - Need some introductory info & guidance

2018-02-02 Thread Thad Humphries
To create an image from a PDF, look at the code for the PDFToImage command line utility (https://pdfbox.apache.org/2.0/commandline.html#pdftoimage). I adapted this to convert PDFs to images. From a quick glance at my old code, I think you want org.apache.pdfbox.rendering.PDFRenderer. On Fri, Feb 2

Re: Convert PDF to HTML with PDFBox in a Java app - Need some introductory info & guidance

2018-02-02 Thread Serban Alexe
Thanks for the hints, I'll look into both of them. I'm aware that it's not possible to obtain something that looks like the original PDF, I'm rather aiming for something as close as possible, at least from the content perspective. *As an alternative* I could settle for a solution that extracts ea

Re: Convert PDF to HTML with PDFBox in a Java app - Need some introductory info & guidance

2018-02-01 Thread Jason Harrop
https://github.com/FitLayout/PDFAnalyzer is promising On 2 Feb. 2018 3:31 am, "Serban Alexe" wrote: > Hello everybody, > > I need to write a Java class that converts a *.pdf* document to the html > format, preferably keeping the original formatting to the best extent > possible. > Also, I need t

Re: Convert PDF to HTML with PDFBox in a Java app - Need some introductory info & guidance

2018-02-01 Thread Tilman Hausherr
Hi, Please have a look at the PDFText2HTML class in the source code download. There is also an ExtractImages and a PrintImageLocations class, but each of them is alone... you'll never get something like a PDF because PDF and HTML are really two different things. Tilman Am 01.02.2018 um 17:1