PDFBox to pdftohtml comparison?

Mark Kerzner Thu, 23 Jul 2009 12:13:47 -0700

Hi,
I have compared the PDFBox-to-text to the pdftohtml (in Linux) - then to
text conversion, and I found the second one a little clearer. For example,
the bottom lines in a PDF (Copyrights, etc) were combined into one line by
the PDFBox conversion, and had three separate pieces in the other way.


I am using the last stable PDFBox jar, which dates back to 2006, and the
pdftohtml utility is from about the same time, so I can understand this.

My question then is twofold: does the comparison make sense, and should I
use the pdftohtml combined with text converter, or should I try to build the
latest from SVN?

Thank you,
Mark

PDFBox to pdftohtml comparison?

Reply via email to