Kristian,
I assume all of you comments are with the 0.7.0 version of PDFBox. There
were some great improvements in that version in terms of speed and
accuracy.
> That's courious beacause we experienced that pdftotext was able to
> convert 33% more pdf documents than PDFBox.
Depending on the se
Hi Christiaan
Just to defend PDFBox: we actually recently decided to move in the
opposite direction.
I didn't want to offend PDFBox *g*
We just removed pdftotext from our application and are now using PDFBox
0.7.0 for all our PDF processing. Before we were using them both in
parallel: pdftotext for
Kristian Hermsdorf wrote:
We're using pdftotext as well, because PDFbox ist really slow. If your
application should work under Windows you will probably experiance some
mystic Java-VM crashes while executing external processes in batch-mode.
(This is because of a bug in Windows-VM... we implemen
Hi
I ve a kind of problem to execute a converting tool to modify a pdf to an
html under Linux. In fact, i have an executable "pdftohtml" which work
correctly on batch mode, and when I want to use it through Java under
Windows 2000 works also,BUT it does not work at all on the server under
linux. I
Check out http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html
which provides some pointers and code which should be helpful.
Cheers,
Kelvin
http://www.supermind.org
On Mon, 31 Jan 2005 19:01:11 +0100, Bertrand VENZAL wrote:
> Hi all,
>
> I ve a kind of problem to execute a convertin
I will assume you are asking this question on the lucene mailing list
because you now want to index that PDF document.
Have you tried PDFBox? It can't create an html file for you but it can
extract text.
Ben
http://www.pdfbox.org
On Mon, 31 Jan 2005, Bertrand VENZAL wrote:
> Hi all,
>
> I v