Ben Litchfield wrote:
Different PDFs will exhibit different extraction speeds because of
the way
that PDF documents are structured.
Yes, I am aware of that - this is the reason I picked pdfs containting
only text, arranged in one column. Anwyay, there probably are lots of
different
Hi,
I have a serious performance problem while extracting text from pdf.
Here is the code (w/o try/catch blocks):
File file = new File(test.pdf);
FileInputStream reader = new FileInputStream(file);
PDFParser parser = new PDFParser(reader);
parser.parse();
PDDocument pdDoc =
as the
size of the file grows.
Try that first, and then rebenchmark.
Cheers
Paul Smith
-Original Message-
From: Miroslaw Milewski [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 29, 2004 7:24 AM
To: [EMAIL PROTECTED]
Subject: pdfbox performance.
Hi,
I have a serious performance
Paul Smith wrote:
The first thing that I would do is wrap the FileInputStream with a
BufferedInputStream.
Change:
FileInputStream reader = new FileInputStream(file);
To:
InputStream reader = new BufferedInputStream(new
FileInputStream(file));
You get a significant boost reading in
Different PDFs will exhibit different extraction speeds because of the way
that PDF documents are structured.
I assume you are using the latest version 0.6.6, could you give 0.6.5 a
try and see if you notice faster speeds.
Ben
On Thu, 29 Jul 2004, Miroslaw Milewski wrote:
Paul Smith wrote: