Re: pdfbox performance.

2004-07-29 Thread Miroslaw Milewski
Ben Litchfield wrote: Different PDFs will exhibit different extraction speeds because of the way that PDF documents are structured. Yes, I am aware of that - this is the reason I picked pdfs containting only text, arranged in one column. Anwyay, there probably are lots of different

pdfbox performance.

2004-07-28 Thread Miroslaw Milewski
Hi, I have a serious performance problem while extracting text from pdf. Here is the code (w/o try/catch blocks): File file = new File(test.pdf); FileInputStream reader = new FileInputStream(file); PDFParser parser = new PDFParser(reader); parser.parse(); PDDocument pdDoc =

RE: pdfbox performance.

2004-07-28 Thread Paul Smith
as the size of the file grows. Try that first, and then rebenchmark. Cheers Paul Smith -Original Message- From: Miroslaw Milewski [mailto:[EMAIL PROTECTED] Sent: Thursday, July 29, 2004 7:24 AM To: [EMAIL PROTECTED] Subject: pdfbox performance. Hi, I have a serious performance

Re: pdfbox performance.

2004-07-28 Thread Miroslaw Milewski
Paul Smith wrote: The first thing that I would do is wrap the FileInputStream with a BufferedInputStream. Change: FileInputStream reader = new FileInputStream(file); To: InputStream reader = new BufferedInputStream(new FileInputStream(file)); You get a significant boost reading in

Re: pdfbox performance.

2004-07-28 Thread Ben Litchfield
Different PDFs will exhibit different extraction speeds because of the way that PDF documents are structured. I assume you are using the latest version 0.6.6, could you give 0.6.5 a try and see if you notice faster speeds. Ben On Thu, 29 Jul 2004, Miroslaw Milewski wrote: Paul Smith wrote: