getText() performance in PDFBox 1.5 release

Zhang, Lisheng Fri, 04 Nov 2011 09:24:17 -0700

Hi,
 
I have been usiing PDFBox to extract text from PDF files for full text search 
for a few years,
and found it is a great product. Recently I downloaded PDFBox 1.5 and found 
that it can 
extract text from many PDF files which cannot be processed previously, thanks!!
 
The problem I have is that it took long time for PDFTextStripper.getText(..) to 
finish, for example
our client has a 27MB PDF file which contains some graphics, it took 
getText(..) 50m to finish
even though it only extract 100K text eventually.
 
I tried to change input parameters and results are same essentially, I would 
like to know if this
speed is expected and the possibility to improve?
 
Thanks very much for helps, Lisheng

getText() performance in PDFBox 1.5 release

Reply via email to