Hi,
I have been usiing PDFBox to extract text from PDF files for full text search
for a few years,
and found it is a great product. Recently I downloaded PDFBox 1.5 and found
that it can
extract text from many PDF files which cannot be processed previously, thanks!!
The problem I have is that it took long time for PDFTextStripper.getText(..) to
finish, for example
our client has a 27MB PDF file which contains some graphics, it took
getText(..) 50m to finish
even though it only extract 100K text eventually.
I tried to change input parameters and results are same essentially, I would
like to know if this
speed is expected and the possibility to improve?
Thanks very much for helps, Lisheng