public void save(File file) throws IOException
{
save(new BufferedOutputStream(new FileOutputStream(file)));
}
so it is more efficient than
save(OutputStream output)
which just takes what it gets. See also
https://issues.apache.org/jira/browse/PDFBOX-3121
Tilman
Am 21.03.2016 um 20:58 schrieb Stahle, Patrick:
Hi John / Tillman,
I have reduced it down to be a difference between doing a PDDocument.save()
using FileOutputStream. If I pass in Java File instead, the problem does not
occur. Also we have only been able to reproduce it on some larger pdf files. It
also seems to only happen in certain environments. On my linux virtual machine
I have not been able to reproduce it at all. Windows and Solaris Server (3par
drive cluster). I have some simple sample code that reproduces the problem but
the 2 pdf files I have at hand I don't think I can send you. The one is a 3D
PDF of ours (TE Classified) and the other ironically is IText v1 manual in pdf
form. The times are pretty drastic, on Windows the 3D PDF with using Java File
class is about 3 seconds vs. 29 seconds for the FileOutputStream. IText manual
is not as bad at 2 vs. 20.
Anyways, we have a workaround. We just converted our code to pass Java File
class for use by PDFBox. If I can find a suitable PDF that reproduces the
problem I will send it your way.
Thanks,
Patrick
-----Original Message-----
From: John Hewson [mailto:[email protected]]
Sent: Friday, March 18, 2016 4:45 PM
To: [email protected]
Subject: Re: Strange performance problem with certain PDF files
On 18 Mar 2016, at 12:01, Stahle, Patrick <[email protected]> wrote:
Hi all,
I am running into a lot of strange performance issues with certain PDF files.
Background info:
The strange thing I can't reproduce this consistently. When I get a pdf being
generated on a particular environment it seems consistent. I do most of my
development inside VirtualBox virtual machine running fedora. These pdf files I
am having problems with never have performance issues when run on my virtual
machine local drive, but if I use a Virtual Box Shared drive as the source /
destination for the PDF, I see the problem. Another co-worker working from pure
windows environment experience the performance problem. We are also seeing the
same issue on our dev solaris servers. The performance range can be quite
drastic on one of our 3DPDF's (12meg) running on my local environment it can be
opened, stamped with some text, encrypted, and saved in around 8 sec. Doing the
same job pointing to a virtual box share drive or on our solaris server that
same work will take minutes. On my coworkers windows environment it takes
around 30 seconds. We really only reproduced this consistently on the 12m 3D
PDF. I have a much smaller pdf (non 3d / convert from msoffice) that does show
similar performance issue but the times range from 200ms local to 8 sec.
You need to isolate the problem, you’ve got too many variables to make any
sense of it all. Get a reproducible problem on one, non-virtualised JVM first.
— John
The one thing I see in common between the 2 files is I see a lot of the
following messages to the console:
Using output from the 12m 3DPDF file:
:
:
1787 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser -
parsed=COSObject{13166, 0}
These messages seem to happen on the PDDocument.open and from what I can tell,
I get 13,166 of these messages in this example PDF.
The slowness does not happen until the following line:
document.save(outputPDFStream);
Other PDF's including some quite large I do not see this performance issue nor
those log messages.
I know this is not much to go on, I am working on seeing if I can isolate this
down to something more concrete / reproducible point. But I thought I would
send this out to see if anyone has any ideas or have seen issues similar to
this? Suggestions?
Thanks,
Patrick
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]