Hi John / Tillman,

I have reduced it down to be a difference between doing a PDDocument.save() 
using FileOutputStream. If I pass in Java File instead, the problem does not 
occur. Also we have only been able to reproduce it on some larger pdf files. It 
also seems to only happen in certain environments. On my linux virtual machine 
I have not been able to reproduce it at all. Windows and Solaris Server (3par 
drive cluster). I have some simple sample code that reproduces the problem but 
the 2 pdf files I have at hand I don't think I can send you. The one is a 3D 
PDF of ours (TE Classified) and the other ironically is IText v1 manual in pdf 
form. The times are pretty drastic, on Windows the 3D PDF with using Java File 
class is about 3 seconds vs.  29 seconds for the FileOutputStream. IText manual 
is not as bad at 2 vs. 20. 

Anyways, we have a workaround. We just converted our code to pass Java File 
class for use by PDFBox. If I can find a suitable PDF that reproduces the 
problem I will send it your way.

Thanks,
Patrick

-----Original Message-----
From: John Hewson [mailto:[email protected]] 
Sent: Friday, March 18, 2016 4:45 PM
To: [email protected]
Subject: Re: Strange performance problem with certain PDF files


> On 18 Mar 2016, at 12:01, Stahle, Patrick <[email protected]> wrote:
> 
> Hi all,
> 
> I am running into a lot of strange performance issues with certain PDF files.
> 
> Background info:
> The strange thing I can't reproduce this consistently. When I get a pdf being 
> generated on a particular environment it seems consistent. I do most of my 
> development inside VirtualBox virtual machine running fedora. These pdf files 
> I am having problems with never have performance issues when run on my 
> virtual machine local drive, but if I use a Virtual Box Shared drive as the 
> source / destination for the PDF, I see the problem. Another co-worker 
> working from pure windows environment experience the performance problem. We 
> are also seeing the same issue on our dev solaris servers. The performance 
> range can be quite drastic on one of our 3DPDF's (12meg) running on my local 
> environment it can be opened, stamped with some text, encrypted, and saved in 
> around 8 sec. Doing the same job pointing to a virtual box share drive or on 
> our solaris server that same work will take minutes. On my coworkers windows 
> environment it takes around 30 seconds. We really only reproduced this 
> consistently on the 12m 3D  PDF. I have a much smaller pdf (non 3d / convert 
> from msoffice) that does show similar performance issue but the times range 
> from 200ms local to 8 sec.

You need to isolate the problem, you’ve got too many variables to make any 
sense of it all. Get a reproducible problem on one, non-virtualised JVM first.

— John

> The one thing I see in common between the 2 files is I see a lot of the 
> following messages to the console:
> Using output from the 12m 3DPDF file:
> :
> :
> 1787 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser  - 
> parsed=COSObject{13166, 0}
> 
> These messages seem to happen on the PDDocument.open and from what I can 
> tell, I get 13,166 of these messages in this example PDF.
> The slowness does not happen until the following line:
> document.save(outputPDFStream);
> 
> Other PDF's including some quite large I do not see this performance issue 
> nor those log messages.
> 
> I know this is not much to go on, I am working on seeing if I can isolate 
> this down to something more concrete / reproducible point. But I thought I 
> would send this out to see if anyone has any ideas or have seen issues 
> similar to this? Suggestions?
> 
> Thanks,
> Patrick
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to