Hi Maruan,

the temporary files are not a problem, I just wanted to know if it is necessary to keep them open until the merge is finished. Your answer implies the need to keep them open, so let it be.

I distilled the code I am using:

private static void downloadMergedPDF(HttpServletResponse response,
    List<InputStream> documentList, String fileName)
        throws IOException, COSVisitorException {

    response.setContentType("application/pdf");
    response.setContentLength(-1);
response.addHeader("Content-disposition", "attachment; filename=" + fileName);
    OutputStream output = response.getOutputStream();

    PDFMergerUtility merger = new PDFMergerUtility();
    for (InputStream document : documentList) {
        merger.addSource(document);
    }
    merger.setDestinationStream(output);
    merger.mergeDocuments();

    output.flush();
    output.close();
}

I still want to know when the merger starts writing bytes to the output stream, already during the merge or after the merge has finished? This is important for me to estimate the time the user has to wait for the download to begin.

Regards
Joern

Am 07.12.2013 09:05, schrieb Maruan Sahyoun:
Hi Joern,

you could do it completely in memory but at the cost of memory consumption as 
all files have to be kept until the merge finishes. So from my perspective 
adjusting the open file limit is a better option.

Maybe you can post a code snippet how you load the files and do the merging. 
Maybe there is some easy way to improve that.

BR
Maruan Sahyoun

Am 07.12.2013 um 01:01 schrieb Jörn Haferstroh <[email protected]>:

Hi,

first let me give some credits to the developers of pdfbox for this very usable 
tool. Please continue your work, guys!

I have a web application storing lots of PDF documents in a database. For 
easier bulk download and printing, I am using pdfbox to merge multiple PDF 
documents into one large PDF document for download. The destination stream of 
the merge is the HTTP output stream, so the merged PDF data goes directly to 
the requesting web client.

Today I learned by a "too many open files" error, that pdfbox creates a 
temporary file for each source input stream and keeps it open until the end of the merge 
process (I tried to merge 1025 PDF sources into one PDF on a Linux box). Is this 
behaviour necessary, maybe caused by the PDF format? However, I was able to handle it by 
increasing the open file limit of the user.

When does pdfbox write the first bytes into the merge output stream? Does it 
happen during the merge process or after the last source has been merged? So, 
does the requesting web client has to wait for the download to start until all 
sources have been merged or not?

Thanks for information
Joern



Reply via email to