Re: Scratch files - too many files open
On 06/06/2015 18:44, Andreas Lehmkuehler wrote: I've added the patch in r1683929 to the trunk with some sligth modifications. Thank you. - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Scratch files - too many files open
Hi, Jesse Long jesse.long...@gmail.com hat am 3. Juni 2015 um 13:20 geschrieben: On 03/06/2015 12:46, Andreas Lehmkühler wrote: Hi, Jesse Long jesse.long...@gmail.com hat am 3. Juni 2015 um 08:45 geschrieben: On 02/06/2015 17:48, Andreas Lehmkuehler wrote: Hi, Am 02.06.2015 um 16:15 schrieb Jesse Long: Hi All, Regarding PDFBOX-2301, and the use of scratch files: right now, each COSStream uses one or two scratch files. I recently ran into the problem on Linux where the max number of open files allowed to the JVM by the OS was reached because of this. Is there a plan around this? Is it maybe that my use case is not expected? I'm aware of that. The refactoring is still in progress. I expect to reduce the number of open files. My use case is: Open PDDocument 1 Open PDDocument 2 for a few hundred times import page 1 of PDDocument 1 into PDDocument 2 and overlay some stuff ontop. save PDDocument 2. I have written a patch to use one single java.io.RandomAccessFile as a scratch file per COSDocument, using pages in a doubly linked list to separate streams in the same file. Would you be interested in adding this to PDFBox? To use one file only led to problems when creating pdfs from scratch. It is possible to write to 2 COSStreams at the same time which corrupts pdf. Hi Andreas, Do you mean at the same time, as in multiple threads, or single thread writing a bit to this stream and then a bit to another stream back and forth? It's about the second case. You can't add fonts and/or images to a page while adding content to a contentstream the same time. You have to add those before opening a stream or you have to close the stream before For the single thread use case, I have solved this in my patch. Actually, even multiple thread should be easy to support with synchronization. I'll work on some docs and submit and you can see if you like it. At least it sounds interesting and I'm happy to look at it. Please see patch attached. I've attached your patch to PDFBOX-2301 so that it can't get lost. Thanks, Jesse BR Andreas - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Scratch files - too many files open
On 03/06/2015 12:46, Andreas Lehmkühler wrote: Hi, Jesse Long jesse.long...@gmail.com hat am 3. Juni 2015 um 08:45 geschrieben: On 02/06/2015 17:48, Andreas Lehmkuehler wrote: Hi, Am 02.06.2015 um 16:15 schrieb Jesse Long: Hi All, Regarding PDFBOX-2301, and the use of scratch files: right now, each COSStream uses one or two scratch files. I recently ran into the problem on Linux where the max number of open files allowed to the JVM by the OS was reached because of this. Is there a plan around this? Is it maybe that my use case is not expected? I'm aware of that. The refactoring is still in progress. I expect to reduce the number of open files. My use case is: Open PDDocument 1 Open PDDocument 2 for a few hundred times import page 1 of PDDocument 1 into PDDocument 2 and overlay some stuff ontop. save PDDocument 2. I have written a patch to use one single java.io.RandomAccessFile as a scratch file per COSDocument, using pages in a doubly linked list to separate streams in the same file. Would you be interested in adding this to PDFBox? To use one file only led to problems when creating pdfs from scratch. It is possible to write to 2 COSStreams at the same time which corrupts pdf. Hi Andreas, Do you mean at the same time, as in multiple threads, or single thread writing a bit to this stream and then a bit to another stream back and forth? It's about the second case. You can't add fonts and/or images to a page while adding content to a contentstream the same time. You have to add those before opening a stream or you have to close the stream before For the single thread use case, I have solved this in my patch. Actually, even multiple thread should be easy to support with synchronization. I'll work on some docs and submit and you can see if you like it. At least it sounds interesting and I'm happy to look at it. Please see patch attached. Thanks, Jesse diff --git a/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDocument.java b/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDocument.java index 2317ee1..a1048e0 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDocument.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDocument.java @@ -25,6 +25,7 @@ import java.util.List; import java.util.Map; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; +import org.apache.pdfbox.io.ScratchFile; import org.apache.pdfbox.pdfparser.PDFObjectStreamParser; /** @@ -74,10 +75,8 @@ public class COSDocument extends COSBase implements Closeable private boolean closed = false; private boolean isXRefStream; - -private final File scratchDirectory; - -private final boolean useScratchFile; + +private ScratchFile scratchFile; /** * Constructor. @@ -102,8 +101,14 @@ public class COSDocument extends COSBase implements Closeable */ public COSDocument(File scratchDir, boolean useScratchFiles) { -scratchDirectory = scratchDir; -useScratchFile = useScratchFiles; +if (useScratchFiles) +{ +try { +scratchFile = new ScratchFile(scratchDir); +}catch (IOException e){ +LOG.error(Can't create temp file, using memory buffer instead, e); +} +} } /** @@ -121,7 +126,7 @@ public class COSDocument extends COSBase implements Closeable */ public COSStream createCOSStream() { -return new COSStream( useScratchFile, scratchDirectory); +return new COSStream(scratchFile); } /** @@ -133,7 +138,7 @@ public class COSDocument extends COSBase implements Closeable */ public COSStream createCOSStream(COSDictionary dictionary) { -return new COSStream( dictionary, useScratchFile, scratchDirectory ); +return new COSStream( dictionary, scratchFile ); } /** @@ -424,6 +429,11 @@ public class COSDocument extends COSBase implements Closeable } } } + +if (scratchFile != null){ +scratchFile.close(); +} + closed = true; } } diff --git a/pdfbox/src/main/java/org/apache/pdfbox/cos/COSStream.java b/pdfbox/src/main/java/org/apache/pdfbox/cos/COSStream.java index a5d6b46..7f73329 100644 --- a/pdfbox/src/main/java/org/apache/pdfbox/cos/COSStream.java +++ b/pdfbox/src/main/java/org/apache/pdfbox/cos/COSStream.java @@ -21,7 +21,6 @@ import java.io.BufferedOutputStream; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.Closeable; -import java.io.File; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; @@ -34,9 +33,9 @@ import org.apache.pdfbox.filter.FilterFactory; import org.apache.pdfbox.io.IOUtils; import org.apache.pdfbox.io.RandomAccess; import org.apache.pdfbox.io.RandomAccessBuffer; -import org.apache.pdfbox.io.RandomAccessFile;
Re: Scratch files - too many files open
On 02/06/2015 17:48, Andreas Lehmkuehler wrote: Hi, Am 02.06.2015 um 16:15 schrieb Jesse Long: Hi All, Regarding PDFBOX-2301, and the use of scratch files: right now, each COSStream uses one or two scratch files. I recently ran into the problem on Linux where the max number of open files allowed to the JVM by the OS was reached because of this. Is there a plan around this? Is it maybe that my use case is not expected? I'm aware of that. The refactoring is still in progress. I expect to reduce the number of open files. My use case is: Open PDDocument 1 Open PDDocument 2 for a few hundred times import page 1 of PDDocument 1 into PDDocument 2 and overlay some stuff ontop. save PDDocument 2. I have written a patch to use one single java.io.RandomAccessFile as a scratch file per COSDocument, using pages in a doubly linked list to separate streams in the same file. Would you be interested in adding this to PDFBox? To use one file only led to problems when creating pdfs from scratch. It is possible to write to 2 COSStreams at the same time which corrupts pdf. Hi Andreas, Do you mean at the same time, as in multiple threads, or single thread writing a bit to this stream and then a bit to another stream back and forth? For the single thread use case, I have solved this in my patch. Actually, even multiple thread should be easy to support with synchronization. I'll work on some docs and submit and you can see if you like it. Thanks, Jesse - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Scratch files - too many files open
Hi, Am 03.06.2015 um 13:20 schrieb Jesse Long: On 03/06/2015 12:46, Andreas Lehmkühler wrote: Hi, Jesse Long jesse.long...@gmail.com hat am 3. Juni 2015 um 08:45 geschrieben: On 02/06/2015 17:48, Andreas Lehmkuehler wrote: Hi, Am 02.06.2015 um 16:15 schrieb Jesse Long: Hi All, Regarding PDFBOX-2301, and the use of scratch files: right now, each COSStream uses one or two scratch files. I recently ran into the problem on Linux where the max number of open files allowed to the JVM by the OS was reached because of this. Is there a plan around this? Is it maybe that my use case is not expected? I'm aware of that. The refactoring is still in progress. I expect to reduce the number of open files. My use case is: Open PDDocument 1 Open PDDocument 2 for a few hundred times import page 1 of PDDocument 1 into PDDocument 2 and overlay some stuff ontop. save PDDocument 2. I have written a patch to use one single java.io.RandomAccessFile as a scratch file per COSDocument, using pages in a doubly linked list to separate streams in the same file. Would you be interested in adding this to PDFBox? To use one file only led to problems when creating pdfs from scratch. It is possible to write to 2 COSStreams at the same time which corrupts pdf. Hi Andreas, Do you mean at the same time, as in multiple threads, or single thread writing a bit to this stream and then a bit to another stream back and forth? It's about the second case. You can't add fonts and/or images to a page while adding content to a contentstream the same time. You have to add those before opening a stream or you have to close the stream before For the single thread use case, I have solved this in my patch. Actually, even multiple thread should be easy to support with synchronization. I'll work on some docs and submit and you can see if you like it. At least it sounds interesting and I'm happy to look at it. Please see patch attached. Looks promising, I'll have a deeper look later. Thanks, Jesse Thanks, Andreas - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Scratch files - too many files open
Hi All, Regarding PDFBOX-2301, and the use of scratch files: right now, each COSStream uses one or two scratch files. I recently ran into the problem on Linux where the max number of open files allowed to the JVM by the OS was reached because of this. Is there a plan around this? Is it maybe that my use case is not expected? My use case is: Open PDDocument 1 Open PDDocument 2 for a few hundred times import page 1 of PDDocument 1 into PDDocument 2 and overlay some stuff ontop. save PDDocument 2. I have written a patch to use one single java.io.RandomAccessFile as a scratch file per COSDocument, using pages in a doubly linked list to separate streams in the same file. Would you be interested in adding this to PDFBox? Thanks, Jesse - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Scratch files - too many files open
Am 02.06.2015 um 16:15 schrieb Jesse Long: Hi All, Regarding PDFBOX-2301, and the use of scratch files: right now, each COSStream uses one or two scratch files. I recently ran into the problem on Linux where the max number of open files allowed to the JVM by the OS was reached because of this. Is there a plan around this? Is it maybe that my use case is not expected? My use case is: Open PDDocument 1 Open PDDocument 2 for a few hundred times import page 1 of PDDocument 1 into PDDocument 2 and overlay some stuff ontop. save PDDocument 2. Did you close the documents when done? Tilman I have written a patch to use one single java.io.RandomAccessFile as a scratch file per COSDocument, using pages in a doubly linked list to separate streams in the same file. Would you be interested in adding this to PDFBox? Thanks, Jesse - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: Scratch files - too many files open
Hi, Am 02.06.2015 um 16:15 schrieb Jesse Long: Hi All, Regarding PDFBOX-2301, and the use of scratch files: right now, each COSStream uses one or two scratch files. I recently ran into the problem on Linux where the max number of open files allowed to the JVM by the OS was reached because of this. Is there a plan around this? Is it maybe that my use case is not expected? I'm aware of that. The refactoring is still in progress. I expect to reduce the number of open files. My use case is: Open PDDocument 1 Open PDDocument 2 for a few hundred times import page 1 of PDDocument 1 into PDDocument 2 and overlay some stuff ontop. save PDDocument 2. I have written a patch to use one single java.io.RandomAccessFile as a scratch file per COSDocument, using pages in a doubly linked list to separate streams in the same file. Would you be interested in adding this to PDFBox? To use one file only led to problems when creating pdfs from scratch. It is possible to write to 2 COSStreams at the same time which corrupts pdf. Thanks, Jesse - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org BR Andreas Lehmkühler - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org