Re: Scratch files - too many files open

2015-06-08 Thread Jesse Long

On 06/06/2015 18:44, Andreas Lehmkuehler wrote:
I've added the patch in r1683929 to the trunk with some sligth 
modifications.


Thank you.

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Scratch files - too many files open

2015-06-05 Thread Andreas Lehmkühler
Hi,

 Jesse Long jesse.long...@gmail.com hat am 3. Juni 2015 um 13:20 geschrieben:
 
 
 On 03/06/2015 12:46, Andreas Lehmkühler wrote:
  Hi,
 
  Jesse Long jesse.long...@gmail.com hat am 3. Juni 2015 um 08:45
  geschrieben:
 
 
  On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
  Hi,
 
  Am 02.06.2015 um 16:15 schrieb Jesse Long:
  Hi All,
 
  Regarding PDFBOX-2301, and the use of scratch files: right now, each
  COSStream
  uses one or two scratch files.
 
  I recently ran into the problem on Linux where the max number of open
  files
  allowed to the JVM by the OS was reached because of this.
 
  Is there a plan around this?
 
  Is it maybe that my use case is not expected?
  I'm aware of that. The refactoring is still in progress. I expect to
  reduce the number of open files.
 
  My use case is:
  Open PDDocument 1
  Open PDDocument 2
  for a few hundred times
import page 1 of PDDocument 1 into PDDocument 2 and overlay
  some stuff
  ontop.
  save PDDocument 2.
 
  I have written a patch to use one single java.io.RandomAccessFile as
  a scratch
  file per COSDocument, using pages in a doubly linked list to separate
  streams in
  the same file. Would you be interested in adding this to PDFBox?
  To use one file only led to problems when creating pdfs from scratch.
  It is possible to write to 2 COSStreams at the same time which
  corrupts pdf.
  Hi Andreas,
 
  Do you mean at the same time, as in multiple threads, or single thread
  writing a bit to this stream and then a bit to another stream back and
  forth?
  It's about the second case. You can't add fonts and/or images to a page
  while
  adding content to a contentstream the same time. You have to add those
  before
  opening a stream or you have to close the stream before
 
  For the single thread use case, I have solved this in my patch.
  Actually, even multiple thread should be easy to support with
  synchronization. I'll work on some docs and submit and you can see if
  you like it.
  At least it sounds interesting and I'm happy to look at it.
 
 
 Please see patch attached.
I've attached your patch to PDFBOX-2301 so that it can't get lost.

 
 Thanks,
 Jesse

BR
Andreas

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Scratch files - too many files open

2015-06-03 Thread Jesse Long

On 03/06/2015 12:46, Andreas Lehmkühler wrote:

Hi,


Jesse Long jesse.long...@gmail.com hat am 3. Juni 2015 um 08:45 geschrieben:


On 02/06/2015 17:48, Andreas Lehmkuehler wrote:

Hi,

Am 02.06.2015 um 16:15 schrieb Jesse Long:

Hi All,

Regarding PDFBOX-2301, and the use of scratch files: right now, each
COSStream
uses one or two scratch files.

I recently ran into the problem on Linux where the max number of open
files
allowed to the JVM by the OS was reached because of this.

Is there a plan around this?

Is it maybe that my use case is not expected?

I'm aware of that. The refactoring is still in progress. I expect to
reduce the number of open files.


My use case is:
Open PDDocument 1
Open PDDocument 2
for a few hundred times
  import page 1 of PDDocument 1 into PDDocument 2 and overlay
some stuff
ontop.
save PDDocument 2.

I have written a patch to use one single java.io.RandomAccessFile as
a scratch
file per COSDocument, using pages in a doubly linked list to separate
streams in
the same file. Would you be interested in adding this to PDFBox?

To use one file only led to problems when creating pdfs from scratch.
It is possible to write to 2 COSStreams at the same time which
corrupts pdf.

Hi Andreas,

Do you mean at the same time, as in multiple threads, or single thread
writing a bit to this stream and then a bit to another stream back and
forth?

It's about the second case. You can't add fonts and/or images to a page while
adding content to a contentstream the same time. You have to add those before
opening a stream or you have to close the stream before


For the single thread use case, I have solved this in my patch.
Actually, even multiple thread should be easy to support with
synchronization. I'll work on some docs and submit and you can see if
you like it.

At least it sounds interesting and I'm happy to look at it.



Please see patch attached.

Thanks,
Jesse
diff --git a/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDocument.java b/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDocument.java
index 2317ee1..a1048e0 100644
--- a/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDocument.java
+++ b/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDocument.java
@@ -25,6 +25,7 @@ import java.util.List;
 import java.util.Map;
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
+import org.apache.pdfbox.io.ScratchFile;
 import org.apache.pdfbox.pdfparser.PDFObjectStreamParser;
 
 /**
@@ -74,10 +75,8 @@ public class COSDocument extends COSBase implements Closeable
 private boolean closed = false;
 
 private boolean isXRefStream;
-
-private final File scratchDirectory;
-
-private final boolean useScratchFile;
+
+private ScratchFile scratchFile;
 
 /**
  * Constructor.
@@ -102,8 +101,14 @@ public class COSDocument extends COSBase implements Closeable
  */
 public COSDocument(File scratchDir, boolean useScratchFiles)
 {
-scratchDirectory = scratchDir;
-useScratchFile = useScratchFiles;
+if (useScratchFiles)
+{
+try {
+scratchFile = new ScratchFile(scratchDir);
+}catch (IOException e){
+LOG.error(Can't create temp file, using memory buffer instead, e);
+}
+}
 }
 
 /**
@@ -121,7 +126,7 @@ public class COSDocument extends COSBase implements Closeable
  */
 public COSStream createCOSStream()
 {
-return new COSStream( useScratchFile, scratchDirectory);
+return new COSStream(scratchFile);
 }
 
 /**
@@ -133,7 +138,7 @@ public class COSDocument extends COSBase implements Closeable
  */
 public COSStream createCOSStream(COSDictionary dictionary)
 {
-return new COSStream( dictionary, useScratchFile, scratchDirectory );
+return new COSStream( dictionary, scratchFile );
 }
 
 /**
@@ -424,6 +429,11 @@ public class COSDocument extends COSBase implements Closeable
 }
 }
 }
+
+if (scratchFile != null){
+scratchFile.close();
+}
+
 closed = true;
 }
 }
diff --git a/pdfbox/src/main/java/org/apache/pdfbox/cos/COSStream.java b/pdfbox/src/main/java/org/apache/pdfbox/cos/COSStream.java
index a5d6b46..7f73329 100644
--- a/pdfbox/src/main/java/org/apache/pdfbox/cos/COSStream.java
+++ b/pdfbox/src/main/java/org/apache/pdfbox/cos/COSStream.java
@@ -21,7 +21,6 @@ import java.io.BufferedOutputStream;
 import java.io.ByteArrayInputStream;
 import java.io.ByteArrayOutputStream;
 import java.io.Closeable;
-import java.io.File;
 import java.io.IOException;
 import java.io.InputStream;
 import java.io.OutputStream;
@@ -34,9 +33,9 @@ import org.apache.pdfbox.filter.FilterFactory;
 import org.apache.pdfbox.io.IOUtils;
 import org.apache.pdfbox.io.RandomAccess;
 import org.apache.pdfbox.io.RandomAccessBuffer;
-import org.apache.pdfbox.io.RandomAccessFile;
 

Re: Scratch files - too many files open

2015-06-03 Thread Jesse Long

On 02/06/2015 17:48, Andreas Lehmkuehler wrote:

Hi,

Am 02.06.2015 um 16:15 schrieb Jesse Long:

Hi All,

Regarding PDFBOX-2301, and the use of scratch files: right now, each 
COSStream

uses one or two scratch files.

I recently ran into the problem on Linux where the max number of open 
files

allowed to the JVM by the OS was reached because of this.

Is there a plan around this?

Is it maybe that my use case is not expected?
I'm aware of that. The refactoring is still in progress. I expect to 
reduce the number of open files.



My use case is:
Open PDDocument 1
Open PDDocument 2
for a few hundred times
 import page 1 of PDDocument 1 into PDDocument 2 and overlay 
some stuff

ontop.
save PDDocument 2.

I have written a patch to use one single java.io.RandomAccessFile as 
a scratch
file per COSDocument, using pages in a doubly linked list to separate 
streams in

the same file. Would you be interested in adding this to PDFBox?
To use one file only led to problems when creating pdfs from scratch. 
It is possible to write to 2 COSStreams at the same time which 
corrupts pdf.


Hi Andreas,

Do you mean at the same time, as in multiple threads, or single thread 
writing a bit to this stream and then a bit to another stream back and 
forth?


For the single thread use case, I have solved this in my patch. 
Actually, even multiple thread should be easy to support with 
synchronization. I'll work on some docs and submit and you can see if 
you like it.


Thanks,
Jesse



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Scratch files - too many files open

2015-06-03 Thread Andreas Lehmkuehler

Hi,

Am 03.06.2015 um 13:20 schrieb Jesse Long:

On 03/06/2015 12:46, Andreas Lehmkühler wrote:

Hi,


Jesse Long jesse.long...@gmail.com hat am 3. Juni 2015 um 08:45 geschrieben:


On 02/06/2015 17:48, Andreas Lehmkuehler wrote:

Hi,

Am 02.06.2015 um 16:15 schrieb Jesse Long:

Hi All,

Regarding PDFBOX-2301, and the use of scratch files: right now, each
COSStream
uses one or two scratch files.

I recently ran into the problem on Linux where the max number of open
files
allowed to the JVM by the OS was reached because of this.

Is there a plan around this?

Is it maybe that my use case is not expected?

I'm aware of that. The refactoring is still in progress. I expect to
reduce the number of open files.


My use case is:
Open PDDocument 1
Open PDDocument 2
for a few hundred times
  import page 1 of PDDocument 1 into PDDocument 2 and overlay
some stuff
ontop.
save PDDocument 2.

I have written a patch to use one single java.io.RandomAccessFile as
a scratch
file per COSDocument, using pages in a doubly linked list to separate
streams in
the same file. Would you be interested in adding this to PDFBox?

To use one file only led to problems when creating pdfs from scratch.
It is possible to write to 2 COSStreams at the same time which
corrupts pdf.

Hi Andreas,

Do you mean at the same time, as in multiple threads, or single thread
writing a bit to this stream and then a bit to another stream back and
forth?

It's about the second case. You can't add fonts and/or images to a page while
adding content to a contentstream the same time. You have to add those before
opening a stream or you have to close the stream before


For the single thread use case, I have solved this in my patch.
Actually, even multiple thread should be easy to support with
synchronization. I'll work on some docs and submit and you can see if
you like it.

At least it sounds interesting and I'm happy to look at it.



Please see patch attached.

Looks promising, I'll have a deeper look later.



Thanks,
Jesse


Thanks,
Andreas


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Scratch files - too many files open

2015-06-02 Thread Jesse Long

Hi All,

Regarding PDFBOX-2301, and the use of scratch files: right now, each 
COSStream uses one or two scratch files.


I recently ran into the problem on Linux where the max number of open 
files allowed to the JVM by the OS was reached because of this.


Is there a plan around this?

Is it maybe that my use case is not expected?

My use case is:
Open PDDocument 1
Open PDDocument 2
for a few hundred times
import page 1 of PDDocument 1 into PDDocument 2 and overlay 
some stuff ontop.

save PDDocument 2.

I have written a patch to use one single java.io.RandomAccessFile as a 
scratch file per COSDocument, using pages in a doubly linked list to 
separate streams in the same file. Would you be interested in adding 
this to PDFBox?


Thanks,
Jesse

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Scratch files - too many files open

2015-06-02 Thread Tilman Hausherr

Am 02.06.2015 um 16:15 schrieb Jesse Long:

Hi All,

Regarding PDFBOX-2301, and the use of scratch files: right now, each 
COSStream uses one or two scratch files.


I recently ran into the problem on Linux where the max number of open 
files allowed to the JVM by the OS was reached because of this.


Is there a plan around this?

Is it maybe that my use case is not expected?

My use case is:
Open PDDocument 1
Open PDDocument 2
for a few hundred times
import page 1 of PDDocument 1 into PDDocument 2 and overlay 
some stuff ontop.

save PDDocument 2.


Did you close the documents when done?

Tilman



I have written a patch to use one single java.io.RandomAccessFile as a 
scratch file per COSDocument, using pages in a doubly linked list to 
separate streams in the same file. Would you be interested in adding 
this to PDFBox?


Thanks,
Jesse

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Scratch files - too many files open

2015-06-02 Thread Andreas Lehmkuehler

Hi,

Am 02.06.2015 um 16:15 schrieb Jesse Long:

Hi All,

Regarding PDFBOX-2301, and the use of scratch files: right now, each COSStream
uses one or two scratch files.

I recently ran into the problem on Linux where the max number of open files
allowed to the JVM by the OS was reached because of this.

Is there a plan around this?

Is it maybe that my use case is not expected?
I'm aware of that. The refactoring is still in progress. I expect to reduce the 
number of open files.



My use case is:
Open PDDocument 1
Open PDDocument 2
for a few hundred times
 import page 1 of PDDocument 1 into PDDocument 2 and overlay some stuff
ontop.
save PDDocument 2.

I have written a patch to use one single java.io.RandomAccessFile as a scratch
file per COSDocument, using pages in a doubly linked list to separate streams in
the same file. Would you be interested in adding this to PDFBox?
To use one file only led to problems when creating pdfs from scratch. It is 
possible to write to 2 COSStreams at the same time which corrupts pdf.



Thanks,
Jesse

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


BR
Andreas Lehmkühler


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org