Re: How to proceed?

2013-05-04 Thread Maruan Sahyoun
Hi,

I added the information about patching a branch to the CMS site developer 
section. Maybe it's helpful for other people too :-)

Maruan Sahyoun

Am 03.05.2013 um 10:50 schrieb Thomas Chojecki i...@rayman2200.de:

 Am 01.05.2013 19:56, schrieb Andreas Lehmkuehler:
 Hi,
 Am 01.05.2013 11:55, schrieb Thomas Chojecki:
 Seems that I missing some basics knowledge about maintaining branches.
 I only used branches for bug fix releases without merging. So I would go 
 the way
 just committing it on the branch and trunk as separate commits without any
 merging attempts. If this is the wrong way, how do I merge the changes from
 2.0.0 into the 1.8.x branch? (like the Oracle JVM error)
 Also I thought that the trunk and branch would never be merged at the end, 
 so
 why doing it this way?
 Let's have a look at the 1.8.1 release. When starting with the
 preparation the branch contained the 1.8.0 source. I did some cherry
 picking and chose some of
 fixes which were done in the trunk. I merged those changes to the 1.8-branch.
 Based on that code I created the 1.8.1 release.
 I'm a little bit confused.
 Maybe we are talking about the very same. :-) I'm using something like this:
 - checkout the branch
 - cd to the branch directory
 - merge some changes from the trunk using svn merge
 -cREV1,REV2,REV3... https://svn.apache.org/repos/asf/pdfbox/trunk;
 - commit the changes
 
 Thx for that explanation. Didn't know about cherry picking in svn and it 
 makes sense :-) This example helps a lot.
 So for me it makes no different merging from the branch to the trunk or vice 
 versa. But if we mainly work on the trunk, it makes sense doing the merge 
 from the trunk to the branch.
 
 A 1.9 branch would only be needed if we really want to release a new 
 version
 including improvements based on the current api. Are you planning to do
 something like that? I'd like to concentrate on the 2.0 and limit the 
 support
 for the old version to bugfixes and maybe smaller enhancements.
 Hmm, to be honest, I would do that extra work if this would not be too
 complicated applying a patch to the branch.
 I was just wondering if we really need another branch. All bugfixes should go
 to the 1.8-branch as long as we don't want to release a new feature release.
 I'm just using the wrong wording. I mean maybe setting the branch to 
 1.9.0-SNAPSHOT as version.
 But this is just nice to have and not really necessary.
 
 BR
 Andreas Lehmkühler
 
 Best regards
 Thomas
 



Build failed in Jenkins: PDFBox-trunk » Apache PDFBox webapp #646

2013-05-04 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox-war/646/

--
[INFO] 
[INFO] 
[INFO] Building Apache PDFBox webapp 2.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ pdfbox-war 
---[INFO] Deleting 
https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox-war/ws/target

[INFO] 
[INFO] --- maven-remote-resources-plugin:1.2.1:process (default) @ pdfbox-war 
---
[INFO] 
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ 
pdfbox-war ---[debug] execute contextualize
[INFO] Using 'ISO-8859-1' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox-war/ws/src/main/resources
[INFO] Copying 3 resources

[INFO] 
[INFO] No sources to compile
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ pdfbox-war 
---
[INFO] [debug] execute contextualize

[INFO] Using 'ISO-8859-1' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox-war/ws/src/test/resources
[INFO] Copying 3 resources
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ 
pdfbox-war ---
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ 
pdfbox-war ---[INFO] No sources to compile

[INFO] 
[INFO] --- maven-surefire-plugin:2.9:test (default-test) @ pdfbox-war ---[INFO] 
Surefire report directory: 
https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox-war/ws/target/surefire-reports

---
 T E S T S
---


Results :

Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

[JENKINS] Recording test results


Build failed in Jenkins: PDFBox-trunk #646

2013-05-04 Thread Apache Jenkins Server
See https://builds.apache.org/job/PDFBox-trunk/646/

--
[...truncated 750 lines...]
[INFO] Building jar: 
https://builds.apache.org/job/PDFBox-trunk/ws/trunk/lucene/target/pdfbox-lucene-2.0.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ 
pdfbox-lucene ---
[INFO] [INFO] Exclude: release.properties

[INFO] --- apache-rat-plugin:0.6:check (default) @ pdfbox-lucene ---
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ pdfbox-lucene 
---[INFO] Installing 
https://builds.apache.org/job/PDFBox-trunk/ws/trunk/lucene/target/pdfbox-lucene-2.0.0-SNAPSHOT.jar
 to 
/export/home/hudson/hudson-slave/maven-repositories/0/org/apache/pdfbox/pdfbox-lucene/2.0.0-SNAPSHOT/pdfbox-lucene-2.0.0-SNAPSHOT.jar
[INFO] Installing 
https://builds.apache.org/job/PDFBox-trunk/ws/trunk/lucene/pom.xml to 
/export/home/hudson/hudson-slave/maven-repositories/0/org/apache/pdfbox/pdfbox-lucene/2.0.0-SNAPSHOT/pdfbox-lucene-2.0.0-SNAPSHOT.pom

[INFO] 
[INFO] --- maven-deploy-plugin:2.6:deploy (default-deploy) @ pdfbox-lucene ---
Downloading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-lucene/2.0.0-SNAPSHOT/maven-metadata.xml
Downloaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-lucene/2.0.0-SNAPSHOT/maven-metadata.xml
 (780 B at 0.7 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-lucene/2.0.0-SNAPSHOT/pdfbox-lucene-2.0.0-20130504.100020-4.jar
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-lucene/2.0.0-SNAPSHOT/pdfbox-lucene-2.0.0-20130504.100020-4.jar
 (17 KB at 10.3 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-lucene/2.0.0-SNAPSHOT/pdfbox-lucene-2.0.0-20130504.100020-4.pom
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-lucene/2.0.0-SNAPSHOT/pdfbox-lucene-2.0.0-20130504.100020-4.pom
 (2 KB at 1.1 KB/sec)
Downloading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-lucene/maven-metadata.xml
Downloaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-lucene/maven-metadata.xml
 (470 B at 0.5 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-lucene/2.0.0-SNAPSHOT/maven-metadata.xml
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-lucene/2.0.0-SNAPSHOT/maven-metadata.xml
 (780 B at 0.5 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-lucene/maven-metadata.xml
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-lucene/maven-metadata.xml
 (470 B at 0.2 KB/sec)
[INFO] 
[INFO] 
[INFO] Building Apache PDFBox for Ant 2.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] Deleting https://builds.apache.org/job/PDFBox-trunk/ws/trunk/ant/target
[INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ pdfbox-ant ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.2.1:process (default) @ pdfbox-ant 
---
[INFO] 
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ 
pdfbox-ant ---[debug] execute contextualize
[INFO] Using 'ISO-8859-1' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
https://builds.apache.org/job/PDFBox-trunk/ws/trunk/ant/src/main/resources
[INFO] Copying 3 resources

[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ pdfbox-ant 
---
[INFO] Compiling 1 source file to 
https://builds.apache.org/job/PDFBox-trunk/ws/trunk/ant/target/classes
[INFO] 
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ 
pdfbox-ant ---[debug] execute contextualize
[INFO] Using 'ISO-8859-1' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
https://builds.apache.org/job/PDFBox-trunk/ws/trunk/ant/src/test/resources
[INFO] Copying 3 resources

[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ 
pdfbox-ant ---[INFO] No sources to compile

[INFO] 
[INFO] --- maven-surefire-plugin:2.9:test (default-test) @ pdfbox-ant ---[INFO] 
Surefire report directory: 
https://builds.apache.org/job/PDFBox-trunk/ws/trunk/ant/target/surefire-reports


---
 T E S T S
---

Results :

Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

[JENKINS] Recording test results
[INFO] 
[INFO] --- maven-jar-plugin:2.3.1:jar (default-jar) @ pdfbox-ant ---

[jira] [Commented] (PDFBOX-1586) IndexOutOfBoundsException when saving a document (at random)

2013-05-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649108#comment-13649108
 ] 

Andreas Lehmkühler commented on PDFBOX-1586:


I removed/reworked the direct access to the scratch file in revision 1479136. 
Now it should be easier to find the real cause for the described issue and 
maybe it'll be easier to remove the direct access at all.

 IndexOutOfBoundsException when saving a document (at random)
 

 Key: PDFBOX-1586
 URL: https://issues.apache.org/jira/browse/PDFBOX-1586
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.8.1
Reporter: James Green
Priority: Critical

 Getting the following stacktrace:
 org.apache.pdfbox.exceptions.COSVisitorException: 
 java.lang.IndexOutOfBoundsException: Index: 28, Size: 0
 at 
 org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1245)
 at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:201)
 at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:206)
 at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:524)
 at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:434)
 at 
 org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1056)
 at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:496)
 at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1392)
 at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1157)
 at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1138)
 ...
 Caused by: java.lang.IndexOutOfBoundsException: Index: 28, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:604)
 at java.util.ArrayList.get(ArrayList.java:382)
 at 
 org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
 at 
 org.apache.pdfbox.io.RandomAccessFileInputStream.read(RandomAccessFileInputStream.java:96)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
 at 
 org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1232)
 I'll add some context. We have a data pipeline in which a Windows Print 
 Monitor sends postscript into a servlet which then uses GhostScript 9.05 to 
 convert in-memory to PDF. This PDF is then loaded into PDFBox using 
 PDDocument.load().
 At this point we split the original PDF into multiple smaller ones each of 
 which is saved to a ByteArrayOutputStream. At the point of save() we are 
 having serious reliability issues.
 Taking an original PDF from Ghostscript we have saved this into a unit test 
 to replicate the problem without success. If we attempt to re-execute the 
 pipeline to take the original PDF and split it, we get apparently random 
 percentages of saved documents.
 For instance, on a 990 page document (text, no images), to be split into 990 
 1-page documents using Tomcat 7 with -Xmx=512m:
 Pass 1: 50% were saved, 50% ended with stack traces
 Pass 2: 100% were saved
 Pass 3: 100% were saved
 The same test with -Xmx=128m ended several times with just 1 document saved, 
 the rest were stack traces.
 We have also seen this randomly hit a sample document consisting of four 
 pages to be split into two two-page documents so it does not appear to be 
 memory related. We also added code to catch the IndexOutOfBoundsException and 
 make up to ten attempts to repeat, but it seems the save() either works the 
 first time or not at all.
 We're thinking there are environmental factors here but we're now focused on 
 getting this nailed. Any advice or assistance will be welcomed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (PDFBOX-1586) IndexOutOfBoundsException when saving a document (at random)

2013-05-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649108#comment-13649108
 ] 

Andreas Lehmkühler edited comment on PDFBOX-1586 at 5/4/13 4:49 PM:


I removed/reworked some of the direct accesses to the scratch file in revision 
1479136. Now it should be easier to find the real cause for the described issue 
and maybe it'll be easier to remove the direct access at all.

  was (Author: lehmi):
I removed/reworked the direct access to the scratch file in revision 
1479136. Now it should be easier to find the real cause for the described issue 
and maybe it'll be easier to remove the direct access at all.
  
 IndexOutOfBoundsException when saving a document (at random)
 

 Key: PDFBOX-1586
 URL: https://issues.apache.org/jira/browse/PDFBOX-1586
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.8.1
Reporter: James Green
Priority: Critical

 Getting the following stacktrace:
 org.apache.pdfbox.exceptions.COSVisitorException: 
 java.lang.IndexOutOfBoundsException: Index: 28, Size: 0
 at 
 org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1245)
 at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:201)
 at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:206)
 at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:524)
 at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:434)
 at 
 org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1056)
 at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:496)
 at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1392)
 at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1157)
 at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1138)
 ...
 Caused by: java.lang.IndexOutOfBoundsException: Index: 28, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:604)
 at java.util.ArrayList.get(ArrayList.java:382)
 at 
 org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
 at 
 org.apache.pdfbox.io.RandomAccessFileInputStream.read(RandomAccessFileInputStream.java:96)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
 at 
 org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1232)
 I'll add some context. We have a data pipeline in which a Windows Print 
 Monitor sends postscript into a servlet which then uses GhostScript 9.05 to 
 convert in-memory to PDF. This PDF is then loaded into PDFBox using 
 PDDocument.load().
 At this point we split the original PDF into multiple smaller ones each of 
 which is saved to a ByteArrayOutputStream. At the point of save() we are 
 having serious reliability issues.
 Taking an original PDF from Ghostscript we have saved this into a unit test 
 to replicate the problem without success. If we attempt to re-execute the 
 pipeline to take the original PDF and split it, we get apparently random 
 percentages of saved documents.
 For instance, on a 990 page document (text, no images), to be split into 990 
 1-page documents using Tomcat 7 with -Xmx=512m:
 Pass 1: 50% were saved, 50% ended with stack traces
 Pass 2: 100% were saved
 Pass 3: 100% were saved
 The same test with -Xmx=128m ended several times with just 1 document saved, 
 the rest were stack traces.
 We have also seen this randomly hit a sample document consisting of four 
 pages to be split into two two-page documents so it does not appear to be 
 memory related. We also added code to catch the IndexOutOfBoundsException and 
 make up to ten attempts to repeat, but it seems the save() either works the 
 first time or not at all.
 We're thinking there are environmental factors here but we're now focused on 
 getting this nailed. Any advice or assistance will be welcomed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1586) IndexOutOfBoundsException when saving a document (at random)

2013-05-04 Thread James Green (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649156#comment-13649156
 ] 

James Green commented on PDFBOX-1586:
-

I looked at this today. I now understand how we experience this problem.

We have a class, Divider, which has a number of methods. We'll start at method 
a() which accepts a ByteArrayInputStream that holds a PDF. a() calls splitter() 
handing it this ByteArrayInputStream and expecting a list of PDDocuments back.

class Divider {

public Listbyte[] a(ByteArrayInputStream pdfBytes) {

ListPDDocument split = splitter(pdfBytes);
Listbyte[] retVal = new ArrayListbyte[]();
for (PDDocument p : split) {
ByteArrayOutputStream os = new ByteArrayOutputStream();
p.save(os);
retVal.add(os.toByteArray());
}
return retVal;
}

public ListPDDocument splitter(ByteArrayInputStream masterPdf) {
ListPDDocument retVal = new ArrayListPDDocument();
PDDocument doc = PDDocument.load(masterPdf);
ListPDPage pages = doc.getDocumentCatalog().getAllPages();

// Iterate over the pages and import each page into new PDDocuments added to 
retVal
return retVal;
}
}

Because splitter internally creates new PDDocument and performs importPage() 
referencing the individual pages from the master document, the master document 
falls of of scope the moment the splitter's work is done. Not unreasonable.

The trouble is that importPage passes the new PDPage a reference to the 
original's scratchFile. So the moment the GC clears up the master document 
having returned to a() the scratchFile is closed, causing a()'s saving to 
crash. All blindingly obvious when you realise the scratchFile is being copied 
inside the importPage routine which most people might expect would perform a 
clean clone.

That hopefully concludes things. We can of course re-work our code to avoid the 
bug but it would be sensible to make importPage perform a proper clone at some 
point.


 IndexOutOfBoundsException when saving a document (at random)
 

 Key: PDFBOX-1586
 URL: https://issues.apache.org/jira/browse/PDFBOX-1586
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.8.1
Reporter: James Green
Priority: Critical

 Getting the following stacktrace:
 org.apache.pdfbox.exceptions.COSVisitorException: 
 java.lang.IndexOutOfBoundsException: Index: 28, Size: 0
 at 
 org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1245)
 at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:201)
 at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:206)
 at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:524)
 at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:434)
 at 
 org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1056)
 at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:496)
 at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1392)
 at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1157)
 at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1138)
 ...
 Caused by: java.lang.IndexOutOfBoundsException: Index: 28, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:604)
 at java.util.ArrayList.get(ArrayList.java:382)
 at 
 org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
 at 
 org.apache.pdfbox.io.RandomAccessFileInputStream.read(RandomAccessFileInputStream.java:96)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
 at 
 org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1232)
 I'll add some context. We have a data pipeline in which a Windows Print 
 Monitor sends postscript into a servlet which then uses GhostScript 9.05 to 
 convert in-memory to PDF. This PDF is then loaded into PDFBox using 
 PDDocument.load().
 At this point we split the original PDF into multiple smaller ones each of 
 which is saved to a ByteArrayOutputStream. At the point of save() we are 
 having serious reliability issues.
 Taking an original PDF from Ghostscript we have saved this into a unit test 
 to replicate the problem without success. If we attempt to re-execute the 
 pipeline to take the original PDF and split it, we get apparently random 
 percentages of saved documents.
 For instance, on a 990 page document (text, no images), to be split into 990 
 1-page documents using Tomcat 7 with -Xmx=512m:
 Pass 1: 50% were saved, 50% ended with stack traces
 Pass 2: 100% were saved
 Pass 3: 100% were saved
 The same test with -Xmx=128m ended several times with just 1 document saved, 
 the rest were stack traces.
 We have