AW: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents
Hi Tilman, I really appreciated the introduction of MemoryUsageSetting a couple of version ago as it saved me a lot of time and headache when we stumbled across an out of memory problem merging single page documents to large multi page documents. If currently PDF box tools do not make use of this configuration option, people using these tools do not benefit from the introduction of MemoryUsageSetting - that's all :-) Daniel -Ursprüngliche Nachricht- Von: Tilman Hausherr [mailto:thaush...@t-online.de] Gesendet: Freitag, 14. Juli 2017 17:08 An: users@pdfbox.apache.org Betreff: Re: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents Hi, No, I did the setter/getter solution which is what you wrote. Re PDFSplit command line - is this a problem that actually happened to you or just an idea? If I start putting a memory option there I may have to put it in every tool :-( Tilman Am 14.07.2017 um 10:39 schrieb d.ham...@aurenz.de: > Hi Tilman, > > I used a decompiler to have a look at the sources. > > Perhaps it would be a good idea to set Splitter() deprecated > > @deprecated > public Splitter() {} > > public Splitter(MemoryUsageSetting memoryUsageSetting) { > this.memoryUsageSetting = memoryUsageSetting; > } > > > to point people to the improvement before they fall into the out of memory > hole themselves. > > > Please add a program argument to PDFSplit.split() like so: > [...]
Re: AW: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents
You are looking at the wrong place. pdfbox-app is just a meta project to create a convience binary of all relevant subprojects. It doesn't contain any source code. The source code you are looking for is here: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/2.0.7-SNAPSHOT/ Andreas > d.ham...@aurenz.de hat am 14. Juli 2017 um 11:05 geschrieben: > > > Hi, > > I talking about the snapshot versions provided here: > > https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/ > > Can you tell me were to download jars containing source files? The source > jars there just contain the META-INF directory but nothing else. > > Thank you! > > -Ursprüngliche Nachricht- > Von: Gilad Denneboom [mailto:gilad.denneb...@gmail.com] > Gesendet: Freitag, 14. Juli 2017 11:03 > An: users@pdfbox.apache.org > Betreff: Re: Splitter.createNewDocument() always uses main memory only - this > leads to out of memory when splitting large documents > > You don't need a decompiler... PDFBox is an open-source library. All the code > is available online. > > On Fri, Jul 14, 2017 at 10:39 AM,wrote: > > > Hi Tilman, > > > > I used a decompiler to have a look at the sources. > > > > Perhaps it would be a good idea to set Splitter() deprecated > > > > @deprecated > > public Splitter() {} > > > > public Splitter(MemoryUsageSetting memoryUsageSetting) { > > this.memoryUsageSetting = memoryUsageSetting; > > } > > > > > > to point people to the improvement before they fall into the out of > > memory hole themselves. > > > > > > Please add a program argument to PDFSplit.split() like so: > > > >if (args[i].equals("-memory")) { > > if (++i >= args.length) { > > PDFSplit.usage(); > > } > > if (args[i].equals("tempFile")) { > > memoryUsageSetting = . > > } else if (args[i].equals("mainMemory")) { > > memoryUsageSetting = . > > } else if (args[i].equals("mixed")) { > > memoryUsageSetting = . > > } else { > > PDFSplit.usage(); > > } > > continue; > > } > > > > Perhaps it would be a good idea to even make "maxMainMemoryBytes" and > > "maxStorageBytes" configurable, too. > > > > Thanks a lot - I really appreciate your great work and support! > > > > Cheers, > > > > Daniel > > > > > > -Ursprüngliche Nachricht- > > Von: Tilman Hausherr [mailto:thaush...@t-online.de] > > Gesendet: Donnerstag, 13. Juli 2017 21:21 > > An: users@pdfbox.apache.org > > Betreff: Re: Splitter.createNewDocument() always uses main memory only > > - this leads to out of memory when splitting large documents > > > > See > > https://issues.apache.org/jira/browse/PDFBOX-3869 > > > > and try a snapshot from > > https://repository.apache.org/content/groups/snapshots/org/ > > apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/ > > (at the bottom) > > > > Please give feedback whether this is what you wanted. Please do it > > quickly because a new version will be built on monday so either I'd > > have to revert before or we'll be stuck with this API. > > > > Re: a global configuration - maybe at a later time. I'm not THAT > > convinced that it is needed. > > > > Tilman > > > > > > Am 13.07.2017 um 09:20 schrieb d.ham...@aurenz.de: > > > Hi dear contributors to pdfbox, > > > > > > I just would like to report that Splitter.createNewDocument() should > > > be > > able to consider different MemoryUsageSetting configurations. > > > > > > In version 2.0.6 this method is implemented as > > > > > > > > > protected PDDocument createNewDocument() throws IOException > > > { > > > PDDocument document = new PDDocument(); > > > document.getDocument().setVersion(getSourceDocument() > > .getVersion()); > > > document.setDocumentInformation(getSourceDocument(). > > getDocumentInformation()); > > > document.getDocumentCatalog().setViewerPreferences( > > > getSourceDocument().getDocumentCatalog(). > > getViewerPreferences()); > > > return document; > > > } > > > > > > > > > > > > I would suggest to introduce a member variable "MemoryUsageSetting > > memSetting" that can be set for each instance of "Splitter". > > > > > > This way createNewDocument() could be implemented as > > > > > > > > > protected PDDocument createNewDocument() throws IOException > > > { > > > PDDocument document = new PDDocument(this. memSetting); > > > document.getDocument().setVersion(getSourceDocument() > > .getVersion()); > > > document.setDocumentInformation(getSourceDocument(). > > getDocumentInformation()); > > >
AW: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents
Hi, I talking about the snapshot versions provided here: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/ Can you tell me were to download jars containing source files? The source jars there just contain the META-INF directory but nothing else. Thank you! -Ursprüngliche Nachricht- Von: Gilad Denneboom [mailto:gilad.denneb...@gmail.com] Gesendet: Freitag, 14. Juli 2017 11:03 An: users@pdfbox.apache.org Betreff: Re: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents You don't need a decompiler... PDFBox is an open-source library. All the code is available online. On Fri, Jul 14, 2017 at 10:39 AM,wrote: > Hi Tilman, > > I used a decompiler to have a look at the sources. > > Perhaps it would be a good idea to set Splitter() deprecated > > @deprecated > public Splitter() {} > > public Splitter(MemoryUsageSetting memoryUsageSetting) { > this.memoryUsageSetting = memoryUsageSetting; > } > > > to point people to the improvement before they fall into the out of > memory hole themselves. > > > Please add a program argument to PDFSplit.split() like so: > >if (args[i].equals("-memory")) { > if (++i >= args.length) { > PDFSplit.usage(); > } > if (args[i].equals("tempFile")) { > memoryUsageSetting = . > } else if (args[i].equals("mainMemory")) { > memoryUsageSetting = . > } else if (args[i].equals("mixed")) { > memoryUsageSetting = . > } else { > PDFSplit.usage(); > } > continue; > } > > Perhaps it would be a good idea to even make "maxMainMemoryBytes" and > "maxStorageBytes" configurable, too. > > Thanks a lot - I really appreciate your great work and support! > > Cheers, > > Daniel > > > -Ursprüngliche Nachricht- > Von: Tilman Hausherr [mailto:thaush...@t-online.de] > Gesendet: Donnerstag, 13. Juli 2017 21:21 > An: users@pdfbox.apache.org > Betreff: Re: Splitter.createNewDocument() always uses main memory only > - this leads to out of memory when splitting large documents > > See > https://issues.apache.org/jira/browse/PDFBOX-3869 > > and try a snapshot from > https://repository.apache.org/content/groups/snapshots/org/ > apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/ > (at the bottom) > > Please give feedback whether this is what you wanted. Please do it > quickly because a new version will be built on monday so either I'd > have to revert before or we'll be stuck with this API. > > Re: a global configuration - maybe at a later time. I'm not THAT > convinced that it is needed. > > Tilman > > > Am 13.07.2017 um 09:20 schrieb d.ham...@aurenz.de: > > Hi dear contributors to pdfbox, > > > > I just would like to report that Splitter.createNewDocument() should > > be > able to consider different MemoryUsageSetting configurations. > > > > In version 2.0.6 this method is implemented as > > > > > > protected PDDocument createNewDocument() throws IOException > > { > > PDDocument document = new PDDocument(); > > document.getDocument().setVersion(getSourceDocument() > .getVersion()); > > document.setDocumentInformation(getSourceDocument(). > getDocumentInformation()); > > document.getDocumentCatalog().setViewerPreferences( > > getSourceDocument().getDocumentCatalog(). > getViewerPreferences()); > > return document; > > } > > > > > > > > I would suggest to introduce a member variable "MemoryUsageSetting > memSetting" that can be set for each instance of "Splitter". > > > > This way createNewDocument() could be implemented as > > > > > > protected PDDocument createNewDocument() throws IOException > > { > > PDDocument document = new PDDocument(this. memSetting); > > document.getDocument().setVersion(getSourceDocument() > .getVersion()); > > document.setDocumentInformation(getSourceDocument(). > getDocumentInformation()); > > document.getDocumentCatalog().setViewerPreferences( > > getSourceDocument().getDocumentCatalog(). > getViewerPreferences()); > > return document; > > } > > > > > > Thankfully createNewDocument() is not private, so I could override > > this method in my child class (as I did for "protected void > > processPage()", too... (just FYI - to create process messages) > > > > > > Please have a look at "PDFMergerUtility.mergeDocuments()" which is > deprecated since MemoryUsageSetting was introduced. Now, the usage of > "PDFMergerUtility.mergeDocuments(MemoryUsageSetting memUsageSetting)" > is encouraged. > > > > > > By the way: The utility "PDFSplit" would
AW: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents
Hi Tilman, I used a decompiler to have a look at the sources. Perhaps it would be a good idea to set Splitter() deprecated @deprecated public Splitter() {} public Splitter(MemoryUsageSetting memoryUsageSetting) { this.memoryUsageSetting = memoryUsageSetting; } to point people to the improvement before they fall into the out of memory hole themselves. Please add a program argument to PDFSplit.split() like so: if (args[i].equals("-memory")) { if (++i >= args.length) { PDFSplit.usage(); } if (args[i].equals("tempFile")) { memoryUsageSetting = . } else if (args[i].equals("mainMemory")) { memoryUsageSetting = . } else if (args[i].equals("mixed")) { memoryUsageSetting = . } else { PDFSplit.usage(); } continue; } Perhaps it would be a good idea to even make "maxMainMemoryBytes" and "maxStorageBytes" configurable, too. Thanks a lot - I really appreciate your great work and support! Cheers, Daniel -Ursprüngliche Nachricht- Von: Tilman Hausherr [mailto:thaush...@t-online.de] Gesendet: Donnerstag, 13. Juli 2017 21:21 An: users@pdfbox.apache.org Betreff: Re: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents See https://issues.apache.org/jira/browse/PDFBOX-3869 and try a snapshot from https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/ (at the bottom) Please give feedback whether this is what you wanted. Please do it quickly because a new version will be built on monday so either I'd have to revert before or we'll be stuck with this API. Re: a global configuration - maybe at a later time. I'm not THAT convinced that it is needed. Tilman Am 13.07.2017 um 09:20 schrieb d.ham...@aurenz.de: > Hi dear contributors to pdfbox, > > I just would like to report that Splitter.createNewDocument() should be able > to consider different MemoryUsageSetting configurations. > > In version 2.0.6 this method is implemented as > > > protected PDDocument createNewDocument() throws IOException > { > PDDocument document = new PDDocument(); > document.getDocument().setVersion(getSourceDocument().getVersion()); > > document.setDocumentInformation(getSourceDocument().getDocumentInformation()); > document.getDocumentCatalog().setViewerPreferences( > > getSourceDocument().getDocumentCatalog().getViewerPreferences()); > return document; > } > > > > I would suggest to introduce a member variable "MemoryUsageSetting > memSetting" that can be set for each instance of "Splitter". > > This way createNewDocument() could be implemented as > > > protected PDDocument createNewDocument() throws IOException > { > PDDocument document = new PDDocument(this. memSetting); > document.getDocument().setVersion(getSourceDocument().getVersion()); > > document.setDocumentInformation(getSourceDocument().getDocumentInformation()); > document.getDocumentCatalog().setViewerPreferences( > > getSourceDocument().getDocumentCatalog().getViewerPreferences()); > return document; > } > > > Thankfully createNewDocument() is not private, so I could override > this method in my child class (as I did for "protected void > processPage()", too... (just FYI - to create process messages) > > > Please have a look at "PDFMergerUtility.mergeDocuments()" which is deprecated > since MemoryUsageSetting was introduced. Now, the usage of > "PDFMergerUtility.mergeDocuments(MemoryUsageSetting memUsageSetting)" is > encouraged. > > > By the way: The utility "PDFSplit" would have to be updated to pass a > configured MemoryUsageSetting to "Splitter" - otherwise this tool relies on > main memory only. > > Perhaps it would be a good thing to be able to define a "pdfbox-wide" > basic MemoryUsageSetting which could be used everywhere as a fallback. > This way the default constructor of PDDocument could be changed from > > its implementation in version 2.0.6 > > public PDDocument() > { > this(MemoryUsageSetting.setupMainMemoryOnly()); > } > > > to something like > > > public PDDocument() > { > this(MemoryUsageSetting.asConfigured()); > } > > > > Regards, > > Daniel > > - > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For
AW: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents
Hi Tilman, thanks a lot for addressing this topic so incredibly fast. I wanted to do the review but source jars from https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/ are significantly smaller than the jars there containing class files only. That is because they just contain the META-INF directory but nothing else - at least as I found with "pdfbox-app-2.0.7-20170713.214057-144-sources" I'm pretty sure though your code change is exactly what I suggested. If you could point me to the source codes of the 2.0.7-SNAPSHOT today before 16:00 - I will definitely have a look at it. Cheers, Daniel -Ursprüngliche Nachricht- Von: Tilman Hausherr [mailto:thaush...@t-online.de] Gesendet: Donnerstag, 13. Juli 2017 21:21 An: users@pdfbox.apache.org Betreff: Re: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents See https://issues.apache.org/jira/browse/PDFBOX-3869 and try a snapshot from https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/ (at the bottom) Please give feedback whether this is what you wanted. Please do it quickly because a new version will be built on monday so either I'd have to revert before or we'll be stuck with this API. Re: a global configuration - maybe at a later time. I'm not THAT convinced that it is needed. Tilman Am 13.07.2017 um 09:20 schrieb d.ham...@aurenz.de: > Hi dear contributors to pdfbox, > > I just would like to report that Splitter.createNewDocument() should be able > to consider different MemoryUsageSetting configurations. > > In version 2.0.6 this method is implemented as > > > protected PDDocument createNewDocument() throws IOException > { > PDDocument document = new PDDocument(); > document.getDocument().setVersion(getSourceDocument().getVersion()); > > document.setDocumentInformation(getSourceDocument().getDocumentInformation()); > document.getDocumentCatalog().setViewerPreferences( > > getSourceDocument().getDocumentCatalog().getViewerPreferences()); > return document; > } > > > > I would suggest to introduce a member variable "MemoryUsageSetting > memSetting" that can be set for each instance of "Splitter". > > This way createNewDocument() could be implemented as > > > protected PDDocument createNewDocument() throws IOException > { > PDDocument document = new PDDocument(this. memSetting); > document.getDocument().setVersion(getSourceDocument().getVersion()); > > document.setDocumentInformation(getSourceDocument().getDocumentInformation()); > document.getDocumentCatalog().setViewerPreferences( > > getSourceDocument().getDocumentCatalog().getViewerPreferences()); > return document; > } > > > Thankfully createNewDocument() is not private, so I could override > this method in my child class (as I did for "protected void > processPage()", too... (just FYI - to create process messages) > > > Please have a look at "PDFMergerUtility.mergeDocuments()" which is deprecated > since MemoryUsageSetting was introduced. Now, the usage of > "PDFMergerUtility.mergeDocuments(MemoryUsageSetting memUsageSetting)" is > encouraged. > > > By the way: The utility "PDFSplit" would have to be updated to pass a > configured MemoryUsageSetting to "Splitter" - otherwise this tool relies on > main memory only. > > Perhaps it would be a good thing to be able to define a "pdfbox-wide" > basic MemoryUsageSetting which could be used everywhere as a fallback. > This way the default constructor of PDDocument could be changed from > > its implementation in version 2.0.6 > > public PDDocument() > { > this(MemoryUsageSetting.setupMainMemoryOnly()); > } > > > to something like > > > public PDDocument() > { > this(MemoryUsageSetting.asConfigured()); > } > > > > Regards, > > Daniel > > - > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org