AW: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents

2017-07-17 Thread D.Hamann
Hi Tilman,

I really appreciated the introduction of MemoryUsageSetting a couple of version 
ago as it saved me a lot of time and headache when we stumbled across an out of 
memory problem merging single page documents to large multi page documents.

If currently PDF box tools do not make use of this configuration option, people 
using these tools do not benefit from the introduction of MemoryUsageSetting - 
that's all :-)

Daniel




-Ursprüngliche Nachricht-
Von: Tilman Hausherr [mailto:thaush...@t-online.de] 
Gesendet: Freitag, 14. Juli 2017 17:08
An: users@pdfbox.apache.org
Betreff: Re: Splitter.createNewDocument() always uses main memory only - this 
leads to out of memory when splitting large documents

Hi,

No, I did the setter/getter solution which is what you wrote.

Re PDFSplit command line - is this a problem that actually happened to you or 
just an idea?  If I start putting a memory option there I may have to put it in 
every tool :-(

Tilman

Am 14.07.2017 um 10:39 schrieb d.ham...@aurenz.de:
> Hi Tilman,
>
> I used a decompiler to have a look at the sources.
>
> Perhaps it would be a good idea to set Splitter() deprecated
>
>  @deprecated
>  public Splitter() {}
>
>  public Splitter(MemoryUsageSetting memoryUsageSetting) {
>   this.memoryUsageSetting = memoryUsageSetting;
>  }
>
>
> to point people to the improvement before they fall into the out of memory 
> hole themselves.
>
>
> Please add a program argument to PDFSplit.split() like so:
>
[...]



Re: AW: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents

2017-07-14 Thread Andreas Lehmkühler
You are looking at the wrong place. pdfbox-app is just a meta project to create 
a convience binary of all relevant subprojects. It doesn't contain any source 
code.

The source code you are looking for is here:

https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/2.0.7-SNAPSHOT/

Andreas

> d.ham...@aurenz.de hat am 14. Juli 2017 um 11:05 geschrieben:
> 
> 
> Hi,
> 
> I talking about the snapshot versions provided here:
> 
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/
> 
> Can you tell me were to download jars containing source files? The source 
> jars there just contain the META-INF directory but nothing else.
> 
> Thank you!
> 
> -Ursprüngliche Nachricht-
> Von: Gilad Denneboom [mailto:gilad.denneb...@gmail.com] 
> Gesendet: Freitag, 14. Juli 2017 11:03
> An: users@pdfbox.apache.org
> Betreff: Re: Splitter.createNewDocument() always uses main memory only - this 
> leads to out of memory when splitting large documents
> 
> You don't need a decompiler... PDFBox is an open-source library. All the code 
> is available online.
> 
> On Fri, Jul 14, 2017 at 10:39 AM,  wrote:
> 
> > Hi Tilman,
> >
> > I used a decompiler to have a look at the sources.
> >
> > Perhaps it would be a good idea to set Splitter() deprecated
> >
> > @deprecated
> > public Splitter() {}
> >
> > public Splitter(MemoryUsageSetting memoryUsageSetting) {
> > this.memoryUsageSetting = memoryUsageSetting;
> > }
> >
> >
> > to point people to the improvement before they fall into the out of 
> > memory hole themselves.
> >
> >
> > Please add a program argument to PDFSplit.split() like so:
> >
> >if (args[i].equals("-memory")) {
> > if (++i >= args.length) {
> > PDFSplit.usage();
> > }
> > if (args[i].equals("tempFile")) {
> >   memoryUsageSetting = .
> > } else if (args[i].equals("mainMemory")) {
> >   memoryUsageSetting = .
> > } else if (args[i].equals("mixed")) {
> >   memoryUsageSetting = .
> > } else {
> >   PDFSplit.usage();
> > }
> > continue;
> > }
> >
> > Perhaps it would be a good idea to even make "maxMainMemoryBytes" and 
> > "maxStorageBytes" configurable, too.
> >
> > Thanks a lot - I really appreciate your great work and support!
> >
> > Cheers,
> >
> > Daniel
> >
> >
> > -Ursprüngliche Nachricht-
> > Von: Tilman Hausherr [mailto:thaush...@t-online.de]
> > Gesendet: Donnerstag, 13. Juli 2017 21:21
> > An: users@pdfbox.apache.org
> > Betreff: Re: Splitter.createNewDocument() always uses main memory only 
> > - this leads to out of memory when splitting large documents
> >
> > See
> > https://issues.apache.org/jira/browse/PDFBOX-3869
> >
> > and try a snapshot from
> > https://repository.apache.org/content/groups/snapshots/org/
> > apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/
> > (at the bottom)
> >
> > Please give feedback whether this is what you wanted. Please do it 
> > quickly because a new version will be built on monday so either I'd 
> > have to revert before or we'll be stuck with this API.
> >
> > Re: a global configuration - maybe at a later time. I'm not THAT 
> > convinced that it is needed.
> >
> > Tilman
> >
> >
> > Am 13.07.2017 um 09:20 schrieb d.ham...@aurenz.de:
> > > Hi dear contributors to pdfbox,
> > >
> > > I just would like to report that Splitter.createNewDocument() should 
> > > be
> > able to consider different MemoryUsageSetting configurations.
> > >
> > > In version 2.0.6 this method is implemented as
> > >
> > >
> > > protected PDDocument createNewDocument() throws IOException
> > >  {
> > >  PDDocument document = new PDDocument();
> > >  document.getDocument().setVersion(getSourceDocument()
> > .getVersion());
> > >  document.setDocumentInformation(getSourceDocument().
> > getDocumentInformation());
> > >  document.getDocumentCatalog().setViewerPreferences(
> > >  getSourceDocument().getDocumentCatalog().
> > getViewerPreferences());
> > >  return document;
> > >  }
> > >
> > >
> > >
> > > I would suggest to introduce a member variable "MemoryUsageSetting
> > memSetting" that can be set for each instance of "Splitter".
> > >
> > > This way createNewDocument() could be implemented as
> > >
> > >
> > > protected PDDocument createNewDocument() throws IOException
> > >  {
> > >  PDDocument document = new PDDocument(this. memSetting);
> > >  document.getDocument().setVersion(getSourceDocument()
> > .getVersion());
> > >  document.setDocumentInformation(getSourceDocument().
> > getDocumentInformation());
> > >  

AW: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents

2017-07-14 Thread D.Hamann
Hi,

I talking about the snapshot versions provided here:

https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/

Can you tell me were to download jars containing source files? The source jars 
there just contain the META-INF directory but nothing else.

Thank you!

-Ursprüngliche Nachricht-
Von: Gilad Denneboom [mailto:gilad.denneb...@gmail.com] 
Gesendet: Freitag, 14. Juli 2017 11:03
An: users@pdfbox.apache.org
Betreff: Re: Splitter.createNewDocument() always uses main memory only - this 
leads to out of memory when splitting large documents

You don't need a decompiler... PDFBox is an open-source library. All the code 
is available online.

On Fri, Jul 14, 2017 at 10:39 AM,  wrote:

> Hi Tilman,
>
> I used a decompiler to have a look at the sources.
>
> Perhaps it would be a good idea to set Splitter() deprecated
>
> @deprecated
> public Splitter() {}
>
> public Splitter(MemoryUsageSetting memoryUsageSetting) {
> this.memoryUsageSetting = memoryUsageSetting;
> }
>
>
> to point people to the improvement before they fall into the out of 
> memory hole themselves.
>
>
> Please add a program argument to PDFSplit.split() like so:
>
>if (args[i].equals("-memory")) {
> if (++i >= args.length) {
> PDFSplit.usage();
> }
> if (args[i].equals("tempFile")) {
>   memoryUsageSetting = .
> } else if (args[i].equals("mainMemory")) {
>   memoryUsageSetting = .
> } else if (args[i].equals("mixed")) {
>   memoryUsageSetting = .
> } else {
>   PDFSplit.usage();
> }
> continue;
> }
>
> Perhaps it would be a good idea to even make "maxMainMemoryBytes" and 
> "maxStorageBytes" configurable, too.
>
> Thanks a lot - I really appreciate your great work and support!
>
> Cheers,
>
> Daniel
>
>
> -Ursprüngliche Nachricht-
> Von: Tilman Hausherr [mailto:thaush...@t-online.de]
> Gesendet: Donnerstag, 13. Juli 2017 21:21
> An: users@pdfbox.apache.org
> Betreff: Re: Splitter.createNewDocument() always uses main memory only 
> - this leads to out of memory when splitting large documents
>
> See
> https://issues.apache.org/jira/browse/PDFBOX-3869
>
> and try a snapshot from
> https://repository.apache.org/content/groups/snapshots/org/
> apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/
> (at the bottom)
>
> Please give feedback whether this is what you wanted. Please do it 
> quickly because a new version will be built on monday so either I'd 
> have to revert before or we'll be stuck with this API.
>
> Re: a global configuration - maybe at a later time. I'm not THAT 
> convinced that it is needed.
>
> Tilman
>
>
> Am 13.07.2017 um 09:20 schrieb d.ham...@aurenz.de:
> > Hi dear contributors to pdfbox,
> >
> > I just would like to report that Splitter.createNewDocument() should 
> > be
> able to consider different MemoryUsageSetting configurations.
> >
> > In version 2.0.6 this method is implemented as
> >
> >
> > protected PDDocument createNewDocument() throws IOException
> >  {
> >  PDDocument document = new PDDocument();
> >  document.getDocument().setVersion(getSourceDocument()
> .getVersion());
> >  document.setDocumentInformation(getSourceDocument().
> getDocumentInformation());
> >  document.getDocumentCatalog().setViewerPreferences(
> >  getSourceDocument().getDocumentCatalog().
> getViewerPreferences());
> >  return document;
> >  }
> >
> >
> >
> > I would suggest to introduce a member variable "MemoryUsageSetting
> memSetting" that can be set for each instance of "Splitter".
> >
> > This way createNewDocument() could be implemented as
> >
> >
> > protected PDDocument createNewDocument() throws IOException
> >  {
> >  PDDocument document = new PDDocument(this. memSetting);
> >  document.getDocument().setVersion(getSourceDocument()
> .getVersion());
> >  document.setDocumentInformation(getSourceDocument().
> getDocumentInformation());
> >  document.getDocumentCatalog().setViewerPreferences(
> >  getSourceDocument().getDocumentCatalog().
> getViewerPreferences());
> >  return document;
> >  }
> >
> >
> > Thankfully createNewDocument() is not private, so I could override 
> > this method in my child class (as I did for "protected void 
> > processPage()", too... (just FYI - to create process messages)
> >
> >
> > Please have a look at "PDFMergerUtility.mergeDocuments()" which is
> deprecated since MemoryUsageSetting was introduced. Now, the usage of 
> "PDFMergerUtility.mergeDocuments(MemoryUsageSetting memUsageSetting)" 
> is encouraged.
> >
> >
> > By the way: The utility "PDFSplit" would 

AW: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents

2017-07-14 Thread D.Hamann
Hi Tilman,

I used a decompiler to have a look at the sources.

Perhaps it would be a good idea to set Splitter() deprecated

@deprecated
public Splitter() {}

public Splitter(MemoryUsageSetting memoryUsageSetting) {
this.memoryUsageSetting = memoryUsageSetting;
}


to point people to the improvement before they fall into the out of memory hole 
themselves.


Please add a program argument to PDFSplit.split() like so:

   if (args[i].equals("-memory")) {
if (++i >= args.length) {
PDFSplit.usage();
}
if (args[i].equals("tempFile")) {
  memoryUsageSetting = .
} else if (args[i].equals("mainMemory")) {
  memoryUsageSetting = .
} else if (args[i].equals("mixed")) {
  memoryUsageSetting = .
} else {
  PDFSplit.usage();
}
continue;
}

Perhaps it would be a good idea to even make "maxMainMemoryBytes" and 
"maxStorageBytes" configurable, too.

Thanks a lot - I really appreciate your great work and support!

Cheers, 

Daniel


-Ursprüngliche Nachricht-
Von: Tilman Hausherr [mailto:thaush...@t-online.de] 
Gesendet: Donnerstag, 13. Juli 2017 21:21
An: users@pdfbox.apache.org
Betreff: Re: Splitter.createNewDocument() always uses main memory only - this 
leads to out of memory when splitting large documents

See
https://issues.apache.org/jira/browse/PDFBOX-3869

and try a snapshot from
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/
(at the bottom)

Please give feedback whether this is what you wanted. Please do it quickly 
because a new version will be built on monday so either I'd have to revert 
before or we'll be stuck with this API.

Re: a global configuration - maybe at a later time. I'm not THAT convinced that 
it is needed.

Tilman


Am 13.07.2017 um 09:20 schrieb d.ham...@aurenz.de:
> Hi dear contributors to pdfbox,
>
> I just would like to report that Splitter.createNewDocument() should be able 
> to consider different MemoryUsageSetting configurations.
>
> In version 2.0.6 this method is implemented as
>
>
> protected PDDocument createNewDocument() throws IOException
>  {
>  PDDocument document = new PDDocument();
>  document.getDocument().setVersion(getSourceDocument().getVersion());
>  
> document.setDocumentInformation(getSourceDocument().getDocumentInformation());
>  document.getDocumentCatalog().setViewerPreferences(
>  
> getSourceDocument().getDocumentCatalog().getViewerPreferences());
>  return document;
>  }
>
>
>
> I would suggest to introduce a member variable "MemoryUsageSetting 
> memSetting" that can be set for each instance of "Splitter".
>
> This way createNewDocument() could be implemented as
>
>
> protected PDDocument createNewDocument() throws IOException
>  {
>  PDDocument document = new PDDocument(this. memSetting);
>  document.getDocument().setVersion(getSourceDocument().getVersion());
>  
> document.setDocumentInformation(getSourceDocument().getDocumentInformation());
>  document.getDocumentCatalog().setViewerPreferences(
>  
> getSourceDocument().getDocumentCatalog().getViewerPreferences());
>  return document;
>  }
>
>
> Thankfully createNewDocument() is not private, so I could override 
> this method in my child class (as I did for "protected void 
> processPage()", too... (just FYI - to create process messages)
>
>
> Please have a look at "PDFMergerUtility.mergeDocuments()" which is deprecated 
> since MemoryUsageSetting was introduced. Now, the usage of 
> "PDFMergerUtility.mergeDocuments(MemoryUsageSetting memUsageSetting)" is 
> encouraged.
>
>
> By the way: The utility "PDFSplit" would have to be updated to pass a 
> configured MemoryUsageSetting to "Splitter" - otherwise this tool relies on 
> main memory only.
>
> Perhaps it would be a good thing to be able to define a "pdfbox-wide" 
> basic MemoryUsageSetting which could be used everywhere as a fallback. 
> This way the default constructor of PDDocument could be changed from
>
> its implementation in version 2.0.6
>
> public PDDocument()
>  {
>  this(MemoryUsageSetting.setupMainMemoryOnly());
>  }
>
>
> to something like
>
>
> public PDDocument()
>  {
>  this(MemoryUsageSetting.asConfigured());
>  }
>
>
>
> Regards,
>
> Daniel
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For 

AW: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents

2017-07-14 Thread D.Hamann
Hi Tilman,

thanks a lot for addressing this topic so incredibly fast. I wanted to do the 
review but source jars from

https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/

are significantly smaller than the jars there containing class files only. That 
is because they just contain the META-INF directory but nothing else - at least 
as I found with "pdfbox-app-2.0.7-20170713.214057-144-sources"

I'm pretty sure though your code change is exactly what I suggested. If you 
could point me to the source codes of the 2.0.7-SNAPSHOT today before 16:00 - I 
will definitely have a look at it.

Cheers,

Daniel


-Ursprüngliche Nachricht-
Von: Tilman Hausherr [mailto:thaush...@t-online.de] 
Gesendet: Donnerstag, 13. Juli 2017 21:21
An: users@pdfbox.apache.org
Betreff: Re: Splitter.createNewDocument() always uses main memory only - this 
leads to out of memory when splitting large documents

See
https://issues.apache.org/jira/browse/PDFBOX-3869

and try a snapshot from
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/
(at the bottom)

Please give feedback whether this is what you wanted. Please do it quickly 
because a new version will be built on monday so either I'd have to revert 
before or we'll be stuck with this API.

Re: a global configuration - maybe at a later time. I'm not THAT convinced that 
it is needed.

Tilman


Am 13.07.2017 um 09:20 schrieb d.ham...@aurenz.de:
> Hi dear contributors to pdfbox,
>
> I just would like to report that Splitter.createNewDocument() should be able 
> to consider different MemoryUsageSetting configurations.
>
> In version 2.0.6 this method is implemented as
>
>
> protected PDDocument createNewDocument() throws IOException
>  {
>  PDDocument document = new PDDocument();
>  document.getDocument().setVersion(getSourceDocument().getVersion());
>  
> document.setDocumentInformation(getSourceDocument().getDocumentInformation());
>  document.getDocumentCatalog().setViewerPreferences(
>  
> getSourceDocument().getDocumentCatalog().getViewerPreferences());
>  return document;
>  }
>
>
>
> I would suggest to introduce a member variable "MemoryUsageSetting 
> memSetting" that can be set for each instance of "Splitter".
>
> This way createNewDocument() could be implemented as
>
>
> protected PDDocument createNewDocument() throws IOException
>  {
>  PDDocument document = new PDDocument(this. memSetting);
>  document.getDocument().setVersion(getSourceDocument().getVersion());
>  
> document.setDocumentInformation(getSourceDocument().getDocumentInformation());
>  document.getDocumentCatalog().setViewerPreferences(
>  
> getSourceDocument().getDocumentCatalog().getViewerPreferences());
>  return document;
>  }
>
>
> Thankfully createNewDocument() is not private, so I could override 
> this method in my child class (as I did for "protected void 
> processPage()", too... (just FYI - to create process messages)
>
>
> Please have a look at "PDFMergerUtility.mergeDocuments()" which is deprecated 
> since MemoryUsageSetting was introduced. Now, the usage of 
> "PDFMergerUtility.mergeDocuments(MemoryUsageSetting memUsageSetting)" is 
> encouraged.
>
>
> By the way: The utility "PDFSplit" would have to be updated to pass a 
> configured MemoryUsageSetting to "Splitter" - otherwise this tool relies on 
> main memory only.
>
> Perhaps it would be a good thing to be able to define a "pdfbox-wide" 
> basic MemoryUsageSetting which could be used everywhere as a fallback. 
> This way the default constructor of PDDocument could be changed from
>
> its implementation in version 2.0.6
>
> public PDDocument()
>  {
>  this(MemoryUsageSetting.setupMainMemoryOnly());
>  }
>
>
> to something like
>
>
> public PDDocument()
>  {
>  this(MemoryUsageSetting.asConfigured());
>  }
>
>
>
> Regards,
>
> Daniel
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org