Hi Tilman,
I used a decompiler to have a look at the sources.
Perhaps it would be a good idea to set Splitter() deprecated
@deprecated
public Splitter() {}
public Splitter(MemoryUsageSetting memoryUsageSetting) {
this.memoryUsageSetting = memoryUsageSetting;
}
to point people to the improvement before they fall into the out of memory hole
themselves.
Please add a program argument to PDFSplit.split() like so:
if (args[i].equals("-memory")) {
if (++i >= args.length) {
PDFSplit.usage();
}
if (args[i].equals("tempFile")) {
memoryUsageSetting = .........
} else if (args[i].equals("mainMemory")) {
memoryUsageSetting = .........
} else if (args[i].equals("mixed")) {
memoryUsageSetting = .........
} else {
PDFSplit.usage();
}
continue;
}
Perhaps it would be a good idea to even make "maxMainMemoryBytes" and
"maxStorageBytes" configurable, too.
Thanks a lot - I really appreciate your great work and support!
Cheers,
Daniel
-----Ursprüngliche Nachricht-----
Von: Tilman Hausherr [mailto:[email protected]]
Gesendet: Donnerstag, 13. Juli 2017 21:21
An: [email protected]
Betreff: Re: Splitter.createNewDocument() always uses main memory only - this
leads to out of memory when splitting large documents
See
https://issues.apache.org/jira/browse/PDFBOX-3869
and try a snapshot from
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/
(at the bottom)
Please give feedback whether this is what you wanted. Please do it quickly
because a new version will be built on monday so either I'd have to revert
before or we'll be stuck with this API.
Re: a global configuration - maybe at a later time. I'm not THAT convinced that
it is needed.
Tilman
Am 13.07.2017 um 09:20 schrieb [email protected]:
> Hi dear contributors to pdfbox,
>
> I just would like to report that Splitter.createNewDocument() should be able
> to consider different MemoryUsageSetting configurations.
>
> In version 2.0.6 this method is implemented as
>
>
> protected PDDocument createNewDocument() throws IOException
> {
> PDDocument document = new PDDocument();
> document.getDocument().setVersion(getSourceDocument().getVersion());
>
> document.setDocumentInformation(getSourceDocument().getDocumentInformation());
> document.getDocumentCatalog().setViewerPreferences(
>
> getSourceDocument().getDocumentCatalog().getViewerPreferences());
> return document;
> }
>
>
>
> I would suggest to introduce a member variable "MemoryUsageSetting
> memSetting" that can be set for each instance of "Splitter".
>
> This way createNewDocument() could be implemented as
>
>
> protected PDDocument createNewDocument() throws IOException
> {
> PDDocument document = new PDDocument(this. memSetting);
> document.getDocument().setVersion(getSourceDocument().getVersion());
>
> document.setDocumentInformation(getSourceDocument().getDocumentInformation());
> document.getDocumentCatalog().setViewerPreferences(
>
> getSourceDocument().getDocumentCatalog().getViewerPreferences());
> return document;
> }
>
>
> Thankfully createNewDocument() is not private, so I could override
> this method in my child class (as I did for "protected void
> processPage()", too... (just FYI - to create process messages)
>
>
> Please have a look at "PDFMergerUtility.mergeDocuments()" which is deprecated
> since MemoryUsageSetting was introduced. Now, the usage of
> "PDFMergerUtility.mergeDocuments(MemoryUsageSetting memUsageSetting)" is
> encouraged.
>
>
> By the way: The utility "PDFSplit" would have to be updated to pass a
> configured MemoryUsageSetting to "Splitter" - otherwise this tool relies on
> main memory only.
>
> Perhaps it would be a good thing to be able to define a "pdfbox-wide"
> basic MemoryUsageSetting which could be used everywhere as a fallback.
> This way the default constructor of PDDocument could be changed from
>
> its implementation in version 2.0.6
>
> public PDDocument()
> {
> this(MemoryUsageSetting.setupMainMemoryOnly());
> }
>
>
> to something like
>
>
> public PDDocument()
> {
> this(MemoryUsageSetting.asConfigured());
> }
>
>
>
> Regards,
>
> Daniel
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]