Hi,

Am 04.06.24 um 10:44 schrieb Constantine Dokolas:
Hi all!

I have a requirement for PDFBox memory management where a multi-threaded
process that is generating PDF files (one per thread, at most) should share
a certain total amount of RAM (any excess should use scratch files). This
is because PDFs are in the order of thousands of pages and in-memory
resources must come from a limited, common pool for efficient use of heap
memory.
That seems to be a rare combination multiple threads and the creation of huge pdfs. However, I'd like to share some thoughts

Is there any mechanism available for this purpose? MemoryUsageSetting
appears to control each PDF separately, but I need more flexibility, i.e.
some sort of pooling of RAM/file resources.
I guess this can't be done with 2.0.x as the cache management is somewhere under the hood, so that configuration is limited to the possible variations implemented in MemoryUsageSetting as you already said.

I've looked a little into the improvements in 3.0 regarding the "stream
cache" and it could be a solution, albeit with some extra work.
In 3.0 the caching was overhauled. It is limited to write operations which fits perfectly into your usecase. And more important the user is able to control the usage of the stream cache.

In your case it should be possible to create exactly one instance of org.apache.pdfbox.io.ScratchFile using the desired configuration and re-use it for all pdfs you are creating. The class should be thread-safe. You might implement your own StreamCache if you need something more sophisticated

Andreas


Any ideas?

C.D.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to