There is another issue with BigBufferedImage - it does not work well on 32 bit JVMs. The problem is that it creates a single MappedByteBuffer for the entire size of a channel's data (for the problem file, this means that each buffer has 172,975,104 bytes).
I think it would be better to create a DataBuffer implementation that uses org.apache.pdfbox.io.ScratchFile and create a PdfBoxBufferedImage... - K Kevin Day *trumpet**p| *480.961.6003 x1002 *e| *ke...@trumpetinc.com www.trumpetinc.com <http://trumpetinc.com/> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog <http://trumpetinc.com/blog/>| Twitter <https://twitter.com/trumpetinc> Proud to be Great Place To Work <https://www.greatplacetowork.com/certified-company/7012667> certified since 2019 On Sun, Nov 7, 2021 at 8:36 AM Tilman Hausherr <thaush...@t-online.de> wrote: > Am 07.11.2021 um 16:09 schrieb Kevin Day: > > ok - I think that maybe there are two things going on, then. The first > is > > that we are allocating a single buffer that is bigger than the default > heap > > (700MB > 500MB). I really think that the only two solutions are to move > > this off the heap or reduce the size of the allocation. > > > > Then the second issue is maybe a memory leak (I have not been able to see > > this because of the first problem unless I set my max heap to a huge > number > > - and that makes memory leak analysis very difficult). I am going to > clone > > the PdfBox git repo now and swap that class in then do a little bit of > > profiling with jvisualvm to see if I can help find the memory leak. > > > > Question: The Git repo for PdfBox says that it is a mirror. If I do > wind > > up creating a pull request against this repo, will you be able to accept > it? > > Not directly but we can create a .diff / .patch file from the PR. > > In the meantime I ran my rendering regression checks which renders 1000 > files. The hard disk C: went full after using 260 GB :-( > > Tilman > > > > > > > > FYI - the use of sun.* classes is just to work around a long standing > > native resource leak bug in the Windows MappedByteBuffer implementation - > > On Windows, MappedByteBuffer does not release the allocated native > resource > > until finalize() is called, and because the MappedByteBuffer itself does > > not take up much heap space, it can stay on the heap for quite a long > time > > even after all references are gone. So the sun.* reference is just to > > force the release of those native resources. I used the same technique > > when I created the high throughput IO layer for iText. Instead of > > referencing the package directly, we can use reflection (and only use it > > for winXXX platform) - this shouldn't be a big problem. > > > > K > > > > Kevin Day > > > > *trumpet**p| *480.961.6003 x1002 > > *e| *ke...@trumpetinc.com > > www.trumpetinc.com <http://trumpetinc.com/> > > LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog > > <http://trumpetinc.com/blog/>| Twitter <https://twitter.com/trumpetinc> > > > > Proud to be Great Place To Work > > <https://www.greatplacetowork.com/certified-company/7012667> certified > > since 2019 > > > > > > On Sun, Nov 7, 2021 at 7:53 AM Tilman Hausherr <thaush...@t-online.de> > > wrote: > > > >> I tried it and it works pretty fast. The memory leak is still there but > >> it comes later. > >> > >> Other drawbacks arethat it doesn't clean up after itself fast enough, > >> and it uses a sun.* class. > >> > >> Tilman > >> > >> Am 07.11.2021 um 14:31 schrieb Tilman Hausherr: > >>> Am 05.11.2021 um 22:09 schrieb Kevin Day: > >>>> ok - so should we be clamping the xstep in some way? Or at this > >>>> depth of > >>>> the algorithm do we not have enough context to actually tell that it > >>>> will > >>>> be outside the page/clipping region? > >>> Yes + yes, that is the problem. The context would have to include the > >>> region but also the current transformation matrix. Which could change > >>> despite using the same pattern. So it's tricky. > >>> > >>>> > >>>> I'm beginning to think that the BigBufferedImage might be the right > >>>> solution... This is very inefficient, but honestly, the PDF is not > >>>> exactly > >>>> well formed, so if these particular files wind up being slower to > render > >>>> b/c they have to swap image content to disk, I think that is OK... > >>> Maybe... the good thing is that we know when the image will be "too > >>> big". The license is OK: > >>> > >>> https://www.apache.org/legal/resolved.html > >>> > >>> But I'm still wondering whether we have a memory leak. If we have, > >>> then it should be fixed. > >>> > >>> Tilman > >>> > >>> > >>>> - K > >>>> > >>>> > >>>> Kevin Day > >>>> > >>>> *trumpet**p| *480.961.6003 x1002 > >>>> *e| *ke...@trumpetinc.com > >>>> www.trumpetinc.com <http://trumpetinc.com/> > >>>> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet > Blog > >>>> <http://trumpetinc.com/blog/>| Twitter < > https://twitter.com/trumpetinc> > >>>> > >>>> Proud to be Great Place To Work > >>>> <https://www.greatplacetowork.com/certified-company/7012667> > Certified > >>>> 2020-2021 > >>>> > >>>> > >>>> On Fri, Nov 5, 2021 at 12:47 PM Tilman Hausherr < > thaush...@t-online.de> > >>>> wrote: > >>>> > >>>>> Am 05.11.2021 um 20:38 schrieb Kevin Day: > >>>>>> I do have a bit of experience with memory leaks (side effect of > >>>>>> writing > >>>>>> high performance java code, unfortunately!) > >>>>>> > >>>>>> > >>>>>> I am pretty sure that this is not a memory leak, though. If this > >>>>>> was a > >>>>>> memory leak, we would see calls happening multiple times before > >>>>>> failure. > >>>>>> > >>>>>> The failure happens on the very first call to create the buffered > >>>>>> image: > >>>>>> > >>>>>> BufferedImage image = new BufferedImage(rasterWidth, > >>>>> rasterHeight, > >>>>>> BufferedImage.TYPE_INT_ARGB); > >>>>>> > >>>>>> Nothing is even getting put into the weakcache. > >>>>>> > >>>>> That is done in TilingPaintFactory. The cache was needed for some > files > >>>>> that have used the pattern several times, to avoid recreating the > image > >>>>> for the TexturePaint. > >>>>> > >>>>> > >>>>>> I think the core issue is that the raster data for the image is > >>>>>> just too > >>>>>> big to store on the heap (without the heap being massive). It's > >>>>>> 13152x13152x4 bytes (there is an alpha channel on this as well) - > >>>>>> that's > >>>>>> almost 700MB - just for this one pattern. > >>>>>> > >>>>>> > >>>>>> Tilman wrote: "That > >>>>>> pattern has an XStep and YStep of 23438 although the image is 2148 x > >>>>>> 440. Because of a matrix scale of 0.0673396 the image pattern size > is > >>>>>> 1578 x 1578 at 72 dpi. So at 1200 dpi the size would be about 26300 > x > >>>>>> 26300" > >>>>>> > >>>>>> I am fairly familiar with PDF (I contributed the parsing library to > >>>>>> iText > >>>>>> back in the day), but I'm not very familiar with rendering tile > >>>>>> patterns, > >>>>>> so the above is not intuitive to me (yet). Is there some chance > that > >>>>>> PdfBox is doing a lot more work than is necessary for this > particular > >>>>> PDF? > >>>>>> Like is the matrix scale just absurdly bad in this file? For an > image > >>>>> that > >>>>>> is originally 2148x440, it seems like requiring a 700MB raster > >>>>>> should not > >>>>>> be necessary - but I do NOT have nearly enough experience with this > >>>>>> area > >>>>> of > >>>>>> PDF... > >>>>> The ridiculous thing isn't the matrix, it's the XStep and YStep. This > >>>>> goes well outside of the page. The 0.067 matrix scaling makes it > >>>>> less bad. > >>>>> > >>>>> A weakness in PDFBox is that we don't readjust the tile image, or > >>>>> use an > >>>>> alternative painting method that doesn't use TexturePaint when the > >>>>> image > >>>>> would be painted only once. > >>>>> > >>>>> Tilman > >>>>> > >>>>> > >>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> - Kevin > >>>>>> > >>>>>> > >>>>>> Kevin Day > >>>>>> > >>>>>> *trumpet**p| *480.961.6003 x1002 > >>>>>> *e| *ke...@trumpetinc.com > >>>>>> www.trumpetinc.com <http://trumpetinc.com/> > >>>>>> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet > >> Blog > >>>>>> <http://trumpetinc.com/blog/>| Twitter > >>>>>> <https://twitter.com/trumpetinc> > >>>>>> > >>>>>> Proud to be Great Place To Work > >>>>>> <https://www.greatplacetowork.com/certified-company/7012667> > >> Certified > >>>>>> 2020-2021 > >>>>>> > >>>>>> > >>>>>> On Fri, Nov 5, 2021 at 11:27 AM Tilman Hausherr > >>>>>> <thaush...@t-online.de> > >>>>>> wrote: > >>>>>> > >>>>>>> If you're experienced with memory leaks then it would be nice if > you > >>>>>>> could search. > >>>>>>> > >>>>>>> Things I tried that didn't help: > >>>>>>> - calling graphics.setPaint(null) after operations (to "lose" > >>>>>>> TilingPaint objects) > >>>>>>> - disabling the (weak) cache of TilingPaint objects > >>>>>>> - adding finalize to see if the TilingPaint class gets finalized > >>>>>>> (yes) > >>>>>>> - adding finalize to see if the HighResolutionImageIcon class gets > >>>>>>> finalized (yes) > >>>>>>> > >>>>>>> Tilman > >>>>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >>>>> For additional commands, e-mail: users-h...@pdfbox.apache.org > >>>>> > >>>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >>> For additional commands, e-mail: users-h...@pdfbox.apache.org > >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >> For additional commands, e-mail: users-h...@pdfbox.apache.org > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > >