ok - I think that maybe there are two things going on, then. The first is that we are allocating a single buffer that is bigger than the default heap (700MB > 500MB). I really think that the only two solutions are to move this off the heap or reduce the size of the allocation.
Then the second issue is maybe a memory leak (I have not been able to see this because of the first problem unless I set my max heap to a huge number - and that makes memory leak analysis very difficult). I am going to clone the PdfBox git repo now and swap that class in then do a little bit of profiling with jvisualvm to see if I can help find the memory leak. Question: The Git repo for PdfBox says that it is a mirror. If I do wind up creating a pull request against this repo, will you be able to accept it? FYI - the use of sun.* classes is just to work around a long standing native resource leak bug in the Windows MappedByteBuffer implementation - On Windows, MappedByteBuffer does not release the allocated native resource until finalize() is called, and because the MappedByteBuffer itself does not take up much heap space, it can stay on the heap for quite a long time even after all references are gone. So the sun.* reference is just to force the release of those native resources. I used the same technique when I created the high throughput IO layer for iText. Instead of referencing the package directly, we can use reflection (and only use it for winXXX platform) - this shouldn't be a big problem. K Kevin Day *trumpet**p| *480.961.6003 x1002 *e| *ke...@trumpetinc.com www.trumpetinc.com <http://trumpetinc.com/> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog <http://trumpetinc.com/blog/>| Twitter <https://twitter.com/trumpetinc> Proud to be Great Place To Work <https://www.greatplacetowork.com/certified-company/7012667> certified since 2019 On Sun, Nov 7, 2021 at 7:53 AM Tilman Hausherr <thaush...@t-online.de> wrote: > I tried it and it works pretty fast. The memory leak is still there but > it comes later. > > Other drawbacks arethat it doesn't clean up after itself fast enough, > and it uses a sun.* class. > > Tilman > > Am 07.11.2021 um 14:31 schrieb Tilman Hausherr: > > Am 05.11.2021 um 22:09 schrieb Kevin Day: > >> ok - so should we be clamping the xstep in some way? Or at this > >> depth of > >> the algorithm do we not have enough context to actually tell that it > >> will > >> be outside the page/clipping region? > > > > Yes + yes, that is the problem. The context would have to include the > > region but also the current transformation matrix. Which could change > > despite using the same pattern. So it's tricky. > > > >> > >> > >> I'm beginning to think that the BigBufferedImage might be the right > >> solution... This is very inefficient, but honestly, the PDF is not > >> exactly > >> well formed, so if these particular files wind up being slower to render > >> b/c they have to swap image content to disk, I think that is OK... > > > > Maybe... the good thing is that we know when the image will be "too > > big". The license is OK: > > > > https://www.apache.org/legal/resolved.html > > > > But I'm still wondering whether we have a memory leak. If we have, > > then it should be fixed. > > > > Tilman > > > > > >> > >> - K > >> > >> > >> Kevin Day > >> > >> *trumpet**p| *480.961.6003 x1002 > >> *e| *ke...@trumpetinc.com > >> www.trumpetinc.com <http://trumpetinc.com/> > >> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog > >> <http://trumpetinc.com/blog/>| Twitter <https://twitter.com/trumpetinc> > >> > >> Proud to be Great Place To Work > >> <https://www.greatplacetowork.com/certified-company/7012667> Certified > >> 2020-2021 > >> > >> > >> On Fri, Nov 5, 2021 at 12:47 PM Tilman Hausherr <thaush...@t-online.de> > >> wrote: > >> > >>> Am 05.11.2021 um 20:38 schrieb Kevin Day: > >>>> I do have a bit of experience with memory leaks (side effect of > >>>> writing > >>>> high performance java code, unfortunately!) > >>>> > >>>> > >>>> I am pretty sure that this is not a memory leak, though. If this > >>>> was a > >>>> memory leak, we would see calls happening multiple times before > >>>> failure. > >>>> > >>>> The failure happens on the very first call to create the buffered > >>>> image: > >>>> > >>>> BufferedImage image = new BufferedImage(rasterWidth, > >>> rasterHeight, > >>>> BufferedImage.TYPE_INT_ARGB); > >>>> > >>>> Nothing is even getting put into the weakcache. > >>>> > >>> That is done in TilingPaintFactory. The cache was needed for some files > >>> that have used the pattern several times, to avoid recreating the image > >>> for the TexturePaint. > >>> > >>> > >>>> I think the core issue is that the raster data for the image is > >>>> just too > >>>> big to store on the heap (without the heap being massive). It's > >>>> 13152x13152x4 bytes (there is an alpha channel on this as well) - > >>>> that's > >>>> almost 700MB - just for this one pattern. > >>>> > >>>> > >>>> Tilman wrote: "That > >>>> pattern has an XStep and YStep of 23438 although the image is 2148 x > >>>> 440. Because of a matrix scale of 0.0673396 the image pattern size is > >>>> 1578 x 1578 at 72 dpi. So at 1200 dpi the size would be about 26300 x > >>>> 26300" > >>>> > >>>> I am fairly familiar with PDF (I contributed the parsing library to > >>>> iText > >>>> back in the day), but I'm not very familiar with rendering tile > >>>> patterns, > >>>> so the above is not intuitive to me (yet). Is there some chance that > >>>> PdfBox is doing a lot more work than is necessary for this particular > >>> PDF? > >>>> Like is the matrix scale just absurdly bad in this file? For an image > >>> that > >>>> is originally 2148x440, it seems like requiring a 700MB raster > >>>> should not > >>>> be necessary - but I do NOT have nearly enough experience with this > >>>> area > >>> of > >>>> PDF... > >>> The ridiculous thing isn't the matrix, it's the XStep and YStep. This > >>> goes well outside of the page. The 0.067 matrix scaling makes it > >>> less bad. > >>> > >>> A weakness in PDFBox is that we don't readjust the tile image, or > >>> use an > >>> alternative painting method that doesn't use TexturePaint when the > >>> image > >>> would be painted only once. > >>> > >>> Tilman > >>> > >>> > >>> > >>>> > >>>> Thanks, > >>>> > >>>> - Kevin > >>>> > >>>> > >>>> Kevin Day > >>>> > >>>> *trumpet**p| *480.961.6003 x1002 > >>>> *e| *ke...@trumpetinc.com > >>>> www.trumpetinc.com <http://trumpetinc.com/> > >>>> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet > Blog > >>>> <http://trumpetinc.com/blog/>| Twitter > >>>> <https://twitter.com/trumpetinc> > >>>> > >>>> Proud to be Great Place To Work > >>>> <https://www.greatplacetowork.com/certified-company/7012667> > Certified > >>>> 2020-2021 > >>>> > >>>> > >>>> On Fri, Nov 5, 2021 at 11:27 AM Tilman Hausherr > >>>> <thaush...@t-online.de> > >>>> wrote: > >>>> > >>>>> If you're experienced with memory leaks then it would be nice if you > >>>>> could search. > >>>>> > >>>>> Things I tried that didn't help: > >>>>> - calling graphics.setPaint(null) after operations (to "lose" > >>>>> TilingPaint objects) > >>>>> - disabling the (weak) cache of TilingPaint objects > >>>>> - adding finalize to see if the TilingPaint class gets finalized > >>>>> (yes) > >>>>> - adding finalize to see if the HighResolutionImageIcon class gets > >>>>> finalized (yes) > >>>>> > >>>>> Tilman > >>>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >>> For additional commands, e-mail: users-h...@pdfbox.apache.org > >>> > >>> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > > For additional commands, e-mail: users-h...@pdfbox.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > >