PS - PdfBox already has the sun.* stuff in it - see IOUtils::unmapper() Kevin Day
*trumpet**p| *480.961.6003 x1002 *e| *ke...@trumpetinc.com www.trumpetinc.com <http://trumpetinc.com/> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog <http://trumpetinc.com/blog/>| Twitter <https://twitter.com/trumpetinc> Proud to be Great Place To Work <https://www.greatplacetowork.com/certified-company/7012667> certified since 2019 On Sun, Nov 7, 2021 at 8:09 AM Kevin Day <ke...@trumpetinc.com> wrote: > ok - I think that maybe there are two things going on, then. The first is > that we are allocating a single buffer that is bigger than the default heap > (700MB > 500MB). I really think that the only two solutions are to move > this off the heap or reduce the size of the allocation. > > Then the second issue is maybe a memory leak (I have not been able to see > this because of the first problem unless I set my max heap to a huge number > - and that makes memory leak analysis very difficult). I am going to clone > the PdfBox git repo now and swap that class in then do a little bit of > profiling with jvisualvm to see if I can help find the memory leak. > > Question: The Git repo for PdfBox says that it is a mirror. If I do wind > up creating a pull request against this repo, will you be able to accept it? > > > FYI - the use of sun.* classes is just to work around a long standing > native resource leak bug in the Windows MappedByteBuffer implementation - > On Windows, MappedByteBuffer does not release the allocated native resource > until finalize() is called, and because the MappedByteBuffer itself does > not take up much heap space, it can stay on the heap for quite a long time > even after all references are gone. So the sun.* reference is just to > force the release of those native resources. I used the same technique > when I created the high throughput IO layer for iText. Instead of > referencing the package directly, we can use reflection (and only use it > for winXXX platform) - this shouldn't be a big problem. > > K > > Kevin Day > > *trumpet**p| *480.961.6003 x1002 > *e| *ke...@trumpetinc.com > www.trumpetinc.com <http://trumpetinc.com/> > LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog > <http://trumpetinc.com/blog/>| Twitter <https://twitter.com/trumpetinc> > > Proud to be Great Place To Work > <https://www.greatplacetowork.com/certified-company/7012667> certified > since 2019 > > > On Sun, Nov 7, 2021 at 7:53 AM Tilman Hausherr <thaush...@t-online.de> > wrote: > >> I tried it and it works pretty fast. The memory leak is still there but >> it comes later. >> >> Other drawbacks arethat it doesn't clean up after itself fast enough, >> and it uses a sun.* class. >> >> Tilman >> >> Am 07.11.2021 um 14:31 schrieb Tilman Hausherr: >> > Am 05.11.2021 um 22:09 schrieb Kevin Day: >> >> ok - so should we be clamping the xstep in some way? Or at this >> >> depth of >> >> the algorithm do we not have enough context to actually tell that it >> >> will >> >> be outside the page/clipping region? >> > >> > Yes + yes, that is the problem. The context would have to include the >> > region but also the current transformation matrix. Which could change >> > despite using the same pattern. So it's tricky. >> > >> >> >> >> >> >> I'm beginning to think that the BigBufferedImage might be the right >> >> solution... This is very inefficient, but honestly, the PDF is not >> >> exactly >> >> well formed, so if these particular files wind up being slower to >> render >> >> b/c they have to swap image content to disk, I think that is OK... >> > >> > Maybe... the good thing is that we know when the image will be "too >> > big". The license is OK: >> > >> > https://www.apache.org/legal/resolved.html >> > >> > But I'm still wondering whether we have a memory leak. If we have, >> > then it should be fixed. >> > >> > Tilman >> > >> > >> >> >> >> - K >> >> >> >> >> >> Kevin Day >> >> >> >> *trumpet**p| *480.961.6003 x1002 >> >> *e| *ke...@trumpetinc.com >> >> www.trumpetinc.com <http://trumpetinc.com/> >> >> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog >> >> <http://trumpetinc.com/blog/>| Twitter <https://twitter.com/trumpetinc >> > >> >> >> >> Proud to be Great Place To Work >> >> <https://www.greatplacetowork.com/certified-company/7012667> Certified >> >> 2020-2021 >> >> >> >> >> >> On Fri, Nov 5, 2021 at 12:47 PM Tilman Hausherr <thaush...@t-online.de >> > >> >> wrote: >> >> >> >>> Am 05.11.2021 um 20:38 schrieb Kevin Day: >> >>>> I do have a bit of experience with memory leaks (side effect of >> >>>> writing >> >>>> high performance java code, unfortunately!) >> >>>> >> >>>> >> >>>> I am pretty sure that this is not a memory leak, though. If this >> >>>> was a >> >>>> memory leak, we would see calls happening multiple times before >> >>>> failure. >> >>>> >> >>>> The failure happens on the very first call to create the buffered >> >>>> image: >> >>>> >> >>>> BufferedImage image = new BufferedImage(rasterWidth, >> >>> rasterHeight, >> >>>> BufferedImage.TYPE_INT_ARGB); >> >>>> >> >>>> Nothing is even getting put into the weakcache. >> >>>> >> >>> That is done in TilingPaintFactory. The cache was needed for some >> files >> >>> that have used the pattern several times, to avoid recreating the >> image >> >>> for the TexturePaint. >> >>> >> >>> >> >>>> I think the core issue is that the raster data for the image is >> >>>> just too >> >>>> big to store on the heap (without the heap being massive). It's >> >>>> 13152x13152x4 bytes (there is an alpha channel on this as well) - >> >>>> that's >> >>>> almost 700MB - just for this one pattern. >> >>>> >> >>>> >> >>>> Tilman wrote: "That >> >>>> pattern has an XStep and YStep of 23438 although the image is 2148 x >> >>>> 440. Because of a matrix scale of 0.0673396 the image pattern size is >> >>>> 1578 x 1578 at 72 dpi. So at 1200 dpi the size would be about 26300 x >> >>>> 26300" >> >>>> >> >>>> I am fairly familiar with PDF (I contributed the parsing library to >> >>>> iText >> >>>> back in the day), but I'm not very familiar with rendering tile >> >>>> patterns, >> >>>> so the above is not intuitive to me (yet). Is there some chance that >> >>>> PdfBox is doing a lot more work than is necessary for this particular >> >>> PDF? >> >>>> Like is the matrix scale just absurdly bad in this file? For an >> image >> >>> that >> >>>> is originally 2148x440, it seems like requiring a 700MB raster >> >>>> should not >> >>>> be necessary - but I do NOT have nearly enough experience with this >> >>>> area >> >>> of >> >>>> PDF... >> >>> The ridiculous thing isn't the matrix, it's the XStep and YStep. This >> >>> goes well outside of the page. The 0.067 matrix scaling makes it >> >>> less bad. >> >>> >> >>> A weakness in PDFBox is that we don't readjust the tile image, or >> >>> use an >> >>> alternative painting method that doesn't use TexturePaint when the >> >>> image >> >>> would be painted only once. >> >>> >> >>> Tilman >> >>> >> >>> >> >>> >> >>>> >> >>>> Thanks, >> >>>> >> >>>> - Kevin >> >>>> >> >>>> >> >>>> Kevin Day >> >>>> >> >>>> *trumpet**p| *480.961.6003 x1002 >> >>>> *e| *ke...@trumpetinc.com >> >>>> www.trumpetinc.com <http://trumpetinc.com/> >> >>>> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet >> Blog >> >>>> <http://trumpetinc.com/blog/>| Twitter >> >>>> <https://twitter.com/trumpetinc> >> >>>> >> >>>> Proud to be Great Place To Work >> >>>> <https://www.greatplacetowork.com/certified-company/7012667> >> Certified >> >>>> 2020-2021 >> >>>> >> >>>> >> >>>> On Fri, Nov 5, 2021 at 11:27 AM Tilman Hausherr >> >>>> <thaush...@t-online.de> >> >>>> wrote: >> >>>> >> >>>>> If you're experienced with memory leaks then it would be nice if you >> >>>>> could search. >> >>>>> >> >>>>> Things I tried that didn't help: >> >>>>> - calling graphics.setPaint(null) after operations (to "lose" >> >>>>> TilingPaint objects) >> >>>>> - disabling the (weak) cache of TilingPaint objects >> >>>>> - adding finalize to see if the TilingPaint class gets finalized >> >>>>> (yes) >> >>>>> - adding finalize to see if the HighResolutionImageIcon class gets >> >>>>> finalized (yes) >> >>>>> >> >>>>> Tilman >> >>>>> >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org >> >>> For additional commands, e-mail: users-h...@pdfbox.apache.org >> >>> >> >>> >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org >> > For additional commands, e-mail: users-h...@pdfbox.apache.org >> > >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org >> For additional commands, e-mail: users-h...@pdfbox.apache.org >> >>