PS - PdfBox already has the sun.* stuff in it - see IOUtils::unmapper()

Kevin Day

*trumpet**p| *480.961.6003 x1002
*e| *ke...@trumpetinc.com
www.trumpetinc.com <http://trumpetinc.com/>
LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog
<http://trumpetinc.com/blog/>| Twitter  <https://twitter.com/trumpetinc>

Proud to be Great Place To Work
<https://www.greatplacetowork.com/certified-company/7012667> certified
since 2019


On Sun, Nov 7, 2021 at 8:09 AM Kevin Day <ke...@trumpetinc.com> wrote:

> ok - I think that maybe there are two things going on, then.  The first is
> that we are allocating a single buffer that is bigger than the default heap
> (700MB > 500MB). I really think that the only two solutions are to move
> this off the heap or reduce the size of the allocation.
>
> Then the second issue is maybe a memory leak (I have not been able to see
> this because of the first problem unless I set my max heap to a huge number
> - and that makes memory leak analysis very difficult).  I am going to clone
> the PdfBox git repo now and swap that class in then do a little bit of
> profiling with jvisualvm to see if I can help find the memory leak.
>
> Question:  The Git repo for PdfBox says that it is a mirror.  If I do wind
> up creating a pull request against this repo, will you be able to accept it?
>
>
> FYI - the use of sun.* classes is just to work around a long standing
> native resource leak bug in the Windows MappedByteBuffer implementation -
> On Windows, MappedByteBuffer does not release the allocated native resource
> until finalize() is called, and because the MappedByteBuffer itself does
> not take up much heap space, it can stay on the heap for quite a long time
> even after all references are gone.  So the sun.* reference is just to
> force the release of those native resources.  I used the same technique
> when I created the high throughput IO layer for iText.  Instead of
> referencing the package directly, we can use reflection (and only use it
> for winXXX platform) - this shouldn't be a big problem.
>
> K
>
> Kevin Day
>
> *trumpet**p| *480.961.6003 x1002
> *e| *ke...@trumpetinc.com
> www.trumpetinc.com <http://trumpetinc.com/>
> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog
> <http://trumpetinc.com/blog/>| Twitter  <https://twitter.com/trumpetinc>
>
> Proud to be Great Place To Work
> <https://www.greatplacetowork.com/certified-company/7012667> certified
> since 2019
>
>
> On Sun, Nov 7, 2021 at 7:53 AM Tilman Hausherr <thaush...@t-online.de>
> wrote:
>
>> I tried it and it works pretty fast. The memory leak is still there but
>> it comes later.
>>
>> Other drawbacks arethat it doesn't clean up after itself fast enough,
>> and it uses a sun.* class.
>>
>> Tilman
>>
>> Am 07.11.2021 um 14:31 schrieb Tilman Hausherr:
>> > Am 05.11.2021 um 22:09 schrieb Kevin Day:
>> >> ok - so should we be clamping the xstep in some way?  Or at this
>> >> depth of
>> >> the algorithm do we not have enough context to actually tell that it
>> >> will
>> >> be outside the page/clipping region?
>> >
>> > Yes + yes, that is the problem. The context would have to include the
>> > region but also the current transformation matrix. Which could change
>> > despite using the same pattern. So it's tricky.
>> >
>> >>
>> >>
>> >> I'm beginning to think that the BigBufferedImage might be the right
>> >> solution...  This is very inefficient, but honestly, the PDF is not
>> >> exactly
>> >> well formed, so if these particular files wind up being slower to
>> render
>> >> b/c they have to swap image content to disk, I think that is OK...
>> >
>> > Maybe... the good thing is that we know when the image will be "too
>> > big". The license is OK:
>> >
>> > https://www.apache.org/legal/resolved.html
>> >
>> > But I'm still wondering whether we have a memory leak. If we have,
>> > then it should be fixed.
>> >
>> > Tilman
>> >
>> >
>> >>
>> >> - K
>> >>
>> >>
>> >> Kevin Day
>> >>
>> >> *trumpet**p| *480.961.6003 x1002
>> >> *e| *ke...@trumpetinc.com
>> >> www.trumpetinc.com <http://trumpetinc.com/>
>> >> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog
>> >> <http://trumpetinc.com/blog/>| Twitter <https://twitter.com/trumpetinc
>> >
>> >>
>> >> Proud to be Great Place To Work
>> >> <https://www.greatplacetowork.com/certified-company/7012667> Certified
>> >> 2020-2021
>> >>
>> >>
>> >> On Fri, Nov 5, 2021 at 12:47 PM Tilman Hausherr <thaush...@t-online.de
>> >
>> >> wrote:
>> >>
>> >>> Am 05.11.2021 um 20:38 schrieb Kevin Day:
>> >>>> I do have a bit of experience with memory leaks (side effect of
>> >>>> writing
>> >>>> high performance java code, unfortunately!)
>> >>>>
>> >>>>
>> >>>> I am pretty sure that this is not a memory leak, though.  If this
>> >>>> was a
>> >>>> memory leak, we would see calls happening multiple times before
>> >>>> failure.
>> >>>>
>> >>>> The failure happens on the very first call to create the buffered
>> >>>> image:
>> >>>>
>> >>>>           BufferedImage image = new BufferedImage(rasterWidth,
>> >>> rasterHeight,
>> >>>> BufferedImage.TYPE_INT_ARGB);
>> >>>>
>> >>>> Nothing is even getting put into the weakcache.
>> >>>>
>> >>> That is done in TilingPaintFactory. The cache was needed for some
>> files
>> >>> that have used the pattern several times, to avoid recreating the
>> image
>> >>> for the TexturePaint.
>> >>>
>> >>>
>> >>>> I think the core issue is that the raster data for the image is
>> >>>> just too
>> >>>> big to store on the heap (without the heap being massive). It's
>> >>>> 13152x13152x4 bytes (there is an alpha channel on this as well) -
>> >>>> that's
>> >>>> almost 700MB - just for this one pattern.
>> >>>>
>> >>>>
>> >>>> Tilman wrote:  "That
>> >>>> pattern has an XStep and YStep of 23438 although the image is 2148 x
>> >>>> 440. Because of a matrix scale of 0.0673396 the image pattern size is
>> >>>> 1578 x 1578 at 72 dpi. So at 1200 dpi the size would be about 26300 x
>> >>>> 26300"
>> >>>>
>> >>>> I am fairly familiar with PDF (I contributed the parsing library to
>> >>>> iText
>> >>>> back in the day), but I'm not very familiar with rendering tile
>> >>>> patterns,
>> >>>> so the above is not intuitive to me (yet).  Is there some chance that
>> >>>> PdfBox is doing a lot more work than is necessary for this particular
>> >>> PDF?
>> >>>> Like is the matrix scale just absurdly bad in this file?  For an
>> image
>> >>> that
>> >>>> is originally 2148x440, it seems like requiring a 700MB raster
>> >>>> should not
>> >>>> be necessary - but I do NOT have nearly enough experience with this
>> >>>> area
>> >>> of
>> >>>> PDF...
>> >>> The ridiculous thing isn't the matrix, it's the XStep and YStep. This
>> >>> goes well outside of the page. The 0.067 matrix scaling makes it
>> >>> less bad.
>> >>>
>> >>> A weakness in PDFBox is that we don't readjust the tile image, or
>> >>> use an
>> >>> alternative painting method that doesn't use TexturePaint when the
>> >>> image
>> >>> would be painted only once.
>> >>>
>> >>> Tilman
>> >>>
>> >>>
>> >>>
>> >>>>
>> >>>> Thanks,
>> >>>>
>> >>>> - Kevin
>> >>>>
>> >>>>
>> >>>> Kevin Day
>> >>>>
>> >>>> *trumpet**p| *480.961.6003 x1002
>> >>>> *e| *ke...@trumpetinc.com
>> >>>> www.trumpetinc.com <http://trumpetinc.com/>
>> >>>> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet
>> Blog
>> >>>> <http://trumpetinc.com/blog/>| Twitter
>> >>>> <https://twitter.com/trumpetinc>
>> >>>>
>> >>>> Proud to be Great Place To Work
>> >>>> <https://www.greatplacetowork.com/certified-company/7012667>
>> Certified
>> >>>> 2020-2021
>> >>>>
>> >>>>
>> >>>> On Fri, Nov 5, 2021 at 11:27 AM Tilman Hausherr
>> >>>> <thaush...@t-online.de>
>> >>>> wrote:
>> >>>>
>> >>>>> If you're experienced with memory leaks then it would be nice if you
>> >>>>> could search.
>> >>>>>
>> >>>>> Things I tried that didn't help:
>> >>>>> - calling graphics.setPaint(null) after operations (to "lose"
>> >>>>> TilingPaint objects)
>> >>>>> - disabling the (weak) cache of TilingPaint objects
>> >>>>> - adding finalize to see if the TilingPaint class gets finalized
>> >>>>> (yes)
>> >>>>> - adding finalize to see if the HighResolutionImageIcon class gets
>> >>>>> finalized (yes)
>> >>>>>
>> >>>>> Tilman
>> >>>>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>> >>> For additional commands, e-mail: users-h...@pdfbox.apache.org
>> >>>
>> >>>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>> > For additional commands, e-mail: users-h...@pdfbox.apache.org
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: users-h...@pdfbox.apache.org
>>
>>

Reply via email to