There is another issue with BigBufferedImage - it does not work well on 32
bit JVMs.  The problem is that it creates a single MappedByteBuffer for the
entire size of a channel's data (for the problem file, this means that each
buffer has 172,975,104 bytes).

I think it would be better to create a DataBuffer implementation that uses
org.apache.pdfbox.io.ScratchFile and create a PdfBoxBufferedImage...

- K

Kevin Day

*trumpet**p| *480.961.6003 x1002
*e| *ke...@trumpetinc.com
www.trumpetinc.com <http://trumpetinc.com/>
LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog
<http://trumpetinc.com/blog/>| Twitter  <https://twitter.com/trumpetinc>

Proud to be Great Place To Work
<https://www.greatplacetowork.com/certified-company/7012667> certified
since 2019


On Sun, Nov 7, 2021 at 8:36 AM Tilman Hausherr <thaush...@t-online.de>
wrote:

> Am 07.11.2021 um 16:09 schrieb Kevin Day:
> > ok - I think that maybe there are two things going on, then.  The first
> is
> > that we are allocating a single buffer that is bigger than the default
> heap
> > (700MB > 500MB). I really think that the only two solutions are to move
> > this off the heap or reduce the size of the allocation.
> >
> > Then the second issue is maybe a memory leak (I have not been able to see
> > this because of the first problem unless I set my max heap to a huge
> number
> > - and that makes memory leak analysis very difficult).  I am going to
> clone
> > the PdfBox git repo now and swap that class in then do a little bit of
> > profiling with jvisualvm to see if I can help find the memory leak.
> >
> > Question:  The Git repo for PdfBox says that it is a mirror.  If I do
> wind
> > up creating a pull request against this repo, will you be able to accept
> it?
>
> Not directly but we can create a .diff / .patch file from the PR.
>
> In the meantime I ran my rendering regression checks which renders 1000
> files. The hard disk C: went full after using 260 GB :-(
>
> Tilman
>
>
> >
> >
> > FYI - the use of sun.* classes is just to work around a long standing
> > native resource leak bug in the Windows MappedByteBuffer implementation -
> > On Windows, MappedByteBuffer does not release the allocated native
> resource
> > until finalize() is called, and because the MappedByteBuffer itself does
> > not take up much heap space, it can stay on the heap for quite a long
> time
> > even after all references are gone.  So the sun.* reference is just to
> > force the release of those native resources.  I used the same technique
> > when I created the high throughput IO layer for iText.  Instead of
> > referencing the package directly, we can use reflection (and only use it
> > for winXXX platform) - this shouldn't be a big problem.
> >
> > K
> >
> > Kevin Day
> >
> > *trumpet**p| *480.961.6003 x1002
> > *e| *ke...@trumpetinc.com
> > www.trumpetinc.com <http://trumpetinc.com/>
> > LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet Blog
> > <http://trumpetinc.com/blog/>| Twitter  <https://twitter.com/trumpetinc>
> >
> > Proud to be Great Place To Work
> > <https://www.greatplacetowork.com/certified-company/7012667> certified
> > since 2019
> >
> >
> > On Sun, Nov 7, 2021 at 7:53 AM Tilman Hausherr <thaush...@t-online.de>
> > wrote:
> >
> >> I tried it and it works pretty fast. The memory leak is still there but
> >> it comes later.
> >>
> >> Other drawbacks arethat it doesn't clean up after itself fast enough,
> >> and it uses a sun.* class.
> >>
> >> Tilman
> >>
> >> Am 07.11.2021 um 14:31 schrieb Tilman Hausherr:
> >>> Am 05.11.2021 um 22:09 schrieb Kevin Day:
> >>>> ok - so should we be clamping the xstep in some way?  Or at this
> >>>> depth of
> >>>> the algorithm do we not have enough context to actually tell that it
> >>>> will
> >>>> be outside the page/clipping region?
> >>> Yes + yes, that is the problem. The context would have to include the
> >>> region but also the current transformation matrix. Which could change
> >>> despite using the same pattern. So it's tricky.
> >>>
> >>>>
> >>>> I'm beginning to think that the BigBufferedImage might be the right
> >>>> solution...  This is very inefficient, but honestly, the PDF is not
> >>>> exactly
> >>>> well formed, so if these particular files wind up being slower to
> render
> >>>> b/c they have to swap image content to disk, I think that is OK...
> >>> Maybe... the good thing is that we know when the image will be "too
> >>> big". The license is OK:
> >>>
> >>> https://www.apache.org/legal/resolved.html
> >>>
> >>> But I'm still wondering whether we have a memory leak. If we have,
> >>> then it should be fixed.
> >>>
> >>> Tilman
> >>>
> >>>
> >>>> - K
> >>>>
> >>>>
> >>>> Kevin Day
> >>>>
> >>>> *trumpet**p| *480.961.6003 x1002
> >>>> *e| *ke...@trumpetinc.com
> >>>> www.trumpetinc.com <http://trumpetinc.com/>
> >>>> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet
> Blog
> >>>> <http://trumpetinc.com/blog/>| Twitter <
> https://twitter.com/trumpetinc>
> >>>>
> >>>> Proud to be Great Place To Work
> >>>> <https://www.greatplacetowork.com/certified-company/7012667>
> Certified
> >>>> 2020-2021
> >>>>
> >>>>
> >>>> On Fri, Nov 5, 2021 at 12:47 PM Tilman Hausherr <
> thaush...@t-online.de>
> >>>> wrote:
> >>>>
> >>>>> Am 05.11.2021 um 20:38 schrieb Kevin Day:
> >>>>>> I do have a bit of experience with memory leaks (side effect of
> >>>>>> writing
> >>>>>> high performance java code, unfortunately!)
> >>>>>>
> >>>>>>
> >>>>>> I am pretty sure that this is not a memory leak, though.  If this
> >>>>>> was a
> >>>>>> memory leak, we would see calls happening multiple times before
> >>>>>> failure.
> >>>>>>
> >>>>>> The failure happens on the very first call to create the buffered
> >>>>>> image:
> >>>>>>
> >>>>>>            BufferedImage image = new BufferedImage(rasterWidth,
> >>>>> rasterHeight,
> >>>>>> BufferedImage.TYPE_INT_ARGB);
> >>>>>>
> >>>>>> Nothing is even getting put into the weakcache.
> >>>>>>
> >>>>> That is done in TilingPaintFactory. The cache was needed for some
> files
> >>>>> that have used the pattern several times, to avoid recreating the
> image
> >>>>> for the TexturePaint.
> >>>>>
> >>>>>
> >>>>>> I think the core issue is that the raster data for the image is
> >>>>>> just too
> >>>>>> big to store on the heap (without the heap being massive). It's
> >>>>>> 13152x13152x4 bytes (there is an alpha channel on this as well) -
> >>>>>> that's
> >>>>>> almost 700MB - just for this one pattern.
> >>>>>>
> >>>>>>
> >>>>>> Tilman wrote:  "That
> >>>>>> pattern has an XStep and YStep of 23438 although the image is 2148 x
> >>>>>> 440. Because of a matrix scale of 0.0673396 the image pattern size
> is
> >>>>>> 1578 x 1578 at 72 dpi. So at 1200 dpi the size would be about 26300
> x
> >>>>>> 26300"
> >>>>>>
> >>>>>> I am fairly familiar with PDF (I contributed the parsing library to
> >>>>>> iText
> >>>>>> back in the day), but I'm not very familiar with rendering tile
> >>>>>> patterns,
> >>>>>> so the above is not intuitive to me (yet).  Is there some chance
> that
> >>>>>> PdfBox is doing a lot more work than is necessary for this
> particular
> >>>>> PDF?
> >>>>>> Like is the matrix scale just absurdly bad in this file?  For an
> image
> >>>>> that
> >>>>>> is originally 2148x440, it seems like requiring a 700MB raster
> >>>>>> should not
> >>>>>> be necessary - but I do NOT have nearly enough experience with this
> >>>>>> area
> >>>>> of
> >>>>>> PDF...
> >>>>> The ridiculous thing isn't the matrix, it's the XStep and YStep. This
> >>>>> goes well outside of the page. The 0.067 matrix scaling makes it
> >>>>> less bad.
> >>>>>
> >>>>> A weakness in PDFBox is that we don't readjust the tile image, or
> >>>>> use an
> >>>>> alternative painting method that doesn't use TexturePaint when the
> >>>>> image
> >>>>> would be painted only once.
> >>>>>
> >>>>> Tilman
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> - Kevin
> >>>>>>
> >>>>>>
> >>>>>> Kevin Day
> >>>>>>
> >>>>>> *trumpet**p| *480.961.6003 x1002
> >>>>>> *e| *ke...@trumpetinc.com
> >>>>>> www.trumpetinc.com <http://trumpetinc.com/>
> >>>>>> LinkedIn <https://www.linkedin.com/company/trumpet-inc.>| Trumpet
> >> Blog
> >>>>>> <http://trumpetinc.com/blog/>| Twitter
> >>>>>> <https://twitter.com/trumpetinc>
> >>>>>>
> >>>>>> Proud to be Great Place To Work
> >>>>>> <https://www.greatplacetowork.com/certified-company/7012667>
> >> Certified
> >>>>>> 2020-2021
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Nov 5, 2021 at 11:27 AM Tilman Hausherr
> >>>>>> <thaush...@t-online.de>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> If you're experienced with memory leaks then it would be nice if
> you
> >>>>>>> could search.
> >>>>>>>
> >>>>>>> Things I tried that didn't help:
> >>>>>>> - calling graphics.setPaint(null) after operations (to "lose"
> >>>>>>> TilingPaint objects)
> >>>>>>> - disabling the (weak) cache of TilingPaint objects
> >>>>>>> - adding finalize to see if the TilingPaint class gets finalized
> >>>>>>> (yes)
> >>>>>>> - adding finalize to see if the HighResolutionImageIcon class gets
> >>>>>>> finalized (yes)
> >>>>>>>
> >>>>>>> Tilman
> >>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> >>>>> For additional commands, e-mail: users-h...@pdfbox.apache.org
> >>>>>
> >>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> >>> For additional commands, e-mail: users-h...@pdfbox.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> >> For additional commands, e-mail: users-h...@pdfbox.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>

Reply via email to