Re: Trimming an overly large PDF?

Aaron Mulder Mon, 20 Jan 2025 10:56:01 -0800

Wow, thank you!!

I got it.  16 MB to 336 kB -- that's a pretty good weight-loss treatment!


Thanks,
       Aaron

On Mon, Jan 20, 2025 at 1:29 PM Tilman Hausherr <thaush...@t-online.de>
wrote:

> Hi,
>
> It's tricky... The easiest would probably be to uncompress the whole
> file and remove all references. One has to remove it from the resources,
> and also from the content streams, and find the correct one. It takes
> more time to explain it than to do it. Here it is:
>
>
> https://drive.google.com/file/d/1AmehgqCs3dkz_zR6TCrxNPOxpRTz09sb/view?usp=sharing
>
> Please tell when you got it, so I can delete it.
>
> Tilman
>
> On 20.01.2025 18:10, Aaron Mulder wrote:
> > Thank you!  I guess I'm too inexperienced with PDFDebugger -- I didn't
> find
> > that page background.
> >
> > So my next question is, is there a way to use PDFBox to open this PDF,
> > delete those three images (and any references to them) and then save it
> > back out?
> >
> > I'm able to run a similar grep to your example and see the lines matching
> > the large images, but I don't understand enough to know what the object
> IDs
> > of the images themselves are.  I guess I'm kind of hoping that if I had
> > their IDs I could go into PDFBox and open the file and try to delete the
> > items with those IDs?  But I'm speculating here, lol.  Don't know if I'd
> > have to manually track down all usage of them and clear it in order to
> > write a valid PDF back out.
> >
> > Thanks,
> >         Aaron
> >
> > On Mon, Jan 20, 2025 at 11:51 AM Tilman Hausherr <thaush...@t-online.de>
> > wrote:
> >
> >> I looked at it with PDFDebugger... There's a background image that is
> >> 4MG compressed which is used by both pages, likely a background.
> >>
> >> Then I looked at it with NOTEPAD++ and searched for /Length. This was
> >> possible because it didn't have compressed object streams.
> >>
> >> There is a second large image with 5 MB. Then I ran a regular expression
> >> and got this:
> >>
> >> Zeile  10698: <</BitsPerComponent 8/ColorSpace 9 0
> >> R/Filter/FlateDecode/Height 3234/Intent/RelativeColorimetric/*Length
> >> 4784102*/Metadata 87 0 R/Name/X/Subtype/Image/Type/XObject/Width
> >> 2522>>stream
> >>       Zeile  49536: <</BitsPerComponent
> >> 8/ColorSpace/DeviceGray/DecodeParms<</BitsPerComponent 1/Colors
> >> 1/Columns 2447>>/Filter/FlateDecode/Height
> >> 3161/Intent/RelativeColorimetric/*Length
> >> 5345666*/Name/X/Subtype/Image/Type/XObject/Width 2447>>stream
> >>       Zeile 103697: <</BitsPerComponent
> >> 8/ColorSpace/DeviceGray/DecodeParms<</BitsPerComponent 1/Colors
> >> 1/Columns 2448>>/Filter/FlateDecode/Height
> >> 3161/Intent/RelativeColorimetric/*Length
> >> 5509296*/Name/X/Subtype/Image/Type/XObject/Width 2448>>stream
> >>
> >> So you have 3 large images in total. I didn't bother researching where
> >> they are, your PDF is very nested. I suspect that these are more
> >> backgrounds, which contain these "dirty" lines.
> >>
> >> Tilman
> >>
> >> On 20.01.2025 16:56, Aaron Mulder wrote:
> >>> OK this is a long shot but... have a look at this PDF:
> >>>
> >>>
> >>
> https://media.dndbeyond.com/compendium-images/phb/downloads/DnD_2024_Character-Sheet.pdf
> >>> It's 16 MB.  Eyeballing the thing, it doesn't seem like there's that
> much
> >>> complexity in there, though it does have a lot of background images or
> >>> textures.
> >>>
> >>> Is there any way to inspect it for "what part of this is so huge?" and
> >>> possibly cut some things out to craft a version more like 1 MB?  You
> >> know,
> >>> if there are a few 4 MB images embedded I could just edit it to cut
> them
> >>> out, or whatever -- some loss of fanciness is OK to me.
> >>>
> >>> I'm going to be creating a bunch of digital D&D character records with
> >> that
> >>> sheet and it seems like an epic waste of storage and bandwidth 😂
> >>>
> >>> I looked at it in the PDF Debugger and couldn't find a way to identify
> >> all
> >>> the elements on the page, much less by "largest first", though I may
> have
> >>> missed something.
> >>>
> >>> Thanks,
> >>>         Aaron
> >>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>

Re: Trimming an overly large PDF?

Reply via email to