Wow, thank you!! I got it. 16 MB to 336 kB -- that's a pretty good weight-loss treatment!
Thanks, Aaron On Mon, Jan 20, 2025 at 1:29 PM Tilman Hausherr <thaush...@t-online.de> wrote: > Hi, > > It's tricky... The easiest would probably be to uncompress the whole > file and remove all references. One has to remove it from the resources, > and also from the content streams, and find the correct one. It takes > more time to explain it than to do it. Here it is: > > > https://drive.google.com/file/d/1AmehgqCs3dkz_zR6TCrxNPOxpRTz09sb/view?usp=sharing > > Please tell when you got it, so I can delete it. > > Tilman > > On 20.01.2025 18:10, Aaron Mulder wrote: > > Thank you! I guess I'm too inexperienced with PDFDebugger -- I didn't > find > > that page background. > > > > So my next question is, is there a way to use PDFBox to open this PDF, > > delete those three images (and any references to them) and then save it > > back out? > > > > I'm able to run a similar grep to your example and see the lines matching > > the large images, but I don't understand enough to know what the object > IDs > > of the images themselves are. I guess I'm kind of hoping that if I had > > their IDs I could go into PDFBox and open the file and try to delete the > > items with those IDs? But I'm speculating here, lol. Don't know if I'd > > have to manually track down all usage of them and clear it in order to > > write a valid PDF back out. > > > > Thanks, > > Aaron > > > > On Mon, Jan 20, 2025 at 11:51 AM Tilman Hausherr <thaush...@t-online.de> > > wrote: > > > >> I looked at it with PDFDebugger... There's a background image that is > >> 4MG compressed which is used by both pages, likely a background. > >> > >> Then I looked at it with NOTEPAD++ and searched for /Length. This was > >> possible because it didn't have compressed object streams. > >> > >> There is a second large image with 5 MB. Then I ran a regular expression > >> and got this: > >> > >> Zeile 10698: <</BitsPerComponent 8/ColorSpace 9 0 > >> R/Filter/FlateDecode/Height 3234/Intent/RelativeColorimetric/*Length > >> 4784102*/Metadata 87 0 R/Name/X/Subtype/Image/Type/XObject/Width > >> 2522>>stream > >> Zeile 49536: <</BitsPerComponent > >> 8/ColorSpace/DeviceGray/DecodeParms<</BitsPerComponent 1/Colors > >> 1/Columns 2447>>/Filter/FlateDecode/Height > >> 3161/Intent/RelativeColorimetric/*Length > >> 5345666*/Name/X/Subtype/Image/Type/XObject/Width 2447>>stream > >> Zeile 103697: <</BitsPerComponent > >> 8/ColorSpace/DeviceGray/DecodeParms<</BitsPerComponent 1/Colors > >> 1/Columns 2448>>/Filter/FlateDecode/Height > >> 3161/Intent/RelativeColorimetric/*Length > >> 5509296*/Name/X/Subtype/Image/Type/XObject/Width 2448>>stream > >> > >> So you have 3 large images in total. I didn't bother researching where > >> they are, your PDF is very nested. I suspect that these are more > >> backgrounds, which contain these "dirty" lines. > >> > >> Tilman > >> > >> On 20.01.2025 16:56, Aaron Mulder wrote: > >>> OK this is a long shot but... have a look at this PDF: > >>> > >>> > >> > https://media.dndbeyond.com/compendium-images/phb/downloads/DnD_2024_Character-Sheet.pdf > >>> It's 16 MB. Eyeballing the thing, it doesn't seem like there's that > much > >>> complexity in there, though it does have a lot of background images or > >>> textures. > >>> > >>> Is there any way to inspect it for "what part of this is so huge?" and > >>> possibly cut some things out to craft a version more like 1 MB? You > >> know, > >>> if there are a few 4 MB images embedded I could just edit it to cut > them > >>> out, or whatever -- some loss of fanciness is OK to me. > >>> > >>> I'm going to be creating a bunch of digital D&D character records with > >> that > >>> sheet and it seems like an epic waste of storage and bandwidth 😂 > >>> > >>> I looked at it in the PDF Debugger and couldn't find a way to identify > >> all > >>> the elements on the page, much less by "largest first", though I may > have > >>> missed something. > >>> > >>> Thanks, > >>> Aaron > >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > >