Thank you! I guess I'm too inexperienced with PDFDebugger -- I didn't find that page background.
So my next question is, is there a way to use PDFBox to open this PDF, delete those three images (and any references to them) and then save it back out? I'm able to run a similar grep to your example and see the lines matching the large images, but I don't understand enough to know what the object IDs of the images themselves are. I guess I'm kind of hoping that if I had their IDs I could go into PDFBox and open the file and try to delete the items with those IDs? But I'm speculating here, lol. Don't know if I'd have to manually track down all usage of them and clear it in order to write a valid PDF back out. Thanks, Aaron On Mon, Jan 20, 2025 at 11:51 AM Tilman Hausherr <thaush...@t-online.de> wrote: > I looked at it with PDFDebugger... There's a background image that is > 4MG compressed which is used by both pages, likely a background. > > Then I looked at it with NOTEPAD++ and searched for /Length. This was > possible because it didn't have compressed object streams. > > There is a second large image with 5 MB. Then I ran a regular expression > and got this: > > Zeile 10698: <</BitsPerComponent 8/ColorSpace 9 0 > R/Filter/FlateDecode/Height 3234/Intent/RelativeColorimetric/*Length > 4784102*/Metadata 87 0 R/Name/X/Subtype/Image/Type/XObject/Width > 2522>>stream > Zeile 49536: <</BitsPerComponent > 8/ColorSpace/DeviceGray/DecodeParms<</BitsPerComponent 1/Colors > 1/Columns 2447>>/Filter/FlateDecode/Height > 3161/Intent/RelativeColorimetric/*Length > 5345666*/Name/X/Subtype/Image/Type/XObject/Width 2447>>stream > Zeile 103697: <</BitsPerComponent > 8/ColorSpace/DeviceGray/DecodeParms<</BitsPerComponent 1/Colors > 1/Columns 2448>>/Filter/FlateDecode/Height > 3161/Intent/RelativeColorimetric/*Length > 5509296*/Name/X/Subtype/Image/Type/XObject/Width 2448>>stream > > So you have 3 large images in total. I didn't bother researching where > they are, your PDF is very nested. I suspect that these are more > backgrounds, which contain these "dirty" lines. > > Tilman > > On 20.01.2025 16:56, Aaron Mulder wrote: > > OK this is a long shot but... have a look at this PDF: > > > > > https://media.dndbeyond.com/compendium-images/phb/downloads/DnD_2024_Character-Sheet.pdf > > > > It's 16 MB. Eyeballing the thing, it doesn't seem like there's that much > > complexity in there, though it does have a lot of background images or > > textures. > > > > Is there any way to inspect it for "what part of this is so huge?" and > > possibly cut some things out to craft a version more like 1 MB? You > know, > > if there are a few 4 MB images embedded I could just edit it to cut them > > out, or whatever -- some loss of fanciness is OK to me. > > > > I'm going to be creating a bunch of digital D&D character records with > that > > sheet and it seems like an epic waste of storage and bandwidth 😂 > > > > I looked at it in the PDF Debugger and couldn't find a way to identify > all > > the elements on the page, much less by "largest first", though I may have > > missed something. > > > > Thanks, > > Aaron > > >