You can try using pdfimages with -j parameter, which will (if the image is stored with jpeg compression) save them as JPEG, thus avoiding recompression, which will increase the size.
Or you can try writing custom script for pdfedit that will look in the content stream. Based on how does the watermark look in the content stream it may be anything between easy and almost impossible (depending how hard is for the script to distinguis the watermark images between the images you want to keep :) But from the dump of the stream it looks the watermark have always the same size, so it should be quite easy. Can you send me one of these documents (not to list but to my email), so I may have a look at it without downloading the whole archive? Martin Petricek On Sat, 23 Jul 2011 09:45:27 +0200, Federico Leva (Nemo) wrote: > Hello, > you might have heard about > > <http://arstechnica.com/tech-policy/news/2011/07/swartz-supporter-dumps-18592-jstor-docs-on-the-pirate-bay.ars> > We're now going to upload those ~19000 PDFs to the Internet Archive, > but > we need to remove a watermark. Could you please give me a suggestion > about how to do it? Sadly I don't know anything about PDF > manipulation. > We tried pdfimages, which output a .pbms per page plus a .ppm (the > footer/watermark); using ImageMagick to recombine pages in a PDF > compressed with LZM produced a PDF almost 3 times as big as the > original > one, so I think it's better to edit the original PDF without > converting > it to other raster formats. > The PDF looks like this: http://p.defau.lt/?8I_tQEf0Q2SZpi9CJx6I8A > Apparently, we need to remove this image: > /GxMWCL: 18 0 R, 187 x 248 > Which is like this in other PDFs: > http://p.defau.lt/?I1lqfJPL8ociEfOpvTfPaA > How can I do it? > Thank you, > Federico > > > ------------------------------------------------------------------------------ > Storage Efficiency Calculator > This modeling tool is based on patent-pending intellectual property > that > has been used successfully in hundreds of IBM storage optimization > engage- > ments, worldwide. Store less, Store more with what you own, Move > data to > the right place. Try It Now! > http://www.accelacomm.com/jaw/sfnl/114/51427378/ > _______________________________________________ > Pdfedit-support mailing list > Pdfedit-support@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pdfedit-support ------------------------------------------------------------------------------ Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey _______________________________________________ Pdfedit-support mailing list Pdfedit-support@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pdfedit-support