OK, I've Googled this one till my brain hurts and got nothing... time to seek the higher wisdom.

I get large PDF files from publishers to index, which I do by running them through a few bash scripts and then working with the printed output. I have found a way to do everything via bash, but lately the file sizes are getting bigger and bigger (the latest was over 500Mb!) and it takes forever to open and print these -- not to mention paging through them if I need to find something.

The images are of no use to me, so an easy way to compress the files would be to eliminate the images, but as far as I can tell there is no simple way to remove all the images at once from a PDF file, while keeping the text and page layout. Have I missed something obvious, or is this really the case? If so, [insert profane expression of incredulity here]!

The second-best option is to reduce the quality of the images to a bare minimum, but so far the only way I can find to do this is to use a Windows system, open the file in Adobe Acrobat, go to the Print dialog, change the settings and print the whole thing to another PDF file with minimal image quality. It's a pain and it takes forever.

Any ideas? There are various suggestions on the web about using ghostscript, imagemagick, ps2ps and so on but all they seem to do is make the resulting file larger instead of smaller.

I'm doing this quite often, so a bash script would be useful. I can also probably make sense of Python, but anything beyond that might be a stretch.

Thanks in advance,

Jon.
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to