Re: [CODE4LIB] PDF Compression
If you didn't optimize images before making the PDF, you can go into preflight and do things like go to grayscale, go to black and white, etc. And you can select which pages to do it on, so you knock down the resolution and color depth for text, but leave them high for images. OCR will store text on a separate layer, so if you flatten layers run OCR later. Also, in Acrobat Pro 9, if you do OCR as a batch processing function, then your staff time is nothing. You just leave the computer churning away for days or weeks and it can do all PDFs in folders and subfolders. So, start it on files before a long weekend, and come back to the project when you back log is processed. By the way, for long term archival purposes, you are better off loosing some quality in the PDFs, as a trade off for manageable filesize. If you compress, then uncompressing at a later date may be a problem. Uncompressing in the future should be a primary factor in selecting a compression program. -Wilhelmina Randtke On Oct 24, 2012 12:09 PM, danielle plumer dcplu...@gmail.com wrote: As you probably know, you can compress PDFs by compressing or flattening the layers (most useful for born-digital materials, such as artwork) or by applying a compression algorithm to the underlying images for PDFs assembled from digitized images, which seems to be what you're doing. Reducing the image size (pixels) and bit depth prior to assembling images in a PDF (i.e., don't start with your 800ppi TIFF master) can have a dramatic difference on the total size of the PDF. Beyond that, lossless and lossy compression algorithms can reduce the size of the underlying image files, with different techniques working well on different types of images. IrfanView and Ghostscript can help with this. LZW is one of the more common lossless compression algorithms for TIFF images. JPEG2000 also offers good lossless compression. In addition to LuraTech, there's at least one other proprietary PDF compression system, developed by SAFER Inc. (http://www.saferinc.com/). Based on a conversation with someone from the company about 18 months ago, they use algorithms that do automatic edge detection and background detection, applying compression non-uniformly to regions that appear to contain little information. At the time of this conversation, they weren't able to give me any white papers or peer-reviewed articles describing the algorithms used, which made me hesitant about recommending the system for anything remotely archival, though they claimed it was lossless. For use copies, though, the software does work very well, and file size reduction is dramatic. I don't know anything about pricing. LuraTech may use something similar in their Mixed Raster Content (MRC) or layered compression. As far as I know, IrfanView and ghostscript don't include algorithms to do anything similar. Danielle Cunniff Plumer dcplumer associates www.dcplumer.com -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nathan Tallman Sent: Wednesday, October 24, 2012 10:29 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] PDF Compression Can anyone recommend some good PDF compression software? Preferable open-source or low-cost. We're scanning archival collections and the PDFs can be quite large for a single folder. The folder may be thick or thin, and contain a mix of text and images. We've fiddled with various Acrobat settings for getting the file size down, but we haven't found a good balance between quality and file size. (Plus, these need to be OCR'ed; so far we've been doing that in Acrobat.) We were looking at LuraTech PDF Compressor, but the cost for an enterprise license is pretty high. It did do an excellent job though. Thanks, Nathan
[CODE4LIB] OCR news (was Re: [CODE4LIB] PDF Compression)
On a vaguely related note: I was talking to some people at the ABBYY booth in the exhibit hall for KMWorld/Enterprise Search Summit/Taxonomy Bootcamp, and asked whether the Linux version of their software was not going to be updated. Apparently it will be skipping a major, and will be syncing with the windows server version some time in Q1/2013. I think enough people hated having to keep a windows server in the loop (i.e. everybody not running sharepoint) that it was worth the effort. Since their SDK provides the OCR engine for a lot of other products, this may have ripple on effects for other products. Simon On Sun, Oct 28, 2012 at 2:10 AM, Wilhelmina Randtke rand...@gmail.comwrote: OCR will store text on a separate layer, so if you flatten layers run OCR later. Also, in Acrobat Pro 9, if you do OCR as a batch processing function, then your staff time is nothing.
Re: [CODE4LIB] PDF Compression
Have you looked into Irfanview's [ww.irfanview.com] batch conversion settings and plugins? Might be something there that is useful. Cheers, Paul +-+-+-+-+-+-+-+-+-+-+-+-+ Paul R Butler Assistant Systems Librarian Simpson Library University of Mary Washington 1801 College Avenue Fredericksburg, VA 22401 540.654.1756 libraries.umw.edu Sent from the mighty Dell Vostro 230. -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nathan Tallman Sent: Wednesday, October 24, 2012 10:29 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] PDF Compression Can anyone recommend some good PDF compression software? Preferable open-source or low-cost. We're scanning archival collections and the PDFs can be quite large for a single folder. The folder may be thick or thin, and contain a mix of text and images. We've fiddled with various Acrobat settings for getting the file size down, but we haven't found a good balance between quality and file size. (Plus, these need to be OCR'ed; so far we've been doing that in Acrobat.) We were looking at LuraTech PDF Compressor, but the cost for an enterprise license is pretty high. It did do an excellent job though. Thanks, Nathan
Re: [CODE4LIB] PDF Compression
Have you tried ghostscript? It should be available for any *nix-like OS or Windows [1]. Cheers, Bridger [1] http://www.ghostscript.com/download/ On Wed, Oct 24, 2012 at 10:59 AM, Paul Butler (pbutler3) pbutl...@umw.eduwrote: Have you looked into Irfanview's [ww.irfanview.com] batch conversion settings and plugins? Might be something there that is useful. Cheers, Paul +-+-+-+-+-+-+-+-+-+-+-+-+ Paul R Butler Assistant Systems Librarian Simpson Library University of Mary Washington 1801 College Avenue Fredericksburg, VA 22401 540.654.1756 libraries.umw.edu Sent from the mighty Dell Vostro 230. -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nathan Tallman Sent: Wednesday, October 24, 2012 10:29 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] PDF Compression Can anyone recommend some good PDF compression software? Preferable open-source or low-cost. We're scanning archival collections and the PDFs can be quite large for a single folder. The folder may be thick or thin, and contain a mix of text and images. We've fiddled with various Acrobat settings for getting the file size down, but we haven't found a good balance between quality and file size. (Plus, these need to be OCR'ed; so far we've been doing that in Acrobat.) We were looking at LuraTech PDF Compressor, but the cost for an enterprise license is pretty high. It did do an excellent job though. Thanks, Nathan
Re: [CODE4LIB] PDF Compression
As you probably know, you can compress PDFs by compressing or flattening the layers (most useful for born-digital materials, such as artwork) or by applying a compression algorithm to the underlying images for PDFs assembled from digitized images, which seems to be what you're doing. Reducing the image size (pixels) and bit depth prior to assembling images in a PDF (i.e., don't start with your 800ppi TIFF master) can have a dramatic difference on the total size of the PDF. Beyond that, lossless and lossy compression algorithms can reduce the size of the underlying image files, with different techniques working well on different types of images. IrfanView and Ghostscript can help with this. LZW is one of the more common lossless compression algorithms for TIFF images. JPEG2000 also offers good lossless compression. In addition to LuraTech, there's at least one other proprietary PDF compression system, developed by SAFER Inc. (http://www.saferinc.com/). Based on a conversation with someone from the company about 18 months ago, they use algorithms that do automatic edge detection and background detection, applying compression non-uniformly to regions that appear to contain little information. At the time of this conversation, they weren't able to give me any white papers or peer-reviewed articles describing the algorithms used, which made me hesitant about recommending the system for anything remotely archival, though they claimed it was lossless. For use copies, though, the software does work very well, and file size reduction is dramatic. I don't know anything about pricing. LuraTech may use something similar in their Mixed Raster Content (MRC) or layered compression. As far as I know, IrfanView and ghostscript don't include algorithms to do anything similar. Danielle Cunniff Plumer dcplumer associates www.dcplumer.com -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nathan Tallman Sent: Wednesday, October 24, 2012 10:29 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] PDF Compression Can anyone recommend some good PDF compression software? Preferable open-source or low-cost. We're scanning archival collections and the PDFs can be quite large for a single folder. The folder may be thick or thin, and contain a mix of text and images. We've fiddled with various Acrobat settings for getting the file size down, but we haven't found a good balance between quality and file size. (Plus, these need to be OCR'ed; so far we've been doing that in Acrobat.) We were looking at LuraTech PDF Compressor, but the cost for an enterprise license is pretty high. It did do an excellent job though. Thanks, Nathan
Re: [CODE4LIB] PDF Compression
+1 for ghostscript. Used it for excatly the urpose you are talking about and found it very useful. It's copious options listhttp://ghostscript.com/doc/current/Ps2pdf.htm#Optionsmean that the learning curve is a little stiff, but it will let you do pretty much anything you want to your pdfs. Chad. On Wed, Oct 24, 2012 at 11:22 AM, Bridger Dyson-Smith bdysonsm...@gmail.com wrote: Have you tried ghostscript? It should be available for any *nix-like OS or Windows [1]. Cheers, Bridger [1] http://www.ghostscript.com/download/ On Wed, Oct 24, 2012 at 10:59 AM, Paul Butler (pbutler3) pbutl...@umw.eduwrote: Have you looked into Irfanview's [ww.irfanview.com] batch conversion settings and plugins? Might be something there that is useful. Cheers, Paul +-+-+-+-+-+-+-+-+-+-+-+-+ Paul R Butler Assistant Systems Librarian Simpson Library University of Mary Washington 1801 College Avenue Fredericksburg, VA 22401 540.654.1756 libraries.umw.edu Sent from the mighty Dell Vostro 230. -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nathan Tallman Sent: Wednesday, October 24, 2012 10:29 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] PDF Compression Can anyone recommend some good PDF compression software? Preferable open-source or low-cost. We're scanning archival collections and the PDFs can be quite large for a single folder. The folder may be thick or thin, and contain a mix of text and images. We've fiddled with various Acrobat settings for getting the file size down, but we haven't found a good balance between quality and file size. (Plus, these need to be OCR'ed; so far we've been doing that in Acrobat.) We were looking at LuraTech PDF Compressor, but the cost for an enterprise license is pretty high. It did do an excellent job though. Thanks, Nathan
Re: [CODE4LIB] PDF Compression
Thank you everyone for the replies and ideas. It looks like Ghostscript is going to be my best bet... on to testing! Thanks, Nathan On Wed, Oct 24, 2012 at 10:28 AM, Nathan Tallman ntall...@gmail.com wrote: Can anyone recommend some good PDF compression software? Preferable open-source or low-cost. We're scanning archival collections and the PDFs can be quite large for a single folder. The folder may be thick or thin, and contain a mix of text and images. We've fiddled with various Acrobat settings for getting the file size down, but we haven't found a good balance between quality and file size. (Plus, these need to be OCR'ed; so far we've been doing that in Acrobat.) We were looking at LuraTech PDF Compressor, but the cost for an enterprise license is pretty high. It did do an excellent job though. Thanks, Nathan