Re: [CODE4LIB] PDF Compression

2012-10-28 Thread Wilhelmina Randtke
If you didn't optimize images before making the PDF, you can go into
preflight and do things like go to grayscale, go to black and white, etc.
And you can select which pages to do it on, so you knock down the
resolution and color depth for text, but leave them high for images.

OCR will store text on a separate layer, so if you flatten layers run OCR
later. Also, in Acrobat Pro 9, if you do OCR as a batch processing
function, then your staff time is nothing. You just leave the computer
churning away for days or weeks and it can do all PDFs in folders and
subfolders.  So, start it on files before a long weekend, and come back to
the project when you back log is processed.

By the way, for long term archival purposes, you are better off loosing
some quality in the PDFs, as a trade off for manageable filesize. If you
compress, then uncompressing at a later date may be a problem.
Uncompressing in the future should be a primary factor in selecting a
compression program.

-Wilhelmina Randtke
On Oct 24, 2012 12:09 PM, danielle plumer dcplu...@gmail.com wrote:

 As you probably know, you can compress PDFs by compressing or flattening
 the layers (most useful for born-digital materials, such as artwork) or by
 applying a compression algorithm to the underlying images for PDFs
 assembled from digitized images, which seems to be what you're doing.
 Reducing the image size (pixels) and bit depth prior to assembling images
 in a PDF (i.e., don't start with your 800ppi TIFF master) can have a
 dramatic difference on the total size of the PDF. Beyond that, lossless and
 lossy compression algorithms can reduce the size of the underlying image
 files, with different techniques working well on different types of images.
 IrfanView and Ghostscript can help with this. LZW is one of the more common
 lossless compression algorithms for TIFF images. JPEG2000 also offers good
 lossless compression.

 In addition to LuraTech, there's at least one other proprietary PDF
 compression system, developed by SAFER Inc. (http://www.saferinc.com/).
 Based
 on a conversation with someone from the company about 18 months ago, they
 use algorithms that do automatic edge detection and background detection,
 applying compression non-uniformly to regions that appear to contain little
 information. At the time of this conversation, they weren't able to give me
 any white papers or peer-reviewed articles describing the algorithms used,
 which made me hesitant about recommending the system for anything remotely
 archival, though they claimed it was lossless. For use copies, though, the
 software does work very well, and file size reduction is dramatic. I don't
 know anything about pricing. LuraTech may use something similar in their
 Mixed Raster Content (MRC) or layered compression. As far as I know,
 IrfanView and ghostscript don't include algorithms to do anything similar.

 Danielle Cunniff Plumer
 dcplumer associates
 www.dcplumer.com



   -Original Message-
   From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
 Of
   Nathan Tallman
   Sent: Wednesday, October 24, 2012 10:29 AM
   To: CODE4LIB@LISTSERV.ND.EDU
   Subject: [CODE4LIB] PDF Compression
  
   Can anyone recommend some good PDF compression software? Preferable
   open-source or low-cost. We're scanning archival collections and the
 PDFs
   can be quite large for a single folder. The folder may be thick or
 thin,
   and contain a mix of text and images. We've fiddled with various
 Acrobat
   settings for getting the file size down, but we haven't found a good
   balance between quality and file size. (Plus, these need to be OCR'ed;
 so
   far we've been doing that in Acrobat.)
  
   We were looking at LuraTech PDF Compressor, but the cost for an
  enterprise
   license is pretty high. It did do an excellent job though.
  
   Thanks,
   Nathan
  
 



[CODE4LIB] OCR news (was Re: [CODE4LIB] PDF Compression)

2012-10-28 Thread Simon Spero
On a vaguely related note:  I was talking to some people at the ABBYY booth
in the exhibit hall for  KMWorld/Enterprise Search Summit/Taxonomy
Bootcamp, and asked whether the Linux version of their software was not
going to be updated.  Apparently it will be skipping a major, and will be
syncing with the windows server version some time in Q1/2013.

I think enough people hated having to keep a windows server in the loop
(i.e. everybody not running sharepoint) that it was worth the effort.

Since their SDK provides the OCR engine for a lot of other products, this
may have ripple on effects for other products.

Simon

On Sun, Oct 28, 2012 at 2:10 AM, Wilhelmina Randtke rand...@gmail.comwrote:

 OCR will store text on a separate layer, so if you flatten layers run OCR
 later. Also, in Acrobat Pro 9, if you do OCR as a batch processing
 function, then your staff time is nothing.


Re: [CODE4LIB] PDF Compression

2012-10-24 Thread Paul Butler (pbutler3)
Have you looked into Irfanview's [ww.irfanview.com] batch conversion settings 
and plugins?  Might be something there that is useful.  
Cheers, Paul
+-+-+-+-+-+-+-+-+-+-+-+-+
Paul R Butler
Assistant Systems Librarian
Simpson Library
University of Mary Washington
1801 College Avenue
Fredericksburg, VA 22401
540.654.1756
libraries.umw.edu

Sent from the mighty Dell Vostro 230.


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nathan 
Tallman
Sent: Wednesday, October 24, 2012 10:29 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] PDF Compression

Can anyone recommend some good PDF compression software? Preferable open-source 
or low-cost. We're scanning archival collections and the PDFs can be quite 
large for a single folder. The folder may be thick or thin, and contain a mix 
of text and images. We've fiddled with various Acrobat settings for getting the 
file size down, but we haven't found a good balance between quality and file 
size. (Plus, these need to be OCR'ed; so far we've been doing that in Acrobat.)

We were looking at LuraTech PDF Compressor, but the cost for an enterprise 
license is pretty high. It did do an excellent job though.

Thanks,
Nathan


Re: [CODE4LIB] PDF Compression

2012-10-24 Thread Bridger Dyson-Smith
Have you tried ghostscript? It should be available for any *nix-like OS or
Windows [1].

Cheers,
Bridger
[1] http://www.ghostscript.com/download/

On Wed, Oct 24, 2012 at 10:59 AM, Paul Butler (pbutler3)
pbutl...@umw.eduwrote:

 Have you looked into Irfanview's [ww.irfanview.com] batch conversion
 settings and plugins?  Might be something there that is useful.
 Cheers, Paul
 +-+-+-+-+-+-+-+-+-+-+-+-+
 Paul R Butler
 Assistant Systems Librarian
 Simpson Library
 University of Mary Washington
 1801 College Avenue
 Fredericksburg, VA 22401
 540.654.1756
 libraries.umw.edu

 Sent from the mighty Dell Vostro 230.


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Nathan Tallman
 Sent: Wednesday, October 24, 2012 10:29 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] PDF Compression

 Can anyone recommend some good PDF compression software? Preferable
 open-source or low-cost. We're scanning archival collections and the PDFs
 can be quite large for a single folder. The folder may be thick or thin,
 and contain a mix of text and images. We've fiddled with various Acrobat
 settings for getting the file size down, but we haven't found a good
 balance between quality and file size. (Plus, these need to be OCR'ed; so
 far we've been doing that in Acrobat.)

 We were looking at LuraTech PDF Compressor, but the cost for an enterprise
 license is pretty high. It did do an excellent job though.

 Thanks,
 Nathan



Re: [CODE4LIB] PDF Compression

2012-10-24 Thread danielle plumer
As you probably know, you can compress PDFs by compressing or flattening
the layers (most useful for born-digital materials, such as artwork) or by
applying a compression algorithm to the underlying images for PDFs
assembled from digitized images, which seems to be what you're doing.
Reducing the image size (pixels) and bit depth prior to assembling images
in a PDF (i.e., don't start with your 800ppi TIFF master) can have a
dramatic difference on the total size of the PDF. Beyond that, lossless and
lossy compression algorithms can reduce the size of the underlying image
files, with different techniques working well on different types of images.
IrfanView and Ghostscript can help with this. LZW is one of the more common
lossless compression algorithms for TIFF images. JPEG2000 also offers good
lossless compression.

In addition to LuraTech, there's at least one other proprietary PDF
compression system, developed by SAFER Inc. (http://www.saferinc.com/). Based
on a conversation with someone from the company about 18 months ago, they
use algorithms that do automatic edge detection and background detection,
applying compression non-uniformly to regions that appear to contain little
information. At the time of this conversation, they weren't able to give me
any white papers or peer-reviewed articles describing the algorithms used,
which made me hesitant about recommending the system for anything remotely
archival, though they claimed it was lossless. For use copies, though, the
software does work very well, and file size reduction is dramatic. I don't
know anything about pricing. LuraTech may use something similar in their
Mixed Raster Content (MRC) or layered compression. As far as I know,
IrfanView and ghostscript don't include algorithms to do anything similar.

Danielle Cunniff Plumer
dcplumer associates
www.dcplumer.com



  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Nathan Tallman
  Sent: Wednesday, October 24, 2012 10:29 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] PDF Compression
 
  Can anyone recommend some good PDF compression software? Preferable
  open-source or low-cost. We're scanning archival collections and the PDFs
  can be quite large for a single folder. The folder may be thick or thin,
  and contain a mix of text and images. We've fiddled with various Acrobat
  settings for getting the file size down, but we haven't found a good
  balance between quality and file size. (Plus, these need to be OCR'ed; so
  far we've been doing that in Acrobat.)
 
  We were looking at LuraTech PDF Compressor, but the cost for an
 enterprise
  license is pretty high. It did do an excellent job though.
 
  Thanks,
  Nathan
 



Re: [CODE4LIB] PDF Compression

2012-10-24 Thread Chad Nelson
+1 for ghostscript. Used it for excatly the urpose you are talking about
and found it very useful.

It's copious options
listhttp://ghostscript.com/doc/current/Ps2pdf.htm#Optionsmean that
the learning curve is a little stiff, but it will let you do
pretty much anything you want to your pdfs.

Chad.




On Wed, Oct 24, 2012 at 11:22 AM, Bridger Dyson-Smith bdysonsm...@gmail.com
 wrote:

 Have you tried ghostscript? It should be available for any *nix-like OS or
 Windows [1].

 Cheers,
 Bridger
 [1] http://www.ghostscript.com/download/

 On Wed, Oct 24, 2012 at 10:59 AM, Paul Butler (pbutler3)
 pbutl...@umw.eduwrote:

  Have you looked into Irfanview's [ww.irfanview.com] batch conversion
  settings and plugins?  Might be something there that is useful.
  Cheers, Paul
  +-+-+-+-+-+-+-+-+-+-+-+-+
  Paul R Butler
  Assistant Systems Librarian
  Simpson Library
  University of Mary Washington
  1801 College Avenue
  Fredericksburg, VA 22401
  540.654.1756
  libraries.umw.edu
 
  Sent from the mighty Dell Vostro 230.
 
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Nathan Tallman
  Sent: Wednesday, October 24, 2012 10:29 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] PDF Compression
 
  Can anyone recommend some good PDF compression software? Preferable
  open-source or low-cost. We're scanning archival collections and the PDFs
  can be quite large for a single folder. The folder may be thick or thin,
  and contain a mix of text and images. We've fiddled with various Acrobat
  settings for getting the file size down, but we haven't found a good
  balance between quality and file size. (Plus, these need to be OCR'ed; so
  far we've been doing that in Acrobat.)
 
  We were looking at LuraTech PDF Compressor, but the cost for an
 enterprise
  license is pretty high. It did do an excellent job though.
 
  Thanks,
  Nathan
 



Re: [CODE4LIB] PDF Compression

2012-10-24 Thread Nathan Tallman
Thank you everyone for the replies and ideas. It looks like Ghostscript is
going to be my best bet... on to testing!

Thanks,
Nathan

On Wed, Oct 24, 2012 at 10:28 AM, Nathan Tallman ntall...@gmail.com wrote:

 Can anyone recommend some good PDF compression software? Preferable
 open-source or low-cost. We're scanning archival collections and the PDFs
 can be quite large for a single folder. The folder may be thick or thin,
 and contain a mix of text and images. We've fiddled with various Acrobat
 settings for getting the file size down, but we haven't found a good
 balance between quality and file size. (Plus, these need to be OCR'ed; so
 far we've been doing that in Acrobat.)

 We were looking at LuraTech PDF Compressor, but the cost for an enterprise
 license is pretty high. It did do an excellent job though.

 Thanks,
 Nathan