Sorry for the long post, but I think that
the issue needs some discussion.

I wrote to this group several days ago
about the larger PDF file output sizes
when using 1.7.1 vs 1.6.0.

I have done some work and discovered that
the observation of larger PDF output
files happened between 1.6.0 and 1.7.0.
I was incorrect in suggesting that the problem
was with 1.7.1.

I tried some testcases and discovered that
a study of the source was needed.

Turns out that the issue may not really be
a "bug" but more of a "feature". However
the lack of consistency (and documentation?)
has caused some of the problems I've encountered.

There are 2 obvious ways to instantiate the
PDJpeg object, one with a FileInputStream, and
other with a BufferedImage.

With the FileInputStream instantiation, everything
looks the same between 1.6.0 and 1.7.0. Files are
the same size.

I cannot use this instantiation for my project,
because I have both monochrome (gray-scale) and
color images. The FileInputStream code sets all
images as color, and that causes the PDF
readers to choke. Some throw an error, others
just won't display the gray-scale image.

To get around that behavior, I changed my
production code to use the BufferedImage
instantiation. Apparently, with version
1.7.0, the PDFBox code checks for gray-scale
vs color, and sets the correct parameters.
This is a good thing, and removes my
work-around.

However, the larger PDF files I was observing
were due to a change in the handling of the
quality (compression) level of the JPG files
so instantiated.

The execution pathways for these two instantiation
methods are quite different, and this leads to
different behavior between the two. One obvious
difference I have already described, as being the
setting of the correct gray vs color parameter.

The other differences very much involve the
JPG quality setting. More quality; larger PDF
files (about 3x the size under 1.6.0).

In addition, there is a "compression" parameter used
in PDJpeg (or which can be specified in the constructor)
that is not passed onwards to the image writer in 1.7.0.
It appears to be a dead-end value.

The BufferedImage constructor seems to be setting
the quality to 1.0, while the previous system
appears to have set it to 0.75. These quality
differences may explain the larger output
files. However, there seems to be no way to
set the quality explicitly.


The image writer now used in 1.7.0 (ImageIOUtil.writeImage),
in addition, is invoked with no "resolution"
parameter, and thus seems to arbitrarily set a
72dpi resolution for the images. I do not
understand the function of this "resolution"
entry. Can someone explain it? Is it important?

Were there changes to the JavaDoc to explain the
settings for these two methods of instantiation
for PDJpeg? Unfortunately, I seem to have
mislayed the link for the JavaDoc for PDFBox
1.7.1.

The "bottom line" may be that there are no
guide-lines that I know of, for which instantiation to
use for PDJpeg. I have no information on the
other settings, such as the "resolution", and
the "quality" of the JPG files has been set
to a specific value, with no mechanism to alter
it (not necessarily a bug?).

[Part of the reason for asking the above questions, is
that I need the illustrations to be good enough to be
zoomed in closely, when a user reads the book with
a reader such as Acrobat. If I create what I hope are
good-quality JPG's, but the final image looks poor when
zoomed in, then all the extra work is wasted.]

Thanks, as always for all the good work that has
gone into PDFBox.

Bob Swanson


Reply via email to