Re: Fwd: Getting and setting image and mask separately (PDImageXObject)

Constantine Dokolas Fri, 24 Sep 2021 01:47:01 -0700

To keep the implementation hidden is reasonable. The thing is that access
to the different forms/parts of a standard-conforming image XObject is
still limited (not to mention creation). That is, you are hiding stuff
that's *not *in the API, and that's not really fair. I should be able to
get/set the two images as BufferedImage separately. Anyway, it seems I have
a working workaround already.

What I meant by avoiding BufferedImage, was actually the use of getImage()
which combines the base image and mask. My bad. The thing is that splitting
the two is usually done to downscale the background when it's low on
"content", like with scanned documents (background is low-fluctuation
black-grey and mask is just a stencil for that).

For example, I was processing a page with a full-page image with mask
having the base as a 659x904 grayscale image (JPEG2000 encoded to 906
bytes) and a mask of 2636x3616 (JBIG2-encoded to 21836 bytes). Using
getImage and replacing the original with that, resulted to base and SMask
images (PDFBox changed the pair from Base-Mask to Base-SMask) both at
2636x3616, encoded as grayscale JPEGs of 511252 and 691079 respectively.
So, from ~23KB, it went all the way to ~1.5MB. That's, of course, a kind of
worst-case scenario (i.e. not trying any compression options).

Hope these images pass through the mailer...
[image: image.png][image: image.png]

So, I'm only saying that the API needs to get a little better at supporting
PDF 32000-1:2008 section 8.9.6 in particular. Beyond that, I must still
congratulate the team for what's already done. PDFBox is a great tool
that's given us the opportunity to do great stuff. Keep up the good work. I
honestly hope to be able to contribute, but after 3 years of working with
the standard I still don't consider myself proficient in the PDF format.

Thanks,
Constantine
--
There is a computer disease that anybody who works with computers knows
about. It's a very serious disease and it interferes completely with the
work. The trouble with computers is that you 'play' with them!
- Richard P. Feynman

On Thu, Sep 23, 2021 at 8:06 PM Tilman Hausherr <thaush...@t-online.de>
wrote:

> The reason we kept this package local is so that we can make changes
> without breaking the API. So for you the best would be to copy that file.
>
> Alternatively copy the mask from the alpha layer of the BufferedImage.
> You mention you don't want to use BufferedImage but how else would you
> process this?
>
> Tilman
>
> Am 23.09.2021 um 12:04 schrieb Constantine Dokolas:
> > It just occurred to me that this should have been posted on "users" not
> > "dev", so I'm forwarding it here.
> > Sorry for the confusion.
> > Constantine
> >
> > ---------- Forwarded message ---------
> > From: Constantine Dokolas <cdoko...@gmail.com>
> > Date: Wed, Sep 22, 2021 at 7:02 PM
> > Subject: Getting and setting image and mask separately (PDImageXObject)
> > To: <d...@pdfbox.apache.org>
> >
> >
> > I'm processing images in PDFs and I sometimes get images with a "Mask". I
> > want to separately retrieve the image (being the base) and the "Mask" and
> > also generate (from these) a new base-Mask pair. This is in order to
> > preserve the original format of the (optimized) image resource and
> > size/compression of the individual images (using the BufferedImage can
> > affect the resource size significantly).
> >
> > Unfortunately, SampledImageReader is only package-visible and I can't use
> > it. What are my options?
> >
> > There is also no PDImageXObject.setMask(...), but I guess I can directly
> > set the "Mask" in the dictionary.
> >
> > Thanks in advance,
> > Constantine
> >
> > --
> > There is a computer disease that anybody who works with computers knows
> > about. It's a very serious disease and it interferes completely with the
> > work. The trouble with computers is that you 'play' with them!
> > - Richard P. Feynman
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>

Re: Fwd: Getting and setting image and mask separately (PDImageXObject)

Reply via email to