Am 17.05.2016 um 04:20 schrieb Romain Guillaume:
Hi everyone,

I would like to overlay 2 pdf files but with particular modifications.
I know how to overlay 2 pdf but sometimes I need to remove some elements of
one of them during overlay operation.
For example, imagine an invoice composed with 2 files:
-one is the background page (containing logo or others fixed graphical
elements)
-one is the text page (containing amounts, dates, invoice number, ...)
As you probably guessed, to obtain final invoice I overlay this 2 pages
(and it works perfectly in 99.99% of cases)
That brings me to my problem. Sometimes the "text page" is somewhat
"dirty". I mean there are some text areas with a white background instead
of a transparent background. So when I do the overlay, I see on final
invoice, white areas which overwrite background page (it covers some
graphical elements and it should not).
So my question is how to say during overlay operation: "don't keep elements
which are white backgrounds, or replace them by transparent backgrounds". I
don't know how parse each element of the pdf and say if this element is a
white background don't keep it.
My question is not "how to keep text only". I want remove only white
backgrounds (or replace them by transparent backgrounds) and keep all
others elements (all images, all texts, all backgrounds which are not
white, ...)
I use pdfbox 1.8.11

I thank you in advance for your help.

Tricky, even if you can share the file.

You should look at the file with the PDFDebugger app (2.x is better). Then find path operators like m, l and re in the content stream of a page. Then the color assignment for the non stroking color (could be s, sc, scn, k, g, rg) and or insert a transparent graphics state parameter and restore it later. Get the token list, change it, and rewrite it into a new content stream.

It might be even more tricky if a PDF uses forms (new elements with their own content stream). Or if the colors are so that one can't easily tell what is white.

However paths are also used to draw lines, boxes, etc that you don't want to remove.

And what about invoice that come with a background image? Or a logo? Or an image that is actually a bunch of vector graphics?

This is a terrible assignment, maybe the result of a poor business decision.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to