Hello everybody,

I would like to ensure that a pre-existing PDF/UA document is still conformant 
after flattening its form.
The issue is that the added text elements which are displaying the text fields 
contents are not marked accordingly.

My current approach is to read all PDFormXObject parse them in PDFStreamParser 
and then look for relevant parts which needs to be marked. I then would simply 
envelop it in an Artifact Sequence (as the information about that content is 
still in the form object) and then write that back into the xObject. However, 
I'm struggling with identifying these relevant parts.

In the following I attached byte code of form fields and what parts of them 
need to be marked.

Text field: (The whole thing needs to be marked)
/Tx BMC
q
1 1 148 20 re
W
n
BT
/Helv 12 Tf
/DeviceGray cs
0 sc
2 6.692 Td
(eberhardt heinzmann) Tj
ET
Q
EMC

Dropdown field: (Line 3 and 4 needs to be marked, as this is the coloured 
background rectangle behind the selection. And everything following needs to be 
marked again because it's like a text field)
/DeviceGray cs
1 sc
0 0 126.7 20 re
f
/Tx BMC
q
1 1 124.7 18 re
W
n
BT
/Helv 12 Tf
/DeviceGray cs
0 sc
2 5.692 Td
(weiblich) Tj
ET
Q
EMC

For check and radio boxes nothing needs to be done as they are marked already, 
here is an example anyway:
q
1 g
/Artifact BMC
/Artifact BMC
0 0 18.000299453 18 re
f
EMC
EMC
/Artifact BMC
/Artifact BMC
.5 .5 17.000299453 17 re
s
EMC
EMC
Q
q
1 1 16.000299453 16 re
W
n
BT
/ZaDb 14.531999588 Tf
2.8531 4.080999851 Td
13.994299888 TL
/Artifact BMC
(4) Tj
EMC
ET
Q

Right now, I would simply scan for Tx sequences and <4 numbers> re f sequences 
and mark them as long they are not already in an artifact sequence. However, 
this does not seem very stable. Perhaps someone can offer some improvements. 
Perhaps there is also a more high-level approach?

Cheers
Ben

___________________________________________________________

Ben Kirsche
Software Engineer

M | +49 151 40230865
@ |ben.kirs...@accso.de<mailto:|ben.kirs...@accso.de>

Accso - Accelerated Solutions GmbH
Hilpertstraße 12, 64295 Darmstadt, Germany
www.accso.de<http://www.accso.de/>  | i...@accso.de<mailto:i...@accso.de>

 [Great Place to Work Siegel für Beste Arbeitgeber in Deutschland, ITK Branche 
und Hessen]

Gesellschafter-Geschäftsführer: Jürgen Artmann, Tim Bölsche, Prof. Dr. Markus 
Voß
Amtsgericht und Handelsregisternummer: Darmstadt -  HRB 89212
____________________________________________________________

Reply via email to