Hello everybody, I would like to ensure that a pre-existing PDF/UA document is still conformant after flattening its form. The issue is that the added text elements which are displaying the text fields contents are not marked accordingly.
My current approach is to read all PDFormXObject parse them in PDFStreamParser and then look for relevant parts which needs to be marked. I then would simply envelop it in an Artifact Sequence (as the information about that content is still in the form object) and then write that back into the xObject. However, I'm struggling with identifying these relevant parts. In the following I attached byte code of form fields and what parts of them need to be marked. Text field: (The whole thing needs to be marked) /Tx BMC q 1 1 148 20 re W n BT /Helv 12 Tf /DeviceGray cs 0 sc 2 6.692 Td (eberhardt heinzmann) Tj ET Q EMC Dropdown field: (Line 3 and 4 needs to be marked, as this is the coloured background rectangle behind the selection. And everything following needs to be marked again because it's like a text field) /DeviceGray cs 1 sc 0 0 126.7 20 re f /Tx BMC q 1 1 124.7 18 re W n BT /Helv 12 Tf /DeviceGray cs 0 sc 2 5.692 Td (weiblich) Tj ET Q EMC For check and radio boxes nothing needs to be done as they are marked already, here is an example anyway: q 1 g /Artifact BMC /Artifact BMC 0 0 18.000299453 18 re f EMC EMC /Artifact BMC /Artifact BMC .5 .5 17.000299453 17 re s EMC EMC Q q 1 1 16.000299453 16 re W n BT /ZaDb 14.531999588 Tf 2.8531 4.080999851 Td 13.994299888 TL /Artifact BMC (4) Tj EMC ET Q Right now, I would simply scan for Tx sequences and <4 numbers> re f sequences and mark them as long they are not already in an artifact sequence. However, this does not seem very stable. Perhaps someone can offer some improvements. Perhaps there is also a more high-level approach? Cheers Ben ___________________________________________________________ Ben Kirsche Software Engineer M | +49 151 40230865 @ |ben.kirs...@accso.de<mailto:|ben.kirs...@accso.de> Accso - Accelerated Solutions GmbH Hilpertstraße 12, 64295 Darmstadt, Germany www.accso.de<http://www.accso.de/> | i...@accso.de<mailto:i...@accso.de> [Great Place to Work Siegel für Beste Arbeitgeber in Deutschland, ITK Branche und Hessen] Gesellschafter-Geschäftsführer: Jürgen Artmann, Tim Bölsche, Prof. Dr. Markus Voß Amtsgericht und Handelsregisternummer: Darmstadt - HRB 89212 ____________________________________________________________