Hi Tilman Thanks for your reply ... I did not really succeed. We'll probably end up looking at how the PDFDebugger code does it ;).
On Tue, Aug 18, 2015 at 9:08 PM, Tilman Hausherr <[email protected]> wrote: > Am 18.08.2015 um 20:50 schrieb Roberto Nibali: > > Hi > > I'd like to print out the corresponding object id given a specific form > field. How would I do that with PDFBox programmatically? > > Let's for the sake of the argument, assume that the form field is > represented by the following obj: > > obj 218 0 > << > /DA <2B94B0298F2FD7F81F32C6E22043> > /F 4 > /FT /Tx > /Ff 4194304 > /MK > /P 28 0 R > /Parent 46 0 R > /Rect [159.781 764.53 347.142 777.195] > /Subtype /Widget > /T <5EB6B730886188AB3D3194B9654C18094C> > /Type /Annot > /V <45BBBA249C618BBD3974A4BE61501E57181D> > /AP 666 0 R > >> > > If I am going over all PDField entries of a PDF, how would I get to the > underlying obj number (in the above case 218) from a PDField object? > > > I haven't tried this myself, but I think you could "synchronise" the > getChildren() results with the getCOSObject().getItem(COSName.KIDS) array, > i.e. sort out which indirect type is which item returned from > getChildren(). The Kids COSArray has indirect objects (= COSObject type), > as seen here: > > > > COSObject.getObject() returns the dereferenced object. > The reason I asked about this is that while migrating some documents, we found out that the originating PDFs not only have textual changes in the PDF (mostly legal aspect changes in the fix text); the client in certain cases modified the PDFs by adding borders or other graphical elements inside. Those obviously do not show up in the template PDF. My somewhat (maybe stupid) idea was to simply print out the obj id or even the whole object and subsequently insert it into the template for the final PDF during the form field migration, on top of updating all references to the new obj id. At least for simple geometric shapes, like rectangles, this should be feasible, no? Anyway, after constantly getting "null" from the getCOSObject().getItem(COSName.KIDS) and nothing out of getChildren() from a given PDField, I kind of gave up. Imagine you had the following code, and wanted to additionally dump out the underlying object id and the referencing ids of the PDField: @Test private void excuteDumpFields() throws IOException { PDDocument srcDoc = null; try { srcDoc = PDDocument.load(new File(srcDocName)); PDAcroForm acroForm = srcDoc.getDocumentCatalog().getAcroForm(); List<PDField> fields = acroForm.getFields(); for (PDField field : fields) { dumpField(srcDoc, field); } srcDoc.close(); } catch (Exception e) { logerr(e.getMessage()); } finally { if (srcDoc != null) { srcDoc.close(); } } } private void dumpField(PDDocument srcDoc, PDField srcField) throws IOException { if (srcField instanceof PDNonTerminalField) { for (PDField child : ((PDNonTerminalField) srcField).getChildren()) { dumpField(srcDoc, child); } } else if (!(srcField instanceof PDSignatureField)) { String fqName = srcField.getFullyQualifiedName(); String fTypes[] = srcField.getClass().getName().split("\\."); System.out.printf("fqName=%s type=%s%n", fqName, fTypes[fTypes.length-1]); } } It has become customary to me to dump the objects using the pdf-parser ( http://blog.didierstevens.com/programs/pdf-tools/) as follows to futher investigate issues (excerpt showing the dump of object 228): $ python pdf-parser.py -o 228 ../../ccmig2.pdf obj 228 0 Type: /Annot Referencing: 685 0 R, 28 0 R, 46 0 R, 686 0 R << /AA << /K 685 0 R >> /DA <92F8913CB200CF3C13A363C2D20D> /F 4 /FT /Tx /Ff 12582912 /MK /MaxLen 1 /P 28 0 R /Parent 46 0 R /Q 1 /Rect [454.437 769.504 465.482 782.169] /Subtype /Widget /T <8C8A> /Type /Annot /V () /AP 686 0 R >> And to get the objects referencing object 228: $ python pdf-parser.py -r 228 ../../ccmig2.pdf obj 28 0 Type: /Page Referencing: 101 0 R, 217 0 R, 218 0 R, 219 0 R, 220 0 R, 221 0 R, 222 0 R, 223 0 R, 224 0 R, 225 0 R, 226 0 R, 227 0 R, 228 0 R, 229 0 R, 230 0 R, 231 0 R, 232 0 R, 61 0 R, 60 0 R, 62 0 R, 63 0 R, 64 0 R, 65 0 R, 66 0 R, 67 0 R, 69 0 R, 68 0 R, 70 0 R, 71 0 R, 72 0 R, 73 0 R, 74 0 R, 75 0 R, 76 0 R, 77 0 R, 78 0 R, 79 0 R, 80 0 R, 81 0 R, 82 0 R, 83 0 R, 84 0 R, 86 0 R, 87 0 R, 88 0 R, 89 0 R, 90 0 R, 91 0 R, 92 0 R, 93 0 R, 94 0 R, 95 0 R, 96 0 R, 97 0 R, 85 0 R, 233 0 R, 234 0 R, 235 0 R, 236 0 R, 237 0 R, 238 0 R, 239 0 R, 22 0 R, 240 0 R, 241 0 R, 242 0 R, 243 0 R, 244 0 R, 245 0 R, 246 0 R, 247 0 R, 103 0 R, 248 0 R, 6 0 R, 205 0 R, 206 0 R, 207 0 R, 208 0 R, 209 0 R, 210 0 R, 211 0 R, 213 0 R, 212 0 R << /Annots '[101 0 R 217 0 R 218 0 R 219 0 R 220 0 R 221 0 R 222 0 R 223 0 R 224 0 R 225 0 R\n226 0 R 227 0 R 228 0 R 229 0 R 230 0 R 231 0 R 232 0 R 61 0 R 60 0 R 62 0 R\n63 0 R 64 0 R 65 0 R 66 0 R 67 0 R 69 0 R 68 0 R 70 0 R 71 0 R 72 0 R\n73 0 R 74 0 R 75 0 R 76 0 R 77 0 R 78 0 R 79 0 R 80 0 R 81 0 R 82 0 R\n83 0 R 84 0 R 86 0 R 87 0 R 88 0 R 89 0 R 90 0 R 91 0 R 92 0 R 93 0 R\n94 0 R 95 0 R 96 0 R 97 0 R 85 0 R 233 0 R 234 0 R 235 0 R 236 0 R 237 0 R\n238 0 R 239 0 R 22 0 R 240 0 R 241 0 R 242 0 R 243 0 R 244 0 R 245 0 R 246 0 R\n247 0 R 103 0 R]' /BleedBox [0.0 0.0 595.276 841.89] /Contents 248 0 R /CropBox [0.0 0.0 595.276 841.89] /MediaBox [0.0 0.0 595.276 841.89] /Parent 6 0 R /Resources << /ExtGState << /GS0 205 0 R /GS1 206 0 R /GS2 207 0 R /GS3 208 0 R >> /Font << /C2_0 209 0 R /C2_1 210 0 R /TT0 211 0 R /TT1 213 0 R /TT2 212 0 R >> /ProcSet [/PDF /Text] >> /Rotate 0 /Tabs /W /TrimBox [0.0 0.0 595.276 841.89] /Type /Page >> obj 46 0 Type: Referencing: 218 0 R, 230 0 R, 231 0 R, 232 0 R, 219 0 R, 217 0 R, 220 0 R, 221 0 R, 222 0 R, 223 0 R, 224 0 R, 225 0 R, 226 0 R, 227 0 R, 228 0 R, 229 0 R, 17 0 R << /Kids '[218 0 R 230 0 R 231 0 R 232 0 R 219 0 R 217 0 R 220 0 R 221 0 R 222 0 R 223 0 R\n224 0 R 225 0 R 226 0 R 227 0 R 228 0 R 229 0 R]' /Parent 17 0 R /T <32AB37> >> It would be tremendous if I could get at least the proper object id out of the PDFields using PDFBox. Take care Roberto

