Hi, can you try using the Non Sequential Parser PDDocument.loadNonSeq(..) ?
BR Maruan Sahyoun Am 04.04.2013 um 12:00 schrieb Tobias Tertel <[email protected]>: > Hey, > > I've a PDF Document and want to extract the values. The PDF-Document is a > form. Therfore I can't use the example with processField and field.getValue(). > I've done the following: > > PDDocument pdf = PDDocument.load(new File("test.pdf")); > PDDocumentCatalog docCatalog = pdf.getDocumentCatalog(); > PDAcroForm acroForm = docCatalog.getAcroForm(); > > for (Object pObject : acroForm.getFields()) { > PDField pfield = (PDField) pObject; > COSDictionary cos1 = (COSDictionary) pfield.getCOSObject(); > for (Entry<COSName, COSBase> test : cos1.entrySet()) > { > System.out.println(test.getKey().toString() + " : " + > test.getValue().toString()); > } > System.out.println(cos1.getString("T") + " => " + > cos1.getString("V")); > System.out.println("done"); > } > > The output ist the folowing: > > COSName{DA} : COSString{/HelveticaNeueLTPro-Bd 18 Tf 0 g} > COSName{FT} : COSName{Tx} > COSName{Kids} : COSArray{[COSObject{23, 0}, COSObject{33, 0}]} > COSName{Q} : COSInt{1} > COSName{T} : COSString{Marke_01} > Marke_01 => null > done > COSName{DA} : COSString{/HelveticaNeueLTPro-Bd 48 Tf 0 g} > COSName{FT} : COSName{Tx} > COSName{Ff} : COSInt{4096} > COSName{Kids} : COSArray{[COSObject{24, 0}, COSObject{34, 0}]} > COSName{Q} : COSInt{1} > COSName{T} : COSString{Produkt_01} > COSName{V} : COSString{PåskesortimentnullNormalsortiment} > Produkt_01 => PåskesortimentnullNormalsortiment > done > > In the first part I'm missing: COSName{V} > > The content of the PDF-Document is: > > %EOF > 11 0 obj > <</DA(/HelveticaNeueLTPro-Bd 18 Tf 0 g)/FT/Tx/Kids[23 0 R 33 0 R]/Q > 1/T(Marke_01)/V(test hallo)>> > endobj > 2 0 obj > <</DA(/HelveticaNeueLTPro-Bd 48 Tf 0 g)/FT/Tx/Ff 4096/Kids[24 0 R 34 0 R]/Q > 1/T(Produkt_01)/V(Påskesortiment\rNormalsortiment)>> > endobj > > Can anybody help me? > Thank You. > > Best regards > Tobias

