Hi,
Thanks for looking into the code; I'm a bit confused though. I guess it's your suggestion to inspect the three locations for metadata "by hand"? What would be the best way to proceed? Best, Robin -----Original message----- From: Andreas Lehmkühler <andr...@lehmi.de> Sent: Thu 22-10-2009 22:36 To: pdfbox-users@incubator.apache.org; Subject: Re: Question XMP metadata extraction Robin Diederen schrieb: > Andreas, > > According to the JavaDoc > (http://www.pdfbox.org/javadoc/org/pdfbox/pdmodel/common/PDMetadata.html#PDMetadata%28org.pdfbox.pdmodel.PDDocument%29) > the extractxmpmetadata method should be able to do this. Or am I missing > something? Ok, I had a deeper look and it seems that there are 3 supported locations for metadata within pdfbox: PDDocumentCatalog, PDPage and PDXObject. The "classic" metadata are located in the catalog. Perhaps you will find the metadata your are looking for in the two other objects? BR Andreas Lehmkühler > Thanks for your help, greatly appreciated! > > > > Best, Robin > > -----Original message----- > From: Andreas Lehmkühler <andr...@lehmi.de> > Sent: Thu 22-10-2009 22:09 > To: pdfbox-users@incubator.apache.org; > Subject: Re: Question XMP metadata extraction > > Hi, > > Robin Diederen schrieb: >> Hello Andreas, >> >> I did have a look at the PrintDocumentMetaData.java fille; there I find that >> using the PDDocumentInformation metadata is extracted. This code is useful >> for PDF files with "classic" metadata, but not for PDF files only carrying >> XMP metadata, right? > OK, I see. I'm not that familiar with the XMP stuff, but I guess I > understand your problem. > >> There's my issue.. as soon as I have a PDF file with only XMP metadata I >> need some other way to extract this metadata.. > I'm afraid that pdfbox is yet limited to the handling of "classic" metadata. > > >> Best, Robin >> >> -----Original message----- >> From: Andreas Lehmkühler <andr...@lehmi.de> >> Sent: Thu 22-10-2009 21:47 >> To: pdfbox-users@incubator.apache.org; >> Subject: Re: Question XMP metadata extraction >> >> Hi, >> >> Robin Diederen schrieb: >>> Hello all, >>> >>> I'm quite new to PDFbox and currently figuring out how to extract metadata >>> from PDF files which is in XMP format. >>> >>> I have a few files containing XMP metadata, but I can not get any of those >>> to work. And I can't seem to figure out where I am failing. >>> >>> A code snippet (all non-relevant code was deleted): >>> >>> String inputFile = "/some/file.pdf" >>> >>> PDDocument pdfDocument = null; >>> pdfDocument = new PDDocument(); >>> pdfDocument = PDDocument.load(inputFile); >>> PDMetadata pdfMetaData = new PDMetadata(pdfDocument); >>> >>> int metadataLength = pdfMetaData.getLength(); >>> System.out.println(pdfMetaData.getLength()); >>> >>> >>> pdfMetaData.exportXMPMetadata(); >>> >>> >>> The getLength call always returns 0; the exportXMPMetadata call returns an >>> error: >>> >>> [Fatal Error] :-1:-1: Premature end of file. >>> Exception in thread "main" java.io.IOException: Premature end of file. >>> at org.apache.jempbox.impl.XMLUtil.parse(XMLUtil.java:78) >>> at org.apache.jempbox.xmp.XMPMetadata.load(XMPMetadata.java:554) >>> at >>> org.apache.pdfbox.pdmodel.common.PDMetadata.exportXMPMetadata(PDMetadata.java:86) >>> at >>> com.robindiederen.pdf.Extractor.extractMetaDataFromXMP(Extractor.java:124) >>> at com.robindiederen.pdf.Extractor.main(Extractor.java:90) >>> >>> >>> >>> This happens for every PDF I test. Extracting metadata from the >>> DocumentInformation table works as a charm. I'm using PDFbox 0.80 on Java >>> 1.5. >> Have a look at PrintDocumentMetaData as an example how to extract the >> docs metadata. >> >> HTH >> Andreas Lehmkühler >> >> > BR > Andreas Lehmkühler > > >