Andreas,
According to the JavaDoc (http://www.pdfbox.org/javadoc/org/pdfbox/pdmodel/common/PDMetadata.html#PDMetadata%28org.pdfbox.pdmodel.PDDocument%29) the extractxmpmetadata method should be able to do this. Or am I missing something? Thanks for your help, greatly appreciated! Best, Robin -----Original message----- From: Andreas Lehmkühler <andr...@lehmi.de> Sent: Thu 22-10-2009 22:09 To: pdfbox-users@incubator.apache.org; Subject: Re: Question XMP metadata extraction Hi, Robin Diederen schrieb: > Hello Andreas, > > I did have a look at the PrintDocumentMetaData.java fille; there I find that > using the PDDocumentInformation metadata is extracted. This code is useful > for PDF files with "classic" metadata, but not for PDF files only carrying > XMP metadata, right? OK, I see. I'm not that familiar with the XMP stuff, but I guess I understand your problem. > There's my issue.. as soon as I have a PDF file with only XMP metadata I need > some other way to extract this metadata.. I'm afraid that pdfbox is yet limited to the handling of "classic" metadata. > Best, Robin > > -----Original message----- > From: Andreas Lehmkühler <andr...@lehmi.de> > Sent: Thu 22-10-2009 21:47 > To: pdfbox-users@incubator.apache.org; > Subject: Re: Question XMP metadata extraction > > Hi, > > Robin Diederen schrieb: >> Hello all, >> >> I'm quite new to PDFbox and currently figuring out how to extract metadata >> from PDF files which is in XMP format. >> >> I have a few files containing XMP metadata, but I can not get any of those >> to work. And I can't seem to figure out where I am failing. >> >> A code snippet (all non-relevant code was deleted): >> >> String inputFile = "/some/file.pdf" >> >> PDDocument pdfDocument = null; >> pdfDocument = new PDDocument(); >> pdfDocument = PDDocument.load(inputFile); >> PDMetadata pdfMetaData = new PDMetadata(pdfDocument); >> >> int metadataLength = pdfMetaData.getLength(); >> System.out.println(pdfMetaData.getLength()); >> >> >> pdfMetaData.exportXMPMetadata(); >> >> >> The getLength call always returns 0; the exportXMPMetadata call returns an >> error: >> >> [Fatal Error] :-1:-1: Premature end of file. >> Exception in thread "main" java.io.IOException: Premature end of file. >> at org.apache.jempbox.impl.XMLUtil.parse(XMLUtil.java:78) >> at org.apache.jempbox.xmp.XMPMetadata.load(XMPMetadata.java:554) >> at >> org.apache.pdfbox.pdmodel.common.PDMetadata.exportXMPMetadata(PDMetadata.java:86) >> at >> com.robindiederen.pdf.Extractor.extractMetaDataFromXMP(Extractor.java:124) >> at com.robindiederen.pdf.Extractor.main(Extractor.java:90) >> >> >> >> This happens for every PDF I test. Extracting metadata from the >> DocumentInformation table works as a charm. I'm using PDFbox 0.80 on Java >> 1.5. > Have a look at PrintDocumentMetaData as an example how to extract the > docs metadata. > > HTH > Andreas Lehmkühler > > BR Andreas Lehmkühler