Hello Andreas,

 

I did have a look at the PrintDocumentMetaData.java fille; there I find that 
using the PDDocumentInformation metadata is extracted. This code is useful for 
PDF files with "classic" metadata, but not for PDF files only carrying XMP 
metadata, right?

 

There's my issue.. as soon as I have a PDF file with only XMP metadata I need 
some other way to extract this metadata..

 

Best, Robin
 
-----Original message-----
From: Andreas Lehmkühler <andr...@lehmi.de>
Sent: Thu 22-10-2009 21:47
To: pdfbox-users@incubator.apache.org; 
Subject: Re: Question XMP metadata extraction

Hi,

Robin Diederen schrieb:
> Hello all,
> 
> I'm quite new to PDFbox and currently figuring out how to extract metadata 
> from PDF files which is in XMP format.
> 
> I have a few files containing XMP metadata, but I can not get any of those to 
> work. And I can't seem to figure out where I am failing.
> 
> A code snippet (all non-relevant code was deleted):
> 
> String inputFile = "/some/file.pdf"
> 
> PDDocument pdfDocument = null;
> pdfDocument = new PDDocument();
> pdfDocument = PDDocument.load(inputFile);     
> PDMetadata pdfMetaData = new PDMetadata(pdfDocument);
>             
> int metadataLength = pdfMetaData.getLength();
> System.out.println(pdfMetaData.getLength());
>  
> 
> pdfMetaData.exportXMPMetadata();
>  
> 
> The getLength call always returns 0; the exportXMPMetadata call returns an 
> error:
> 
> [Fatal Error] :-1:-1: Premature end of file.
> Exception in thread "main" java.io.IOException: Premature end of file.
>     at org.apache.jempbox.impl.XMLUtil.parse(XMLUtil.java:78)
>     at org.apache.jempbox.xmp.XMPMetadata.load(XMPMetadata.java:554)
>     at 
> org.apache.pdfbox.pdmodel.common.PDMetadata.exportXMPMetadata(PDMetadata.java:86)
>     at 
> com.robindiederen.pdf.Extractor.extractMetaDataFromXMP(Extractor.java:124)
>     at com.robindiederen.pdf.Extractor.main(Extractor.java:90)
> 
>  
> 
> This happens for every PDF I test. Extracting metadata from the 
> DocumentInformation table works as a charm. I'm using PDFbox 0.80 on Java 1.5.
Have a look at PrintDocumentMetaData as an example how to extract the
docs metadata.

HTH
Andreas Lehmkühler

Reply via email to