RE: Re: Question XMP metadata extraction

Robin Diederen Thu, 22 Oct 2009 13:17:55 -0700

Andreas,


According to the JavaDoc 
(http://www.pdfbox.org/javadoc/org/pdfbox/pdmodel/common/PDMetadata.html#PDMetadata%28org.pdfbox.pdmodel.PDDocument%29)
 the extractxmpmetadata method should be able to do this. Or am I missing 
something?

 

Thanks for your help, greatly appreciated!

 

Best, Robin
 
-----Original message-----
From: Andreas Lehmkühler <andr...@lehmi.de>
Sent: Thu 22-10-2009 22:09
To: pdfbox-users@incubator.apache.org; 
Subject: Re: Question XMP metadata extraction

Hi,

Robin Diederen schrieb:
> Hello Andreas,
> 
> I did have a look at the PrintDocumentMetaData.java fille; there I find that 
> using the PDDocumentInformation metadata is extracted. This code is useful 
> for PDF files with "classic" metadata, but not for PDF files only carrying 
> XMP metadata, right?
OK, I see. I'm not that familiar with the XMP stuff, but I guess I
understand your problem.

> There's my issue.. as soon as I have a PDF file with only XMP metadata I need 
> some other way to extract this metadata..
I'm afraid that pdfbox is yet limited to the handling of "classic" metadata.


> Best, Robin
>  
> -----Original message-----
> From: Andreas Lehmkühler <andr...@lehmi.de>
> Sent: Thu 22-10-2009 21:47
> To: pdfbox-users@incubator.apache.org; 
> Subject: Re: Question XMP metadata extraction
> 
> Hi,
> 
> Robin Diederen schrieb:
>> Hello all,
>>
>> I'm quite new to PDFbox and currently figuring out how to extract metadata 
>> from PDF files which is in XMP format.
>>
>> I have a few files containing XMP metadata, but I can not get any of those 
>> to work. And I can't seem to figure out where I am failing.
>>
>> A code snippet (all non-relevant code was deleted):
>>
>> String inputFile = "/some/file.pdf"
>>
>> PDDocument pdfDocument = null;
>> pdfDocument = new PDDocument();
>> pdfDocument = PDDocument.load(inputFile);     
>> PDMetadata pdfMetaData = new PDMetadata(pdfDocument);
>>             
>> int metadataLength = pdfMetaData.getLength();
>> System.out.println(pdfMetaData.getLength());
>>  
>>
>> pdfMetaData.exportXMPMetadata();
>>  
>>
>> The getLength call always returns 0; the exportXMPMetadata call returns an 
>> error:
>>
>> [Fatal Error] :-1:-1: Premature end of file.
>> Exception in thread "main" java.io.IOException: Premature end of file.
>>     at org.apache.jempbox.impl.XMLUtil.parse(XMLUtil.java:78)
>>     at org.apache.jempbox.xmp.XMPMetadata.load(XMPMetadata.java:554)
>>     at 
>> org.apache.pdfbox.pdmodel.common.PDMetadata.exportXMPMetadata(PDMetadata.java:86)
>>     at 
>> com.robindiederen.pdf.Extractor.extractMetaDataFromXMP(Extractor.java:124)
>>     at com.robindiederen.pdf.Extractor.main(Extractor.java:90)
>>
>>  
>>
>> This happens for every PDF I test. Extracting metadata from the 
>> DocumentInformation table works as a charm. I'm using PDFbox 0.80 on Java 
>> 1.5.
> Have a look at PrintDocumentMetaData as an example how to extract the
> docs metadata.
> 
> HTH
> Andreas Lehmkühler
> 
> 
BR
Andreas Lehmkühler

RE: Re: Question XMP metadata extraction

Reply via email to