RE: Re: Question XMP metadata extraction

Robin Diederen Thu, 22 Oct 2009 13:53:47 -0700

Hi,


Thanks for looking into the code; I'm a bit confused though. I guess it's your 
suggestion to inspect the three locations for metadata "by hand"?  What would 
be the best way to proceed?

 

Best, Robin
 
-----Original message-----
From: Andreas Lehmkühler <andr...@lehmi.de>
Sent: Thu 22-10-2009 22:36
To: pdfbox-users@incubator.apache.org; 
Subject: Re: Question XMP metadata extraction


Robin Diederen schrieb:
> Andreas,
> 
> According to the JavaDoc 
> (http://www.pdfbox.org/javadoc/org/pdfbox/pdmodel/common/PDMetadata.html#PDMetadata%28org.pdfbox.pdmodel.PDDocument%29)
>  the extractxmpmetadata method should be able to do this. Or am I missing 
> something?
Ok, I had a deeper look and it seems that there are 3 supported
locations for metadata within pdfbox: PDDocumentCatalog, PDPage and
PDXObject. The "classic" metadata are located in the catalog. Perhaps
you will find the metadata your are looking for in the two other objects?

BR
Andreas Lehmkühler

> Thanks for your help, greatly appreciated!
> 
>  
> 
> Best, Robin
>  
> -----Original message-----
> From: Andreas Lehmkühler <andr...@lehmi.de>
> Sent: Thu 22-10-2009 22:09
> To: pdfbox-users@incubator.apache.org; 
> Subject: Re: Question XMP metadata extraction
> 
> Hi,
> 
> Robin Diederen schrieb:
>> Hello Andreas,
>>
>> I did have a look at the PrintDocumentMetaData.java fille; there I find that 
>> using the PDDocumentInformation metadata is extracted. This code is useful 
>> for PDF files with "classic" metadata, but not for PDF files only carrying 
>> XMP metadata, right?
> OK, I see. I'm not that familiar with the XMP stuff, but I guess I
> understand your problem.
> 
>> There's my issue.. as soon as I have a PDF file with only XMP metadata I 
>> need some other way to extract this metadata..
> I'm afraid that pdfbox is yet limited to the handling of "classic" metadata.
> 
> 
>> Best, Robin
>>  
>> -----Original message-----
>> From: Andreas Lehmkühler <andr...@lehmi.de>
>> Sent: Thu 22-10-2009 21:47
>> To: pdfbox-users@incubator.apache.org; 
>> Subject: Re: Question XMP metadata extraction
>>
>> Hi,
>>
>> Robin Diederen schrieb:
>>> Hello all,
>>>
>>> I'm quite new to PDFbox and currently figuring out how to extract metadata 
>>> from PDF files which is in XMP format.
>>>
>>> I have a few files containing XMP metadata, but I can not get any of those 
>>> to work. And I can't seem to figure out where I am failing.
>>>
>>> A code snippet (all non-relevant code was deleted):
>>>
>>> String inputFile = "/some/file.pdf"
>>>
>>> PDDocument pdfDocument = null;
>>> pdfDocument = new PDDocument();
>>> pdfDocument = PDDocument.load(inputFile);     
>>> PDMetadata pdfMetaData = new PDMetadata(pdfDocument);
>>>             
>>> int metadataLength = pdfMetaData.getLength();
>>> System.out.println(pdfMetaData.getLength());
>>>  
>>>
>>> pdfMetaData.exportXMPMetadata();
>>>  
>>>
>>> The getLength call always returns 0; the exportXMPMetadata call returns an 
>>> error:
>>>
>>> [Fatal Error] :-1:-1: Premature end of file.
>>> Exception in thread "main" java.io.IOException: Premature end of file.
>>>     at org.apache.jempbox.impl.XMLUtil.parse(XMLUtil.java:78)
>>>     at org.apache.jempbox.xmp.XMPMetadata.load(XMPMetadata.java:554)
>>>     at 
>>> org.apache.pdfbox.pdmodel.common.PDMetadata.exportXMPMetadata(PDMetadata.java:86)
>>>     at 
>>> com.robindiederen.pdf.Extractor.extractMetaDataFromXMP(Extractor.java:124)
>>>     at com.robindiederen.pdf.Extractor.main(Extractor.java:90)
>>>
>>>  
>>>
>>> This happens for every PDF I test. Extracting metadata from the 
>>> DocumentInformation table works as a charm. I'm using PDFbox 0.80 on Java 
>>> 1.5.
>> Have a look at PrintDocumentMetaData as an example how to extract the
>> docs metadata.
>>
>> HTH
>> Andreas Lehmkühler
>>
>>
> BR
> Andreas Lehmkühler
> 
> 
>

RE: Re: Question XMP metadata extraction

Reply via email to