Re: Parse metadata only

Nick Burch Tue, 29 May 2012 06:51:05 -0700

On Tue, 29 May 2012, Thinus Prinsloo wrote:

I would like to parse the meta-data of a massive amount of PDF filesonly. I do not want to extract the text, not yet anyway, only getmeta-data information such as "Creation-Date", etc. Is it possible forTika to provide the meta-data without doing a parse on the wholedocument (with a content handler, etc.)?

At the moment, that's not possible. Most file formats don't have all theirmetadata in entirely separate places, so you end up having to processalmost all of the file anyway. (There has been talk about implementingthis in the past, but this problem has largely meant it hasn't beentackled)


If you don't want the text, you can just pass in a content handler that
ignores everything

Nick

Re: Parse metadata only

Reply via email to