On Tue, 29 May 2012, Thinus Prinsloo wrote:
I would like to parse the meta-data of a massive amount of PDF files only. I do not want to extract the text, not yet anyway, only get meta-data information such as "Creation-Date", etc. Is it possible for Tika to provide the meta-data without doing a parse on the whole document (with a content handler, etc.)?

At the moment, that's not possible. Most file formats don't have all their metadata in entirely separate places, so you end up having to process almost all of the file anyway. (There has been talk about implementing this in the past, but this problem has largely meant it hasn't been tackled)

If you don't want the text, you can just pass in a content handler that
ignores everything

Nick

Reply via email to