Hi, On Thu, Aug 12, 2010 at 3:30 PM, Sergiy Karpenko <[email protected]> wrote: > As I know, Parser extracts metadata and file content. > > parse(stream, contentHandler, metadata, parseContext); > > But full content extraction is redundant if I want extract only metadata. > > How can I do this with Tika?
Just pass a "new org.xml.sax.helpers.DefaultHandler()" as the content handler argument to the parse() method. Tika will still do content extraction along with metadata extraction, but the text content is simply ignored. There was earlier some discussion about adding some specific dummy ContentHandler instance (or allowing a null handler) that would inform the underlying parsers that the client is only interested in the document metadata. So far the need for such an optimization has not been too pressing, so we haven't yet implemented that. BR, Jukka Zitting
