On Thu, 27 Sep 2012, Vigneshwaran wrote:
I am new to Apache Tika. I want Tika to output only the names of the files within the archive (if the input file is an archive) and the file content as usual if the input file is not an archive. Is there a way I can do that?

Yup. Rather than passing in something like AutoDetectParser in the ParseContext, parse in your own custom one. When that is called for an embedded document (eg a document within an archive), rather than processing the embedded resource, simply print out the name and return

Nick

Reply via email to