Hi Chris, Thank you so much - that's a great start.
Andreas ________________________________ From: "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> Cc: "u...@tika.apache.org" <u...@tika.apache.org> Sent: Fri, February 25, 2011 1:21:33 PM Subject: Re: Tika metadata extracted per supported document format? Hi Andreas, In Tika 0.8+, you can run the --list-met-models command from tika-app: java -jar tika-app-<version>.jar --list-met-models And get a print out of the met keys that Tika supports. Some parsers add their own that aren't part of this met listing, but this is a relatively comprehensive list. Cheers, Chris On Feb 25, 2011, at 12:10 PM, Andreas Kemkes wrote: > Hello, > > I've asked this on the Tika mailing list w/o an answer, so apologies for > cross-posting. > > I'm trying to find information that tells me specifically what metadata is > provided for the different supported document formats. Unfortunately all I > was > > able to find so far is "The Metadata produced depends on the type of document > submitted." > > Currently, I'm using ExtractingRequestHandler from Solr 1.4 (with Tika 0.4), > so > > I'm particularly interested in that version, but also in changes that are > provided in newer versions of Tika. > > Where are the best places to look for such information? > > Thanks in advance, > > Andreas > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++