Hi Chris, java -jar tika-app-0.9.jar --list-met-models TikaMetadataKeys PROTECTED RESOURCE_NAME_KEY TikaMimeKeys MIME_TYPE_MAGIC TIKA_MIME_FILE
Both 0.8 and 0.9 give me the same list. Is that a configuration issue? I'm a bit unclear if that gets me to what I was looking for - metadata like "content_type" or "last_modified". Or am I confusing Tika metadata with SolrCell metadata? I thought SolrCell metadata comes from Tika, or does it not? Regards, Andreas ________________________________ From: "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> Cc: "u...@tika.apache.org" <u...@tika.apache.org> Sent: Fri, February 25, 2011 1:21:33 PM Subject: Re: Tika metadata extracted per supported document format? Hi Andreas, In Tika 0.8+, you can run the --list-met-models command from tika-app: java -jar tika-app-<version>.jar --list-met-models And get a print out of the met keys that Tika supports. Some parsers add their own that aren't part of this met listing, but this is a relatively comprehensive list. Cheers, Chris On Feb 25, 2011, at 12:10 PM, Andreas Kemkes wrote: > Hello, > > I've asked this on the Tika mailing list w/o an answer, so apologies for > cross-posting. > > I'm trying to find information that tells me specifically what metadata is > provided for the different supported document formats. Unfortunately all I > was > > able to find so far is "The Metadata produced depends on the type of document > submitted." > > Currently, I'm using ExtractingRequestHandler from Solr 1.4 (with Tika 0.4), > so > > I'm particularly interested in that version, but also in changes that are > provided in newer versions of Tika. > > Where are the best places to look for such information? > > Thanks in advance, > > Andreas > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++