Hi Chris,

java -jar tika-app-0.9.jar --list-met-models
TikaMetadataKeys
 PROTECTED
 RESOURCE_NAME_KEY
TikaMimeKeys
 MIME_TYPE_MAGIC
 TIKA_MIME_FILE

Both 0.8 and 0.9 give me the same list.  Is that a configuration issue?

I'm a bit unclear if that gets me to what I was looking for - metadata 
like "content_type" or "last_modified".  Or am I confusing Tika metadata 
with SolrCell metadata?

I thought SolrCell metadata comes from Tika, or does it not?

Regards,

Andreas



________________________________
From: "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Cc: "u...@tika.apache.org" <u...@tika.apache.org>
Sent: Fri, February 25, 2011 1:21:33 PM
Subject: Re: Tika metadata extracted per supported document format?

Hi Andreas,

In Tika 0.8+, you can run the --list-met-models command from tika-app:

java -jar tika-app-<version>.jar --list-met-models

And get a print out of the met keys that Tika supports. Some parsers add their 
own that aren't part of this met listing, but this is a relatively 
comprehensive 
list.

Cheers,
Chris

On Feb 25, 2011, at 12:10 PM, Andreas Kemkes wrote:

> Hello,
> 
> I've asked this on the Tika mailing list w/o an answer, so apologies for 
> cross-posting.
> 
> I'm trying to find information that tells me specifically what metadata is 
> provided for the different supported document formats.  Unfortunately all I 
> was 
>
> able to find so far is "The Metadata produced depends on the type of document 
> submitted."
> 
> Currently, I'm using ExtractingRequestHandler from Solr 1.4 (with Tika 0.4), 
> so 
>
> I'm particularly interested in that version, but also in changes that are 
> provided in newer versions of Tika.
> 
> Where are the best places to look for such information?
> 
> Thanks in advance,
> 
> Andreas
> 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


      

Reply via email to