Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The "ExtractingRequestHandler" page has been changed by iorixxx:
http://wiki.apache.org/solr/ExtractingRequestHandler?action=diff&rev1=80&rev2=81

Comment:
broken tike documentation link corrected

   * resource.name=<File Name> - The optional name of the file.  Tika can use 
it as a hint for detecting mime type.
   * capture=<Tika XHTML NAME> - Capture XHTML elements with the name 
separately for adding to the Solr document.  This can be useful for grabbing 
chunks of the XHTML into a separate field.  For instance, it could be used to 
grab paragraphs (<p>) and index them into a separate field.  Note that content 
is also still captured into the overall "content" field.
   * captureAttr=true|false - Index attributes of the Tika XHTML elements into 
separate fields, named after the element.  For example, when extracting from 
HTML, Tika can return the href attributes in <a> tags as fields named "a". See 
the examples below.
-  * xpath=<XPath expression> - When extracting, only return Tika XHTML content 
that satisfies the XPath expression.  See 
http://lucene.apache.org/tika/documentation.html for details on the format of 
Tika XHTML.  See also TikaExtractOnlyExampleOutput.
+  * xpath=<XPath expression> - When extracting, only return Tika XHTML content 
that satisfies the XPath expression.  See 
http://tika.apache.org/1.2/parser.html for details on the format of Tika XHTML. 
 See also TikaExtractOnlyExampleOutput.
   * lowernames=true|false - Map all field names to lowercase with underscores. 
 For example, Content-Type would be mapped to content_type.
   * literalsOverride=true|false - <!> [[Solr4.0]] When true, literal field 
values will override other values with same field name, such as metadata and 
content. If false, then literal field values will be appended to any extracted 
data from Tika, and the resulting field needs to be multi valued. Default: true
   * resource.password=<password> - <!> [[Solr4.0]] The optional password for a 
password protected PDF or OOXML file. File format support depends on Tika.

Reply via email to