Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The "TikaEntityProcessor" page has been changed by NoblePaul.
http://wiki.apache.org/solr/TikaEntityProcessor?action=diff&rev1=1&rev2=2

--------------------------------------------------

  = Configuration =
  Sample configuration
  {{{
- 
- <entity processor="TikaEntityProcessor" tikaConfig="tikaconfig.xml" 
url="${some.var.goes.here}" format="text">
+ <entity processor="TikaEntityProcessor" tikaConfig="tikaconfig.xml" 
url="${some.var.goes.here}" dataSource="bin" format="text">
        <!--Do appropriate mapping here  meta="true" means it is a metadata 
field -->
        <field column="Author" meta="true" name="author"/>
        <field column="title" meta="true" name="docTitle"/>
        <!--'text' is an implicit field emited by TikaEntityProcessor . Map it 
appropriately-->
        <field column="text"/>
- </entity>
+ </entity>  
-   
  }}}
+ === attributes ===
+  * url : (required) The url to the source. This depends on the !DataSource 
being used
+  * tikaConfig : (optional).The tika config file . If missing , default config 
is used. If the path is relative it is w.r.t the conf dir. 
+  * format : (optional) output format. values are text|xml|html|none . default 
is 'text'. irrespective of the format, the body is emitted as a field called 
'text'.   Just that the content format would be different. Use 'none' if the 
body is not to be parsed i.e only metadata is emitted.
+  * parser : (optional) Default is org.apache.tika.parser.!AutoDetectParser . 
Povide a FQN of a class which implements org.apache.tika.parser.Parser
  
+ ==== fields ====
+ Each field may have an optional attribute meta="true". Which means this field 
is to be obtained from the !MetaData of the document. The column value is used 
as the key on metadata. Checkout the list of available keys from here 
[[http://svn.apache.org/viewvc/lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/DublinCore.java?revision=801678&view=markup
 | DublinCore]] , 
[[http://svn.apache.org/viewvc/lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/MSOffice.java?revision=801678&view=markup
 |MSOffice]]
+ 

Reply via email to