Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "TikaPlugin" page has been changed by JulienNioche.
http://wiki.apache.org/nutch/TikaPlugin?action=diff&rev1=3&rev2=4

--------------------------------------------------

  = Tika Plugin =
  The Tika plugin in http://issues.apache.org/jira/browse/NUTCH-766 is a first 
attempt at delegating the parsing to Tika instead of having to maintain the 
parser plugins in Nutch. This page will list the differences in coverage or 
functionality between the Tika plugin and the existing Nutch parsers. Tika also 
has more formats not covered by Nutch which are not described here and has a 
more generic capability of representing structured content which can be useful 
for HtmlParseFilters (which are currently limited to HTML content).
  
- '''html''': ?
+ '''html''': comparable
  
  '''js''': ?
  
@@ -21, +21 @@

  
  '''rss''': ?
  
- '''rtf''': comparable
+ '''rtf''': deactivated in Nutch for licensing reasons | works in Tika
  
  '''swf''' : not yet covered in Tika (see 
https://issues.apache.org/jira/browse/TIKA-337)
  

Reply via email to