The order of plugins depends on three things.

First, plugins are executed based on their components. For instance, the url
filter plugins are executed before other plugins because url filtering
happens at an early stage of crawling.  The following summarizes the order
of components in Nutch:

seed list -> url filter -> crawldb (some functions from scoring plugins are
called here) -> fetching raw data (download site contents) 
-> parsing the raw data -> structuring the parsed data into fields (title,
text, anchor text, metadata and so on)-> 
sending the structured data to storage for usage (like ElasticSearch and
Solr).

Second, if you have multiple url filters, then their execution is managed by
their appearance in the plugin property as i have experienced. 

Third, Nutch allows to configure running plugins within the same component
using a property. For instance, this is taken from nutch-default.xml:

<property>
  <name>indexingfilter.order</name>
  <value></value>
  <description>The order by which index filters are applied.
  If empty, all available index filters (as dictated by properties
  plugin-includes and plugin-excludes above) are loaded and applied in
system
  defined order. If not empty, only named filters are loaded and applied
  in given order. For example, if this property has value:
  org.apache.nutch.indexer.basic.BasicIndexingFilter
org.apache.nutch.indexer.more.MoreIndexingFilter
  then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
  
  Filter ordering might have impact on result if one filter depends on
output of
  another filter.
  </description>
</property>

I hope that answers your question




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Order-of-Execution-of-Plugins-tp4190766p4190779.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to