The order of plugins depends on three things. First, plugins are executed based on their components. For instance, the url filter plugins are executed before other plugins because url filtering happens at an early stage of crawling. The following summarizes the order of components in Nutch:
seed list -> url filter -> crawldb (some functions from scoring plugins are called here) -> fetching raw data (download site contents) -> parsing the raw data -> structuring the parsed data into fields (title, text, anchor text, metadata and so on)-> sending the structured data to storage for usage (like ElasticSearch and Solr). Second, if you have multiple url filters, then their execution is managed by their appearance in the plugin property as i have experienced. Third, Nutch allows to configure running plugins within the same component using a property. For instance, this is taken from nutch-default.xml: <property> <name>indexingfilter.order</name> <value></value> <description>The order by which index filters are applied. If empty, all available index filters (as dictated by properties plugin-includes and plugin-excludes above) are loaded and applied in system defined order. If not empty, only named filters are loaded and applied in given order. For example, if this property has value: org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter then BasicIndexingFilter is applied first, and MoreIndexingFilter second. Filter ordering might have impact on result if one filter depends on output of another filter. </description> </property> I hope that answers your question -- View this message in context: http://lucene.472066.n3.nabble.com/Order-of-Execution-of-Plugins-tp4190766p4190779.html Sent from the Nutch - User mailing list archive at Nabble.com.

