Hi Sourajit >>>> We have an implementation of Indexing filter that runs side-by-side the indexer-basic plugin. How is the order determined ? <<<< First, Make sure you indexing filter plugin is set currectly at plugin.includes property in nutch-site.xml configuration file. The indexing filter order is determined by indexingfilter.order property like this. String class1 = "YouIndexingFilter"; String class2 = "org.apache.nutch.indexer.basic.BasicIndexingFilter"; conf.set(IndexingFilters.INDEXINGFILTER_ORDER, class1 + " " + class2); IndexingFilters filters = new IndexingFilters(conf);
>>>>> Also, how do I do conditional indexing i.e. stop certain urls from being indexed ? I think I can apply a filter but that approach will not work since we index based on the page contents. <<<<< May be now you can filter certain urls by returning a null value. you can see the IndexingFilter API comment. But now indexfilter can not delete a existing document in back-end search engine. and Markus will fix this in https://issues.apache.org/jira/browse/NUTCH-1449 . On Wed, Jan 23, 2013 at 7:24 PM, Sourajit Basak <[email protected]>wrote: > Markus - Can you please share your patch ? > > On Wed, Jan 23, 2013 at 1:52 PM, Tejas Patil <[email protected] > >wrote: > > > Hi Sourajit, > > See indexingfilter.order in nutch-default.xml > > > > Thanks, > > Tejas Patil > > > > On Wed, Jan 23, 2013 at 12:16 AM, Sourajit Basak > > <[email protected]>wrote: > > > > > We have an implementation of Indexing filter that runs side-by-side the > > > indexer-basic plugin. How is the order determined ? > > > Also, how do I do conditional indexing i.e. stop certain urls from > being > > > indexed ? I think I can apply a filter but that approach will not work > > > since we index based on the page contents. > > > > > > -- Don't Grow Old, Grow Up... :-)

