Thanks for the pointer on ordering. Based upon the presence of certain fields, nutch can decide whether to send a doc to solr. However, I guess its means changes to the main code line instead of driven by plugin. Let me see the source.
I am eagerly waiting for the patch from Markus. Best, Sourajit On Thu, Jan 24, 2013 at 7:08 AM, feng lu <[email protected]> wrote: > Hi Sourajit > > >>>> > We have an implementation of Indexing filter that runs side-by-side the > indexer-basic plugin. How is the order determined ? > <<<< > First, Make sure you indexing filter plugin is set currectly at > plugin.includes property in nutch-site.xml configuration file. The indexing > filter order is determined by indexingfilter.order property like this. > String class1 = "YouIndexingFilter"; > String class2 = "org.apache.nutch.indexer.basic.BasicIndexingFilter"; > conf.set(IndexingFilters.INDEXINGFILTER_ORDER, class1 + " " + class2); > IndexingFilters filters = new IndexingFilters(conf); > > >>>>> > Also, how do I do conditional indexing i.e. stop certain urls from being > indexed ? I think I can apply a filter but that approach will not work > since we index based on the page contents. > <<<<< > May be now you can filter certain urls by returning a null value. you can > see the IndexingFilter API comment. But now indexfilter can not delete a > existing document in back-end search engine. > and Markus will fix this in > https://issues.apache.org/jira/browse/NUTCH-1449 > . > > > On Wed, Jan 23, 2013 at 7:24 PM, Sourajit Basak <[email protected] > >wrote: > > > Markus - Can you please share your patch ? > > > > On Wed, Jan 23, 2013 at 1:52 PM, Tejas Patil <[email protected] > > >wrote: > > > > > Hi Sourajit, > > > See indexingfilter.order in nutch-default.xml > > > > > > Thanks, > > > Tejas Patil > > > > > > On Wed, Jan 23, 2013 at 12:16 AM, Sourajit Basak > > > <[email protected]>wrote: > > > > > > > We have an implementation of Indexing filter that runs side-by-side > the > > > > indexer-basic plugin. How is the order determined ? > > > > Also, how do I do conditional indexing i.e. stop certain urls from > > being > > > > indexed ? I think I can apply a filter but that approach will not > work > > > > since we index based on the page contents. > > > > > > > > > > > > > -- > Don't Grow Old, Grow Up... :-) >

