Hi, I just need some feedback on the patch before I commit it. It's quite a big change and we need to get it right
J. On 24 January 2013 07:40, Markus Jelsma <[email protected]> wrote: > Hi > > I'll have to wait until Julien commits his work for pluggable indexing > back ends. That should not take a long time now. > > Cheers > > > > -----Original message----- > > From:Sourajit Basak <[email protected]> > > Sent: Thu 24-Jan-2013 05:36 > > To: [email protected] > > Subject: Re: conditional indexing > > > > Thanks for the pointer on ordering. > > > > Based upon the presence of certain fields, nutch can decide whether to > send > > a doc to solr. However, I guess its means changes to the main code line > > instead of driven by plugin. Let me see the source. > > > > I am eagerly waiting for the patch from Markus. > > > > Best, > > Sourajit > > > > On Thu, Jan 24, 2013 at 7:08 AM, feng lu <[email protected]> wrote: > > > > > Hi Sourajit > > > > > > >>>> > > > We have an implementation of Indexing filter that runs side-by-side the > > > indexer-basic plugin. How is the order determined ? > > > <<<< > > > First, Make sure you indexing filter plugin is set currectly at > > > plugin.includes property in nutch-site.xml configuration file. The > indexing > > > filter order is determined by indexingfilter.order property like this. > > > String class1 = "YouIndexingFilter"; > > > String class2 = "org.apache.nutch.indexer.basic.BasicIndexingFilter"; > > > conf.set(IndexingFilters.INDEXINGFILTER_ORDER, class1 + " " + class2); > > > IndexingFilters filters = new IndexingFilters(conf); > > > > > > >>>>> > > > Also, how do I do conditional indexing i.e. stop certain urls from > being > > > indexed ? I think I can apply a filter but that approach will not work > > > since we index based on the page contents. > > > <<<<< > > > May be now you can filter certain urls by returning a null value. you > can > > > see the IndexingFilter API comment. But now indexfilter can not delete > a > > > existing document in back-end search engine. > > > and Markus will fix this in > > > https://issues.apache.org/jira/browse/NUTCH-1449 > > > . > > > > > > > > > On Wed, Jan 23, 2013 at 7:24 PM, Sourajit Basak < > [email protected] > > > >wrote: > > > > > > > Markus - Can you please share your patch ? > > > > > > > > On Wed, Jan 23, 2013 at 1:52 PM, Tejas Patil < > [email protected] > > > > >wrote: > > > > > > > > > Hi Sourajit, > > > > > See indexingfilter.order in nutch-default.xml > > > > > > > > > > Thanks, > > > > > Tejas Patil > > > > > > > > > > On Wed, Jan 23, 2013 at 12:16 AM, Sourajit Basak > > > > > <[email protected]>wrote: > > > > > > > > > > > We have an implementation of Indexing filter that runs > side-by-side > > > the > > > > > > indexer-basic plugin. How is the order determined ? > > > > > > Also, how do I do conditional indexing i.e. stop certain urls > from > > > > being > > > > > > indexed ? I think I can apply a filter but that approach will not > > > work > > > > > > since we index based on the page contents. > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Don't Grow Old, Grow Up... :-) > > > > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

