Thanks for the pointer on ordering.

Based upon the presence of certain fields, nutch can decide whether to send
a doc to solr. However, I guess its means changes to the main code line
instead of driven by plugin. Let me see the source.

I am eagerly waiting for the patch from Markus.

Best,
Sourajit

On Thu, Jan 24, 2013 at 7:08 AM, feng lu <[email protected]> wrote:

> Hi Sourajit
>
> >>>>
> We have an implementation of Indexing filter that runs side-by-side the
> indexer-basic plugin. How is the order determined ?
> <<<<
> First, Make sure you indexing filter plugin is set currectly at
> plugin.includes property in nutch-site.xml configuration file. The indexing
> filter order is determined by indexingfilter.order property like this.
> String class1 = "YouIndexingFilter";
> String class2 = "org.apache.nutch.indexer.basic.BasicIndexingFilter";
> conf.set(IndexingFilters.INDEXINGFILTER_ORDER, class1 + " " + class2);
> IndexingFilters filters = new IndexingFilters(conf);
>
> >>>>>
> Also, how do I do conditional indexing i.e. stop certain urls from being
> indexed ? I think I can apply a filter but that approach will not work
> since we index based on the page contents.
> <<<<<
> May be now you can filter certain urls by returning a null value. you can
> see the IndexingFilter API comment. But now indexfilter can not delete a
> existing document in back-end search engine.
> and Markus will fix this in
> https://issues.apache.org/jira/browse/NUTCH-1449
> .
>
>
> On Wed, Jan 23, 2013 at 7:24 PM, Sourajit Basak <[email protected]
> >wrote:
>
> > Markus - Can you please share your patch ?
> >
> > On Wed, Jan 23, 2013 at 1:52 PM, Tejas Patil <[email protected]
> > >wrote:
> >
> > > Hi Sourajit,
> > > See indexingfilter.order in nutch-default.xml
> > >
> > > Thanks,
> > > Tejas Patil
> > >
> > > On Wed, Jan 23, 2013 at 12:16 AM, Sourajit Basak
> > > <[email protected]>wrote:
> > >
> > > > We have an implementation of Indexing filter that runs side-by-side
> the
> > > > indexer-basic plugin. How is the order determined ?
> > > > Also, how do I do conditional indexing i.e. stop certain urls from
> > being
> > > > indexed ? I think I can apply a filter but that approach will not
> work
> > > > since we index based on the page contents.
> > > >
> > >
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>

Reply via email to