Hi Sourajit

>>>>
We have an implementation of Indexing filter that runs side-by-side the
indexer-basic plugin. How is the order determined ?
<<<<
First, Make sure you indexing filter plugin is set currectly at
plugin.includes property in nutch-site.xml configuration file. The indexing
filter order is determined by indexingfilter.order property like this.
String class1 = "YouIndexingFilter";
String class2 = "org.apache.nutch.indexer.basic.BasicIndexingFilter";
conf.set(IndexingFilters.INDEXINGFILTER_ORDER, class1 + " " + class2);
IndexingFilters filters = new IndexingFilters(conf);

>>>>>
Also, how do I do conditional indexing i.e. stop certain urls from being
indexed ? I think I can apply a filter but that approach will not work
since we index based on the page contents.
<<<<<
May be now you can filter certain urls by returning a null value. you can
see the IndexingFilter API comment. But now indexfilter can not delete a
existing document in back-end search engine.
and Markus will fix this in https://issues.apache.org/jira/browse/NUTCH-1449
.


On Wed, Jan 23, 2013 at 7:24 PM, Sourajit Basak <[email protected]>wrote:

> Markus - Can you please share your patch ?
>
> On Wed, Jan 23, 2013 at 1:52 PM, Tejas Patil <[email protected]
> >wrote:
>
> > Hi Sourajit,
> > See indexingfilter.order in nutch-default.xml
> >
> > Thanks,
> > Tejas Patil
> >
> > On Wed, Jan 23, 2013 at 12:16 AM, Sourajit Basak
> > <[email protected]>wrote:
> >
> > > We have an implementation of Indexing filter that runs side-by-side the
> > > indexer-basic plugin. How is the order determined ?
> > > Also, how do I do conditional indexing i.e. stop certain urls from
> being
> > > indexed ? I think I can apply a filter but that approach will not work
> > > since we index based on the page contents.
> > >
> >
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to