Hi,

I just need some feedback on the patch before I commit it. It's quite a big
change and we need to get it right

J.

On 24 January 2013 07:40, Markus Jelsma <[email protected]> wrote:

> Hi
>
> I'll have to wait until Julien commits his work for pluggable indexing
> back ends. That should not take a long time now.
>
> Cheers
>
>
>
> -----Original message-----
> > From:Sourajit Basak <[email protected]>
> > Sent: Thu 24-Jan-2013 05:36
> > To: [email protected]
> > Subject: Re: conditional indexing
> >
> > Thanks for the pointer on ordering.
> >
> > Based upon the presence of certain fields, nutch can decide whether to
> send
> > a doc to solr. However, I guess its means changes to the main code line
> > instead of driven by plugin. Let me see the source.
> >
> > I am eagerly waiting for the patch from Markus.
> >
> > Best,
> > Sourajit
> >
> > On Thu, Jan 24, 2013 at 7:08 AM, feng lu <[email protected]> wrote:
> >
> > > Hi Sourajit
> > >
> > > >>>>
> > > We have an implementation of Indexing filter that runs side-by-side the
> > > indexer-basic plugin. How is the order determined ?
> > > <<<<
> > > First, Make sure you indexing filter plugin is set currectly at
> > > plugin.includes property in nutch-site.xml configuration file. The
> indexing
> > > filter order is determined by indexingfilter.order property like this.
> > > String class1 = "YouIndexingFilter";
> > > String class2 = "org.apache.nutch.indexer.basic.BasicIndexingFilter";
> > > conf.set(IndexingFilters.INDEXINGFILTER_ORDER, class1 + " " + class2);
> > > IndexingFilters filters = new IndexingFilters(conf);
> > >
> > > >>>>>
> > > Also, how do I do conditional indexing i.e. stop certain urls from
> being
> > > indexed ? I think I can apply a filter but that approach will not work
> > > since we index based on the page contents.
> > > <<<<<
> > > May be now you can filter certain urls by returning a null value. you
> can
> > > see the IndexingFilter API comment. But now indexfilter can not delete
> a
> > > existing document in back-end search engine.
> > > and Markus will fix this in
> > > https://issues.apache.org/jira/browse/NUTCH-1449
> > > .
> > >
> > >
> > > On Wed, Jan 23, 2013 at 7:24 PM, Sourajit Basak <
> [email protected]
> > > >wrote:
> > >
> > > > Markus - Can you please share your patch ?
> > > >
> > > > On Wed, Jan 23, 2013 at 1:52 PM, Tejas Patil <
> [email protected]
> > > > >wrote:
> > > >
> > > > > Hi Sourajit,
> > > > > See indexingfilter.order in nutch-default.xml
> > > > >
> > > > > Thanks,
> > > > > Tejas Patil
> > > > >
> > > > > On Wed, Jan 23, 2013 at 12:16 AM, Sourajit Basak
> > > > > <[email protected]>wrote:
> > > > >
> > > > > > We have an implementation of Indexing filter that runs
> side-by-side
> > > the
> > > > > > indexer-basic plugin. How is the order determined ?
> > > > > > Also, how do I do conditional indexing i.e. stop certain urls
> from
> > > > being
> > > > > > indexed ? I think I can apply a filter but that approach will not
> > > work
> > > > > > since we index based on the page contents.
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Don't Grow Old, Grow Up... :-)
> > >
> >
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to