Thanks Arkadi. Thanks all for your patience and guidance.

On Tue, Feb 8, 2011 at 8:48 AM, <[email protected]> wrote:

> You can exclude documents by returning NULL from an index filter.
>
> Regards,
>
> Arkadi
>
> >-----Original Message-----
> >From: .: Abhishek :. [mailto:[email protected]]
> >Sent: Tuesday, February 08, 2011 11:44 AM
> >To: [email protected]; [email protected]
> >Subject: Re: Indexing question - Setting low boost
> >
> >Hi folks,
> >
> > Some help would be appreciated. Thanks a bunch..
> >
> >Cheers,
> >Abi
> >
> >
> >On Mon, Feb 7, 2011 at 10:46 AM, .: Abhishek :. <[email protected]>
> >wrote:
> >
> >> Hi,
> >>
> >>  Thanks again for your time and patience.
> >>
> >>  The boost makes sense now. I am kind of not sure how to exclude the
> >entire
> >> document because there are only two methods,
> >>
> >>    - public NutchDocument filter(NutchDocument doc, Parse parse, Text
> >url,
> >>    CrawlDatum datum, Inlinks inlinks)
> >>        throws IndexingException
> >>    - public void addIndexBackendOptions(Configuration conf)
> >>
> >>
> >>  May be should I add nothing in the document and/or return a null??
> >>
> >> ./Abi
> >>
> >>
> >> On Mon, Feb 7, 2011 at 10:07 AM, Markus Jelsma
> ><[email protected]
> >> > wrote:
> >>
> >>> Hi,
> >>>
> >>> A high boost depends on the index and query time boosts on other
> >fields.
> >>> If the
> >>> highest boost on a field is N, then N*100 will certainly do the
> >trick.
> >>>
> >>> I haven't studied the LuceneWriter but storing and indexing
> >parameters are
> >>> very familiar. Storing a field means it can be retrieved along with
> >the
> >>> document if it's queried. Having it indexed just means it can be
> >queried.
> >>> But
> >>> this is about fields, not on the entire document itself.
> >>>
> >>> In an indexing filter you want to exclude the entire document.
> >>>
> >>> Cheers,
> >>>
> >>> > Hi Markus,
> >>> >
> >>> >  Thanks for the quick reply.
> >>> >
> >>> >  Could you tell me a possible a value for the high boost such that
> >its
> >>> to
> >>> > be negated? or Is there a way I can calculate or find that out.
> >>> >
> >>> >  Also, for the other approach on using indexing filter does the
> >("...",
> >>> > LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the
> >work?
> >>> >
> >>> > Thanks,
> >>> > Abi
> >>> >
> >>> > On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma
> >>> <[email protected]>wrote:
> >>> > > Hi,
> >>> > >
> >>> > > A negative boost does not exist and a very low boost is still a
> >boost.
> >>> In
> >>> > > queries, you can work around the problem by giving a very high
> >boost
> >>> do
> >>> > > documents that do not match; the negation parameter with a high
> >boost
> >>> > > will do
> >>> > > the trick.
> >>> > >
> >>> > > If you don't want to index certain documents then you'll need an
> >>> indexing
> >>> > > filter. That's a different approach.
> >>> > >
> >>> > > Cheers,
> >>> > >
> >>> > > > Hi all,
> >>> > > >
> >>> > > >  I was looking at the following example,
> >>> > > >
> >>> > > >  http://wiki.apache.org/nutch/WritingPluginExample
> >>> > > >
> >>> > > >  In the example, the author sets a boost of 5.0f for the
> >recommended
> >>> > > >  tag.
> >>> > > >
> >>> > > >  In this same way, can I also set a boost value such that a tag
> >or
> >>> > >
> >>> > > content
> >>> > >
> >>> > > > is never indexed at all? If so, what would be the boost value?
> >On a
> >>> > >
> >>> > > related
> >>> > >
> >>> > > > note, what are the default content that are usually(by default)
> >>> indexed
> >>> > >
> >>> > > by
> >>> > >
> >>> > > > Lucene?
> >>> > > >
> >>> > > >  Thanks a bunch for all your time and patience. Have a good
> >day.
> >>> > > >
> >>> > > > Cheers,
> >>> > > > Abi
> >>>
> >>
> >>
>

Reply via email to