Hi,

 Thanks again for your time and patience.

 The boost makes sense now. I am kind of not sure how to exclude the entire
document because there are only two methods,

   - public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
   CrawlDatum datum, Inlinks inlinks)
       throws IndexingException
   - public void addIndexBackendOptions(Configuration conf)


 May be should I add nothing in the document and/or return a null??

./Abi

On Mon, Feb 7, 2011 at 10:07 AM, Markus Jelsma
<[email protected]>wrote:

> Hi,
>
> A high boost depends on the index and query time boosts on other fields. If
> the
> highest boost on a field is N, then N*100 will certainly do the trick.
>
> I haven't studied the LuceneWriter but storing and indexing parameters are
> very familiar. Storing a field means it can be retrieved along with the
> document if it's queried. Having it indexed just means it can be queried.
> But
> this is about fields, not on the entire document itself.
>
> In an indexing filter you want to exclude the entire document.
>
> Cheers,
>
> > Hi Markus,
> >
> >  Thanks for the quick reply.
> >
> >  Could you tell me a possible a value for the high boost such that its to
> > be negated? or Is there a way I can calculate or find that out.
> >
> >  Also, for the other approach on using indexing filter does the ("...",
> > LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the work?
> >
> > Thanks,
> > Abi
> >
> > On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma
> <[email protected]>wrote:
> > > Hi,
> > >
> > > A negative boost does not exist and a very low boost is still a boost.
> In
> > > queries, you can work around the problem by giving a very high boost do
> > > documents that do not match; the negation parameter with a high boost
> > > will do
> > > the trick.
> > >
> > > If you don't want to index certain documents then you'll need an
> indexing
> > > filter. That's a different approach.
> > >
> > > Cheers,
> > >
> > > > Hi all,
> > > >
> > > >  I was looking at the following example,
> > > >
> > > >  http://wiki.apache.org/nutch/WritingPluginExample
> > > >
> > > >  In the example, the author sets a boost of 5.0f for the recommended
> > > >  tag.
> > > >
> > > >  In this same way, can I also set a boost value such that a tag or
> > >
> > > content
> > >
> > > > is never indexed at all? If so, what would be the boost value? On a
> > >
> > > related
> > >
> > > > note, what are the default content that are usually(by default)
> indexed
> > >
> > > by
> > >
> > > > Lucene?
> > > >
> > > >  Thanks a bunch for all your time and patience. Have a good day.
> > > >
> > > > Cheers,
> > > > Abi
>

Reply via email to