Hi,
Thanks again for your time and patience.
The boost makes sense now. I am kind of not sure how to exclude the entire
document because there are only two methods,
- public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
CrawlDatum datum, Inlinks inlinks)
throws IndexingException
- public void addIndexBackendOptions(Configuration conf)
May be should I add nothing in the document and/or return a null??
./Abi
On Mon, Feb 7, 2011 at 10:07 AM, Markus Jelsma
<[email protected]>wrote:
> Hi,
>
> A high boost depends on the index and query time boosts on other fields. If
> the
> highest boost on a field is N, then N*100 will certainly do the trick.
>
> I haven't studied the LuceneWriter but storing and indexing parameters are
> very familiar. Storing a field means it can be retrieved along with the
> document if it's queried. Having it indexed just means it can be queried.
> But
> this is about fields, not on the entire document itself.
>
> In an indexing filter you want to exclude the entire document.
>
> Cheers,
>
> > Hi Markus,
> >
> > Thanks for the quick reply.
> >
> > Could you tell me a possible a value for the high boost such that its to
> > be negated? or Is there a way I can calculate or find that out.
> >
> > Also, for the other approach on using indexing filter does the ("...",
> > LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the work?
> >
> > Thanks,
> > Abi
> >
> > On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma
> <[email protected]>wrote:
> > > Hi,
> > >
> > > A negative boost does not exist and a very low boost is still a boost.
> In
> > > queries, you can work around the problem by giving a very high boost do
> > > documents that do not match; the negation parameter with a high boost
> > > will do
> > > the trick.
> > >
> > > If you don't want to index certain documents then you'll need an
> indexing
> > > filter. That's a different approach.
> > >
> > > Cheers,
> > >
> > > > Hi all,
> > > >
> > > > I was looking at the following example,
> > > >
> > > > http://wiki.apache.org/nutch/WritingPluginExample
> > > >
> > > > In the example, the author sets a boost of 5.0f for the recommended
> > > > tag.
> > > >
> > > > In this same way, can I also set a boost value such that a tag or
> > >
> > > content
> > >
> > > > is never indexed at all? If so, what would be the boost value? On a
> > >
> > > related
> > >
> > > > note, what are the default content that are usually(by default)
> indexed
> > >
> > > by
> > >
> > > > Lucene?
> > > >
> > > > Thanks a bunch for all your time and patience. Have a good day.
> > > >
> > > > Cheers,
> > > > Abi
>