You can exclude documents by returning NULL from an index filter. Regards,
Arkadi >-----Original Message----- >From: .: Abhishek :. [mailto:[email protected]] >Sent: Tuesday, February 08, 2011 11:44 AM >To: [email protected]; [email protected] >Subject: Re: Indexing question - Setting low boost > >Hi folks, > > Some help would be appreciated. Thanks a bunch.. > >Cheers, >Abi > > >On Mon, Feb 7, 2011 at 10:46 AM, .: Abhishek :. <[email protected]> >wrote: > >> Hi, >> >> Thanks again for your time and patience. >> >> The boost makes sense now. I am kind of not sure how to exclude the >entire >> document because there are only two methods, >> >> - public NutchDocument filter(NutchDocument doc, Parse parse, Text >url, >> CrawlDatum datum, Inlinks inlinks) >> throws IndexingException >> - public void addIndexBackendOptions(Configuration conf) >> >> >> May be should I add nothing in the document and/or return a null?? >> >> ./Abi >> >> >> On Mon, Feb 7, 2011 at 10:07 AM, Markus Jelsma ><[email protected] >> > wrote: >> >>> Hi, >>> >>> A high boost depends on the index and query time boosts on other >fields. >>> If the >>> highest boost on a field is N, then N*100 will certainly do the >trick. >>> >>> I haven't studied the LuceneWriter but storing and indexing >parameters are >>> very familiar. Storing a field means it can be retrieved along with >the >>> document if it's queried. Having it indexed just means it can be >queried. >>> But >>> this is about fields, not on the entire document itself. >>> >>> In an indexing filter you want to exclude the entire document. >>> >>> Cheers, >>> >>> > Hi Markus, >>> > >>> > Thanks for the quick reply. >>> > >>> > Could you tell me a possible a value for the high boost such that >its >>> to >>> > be negated? or Is there a way I can calculate or find that out. >>> > >>> > Also, for the other approach on using indexing filter does the >("...", >>> > LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the >work? >>> > >>> > Thanks, >>> > Abi >>> > >>> > On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma >>> <[email protected]>wrote: >>> > > Hi, >>> > > >>> > > A negative boost does not exist and a very low boost is still a >boost. >>> In >>> > > queries, you can work around the problem by giving a very high >boost >>> do >>> > > documents that do not match; the negation parameter with a high >boost >>> > > will do >>> > > the trick. >>> > > >>> > > If you don't want to index certain documents then you'll need an >>> indexing >>> > > filter. That's a different approach. >>> > > >>> > > Cheers, >>> > > >>> > > > Hi all, >>> > > > >>> > > > I was looking at the following example, >>> > > > >>> > > > http://wiki.apache.org/nutch/WritingPluginExample >>> > > > >>> > > > In the example, the author sets a boost of 5.0f for the >recommended >>> > > > tag. >>> > > > >>> > > > In this same way, can I also set a boost value such that a tag >or >>> > > >>> > > content >>> > > >>> > > > is never indexed at all? If so, what would be the boost value? >On a >>> > > >>> > > related >>> > > >>> > > > note, what are the default content that are usually(by default) >>> indexed >>> > > >>> > > by >>> > > >>> > > > Lucene? >>> > > > >>> > > > Thanks a bunch for all your time and patience. Have a good >day. >>> > > > >>> > > > Cheers, >>> > > > Abi >>> >> >>

