Thanks Arkadi. Thanks all for your patience and guidance.
On Tue, Feb 8, 2011 at 8:48 AM, <[email protected]> wrote: > You can exclude documents by returning NULL from an index filter. > > Regards, > > Arkadi > > >-----Original Message----- > >From: .: Abhishek :. [mailto:[email protected]] > >Sent: Tuesday, February 08, 2011 11:44 AM > >To: [email protected]; [email protected] > >Subject: Re: Indexing question - Setting low boost > > > >Hi folks, > > > > Some help would be appreciated. Thanks a bunch.. > > > >Cheers, > >Abi > > > > > >On Mon, Feb 7, 2011 at 10:46 AM, .: Abhishek :. <[email protected]> > >wrote: > > > >> Hi, > >> > >> Thanks again for your time and patience. > >> > >> The boost makes sense now. I am kind of not sure how to exclude the > >entire > >> document because there are only two methods, > >> > >> - public NutchDocument filter(NutchDocument doc, Parse parse, Text > >url, > >> CrawlDatum datum, Inlinks inlinks) > >> throws IndexingException > >> - public void addIndexBackendOptions(Configuration conf) > >> > >> > >> May be should I add nothing in the document and/or return a null?? > >> > >> ./Abi > >> > >> > >> On Mon, Feb 7, 2011 at 10:07 AM, Markus Jelsma > ><[email protected] > >> > wrote: > >> > >>> Hi, > >>> > >>> A high boost depends on the index and query time boosts on other > >fields. > >>> If the > >>> highest boost on a field is N, then N*100 will certainly do the > >trick. > >>> > >>> I haven't studied the LuceneWriter but storing and indexing > >parameters are > >>> very familiar. Storing a field means it can be retrieved along with > >the > >>> document if it's queried. Having it indexed just means it can be > >queried. > >>> But > >>> this is about fields, not on the entire document itself. > >>> > >>> In an indexing filter you want to exclude the entire document. > >>> > >>> Cheers, > >>> > >>> > Hi Markus, > >>> > > >>> > Thanks for the quick reply. > >>> > > >>> > Could you tell me a possible a value for the high boost such that > >its > >>> to > >>> > be negated? or Is there a way I can calculate or find that out. > >>> > > >>> > Also, for the other approach on using indexing filter does the > >("...", > >>> > LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the > >work? > >>> > > >>> > Thanks, > >>> > Abi > >>> > > >>> > On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma > >>> <[email protected]>wrote: > >>> > > Hi, > >>> > > > >>> > > A negative boost does not exist and a very low boost is still a > >boost. > >>> In > >>> > > queries, you can work around the problem by giving a very high > >boost > >>> do > >>> > > documents that do not match; the negation parameter with a high > >boost > >>> > > will do > >>> > > the trick. > >>> > > > >>> > > If you don't want to index certain documents then you'll need an > >>> indexing > >>> > > filter. That's a different approach. > >>> > > > >>> > > Cheers, > >>> > > > >>> > > > Hi all, > >>> > > > > >>> > > > I was looking at the following example, > >>> > > > > >>> > > > http://wiki.apache.org/nutch/WritingPluginExample > >>> > > > > >>> > > > In the example, the author sets a boost of 5.0f for the > >recommended > >>> > > > tag. > >>> > > > > >>> > > > In this same way, can I also set a boost value such that a tag > >or > >>> > > > >>> > > content > >>> > > > >>> > > > is never indexed at all? If so, what would be the boost value? > >On a > >>> > > > >>> > > related > >>> > > > >>> > > > note, what are the default content that are usually(by default) > >>> indexed > >>> > > > >>> > > by > >>> > > > >>> > > > Lucene? > >>> > > > > >>> > > > Thanks a bunch for all your time and patience. Have a good > >day. > >>> > > > > >>> > > > Cheers, > >>> > > > Abi > >>> > >> > >> >

