Hi folks, Some help would be appreciated. Thanks a bunch..
Cheers, Abi On Mon, Feb 7, 2011 at 10:46 AM, .: Abhishek :. <[email protected]> wrote: > Hi, > > Thanks again for your time and patience. > > The boost makes sense now. I am kind of not sure how to exclude the entire > document because there are only two methods, > > - public NutchDocument filter(NutchDocument doc, Parse parse, Text url, > CrawlDatum datum, Inlinks inlinks) > throws IndexingException > - public void addIndexBackendOptions(Configuration conf) > > > May be should I add nothing in the document and/or return a null?? > > ./Abi > > > On Mon, Feb 7, 2011 at 10:07 AM, Markus Jelsma <[email protected] > > wrote: > >> Hi, >> >> A high boost depends on the index and query time boosts on other fields. >> If the >> highest boost on a field is N, then N*100 will certainly do the trick. >> >> I haven't studied the LuceneWriter but storing and indexing parameters are >> very familiar. Storing a field means it can be retrieved along with the >> document if it's queried. Having it indexed just means it can be queried. >> But >> this is about fields, not on the entire document itself. >> >> In an indexing filter you want to exclude the entire document. >> >> Cheers, >> >> > Hi Markus, >> > >> > Thanks for the quick reply. >> > >> > Could you tell me a possible a value for the high boost such that its >> to >> > be negated? or Is there a way I can calculate or find that out. >> > >> > Also, for the other approach on using indexing filter does the ("...", >> > LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the work? >> > >> > Thanks, >> > Abi >> > >> > On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma >> <[email protected]>wrote: >> > > Hi, >> > > >> > > A negative boost does not exist and a very low boost is still a boost. >> In >> > > queries, you can work around the problem by giving a very high boost >> do >> > > documents that do not match; the negation parameter with a high boost >> > > will do >> > > the trick. >> > > >> > > If you don't want to index certain documents then you'll need an >> indexing >> > > filter. That's a different approach. >> > > >> > > Cheers, >> > > >> > > > Hi all, >> > > > >> > > > I was looking at the following example, >> > > > >> > > > http://wiki.apache.org/nutch/WritingPluginExample >> > > > >> > > > In the example, the author sets a boost of 5.0f for the recommended >> > > > tag. >> > > > >> > > > In this same way, can I also set a boost value such that a tag or >> > > >> > > content >> > > >> > > > is never indexed at all? If so, what would be the boost value? On a >> > > >> > > related >> > > >> > > > note, what are the default content that are usually(by default) >> indexed >> > > >> > > by >> > > >> > > > Lucene? >> > > > >> > > > Thanks a bunch for all your time and patience. Have a good day. >> > > > >> > > > Cheers, >> > > > Abi >> > >

