Hi folks,

 Some help would be appreciated. Thanks a bunch..

Cheers,
Abi


On Mon, Feb 7, 2011 at 10:46 AM, .: Abhishek :. <[email protected]> wrote:

> Hi,
>
>  Thanks again for your time and patience.
>
>  The boost makes sense now. I am kind of not sure how to exclude the entire
> document because there are only two methods,
>
>    - public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
>    CrawlDatum datum, Inlinks inlinks)
>        throws IndexingException
>    - public void addIndexBackendOptions(Configuration conf)
>
>
>  May be should I add nothing in the document and/or return a null??
>
> ./Abi
>
>
> On Mon, Feb 7, 2011 at 10:07 AM, Markus Jelsma <[email protected]
> > wrote:
>
>> Hi,
>>
>> A high boost depends on the index and query time boosts on other fields.
>> If the
>> highest boost on a field is N, then N*100 will certainly do the trick.
>>
>> I haven't studied the LuceneWriter but storing and indexing parameters are
>> very familiar. Storing a field means it can be retrieved along with the
>> document if it's queried. Having it indexed just means it can be queried.
>> But
>> this is about fields, not on the entire document itself.
>>
>> In an indexing filter you want to exclude the entire document.
>>
>> Cheers,
>>
>> > Hi Markus,
>> >
>> >  Thanks for the quick reply.
>> >
>> >  Could you tell me a possible a value for the high boost such that its
>> to
>> > be negated? or Is there a way I can calculate or find that out.
>> >
>> >  Also, for the other approach on using indexing filter does the ("...",
>> > LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the work?
>> >
>> > Thanks,
>> > Abi
>> >
>> > On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma
>> <[email protected]>wrote:
>> > > Hi,
>> > >
>> > > A negative boost does not exist and a very low boost is still a boost.
>> In
>> > > queries, you can work around the problem by giving a very high boost
>> do
>> > > documents that do not match; the negation parameter with a high boost
>> > > will do
>> > > the trick.
>> > >
>> > > If you don't want to index certain documents then you'll need an
>> indexing
>> > > filter. That's a different approach.
>> > >
>> > > Cheers,
>> > >
>> > > > Hi all,
>> > > >
>> > > >  I was looking at the following example,
>> > > >
>> > > >  http://wiki.apache.org/nutch/WritingPluginExample
>> > > >
>> > > >  In the example, the author sets a boost of 5.0f for the recommended
>> > > >  tag.
>> > > >
>> > > >  In this same way, can I also set a boost value such that a tag or
>> > >
>> > > content
>> > >
>> > > > is never indexed at all? If so, what would be the boost value? On a
>> > >
>> > > related
>> > >
>> > > > note, what are the default content that are usually(by default)
>> indexed
>> > >
>> > > by
>> > >
>> > > > Lucene?
>> > > >
>> > > >  Thanks a bunch for all your time and patience. Have a good day.
>> > > >
>> > > > Cheers,
>> > > > Abi
>>
>
>

Reply via email to