You can exclude documents by returning NULL from an index filter.

Regards,

Arkadi

>-----Original Message-----
>From: .: Abhishek :. [mailto:[email protected]]
>Sent: Tuesday, February 08, 2011 11:44 AM
>To: [email protected]; [email protected]
>Subject: Re: Indexing question - Setting low boost
>
>Hi folks,
>
> Some help would be appreciated. Thanks a bunch..
>
>Cheers,
>Abi
>
>
>On Mon, Feb 7, 2011 at 10:46 AM, .: Abhishek :. <[email protected]>
>wrote:
>
>> Hi,
>>
>>  Thanks again for your time and patience.
>>
>>  The boost makes sense now. I am kind of not sure how to exclude the
>entire
>> document because there are only two methods,
>>
>>    - public NutchDocument filter(NutchDocument doc, Parse parse, Text
>url,
>>    CrawlDatum datum, Inlinks inlinks)
>>        throws IndexingException
>>    - public void addIndexBackendOptions(Configuration conf)
>>
>>
>>  May be should I add nothing in the document and/or return a null??
>>
>> ./Abi
>>
>>
>> On Mon, Feb 7, 2011 at 10:07 AM, Markus Jelsma
><[email protected]
>> > wrote:
>>
>>> Hi,
>>>
>>> A high boost depends on the index and query time boosts on other
>fields.
>>> If the
>>> highest boost on a field is N, then N*100 will certainly do the
>trick.
>>>
>>> I haven't studied the LuceneWriter but storing and indexing
>parameters are
>>> very familiar. Storing a field means it can be retrieved along with
>the
>>> document if it's queried. Having it indexed just means it can be
>queried.
>>> But
>>> this is about fields, not on the entire document itself.
>>>
>>> In an indexing filter you want to exclude the entire document.
>>>
>>> Cheers,
>>>
>>> > Hi Markus,
>>> >
>>> >  Thanks for the quick reply.
>>> >
>>> >  Could you tell me a possible a value for the high boost such that
>its
>>> to
>>> > be negated? or Is there a way I can calculate or find that out.
>>> >
>>> >  Also, for the other approach on using indexing filter does the
>("...",
>>> > LuceneWriter.STORE.YES, LuceneWriter.INDEX.NO, conf); does the
>work?
>>> >
>>> > Thanks,
>>> > Abi
>>> >
>>> > On Mon, Feb 7, 2011 at 9:34 AM, Markus Jelsma
>>> <[email protected]>wrote:
>>> > > Hi,
>>> > >
>>> > > A negative boost does not exist and a very low boost is still a
>boost.
>>> In
>>> > > queries, you can work around the problem by giving a very high
>boost
>>> do
>>> > > documents that do not match; the negation parameter with a high
>boost
>>> > > will do
>>> > > the trick.
>>> > >
>>> > > If you don't want to index certain documents then you'll need an
>>> indexing
>>> > > filter. That's a different approach.
>>> > >
>>> > > Cheers,
>>> > >
>>> > > > Hi all,
>>> > > >
>>> > > >  I was looking at the following example,
>>> > > >
>>> > > >  http://wiki.apache.org/nutch/WritingPluginExample
>>> > > >
>>> > > >  In the example, the author sets a boost of 5.0f for the
>recommended
>>> > > >  tag.
>>> > > >
>>> > > >  In this same way, can I also set a boost value such that a tag
>or
>>> > >
>>> > > content
>>> > >
>>> > > > is never indexed at all? If so, what would be the boost value?
>On a
>>> > >
>>> > > related
>>> > >
>>> > > > note, what are the default content that are usually(by default)
>>> indexed
>>> > >
>>> > > by
>>> > >
>>> > > > Lucene?
>>> > > >
>>> > > >  Thanks a bunch for all your time and patience. Have a good
>day.
>>> > > >
>>> > > > Cheers,
>>> > > > Abi
>>>
>>
>>

Reply via email to