Thats a great suggestion, Ken. Thanks a lot.

--Chris

On Dec 19, 2007 2:53 AM, Ken Krugler <[EMAIL PROTECTED]> wrote:

> >It seems an OK way of doing it to me.
> >
> >I don't know how expensive those range queries are, but if it turns
> >out they do eat a lot of performance and/or you want more control
> >over exactly how scoring is done, AFAIK you'll have to get into the
> >guts of Lucene and define a custom scorer as documented here:
> >
> >
> >
> http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/package-summary.html#scoring
> >.
> >
> >However this is expert territory so I wouldn't go there lightly.
>
> Another approach, which I've tried out using raw Lucene but not via
> Nutch, is to implement Andrzej's suggestion - have a new "dateBoost"
> field, and set the content of that field to be "1 1 1 1 ...", where
> the number of "1" characters equals the date of the page from some
> arbitrary earliest date.
>
> For example, I set it to the number of weeks from 1997, which was my
> earliest known document date.
>
> Then, at query time include "... AND dateBoost:1", so that newer
> fields get a higher score.
>
> You can fool around with specifying a run-time boost on the dateBoost
> field to tune the importance of the document's last modified time
> relative to other factors (static doc score, other query terms).
>
> -- Ken
>
>
> >On Dec 18, 2007, at 12:01 AM, chris sleeman wrote:
> >
> >>Hi,
> >>
> >>I am interested in writing a plugin, where the recency of the document,
> >>would also be a determinant as far as relevance/scoring is concerned. I
> >>don't want to sort by date, but would rather like to boost the score for
> >>pages which are most recently indexed.
> >>
> >>Have tried adding range queries using a custom query filter, which
> creates
> >>queries of the form -
> >>
> >><query> AND +(date:[20071215 TO 20071218]^3.0 date:[20071201 TO
> 20071214]^
> >>2.0 date:[20071103 TO 20071201]^1.5 date:[00000000 TO 20071102])^1.0
> >>
> >>
> >>But I am not sure whether this is a good way or whether including date
> range
> >>clauses would have an adverse impact on performance.
> >>Am I missing something? Is there a better way of doing this? Any help
> would
> >>be much appreciated.
> >>
> >>Regards,
> >>Chris
>
>
> --
> Ken Krugler
> Krugle, Inc.
> +1 530-210-6378
> "If you can't find it, you can't fix it"
>

Reply via email to