It seems an OK way of doing it to me.
I don't know how expensive those range queries are, but if it turns
out they do eat a lot of performance and/or you want more control
over exactly how scoring is done, AFAIK you'll have to get into the
guts of Lucene and define a custom scorer as documented here:
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/package-summary.html#scoring
.
However this is expert territory so I wouldn't go there lightly.
Another approach, which I've tried out using raw Lucene but not via
Nutch, is to implement Andrzej's suggestion - have a new "dateBoost"
field, and set the content of that field to be "1 1 1 1 ...", where
the number of "1" characters equals the date of the page from some
arbitrary earliest date.
For example, I set it to the number of weeks from 1997, which was my
earliest known document date.
Then, at query time include "... AND dateBoost:1", so that newer
fields get a higher score.
You can fool around with specifying a run-time boost on the dateBoost
field to tune the importance of the document's last modified time
relative to other factors (static doc score, other query terms).
-- Ken
On Dec 18, 2007, at 12:01 AM, chris sleeman wrote:
Hi,
I am interested in writing a plugin, where the recency of the document,
would also be a determinant as far as relevance/scoring is concerned. I
don't want to sort by date, but would rather like to boost the score for
pages which are most recently indexed.
Have tried adding range queries using a custom query filter, which creates
queries of the form -
<query> AND +(date:[20071215 TO 20071218]^3.0 date:[20071201 TO 20071214]^
2.0 date:[20071103 TO 20071201]^1.5 date:[00000000 TO 20071102])^1.0
But I am not sure whether this is a good way or whether including date range
clauses would have an adverse impact on performance.
Am I missing something? Is there a better way of doing this? Any help would
be much appreciated.
Regards,
Chris
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"