It seems an OK way of doing it to me.

I don't know how expensive those range queries are, but if it turns out they do eat a lot of performance and/or you want more control over exactly how scoring is done, AFAIK you'll have to get into the guts of Lucene and define a custom scorer as documented here:


http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/package-summary.html#scoring .

However this is expert territory so I wouldn't go there lightly.

Another approach, which I've tried out using raw Lucene but not via Nutch, is to implement Andrzej's suggestion - have a new "dateBoost" field, and set the content of that field to be "1 1 1 1 ...", where the number of "1" characters equals the date of the page from some arbitrary earliest date.

For example, I set it to the number of weeks from 1997, which was my earliest known document date.

Then, at query time include "... AND dateBoost:1", so that newer fields get a higher score.

You can fool around with specifying a run-time boost on the dateBoost field to tune the importance of the document's last modified time relative to other factors (static doc score, other query terms).

-- Ken


On Dec 18, 2007, at 12:01 AM, chris sleeman wrote:

Hi,

I am interested in writing a plugin, where the recency of the document,
would also be a determinant as far as relevance/scoring is concerned. I
don't want to sort by date, but would rather like to boost the score for
pages which are most recently indexed.

Have tried adding range queries using a custom query filter, which creates
queries of the form -

<query> AND +(date:[20071215 TO 20071218]^3.0 date:[20071201 TO 20071214]^
2.0 date:[20071103 TO 20071201]^1.5 date:[00000000 TO 20071102])^1.0


But I am not sure whether this is a good way or whether including date range
clauses would have an adverse impact on performance.
Am I missing something? Is there a better way of doing this? Any help would
be much appreciated.

Regards,
Chris


--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"

Reply via email to