Thats a great suggestion, Ken. Thanks a lot. --Chris
On Dec 19, 2007 2:53 AM, Ken Krugler <[EMAIL PROTECTED]> wrote: > >It seems an OK way of doing it to me. > > > >I don't know how expensive those range queries are, but if it turns > >out they do eat a lot of performance and/or you want more control > >over exactly how scoring is done, AFAIK you'll have to get into the > >guts of Lucene and define a custom scorer as documented here: > > > > > > > http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/package-summary.html#scoring > >. > > > >However this is expert territory so I wouldn't go there lightly. > > Another approach, which I've tried out using raw Lucene but not via > Nutch, is to implement Andrzej's suggestion - have a new "dateBoost" > field, and set the content of that field to be "1 1 1 1 ...", where > the number of "1" characters equals the date of the page from some > arbitrary earliest date. > > For example, I set it to the number of weeks from 1997, which was my > earliest known document date. > > Then, at query time include "... AND dateBoost:1", so that newer > fields get a higher score. > > You can fool around with specifying a run-time boost on the dateBoost > field to tune the importance of the document's last modified time > relative to other factors (static doc score, other query terms). > > -- Ken > > > >On Dec 18, 2007, at 12:01 AM, chris sleeman wrote: > > > >>Hi, > >> > >>I am interested in writing a plugin, where the recency of the document, > >>would also be a determinant as far as relevance/scoring is concerned. I > >>don't want to sort by date, but would rather like to boost the score for > >>pages which are most recently indexed. > >> > >>Have tried adding range queries using a custom query filter, which > creates > >>queries of the form - > >> > >><query> AND +(date:[20071215 TO 20071218]^3.0 date:[20071201 TO > 20071214]^ > >>2.0 date:[20071103 TO 20071201]^1.5 date:[00000000 TO 20071102])^1.0 > >> > >> > >>But I am not sure whether this is a good way or whether including date > range > >>clauses would have an adverse impact on performance. > >>Am I missing something? Is there a better way of doing this? Any help > would > >>be much appreciated. > >> > >>Regards, > >>Chris > > > -- > Ken Krugler > Krugle, Inc. > +1 530-210-6378 > "If you can't find it, you can't fix it" >
