On 5/18/06, jason rutherglen <[EMAIL PROTECTED]> wrote:
It uses Jakarta HTTP Client.  And implements a PriorityQueue like thing using 
the java.util.concurrent queues and thread pool for merging results.

Are you able to contribute this code, or is it proprietary?

Have you implemented sorting by field also?  That would currently
require the additional constraint that the sort field be stored as
well as indexed (Lucene only requires it be indexed).

Perhaps the global IDF is not a big deal?  The idea is to distribute evenly 
over all the machines the documents.  However when a new server comes online, 
this may present a problem as it would start at 0 documents.

Hmmm, yes, idf values could get out-of-whack when there are very few
documents on a server.

I probably would not cache the global IDF, would simply merge it each time.  I 
actually do not fully understand what the global IDF means as I need to dig 
more deeply into this.

inverse-document-frequency.  it makes rarer terms count more.
it's two components are the number of docs in the collection, and the
number of docs containing a specific term.

> I don't think everything can be done in a single call since by the
time you score docs against a query you have lost how you arrived at
the composite score.

I'm not sure what this means "you have lost how you arrived at
 the composite score" could you explain.

If you query for "x OR y", the doc score you get will be a combination
of the doc score for x and the doc score for y.   After you have the
document score for the complete query, you can't adjust the IDF for
just one of the terms because you don't know the individual scores for
x and y anymore.

-Yonik

Reply via email to