On Thu, Feb 11, 2010 at 6:56 AM, abhishes <abhis...@gmail.com> wrote:
>
> Thanks really useful article.
>
> I am wondering about this statement in the article
>
> "Keep in mind that Solr does not calculate universal term/doc frequencies.
> At a large scale, its not likely  to matter that tf/idf is calculated at the
> shard level - however, if your collection is heavily skewed in its
> distribution across servers, you might take issue with the relevance
> results. Its probably best to randomly distribute documents to your shards"
>
> So if there is no universal tf/idf kept, then how does solr determine the
> rank of two documents which came from different shards in a distributed
> search query?

tf is per document, so it's the same distributed or non-distributed.
idf (inverse document frequency) is the measure of the rareness of a term.
Scoring in distributed search only considers the term rareness within
the shard.  Solr still orders documents from different shards by this
score.

Even after we integrate distributed idf, it will be optional because
it comes with a cost and is often unnecessary.

-Yonik
http://www.lucidimagination.com

Reply via email to