On Fri, Dec 19, 2008 at 1:59 AM, Pooja Verlani <pooja.verl...@gmail.com> wrote: > Hi, > I am planning to use Solr's distributed searching for my project. But while > going through http://wiki.apache.org/solr/DistributedSearch, i found a few > limitations with it. Can anyone please explain the 2nd and 3rd points in the > limitations sections on the page. The points are: > > When duplicate doc IDs are received, Solr chooses the first doc and > discards subsequent ones
This one isn't a limitation IMO... IDs are supposed to be unique. If someone indexes the same document (same meaning has the same ID/uniqueKey) then distributed search will handle this error condition relatively gracefully by using the first doc returned and ignoring others. > No distributed idf idf is part of scoring, based on the inverse document frequency of terms in the query - rarer terms count more than common terms. When the index is split across multiple shards, scoring is currently done locally on each shard, not globally across all shards. If the index isn't well mixed, say all documents pertaining to one subject are in one shard, then that can skew the scoring since a term that is rare on one shard may not be rare across the whole collection. -Yonik