Re: Distributed Searching - Limitations?

Yonik Seeley Fri, 19 Dec 2008 06:11:24 -0800

On Fri, Dec 19, 2008 at 1:59 AM, Pooja Verlani <pooja.verl...@gmail.com> wrote:
> Hi,
> I am planning to use Solr's distributed searching for my project. But while
> going through http://wiki.apache.org/solr/DistributedSearch, i found a few
> limitations with it. Can anyone please explain the 2nd and 3rd points in the
> limitations sections on the page. The points are:
>
>   When duplicate doc IDs are received, Solr chooses the first doc and
>   discards subsequent ones


This one isn't a limitation IMO... IDs are supposed to be unique.
If someone indexes the same document (same meaning has the same
ID/uniqueKey) then distributed search will handle this error condition
relatively gracefully by using the first doc returned and ignoring
others.

>   No distributed idf

idf is part of scoring, based on the inverse document frequency of
terms in the query - rarer terms count more than common terms.

When the index is split across multiple shards, scoring is currently
done locally on each shard, not globally across all shards. If the
index isn't well mixed, say all documents pertaining to one subject
are in one shard, then that can skew the scoring since a term that is
rare on one shard may not be rare across the whole collection.

-Yonik

Re: Distributed Searching - Limitations?

Reply via email to