Re: SolrCloud logical shards

Yonik Seeley Sun, 17 Jan 2010 11:15:52 -0800

On Thu, Jan 14, 2010 at 2:43 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Thu, Jan 14, 2010 at 1:58 PM, Chris Hostetter
> <hossman_luc...@fucit.org> wrote:
>> : parameter we use for this.  Suggestions?  logicalshards=shard1,shard2?
>> : lshards=shard1,shard2?  slice=shard1,shard2? It doesn't seem like it
>> : would be easy to reuse the "shards" parameter for this since it refers
>> : to physical shard addresses.
>>
>> I haven't been following the SolrCloud stuff much, but from a client
>> perspective is there really any difference between asking for a physical
>> shard, vs asking for a logical shard (or slice name)? ... shouldn't the
>> later case just result in a resolution from logical->physical w/o
>> requiring the client code to know/care wether the String they have is a
>> physical shard URL, or a slice name.
>
> That might be doable... but we would need to be able to tell the difference.
> Perhaps we could always require a slash in a physical address
> (localhost/context) and prohibit it in slice names?
>
> But... I think there's still a potentially bigger difference: today,
> if shards is set, it means it's a distributed search (and shards is
> removed for sub-requests).  But the slice of the index being requested
> may not have a one-to-one mapping with a full request on a solr core.
> And shards may be able to move around, and so it seems important to be
> able to declare what part of the index you're looking for when you're
> querying a shard.


If we want to go this route for parameters (allowing use of both
physical or logical shards in the shards param), I've updated the wiki
with one way to do it:

"""
The presence of "shards" is what currently signals that a request is
distributed, and distrib search removes this param for sub-requests.
But with future micro-sharding or having a single core support
multiple shards, the request will need to contain what shards are
being requested. Reusing "shards" for this (per Hoss' suggestion) by
allowing either physical urls or logical shards (slices) would require
that either

    * a) The search component detect when it has all of the shards
requested, and turn it into a non-distributed request (any error here
could easily result in an infinite request loop until deadlock). It
seems better to return a specific error if this node no longer
contains the shard being queried in a non-distrib search.
    * b) Use a different distrib=true flag to indicate if this is a
distributed search. This isn't back compatible though? Unless we also
consider any request where shards contains a url to be distributed.

http://localhost:8983/solr/collection1/select?shards=shard_200911,shard_200912,shard_201001&distrib=true

If we adopt "distrib=true" then it should replace "shards=auto" in the
other example URLs
"""

So the top-level distributed request shown above would resolve to
potentially multiple sub-requests of the form
http://localhost:1234/solr/collection1/select?shards=shard_200911
(note, distrib=true has been removed)
http://localhost:1235/solr/collection1/select?shards=shard_200912
http://localhost:1236/solr/collection1/select?shards=shard_201001

-Yonik
http://www.lucidimagination.com

Re: SolrCloud logical shards

Reply via email to