Re: SolrCloud logical shards

Lance Norskog Thu, 14 Jan 2010 16:31:32 -0800

Yonik spake-
    I'm actually starting to lean toward "slice" instead of "logical shard".
    In the future we'll want to enable overlapping shards I think (due to
   an Amazon Dynamo type of replication, or due to merging shards, etc),v
   and a separate word for a logical slice of the index seems desirable.


   For instance, one could specify slice=1000-1999 (defined by the ids or
   hashcodes of the ids) and that could end up querying multiple servers.
   For this first iteration, slices would just be opaque identifiers
   though (and that functionality would always remain, allowing for user
   partitioning by time or by geo region).

+1

Logical-to-physical mapping should not assume that the logical has an
integral number of the physical. Overlapping and partial physical
shards should be addressable as a logical shard. If you're going to do
something this major, do it right.

On Thu, Jan 14, 2010 at 3:29 PM, Ted Dunning <[email protected]> wrote:
> Shard has the interesting additional implication that it is part of a
> composite index made up of many sub-indexes.
>
> A lucene index could be a complete index or a shard.  I would presume the
> same of what might be called a core.
>
> On Thu, Jan 14, 2010 at 3:21 PM, Jason Rutherglen <
> [email protected]> wrote:
>
>> Uri,
>>
>> > "core" to represent a single index and "shard" to be
>> > represented by a single core
>>
>> Can you elaborate on what you mean, isn't a core a single index
>> too? It seems like shard was used to represent a remote index
>> (perhaps?). Though here I'd prefer "remote core", because to the
>> uninitiated Solr outsider it's immediately obvious (i.e. they
>> need only know what a core is, in the Solr glossary or term
>> dictionary).
>>
>> In Google vernacular, which is where the name shard came from, a
>> "shard" is basically a local sub-index
>> http://research.google.com/archive/googlecluster.html where
>> there would be many "shards" per server. However that's a
>> digression at this point.
>>
>> I personally prefer relatively straightforward names, that are
>> self-evident, rather than inventing new language for fairly
>> simple concepts. Slice, even though it comes from our buddy
>> Yonik, probably doesn't make any immediate sense to external
>> users when compared with the word shard. Of course software
>> projects have a tendency to create their own words to somewhat
>> mystify users into believing in some sort of magic occurring
>> underneath. If that's what we're after, it's cool, I mean that
>> makes sense. And I don't mean to be derogatory here however this
>> is an open source project created in part to educate users on
>> search and be made easily accessible as possible, to the
>> greatest number of users possible. I think Doug did a create job
>> of this when Lucene started with amazingly succinct code for
>> fairly complex concepts (eg, anti-mystification of search).
>>
>> Jason
>>
>> On Thu, Jan 14, 2010 at 2:58 PM, Uri Boness <[email protected]> wrote:
>> > Although Jason has some valid points here, I'm with Yonik here. I do
>> believe
>> > that we've gotten used to the terms "core" to represent a single index
>> and
>> > "shard" to be represented by a single core. A "node" seems to indicate a
>> > machine or a JVM. Changing any of these (informal perhaps) definitions
>> will
>> > only cause confusion. That's why I think a "slice" is a good solution
>> now...
>> > first it's a new term to a new view of the index (logical shard AFAIK
>> don't
>> > really exists yet) so people won't need to get used to it, but it's also
>> > descriptive and intuitive. I do like Jason's idea about having a protocol
>> > attached to the URL's.
>> >
>> > Cheers,
>> > Uri
>> >
>> > Jason Rutherglen wrote:
>> >>>
>> >>> But I've kind of gotten used to thinking of shards as the
>> >>> actual physical queryable things...
>> >>>
>> >>
>> >> I think a mistake was made referring to Solr cores as shards.
>> >> It's the same thing with 2 different names. Slices adds yet
>> >> another name which seems to imply the same thing yet again. I'd
>> >> rather see disambiguation here, and call them cores (partially
>> >> because that's what's in the code and on the wiki), and cores
>> >> only. It's a Solr specific term, it's going to be confused with
>> >> microprocessor cores, but at least there's only one name, which
>> >> as search people, we know creates fewer posting lists :).
>> >>
>> >> Logical groupings of cores can occur, which can be aptly named
>> >> core groups. This way I can submit a query to a core group, and
>> >> it's reasonable to assume I'm hitting N cores. Further, cores
>> >> could point to a logical or physical entity via a URL. (As a
>> >> side note, I've always found it odd that the shards param to
>> >> RequestHandler lacks the protocol, what if I want to use HTTPS
>> >> for example?).
>> >>
>> >> So there could be http://host/solr/core1 (physical),
>> >> core://megacorename (logical),
>> >> coregroup://supergreatcoregroupname (a group of cores) in the
>> >> "shards" parameter (whose name can perhaps be changed for
>> >> clarity in a future release). Then people can mix and match and
>> >> we won't have many different XML elements floating around. We'd
>> >> have a simple list of URLs that are transposed into a real
>> >> physical network request.
>> >>
>> >>
>> >> On Thu, Jan 14, 2010 at 12:56 PM, Yonik Seeley
>> >> <[email protected]> wrote:
>> >>
>> >>>
>> >>> On Thu, Jan 14, 2010 at 1:38 PM, Yonik Seeley
>> >>> <[email protected]> wrote:
>> >>>
>> >>>>
>> >>>> On Thu, Jan 14, 2010 at 12:46 PM, Yonik Seeley
>> >>>> <[email protected]> wrote:
>> >>>>
>> >>>>>
>> >>>>> I'm actually starting to lean toward "slice" instead of "logical
>> >>>>> shard".
>> >>>>>
>> >>>
>> >>> Alternate terminology could be "index" for the actual physical lucene
>> >>> lindex (and also enough of the URL that unambiguously identifies it),
>> >>> and then "shard" could be the logical entity.
>> >>>
>> >>> But I've kind of gotten used to thinking of shards as the actual
>> >>> physical queryable things...
>> >>>
>> >>> -Yonik
>> >>> http://www.lucidimagination.com
>> >>>
>> >>>
>> >>
>> >>
>> >
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>



-- 
Lance Norskog
[email protected]

Re: SolrCloud logical shards

Reply via email to