Re: SolrCloud logical shards

Uri Boness Fri, 15 Jan 2010 01:25:08 -0800


Can you elaborate on what you mean, isn't a core a single index
too? It seems like shard was used to represent a remote index
(perhaps?).

Yes, a core is a single index and a shard is a conceptual idea which atthe moment concretely refers to a remote core (but not a specific one asthe same shard can be represented by multiple core replicas). The pointI was trying to make is that I believe that if you start changingterminologies now people will be very confused. And I thought ofsticking to Yonik's suggestion of a "slice" just to prevent thisconfusion. On the other hand one can argue that the terminology as it istoday is already confusing... and if you really want to get it right andbe aligned with the "rest of the world" (if there is such a thing...from what I've seen so far sharding is used differently in differentcontexts), then perhaps a "good" timing for making such terminologychanges is with a major release (Solr 2.0?) as with such release peopletend to be more open for new/changed concepts.


Cheers,
Uri

Jason Rutherglen wrote:

Uri,

"core" to represent a single index and "shard" to be
represented by a single core


Can you elaborate on what you mean, isn't a core a single index
too? It seems like shard was used to represent a remote index
(perhaps?). Though here I'd prefer "remote core", because to the
uninitiated Solr outsider it's immediately obvious (i.e. they
need only know what a core is, in the Solr glossary or term
dictionary).

In Google vernacular, which is where the name shard came from, a
"shard" is basically a local sub-index
http://research.google.com/archive/googlecluster.html where
there would be many "shards" per server. However that's a
digression at this point.

I personally prefer relatively straightforward names, that are
self-evident, rather than inventing new language for fairly
simple concepts. Slice, even though it comes from our buddy
Yonik, probably doesn't make any immediate sense to external
users when compared with the word shard. Of course software
projects have a tendency to create their own words to somewhat
mystify users into believing in some sort of magic occurring
underneath. If that's what we're after, it's cool, I mean that
makes sense. And I don't mean to be derogatory here however this
is an open source project created in part to educate users on
search and be made easily accessible as possible, to the
greatest number of users possible. I think Doug did a create job
of this when Lucene started with amazingly succinct code for
fairly complex concepts (eg, anti-mystification of search).

Jason

On Thu, Jan 14, 2010 at 2:58 PM, Uri Boness <ubon...@gmail.com> wrote:

Although Jason has some valid points here, I'm with Yonik here. I do believe
that we've gotten used to the terms "core" to represent a single index and
"shard" to be represented by a single core. A "node" seems to indicate a
machine or a JVM. Changing any of these (informal perhaps) definitions will
only cause confusion. That's why I think a "slice" is a good solution now...
first it's a new term to a new view of the index (logical shard AFAIK don't
really exists yet) so people won't need to get used to it, but it's also
descriptive and intuitive. I do like Jason's idea about having a protocol
attached to the URL's.

Cheers,
Uri

Jason Rutherglen wrote:

But I've kind of gotten used to thinking of shards as the
actual physical queryable things...

I think a mistake was made referring to Solr cores as shards.
It's the same thing with 2 different names. Slices adds yet
another name which seems to imply the same thing yet again. I'd
rather see disambiguation here, and call them cores (partially
because that's what's in the code and on the wiki), and cores
only. It's a Solr specific term, it's going to be confused with
microprocessor cores, but at least there's only one name, which
as search people, we know creates fewer posting lists :).

Logical groupings of cores can occur, which can be aptly named
core groups. This way I can submit a query to a core group, and
it's reasonable to assume I'm hitting N cores. Further, cores
could point to a logical or physical entity via a URL. (As a
side note, I've always found it odd that the shards param to
RequestHandler lacks the protocol, what if I want to use HTTPS
for example?).

So there could be http://host/solr/core1 (physical),
core://megacorename (logical),
coregroup://supergreatcoregroupname (a group of cores) in the
"shards" parameter (whose name can perhaps be changed for
clarity in a future release). Then people can mix and match and
we won't have many different XML elements floating around. We'd
have a simple list of URLs that are transposed into a real
physical network request.


On Thu, Jan 14, 2010 at 12:56 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:

On Thu, Jan 14, 2010 at 1:38 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:

On Thu, Jan 14, 2010 at 12:46 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:

I'm actually starting to lean toward "slice" instead of "logical
shard".

Alternate terminology could be "index" for the actual physical lucene
lindex (and also enough of the URL that unambiguously identifies it),
and then "shard" could be the logical entity.

But I've kind of gotten used to thinking of shards as the actual
physical queryable things...

-Yonik
http://www.lucidimagination.com

Re: SolrCloud logical shards

Reply via email to