Re: Creating new cluster with existing config in zookeeper

2016-03-23 Thread Shawn Heisey
On 3/23/2016 9:43 AM, Robert Brown wrote:
> When going to the admin UI on this new server I can see the
> shards/replica's of the existing collection, and can even query it,
> even tho this new server has no cores on it itself.
>
> Is this all expected behaviour?
>

Yes.   There were some bugs early on in the 4.x versions where this
*didn't* work, but those were fixed.  This functionality was fully
intentional.

> Is there any performance gain with what I have at this precise stage? 
> The extra server certainly makes it appear i could balance more
> load/requests, but I guess the queries are just being forwarded on to
> the servers with the actual data?

As Erick said, the requests are being handled by the servers that
actually host the data.  The new node is just acting as a data ferry. 
There is no performance gain unless you use the ADDREPLICA feature of
the collections API to add replicas of your existing shards to the new
node, so some of the query load is transferred.

> Am I correct in thinking I can now create a new collection on this
> host, and begin to build up a new cluster?  and they won't interfere
> with each other at all?
>
> Also, that I'll be able to see both collections when using the admin
> UI Cloud page on any of the servers in either collection?

You're still mixing up "collection" and "cluster" in your terminology. 
This is somewhat understandable ... the concepts do have some
similarity.  Within a single cluster (servers sharing a particular
database/chroot in zookeeper), all collections (logical indexes) in the
entire cluster will be usable on any machine in the cluster.

Thanks,
Shawn



Re: Creating new cluster with existing config in zookeeper

2016-03-23 Thread Robert Brown

Thanks all,

I am no doubt confusing things myself - I (rather stupidly) have 5 
completely separate clouds, with separate ZK trees - a bad design 
decision on day one when I thought each config needed a separate ZK tree.


So it could all be simplified a bit, but that's my current view, which 
is probably sounding confused.


Cheers,
Rob


On 03/23/2016 04:03 PM, Tom Evans wrote:

On Wed, Mar 23, 2016 at 3:43 PM, Robert Brown  wrote:

So I setup a new solr server to point to my existing ZK configs.

When going to the admin UI on this new server I can see the shards/replica's
of the existing collection, and can even query it, even tho this new server
has no cores on it itself.

Is this all expected behaviour?

Is there any performance gain with what I have at this precise stage?  The
extra server certainly makes it appear i could balance more load/requests,
but I guess the queries are just being forwarded on to the servers with the
actual data?

Am I correct in thinking I can now create a new collection on this host, and
begin to build up a new cluster?  and they won't interfere with each other
at all?

Also, that I'll be able to see both collections when using the admin UI
Cloud page on any of the servers in either collection?


I'm confused slightly:

SolrCloud is a (singular) cluster of servers, storing all of its state
and configuration underneath a single zookeeper path. The cluster
contains collections. Collections are tied to a particular config set
within the cluster. Collections are made up of 1 or more shards. Each
shard is a core, and there are 1 or more replicas of each core.

You can add more servers to the cluster, and then create a new
collection with the same config as an existing collection, but it is
still part of the same cluster. Of course, you could think of a set of
servers within a cluster as a "logical" cluster if it just serves
particular collection, but "cluster" to me would be all of the servers
within the same zookeeper tree, because that is where cluster state is
maintained.

Cheers

Tom




Re: Creating new cluster with existing config in zookeeper

2016-03-23 Thread Erick Erickson
> Is this all expected behaviour?
Yes. As Since each Solr node has access to the entire state, an
arbitrary Solr node can figure out where to forward a request for some
collection it doesn't host.

>Is there any performance gain with what I have at this precise stage?  The 
>extra server certainly makes it appear i could balance more load/requests, but 
>I guess the queries are just being forwarded on to the servers with the actual 
>data?

Not really. The work of searching is done by the Solr nodes hosting
the collection.

Am I correct in thinking I can now create a new collection on this
host, and begin to build up a new cluster?  and they won't interfere
with each other at all?
Yes

Also, that I'll be able to see both collections when using the admin
UI Cloud page on any of the servers in either collection?
Yes



On Wed, Mar 23, 2016 at 8:43 AM, Robert Brown  wrote:
> So I setup a new solr server to point to my existing ZK configs.
>
> When going to the admin UI on this new server I can see the shards/replica's
> of the existing collection, and can even query it, even tho this new server
> has no cores on it itself.
>
> Is this all expected behaviour?
>
> Is there any performance gain with what I have at this precise stage?  The
> extra server certainly makes it appear i could balance more load/requests,
> but I guess the queries are just being forwarded on to the servers with the
> actual data?
>
> Am I correct in thinking I can now create a new collection on this host, and
> begin to build up a new cluster?  and they won't interfere with each other
> at all?
>
> Also, that I'll be able to see both collections when using the admin UI
> Cloud page on any of the servers in either collection?
>
> Thanks,
> Rob
>
>
>
> On 03/22/2016 04:47 PM, Erick Erickson wrote:
>>
>> The whole _point_ of configsets is to re-use them in multiple
>> collections, so please do!
>>
>> Best,
>> Erick
>>
>> On Tue, Mar 22, 2016 at 5:38 AM, Robert Brown 
>> wrote:
>>>
>>> Hi,
>>>
>>> Is it safe to create a new cluster but use an existing config set that's
>>> in
>>> zookeeper?  Or does that config set contain the cluster status too?
>>>
>>> I want to (re)-build a cluster from scratch, with a different amount of
>>> shards, but not using shard-splitting.
>>>
>>> Thanks,
>>> Rob
>>>
>


Re: Creating new cluster with existing config in zookeeper

2016-03-23 Thread Tom Evans
On Wed, Mar 23, 2016 at 3:43 PM, Robert Brown  wrote:
> So I setup a new solr server to point to my existing ZK configs.
>
> When going to the admin UI on this new server I can see the shards/replica's
> of the existing collection, and can even query it, even tho this new server
> has no cores on it itself.
>
> Is this all expected behaviour?
>
> Is there any performance gain with what I have at this precise stage?  The
> extra server certainly makes it appear i could balance more load/requests,
> but I guess the queries are just being forwarded on to the servers with the
> actual data?
>
> Am I correct in thinking I can now create a new collection on this host, and
> begin to build up a new cluster?  and they won't interfere with each other
> at all?
>
> Also, that I'll be able to see both collections when using the admin UI
> Cloud page on any of the servers in either collection?
>

I'm confused slightly:

SolrCloud is a (singular) cluster of servers, storing all of its state
and configuration underneath a single zookeeper path. The cluster
contains collections. Collections are tied to a particular config set
within the cluster. Collections are made up of 1 or more shards. Each
shard is a core, and there are 1 or more replicas of each core.

You can add more servers to the cluster, and then create a new
collection with the same config as an existing collection, but it is
still part of the same cluster. Of course, you could think of a set of
servers within a cluster as a "logical" cluster if it just serves
particular collection, but "cluster" to me would be all of the servers
within the same zookeeper tree, because that is where cluster state is
maintained.

Cheers

Tom


Re: Creating new cluster with existing config in zookeeper

2016-03-23 Thread Shawn Heisey
On 3/22/2016 11:16 AM, Robert Brown wrote:
> Thanks Erick and Shawn, a "collection" is indeed what I meant.
>
> I was under the impression the entire Tree view in the admin GUI was
> showing everything in ZK, including things like
> "collections/name/state.json", not just the /configs directory.
>
> The solr.xml file is too isn't it? (I added it to ZK as per the docs),
> just a bit confusing to see some files/directories from ZK, and some not.

Info you may already know:  When you create a new collection using the
Collections API, you can give it the name of an existing config with the
collections.configName parameter.  Changes to that config will affect
all collections using it.  You'll usually need to reload the collection
so it re-reads the config from zookeeper.

The "Tree" view does show you the entire zookeeper database from Solr's
point of view.  This is the common information available to every Solr
server in your entire cloud.  It contains information about every
collection, all of your uploaded configs, and a few other things.

SolrCloud still requires a fair amount of information at the core level
that isn't stored in zookeeper.  The index is too big, and we haven't
gotten around to the rest of it.  We do want to get to the point where
everything relevant for a SolrCloud core (except the Lucene index) is
stored in zookeeper -- for internal discussions, we call this "ZK as truth":

https://issues.apache.org/jira/browse/SOLR-7269

Thanks,
Shawn



Re: Creating new cluster with existing config in zookeeper

2016-03-23 Thread Robert Brown

So I setup a new solr server to point to my existing ZK configs.

When going to the admin UI on this new server I can see the 
shards/replica's of the existing collection, and can even query it, even 
tho this new server has no cores on it itself.


Is this all expected behaviour?

Is there any performance gain with what I have at this precise stage?  
The extra server certainly makes it appear i could balance more 
load/requests, but I guess the queries are just being forwarded on to 
the servers with the actual data?


Am I correct in thinking I can now create a new collection on this host, 
and begin to build up a new cluster?  and they won't interfere with each 
other at all?


Also, that I'll be able to see both collections when using the admin UI 
Cloud page on any of the servers in either collection?


Thanks,
Rob



On 03/22/2016 04:47 PM, Erick Erickson wrote:

The whole _point_ of configsets is to re-use them in multiple
collections, so please do!

Best,
Erick

On Tue, Mar 22, 2016 at 5:38 AM, Robert Brown  wrote:

Hi,

Is it safe to create a new cluster but use an existing config set that's in
zookeeper?  Or does that config set contain the cluster status too?

I want to (re)-build a cluster from scratch, with a different amount of
shards, but not using shard-splitting.

Thanks,
Rob





Re: Creating new cluster with existing config in zookeeper

2016-03-22 Thread Robert Brown

Thanks Erick and Shawn, a "collection" is indeed what I meant.

I was under the impression the entire Tree view in the admin GUI was 
showing everything in ZK, including things like 
"collections/name/state.json", not just the /configs directory.


The solr.xml file is too isn't it? (I added it to ZK as per the docs), 
just a bit confusing to see some files/directories from ZK, and some not.


Thanks for any more insight.



On 03/22/2016 04:57 PM, Shawn Heisey wrote:

On 3/22/2016 6:38 AM, Robert Brown wrote:
Is it safe to create a new cluster but use an existing config set 
that's in zookeeper?  Or does that config set contain the cluster 
status too?


I want to (re)-build a cluster from scratch, with a different amount 
of shards, but not using shard-splitting.


When you say "cluster" what exactly do you mean?

To me, "cluster" in a Solr context means "a bunch of Solr servers."  
If this is what you mean, there is nothing built in to copy things 
from an existing cluster.  You *can* run multiple SolrCloud clusters 
on one Zookeeper ensemble.


If you are actually talking about a *collection* when you say 
"cluster", then what Erick said is 100% correct.


Thanks,
Shawn





Re: Creating new cluster with existing config in zookeeper

2016-03-22 Thread Shawn Heisey

On 3/22/2016 6:38 AM, Robert Brown wrote:
Is it safe to create a new cluster but use an existing config set 
that's in zookeeper?  Or does that config set contain the cluster 
status too?


I want to (re)-build a cluster from scratch, with a different amount 
of shards, but not using shard-splitting.


When you say "cluster" what exactly do you mean?

To me, "cluster" in a Solr context means "a bunch of Solr servers."  If 
this is what you mean, there is nothing built in to copy things from an 
existing cluster.  You *can* run multiple SolrCloud clusters on one 
Zookeeper ensemble.


If you are actually talking about a *collection* when you say "cluster", 
then what Erick said is 100% correct.


Thanks,
Shawn



Re: Creating new cluster with existing config in zookeeper

2016-03-22 Thread Erick Erickson
The whole _point_ of configsets is to re-use them in multiple
collections, so please do!

Best,
Erick

On Tue, Mar 22, 2016 at 5:38 AM, Robert Brown  wrote:
> Hi,
>
> Is it safe to create a new cluster but use an existing config set that's in
> zookeeper?  Or does that config set contain the cluster status too?
>
> I want to (re)-build a cluster from scratch, with a different amount of
> shards, but not using shard-splitting.
>
> Thanks,
> Rob
>


Creating new cluster with existing config in zookeeper

2016-03-22 Thread Robert Brown

Hi,

Is it safe to create a new cluster but use an existing config set that's 
in zookeeper?  Or does that config set contain the cluster status too?


I want to (re)-build a cluster from scratch, with a different amount of 
shards, but not using shard-splitting.


Thanks,
Rob