I think my question is easier, because I think the problem below was
caused by the very first startup of the 'ldwa01' collection/'ldwa01cfg'
zk collection name didn't specify the number of shards (and thus
defaulted to 1).

So, how can I change the number of shards for an existing collection/zk
collection name, especially when the ZK ensemble in question is the
production version and supporting other Solr collections that I do not
want to interrupt. (Which I think means that I can't just delete the
clusterstate.json and restart the ZKs as this will also lose the other
Solr collection information.)

Thanks in advance, Gil

-----Original Message-----
From: Hoggarth, Gil [mailto:gil.hogga...@bl.uk] 
Sent: 24 October 2013 10:13
To: solr-user@lucene.apache.org
Subject: RE: New shard leaders or existing shard replicas depends on
zookeeper?

Absolutely, the scenario I'm seeing does _sound_ like I've not specified
the number of shards, but I think I have - the evidence is:
- DnumShards=24 defined within the /etc/sysconfig/solrnode* files

- DnumShards=24 seen on each 'ps' line (two nodes listed here):
" tomcat   26135     1  5 09:51 ?        00:00:22 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode1/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode1 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode1/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode1/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode1
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode1/tmp
org.apache.catalina.startup.Bootstrap start
tomcat   26225     1  5 09:51 ?        00:00:19 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode2/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode2 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode2/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode2/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode2
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode2/tmp
org.apache.catalina.startup.Bootstrap start"

- The Solr node dashboard shows "-DnumShards=24" in its list of Args for
each node

And yet, the ldwa01 nodes are leader and replica of shard 17 and there
are no other shard leaders created. Plus, if I only change the ZK
ensemble declarations in /etc/system/solrnode* to the different dev ZK
servers, all 24 leaders are created before any replicas are added.

I can also mention, when I browse the Cloud view, I can see both the
ldwa01 collection and the ukdomain collection listed, suggesting that
this information comes from the ZKs - I assume this is as expected.
Plus, the correct node addresses (e.g., 192.168.45.17:8984) are listed
for ldwa01 but these addresses are also listed as 'Down' in the ukdomain
collection (except for :8983 which only shows in the ldwa01 collection).

Any help very gratefully received.
Gil

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 23 October 2013 18:50
To: solr-user@lucene.apache.org
Subject: Re: New shard leaders or existing shard replicas depends on
zookeeper?

My first impulse would be to ask how you created the collection. It sure
_sounds_ like you didn't specify 24 shards and thus have only a single
shard, one leader and 23 replicas....

bq: ...to point to the zookeeper ensemble also used for the ukdomain
collection...

so my guess is that this ZK ensemble has the ldwa01 collection defined
as having only one shard....

I admit I pretty much skimmed your post though...

Best,
Erick


On Wed, Oct 23, 2013 at 12:54 PM, Hoggarth, Gil <gil.hogga...@bl.uk>
wrote:

> Hi solr-users,
>
>
>
> I'm seeing some confusing behaviour in Solr/zookeeper and hope you can

> shed some light on what's happening/how I can correct it.
>
>
>
> We have two physical servers running automated builds of RedHat 6.4 
> and Solr 4.4.0 that host two separate Solr services. The first server 
> (called ld01) has 24 shards and hosts a collection called 'ukdomain'; 
> the second server (ld02) also has 24 shards and hosts a different 
> collection called 'ldwa01'. It's evidently important to note that 
> previously both of these physical servers provided the 'ukdomain'
> collection, but the 'ldwa01' server has been rebuilt for the new 
> collection.
>
>
>
> When I start the ldwa01 solr nodes with their zookeeper configuration 
> (defined in /etc/sysconfig/solrnode* and with collection.configName as
> 'ldwa01cfg') pointing to the development zookeeper ensemble, all nodes

> initially become shard leaders and then replicas as I'd expect. But if

> I change the ldwa01 solr nodes to point to the zookeeper ensemble also

> used for the ukdomain collection, all ldwa01 solr nodes start on the 
> same shard (that is, the first ldwa01 solr node becomes the shard 
> leader, then every other solr node becomes a replica for this shard).
> The significant point here is no other ldwa01 shards gain leaders (or
replicas).
>
>
>
> The ukdomain collection uses a zookeeper collection.configName of 
> 'ukdomaincfg', and prior to the creation of this ldwa01 service the 
> collection.configName of 'ldwa01cfg' has never previously been used.
So 
> I'm confused why the ldwa01 service would differ when the only 
> difference is which zookeeper ensemble is used (both zookeeper 
> ensembles are automatedly built using version 3.4.5).
>
>
>
> If anyone can explain why this is happening and how I can get the
ldwa01 
> services to start correctly using the non-development zookeeper 
> ensemble, I'd be very grateful! If more information or explanation is 
> needed, just ask.
>
>
>
> Thanks, Gil
>
>
>
> Gil Hoggarth
>
> Web Archiving Technical Services Engineer
>
> The British Library, Boston Spa, West Yorkshire, LS23 7BQ
>
>
>
>

Reply via email to