I think my question is easier, because I think the problem below was caused by the very first startup of the 'ldwa01' collection/'ldwa01cfg' zk collection name didn't specify the number of shards (and thus defaulted to 1).
So, how can I change the number of shards for an existing collection/zk collection name, especially when the ZK ensemble in question is the production version and supporting other Solr collections that I do not want to interrupt. (Which I think means that I can't just delete the clusterstate.json and restart the ZKs as this will also lose the other Solr collection information.) Thanks in advance, Gil -----Original Message----- From: Hoggarth, Gil [mailto:gil.hogga...@bl.uk] Sent: 24 October 2013 10:13 To: solr-user@lucene.apache.org Subject: RE: New shard leaders or existing shard replicas depends on zookeeper? Absolutely, the scenario I'm seeing does _sound_ like I've not specified the number of shards, but I think I have - the evidence is: - DnumShards=24 defined within the /etc/sysconfig/solrnode* files - DnumShards=24 seen on each 'ps' line (two nodes listed here): " tomcat 26135 1 5 09:51 ? 00:00:22 /opt/java/bin/java -Djava.util.logging.config.file=/opt/tomcat_instances/solrnode1/conf/log ging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode1 -Duser.language=en -Duser.country=uk -Dbootstrap_confdir=/opt/solrnode1/ldwa01/conf -Dcollection.configName=ldwa01cfg -DnumShards=24 -Dsolr.data.dir=/opt/data/solrnode1/ldwa01/data -DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl .uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath /opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/opt/tomcat_instances/solrnode1 -Dcatalina.home=/opt/tomcat -Djava.io.tmpdir=/opt/tomcat_instances/solrnode1/tmp org.apache.catalina.startup.Bootstrap start tomcat 26225 1 5 09:51 ? 00:00:19 /opt/java/bin/java -Djava.util.logging.config.file=/opt/tomcat_instances/solrnode2/conf/log ging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode2 -Duser.language=en -Duser.country=uk -Dbootstrap_confdir=/opt/solrnode2/ldwa01/conf -Dcollection.configName=ldwa01cfg -DnumShards=24 -Dsolr.data.dir=/opt/data/solrnode2/ldwa01/data -DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl .uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath /opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/opt/tomcat_instances/solrnode2 -Dcatalina.home=/opt/tomcat -Djava.io.tmpdir=/opt/tomcat_instances/solrnode2/tmp org.apache.catalina.startup.Bootstrap start" - The Solr node dashboard shows "-DnumShards=24" in its list of Args for each node And yet, the ldwa01 nodes are leader and replica of shard 17 and there are no other shard leaders created. Plus, if I only change the ZK ensemble declarations in /etc/system/solrnode* to the different dev ZK servers, all 24 leaders are created before any replicas are added. I can also mention, when I browse the Cloud view, I can see both the ldwa01 collection and the ukdomain collection listed, suggesting that this information comes from the ZKs - I assume this is as expected. Plus, the correct node addresses (e.g., 192.168.45.17:8984) are listed for ldwa01 but these addresses are also listed as 'Down' in the ukdomain collection (except for :8983 which only shows in the ldwa01 collection). Any help very gratefully received. Gil -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 23 October 2013 18:50 To: solr-user@lucene.apache.org Subject: Re: New shard leaders or existing shard replicas depends on zookeeper? My first impulse would be to ask how you created the collection. It sure _sounds_ like you didn't specify 24 shards and thus have only a single shard, one leader and 23 replicas.... bq: ...to point to the zookeeper ensemble also used for the ukdomain collection... so my guess is that this ZK ensemble has the ldwa01 collection defined as having only one shard.... I admit I pretty much skimmed your post though... Best, Erick On Wed, Oct 23, 2013 at 12:54 PM, Hoggarth, Gil <gil.hogga...@bl.uk> wrote: > Hi solr-users, > > > > I'm seeing some confusing behaviour in Solr/zookeeper and hope you can > shed some light on what's happening/how I can correct it. > > > > We have two physical servers running automated builds of RedHat 6.4 > and Solr 4.4.0 that host two separate Solr services. The first server > (called ld01) has 24 shards and hosts a collection called 'ukdomain'; > the second server (ld02) also has 24 shards and hosts a different > collection called 'ldwa01'. It's evidently important to note that > previously both of these physical servers provided the 'ukdomain' > collection, but the 'ldwa01' server has been rebuilt for the new > collection. > > > > When I start the ldwa01 solr nodes with their zookeeper configuration > (defined in /etc/sysconfig/solrnode* and with collection.configName as > 'ldwa01cfg') pointing to the development zookeeper ensemble, all nodes > initially become shard leaders and then replicas as I'd expect. But if > I change the ldwa01 solr nodes to point to the zookeeper ensemble also > used for the ukdomain collection, all ldwa01 solr nodes start on the > same shard (that is, the first ldwa01 solr node becomes the shard > leader, then every other solr node becomes a replica for this shard). > The significant point here is no other ldwa01 shards gain leaders (or replicas). > > > > The ukdomain collection uses a zookeeper collection.configName of > 'ukdomaincfg', and prior to the creation of this ldwa01 service the > collection.configName of 'ldwa01cfg' has never previously been used. So > I'm confused why the ldwa01 service would differ when the only > difference is which zookeeper ensemble is used (both zookeeper > ensembles are automatedly built using version 3.4.5). > > > > If anyone can explain why this is happening and how I can get the ldwa01 > services to start correctly using the non-development zookeeper > ensemble, I'd be very grateful! If more information or explanation is > needed, just ask. > > > > Thanks, Gil > > > > Gil Hoggarth > > Web Archiving Technical Services Engineer > > The British Library, Boston Spa, West Yorkshire, LS23 7BQ > > > >