Hmmm, shouldn't be happening. How sure are you that the upgrade to 4.4 was carried out on all machines?
Erick On Tue, Aug 6, 2013 at 5:23 PM, Joshi, Shital <shital.jo...@gs.com> wrote: > Machines are definitely up. Solr4 node and zookeeper instance share the > machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know > about the zk instances. > > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, August 06, 2013 5:03 PM > To: solr-user@lucene.apache.org > Subject: Re: external zookeeper with SolrCloud > > First off, even 6 ZK instances are overkill, vast overkill. 3 should be > more than enough. > > That aside, however, how are you letting your Solr nodes know about the zk > machines? > Is it possible you've pointed some of your Solr nodes at specific ZK > machines > that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3.... > > Best > Erick > > > On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital <shital.jo...@gs.com> wrote: > > > Hi, > > > > We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. > > We have 6 zookeeper instances. We are planning to change to odd number of > > zookeeper instances. > > > > With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node > > never connects to zookeeper (can't see the admin page) until all > zookeeper > > instances are up and we restart all solr nodes. It was suggested that it > > could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and > > this bug is solved in Solr 4.4 > > > > We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of > 6 > > zookeeper instances and then brought up all ten Solr4 nodes. We kept > seeing > > this exception in Solr logs: > > > > 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session > > 0x0 for server null, unexpected error, closing socket connection and > > attempting reconnect java.net.ConnectException: Connection refused > > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > > at > > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > > at > > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) > > at > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) > > > > And after a while saw this exception. > > > > INFO - 2013-08-05 22:24:07.582; > > org.apache.solr.common.cloud.ConnectionManager; Watcher > > org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection > Watcher: > > qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, > > qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got > > event WatchedEvent state:SyncConnected type:None path:null path:null > > type:None > > INFO - 2013-08-05 22:24:07.662; > > org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status > > change trigger but we are already closed > > 754311 [main-EventThread] INFO > > org.apache.solr.common.cloud.ConnectionManager ? Client->ZooKeeper > status > > change trigger but we are already closed > > > > We brought up all zookeeper instances but the cloud never came up until > > all solr nodes were restarted. Do we need to change any settings? After > > weekend reboot, all zookeeper instances come up one by one. While > zookeeper > > instances are coming up solr nodes are also getting started. With this > > issue, we have to put checks to make sure all zookeeper instances are up > > before we bring up any solr node. > > > > Thanks!! > > > > -----Original Message----- > > From: Mark Miller [mailto:markrmil...@gmail.com] > > Sent: Tuesday, June 11, 2013 10:42 AM > > To: solr-user@lucene.apache.org > > Subject: Re: external zookeeper with SolrCloud > > > > > > On Jun 11, 2013, at 10:15 AM, "Joshi, Shital" <shital.jo...@gs.com> > wrote: > > > > > Thanks Mark. > > > > > > Looks like this bug is fixed in Solr 4.4. Do you have any date for > > official release of 4.4? > > > > Looks like it might come out in a couple of weeks. > > > > > Is there any instruction available on how to build Solr 4.4 from SVN > > repository? > > > > It's java, so it's pretty easy - you might find some help here: > > http://wiki.apache.org/solr/HowToContribute > > > > - Mark > > > > > > > > -----Original Message----- > > > From: Mark Miller [mailto:markrmil...@gmail.com] > > > Sent: Monday, June 10, 2013 8:05 PM > > > To: solr-user@lucene.apache.org > > > Subject: Re: external zookeeper with SolrCloud > > > > > > This might be https://issues.apache.org/jira/browse/SOLR-4899 > > > > > > - Mark > > > > > > On Jun 10, 2013, at 5:59 PM, "Joshi, Shital" <shital.jo...@gs.com> > > wrote: > > > > > >> Hi, > > >> > > >> > > >> > > >> We're setting up 5 shard SolrCloud with external zoo keeper. When we > > bring up Solr nodes while the zookeeper instance is not up and running, > we > > see this error in Solr logs. > > >> > > >> > > >> > > >> java.net.ConnectException: Connection refused > > >> > > >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > > >> > > >> at > > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > > >> > > >> at > > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) > > >> > > >> at > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) > > >> > > >> > > >> > > >> INFO - 2013-06-10 15:03:35.422; > > org.apache.solr.common.cloud.ConnectionManager; Watcher 592147 > > [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager > ? > > Watcher > > org.apache.solr.common.cloud.ConnectionManager@530d0eaename:ZooKeeperConnection > Watcher: ................. got event WatchedEvent > > state:SyncConnected type:None path:null path:null type:None > > >> > > >> > > >> > > >> INFO - 2013-06-10 15:03:35.423; > > org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status > > change trigger but we are already closed > > >> > > >> 592148 [main-EventThread] INFO > > org.apache.solr.common.cloud.ConnectionManager ? Client->ZooKeeper > status > > change trigger but we are already closed > > >> > > >> > > >> > > >> After we bring up zookeeper instance, the node never connects to > > zookeeper and we can't see the solr admin page, until we restart the > node. > > >> > > >> > > >> > > >> Does the zookeeper instance has to be up when we bring up Solr node? > > That's not what the documentation say though. > > >> > > >> > > >> > > >> Thanks. > > > > > > > >