This may not be a bug in the ZK /server/, but it does seem like a
problem case for client-side software.
If we were able to guarantee the server process was always running, then
we wouldn't ever need more than a one node ensemble. Suggesting that
clients extract names from zoo.cfg or use numeric addresses makes thing
worse rather than better.
I suspect more the issue is that connect strings with multiple names or
addresses are handled differently by /clients /than a name that resolves
to multiple addresses.
In the Oracle client software, we had to correct such an oversight when
we introduced the "single client access name" (SCAN) to the RAC
database. The SCAN is a DNS name that expands to multiple addresses,
normally on different hosts. The client is expected to get all of the
addresses back when it resolves the name, typically in a pseudo-random
order. If the client fails to connect on the first, it tries the
second, etc. until there are no more (unless retries are specified).
It is very convenient to not have to configure clients with explicit
names for the server addresses, using a single name to represent the
entire collection. It also makes it possible to add and delete
servers from the group transparently to the clients by manipulating the
DNS entry for the group.
-dB,
Oracle RAC Database and Cluster Infrastructure Architect
On 7/21/2016 1:16 PM, Michael Han wrote:
This does not sound like a ZK bug - the contract on ZooKeeper is the IP
addresses resolved from the host DNS name extracted from the connection
string should have ZK server process running.. so in this case either the
'bad' IP should be removed from the record or you can use the IP address
instead of DNS name in zoo.cfg for connection string.
On Wed, Jul 20, 2016 at 6:24 PM, 蒋丽诗 <[email protected]> wrote:
Hi,
I am using zookeeper 3.4.6.
I have created A records "test-zookeeper.domain.name" with 2 ips behinds.
One has the zookeeper running, the other not.
21 Jul 2016 01:12:24,616 [WARN] (main-SendThread)
org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected
error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
21 Jul 2016 01:12:24,724 [ERROR] (main) KafkaProducerConfig: Failed to get
data from zookeeper, as KeeperErrorCode = ConnectionLoss for /brokers/ids
My code:
ZooKeeper zk = new ZooKeeper("test-zookeeper.domain.name:2181", 60000,
null); //zookeeper will close the session after 60s
List<String> ids = zk.getChildren("/brokers/ids", false);
=====Some debug I have already done===
ConnectStringParser connectStringParser = new ConnectStringParser("
test-zookeeper.domain.name:2181");
Collection<InetSocketAddress> serverAddresses =
connectStringParser.getServerAddresses();
StaticHostProvider test = new StaticHostProvider(serverAddresses);
LOG.info(test.size()); //the result is 2
--
Thanks,
Lishi