This may not be a bug in the ZK /server/, but it does seem like a problem case for client-side software.

If we were able to guarantee the server process was always running, then we wouldn't ever need more than a one node ensemble. Suggesting that clients extract names from zoo.cfg or use numeric addresses makes thing worse rather than better.

I suspect more the issue is that connect strings with multiple names or addresses are handled differently by /clients /than a name that resolves to multiple addresses.

In the Oracle client software, we had to correct such an oversight when we introduced the "single client access name" (SCAN) to the RAC database. The SCAN is a DNS name that expands to multiple addresses, normally on different hosts. The client is expected to get all of the addresses back when it resolves the name, typically in a pseudo-random order. If the client fails to connect on the first, it tries the second, etc. until there are no more (unless retries are specified).

It is very convenient to not have to configure clients with explicit names for the server addresses, using a single name to represent the entire collection. It also makes it possible to add and delete servers from the group transparently to the clients by manipulating the DNS entry for the group.

-dB,

Oracle RAC Database and Cluster Infrastructure Architect


On 7/21/2016 1:16 PM, Michael Han wrote:
This does not sound like a ZK bug - the contract on ZooKeeper is the IP
addresses resolved from the host DNS name extracted from the connection
string should have ZK server process running.. so in this case either the
'bad' IP should be removed from the record or you can use the IP address
instead of DNS name in zoo.cfg for connection string.

On Wed, Jul 20, 2016 at 6:24 PM, 蒋丽诗 <[email protected]> wrote:

Hi,

I am using zookeeper 3.4.6.
I have created A records "test-zookeeper.domain.name" with 2 ips behinds.
One has the zookeeper running, the other not.

21 Jul 2016 01:12:24,616 [WARN]  (main-SendThread)
org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected
error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at

org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
21 Jul 2016 01:12:24,724 [ERROR]  (main) KafkaProducerConfig: Failed to get
data from zookeeper, as KeeperErrorCode = ConnectionLoss for /brokers/ids

My code:
ZooKeeper zk = new ZooKeeper("test-zookeeper.domain.name:2181", 60000,
null); //zookeeper will close the session after 60s
List<String> ids = zk.getChildren("/brokers/ids", false);

=====Some debug I have already done===
ConnectStringParser connectStringParser = new ConnectStringParser("
test-zookeeper.domain.name:2181");
Collection<InetSocketAddress> serverAddresses =
connectStringParser.getServerAddresses();
StaticHostProvider test = new StaticHostProvider(serverAddresses);
LOG.info(test.size()); //the result is 2

--
Thanks,
Lishi




Reply via email to