Hi Gwen, I have explored and tested this approach in the past. It does not work for 2 reasons: A. the first one relates to the ZKClient implementation, B. the second is the JVM behavior.
A. The ZKConnection [1] managed by ZKClient uses a legacy constructor of org,apache.Zookeeper [2]. The created Zookeeper instance relies on a StaticHostProvider [3]. This host provider implementation will resolve the DNS on instantiation. So as soon as the Kafka broker creates its ZKClient instance, all server addresses are resolved and the corresponding InetSocketAddress instances will store the IP for their lifetime. :( I believe the right thing to do would be for ZKClient to use a custom HostProvider implementation that create a new InetSocketAddress instance on each invocation of `HostProvider#next()` [4] (therefore resolving the address). You would think this is enough, but no, because the JVM itself caches DNS. B. When an InetSocketAddress instance resolves a DNS name, the JVM will cache the value! So even if a dynamic HostProvider implementation is used, the JVM might return a cached value. And the default TTL is implementation specific. If I remember correctly the Oracle JVM caches them for ever. [5] So you must also configure the Kafka JVM correctly. hope it helps, Alexis [1] https://github.com/sgroschupf/zkclient/blob/ec77080a5d7a5d920fa0e8ea5bd5119fb02a06f1/src/main/java/org/I0Itec/zkclient/ZkConnection.java#L69 [2] https://github.com/apache/zookeeper/blob/a0fcb8ff6c2eece8804ca6c009c175cf8a86335d/src/java/main/org/apache/zookeeper/ZooKeeper.java#L1210 [3] https://github.com/apache/zookeeper/blob/a0fcb8ff6c2eece8804ca6c009c175cf8a86335d/src/java/main/org/apache/zookeeper/client/StaticHostProvider.java [4] https://github.com/apache/zookeeper/blob/a0fcb8ff6c2eece8804ca6c009c175cf8a86335d/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1071 [5] http://docs.oracle.com/javase/8/docs/technotes/guides/net/properties.html *`networkaddress.cache.ttl`*Specified in java.security to indicate the caching policy for successful name lookups from the name service.. The value is specified as integer to indicate the number of seconds to cache the successful lookup. A value of -1 indicates "cache forever". The default behavior is to cache forever when a security manager is installed, and to cache for an implementation specific period of time, when a security manager is not installed. See also `networkaddress.cache.negative.ttl`. On Wed, Aug 3, 2016 at 9:45 AM Gwen Shapira <g...@confluent.io> wrote: > Can you define a DNS name that round-robins to multiple IP addresses? > This way ZKClient will cache the name and you can rotate IPs behind > the scenes with no issues? > > > > On Wed, Aug 3, 2016 at 7:22 AM, Zuber <objectsp...@gmail.com> wrote: > > Hello – > > > > We are planning to use Kafka as Event Store in a system which is being > built using event sourcing design approach. > > Here is how we deployed the cluster in AWS to verify HA in the cloud (in > our POC we only had 1 topic with 1 partition and 3 replication factor) - > > 1) 3 ZK servers running in different AZs (managed by Auto Scaling > Group) > > 2) 3 Kafka brokers EC2 running in different AZs (managed by Auto > Scaling Group) > > 3) Kafka logs are stored in EBS volumes > > 4) A type addresses are defined for all ZK servers & Kafka brokers in > Route53 > > EC2 instance registers its IP for corresponding A type address (in > Route53) on startup > > > > But due a bug in ZKClient used by Kafka broker which caches ZK IP > forever, I don’t see any other option other than bouncing all brokers. > > > > One of the Netflix presentation (following links) mentions about the > issue as well as couple of ZK JIRA defects but I haven’t found any concrete > solution yet. > > I would really appreciate any help in this regard. > > > > > http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139 > > > http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139 > > https://issues.apache.org/jira/browse/ZOOKEEPER-338 > > https://issues.apache.org/jira/browse/ZOOKEEPER-1506 > > > http://grokbase.com/t/kafka/users/131x67h1bt/zookeeper-caching-dns-entries > > > > Thanks, > > Zuber > > >