Re: Is fetching from in-sync replicas possible?
That's right - it should not help significantly assuming even distribution of leaders and even distribution of partition volume (average inbound messages/sec). Theo's use-case is a bit different though in which you want to avoid cross-zone consumer reads especially if you have a high fan-out in number of consumers. On Wed, May 27, 2015 at 05:56:56PM +, Aditya Auradkar wrote: Is that necessarily the case? On a cluster hosting partitions, assuming the leaders are evenly distributed, every node should receive a roughly equal share of the traffic. It does help a lot when the consumer throughput of a single partition exceeds the capacity of a single leader but at that point the topic ideally needs more partitions. Aditya From: James Cheng [jch...@tivo.com] Sent: Wednesday, May 27, 2015 10:50 AM To: users@kafka.apache.org Subject: Re: Is fetching from in-sync replicas possible? On May 26, 2015, at 1:44 PM, Joel Koshy jjkosh...@gmail.com wrote: Apologies if this question has been asked before. If I understand things correctly a client can only fetch from the leader of a partition, not from an (in-sync) replica. I have a use case where it would be very beneficial if it were possible to fetch from a replica instead of just the leader, and I wonder why it is not allowed? Are there any consistency problems with allowing it, for example? Is there any way to configure Kafka to allow it? Yes this should be possible. I don't think there are any consistency issues (barring any bugs) since we never expose past the high-watermark and the follower HW is strictly = leader HW. Can you file a jira for this? Wouldn't this allow Kafka to scale to handle a lot more consumer traffic? Currently, consumers all have to read from the leader, which means that the network/disk bandwidth of a particular leader is the bottleneck. If consumers could read from in-sync replicas, then a single node no longer is the bottleneck for reads. You could scale out your read capacity as far as you want. -James The use case is a Kafka cluster running in EC2 across three availability zones. Out of curiosity - what's the typical latency (distribution) you see between zones? Joel
Re: Is fetching from in-sync replicas possible?
Out of curiosity - what's the typical latency (distribution) you see between zones? Unfortunately I don't have any good numbers on that. Since we're publishing both in the same AZ and to other AZs the latency metrics reflect both. If I figure out a good way to measure this I will report back. Thanks - I was wondering if you had done some simple ping/traceroute tests. It's just that with replicas in different zones the end-to-end latency from producer to consumer will be correspondingly higher. Does your zookeeper setup span zones as well? On Tue, May 26, 2015 at 10:44 PM, Joel Koshy jjkosh...@gmail.com wrote: Apologies if this question has been asked before. If I understand things correctly a client can only fetch from the leader of a partition, not from an (in-sync) replica. I have a use case where it would be very beneficial if it were possible to fetch from a replica instead of just the leader, and I wonder why it is not allowed? Are there any consistency problems with allowing it, for example? Is there any way to configure Kafka to allow it? Yes this should be possible. I don't think there are any consistency issues (barring any bugs) since we never expose past the high-watermark and the follower HW is strictly = leader HW. Can you file a jira for this? The use case is a Kafka cluster running in EC2 across three availability zones. Out of curiosity - what's the typical latency (distribution) you see between zones? Joel
Re: Is fetching from in-sync replicas possible?
On May 27, 2015, at 11:23 AM, Joel Koshy jjkosh...@gmail.com wrote: That's right - it should not help significantly assuming even distribution of leaders and even distribution of partition volume (average inbound messages/sec). Aditya, Joel, Oh, right, that makes sense. If I had a 10 partition topic across 10 nodes where each leader handles 1/10th of the consumer traffic for that topic, I could change that and instead have 100 partition topic across 100 nodes, and then each leader would only have to handle 1/100th of the consumer traffic for that topic. -James Theo's use-case is a bit different though in which you want to avoid cross-zone consumer reads especially if you have a high fan-out in number of consumers. On Wed, May 27, 2015 at 05:56:56PM +, Aditya Auradkar wrote: Is that necessarily the case? On a cluster hosting partitions, assuming the leaders are evenly distributed, every node should receive a roughly equal share of the traffic. It does help a lot when the consumer throughput of a single partition exceeds the capacity of a single leader but at that point the topic ideally needs more partitions. Aditya From: James Cheng [jch...@tivo.com] Sent: Wednesday, May 27, 2015 10:50 AM To: users@kafka.apache.org Subject: Re: Is fetching from in-sync replicas possible? On May 26, 2015, at 1:44 PM, Joel Koshy jjkosh...@gmail.com wrote: Apologies if this question has been asked before. If I understand things correctly a client can only fetch from the leader of a partition, not from an (in-sync) replica. I have a use case where it would be very beneficial if it were possible to fetch from a replica instead of just the leader, and I wonder why it is not allowed? Are there any consistency problems with allowing it, for example? Is there any way to configure Kafka to allow it? Yes this should be possible. I don't think there are any consistency issues (barring any bugs) since we never expose past the high-watermark and the follower HW is strictly = leader HW. Can you file a jira for this? Wouldn't this allow Kafka to scale to handle a lot more consumer traffic? Currently, consumers all have to read from the leader, which means that the network/disk bandwidth of a particular leader is the bottleneck. If consumers could read from in-sync replicas, then a single node no longer is the bottleneck for reads. You could scale out your read capacity as far as you want. -James The use case is a Kafka cluster running in EC2 across three availability zones. Out of curiosity - what's the typical latency (distribution) you see between zones? Joel
RE: Is fetching from in-sync replicas possible?
Is that necessarily the case? On a cluster hosting partitions, assuming the leaders are evenly distributed, every node should receive a roughly equal share of the traffic. It does help a lot when the consumer throughput of a single partition exceeds the capacity of a single leader but at that point the topic ideally needs more partitions. Aditya From: James Cheng [jch...@tivo.com] Sent: Wednesday, May 27, 2015 10:50 AM To: users@kafka.apache.org Subject: Re: Is fetching from in-sync replicas possible? On May 26, 2015, at 1:44 PM, Joel Koshy jjkosh...@gmail.com wrote: Apologies if this question has been asked before. If I understand things correctly a client can only fetch from the leader of a partition, not from an (in-sync) replica. I have a use case where it would be very beneficial if it were possible to fetch from a replica instead of just the leader, and I wonder why it is not allowed? Are there any consistency problems with allowing it, for example? Is there any way to configure Kafka to allow it? Yes this should be possible. I don't think there are any consistency issues (barring any bugs) since we never expose past the high-watermark and the follower HW is strictly = leader HW. Can you file a jira for this? Wouldn't this allow Kafka to scale to handle a lot more consumer traffic? Currently, consumers all have to read from the leader, which means that the network/disk bandwidth of a particular leader is the bottleneck. If consumers could read from in-sync replicas, then a single node no longer is the bottleneck for reads. You could scale out your read capacity as far as you want. -James The use case is a Kafka cluster running in EC2 across three availability zones. Out of curiosity - what's the typical latency (distribution) you see between zones? Joel
Re: Is fetching from in-sync replicas possible?
Thank you for your response Joel, Can you file a jira for this? I've created https://issues.apache.org/jira/browse/KAFKA-2225 Out of curiosity - what's the typical latency (distribution) you see between zones? Unfortunately I don't have any good numbers on that. Since we're publishing both in the same AZ and to other AZs the latency metrics reflect both. If I figure out a good way to measure this I will report back. T# On Tue, May 26, 2015 at 10:44 PM, Joel Koshy jjkosh...@gmail.com wrote: Apologies if this question has been asked before. If I understand things correctly a client can only fetch from the leader of a partition, not from an (in-sync) replica. I have a use case where it would be very beneficial if it were possible to fetch from a replica instead of just the leader, and I wonder why it is not allowed? Are there any consistency problems with allowing it, for example? Is there any way to configure Kafka to allow it? Yes this should be possible. I don't think there are any consistency issues (barring any bugs) since we never expose past the high-watermark and the follower HW is strictly = leader HW. Can you file a jira for this? The use case is a Kafka cluster running in EC2 across three availability zones. Out of curiosity - what's the typical latency (distribution) you see between zones? Joel
Re: Is fetching from in-sync replicas possible?
On May 26, 2015, at 1:44 PM, Joel Koshy jjkosh...@gmail.com wrote: Apologies if this question has been asked before. If I understand things correctly a client can only fetch from the leader of a partition, not from an (in-sync) replica. I have a use case where it would be very beneficial if it were possible to fetch from a replica instead of just the leader, and I wonder why it is not allowed? Are there any consistency problems with allowing it, for example? Is there any way to configure Kafka to allow it? Yes this should be possible. I don't think there are any consistency issues (barring any bugs) since we never expose past the high-watermark and the follower HW is strictly = leader HW. Can you file a jira for this? Wouldn't this allow Kafka to scale to handle a lot more consumer traffic? Currently, consumers all have to read from the leader, which means that the network/disk bandwidth of a particular leader is the bottleneck. If consumers could read from in-sync replicas, then a single node no longer is the bottleneck for reads. You could scale out your read capacity as far as you want. -James The use case is a Kafka cluster running in EC2 across three availability zones. Out of curiosity - what's the typical latency (distribution) you see between zones? Joel
Is fetching from in-sync replicas possible?
Hello, Apologies if this question has been asked before. If I understand things correctly a client can only fetch from the leader of a partition, not from an (in-sync) replica. I have a use case where it would be very beneficial if it were possible to fetch from a replica instead of just the leader, and I wonder why it is not allowed? Are there any consistency problems with allowing it, for example? Is there any way to configure Kafka to allow it? The use case is a Kafka cluster running in EC2 across three availability zones. I would like clients to only fetch from a replica in the same zone to avoid paying for more inter-zone traffic than absolutely necessary. The producer side and the replication will still send traffic across zones, but it would be nice to be able to avoid another zone crossing on the consumer side. We currently have a Kafka cluster that costs about the same in bandwidth cost as running the instances, so every little bit helps. yours, Theo
Re: Is fetching from in-sync replicas possible?
Apologies if this question has been asked before. If I understand things correctly a client can only fetch from the leader of a partition, not from an (in-sync) replica. I have a use case where it would be very beneficial if it were possible to fetch from a replica instead of just the leader, and I wonder why it is not allowed? Are there any consistency problems with allowing it, for example? Is there any way to configure Kafka to allow it? Yes this should be possible. I don't think there are any consistency issues (barring any bugs) since we never expose past the high-watermark and the follower HW is strictly = leader HW. Can you file a jira for this? The use case is a Kafka cluster running in EC2 across three availability zones. Out of curiosity - what's the typical latency (distribution) you see between zones? Joel