Re: New producer: metadata update problem on 2 Node cluster.
Creating a new consumer instance *does not* solve this problem. Attaching the producer/consumer code that I used for testing. On Wed, May 6, 2015 at 6:31 AM, Ewen Cheslack-Postava e...@confluent.io wrote: I'm not sure about the old producer behavior in this same failure scenario, but creating a new producer instance would resolve the issue since it would start with the list of bootstrap nodes and, assuming at least one of them was up, it would be able to fetch up to date metadata. On Tue, May 5, 2015 at 5:32 PM, Jason Rosenberg j...@squareup.com wrote: Can you clarify, is this issue here specific to the new producer? With the old producer, we routinely construct a new producer which makes a fresh metadata request (via a VIP connected to all nodes in the cluster). Would this approach work with the new producer? Jason On Tue, May 5, 2015 at 1:12 PM, Rahul Jain rahul...@gmail.com wrote: Mayuresh, I was testing this in a development environment and manually brought down a node to simulate this. So the dead node never came back up. My colleague and I were able to consistently see this behaviour several times during the testing. On 5 May 2015 20:32, Mayuresh Gharat gharatmayures...@gmail.com wrote: I agree that to find the least Loaded node the producer should fall back to the bootstrap nodes if its not able to connect to any nodes in the current metadata. That should resolve this. Rahul, I suppose the problem went off because the dead node in your case might have came back up and allowed for a metadata update. Can you confirm this? Thanks, Mayuresh On Tue, May 5, 2015 at 5:10 AM, Rahul Jain rahul...@gmail.com wrote: We observed the exact same error. Not very clear about the root cause although it appears to be related to leastLoadedNode implementation. Interestingly, the problem went away by increasing the value of reconnect.backoff.ms to 1000ms. On 29 Apr 2015 00:32, Ewen Cheslack-Postava e...@confluent.io wrote: Ok, all of that makes sense. The only way to possibly recover from that state is either for K2 to come back up allowing the metadata refresh to eventually succeed or to eventually try some other node in the cluster. Reusing the bootstrap nodes is one possibility. Another would be for the client to get more metadata than is required for the topics it needs in order to ensure it has more nodes to use as options when looking for a node to fetch metadata from. I added your description to KAFKA-1843, although it might also make sense as a separate bug since fixing it could be considered incremental progress towards resolving 1843. On Tue, Apr 28, 2015 at 9:18 AM, Manikumar Reddy ku...@nmsworks.co.in wrote: Hi Ewen, Thanks for the response. I agree with you, In some case we should use bootstrap servers. If you have logs at debug level, are you seeing this message in between the connection attempts: Give up sending metadata request since no node is available Yes, this log came for couple of times. Also, if you let it continue running, does it recover after the metadata.max.age.ms timeout? It does not reconnect. It is continuously trying to connect with dead node. -Manikumar -- Thanks, Ewen -- -Regards, Mayuresh R. Gharat (862) 250-7125 -- Thanks, Ewen
Re: New producer: metadata update problem on 2 Node cluster.
Sorry, I meant creating a new producer, not consumer. Here's the code. Producer - http://pastebin.com/Kqq1ymCX Consumer - http://pastebin.com/i2Z8PTYB Callback - http://pastebin.com/x253z7bG As you'll notice, I am creating a new producer for each message. So the bootstrap nodes should be refreshed. I have a single topic (receive.queue) replicated across 3 nodes. I add all 3 nodes to the bootstrap list. On bringing one of the nodes down, some messages start failing (metadata update timeout error). As I mentioned earlier, the problem goes away simply by setting the reconnect.backoff.ms property to 1000ms. On 7 May 2015 23:18, Ewen Cheslack-Postava e...@confluent.io wrote: Rahul, the mailing list filters attachments, you'd have to post the code somewhere else for people to be able to see it. But I don't think anyone suggested that creating a new consumer would fix anything. Creating a new producer *and discarding the old one* basically just makes it start from scratch using the bootstrap nodes, which is why that would allow recovery from that condition. But that's just a workaround. The real issue is that the producer only maintains metadata for the nodes that are replicas for the partitions of the topics the producer sends data to. In some cases, this is a small set of servers and can get the producer stuck if a node goes offline and it doesn't have any other nodes that it can try to communicate with to get updated metadata (since the topic partitions should have a new leader). Falling back on the original bootstrap servers is one solution to this problem. Another would be to maintain metadata for additional servers so you always have extra bootstrap nodes in your current metadata set, even if they aren't replicas for any of the topics you're working with. -Ewen On Thu, May 7, 2015 at 12:06 AM, Rahul Jain rahul...@gmail.com wrote: Creating a new consumer instance *does not* solve this problem. Attaching the producer/consumer code that I used for testing. On Wed, May 6, 2015 at 6:31 AM, Ewen Cheslack-Postava e...@confluent.io wrote: I'm not sure about the old producer behavior in this same failure scenario, but creating a new producer instance would resolve the issue since it would start with the list of bootstrap nodes and, assuming at least one of them was up, it would be able to fetch up to date metadata. On Tue, May 5, 2015 at 5:32 PM, Jason Rosenberg j...@squareup.com wrote: Can you clarify, is this issue here specific to the new producer? With the old producer, we routinely construct a new producer which makes a fresh metadata request (via a VIP connected to all nodes in the cluster). Would this approach work with the new producer? Jason On Tue, May 5, 2015 at 1:12 PM, Rahul Jain rahul...@gmail.com wrote: Mayuresh, I was testing this in a development environment and manually brought down a node to simulate this. So the dead node never came back up. My colleague and I were able to consistently see this behaviour several times during the testing. On 5 May 2015 20:32, Mayuresh Gharat gharatmayures...@gmail.com wrote: I agree that to find the least Loaded node the producer should fall back to the bootstrap nodes if its not able to connect to any nodes in the current metadata. That should resolve this. Rahul, I suppose the problem went off because the dead node in your case might have came back up and allowed for a metadata update. Can you confirm this? Thanks, Mayuresh On Tue, May 5, 2015 at 5:10 AM, Rahul Jain rahul...@gmail.com wrote: We observed the exact same error. Not very clear about the root cause although it appears to be related to leastLoadedNode implementation. Interestingly, the problem went away by increasing the value of reconnect.backoff.ms to 1000ms. On 29 Apr 2015 00:32, Ewen Cheslack-Postava e...@confluent.io wrote: Ok, all of that makes sense. The only way to possibly recover from that state is either for K2 to come back up allowing the metadata refresh to eventually succeed or to eventually try some other node in the cluster. Reusing the bootstrap nodes is one possibility. Another would be for the client to get more metadata than is required for the topics it needs in order to ensure it has more nodes to use as options when looking for a node to fetch metadata from. I added your description to KAFKA-1843, although it might also make sense as a separate bug since fixing it could be considered incremental progress towards resolving 1843. On Tue, Apr 28, 2015 at 9:18 AM, Manikumar Reddy ku...@nmsworks.co.in wrote: Hi Ewen,
RE: Significance of SimpleConsumer id string
A client id is used to logically identify an application. Ideally, multiple consumers belonging to the same application should use the same client id. More concretely, metrics can be gathered per client and quotas in the future will be enforced by clientId. Aditya From: Magnus Vojbacke [magnus.vojba...@digitalroute.com] Sent: Thursday, May 07, 2015 4:44 AM To: users@kafka.apache.org Subject: Significance of SimpleConsumer id string Hi, The kafka.consumer.SimpleConsumer takes an id: String” constructor parameter. What is the significance of this id? Are there any consequences or risks associated with using the exact same id for several consumers of the same topic/partition? /Magnus
Re: New producer: metadata update problem on 2 Node cluster.
Rahul, the mailing list filters attachments, you'd have to post the code somewhere else for people to be able to see it. But I don't think anyone suggested that creating a new consumer would fix anything. Creating a new producer *and discarding the old one* basically just makes it start from scratch using the bootstrap nodes, which is why that would allow recovery from that condition. But that's just a workaround. The real issue is that the producer only maintains metadata for the nodes that are replicas for the partitions of the topics the producer sends data to. In some cases, this is a small set of servers and can get the producer stuck if a node goes offline and it doesn't have any other nodes that it can try to communicate with to get updated metadata (since the topic partitions should have a new leader). Falling back on the original bootstrap servers is one solution to this problem. Another would be to maintain metadata for additional servers so you always have extra bootstrap nodes in your current metadata set, even if they aren't replicas for any of the topics you're working with. -Ewen On Thu, May 7, 2015 at 12:06 AM, Rahul Jain rahul...@gmail.com wrote: Creating a new consumer instance *does not* solve this problem. Attaching the producer/consumer code that I used for testing. On Wed, May 6, 2015 at 6:31 AM, Ewen Cheslack-Postava e...@confluent.io wrote: I'm not sure about the old producer behavior in this same failure scenario, but creating a new producer instance would resolve the issue since it would start with the list of bootstrap nodes and, assuming at least one of them was up, it would be able to fetch up to date metadata. On Tue, May 5, 2015 at 5:32 PM, Jason Rosenberg j...@squareup.com wrote: Can you clarify, is this issue here specific to the new producer? With the old producer, we routinely construct a new producer which makes a fresh metadata request (via a VIP connected to all nodes in the cluster). Would this approach work with the new producer? Jason On Tue, May 5, 2015 at 1:12 PM, Rahul Jain rahul...@gmail.com wrote: Mayuresh, I was testing this in a development environment and manually brought down a node to simulate this. So the dead node never came back up. My colleague and I were able to consistently see this behaviour several times during the testing. On 5 May 2015 20:32, Mayuresh Gharat gharatmayures...@gmail.com wrote: I agree that to find the least Loaded node the producer should fall back to the bootstrap nodes if its not able to connect to any nodes in the current metadata. That should resolve this. Rahul, I suppose the problem went off because the dead node in your case might have came back up and allowed for a metadata update. Can you confirm this? Thanks, Mayuresh On Tue, May 5, 2015 at 5:10 AM, Rahul Jain rahul...@gmail.com wrote: We observed the exact same error. Not very clear about the root cause although it appears to be related to leastLoadedNode implementation. Interestingly, the problem went away by increasing the value of reconnect.backoff.ms to 1000ms. On 29 Apr 2015 00:32, Ewen Cheslack-Postava e...@confluent.io wrote: Ok, all of that makes sense. The only way to possibly recover from that state is either for K2 to come back up allowing the metadata refresh to eventually succeed or to eventually try some other node in the cluster. Reusing the bootstrap nodes is one possibility. Another would be for the client to get more metadata than is required for the topics it needs in order to ensure it has more nodes to use as options when looking for a node to fetch metadata from. I added your description to KAFKA-1843, although it might also make sense as a separate bug since fixing it could be considered incremental progress towards resolving 1843. On Tue, Apr 28, 2015 at 9:18 AM, Manikumar Reddy ku...@nmsworks.co.in wrote: Hi Ewen, Thanks for the response. I agree with you, In some case we should use bootstrap servers. If you have logs at debug level, are you seeing this message in between the connection attempts: Give up sending metadata request since no node is available Yes, this log came for couple of times. Also, if you let it continue running, does it recover after the metadata.max.age.ms timeout? It does not reconnect. It is continuously trying to connect with dead node. -Manikumar -- Thanks, Ewen -- -Regards, Mayuresh R. Gharat (862) 250-7125
Auto-rebalance not triggering in 2.10-0.8.1.1
I'm running 2.10-0.8.1.1, and rebalance will not trigger on it's own. From http://grokbase.com/t/kafka/users/14bj5ps9hp/partition-auto-rebalance#20141118rf39q8cs4sjh6vzjgdw92e37cw I think the leader imbalance means: For a single broker, add up all the partitions it is leading (Y), and count the ones for which it's not the preferred broker (X). The ratio of X:Y is the one being used. I have about 10 topics spread between the 3 brokers, each with 4 or 8 partitions. If I restart broker A, wait 5 min, then restart B, leadership ends up entirely on C (even though A was in ISR when B went down). Nothing triggers on it's own. Triggering it manually works (with bin/kafka-preferred-replica-election.sh). Is there something I should be checking, or is there a downside to just adding a cron job to trigger replica election once an hour? Thanks Steve
Custom Partition example in SCALA
Hi, Could you please point me the example scala program for Custom Partition Regards, Rajesh
Getting NotLeaderForPartitionException in kafka broker
Hi team, I have a 12 nodes cluster that has 800 topics and each of which has only 1 partition. I observed that one of the node keeps generating NotLeaderForPartitionException that causes the node to be unresponsive to all requests. Below is the exception [2015-05-07 04:16:01,014] ERROR [ReplicaFetcherThread-1-12], Error for partition [topic1,0] to broker 12:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) All other nodes in the cluster generate lots of replication error too as shown below due to unresponsiveness of above node. [2015-05-07 04:17:34,917] WARN [Replica Manager on Broker 1]: Fetch request with correlation id 3630911 from client ReplicaFetcherThread-0-1 on partition [topic1,0] failed due to Leader not local for partition [cg22_user.item_attr_info.lcr,0] on broker 1 (kafka.server.ReplicaManager) Any suggestion why the node runs into the unstable stage and any configuration I can set to prevent this? I use kafka 0.8.2.1 And here is the server.properties broker.id=5 port=9092 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=1048576 socket.receive.buffer.bytes=1048576 socket.request.max.bytes=104857600 log.dirs=/mnt/kafka num.partitions=1 num.recovery.threads.per.data.dir=1 log.retention.hours=1 log.segment.bytes=1073741824 log.retention.check.interval.ms=30 log.cleaner.enable=false zookeeper.connect=ip:2181 zookeeper.connection.timeout.ms=6000 unclean.leader.election.enable=false delete.topic.enable=true default.replication.factor=3 num.replica.fetchers=3 delete.topic.enable=true kafka.metrics.reporters=report.KafkaMetricsCollector straas.hubble.conf.file=/etc/kafka/report.conf -- Regards, Tao
Significance of SimpleConsumer id string
Hi, The kafka.consumer.SimpleConsumer takes an id: String” constructor parameter. What is the significance of this id? Are there any consequences or risks associated with using the exact same id for several consumers of the same topic/partition? /Magnus
Differences between new and legacy scala producer API
Hi - Legacy scala api for producer is having keyed message along with topic, key, partkey, and message. Meanwhile new api has no partkey. Whats the difference between key and partkey? - In javadoc, new producer api send method is always async, does producer.type properties overriden? - Will scala legacy api be deprecated any time soon? Rendy
Re: Support https or ssl
Hi Jamie, I am currently working on providing ssl support for kafka. Here are the iras https://issues.apache.org/jira/browse/KAFKA-1690 and https://issues.apache.org/jira/browse/KAFKA-1684 . If you are using REST api to front kafka producer than you can probably make that http server to be on ssl. -- Harsha On May 7, 2015 at 7:07:58 PM, Jamie Wang (jamie.w...@actuate.com) wrote: Hello, It's been a while since my team worked on kafka related project. Btw, previous project using Kafka worked wonderfully for us. Now I have requirement to use https or SSL. I am wondering if the latest version has support for SSL. If not, what is the timeline this functionality would supported and if there is any suggestion on what I can do in the interim to provide a similar functionality using Kakfa. Thank you in advance for your time and help. Jamie
Support https or ssl
Hello, It's been a while since my team worked on kafka related project. Btw, previous project using Kafka worked wonderfully for us. Now I have requirement to use https or SSL. I am wondering if the latest version has support for SSL. If not, what is the timeline this functionality would supported and if there is any suggestion on what I can do in the interim to provide a similar functionality using Kakfa. Thank you in advance for your time and help. Jamie
Re: Custom Partition example in SCALA
Hi, I have written below custom partition program in scala but it is not calling partition method from producer *class TestRoundRobinPartitioner(props: VerifiableProperties) extends Partitioner { def partition(key: Any, numPartitions: Int): Int = { println(key + key); println(partition : + numPartitions); Integer.parseInt(key.toString())%4 }}* In producer class I have configured this class *props.put(partitioner.class, TestRoundRobinPartitioner)* Could you please help me to resolve this issue Regards, Rajesh On Thu, May 7, 2015 at 12:38 PM, Madabhattula Rajesh Kumar mrajaf...@gmail.com wrote: Hi, Could you please point me the example scala program for Custom Partition Regards, Rajesh