[jira] [Commented] (KAFKA-1530) howto update continuously
[ https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425454#comment-15425454 ] Alexey Ozeritskiy commented on KAFKA-1530: -- I think this ticket may be closed unclean.leader.election.enable=false helps us Also we've developed tool kafka-restarter that restarts kafka node by node and controls isr status. And we've developed tool fix-isr that can fix isr after cluster power failure. > howto update continuously > - > > Key: KAFKA-1530 > URL: https://issues.apache.org/jira/browse/KAFKA-1530 > Project: Kafka > Issue Type: Wish >Reporter: Stanislav Gilmulin >Assignee: Guozhang Wang >Priority: Minor > Labels: operating_manual, performance > > Hi, > > Could I ask you a question about the Kafka update procedure? > Is there a way to update software, which doesn't require service interruption > or lead to data losses? > We can't stop message brokering during the update as we have a strict SLA. > > Best regards > Stanislav Gilmulin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1530) howto update continuously
[ https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064815#comment-14064815 ] Oleg Golovin commented on KAFKA-1530: - Thank you for mentioning the option unclean.leader.election.enable. It seems to be a new option we didn't know of. We will need some time to test it. We will report how it went as soon as we perform this testing. howto update continuously - Key: KAFKA-1530 URL: https://issues.apache.org/jira/browse/KAFKA-1530 Project: Kafka Issue Type: Wish Reporter: Stanislav Gilmulin Assignee: Guozhang Wang Priority: Minor Labels: operating_manual, performance Hi, Could I ask you a question about the Kafka update procedure? Is there a way to update software, which doesn't require service interruption or lead to data losses? We can't stop message brokering during the update as we have a strict SLA. Best regards Stanislav Gilmulin -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1530) howto update continuously
[ https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058568#comment-14058568 ] Andrey Stepachev commented on KAFKA-1530: - Looks like [~ovgolovin] problem with wrong replica election can be fixed by adding notion of min-replicas somewhere around that code https://github.com/apache/kafka/blob/3c4ca854fd2da5e5fcecdaf0856a38a9ebe4763c/core/src/main/scala/kafka/cluster/Partition.scala#L165, we can restrict leader election/reelection only for partitions which have configured size of isr. According to [~renew] situation, it is not realistic to loose data in situation when leader is stopped and one of the replica will became the leader and _if_ acks required greater then 1. kafka maintains 'high watermark' for each partition and for each request it waits for required replicas to catch up with leader before responds to client. So if it is not a correlated failure (when we loose 2 replicas at once) it will work correctly. If it was 2 replicas and 1 replica outside of isr, both in ISR die, then it is possible to bring up third replica and new data in those replicas data will be lost. Just to be sure, kafka is a 'primary backup' replication system, so in doesn't tolerate correlated failures in oppose to quorum system. But gives high throughput in return. That how in stands :) howto update continuously - Key: KAFKA-1530 URL: https://issues.apache.org/jira/browse/KAFKA-1530 Project: Kafka Issue Type: Wish Reporter: Stanislav Gilmulin Assignee: Guozhang Wang Priority: Minor Labels: operating_manual, performance Hi, Could I ask you a question about the Kafka update procedure? Is there a way to update software, which doesn't require service interruption or lead to data losses? We can't stop message brokering during the update as we have a strict SLA. Best regards Stanislav Gilmulin -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1530) howto update continuously
[ https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058893#comment-14058893 ] Jun Rao commented on KAFKA-1530: We do have an option unclean.leader.election.enable to prevent unclean leader election. So, if you care more about durability than availability, you can set this option to false. Then, the new leader will only be elected from isr. The unavailability window of a partition could be longer though since we have to wait until at least one broker in isr is back online. howto update continuously - Key: KAFKA-1530 URL: https://issues.apache.org/jira/browse/KAFKA-1530 Project: Kafka Issue Type: Wish Reporter: Stanislav Gilmulin Assignee: Guozhang Wang Priority: Minor Labels: operating_manual, performance Hi, Could I ask you a question about the Kafka update procedure? Is there a way to update software, which doesn't require service interruption or lead to data losses? We can't stop message brokering during the update as we have a strict SLA. Best regards Stanislav Gilmulin -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1530) howto update continuously
[ https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056049#comment-14056049 ] Stanislav Gilmulin commented on KAFKA-1530: --- Thank you, i'd like to ask some questions. If a cluster has a lot of nodes and a few of them are lagging or down, we can't guarantee we would stop and restart nodes properly and in the right order Is there any recommended way to manage it? Or even an already existing script or tool for it? Replication factor = 3. Version 0.8.1.1 howto update continuously - Key: KAFKA-1530 URL: https://issues.apache.org/jira/browse/KAFKA-1530 Project: Kafka Issue Type: Wish Reporter: Stanislav Gilmulin Priority: Minor Labels: operating_manual, performance Hi, Could I ask you a question about the Kafka update procedure? Is there a way to update software, which doesn't require service interruption or lead to data losses? We can't stop message brokering during the update as we have a strict SLA. Best regards Stanislav Gilmulin -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1530) howto update continuously
[ https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056339#comment-14056339 ] Guozhang Wang commented on KAFKA-1530: -- Not sure I understand your question, what do you mean by we can't guarantee we would stop and restart nodes properly and in the right order ? howto update continuously - Key: KAFKA-1530 URL: https://issues.apache.org/jira/browse/KAFKA-1530 Project: Kafka Issue Type: Wish Reporter: Stanislav Gilmulin Priority: Minor Labels: operating_manual, performance Hi, Could I ask you a question about the Kafka update procedure? Is there a way to update software, which doesn't require service interruption or lead to data losses? We can't stop message brokering during the update as we have a strict SLA. Best regards Stanislav Gilmulin -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1530) howto update continuously
[ https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056411#comment-14056411 ] Stanislav Gilmulin commented on KAFKA-1530: --- It means we have the risks. You're right. My question wansn't clear. Let me try to explain. First of all, accordint to business requirements we can't stop the service. So we can't stop all nodes before updating. And, as you've advised, our option would be updating step by step. But when we update without using the right procedure, we could lose an unknown amount of messages in the example case presented below. Let's consider this case for a example. We have 3 replicas of one partition with 2 of them lagging behind. Then we restart the leader. At that very moment one of the two lagging partitions become a new leader. After that, when the used-to-be-leader partiton starts working again (and which in fact has the newest data), it truncates all the newest data to match with now elected leader. This situation happens quite often when we restart a highly loaded Kafka cluster, so that we loose some part of our data. howto update continuously - Key: KAFKA-1530 URL: https://issues.apache.org/jira/browse/KAFKA-1530 Project: Kafka Issue Type: Wish Reporter: Stanislav Gilmulin Priority: Minor Labels: operating_manual, performance Hi, Could I ask you a question about the Kafka update procedure? Is there a way to update software, which doesn't require service interruption or lead to data losses? We can't stop message brokering during the update as we have a strict SLA. Best regards Stanislav Gilmulin -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1530) howto update continuously
[ https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056422#comment-14056422 ] Oleg commented on KAFKA-1530: - Another situation which happened in our production: We have replication level 3. One out of 3 partition started lagging behind (due to network connectivity problems, etc.). Then while upgrading/restarting Kafka we restart the whole cluster. After upgrade Kafka starts electing leaders for each partition. It's highly likely it may elect the lagging behind partition as a leader. Which in result leads to truncating two other partitions. In this case we loose data. So we are seeking a means of restarting/upgrading Kafka without data loose. howto update continuously - Key: KAFKA-1530 URL: https://issues.apache.org/jira/browse/KAFKA-1530 Project: Kafka Issue Type: Wish Reporter: Stanislav Gilmulin Priority: Minor Labels: operating_manual, performance Hi, Could I ask you a question about the Kafka update procedure? Is there a way to update software, which doesn't require service interruption or lead to data losses? We can't stop message brokering during the update as we have a strict SLA. Best regards Stanislav Gilmulin -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1530) howto update continuously
[ https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056448#comment-14056448 ] Guozhang Wang commented on KAFKA-1530: -- Hi Stanislav/Oleg, Kafka server has a config called controlled.shutdown.enable, and when it is turned on, the shutting down process will first wait for all the leaders of the current shutting down node to migrate to other nodes before shutting down the server (http://kafka.apache.org/documentation.html#brokerconfigs). For your first case, where the shutting down node is the only replica in ISR, the shutting down process will block until there are other nodes back in ISR and hence can take the partitions; for your second case where there are more than one node in ISR, then it is guaranteed that the leaders of the shutting down nodes will be moved to another ISR node. howto update continuously - Key: KAFKA-1530 URL: https://issues.apache.org/jira/browse/KAFKA-1530 Project: Kafka Issue Type: Wish Reporter: Stanislav Gilmulin Priority: Minor Labels: operating_manual, performance Hi, Could I ask you a question about the Kafka update procedure? Is there a way to update software, which doesn't require service interruption or lead to data losses? We can't stop message brokering during the update as we have a strict SLA. Best regards Stanislav Gilmulin -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1530) howto update continuously
[ https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055437#comment-14055437 ] Guozhang Wang commented on KAFKA-1530: -- Hi Stanislav, Upgrading a kafka server does require bouncing it with the new jar, however if you have a cluster of more than one server and data replication is turned on (i.e. replication factor 1 for all topics hosted) then this should not interrupt the message brokering, since you will only bounce the brokers sequentially and while one node is down, its brokering functionality will be moved to other replicas. The only exception is when you upgrade from 0.7 to 0.8.*, more details can be found on this wiki: https://cwiki.apache.org/confluence/display/KAFKA/Changes+in+Kafka+0.8 howto update continuously - Key: KAFKA-1530 URL: https://issues.apache.org/jira/browse/KAFKA-1530 Project: Kafka Issue Type: Wish Reporter: Stanislav Gilmulin Priority: Minor Labels: operating_manual, performance Hi, Could I ask you a question about the Kafka update procedure? Is there a way to update software, which doesn't require service interruption or lead to data losses? We can't stop message brokering during the update as we have a strict SLA. Best regards Stanislav Gilmulin -- This message was sent by Atlassian JIRA (v6.2#6252)