[jira] [Commented] (KAFKA-5778) Kafka cluster is not responding when one broker hangs and resulted in too many connections in close_wait in other brokers
[ https://issues.apache.org/jira/browse/KAFKA-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728943#comment-16728943 ] Jungbae Jun commented on KAFKA-5778: [~tju_lushilin] I've never experienced same problem since updating kafka version. > Kafka cluster is not responding when one broker hangs and resulted in too > many connections in close_wait in other brokers > - > > Key: KAFKA-5778 > URL: https://issues.apache.org/jira/browse/KAFKA-5778 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1 >Reporter: saichand >Priority: Critical > > In a cluster of 3 brokers , one of the broker(Broker-1 ) is hanged and from > then other two brokers has connections in close_wait for java client > producer/consumer and also even some broker to broker connections are in > close wait among those two brokers. > Kafka Version : 0.10.0.1 > In logs I found replica fetcher thread connection refused exceptions: > In broker 0 : replica fetcher 0-1, replica fetcher 0-2 > In broker 2 : replica fetcher 0-0, replica fetcher 0-1 > In broker 1 : It was hung no logs were available at that time. > We tried restarting broker- 2 kafka and then it was not successful as it > terminated saying zookeeper timeout > then we tried restarting broker- 0 kafka and we got the same error > Broker -1 was hang so , we could not login even into it > so we restarted broker -1 machine > then we restarted all zookepers and then kafka brokers now everything is fine -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-5778) Kafka cluster is not responding when one broker hangs and resulted in too many connections in close_wait in other brokers
[ https://issues.apache.org/jira/browse/KAFKA-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728624#comment-16728624 ] Shilin Lu commented on KAFKA-5778: -- hello, we meet the same problem in our prod environment when the controller is reelect.The phenomenon is close_wait tcp status increace fastly and isr shrink.I think this problem can not resolve by modify linux sys ctl config ,maybe this is a program code bug. how to resolve it?Do you have some new discovery.thank you ! > Kafka cluster is not responding when one broker hangs and resulted in too > many connections in close_wait in other brokers > - > > Key: KAFKA-5778 > URL: https://issues.apache.org/jira/browse/KAFKA-5778 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1 >Reporter: saichand >Priority: Critical > > In a cluster of 3 brokers , one of the broker(Broker-1 ) is hanged and from > then other two brokers has connections in close_wait for java client > producer/consumer and also even some broker to broker connections are in > close wait among those two brokers. > Kafka Version : 0.10.0.1 > In logs I found replica fetcher thread connection refused exceptions: > In broker 0 : replica fetcher 0-1, replica fetcher 0-2 > In broker 2 : replica fetcher 0-0, replica fetcher 0-1 > In broker 1 : It was hung no logs were available at that time. > We tried restarting broker- 2 kafka and then it was not successful as it > terminated saying zookeeper timeout > then we tried restarting broker- 0 kafka and we got the same error > Broker -1 was hang so , we could not login even into it > so we restarted broker -1 machine > then we restarted all zookepers and then kafka brokers now everything is fine -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-5778) Kafka cluster is not responding when one broker hangs and resulted in too many connections in close_wait in other brokers
[ https://issues.apache.org/jira/browse/KAFKA-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352063#comment-16352063 ] tianye commented on KAFKA-5778: --- hello ,i have the same problem,the kafka broker close_wait is increase, i config the /etc/sysctl.conf net.ipv4.tcp_keepalive_intvl=5 net.ipv4.tcp_keepalive_probes=5 net.ipv4.tcp_time=1200 net.ipv4.tcp_tw_recycle=1 net.ipv4.tcp_tw_reuse=1 but ,it don't work, the open file is increase too ,about 2 hours,the open file is biger than system ,the kafka close by it self . how to solve it ?please help me .thank you ! > Kafka cluster is not responding when one broker hangs and resulted in too > many connections in close_wait in other brokers > - > > Key: KAFKA-5778 > URL: https://issues.apache.org/jira/browse/KAFKA-5778 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1 >Reporter: saichand >Priority: Critical > > In a cluster of 3 brokers , one of the broker(Broker-1 ) is hanged and from > then other two brokers has connections in close_wait for java client > producer/consumer and also even some broker to broker connections are in > close wait among those two brokers. > Kafka Version : 0.10.0.1 > In logs I found replica fetcher thread connection refused exceptions: > In broker 0 : replica fetcher 0-1, replica fetcher 0-2 > In broker 2 : replica fetcher 0-0, replica fetcher 0-1 > In broker 1 : It was hung no logs were available at that time. > We tried restarting broker- 2 kafka and then it was not successful as it > terminated saying zookeeper timeout > then we tried restarting broker- 0 kafka and we got the same error > Broker -1 was hang so , we could not login even into it > so we restarted broker -1 machine > then we restarted all zookepers and then kafka brokers now everything is fine -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-5778) Kafka cluster is not responding when one broker hangs and resulted in too many connections in close_wait in other brokers
[ https://issues.apache.org/jira/browse/KAFKA-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172583#comment-16172583 ] Jungbae Jun commented on KAFKA-5778: I patched my kafka version by 0.10.2.1 and the problem has not been occurred for 2 weeks. > Kafka cluster is not responding when one broker hangs and resulted in too > many connections in close_wait in other brokers > - > > Key: KAFKA-5778 > URL: https://issues.apache.org/jira/browse/KAFKA-5778 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1 >Reporter: saichand >Priority: Blocker > > In a cluster of 3 brokers , one of the broker(Broker-1 ) is hanged and from > then other two brokers has connections in close_wait for java client > producer/consumer and also even some broker to broker connections are in > close wait among those two brokers. > Kafka Version : 0.10.0.1 > In logs I found replica fetcher thread connection refused exceptions: > In broker 0 : replica fetcher 0-1, replica fetcher 0-2 > In broker 2 : replica fetcher 0-0, replica fetcher 0-1 > In broker 1 : It was hung no logs were available at that time. > We tried restarting broker- 2 kafka and then it was not successful as it > terminated saying zookeeper timeout > then we tried restarting broker- 0 kafka and we got the same error > Broker -1 was hang so , we could not login even into it > so we restarted broker -1 machine > then we restarted all zookepers and then kafka brokers now everything is fine -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5778) Kafka cluster is not responding when one broker hangs and resulted in too many connections in close_wait in other brokers
[ https://issues.apache.org/jira/browse/KAFKA-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148669#comment-16148669 ] Jungbae Jun commented on KAFKA-5778: I had experienced for 3 times in a month, same version and symptoms In the Last occurence, hanged broker was removed after 50 mins from the repilca automatically. (the kafka was unavailable for 50 mins) But I couldn't find any Error messsage in the log files. > Kafka cluster is not responding when one broker hangs and resulted in too > many connections in close_wait in other brokers > - > > Key: KAFKA-5778 > URL: https://issues.apache.org/jira/browse/KAFKA-5778 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1 >Reporter: saichand >Priority: Blocker > > In a cluster of 3 brokers , one of the broker(Broker-1 ) is hanged and from > then other two brokers has connections in close_wait for java client > producer/consumer and also even some broker to broker connections are in > close wait among those two brokers. > Kafka Version : 0.10.0.1 > In logs I found replica fetcher thread connection refused exceptions: > In broker 0 : replica fetcher 0-1, replica fetcher 0-2 > In broker 2 : replica fetcher 0-0, replica fetcher 0-1 > In broker 1 : It was hung no logs were available at that time. > We tried restarting broker- 2 kafka and then it was not successful as it > terminated saying zookeeper timeout > then we tried restarting broker- 0 kafka and we got the same error > Broker -1 was hang so , we could not login even into it > so we restarted broker -1 machine > then we restarted all zookepers and then kafka brokers now everything is fine -- This message was sent by Atlassian JIRA (v6.4.14#64029)