[jira] [Commented] (KAFKA-5778) Kafka cluster is not responding when one broker hangs and resulted in too many connections in close_wait in other brokers

2018-12-26 Thread Jungbae Jun (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728943#comment-16728943
 ] 

Jungbae Jun commented on KAFKA-5778:


[~tju_lushilin]

I've never experienced same problem since updating kafka version.

 

> Kafka cluster is not responding when one broker hangs and resulted in too 
> many connections in close_wait in other brokers
> -
>
> Key: KAFKA-5778
> URL: https://issues.apache.org/jira/browse/KAFKA-5778
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: saichand
>Priority: Critical
>
> In a cluster of 3 brokers , one of the broker(Broker-1 ) is hanged and from 
> then other two brokers has connections in close_wait for java client 
> producer/consumer and also even some broker to broker connections are in 
> close wait among those two brokers.
> Kafka Version : 0.10.0.1
> In logs I found replica fetcher thread connection refused exceptions:
> In broker 0 : replica fetcher 0-1, replica fetcher 0-2
> In broker 2 : replica fetcher 0-0, replica fetcher 0-1
> In broker 1 : It was hung no logs were available at that time.
> We tried restarting broker- 2 kafka and then it was not successful as it 
> terminated saying zookeeper timeout 
> then we tried restarting broker- 0 kafka and we got the same error
> Broker -1 was hang so , we could not login even into it
> so we restarted broker -1 machine
> then we restarted all zookepers and then kafka brokers now everything is fine 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5778) Kafka cluster is not responding when one broker hangs and resulted in too many connections in close_wait in other brokers

2018-12-25 Thread Shilin Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728624#comment-16728624
 ] 

Shilin Lu commented on KAFKA-5778:
--

hello, we meet the same problem in our prod environment when the controller is 
reelect.The phenomenon is close_wait tcp status increace fastly and isr 
shrink.I think this problem can  not resolve by modify linux sys ctl config 
,maybe this is a program code bug.

how to resolve it?Do you have some new discovery.thank you !

> Kafka cluster is not responding when one broker hangs and resulted in too 
> many connections in close_wait in other brokers
> -
>
> Key: KAFKA-5778
> URL: https://issues.apache.org/jira/browse/KAFKA-5778
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: saichand
>Priority: Critical
>
> In a cluster of 3 brokers , one of the broker(Broker-1 ) is hanged and from 
> then other two brokers has connections in close_wait for java client 
> producer/consumer and also even some broker to broker connections are in 
> close wait among those two brokers.
> Kafka Version : 0.10.0.1
> In logs I found replica fetcher thread connection refused exceptions:
> In broker 0 : replica fetcher 0-1, replica fetcher 0-2
> In broker 2 : replica fetcher 0-0, replica fetcher 0-1
> In broker 1 : It was hung no logs were available at that time.
> We tried restarting broker- 2 kafka and then it was not successful as it 
> terminated saying zookeeper timeout 
> then we tried restarting broker- 0 kafka and we got the same error
> Broker -1 was hang so , we could not login even into it
> so we restarted broker -1 machine
> then we restarted all zookepers and then kafka brokers now everything is fine 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5778) Kafka cluster is not responding when one broker hangs and resulted in too many connections in close_wait in other brokers

2018-02-04 Thread tianye (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352063#comment-16352063
 ] 

tianye commented on KAFKA-5778:
---

hello ,i have the same problem,the kafka broker close_wait is increase, i 
config the /etc/sysctl.conf 

net.ipv4.tcp_keepalive_intvl=5

net.ipv4.tcp_keepalive_probes=5

net.ipv4.tcp_time=1200

net.ipv4.tcp_tw_recycle=1

net.ipv4.tcp_tw_reuse=1

but ,it don't work, the open file is increase too ,about 2 hours,the open file 
is biger than system ,the kafka close by it self .

how to solve it ?please help me .thank you !

> Kafka cluster is not responding when one broker hangs and resulted in too 
> many connections in close_wait in other brokers
> -
>
> Key: KAFKA-5778
> URL: https://issues.apache.org/jira/browse/KAFKA-5778
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: saichand
>Priority: Critical
>
> In a cluster of 3 brokers , one of the broker(Broker-1 ) is hanged and from 
> then other two brokers has connections in close_wait for java client 
> producer/consumer and also even some broker to broker connections are in 
> close wait among those two brokers.
> Kafka Version : 0.10.0.1
> In logs I found replica fetcher thread connection refused exceptions:
> In broker 0 : replica fetcher 0-1, replica fetcher 0-2
> In broker 2 : replica fetcher 0-0, replica fetcher 0-1
> In broker 1 : It was hung no logs were available at that time.
> We tried restarting broker- 2 kafka and then it was not successful as it 
> terminated saying zookeeper timeout 
> then we tried restarting broker- 0 kafka and we got the same error
> Broker -1 was hang so , we could not login even into it
> so we restarted broker -1 machine
> then we restarted all zookepers and then kafka brokers now everything is fine 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5778) Kafka cluster is not responding when one broker hangs and resulted in too many connections in close_wait in other brokers

2017-09-19 Thread Jungbae Jun (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172583#comment-16172583
 ] 

Jungbae Jun commented on KAFKA-5778:


I patched my kafka version by 0.10.2.1 and the problem has not been occurred 
for 2 weeks.


> Kafka cluster is not responding when one broker hangs and resulted in too 
> many connections in close_wait in other brokers
> -
>
> Key: KAFKA-5778
> URL: https://issues.apache.org/jira/browse/KAFKA-5778
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: saichand
>Priority: Blocker
>
> In a cluster of 3 brokers , one of the broker(Broker-1 ) is hanged and from 
> then other two brokers has connections in close_wait for java client 
> producer/consumer and also even some broker to broker connections are in 
> close wait among those two brokers.
> Kafka Version : 0.10.0.1
> In logs I found replica fetcher thread connection refused exceptions:
> In broker 0 : replica fetcher 0-1, replica fetcher 0-2
> In broker 2 : replica fetcher 0-0, replica fetcher 0-1
> In broker 1 : It was hung no logs were available at that time.
> We tried restarting broker- 2 kafka and then it was not successful as it 
> terminated saying zookeeper timeout 
> then we tried restarting broker- 0 kafka and we got the same error
> Broker -1 was hang so , we could not login even into it
> so we restarted broker -1 machine
> then we restarted all zookepers and then kafka brokers now everything is fine 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5778) Kafka cluster is not responding when one broker hangs and resulted in too many connections in close_wait in other brokers

2017-08-31 Thread Jungbae Jun (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148669#comment-16148669
 ] 

Jungbae Jun commented on KAFKA-5778:


I had experienced for 3 times in a month, same version and symptoms

In the Last occurence, hanged broker was removed after 50 mins from the repilca 
automatically. (the kafka was unavailable for 50 mins)
But I couldn't find any Error messsage in the log files.



> Kafka cluster is not responding when one broker hangs and resulted in too 
> many connections in close_wait in other brokers
> -
>
> Key: KAFKA-5778
> URL: https://issues.apache.org/jira/browse/KAFKA-5778
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: saichand
>Priority: Blocker
>
> In a cluster of 3 brokers , one of the broker(Broker-1 ) is hanged and from 
> then other two brokers has connections in close_wait for java client 
> producer/consumer and also even some broker to broker connections are in 
> close wait among those two brokers.
> Kafka Version : 0.10.0.1
> In logs I found replica fetcher thread connection refused exceptions:
> In broker 0 : replica fetcher 0-1, replica fetcher 0-2
> In broker 2 : replica fetcher 0-0, replica fetcher 0-1
> In broker 1 : It was hung no logs were available at that time.
> We tried restarting broker- 2 kafka and then it was not successful as it 
> terminated saying zookeeper timeout 
> then we tried restarting broker- 0 kafka and we got the same error
> Broker -1 was hang so , we could not login even into it
> so we restarted broker -1 machine
> then we restarted all zookepers and then kafka brokers now everything is fine 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)