[jira] [Created] (KAFKA-13720) Few topic partitions remain under replicated after broker lose connectivity to zookeeper

2022-03-09 Thread Dhirendra Singh (Jira)
Dhirendra Singh created KAFKA-13720:
---

 Summary: Few topic partitions remain under replicated after broker 
lose connectivity to zookeeper
 Key: KAFKA-13720
 URL: https://issues.apache.org/jira/browse/KAFKA-13720
 Project: Kafka
  Issue Type: Bug
  Components: controller
Affects Versions: 2.7.1
Reporter: Dhirendra Singh


Few topic partitions remain under replicated after broker lose connectivity to 
zookeeper.
It only happens when brokers lose connectivity to zookeeper and it results in 
change in active controller. Issue does not occur always but randomly.
Issue never occurs when there is no change in active controller when brokers 
lose connectivity to zookeeper.
Following error message i found in the log file.


[2022-02-28 04:01:20,217] WARN [Partition __consumer_offsets-4 broker=1] 
Controller failed to update ISR to PendingExpandIsr(isr=Set(1), 
newInSyncReplicaId=2) due to unexpected UNKNOWN_SERVER_ERROR. Retrying. 
(kafka.cluster.Partition)
[2022-02-28 04:01:20,217] ERROR [broker-1-to-controller] Uncaught error in 
request completion: (org.apache.kafka.clients.NetworkClient)
java.lang.IllegalStateException: Failed to enqueue `AlterIsr` request with 
state LeaderAndIsr(leader=1, leaderEpoch=2728, isr=List(1, 2), zkVersion=4719) 
for partition __consumer_offsets-4
at kafka.cluster.Partition.sendAlterIsrRequest(Partition.scala:1403)
at 
kafka.cluster.Partition.$anonfun$handleAlterIsrResponse$1(Partition.scala:1438)
at kafka.cluster.Partition.handleAlterIsrResponse(Partition.scala:1417)
at kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1(Partition.scala:1398)
at 
kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1$adapted(Partition.scala:1398)
at 
kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8(AlterIsrManager.scala:166)
at 
kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8$adapted(AlterIsrManager.scala:163)
at scala.collection.immutable.List.foreach(List.scala:333)
at 
kafka.server.AlterIsrManagerImpl.handleAlterIsrResponse(AlterIsrManager.scala:163)
at kafka.server.AlterIsrManagerImpl.responseHandler$1(AlterIsrManager.scala:94)
at 
kafka.server.AlterIsrManagerImpl.$anonfun$sendRequest$2(AlterIsrManager.scala:104)
at 
kafka.server.BrokerToControllerRequestThread.handleResponse(BrokerToControllerChannelManagerImpl.scala:175)
at 
kafka.server.BrokerToControllerRequestThread.$anonfun$generateRequests$1(BrokerToControllerChannelManagerImpl.scala:158)
at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:586)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:578)
at kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:71)
at 
kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManagerImpl.scala:183)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
 
under replication count goes to zero after the controller broker is restarted 
again. but this require manual intervention.
Expectation is that when broker reconnect with zookeeper cluster should come 
back to stable state with under replication count as zero by itself without any 
manual intervention.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-9135) Kafka producer/consumer are creating too many open file

2019-11-04 Thread Dhirendra Singh (Jira)
Dhirendra Singh created KAFKA-9135:
--

 Summary: Kafka producer/consumer are creating too many open file
 Key: KAFKA-9135
 URL: https://issues.apache.org/jira/browse/KAFKA-9135
 Project: Kafka
  Issue Type: Bug
  Components: admin, consumer, producer 
Affects Versions: 1.0.1
 Environment: apache kafka client :- 1.0.1
Kafka version :- 1.0.1
Open JDK :- java-1.8.0-openjdk-1.8.0.222.b10-1
CentOS version :- CentOS Linux release 7.6.1810
Reporter: Dhirendra Singh


We have a 3 node Kafka cluster deployment with 5 topic and 6 partition per 
topic . we have configured the replication factor =3 , we are seeing very 
strange problem that number of file descriptor have been crossed the ulimit ( 
what is 50K for our application)

As per the lsof command and our ananlsys

1. there are 15K established connection from kafka producer/consumer towards 
broker and at the same time in thread dump we have observed thousands of entry 
for kafka 'admin-client-network-thread'

admin-client-network-thread" #224398 daemon prio=5 os_prio=0 
tid=0x7f12ca119800 nid=0x5363 runnable [0x7f12c4db8000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0005e0603238> (a sun.nio.ch.Util$3)
- locked <0x0005e0603228> (a java.util.Collections$UnmodifiableSet)
- locked <0x0005e0602f08> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.kafka.common.network.Selector.select(Selector.java:672)
at org.apache.kafka.common.network.Selector.poll(Selector.java:396)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:238)
- locked <0x0005e0602dc0> (a 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:214)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:205)
at kafka.admin.AdminClient$$anon$1.run(AdminClient.scala:61)
at java.lang.Thread.run(Thread.java:748)


2. As per the lsof output , We have observed 35K entry for pipe and event poll

java 5441 notifs 374r FIFO 0,9 0t0 22415240 pipe
java 5441 notifs 375w FIFO 0,9 0t0 22415240 pipe
java 5441 notifs 376u a_inode 0,10 0 6379 [eventpoll]
java 5441 notifs 377r FIFO 0,9 0t0 2247 pipe
java 5441 notifs 378r FIFO 0,9 0t0 28054726 pipe
java 5441 notifs 379r FIFO 0,9 0t0 22415241 pipe
java 5441 notifs 380w FIFO 0,9 0t0 22415241 pipe
java 5441 notifs 381u a_inode 0,10 0 6379 [eventpoll]
java 5441 notifs 382w FIFO 0,9 0t0 2247 pipe
java 5441 notifs 383u a_inode 0,10 0 6379 [eventpoll]
java 5441 notifs 384u a_inode 0,10 0 6379 [eventpoll]
java 5441 notifs 385r FIFO 0,9 0t0 40216087 pipe
java 5441 notifs 386r FIFO 0,9 0t0 22483470 pipe


Setup details :- 
apache kafka client :- 1.0.1
Kafka version :- 1.0.1
Open JDK :- java-1.8.0-openjdk-1.8.0.222.b10-1
CentOS version :- CentOS Linux release 7.6.1810

Note :- After restarted VM file descriptor count was able to clear and come to 
normal count as 1000 then after a few second file descriptor count started to 
increase and it will reach to 50K (ulimit) after 1 week inIdle scenarion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)