[ https://issues.apache.org/jira/browse/KAFKA-9017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Randall Hauch updated KAFKA-9017: --------------------------------- Component/s: (was: KafkaConnect) core > We see timeout in kafka in production cluster > --------------------------------------------- > > Key: KAFKA-9017 > URL: https://issues.apache.org/jira/browse/KAFKA-9017 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 2.1.0 > Environment: Production > Reporter: Suhas > Priority: Critical > Attachments: stderr (7), stdout (12) > > > We see timeout in kafka in production cluster and Kafka is running on > DC/OS(MESOS) > and below are the errors > *+Exception 1: This from application logs+* > 2019-10-07 10:01:59 Error: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for > ie-lrx-audit-evt-3: 30030 ms has passed since batch creation plus linger time > *+Exception 2:This from application logs+* > {"eventTime":"2019-10-07 08:20:43.265", "logType":"ERROR", "stackMessage" : > "java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for > ie-lrx-audit-evt-3: 30028 ms has passed since batch creation plus linger > time", "stackTrace" : > *+Exception (from log) We see this logs on broker logs+* > [2019-10-10 06:32:10,844] INFO [ReplicaFetcher replicaId=4, leaderId=2, > fetcherId=0] Error sending fetch request (sessionId=919177392, epoch=INITIAL) > to node 2: java.io.IOException: Connection to 2 was disconnected before the > response was read. (org.apache.kafka.clients.FetchSessionHandler)[2019-10-10 > 06:32:10,844] INFO [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] > Error sending fetch request (sessionId=919177392, epoch=INITIAL) to node 2: > java.io.IOException: Connection to 2 was disconnected before the response was > read. (org.apache.kafka.clients.FetchSessionHandler)[2019-10-10 06:32:10,849] > WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Error in response > for fetch request (type=FetchRequest, replicaId=4, maxWait=500, minBytes=1, > maxBytes=10485760, fetchData=\{ie-lrx-rxer-audit-evt-0=(offset=0, > logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), > mft-hdfs-landing-evt-1=(offset=0, logStartOffset=0, maxBytes=1048576, > currentLeaderEpoch=Optional[108]), dca-audit-evt-2=(offset=0, > logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), > it-sou-audit-evt-7=(offset=94819, logStartOffset=94819, maxBytes=1048576, > currentLeaderEpoch=Optional[100]), intg-ie-lrx-rxer-audit-evt-2=(offset=0, > logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[78]), > prod-pipelines-errors-evt-0=(offset=0, logStartOffset=0, maxBytes=1048576, > currentLeaderEpoch=Optional[117]), __consumer_offsets-36=(offset=3, > logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), > panel-data-change-evt-4=(offset=0, logStartOffset=0, maxBytes=1048576, > currentLeaderEpoch=Optional[108]), gdcp-notification-evt-2=(offset=0, > logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), > data-transfer-change-evt-0=(offset=0, logStartOffset=0, maxBytes=1048576, > currentLeaderEpoch=Optional[108]), __consumer_offsets-11=(offset=15, > logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), > dca-heartbeat-evt-2=(offset=0, logStartOffset=0, maxBytes=1048576, > currentLeaderEpoch=Optional[105]), ukwhs-error-topic-1=(offset=8, > logStartOffset=8, maxBytes=1048576, currentLeaderEpoch=Optional[105]), > intg-ie-lrx-audit-evt-4=(offset=21, logStartOffset=21, maxBytes=1048576, > currentLeaderEpoch=Optional[74]), __consumer_offsets-16=(offset=11329814, > logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), > __consumer_offsets-31=(offset=3472033, logStartOffset=0, maxBytes=1048576, > currentLeaderEpoch=Optional[107]), ukpai-hdfs-evt-1=(offset=0, > logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[107]), > mft-pflow-evt-1=(offset=0, logStartOffset=0, maxBytes=1048576, > currentLeaderEpoch=Optional[108]), ukwhs-hdfs-landing-evt-01-2=(offset=0, > logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[105]), > it-sou-audit-evt-2=(offset=490084, logStartOffset=490084, maxBytes=1048576, > currentLeaderEpoch=Optional[105]), ie-lrx-pat-audit-evt-4=(offset=0, > logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104])}, > isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=919177392, > epoch=INITIAL)) (kafka.server.ReplicaFetcherThread)java.io.IOException: > Connection to 2 was disconnected before the response was read at > org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97) > at > kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97) > at > kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:241) > at > kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130) > at > kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129) > at scala.Option.foreach(Option.scala:257) at > kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129) > at > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111) at > kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) -- This message was sent by Atlassian Jira (v8.3.4#803005)