Hi all, We have update Kafka-storm client for Kafka Spout from 1.0.3 to 1.1 recently. Before this upgrade, we were able to use our application without any issue, but after the upgrade, our performance dropped significantly wherever we are using more than a single partition for our Kafka topic! If I use more than a single partition, I can see the following error in storm log which affects throughput significantly.
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. at org.apache.kafka.clients.consumer.internals. ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:600) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$ OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:541) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$ CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$ CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$ RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient. clientPoll(ConsumerNetworkClient.java:360) at org.apache.kafka.clients. consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll( ConsumerNetworkClient.java:192) at org.apache.kafka.clients. consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator. commitOffsetsSync(ConsumerCoordinator.java:426) at org.apache.kafka.clients. consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1059) at org.apache.storm.kafka.spout.KafkaSpout.commitOffsetsForAckedTuples(KafkaSpout.java:302) at org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:204) at org.apache.storm.daemon.executor$fn__6505$fn__6520$fn__6551.invoke(executor.clj:651) at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at clojure.lang.AFn.run(AFn.java:22) at java.lang.Thread.run(Thread.java:748) We have changed the value of max.poll.records or session.timeout.ms but it wasn't helpful at all. I was wondering what has changed that caused this performance issue. P.S: In addition to the upgrade, we are using a dedicated Kafka Brokers right now, so I am not entirely sure that the issue is related to the client upgrade. Previously Kafka and Storm supervisor were collocated on the same host. Regards, Ali