Hi Abhishek, Have a look at KafkaSpoutConfig (and KafkaSpoutConfig.Builder), particularly the setMaxUncommittedOffsets and ConsumerConfig.MAX_POLL_RECORDS_CONFIG (kConfigBuilder.setProp(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, i)))
https://storm.apache.org/releases/current/javadocs/org/apache/storm/kafka/spout/KafkaSpoutConfig.Builder.html https://kafka.apache.org/21/javadoc/index.html?org/apache/kafka/clients/consumer/ConsumerConfig.html This will allow you to have more control at the Spout level. Also, have a look at the documentation of Storm 2.x. The backpressure mechanism has been changed: http://storm.apache.org/releases/current/Performance.html Backpressure model for Storm 2.0: https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit#heading=h.w07mdxni7moh https://issues.apache.org/jira/browse/STORM-2306 On Fri, 24 Apr 2020 at 18:39, Abhishek Raj <[email protected]> wrote: > Hi, > > We have a topology which consumes data generated by multiple applications > via Kafka. The data for one application is aggregated in a single bolt task > using fieldsgrouping. All applications push data at different rates so some > executors of the bolt are busier/overloaded than others and capacity > distribution is non-uniform. > > The problem we're facing now is that when there's a spike in data produced > by one (or more applications), capacity goes up for that executor, we see > frequent gc pauses and eventually the corresponding jvm crashes causing > worker restarts. > > As an ideal solution, we want to slow down only the application(s) which > cause the spike. We cannot use the built in backpressure here because it > happens at the spout level and slows down the entire pipeline. > > What are your thoughts on this? How can we fix this? > > Thanks >
