Hi Abhishek,

Have a look at KafkaSpoutConfig (and KafkaSpoutConfig.Builder),
particularly  the setMaxUncommittedOffsets
and ConsumerConfig.MAX_POLL_RECORDS_CONFIG
(kConfigBuilder.setProp(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, i)))

https://storm.apache.org/releases/current/javadocs/org/apache/storm/kafka/spout/KafkaSpoutConfig.Builder.html
https://kafka.apache.org/21/javadoc/index.html?org/apache/kafka/clients/consumer/ConsumerConfig.html

This will allow you to have more control at the Spout level.

Also, have a look at the documentation of Storm 2.x. The backpressure
mechanism has been changed:
http://storm.apache.org/releases/current/Performance.html

Backpressure model for Storm 2.0:
https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit#heading=h.w07mdxni7moh

https://issues.apache.org/jira/browse/STORM-2306


On Fri, 24 Apr 2020 at 18:39, Abhishek Raj <[email protected]> wrote:

> Hi,
>
> We have a topology which consumes data generated by multiple applications
> via Kafka. The data for one application is aggregated in a single bolt task
> using fieldsgrouping. All applications push data at different rates so some
> executors of the bolt are busier/overloaded than others and capacity
> distribution is non-uniform.
>
> The problem we're facing now is that when there's a spike in data produced
> by one (or more applications), capacity goes up for that executor, we see
> frequent gc pauses and eventually the corresponding jvm crashes causing
> worker restarts.
>
> As an ideal solution, we want to slow down only the application(s) which
> cause the spike. We cannot use the built in backpressure here because it
> happens at the spout level and slows down the entire pipeline.
>
> What are your thoughts on this? How can we fix this?
>
> Thanks
>

Reply via email to