Hi Ashish,

Gordon (in CC) might be able to help you.

Cheers, Fabian

2017-11-05 16:24 GMT+01:00 Ashish Pokharel <ashish...@yahoo.com>:

> All,
>
> I am starting to notice a strange behavior in a particular streaming app.
> I initially thought it was a Producer issue as I was seeing timeout
> exceptions (records expiring in queue. I did try to modify
> request.timeout.ms, linger.ms etc to help with the issue if it were
> caused by a sudden burst of data or something along those lines. However,
> what it caused the app to increase back pressure and made the slower and
> slower until that timeout is reached. With lower timeouts, app would
> actually raise exception and recover faster. I can tell it is not related
> to connectivity as other apps are running just fine around the same time
> frame connected to same brokers (we have at least 10 streaming apps
> connected to same list of brokers) from the same data nodes. We have
> enabled Graphite Reporter in all of our applications. After deep diving
> into some of consumer and producer stats, I noticed that consumer
> fetch-rate drops tremendously while fetch-size grows exponentially BEFORE
> the producer actually start to show higher response-time and lower rates.
> Eventually, I noticed connection resets start to occur and connection
> counts go up momentarily. After which, things get back to normal. Data
> producer rates remain constant around that timeframe - we have Logstash
> producer sending data over. We checked both Logstash and Kafka metrics and
> they seem to be showing same pattern (sort of sin wave) throughout.
>
> It seems to point to Kafka issue (perhaps some tuning between Flink App
> and Kafka) but wanted to check with the experts before I start knocking
> down Kafka Admin’s doors. Are there anything else I can look into. There
> are quite a few default stats in Graphite but those were the ones that made
> most sense.
>
> Thanks, Ashish

Reply via email to