Thanks Jeff! On Fri, Feb 2, 2018 at 11:58 AM, Jeff Widman <j...@jeffwidman.com> wrote:
> This means either the brokers are not healthy (bad hardware) or that the > replication fetchers can't keep up with the rate of incoming messages. > > If the latter, you need to figure out where the latency bottleneck is and > what your latency SLAs are. > > Common sources of latency bottlenecks: > - network has slow roundtrip speeds: Increase network speed, or increase > bytes per trip, or increase number of simultaneous fetchers, or increase > the timeout so that the broker has time to fill all the bytes in the fetch > request... > - broker slow disk I/O: increase disk speed, or increase linux page cache > size > > There are JMX metrics that help disambiguate whether the problem is disk vs > network... unfortunately the Datadog check is lacking many of these, > something that I've had on my todo list to patch as we also use Datadog at > my day job. > > One other possible problem is when you have a combination of a lot of > low-volume partitions being replicated in each call along with a couple of > high-volume partitions... then the broker can take a long time assembling > the responses because it has to look at each partition, which might add > only 1 KB, so it takes a long time to hit the 1MB bytes partition... so it > hits the timeout first. Then it sends a small response, even though you've > got a handful of partitions that are really hot and will soon be marked as > not being in sync. > > I know this doesn't provide full details, but hopefully it's enough to get > you pointed in the right direction... > > Cheers, > Jeff > > > > On Fri, Feb 2, 2018 at 11:27 AM, Richard Rodseth <rrods...@gmail.com> > wrote: > > > We have a DataDog integration showing some metrics, and for one of our > > clusters the above two > > values are > 0 and highlighted in red. > > > > What's the usual remedy (Confluient Platform, OSS version) ? > > > > Thanks > > > > > > -- > > *Jeff Widman* > jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265) > <>< >