Sounds like you're reaching the limits of what your disks will do either on reads or writes. Debug it as you would any other disk based app, https://haydenjames.io/linux-server-performance-disk-io-slowing-application/ might help.
On Tue, 22 Jan 2019 at 09:19, wenxing zheng <wenxing.zh...@gmail.com> wrote: > Dear all, > > We got a kafka cluster with 5 nodes, and from the metrics of datadog, we > found that regularly the elapse for sending to kafka was more than 200ms, > and there was a peek on the system.io.await. > > Please help to advice what would be the problem and any hints? > [image: image.png] > > Kind regards, Wenxing >