Hello Magnat,

Thanks for reporting your observations. I have some questions:

1) Are your global state stores also in-memory or persisted on disks?
2) Are your Kafka and KStreams colocated?


Guozhang

On Tue, Aug 10, 2021 at 6:10 AM mangat rai <mangatm...@gmail.com> wrote:

> Hey All,
>
> We are using the low level processor API to create kafka stream
> applications. Each app has 1 or more in-memory state stores with caching
> disabled and changelog enabled. Some of the apps also have global stores.
> We noticed from the node metrics (kubernetes) that the stream applications
> are consuming too much disk IO. On going deeper I found following
>
> 1. Running locally with docker I could see some pretty high disk reads. I
> used `docker stats` and got `BLOCK I/O` as `438MB / 0B`. To compare we did
> only a few gigabytes of Net I/O.
> 2. In kubernetes, `container_fs_reads_bytes_total` gives us pretty big
> numbers whereas `container_fs_writes_bytes_total` is almost negligible.
>
> Now we are *not* using RocksDB. The pattern is not correlated to having a
> global store. I read various documents but I still can't figure out why a
> stream application would perform so much disk read. It's not even writing
> so that rules out the swap space or any buffering etc.
>
> I also noticed that a higher amount of data consumption is directly
> proportional to a higher amount of disk reads. Is it possible that the data
> is zero copied from the network interface to the disk and Kafka app is
> reading from it. I am not aware if there is any mechanism to do that.
>
> I would really appreciate any help in debugging this issue.
>
> Thanks,
> Mangat
>


-- 
-- Guozhang

Reply via email to