Hello Niklas, If you can monitor your repartition topic's consumer lag, and it was increasing consistently, it means your downstream processor cannot simply keep up with the throughput of the upstream processor. Usually it means your downstream operators is heavier (e.g. aggregations, joins that are all stateful) than your upstreams (e.g. simply for shuffling the data to repartition topics), and since tasks assignment only consider a task as the smallest unit of work and did not differentiate "heavy" and "light" tasks, such imbalance of task assignment may happen. At the moment, to resolve this you should add more resources to make sure the heavy tasks get enough computational resource assigned (more threads, e.g.).
If your observed consumer lag stays plateau after increasing to some point, it means your consumer can actually keep up with some constant lag; if you hit your open file limits before seeing this, it means you either need to increase your open file limits, OR you can simply increase the segment size to reduce num. files via "StreamsConfig.TOPIC_PREFIX"to set the value of TopicConfig.SEGMENT_BYTES_CONFIG. Guozhang On Tue, Jan 22, 2019 at 4:38 AM Niklas Lönn <niklas.l...@gmail.com> wrote: > Hi Kafka Devs & Users, > > We recently had an issue where we processed a lot of old data and we > crashed our brokers due to too many memory mapped files. > It seems to me that the nature of Kafka / Kafka Streams is a bit > suboptimal in terms of resource management. (Keeping all files open all the > time, maybe there should be something managing this more on-demand?) > > In the issue I described, the repartition topic was produced very fast, > but not consumed, causing a lot of segments and files to be open at the > same time. > > I have worked around the issue by making sure I have more threads than > partitions to force tasks to subscribe to internal topics only, but seems a > bit hacky and maybe there should be some guidance in documentation if > considered part of design.. > > After quite some testing and code reversing it seems that the nature of > this imbalance lies within how the broker multiplexes the consumed > topic-partitions. > > I have attached a slide that I will present to my team to explain the > issue in a bit more detail, it might be good to check it out to understand > the context. > > Any thoughts about my findings and concerns? > > Kind regards > Niklas > -- -- Guozhang