Re: Urgent: Mitigating Slow Consumer Impact and Seeking Open-SourceSolutions in Apache Kafka Consumers
Hi All We are also seeking for a custom partitioning strategy, it will be helpful for us too. Thanks and regards, Gowtham S On Mon, 18 Sept 2023 at 12:13, Karthick wrote: > Thanks Liu Ron for the suggestion. > > Can you please give any pointers/Reference for the custom partitioning > strategy, we are currently using murmur hashing with the device unique id. > It would be helpful if we guide/refer any other strategies. > > Thanks and regards > Karthick. > > On Mon, Sep 18, 2023 at 9:18 AM liu ron wrote: > >> Hi, Karthick >> >> It looks like a data skewing problem, and I think one of the easiest and >> most efficient ways for this issue is to increase the number of Partitions >> and see how it works first, like try expanding by 100 first. >> >> Best, >> Ron >> >> Karthick 于2023年9月17日周日 17:03写道: >> >>> Thanks Wei Chen, Giannis for the time, >>> >>> >>> For starters, you need to better size and estimate the required number >>>> of partitions you will need on the Kafka side in order to process 1000+ >>>> messages/second. >>>> The number of partitions should also define the maximum parallelism for >>>> the Flink job reading for Kafka. >>> >>> Thanks for the pointer, can you please guide on what are all the factors >>> we need to consider regarding this. >>> >>> use a custom partitioner that spreads those devices to somewhat separate >>>> partitions. >>> >>> Please suggest a working solution regarding the custom partitioner, to >>> distribute the load. It will be helpful. >>> >>> >>> What we were doing at that time was to define multiple topics and each >>>> has a different # of partitions >>> >>> Thanks for the suggestion, is there any calculation for choosing topics >>> count, is there are any formulae/factors to determine this topic number, >>> please let me know if available it will be helpful for us to choose that. >>> >>> Thanks and Regards >>> Karthick. >>> >>> >>> >>> On Sun, Sep 17, 2023 at 4:04 AM Wei Chen wrote: >>> >>>> Hi Karthick, >>>> We’ve experienced the similar issue before. What we were doing at that >>>> time was to define multiple topics and each has a different # of partitions >>>> which means some of the topics with more partitions will have the high >>>> parallelisms for processing. >>>> And you can further divide the topics into several groups and each >>>> group should have the similar # of partitions. For each group, you can >>>> define as the source of flink data stream to run them in parallel with >>>> different parallelism. >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> -- Original -- >>>> *From:* Giannis Polyzos >>>> *Date:* Sat,Sep 16,2023 11:52 PM >>>> *To:* Karthick >>>> *Cc:* Gowtham S , user >>> > >>>> *Subject:* Re: Urgent: Mitigating Slow Consumer Impact and Seeking >>>> Open-SourceSolutions in Apache Kafka Consumers >>>> >>>> Can you provide some more context on what your Flink job will be doing? >>>> There might be some things you can do to fix the data skew on the link >>>> side, but first, you want to start with Kafka. >>>> For starters, you need to better size and estimate the required number >>>> of partitions you will need on the Kafka side in order to process 1000+ >>>> messages/second. >>>> The number of partitions should also define the maximum parallelism for >>>> the Flink job reading for Kafka. >>>> If you know your "hot devices" in advance you might wanna use a custom >>>> partitioner that spreads those devices to somewhat separate partitions. >>>> Overall this is somewhat of a trial-and-error process. You might also >>>> want to check that these partitions are evenly balanced among your brokers >>>> and don't cause too much stress on particular brokers. >>>> >>>> Best >>>> >>>> On Sat, Sep 16, 2023 at 6:03 PM Karthick >>>> wrote: >>>> >>>>> Hi Gowtham i agree with you, >>>>> >>>>> I'm eager to resolve the issue or gain a better understanding. Your >>>>> assistance would be greatl
Re: Urgent: Mitigating Slow Consumer Impact and Seeking Open-Source Solutions in Apache Kafka Consumers
Hi Karthik This appears to be a common challenge related to a slow-consuming situation. Those with relevant experience in addressing such matters should be capable of providing assistance. Thanks and regards, Gowtham S On Fri, 15 Sept 2023 at 23:06, Giannis Polyzos wrote: > Hi Karthick, > > on a high level seems like a data skew issue and some partitions have way > more data than others? > What is the number of your devices? how many messages are you processing? > Most of the things you share above sound like you are looking for > suggestions around load distribution for Kafka. i.e number of partitions, > how to distribute your device data etc. > It would be good to also share what your flink job is doing as I don't see > anything mentioned around that.. are you observing back pressure in the > Flink UI? > > Best > > On Fri, Sep 15, 2023 at 3:46 PM Karthick > wrote: > >> Dear Apache Flink Community, >> >> >> >> I am writing to urgently address a critical challenge we've encountered >> in our IoT platform that relies on Apache Kafka and real-time data >> processing. We believe this issue is of paramount importance and may have >> broad implications for the community. >> >> >> >> In our IoT ecosystem, we receive data streams from numerous devices, each >> uniquely identified. To maintain data integrity and ordering, we've >> meticulously configured a Kafka topic with ten partitions, ensuring that >> each device's data is directed to its respective partition based on its >> unique identifier. This architectural choice has proven effective in >> maintaining data order, but it has also unveiled a significant problem: >> >> >> >> *One device's data processing slowness is interfering with other devices' >> data, causing a detrimental ripple effect throughout our system.* >> >> To put it simply, when a single device experiences processing delays, it >> acts as a bottleneck within the Kafka partition, leading to delays in >> processing data from other devices sharing the same partition. This issue >> undermines the efficiency and scalability of our entire data processing >> pipeline. >> >> Additionally, I would like to highlight that we are currently using the >> default partitioner for choosing the partition of each device's data. If >> there are alternative partitioning strategies that can help alleviate this >> problem, we are eager to explore them. >> >> We are in dire need of a high-scalability solution that not only ensures >> each device's data processing is independent but also prevents any >> interference or collisions between devices' data streams. Our primary >> objectives are: >> >> 1. *Isolation and Independence:* We require a strategy that guarantees >> one device's processing speed does not affect other devices in the same >> Kafka partition. In other words, we need a solution that ensures the >> independent processing of each device's data. >> >> >> 2. *Open-Source Implementation:* We are actively seeking pointers to >> open-source implementations or references to working solutions that address >> this specific challenge within the Apache ecosystem or any existing >> projects, libraries, or community-contributed solutions that align with our >> requirements would be immensely valuable. >> >> We recognize that many Apache Flink users face similar issues and may >> have already found innovative ways to tackle them. We implore you to share >> your knowledge and experiences on this matter. Specifically, we are >> interested in: >> >> *- Strategies or architectural patterns that ensure independent >> processing of device data.* >> >> *- Insights into load balancing, scalability, and efficient data >> processing across Kafka partitions.* >> >> *- Any existing open-source projects or implementations that address >> similar challenges.* >> >> >> >> We are confident that your contributions will not only help us resolve >> this critical issue but also assist the broader Apache Flink community >> facing similar obstacles. >> >> >> >> Please respond to this thread with your expertise, solutions, or any >> relevant resources. Your support will be invaluable to our team and the >> entire Apache Flink community. >> >> Thank you for your prompt attention to this matter. >> >> >> Thanks & Regards >> >> Karthick. >> >