Re: Urgent: Mitigating Slow Consumer Impact and Seeking Open-SourceSolutions in Apache Kafka Consumers

2023-09-18 Thread Gowtham S
Hi All

We are also seeking for a custom partitioning strategy, it will be helpful
for us too.


Thanks and regards,
Gowtham S


On Mon, 18 Sept 2023 at 12:13, Karthick  wrote:

> Thanks Liu Ron for the suggestion.
>
> Can you please give any pointers/Reference for the custom partitioning
> strategy, we are currently using murmur hashing with the device unique id.
> It would be helpful if we guide/refer any other strategies.
>
> Thanks and regards
> Karthick.
>
> On Mon, Sep 18, 2023 at 9:18 AM liu ron  wrote:
>
>> Hi, Karthick
>>
>> It looks like a data skewing problem, and I think one of the easiest and
>> most efficient ways for this issue is to increase the number of Partitions
>> and see how it works first, like try expanding by 100 first.
>>
>> Best,
>> Ron
>>
>> Karthick  于2023年9月17日周日 17:03写道:
>>
>>> Thanks Wei Chen, Giannis for the time,
>>>
>>>
>>> For starters, you need to better size and estimate the required number
>>>> of partitions you will need on the Kafka side in order to process 1000+
>>>> messages/second.
>>>> The number of partitions should also define the maximum parallelism for
>>>> the Flink job reading for Kafka.
>>>
>>> Thanks for the pointer, can you please guide on what are all the factors
>>> we need to consider regarding this.
>>>
>>> use a custom partitioner that spreads those devices to somewhat separate
>>>> partitions.
>>>
>>> Please suggest a working solution regarding the custom partitioner, to
>>> distribute the load. It will be helpful.
>>>
>>>
>>> What we were doing at that time was to define multiple topics and each
>>>> has a different # of partitions
>>>
>>> Thanks for the suggestion, is there any calculation for choosing topics
>>> count, is there are any formulae/factors to determine this topic number,
>>> please let me know if available it will be helpful for us to choose that.
>>>
>>> Thanks and Regards
>>> Karthick.
>>>
>>>
>>>
>>> On Sun, Sep 17, 2023 at 4:04 AM Wei Chen  wrote:
>>>
>>>> Hi Karthick,
>>>> We’ve experienced the similar issue before. What we were doing at that
>>>> time was to define multiple topics and each has a different # of partitions
>>>> which means some of the topics with more partitions will have the high
>>>> parallelisms for processing.
>>>> And you can further divide the topics into several groups and each
>>>> group should have the similar # of partitions. For each group, you can
>>>> define as the source of flink data stream to run them in parallel with
>>>> different parallelism.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>> -- Original --
>>>> *From:* Giannis Polyzos 
>>>> *Date:* Sat,Sep 16,2023 11:52 PM
>>>> *To:* Karthick 
>>>> *Cc:* Gowtham S , user >>> >
>>>> *Subject:* Re: Urgent: Mitigating Slow Consumer Impact and Seeking
>>>> Open-SourceSolutions in Apache Kafka Consumers
>>>>
>>>> Can you provide some more context on what your Flink job will be doing?
>>>> There might be some things you can do to fix the data skew on the link
>>>> side, but first, you want to start with Kafka.
>>>> For starters, you need to better size and estimate the required number
>>>> of partitions you will need on the Kafka side in order to process 1000+
>>>> messages/second.
>>>> The number of partitions should also define the maximum parallelism for
>>>> the Flink job reading for Kafka.
>>>> If you know your "hot devices" in advance you might wanna use a custom
>>>> partitioner that spreads those devices to somewhat separate partitions.
>>>> Overall this is somewhat of a trial-and-error process. You might also
>>>> want to check that these partitions are evenly balanced among your brokers
>>>> and don't cause too much stress on particular brokers.
>>>>
>>>> Best
>>>>
>>>> On Sat, Sep 16, 2023 at 6:03 PM Karthick 
>>>> wrote:
>>>>
>>>>> Hi Gowtham i agree with you,
>>>>>
>>>>> I'm eager to resolve the issue or gain a better understanding. Your
>>>>> assistance would be greatl

Re: Urgent: Mitigating Slow Consumer Impact and Seeking Open-Source Solutions in Apache Kafka Consumers

2023-09-15 Thread Gowtham S
Hi Karthik

This appears to be a common challenge related to a slow-consuming
situation. Those with relevant experience in addressing such matters should
be capable of providing assistance.

Thanks and regards,
Gowtham S


On Fri, 15 Sept 2023 at 23:06, Giannis Polyzos 
wrote:

> Hi Karthick,
>
> on a high level seems like a data skew issue and some partitions have way
> more data than others?
> What is the number of your devices? how many messages are you processing?
> Most of the things you share above sound like you are looking for
> suggestions around load distribution for Kafka.  i.e number of partitions,
> how to distribute your device data etc.
> It would be good to also share what your flink job is doing as I don't see
> anything mentioned around that.. are you observing back pressure in the
> Flink UI?
>
> Best
>
> On Fri, Sep 15, 2023 at 3:46 PM Karthick 
> wrote:
>
>> Dear Apache Flink Community,
>>
>>
>>
>> I am writing to urgently address a critical challenge we've encountered
>> in our IoT platform that relies on Apache Kafka and real-time data
>> processing. We believe this issue is of paramount importance and may have
>> broad implications for the community.
>>
>>
>>
>> In our IoT ecosystem, we receive data streams from numerous devices, each
>> uniquely identified. To maintain data integrity and ordering, we've
>> meticulously configured a Kafka topic with ten partitions, ensuring that
>> each device's data is directed to its respective partition based on its
>> unique identifier. This architectural choice has proven effective in
>> maintaining data order, but it has also unveiled a significant problem:
>>
>>
>>
>> *One device's data processing slowness is interfering with other devices'
>> data, causing a detrimental ripple effect throughout our system.*
>>
>> To put it simply, when a single device experiences processing delays, it
>> acts as a bottleneck within the Kafka partition, leading to delays in
>> processing data from other devices sharing the same partition. This issue
>> undermines the efficiency and scalability of our entire data processing
>> pipeline.
>>
>> Additionally, I would like to highlight that we are currently using the
>> default partitioner for choosing the partition of each device's data. If
>> there are alternative partitioning strategies that can help alleviate this
>> problem, we are eager to explore them.
>>
>> We are in dire need of a high-scalability solution that not only ensures
>> each device's data processing is independent but also prevents any
>> interference or collisions between devices' data streams. Our primary
>> objectives are:
>>
>> 1. *Isolation and Independence:* We require a strategy that guarantees
>> one device's processing speed does not affect other devices in the same
>> Kafka partition. In other words, we need a solution that ensures the
>> independent processing of each device's data.
>>
>>
>> 2. *Open-Source Implementation:* We are actively seeking pointers to
>> open-source implementations or references to working solutions that address
>> this specific challenge within the Apache ecosystem or any existing
>> projects, libraries, or community-contributed solutions that align with our
>> requirements would be immensely valuable.
>>
>> We recognize that many Apache Flink users face similar issues and may
>> have already found innovative ways to tackle them. We implore you to share
>> your knowledge and experiences on this matter. Specifically, we are
>> interested in:
>>
>> *- Strategies or architectural patterns that ensure independent
>> processing of device data.*
>>
>> *- Insights into load balancing, scalability, and efficient data
>> processing across Kafka partitions.*
>>
>> *- Any existing open-source projects or implementations that address
>> similar challenges.*
>>
>>
>>
>> We are confident that your contributions will not only help us resolve
>> this critical issue but also assist the broader Apache Flink community
>> facing similar obstacles.
>>
>>
>>
>> Please respond to this thread with your expertise, solutions, or any
>> relevant resources. Your support will be invaluable to our team and the
>> entire Apache Flink community.
>>
>> Thank you for your prompt attention to this matter.
>>
>>
>> Thanks & Regards
>>
>> Karthick.
>>
>