Robert

We are checking using the metric
flink_taskmanager_job_task_operator_KafkaConsumer_assigned_partitions{jobname="SPECIFICJOBNAME"}

This metric gives the number of partitions assigned to each task(kafka
consumer operator).

Prasanna.


On Wed, Aug 4, 2021 at 8:59 PM Robert Metzger <rmetz...@apache.org> wrote:

> Hi Prasanna,
>
> How are you checking the assignment of Kafka partitions to the consumers?
>
> The FlinkKafkaConsumer doesn't have a rebalance() method, this is a
> generic concept of the DataStream API. Is it possible that you are
> somehow partitioning your data in your Flink job, and this is causing the
> data distribution issues you are observing?
>
>
> On Wed, Aug 4, 2021 at 4:00 PM Prasanna kumar <
> prasannakumarram...@gmail.com> wrote:
>
>> Robert
>>
>> When we apply a rebalance method to the kafka consumer, it is assigning
>> partitions of various topics evenly.
>>
>> But my only concern is that the rebalance method might have a performance
>> impact .
>>
>> Thanks,
>> Prasanna.
>>
>>
>> On Wed, Aug 4, 2021 at 5:55 PM Prasanna kumar <
>> prasannakumarram...@gmail.com> wrote:
>>
>>> Robert,
>>>
>>> Flink version 1.12.2.
>>> Flink connector Kafka Version 2..12
>>>
>>> The partitions are assigned equally if we are reading from a single
>>> topic.
>>>
>>> Our Use case is to read from multiple topics [topics r4 regex pattern]
>>> we use 6 topics and 1 partition per topic for this job.
>>>
>>> In this case , few of the kafka consumer tasks are not allocated.
>>>
>>> Thanks,
>>> Prasanna.
>>>
>>> On Tue, 20 Jul 2021, 17:44 Robert Metzger, <rmetz...@apache.org> wrote:
>>>
>>>> Hi Prasanna,
>>>> which Flink version and Kafka connector are you using? (the
>>>> "KafkaSource" or "FlinkKafkaConsumer"?)
>>>>
>>>> The partition assignment for the FlinkKafkaConsumer is defined here:
>>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/KafkaTopicPartitionAssigner.java#L27-L43
>>>>
>>>>
>>>> I assume all your topics have one partition only. Still, the
>>>> "startIndex" should be determined based on the hash of the topic name. My
>>>> only explanation is that your unlucky with the distribution of the hashes.
>>>> If this leads to performance issues, consider using topics with
>>>> multiple partitions, change the name of the topics or increase the
>>>> parallelism of your consumer.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jul 20, 2021 at 7:53 AM Prasanna kumar <
>>>> prasannakumarram...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a Flink job reading from multiple Kafka topics based on a
>>>>> regex pattern.
>>>>>
>>>>> What we have found out is that the topics are not shared between the
>>>>> kafka consumers in an even manner .
>>>>>
>>>>> Example if there are 8 topics and 4 kafka consumer operators . 1
>>>>> consumer is assigned 6 topics , 2 consumers assigned 1 each and the last
>>>>> consumer is not assigned at all.
>>>>>
>>>>> This leads to inadequate usage of the resources.
>>>>>
>>>>> I could not find any setting/configuration which would make them as
>>>>> even as possible.
>>>>>
>>>>> Let me know if there's a way to do the same.
>>>>>
>>>>> Thanks,
>>>>> Prasanna.
>>>>>
>>>>

Reply via email to