Re: Urgent: Mitigating Slow Consumer Impact and Seeking Open-Source Solutions in Apache Kafka Consumers

2023-09-15 Thread Gowtham S
Hi Karthik

This appears to be a common challenge related to a slow-consuming
situation. Those with relevant experience in addressing such matters should
be capable of providing assistance.

Thanks and regards,
Gowtham S


On Fri, 15 Sept 2023 at 23:06, Giannis Polyzos 
wrote:

> Hi Karthick,
>
> on a high level seems like a data skew issue and some partitions have way
> more data than others?
> What is the number of your devices? how many messages are you processing?
> Most of the things you share above sound like you are looking for
> suggestions around load distribution for Kafka.  i.e number of partitions,
> how to distribute your device data etc.
> It would be good to also share what your flink job is doing as I don't see
> anything mentioned around that.. are you observing back pressure in the
> Flink UI?
>
> Best
>
> On Fri, Sep 15, 2023 at 3:46 PM Karthick 
> wrote:
>
>> Dear Apache Flink Community,
>>
>>
>>
>> I am writing to urgently address a critical challenge we've encountered
>> in our IoT platform that relies on Apache Kafka and real-time data
>> processing. We believe this issue is of paramount importance and may have
>> broad implications for the community.
>>
>>
>>
>> In our IoT ecosystem, we receive data streams from numerous devices, each
>> uniquely identified. To maintain data integrity and ordering, we've
>> meticulously configured a Kafka topic with ten partitions, ensuring that
>> each device's data is directed to its respective partition based on its
>> unique identifier. This architectural choice has proven effective in
>> maintaining data order, but it has also unveiled a significant problem:
>>
>>
>>
>> *One device's data processing slowness is interfering with other devices'
>> data, causing a detrimental ripple effect throughout our system.*
>>
>> To put it simply, when a single device experiences processing delays, it
>> acts as a bottleneck within the Kafka partition, leading to delays in
>> processing data from other devices sharing the same partition. This issue
>> undermines the efficiency and scalability of our entire data processing
>> pipeline.
>>
>> Additionally, I would like to highlight that we are currently using the
>> default partitioner for choosing the partition of each device's data. If
>> there are alternative partitioning strategies that can help alleviate this
>> problem, we are eager to explore them.
>>
>> We are in dire need of a high-scalability solution that not only ensures
>> each device's data processing is independent but also prevents any
>> interference or collisions between devices' data streams. Our primary
>> objectives are:
>>
>> 1. *Isolation and Independence:* We require a strategy that guarantees
>> one device's processing speed does not affect other devices in the same
>> Kafka partition. In other words, we need a solution that ensures the
>> independent processing of each device's data.
>>
>>
>> 2. *Open-Source Implementation:* We are actively seeking pointers to
>> open-source implementations or references to working solutions that address
>> this specific challenge within the Apache ecosystem or any existing
>> projects, libraries, or community-contributed solutions that align with our
>> requirements would be immensely valuable.
>>
>> We recognize that many Apache Flink users face similar issues and may
>> have already found innovative ways to tackle them. We implore you to share
>> your knowledge and experiences on this matter. Specifically, we are
>> interested in:
>>
>> *- Strategies or architectural patterns that ensure independent
>> processing of device data.*
>>
>> *- Insights into load balancing, scalability, and efficient data
>> processing across Kafka partitions.*
>>
>> *- Any existing open-source projects or implementations that address
>> similar challenges.*
>>
>>
>>
>> We are confident that your contributions will not only help us resolve
>> this critical issue but also assist the broader Apache Flink community
>> facing similar obstacles.
>>
>>
>>
>> Please respond to this thread with your expertise, solutions, or any
>> relevant resources. Your support will be invaluable to our team and the
>> entire Apache Flink community.
>>
>> Thank you for your prompt attention to this matter.
>>
>>
>> Thanks & Regards
>>
>> Karthick.
>>
>


Re: Urgent: Mitigating Slow Consumer Impact and Seeking Open-Source Solutions in Apache Kafka Consumers

2023-09-15 Thread Karthick
Hi Giannis
Thanks for the reply

 some partitions have way more data than others?

Yes, some of the partitions are overloaded. Say 5 out of 10 were
overloaded. We are now using Default partitioner, the key is device unique
identifier.

What is the number of your devices? how many messages are you processing?

As of now there are 1000 devices. We need to process 1000 messages per sec.
(This message count will grow in near future)

We are using apache storm now and in the plan to migrate to Flink. Can you
please throw some light on backpressure?

We need to know is there any way to achieve the no collision on device data
with scalable angle and unique channel?

Your advice/help will be appreciated.

Thanks and regards
Karthick.



On Fri, Sep 15, 2023 at 6:58 PM Giannis Polyzos 
wrote:

> Hi Karthick,
>
> on a high level seems like a data skew issue and some partitions have way
> more data than others?
> What is the number of your devices? how many messages are you processing?
> Most of the things you share above sound like you are looking for
> suggestions around load distribution for Kafka.  i.e number of partitions,
> how to distribute your device data etc.
> It would be good to also share what your flink job is doing as I don't see
> anything mentioned around that.. are you observing back pressure in the
> Flink UI?
>
> Best
>
> On Fri, Sep 15, 2023 at 3:46 PM Karthick 
> wrote:
>
>> Dear Apache Flink Community,
>>
>>
>>
>> I am writing to urgently address a critical challenge we've encountered
>> in our IoT platform that relies on Apache Kafka and real-time data
>> processing. We believe this issue is of paramount importance and may have
>> broad implications for the community.
>>
>>
>>
>> In our IoT ecosystem, we receive data streams from numerous devices, each
>> uniquely identified. To maintain data integrity and ordering, we've
>> meticulously configured a Kafka topic with ten partitions, ensuring that
>> each device's data is directed to its respective partition based on its
>> unique identifier. This architectural choice has proven effective in
>> maintaining data order, but it has also unveiled a significant problem:
>>
>>
>>
>> *One device's data processing slowness is interfering with other devices'
>> data, causing a detrimental ripple effect throughout our system.*
>>
>> To put it simply, when a single device experiences processing delays, it
>> acts as a bottleneck within the Kafka partition, leading to delays in
>> processing data from other devices sharing the same partition. This issue
>> undermines the efficiency and scalability of our entire data processing
>> pipeline.
>>
>> Additionally, I would like to highlight that we are currently using the
>> default partitioner for choosing the partition of each device's data. If
>> there are alternative partitioning strategies that can help alleviate this
>> problem, we are eager to explore them.
>>
>> We are in dire need of a high-scalability solution that not only ensures
>> each device's data processing is independent but also prevents any
>> interference or collisions between devices' data streams. Our primary
>> objectives are:
>>
>> 1. *Isolation and Independence:* We require a strategy that guarantees
>> one device's processing speed does not affect other devices in the same
>> Kafka partition. In other words, we need a solution that ensures the
>> independent processing of each device's data.
>>
>>
>> 2. *Open-Source Implementation:* We are actively seeking pointers to
>> open-source implementations or references to working solutions that address
>> this specific challenge within the Apache ecosystem or any existing
>> projects, libraries, or community-contributed solutions that align with our
>> requirements would be immensely valuable.
>>
>> We recognize that many Apache Flink users face similar issues and may
>> have already found innovative ways to tackle them. We implore you to share
>> your knowledge and experiences on this matter. Specifically, we are
>> interested in:
>>
>> *- Strategies or architectural patterns that ensure independent
>> processing of device data.*
>>
>> *- Insights into load balancing, scalability, and efficient data
>> processing across Kafka partitions.*
>>
>> *- Any existing open-source projects or implementations that address
>> similar challenges.*
>>
>>
>>
>> We are confident that your contributions will not only help us resolve
>> this critical issue but also assist the broader Apache Flink community
>> facing similar obstacles.
>>
>>
>>
>> Please respond to this thread with your expertise, solutions, or any
>> relevant resources. Your support will be invaluable to our team and the
>> entire Apache Flink community.
>>
>> Thank you for your prompt attention to this matter.
>>
>>
>> Thanks & Regards
>>
>> Karthick.
>>
>


Re: Flink 1.15 KubernetesHaServicesFactory

2023-09-15 Thread Chen Zhanghao
Hi Alexey,
This is expected as Flink 1.15 introduced a new multiple component leader 
election service that only runs a single leader election per Flink process. You 
may set `high-availability.use-old-ha-services: true` to use the old high 
availability services in case of any issues as well.

Best,
Zhanghao Chen

From: Alexey Trenikhun 
Sent: Saturday, September 16, 2023 10:36:41 AM
To: Flink User Mail List 
Subject: Flink 1.15 KubernetesHaServicesFactory

Hello,
After upgrading Flink to 1.15.4 from 1.14.6, I've noticed that there are no 
"{clusterId}-{componentName}-leader" config maps anymore, but instead there are 
"{clusterId}-cluster-config-map" and  "{clusterId}--cluster-config-map". 
Is it expected ?

Thanks,
Alexey


Flink 1.15 KubernetesHaServicesFactory

2023-09-15 Thread Alexey Trenikhun
Hello,
After upgrading Flink to 1.15.4 from 1.14.6, I've noticed that there are no 
"{clusterId}-{componentName}-leader" config maps anymore, but instead there are 
"{clusterId}-cluster-config-map" and  "{clusterId}--cluster-config-map". 
Is it expected ?

Thanks,
Alexey


Re: Urgent: Mitigating Slow Consumer Impact and Seeking Open-Source Solutions in Apache Kafka Consumers

2023-09-15 Thread Giannis Polyzos
Hi Karthick,

on a high level seems like a data skew issue and some partitions have way
more data than others?
What is the number of your devices? how many messages are you processing?
Most of the things you share above sound like you are looking for
suggestions around load distribution for Kafka.  i.e number of partitions,
how to distribute your device data etc.
It would be good to also share what your flink job is doing as I don't see
anything mentioned around that.. are you observing back pressure in the
Flink UI?

Best

On Fri, Sep 15, 2023 at 3:46 PM Karthick  wrote:

> Dear Apache Flink Community,
>
>
>
> I am writing to urgently address a critical challenge we've encountered in
> our IoT platform that relies on Apache Kafka and real-time data processing.
> We believe this issue is of paramount importance and may have broad
> implications for the community.
>
>
>
> In our IoT ecosystem, we receive data streams from numerous devices, each
> uniquely identified. To maintain data integrity and ordering, we've
> meticulously configured a Kafka topic with ten partitions, ensuring that
> each device's data is directed to its respective partition based on its
> unique identifier. This architectural choice has proven effective in
> maintaining data order, but it has also unveiled a significant problem:
>
>
>
> *One device's data processing slowness is interfering with other devices'
> data, causing a detrimental ripple effect throughout our system.*
>
> To put it simply, when a single device experiences processing delays, it
> acts as a bottleneck within the Kafka partition, leading to delays in
> processing data from other devices sharing the same partition. This issue
> undermines the efficiency and scalability of our entire data processing
> pipeline.
>
> Additionally, I would like to highlight that we are currently using the
> default partitioner for choosing the partition of each device's data. If
> there are alternative partitioning strategies that can help alleviate this
> problem, we are eager to explore them.
>
> We are in dire need of a high-scalability solution that not only ensures
> each device's data processing is independent but also prevents any
> interference or collisions between devices' data streams. Our primary
> objectives are:
>
> 1. *Isolation and Independence:* We require a strategy that guarantees
> one device's processing speed does not affect other devices in the same
> Kafka partition. In other words, we need a solution that ensures the
> independent processing of each device's data.
>
>
> 2. *Open-Source Implementation:* We are actively seeking pointers to
> open-source implementations or references to working solutions that address
> this specific challenge within the Apache ecosystem or any existing
> projects, libraries, or community-contributed solutions that align with our
> requirements would be immensely valuable.
>
> We recognize that many Apache Flink users face similar issues and may have
> already found innovative ways to tackle them. We implore you to share your
> knowledge and experiences on this matter. Specifically, we are interested
> in:
>
> *- Strategies or architectural patterns that ensure independent processing
> of device data.*
>
> *- Insights into load balancing, scalability, and efficient data
> processing across Kafka partitions.*
>
> *- Any existing open-source projects or implementations that address
> similar challenges.*
>
>
>
> We are confident that your contributions will not only help us resolve
> this critical issue but also assist the broader Apache Flink community
> facing similar obstacles.
>
>
>
> Please respond to this thread with your expertise, solutions, or any
> relevant resources. Your support will be invaluable to our team and the
> entire Apache Flink community.
>
> Thank you for your prompt attention to this matter.
>
>
> Thanks & Regards
>
> Karthick.
>


回复: 退订

2023-09-15 Thread Chen Zhanghao
Hi,

请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 地址来取消订阅来自 
user-zh@flink.apache.org 邮件组的邮件

Best,
Zhanghao Chen

发件人: Lynn Chen 
发送时间: 2023年9月15日 16:56
收件人: user-zh@flink.apache.org 
主题: 退订







退订













回复: flink-metrics如何获取applicationid

2023-09-15 Thread Chen Zhanghao
Hi,

请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 地址来取消订阅来自 
user-zh@flink.apache.org 邮件组的邮件

Best,
Zhanghao Chen

发件人: im huzi 
发送时间: 2023年9月15日 18:14
收件人: user-zh@flink.apache.org 
主题: Re: flink-metrics如何获取applicationid

退订
On Wed, Aug 30, 2023 at 19:14 allanqinjy  wrote:

> hi,
>请教大家一个问题,就是在上报指标到prometheus时候,jobname会随机生成一个后缀,看源码也是new Abstract
> ID(),有方法在这里获取本次上报的作业applicationid吗?


Recommended Download Directory for File Source

2023-09-15 Thread Kirti Dhar Upadhyay K via user
Hello Guys,

I am using Flink File Source with Amazon S3.
AFAIK, File source first downloads the file in temporary location and then 
starts reading the file and emitting the records.
By default the download location is /tmp directory.

In case of containerized environment, where Pods have limited memory, is /tmp 
directory the recommended download directory?
Or we should we any persistent location for the same by configuring 
io.tmp.dirs? Is there significant impact on performance?

Regards,
Kirti Dhar




Re: About Flink parquet format

2023-09-15 Thread Feng Jin
Hi Kamal

Check if the checkpoint of the task is enabled and triggered correctly. By
default, write parquet files will roll a new file when checkpointing.


Best,
Feng

On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user 
wrote:

> Hello,
>
>
>
> Tried parquet file creation with file sink bulk writer.
>
>
>
> If configured parquet page size as low as 1 byte (allowed configuration)
> then flink keeps on creating multiple ‘in-progress’ state files and with
> content only as ‘PAR1’ and never closed the file.
>
>
>
> I want to know what is the reason of not closing the file and creating
> multiple ‘in-progress’ part files or why no error is given if applicable?
>
>
>
> Rgds,
>
> Kamal
>


Urgent: Mitigating Slow Consumer Impact and Seeking Open-Source Solutions in Apache Kafka Consumers

2023-09-15 Thread Karthick
Dear Apache Flink Community,



I am writing to urgently address a critical challenge we've encountered in
our IoT platform that relies on Apache Kafka and real-time data processing.
We believe this issue is of paramount importance and may have broad
implications for the community.



In our IoT ecosystem, we receive data streams from numerous devices, each
uniquely identified. To maintain data integrity and ordering, we've
meticulously configured a Kafka topic with ten partitions, ensuring that
each device's data is directed to its respective partition based on its
unique identifier. This architectural choice has proven effective in
maintaining data order, but it has also unveiled a significant problem:



*One device's data processing slowness is interfering with other devices'
data, causing a detrimental ripple effect throughout our system.*

To put it simply, when a single device experiences processing delays, it
acts as a bottleneck within the Kafka partition, leading to delays in
processing data from other devices sharing the same partition. This issue
undermines the efficiency and scalability of our entire data processing
pipeline.

Additionally, I would like to highlight that we are currently using the
default partitioner for choosing the partition of each device's data. If
there are alternative partitioning strategies that can help alleviate this
problem, we are eager to explore them.

We are in dire need of a high-scalability solution that not only ensures
each device's data processing is independent but also prevents any
interference or collisions between devices' data streams. Our primary
objectives are:

1. *Isolation and Independence:* We require a strategy that guarantees one
device's processing speed does not affect other devices in the same Kafka
partition. In other words, we need a solution that ensures the independent
processing of each device's data.


2. *Open-Source Implementation:* We are actively seeking pointers to
open-source implementations or references to working solutions that address
this specific challenge within the Apache ecosystem or any existing
projects, libraries, or community-contributed solutions that align with our
requirements would be immensely valuable.

We recognize that many Apache Flink users face similar issues and may have
already found innovative ways to tackle them. We implore you to share your
knowledge and experiences on this matter. Specifically, we are interested
in:

*- Strategies or architectural patterns that ensure independent processing
of device data.*

*- Insights into load balancing, scalability, and efficient data processing
across Kafka partitions.*

*- Any existing open-source projects or implementations that address
similar challenges.*



We are confident that your contributions will not only help us resolve this
critical issue but also assist the broader Apache Flink community facing
similar obstacles.



Please respond to this thread with your expertise, solutions, or any
relevant resources. Your support will be invaluable to our team and the
entire Apache Flink community.

Thank you for your prompt attention to this matter.


Thanks & Regards

Karthick.


Maven plugin to detect issues in Flink projects

2023-09-15 Thread Kartoglu, Emre
Hello Flink users,

We recently released a Maven plugin that detects known Flink issues at 
packaging/compile time:

https://github.com/awslabs/static-checker-flink

Its scope is currently limited to finding known connector incompatibility 
issues.

Some future ideas:


  *   Check for other statically detectable issues e.g. setting Flink operator 
ids.
  *   Gradle plugin
  *   Simple CLI tool to detect the issues in jar files (not couples with any 
build tool)


This came out of a Hackathon, so the code needs a lot of improvement (please 
feel absolutely free to raise a PR).

Please let know if you have any feedback.

Emre


Re: flink-metrics如何获取applicationid

2023-09-15 Thread im huzi
退订
On Wed, Aug 30, 2023 at 19:14 allanqinjy  wrote:

> hi,
>请教大家一个问题,就是在上报指标到prometheus时候,jobname会随机生成一个后缀,看源码也是new Abstract
> ID(),有方法在这里获取本次上报的作业applicationid吗?


Re: Securing Keytab File in Flink

2023-09-15 Thread Gabor Somogyi
Hi Chirag,

Couple things can be done to reduce the attack surface (including but not
limited to):
* Use delegation tokens where only JM needs the keytab file:
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/security/security-delegation-token/
* Limit the access rights of the keytab to the minimum
* Rotate the keytab time to time
* The keytab can be encrypted at rest but that's fully custom logic outside
of Flink

G


On Fri, Sep 15, 2023 at 7:05 AM Chirag Dewan via user 
wrote:

> Hi,
>
> I am trying to implement a HDFS Source connector that can collect files
> from Kerberos enabled HDFS. As per the Kerberos support, I have provided my
> keytab file to Job Managers and all the Task Managers.
>
> Now, I understand that keytab file is a security concern and if left
> unsecured can be used by hackers to gain access to HDFS.
>
> So I wanted to understand if there's a way to encrypt this file on storage
> (at rest) and later decrypt before JM and TMs can initialize the
> KerberosModule?
>
> Or if there any other standard practices in controlling the keytab access
> from Storage. Would appreciate some ideas.
>
>
> Thanks
>


Re: Help needed on stack overflow query

2023-09-15 Thread Nihar Rao
Thanks Feng, it worked.

On Wed, Sep 6, 2023 at 8:09 AM Feng Jin  wrote:

> Hi Nihar,
> Have you tried using the following configuration:
>
> metrics.reporter.my_reporter.filter.includes:
> jobmanager:*:*;taskmanager:*:*
>
> Please note that the default delimiter for the List parameter in Flink is
> ";".
>
> Best regards,
> Feng
>
> On Thu, Aug 24, 2023 at 11:36 PM Nihar Rao  wrote:
>
>> Hello,
>> Creating a new question for this query
>>  as I
>> am not able to reply to the post.
>>
>> Can anyone help with the below query
>>
>>
>> https://stackoverflow.com/questions/76949195/how-to-include-multiple-filters-in-filter-includes-parameter-for-my-flink-metric
>>
>> Thanks
>>
>