退订

2023-06-08 Thread huang huang



Call for Presentations: Flink Forward Seattle 2023

2023-06-08 Thread Jing Ge via user
Dear Flink developers & users,

We hope this email finds you well. We are excited to announce the Call for
Presentations for the upcoming Flink Forward Seattle 2023, the premier
event dedicated to Apache Flink and stream processing technologies. As a
prominent figure in the field, we invite you to submit your innovative
research, insightful experiences, and cutting-edge use cases for
consideration as a speaker at the conference.

Flink Forward Conference 2023 Details:
Date: November 6-7(training), November 8 (conference)
Location: Seattle United States

Flink Forward is a conference dedicated to the Apache Flink® community. In
2023 we shall have a full conference day following a 2-days long training
session. The conference gathers an international audience of CTOs/CIOs,
developers, data architects, data scientists, Apache Flink® core
committers, and the stream processing community, to share experiences,
exchange ideas and knowledge, and receive hands-on training sessions led by
Flink experts. We are seeking compelling presentations and
thought-provoking talks that cover a broad range of topics related to
Apache Flink, including but not limited to:

Flink architecture and internals
Flink performance optimization
Advanced Flink features and enhancements
Real-world use cases and success stories
Flink ecosystem and integrations
Stream processing at scale
Best practices for Flink application development

If you have an inspiring story, valuable insights, real-world application,
research breakthroughs, use case, best practice, or compelling vision of
the future for Flink, we encourage you to present it to a highly skilled
and enthusiastic community. We welcome submissions from both industry
professionals and academic researchers.

To submit your proposal, please visit the Flink Forward Conference website
at https://www.flink-forward.org/seattle-2023/call-for-presentations. The
submission form will require you to provide an abstract of your talk, along
with a brief biography and any supporting materials. The deadline for
submissions is July 12th 11:59 pm PDT.

We believe your contribution will greatly enrich the Flink Forward
Conference and provide invaluable insights to our attendees. This is an
excellent opportunity to connect with a diverse community of Flink
enthusiasts, network with industry experts, and gain recognition for your
expertise. We look forward to receiving your submission and welcoming you
as a speaker at the Flink Forward Conference.

Thank you for your time and consideration.

Best regards,

-- 

Jing Ge | Head of Engineering

j...@ververica.com



Follow us @VervericaData

--

Join Flink Forward  - The Apache Flink
Conference - Tickets on SALE now!

Re: [DISCUSS] Status of Statefun Project

2023-06-08 Thread Galen Warren via user
Thanks Martijn.

Personally, I'm already using a local fork of Statefun that is compatible
with Flink 1.16.x, so I wouldn't have any need for a released version
compatible with 1.15.x. I'd be happy to do the PRs to modify Statefun to
work with new versions of Flink as they come along.

As for testing, Statefun does have unit tests and Gordon also sent me
instructions a while back for how to do some additional smoke tests which
are pretty straightforward. Perhaps he could weigh in on whether the
combination of automated tests plus those smoke tests should be sufficient
for testing with new Flink versions (I believe the answer is yes).

-- Galen



On Thu, Jun 8, 2023 at 8:01 AM Martijn Visser 
wrote:

> Hi all,
>
> Apologies for the late reply.
>
> I'm willing to help out with merging requests in Statefun to keep them
> compatible with new Flink releases and create new releases. I do think that
> validation of the functionality of these releases depends a lot on those
> who do these compatibility updates, with PMC members helping out with the
> formal process.
>
> > Why can't the Apache Software Foundation allow community members to bring
> it up to date?
>
> There's nothing preventing anyone from reviewing any of the current PRs or
> opening new ones. However, none of them are approved [1], so there's also
> nothing to merge.
>
> > I believe that there are people and companies on this mailing list
> interested in supporting Apache Flink Stateful Functions.
>
> If so, then now is the time to show.
>
> Would there be a preference to create a release with Galen's merged
> compatibility update to Flink 1.15.2, or do we want to skip that and go
> straight to a newer version?
>
> Best regards,
>
> Martijn
>
> [1]
>
> https://github.com/apache/flink-statefun/pulls?q=is%3Apr+is%3Aopen+review%3Aapproved
>
> On Tue, Jun 6, 2023 at 3:55 PM Marco Villalobos  >
> wrote:
>
> > Why can't the Apache Software Foundation allow community members to bring
> > it up to date?
> >
> > What's the process for that?
> >
> > I believe that there are people and companies on this mailing list
> > interested in supporting Apache Flink Stateful Functions.
> >
> > You already had two people on this thread express interest.
> >
> > At the very least, we could keep the library versions up to date.
> >
> > There are only a small list of new features that might be worthwhile:
> >
> > 1. event time processing
> > 2. state rest api
> >
> >
> > On Jun 6, 2023, at 3:06 AM, Chesnay Schepler  wrote:
> >
> > If you were to fork it *and want to redistribute it* then the short
> > version is that
> >
> >1. you have to adhere to the Apache licensing requirements
> >2. you have to make it clear that your fork does not belong to the
> >Apache Flink project. (Trademarks and all that)
> >
> > Neither should be significant hurdles (there should also be plenty of
> > online resources regarding 1), and if you do this then you can freely
> share
> > your fork with others.
> >
> > I've also pinged Martijn to take a look at this thread.
> > To my knowledge the project hasn't decided anything yet.
> >
> > On 27/05/2023 04:05, Galen Warren wrote:
> >
> > Ok, I get it. No interest.
> >
> > If this project is being abandoned, I guess I'll work with my own fork.
> Is
> > there anything I should consider here? Can I share it with other people
> who
> > use this project?
> >
> > On Tue, May 16, 2023 at 10:50 AM Galen Warren 
> 
> > wrote:
> >
> >
> > Hi Martijn, since you opened this discussion thread, I'm curious what
> your
> > thoughts are in light of the responses? Thanks.
> >
> > On Wed, Apr 19, 2023 at 1:21 PM Galen Warren  <
> ga...@cvillewarrens.com>
> > wrote:
> >
> >
> > I use Apache Flink for stream processing, and StateFun as a hand-off
> >
> > point for the rest of the application.
> > It serves well as a bridge between a Flink Streaming job and
> > micro-services.
> >
> > This is essentially how I use it as well, and I would also be sad to see
> > it sunsetted. It works well; I don't know that there is a lot of new
> > development required, but if there are no new Statefun releases, then
> > Statefun can only be used with older Flink versions.
> >
> > On Tue, Apr 18, 2023 at 10:04 PM Marco Villalobos <
> mvillalo...@kineteque.com> wrote:
> >
> >
> > I am currently using Stateful Functions in my application.
> >
> > I use Apache Flink for stream processing, and StateFun as a hand-off
> > point for the rest of the application.
> > It serves well as a bridge between a Flink Streaming job and
> > micro-services.
> >
> > I would be disappointed if StateFun was sunsetted.  Its a good idea.
> >
> > If there is anything I can do to help, as a contributor perhaps, please
> > let me know.
> >
> >
> > On Apr 3, 2023, at 2:02 AM, Martijn Visser  <
> martijnvis...@apache.org>
> >
> > wrote:
> >
> > Hi everyone,
> >
> > I want to open a discussion on the status of the Statefun Project [1]
> >
> > in Apache Flink. As you might have noticed, there hasn't 

Re: [DISCUSS] Status of Statefun Project

2023-06-08 Thread Martijn Visser
Hi all,

Apologies for the late reply.

I'm willing to help out with merging requests in Statefun to keep them
compatible with new Flink releases and create new releases. I do think that
validation of the functionality of these releases depends a lot on those
who do these compatibility updates, with PMC members helping out with the
formal process.

> Why can't the Apache Software Foundation allow community members to bring
it up to date?

There's nothing preventing anyone from reviewing any of the current PRs or
opening new ones. However, none of them are approved [1], so there's also
nothing to merge.

> I believe that there are people and companies on this mailing list
interested in supporting Apache Flink Stateful Functions.

If so, then now is the time to show.

Would there be a preference to create a release with Galen's merged
compatibility update to Flink 1.15.2, or do we want to skip that and go
straight to a newer version?

Best regards,

Martijn

[1]
https://github.com/apache/flink-statefun/pulls?q=is%3Apr+is%3Aopen+review%3Aapproved

On Tue, Jun 6, 2023 at 3:55 PM Marco Villalobos 
wrote:

> Why can't the Apache Software Foundation allow community members to bring
> it up to date?
>
> What's the process for that?
>
> I believe that there are people and companies on this mailing list
> interested in supporting Apache Flink Stateful Functions.
>
> You already had two people on this thread express interest.
>
> At the very least, we could keep the library versions up to date.
>
> There are only a small list of new features that might be worthwhile:
>
> 1. event time processing
> 2. state rest api
>
>
> On Jun 6, 2023, at 3:06 AM, Chesnay Schepler  wrote:
>
> If you were to fork it *and want to redistribute it* then the short
> version is that
>
>1. you have to adhere to the Apache licensing requirements
>2. you have to make it clear that your fork does not belong to the
>Apache Flink project. (Trademarks and all that)
>
> Neither should be significant hurdles (there should also be plenty of
> online resources regarding 1), and if you do this then you can freely share
> your fork with others.
>
> I've also pinged Martijn to take a look at this thread.
> To my knowledge the project hasn't decided anything yet.
>
> On 27/05/2023 04:05, Galen Warren wrote:
>
> Ok, I get it. No interest.
>
> If this project is being abandoned, I guess I'll work with my own fork. Is
> there anything I should consider here? Can I share it with other people who
> use this project?
>
> On Tue, May 16, 2023 at 10:50 AM Galen Warren  
> 
> wrote:
>
>
> Hi Martijn, since you opened this discussion thread, I'm curious what your
> thoughts are in light of the responses? Thanks.
>
> On Wed, Apr 19, 2023 at 1:21 PM Galen Warren  
> 
> wrote:
>
>
> I use Apache Flink for stream processing, and StateFun as a hand-off
>
> point for the rest of the application.
> It serves well as a bridge between a Flink Streaming job and
> micro-services.
>
> This is essentially how I use it as well, and I would also be sad to see
> it sunsetted. It works well; I don't know that there is a lot of new
> development required, but if there are no new Statefun releases, then
> Statefun can only be used with older Flink versions.
>
> On Tue, Apr 18, 2023 at 10:04 PM Marco Villalobos  
> wrote:
>
>
> I am currently using Stateful Functions in my application.
>
> I use Apache Flink for stream processing, and StateFun as a hand-off
> point for the rest of the application.
> It serves well as a bridge between a Flink Streaming job and
> micro-services.
>
> I would be disappointed if StateFun was sunsetted.  Its a good idea.
>
> If there is anything I can do to help, as a contributor perhaps, please
> let me know.
>
>
> On Apr 3, 2023, at 2:02 AM, Martijn Visser  
> 
>
> wrote:
>
> Hi everyone,
>
> I want to open a discussion on the status of the Statefun Project [1]
>
> in Apache Flink. As you might have noticed, there hasn't been much
> development over the past months in the Statefun repository [2]. There is
> currently a lack of active contributors and committers who are able to help
> with the maintenance of the project.
>
> In order to improve the situation, we need to solve the lack of
>
> committers and the lack of contributors.
>
> On the lack of committers:
>
> 1. Ideally, there are some of the current Flink committers who have
>
> the bandwidth and can help with reviewing PRs and merging them.
>
> 2. If that's not an option, it could be a consideration that current
>
> committers only approve and review PRs, that are approved by those who are
> willing to contribute to Statefun and if the CI passes
>
> On the lack of contributors:
>
> 3. Next to having this discussion on the Dev and User mailing list, we
>
> can also create a blog with a call for new contributors on the Flink
> project website, send out some tweets on the Flink / Statefun twitter
> accounts, post messages on Slack etc. In that message, we would inform how
> those that 

回复:flink作业延迟时效指标

2023-06-08 Thread 17610775726
Hi


你提到的所有的监控都是可以通过 metric 来监控报警的,至于你提到的 LatencyMarker 因为它不参与算子内部的计算逻辑的时间,所以这个 
metric 并不是准确的,但是如果任务有反压的情况下 LatencyMarker 
也会被阻塞,所以大体上还是可以反应出任务的延迟情况,如果想要准确的计算出端到端的延迟,可以在 消费 kafka 的时候获取一个 start time 时间戳 
在 sink 的时候获取一个 end time 时间戳,然后自定义一个 metric 把这个结果上报 基于这个 metric 做端到端的延迟监控。


Best
JasonLee


 回复的原邮件 
| 发件人 | casel.chen |
| 发送日期 | 2023年06月8日 16:39 |
| 收件人 | user-zh@flink.apache.org |
| 主题 | flink作业延迟时效指标 |
我想知道当前flink作业延迟了多久现在能通过什么指标可以获取到吗?想通过设置作业延迟告警来反馈作业健康状况,是否产生背压,是否需要增加资源等。
以mysql表实时同步到doris表为例:mysql binlog -> kafka -> flink -> doris
延迟指标包括:
1. 业务延迟:业务延迟=当前系统时间 - 当前系统处理的最后一条数据的事件时间(Event time)
例如:kafka消息写入doris的时间 - kafka消息数据本身产生时间(例如更新mysql记录的时间)
2. 数据滞留延迟:数据滞留时间=数据进入实时计算的时间 - 数据事件时间(Event time)
例如:flink消费到kafka消息时间 - 消费到的kafka消息数据本身产生时间(例如更新mysql记录的时间)


当前我们是用kafka消费组积压告警来替代的,但这个数据不准,一是因为flink 
checkpoint才会更新offset,二是因为生产流量在不同时段是不同的,在流量低的时候告警不及时。
查了官网有一个LatencyMarker可以开启使用,请问这个开启后要怎么观察延迟呢?这个metric需要上报到prometheus才可以读到吗?


我们遇到另一个问题是使用flink 
sql提交作业生成的metric名称很长,因为operatorId是根据sql内容来生成的,所以动不动就把prometheus给打爆了,这个有什么办法解决么?

flink作业延迟时效指标

2023-06-08 Thread casel.chen
我想知道当前flink作业延迟了多久现在能通过什么指标可以获取到吗?想通过设置作业延迟告警来反馈作业健康状况,是否产生背压,是否需要增加资源等。
以mysql表实时同步到doris表为例:mysql binlog -> kafka -> flink -> doris
延迟指标包括:
1. 业务延迟:业务延迟=当前系统时间 - 当前系统处理的最后一条数据的事件时间(Event time)
   例如:kafka消息写入doris的时间 - kafka消息数据本身产生时间(例如更新mysql记录的时间)
2. 数据滞留延迟:数据滞留时间=数据进入实时计算的时间 - 数据事件时间(Event time)
例如:flink消费到kafka消息时间 - 消费到的kafka消息数据本身产生时间(例如更新mysql记录的时间)


当前我们是用kafka消费组积压告警来替代的,但这个数据不准,一是因为flink 
checkpoint才会更新offset,二是因为生产流量在不同时段是不同的,在流量低的时候告警不及时。
查了官网有一个LatencyMarker可以开启使用,请问这个开启后要怎么观察延迟呢?这个metric需要上报到prometheus才可以读到吗?


我们遇到另一个问题是使用flink 
sql提交作业生成的metric名称很长,因为operatorId是根据sql内容来生成的,所以动不动就把prometheus给打爆了,这个有什么办法解决么?

Re: PyFlink Error JAR files

2023-06-08 Thread Leo

Hi  Kadiyala,


I think that there  is a typo from your email:

    Fink  version 1.7.1

May be  1.17.1  ?


    About the error,  The reason why your code can't run successfully

is: the class 
"*org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer" had 
been obsolete  since Flink 1.14。*


*
*

You need to use "KafkaSource" and "KafkaSink" to finish  your requirement.


Regards,

*Leo*

*
*

在 2023/6/7 23:27, Kadiyala, Ruthvik via user 写道:

Hi,

Please find below the code I have been using to consume a Kafka Stream 
that is hosted on confluent. It returns an error regarding the jar 
files. Please find the error below the code snippet. Let me know what 
I am doing wrong. I am running this on *Docker *with *Flink Version: 
1.7.1.*


*Code:*

frompyflink.common.typeinfoimportTypes
frompyflink.datastreamimportStreamExecutionEnvironment
frompyflink.datastream.connectors.kafkaimportFlinkKafkaConsumer
frompyflink.datastream.formats.jsonimportJsonRowDeserializationSchema
importglob
importos
importsys
importlogging

# Set up the execution environment
env=StreamExecutionEnvironment.get_execution_environment()

logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
format="%(message)s")


# the sql connector for kafka is used here as it's a fat jar and could 
avoid dependency issues

env.add_jars("file://flink-sql-connector-kafka-1.17.1.jar")
env.add_classpaths("file://flink-sql-connector-kafka-1.17.1.jar")
env.add_jars("file://flink-connector-kafka_2.11-1.9.2-javadoc.jar")
env.add_classpaths("file://flink-connector-kafka_2.11-1.9.2-javadoc.jar")

# Set up the Confluent Cloud Kafka configuration
kafka_config= {
'bootstrap.servers': 'bootstrap-server',
'security.protocol': 'SASL_SSL',
'sasl.mechanism': 'PLAIN',
'sasl.jaas.config': 
'org.apache.kafka.common.security.plain.PlainLoginModule required 
username="API_KEY" password="API_SECRET";'

}

topic='TOPIC_NAME'

deserialization_schema=JsonRowDeserializationSchema.Builder() \
        .type_info(Types.ROW([Types.INT(), Types.STRING()])) \
        .build()

# Set up the Kafka consumer properties
consumer_props= {
'bootstrap.servers': kafka_config['bootstrap.servers'],
'security.protocol': kafka_config['security.protocol'],
'sasl.mechanism': kafka_config['sasl.mechanism'],
'sasl.jaas.config': kafka_config['sasl.jaas.config'],
'group.id': 'python-group-1'
}

# Create a Kafka consumer
kafka_consumer=FlinkKafkaConsumer(
topics=topic, # Kafka topic
deserialization_schema=deserialization_schema,
properties=consumer_props, # Consumer properties
)
kafka_consumer.set_start_from_earliest()
# Add the Kafka consumer as a source to the execution environment
stream=env.add_source(kafka_consumer)

# Define your data processing logic here
# For example, you can print the stream to the console
stream.print()

# Execute the job
env.execute()

*Error:*
*
*
*Traceback (most recent call last):*
File "/home/pyflink/test.py", line 45, in 
kafka_consumer = FlinkKafkaConsumer(
File 
"/usr/local/lib/python3.10/dist-packages/pyflink/datastream/connectors/kafka.py", 
line 203, in __init__
j_flink_kafka_consumer = _get_kafka_consumer(topics, properties, 
deserialization_schema,
File 
"/usr/local/lib/python3.10/dist-packages/pyflink/datastream/connectors/kafka.py", 
line 161, in _get_kafka_consumer

j_flink_kafka_consumer = j_consumer_clz(topics,
File 
"/usr/local/lib/python3.10/dist-packages/pyflink/util/exceptions.py", 
line 185, in wrapped_call

raise TypeError(
*TypeError: Could not found the Java class 
'org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer'. The 
Java dependencies could be specified via command line argument 
'--jarfile' or the config option 'pipeline.jars'*




Cheers & Regards,
Ruthvik Kadiyala




Re: Kubernetes operator listing jobs TimeoutException

2023-06-08 Thread Evgeniy Lyutikov
Hi, thanks for the reply.
These errors occur on jobs that have already been successfully deployed and are 
running.

When such an error occurs, the operator begins to consider that the job is in 
the DEPLOYING or DEPLOYED_NOT_READY status, but all this time the job is in the 
RUNNING state and no actions are performed with it

It seems that this problem appeared after updating the FlinkDeployment resource 
to update the version of the running job


2023-06-08 06:31:02,741 o.a.f.k.o.o.JobStatusObserver  [WARN 
][job-name/job-name] Exception while listing jobs
2023-06-08 06:31:02,741 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] Observing JobManager deployment. Previous status: READY
2023-06-08 06:31:03,758 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] JobManager is being deployed
2023-06-08 06:31:03,824 o.a.f.k.o.l.AuditUtils [INFO 
][job-name/job-name] >>> Status | Info| STABLE  | The resource 
deployment is considered to be stable and won’t be rolled back
2023-06-08 06:31:03,825 o.a.f.k.o.a.JobAutoScalerImpl  [INFO 
][job-name/job-name] Job autoscaler is disabled
2023-06-08 06:31:03,825 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO 
][job-name/job-name] Resource fully reconciled, nothing to do...
2023-06-08 06:31:03,825 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] End of reconciliation
2023-06-08 06:31:13,828 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] Starting reconciliation
2023-06-08 06:31:13,829 o.a.f.k.o.s.FlinkResourceContextFactory [INFO 
][job-name/job-name] Getting service for job-name
2023-06-08 06:31:13,829 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] Observing JobManager deployment. Previous status: DEPLOYING
2023-06-08 06:31:14,849 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] JobManager is being deployed
2023-06-08 06:31:14,850 o.a.f.k.o.a.JobAutoScalerImpl  [INFO 
][job-name/job-name] Job autoscaler is disabled
2023-06-08 06:31:14,850 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO 
][job-name/job-name] Resource fully reconciled, nothing to do...
2023-06-08 06:31:14,850 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] End of reconciliation
2023-06-08 06:31:24,853 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] Starting reconciliation
2023-06-08 06:31:24,854 o.a.f.k.o.s.FlinkResourceContextFactory [INFO 
][job-name/job-name] Getting service for job-name
2023-06-08 06:31:24,854 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] Observing JobManager deployment. Previous status: DEPLOYING
2023-06-08 06:31:24,858 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] JobManager deployment port is ready, waiting for the Flink 
REST API...
2023-06-08 06:31:24,926 o.a.f.k.o.l.AuditUtils [INFO 
][job-name/job-name] >>> Status | Info| STABLE  | The resource 
deployment is considered to be stable and won’t be rolled back
2023-06-08 06:31:24,927 o.a.f.k.o.a.JobAutoScalerImpl  [INFO 
][job-name/job-name] Job autoscaler is disabled
2023-06-08 06:31:24,927 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO 
][job-name/job-name] Resource fully reconciled, nothing to do...
2023-06-08 06:31:24,927 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] End of reconciliation
2023-06-08 06:31:34,930 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] Starting reconciliation
2023-06-08 06:31:34,931 o.a.f.k.o.s.FlinkResourceContextFactory [INFO 
][job-name/job-name] Getting service for job-name
2023-06-08 06:31:34,931 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] Observing JobManager deployment. Previous status: 
DEPLOYED_NOT_READY
2023-06-08 06:31:34,931 o.a.f.k.o.o.d.ApplicationObserver [INFO 
][job-name/job-name] JobManager deployment is ready
2023-06-08 06:31:34,931 o.a.f.k.o.o.JobStatusObserver  [INFO 
][job-name/job-name] Observing job status
2023-06-08 06:31:34,936 o.a.f.k.o.o.JobStatusObserver  [INFO 
][job-name/job-name] Job status changed from RECONCILING to RUNNING
2023-06-08 06:31:34,960 o.a.f.k.o.l.AuditUtils [INFO 
][job-name/job-name] >>> Event  | Info| JOBSTATUSCHANGED | Job status 
changed from RECONCILING to RUNNING
2023-06-08 06:31:35,031 o.a.f.k.o.l.AuditUtils [INFO 
][job-name/job-name] >>> Status | Info| STABLE  | The resource 
deployment is considered to be stable and won’t be rolled back
2023-06-08 06:31:35,032 o.a.f.k.o.a.JobAutoScalerImpl  [INFO 
][job-name/job-name] Job autoscaler is disabled
2023-06-08 06:31:35,032 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO 
][job-name/job-name] Resource fully reconciled, nothing to do...
2023-06-08 06:31:35,032 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] End of reconciliation
2023-06-08 06:32:35,035 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][job-name/job-name] Starting reconciliation
2023-06-08 06:32:35,035 

RE: Parquet decoding exception - Flink 1.16.x

2023-06-08 Thread Kamal Mittal via user
Hello,

Can you please share view that for “file system sources”, how to create custom 
metrices e.g. no. of corrupt records count?

Using Flink file source API as in below mail and decoding parquet formatted 
data. Able to count corrupt records but how to give it to Flink?

Rgds,
Kamal

From: Kamal Mittal via user 
Sent: 07 June 2023 05:48 PM
To: Martijn Visser 
Cc: Kamal Mittal via user 
Subject: RE: Parquet decoding exception - Flink 1.16.x

Hello,

Metrices link given in below mail doesn’t give any way to create metrices for 
source function right?

I am using below Flink API to read/decode parquet data, query is where 
exception can be caught for error like “decoding exception” from internal 
parquet API like “AvroParquetReader” and create metrices for corrupt records?

FileSource.FileSourceBuilder source = 
FileSource.forRecordStreamFormat(streamformat, path); //streamformat is of type 
- AvroParquetRecordFormat

Please suggest.

Rgds,
Kamal

From: Martijn Visser mailto:martijnvis...@apache.org>>
Sent: 07 June 2023 03:39 PM
To: Kamal Mittal mailto:kamal.mit...@ericsson.com>>
Cc: Kamal Mittal via user mailto:user@flink.apache.org>>
Subject: Re: Raise alarm for corrupt records

Hi Kamal,

Documentation on the metrics can be found at 
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/

Best regards,

Martijn

On Wed, Jun 7, 2023 at 10:13 AM Kamal Mittal via user 
mailto:user@flink.apache.org>> wrote:
Hello,

Thanks for quick reply.

I am using parquet encoder/decoder and during decoding if any corrupt record 
comes then need to raise alarm and maintain metrices visible over Flink 
Metrices GUI.

So any custom metrices can be created in Flink? Please give some reference of 
any such documentation.

Rgds,
Kamal

From: Martijn Visser mailto:martijnvis...@apache.org>>
Sent: 07 June 2023 12:31 PM
To: Kamal Mittal mailto:kamal.mit...@ericsson.com>>
Cc: user@flink.apache.org
Subject: Re: Raise alarm for corrupt records

Hi Kamal,

No, but it should be straightforward to create metrics or events for these 
types of situations and integrate them with your own alerting solution.

Best regards,

Martijn

On Wed, Jun 7, 2023 at 8:25 AM Kamal Mittal via user 
mailto:user@flink.apache.org>> wrote:
Hello Community,

Is there any way Flink provides out of box to raise alarm for corrupt records 
(e.g. due to decoding failure) in between of running data pipeline and send 
this alarm to outside of task manager process?

Rgds,
Kamal