Re: Profiling on flink jobs

2023-12-01 Thread Matthias Pohl via user
I missed the Reply All button in my previous message. Here's my previous email for the sake of transparency sent to the user ML once more: Hi Oscar, sorry for the late reply. I didn't see that you posted the question at the beginning of the month already. I used jmap [1] in the past to get some

Re: Doubts about state and table API

2023-11-29 Thread Matthias Pohl via user
Hi Oscar, could you provide the Java code to illustrate what you were doing? The difference between version A and B might be especially helpful. I assume you already looked into the FAQ about operator IDs [1]? Adding the JM and TM logs might help as well to investigate the issue, as Yu Chen

Re: Java 17 as default

2023-11-29 Thread Matthias Pohl via user
The 1.18 Docker images were pushed on Oct 31. This also included Java 17 images [1]. [1] https://hub.docker.com/_/flink/tags?page=1=java17 On Wed, Nov 15, 2023 at 7:56 AM Tauseef Janvekar wrote: > Dear Team, > > I saw the documentation for 1.18 and Java 17 is not supported and the > image is

Re: [DISCUSS][FLINK-33240] Document deprecated options as well

2023-10-30 Thread Matthias Pohl via user
Thanks for your proposal, Zhanghao Chen. I think it adds more transparency to the configuration documentation. +1 from my side on the proposal On Wed, Oct 11, 2023 at 2:09 PM Zhanghao Chen wrote: > Hi Flink users and developers, > > Currently, Flink won't generate doc for the deprecated

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Matthias Pohl via user
Congratulations and good luck with pushing the project forward. On Mon, Mar 27, 2023 at 2:35 PM Jing Ge via user wrote: > Congrats! > > Best regards, > Jing > > On Mon, Mar 27, 2023 at 2:32 PM Leonard Xu wrote: > >> Congratulations! >> >> >> Best, >> Leonard >> >> On Mar 27, 2023, at 5:23 PM,

Re: Issue with the flink version 1.10.1

2023-03-27 Thread Matthias Pohl via user
Hi Kiran, it's really hard to come up with an answer based on your description. Usually, it helps to share some logs with the exact error that's appearing and a clear description on what you're observing and what you're expecting. A plain "no jobs are running" is too general to come up with a

Re: [ANNOUNCE] Apache Flink 1.17.0 released

2023-03-27 Thread Matthias Pohl via user
Here are a few things I noticed from the 1.17 release retrospectively which I want to share (other release managers might have a different view or might disagree): - Google Meet might not be the best choice for the release sync. We need to be able to invite attendees even if the creator of the

Re: [ANNOUNCE] Apache Flink 1.17.0 released

2023-03-23 Thread Matthias Pohl via user
Thanks for making this release getting over the finish line. One additional thing: Feel free to reach out to the release managers (or respond to this thread) with feedback on the release process. Our goal is to constantly improve the release process. Feedback on what could be improved or things

Re: Job Cancellation Failing

2023-02-21 Thread Matthias Pohl via user
I noticed a test instability that sounds quite similar to what you're experiencing. I created FLINK-31168 [1] to follow-up on this one. [1] https://issues.apache.org/jira/browse/FLINK-31168 On Mon, Feb 20, 2023 at 4:50 PM Matthias Pohl wrote: > What do you mean by "earlier it used to fail due

Re: Job Cancellation Failing

2023-02-20 Thread Matthias Pohl via user
What do you mean by "earlier it used to fail due to ExecutionGraphStore not existing in /tmp" folder? Did you get the error message "Could not create executionGraphStorage directory in /tmp." and creating this folder fixed the issue? It also looks like the stacktrace doesn't match any of the 1.15

Re: Blob server connection problem

2023-01-24 Thread Matthias Pohl via user
We had issues like that in the past (e.g. FLINK-24923 [1], FLINK-10683 [2]). The error you're observing is caused by an unexpected byte being read from the socket. The BlobServer protocol expects either 0 (for put messages) or 1 (for get messages) being retrieved as a header for new message blocks

Re: How does Flink plugin system work?

2023-01-02 Thread Matthias Pohl via user
#L457 > [3] > https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/plugins/ > > Matthias Pohl via user 于2023年1月2日周一 20:27写道: > >> Hi Ruibin, >> could you switch to using the currently supported way for instantiating >> reporters using t

Re: The use of zookeeper in flink

2023-01-02 Thread Matthias Pohl via user
And I screwed up the reply again. -.- Here's my previous response for the ML thread and not only spoon_lz: Hi spoon_lz, Thanks for reaching out to the community and sharing your use case. You're right about the fact that Flink's HA feature relies on the leader election. The HA backend not being

Re: How does Flink plugin system work?

2023-01-02 Thread Matthias Pohl via user
Hi Ruibin, could you switch to using the currently supported way for instantiating reporters using the factory configuration parameter [1][2]? Based on the ClassNotFoundException, your suspicion might be right that the plugin didn't make it onto the classpath. Could you share the startup logs of

Re: Cleanup for high-availability.storageDir

2022-12-08 Thread Matthias Pohl via user
Yes, the wrong button was pushed when replying last time. -.- Looking into the code once again [1], you're right. It looks like for "last-state", no job is cancelled but the cluster deployment is just deleted. I was assuming that the artifacts the documentation about the JobResultStore resource

Re: How's JobManager bring up TaskManager in Application Mode or Session Mode?

2022-11-28 Thread Matthias Pohl via user
Hi Mark, the JobManager is not necessarily in charge of spinning up TaskManager instances. It depends on the resource provider configuration you choose. Flink differentiates between active and passive Resource Management (see the two available implementations of ResourceManager [1]). Active

Re: [Security] - Critical OpenSSL Vulnerability

2022-11-01 Thread Matthias Pohl via user
The Docker image for Flink 1.12.7 uses an older base image which comes with openssl 1.1.1k. There was a previous post in the OpenSSL mailing list reporting a low vulnerability being fixed with 3.0.6 and 1.1.1r (both versions being explicitly mentioned) [1]. Therefore, I understand the post in a

Re: Watermark generating mechanism in Flink SQL

2022-10-17 Thread Matthias Pohl via user
Hi Hunk, there is documentation about watermarking in FlinkSQL [1]. There is also a FlinkSQL cookbook entry about watermarking [2]. Essentially, you define the watermark strategy in your CREATE TABLE statement and specify the lateness for a given event (not the period in which watermarks are

Re: jobmaster's fatal error will kill the session cluster

2022-10-17 Thread Matthias Pohl via user
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) > ~[flink-scala_2.12-1.15.0.jar:1.15.0] > at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) > ~[flink-scala_2.12-1.15.0.jar:1.15.0] > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)

Re: Sometimes checkpoints to s3 fail

2022-10-14 Thread Matthias Pohl via user
Hi Evgeniy, is it Ceph which you're using as a S3 server? All the Google search entries point to Ceph when looking for the error message. Could it be that there's a problem with the version of the underlying system? The stacktrace you provided looks like Flink struggles to close the File and,

Re: jobmaster's fatal error will kill the session cluster

2022-10-14 Thread Matthias Pohl via user
Hi Jie Han, welcome to the community. Just a little side note: These kinds of questions are more suitable to be asked in the user mailing list. The dev mailing list is rather used for discussing feature development or project-related topics. See [1] for further details. About your question: The

Re: Cancel a job in status INITIALIZING

2022-09-26 Thread Matthias Pohl via user
Can you provide the JobManager logs for this case. It sounds odd that the job was stuck in the INITIALIZING phase. Matthias On Wed, Sep 21, 2022 at 11:50 AM Christian Lorenz via user < user@flink.apache.org> wrote: > Hi, > > > > we’re running a Flink Cluster in standalone/session mode. During a

Re: Jobmanager fails to come up if the job has an issue

2022-09-26 Thread Matthias Pohl via user
Yes, the JobManager will failover in HA mode and all jobs would be recovered. On Mon, Sep 26, 2022 at 2:06 PM ramkrishna vasudevan < ramvasu.fl...@gmail.com> wrote: > Thanks @Matthias Pohl . This is informative. So > generally in a session cluster if I have more than one job and only one of >

Re: JobManager restarts on job failure

2022-09-26 Thread Matthias Pohl via user
That's a good point. I forgot about these options. You're right. Cleanup wouldn't be done in that case. So, upgrading would be a viable option as you suggested. Matthias On Mon, Sep 26, 2022 at 12:53 PM Gyula Fóra wrote: > Maybe it is a stupid question but in Flink 1.15 with the following

Re: Jobmanager fails to come up if the job has an issue

2022-09-26 Thread Matthias Pohl via user
I see. Thanks for sharing the logs. It's related to a FLINK-9097 [1]. In order for the job to not be cleaned up entirely after a failure while submitting the job, the JobManager is failed fatally resulting in a failover. That's what you're experiencing. One solution is to fix the permission issue

Re: JobManager restarts on job failure

2022-09-26 Thread Matthias Pohl via user
Thanks Evgeniy for reaching out to the community and Gyula for picking it up. I haven't looked into the k8s operator in much detail, yet. So, help me out if I miss something here. But I'm afraid that this is not something that would be fixed by upgrading to 1.15. The issue here is that we're

Re: Jobmanager fails to come up if the job has an issue

2022-09-26 Thread Matthias Pohl via user
Hi Ramkrishna, thanks for reaching out to the Flink community. Could you share the JobManager logs to get a better understanding of what's going on? I'm wondering why the JobManager is failing when the actual problem is that the job is struggling to access a folder. It sounds like there are

Re: Classloading issues with Flink Operator / Kubernetes Native

2022-09-16 Thread Matthias Pohl via user
Are you deploying the job in session or application mode? Could you provide the stacktrace. I'm wondering whether that would be helpful to pin a code location for further investigation. So far, I couldn't come up with a definite answer about placing the jar in the lib directory. Initially, I would

Re: New licensing for Akka

2022-09-09 Thread Matthias Pohl via user
Looks like there will be a bit of a grace period till Sep 2023 for vulnerability fixes in akka 2.6.x [1] [1] https://discuss.lightbend.com/t/2-6-x-maintenance-proposal/9949 On Wed, Sep 7, 2022 at 4:30 PM Robin Cassan via user wrote: > Thanks a lot for your answers, this is reassuring! > >

Re: New licensing for Akka

2022-09-07 Thread Matthias Pohl via user
There is some more discussion going on in the related PR [1]. Based on the current state of the discussion, akka 2.6.20 will be the last version under Apache 2.0 license. But, I guess, we'll have to see where this discussion is heading considering that it's kind of fresh. [1]

Re: Slow Tests in Flink 1.15

2022-09-06 Thread Matthias Pohl via user
Hi David, I guess, you're referring to [1]. But as Chesnay already pointed out in the previous thread: It would be helpful to get more insights into what exactly your tests are executing (logs, code, ...). That would help identifying the cause. > Can you give us a more complete stacktrace so we

Re: flink ci build run longer than the maximum time of 310 minutes.

2022-09-05 Thread Matthias Pohl via user
Usually, it would be more helpful to provide a link to the PR to get a better picture of the problem. I'm not 100% sure whether I grasp what's wrong. It looks like your branch is based on apache/flink:release-1.15 [1]. Therefore, you should fetch the most recent version from upstream and then do

Re: flink ci build run longer than the maximum time of 310 minutes.

2022-09-02 Thread Matthias Pohl via user
Not sure whether that applies to your case, but there was a recent issue [1] where the e2e_1_ci job ran into a timeout. If that's what you were observing, rebasing your branch might help. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-29161 On Fri, Sep 2, 2022 at 10:51 AM

Re: Failing to maven compile install Flink 1.15

2022-08-22 Thread Matthias Pohl via user
Hi hjw, it would be interesting to know the exact Maven commands you used for the successful run (where you compiled the flink-client module individually) and the failed run (where you tried to build everything at once) and probably a more complete version of the Maven output. The path