Re: Profiling on flink jobs

2023-12-01 Thread Matthias Pohl via user
I missed the Reply All button in my previous message. Here's my previous email for the sake of transparency sent to the user ML once more: Hi Oscar, sorry for the late reply. I didn't see that you posted the question at the beginning of the month already. I used jmap [1] in the past to get some

Re: Doubts about state and table API

2023-11-29 Thread Matthias Pohl via user
Hi Oscar, could you provide the Java code to illustrate what you were doing? The difference between version A and B might be especially helpful. I assume you already looked into the FAQ about operator IDs [1]? Adding the JM and TM logs might help as well to investigate the issue, as Yu Chen

Re: Java 17 as default

2023-11-29 Thread Matthias Pohl via user
The 1.18 Docker images were pushed on Oct 31. This also included Java 17 images [1]. [1] https://hub.docker.com/_/flink/tags?page=1=java17 On Wed, Nov 15, 2023 at 7:56 AM Tauseef Janvekar wrote: > Dear Team, > > I saw the documentation for 1.18 and Java 17 is not supported and the > image is

Re: [DISCUSS][FLINK-33240] Document deprecated options as well

2023-10-30 Thread Matthias Pohl via user
Thanks for your proposal, Zhanghao Chen. I think it adds more transparency to the configuration documentation. +1 from my side on the proposal On Wed, Oct 11, 2023 at 2:09 PM Zhanghao Chen wrote: > Hi Flink users and developers, > > Currently, Flink won't generate doc for the deprecated

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Matthias Pohl
Congratulations and good luck with pushing the project forward. On Mon, Mar 27, 2023 at 2:35 PM Jing Ge via user wrote: > Congrats! > > Best regards, > Jing > > On Mon, Mar 27, 2023 at 2:32 PM Leonard Xu wrote: > >> Congratulations! >> >> >> Best, >> Leonard >> >> On Mar 27, 2023, at 5:23 PM,

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Matthias Pohl via user
Congratulations and good luck with pushing the project forward. On Mon, Mar 27, 2023 at 2:35 PM Jing Ge via user wrote: > Congrats! > > Best regards, > Jing > > On Mon, Mar 27, 2023 at 2:32 PM Leonard Xu wrote: > >> Congratulations! >> >> >> Best, >> Leonard >> >> On Mar 27, 2023, at 5:23 PM,

Re: Issue with the flink version 1.10.1

2023-03-27 Thread Matthias Pohl via user
Hi Kiran, it's really hard to come up with an answer based on your description. Usually, it helps to share some logs with the exact error that's appearing and a clear description on what you're observing and what you're expecting. A plain "no jobs are running" is too general to come up with a

Re: [ANNOUNCE] Apache Flink 1.17.0 released

2023-03-27 Thread Matthias Pohl via user
Here are a few things I noticed from the 1.17 release retrospectively which I want to share (other release managers might have a different view or might disagree): - Google Meet might not be the best choice for the release sync. We need to be able to invite attendees even if the creator of the

Re: [ANNOUNCE] Apache Flink 1.17.0 released

2023-03-27 Thread Matthias Pohl
Here are a few things I noticed from the 1.17 release retrospectively which I want to share (other release managers might have a different view or might disagree): - Google Meet might not be the best choice for the release sync. We need to be able to invite attendees even if the creator of the

Re: [ANNOUNCE] Apache Flink 1.17.0 released

2023-03-23 Thread Matthias Pohl
Thanks for making this release getting over the finish line. One additional thing: Feel free to reach out to the release managers (or respond to this thread) with feedback on the release process. Our goal is to constantly improve the release process. Feedback on what could be improved or things

Re: [ANNOUNCE] Apache Flink 1.17.0 released

2023-03-23 Thread Matthias Pohl via user
Thanks for making this release getting over the finish line. One additional thing: Feel free to reach out to the release managers (or respond to this thread) with feedback on the release process. Our goal is to constantly improve the release process. Feedback on what could be improved or things

Re: Job Cancellation Failing

2023-02-21 Thread Matthias Pohl via user
I noticed a test instability that sounds quite similar to what you're experiencing. I created FLINK-31168 [1] to follow-up on this one. [1] https://issues.apache.org/jira/browse/FLINK-31168 On Mon, Feb 20, 2023 at 4:50 PM Matthias Pohl wrote: > What do you mean by "earlier it used to

Re: Job Cancellation Failing

2023-02-20 Thread Matthias Pohl via user
What do you mean by "earlier it used to fail due to ExecutionGraphStore not existing in /tmp" folder? Did you get the error message "Could not create executionGraphStorage directory in /tmp." and creating this folder fixed the issue? It also looks like the stacktrace doesn't match any of the 1.15

Re: Blob server connection problem

2023-01-24 Thread Matthias Pohl via user
We had issues like that in the past (e.g. FLINK-24923 [1], FLINK-10683 [2]). The error you're observing is caused by an unexpected byte being read from the socket. The BlobServer protocol expects either 0 (for put messages) or 1 (for get messages) being retrieved as a header for new message blocks

Re: How does Flink plugin system work?

2023-01-02 Thread Matthias Pohl via user
#L457 > [3] > https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/plugins/ > > Matthias Pohl via user 于2023年1月2日周一 20:27写道: > >> Hi Ruibin, >> could you switch to using the currently supported way for instantiating >> reporters using t

Re: The use of zookeeper in flink

2023-01-02 Thread Matthias Pohl via user
And I screwed up the reply again. -.- Here's my previous response for the ML thread and not only spoon_lz: Hi spoon_lz, Thanks for reaching out to the community and sharing your use case. You're right about the fact that Flink's HA feature relies on the leader election. The HA backend not being

Re: How does Flink plugin system work?

2023-01-02 Thread Matthias Pohl via user
Hi Ruibin, could you switch to using the currently supported way for instantiating reporters using the factory configuration parameter [1][2]? Based on the ClassNotFoundException, your suspicion might be right that the plugin didn't make it onto the classpath. Could you share the startup logs of

Re: Cleanup for high-availability.storageDir

2022-12-08 Thread Matthias Pohl via user
ve that doesn't count as Cancelled, so > the artifacts for blobs and submitted job graphs are not cleaned up. I > imagine the same logic Gyula mentioned before applies, namely keep the > latest one and clean the older ones. > > Regards, > Alexis. > > Am Do., 8. Dez. 2022 um

Re: How's JobManager bring up TaskManager in Application Mode or Session Mode?

2022-11-28 Thread Matthias Pohl via user
Hi Mark, the JobManager is not necessarily in charge of spinning up TaskManager instances. It depends on the resource provider configuration you choose. Flink differentiates between active and passive Resource Management (see the two available implementations of ResourceManager [1]). Active

Re: [Security] - Critical OpenSSL Vulnerability

2022-11-01 Thread Matthias Pohl via user
The Docker image for Flink 1.12.7 uses an older base image which comes with openssl 1.1.1k. There was a previous post in the OpenSSL mailing list reporting a low vulnerability being fixed with 3.0.6 and 1.1.1r (both versions being explicitly mentioned) [1]. Therefore, I understand the post in a

Re: Watermark generating mechanism in Flink SQL

2022-10-17 Thread Matthias Pohl via user
Hi Hunk, there is documentation about watermarking in FlinkSQL [1]. There is also a FlinkSQL cookbook entry about watermarking [2]. Essentially, you define the watermark strategy in your CREATE TABLE statement and specify the lateness for a given event (not the period in which watermarks are

Re: Watermark generating mechanism in Flink SQL

2022-10-17 Thread Matthias Pohl
Hi Hunk, there is documentation about watermarking in FlinkSQL [1]. There is also a FlinkSQL cookbook entry about watermarking [2]. Essentially, you define the watermark strategy in your CREATE TABLE statement and specify the lateness for a given event (not the period in which watermarks are

Re: jobmaster's fatal error will kill the session cluster

2022-10-17 Thread Matthias Pohl via user
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) > ~[flink-scala_2.12-1.15.0.jar:1.15.0] > at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) > ~[flink-scala_2.12-1.15.0.jar:1.15.0] > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)

Re: Sometimes checkpoints to s3 fail

2022-10-14 Thread Matthias Pohl via user
Hi Evgeniy, is it Ceph which you're using as a S3 server? All the Google search entries point to Ceph when looking for the error message. Could it be that there's a problem with the version of the underlying system? The stacktrace you provided looks like Flink struggles to close the File and,

Re: jobmaster's fatal error will kill the session cluster

2022-10-14 Thread Matthias Pohl via user
Hi Jie Han, welcome to the community. Just a little side note: These kinds of questions are more suitable to be asked in the user mailing list. The dev mailing list is rather used for discussing feature development or project-related topics. See [1] for further details. About your question: The

Re: Cancel a job in status INITIALIZING

2022-09-26 Thread Matthias Pohl via user
Can you provide the JobManager logs for this case. It sounds odd that the job was stuck in the INITIALIZING phase. Matthias On Wed, Sep 21, 2022 at 11:50 AM Christian Lorenz via user < user@flink.apache.org> wrote: > Hi, > > > > we’re running a Flink Cluster in standalone/session mode. During a

Re: Jobmanager fails to come up if the job has an issue

2022-09-26 Thread Matthias Pohl via user
Yes, the JobManager will failover in HA mode and all jobs would be recovered. On Mon, Sep 26, 2022 at 2:06 PM ramkrishna vasudevan < ramvasu.fl...@gmail.com> wrote: > Thanks @Matthias Pohl . This is informative. So > generally in a session cluster if I have more than one job

Re: JobManager restarts on job failure

2022-09-26 Thread Matthias Pohl via user
ing configs > enabled: > > SHUTDOWN_ON_APPLICATION_FINISH = false > SUBMIT_FAILED_JOB_ON_APPLICATION_ERROR = true > > I think jobmanager pod would not restart but simply go to a terminal > failed state right? > > Gyula > > On Mon, Sep 26, 2022 at 12:31 PM Matthia

Re: Jobmanager fails to come up if the job has an issue

2022-09-26 Thread Matthias Pohl via user
ishna vasudevan < > ramvasu.fl...@gmail.com> wrote: > >> Thank you very much for the reply. I have lost the k8s cluster in this >> case before I could capture the logs. I will try to repro this and get back >> to you. >> >> Regards >> Ram >> >>

Re: JobManager restarts on job failure

2022-09-26 Thread Matthias Pohl via user
Thanks Evgeniy for reaching out to the community and Gyula for picking it up. I haven't looked into the k8s operator in much detail, yet. So, help me out if I miss something here. But I'm afraid that this is not something that would be fixed by upgrading to 1.15. The issue here is that we're

Re: Jobmanager fails to come up if the job has an issue

2022-09-26 Thread Matthias Pohl via user
Hi Ramkrishna, thanks for reaching out to the Flink community. Could you share the JobManager logs to get a better understanding of what's going on? I'm wondering why the JobManager is failing when the actual problem is that the job is struggling to access a folder. It sounds like there are

Re: Classloading issues with Flink Operator / Kubernetes Native

2022-09-16 Thread Matthias Pohl via user
Are you deploying the job in session or application mode? Could you provide the stacktrace. I'm wondering whether that would be helpful to pin a code location for further investigation. So far, I couldn't come up with a definite answer about placing the jar in the lib directory. Initially, I would

Re: New licensing for Akka

2022-09-09 Thread Matthias Pohl via user
Looks like there will be a bit of a grace period till Sep 2023 for vulnerability fixes in akka 2.6.x [1] [1] https://discuss.lightbend.com/t/2-6-x-maintenance-proposal/9949 On Wed, Sep 7, 2022 at 4:30 PM Robin Cassan via user wrote: > Thanks a lot for your answers, this is reassuring! > >

Re: New licensing for Akka

2022-09-07 Thread Matthias Pohl via user
There is some more discussion going on in the related PR [1]. Based on the current state of the discussion, akka 2.6.20 will be the last version under Apache 2.0 license. But, I guess, we'll have to see where this discussion is heading considering that it's kind of fresh. [1]

Re: Slow Tests in Flink 1.15

2022-09-06 Thread Matthias Pohl via user
Hi David, I guess, you're referring to [1]. But as Chesnay already pointed out in the previous thread: It would be helpful to get more insights into what exactly your tests are executing (logs, code, ...). That would help identifying the cause. > Can you give us a more complete stacktrace so we

Re: flink ci build run longer than the maximum time of 310 minutes.

2022-09-05 Thread Matthias Pohl via user
imSweet/flink/compare/release-1.15...apache:flink:release-1.15> > behind > <https://github.com/SwimSweet/flink/compare/release-1.15...apache:flink:release-1.15> > apache:release-1.15 > also appear in my pr change files. How can I handle this situation? thx. > > ---

Re: flink ci build run longer than the maximum time of 310 minutes.

2022-09-02 Thread Matthias Pohl via user
Not sure whether that applies to your case, but there was a recent issue [1] where the e2e_1_ci job ran into a timeout. If that's what you were observing, rebasing your branch might help. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-29161 On Fri, Sep 2, 2022 at 10:51 AM

Re: Failing to maven compile install Flink 1.15

2022-08-22 Thread Matthias Pohl via user
Hi hjw, it would be interesting to know the exact Maven commands you used for the successful run (where you compiled the flink-client module individually) and the failed run (where you tried to build everything at once) and probably a more complete version of the Maven output. The path

Re: Jobmanager trying to be registered for Zombie Job

2022-04-26 Thread Matthias Pohl
thanks a lot for your help too! > > It's not quite clear to me, the bug was already there since 1.13.6 but not > reported yet (FLINK-27354 is a new ticket)? > > Best, Peter > > > On Mon, Apr 25, 2022 at 5:48 PM Matthias Pohl > wrote: > >> Thanks again, Pet

Re: Jobmanager trying to be registered for Zombie Job

2022-04-25 Thread Matthias Pohl
details on the investigation in FLINK-27354 [1] itself. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-27354 On Mon, Apr 25, 2022 at 2:00 PM Matthias Pohl wrote: > Thanks Peter, we're looking into it... > > On Mon, Apr 25, 2022 at 11:54 AM Peter Schrott > wr

Re: Jobmanager trying to be registered for Zombie Job

2022-04-25 Thread Matthias Pohl
an be seen on jm 1 that > the job starts crashing and recovering a few times. This happens > until 2022-04-20 12:12:14,607. After that the above described behavior can > be seen. > > I hope this helps. > > Best, Peter > > On Fri, Apr 22, 2022 at 12:06 PM Matthias Pohl

Re: Jobmanager trying to be registered for Zombie Job

2022-04-22 Thread Matthias Pohl
/browse/FLINK-27354 On Fri, Apr 22, 2022 at 11:54 AM Matthias Pohl wrote: > Just by looking through the code, it appears that these logs could be > produced while stopping the job. The ResourceManager sends a confirmation > of the JobMaster being disconnected at the end back to the

Re: Jobmanager trying to be registered for Zombie Job

2022-04-22 Thread Matthias Pohl
) is that stopping the JobMaster didn't finish for some reason. For that it would be helpful to look at the logs to see whether there is some other issue that causes the JobMaster to stop entirely. On Fri, Apr 22, 2022 at 10:14 AM Matthias Pohl wrote: > ...if possible it would be good to get debug rat

Re: Jobmanager trying to be registered for Zombie Job

2022-04-22 Thread Matthias Pohl
...if possible it would be good to get debug rather than only info logs. Did you encounter anything odd in the TaskManager logs as well. Sharing those might be of value as well. On Fri, Apr 22, 2022 at 8:57 AM Matthias Pohl wrote: > Hi Peter, > thanks for sharing. That doesn't sound righ

Re: Jobmanager trying to be registered for Zombie Job

2022-04-22 Thread Matthias Pohl
Hi Peter, thanks for sharing. That doesn't sound right. May you provide the entire jobmanager logs? Best, Matthias On Thu, Apr 21, 2022 at 6:08 PM Peter Schrott wrote: > Hi Flink-Users, > > I am not sure if this does something to my cluster or not. But since > updating to Flink 1.15 (atm rc4)

Re: Adjusted frame length exceeds 2147483647

2022-03-18 Thread Matthias Pohl
: > This issue did not repeat, so it may be a network issue > > On Thu, Mar 17, 2022 at 6:12 PM Matthias Pohl wrote: > >> Hi Ori, >> that looks odd. The message seems to exceed the maximum size >> of 2147483647 bytes (2GB). I couldn't find anything similar in the ML or in >&g

Re: how to set kafka sink ssl properties

2022-03-17 Thread Matthias Pohl
Could you share more details on what's not working? Is the ssl.trustore.location accessible from the Flink nodes? Matthias On Thu, Mar 17, 2022 at 4:00 PM HG wrote: > Hi all, > I am probably not the smartest but I cannot find how to set ssl-properties > for a Kafka Sink. > My assumption was

Re: Adjusted frame length exceeds 2147483647

2022-03-17 Thread Matthias Pohl
Hi Ori, that looks odd. The message seems to exceed the maximum size of 2147483647 bytes (2GB). I couldn't find anything similar in the ML or in Jira that supports a bug in Flink. Could it be that there was some network issue? Matthias On Tue, Mar 15, 2022 at 6:52 AM Ori Popowski wrote: > I am

Re: Flink failure rate restart not work as expect

2022-03-01 Thread Matthias Pohl
detail logs to this thread when it happen again, > since it happen sometime, like between two weeks, if one of our cluster > machine go down. > ------ > *发件人:* Matthias Pohl > *发送时间:* 2022年3月1日 17:57 > *收件人:* Alexander Preuß > *抄送:* 刘 家锹 ; user@flink.apache.org

Re: Flink failure rate restart not work as expect

2022-03-01 Thread Matthias Pohl
Hi, I second Alex' observation - based on the logs it looks like the task restart functionality worked as expected: It tried to restart the tasks until it reached the limit of 4 attempts due to the missing TaskManager. The job-cluster shut down with an error code. At this point, YARN should pick

Re: [DISCUSS] Future of Per-Job Mode

2022-01-24 Thread Matthias Pohl
Hi all, I agree with Xintong's comment: Reducing the number of deployment modes would help users. There is a clearer distinction between session mode and the two other deployment modes (i.e. application and job mode). The difference between application and job mode is not that easy to grasp for

Re: Flink Kinesis Producer con't connect with AWS credentials

2022-01-07 Thread Matthias Pohl
FunctionWrapper) -> Sink: Unnamed (1/1)#0] WARN > o.a.f.k.s.c.a.s.k.producer.KinesisProducerConfiguration - Property > aws.credentials.provider.basic.accesskeyid ignored as there is no > corresponding set method in KinesisProducerConfiguration > > On Mon, Jan 3, 2022 at 5:34 PM Matthias Pohl

Re: Flink connection to remote server

2022-01-03 Thread Matthias Pohl
Hi Mariam, a quick mailing list query and Jira query didn't reveal any pointers for Flink with Milvus, unfortunately. But have you had a look at Flink's AsyncIO API [1]? I haven't worked with it, yet. But it sounds like something that might help you accessing an external system. Matthias [1]

Re: Flink Kinesis Producer con't connect with AWS credentials

2022-01-03 Thread Matthias Pohl
Hi Daniel, I'm assuming you already looked into the Flink documentation for this topic [1]? I'm gonna add Fabian to this thread. Maybe, he's able to help out here. Matthias [1] https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/connectors/kinesis.html#kinesis-producer On Fri, Dec

Re: JsonRowSerializationSchema unable to parse TIMESTAMP_LTZ fields

2022-01-03 Thread Matthias Pohl
For documentation purposes: Surendra started a discussion in FLINK-25411 [1]. [1] https://issues.apache.org/jira/browse/FLINK-25411 On Wed, Dec 22, 2021 at 9:51 AM Surendra Lalwani wrote: > > Hi Team, > > JsonRowSerializationSchema is unable to parse fields with type > TIMESTAMP_LTZ, seems

Re: log4j2 upgrade requirement

2022-01-03 Thread Matthias Pohl
Hi Puneet, Flink logs things like the job name which can be specified by the user. Hence, a user could (as far as I understand) add a job name containing malicious content. This is where the Flink cluster's log4j version comes into play. Therefore, it's not enough to provide only an updated log4j

Re: Scala Case Class Serialization

2021-12-07 Thread Matthias Pohl
Hi Lars, not sure about the out-of-the-box support for case classes with primitive member types (could you refer to the section which made you conclude this?). I haven't used Scala with Flink, yet. So maybe, others can give more context. But have you looked into using the TypeInfoFactory to define

Re: Query regarding exceptions API(/jobs/:jobid/exceptions)

2021-11-30 Thread Matthias Pohl
: > Hi Matthias, > > We have created a JIRA ticket for this issue. Please find the jira id below > > https://issues.apache.org/jira/browse/FLINK-25096 > > Thanks > Mahima > > On Mon, Nov 29, 2021 at 2:24 PM Matthias Pohl > wrote: > >> Thanks Mahima, >>

Re: Query regarding exceptions API(/jobs/:jobid/exceptions)

2021-11-29 Thread Matthias Pohl
92bcba074' on > file system 'file'. > > Caused by: java.lang.IllegalStateException: Failed to rollback to > checkpoint/savepoint > file:/mnt/c/Users/abc/Documents/checkpoints/a737088e21206281db87f6492bcba074/chk-144. > Thanks and Regards > Mahima Agarwal > > > On Fri, N

Re: Query regarding exceptions API(/jobs/:jobid/exceptions)

2021-11-25 Thread Matthias Pohl
Just to add a bit of context: The first-level members all-exceptions, root-exceptions, truncated and timestamp have been around for a longer time. The exceptionHistory was added in Flink 1.13. As part of this change, the aforementioned members were deprecated (see [1]). We kept them for

Re: Recommended metaspace memory config for 16GB hosts.

2021-11-23 Thread Matthias Pohl
> Non-Heap 416 MB 404 MB 2.23 GB > Total 5.28 GB 2.55 GB 7.10 GB > Outside JVM > Type > Count > Used > Capacity > Direct 32,836 1.01 GB 1.01 GB > Mapped 0 0 B 0 B > > > > On Tue, 23 Nov 2021 at 02:23, Matthias Pohl > wrote: > >> In general, running

Re: Recommended metaspace memory config for 16GB hosts.

2021-11-22 Thread Matthias Pohl
t; > Kafka -> JSon Validation (Jackson) -> filter -> JDBC to database. > > On Mon, 22 Nov 2021 at 10:24, Matthias Pohl > wrote: > >> Hi John, >> have you had a look at the memory model for Flink 1.10? [1] Based on the >> documentation, you could try increa

Re: Flink on Native Kubernetes S3 checkpointing error

2021-11-22 Thread Matthias Pohl
3 checkpointing is working fine. > So, in short, on AWS use EKS v1.20+ for IAM Pod Identity Webhook. > > Thanks, > Hemant > > On Mon, Nov 22, 2021 at 7:26 PM Matthias Pohl > wrote: > >> Hi bat man, >> this feature seems to be tied to a certain AWS SDK version

Re: Flink CLI - pass command line arguments to a pyflink job

2021-11-22 Thread Matthias Pohl
Hi Kamil, afaik, the parameter passing should work as normal by just appending them to the Flink job submission similar to the Java job submission: ``` $ ./flink run --help Action "run" compiles and runs a program. Syntax: run [OPTIONS] [...] ``` Matthias On Mon, Nov 22, 2021 at 3:58 PM

Re: Recommended metaspace memory config for 16GB hosts.

2021-11-22 Thread Matthias Pohl
Hi John, have you had a look at the memory model for Flink 1.10? [1] Based on the documentation, you could try increasing the Metaspace size independently of the Flink memory usage (i.e. flink.size). The heap Size is a part of the overall Flink memory. I hope that helps. Best, Matthias [1]

Re: Table API Filesystem connector - disable interval rolling policy

2021-11-22 Thread Matthias Pohl
Hi Kamil, by looking at the code I'd say that the only option you have is to increase the parameter you already mentioned to a very high number. But I'm not sure about the side effects. I'm gonna add Francesco to this thread. Maybe he has better ideas on how to answer your question. Best,

Re: Flink on Native Kubernetes S3 checkpointing error

2021-11-22 Thread Matthias Pohl
Hi bat man, this feature seems to be tied to a certain AWS SDK version [1] which you already considered. But I checked the version used in Flink 1.13.1 for the s3 filesystem. It seems like the version that's used (1.11.788) is good enough to provide this feature (which was added in 1.11.704): ```

Re: Kubernetes HA: New jobs stuck in Initializing for a long time after a certain number of existing jobs are running

2021-11-22 Thread Matthias Pohl
Hi Joey, that looks like a cluster configuration issue. The 192.168.100.79:6123 is not accessible from the JobManager pod (see line 1224f in the provided JM logs): 2021-11-19 04:06:45,049 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed

Re: FlinkJobNotFoundException

2021-10-14 Thread Matthias Pohl
wrote: > Hi Matthias, > > > > Do you have any update here? > > > > Thank you, > > Doug > > > > *From:* Gusick, Doug S [Engineering] > *Sent:* Thursday, October 7, 2021 9:03 AM > *To:* Hailu, Andreas [Engineering] ; > Matthias Pohl &

Re: Unable to connect to Mesos on mesos-appmaster.sh start

2021-09-30 Thread Matthias Pohl
; > > where $PORT1 refers to my second host open port, mapped to 6123 on the > Docker container (first port is mapped to 8081). > I can see in the log that $HOST and $PORT1 resolve to the correct values, > 10.0.20.25 > and 31608 > > On Wed, Sep 29, 2021 at 9:41 AM Matthias Pohl

Re: Start Flink cluster, k8s pod behavior

2021-09-30 Thread Matthias Pohl
Hi Qihua, I guess, looking into kubectl describe and the JobManager logs would help in understanding what's going on. Best, Matthias On Wed, Sep 29, 2021 at 8:37 PM Qihua Yang wrote: > Hi, > I deployed flink in session mode. I didn't run any jobs. I saw below logs. > That is normal, same as

Re: FlinkJobNotFoundException

2021-09-30 Thread Matthias Pohl
ogs there. Please let me know if you need any more information. > > > > Thank you, > > Doug > > > > *From:* Matthias Pohl > *Sent:* Wednesday, September 29, 2021 12:00 PM > *To:* Gusick, Doug S [Engineering] > *Cc:* user@flink.apache.org

Re: Unable to connect to Mesos on mesos-appmaster.sh start

2021-09-29 Thread Matthias Pohl
...and if possible, it would be helpful to provide debug logs as well. On Wed, Sep 29, 2021 at 6:33 PM Matthias Pohl wrote: > May you provide the entire JobManager logs so that we can see what's going > on? > > On Wed, Sep 29, 2021 at 12:42 PM Javier Vegas wrote: > >> T

Re: Unable to connect to Mesos on mesos-appmaster.sh start

2021-09-29 Thread Matthias Pohl
at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Unknown Source) > > On Wed, Sep 29, 2021 at 2:36 AM Matthias Pohl > wrote: > >> The port has its separate configuration par

Re: FlinkJobNotFoundException

2021-09-29 Thread Matthias Pohl
Hi Doug, thanks for reaching out to the community. First of all, 1.9.2 is quite an old Flink version. You might want to consider upgrading to a newer version. The community only offers support for the two most-recent Flink versions. Newer version might include fixes for your issue. But back to

Re: flink rest endpoint creation failure

2021-09-29 Thread Matthias Pohl
Hi Curt, could you elaborate a bit more on your setup? Maybe, provide commands you used to deploy the jobs and the Flink/YARN logs. What's puzzling me is your statement about "two JobManagers spinning up" and "everything's working fine if two TaskManagers are running on different instances". -

Re: Unable to connect to Mesos on mesos-appmaster.sh start

2021-09-29 Thread Matthias Pohl
ation.doAs(UserGroupInformation.java:1762) > at > org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:186) > ... 2 common frames omitte

Re: Unable to connect to Mesos on mesos-appmaster.sh start

2021-09-29 Thread Matthias Pohl
problem, or is this a red > herring and the problem is a different one? > > > Thanks, > > > Javier Vegas > > > > > > > > > On Tue, Sep 28, 2021 at 10:23 AM Javier Vegas wrote: > >> Thanks, Matthias! >> >> There are lots of apps deployed to

Re: Unable to connect to Mesos on mesos-appmaster.sh start

2021-09-28 Thread Matthias Pohl
Hi Javier, I don't see anything that's configured in the wrong way based on the jobmanager logs you've provided. Have you been able to deploy other applications to this Mesos cluster? Do the Mesos master logs reveal anything? The variable resolution on the TaskManager side is a valid concern

Re: hdfs lease issues on flink retry

2021-09-20 Thread Matthias Pohl
uot;s"*, *" "*).replace(*" "*, *"0"*) > + Integer.*toString*(taskNumber + 1) > + *"_0"*); > > > > > > So basically I am creating a directory named *attempt__0123_r_0001_0 *instead > of *attempt___r_0001_0*. I ha

Re: Fast serialization for Kotlin data classes

2021-09-16 Thread Matthias Pohl
16686 > > > > Regards, > > Alexis. > > > > *From:* Matthias Pohl > *Sent:* Donnerstag, 16. September 2021 13:12 > *To:* Alex Cruise > *Cc:* Flink ML > *Subject:* Re: Fast serialization for Kotlin data classes > > > > Hi Alex, > >

Re: Fast serialization for Kotlin data classes

2021-09-16 Thread Matthias Pohl
Hi Alex, have you had a look at TypeInfoFactory? That might be the best way to come up with a custom serialization mechanism. See the docs [1] for further details. Best, Matthias [1]

Re: [ANNOUNCE] Flink mailing lists archive service has migrated to Apache Archive service

2021-09-15 Thread Matthias Pohl
Thanks Leonard for the announcement. I guess that is helpful. @Robert is there any way we can change the default setting to something else (e.g. greater than 0 days)? Only having the last month available as a default is kind of annoying considering that the time setting is quite hidden. Matthias

Re: [ANNOUNCE] Flink mailing lists archive service has migrated to Apache Archive service

2021-09-15 Thread Matthias Pohl
Thanks Leonard for the announcement. I guess that is helpful. @Robert is there any way we can change the default setting to something else (e.g. greater than 0 days)? Only having the last month available as a default is kind of annoying considering that the time setting is quite hidden. Matthias

Re: KafkaSource builder and checkpointing with parallelism > kafka partitions

2021-09-15 Thread Matthias Pohl
Hi Lars, I guess you are looking for execution.checkpointing.checkpoints-after-tasks-finish.enabled [1]. This configuration parameter is going to be introduced in the upcoming Flink 1.14 release. Best, Matthias [1]

Re: hdfs lease issues on flink retry

2021-09-07 Thread Matthias Pohl
Just for documentation purposes: I created FLINK-24147 [1] to cover this issue. [1] https://issues.apache.org/jira/browse/FLINK-24147 On Thu, Aug 26, 2021 at 6:14 PM Matthias Pohl wrote: > I see - I should have checked my mailbox before answering. I received the > email and was able to

Re: FLINK-14316 happens on version 1.13.2

2021-09-07 Thread Matthias Pohl
Hi Xiangyu, thanks for reaching out to the community. Could you share the entire TaskManager and JobManager logs with us? That might help investigating what's going on. Best, Matthias On Fri, Sep 3, 2021 at 10:07 AM Xiangyu Su wrote: > Hi Yun, > Thanks alot. > I am running a test, and facing

Re: Savepoint failure along with JobManager crash

2021-08-31 Thread Matthias Pohl
Hi Prasanna, thanks for reaching out to the community. What you're experiencing is that the savepoint was created but the job itself ended up in an inconsistent state with Executions being cancelled instead of being finished. This should have triggered a global failover resulting in a job restart.

Re: Table API demo problem

2021-08-31 Thread Matthias Pohl
I missed the point that it's the purpose of the walkthrough to have the functionality being implemented by the user. So, FLINK-24076 is actually not valid. I initially thought of it as some kind of demo implementation. Sorry for the confusion. On Tue, Aug 31, 2021 at 11:15 AM Matthias Pohl wrote

Re: Table API demo problem

2021-08-31 Thread Matthias Pohl
Hi Manraj, the error messages about libjemalloc.so are caused by Flink 1.13.1 that has been published with the wrong architecture accidentally. I created FLINK-24075 [1] to cover this issue. As a workaround, you could upgrade the base image to Flink 1.13.2 until the Flink 1.13.1 images are

Re: Queries regarding Flink upgrade strategies

2021-08-27 Thread Matthias Pohl
when during > upgrade Jobmanagers & Taskmanger pods are on different versions. Also the > impact of recreate strategy in the same context. > > Regards, > Amit > > On Fri, Aug 27, 2021 at 5:32 PM Matthias Pohl > wrote: > >> The upgrade approach mentioned in my previo

Re: Queries regarding Flink upgrade strategies

2021-08-27 Thread Matthias Pohl
t; (physical/virtual) deployment. > I want to understand the upgrade strategies on kubernetes deployments > where Flink is running in pods. If you could help in that area it would be > great. > > Regards, > Amit Bhatia > > On Thu, Aug 26, 2021 at 5:25 PM Matthias Pohl >

Re: hdfs lease issues on flink retry

2021-08-26 Thread Matthias Pohl
I see - I should have checked my mailbox before answering. I received the email and was able to login. On Thu, Aug 26, 2021 at 6:12 PM Matthias Pohl wrote: > The link doesn't work, i.e. I'm redirected to a login page. It would be > also good to include the Flink logs and make them acce

Re: hdfs lease issues on flink retry

2021-08-26 Thread Matthias Pohl
Please let us know if > you’re not able to see the files. > > > > > > *From:* Matthias Pohl > *Sent:* Thursday, August 26, 2021 9:47 AM > *To:* Shah, Siddharth [Engineering] > *Cc:* user@flink.apache.org; Hailu, Andreas [Engineering] < > andreas.ha...@ny.email.gs.

Re: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden)

2021-08-26 Thread Matthias Pohl
Hi Jonas, have you included the s3 credentials in the Flink config file like it's described in [1]? I'm not sure about this hive.s3.use-instance-credentials being a valid configuration parameter. Best, Matthias [1]

Re: hdfs lease issues on flink retry

2021-08-26 Thread Matthias Pohl
Hi Siddharth, thanks for reaching out to the community. This might be a bug. Could you share your Flink and YARN logs? This way we could get a better understanding of what's going on. Best, Matthias On Tue, Aug 24, 2021 at 10:19 PM Shah, Siddharth [Engineering] < siddharth.x.s...@gs.com> wrote:

Re: Disabling autogenerated uid/hash doesn't work when using file source

2021-08-26 Thread Matthias Pohl
Hi Vishal, you're right: the FileSource itself doesn't provide these methods. But you could get them through the DataStreamSource (which implements SingleOutputStreamOperator and provides these two methods [1,2]). It is returned by StreamExecutionEnvironment.fromSource [3]. fromSource would need

Re: Flink Avro Timestamp Precision Issue

2021-08-26 Thread Matthias Pohl
Hi Akshay, thanks for reaching out to the community. There was a similar question on the mailing list earlier this month [1]. Unfortunately, it just doesn't seem to be supported, yet. The feature request was already created with FLINK-23589 [2]. Best, Matthias [1]

Re: Kinesis Producer not working with Flink 1.11.2

2021-08-26 Thread Matthias Pohl
Hi Sanket, have you considered reaching out to the Kinesis community? I might be wrong but it looks like a Kinesis issue. Best, Matthias On Tue, Aug 24, 2021 at 7:13 PM Sanket Agrawal wrote: > Hi, > > > > We are trying to use Kinesis along with Flink(1.11.2) and JDK 11 on EMR > cluster(6.2).

Re: checkpoints/.../shared cleanup

2021-08-26 Thread Matthias Pohl
Hi Alexey, thanks for reaching out to the community. I have a question: What do you mean by "the shared subfolder still grows"? As far as I understand, the shared folder contains the state of incremental checkpoints. If you cancel the corresponding job and start a new job from one of the retained

  1   2   3   >