[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-26 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820657#comment-17820657 ] Gyula Fora commented on FLINK-34451: I opened a new ticket to track this issue explicitly for the

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-26 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820622#comment-17820622 ] Gyula Fora commented on FLINK-34451: It looks like there is a race condition between handling the

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-25 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820604#comment-17820604 ] Gyula Fora commented on FLINK-34451: I took a closer look at this and it also happens with the

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-23 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820276#comment-17820276 ] Gyula Fora commented on FLINK-34451: I will definitely try this on Monday , I was just curious if

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-23 Thread Alex Hoffer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820275#comment-17820275 ] Alex Hoffer commented on FLINK-34451: - You’re the expert, I’m merely a user!:D I’d be curious if you

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-23 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820272#comment-17820272 ] Gyula Fora commented on FLINK-34451: To me the logs are not very surprising. The way it is currently

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-23 Thread Alex Hoffer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820233#comment-17820233 ] Alex Hoffer commented on FLINK-34451: - The clue seems to be in the logs. The operator thinks it is

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-23 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820209#comment-17820209 ] Gyula Fora commented on FLINK-34451: That's a good catch, if this is a bug related to adaptive

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-23 Thread Alex Hoffer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820151#comment-17820151 ] Alex Hoffer commented on FLINK-34451: - [~gyfora] I can recreate on basic-checkpoint-ha by adding 

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-22 Thread Alex Hoffer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819853#comment-17819853 ] Alex Hoffer commented on FLINK-34451: - I just got the checkpoint example running and it doesn't show

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-22 Thread Alex Hoffer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819829#comment-17819829 ] Alex Hoffer commented on FLINK-34451: - I'm bound to Azure unfortunately. Also, why would that only

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-22 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819826#comment-17819826 ] Gyula Fora commented on FLINK-34451: Also one thing that occurred to me is that the issue could be

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-22 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819810#comment-17819810 ] Gyula Fora commented on FLINK-34451: I tried killing TMs and immediately bumping the restartNonce at

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-22 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819807#comment-17819807 ] Gyula Fora commented on FLINK-34451: Hm, so this really seems to be somehow adaptive scheduler

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-22 Thread Alex Hoffer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819803#comment-17819803 ] Alex Hoffer commented on FLINK-34451: - Putting JMs at 1 with HA on still leads to the same result -

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-22 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819795#comment-17819795 ] Gyula Fora commented on FLINK-34451: I did not mean to turn off HA but only to reduce the replicas

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-22 Thread Alex Hoffer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819791#comment-17819791 ] Alex Hoffer commented on FLINK-34451: - That "fixed it", but not in a satisfying way... Using non-HA

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-22 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819783#comment-17819783 ] Gyula Fora commented on FLINK-34451: Could this be related to the the Jobmanager HA? Instead of 2

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-22 Thread Alex Hoffer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819774#comment-17819774 ] Alex Hoffer commented on FLINK-34451: - [~gyfora] I see it on 1.18.1 as well. Only flipping off the

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-22 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819567#comment-17819567 ] Gyula Fora commented on FLINK-34451: I am only asking because there have been fixes / improvements

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-22 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819564#comment-17819564 ] Gyula Fora commented on FLINK-34451: Which 1.18 version are you using? I have only tried to repro

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-22 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819563#comment-17819563 ] Gyula Fora commented on FLINK-34451: [~alexdchoffer] so, just to confirm: This issue doesn't occur

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-21 Thread Alex Hoffer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819410#comment-17819410 ] Alex Hoffer commented on FLINK-34451: -   # Here is my FlinkDeployment: {code:java} apiVersion:

[jira] [Commented] (FLINK-34451) [Kubernetes Operator] Job with restarting TaskManagers uses wrong/misleading fallback approach

2024-02-16 Thread Gyula Fora (Jira)
[ https://issues.apache.org/jira/browse/FLINK-34451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818126#comment-17818126 ] Gyula Fora commented on FLINK-34451: Before we can investigate the root cause it would be great to