[jira] [Closed] (FLINK-36250) Remove CheckpointStorage-related configuration getters/setters that return/set complex Java objects

2024-09-18 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-36250.
---
Resolution: Done

df412add3895e5fa88a54c26b67f188940345d0c

> Remove CheckpointStorage-related configuration getters/setters that 
> return/set complex Java objects
> ---
>
> Key: FLINK-36250
> URL: https://issues.apache.org/jira/browse/FLINK-36250
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0-preview
>
>
> FLINK-33581/FLIP-381: Deprecate configuration getters/setters that return or 
> set complex Java objects.
> In Flink 2.0, we will remove these deprecated methods and fields. This change 
> will prevent users from configuring their jobs by passing complex Java 
> objects, encouraging them to use {{ConfigOption}} instead.
> This JIRA will remove the public API associated with CheckpointStorage 
> getters/setters that return/set complex Java objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-36249) Remove RestartStrategy-related configuration getters/setters that return/set complex Java objects

2024-09-17 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-36249.
---
Resolution: Done

06d045f77cfb67a08d7e397d3c06ef8189bbe68a
4e825107c34d4163edb0691332877be2004b28e3

> Remove RestartStrategy-related configuration getters/setters that return/set 
> complex Java objects
> -
>
> Key: FLINK-36249
> URL: https://issues.apache.org/jira/browse/FLINK-36249
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0-preview
>
>
> FLINK-33581/FLIP-381: Deprecate configuration getters/setters that return or 
> set complex Java objects.
> In Flink 2.0, we will remove these deprecated methods and fields. This change 
> will prevent users from configuring their jobs by passing complex Java 
> objects, encouraging them to use {{ConfigOption}} instead.
> This JIRA will remove the public API related to {{RestartStrategy}} 
> getter/setter methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-36251) Remove StateBackend-related configuration getters/setters that return/set complex Java objects

2024-09-11 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-36251:
---

Assignee: Junrui Li

> Remove StateBackend-related configuration getters/setters that return/set 
> complex Java objects
> --
>
> Key: FLINK-36251
> URL: https://issues.apache.org/jira/browse/FLINK-36251
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
> Fix For: 2.0-preview
>
>
> FLINK-33581/FLIP-381: Deprecate configuration getters/setters that return or 
> set complex Java objects.
> In Flink 2.0, we will remove these deprecated methods and fields. This change 
> will prevent users from configuring their jobs by passing complex Java 
> objects, encouraging them to use {{ConfigOption}} instead.
> This JIRA will remove the public API associated with StateBackend 
> getters/setters that return/set complex Java objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-36249) Remove RestartStrategy-related configuration getters/setters that return/set complex Java objects

2024-09-11 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-36249:
---

Assignee: Junrui Li

> Remove RestartStrategy-related configuration getters/setters that return/set 
> complex Java objects
> -
>
> Key: FLINK-36249
> URL: https://issues.apache.org/jira/browse/FLINK-36249
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
> Fix For: 2.0-preview
>
>
> FLINK-33581/FLIP-381: Deprecate configuration getters/setters that return or 
> set complex Java objects.
> In Flink 2.0, we will remove these deprecated methods and fields. This change 
> will prevent users from configuring their jobs by passing complex Java 
> objects, encouraging them to use {{ConfigOption}} instead.
> This JIRA will remove the public API related to {{RestartStrategy}} 
> getter/setter methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-36250) Remove CheckpointStorage-related configuration getters/setters that return/set complex Java objects

2024-09-11 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-36250:
---

Assignee: Junrui Li

> Remove CheckpointStorage-related configuration getters/setters that 
> return/set complex Java objects
> ---
>
> Key: FLINK-36250
> URL: https://issues.apache.org/jira/browse/FLINK-36250
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
> Fix For: 2.0-preview
>
>
> FLINK-33581/FLIP-381: Deprecate configuration getters/setters that return or 
> set complex Java objects.
> In Flink 2.0, we will remove these deprecated methods and fields. This change 
> will prevent users from configuring their jobs by passing complex Java 
> objects, encouraging them to use {{ConfigOption}} instead.
> This JIRA will remove the public API associated with CheckpointStorage 
> getters/setters that return/set complex Java objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-36185) Remove RuntimeContext#getExecutionConfig

2024-09-11 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-36185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881152#comment-17881152
 ] 

Zhu Zhu commented on FLINK-36185:
-

581a4f761d231166f1471fcd1f18ae55df680983
d564e88926453cfa05863255031999da155b833b
c7b1f4be424396bb07cc65777fdb31053f785be1
356e2af4b27b5a7250d9780ee7b41ce945b98c1e

> Remove RuntimeContext#getExecutionConfig
> 
>
> Key: FLINK-36185
> URL: https://issues.apache.org/jira/browse/FLINK-36185
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 2.0-preview
>
>
> FLINK-33713/FLIP-391 Deprecate RuntimeContext#getExecutionConfig
> In Flink 2.0 we should remove these deprecated method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-36185) Remove RuntimeContext#getExecutionConfig

2024-09-11 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-36185.
---
Release Note: The public method `RuntimeContext#getExecutionConfig()`, 
which was deprecated in FLINK-33713/FLIP-391, is removed.
  Resolution: Done

> Remove RuntimeContext#getExecutionConfig
> 
>
> Key: FLINK-36185
> URL: https://issues.apache.org/jira/browse/FLINK-36185
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 2.0-preview
>
>
> FLINK-33713/FLIP-391 Deprecate RuntimeContext#getExecutionConfig
> In Flink 2.0 we should remove these deprecated method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32384) Remove deprecated configuration keys which violate YAML spec

2024-09-10 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-32384.
---
Resolution: Duplicate

> Remove deprecated configuration keys which violate YAML spec
> 
>
> Key: FLINK-32384
> URL: https://issues.apache.org/jira/browse/FLINK-32384
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zhu Zhu
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: 2.0-related
> Fix For: 2.0-preview
>
>
> In FLINK-29372, key that violate YAML spec are renamed to a valid form and 
> the old names are deprecated.
> In Flink 2.0 we should remove these deprecated keys. This will prevent users 
> (unintentionally) to create invalid YAML form flink-conf.yaml.
> Then with the work of FLINK-23620,  we can remove the non-standard YAML 
> parsing logic and enforce standard YAML validation in CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33677) Remove flink-conf.yaml from flink dist

2024-09-10 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33677.
---
Release Note: 
FLINK-33297/FLIP-366 introduces support for parsing standard YAML files for 
Flink configuration. A new configuration file named config.yaml, which adheres 
to the standard YAML format, has been introduced.

In the Flink 2.0, we have removed the old configuration file from the 
flink-dist, along with support for the old configuration parser. Users are now 
required to use the new config.yaml file when starting the cluster and 
submitting jobs. They can utilize the migration tool to assist in migrating 
legacy configuration files to the new format.
  Resolution: Done

> Remove flink-conf.yaml from flink dist
> --
>
> Key: FLINK-33677
> URL: https://issues.apache.org/jira/browse/FLINK-33677
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zhu Zhu
>Assignee: Junrui Li
>Priority: Major
>  Labels: 2.0-related, pull-request-available
> Fix For: 2.0-preview
>
>
> FLINK-33297/FLIP-366 supports parsing standard YAML files for Flink 
> configuration. A new configuration file config.yaml, which should be a 
> standard YAML file, is introduced.
> To ensure compatibility, in Flink 1.x, the old configuration parser will 
> still be used if the old configuration file flink-conf.yaml exists. Only if 
> it does not exist, the new configuration file will be used.
> In Flink 2.0, we should remove the old configuration file from flink dist, as 
> well as the old configuration parser.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33677) Remove flink-conf.yaml from flink dist

2024-09-10 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880830#comment-17880830
 ] 

Zhu Zhu edited comment on FLINK-33677 at 9/11/24 2:36 AM:
--

flink master: c0176fd6794295dc3706ed7f536f3f70b6110923
flink-docker dev-master: 6bac0c6248e1b13eb09c1fb0fb51f5b400202679


was (Author: zhuzh):
flink master: c0176fd6794295dc3706ed7f536f3f70b6110923

> Remove flink-conf.yaml from flink dist
> --
>
> Key: FLINK-33677
> URL: https://issues.apache.org/jira/browse/FLINK-33677
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zhu Zhu
>Assignee: Junrui Li
>Priority: Major
>  Labels: 2.0-related, pull-request-available
> Fix For: 2.0-preview
>
>
> FLINK-33297/FLIP-366 supports parsing standard YAML files for Flink 
> configuration. A new configuration file config.yaml, which should be a 
> standard YAML file, is introduced.
> To ensure compatibility, in Flink 1.x, the old configuration parser will 
> still be used if the old configuration file flink-conf.yaml exists. Only if 
> it does not exist, the new configuration file will be used.
> In Flink 2.0, we should remove the old configuration file from flink dist, as 
> well as the old configuration parser.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33749) Remove deprecated setter/getter method in Configuration.

2024-09-10 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33749.
---
Resolution: Duplicate

> Remove deprecated setter/getter method in Configuration.
> 
>
> Key: FLINK-33749
> URL: https://issues.apache.org/jira/browse/FLINK-33749
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: 2.0-related, pull-request-available
> Fix For: 2.0-preview
>
>
> Currently, the Configuration has several setter/getter methods for retrieving 
> config values based on a String key, such as getString(String key, String 
> defaultValue), and all of which have been deprecated.
> We should remove these getter methods in FLINK-2.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33677) Remove flink-conf.yaml from flink dist

2024-09-10 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880830#comment-17880830
 ] 

Zhu Zhu commented on FLINK-33677:
-

flink master: c0176fd6794295dc3706ed7f536f3f70b6110923

> Remove flink-conf.yaml from flink dist
> --
>
> Key: FLINK-33677
> URL: https://issues.apache.org/jira/browse/FLINK-33677
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zhu Zhu
>Assignee: Junrui Li
>Priority: Major
>  Labels: 2.0-related, pull-request-available
> Fix For: 2.0-preview
>
>
> FLINK-33297/FLIP-366 supports parsing standard YAML files for Flink 
> configuration. A new configuration file config.yaml, which should be a 
> standard YAML file, is introduced.
> To ensure compatibility, in Flink 1.x, the old configuration parser will 
> still be used if the old configuration file flink-conf.yaml exists. Only if 
> it does not exist, the new configuration file will be used.
> In Flink 2.0, we should remove the old configuration file from flink dist, as 
> well as the old configuration parser.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-36063) Migrate StreamGraph and its related classes to flink-runtime

2024-08-27 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-36063.
---
Fix Version/s: 2.0.0
   Resolution: Fixed

master: 51f1684be0214a1a8c47ffb4ae20b6ebf6f0fc0c

> Migrate StreamGraph and its related classes to flink-runtime
> 
>
> Key: FLINK-36063
> URL: https://issues.apache.org/jira/browse/FLINK-36063
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>
> In order to allow relevant components in the runtime layer to access the 
> StreamGraph and its related classes, we should migrate StreamGraph and its 
> related classes from flink-streaming-java to flink-runtime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-36032) FLIP-468: Introducing StreamGraph-Based Job Submission

2024-08-27 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-36032:
---

Assignee: Junrui Li

> FLIP-468: Introducing StreamGraph-Based Job Submission
> --
>
> Key: FLINK-36032
> URL: https://issues.apache.org/jira/browse/FLINK-36032
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination, Runtime / REST
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>
> This is the umbrella ticket for 
> [FLIP-468|https://cwiki.apache.org/confluence/display/FLINK/FLIP-468%3A+Introducing+StreamGraph-Based+Job+Submission]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-36033) FLIP-469: Supports Adaptive Optimization of StreamGraph

2024-08-27 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-36033:
---

Assignee: Junrui Li

> FLIP-469: Supports Adaptive Optimization of StreamGraph
> ---
>
> Key: FLINK-36033
> URL: https://issues.apache.org/jira/browse/FLINK-36033
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>
> This is the umbrella ticket for 
> [FLIP-469|https://cwiki.apache.org/confluence/display/FLINK/FLIP-469%3A+Supports+Adaptive+Optimization+of+StreamGraph]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-36066) Introducing the AdaptiveGraphGenerator component

2024-08-27 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-36066:
---

Assignee: Lei Yang

> Introducing the AdaptiveGraphGenerator component
> 
>
> Key: FLINK-36066
> URL: https://issues.apache.org/jira/browse/FLINK-36066
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Lei Yang
>Priority: Major
>
> To support the incremental generation of JobGraph and update the StreamGraph 
> based on runtime information, we plan to introduce the AdaptiveGraphManager 
> component, which will implement the AdaptiveGraphGenerator interface. The 
> AdaptiveGraphGenerator interface is responsible for:
>  # Generating the JobGraph
>  # Responding to upstream job vertex finished events, generating JobVertex, 
> and updating it to the JobGraph
>  # Providing StreamGraphContext to allow StreamGraphOptimizer to add, delete, 
> and modify StreamNode and StreamEdge



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-36065) Support for submitting StreamGraph

2024-08-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-36065:
---

Assignee: Junrui Li

> Support for submitting StreamGraph
> --
>
> Key: FLINK-36065
> URL: https://issues.apache.org/jira/browse/FLINK-36065
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>
> This ticket will encompass the following tasks:
>  # Make StreamGraph serializable
>  # Support for persisting StreamGraph
>  # Enable submission of StreamGraph
> Note that this ticket will not modify the existing JobGraph submission path.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-36063) Migrate StreamGraph and its related classes to flink-runtime

2024-08-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-36063:
---

Assignee: Junrui Li

> Migrate StreamGraph and its related classes to flink-runtime
> 
>
> Key: FLINK-36063
> URL: https://issues.apache.org/jira/browse/FLINK-36063
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> In order to allow relevant components in the runtime layer to access the 
> StreamGraph and its related classes, we should migrate StreamGraph and its 
> related classes from flink-streaming-java to flink-runtime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-36064) Refactor CollectResultIterator to eliminate dependency on OperatorID

2024-08-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-36064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-36064:
---

Assignee: Junrui Li

> Refactor CollectResultIterator to eliminate dependency on OperatorID
> 
>
> Key: FLINK-36064
> URL: https://issues.apache.org/jira/browse/FLINK-36064
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the CollectResultIterator relies on the operator ID to send REST 
> requests to the JM for querying results. However, with the introduction of 
> Stream Graph submission, the Job Graph will be compiled and generated within 
> the JM, resulting in the client being unable to access the operator ID.
> To resolve this issue, we intend to refactor CollectResultIterator to remove 
> its dependency on the operator ID and utilize an alternative UUID instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33677) Remove flink-conf.yaml from flink dist

2024-08-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33677:
---

Assignee: Junrui Li

> Remove flink-conf.yaml from flink dist
> --
>
> Key: FLINK-33677
> URL: https://issues.apache.org/jira/browse/FLINK-33677
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zhu Zhu
>Assignee: Junrui Li
>Priority: Major
>  Labels: 2.0-related
>
> FLINK-33297/FLIP-366 supports parsing standard YAML files for Flink 
> configuration. A new configuration file config.yaml, which should be a 
> standard YAML file, is introduced.
> To ensure compatibility, in Flink 1.x, the old configuration parser will 
> still be used if the old configuration file flink-conf.yaml exists. Only if 
> it does not exist, the new configuration file will be used.
> In Flink 2.0, we should remove the old configuration file from flink dist, as 
> well as the old configuration parser.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35766) When the job contains many YieldingOperatorFactory instances, compiling the JobGraph hangs

2024-08-13 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35766.
---
Fix Version/s: 2.0.0
   1.20.1
   Resolution: Fixed

master(2.0.0): ff3cfb498e1987365320dc77fb0ec40d1573ee25
1.20: b8eb349ef509dfc6ab41736f3568b3a4967a2381

> When the job contains many YieldingOperatorFactory instances, compiling the 
> JobGraph hangs
> --
>
> Key: FLINK-35766
> URL: https://issues.apache.org/jira/browse/FLINK-35766
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.20.1
>
>
> When a job contains YieldingOperatorFactory instances, the time complexity of 
> compiling the JobGraph is very high (with a complexity of O(N!)). This leads 
> to the job compilation hanging on creating chains when there are many 
> YieldingOperatorFactory instances (e.g., more than 30).
> This is a very rare bug, but we have users who use SQL that contains many 
> LookupJoins that use YieldingOperatorFactory in the production environment. A 
> simple reproducible case is as follows:
> {code:java}
> @Test
> void test() {
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.createLocalEnvironment(1);
> env.fromSource(
> new NumberSequenceSource(0, 10), 
> WatermarkStrategy.noWatermarks(), "input")
> .map((x) -> x)
> // add 32 YieldingOperatorFactory
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperatorFactory<>())
> .transform(
> "test", BasicTypeInfo.LONG_TYPE_INFO, new 
> YieldingTestOperator

[jira] [Closed] (FLINK-35731) Sink V2 operator is mistakenly assumed always to be parallelism configured

2024-07-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35731.
---
Fix Version/s: 1.20.0
   1.19.2
   Resolution: Fixed

1.20: 2b3a47d98bf06ffde3d6d7c850414cc07a47d3f2
1.19: 6caf576d582bf3b2fcb9c6ef71f46115c58ea59c

> Sink V2 operator is mistakenly assumed always to be parallelism configured
> --
>
> Key: FLINK-35731
> URL: https://issues.apache.org/jira/browse/FLINK-35731
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.18.1, 1.20.0, 1.19.1
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0, 1.19.2
>
>
> Currently, the Sink V2 operator is always marked as parallelism configured, 
> which prevents parallelism from being inferred. This can cause confusion for 
> users utilizing the Adaptive Batch scheduler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35731) Sink V2 operator is mistakenly assumed always to be parallelism configured

2024-07-05 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35731:
---

Assignee: Junrui Li

> Sink V2 operator is mistakenly assumed always to be parallelism configured
> --
>
> Key: FLINK-35731
> URL: https://issues.apache.org/jira/browse/FLINK-35731
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the Sink V2 operator is always marked as parallelism configured, 
> which prevents parallelism from being inferred. This can cause confusion for 
> users utilizing the Adaptive Batch scheduler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35731) Sink V2 operator is mistakenly assumed always to be parallelism configured

2024-07-05 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-35731:

Affects Version/s: 1.19.1
   1.18.1
   1.17.2
   1.20.0

> Sink V2 operator is mistakenly assumed always to be parallelism configured
> --
>
> Key: FLINK-35731
> URL: https://issues.apache.org/jira/browse/FLINK-35731
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.18.1, 1.20.0, 1.19.1
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the Sink V2 operator is always marked as parallelism configured, 
> which prevents parallelism from being inferred. This can cause confusion for 
> users utilizing the Adaptive Batch scheduler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35731) Sink V2 operator is mistakenly assumed always to be parallelism configured

2024-07-04 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863109#comment-17863109
 ] 

Zhu Zhu commented on FLINK-35731:
-

[~JunRuiLi] could you open PRs to backport this fix to 1.19 & 1.20?

> Sink V2 operator is mistakenly assumed always to be parallelism configured
> --
>
> Key: FLINK-35731
> URL: https://issues.apache.org/jira/browse/FLINK-35731
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the Sink V2 operator is always marked as parallelism configured, 
> which prevents parallelism from being inferred. This can cause confusion for 
> users utilizing the Adaptive Batch scheduler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35635) Release Testing: Verify FLIP-445: Support dynamic parallelism inference for HiveSource

2024-06-26 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860303#comment-17860303
 ] 

Zhu Zhu commented on FLINK-35635:
-

The ticket is assigned to you. Thanks for volunteering! [~JunRuiLi]

> Release Testing: Verify FLIP-445: Support dynamic parallelism inference for 
> HiveSource
> --
>
> Key: FLINK-35635
> URL: https://issues.apache.org/jira/browse/FLINK-35635
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: xingbe
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.20.0
>
>
> This issue aims to verify FLIP-445.
> The Hive Source now supports dynamic parallelism inference, and the default 
> parallelism inference mode has also changed from static to dynamic inference. 
> For detailed information, please refer to the 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#source-parallelism-inference].
> We may need to cover the following types of test cases:
> Test 1: Confirm that the dynamic parallelism inference feature is effective.
> Test 2: Confirm whether it is possible to switch the parallelism inference 
> mode or disable parallelism inference through the config option 
> `table.exec.hive.infer-source-parallelism.mode`.
> Test 3: Construct scenarios where DynamicPartitionPruning is effective, and 
> confirm whether the dynamically inferred source parallelism has been 
> optimized.
>  
> Follow up the test for https://issues.apache.org/jira/browse/FLINK-35293
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35635) Release Testing: Verify FLIP-445: Support dynamic parallelism inference for HiveSource

2024-06-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35635:
---

Assignee: Junrui Li

> Release Testing: Verify FLIP-445: Support dynamic parallelism inference for 
> HiveSource
> --
>
> Key: FLINK-35635
> URL: https://issues.apache.org/jira/browse/FLINK-35635
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: xingbe
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.20.0
>
>
> This issue aims to verify FLIP-445.
> The Hive Source now supports dynamic parallelism inference, and the default 
> parallelism inference mode has also changed from static to dynamic inference. 
> For detailed information, please refer to the 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#source-parallelism-inference].
> We may need to cover the following types of test cases:
> Test 1: Confirm that the dynamic parallelism inference feature is effective.
> Test 2: Confirm whether it is possible to switch the parallelism inference 
> mode or disable parallelism inference through the config option 
> `table.exec.hive.infer-source-parallelism.mode`.
> Test 3: Construct scenarios where DynamicPartitionPruning is effective, and 
> confirm whether the dynamically inferred source parallelism has been 
> optimized.
>  
> Follow up the test for https://issues.apache.org/jira/browse/FLINK-35293
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35669) Release Testing: Verify FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-06-26 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860302#comment-17860302
 ] 

Zhu Zhu commented on FLINK-35669:
-

Assigned. Thanks for volunteering! [~xiasun]

> Release Testing: Verify FLIP-383: Support Job Recovery from JobMaster 
> Failures for Batch Jobs
> -
>
> Key: FLINK-35669
> URL: https://issues.apache.org/jira/browse/FLINK-35669
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Junrui Li
>Assignee: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.20.0
>
>
> In 1.20, we introduced a batch job recovery mechanism to enable batch jobs to 
> recover as much progress as possible after a JobMaster failover, avoiding the 
> need to rerun tasks that have already been finished.
> More information about this feature and how to enable it could be found in: 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/recovery_from_job_master_failure/]
> We may need the following tests:
>  # Start a batch job with High Availability (HA) enabled, and after it has 
> progressed to a certain point, kill the JobManager (jm), then observe whether 
> the job recovers its progress normally.
>  # Use a custom source and ensure that its SplitEnumerator implements the 
> SupportsBatchSnapshot interface, submit the job, and after it has progressed 
> to a certain point, kill the JobManager (jm), then observe whether the job 
> recovers its progress normally.
>  
> Follow up the test for https://issues.apache.org/jira/browse/FLINK-33892



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35669) Release Testing: Verify FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-06-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35669:
---

Assignee: xingbe

> Release Testing: Verify FLIP-383: Support Job Recovery from JobMaster 
> Failures for Batch Jobs
> -
>
> Key: FLINK-35669
> URL: https://issues.apache.org/jira/browse/FLINK-35669
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Junrui Li
>Assignee: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.20.0
>
>
> In 1.20, we introduced a batch job recovery mechanism to enable batch jobs to 
> recover as much progress as possible after a JobMaster failover, avoiding the 
> need to rerun tasks that have already been finished.
> More information about this feature and how to enable it could be found in: 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/recovery_from_job_master_failure/]
> We may need the following tests:
>  # Start a batch job with High Availability (HA) enabled, and after it has 
> progressed to a certain point, kill the JobManager (jm), then observe whether 
> the job recovers its progress normally.
>  # Use a custom source and ensure that its SplitEnumerator implements the 
> SupportsBatchSnapshot interface, submit the job, and after it has progressed 
> to a certain point, kill the JobManager (jm), then observe whether the job 
> recovers its progress normally.
>  
> Follow up the test for https://issues.apache.org/jira/browse/FLINK-33892



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35656) Hive Source has issues setting max parallelism in dynamic inference mode

2024-06-24 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35656.
---
Resolution: Fixed

master: 01c3fd67ac46898bd520477ae861cd29cceaa636
release-1.20: d93f7421e0d520d6b2899cbff5844867374b96ab

> Hive Source has issues setting max parallelism in dynamic inference mode
> 
>
> Key: FLINK-35656
> URL: https://issues.apache.org/jira/browse/FLINK-35656
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> In the dynamic parallelism inference mode of Hive Source, when 
> `table.exec.hive.infer-source-parallelism.max` is not configured, it does not 
> use `execution.batch.adaptive.auto-parallelism.default-source-parallelism` as 
> the upper bound for parallelism inference, which is inconsistent with the 
> behavior described in the documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35656) Hive Source has issues setting max parallelism in dynamic inference mode

2024-06-20 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35656:
---

Assignee: xingbe

> Hive Source has issues setting max parallelism in dynamic inference mode
> 
>
> Key: FLINK-35656
> URL: https://issues.apache.org/jira/browse/FLINK-35656
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
> Fix For: 1.20.0
>
>
> In the dynamic parallelism inference mode of Hive Source, when 
> `table.exec.hive.infer-source-parallelism.max` is not configured, it does not 
> use `execution.batch.adaptive.auto-parallelism.default-source-parallelism` as 
> the upper bound for parallelism inference, which is inconsistent with the 
> behavior described in the documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35635) Release Testing: Verify FLIP-445: Support dynamic parallelism inference for HiveSource

2024-06-18 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35635:
---

Assignee: (was: xingbe)

> Release Testing: Verify FLIP-445: Support dynamic parallelism inference for 
> HiveSource
> --
>
> Key: FLINK-35635
> URL: https://issues.apache.org/jira/browse/FLINK-35635
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.20.0
>
>
> This issue aims to verify FLIP-445.
> The Hive Source now supports dynamic parallelism inference, and the default 
> parallelism inference mode has also changed from static to dynamic inference. 
> For detailed information, please refer to the 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#source-parallelism-inference].
> We may need to cover the following types of test cases:
> Test 1: Confirm that the dynamic parallelism inference feature is effective.
> Test 2: Confirm whether it is possible to switch the parallelism inference 
> mode or disable parallelism inference through the config option 
> `table.exec.hive.infer-source-parallelism.mode`.
> Test 3: Construct scenarios where DynamicPartitionPruning is effective, and 
> confirm whether the dynamically inferred source parallelism has been 
> optimized.
>  
> Follow up the test for https://issues.apache.org/jira/browse/FLINK-35293
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35522) The source task may get stuck after a failover occurs in batch jobs

2024-06-06 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35522.
---
Fix Version/s: 1.18.2
   1.19.2
   Resolution: Fixed

1.18: 8ae7986e1dca80a686b678feb4ea3bdfff4a19bb
1.19: 7afa6eaae3218f273c7971434ea88a2cfd966ced

> The source task may get stuck after a failover occurs in batch jobs
> ---
>
> Key: FLINK-35522
> URL: https://issues.apache.org/jira/browse/FLINK-35522
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.2
>
>
> If the source task does not get assigned a split because the SplitEnumerator 
> has no more splits, and a failover occurs during the closing process, the 
> SourceCoordinatorContext will not resend the NoMoreSplit event to the newly 
> started source task, causing the source vertex to remain stuck indefinitely. 
> This case may only occur in batch jobs where speculative execution has been 
> enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35522) The source task may get stuck after a failover occurs in batch jobs

2024-06-05 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852611#comment-17852611
 ] 

Zhu Zhu commented on FLINK-35522:
-

[~xiasun] would you open 2 PRs to backport this change to 1.18 and 1.19?

> The source task may get stuck after a failover occurs in batch jobs
> ---
>
> Key: FLINK-35522
> URL: https://issues.apache.org/jira/browse/FLINK-35522
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> If the source task does not get assigned a split because the SplitEnumerator 
> has no more splits, and a failover occurs during the closing process, the 
> SourceCoordinatorContext will not resend the NoMoreSplit event to the newly 
> started source task, causing the source vertex to remain stuck indefinitely. 
> This case may only occur in batch jobs where speculative execution has been 
> enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35522) The source task may get stuck after a failover occurs in batch jobs

2024-06-04 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852241#comment-17852241
 ] 

Zhu Zhu commented on FLINK-35522:
-

Thanks for reporting this problem and volunteering to fix it! [~xiasun]
The ticket is assigned to you.

> The source task may get stuck after a failover occurs in batch jobs
> ---
>
> Key: FLINK-35522
> URL: https://issues.apache.org/jira/browse/FLINK-35522
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
> Fix For: 1.20.0
>
>
> If the source task does not get assigned a split because the SplitEnumerator 
> has no more splits, and a failover occurs during the closing process, the 
> SourceCoordinatorContext will not resend the NoMoreSplit event to the newly 
> started source task, causing the source vertex to remain stuck indefinitely. 
> This case may only occur in batch jobs where speculative execution has been 
> enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35522) The source task may get stuck after a failover occurs in batch jobs

2024-06-04 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35522:
---

Assignee: xingbe

> The source task may get stuck after a failover occurs in batch jobs
> ---
>
> Key: FLINK-35522
> URL: https://issues.apache.org/jira/browse/FLINK-35522
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
> Fix For: 1.20.0
>
>
> If the source task does not get assigned a split because the SplitEnumerator 
> has no more splits, and a failover occurs during the closing process, the 
> SourceCoordinatorContext will not resend the NoMoreSplit event to the newly 
> started source task, causing the source vertex to remain stuck indefinitely. 
> This case may only occur in batch jobs where speculative execution has been 
> enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35483) BatchJobRecoveryTest related to JM failover produced no output for 900 second

2024-05-31 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35483.
---
Fix Version/s: 1.20.0
   Resolution: Fixed

f3a3f926c6c6c931bb7ccc52e823d70cfd8aadf5

> BatchJobRecoveryTest related to JM failover produced no output for 900 second
> -
>
> Key: FLINK-35483
> URL: https://issues.apache.org/jira/browse/FLINK-35483
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> testRecoverFromJMFailover
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59919&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9476



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35399) Add documents for batch job master failure recovery

2024-05-30 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850739#comment-17850739
 ] 

Zhu Zhu commented on FLINK-35399:
-

56cd9607713d0da874dcc54c4cf6d5b3b52b1050 refined the doc a bit.

> Add documents for batch job master failure recovery
> ---
>
> Key: FLINK-35399
> URL: https://issues.apache.org/jira/browse/FLINK-35399
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Zhu Zhu
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32384) Remove deprecated configuration keys which violate YAML spec

2024-05-30 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850734#comment-17850734
 ] 

Zhu Zhu commented on FLINK-32384:
-

Thanks for volunteering to contribute to Flink. [~kartikeypant]
However, this is a breaking change. Therefore, we cannot do it until Flink 1.20 
is released and release cycle of Flink 2.0 is started.
You are welcome to take this task if you are free at that moment.
Before that, you can take a look at other tickets.

> Remove deprecated configuration keys which violate YAML spec
> 
>
> Key: FLINK-32384
> URL: https://issues.apache.org/jira/browse/FLINK-32384
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zhu Zhu
>Priority: Major
>  Labels: 2.0-related
> Fix For: 2.0.0
>
>
> In FLINK-29372, key that violate YAML spec are renamed to a valid form and 
> the old names are deprecated.
> In Flink 2.0 we should remove these deprecated keys. This will prevent users 
> (unintentionally) to create invalid YAML form flink-conf.yaml.
> Then with the work of FLINK-23620,  we can remove the non-standard YAML 
> parsing logic and enforce standard YAML validation in CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33892) FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-05-29 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33892.
---
Fix Version/s: 1.20.0
   Resolution: Done

> FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs
> -
>
> Key: FLINK-33892
> URL: https://issues.apache.org/jira/browse/FLINK-33892
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> This is the umbrella ticket for 
> [FLIP-383|https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+for+Batch+Jobs]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33892) FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-05-29 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850582#comment-17850582
 ] 

Zhu Zhu commented on FLINK-33892:
-

The feature development is done except for some follow-up tasks.
Would you add some release notes to this ticket and close it? [~JunRuiLi]

> FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs
> -
>
> Key: FLINK-33892
> URL: https://issues.apache.org/jira/browse/FLINK-33892
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> This is the umbrella ticket for 
> [FLIP-383|https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+for+Batch+Jobs]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35399) Add documents for batch job master failure recovery

2024-05-29 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35399.
---
Fix Version/s: 1.20.0
   Resolution: Done

aabfd6e2eaa37d554632129c959a309f85d528c5

> Add documents for batch job master failure recovery
> ---
>
> Key: FLINK-35399
> URL: https://issues.apache.org/jira/browse/FLINK-35399
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Zhu Zhu
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35465) Introduce BatchJobRecoveryHandler for recovery of batch jobs from JobMaster failures.

2024-05-27 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35465.
---
Resolution: Done

master:
b8173eb662ee5823de40de356869d0064de2c22a
3206659db5b7c4ce645072f11f091e0e9e92b0ce
e964af392476e011147be73ae4dab8ff89512994

> Introduce BatchJobRecoveryHandler for recovery of batch jobs from JobMaster 
> failures.
> -
>
> Key: FLINK-35465
> URL: https://issues.apache.org/jira/browse/FLINK-35465
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35465) Introduce BatchJobRecoveryHandler for recovery of batch jobs from JobMaster failures.

2024-05-27 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-35465:

Fix Version/s: 1.20.0

> Introduce BatchJobRecoveryHandler for recovery of batch jobs from JobMaster 
> failures.
> -
>
> Key: FLINK-35465
> URL: https://issues.apache.org/jira/browse/FLINK-35465
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35465) Introduce BatchJobRecoveryHandler for recovery of batch jobs from JobMaster failures.

2024-05-27 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35465:
---

Assignee: Junrui Li

> Introduce BatchJobRecoveryHandler for recovery of batch jobs from JobMaster 
> failures.
> -
>
> Key: FLINK-35465
> URL: https://issues.apache.org/jira/browse/FLINK-35465
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35426) Change the distribution of DynamicFilteringDataCollector to Broadcast

2024-05-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35426.
---
Resolution: Done

master: 4b342da6d149113dde821b370a136beef3430fff

> Change the distribution of DynamicFilteringDataCollector to Broadcast
> -
>
> Key: FLINK-35426
> URL: https://issues.apache.org/jira/browse/FLINK-35426
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Currently, the DynamicFilteringDataCollector is utilized in the dynamic 
> partition pruning feature of batch jobs to collect the partition information 
> dynamically filtered by the source. Its current data distribution method is 
> rebalance, and it also acts as an upstream vertex to the probe side Source.
> Presently, when the Scheduler dynamically infers the parallelism for vertices 
> that are both downstream and Source, it considers factors from both sides, 
> which can lead to an overestimation of parallelism due to 
> DynamicFilteringDataCollector being an upstream of the Source. We aim to 
> change the distribution method of the DynamicFilteringDataCollector to 
> broadcast to prevent the dynamic overestimation of Source parallelism.
> Furthermore, given that the DynamicFilteringDataCollector transmits data 
> through the OperatorCoordinator rather than through normal data distribution, 
> this change will not affect the DPP (Dynamic Partition Pruning) functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35426) Change the distribution of DynamicFilteringDataCollector to Broadcast

2024-05-22 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848801#comment-17848801
 ] 

Zhu Zhu commented on FLINK-35426:
-

Good point! [~xiasun]
The task is assigned to you. Feel free to open a pr for it.

> Change the distribution of DynamicFilteringDataCollector to Broadcast
> -
>
> Key: FLINK-35426
> URL: https://issues.apache.org/jira/browse/FLINK-35426
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
> Fix For: 1.20.0
>
>
> Currently, the DynamicFilteringDataCollector is utilized in the dynamic 
> partition pruning feature of batch jobs to collect the partition information 
> dynamically filtered by the source. Its current data distribution method is 
> rebalance, and it also acts as an upstream vertex to the probe side Source.
> Presently, when the Scheduler dynamically infers the parallelism for vertices 
> that are both downstream and Source, it considers factors from both sides, 
> which can lead to an overestimation of parallelism due to 
> DynamicFilteringDataCollector being an upstream of the Source. We aim to 
> change the distribution method of the DynamicFilteringDataCollector to 
> broadcast to prevent the dynamic overestimation of Source parallelism.
> Furthermore, given that the DynamicFilteringDataCollector transmits data 
> through the OperatorCoordinator rather than through normal data distribution, 
> this change will not affect the DPP (Dynamic Partition Pruning) functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35426) Change the distribution of DynamicFilteringDataCollector to Broadcast

2024-05-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35426:
---

Assignee: xingbe

> Change the distribution of DynamicFilteringDataCollector to Broadcast
> -
>
> Key: FLINK-35426
> URL: https://issues.apache.org/jira/browse/FLINK-35426
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
> Fix For: 1.20.0
>
>
> Currently, the DynamicFilteringDataCollector is utilized in the dynamic 
> partition pruning feature of batch jobs to collect the partition information 
> dynamically filtered by the source. Its current data distribution method is 
> rebalance, and it also acts as an upstream vertex to the probe side Source.
> Presently, when the Scheduler dynamically infers the parallelism for vertices 
> that are both downstream and Source, it considers factors from both sides, 
> which can lead to an overestimation of parallelism due to 
> DynamicFilteringDataCollector being an upstream of the Source. We aim to 
> change the distribution method of the DynamicFilteringDataCollector to 
> broadcast to prevent the dynamic overestimation of Source parallelism.
> Furthermore, given that the DynamicFilteringDataCollector transmits data 
> through the OperatorCoordinator rather than through normal data distribution, 
> this change will not affect the DPP (Dynamic Partition Pruning) functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35384) Expose TaskIOMetricGroup to custom Partitioner via init Context

2024-05-21 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848126#comment-17848126
 ] 

Zhu Zhu commented on FLINK-35384:
-

Enabling metrics for partitioners makes sense and the proposed approach of 
introducing a context sounds good.

How about introducing a sub-metric group specifically for partitioner metrics? 
A single task might contain multiple partitioners for which the metrics should 
not get mixed. It also avoids exposing the internal TaskIOMetricGroup class to 
users.

> Expose TaskIOMetricGroup to custom Partitioner via init Context
> ---
>
> Key: FLINK-35384
> URL: https://issues.apache.org/jira/browse/FLINK-35384
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Affects Versions: 1.9.4
>Reporter: Steven Zhen Wu
>Priority: Major
>
> I am trying to implement a custom range partitioner in the Flink Iceberg 
> sink. Want to publish some counter metrics for certain scenarios. This is 
> like the network metrics exposed in `TaskIOMetricGroup`.
> We can implement the range partitioner using the public interface from 
> `DataStream`. 
> {code}
> public  DataStream partitionCustom(
> Partitioner partitioner, KeySelector keySelector)
> {code}
> We can pass the `TaskIOMetricGroup` to the `StreamPartitioner` that 
> `CustomPartitionerWrapper` extends from. `CustomPartitionerWrapper` wraps the 
> pubic `Partitioner` interface, where we can implement the custom range 
> partitioner.
> `Partitioner` interface is a functional interface today. we can add a new 
> default `setup` method without breaking the backward compatibility.
> {code}
> @Public
> @FunctionalInterface
> public interface Partitioner extends java.io.Serializable, Function {
> *default void setup(TaskIOMetricGroup metrics) {}*
> int partition(K key, int numPartitions);
> }
> {code}
> I know public interface requires a FLIP process. will do that if the 
> community agree with this feature request.
> Personally, `numPartitions` should be passed in the `setup` method too. But 
> it is a breaking change that is NOT worth the benefit right now.
> {code}
> @Public
> @FunctionalInterface
> public interface Partitioner extends java.io.Serializable, Function {
> public void setup(int numPartitions, TaskIOMetricGroup metrics) {}
> int partition(K key);
> }
> {code}
> That would be similar to `StreamPartitioner#setup()` method that we would 
> need to modify for passing the metrics group.
> {code}
> @Internal
> public abstract class StreamPartitioner
> implements ChannelSelector>>, 
> Serializable {
> @Override
> public void setup(int numberOfChannels, TaskIOMetricGroup metrics) {
> this.numberOfChannels = numberOfChannels;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35399) Add documents for batch job master failure recovery

2024-05-19 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-35399:
---

 Summary: Add documents for batch job master failure recovery
 Key: FLINK-35399
 URL: https://issues.apache.org/jira/browse/FLINK-35399
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Zhu Zhu
Assignee: Junrui Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33983) Introduce JobEvent and JobEventStore for Batch Job Recovery

2024-05-19 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33983.
---
Fix Version/s: 1.20.0
   Resolution: Done

master: c7c1d78752836b96591e31422c65b85eca38bd50

> Introduce JobEvent and JobEventStore for Batch Job Recovery
> ---
>
> Key: FLINK-33983
> URL: https://issues.apache.org/jira/browse/FLINK-33983
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33986) Extend shuffleMaster to support batch snapshot.

2024-05-14 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33986:

Component/s: Runtime / Coordination

> Extend shuffleMaster to support batch snapshot.
> ---
>
> Key: FLINK-33986
> URL: https://issues.apache.org/jira/browse/FLINK-33986
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Extend shuffleMaster to support batch snapshot as follows:
>  # Add method supportsBatchSnapshot to identify whether the shuffle master 
> supports taking snapshot in batch scenarios
>  # Add method snapshotState and restoreState to snapshot and restore the 
> shuffle master's state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33986) Extend shuffleMaster to support batch snapshot.

2024-05-14 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33986.
---
Fix Version/s: 1.20.0
   Resolution: Done

65d31e26534836909f6b8139c6bd6cd45b91bba4

> Extend shuffleMaster to support batch snapshot.
> ---
>
> Key: FLINK-33986
> URL: https://issues.apache.org/jira/browse/FLINK-33986
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Extend shuffleMaster to support batch snapshot as follows:
>  # Add method supportsBatchSnapshot to identify whether the shuffle master 
> supports taking snapshot in batch scenarios
>  # Add method snapshotState and restoreState to snapshot and restore the 
> shuffle master's state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35293) FLIP-445: Support dynamic parallelism inference for HiveSource

2024-05-13 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846177#comment-17846177
 ] 

Zhu Zhu commented on FLINK-35293:
-

The change is merged. Could you add release notes for it and close the ticket? 
[~xiasun].

> FLIP-445: Support dynamic parallelism inference for HiveSource
> --
>
> Key: FLINK-35293
> URL: https://issues.apache.org/jira/browse/FLINK-35293
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  introduces dynamic source parallelism inference, which, compared to static 
> inference, utilizes runtime information to more accurately determine the 
> source parallelism. The FileSource already possesses the capability for 
> dynamic parallelism inference. As a follow-up task to FLIP-379, this FLIP 
> plans to implement the dynamic parallelism inference interface for 
> HiveSource, and also switches the default static parallelism inference to 
> dynamic parallelism inference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35293) FLIP-445: Support dynamic parallelism inference for HiveSource

2024-05-13 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846137#comment-17846137
 ] 

Zhu Zhu commented on FLINK-35293:
-

master: ddb5a5355f9aca3d223f1fff6581d83dd317c2de

> FLIP-445: Support dynamic parallelism inference for HiveSource
> --
>
> Key: FLINK-35293
> URL: https://issues.apache.org/jira/browse/FLINK-35293
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  introduces dynamic source parallelism inference, which, compared to static 
> inference, utilizes runtime information to more accurately determine the 
> source parallelism. The FileSource already possesses the capability for 
> dynamic parallelism inference. As a follow-up task to FLIP-379, this FLIP 
> plans to implement the dynamic parallelism inference interface for 
> HiveSource, and also switches the default static parallelism inference to 
> dynamic parallelism inference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34661) TaskExecutor supports retain partitions after JM crashed.

2024-05-13 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34661.
---
Fix Version/s: 1.20.0
   Resolution: Done

4e6b42046adbe2f337460d2e50f1fee12cff21a5

> TaskExecutor supports retain partitions after JM crashed.
> -
>
> Key: FLINK-34661
> URL: https://issues.apache.org/jira/browse/FLINK-34661
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35270.
---
Resolution: Fixed

547e4b53ebe36c39066adcf3a98123a1f7890c15

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Haifei Chen
>Assignee: Haifei Chen
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35270:
---

Assignee: Haifei Chen

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Haifei Chen
>Assignee: Haifei Chen
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-35270:

Component/s: API / Core

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Haifei Chen
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-35270:

Labels: pull-request-available starter  (was: pull-request-available)

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>Reporter: Haifei Chen
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-35270:

Fix Version/s: 1.20.0

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>Reporter: Haifei Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35293) FLIP-445: Support dynamic parallelism inference for HiveSource

2024-05-06 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35293:
---

Assignee: xingbe

> FLIP-445: Support dynamic parallelism inference for HiveSource
> --
>
> Key: FLINK-35293
> URL: https://issues.apache.org/jira/browse/FLINK-35293
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  introduces dynamic source parallelism inference, which, compared to static 
> inference, utilizes runtime information to more accurately determine the 
> source parallelism. The FileSource already possesses the capability for 
> dynamic parallelism inference. As a follow-up task to FLIP-379, this FLIP 
> plans to implement the dynamic parallelism inference interface for 
> HiveSource, and also switches the default static parallelism inference to 
> dynamic parallelism inference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35009) Change on getTransitivePredecessors breaks connectors

2024-04-09 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835577#comment-17835577
 ] 

Zhu Zhu commented on FLINK-35009:
-

[~martijnvisser] Thanks for monitoring the kafka connector builds and reporting 
this problem!
Feel free to loop me in to review the pr.

> Change on getTransitivePredecessors breaks connectors
> -
>
> Key: FLINK-35009
> URL: https://issues.apache.org/jira/browse/FLINK-35009
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / Kafka
>Affects Versions: 1.18.2, 1.20.0, 1.19.1
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Blocker
>
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-kafka: Compilation failure: 
> Compilation failure: 
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[214,24]
>  
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  is not abstract and does not override abstract method 
> getTransitivePredecessorsInternal() in org.apache.flink.api.dag.Transformation
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[220,44]
>  getTransitivePredecessors() in 
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  cannot override getTransitivePredecessors() in 
> org.apache.flink.api.dag.Transformation
> Error:overridden method is final
> {code}
> Example: 
> https://github.com/apache/flink-connector-kafka/actions/runs/8494349338/job/23269406762#step:15:167



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35009) Change on getTransitivePredecessors breaks connectors

2024-04-07 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834799#comment-17834799
 ] 

Zhu Zhu commented on FLINK-35009:
-

Thanks for looking into the issue. [~Weijie Guo]
So it is just a test class which uses Flink `@Internal` classes is broken. And 
that test class is even not used. 
I think it's better to just remove `MockTransformation` from kafka connector.
WDYT? [~martijnvisser]

> Change on getTransitivePredecessors breaks connectors
> -
>
> Key: FLINK-35009
> URL: https://issues.apache.org/jira/browse/FLINK-35009
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / Kafka
>Affects Versions: 1.18.2, 1.20.0, 1.19.1
>Reporter: Martijn Visser
>Priority: Blocker
>
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-kafka: Compilation failure: 
> Compilation failure: 
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[214,24]
>  
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  is not abstract and does not override abstract method 
> getTransitivePredecessorsInternal() in org.apache.flink.api.dag.Transformation
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[220,44]
>  getTransitivePredecessors() in 
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  cannot override getTransitivePredecessors() in 
> org.apache.flink.api.dag.Transformation
> Error:overridden method is final
> {code}
> Example: 
> https://github.com/apache/flink-connector-kafka/actions/runs/8494349338/job/23269406762#step:15:167



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-04-07 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834798#comment-17834798
 ] 

Zhu Zhu edited comment on FLINK-32513 at 4/8/24 6:27 AM:
-

{{Transformation}} is not a public interface, it is an @Internal class. 
Ideally, kafka connectors should not directly manipulate {{Transformation}}.
Yet we may try to find a workaround to avoid break existing kafka connectors.


was (Author: zhuzh):
{{Transformation}} is not a public interface, it is an @Internal class. 
Ideally, kafka connectors should not directly manipulate {{Transformation}}.
Production-wise, we may take a look whether there is a workaround to avoid 
break existing kafka connectors.

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation

[jira] [Commented] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-04-07 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834798#comment-17834798
 ] 

Zhu Zhu commented on FLINK-32513:
-

{{Transformation}} is not a public interface, it is an @Internal class. 
Ideally, kafka connectors should not directly manipulate {{Transformation}}.
Production-wise, we may take a look whether there is a workaround to avoid 
break existing kafka connectors.

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>

[jira] [Assigned] (FLINK-34661) TaskExecutor supports retain partitions after JM crashed.

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34661:
---

Assignee: Junrui Li

> TaskExecutor supports retain partitions after JM crashed.
> -
>
> Key: FLINK-34661
> URL: https://issues.apache.org/jira/browse/FLINK-34661
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33983) Introduce JobEvent and JobEventStore for Batch Job Recovery

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33983:
---

Assignee: Junrui Li

> Introduce JobEvent and JobEventStore for Batch Job Recovery
> ---
>
> Key: FLINK-33983
> URL: https://issues.apache.org/jira/browse/FLINK-33983
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33986) Extend shuffleMaster to support batch snapshot.

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33986:
---

Assignee: Junrui Li

> Extend shuffleMaster to support batch snapshot.
> ---
>
> Key: FLINK-33986
> URL: https://issues.apache.org/jira/browse/FLINK-33986
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>
> Extend shuffleMaster to support batch snapshot as follows:
>  # Add method supportsBatchSnapshot to identify whether the shuffle master 
> supports taking snapshot in batch scenarios
>  # Add method snapshotState and restoreState to snapshot and restore the 
> shuffle master's state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33984) Introduce SupportsBatchSnapshot for operator coordinator

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33984.
---
Fix Version/s: 1.20.0
   Resolution: Done

master:
38255652406becbfbcb7cbec557aa5ba9a1ebbb3
558ca75da2fcec875d1e04a8d75a24fd0ad42ccc

> Introduce SupportsBatchSnapshot for operator coordinator
> 
>
> Key: FLINK-33984
> URL: https://issues.apache.org/jira/browse/FLINK-33984
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34945) Support recover shuffle descriptor and partition metrics from tiered storage

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34945:
---

Assignee: Junrui Li

> Support recover shuffle descriptor and partition metrics from tiered storage
> 
>
> Key: FLINK-34945
> URL: https://issues.apache.org/jira/browse/FLINK-34945
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33985) Support obtain all partitions existing in cluster through ShuffleMaster.

2024-04-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33985.
---
Fix Version/s: 1.20.0
   Resolution: Done

master: a44709662956b306fe686623d00358a6b076f637

> Support obtain all partitions existing in cluster through ShuffleMaster.
> 
>
> Key: FLINK-33985
> URL: https://issues.apache.org/jira/browse/FLINK-33985
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Support obtain all partitions existing in cluster through ShuffleMaster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33985) Support obtain all partitions existing in cluster through ShuffleMaster.

2024-04-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33985:

Component/s: Runtime / Coordination

> Support obtain all partitions existing in cluster through ShuffleMaster.
> 
>
> Key: FLINK-33985
> URL: https://issues.apache.org/jira/browse/FLINK-33985
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Support obtain all partitions existing in cluster through ShuffleMaster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33982) Introduce new config options for Batch Job Recovery

2024-04-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33982.
---
Fix Version/s: 1.20.0
   Resolution: Done

master: ec1311c8eb805f91b3b8d7d7cbe192e8cad05a76

> Introduce new config options for Batch Job Recovery
> ---
>
> Key: FLINK-33982
> URL: https://issues.apache.org/jira/browse/FLINK-33982
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34565) Enhance flink kubernetes configMap to accommodate additional configuration files

2024-03-27 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831336#comment-17831336
 ] 

Zhu Zhu commented on FLINK-34565:
-

IIUC, the requirement is to ship more user files, which may be needed by user 
code, to the pod. Supporting configuration files is just a special case of it. 
Shipping them via ConfigMap sounds a bit tricky to me.
cc [~wangyang0918]

> Enhance flink kubernetes configMap to accommodate additional configuration 
> files
> 
>
> Key: FLINK-34565
> URL: https://issues.apache.org/jira/browse/FLINK-34565
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Reporter: Surendra Singh Lilhore
>Priority: Major
>  Labels: pull-request-available
>
> Flink kubernetes client currently supports a fixed number of files 
> (flink-conf.yaml, logback-console.xml, log4j-console.properties) in the JM 
> and TM Pod ConfigMap. In certain scenarios, particularly in app mode, 
> additional configuration files are required for jobs to run successfully. 
> Presently, users must resort to workarounds to include dynamic configuration 
> files in the JM and TM. This proposed improvement allows users to easily add 
> extra files by configuring the 
> '{*}kubernetes.flink.configmap.additional.resources{*}' property. Users can 
> provide a semicolon-separated list of local files in the client Flink config 
> directory that should be included in the Flink ConfigMap.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33984) Introduce SupportsBatchSnapshot for operator coordinator

2024-03-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33984:
---

Assignee: Junrui Li

> Introduce SupportsBatchSnapshot for operator coordinator
> 
>
> Key: FLINK-33984
> URL: https://issues.apache.org/jira/browse/FLINK-33984
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33985) Support obtain all partitions existing in cluster through ShuffleMaster.

2024-03-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33985:
---

Assignee: Junrui Li

> Support obtain all partitions existing in cluster through ShuffleMaster.
> 
>
> Key: FLINK-33985
> URL: https://issues.apache.org/jira/browse/FLINK-33985
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Support obtain all partitions existing in cluster through ShuffleMaster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33982) Introduce new config options for Batch Job Recovery

2024-03-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33982:
---

Assignee: Junrui Li

> Introduce new config options for Batch Job Recovery
> ---
>
> Key: FLINK-33982
> URL: https://issues.apache.org/jira/browse/FLINK-33982
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33892) FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-03-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33892:

Summary: FLIP-383: Support Job Recovery from JobMaster Failures for Batch 
Jobs  (was: FLIP-383: Support Job Recovery for Batch Jobs)

> FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs
> -
>
> Key: FLINK-33892
> URL: https://issues.apache.org/jira/browse/FLINK-33892
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Junrui Li
>Priority: Major
>
> This is the umbrella ticket for 
> [FLIP-383|https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+for+Batch+Jobs]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-03-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-32513.
---
Fix Version/s: 1.18.2
   1.20.0
   1.19.1
   Resolution: Fixed

master: 8dcb0ae9063b66af1d674b7b0b3be76b6d752692
release-1.19: 5ec4bf2f18168001b5cbb9012f331d3405228516
release-1.18: 940b3bbda5b10abe3a41d60467d33fd424c7dae6

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformati

[jira] [Closed] (FLINK-34731) Remove SpeculativeScheduler and incorporate its features into AdaptiveBatchScheduler

2024-03-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34731.
---
Resolution: Done

master: cf0d75c4bb324825a057dc72243bb6a2046f8479

> Remove SpeculativeScheduler and incorporate its features into 
> AdaptiveBatchScheduler
> 
>
> Key: FLINK-34731
> URL: https://issues.apache.org/jira/browse/FLINK-34731
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Presently, speculative execution is exposed to users as a feature of the 
> AdaptiveBatchScheduler.
> To streamline our codebase and reduce maintenance overhead, this ticket will 
> consolidate the SpeculativeScheduler into the AdaptiveBatchScheduler, 
> eliminating the need for a separate SpeculativeScheduler class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-03-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-32513:
---

Assignee: Jeyhun Karimov

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api

[jira] [Closed] (FLINK-34725) Dockerfiles for release publishing has incorrect config.yaml path

2024-03-20 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34725.
---
Resolution: Fixed

master: 3f4a80989fe7243983926f09fac2283f6fa63693
release-1.19: f53c5628e43777b4b924ec81224acc3df938800a

> Dockerfiles for release publishing has incorrect config.yaml path
> -
>
> Key: FLINK-34725
> URL: https://issues.apache.org/jira/browse/FLINK-34725
> Project: Flink
>  Issue Type: Bug
>  Components: flink-docker
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0, 1.19.1
>
>
> An issue found when do docker image publishing, unexpected error msg:
> {code:java}
> sed: can't read /config.yaml: No such file or directory{code}
>  
> also found in flink-docker/master daily Publish SNAPSHOTs  action:
> [https://github.com/apache/flink-docker/actions/runs/8210534289/job/22458150514#step:8:588]
> [https://github.com/apache/flink-docker/actions/runs/8210534289/job/22458150322#step:8:549]
>  
> This related to changes by https://issues.apache.org/jira/browse/FLINK-34205



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34731) Remove SpeculativeScheduler and incorporate its features into AdaptiveBatchScheduler

2024-03-20 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34731:
---

Assignee: Junrui Li

> Remove SpeculativeScheduler and incorporate its features into 
> AdaptiveBatchScheduler
> 
>
> Key: FLINK-34731
> URL: https://issues.apache.org/jira/browse/FLINK-34731
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
> Fix For: 1.20.0
>
>
> Presently, speculative execution is exposed to users as a feature of the 
> AdaptiveBatchScheduler.
> To streamline our codebase and reduce maintenance overhead, this ticket will 
> consolidate the SpeculativeScheduler into the AdaptiveBatchScheduler, 
> eliminating the need for a separate SpeculativeScheduler class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-03-06 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34105.
---
Fix Version/s: 1.19.0
 Assignee: dizhou cao  (was: Yangze Guo)
   Resolution: Fixed

1.19: 837f8e584850bdcbc586a54f58e3fe953a816be8

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: dizhou cao
>Priority: Critical
> Fix For: 1.19.0
>
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34377) Release Testing: Verify FLINK-33297 Support standard YAML for FLINK configuration

2024-02-25 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820553#comment-17820553
 ] 

Zhu Zhu commented on FLINK-34377:
-

Thanks for volunteering! [~xiasun]
I have assigned you the ticket.

> Release Testing: Verify FLINK-33297 Support standard YAML for FLINK 
> configuration
> -
>
> Key: FLINK-34377
> URL: https://issues.apache.org/jira/browse/FLINK-34377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Junrui Li
>Assignee: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-366.
> Starting with version 1.19, Flink has officially introduced full support for 
> the standard YAML 1.2 syntax. For detailed information, please refer to the 
> Flink 
> website:https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#flink-configuration-file
> We may need to cover the following two types of test cases:
> Test 1: For newly created jobs, utilize a config.yaml file to set up the 
> Flink cluster. We need to verify that the job runs as expected with this new 
> configuration.
> Test 2: For existing jobs, migrate the legacy flink-conf.yaml to the new 
> config.yaml. Test the job runs just like before post-migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34377) Release Testing: Verify FLINK-33297 Support standard YAML for FLINK configuration

2024-02-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34377:
---

Assignee: xingbe

> Release Testing: Verify FLINK-33297 Support standard YAML for FLINK 
> configuration
> -
>
> Key: FLINK-34377
> URL: https://issues.apache.org/jira/browse/FLINK-34377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Junrui Li
>Assignee: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-366.
> Starting with version 1.19, Flink has officially introduced full support for 
> the standard YAML 1.2 syntax. For detailed information, please refer to the 
> Flink 
> website:https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#flink-configuration-file
> We may need to cover the following two types of test cases:
> Test 1: For newly created jobs, utilize a config.yaml file to set up the 
> Flink cluster. We need to verify that the job runs as expected with this new 
> configuration.
> Test 2: For existing jobs, migrate the legacy flink-conf.yaml to the new 
> config.yaml. Test the job runs just like before post-migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34383) Modify the comment with incorrect syntax

2024-02-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34383.
---
Resolution: Won't Fix

> Modify the comment with incorrect syntax
> 
>
> Key: FLINK-34383
> URL: https://issues.apache.org/jira/browse/FLINK-34383
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: li you
>Priority: Major
>  Labels: pull-request-available
>
> There is an error in the syntax of the comment for the class 
> PermanentBlobCache



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34356) Release Testing: Verify FLINK-33768 Support dynamic source parallelism inference for batch jobs

2024-02-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34356:
---

Assignee: Junrui Li

> Release Testing: Verify FLINK-33768  Support dynamic source parallelism 
> inference for batch jobs 
> -
>
> Key: FLINK-34356
> URL: https://issues.apache.org/jira/browse/FLINK-34356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-379.
> New Source can implement the interface DynamicParallelismInference to enable 
> dynamic parallelism inference. For detailed information, please refer to the 
> documentation.
> We may need to cover the following two types of test cases:
> Test 1: FileSource has implemented the dynamic source parallelism inference. 
> Test the automatic parallelism inference of FileSource.
> Test 2: Test the dynamic source parallelism inference of a custom Source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34356) Release Testing: Verify FLINK-33768 Support dynamic source parallelism inference for batch jobs

2024-02-21 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819142#comment-17819142
 ] 

Zhu Zhu commented on FLINK-34356:
-

Assigned. Thanks for volunteering! [~JunRuiLi]


> Release Testing: Verify FLINK-33768  Support dynamic source parallelism 
> inference for batch jobs 
> -
>
> Key: FLINK-34356
> URL: https://issues.apache.org/jira/browse/FLINK-34356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-379.
> New Source can implement the interface DynamicParallelismInference to enable 
> dynamic parallelism inference. For detailed information, please refer to the 
> documentation.
> We may need to cover the following two types of test cases:
> Test 1: FileSource has implemented the dynamic source parallelism inference. 
> Test the automatic parallelism inference of FileSource.
> Test 2: Test the dynamic source parallelism inference of a custom Source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33241) Align config option generation documentation for Flink's config documentation

2024-02-20 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33241.
---
Fix Version/s: 1.19.0
   Resolution: Fixed

Fixed via 8fe005cf1325b7477d7e1808a46bd80798165029

> Align config option generation documentation for Flink's config documentation
> -
>
> Key: FLINK-33241
> URL: https://issues.apache.org/jira/browse/FLINK-33241
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.19.0
>
>
> The configuration parameter docs generation is documented in two places in 
> different ways:
> [docs/README.md:62|https://github.com/apache/flink/blob/5c1e9f3b1449cb77276d578b344d9a69c7cf9a3c/docs/README.md#L62]
>  and 
> [flink-docs/README.md:44|https://github.com/apache/flink/blob/7bebd2d9fac517c28afc24c0c034d77cfe2b43a6/flink-docs/README.md#L44].
> We should remove the corresponding command from {{docs/README.md}} and refer 
> to {{flink-docs/README.md}} for the documentation. That way, we only have to 
> maintain a single file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34247) Document FLIP-366: Support standard YAML for FLINK configuration

2024-02-04 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34247.
---
  Assignee: Junrui Li
Resolution: Done

master/release-1.19:
5b61baadd02ccdfa702834e2e63aeb8d1d9e1250
04dd91f2b6c830b9ac0e445f72938e3d6f479edd
e9bea09510e18c6143e6e14ca17a894abfaf92bf

> Document FLIP-366: Support standard YAML for FLINK configuration
> 
>
> Key: FLINK-34247
> URL: https://issues.apache.org/jira/browse/FLINK-34247
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (FLINK-33768) FLIP-379: Support dynamic source parallelism inference for batch jobs

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reopened FLINK-33768:
-

> FLIP-379: Support dynamic source parallelism inference for batch jobs
> -
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34145) File source connector support dynamic source parallelism inference in batch jobs

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34145:

Parent: FLINK-33768
Issue Type: Sub-task  (was: Improvement)

> File source connector support dynamic source parallelism inference in batch 
> jobs
> 
>
> Key: FLINK-34145
> URL: https://issues.apache.org/jira/browse/FLINK-34145
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / FileSystem
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  has introduced support for dynamic source parallelism inference in batch 
> jobs, and we plan to give priority to enabling this feature for the file 
> source connector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34144) Update the documentation and configuration description about dynamic source parallelism inference

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34144:

Parent: FLINK-33768
Issue Type: Sub-task  (was: Improvement)

> Update the documentation and configuration description about dynamic source 
> parallelism inference
> -
>
> Key: FLINK-34144
> URL: https://issues.apache.org/jira/browse/FLINK-34144
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource]
>  introduces the new feature of dynamic source parallelism inference, and we 
> plan to update the documentation and configuration items accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34143) Modify the effective strategy of `execution.batch.adaptive.auto-parallelism.default-source-parallelism`

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34143:

Parent: FLINK-33768
Issue Type: Sub-task  (was: Improvement)

> Modify the effective strategy of 
> `execution.batch.adaptive.auto-parallelism.default-source-parallelism`
> ---
>
> Key: FLINK-34143
> URL: https://issues.apache.org/jira/browse/FLINK-34143
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, if users do not set the 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  configuration option, the AdaptiveBatchScheduler defaults to a parallelism 
> of 1 for source vertices. In 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource],
>  the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  will act as the upper bound for inferring dynamic source parallelism, and 
> continuing with the current policy is no longer appropriate.
> We plan to change the effectiveness strategy of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}};
>  when the user does not set this config option, we will use the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.max-parallelism`{}}} as the 
> upper bound for source parallelism inference. If 
> {{`execution.batch.adaptive.auto-parallelism.max-parallelism`}} is also not 
> configured, the value of `{{{}parallelism.default`{}}} will be used as a 
> fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33768) FLIP-379: Support dynamic source parallelism inference for batch jobs

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33768:

Summary: FLIP-379: Support dynamic source parallelism inference for batch 
jobs  (was: [FLIP-379] Support dynamic source parallelism inference for batch 
jobs)

> FLIP-379: Support dynamic source parallelism inference for batch jobs
> -
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33768) [FLIP-379] Support dynamic source parallelism inference for batch jobs

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33768:

Summary: [FLIP-379] Support dynamic source parallelism inference for batch 
jobs  (was: Support dynamic source parallelism inference for batch jobs)

> [FLIP-379] Support dynamic source parallelism inference for batch jobs
> --
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34132) Batch WordCount job fails when run with AdaptiveBatch scheduler

2024-02-01 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813095#comment-17813095
 ] 

Zhu Zhu commented on FLINK-34132:
-

We should also migrate the existing batch examples from DataSet to DataStream 
so that it can directly work with AdaptiveBatchScheduler. 
This work needs to be done before removing the DataSet API in Flink 2.0.
cc [~Wencong Liu]

> Batch WordCount job fails when run with AdaptiveBatch scheduler
> ---
>
> Key: FLINK-34132
> URL: https://issues.apache.org/jira/browse/FLINK-34132
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.17.1, 1.18.1
>Reporter: Prabhu Joseph
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Batch WordCount job fails when run with AdaptiveBatch scheduler.
> *Repro Steps*
> {code:java}
> flink-yarn-session -Djobmanager.scheduler=adaptive -d
>  flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input 
> s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT
> {code}
> *Error logs*
> {code:java}
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1067)
>   at 
> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:144)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:73)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>   ... 12 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1062)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)

[jira] [Updated] (FLINK-34132) Batch WordCount job fails when run with AdaptiveBatch scheduler

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34132:

Component/s: Documentation

> Batch WordCount job fails when run with AdaptiveBatch scheduler
> ---
>
> Key: FLINK-34132
> URL: https://issues.apache.org/jira/browse/FLINK-34132
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.17.1, 1.18.1
>Reporter: Prabhu Joseph
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Batch WordCount job fails when run with AdaptiveBatch scheduler.
> *Repro Steps*
> {code:java}
> flink-yarn-session -Djobmanager.scheduler=adaptive -d
>  flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input 
> s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT
> {code}
> *Error logs*
> {code:java}
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1067)
>   at 
> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:144)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:73)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>   ... 12 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1062)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457)
>   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
>   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:

  1   2   3   4   5   6   7   8   9   10   >