[jira] [Commented] (FLINK-35009) Change on getTransitivePredecessors breaks connectors

2024-04-09 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835577#comment-17835577
 ] 

Zhu Zhu commented on FLINK-35009:
-

[~martijnvisser] Thanks for monitoring the kafka connector builds and reporting 
this problem!
Feel free to loop me in to review the pr.

> Change on getTransitivePredecessors breaks connectors
> -
>
> Key: FLINK-35009
> URL: https://issues.apache.org/jira/browse/FLINK-35009
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / Kafka
>Affects Versions: 1.18.2, 1.20.0, 1.19.1
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Blocker
>
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-kafka: Compilation failure: 
> Compilation failure: 
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[214,24]
>  
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  is not abstract and does not override abstract method 
> getTransitivePredecessorsInternal() in org.apache.flink.api.dag.Transformation
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[220,44]
>  getTransitivePredecessors() in 
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  cannot override getTransitivePredecessors() in 
> org.apache.flink.api.dag.Transformation
> Error:overridden method is final
> {code}
> Example: 
> https://github.com/apache/flink-connector-kafka/actions/runs/8494349338/job/23269406762#step:15:167



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35009) Change on getTransitivePredecessors breaks connectors

2024-04-08 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834799#comment-17834799
 ] 

Zhu Zhu commented on FLINK-35009:
-

Thanks for looking into the issue. [~Weijie Guo]
So it is just a test class which uses Flink `@Internal` classes is broken. And 
that test class is even not used. 
I think it's better to just remove `MockTransformation` from kafka connector.
WDYT? [~martijnvisser]

> Change on getTransitivePredecessors breaks connectors
> -
>
> Key: FLINK-35009
> URL: https://issues.apache.org/jira/browse/FLINK-35009
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / Kafka
>Affects Versions: 1.18.2, 1.20.0, 1.19.1
>Reporter: Martijn Visser
>Priority: Blocker
>
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-kafka: Compilation failure: 
> Compilation failure: 
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[214,24]
>  
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  is not abstract and does not override abstract method 
> getTransitivePredecessorsInternal() in org.apache.flink.api.dag.Transformation
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[220,44]
>  getTransitivePredecessors() in 
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  cannot override getTransitivePredecessors() in 
> org.apache.flink.api.dag.Transformation
> Error:overridden method is final
> {code}
> Example: 
> https://github.com/apache/flink-connector-kafka/actions/runs/8494349338/job/23269406762#step:15:167



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-04-08 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834798#comment-17834798
 ] 

Zhu Zhu edited comment on FLINK-32513 at 4/8/24 6:27 AM:
-

{{Transformation}} is not a public interface, it is an @Internal class. 
Ideally, kafka connectors should not directly manipulate {{Transformation}}.
Yet we may try to find a workaround to avoid break existing kafka connectors.


was (Author: zhuzh):
{{Transformation}} is not a public interface, it is an @Internal class. 
Ideally, kafka connectors should not directly manipulate {{Transformation}}.
Production-wise, we may take a look whether there is a workaround to avoid 
break existing kafka connectors.

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>     

[jira] [Commented] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-04-08 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834798#comment-17834798
 ] 

Zhu Zhu commented on FLINK-32513:
-

{{Transformation}} is not a public interface, it is an @Internal class. 
Ideally, kafka connectors should not directly manipulate {{Transformation}}.
Production-wise, we may take a look whether there is a workaround to avoid 
break existing kafka connectors.

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> 

[jira] [Assigned] (FLINK-34661) TaskExecutor supports retain partitions after JM crashed.

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34661:
---

Assignee: Junrui Li

> TaskExecutor supports retain partitions after JM crashed.
> -
>
> Key: FLINK-34661
> URL: https://issues.apache.org/jira/browse/FLINK-34661
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33983) Introduce JobEvent and JobEventStore for Batch Job Recovery

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33983:
---

Assignee: Junrui Li

> Introduce JobEvent and JobEventStore for Batch Job Recovery
> ---
>
> Key: FLINK-33983
> URL: https://issues.apache.org/jira/browse/FLINK-33983
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33986) Extend shuffleMaster to support batch snapshot.

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33986:
---

Assignee: Junrui Li

> Extend shuffleMaster to support batch snapshot.
> ---
>
> Key: FLINK-33986
> URL: https://issues.apache.org/jira/browse/FLINK-33986
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>
> Extend shuffleMaster to support batch snapshot as follows:
>  # Add method supportsBatchSnapshot to identify whether the shuffle master 
> supports taking snapshot in batch scenarios
>  # Add method snapshotState and restoreState to snapshot and restore the 
> shuffle master's state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33984) Introduce SupportsBatchSnapshot for operator coordinator

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33984.
---
Fix Version/s: 1.20.0
   Resolution: Done

master:
38255652406becbfbcb7cbec557aa5ba9a1ebbb3
558ca75da2fcec875d1e04a8d75a24fd0ad42ccc

> Introduce SupportsBatchSnapshot for operator coordinator
> 
>
> Key: FLINK-33984
> URL: https://issues.apache.org/jira/browse/FLINK-33984
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34945) Support recover shuffle descriptor and partition metrics from tiered storage

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34945:
---

Assignee: Junrui Li

> Support recover shuffle descriptor and partition metrics from tiered storage
> 
>
> Key: FLINK-34945
> URL: https://issues.apache.org/jira/browse/FLINK-34945
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33985) Support obtain all partitions existing in cluster through ShuffleMaster.

2024-04-02 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33985.
---
Fix Version/s: 1.20.0
   Resolution: Done

master: a44709662956b306fe686623d00358a6b076f637

> Support obtain all partitions existing in cluster through ShuffleMaster.
> 
>
> Key: FLINK-33985
> URL: https://issues.apache.org/jira/browse/FLINK-33985
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Support obtain all partitions existing in cluster through ShuffleMaster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33985) Support obtain all partitions existing in cluster through ShuffleMaster.

2024-04-02 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33985:

Component/s: Runtime / Coordination

> Support obtain all partitions existing in cluster through ShuffleMaster.
> 
>
> Key: FLINK-33985
> URL: https://issues.apache.org/jira/browse/FLINK-33985
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Support obtain all partitions existing in cluster through ShuffleMaster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33982) Introduce new config options for Batch Job Recovery

2024-04-02 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33982.
---
Fix Version/s: 1.20.0
   Resolution: Done

master: ec1311c8eb805f91b3b8d7d7cbe192e8cad05a76

> Introduce new config options for Batch Job Recovery
> ---
>
> Key: FLINK-33982
> URL: https://issues.apache.org/jira/browse/FLINK-33982
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34565) Enhance flink kubernetes configMap to accommodate additional configuration files

2024-03-27 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831336#comment-17831336
 ] 

Zhu Zhu commented on FLINK-34565:
-

IIUC, the requirement is to ship more user files, which may be needed by user 
code, to the pod. Supporting configuration files is just a special case of it. 
Shipping them via ConfigMap sounds a bit tricky to me.
cc [~wangyang0918]

> Enhance flink kubernetes configMap to accommodate additional configuration 
> files
> 
>
> Key: FLINK-34565
> URL: https://issues.apache.org/jira/browse/FLINK-34565
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Reporter: Surendra Singh Lilhore
>Priority: Major
>  Labels: pull-request-available
>
> Flink kubernetes client currently supports a fixed number of files 
> (flink-conf.yaml, logback-console.xml, log4j-console.properties) in the JM 
> and TM Pod ConfigMap. In certain scenarios, particularly in app mode, 
> additional configuration files are required for jobs to run successfully. 
> Presently, users must resort to workarounds to include dynamic configuration 
> files in the JM and TM. This proposed improvement allows users to easily add 
> extra files by configuring the 
> '{*}kubernetes.flink.configmap.additional.resources{*}' property. Users can 
> provide a semicolon-separated list of local files in the client Flink config 
> directory that should be included in the Flink ConfigMap.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33984) Introduce SupportsBatchSnapshot for operator coordinator

2024-03-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33984:
---

Assignee: Junrui Li

> Introduce SupportsBatchSnapshot for operator coordinator
> 
>
> Key: FLINK-33984
> URL: https://issues.apache.org/jira/browse/FLINK-33984
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33985) Support obtain all partitions existing in cluster through ShuffleMaster.

2024-03-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33985:
---

Assignee: Junrui Li

> Support obtain all partitions existing in cluster through ShuffleMaster.
> 
>
> Key: FLINK-33985
> URL: https://issues.apache.org/jira/browse/FLINK-33985
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Support obtain all partitions existing in cluster through ShuffleMaster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33982) Introduce new config options for Batch Job Recovery

2024-03-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33982:
---

Assignee: Junrui Li

> Introduce new config options for Batch Job Recovery
> ---
>
> Key: FLINK-33982
> URL: https://issues.apache.org/jira/browse/FLINK-33982
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33892) FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-03-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33892:

Summary: FLIP-383: Support Job Recovery from JobMaster Failures for Batch 
Jobs  (was: FLIP-383: Support Job Recovery for Batch Jobs)

> FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs
> -
>
> Key: FLINK-33892
> URL: https://issues.apache.org/jira/browse/FLINK-33892
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Junrui Li
>Priority: Major
>
> This is the umbrella ticket for 
> [FLIP-383|https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+for+Batch+Jobs]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-03-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-32513.
---
Fix Version/s: 1.18.2
   1.20.0
   1.19.1
   Resolution: Fixed

master: 8dcb0ae9063b66af1d674b7b0b3be76b6d752692
release-1.19: 5ec4bf2f18168001b5cbb9012f331d3405228516
release-1.18: 940b3bbda5b10abe3a41d60467d33fd424c7dae6

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> 

[jira] [Closed] (FLINK-34731) Remove SpeculativeScheduler and incorporate its features into AdaptiveBatchScheduler

2024-03-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34731.
---
Resolution: Done

master: cf0d75c4bb324825a057dc72243bb6a2046f8479

> Remove SpeculativeScheduler and incorporate its features into 
> AdaptiveBatchScheduler
> 
>
> Key: FLINK-34731
> URL: https://issues.apache.org/jira/browse/FLINK-34731
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Presently, speculative execution is exposed to users as a feature of the 
> AdaptiveBatchScheduler.
> To streamline our codebase and reduce maintenance overhead, this ticket will 
> consolidate the SpeculativeScheduler into the AdaptiveBatchScheduler, 
> eliminating the need for a separate SpeculativeScheduler class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-03-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-32513:
---

Assignee: Jeyhun Karimov

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> 

[jira] [Closed] (FLINK-34725) Dockerfiles for release publishing has incorrect config.yaml path

2024-03-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34725.
---
Resolution: Fixed

master: 3f4a80989fe7243983926f09fac2283f6fa63693
release-1.19: f53c5628e43777b4b924ec81224acc3df938800a

> Dockerfiles for release publishing has incorrect config.yaml path
> -
>
> Key: FLINK-34725
> URL: https://issues.apache.org/jira/browse/FLINK-34725
> Project: Flink
>  Issue Type: Bug
>  Components: flink-docker
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0, 1.19.1
>
>
> An issue found when do docker image publishing, unexpected error msg:
> {code:java}
> sed: can't read /config.yaml: No such file or directory{code}
>  
> also found in flink-docker/master daily Publish SNAPSHOTs  action:
> [https://github.com/apache/flink-docker/actions/runs/8210534289/job/22458150514#step:8:588]
> [https://github.com/apache/flink-docker/actions/runs/8210534289/job/22458150322#step:8:549]
>  
> This related to changes by https://issues.apache.org/jira/browse/FLINK-34205



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34731) Remove SpeculativeScheduler and incorporate its features into AdaptiveBatchScheduler

2024-03-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34731:
---

Assignee: Junrui Li

> Remove SpeculativeScheduler and incorporate its features into 
> AdaptiveBatchScheduler
> 
>
> Key: FLINK-34731
> URL: https://issues.apache.org/jira/browse/FLINK-34731
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
> Fix For: 1.20.0
>
>
> Presently, speculative execution is exposed to users as a feature of the 
> AdaptiveBatchScheduler.
> To streamline our codebase and reduce maintenance overhead, this ticket will 
> consolidate the SpeculativeScheduler into the AdaptiveBatchScheduler, 
> eliminating the need for a separate SpeculativeScheduler class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-03-06 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34105.
---
Fix Version/s: 1.19.0
 Assignee: dizhou cao  (was: Yangze Guo)
   Resolution: Fixed

1.19: 837f8e584850bdcbc586a54f58e3fe953a816be8

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: dizhou cao
>Priority: Critical
> Fix For: 1.19.0
>
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34377) Release Testing: Verify FLINK-33297 Support standard YAML for FLINK configuration

2024-02-25 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820553#comment-17820553
 ] 

Zhu Zhu commented on FLINK-34377:
-

Thanks for volunteering! [~xiasun]
I have assigned you the ticket.

> Release Testing: Verify FLINK-33297 Support standard YAML for FLINK 
> configuration
> -
>
> Key: FLINK-34377
> URL: https://issues.apache.org/jira/browse/FLINK-34377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Junrui Li
>Assignee: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-366.
> Starting with version 1.19, Flink has officially introduced full support for 
> the standard YAML 1.2 syntax. For detailed information, please refer to the 
> Flink 
> website:https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#flink-configuration-file
> We may need to cover the following two types of test cases:
> Test 1: For newly created jobs, utilize a config.yaml file to set up the 
> Flink cluster. We need to verify that the job runs as expected with this new 
> configuration.
> Test 2: For existing jobs, migrate the legacy flink-conf.yaml to the new 
> config.yaml. Test the job runs just like before post-migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34377) Release Testing: Verify FLINK-33297 Support standard YAML for FLINK configuration

2024-02-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34377:
---

Assignee: xingbe

> Release Testing: Verify FLINK-33297 Support standard YAML for FLINK 
> configuration
> -
>
> Key: FLINK-34377
> URL: https://issues.apache.org/jira/browse/FLINK-34377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Junrui Li
>Assignee: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-366.
> Starting with version 1.19, Flink has officially introduced full support for 
> the standard YAML 1.2 syntax. For detailed information, please refer to the 
> Flink 
> website:https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#flink-configuration-file
> We may need to cover the following two types of test cases:
> Test 1: For newly created jobs, utilize a config.yaml file to set up the 
> Flink cluster. We need to verify that the job runs as expected with this new 
> configuration.
> Test 2: For existing jobs, migrate the legacy flink-conf.yaml to the new 
> config.yaml. Test the job runs just like before post-migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34383) Modify the comment with incorrect syntax

2024-02-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34383.
---
Resolution: Won't Fix

> Modify the comment with incorrect syntax
> 
>
> Key: FLINK-34383
> URL: https://issues.apache.org/jira/browse/FLINK-34383
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: li you
>Priority: Major
>  Labels: pull-request-available
>
> There is an error in the syntax of the comment for the class 
> PermanentBlobCache



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34356) Release Testing: Verify FLINK-33768 Support dynamic source parallelism inference for batch jobs

2024-02-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34356:
---

Assignee: Junrui Li

> Release Testing: Verify FLINK-33768  Support dynamic source parallelism 
> inference for batch jobs 
> -
>
> Key: FLINK-34356
> URL: https://issues.apache.org/jira/browse/FLINK-34356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-379.
> New Source can implement the interface DynamicParallelismInference to enable 
> dynamic parallelism inference. For detailed information, please refer to the 
> documentation.
> We may need to cover the following two types of test cases:
> Test 1: FileSource has implemented the dynamic source parallelism inference. 
> Test the automatic parallelism inference of FileSource.
> Test 2: Test the dynamic source parallelism inference of a custom Source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34356) Release Testing: Verify FLINK-33768 Support dynamic source parallelism inference for batch jobs

2024-02-21 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819142#comment-17819142
 ] 

Zhu Zhu commented on FLINK-34356:
-

Assigned. Thanks for volunteering! [~JunRuiLi]


> Release Testing: Verify FLINK-33768  Support dynamic source parallelism 
> inference for batch jobs 
> -
>
> Key: FLINK-34356
> URL: https://issues.apache.org/jira/browse/FLINK-34356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-379.
> New Source can implement the interface DynamicParallelismInference to enable 
> dynamic parallelism inference. For detailed information, please refer to the 
> documentation.
> We may need to cover the following two types of test cases:
> Test 1: FileSource has implemented the dynamic source parallelism inference. 
> Test the automatic parallelism inference of FileSource.
> Test 2: Test the dynamic source parallelism inference of a custom Source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33241) Align config option generation documentation for Flink's config documentation

2024-02-20 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33241.
---
Fix Version/s: 1.19.0
   Resolution: Fixed

Fixed via 8fe005cf1325b7477d7e1808a46bd80798165029

> Align config option generation documentation for Flink's config documentation
> -
>
> Key: FLINK-33241
> URL: https://issues.apache.org/jira/browse/FLINK-33241
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.19.0
>
>
> The configuration parameter docs generation is documented in two places in 
> different ways:
> [docs/README.md:62|https://github.com/apache/flink/blob/5c1e9f3b1449cb77276d578b344d9a69c7cf9a3c/docs/README.md#L62]
>  and 
> [flink-docs/README.md:44|https://github.com/apache/flink/blob/7bebd2d9fac517c28afc24c0c034d77cfe2b43a6/flink-docs/README.md#L44].
> We should remove the corresponding command from {{docs/README.md}} and refer 
> to {{flink-docs/README.md}} for the documentation. That way, we only have to 
> maintain a single file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34247) Document FLIP-366: Support standard YAML for FLINK configuration

2024-02-04 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34247.
---
  Assignee: Junrui Li
Resolution: Done

master/release-1.19:
5b61baadd02ccdfa702834e2e63aeb8d1d9e1250
04dd91f2b6c830b9ac0e445f72938e3d6f479edd
e9bea09510e18c6143e6e14ca17a894abfaf92bf

> Document FLIP-366: Support standard YAML for FLINK configuration
> 
>
> Key: FLINK-34247
> URL: https://issues.apache.org/jira/browse/FLINK-34247
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (FLINK-33768) FLIP-379: Support dynamic source parallelism inference for batch jobs

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reopened FLINK-33768:
-

> FLIP-379: Support dynamic source parallelism inference for batch jobs
> -
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34145) File source connector support dynamic source parallelism inference in batch jobs

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34145:

Parent: FLINK-33768
Issue Type: Sub-task  (was: Improvement)

> File source connector support dynamic source parallelism inference in batch 
> jobs
> 
>
> Key: FLINK-34145
> URL: https://issues.apache.org/jira/browse/FLINK-34145
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / FileSystem
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  has introduced support for dynamic source parallelism inference in batch 
> jobs, and we plan to give priority to enabling this feature for the file 
> source connector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34144) Update the documentation and configuration description about dynamic source parallelism inference

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34144:

Parent: FLINK-33768
Issue Type: Sub-task  (was: Improvement)

> Update the documentation and configuration description about dynamic source 
> parallelism inference
> -
>
> Key: FLINK-34144
> URL: https://issues.apache.org/jira/browse/FLINK-34144
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource]
>  introduces the new feature of dynamic source parallelism inference, and we 
> plan to update the documentation and configuration items accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34143) Modify the effective strategy of `execution.batch.adaptive.auto-parallelism.default-source-parallelism`

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34143:

Parent: FLINK-33768
Issue Type: Sub-task  (was: Improvement)

> Modify the effective strategy of 
> `execution.batch.adaptive.auto-parallelism.default-source-parallelism`
> ---
>
> Key: FLINK-34143
> URL: https://issues.apache.org/jira/browse/FLINK-34143
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, if users do not set the 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  configuration option, the AdaptiveBatchScheduler defaults to a parallelism 
> of 1 for source vertices. In 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource],
>  the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  will act as the upper bound for inferring dynamic source parallelism, and 
> continuing with the current policy is no longer appropriate.
> We plan to change the effectiveness strategy of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}};
>  when the user does not set this config option, we will use the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.max-parallelism`{}}} as the 
> upper bound for source parallelism inference. If 
> {{`execution.batch.adaptive.auto-parallelism.max-parallelism`}} is also not 
> configured, the value of `{{{}parallelism.default`{}}} will be used as a 
> fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33768) FLIP-379: Support dynamic source parallelism inference for batch jobs

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33768:

Summary: FLIP-379: Support dynamic source parallelism inference for batch 
jobs  (was: [FLIP-379] Support dynamic source parallelism inference for batch 
jobs)

> FLIP-379: Support dynamic source parallelism inference for batch jobs
> -
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33768) [FLIP-379] Support dynamic source parallelism inference for batch jobs

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33768:

Summary: [FLIP-379] Support dynamic source parallelism inference for batch 
jobs  (was: Support dynamic source parallelism inference for batch jobs)

> [FLIP-379] Support dynamic source parallelism inference for batch jobs
> --
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34132) Batch WordCount job fails when run with AdaptiveBatch scheduler

2024-02-01 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813095#comment-17813095
 ] 

Zhu Zhu commented on FLINK-34132:
-

We should also migrate the existing batch examples from DataSet to DataStream 
so that it can directly work with AdaptiveBatchScheduler. 
This work needs to be done before removing the DataSet API in Flink 2.0.
cc [~Wencong Liu]

> Batch WordCount job fails when run with AdaptiveBatch scheduler
> ---
>
> Key: FLINK-34132
> URL: https://issues.apache.org/jira/browse/FLINK-34132
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.17.1, 1.18.1
>Reporter: Prabhu Joseph
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Batch WordCount job fails when run with AdaptiveBatch scheduler.
> *Repro Steps*
> {code:java}
> flink-yarn-session -Djobmanager.scheduler=adaptive -d
>  flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input 
> s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT
> {code}
> *Error logs*
> {code:java}
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1067)
>   at 
> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:144)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:73)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>   ... 12 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1062)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> 

[jira] [Updated] (FLINK-34132) Batch WordCount job fails when run with AdaptiveBatch scheduler

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34132:

Component/s: Documentation

> Batch WordCount job fails when run with AdaptiveBatch scheduler
> ---
>
> Key: FLINK-34132
> URL: https://issues.apache.org/jira/browse/FLINK-34132
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.17.1, 1.18.1
>Reporter: Prabhu Joseph
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Batch WordCount job fails when run with AdaptiveBatch scheduler.
> *Repro Steps*
> {code:java}
> flink-yarn-session -Djobmanager.scheduler=adaptive -d
>  flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input 
> s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT
> {code}
> *Error logs*
> {code:java}
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1067)
>   at 
> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:144)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:73)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>   ... 12 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1062)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457)
>   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
>   at 
> 

[jira] [Closed] (FLINK-34132) Batch WordCount job fails when run with AdaptiveBatch scheduler

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34132.
---
Fix Version/s: 1.19.0
 Assignee: Junrui Li
   Resolution: Fixed

The documentation is updated via dd3e60a4b1e473c167837b7c3bc4fb90c0a1f51a

> Batch WordCount job fails when run with AdaptiveBatch scheduler
> ---
>
> Key: FLINK-34132
> URL: https://issues.apache.org/jira/browse/FLINK-34132
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.17.1, 1.18.1
>Reporter: Prabhu Joseph
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Batch WordCount job fails when run with AdaptiveBatch scheduler.
> *Repro Steps*
> {code:java}
> flink-yarn-session -Djobmanager.scheduler=adaptive -d
>  flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input 
> s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT
> {code}
> *Error logs*
> {code:java}
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1067)
>   at 
> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:144)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:73)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>   ... 12 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1062)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457)
>   at 

[jira] [Closed] (FLINK-34126) Correct the description of jobmanager.scheduler

2024-01-31 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34126.
---
Fix Version/s: 1.19.0
 Assignee: Junrui Li
   Resolution: Fixed

Fixed via 2be1ea801cf616d0d0a82729829245c205caaad8

> Correct the description of jobmanager.scheduler
> ---
>
> Key: FLINK-34126
> URL: https://issues.apache.org/jira/browse/FLINK-34126
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Now the config option jobmanager.scheduler has description: 
> _Determines which scheduler implementation is used to schedule tasks. 
> Accepted values are:_
>  * _'Default': Default scheduler_
>  * _'Adaptive': Adaptive scheduler. More details can be found 
> [here|https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/elastic_scaling#adaptive-scheduler]._
>  * _'AdaptiveBatch': Adaptive batch scheduler. More details can be found 
> [here|https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/elastic_scaling#adaptive-batch-scheduler]._
> _Possible values:_
>  * _"Default"_
>  * _"Adaptive"_
>  * _"AdaptiveBatch"_
>  
> However, after FLIP-283 we changed the default scheduler for batch job to 
> AdaptiveBatchScheduler. This config option description will mislead users 
> that the 'DefaultScheduler' is the universal fallback for both batch and 
> streaming jobs.
> We should update this description.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34206) CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed

2024-01-31 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34206.
---
Fix Version/s: 1.19.0
   Resolution: Fixed

Fixed via b737b71859672e8020881ce2abf998735ee98abb

> CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed
> ---
>
> Key: FLINK-34206
> URL: https://issues.apache.org/jira/browse/FLINK-34206
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: xingbe
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56728=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8763
> {code}
> Jan 23 01:39:48 01:39:48.152 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.24 s <<< FAILURE! -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> Jan 23 01:39:48 01:39:48.152 [ERROR] 
> org.apache.flink.test.streaming.runtime.CacheITCase.testRetryOnCorruptedClusterDataset(Path)
>  -- Time elapsed: 4.755 s <<< ERROR!
> Jan 23 01:39:48 org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
> 

[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-29 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812123#comment-17812123
 ] 

Zhu Zhu commented on FLINK-34105:
-

Thanks for the updates! [~guoyangze]

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34257) Update Flink YAML Parser to Support YAML 1.2 Specification

2024-01-29 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34257.
---
Resolution: Fixed

Fixed via 081051a2cacaddf6dfe613da061f15f28a015a41

>  Update Flink YAML Parser to Support YAML 1.2 Specification
> ---
>
> Key: FLINK-34257
> URL: https://issues.apache.org/jira/browse/FLINK-34257
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Runtime / Configuration
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> FLINK-33364 and FLINK-33577 added snakeyaml and pyyaml dependencies to 
> support a standard YAML parser. However, these parsers support the YAML 1.1 
> specification rather than the YAML 1.2 specification, which is the version 
> referenced by [FLINK official 
> website|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#configuration].
>  Therefore, we need to update these dependencies that support YAML 1.2.
> The updated dependencies are as follows:
> 1. For Java: change from snakeyaml to snakeyaml-engine
> 2. For Python: change from pyyaml to ruamel.yaml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34257) Update Flink YAML Parser to Support YAML 1.2 Specification

2024-01-29 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34257:

Affects Version/s: 1.19.0

>  Update Flink YAML Parser to Support YAML 1.2 Specification
> ---
>
> Key: FLINK-34257
> URL: https://issues.apache.org/jira/browse/FLINK-34257
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> FLINK-33364 and FLINK-33577 added snakeyaml and pyyaml dependencies to 
> support a standard YAML parser. However, these parsers support the YAML 1.1 
> specification rather than the YAML 1.2 specification, which is the version 
> referenced by [FLINK official 
> website|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#configuration].
>  Therefore, we need to update these dependencies that support YAML 1.2.
> The updated dependencies are as follows:
> 1. For Java: change from snakeyaml to snakeyaml-engine
> 2. For Python: change from pyyaml to ruamel.yaml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-29 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811867#comment-17811867
 ] 

Zhu Zhu commented on FLINK-34105:
-

Hi [~lsdy], what's the status of the fix?

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34257) Update Flink YAML Parser to Support YAML 1.2 Specification

2024-01-29 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34257:
---

Assignee: Junrui Li

>  Update Flink YAML Parser to Support YAML 1.2 Specification
> ---
>
> Key: FLINK-34257
> URL: https://issues.apache.org/jira/browse/FLINK-34257
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> FLINK-33364 and FLINK-33577 added snakeyaml and pyyaml dependencies to 
> support a standard YAML parser. However, these parsers support the YAML 1.1 
> specification, not the YAML 1.2 specification. Therefore, we need to update 
> these dependencies that support YAML 1.2.
> The updated dependencies are as follows:
> 1. For Java: change from snakeyaml to snakeyaml-engine
> 2. For Python: change from pyyaml to ruamel.yaml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34257) Update Flink YAML Parser to Support YAML 1.2 Specification

2024-01-29 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34257:

Component/s: Runtime / Configuration

>  Update Flink YAML Parser to Support YAML 1.2 Specification
> ---
>
> Key: FLINK-34257
> URL: https://issues.apache.org/jira/browse/FLINK-34257
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Runtime / Configuration
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> FLINK-33364 and FLINK-33577 added snakeyaml and pyyaml dependencies to 
> support a standard YAML parser. However, these parsers support the YAML 1.1 
> specification, not the YAML 1.2 specification. Therefore, we need to update 
> these dependencies that support YAML 1.2.
> The updated dependencies are as follows:
> 1. For Java: change from snakeyaml to snakeyaml-engine
> 2. For Python: change from pyyaml to ruamel.yaml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34245) CassandraSinkTest.test_cassandra_sink fails under JDK17 and JDK21 due to InaccessibleObjectException

2024-01-28 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34245.
---
Fix Version/s: 1.19.0
 Assignee: Junrui Li
   Resolution: Fixed

Fixed via ddbf87f2a7aeeeb20a8590578c6d037b239d5593

> CassandraSinkTest.test_cassandra_sink fails under JDK17 and JDK21 due to 
> InaccessibleObjectException
> 
>
> Key: FLINK-34245
> URL: https://issues.apache.org/jira/browse/FLINK-34245
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Connectors / Cassandra
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56942=logs=b53e1644-5cb4-5a3b-5d48-f523f39bcf06=b68c9f5c-04c9-5c75-3862-a3a27aabbce3]
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56942=logs=60960eae-6f09-579e-371e-29814bdd1adc=7a70c083-6a74-5348-5106-30a76c29d8fa=63680]
> {code:java}
> Jan 26 01:29:27 E   py4j.protocol.Py4JJavaError: An error 
> occurred while calling 
> z:org.apache.flink.python.util.PythonConfigUtil.configPythonOperator.
> Jan 26 01:29:27 E   : 
> java.lang.reflect.InaccessibleObjectException: Unable to make field final 
> java.util.Map java.util.Collections$UnmodifiableMap.m accessible: module 
> java.base does not "opens java.util" to unnamed module @17695df3
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.AccessibleObject.throwInaccessibleObjectException(AccessibleObject.java:391)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:367)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:315)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:183)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.Field.setAccessible(Field.java:177)
> Jan 26 01:29:27 E at 
> org.apache.flink.python.util.PythonConfigUtil.registerPythonBroadcastTransformationTranslator(PythonConfigUtil.java:357)
> Jan 26 01:29:27 E at 
> org.apache.flink.python.util.PythonConfigUtil.configPythonOperator(PythonConfigUtil.java:101)
> Jan 26 01:29:27 E at 
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.Method.invoke(Method.java:580)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
> Jan 26 01:29:27 E at 
> java.base/java.lang.Thread.run(Thread.java:1583) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34200) AutoRescalingITCase#testCheckpointRescalingInKeyedState fails

2024-01-27 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811633#comment-17811633
 ] 

Zhu Zhu commented on FLINK-34200:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57024=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba

> AutoRescalingITCase#testCheckpointRescalingInKeyedState fails
> -
>
> Key: FLINK-34200
> URL: https://issues.apache.org/jira/browse/FLINK-34200
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
> Attachments: FLINK-34200.failure.log.gz
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56601=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=8200]
> {code:java}
> Jan 19 02:31:53 02:31:53.954 [ERROR] Tests run: 32, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 1050 s <<< FAILURE! -- in 
> org.apache.flink.test.checkpointing.AutoRescalingITCase
> Jan 19 02:31:53 02:31:53.954 [ERROR] 
> org.apache.flink.test.checkpointing.AutoRescalingITCase.testCheckpointRescalingInKeyedState[backend
>  = rocksdb, buffersPerChannel = 2] -- Time elapsed: 59.10 s <<< FAILURE!
> Jan 19 02:31:53 java.lang.AssertionError: expected:<[(0,8000), (0,32000), 
> (0,48000), (0,72000), (1,78000), (1,3), (1,54000), (0,2000), (0,1), 
> (0,5), (0,66000), (0,74000), (0,82000), (1,8), (1,0), (1,16000), 
> (1,24000), (1,4), (1,56000), (1,64000), (0,12000), (0,28000), (0,52000), 
> (0,6), (0,68000), (0,76000), (1,18000), (1,26000), (1,34000), (1,42000), 
> (1,58000), (0,6000), (0,14000), (0,22000), (0,38000), (0,46000), (0,62000), 
> (0,7), (1,4000), (1,2), (1,36000), (1,44000)]> but was:<[(0,8000), 
> (0,32000), (0,48000), (0,72000), (1,78000), (1,3), (1,54000), (0,2000), 
> (0,1), (0,5), (0,66000), (0,74000), (0,82000), (1,8), (1,0), 
> (1,16000), (1,24000), (1,4), (1,56000), (1,64000), (0,12000), (0,28000), 
> (0,52000), (0,6), (0,68000), (0,76000), (0,1000), (0,25000), (0,33000), 
> (0,41000), (1,18000), (1,26000), (1,34000), (1,42000), (1,58000), (0,6000), 
> (0,14000), (0,22000), (0,38000), (0,46000), (0,62000), (0,7), (1,4000), 
> (1,2), (1,36000), (1,44000)]>
> Jan 19 02:31:53   at org.junit.Assert.fail(Assert.java:89)
> Jan 19 02:31:53   at org.junit.Assert.failNotEquals(Assert.java:835)
> Jan 19 02:31:53   at org.junit.Assert.assertEquals(Assert.java:120)
> Jan 19 02:31:53   at org.junit.Assert.assertEquals(Assert.java:146)
> Jan 19 02:31:53   at 
> org.apache.flink.test.checkpointing.AutoRescalingITCase.testCheckpointRescalingKeyedState(AutoRescalingITCase.java:296)
> Jan 19 02:31:53   at 
> org.apache.flink.test.checkpointing.AutoRescalingITCase.testCheckpointRescalingInKeyedState(AutoRescalingITCase.java:196)
> Jan 19 02:31:53   at java.lang.reflect.Method.invoke(Method.java:498)
> Jan 19 02:31:53   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34145) File source connector support dynamic source parallelism inference in batch jobs

2024-01-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34145.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
11631cb59568df60d40933fb13c8433062ed9290

> File source connector support dynamic source parallelism inference in batch 
> jobs
> 
>
> Key: FLINK-34145
> URL: https://issues.apache.org/jira/browse/FLINK-34145
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  has introduced support for dynamic source parallelism inference in batch 
> jobs, and we plan to give priority to enabling this feature for the file 
> source connector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34206) CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed

2024-01-26 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811231#comment-17811231
 ] 

Zhu Zhu commented on FLINK-34206:
-

Disabled {{CacheITCase.testRetryOnCorruptedClusterDataset}} temporarily via 
05ee359ebd564af3dd8ab31975cd479e92ba1785

> CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed
> ---
>
> Key: FLINK-34206
> URL: https://issues.apache.org/jira/browse/FLINK-34206
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: xingbe
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56728=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8763
> {code}
> Jan 23 01:39:48 01:39:48.152 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.24 s <<< FAILURE! -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> Jan 23 01:39:48 01:39:48.152 [ERROR] 
> org.apache.flink.test.streaming.runtime.CacheITCase.testRetryOnCorruptedClusterDataset(Path)
>  -- Time elapsed: 4.755 s <<< ERROR!
> Jan 23 01:39:48 org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
> Jan 23 01:39:48   at 
> 

[jira] [Assigned] (FLINK-34206) CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed

2024-01-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34206:
---

Assignee: xingbe

> CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed
> ---
>
> Key: FLINK-34206
> URL: https://issues.apache.org/jira/browse/FLINK-34206
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: xingbe
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56728=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8763
> {code}
> Jan 23 01:39:48 01:39:48.152 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.24 s <<< FAILURE! -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> Jan 23 01:39:48 01:39:48.152 [ERROR] 
> org.apache.flink.test.streaming.runtime.CacheITCase.testRetryOnCorruptedClusterDataset(Path)
>  -- Time elapsed: 4.755 s <<< ERROR!
> Jan 23 01:39:48 org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
> Jan 23 01:39:48   at 
> org.apache.pekko.pattern.PromiseActorRef.$bang(AskSupport.scala:629)
> Jan 

[jira] [Closed] (FLINK-34223) Introduce a migration tool to transfer legacy config file to new config file

2024-01-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34223.
---
  Assignee: Junrui Li
Resolution: Done

master/release-1.19:
8fceb101e7e45f3fdc9357019757230bd8c16aa7

> Introduce a migration tool to transfer legacy config file to new config file
> 
>
> Key: FLINK-34223
> URL: https://issues.apache.org/jira/browse/FLINK-34223
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Scripts
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> As we transition to new configuration files that adhere to the standard YAML 
> format, users are expected to manually migrate their existing config files. 
> However, this process can be error-prone and time-consuming.
> To simplify the migration, we're introducing an automated script. This script 
> leverages BashJavaUtils to efficiently convert old flink-conf.yaml files into 
> the new config file config.yaml, thereby reducing the effort required for 
> migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34232) Config file unexpectedly lacks support for env.java.home

2024-01-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34232.
---
  Assignee: Junrui Li
Resolution: Fixed

Fixed in master/release-1.19:
e623c07f4e56fdef1bd8514ccd02df347af5b122

> Config file unexpectedly lacks support for env.java.home
> 
>
> Key: FLINK-34232
> URL: https://issues.apache.org/jira/browse/FLINK-34232
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> We removed the option to set JAVA_HOME in the config file with commit 
> [24091|https://github.com/apache/flink/pull/24091] to improve how we handle 
> standard YAML with BashJavaUtils. But since setting JAVA_HOME is a publicly 
> documented feature, we need to keep it available for users. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34206) CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed

2024-01-25 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811084#comment-17811084
 ] 

Zhu Zhu commented on FLINK-34206:
-

We will take a look.

> CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed
> ---
>
> Key: FLINK-34206
> URL: https://issues.apache.org/jira/browse/FLINK-34206
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Blocker
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56728=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8763
> {code}
> Jan 23 01:39:48 01:39:48.152 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.24 s <<< FAILURE! -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> Jan 23 01:39:48 01:39:48.152 [ERROR] 
> org.apache.flink.test.streaming.runtime.CacheITCase.testRetryOnCorruptedClusterDataset(Path)
>  -- Time elapsed: 4.755 s <<< ERROR!
> Jan 23 01:39:48 org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
> Jan 23 01:39:48   at 
> org.apache.pekko.pattern.PromiseActorRef.$bang(AskSupport.scala:629)
> Jan 23 01:39:48   at 

[jira] [Closed] (FLINK-33577) Make "conf.yaml" as the default Flink configuration file

2024-01-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33577.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
9721ce835f5a7f28f2ad187346e009633307097b

> Make "conf.yaml" as the default Flink configuration file
> 
>
> Key: FLINK-33577
> URL: https://issues.apache.org/jira/browse/FLINK-33577
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> This update ensures that the flink-dist package in FLINK will include the new 
> configuration file "conf.yaml" by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34144) Update the documentation and configuration description about dynamic source parallelism inference

2024-01-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34144.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
78e31f0dcc4da65d88e18560f3374a47cc0a7c9b

> Update the documentation and configuration description about dynamic source 
> parallelism inference
> -
>
> Key: FLINK-34144
> URL: https://issues.apache.org/jira/browse/FLINK-34144
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource]
>  introduces the new feature of dynamic source parallelism inference, and we 
> plan to update the documentation and configuration items accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34229) Duplicate entry in InnerClasses attribute in class file FusionStreamOperator

2024-01-25 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810856#comment-17810856
 ] 

Zhu Zhu commented on FLINK-34229:
-

[~FrankZou] the query plan can be different between q35 of `flink-tpcds-test` 
and q35 of a 10TB TPC-DS benchmark. e.g., DPP or runtime filters may not be 
created in `flink-tpcds-test` due to the data size is very small.

> Duplicate entry in InnerClasses attribute in class file FusionStreamOperator
> 
>
> Key: FLINK-34229
> URL: https://issues.apache.org/jira/browse/FLINK-34229
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.19.0
>Reporter: xingbe
>Priority: Major
> Attachments: image-2024-01-24-17-05-47-883.png
>
>
> I noticed a runtime error happens in 10TB TPC-DS (q35.sql) benchmarks in 
> 1.19, the problem did not happen in 1.18.0. This issue may have been newly 
> introduced recently. !image-2024-01-24-17-05-47-883.png|width=589,height=279!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34205) Update flink-docker's Dockerfile and docker-entrypoint.sh to use BashJavaUtils for Flink configuration management

2024-01-23 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34205.
---
Fix Version/s: 1.19.0
   Resolution: Done

dev-master:
44f058287cc956a620b12b6f8ed214e44dc3db77

> Update flink-docker's Dockerfile and docker-entrypoint.sh to use 
> BashJavaUtils for Flink configuration management
> -
>
> Key: FLINK-34205
> URL: https://issues.apache.org/jira/browse/FLINK-34205
> Project: Flink
>  Issue Type: Sub-task
>  Components: flink-docker
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The flink-docker's Dockerfile and docker-entrypoint.sh currently use shell 
> scripting techniques with grep and sed for configuration reading and 
> modification. This method is not suitable for the standard YAML configuration 
> format.
> Following the changes introduced in FLINK-33721, we should update 
> flink-docker's Dockerfile and docker-entrypoint.sh to use BashJavaUtils for 
> Flink configuration reading and writing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34205) Update flink-docker's Dockerfile and docker-entrypoint.sh to use BashJavaUtils for Flink configuration management

2024-01-23 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34205:
---

Assignee: Junrui Li

> Update flink-docker's Dockerfile and docker-entrypoint.sh to use 
> BashJavaUtils for Flink configuration management
> -
>
> Key: FLINK-34205
> URL: https://issues.apache.org/jira/browse/FLINK-34205
> Project: Flink
>  Issue Type: Sub-task
>  Components: flink-docker
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> The flink-docker's Dockerfile and docker-entrypoint.sh currently use shell 
> scripting techniques with grep and sed for configuration reading and 
> modification. This method is not suitable for the standard YAML configuration 
> format.
> Following the changes introduced in FLINK-33721, we should update 
> flink-docker's Dockerfile and docker-entrypoint.sh to use BashJavaUtils for 
> Flink configuration reading and writing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34143) Modify the effective strategy of `execution.batch.adaptive.auto-parallelism.default-source-parallelism`

2024-01-23 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34143.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
97caa3c251e416640a6f54ea103912839c346f70

> Modify the effective strategy of 
> `execution.batch.adaptive.auto-parallelism.default-source-parallelism`
> ---
>
> Key: FLINK-34143
> URL: https://issues.apache.org/jira/browse/FLINK-34143
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, if users do not set the 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  configuration option, the AdaptiveBatchScheduler defaults to a parallelism 
> of 1 for source vertices. In 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource],
>  the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  will act as the upper bound for inferring dynamic source parallelism, and 
> continuing with the current policy is no longer appropriate.
> We plan to change the effectiveness strategy of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}};
>  when the user does not set this config option, we will use the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.max-parallelism`{}}} as the 
> upper bound for source parallelism inference. If 
> {{`execution.batch.adaptive.auto-parallelism.max-parallelism`}} is also not 
> configured, the value of `{{{}parallelism.default`{}}} will be used as a 
> fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34144) Update the documentation and configuration description about dynamic source parallelism inference

2024-01-23 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34144:
---

Assignee: xingbe

> Update the documentation and configuration description about dynamic source 
> parallelism inference
> -
>
> Key: FLINK-34144
> URL: https://issues.apache.org/jira/browse/FLINK-34144
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource]
>  introduces the new feature of dynamic source parallelism inference, and we 
> plan to update the documentation and configuration items accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34145) File source connector support dynamic source parallelism inference in batch jobs

2024-01-23 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34145:
---

Assignee: xingbe

> File source connector support dynamic source parallelism inference in batch 
> jobs
> 
>
> Key: FLINK-34145
> URL: https://issues.apache.org/jira/browse/FLINK-34145
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  has introduced support for dynamic source parallelism inference in batch 
> jobs, and we plan to give priority to enabling this feature for the file 
> source connector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33721) Extend BashJavaUtils to Support Reading and Writing Standard YAML Files

2024-01-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33721.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
1f7622d4d23bfcb76f466469ec36585054864f04
c148b62166d8fec9e7e525a836de890c2f12973b

> Extend BashJavaUtils to Support Reading and Writing Standard YAML Files
> ---
>
> Key: FLINK-33721
> URL: https://issues.apache.org/jira/browse/FLINK-33721
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, Flink's shell scripts, such as those used for end-to-end (e2e) 
> testing and Docker image building, require the ability to read from and 
> modify Flink's configuration files. With the introduction of standard YAML 
> files as the configuration format, the existing shell scripts are not 
> equipped to correctly handle read and modify operations on these files.
> To accommodate this requirement and enhance our script capabilities, we 
> propose an extension to the BashJavaUtils functionality. This extension will 
> enable BashJavaUtils to support the reading and modifying of standard YAML 
> files, ensuring that our shell scripts can seamlessly interact with the new 
> configuration format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33768) Support dynamic source parallelism inference for batch jobs

2024-01-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33768.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
c3c836216eaaaf24c1add3b490c8f425fda01d7c

> Support dynamic source parallelism inference for batch jobs
> ---
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-18 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808457#comment-17808457
 ] 

Zhu Zhu commented on FLINK-34105:
-

[~lsdy] Sounds good to me. Feel free to open a PR for it.

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34144) Update the documentation and configuration description about dynamic source parallelism inference

2024-01-18 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808424#comment-17808424
 ] 

Zhu Zhu commented on FLINK-34144:
-

[~xiasun] Assigned. Feel free to open a PR for it.

> Update the documentation and configuration description about dynamic source 
> parallelism inference
> -
>
> Key: FLINK-34144
> URL: https://issues.apache.org/jira/browse/FLINK-34144
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: xingbe
>Priority: Major
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource]
>  introduces the new feature of dynamic source parallelism inference, and we 
> plan to update the documentation and configuration items accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34143) Modify the effective strategy of `execution.batch.adaptive.auto-parallelism.default-source-parallelism`

2024-01-18 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808423#comment-17808423
 ] 

Zhu Zhu commented on FLINK-34143:
-

[~xiasun]Assigned. Feel free to open a PR for it.

> Modify the effective strategy of 
> `execution.batch.adaptive.auto-parallelism.default-source-parallelism`
> ---
>
> Key: FLINK-34143
> URL: https://issues.apache.org/jira/browse/FLINK-34143
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> Currently, if users do not set the 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  configuration option, the AdaptiveBatchScheduler defaults to a parallelism 
> of 1 for source vertices. In 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource],
>  the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  will act as the upper bound for inferring dynamic source parallelism, and 
> continuing with the current policy is no longer appropriate.
> We plan to change the effectiveness strategy of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}};
>  when the user does not set this config option, we will use the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.max-parallelism`{}}} as the 
> upper bound for source parallelism inference. If 
> {{`execution.batch.adaptive.auto-parallelism.max-parallelism`}} is also not 
> configured, the value of `{{{}parallelism.default`{}}} will be used as a 
> fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34143) Modify the effective strategy of `execution.batch.adaptive.auto-parallelism.default-source-parallelism`

2024-01-18 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808423#comment-17808423
 ] 

Zhu Zhu edited comment on FLINK-34143 at 1/19/24 1:54 AM:
--

[~xiasun] Assigned. Feel free to open a PR for it.


was (Author: zhuzh):
[~xiasun]Assigned. Feel free to open a PR for it.

> Modify the effective strategy of 
> `execution.batch.adaptive.auto-parallelism.default-source-parallelism`
> ---
>
> Key: FLINK-34143
> URL: https://issues.apache.org/jira/browse/FLINK-34143
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> Currently, if users do not set the 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  configuration option, the AdaptiveBatchScheduler defaults to a parallelism 
> of 1 for source vertices. In 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource],
>  the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  will act as the upper bound for inferring dynamic source parallelism, and 
> continuing with the current policy is no longer appropriate.
> We plan to change the effectiveness strategy of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}};
>  when the user does not set this config option, we will use the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.max-parallelism`{}}} as the 
> upper bound for source parallelism inference. If 
> {{`execution.batch.adaptive.auto-parallelism.max-parallelism`}} is also not 
> configured, the value of `{{{}parallelism.default`{}}} will be used as a 
> fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34143) Modify the effective strategy of `execution.batch.adaptive.auto-parallelism.default-source-parallelism`

2024-01-18 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34143:
---

Assignee: xingbe

> Modify the effective strategy of 
> `execution.batch.adaptive.auto-parallelism.default-source-parallelism`
> ---
>
> Key: FLINK-34143
> URL: https://issues.apache.org/jira/browse/FLINK-34143
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> Currently, if users do not set the 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  configuration option, the AdaptiveBatchScheduler defaults to a parallelism 
> of 1 for source vertices. In 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource],
>  the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  will act as the upper bound for inferring dynamic source parallelism, and 
> continuing with the current policy is no longer appropriate.
> We plan to change the effectiveness strategy of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}};
>  when the user does not set this config option, we will use the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.max-parallelism`{}}} as the 
> upper bound for source parallelism inference. If 
> {{`execution.batch.adaptive.auto-parallelism.max-parallelism`}} is also not 
> configured, the value of `{{{}parallelism.default`{}}} will be used as a 
> fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-16 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807554#comment-17807554
 ] 

Zhu Zhu commented on FLINK-34105:
-

I think it is a regression because existing jobs can become unstable.
Is it possible that we use a thread pool to do parallelized serialization 
before conducting task submission, so that it will not be counted as part of 
pekko RPC process?

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34116) GlobalConfigurationTest.testInvalidStandardYamlFile fails

2024-01-16 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34116.
---
Fix Version/s: 1.19.0
 Assignee: Junrui Li
   Resolution: Fixed

Fixed via
cdf314d30b59994283e0bbf70f350618de02118c

> GlobalConfigurationTest.testInvalidStandardYamlFile fails
> -
>
> Key: FLINK-34116
> URL: https://issues.apache.org/jira/browse/FLINK-34116
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> All build failures in the same build:
>  * 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56416=logs=f0ac5c25-1168-55a5-07ff-0e88223afed9=50bf7a25-bdc4-5e56-5478-c7b4511dde53=6274]
>  * 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56416=logs=675bf62c-8558-587e-2555-dcad13acefb5=5878eed3-cc1e-5b12-1ed0-9e7139ce0992=6256]
>  * 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56416=logs=d06b80b4-9e88-5d40-12a2-18072cf60528=609ecd5a-3f6e-5d0c-2239-2096b155a4d0=6176]
> {code:java}
> Jan 16 01:38:10 01:38:10.780 [ERROR] Failures: 
> Jan 16 01:38:10 01:38:10.781 [ERROR]   
> GlobalConfigurationTest.testInvalidStandardYamlFile:200 
> Jan 16 01:38:10 Multiple Failures (1 failure)
> Jan 16 01:38:10 -- failure 1 --
> Jan 16 01:38:10 Expecting actual:
> Jan 16 01:38:10   "java.lang.RuntimeException: Error parsing YAML 
> configuration.
> Jan 16 01:38:10   at 
> org.apache.flink.configuration.GlobalConfiguration.loadYAMLResource(GlobalConfiguration.java:351)
> Jan 16 01:38:10   at 
> org.apache.flink.configuration.GlobalConfiguration.loadConfiguration(GlobalConfiguration.java:162)
> Jan 16 01:38:10   at 
> org.apache.flink.configuration.GlobalConfiguration.loadConfiguration(GlobalConfiguration.java:115)
> Jan 16 01:38:10   at 
> org.apache.flink.configuration.GlobalConfigurationTest.lambda$testInvalidStandardYamlFile$3(GlobalConfigurationTest.java:198)
> Jan 16 01:38:10   at 
> org.assertj.core.api.ThrowableAssert.catchThrowable(ThrowableAssert.java:63)
> Jan 16 01:38:10   at 
> org.assertj.core.api.AssertionsForClassTypes.catchThrowable(AssertionsForClassTypes.java:892)
> Jan 16 01:38:10   at 
> org.assertj.core.api.Assertions.catchThrowable(Assertions.java:1366)
> Jan 16 01:38:10   at 
> org.assertj.core.api.Assertions.assertThatThrownBy(Assertions.java:1210)
> Jan 16 01:38:10   at 
> org.apache.flink.configuration.GlobalConfigurationTest.testInvalidStandardYamlFile(GlobalConfigurationTest.java:198)
> Jan 16 01:38:10   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> Jan 16 01:38:10   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jan 16 01:38:10   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jan 16 01:38:10   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-16 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807505#comment-17807505
 ] 

Zhu Zhu commented on FLINK-34105:
-

We tried increasing {{pekko.ask.timeout}} to {{1min}}(from the default 
{{10s}}), and the problem did not happen again.
So I guess it's not related to the framesize.

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-16 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807203#comment-17807203
 ] 

Zhu Zhu commented on FLINK-34105:
-

[~guoyangze] pekko.framesize is not configured and the default value should 
have been used.
Here are all the configuration:
```
parallelism.default: 1500
slotmanager.number-of-slots.max: 1500
taskmanager.numberOfTaskSlots: 10
jobmanager.memory.process.size: 24000m
taskmanager.memory.process.size: 24000m
resourcemanager.taskmanager-timeout: 90
taskmanager.memory.network.fraction: 0.2
cluster.evenly-spread-out-slots: true

table.optimizer.join-reorder-enabled: true
table.optimizer.join.broadcast-threshold: 10485760
table.exec.operator-fusion-codegen.enabled: true
```

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33980) Reorganize job configuration

2024-01-15 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33980.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
06b37089f0c1cdf70ca21970a40d15c3eaba07ed
290b633c4265540d481ac563454c7f4a3e706b9d
65b8b3baad6a27e6253a93701508ba24dc8fcfe0
d02ef1cebf302c56a0d9d51664d2c7fb6f5be932
eb8af0c589ce46b091f403e848c7dc84b7e3ee8b

> Reorganize job configuration
> 
>
> Key: FLINK-33980
> URL: https://issues.apache.org/jira/browse/FLINK-33980
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, job configuration in FLINK is spread out across different 
> components, including StreamExecutionEnvironment, CheckpointConfig, and 
> ExecutionConfig. This distribution leads to inconsistencies among the 
> configurations stored within these components. Furthermore, the methods used 
> to configure these components vary; some rely on complex Java objects, while 
> others use ConfigOption, which is a key-value configuration approach. This 
> variation complicates the effective management of job configurations. 
> Additionally, passing complex Java objects (e.g., StateBackend and 
> CheckpointStorage) between the environment, StreamGraph, and JobGraph adds 
> complexity to development.
> With the completion of FLIP-381, it is now time to standardize and unify job 
> configuration in FLINK. The goals of this JIRA are as follows:
>  # Migrate configuration from non-ConfigOption objects to use ConfigOption.
>  # Adopt a single Configuration object to house all configurations.
>  # Create complex Java objects, such as RestartBackoffTimeStrategyFactory, 
> CheckpointStorage, and StateBackend, directly from the configuration on the 
> JM side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34105:

Priority: Critical  (was: Major)

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807052#comment-17807052
 ] 

Zhu Zhu edited comment on FLINK-34105 at 1/16/24 6:06 AM:
--

[~lsdy] [~guoyangze] would you help to take a look?

This indicates a problem that existing large Flink batch jobs may become 
unstable in 1.19.


was (Author: zhuzh):
[~lsdy] [~guoyangze] would you help to take a look?

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Priority: Major
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34105:

Description: 
We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png|width=250! 

  was:
We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png! 


> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Priority: Major
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=250! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34105:

Description: 
We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png|width=800! 

  was:
We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png|width=600! 


> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Priority: Major
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34105:

Description: 
We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png|width=600! 

  was:
We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png|width=250! 


> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Priority: Major
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=600! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807052#comment-17807052
 ] 

Zhu Zhu commented on FLINK-34105:
-

[~lsdy] [~guoyangze] would you help to take a look?

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Priority: Major
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-34105:
---

 Summary: Akka timeout happens in TPC-DS benchmarks
 Key: FLINK-34105
 URL: https://issues.apache.org/jira/browse/FLINK-34105
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.19.0
Reporter: Zhu Zhu
 Attachments: image-2024-01-16-13-59-45-556.png

We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33576) Introduce new Flink conf file "config.yaml" supporting standard YAML syntax

2024-01-15 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33576.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19: 
ddc7171dd83ade50e6b551d3bc2d31acb5ff6451
b6b2a0f7abebc92f849bfa3ef6f1a71f25b9672d
0153b903d4df2f49c95ee2c53737aa5e492439be

> Introduce new Flink conf file "config.yaml" supporting standard YAML syntax
> ---
>
> Key: FLINK-33576
> URL: https://issues.apache.org/jira/browse/FLINK-33576
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Introduce new Flink conf file config.yaml, and this file will be parsed by 
> standard YAML syntax.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33968) Compute the number of subpartitions when initializing executon job vertices

2024-01-04 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33968.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/1.19: b25dfaee80727d6662a5fd445fe51cc139a8b9eb

> Compute the number of subpartitions when initializing executon job vertices
> ---
>
> Key: FLINK-33968
> URL: https://issues.apache.org/jira/browse/FLINK-33968
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Lijie Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, when using dynamic graphs, the subpartition-num of a task is 
> lazily calculated until the task deployment moment, this may lead to some 
> uncertainties in job recovery scenarios:
> Before jm crashs, when deploying upstream tasks, the parallelism of 
> downstream vertex may be unknown, so the subpartiton-num will be the max 
> parallelism of downstream job vertex. However, after jm restarts, when 
> deploying upstream tasks, the parallelism of downstream job vertex may be 
> known(has been calculated before jm crashs and been recovered after jm 
> restarts), so the subpartiton-num will be the actual parallelism of 
> downstream job vertex. The difference of calculated subpartition-num will 
> lead to the partitions generated before jm crashs cannot be reused after jm 
> restarts.
> We will solve this problem by advancing the calculation of subpartitoin-num 
> to the moment of initializing executon job vertex (in ctor of 
> IntermediateResultPartition)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33980) Reorganize job configuration

2024-01-03 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33980:
---

Assignee: Junrui Li

> Reorganize job configuration
> 
>
> Key: FLINK-33980
> URL: https://issues.apache.org/jira/browse/FLINK-33980
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>
> Currently, job configuration in FLINK is spread out across different 
> components, including StreamExecutionEnvironment, CheckpointConfig, and 
> ExecutionConfig. This distribution leads to inconsistencies among the 
> configurations stored within these components. Furthermore, the methods used 
> to configure these components vary; some rely on complex Java objects, while 
> others use ConfigOption, which is a key-value configuration approach. This 
> variation complicates the effective management of job configurations. 
> Additionally, passing complex Java objects (e.g., StateBackend and 
> CheckpointStorage) between the environment, StreamGraph, and JobGraph adds 
> complexity to development.
> With the completion of FLIP-381, it is now time to standardize and unify job 
> configuration in FLINK. The goals of this JIRA are as follows:
>  # Migrate configuration from non-ConfigOption objects to use ConfigOption.
>  # Adopt a single Configuration object to house all configurations.
>  # Create complex Java objects, such as RestartBackoffTimeStrategyFactory, 
> CheckpointStorage, and StateBackend, directly from the configuration on the 
> JM side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2023-12-21 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799647#comment-17799647
 ] 

Zhu Zhu edited comment on FLINK-32513 at 12/22/23 2:46 AM:
---

The problem happens because the {{getTransitivePredecessors()}} of 
{{TwoInputTransformation}} is not properly implemented. If the two inputs share 
the same upstream node, that node will be visited twice. It results in a 2^N(N 
= number of TwoInputTransformation) time cost to iterate and space cost to 
store the predecessors.
A possible solution can be adding a cache of predecessors for each 
Transformation and using LinkedHashSet to deduplicate the predecessors.
Note that a few other transformations can lead to the same problem too, e.g. 
UnionTransformation, AbstractMultipleInputTransformation.


was (Author: zhuzh):
The problem happens because the {{getTransitivePredecessors()}} of 
{{TwoInputTransformation}} is not properly implemented, which results in a 
2^N(N = number of TwoInputTransformation) time cost to iterate and space cost 
to store the predecessors.
A possible solution can be adding a cache of predecessors for each 
Transformation and using LinkedHashSet to deduplicate the predecessors.
Note that a few other transformations can lead to the same problem too, e.g. 
UnionTransformation, AbstractMultipleInputTransformation.

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Priority: Critical
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> 

[jira] [Commented] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2023-12-21 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799647#comment-17799647
 ] 

Zhu Zhu commented on FLINK-32513:
-

The problem happens because the {{getTransitivePredecessors()}} of 
{{TwoInputTransformation}} is not properly implemented, which results in a 
2^N(N = number of TwoInputTransformation) time cost to iterate and space cost 
to store the predecessors.
A possible solution can be adding a cache of predecessors for each 
Transformation and using LinkedHashSet to deduplicate the predecessors.
Note that a few other transformations can lead to the same problem too, e.g. 
UnionTransformation, AbstractMultipleInputTransformation.

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Priority: Critical
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> 

[jira] [Assigned] (FLINK-33712) FLIP-391: Deprecate RuntimeContext#getExecutionConfig

2023-12-11 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33712:
---

Assignee: Junrui Li

> FLIP-391: Deprecate RuntimeContext#getExecutionConfig
> -
>
> Key: FLINK-33712
> URL: https://issues.apache.org/jira/browse/FLINK-33712
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>
> Deprecate the RuntimeContext#getExecutionConfig and  introduce alternative 
> getter methods that allow users to access specific information without 
> exposing unnecessary runtime details. More details see: 
> [FLIP-391|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278465937]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33713) Deprecate RuntimeContext#getExecutionConfig

2023-12-11 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33713.
---
Fix Version/s: 1.19.0
   Resolution: Done

Done via 3aa70df4e7da93ed32c26cfabdaeb606233419b1

> Deprecate RuntimeContext#getExecutionConfig
> ---
>
> Key: FLINK-33713
> URL: https://issues.apache.org/jira/browse/FLINK-33713
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Deprecate RuntimeContext#getExecutionConfig



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33768) Support dynamic source parallelism inference for batch jobs

2023-12-06 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33768:
---

Assignee: xingbe

> Support dynamic source parallelism inference for batch jobs
> ---
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33669) Update the documentation for RestartStrategy, Checkpoint Storage, and State Backend.

2023-12-05 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33669.
---
Fix Version/s: 1.19.0
   Resolution: Done

master:
e27f8a3a0783d551457a2f424b01267bd1c8c2c2
d9bcb3b40ed5cefadbbaf391dacaa0ecd7fc8243
52d8d3583e5c989da84126a8805ab335408c46c2

> Update the documentation for RestartStrategy, Checkpoint Storage, and State 
> Backend.
> 
>
> Key: FLINK-33669
> URL: https://issues.apache.org/jira/browse/FLINK-33669
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> After the deprecation of complex Java object getter and setter methods in 
> FLIP-381, Flink now recommends the use of ConfigOptions for the configuration 
> of RestartStrategy, Checkpoint Storage, and State Backend. It is necessary 
> that we update FLINK documentation to clearly instruct users on this new 
> recommended approach.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33581) FLIP-381: Deprecate configuration getters/setters that return/set complex Java objects

2023-12-04 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792830#comment-17792830
 ] 

Zhu Zhu commented on FLINK-33581:
-

for pyflink:
17b82a7662c76dc9f41125782ba5232bb1a7eea4
4a7def9fe00387df66af8daf23603a6ea0848e03
17b82a7662c76dc9f41125782ba5232bb1a7eea4

> FLIP-381: Deprecate configuration getters/setters that return/set complex 
> Java objects
> --
>
> Key: FLINK-33581
> URL: https://issues.apache.org/jira/browse/FLINK-33581
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / DataStream
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: image-2023-11-30-17-59-42-650.png
>
>
> Deprecate the non-ConfigOption objects in the StreamExecutionEnvironment, 
> CheckpointConfig, and ExecutionConfig, and ultimately removing them in FLINK 
> 2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33731) failover.flip1 package can be rename to failover

2023-12-03 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792696#comment-17792696
 ] 

Zhu Zhu edited comment on FLINK-33731 at 12/4/23 7:46 AM:
--

LGTM. It is a legacy issue. When the new failover logics were introduced, there 
were legacy classes directly in package 
{{org.apache.flink.runtime.executiongraph.failover}}, so the sub-package was 
introduced to avoid mixing them up. 
Now that the legacy failover classes are removed already and it's time to do 
the renaming.


was (Author: zhuzh):
LGTM. It is a legacy issue. When the new failover logics is introduced, were 
legacy classes directly in package 
{{org.apache.flink.runtime.executiongraph.failover}}, so the sub-package was 
introduced to avoid mixing them up. 
Now that the legacy failover classes are removed already and it's time to do 
the renaming.

> failover.flip1 package can be rename to failover
> 
>
> Key: FLINK-33731
> URL: https://issues.apache.org/jira/browse/FLINK-33731
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Affects Versions: 1.18.0, 1.17.2
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, there is a org.apache.flink.runtime.executiongraph.failover.flip1 
> package.
> I propose rename the failover.flip1 to failover, in the other word: removing 
> the flip1. I have 2 reasons:
>  * The naming of the package should be based on business semantics, not FLIP 
> number, and the code under the failover.flip1 package has also many changes 
> after FLIP-1.
>  * All code under the failover.flip1 package are Internal code instead 
> @Public code, so they can be renamed directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33731) failover.flip1 package can be rename to failover

2023-12-03 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792696#comment-17792696
 ] 

Zhu Zhu commented on FLINK-33731:
-

LGTM. It is a legacy issue. When the new failover logics is introduced, were 
legacy classes directly in package 
{{org.apache.flink.runtime.executiongraph.failover}}, so the sub-package was 
introduced to avoid mixing them up. 
Now that the legacy failover classes are removed already and it's time to do 
the renaming.

> failover.flip1 package can be rename to failover
> 
>
> Key: FLINK-33731
> URL: https://issues.apache.org/jira/browse/FLINK-33731
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Affects Versions: 1.18.0, 1.17.2
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, there is a org.apache.flink.runtime.executiongraph.failover.flip1 
> package.
> I propose rename the failover.flip1 to failover, in the other word: removing 
> the flip1. I have 2 reasons:
>  * The naming of the package should be based on business semantics, not FLIP 
> number, and the code under the failover.flip1 package has also many changes 
> after FLIP-1.
>  * All code under the failover.flip1 package are Internal code instead 
> @Public code, so they can be renamed directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33364) Introduce standard YAML for flink configuration

2023-11-29 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33364.
---
Fix Version/s: 1.19.0
   Resolution: Fixed

Done via d5b1afb7f67fe7da166193deb7711b8a8b163bcf

> Introduce standard YAML for flink configuration
> ---
>
> Key: FLINK-33364
> URL: https://issues.apache.org/jira/browse/FLINK-33364
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33677) Remove flink-conf.yaml from flink dist

2023-11-28 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-33677:
---

 Summary: Remove flink-conf.yaml from flink dist
 Key: FLINK-33677
 URL: https://issues.apache.org/jira/browse/FLINK-33677
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Configuration
Reporter: Zhu Zhu


FLINK-33297/FLIP-366 supports parsing standard YAML files for Flink 
configuration. A new configuration file config.yaml, which should be a standard 
YAML file, is introduced.
To ensure compatibility, in Flink 1.x, the old configuration parser will still 
be used if the old configuration file flink-conf.yaml exists. Only if it does 
not exist, the new configuration file will be used.
In Flink 2.0, we should remove the old configuration file from flink dist, as 
well as the old configuration parser.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33581) FLIP-381: Deprecate configuration getters/setters that return/set complex Java objects

2023-11-27 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33581.
---
Fix Version/s: 1.19.0
   Resolution: Done

Done via
0e0099b4eb1285929fec02326f661cba899eedcd
139db3f4bc7faed4478393a91a063ad54d15a523
dae2eb5b61f71b9453a73e4f0b3c69fd28f54ebf

> FLIP-381: Deprecate configuration getters/setters that return/set complex 
> Java objects
> --
>
> Key: FLINK-33581
> URL: https://issues.apache.org/jira/browse/FLINK-33581
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / DataStream
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Deprecate he non-ConfigOption objects in the StreamExecutionEnvironment, 
> CheckpointConfig, and ExecutionConfig, and ultimately removing them in FLINK 
> 2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33297) FLIP-366: Support standard YAML for FLINK configuration

2023-11-14 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33297:

Labels: pull-request-available  (was: pull-request-available stale-assigned)

> FLIP-366: Support standard YAML for FLINK configuration
> ---
>
> Key: FLINK-33297
> URL: https://issues.apache.org/jira/browse/FLINK-33297
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Configuration
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Support standard YAML for FLINK configuration



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33080) The checkpoint storage configured in the job level by config option will not take effect

2023-10-30 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33080:

Affects Version/s: 1.17.1
   1.18.0
   1.19.0

> The checkpoint storage configured in the job level by config option will not 
> take effect
> 
>
> Key: FLINK-33080
> URL: https://issues.apache.org/jira/browse/FLINK-33080
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Configuration
>Affects Versions: 1.18.0, 1.17.1, 1.19.0
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> When we configure the checkpoint storage at the job level, it can only be 
> done through the following method:
> {code:java}
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.getCheckpointConfig().setCheckpointStorage(xxx); {code}
> or configure filesystem storage by config option 
> CheckpointingOptions.CHECKPOINTS_DIRECTORY through the following method:
> {code:java}
> Configuration configuration = new Configuration();
> configuration.set(CheckpointingOptions.CHECKPOINTS_DIRECTORY, checkpointDir); 
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment(configuration);{code}
> However, configure the other type checkpoint storage by the job-side 
> configuration like the following will not take effect:
> {code:java}
> Configuration configuration = new Configuration();
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment(configuration);
> configuration.set(CheckpointingOptions.CHECKPOINT_STORAGE, 
> "aaa.bbb.ccc.CustomCheckpointStorage");
> configuration.set(CheckpointingOptions.CHECKPOINTS_DIRECTORY, checkpointDir); 
> {code}
> This behavior is unexpected, we should allow this way will take effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33080) The checkpoint storage configured in the job level by config option will not take effect

2023-10-30 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33080.
---
Resolution: Fixed

Fixed via 25697476095a5b9cf38dc3b61c684d0e912b1353

> The checkpoint storage configured in the job level by config option will not 
> take effect
> 
>
> Key: FLINK-33080
> URL: https://issues.apache.org/jira/browse/FLINK-33080
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Configuration
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> When we configure the checkpoint storage at the job level, it can only be 
> done through the following method:
> {code:java}
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.getCheckpointConfig().setCheckpointStorage(xxx); {code}
> or configure filesystem storage by config option 
> CheckpointingOptions.CHECKPOINTS_DIRECTORY through the following method:
> {code:java}
> Configuration configuration = new Configuration();
> configuration.set(CheckpointingOptions.CHECKPOINTS_DIRECTORY, checkpointDir); 
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment(configuration);{code}
> However, configure the other type checkpoint storage by the job-side 
> configuration like the following will not take effect:
> {code:java}
> Configuration configuration = new Configuration();
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment(configuration);
> configuration.set(CheckpointingOptions.CHECKPOINT_STORAGE, 
> "aaa.bbb.ccc.CustomCheckpointStorage");
> configuration.set(CheckpointingOptions.CHECKPOINTS_DIRECTORY, checkpointDir); 
> {code}
> This behavior is unexpected, we should allow this way will take effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >