[jira] [Commented] (FLINK-35426) Change the distribution of DynamicFilteringDataCollector to Broadcast

2024-05-22 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848801#comment-17848801
 ] 

Zhu Zhu commented on FLINK-35426:
-

Good point! [~xiasun]
The task is assigned to you. Feel free to open a pr for it.

> Change the distribution of DynamicFilteringDataCollector to Broadcast
> -
>
> Key: FLINK-35426
> URL: https://issues.apache.org/jira/browse/FLINK-35426
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
> Fix For: 1.20.0
>
>
> Currently, the DynamicFilteringDataCollector is utilized in the dynamic 
> partition pruning feature of batch jobs to collect the partition information 
> dynamically filtered by the source. Its current data distribution method is 
> rebalance, and it also acts as an upstream vertex to the probe side Source.
> Presently, when the Scheduler dynamically infers the parallelism for vertices 
> that are both downstream and Source, it considers factors from both sides, 
> which can lead to an overestimation of parallelism due to 
> DynamicFilteringDataCollector being an upstream of the Source. We aim to 
> change the distribution method of the DynamicFilteringDataCollector to 
> broadcast to prevent the dynamic overestimation of Source parallelism.
> Furthermore, given that the DynamicFilteringDataCollector transmits data 
> through the OperatorCoordinator rather than through normal data distribution, 
> this change will not affect the DPP (Dynamic Partition Pruning) functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35426) Change the distribution of DynamicFilteringDataCollector to Broadcast

2024-05-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35426:
---

Assignee: xingbe

> Change the distribution of DynamicFilteringDataCollector to Broadcast
> -
>
> Key: FLINK-35426
> URL: https://issues.apache.org/jira/browse/FLINK-35426
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
> Fix For: 1.20.0
>
>
> Currently, the DynamicFilteringDataCollector is utilized in the dynamic 
> partition pruning feature of batch jobs to collect the partition information 
> dynamically filtered by the source. Its current data distribution method is 
> rebalance, and it also acts as an upstream vertex to the probe side Source.
> Presently, when the Scheduler dynamically infers the parallelism for vertices 
> that are both downstream and Source, it considers factors from both sides, 
> which can lead to an overestimation of parallelism due to 
> DynamicFilteringDataCollector being an upstream of the Source. We aim to 
> change the distribution method of the DynamicFilteringDataCollector to 
> broadcast to prevent the dynamic overestimation of Source parallelism.
> Furthermore, given that the DynamicFilteringDataCollector transmits data 
> through the OperatorCoordinator rather than through normal data distribution, 
> this change will not affect the DPP (Dynamic Partition Pruning) functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35384) Expose TaskIOMetricGroup to custom Partitioner via init Context

2024-05-21 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848126#comment-17848126
 ] 

Zhu Zhu commented on FLINK-35384:
-

Enabling metrics for partitioners makes sense and the proposed approach of 
introducing a context sounds good.

How about introducing a sub-metric group specifically for partitioner metrics? 
A single task might contain multiple partitioners for which the metrics should 
not get mixed. It also avoids exposing the internal TaskIOMetricGroup class to 
users.

> Expose TaskIOMetricGroup to custom Partitioner via init Context
> ---
>
> Key: FLINK-35384
> URL: https://issues.apache.org/jira/browse/FLINK-35384
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Affects Versions: 1.9.4
>Reporter: Steven Zhen Wu
>Priority: Major
>
> I am trying to implement a custom range partitioner in the Flink Iceberg 
> sink. Want to publish some counter metrics for certain scenarios. This is 
> like the network metrics exposed in `TaskIOMetricGroup`.
> We can implement the range partitioner using the public interface from 
> `DataStream`. 
> {code}
> public  DataStream partitionCustom(
> Partitioner partitioner, KeySelector keySelector)
> {code}
> We can pass the `TaskIOMetricGroup` to the `StreamPartitioner` that 
> `CustomPartitionerWrapper` extends from. `CustomPartitionerWrapper` wraps the 
> pubic `Partitioner` interface, where we can implement the custom range 
> partitioner.
> `Partitioner` interface is a functional interface today. we can add a new 
> default `setup` method without breaking the backward compatibility.
> {code}
> @Public
> @FunctionalInterface
> public interface Partitioner extends java.io.Serializable, Function {
> *default void setup(TaskIOMetricGroup metrics) {}*
> int partition(K key, int numPartitions);
> }
> {code}
> I know public interface requires a FLIP process. will do that if the 
> community agree with this feature request.
> Personally, `numPartitions` should be passed in the `setup` method too. But 
> it is a breaking change that is NOT worth the benefit right now.
> {code}
> @Public
> @FunctionalInterface
> public interface Partitioner extends java.io.Serializable, Function {
> public void setup(int numPartitions, TaskIOMetricGroup metrics) {}
> int partition(K key);
> }
> {code}
> That would be similar to `StreamPartitioner#setup()` method that we would 
> need to modify for passing the metrics group.
> {code}
> @Internal
> public abstract class StreamPartitioner
> implements ChannelSelector>>, 
> Serializable {
> @Override
> public void setup(int numberOfChannels, TaskIOMetricGroup metrics) {
> this.numberOfChannels = numberOfChannels;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35399) Add documents for batch job master failure recovery

2024-05-19 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-35399:
---

 Summary: Add documents for batch job master failure recovery
 Key: FLINK-35399
 URL: https://issues.apache.org/jira/browse/FLINK-35399
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Zhu Zhu
Assignee: Junrui Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33983) Introduce JobEvent and JobEventStore for Batch Job Recovery

2024-05-19 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33983.
---
Fix Version/s: 1.20.0
   Resolution: Done

master: c7c1d78752836b96591e31422c65b85eca38bd50

> Introduce JobEvent and JobEventStore for Batch Job Recovery
> ---
>
> Key: FLINK-33983
> URL: https://issues.apache.org/jira/browse/FLINK-33983
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33986) Extend shuffleMaster to support batch snapshot.

2024-05-14 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33986:

Component/s: Runtime / Coordination

> Extend shuffleMaster to support batch snapshot.
> ---
>
> Key: FLINK-33986
> URL: https://issues.apache.org/jira/browse/FLINK-33986
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Extend shuffleMaster to support batch snapshot as follows:
>  # Add method supportsBatchSnapshot to identify whether the shuffle master 
> supports taking snapshot in batch scenarios
>  # Add method snapshotState and restoreState to snapshot and restore the 
> shuffle master's state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33986) Extend shuffleMaster to support batch snapshot.

2024-05-14 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33986.
---
Fix Version/s: 1.20.0
   Resolution: Done

65d31e26534836909f6b8139c6bd6cd45b91bba4

> Extend shuffleMaster to support batch snapshot.
> ---
>
> Key: FLINK-33986
> URL: https://issues.apache.org/jira/browse/FLINK-33986
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Extend shuffleMaster to support batch snapshot as follows:
>  # Add method supportsBatchSnapshot to identify whether the shuffle master 
> supports taking snapshot in batch scenarios
>  # Add method snapshotState and restoreState to snapshot and restore the 
> shuffle master's state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35293) FLIP-445: Support dynamic parallelism inference for HiveSource

2024-05-13 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846177#comment-17846177
 ] 

Zhu Zhu commented on FLINK-35293:
-

The change is merged. Could you add release notes for it and close the ticket? 
[~xiasun].

> FLIP-445: Support dynamic parallelism inference for HiveSource
> --
>
> Key: FLINK-35293
> URL: https://issues.apache.org/jira/browse/FLINK-35293
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  introduces dynamic source parallelism inference, which, compared to static 
> inference, utilizes runtime information to more accurately determine the 
> source parallelism. The FileSource already possesses the capability for 
> dynamic parallelism inference. As a follow-up task to FLIP-379, this FLIP 
> plans to implement the dynamic parallelism inference interface for 
> HiveSource, and also switches the default static parallelism inference to 
> dynamic parallelism inference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35293) FLIP-445: Support dynamic parallelism inference for HiveSource

2024-05-13 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846137#comment-17846137
 ] 

Zhu Zhu commented on FLINK-35293:
-

master: ddb5a5355f9aca3d223f1fff6581d83dd317c2de

> FLIP-445: Support dynamic parallelism inference for HiveSource
> --
>
> Key: FLINK-35293
> URL: https://issues.apache.org/jira/browse/FLINK-35293
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  introduces dynamic source parallelism inference, which, compared to static 
> inference, utilizes runtime information to more accurately determine the 
> source parallelism. The FileSource already possesses the capability for 
> dynamic parallelism inference. As a follow-up task to FLIP-379, this FLIP 
> plans to implement the dynamic parallelism inference interface for 
> HiveSource, and also switches the default static parallelism inference to 
> dynamic parallelism inference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34661) TaskExecutor supports retain partitions after JM crashed.

2024-05-13 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34661.
---
Fix Version/s: 1.20.0
   Resolution: Done

4e6b42046adbe2f337460d2e50f1fee12cff21a5

> TaskExecutor supports retain partitions after JM crashed.
> -
>
> Key: FLINK-34661
> URL: https://issues.apache.org/jira/browse/FLINK-34661
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35270.
---
Resolution: Fixed

547e4b53ebe36c39066adcf3a98123a1f7890c15

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Haifei Chen
>Assignee: Haifei Chen
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35270:
---

Assignee: Haifei Chen

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Haifei Chen
>Assignee: Haifei Chen
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-35270:

Component/s: API / Core

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Haifei Chen
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-35270:

Labels: pull-request-available starter  (was: pull-request-available)

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>Reporter: Haifei Chen
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-35270:

Fix Version/s: 1.20.0

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>Reporter: Haifei Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-35293) FLIP-445: Support dynamic parallelism inference for HiveSource

2024-05-06 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35293:
---

Assignee: xingbe

> FLIP-445: Support dynamic parallelism inference for HiveSource
> --
>
> Key: FLINK-35293
> URL: https://issues.apache.org/jira/browse/FLINK-35293
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  introduces dynamic source parallelism inference, which, compared to static 
> inference, utilizes runtime information to more accurately determine the 
> source parallelism. The FileSource already possesses the capability for 
> dynamic parallelism inference. As a follow-up task to FLIP-379, this FLIP 
> plans to implement the dynamic parallelism inference interface for 
> HiveSource, and also switches the default static parallelism inference to 
> dynamic parallelism inference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35009) Change on getTransitivePredecessors breaks connectors

2024-04-09 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835577#comment-17835577
 ] 

Zhu Zhu commented on FLINK-35009:
-

[~martijnvisser] Thanks for monitoring the kafka connector builds and reporting 
this problem!
Feel free to loop me in to review the pr.

> Change on getTransitivePredecessors breaks connectors
> -
>
> Key: FLINK-35009
> URL: https://issues.apache.org/jira/browse/FLINK-35009
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / Kafka
>Affects Versions: 1.18.2, 1.20.0, 1.19.1
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Blocker
>
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-kafka: Compilation failure: 
> Compilation failure: 
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[214,24]
>  
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  is not abstract and does not override abstract method 
> getTransitivePredecessorsInternal() in org.apache.flink.api.dag.Transformation
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[220,44]
>  getTransitivePredecessors() in 
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  cannot override getTransitivePredecessors() in 
> org.apache.flink.api.dag.Transformation
> Error:overridden method is final
> {code}
> Example: 
> https://github.com/apache/flink-connector-kafka/actions/runs/8494349338/job/23269406762#step:15:167



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35009) Change on getTransitivePredecessors breaks connectors

2024-04-08 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834799#comment-17834799
 ] 

Zhu Zhu commented on FLINK-35009:
-

Thanks for looking into the issue. [~Weijie Guo]
So it is just a test class which uses Flink `@Internal` classes is broken. And 
that test class is even not used. 
I think it's better to just remove `MockTransformation` from kafka connector.
WDYT? [~martijnvisser]

> Change on getTransitivePredecessors breaks connectors
> -
>
> Key: FLINK-35009
> URL: https://issues.apache.org/jira/browse/FLINK-35009
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / Kafka
>Affects Versions: 1.18.2, 1.20.0, 1.19.1
>Reporter: Martijn Visser
>Priority: Blocker
>
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-kafka: Compilation failure: 
> Compilation failure: 
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[214,24]
>  
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  is not abstract and does not override abstract method 
> getTransitivePredecessorsInternal() in org.apache.flink.api.dag.Transformation
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[220,44]
>  getTransitivePredecessors() in 
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  cannot override getTransitivePredecessors() in 
> org.apache.flink.api.dag.Transformation
> Error:overridden method is final
> {code}
> Example: 
> https://github.com/apache/flink-connector-kafka/actions/runs/8494349338/job/23269406762#step:15:167



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-04-08 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834798#comment-17834798
 ] 

Zhu Zhu edited comment on FLINK-32513 at 4/8/24 6:27 AM:
-

{{Transformation}} is not a public interface, it is an @Internal class. 
Ideally, kafka connectors should not directly manipulate {{Transformation}}.
Yet we may try to find a workaround to avoid break existing kafka connectors.


was (Author: zhuzh):
{{Transformation}} is not a public interface, it is an @Internal class. 
Ideally, kafka connectors should not directly manipulate {{Transformation}}.
Production-wise, we may take a look whether there is a workaround to avoid 
break existing kafka connectors.

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>     

[jira] [Commented] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-04-08 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834798#comment-17834798
 ] 

Zhu Zhu commented on FLINK-32513:
-

{{Transformation}} is not a public interface, it is an @Internal class. 
Ideally, kafka connectors should not directly manipulate {{Transformation}}.
Production-wise, we may take a look whether there is a workaround to avoid 
break existing kafka connectors.

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> 

[jira] [Assigned] (FLINK-34661) TaskExecutor supports retain partitions after JM crashed.

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34661:
---

Assignee: Junrui Li

> TaskExecutor supports retain partitions after JM crashed.
> -
>
> Key: FLINK-34661
> URL: https://issues.apache.org/jira/browse/FLINK-34661
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33983) Introduce JobEvent and JobEventStore for Batch Job Recovery

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33983:
---

Assignee: Junrui Li

> Introduce JobEvent and JobEventStore for Batch Job Recovery
> ---
>
> Key: FLINK-33983
> URL: https://issues.apache.org/jira/browse/FLINK-33983
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33986) Extend shuffleMaster to support batch snapshot.

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33986:
---

Assignee: Junrui Li

> Extend shuffleMaster to support batch snapshot.
> ---
>
> Key: FLINK-33986
> URL: https://issues.apache.org/jira/browse/FLINK-33986
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>
> Extend shuffleMaster to support batch snapshot as follows:
>  # Add method supportsBatchSnapshot to identify whether the shuffle master 
> supports taking snapshot in batch scenarios
>  # Add method snapshotState and restoreState to snapshot and restore the 
> shuffle master's state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33984) Introduce SupportsBatchSnapshot for operator coordinator

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33984.
---
Fix Version/s: 1.20.0
   Resolution: Done

master:
38255652406becbfbcb7cbec557aa5ba9a1ebbb3
558ca75da2fcec875d1e04a8d75a24fd0ad42ccc

> Introduce SupportsBatchSnapshot for operator coordinator
> 
>
> Key: FLINK-33984
> URL: https://issues.apache.org/jira/browse/FLINK-33984
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34945) Support recover shuffle descriptor and partition metrics from tiered storage

2024-04-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34945:
---

Assignee: Junrui Li

> Support recover shuffle descriptor and partition metrics from tiered storage
> 
>
> Key: FLINK-34945
> URL: https://issues.apache.org/jira/browse/FLINK-34945
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33985) Support obtain all partitions existing in cluster through ShuffleMaster.

2024-04-02 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33985.
---
Fix Version/s: 1.20.0
   Resolution: Done

master: a44709662956b306fe686623d00358a6b076f637

> Support obtain all partitions existing in cluster through ShuffleMaster.
> 
>
> Key: FLINK-33985
> URL: https://issues.apache.org/jira/browse/FLINK-33985
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Support obtain all partitions existing in cluster through ShuffleMaster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33985) Support obtain all partitions existing in cluster through ShuffleMaster.

2024-04-02 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33985:

Component/s: Runtime / Coordination

> Support obtain all partitions existing in cluster through ShuffleMaster.
> 
>
> Key: FLINK-33985
> URL: https://issues.apache.org/jira/browse/FLINK-33985
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Support obtain all partitions existing in cluster through ShuffleMaster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33982) Introduce new config options for Batch Job Recovery

2024-04-02 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33982.
---
Fix Version/s: 1.20.0
   Resolution: Done

master: ec1311c8eb805f91b3b8d7d7cbe192e8cad05a76

> Introduce new config options for Batch Job Recovery
> ---
>
> Key: FLINK-33982
> URL: https://issues.apache.org/jira/browse/FLINK-33982
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34565) Enhance flink kubernetes configMap to accommodate additional configuration files

2024-03-27 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831336#comment-17831336
 ] 

Zhu Zhu commented on FLINK-34565:
-

IIUC, the requirement is to ship more user files, which may be needed by user 
code, to the pod. Supporting configuration files is just a special case of it. 
Shipping them via ConfigMap sounds a bit tricky to me.
cc [~wangyang0918]

> Enhance flink kubernetes configMap to accommodate additional configuration 
> files
> 
>
> Key: FLINK-34565
> URL: https://issues.apache.org/jira/browse/FLINK-34565
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Reporter: Surendra Singh Lilhore
>Priority: Major
>  Labels: pull-request-available
>
> Flink kubernetes client currently supports a fixed number of files 
> (flink-conf.yaml, logback-console.xml, log4j-console.properties) in the JM 
> and TM Pod ConfigMap. In certain scenarios, particularly in app mode, 
> additional configuration files are required for jobs to run successfully. 
> Presently, users must resort to workarounds to include dynamic configuration 
> files in the JM and TM. This proposed improvement allows users to easily add 
> extra files by configuring the 
> '{*}kubernetes.flink.configmap.additional.resources{*}' property. Users can 
> provide a semicolon-separated list of local files in the client Flink config 
> directory that should be included in the Flink ConfigMap.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33984) Introduce SupportsBatchSnapshot for operator coordinator

2024-03-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33984:
---

Assignee: Junrui Li

> Introduce SupportsBatchSnapshot for operator coordinator
> 
>
> Key: FLINK-33984
> URL: https://issues.apache.org/jira/browse/FLINK-33984
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33985) Support obtain all partitions existing in cluster through ShuffleMaster.

2024-03-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33985:
---

Assignee: Junrui Li

> Support obtain all partitions existing in cluster through ShuffleMaster.
> 
>
> Key: FLINK-33985
> URL: https://issues.apache.org/jira/browse/FLINK-33985
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Support obtain all partitions existing in cluster through ShuffleMaster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33982) Introduce new config options for Batch Job Recovery

2024-03-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33982:
---

Assignee: Junrui Li

> Introduce new config options for Batch Job Recovery
> ---
>
> Key: FLINK-33982
> URL: https://issues.apache.org/jira/browse/FLINK-33982
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33892) FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-03-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33892:

Summary: FLIP-383: Support Job Recovery from JobMaster Failures for Batch 
Jobs  (was: FLIP-383: Support Job Recovery for Batch Jobs)

> FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs
> -
>
> Key: FLINK-33892
> URL: https://issues.apache.org/jira/browse/FLINK-33892
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Junrui Li
>Priority: Major
>
> This is the umbrella ticket for 
> [FLIP-383|https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+for+Batch+Jobs]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-03-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-32513.
---
Fix Version/s: 1.18.2
   1.20.0
   1.19.1
   Resolution: Fixed

master: 8dcb0ae9063b66af1d674b7b0b3be76b6d752692
release-1.19: 5ec4bf2f18168001b5cbb9012f331d3405228516
release-1.18: 940b3bbda5b10abe3a41d60467d33fd424c7dae6

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> 

[jira] [Closed] (FLINK-34731) Remove SpeculativeScheduler and incorporate its features into AdaptiveBatchScheduler

2024-03-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34731.
---
Resolution: Done

master: cf0d75c4bb324825a057dc72243bb6a2046f8479

> Remove SpeculativeScheduler and incorporate its features into 
> AdaptiveBatchScheduler
> 
>
> Key: FLINK-34731
> URL: https://issues.apache.org/jira/browse/FLINK-34731
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Presently, speculative execution is exposed to users as a feature of the 
> AdaptiveBatchScheduler.
> To streamline our codebase and reduce maintenance overhead, this ticket will 
> consolidate the SpeculativeScheduler into the AdaptiveBatchScheduler, 
> eliminating the need for a separate SpeculativeScheduler class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-03-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-32513:
---

Assignee: Jeyhun Karimov

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> 

[jira] [Closed] (FLINK-34725) Dockerfiles for release publishing has incorrect config.yaml path

2024-03-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34725.
---
Resolution: Fixed

master: 3f4a80989fe7243983926f09fac2283f6fa63693
release-1.19: f53c5628e43777b4b924ec81224acc3df938800a

> Dockerfiles for release publishing has incorrect config.yaml path
> -
>
> Key: FLINK-34725
> URL: https://issues.apache.org/jira/browse/FLINK-34725
> Project: Flink
>  Issue Type: Bug
>  Components: flink-docker
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0, 1.19.1
>
>
> An issue found when do docker image publishing, unexpected error msg:
> {code:java}
> sed: can't read /config.yaml: No such file or directory{code}
>  
> also found in flink-docker/master daily Publish SNAPSHOTs  action:
> [https://github.com/apache/flink-docker/actions/runs/8210534289/job/22458150514#step:8:588]
> [https://github.com/apache/flink-docker/actions/runs/8210534289/job/22458150322#step:8:549]
>  
> This related to changes by https://issues.apache.org/jira/browse/FLINK-34205



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34731) Remove SpeculativeScheduler and incorporate its features into AdaptiveBatchScheduler

2024-03-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34731:
---

Assignee: Junrui Li

> Remove SpeculativeScheduler and incorporate its features into 
> AdaptiveBatchScheduler
> 
>
> Key: FLINK-34731
> URL: https://issues.apache.org/jira/browse/FLINK-34731
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
> Fix For: 1.20.0
>
>
> Presently, speculative execution is exposed to users as a feature of the 
> AdaptiveBatchScheduler.
> To streamline our codebase and reduce maintenance overhead, this ticket will 
> consolidate the SpeculativeScheduler into the AdaptiveBatchScheduler, 
> eliminating the need for a separate SpeculativeScheduler class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-03-06 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34105.
---
Fix Version/s: 1.19.0
 Assignee: dizhou cao  (was: Yangze Guo)
   Resolution: Fixed

1.19: 837f8e584850bdcbc586a54f58e3fe953a816be8

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: dizhou cao
>Priority: Critical
> Fix For: 1.19.0
>
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34377) Release Testing: Verify FLINK-33297 Support standard YAML for FLINK configuration

2024-02-25 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820553#comment-17820553
 ] 

Zhu Zhu commented on FLINK-34377:
-

Thanks for volunteering! [~xiasun]
I have assigned you the ticket.

> Release Testing: Verify FLINK-33297 Support standard YAML for FLINK 
> configuration
> -
>
> Key: FLINK-34377
> URL: https://issues.apache.org/jira/browse/FLINK-34377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Junrui Li
>Assignee: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-366.
> Starting with version 1.19, Flink has officially introduced full support for 
> the standard YAML 1.2 syntax. For detailed information, please refer to the 
> Flink 
> website:https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#flink-configuration-file
> We may need to cover the following two types of test cases:
> Test 1: For newly created jobs, utilize a config.yaml file to set up the 
> Flink cluster. We need to verify that the job runs as expected with this new 
> configuration.
> Test 2: For existing jobs, migrate the legacy flink-conf.yaml to the new 
> config.yaml. Test the job runs just like before post-migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34377) Release Testing: Verify FLINK-33297 Support standard YAML for FLINK configuration

2024-02-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34377:
---

Assignee: xingbe

> Release Testing: Verify FLINK-33297 Support standard YAML for FLINK 
> configuration
> -
>
> Key: FLINK-34377
> URL: https://issues.apache.org/jira/browse/FLINK-34377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Junrui Li
>Assignee: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-366.
> Starting with version 1.19, Flink has officially introduced full support for 
> the standard YAML 1.2 syntax. For detailed information, please refer to the 
> Flink 
> website:https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#flink-configuration-file
> We may need to cover the following two types of test cases:
> Test 1: For newly created jobs, utilize a config.yaml file to set up the 
> Flink cluster. We need to verify that the job runs as expected with this new 
> configuration.
> Test 2: For existing jobs, migrate the legacy flink-conf.yaml to the new 
> config.yaml. Test the job runs just like before post-migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34383) Modify the comment with incorrect syntax

2024-02-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34383.
---
Resolution: Won't Fix

> Modify the comment with incorrect syntax
> 
>
> Key: FLINK-34383
> URL: https://issues.apache.org/jira/browse/FLINK-34383
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: li you
>Priority: Major
>  Labels: pull-request-available
>
> There is an error in the syntax of the comment for the class 
> PermanentBlobCache



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34356) Release Testing: Verify FLINK-33768 Support dynamic source parallelism inference for batch jobs

2024-02-21 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34356:
---

Assignee: Junrui Li

> Release Testing: Verify FLINK-33768  Support dynamic source parallelism 
> inference for batch jobs 
> -
>
> Key: FLINK-34356
> URL: https://issues.apache.org/jira/browse/FLINK-34356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-379.
> New Source can implement the interface DynamicParallelismInference to enable 
> dynamic parallelism inference. For detailed information, please refer to the 
> documentation.
> We may need to cover the following two types of test cases:
> Test 1: FileSource has implemented the dynamic source parallelism inference. 
> Test the automatic parallelism inference of FileSource.
> Test 2: Test the dynamic source parallelism inference of a custom Source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34356) Release Testing: Verify FLINK-33768 Support dynamic source parallelism inference for batch jobs

2024-02-21 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819142#comment-17819142
 ] 

Zhu Zhu commented on FLINK-34356:
-

Assigned. Thanks for volunteering! [~JunRuiLi]


> Release Testing: Verify FLINK-33768  Support dynamic source parallelism 
> inference for batch jobs 
> -
>
> Key: FLINK-34356
> URL: https://issues.apache.org/jira/browse/FLINK-34356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-379.
> New Source can implement the interface DynamicParallelismInference to enable 
> dynamic parallelism inference. For detailed information, please refer to the 
> documentation.
> We may need to cover the following two types of test cases:
> Test 1: FileSource has implemented the dynamic source parallelism inference. 
> Test the automatic parallelism inference of FileSource.
> Test 2: Test the dynamic source parallelism inference of a custom Source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33241) Align config option generation documentation for Flink's config documentation

2024-02-20 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33241.
---
Fix Version/s: 1.19.0
   Resolution: Fixed

Fixed via 8fe005cf1325b7477d7e1808a46bd80798165029

> Align config option generation documentation for Flink's config documentation
> -
>
> Key: FLINK-33241
> URL: https://issues.apache.org/jira/browse/FLINK-33241
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.19.0
>
>
> The configuration parameter docs generation is documented in two places in 
> different ways:
> [docs/README.md:62|https://github.com/apache/flink/blob/5c1e9f3b1449cb77276d578b344d9a69c7cf9a3c/docs/README.md#L62]
>  and 
> [flink-docs/README.md:44|https://github.com/apache/flink/blob/7bebd2d9fac517c28afc24c0c034d77cfe2b43a6/flink-docs/README.md#L44].
> We should remove the corresponding command from {{docs/README.md}} and refer 
> to {{flink-docs/README.md}} for the documentation. That way, we only have to 
> maintain a single file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34247) Document FLIP-366: Support standard YAML for FLINK configuration

2024-02-04 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34247.
---
  Assignee: Junrui Li
Resolution: Done

master/release-1.19:
5b61baadd02ccdfa702834e2e63aeb8d1d9e1250
04dd91f2b6c830b9ac0e445f72938e3d6f479edd
e9bea09510e18c6143e6e14ca17a894abfaf92bf

> Document FLIP-366: Support standard YAML for FLINK configuration
> 
>
> Key: FLINK-34247
> URL: https://issues.apache.org/jira/browse/FLINK-34247
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (FLINK-33768) FLIP-379: Support dynamic source parallelism inference for batch jobs

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reopened FLINK-33768:
-

> FLIP-379: Support dynamic source parallelism inference for batch jobs
> -
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34145) File source connector support dynamic source parallelism inference in batch jobs

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34145:

Parent: FLINK-33768
Issue Type: Sub-task  (was: Improvement)

> File source connector support dynamic source parallelism inference in batch 
> jobs
> 
>
> Key: FLINK-34145
> URL: https://issues.apache.org/jira/browse/FLINK-34145
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / FileSystem
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  has introduced support for dynamic source parallelism inference in batch 
> jobs, and we plan to give priority to enabling this feature for the file 
> source connector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34144) Update the documentation and configuration description about dynamic source parallelism inference

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34144:

Parent: FLINK-33768
Issue Type: Sub-task  (was: Improvement)

> Update the documentation and configuration description about dynamic source 
> parallelism inference
> -
>
> Key: FLINK-34144
> URL: https://issues.apache.org/jira/browse/FLINK-34144
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource]
>  introduces the new feature of dynamic source parallelism inference, and we 
> plan to update the documentation and configuration items accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34143) Modify the effective strategy of `execution.batch.adaptive.auto-parallelism.default-source-parallelism`

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34143:

Parent: FLINK-33768
Issue Type: Sub-task  (was: Improvement)

> Modify the effective strategy of 
> `execution.batch.adaptive.auto-parallelism.default-source-parallelism`
> ---
>
> Key: FLINK-34143
> URL: https://issues.apache.org/jira/browse/FLINK-34143
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, if users do not set the 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  configuration option, the AdaptiveBatchScheduler defaults to a parallelism 
> of 1 for source vertices. In 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource],
>  the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  will act as the upper bound for inferring dynamic source parallelism, and 
> continuing with the current policy is no longer appropriate.
> We plan to change the effectiveness strategy of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}};
>  when the user does not set this config option, we will use the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.max-parallelism`{}}} as the 
> upper bound for source parallelism inference. If 
> {{`execution.batch.adaptive.auto-parallelism.max-parallelism`}} is also not 
> configured, the value of `{{{}parallelism.default`{}}} will be used as a 
> fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33768) FLIP-379: Support dynamic source parallelism inference for batch jobs

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33768:

Summary: FLIP-379: Support dynamic source parallelism inference for batch 
jobs  (was: [FLIP-379] Support dynamic source parallelism inference for batch 
jobs)

> FLIP-379: Support dynamic source parallelism inference for batch jobs
> -
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33768) [FLIP-379] Support dynamic source parallelism inference for batch jobs

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33768:

Summary: [FLIP-379] Support dynamic source parallelism inference for batch 
jobs  (was: Support dynamic source parallelism inference for batch jobs)

> [FLIP-379] Support dynamic source parallelism inference for batch jobs
> --
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34132) Batch WordCount job fails when run with AdaptiveBatch scheduler

2024-02-01 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813095#comment-17813095
 ] 

Zhu Zhu commented on FLINK-34132:
-

We should also migrate the existing batch examples from DataSet to DataStream 
so that it can directly work with AdaptiveBatchScheduler. 
This work needs to be done before removing the DataSet API in Flink 2.0.
cc [~Wencong Liu]

> Batch WordCount job fails when run with AdaptiveBatch scheduler
> ---
>
> Key: FLINK-34132
> URL: https://issues.apache.org/jira/browse/FLINK-34132
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.17.1, 1.18.1
>Reporter: Prabhu Joseph
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Batch WordCount job fails when run with AdaptiveBatch scheduler.
> *Repro Steps*
> {code:java}
> flink-yarn-session -Djobmanager.scheduler=adaptive -d
>  flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input 
> s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT
> {code}
> *Error logs*
> {code:java}
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1067)
>   at 
> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:144)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:73)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>   ... 12 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1062)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> 

[jira] [Updated] (FLINK-34132) Batch WordCount job fails when run with AdaptiveBatch scheduler

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34132:

Component/s: Documentation

> Batch WordCount job fails when run with AdaptiveBatch scheduler
> ---
>
> Key: FLINK-34132
> URL: https://issues.apache.org/jira/browse/FLINK-34132
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.17.1, 1.18.1
>Reporter: Prabhu Joseph
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Batch WordCount job fails when run with AdaptiveBatch scheduler.
> *Repro Steps*
> {code:java}
> flink-yarn-session -Djobmanager.scheduler=adaptive -d
>  flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input 
> s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT
> {code}
> *Error logs*
> {code:java}
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1067)
>   at 
> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:144)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:73)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>   ... 12 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1062)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457)
>   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
>   at 
> 

[jira] [Closed] (FLINK-34132) Batch WordCount job fails when run with AdaptiveBatch scheduler

2024-02-01 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34132.
---
Fix Version/s: 1.19.0
 Assignee: Junrui Li
   Resolution: Fixed

The documentation is updated via dd3e60a4b1e473c167837b7c3bc4fb90c0a1f51a

> Batch WordCount job fails when run with AdaptiveBatch scheduler
> ---
>
> Key: FLINK-34132
> URL: https://issues.apache.org/jira/browse/FLINK-34132
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.17.1, 1.18.1
>Reporter: Prabhu Joseph
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Batch WordCount job fails when run with AdaptiveBatch scheduler.
> *Repro Steps*
> {code:java}
> flink-yarn-session -Djobmanager.scheduler=adaptive -d
>  flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input 
> s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT
> {code}
> *Error logs*
> {code:java}
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1067)
>   at 
> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:144)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:73)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>   ... 12 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1062)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457)
>   at 

[jira] [Closed] (FLINK-34126) Correct the description of jobmanager.scheduler

2024-01-31 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34126.
---
Fix Version/s: 1.19.0
 Assignee: Junrui Li
   Resolution: Fixed

Fixed via 2be1ea801cf616d0d0a82729829245c205caaad8

> Correct the description of jobmanager.scheduler
> ---
>
> Key: FLINK-34126
> URL: https://issues.apache.org/jira/browse/FLINK-34126
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Now the config option jobmanager.scheduler has description: 
> _Determines which scheduler implementation is used to schedule tasks. 
> Accepted values are:_
>  * _'Default': Default scheduler_
>  * _'Adaptive': Adaptive scheduler. More details can be found 
> [here|https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/elastic_scaling#adaptive-scheduler]._
>  * _'AdaptiveBatch': Adaptive batch scheduler. More details can be found 
> [here|https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/elastic_scaling#adaptive-batch-scheduler]._
> _Possible values:_
>  * _"Default"_
>  * _"Adaptive"_
>  * _"AdaptiveBatch"_
>  
> However, after FLIP-283 we changed the default scheduler for batch job to 
> AdaptiveBatchScheduler. This config option description will mislead users 
> that the 'DefaultScheduler' is the universal fallback for both batch and 
> streaming jobs.
> We should update this description.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34206) CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed

2024-01-31 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34206.
---
Fix Version/s: 1.19.0
   Resolution: Fixed

Fixed via b737b71859672e8020881ce2abf998735ee98abb

> CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed
> ---
>
> Key: FLINK-34206
> URL: https://issues.apache.org/jira/browse/FLINK-34206
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: xingbe
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56728=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8763
> {code}
> Jan 23 01:39:48 01:39:48.152 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.24 s <<< FAILURE! -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> Jan 23 01:39:48 01:39:48.152 [ERROR] 
> org.apache.flink.test.streaming.runtime.CacheITCase.testRetryOnCorruptedClusterDataset(Path)
>  -- Time elapsed: 4.755 s <<< ERROR!
> Jan 23 01:39:48 org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
> 

[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-29 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812123#comment-17812123
 ] 

Zhu Zhu commented on FLINK-34105:
-

Thanks for the updates! [~guoyangze]

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34257) Update Flink YAML Parser to Support YAML 1.2 Specification

2024-01-29 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34257.
---
Resolution: Fixed

Fixed via 081051a2cacaddf6dfe613da061f15f28a015a41

>  Update Flink YAML Parser to Support YAML 1.2 Specification
> ---
>
> Key: FLINK-34257
> URL: https://issues.apache.org/jira/browse/FLINK-34257
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Runtime / Configuration
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> FLINK-33364 and FLINK-33577 added snakeyaml and pyyaml dependencies to 
> support a standard YAML parser. However, these parsers support the YAML 1.1 
> specification rather than the YAML 1.2 specification, which is the version 
> referenced by [FLINK official 
> website|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#configuration].
>  Therefore, we need to update these dependencies that support YAML 1.2.
> The updated dependencies are as follows:
> 1. For Java: change from snakeyaml to snakeyaml-engine
> 2. For Python: change from pyyaml to ruamel.yaml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34257) Update Flink YAML Parser to Support YAML 1.2 Specification

2024-01-29 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34257:

Affects Version/s: 1.19.0

>  Update Flink YAML Parser to Support YAML 1.2 Specification
> ---
>
> Key: FLINK-34257
> URL: https://issues.apache.org/jira/browse/FLINK-34257
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> FLINK-33364 and FLINK-33577 added snakeyaml and pyyaml dependencies to 
> support a standard YAML parser. However, these parsers support the YAML 1.1 
> specification rather than the YAML 1.2 specification, which is the version 
> referenced by [FLINK official 
> website|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#configuration].
>  Therefore, we need to update these dependencies that support YAML 1.2.
> The updated dependencies are as follows:
> 1. For Java: change from snakeyaml to snakeyaml-engine
> 2. For Python: change from pyyaml to ruamel.yaml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-29 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811867#comment-17811867
 ] 

Zhu Zhu commented on FLINK-34105:
-

Hi [~lsdy], what's the status of the fix?

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34257) Update Flink YAML Parser to Support YAML 1.2 Specification

2024-01-29 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34257:
---

Assignee: Junrui Li

>  Update Flink YAML Parser to Support YAML 1.2 Specification
> ---
>
> Key: FLINK-34257
> URL: https://issues.apache.org/jira/browse/FLINK-34257
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> FLINK-33364 and FLINK-33577 added snakeyaml and pyyaml dependencies to 
> support a standard YAML parser. However, these parsers support the YAML 1.1 
> specification, not the YAML 1.2 specification. Therefore, we need to update 
> these dependencies that support YAML 1.2.
> The updated dependencies are as follows:
> 1. For Java: change from snakeyaml to snakeyaml-engine
> 2. For Python: change from pyyaml to ruamel.yaml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34257) Update Flink YAML Parser to Support YAML 1.2 Specification

2024-01-29 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34257:

Component/s: Runtime / Configuration

>  Update Flink YAML Parser to Support YAML 1.2 Specification
> ---
>
> Key: FLINK-34257
> URL: https://issues.apache.org/jira/browse/FLINK-34257
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Runtime / Configuration
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> FLINK-33364 and FLINK-33577 added snakeyaml and pyyaml dependencies to 
> support a standard YAML parser. However, these parsers support the YAML 1.1 
> specification, not the YAML 1.2 specification. Therefore, we need to update 
> these dependencies that support YAML 1.2.
> The updated dependencies are as follows:
> 1. For Java: change from snakeyaml to snakeyaml-engine
> 2. For Python: change from pyyaml to ruamel.yaml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34245) CassandraSinkTest.test_cassandra_sink fails under JDK17 and JDK21 due to InaccessibleObjectException

2024-01-28 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34245.
---
Fix Version/s: 1.19.0
 Assignee: Junrui Li
   Resolution: Fixed

Fixed via ddbf87f2a7aeeeb20a8590578c6d037b239d5593

> CassandraSinkTest.test_cassandra_sink fails under JDK17 and JDK21 due to 
> InaccessibleObjectException
> 
>
> Key: FLINK-34245
> URL: https://issues.apache.org/jira/browse/FLINK-34245
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Connectors / Cassandra
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56942=logs=b53e1644-5cb4-5a3b-5d48-f523f39bcf06=b68c9f5c-04c9-5c75-3862-a3a27aabbce3]
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56942=logs=60960eae-6f09-579e-371e-29814bdd1adc=7a70c083-6a74-5348-5106-30a76c29d8fa=63680]
> {code:java}
> Jan 26 01:29:27 E   py4j.protocol.Py4JJavaError: An error 
> occurred while calling 
> z:org.apache.flink.python.util.PythonConfigUtil.configPythonOperator.
> Jan 26 01:29:27 E   : 
> java.lang.reflect.InaccessibleObjectException: Unable to make field final 
> java.util.Map java.util.Collections$UnmodifiableMap.m accessible: module 
> java.base does not "opens java.util" to unnamed module @17695df3
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.AccessibleObject.throwInaccessibleObjectException(AccessibleObject.java:391)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:367)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:315)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:183)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.Field.setAccessible(Field.java:177)
> Jan 26 01:29:27 E at 
> org.apache.flink.python.util.PythonConfigUtil.registerPythonBroadcastTransformationTranslator(PythonConfigUtil.java:357)
> Jan 26 01:29:27 E at 
> org.apache.flink.python.util.PythonConfigUtil.configPythonOperator(PythonConfigUtil.java:101)
> Jan 26 01:29:27 E at 
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.Method.invoke(Method.java:580)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
> Jan 26 01:29:27 E at 
> java.base/java.lang.Thread.run(Thread.java:1583) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34200) AutoRescalingITCase#testCheckpointRescalingInKeyedState fails

2024-01-27 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811633#comment-17811633
 ] 

Zhu Zhu commented on FLINK-34200:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57024=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba

> AutoRescalingITCase#testCheckpointRescalingInKeyedState fails
> -
>
> Key: FLINK-34200
> URL: https://issues.apache.org/jira/browse/FLINK-34200
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
> Attachments: FLINK-34200.failure.log.gz
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56601=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=8200]
> {code:java}
> Jan 19 02:31:53 02:31:53.954 [ERROR] Tests run: 32, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 1050 s <<< FAILURE! -- in 
> org.apache.flink.test.checkpointing.AutoRescalingITCase
> Jan 19 02:31:53 02:31:53.954 [ERROR] 
> org.apache.flink.test.checkpointing.AutoRescalingITCase.testCheckpointRescalingInKeyedState[backend
>  = rocksdb, buffersPerChannel = 2] -- Time elapsed: 59.10 s <<< FAILURE!
> Jan 19 02:31:53 java.lang.AssertionError: expected:<[(0,8000), (0,32000), 
> (0,48000), (0,72000), (1,78000), (1,3), (1,54000), (0,2000), (0,1), 
> (0,5), (0,66000), (0,74000), (0,82000), (1,8), (1,0), (1,16000), 
> (1,24000), (1,4), (1,56000), (1,64000), (0,12000), (0,28000), (0,52000), 
> (0,6), (0,68000), (0,76000), (1,18000), (1,26000), (1,34000), (1,42000), 
> (1,58000), (0,6000), (0,14000), (0,22000), (0,38000), (0,46000), (0,62000), 
> (0,7), (1,4000), (1,2), (1,36000), (1,44000)]> but was:<[(0,8000), 
> (0,32000), (0,48000), (0,72000), (1,78000), (1,3), (1,54000), (0,2000), 
> (0,1), (0,5), (0,66000), (0,74000), (0,82000), (1,8), (1,0), 
> (1,16000), (1,24000), (1,4), (1,56000), (1,64000), (0,12000), (0,28000), 
> (0,52000), (0,6), (0,68000), (0,76000), (0,1000), (0,25000), (0,33000), 
> (0,41000), (1,18000), (1,26000), (1,34000), (1,42000), (1,58000), (0,6000), 
> (0,14000), (0,22000), (0,38000), (0,46000), (0,62000), (0,7), (1,4000), 
> (1,2), (1,36000), (1,44000)]>
> Jan 19 02:31:53   at org.junit.Assert.fail(Assert.java:89)
> Jan 19 02:31:53   at org.junit.Assert.failNotEquals(Assert.java:835)
> Jan 19 02:31:53   at org.junit.Assert.assertEquals(Assert.java:120)
> Jan 19 02:31:53   at org.junit.Assert.assertEquals(Assert.java:146)
> Jan 19 02:31:53   at 
> org.apache.flink.test.checkpointing.AutoRescalingITCase.testCheckpointRescalingKeyedState(AutoRescalingITCase.java:296)
> Jan 19 02:31:53   at 
> org.apache.flink.test.checkpointing.AutoRescalingITCase.testCheckpointRescalingInKeyedState(AutoRescalingITCase.java:196)
> Jan 19 02:31:53   at java.lang.reflect.Method.invoke(Method.java:498)
> Jan 19 02:31:53   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34145) File source connector support dynamic source parallelism inference in batch jobs

2024-01-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34145.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
11631cb59568df60d40933fb13c8433062ed9290

> File source connector support dynamic source parallelism inference in batch 
> jobs
> 
>
> Key: FLINK-34145
> URL: https://issues.apache.org/jira/browse/FLINK-34145
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  has introduced support for dynamic source parallelism inference in batch 
> jobs, and we plan to give priority to enabling this feature for the file 
> source connector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34206) CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed

2024-01-26 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811231#comment-17811231
 ] 

Zhu Zhu commented on FLINK-34206:
-

Disabled {{CacheITCase.testRetryOnCorruptedClusterDataset}} temporarily via 
05ee359ebd564af3dd8ab31975cd479e92ba1785

> CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed
> ---
>
> Key: FLINK-34206
> URL: https://issues.apache.org/jira/browse/FLINK-34206
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: xingbe
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56728=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8763
> {code}
> Jan 23 01:39:48 01:39:48.152 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.24 s <<< FAILURE! -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> Jan 23 01:39:48 01:39:48.152 [ERROR] 
> org.apache.flink.test.streaming.runtime.CacheITCase.testRetryOnCorruptedClusterDataset(Path)
>  -- Time elapsed: 4.755 s <<< ERROR!
> Jan 23 01:39:48 org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
> Jan 23 01:39:48   at 
> 

[jira] [Assigned] (FLINK-34206) CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed

2024-01-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34206:
---

Assignee: xingbe

> CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed
> ---
>
> Key: FLINK-34206
> URL: https://issues.apache.org/jira/browse/FLINK-34206
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: xingbe
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56728=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8763
> {code}
> Jan 23 01:39:48 01:39:48.152 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.24 s <<< FAILURE! -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> Jan 23 01:39:48 01:39:48.152 [ERROR] 
> org.apache.flink.test.streaming.runtime.CacheITCase.testRetryOnCorruptedClusterDataset(Path)
>  -- Time elapsed: 4.755 s <<< ERROR!
> Jan 23 01:39:48 org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
> Jan 23 01:39:48   at 
> org.apache.pekko.pattern.PromiseActorRef.$bang(AskSupport.scala:629)
> Jan 

[jira] [Closed] (FLINK-34223) Introduce a migration tool to transfer legacy config file to new config file

2024-01-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34223.
---
  Assignee: Junrui Li
Resolution: Done

master/release-1.19:
8fceb101e7e45f3fdc9357019757230bd8c16aa7

> Introduce a migration tool to transfer legacy config file to new config file
> 
>
> Key: FLINK-34223
> URL: https://issues.apache.org/jira/browse/FLINK-34223
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Scripts
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> As we transition to new configuration files that adhere to the standard YAML 
> format, users are expected to manually migrate their existing config files. 
> However, this process can be error-prone and time-consuming.
> To simplify the migration, we're introducing an automated script. This script 
> leverages BashJavaUtils to efficiently convert old flink-conf.yaml files into 
> the new config file config.yaml, thereby reducing the effort required for 
> migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34232) Config file unexpectedly lacks support for env.java.home

2024-01-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34232.
---
  Assignee: Junrui Li
Resolution: Fixed

Fixed in master/release-1.19:
e623c07f4e56fdef1bd8514ccd02df347af5b122

> Config file unexpectedly lacks support for env.java.home
> 
>
> Key: FLINK-34232
> URL: https://issues.apache.org/jira/browse/FLINK-34232
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> We removed the option to set JAVA_HOME in the config file with commit 
> [24091|https://github.com/apache/flink/pull/24091] to improve how we handle 
> standard YAML with BashJavaUtils. But since setting JAVA_HOME is a publicly 
> documented feature, we need to keep it available for users. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34206) CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed

2024-01-25 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811084#comment-17811084
 ] 

Zhu Zhu commented on FLINK-34206:
-

We will take a look.

> CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed
> ---
>
> Key: FLINK-34206
> URL: https://issues.apache.org/jira/browse/FLINK-34206
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Blocker
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56728=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8763
> {code}
> Jan 23 01:39:48 01:39:48.152 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.24 s <<< FAILURE! -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> Jan 23 01:39:48 01:39:48.152 [ERROR] 
> org.apache.flink.test.streaming.runtime.CacheITCase.testRetryOnCorruptedClusterDataset(Path)
>  -- Time elapsed: 4.755 s <<< ERROR!
> Jan 23 01:39:48 org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
> Jan 23 01:39:48   at 
> org.apache.pekko.pattern.PromiseActorRef.$bang(AskSupport.scala:629)
> Jan 23 01:39:48   at 

[jira] [Closed] (FLINK-33577) Make "conf.yaml" as the default Flink configuration file

2024-01-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33577.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
9721ce835f5a7f28f2ad187346e009633307097b

> Make "conf.yaml" as the default Flink configuration file
> 
>
> Key: FLINK-33577
> URL: https://issues.apache.org/jira/browse/FLINK-33577
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> This update ensures that the flink-dist package in FLINK will include the new 
> configuration file "conf.yaml" by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34144) Update the documentation and configuration description about dynamic source parallelism inference

2024-01-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34144.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
78e31f0dcc4da65d88e18560f3374a47cc0a7c9b

> Update the documentation and configuration description about dynamic source 
> parallelism inference
> -
>
> Key: FLINK-34144
> URL: https://issues.apache.org/jira/browse/FLINK-34144
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource]
>  introduces the new feature of dynamic source parallelism inference, and we 
> plan to update the documentation and configuration items accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34229) Duplicate entry in InnerClasses attribute in class file FusionStreamOperator

2024-01-25 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810856#comment-17810856
 ] 

Zhu Zhu commented on FLINK-34229:
-

[~FrankZou] the query plan can be different between q35 of `flink-tpcds-test` 
and q35 of a 10TB TPC-DS benchmark. e.g., DPP or runtime filters may not be 
created in `flink-tpcds-test` due to the data size is very small.

> Duplicate entry in InnerClasses attribute in class file FusionStreamOperator
> 
>
> Key: FLINK-34229
> URL: https://issues.apache.org/jira/browse/FLINK-34229
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.19.0
>Reporter: xingbe
>Priority: Major
> Attachments: image-2024-01-24-17-05-47-883.png
>
>
> I noticed a runtime error happens in 10TB TPC-DS (q35.sql) benchmarks in 
> 1.19, the problem did not happen in 1.18.0. This issue may have been newly 
> introduced recently. !image-2024-01-24-17-05-47-883.png|width=589,height=279!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34205) Update flink-docker's Dockerfile and docker-entrypoint.sh to use BashJavaUtils for Flink configuration management

2024-01-23 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34205.
---
Fix Version/s: 1.19.0
   Resolution: Done

dev-master:
44f058287cc956a620b12b6f8ed214e44dc3db77

> Update flink-docker's Dockerfile and docker-entrypoint.sh to use 
> BashJavaUtils for Flink configuration management
> -
>
> Key: FLINK-34205
> URL: https://issues.apache.org/jira/browse/FLINK-34205
> Project: Flink
>  Issue Type: Sub-task
>  Components: flink-docker
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The flink-docker's Dockerfile and docker-entrypoint.sh currently use shell 
> scripting techniques with grep and sed for configuration reading and 
> modification. This method is not suitable for the standard YAML configuration 
> format.
> Following the changes introduced in FLINK-33721, we should update 
> flink-docker's Dockerfile and docker-entrypoint.sh to use BashJavaUtils for 
> Flink configuration reading and writing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34205) Update flink-docker's Dockerfile and docker-entrypoint.sh to use BashJavaUtils for Flink configuration management

2024-01-23 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34205:
---

Assignee: Junrui Li

> Update flink-docker's Dockerfile and docker-entrypoint.sh to use 
> BashJavaUtils for Flink configuration management
> -
>
> Key: FLINK-34205
> URL: https://issues.apache.org/jira/browse/FLINK-34205
> Project: Flink
>  Issue Type: Sub-task
>  Components: flink-docker
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> The flink-docker's Dockerfile and docker-entrypoint.sh currently use shell 
> scripting techniques with grep and sed for configuration reading and 
> modification. This method is not suitable for the standard YAML configuration 
> format.
> Following the changes introduced in FLINK-33721, we should update 
> flink-docker's Dockerfile and docker-entrypoint.sh to use BashJavaUtils for 
> Flink configuration reading and writing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34143) Modify the effective strategy of `execution.batch.adaptive.auto-parallelism.default-source-parallelism`

2024-01-23 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34143.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
97caa3c251e416640a6f54ea103912839c346f70

> Modify the effective strategy of 
> `execution.batch.adaptive.auto-parallelism.default-source-parallelism`
> ---
>
> Key: FLINK-34143
> URL: https://issues.apache.org/jira/browse/FLINK-34143
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, if users do not set the 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  configuration option, the AdaptiveBatchScheduler defaults to a parallelism 
> of 1 for source vertices. In 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource],
>  the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  will act as the upper bound for inferring dynamic source parallelism, and 
> continuing with the current policy is no longer appropriate.
> We plan to change the effectiveness strategy of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}};
>  when the user does not set this config option, we will use the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.max-parallelism`{}}} as the 
> upper bound for source parallelism inference. If 
> {{`execution.batch.adaptive.auto-parallelism.max-parallelism`}} is also not 
> configured, the value of `{{{}parallelism.default`{}}} will be used as a 
> fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34144) Update the documentation and configuration description about dynamic source parallelism inference

2024-01-23 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34144:
---

Assignee: xingbe

> Update the documentation and configuration description about dynamic source 
> parallelism inference
> -
>
> Key: FLINK-34144
> URL: https://issues.apache.org/jira/browse/FLINK-34144
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource]
>  introduces the new feature of dynamic source parallelism inference, and we 
> plan to update the documentation and configuration items accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34145) File source connector support dynamic source parallelism inference in batch jobs

2024-01-23 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34145:
---

Assignee: xingbe

> File source connector support dynamic source parallelism inference in batch 
> jobs
> 
>
> Key: FLINK-34145
> URL: https://issues.apache.org/jira/browse/FLINK-34145
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  has introduced support for dynamic source parallelism inference in batch 
> jobs, and we plan to give priority to enabling this feature for the file 
> source connector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33721) Extend BashJavaUtils to Support Reading and Writing Standard YAML Files

2024-01-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33721.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
1f7622d4d23bfcb76f466469ec36585054864f04
c148b62166d8fec9e7e525a836de890c2f12973b

> Extend BashJavaUtils to Support Reading and Writing Standard YAML Files
> ---
>
> Key: FLINK-33721
> URL: https://issues.apache.org/jira/browse/FLINK-33721
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, Flink's shell scripts, such as those used for end-to-end (e2e) 
> testing and Docker image building, require the ability to read from and 
> modify Flink's configuration files. With the introduction of standard YAML 
> files as the configuration format, the existing shell scripts are not 
> equipped to correctly handle read and modify operations on these files.
> To accommodate this requirement and enhance our script capabilities, we 
> propose an extension to the BashJavaUtils functionality. This extension will 
> enable BashJavaUtils to support the reading and modifying of standard YAML 
> files, ensuring that our shell scripts can seamlessly interact with the new 
> configuration format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33768) Support dynamic source parallelism inference for batch jobs

2024-01-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33768.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
c3c836216eaaaf24c1add3b490c8f425fda01d7c

> Support dynamic source parallelism inference for batch jobs
> ---
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-18 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808457#comment-17808457
 ] 

Zhu Zhu commented on FLINK-34105:
-

[~lsdy] Sounds good to me. Feel free to open a PR for it.

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34144) Update the documentation and configuration description about dynamic source parallelism inference

2024-01-18 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808424#comment-17808424
 ] 

Zhu Zhu commented on FLINK-34144:
-

[~xiasun] Assigned. Feel free to open a PR for it.

> Update the documentation and configuration description about dynamic source 
> parallelism inference
> -
>
> Key: FLINK-34144
> URL: https://issues.apache.org/jira/browse/FLINK-34144
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: xingbe
>Priority: Major
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource]
>  introduces the new feature of dynamic source parallelism inference, and we 
> plan to update the documentation and configuration items accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34143) Modify the effective strategy of `execution.batch.adaptive.auto-parallelism.default-source-parallelism`

2024-01-18 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808423#comment-17808423
 ] 

Zhu Zhu commented on FLINK-34143:
-

[~xiasun]Assigned. Feel free to open a PR for it.

> Modify the effective strategy of 
> `execution.batch.adaptive.auto-parallelism.default-source-parallelism`
> ---
>
> Key: FLINK-34143
> URL: https://issues.apache.org/jira/browse/FLINK-34143
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> Currently, if users do not set the 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  configuration option, the AdaptiveBatchScheduler defaults to a parallelism 
> of 1 for source vertices. In 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource],
>  the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  will act as the upper bound for inferring dynamic source parallelism, and 
> continuing with the current policy is no longer appropriate.
> We plan to change the effectiveness strategy of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}};
>  when the user does not set this config option, we will use the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.max-parallelism`{}}} as the 
> upper bound for source parallelism inference. If 
> {{`execution.batch.adaptive.auto-parallelism.max-parallelism`}} is also not 
> configured, the value of `{{{}parallelism.default`{}}} will be used as a 
> fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34143) Modify the effective strategy of `execution.batch.adaptive.auto-parallelism.default-source-parallelism`

2024-01-18 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808423#comment-17808423
 ] 

Zhu Zhu edited comment on FLINK-34143 at 1/19/24 1:54 AM:
--

[~xiasun] Assigned. Feel free to open a PR for it.


was (Author: zhuzh):
[~xiasun]Assigned. Feel free to open a PR for it.

> Modify the effective strategy of 
> `execution.batch.adaptive.auto-parallelism.default-source-parallelism`
> ---
>
> Key: FLINK-34143
> URL: https://issues.apache.org/jira/browse/FLINK-34143
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> Currently, if users do not set the 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  configuration option, the AdaptiveBatchScheduler defaults to a parallelism 
> of 1 for source vertices. In 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource],
>  the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  will act as the upper bound for inferring dynamic source parallelism, and 
> continuing with the current policy is no longer appropriate.
> We plan to change the effectiveness strategy of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}};
>  when the user does not set this config option, we will use the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.max-parallelism`{}}} as the 
> upper bound for source parallelism inference. If 
> {{`execution.batch.adaptive.auto-parallelism.max-parallelism`}} is also not 
> configured, the value of `{{{}parallelism.default`{}}} will be used as a 
> fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34143) Modify the effective strategy of `execution.batch.adaptive.auto-parallelism.default-source-parallelism`

2024-01-18 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34143:
---

Assignee: xingbe

> Modify the effective strategy of 
> `execution.batch.adaptive.auto-parallelism.default-source-parallelism`
> ---
>
> Key: FLINK-34143
> URL: https://issues.apache.org/jira/browse/FLINK-34143
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> Currently, if users do not set the 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  configuration option, the AdaptiveBatchScheduler defaults to a parallelism 
> of 1 for source vertices. In 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource],
>  the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  will act as the upper bound for inferring dynamic source parallelism, and 
> continuing with the current policy is no longer appropriate.
> We plan to change the effectiveness strategy of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}};
>  when the user does not set this config option, we will use the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.max-parallelism`{}}} as the 
> upper bound for source parallelism inference. If 
> {{`execution.batch.adaptive.auto-parallelism.max-parallelism`}} is also not 
> configured, the value of `{{{}parallelism.default`{}}} will be used as a 
> fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-16 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807554#comment-17807554
 ] 

Zhu Zhu commented on FLINK-34105:
-

I think it is a regression because existing jobs can become unstable.
Is it possible that we use a thread pool to do parallelized serialization 
before conducting task submission, so that it will not be counted as part of 
pekko RPC process?

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34116) GlobalConfigurationTest.testInvalidStandardYamlFile fails

2024-01-16 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34116.
---
Fix Version/s: 1.19.0
 Assignee: Junrui Li
   Resolution: Fixed

Fixed via
cdf314d30b59994283e0bbf70f350618de02118c

> GlobalConfigurationTest.testInvalidStandardYamlFile fails
> -
>
> Key: FLINK-34116
> URL: https://issues.apache.org/jira/browse/FLINK-34116
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> All build failures in the same build:
>  * 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56416=logs=f0ac5c25-1168-55a5-07ff-0e88223afed9=50bf7a25-bdc4-5e56-5478-c7b4511dde53=6274]
>  * 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56416=logs=675bf62c-8558-587e-2555-dcad13acefb5=5878eed3-cc1e-5b12-1ed0-9e7139ce0992=6256]
>  * 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56416=logs=d06b80b4-9e88-5d40-12a2-18072cf60528=609ecd5a-3f6e-5d0c-2239-2096b155a4d0=6176]
> {code:java}
> Jan 16 01:38:10 01:38:10.780 [ERROR] Failures: 
> Jan 16 01:38:10 01:38:10.781 [ERROR]   
> GlobalConfigurationTest.testInvalidStandardYamlFile:200 
> Jan 16 01:38:10 Multiple Failures (1 failure)
> Jan 16 01:38:10 -- failure 1 --
> Jan 16 01:38:10 Expecting actual:
> Jan 16 01:38:10   "java.lang.RuntimeException: Error parsing YAML 
> configuration.
> Jan 16 01:38:10   at 
> org.apache.flink.configuration.GlobalConfiguration.loadYAMLResource(GlobalConfiguration.java:351)
> Jan 16 01:38:10   at 
> org.apache.flink.configuration.GlobalConfiguration.loadConfiguration(GlobalConfiguration.java:162)
> Jan 16 01:38:10   at 
> org.apache.flink.configuration.GlobalConfiguration.loadConfiguration(GlobalConfiguration.java:115)
> Jan 16 01:38:10   at 
> org.apache.flink.configuration.GlobalConfigurationTest.lambda$testInvalidStandardYamlFile$3(GlobalConfigurationTest.java:198)
> Jan 16 01:38:10   at 
> org.assertj.core.api.ThrowableAssert.catchThrowable(ThrowableAssert.java:63)
> Jan 16 01:38:10   at 
> org.assertj.core.api.AssertionsForClassTypes.catchThrowable(AssertionsForClassTypes.java:892)
> Jan 16 01:38:10   at 
> org.assertj.core.api.Assertions.catchThrowable(Assertions.java:1366)
> Jan 16 01:38:10   at 
> org.assertj.core.api.Assertions.assertThatThrownBy(Assertions.java:1210)
> Jan 16 01:38:10   at 
> org.apache.flink.configuration.GlobalConfigurationTest.testInvalidStandardYamlFile(GlobalConfigurationTest.java:198)
> Jan 16 01:38:10   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> Jan 16 01:38:10   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jan 16 01:38:10   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jan 16 01:38:10   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-16 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807505#comment-17807505
 ] 

Zhu Zhu commented on FLINK-34105:
-

We tried increasing {{pekko.ask.timeout}} to {{1min}}(from the default 
{{10s}}), and the problem did not happen again.
So I guess it's not related to the framesize.

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-16 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807203#comment-17807203
 ] 

Zhu Zhu commented on FLINK-34105:
-

[~guoyangze] pekko.framesize is not configured and the default value should 
have been used.
Here are all the configuration:
```
parallelism.default: 1500
slotmanager.number-of-slots.max: 1500
taskmanager.numberOfTaskSlots: 10
jobmanager.memory.process.size: 24000m
taskmanager.memory.process.size: 24000m
resourcemanager.taskmanager-timeout: 90
taskmanager.memory.network.fraction: 0.2
cluster.evenly-spread-out-slots: true

table.optimizer.join-reorder-enabled: true
table.optimizer.join.broadcast-threshold: 10485760
table.exec.operator-fusion-codegen.enabled: true
```

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33980) Reorganize job configuration

2024-01-15 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33980.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
06b37089f0c1cdf70ca21970a40d15c3eaba07ed
290b633c4265540d481ac563454c7f4a3e706b9d
65b8b3baad6a27e6253a93701508ba24dc8fcfe0
d02ef1cebf302c56a0d9d51664d2c7fb6f5be932
eb8af0c589ce46b091f403e848c7dc84b7e3ee8b

> Reorganize job configuration
> 
>
> Key: FLINK-33980
> URL: https://issues.apache.org/jira/browse/FLINK-33980
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, job configuration in FLINK is spread out across different 
> components, including StreamExecutionEnvironment, CheckpointConfig, and 
> ExecutionConfig. This distribution leads to inconsistencies among the 
> configurations stored within these components. Furthermore, the methods used 
> to configure these components vary; some rely on complex Java objects, while 
> others use ConfigOption, which is a key-value configuration approach. This 
> variation complicates the effective management of job configurations. 
> Additionally, passing complex Java objects (e.g., StateBackend and 
> CheckpointStorage) between the environment, StreamGraph, and JobGraph adds 
> complexity to development.
> With the completion of FLIP-381, it is now time to standardize and unify job 
> configuration in FLINK. The goals of this JIRA are as follows:
>  # Migrate configuration from non-ConfigOption objects to use ConfigOption.
>  # Adopt a single Configuration object to house all configurations.
>  # Create complex Java objects, such as RestartBackoffTimeStrategyFactory, 
> CheckpointStorage, and StateBackend, directly from the configuration on the 
> JM side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34105:

Priority: Critical  (was: Major)

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807052#comment-17807052
 ] 

Zhu Zhu edited comment on FLINK-34105 at 1/16/24 6:06 AM:
--

[~lsdy] [~guoyangze] would you help to take a look?

This indicates a problem that existing large Flink batch jobs may become 
unstable in 1.19.


was (Author: zhuzh):
[~lsdy] [~guoyangze] would you help to take a look?

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Priority: Major
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34105:

Description: 
We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png|width=250! 

  was:
We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png! 


> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Priority: Major
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=250! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34105:

Description: 
We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png|width=800! 

  was:
We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png|width=600! 


> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Priority: Major
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34105:

Description: 
We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png|width=600! 

  was:
We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png|width=250! 


> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Priority: Major
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=600! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807052#comment-17807052
 ] 

Zhu Zhu commented on FLINK-34105:
-

[~lsdy] [~guoyangze] would you help to take a look?

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Priority: Major
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-15 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-34105:
---

 Summary: Akka timeout happens in TPC-DS benchmarks
 Key: FLINK-34105
 URL: https://issues.apache.org/jira/browse/FLINK-34105
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.19.0
Reporter: Zhu Zhu
 Attachments: image-2024-01-16-13-59-45-556.png

We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The problem 
did not happen in 1.18.0.
After bisecting, we find the problem was introduced in FLINK-33532.

 !image-2024-01-16-13-59-45-556.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33576) Introduce new Flink conf file "config.yaml" supporting standard YAML syntax

2024-01-15 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33576.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19: 
ddc7171dd83ade50e6b551d3bc2d31acb5ff6451
b6b2a0f7abebc92f849bfa3ef6f1a71f25b9672d
0153b903d4df2f49c95ee2c53737aa5e492439be

> Introduce new Flink conf file "config.yaml" supporting standard YAML syntax
> ---
>
> Key: FLINK-33576
> URL: https://issues.apache.org/jira/browse/FLINK-33576
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Introduce new Flink conf file config.yaml, and this file will be parsed by 
> standard YAML syntax.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33968) Compute the number of subpartitions when initializing executon job vertices

2024-01-04 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33968.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/1.19: b25dfaee80727d6662a5fd445fe51cc139a8b9eb

> Compute the number of subpartitions when initializing executon job vertices
> ---
>
> Key: FLINK-33968
> URL: https://issues.apache.org/jira/browse/FLINK-33968
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Lijie Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, when using dynamic graphs, the subpartition-num of a task is 
> lazily calculated until the task deployment moment, this may lead to some 
> uncertainties in job recovery scenarios:
> Before jm crashs, when deploying upstream tasks, the parallelism of 
> downstream vertex may be unknown, so the subpartiton-num will be the max 
> parallelism of downstream job vertex. However, after jm restarts, when 
> deploying upstream tasks, the parallelism of downstream job vertex may be 
> known(has been calculated before jm crashs and been recovered after jm 
> restarts), so the subpartiton-num will be the actual parallelism of 
> downstream job vertex. The difference of calculated subpartition-num will 
> lead to the partitions generated before jm crashs cannot be reused after jm 
> restarts.
> We will solve this problem by advancing the calculation of subpartitoin-num 
> to the moment of initializing executon job vertex (in ctor of 
> IntermediateResultPartition)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >