[GitHub] [flink] zhuzhurk commented on a diff in pull request #21970: [FLINK-31041][runtime] Fix multiple restoreState when GlobalFailure occurs in a short period.

2023-02-21 Thread via GitHub


zhuzhurk commented on code in PR #21970:
URL: https://github.com/apache/flink/pull/21970#discussion_r1112513443


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java:
##
@@ -377,6 +377,10 @@ private void restartTasks(
 final Set verticesToRestart =
 
executionVertexVersioner.getUnmodifiedExecutionVertices(executionVertexVersions);
 
+if (verticesToRestart.isEmpty()) {
+return;

Review Comment:
   I'm not entirely sure but a bit concerned is that Flink may take some 
important actions in 
`CheckpointCoordinator#restoreLatestCheckpointedStateInternal()` even if 
`verticesToRestart` if empty, e.g. `invoking 
OperatorCoordinator#resetToCheckpoint(...)`. These actions were always taken 
previously, while are possible to be skipped after this change(when a global 
failover and regional failover happen concurrently).
   
   I haven't had the chance to examine it all over yet. It's appreciated if you 
can also help to examine this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] lindong28 commented on a diff in pull request #214: [FLINK-31127] Add public API classes for FLIP-289

2023-02-21 Thread via GitHub


lindong28 commented on code in PR #214:
URL: https://github.com/apache/flink-ml/pull/214#discussion_r1113910042


##
flink-ml-servable-core/src/main/java/org/apache/flink/ml/servable/api/DataFrame.java:
##
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.servable.api;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.ml.servable.types.DataType;
+
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * A DataFrame consists of some number of rows, each of which has the same 
list of column names and
+ * data types.
+ *
+ * All values in a column must have the same data type: integer, float, 
string etc.
+ */
+@PublicEvolving
+public class DataFrame {
+
+private final List columnNames;
+private final List dataTypes;
+private final List rows;
+
+public DataFrame(List columnNames, List dataTypes, 
List rows) {
+int numColumns = columnNames.size();
+if (dataTypes.size() != numColumns) {
+throw new IllegalArgumentException(
+"The number of data types is different from the number of 
column names.");
+}
+for (Row row : rows) {
+if (row.size() != numColumns) {
+throw new IllegalArgumentException(
+"The number of values in some row is different from 
the number of column names");
+}
+}
+
+this.columnNames = columnNames;
+this.dataTypes = dataTypes;
+this.rows = rows;
+}
+
+/** Returns a list of the names of all the columns in this DataFrame. */
+public List getColumnNames() {
+return columnNames;
+}
+
+/**
+ * Returns the index of the column with the given name.
+ *
+ * @throws IllegalArgumentException if the column is not present in this 
table
+ */
+public int getIndex(String name) {
+for (int i = 0; i < columnNames.size(); i++) {
+if (columnNames.get(i).equals(name)) {
+return i;
+}
+}
+throw new IllegalArgumentException("Failed to find the column with the 
given name.");
+}
+
+/**
+ * Returns the data type of the column with the given name.
+ *
+ * @throws IllegalArgumentException if the column is not present in this 
table
+ */
+public DataType getDataType(String name) {
+int index = getIndex(name);
+return dataTypes.get(index);
+}
+
+/**
+ * Adds to this DataFrame a column with the given name, data type, and 
values.
+ *
+ * @throws IllegalArgumentException if the number of values is different 
from the number of
+ * rows.
+ */
+public DataFrame addColumn(String columnName, DataType dataType, 
List values) {
+if (values.size() != rows.size()) {
+throw new RuntimeException(
+"The number of values is different from the number of 
rows.");
+}
+columnNames.add(columnName);
+dataTypes.add(dataType);

Review Comment:
   Thanks for catching this. It is fixed now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-31180) Fail early when installing minikube and check whether we can retry

2023-02-21 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-31180:
-

 Summary: Fail early when installing minikube and check whether we 
can retry
 Key: FLINK-31180
 URL: https://issues.apache.org/jira/browse/FLINK-31180
 Project: Flink
  Issue Type: Improvement
  Components: Test Infrastructure
Affects Versions: 1.16.1, 1.15.3, 1.17.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46367=logs=bea52777-eaf8-5663-8482-18fbc3630e81=b2642e3a-5b86-574d-4c8a-f7e2842bfb14=4726

We experienced a build failure where Minikube couldn't be installed due to some 
network issues. Two things which we could do here:
* check whether we can add a retry loop (maybe also to other resource)
* fail early if CI didn't manage to install the binaries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] JingsongLi commented on pull request #547: [FLINK-31128] Add Create Table As for flink table store

2023-02-21 Thread via GitHub


JingsongLi commented on PR #547:
URL: 
https://github.com/apache/flink-table-store/pull/547#issuecomment-1439554535

   > Thanks @zhangjun0x01 , at present, the biggest problem with CTAS is that 
it cannot specify the primary key and partition column, which makes it almost 
unavailable for production. Or can we consider supporting them?
   
   @zhangjun0x01 The core problem is that Flink SQL CTAS dose not support pk 
partition key declaration, Maybe we can provide options for primary key and 
partition key just like Spark Create table for table store. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #21992: [BP-1.17][FLINK-31132] Let hive compact operator without setting parallelism subject to sink operator's configured parallelism.

2023-02-21 Thread via GitHub


flinkbot commented on PR #21992:
URL: https://github.com/apache/flink/pull/21992#issuecomment-1439566892

   
   ## CI report:
   
   * 2da857058638b87b37c50be1cc402022fdc3 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] liuyongvs opened a new pull request, #21993: [FLINK-31166][table] Fix array_contains does not support null argumen…

2023-02-21 Thread via GitHub


liuyongvs opened a new pull request, #21993:
URL: https://github.com/apache/flink/pull/21993

   ## What is the purpose of the change
   
   *Fix array_contains does not support null argument when the array element 
type is not null.*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] dannycranmer merged pull request #21970: [FLINK-31041][runtime] Fix multiple restoreState when GlobalFailure occurs in a short period.

2023-02-21 Thread via GitHub


dannycranmer merged PR #21970:
URL: https://github.com/apache/flink/pull/21970


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] dannycranmer commented on pull request #21970: [FLINK-31041][runtime] Fix multiple restoreState when GlobalFailure occurs in a short period.

2023-02-21 Thread via GitHub


dannycranmer commented on PR #21970:
URL: https://github.com/apache/flink/pull/21970#issuecomment-1439572097

   Thanks @huwh, seems like all open questions are resolved and CI passes. 
Merging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] jiangxin369 commented on a diff in pull request #214: [FLINK-31127] Add public API classes for FLIP-289

2023-02-21 Thread via GitHub


jiangxin369 commented on code in PR #214:
URL: https://github.com/apache/flink-ml/pull/214#discussion_r1113903069


##
flink-ml-servable-core/src/main/java/org/apache/flink/ml/servable/api/ModelServable.java:
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.servable.api;
+
+import org.apache.flink.annotation.PublicEvolving;
+
+import java.io.IOException;
+import java.io.InputStream;
+
+/**
+ * A ModelServable is a TransformerServable with the extra API to set model 
data.
+ *
+ * @param  The class type of the ModelServable implementation itself.
+ */
+@PublicEvolving
+public interface ModelServable> extends 
TransformerServable {
+
+/** Sets model data using the serialized model data from the given input 
stream. */

Review Comment:
   ```suggestion
   /** Sets model data using the serialized model data from the given input 
streams. */
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31061) Release Testing: Verify FLINK-30376 Introduce a new flink bushy join reorder rule which based on greedy algorithm

2023-02-21 Thread Yunhong Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691961#comment-17691961
 ] 

Yunhong Zheng commented on FLINK-31061:
---

Thanks, [~JunRuiLi]  for your careful testing. no problems were found during 
the test, so I will close this issue.

> Release Testing: Verify FLINK-30376 Introduce a new flink bushy join reorder 
> rule which based on greedy algorithm
> -
>
> Key: FLINK-31061
> URL: https://issues.apache.org/jira/browse/FLINK-31061
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Affects Versions: 1.17.0
>Reporter: Yunhong Zheng
>Assignee: Junrui Li
>Priority: Major
> Fix For: 1.17.0
>
>
> This issue aims to verify FLINK-30376: [Introduce a new flink bushy join 
> reorder rule which based on greedy 
> algorithm|https://issues.apache.org/jira/browse/FLINK-30376].
>  In Flink-1.17, bushy join reorder strategy is the default join reorder 
> strategy, and this strategy can be disable by setting factor '
> table.optimizer.bushy-join-reorder-threshold' smaller that the table number 
> need to be reordered. If disabled, the Lopt join reorder strategy, which is 
> default join reorder strategy in Flink-1.16, will be choosen. 
> We can verify it in SQL client after we build the flink-dist package.
>  # Firstly, we need to create several tables (The best case is that these 
> tables have table and column statistics).
>  # Secondly, we need to set 'table.optimizer.join-reorder-enabled = true' to 
> open join reorder.
>  # Verify bushy join reorder (The default bushy join reorder threshold is 12, 
> so if the number of table smaller than 12, the join reorder strategy is bushy 
> join reorder).
>  # Compare the results of bushy join reorder and Lopt join reorder strategy. 
> Need to be same.
>  # If we want to create a bushy join tree after join reorder, we need to add 
> statistics. Like:'JoinReorderITCaseBase.testBushyTreeJoinReorder'. 
> If you meet any problems, it's welcome to ping me directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] zhuzhurk commented on a diff in pull request #21970: [FLINK-31041][runtime] Fix multiple restoreState when GlobalFailure occurs in a short period.

2023-02-21 Thread via GitHub


zhuzhurk commented on code in PR #21970:
URL: https://github.com/apache/flink/pull/21970#discussion_r1113903871


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java:
##
@@ -377,6 +377,10 @@ private void restartTasks(
 final Set verticesToRestart =
 
executionVertexVersioner.getUnmodifiedExecutionVertices(executionVertexVersions);
 
+if (verticesToRestart.isEmpty()) {
+return;

Review Comment:
   Yes you are right. If a global failover is in progress, no regional failover 
will be triggered.
   Thanks for the explanation!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-31061) Release Testing: Verify FLINK-30376 Introduce a new flink bushy join reorder rule which based on greedy algorithm

2023-02-21 Thread Yunhong Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yunhong Zheng closed FLINK-31061.
-
Resolution: Fixed

> Release Testing: Verify FLINK-30376 Introduce a new flink bushy join reorder 
> rule which based on greedy algorithm
> -
>
> Key: FLINK-31061
> URL: https://issues.apache.org/jira/browse/FLINK-31061
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Affects Versions: 1.17.0
>Reporter: Yunhong Zheng
>Assignee: Junrui Li
>Priority: Major
> Fix For: 1.17.0
>
>
> This issue aims to verify FLINK-30376: [Introduce a new flink bushy join 
> reorder rule which based on greedy 
> algorithm|https://issues.apache.org/jira/browse/FLINK-30376].
>  In Flink-1.17, bushy join reorder strategy is the default join reorder 
> strategy, and this strategy can be disable by setting factor '
> table.optimizer.bushy-join-reorder-threshold' smaller that the table number 
> need to be reordered. If disabled, the Lopt join reorder strategy, which is 
> default join reorder strategy in Flink-1.16, will be choosen. 
> We can verify it in SQL client after we build the flink-dist package.
>  # Firstly, we need to create several tables (The best case is that these 
> tables have table and column statistics).
>  # Secondly, we need to set 'table.optimizer.join-reorder-enabled = true' to 
> open join reorder.
>  # Verify bushy join reorder (The default bushy join reorder threshold is 12, 
> so if the number of table smaller than 12, the join reorder strategy is bushy 
> join reorder).
>  # Compare the results of bushy join reorder and Lopt join reorder strategy. 
> Need to be same.
>  # If we want to create a bushy join tree after join reorder, we need to add 
> statistics. Like:'JoinReorderITCaseBase.testBushyTreeJoinReorder'. 
> If you meet any problems, it's welcome to ping me directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-ml] lindong28 commented on pull request #214: [FLINK-31127] Add public API classes for FLIP-289

2023-02-21 Thread via GitHub


lindong28 commented on PR #214:
URL: https://github.com/apache/flink-ml/pull/214#issuecomment-1439531423

   @zhipeng93 Can you help review this PR? If this PR looks good to you, I will 
need to send an email to the FLIP-289 voting thread to discuss the proposed API 
change (e.g. let `setModelData()` take multiple input streams) before this PR 
is merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] reswqa opened a new pull request, #21992: [BP-1.17][FLINK-31132] Let hive compact operator without setting parallelism subject to sink operator's configured parallelism.

2023-02-21 Thread via GitHub


reswqa opened a new pull request, #21992:
URL: https://github.com/apache/flink/pull/21992

   Backport FLINK-31132 to release-1.17.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] ChenZhongPu commented on a diff in pull request #21989: [FLINK-31174][Doc] Resolve inconsistent data format between Learn-Flink Doc and flink-training-repo

2023-02-21 Thread via GitHub


ChenZhongPu commented on code in PR #21989:
URL: https://github.com/apache/flink/pull/21989#discussion_r1113906467


##
docs/content.zh/docs/learn-flink/etl.md:
##
@@ -223,11 +215,11 @@ minutesByStartCell
 
 在 Flink 不参与管理状态的情况下,你的应用也可以使用状态,但 Flink 为其管理状态提供了一些引人注目的特性:
 
-* **本地性**: Flink 状态是存储在使用它的机器本地的,并且可以以内存访问速度来获取
-* **持久性**: Flink 状态是容错的,例如,它可以自动按一定的时间间隔产生 checkpoint,并且在任务失败后进行恢复
-* **纵向可扩展性**: Flink 状态可以存储在集成的 RocksDB 实例中,这种方式下可以通过增加本地磁盘来扩展空间
-* **横向可扩展性**: Flink 状态可以随着集群的扩缩容重新分布
-* **可查询性**: Flink 状态可以通过使用 [状态查询 API]({{< ref 
"docs/dev/datastream/fault-tolerance/queryable_state" >}}) 从外部进行查询。
+- **本地性**: Flink 状态是存储在使用它的机器本地的,并且可以以内存访问速度来获取

Review Comment:
   I added an extra [jira](https://issues.apache.org/jira/browse/FLINK-31177) 
in terms of markdown formatting.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] shuiqiangchen commented on pull request #21962: [FLINK-31124][Connectors/Hive] Add IT sase for HiveTableSink speculative execution

2023-02-21 Thread via GitHub


shuiqiangchen commented on PR #21962:
URL: https://github.com/apache/flink/pull/21962#issuecomment-1439524626

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31177) To introduce a formatter for Markdown files

2023-02-21 Thread CHEN Zhongpu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHEN Zhongpu updated FLINK-31177:
-
Description: 
Currently, markdown files in *docs* are maintained and updated by many 
contributors, and different people have varying code style taste. By the way, 
as the syntax of markdown is not really strict, the styles tend to be 
inconsistent.

To name a few,
 * Some prefer `*` to make a list item, while others may prefer `-`.
 * It is common to  leave many unnecessary blank lines and spaces.
 * To make a divider, the number of `-` can be varying.

To this end, I think it would be nicer to encourage or demand contributors to 
format their markdown files before making a pull request.  Personally, I think 
Prettier ([https://prettier.io/)] is a good candidate.

  was:
Currently, markdown files in *docs* are maintained and updated by many 
contributors, and different people have varying code style taste. By the way, 
as the syntax of markdown is not really strict, the styles can also 
inconsistent.

To name a few,
 * Some prefer `*` to make a list item, while others may prefer `-`.
 * It is common to  leave many unnecessary blank lines and spaces.
 * To make a divider, the number of `-` can be varying.

To this end, I think it would be nicer to encourage or demand contributors to 
format their markdown files before making a pull request.  Personally, I think 
Prettier ([https://prettier.io/)] is a good candidate.


> To introduce a formatter for Markdown files
> ---
>
> Key: FLINK-31177
> URL: https://issues.apache.org/jira/browse/FLINK-31177
> Project: Flink
>  Issue Type: Improvement
>Reporter: CHEN Zhongpu
>Priority: Minor
>
> Currently, markdown files in *docs* are maintained and updated by many 
> contributors, and different people have varying code style taste. By the way, 
> as the syntax of markdown is not really strict, the styles tend to be 
> inconsistent.
> To name a few,
>  * Some prefer `*` to make a list item, while others may prefer `-`.
>  * It is common to  leave many unnecessary blank lines and spaces.
>  * To make a divider, the number of `-` can be varying.
> To this end, I think it would be nicer to encourage or demand contributors to 
> format their markdown files before making a pull request.  Personally, I 
> think Prettier ([https://prettier.io/)] is a good candidate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-ml] lindong28 commented on a diff in pull request #214: [FLINK-31127] Add public API classes for FLIP-289

2023-02-21 Thread via GitHub


lindong28 commented on code in PR #214:
URL: https://github.com/apache/flink-ml/pull/214#discussion_r1113911222


##
flink-ml-servable-core/src/main/java/org/apache/flink/ml/servable/api/ModelServable.java:
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.servable.api;
+
+import org.apache.flink.annotation.PublicEvolving;
+
+import java.io.IOException;
+import java.io.InputStream;
+
+/**
+ * A ModelServable is a TransformerServable with the extra API to set model 
data.
+ *
+ * @param  The class type of the ModelServable implementation itself.
+ */
+@PublicEvolving
+public interface ModelServable> extends 
TransformerServable {
+
+/** Sets model data using the serialized model data from the given input 
stream. */

Review Comment:
   Thanks for catching this. It is fixed now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31166) array_contains element type error

2023-02-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31166:
---
Labels: pull-request-available  (was: )

> array_contains element type error
> -
>
> Key: FLINK-31166
> URL: https://issues.apache.org/jira/browse/FLINK-31166
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.18.0
>Reporter: jackylau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
> Attachments: image-2023-02-21-18-37-45-202.png, 
> image-2023-02-21-18-41-19-385.png, image-2023-02-22-09-56-59-257.png
>
>
> !image-2023-02-22-09-56-59-257.png!
>  
> !image-2023-02-21-18-41-19-385.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #21990: [Flink 31170] [docs]The spelling error of the document word causes sql to fail to execute

2023-02-21 Thread via GitHub


flinkbot commented on PR #21990:
URL: https://github.com/apache/flink/pull/21990#issuecomment-1439515422

   
   ## CI report:
   
   * 9dd7ee22fcd390e9a32e54fe914640ef2d7b888f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] JingsongLi opened a new pull request, #551: [FLINK-31179] Make data structures serializable

2023-02-21 Thread via GitHub


JingsongLi opened a new pull request, #551:
URL: https://github.com/apache/flink-table-store/pull/551

   This can make our internal data structures easy to use.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31179) Make data structures serializable

2023-02-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31179:
---
Labels: pull-request-available  (was: )

> Make data structures serializable
> -
>
> Key: FLINK-31179
> URL: https://issues.apache.org/jira/browse/FLINK-31179
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31179) Make data structures serializable

2023-02-21 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-31179:


 Summary: Make data structures serializable
 Key: FLINK-31179
 URL: https://issues.apache.org/jira/browse/FLINK-31179
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] flinkbot commented on pull request #21991: [BP-1.17][FLINK-31091][sql-gateway] Add Ser/de for Interval types (#21945)

2023-02-21 Thread via GitHub


flinkbot commented on PR #21991:
URL: https://github.com/apache/flink/pull/21991#issuecomment-1439549889

   
   ## CI report:
   
   * f2a617eb6560c48e85f0dbff2cc1d550194bf67b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-31177) To introduce a formatter for Markdown files

2023-02-21 Thread CHEN Zhongpu (Jira)
CHEN Zhongpu created FLINK-31177:


 Summary: To introduce a formatter for Markdown files
 Key: FLINK-31177
 URL: https://issues.apache.org/jira/browse/FLINK-31177
 Project: Flink
  Issue Type: Improvement
Reporter: CHEN Zhongpu


Currently, markdown files in *docs* are maintained and updated by many 
contributors, and different people have varying code style taste. By the way, 
as the syntax of markdown is not really strict, the styles can also 
inconsistent.

To name a few,
 * Some prefer `*` to make a list item, while others may prefer `-`.
 * It is common to  leave many unnecessary blank lines and spaces.
 * To make a divider, the number of `-` can be varying.

To this end, I think it would be nicer to encourage or demand contributors to 
format their markdown files before making a pull request.  Personally, I think 
Prettier ([https://prettier.io/)] is a good candidate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31169) KubernetesResourceManagerDriverTest.testOnPodDeleted fails fatally due to 239 exit code

2023-02-21 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691994#comment-17691994
 ] 

Matthias Pohl commented on FLINK-31169:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46382=logs=5cae8624-c7eb-5c51-92d3-4d2dacedd221=5acec1b4-945b-59ca-34f8-168928ce5199=26468

> KubernetesResourceManagerDriverTest.testOnPodDeleted fails fatally due to 239 
> exit code
> ---
>
> Key: FLINK-31169
> URL: https://issues.apache.org/jira/browse/FLINK-31169
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>Assignee: Xintong Song
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> master: 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46341=logs=5cae8624-c7eb-5c51-92d3-4d2dacedd221=5acec1b4-945b-59ca-34f8-168928ce5199=27329
> {code}
> [...]
> Feb 21 04:44:11 [ERROR] Process Exit Code: 239
> Feb 21 04:44:11 [ERROR] Crashed tests:
> Feb 21 04:44:11 [ERROR] 
> org.apache.flink.kubernetes.KubernetesResourceManagerDriverTest
> Feb 21 04:44:11 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:748)
> [...]
> {code}
> {code}
> [...]
> Test 
> org.apache.flink.kubernetes.KubernetesResourceManagerDriverTest.testOnPodDeleted[testOnPodDeleted()]
>  is running.
> 
> 04:43:57,681 [ForkJoinPool-4-worker-1] INFO  
> org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Recovered 0 
> pods from previous attempts, current attempt id is 1.
> 04:43:57,701 [testing-rpc-main-thread] INFO  
> org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled 
> external resources: []
> 04:43:57,705 [testing-rpc-main-thread] INFO  
> org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Creating 
> new TaskManager pod with name testing-flink-cluster-taskmanager-1-1 and 
> resource <704,0.0>.
> 04:43:57,708 [testing-rpc-main-thread] INFO  
> org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Received 
> new TaskManager pod: testing-flink-cluster-taskmanager-1-1
> 04:43:57,708 [testing-rpc-main-thread] INFO  
> org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Pod 
> testing-flink-cluster-taskmanager-1-1 is created.
> 04:43:57,708 [testing-rpc-main-thread] WARN  
> org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Pod 
> testing-flink-cluster-taskmanager-1-1 is terminated before being scheduled.
> 04:43:57,709 [testing-rpc-main-thread] ERROR 
> org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Error 
> completing resource request.
> org.apache.flink.util.FlinkException: Pod is terminated.
> at 
> org.apache.flink.kubernetes.KubernetesResourceManagerDriver.onPodTerminated(KubernetesResourceManagerDriver.java:379)
>  ~[classes/:?]
> at 
> org.apache.flink.kubernetes.KubernetesResourceManagerDriver.lambda$handlePodEventsInMainThread$2(KubernetesResourceManagerDriver.java:347)
>  ~[classes/:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_292]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_292]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  [?:1.8.0_292]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  [?:1.8.0_292]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_292]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_292]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
> 04:43:57,724 [testing-rpc-main-thread] ERROR 
> org.apache.flink.util.FatalExitExceptionHandler  [] - FATAL: 
> Thread 'testing-rpc-main-thread' produced an uncaught exception. Stopping the 
> process...
> java.util.concurrent.CompletionException: java.lang.RuntimeException: 
> org.apache.flink.util.FlinkException: Pod is terminated.
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
>  ~[?:1.8.0_292]
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
>  ~[?:1.8.0_292]
> at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) 
> ~[?:1.8.0_292]
> at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
>  ~[?:1.8.0_292]
> at 
> 

[GitHub] [flink] snuyanzin opened a new pull request, #21991: [BP-1.17][FLINK-31091][sql-gateway] Add Ser/de for Interval types (#21945)

2023-02-21 Thread via GitHub


snuyanzin opened a new pull request, #21991:
URL: https://github.com/apache/flink/pull/21991

   This is a backport of 14adc1679fb3d025a2808af91f23f14e7c6f6e24 to 1.17.0 
branch
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #546: [FLINK-31009] Add recordCount to snapshot meta

2023-02-21 Thread via GitHub


JingsongLi commented on code in PR #546:
URL: https://github.com/apache/flink-table-store/pull/546#discussion_r1113926517


##
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/operation/FileStoreCommitImpl.java:
##
@@ -477,12 +477,20 @@ private boolean tryCommitOnce(
 List newMetas = new ArrayList<>();
 List changelogMetas = new ArrayList<>();
 try {
+long previousChangesRecordCount = 0L;
 if (latestSnapshot != null) {
+List previousManifests =
+latestSnapshot.dataManifests(manifestList);
 // read all previous manifest files
-
oldMetas.addAll(latestSnapshot.readAllDataManifests(manifestList));
+oldMetas.addAll(previousManifests);
+previousChangesRecordCount =
+latestSnapshot.version() >= Snapshot.CURRENT_VERSION

Review Comment:
   Create a util method: `totalRecordCount(Snapshot snapshot)`.
   And I think maybe just judge that field is null instead of snapshot version?



##
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/Snapshot.java:
##
@@ -124,6 +130,24 @@ public class Snapshot {
 @JsonProperty(FIELD_LOG_OFFSETS)
 private final Map logOffsets;
 
+// record count of all changes from the previous snapshots
+// null for table store <= 0.3
+@JsonProperty(FIELD_BASE_RECORD_COUNT)
+@Nullable
+private final Long baseRecordCount;

Review Comment:
   maybe just provide a `totalRecordCount`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-31178) Public Writer API

2023-02-21 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-31178:


 Summary: Public Writer API
 Key: FLINK-31178
 URL: https://issues.apache.org/jira/browse/FLINK-31178
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31041) Race condition in DefaultScheduler results in memory leak and busy loop

2023-02-21 Thread Danny Cranmer (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691998#comment-17691998
 ] 

Danny Cranmer commented on FLINK-31041:
---

Merged commit 
[{{b3e1492}}|https://github.com/apache/flink/commit/b3e14928e815dd6dbdffbe3c5616733d4c7c8825]
 into apache:master

> Race condition in DefaultScheduler results in memory leak and busy loop
> ---
>
> Key: FLINK-31041
> URL: https://issues.apache.org/jira/browse/FLINK-31041
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.3, 1.16.1
>Reporter: Danny Cranmer
>Assignee: Weihua Hu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.15.4, 1.16.2
>
> Attachments: failovers.log, flink-31041-heap-dump.png, 
> test-restart-strategy.log
>
>
> h4. Context
> When a job creates multiple sources that use the {{SourceCoordinator}} 
> (FLIP-27), there is a failure race condition that results in:
>  * Memory leak of {{ExecutionVertexVersion}}
>  * Busy loop constantly trying to restart job
>  * Restart strategy is not respected
> This results in the Job Manager becoming unresponsive.
> h4. !flink-31041-heap-dump.png!
> h4. Reproduction Steps
> This can be reproduced by a job that creates multiple sources that fail in 
> the {{{}SplitEnumerator{}}}. We observed this with multiple {{KafkaSource's}} 
> trying to load a non-existent cert from the file system and throwing FNFE. 
> Thus, here is a simple job to reproduce (BE WARNED: running this locally will 
> lock up your IDE):
> {code:java}
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setParallelism(1);
> env.setRestartStrategy(new 
> RestartStrategies.FailureRateRestartStrategyConfiguration(1, Time.of(10, 
> TimeUnit.SECONDS), Time.of(10, TimeUnit.SECONDS)));
> KafkaSource source = KafkaSource.builder()
> .setProperty("security.protocol", "SASL_SSL")
> // SSL configurations
> // Configure the path of truststore (CA) provided by the server
> .setProperty("ssl.truststore.location", 
> "/path/to/kafka.client.truststore.jks")
> .setProperty("ssl.truststore.password", "test1234")
> // Configure the path of keystore (private key) if client 
> authentication is required
> .setProperty("ssl.keystore.location", 
> "/path/to/kafka.client.keystore.jks")
> .setProperty("ssl.keystore.password", "test1234")
> // SASL configurations
> // Set SASL mechanism as SCRAM-SHA-256
> .setProperty("sasl.mechanism", "SCRAM-SHA-256")
> // Set JAAS configurations
> .setProperty("sasl.jaas.config", 
> "org.apache.kafka.common.security.scram.ScramLoginModule required 
> username=\"username\" password=\"password\";")
> .setBootstrapServers("http://localhost:3456;)
> .setTopics("input-topic")
> .setGroupId("my-group")
> .setStartingOffsets(OffsetsInitializer.earliest())
> .setValueOnlyDeserializer(new SimpleStringSchema())
> .build();
> List> sources = IntStream.range(0, 32)
> .mapToObj(i -> env
> .fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka 
> Source " + i).uid("source-" + i)
> .keyBy(s -> s.charAt(0))
> .map(s -> s))
> .collect(Collectors.toList());
> env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka 
> Source").uid("source")
> .keyBy(s -> s.charAt(0))
> .union(sources.toArray(new SingleOutputStreamOperator[] {}))
> .print();
> env.execute("test job"); {code}
> h4. Root Cause
> We can see that the {{OperatorCoordinatorHolder}} already has a [debounce 
> mechanism|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/coordination/OperatorCoordinatorHolder.java#L609],
>  however the {{DefaultScheduler}} does not. We need a debounce mechanism in 
> the {{DefaultScheduler}} since it handles many 
> {{{}OperatorCoordinatorHolder{}}}.
> h4. Fix
> I have managed to fix this, I will open a PR, but would need feedback from 
> people who understand this code better than me!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] liuyongvs commented on pull request #21993: [FLINK-31166][table] Fix array_contains does not support null argumen…

2023-02-21 Thread via GitHub


liuyongvs commented on PR #21993:
URL: https://github.com/apache/flink/pull/21993#issuecomment-1439574746

   hi @snuyanzin after dig into the code, i fix it now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-31079) Release Testing: Verify FLINK-29663 Further improvements of adaptive batch scheduler

2023-02-21 Thread Lijie Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lijie Wang closed FLINK-31079.
--
Resolution: Done

> Release Testing: Verify FLINK-29663 Further improvements of adaptive batch 
> scheduler
> 
>
> Key: FLINK-31079
> URL: https://issues.apache.org/jira/browse/FLINK-31079
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: miamiaoxyz
>Priority: Blocker
> Fix For: 1.17.0
>
> Attachments: image-2023-02-22-14-00-13-646.png
>
>
> This task aims to verify FLINK-29663 which improves the adaptive batch 
> scheduler.
> Before the change of FLINK-29663, adaptive batch scheduler will distribute 
> subpartitoins according to the number of subpartitions, make different 
> downstream subtasks consume roughly the same number of subpartitions. This 
> will lead to imbalance loads of different downstream tasks when the 
> subpartitions contain different amounts of data.
> To solve this problem, in FLINK-29663, we let the adaptive batch scheduler 
> distribute subpartitoins according to the amount of data, so that different 
> downstream subtasks consume roughly the same amount of data. Note that 
> currently it only takes effect for All-To-All edges.
> The documentation of adaptive scheduler can be found 
> [here|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/#adaptive-batch-scheduler]
> One can verify it by creating intended data skew on All-To-All edges.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31079) Release Testing: Verify FLINK-29663 Further improvements of adaptive batch scheduler

2023-02-21 Thread Lijie Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691969#comment-17691969
 ] 

Lijie Wang edited comment on FLINK-31079 at 2/22/23 7:14 AM:
-

Thanks [~lsy]. Currently, the 
[{{execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task}}|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-ta]
 is cluster level, but I personally think it makes sense to make it job level, 
especially when using session cluster, I will evaluate it in the future version.


was (Author: wanglijie95):
Thanks [~lsy]. Currently, the 
[{{execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task}}|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-ta]
 is cluster level, but I personally think it makes sense to make it a job 
level, especially when using session cluster, I will evaluate it in the future 
version.

> Release Testing: Verify FLINK-29663 Further improvements of adaptive batch 
> scheduler
> 
>
> Key: FLINK-31079
> URL: https://issues.apache.org/jira/browse/FLINK-31079
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: miamiaoxyz
>Priority: Blocker
> Fix For: 1.17.0
>
> Attachments: image-2023-02-22-14-00-13-646.png
>
>
> This task aims to verify FLINK-29663 which improves the adaptive batch 
> scheduler.
> Before the change of FLINK-29663, adaptive batch scheduler will distribute 
> subpartitoins according to the number of subpartitions, make different 
> downstream subtasks consume roughly the same number of subpartitions. This 
> will lead to imbalance loads of different downstream tasks when the 
> subpartitions contain different amounts of data.
> To solve this problem, in FLINK-29663, we let the adaptive batch scheduler 
> distribute subpartitoins according to the amount of data, so that different 
> downstream subtasks consume roughly the same amount of data. Note that 
> currently it only takes effect for All-To-All edges.
> The documentation of adaptive scheduler can be found 
> [here|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/#adaptive-batch-scheduler]
> One can verify it by creating intended data skew on All-To-All edges.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31079) Release Testing: Verify FLINK-29663 Further improvements of adaptive batch scheduler

2023-02-21 Thread Lijie Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691969#comment-17691969
 ] 

Lijie Wang commented on FLINK-31079:


Thanks [~lsy]. Currently, the 
[{{execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task}}|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-ta]
 is cluster level, but I personally think it makes sense to make it a job 
level, especially when using session cluster, I will evaluate it in the future 
version.

> Release Testing: Verify FLINK-29663 Further improvements of adaptive batch 
> scheduler
> 
>
> Key: FLINK-31079
> URL: https://issues.apache.org/jira/browse/FLINK-31079
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: miamiaoxyz
>Priority: Blocker
> Fix For: 1.17.0
>
> Attachments: image-2023-02-22-14-00-13-646.png
>
>
> This task aims to verify FLINK-29663 which improves the adaptive batch 
> scheduler.
> Before the change of FLINK-29663, adaptive batch scheduler will distribute 
> subpartitoins according to the number of subpartitions, make different 
> downstream subtasks consume roughly the same number of subpartitions. This 
> will lead to imbalance loads of different downstream tasks when the 
> subpartitions contain different amounts of data.
> To solve this problem, in FLINK-29663, we let the adaptive batch scheduler 
> distribute subpartitoins according to the amount of data, so that different 
> downstream subtasks consume roughly the same amount of data. Note that 
> currently it only takes effect for All-To-All edges.
> The documentation of adaptive scheduler can be found 
> [here|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/#adaptive-batch-scheduler]
> One can verify it by creating intended data skew on All-To-All edges.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31079) Release Testing: Verify FLINK-29663 Further improvements of adaptive batch scheduler

2023-02-21 Thread Lijie Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691969#comment-17691969
 ] 

Lijie Wang edited comment on FLINK-31079 at 2/22/23 7:14 AM:
-

Thanks [~lsy]. Currently, the 
[{{execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task}}|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-ta]
 is cluster level, but I personally think it makes sense to make it job level, 
especially when using session cluster, I will evaluate it in the future.


was (Author: wanglijie95):
Thanks [~lsy]. Currently, the 
[{{execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task}}|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-ta]
 is cluster level, but I personally think it makes sense to make it job level, 
especially when using session cluster, I will evaluate it in the future version.

> Release Testing: Verify FLINK-29663 Further improvements of adaptive batch 
> scheduler
> 
>
> Key: FLINK-31079
> URL: https://issues.apache.org/jira/browse/FLINK-31079
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: miamiaoxyz
>Priority: Blocker
> Fix For: 1.17.0
>
> Attachments: image-2023-02-22-14-00-13-646.png
>
>
> This task aims to verify FLINK-29663 which improves the adaptive batch 
> scheduler.
> Before the change of FLINK-29663, adaptive batch scheduler will distribute 
> subpartitoins according to the number of subpartitions, make different 
> downstream subtasks consume roughly the same number of subpartitions. This 
> will lead to imbalance loads of different downstream tasks when the 
> subpartitions contain different amounts of data.
> To solve this problem, in FLINK-29663, we let the adaptive batch scheduler 
> distribute subpartitoins according to the amount of data, so that different 
> downstream subtasks consume roughly the same amount of data. Note that 
> currently it only takes effect for All-To-All edges.
> The documentation of adaptive scheduler can be found 
> [here|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/#adaptive-batch-scheduler]
> One can verify it by creating intended data skew on All-To-All edges.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31167) Verify that no exclusions were erroneously added to the japicmp plugin

2023-02-21 Thread Biao Liu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691970#comment-17691970
 ] 

Biao Liu commented on FLINK-31167:
--

Hi [~mapohl] ,

As you mentioned, the new method with context were introduced to replace the 
old one.

Let's take the {{finalizeGlobal}} as an example. If we do not provide the 
default implementation for {{finalizeGlobal(int)}} and someone wants to use the 
new method with context, he has to implement both {{finalizeGlobal(int)}} and 
{{finalizeGlobal(FinalizationContext)}} even the former one means nothing to 
him.

That's not we expect users to do. The {{finalizeGlobal(int)}} works for the 
legacy codes, and {{finalizeGlobal(FinalizationContext)}} works for the new 
implementation. Users only need to implement one of them. It's sort of like how 
SinkFunction is handled in https://issues.apache.org/jira/browse/FLINK-7552.

I think it's not a breaking change however it could not pass the japicmp 
checking.

> Verify that no exclusions were erroneously added to the japicmp plugin
> --
>
> Key: FLINK-31167
> URL: https://issues.apache.org/jira/browse/FLINK-31167
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>
> Verify that no exclusions were erroneously added to the japicmp plugin that 
> break compatibility guarantees. Check the exclusions for the 
> japicmp-maven-plugin in the root pom (see 
> [apache/flink:pom.xml:2175ff|https://github.com/apache/flink/blob/3856c49af77601cf7943a5072d8c932279ce46b4/pom.xml#L2175]
>  for exclusions that:
> * For minor releases: break source compatibility for {{@Public}} APIs
> * For patch releases: break source/binary compatibility for 
> {{@Public}}/{{@PublicEvolving}}  APIs
> Any such exclusion must be properly justified, in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-30129) Push projection through ChangelogNormalize

2023-02-21 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-30129:

Fix Version/s: 1.18.0

> Push projection through ChangelogNormalize
> --
>
> Key: FLINK-30129
> URL: https://issues.apache.org/jira/browse/FLINK-30129
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Planner
>Reporter: Jark Wu
>Priority: Major
> Fix For: 1.18.0
>
>
> Currently, the ChangelogNormalize node is generated during the physical 
> optimization phase. That means the projection is not pushed through 
> ChangelogNormalize if the {{TableSource}} doesn't support 
> {{SupportsProjectionPushDown}}. We can implement such optimization to reduce 
> the state size (fewer columns in state value) and better throughput (only 
> changes on the selected columns will be emitted). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] MartijnVisser commented on pull request #21985: [FLINK-31162][yarn] Use currUsr.getCredentials.getTokens instead of currUsr.getTokens

2023-02-21 Thread via GitHub


MartijnVisser commented on PR #21985:
URL: https://github.com/apache/flink/pull/21985#issuecomment-1439560715

   > The issue here is different and is only specific to `1.16`
   
   So this issue only occurs in 1.16 and not in 1.17 and later? Then we're good 
yes :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] Fanoid commented on a diff in pull request #210: [FLINK-31010] Add Transformer and Estimator for GBTClassifier and GBTRegressor

2023-02-21 Thread via GitHub


Fanoid commented on code in PR #210:
URL: https://github.com/apache/flink-ml/pull/210#discussion_r1113947913


##
flink-ml-lib/src/main/java/org/apache/flink/ml/common/gbt/GBTModelParams.java:
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.common.gbt;
+
+import org.apache.flink.ml.classification.gbtclassifier.GBTClassifierModel;
+import org.apache.flink.ml.common.param.HasCategoricalCols;
+import org.apache.flink.ml.common.param.HasFeaturesCol;
+import org.apache.flink.ml.common.param.HasLabelCol;
+import org.apache.flink.ml.common.param.HasPredictionCol;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.param.StringArrayParam;
+import org.apache.flink.ml.regression.gbtregressor.GBTRegressorModel;
+
+/**
+ * Params of {@link GBTClassifierModel} and {@link GBTRegressorModel}.
+ *
+ * If the input features come from 1 column of vector type, `featuresCol` 
should be used, and all
+ * features are treated as continuous features. Otherwise, `inputCols` should 
be used for multiple
+ * columns. Columns whose names specified in `categoricalCols` are treated as 
categorical features,
+ * while others are continuous features.
+ *
+ * NOTE: `inputCols` and `featuresCol` are in conflict with each other, so 
they should not be set
+ * at the same time. In addition, `inputCols` has a higher precedence than 
`featuresCol`, that is,
+ * `featuresCol` is ignored when `inputCols` is not `null`.
+ *
+ * @param  The class type of this instance.
+ */
+public interface GBTModelParams
+extends HasFeaturesCol, HasLabelCol, HasCategoricalCols, 
HasPredictionCol {
+
+Param INPUT_COLS = new StringArrayParam("inputCols", "Input 
column names.", null);

Review Comment:
   I'm OK to rename `HasInputCols` to `HasFeaturesCols`.
   
   The reason to have both two parameters is related with the handling of 
categorical columns.
   
   SparkML's implementation only supports a vector column as input, but it can 
store meta information in its schema [1], so the algorithm can distinguish 
between numerical and categorical values.
   
   In Flink, there is no way to store meta information in the Table schema. So 
when end-users want to feed categorical values, our implementation can only 
accept the data as multiple columns, i.e., `HasFeaturesCols`.  Otherwise, 
`HasFeaturesCol` is used.
   
   [1] 
https://github.com/apache/spark/blob/27ed89b7be5ebb91e4a0b106b1669a7867a6012d/mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala#L194



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31166) array_contains element type error

2023-02-21 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692003#comment-17692003
 ] 

Martijn Visser commented on FLINK-31166:


[~jackylau] So this is specifically about the behaviour when the needle (given 
ARRAY_CONTAINS(haystack, needle) is the function) is null? 

> array_contains element type error
> -
>
> Key: FLINK-31166
> URL: https://issues.apache.org/jira/browse/FLINK-31166
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.18.0
>Reporter: jackylau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
> Attachments: image-2023-02-21-18-37-45-202.png, 
> image-2023-02-21-18-41-19-385.png, image-2023-02-22-09-56-59-257.png
>
>
> !image-2023-02-22-09-56-59-257.png!
>  
> !image-2023-02-21-18-41-19-385.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] MartijnVisser merged pull request #21964: [FLINK-30948][Formats/AWS] Remove GlueSchemaRegistry Avro and JSON fo…

2023-02-21 Thread via GitHub


MartijnVisser merged PR #21964:
URL: https://github.com/apache/flink/pull/21964


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-30948) Remove flink-avro-glue-schema-registry and flink-json-glue-schema-registry from Flink main repo

2023-02-21 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser updated FLINK-30948:
---
Fix Version/s: 1.17.0

> Remove flink-avro-glue-schema-registry and flink-json-glue-schema-registry 
> from Flink main repo
> ---
>
> Key: FLINK-30948
> URL: https://issues.apache.org/jira/browse/FLINK-30948
> Project: Flink
>  Issue Type: Sub-task
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Environment: Remove flink-avro-glue-schema-registry and 
> flink-json-glue-schema-registry from Flink main repo, along with associated 
> end-to-end tests
>Reporter: Hong Liang Teoh
>Assignee: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31153) Create a release branch

2023-02-21 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-31153:
--
Description: 
If you are doing a new minor release, you need to update Flink version in the 
following repositories:
 * [apache/flink|https://github.com/apache/flink]
 * [apache/flink-docker|https://github.com/apache/flink-docker]
 * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks]

Patch releases don't require the these repositories to be touched. Simply 
checkout the already existing branch for that version:
{code:java}
$ cd ./tools
$ git checkout release-$SHORT_RELEASE_VERSION
{code}
h4. Flink repository

Create a branch for the new version that we want to release before updating the 
master branch to the next development version:
{code:bash}
$ cd ./tools
$ releasing/create_snapshot_branch.sh
$ git checkout master
$ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION 
releasing/update_branch_version.sh
{code}
In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 as the last entry:
{code:java}
// ...
v1_12("1.12"),
v1_13("1.13"),
v1_14("1.14"),
v1_15("1.15"),
v1_16("1.16");
{code}
The newly created branch and updated {{master}} branch need to be pushed to the 
official repository.
h4. Flink Docker Repository

Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the 
[apache/flink-docker|https://github.com/apache/flink-docker] repository. Make 
sure that 
[apache/flink-docker:.github/workflows/ci.yml|https://github.com/apache/flink-docker/blob/dev-master/.github/workflows/ci.yml]
 points to the correct snapshot version; for {{dev-x.y}} it should point to 
{{{}x.y-SNAPSHOT{}}}, while for {{dev-master}} it should point to the most 
recent snapshot version (\{[$NEXT_SNAPSHOT_VERSION}}).

After pushing the new minor release branch, as the last step you should also 
update the documentation workflow to also build the documentation for the new 
release branch. Check [Managing 
Documentation|https://cwiki.apache.org/confluence/display/FLINK/Managing+Documentation]
 on details on how to do that. You may also want to manually trigger a build to 
make the changes visible as soon as possible.

 

h3. Expectations (Minor Version only)
 * Release branch has been created and pushed
 * Cron job has been added on the release branch in 
([apache-flink:./tools/azure-pipelines/build-apache-repo.yml|https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-apache-repo.yml])
 * Originating branch has the version information updated to the new version
 * New version is added to the 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 enum.
 * Make sure {{flink-docker}} has {{dev-x.y}} branch and docker e2e tests run 
against this branch
 * docs/config.toml has been updated appropriately.
 * The {{dev-x.y}} branch of ({{{}$CURRENT_SNAPSHOT_VERSION{}}}) have been 
created in the Flink Benchmark repo.
 * The {{flink.version}} property of Flink Benchmark repo has been updated to 
the latest snapshot version.

  was:
If you are doing a new major release, you need to update Flink version in the 
following repositories:
 * [apache/flink|https://github.com/apache/flink]
 * [apache/flink-docker|https://github.com/apache/flink-docker]
 * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks]

Minor releases don't require the these repositories to be touched. Simply 
checkout the already existing branch for that version:
{code:java}
$ cd ./tools
$ git checkout release-$SHORT_RELEASE_VERSION
{code}
h4. Flink repository

Create a branch for the new version that we want to release before updating the 
master branch to the next development version:
{code:bash}
$ cd ./tools
$ releasing/create_snapshot_branch.sh
$ git checkout master
$ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION 
releasing/update_branch_version.sh
{code}
In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 as the last entry:
{code:java}
// ...
v1_12("1.12"),
v1_13("1.13"),
v1_14("1.14"),
v1_15("1.15"),
v1_16("1.16");
{code}
The newly created branch and updated {{master}} branch need to be pushed to the 
official repository.
h4. Flink Docker Repository

Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the 
[apache/flink-docker|https://github.com/apache/flink-docker] repository. Make 
sure that 

[jira] [Commented] (FLINK-31059) Release Testing: Verify FLINK-29717 Supports hive udaf such as sum/count by native implementation

2023-02-21 Thread miamiaoxyz (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691464#comment-17691464
 ] 

miamiaoxyz commented on FLINK-31059:


I first use sql client to test the feature, but the 
`exec.hive.native-agg-function.enabled`  do not work. 


I then use ITCase to verify.
 #    I use IT case to turn on and off `exec.hive.native-agg-function.enabled` 
to verify the two results are the same, testSql function test whether the same 
sql get same result with on and off `exec.hive.native-agg-function.enabled`.

I verfied that the plan use Hashagg when turn on the 
`exec.hive.native-agg-function.enabled`, and the plan use SortAgg when turn off 
by IT case.
!image-2023-02-21-15-45-48-226.png|width=549,height=234!
It pass all the IT Case below. 
!image-2023-02-21-15-46-13-966.png|width=501,height=371!
2. I verified that data results are the same when combine sum/count/avg/min/max 
functions in query using `exec.hive.native-agg-function.enabled` on and off 
using the IT case below.

I verfied that the plan use Hashagg when turn on the 
`exec.hive.native-agg-function.enabled`, and the plan use SortAgg when turn off 
by IT case.

!image-2023-02-21-15-49-58-854.png|width=536,height=219!

3. For  `array` and `struct` do not support the max function.  For count 
function, it does not store `array` or `struct` in agg, so they use bigint 
instead, and hash-agg is chosen  .

!image-2023-02-21-15-59-44-470.png|width=1016,height=189!

4. For `first_value` and `last_value` are not implemented in hive,  
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF)]
 I use `collect_set` to test  instead. All the plan use SortAgg, and get same 
result, which meet the expectations.

```

---  turn on `table.exec.hive.native-agg-function.enabled`

== Abstract Syntax Tree ==
LogicalProject(x=[$0], _o__c1=[$1])
+- LogicalAggregate(group=[\{0}], agg#0=[collect_set($1)])
   +- LogicalProject($f0=[$0], $f1=[$1])
      +- LogicalTableScan(table=[[test-catalog, default, foo]])

== Optimized Physical Plan ==
SortAggregate(isMerge=[true], groupBy=[x], select=[x, Final_collect_set($f1) AS 
$f1])
+- Sort(orderBy=[x ASC])
   +- Exchange(distribution=[hash[x]])
      +- LocalSortAggregate(groupBy=[x], select=[x, Partial_collect_set(y) AS 
$f1])
         +- Sort(orderBy=[x ASC])
            +- TableSourceScan(table=[[test-catalog, default, foo]], fields=[x, 
y])

== Optimized Execution Plan ==
SortAggregate(isMerge=[true], groupBy=[x], select=[x, Final_collect_set($f1) AS 
$f1])
+- Exchange(distribution=[forward])
   +- Sort(orderBy=[x ASC])
      +- Exchange(distribution=[hash[x]])
         +- LocalSortAggregate(groupBy=[x], select=[x, Partial_collect_set(y) 
AS $f1])
            +- Exchange(distribution=[forward])
               +- Sort(orderBy=[x ASC])
                  +- TableSourceScan(table=[[test-catalog, default, foo]], 
fields=[x, y])

---  turn off `table.exec.hive.native-agg-function.enabled`

== Abstract Syntax Tree ==
LogicalProject(x=[$0], _o__c1=[$1])
+- LogicalAggregate(group=[\{0}], agg#0=[collect_set($1)])
   +- LogicalProject($f0=[$0], $f1=[$1])
      +- LogicalTableScan(table=[[test-catalog, default, foo]])

== Optimized Physical Plan ==
SortAggregate(isMerge=[true], groupBy=[x], select=[x, Final_collect_set($f1) AS 
$f1])
+- Sort(orderBy=[x ASC])
   +- Exchange(distribution=[hash[x]])
      +- LocalSortAggregate(groupBy=[x], select=[x, Partial_collect_set(y) AS 
$f1])
         +- Sort(orderBy=[x ASC])
            +- TableSourceScan(table=[[test-catalog, default, foo]], fields=[x, 
y])

== Optimized Execution Plan ==
SortAggregate(isMerge=[true], groupBy=[x], select=[x, Final_collect_set($f1) AS 
$f1])
+- Exchange(distribution=[forward])
   +- Sort(orderBy=[x ASC])
      +- Exchange(distribution=[hash[x]])
         +- LocalSortAggregate(groupBy=[x], select=[x, Partial_collect_set(y) 
AS $f1])
            +- Exchange(distribution=[forward])
               +- Sort(orderBy=[x ASC])
                  +- TableSourceScan(table=[[test-catalog, default, foo]], 
fields=[x, y])

```

 

 

!image-2023-02-21-16-31-58-361.png|width=620,height=261!

 

5. I disable the hashagg to force use sortagg to process all of the test above, 
which  can see that the result of forcing to close hashagg is the same as the 
result of turn on and off`exec.hive.native-agg-function.enabled`, which meets 
the expectations

!image-2023-02-21-16-35-46-294.png|width=632,height=392!

 

Problems:

a.  The `exec.hive.native-agg-function.enabled`  do not work on sql client. the 
hashagg is not chosen on sql client.

!https://intranetproxy.alipay.com/skylark/lark/0/2023/png/83756403/1676952029939-182fa078-3a07-4e45-bdbb-832f7f74c838.png|width=703,height=383,id=u4fc84338!

b. Enable and disable `table.exec.hive.native-agg-function.enabled` get 
different result.


[GitHub] [flink] FangYongs commented on a diff in pull request #21901: [FLINK-30968][sql-client] Sql client supports dynamic config to open session

2023-02-21 Thread via GitHub


FangYongs commented on code in PR #21901:
URL: https://github.com/apache/flink/pull/21901#discussion_r1112748189


##
flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/gateway/DefaultContextUtils.java:
##
@@ -51,12 +52,17 @@ public static DefaultContext 
buildDefaultContext(CliOptions.EmbeddedCliOptions o
 } else {
 libDirs = Collections.emptyList();
 }
-return DefaultContext.load(
-options.getPythonConfiguration(), discoverDependencies(jars, 
libDirs), true);
+Configuration pythonConfiguration = options.getPythonConfiguration();
+pythonConfiguration.addAll(
+
ConfigurationUtils.createConfiguration(options.getSessionConfig()));
+return DefaultContext.load(pythonConfiguration, 
discoverDependencies(jars, libDirs), true);

Review Comment:
   LGTM, DONE



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] MartijnVisser merged pull request #21969: [FLINK-30948][Formats/AWS] Remove GlueSchemaRegistry Avro and JSON fo…

2023-02-21 Thread via GitHub


MartijnVisser merged PR #21969:
URL: https://github.com/apache/flink/pull/21969


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-31150) Cross team testing

2023-02-21 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-31150:
-

Assignee: Qingsheng Ren

> Cross team testing
> --
>
> Key: FLINK-31150
> URL: https://issues.apache.org/jira/browse/FLINK-31150
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Matthias Pohl
>Assignee: Qingsheng Ren
>Priority: Major
>
> For user facing features that go into the release we'd like to ensure they 
> can actually _be used_ by Flink users. To achieve this the release managers 
> ensure that an issue for cross team testing is created in the Apache Flink 
> Jira. This can and should be picked up by other community members to verify 
> the functionality and usability of the feature.
> The issue should contain some entry points which enables other community 
> members to test it. It should not contain documentation on how to use the 
> feature as this should be part of the actual documentation. The cross team 
> tests are performed after the feature freeze. Documentation should be in 
> place before that. Those tests are manual tests, so do not confuse them with 
> automated tests.
> To sum that up:
>  * User facing features should be tested by other contributors
>  * The scope is usability and sanity of the feature
>  * The feature needs to be already documented
>  * The contributor creates an issue containing some pointers on how to get 
> started (e.g. link to the documentation, suggested targets of verification)
>  * Other community members pick those issues up and provide feedback
>  * Cross team testing happens right after the feature freeze



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31165) Over Agg: The window rank function without order by error in top N query

2023-02-21 Thread P Rohan Kumar (Jira)
P Rohan Kumar created FLINK-31165:
-

 Summary: Over Agg: The window rank function without order by error 
in top N query
 Key: FLINK-31165
 URL: https://issues.apache.org/jira/browse/FLINK-31165
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.16.0
Reporter: P Rohan Kumar


 
{code:java}
val env: StreamExecutionEnvironment = 
StreamExecutionEnvironment.getExecutionEnvironment

val tableEnv = StreamTableEnvironment.create(env)


val td = TableDescriptor.forConnector("datagen").option("rows-per-second", "10")
  .option("number-of-rows", "10")
  .schema(Schema
.newBuilder()
.column("NAME", DataTypes.VARCHAR(2147483647))
.column("ROLLNO", DataTypes.DECIMAL(5, 0))
.column("DOB", DataTypes.DATE())
.column("CLASS", DataTypes.DECIMAL(2, 0))
.column("SUBJECT", DataTypes.VARCHAR(2147483647))
.build())
  .build()

val table = tableEnv.from(td)


tableEnv.createTemporaryView("temp_table", table)

val newTable = tableEnv.sqlQuery("select temp_table.*,cast('2022-01-01' as 
date) SRC_NO from temp_table")

tableEnv.createTemporaryView("temp_table2", newTable)


val newTable2 = tableEnv.sqlQuery("select * from (select 
NAME,ROLLNO,row_number() over (partition by NAME ORDER BY SRC_NO) AS rownum  
from temp_table2 a) where rownum <= 1")

tableEnv.toChangelogStream(newTable2).print()

env.execute()
 {code}
 

 

I am getting the below error if I run the above code.

I have already provided an order by column.

If I change the order by column to some other column, such as "SUBJECT", then 
the job runs fine.

 

 
{code:java}
Exception in thread "main" java.lang.RuntimeException: Error while applying 
rule FlinkLogicalOverAggregateConverter(in:NONE,out:LOGICAL), args 
[rel#245:LogicalWindow.NONE.any.None: 
0.[NONE].[NONE](input=RelSubset#244,window#0=window(partition {0} rows between 
UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()]))]
    at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:256)
    at 
org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58)
    at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510)
    at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:312)
    at 
org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62)
    at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram.$anonfun$optimize$1(FlinkChainedProgram.scala:59)
    at 
scala.collection.TraversableOnce$folder$1$.apply(TraversableOnce.scala:187)
    at 
scala.collection.TraversableOnce$folder$1$.apply(TraversableOnce.scala:185)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:189)
    at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:184)
    at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:108)
    at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram.optimize(FlinkChainedProgram.scala:55)
    at 
org.apache.flink.table.planner.plan.optimize.StreamCommonSubGraphBasedOptimizer.optimizeTree(StreamCommonSubGraphBasedOptimizer.scala:176)
    at 
org.apache.flink.table.planner.plan.optimize.StreamCommonSubGraphBasedOptimizer.doOptimize(StreamCommonSubGraphBasedOptimizer.scala:83)
    at 
org.apache.flink.table.planner.plan.optimize.CommonSubGraphBasedOptimizer.optimize(CommonSubGraphBasedOptimizer.scala:87)
    at 
org.apache.flink.table.planner.delegation.PlannerBase.optimize(PlannerBase.scala:315)
    at 
org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:195)
    at 
org.apache.flink.table.api.bridge.internal.AbstractStreamTableEnvironmentImpl.toStreamInternal(AbstractStreamTableEnvironmentImpl.java:224)
    at 
org.apache.flink.table.api.bridge.internal.AbstractStreamTableEnvironmentImpl.toStreamInternal(AbstractStreamTableEnvironmentImpl.java:219)
    at 
org.apache.flink.table.api.bridge.scala.internal.StreamTableEnvironmentImpl.toChangelogStream(StreamTableEnvironmentImpl.scala:160)
    at org.example.OverAggregateBug$.main(OverAggregateBug.scala:39)
    at org.example.OverAggregateBug.main(OverAggregateBug.scala)
Caused by: org.apache.flink.table.api.ValidationException: Over Agg: The window 
rank function without order by. please re-check the over window statement.
    at 

[jira] [Assigned] (FLINK-31153) Create a release branch

2023-02-21 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-31153:
-

Assignee: Leonard Xu

> Create a release branch
> ---
>
> Key: FLINK-31153
> URL: https://issues.apache.org/jira/browse/FLINK-31153
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Matthias Pohl
>Assignee: Leonard Xu
>Priority: Major
>
> If you are doing a new minor release, you need to update Flink version in the 
> following repositories:
>  * [apache/flink|https://github.com/apache/flink]
>  * [apache/flink-docker|https://github.com/apache/flink-docker]
>  * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks]
> Patch releases don't require the these repositories to be touched. Simply 
> checkout the already existing branch for that version:
> {code:java}
> $ cd ./tools
> $ git checkout release-$SHORT_RELEASE_VERSION
> {code}
> h4. Flink repository
> Create a branch for the new version that we want to release before updating 
> the master branch to the next development version:
> {code:bash}
> $ cd ./tools
> $ releasing/create_snapshot_branch.sh
> $ git checkout master
> $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION 
> releasing/update_branch_version.sh
> {code}
> In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to 
> [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
>  as the last entry:
> {code:java}
> // ...
> v1_12("1.12"),
> v1_13("1.13"),
> v1_14("1.14"),
> v1_15("1.15"),
> v1_16("1.16");
> {code}
> The newly created branch and updated {{master}} branch need to be pushed to 
> the official repository.
> h4. Flink Docker Repository
> Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the 
> [apache/flink-docker|https://github.com/apache/flink-docker] repository. Make 
> sure that 
> [apache/flink-docker:.github/workflows/ci.yml|https://github.com/apache/flink-docker/blob/dev-master/.github/workflows/ci.yml]
>  points to the correct snapshot version; for {{dev-x.y}} it should point to 
> {{{}x.y-SNAPSHOT{}}}, while for {{dev-master}} it should point to the most 
> recent snapshot version (\{[$NEXT_SNAPSHOT_VERSION}}).
> After pushing the new minor release branch, as the last step you should also 
> update the documentation workflow to also build the documentation for the new 
> release branch. Check [Managing 
> Documentation|https://cwiki.apache.org/confluence/display/FLINK/Managing+Documentation]
>  on details on how to do that. You may also want to manually trigger a build 
> to make the changes visible as soon as possible.
>  
> 
> h3. Expectations (Minor Version only)
>  * Release branch has been created and pushed
>  * Cron job has been added on the release branch in 
> ([apache-flink:./tools/azure-pipelines/build-apache-repo.yml|https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-apache-repo.yml])
>  * Originating branch has the version information updated to the new version
>  * New version is added to the 
> [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
>  enum.
>  * Make sure {{flink-docker}} has {{dev-x.y}} branch and docker e2e tests run 
> against this branch
>  * docs/config.toml has been updated appropriately.
>  * The {{dev-x.y}} branch of ({{{}$CURRENT_SNAPSHOT_VERSION{}}}) have been 
> created in the Flink Benchmark repo.
>  * The {{flink.version}} property of Flink Benchmark repo has been updated to 
> the latest snapshot version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31150) Cross team testing

2023-02-21 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691465#comment-17691465
 ] 

Matthias Pohl commented on FLINK-31150:
---

[~renqs] created FLINK-30926 as an umbrella ticket for the 1.17 release-testing 
efforts.

> Cross team testing
> --
>
> Key: FLINK-31150
> URL: https://issues.apache.org/jira/browse/FLINK-31150
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Matthias Pohl
>Assignee: Qingsheng Ren
>Priority: Major
>
> For user facing features that go into the release we'd like to ensure they 
> can actually _be used_ by Flink users. To achieve this the release managers 
> ensure that an issue for cross team testing is created in the Apache Flink 
> Jira. This can and should be picked up by other community members to verify 
> the functionality and usability of the feature.
> The issue should contain some entry points which enables other community 
> members to test it. It should not contain documentation on how to use the 
> feature as this should be part of the actual documentation. The cross team 
> tests are performed after the feature freeze. Documentation should be in 
> place before that. Those tests are manual tests, so do not confuse them with 
> automated tests.
> To sum that up:
>  * User facing features should be tested by other contributors
>  * The scope is usability and sanity of the feature
>  * The feature needs to be already documented
>  * The contributor creates an issue containing some pointers on how to get 
> started (e.g. link to the documentation, suggested targets of verification)
>  * Other community members pick those issues up and provide feedback
>  * Cross team testing happens right after the feature freeze



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30948) Remove flink-avro-glue-schema-registry and flink-json-glue-schema-registry from Flink main repo

2023-02-21 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691466#comment-17691466
 ] 

Martijn Visser commented on FLINK-30948:


Fixed in:

master: 29f009b7e8c714cd5af0626e9725eb8538a4bd0f
release-1.17: d58829335557dac6ce428df5a80d4244fccf4491

> Remove flink-avro-glue-schema-registry and flink-json-glue-schema-registry 
> from Flink main repo
> ---
>
> Key: FLINK-30948
> URL: https://issues.apache.org/jira/browse/FLINK-30948
> Project: Flink
>  Issue Type: Sub-task
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Environment: Remove flink-avro-glue-schema-registry and 
> flink-json-glue-schema-registry from Flink main repo, along with associated 
> end-to-end tests
>Reporter: Hong Liang Teoh
>Assignee: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-30948) Remove flink-avro-glue-schema-registry and flink-json-glue-schema-registry from Flink main repo

2023-02-21 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser closed FLINK-30948.
--
Resolution: Fixed

> Remove flink-avro-glue-schema-registry and flink-json-glue-schema-registry 
> from Flink main repo
> ---
>
> Key: FLINK-30948
> URL: https://issues.apache.org/jira/browse/FLINK-30948
> Project: Flink
>  Issue Type: Sub-task
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Environment: Remove flink-avro-glue-schema-registry and 
> flink-json-glue-schema-registry from Flink main repo, along with associated 
> end-to-end tests
>Reporter: Hong Liang Teoh
>Assignee: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] dmvk commented on a diff in pull request #21981: [FLINK-21450][runtime] Support LocalRecovery by AdaptiveScheduler

2023-02-21 Thread via GitHub


dmvk commented on code in PR #21981:
URL: https://github.com/apache/flink/pull/21981#discussion_r1112754334


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/AdaptiveScheduler.java:
##
@@ -794,7 +802,23 @@ public void goToWaitingForResources() {
 LOG,
 desiredResources,
 this.initialResourceAllocationTimeout,
-this.resourceStabilizationTimeout));
+this.resourceStabilizationTimeout,
+null));
+}
+
+@Override
+public void goToWaitingForResources(ExecutionGraph executionGraph) {

Review Comment:
   I'm wondering, do we need an extra `goToWaitingForResources` method? 樂 
   
   Can we do something along these lines instead?
   
   ```
   final ExecutionGraph previousExecutionGraph = 
state.as(StateWithExecutionGraph.class)
   .map(StateWithExecutionGraph::getExecutionGraph)
   .orElse(null);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31153) Create a release branch

2023-02-21 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-31153:
--
Description: 
If you are doing a new minor release, you need to update Flink version in the 
following repositories:
 * [apache/flink|https://github.com/apache/flink]
 * [apache/flink-docker|https://github.com/apache/flink-docker]
 * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks]

Patch releases don't require the these repositories to be touched. Simply 
checkout the already existing branch for that version:
{code:java}
$ cd ./tools
$ git checkout release-$SHORT_RELEASE_VERSION
{code}
h4. Flink repository

Create a branch for the new version that we want to release before updating the 
master branch to the next development version:
{code:bash}
$ cd ./tools
$ releasing/create_snapshot_branch.sh
$ git checkout master
$ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION 
releasing/update_branch_version.sh
{code}
In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 as the last entry:
{code:java}
// ...
v1_12("1.12"),
v1_13("1.13"),
v1_14("1.14"),
v1_15("1.15"),
v1_16("1.16");
{code}
The newly created branch and updated {{master}} branch need to be pushed to the 
official repository.
h4. Flink Docker Repository

Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the 
[apache/flink-docker|https://github.com/apache/flink-docker] repository. Make 
sure that 
[apache/flink-docker:.github/workflows/ci.yml|https://github.com/apache/flink-docker/blob/dev-master/.github/workflows/ci.yml]
 points to the correct snapshot version; for {{dev-x.y}} it should point to 
{{{}x.y-SNAPSHOT{}}}, while for {{dev-master}} it should point to the most 
recent snapshot version (\{[$NEXT_SNAPSHOT_VERSION}}).

After pushing the new minor release branch, as the last step you should also 
update the documentation workflow to also build the documentation for the new 
release branch. Check [Managing 
Documentation|https://cwiki.apache.org/confluence/display/FLINK/Managing+Documentation]
 on details on how to do that. You may also want to manually trigger a build to 
make the changes visible as soon as possible.

 

h3. Expectations (Minor Version only)
 * Release branch has been created and pushed
 * Cron job has been added on the release branch in 
([apache-flink:./tools/azure-pipelines/build-apache-repo.yml|https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-apache-repo.yml])
 * Originating branch has the version information updated to the new version
 * New version is added to the 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 enum.
 * Make sure [flink-docker|https://github.com/apache/flink-docker/] has 
{{dev-x.y}} branch and docker e2e tests run against this branch
 * docs/config.toml has been updated appropriately.
 * The {{dev-x.y}} branch of ({{{}$CURRENT_SNAPSHOT_VERSION{}}}) have been 
created in the Flink Benchmark repo.
 * The {{flink.version}} property of Flink Benchmark repo has been updated to 
the latest snapshot version.

  was:
If you are doing a new minor release, you need to update Flink version in the 
following repositories:
 * [apache/flink|https://github.com/apache/flink]
 * [apache/flink-docker|https://github.com/apache/flink-docker]
 * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks]

Patch releases don't require the these repositories to be touched. Simply 
checkout the already existing branch for that version:
{code:java}
$ cd ./tools
$ git checkout release-$SHORT_RELEASE_VERSION
{code}
h4. Flink repository

Create a branch for the new version that we want to release before updating the 
master branch to the next development version:
{code:bash}
$ cd ./tools
$ releasing/create_snapshot_branch.sh
$ git checkout master
$ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION 
releasing/update_branch_version.sh
{code}
In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 as the last entry:
{code:java}
// ...
v1_12("1.12"),
v1_13("1.13"),
v1_14("1.14"),
v1_15("1.15"),
v1_16("1.16");
{code}
The newly created branch and updated {{master}} branch need to be pushed to the 
official repository.
h4. Flink Docker Repository

Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the 
[apache/flink-docker|https://github.com/apache/flink-docker] 

[GitHub] [flink-ml] Fanoid commented on a diff in pull request #213: [FLINK-31126] Move classes not depending on Flink runtime from flink-ml-core to flink-ml-servable-core

2023-02-21 Thread via GitHub


Fanoid commented on code in PR #213:
URL: https://github.com/apache/flink-ml/pull/213#discussion_r1112756657


##
flink-ml-servable-core/pom.xml:
##
@@ -0,0 +1,127 @@
+
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  4.0.0
+
+  
+org.apache.flink
+flink-ml-parent
+2.2-SNAPSHOT
+  
+
+  flink-ml-servable-core
+  Flink ML : Servable : Core
+
+  
+
+
+  org.apache.flink

Review Comment:
   @lindong28 and I had a discussion about `flink-core` in a previous draft PR 
[1]. We agreed this change can be put in a follow-up PR.
   
   [1] https://github.com/apache/flink-ml/pull/199#discussion_r1090122715



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-31092) Hive ITCases fail with OutOfMemoryError

2023-02-21 Thread Leonard Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu reassigned FLINK-31092:
--

Assignee: luoyuxia

> Hive ITCases fail with OutOfMemoryError
> ---
>
> Key: FLINK-31092
> URL: https://issues.apache.org/jira/browse/FLINK-31092
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>Assignee: luoyuxia
>Priority: Critical
>  Labels: test-stability
> Attachments: VisualVM-FLINK-31092.png
>
>
> We're experiencing a OutOfMemoryError where the heap space reaches the upper 
> limit:
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46161=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23142
> {code}
> Feb 15 05:05:14 [INFO] Running 
> org.apache.flink.table.catalog.hive.HiveCatalogITCase
> Feb 15 05:05:17 [INFO] java.lang.OutOfMemoryError: Java heap space
> Feb 15 05:05:17 [INFO] Dumping heap to java_pid9669.hprof ...
> Feb 15 05:05:28 [INFO] Heap dump file created [1957090051 bytes in 11.718 
> secs]
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.cancelPingScheduler(ForkedBooter.java:209)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.acknowledgedExit(ForkedBooter.java:419)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:186)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-30951) Release Testing: Verify FLINK-29635 Hive sink should support merge files in batch mode

2023-02-21 Thread Shengkai Fang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shengkai Fang closed FLINK-30951.
-
Resolution: Fixed

> Release Testing: Verify FLINK-29635 Hive sink should support merge files in 
> batch mode
> --
>
> Key: FLINK-30951
> URL: https://issues.apache.org/jira/browse/FLINK-30951
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Assignee: Shengkai Fang
>Priority: Blocker
> Fix For: 1.17.0
>
> Attachments: screenshot-1.png
>
>
> The issue aims to verfiy FLINK-29635.
> Please verify in batch mode, the document is in 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#file-compaction]:
>  
> 1: enable auto-compaction, write some data to a Hive table which results in 
> the average size of files is less than compaction.small-files.avg-size(16MB 
> by default), verfiy these files should be merged.
> 2:  enable auto-compaction, set compaction.small-files.avg-size to a smaller 
> values, then write some data to a Hive table which results in the average 
> size of files is greater thant the compaction.small-files.avg-size, verfiy 
> these files shouldn't be merged.
> 3. set sink.parallelism manually, check the parallelism of the compact 
> operator is equal to sink.parallelism.
> 4. set compaction.parallelism manually, check the parallelism of the compact 
> operator is equal to compaction.parallelism.
> 5. set compaction.file-size, check the size of the each target file merged is 
> about the `compaction.file-size`.
>  
> We shoud verify it with writing non-partitioned table, static partition 
> table, dynamic partition table.
> We can find the example sql for how to create & write hive table in the 
> codebase  
> [HiveTableCompactSinkITCase|[https://github.com/apache/flink/blob/0915c9850d861165e283acc0f60545cd836f0567/flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/HiveTableCompactSinkITCase.java]].
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-ml] zhipeng93 commented on a diff in pull request #213: [FLINK-31126] Move classes not depending on Flink runtime from flink-ml-core to flink-ml-servable-core

2023-02-21 Thread via GitHub


zhipeng93 commented on code in PR #213:
URL: https://github.com/apache/flink-ml/pull/213#discussion_r1112772912


##
flink-ml-servable-core/pom.xml:
##
@@ -0,0 +1,127 @@
+
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  4.0.0
+
+  
+org.apache.flink
+flink-ml-parent
+2.2-SNAPSHOT
+  
+
+  flink-ml-servable-core
+  Flink ML : Servable : Core
+
+  
+
+
+  org.apache.flink

Review Comment:
   Thanks for the explanation. I agree it is a reasonable solution in the long 
term.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31059) Release Testing: Verify FLINK-29717 Supports hive udaf such as sum/count by native implementation

2023-02-21 Thread dalongliu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691472#comment-17691472
 ] 

dalongliu commented on FLINK-31059:
---

[~miamiaoxyz] Thanks for your verify, for the problem, I will see it.

> Release Testing: Verify FLINK-29717 Supports hive udaf such as sum/count by 
> native implementation
> -
>
> Key: FLINK-31059
> URL: https://issues.apache.org/jira/browse/FLINK-31059
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Affects Versions: 1.17.0
>Reporter: dalongliu
>Assignee: miamiaoxyz
>Priority: Blocker
> Fix For: 1.17.0
>
> Attachments: image-2023-02-21-15-45-48-226.png, 
> image-2023-02-21-15-46-13-966.png, image-2023-02-21-15-47-54-043.png, 
> image-2023-02-21-15-49-58-854.png, image-2023-02-21-15-59-44-470.png, 
> image-2023-02-21-16-28-22-038.png, image-2023-02-21-16-29-42-983.png, 
> image-2023-02-21-16-31-58-361.png, image-2023-02-21-16-35-46-294.png
>
>
> This task aims to verify 
> [FLINK-29717|https://issues.apache.org/jira/browse/FLINK-29717] which 
> improves the hive udaf performance.
> As the document [PR|https://github.com/apache/flink/pull/21789] description, 
> please veriy:
> 1. Enabling the option `table.exec.hive.native-agg-function.enabled`, use the 
> sum/count/avg/min/max functions separately in the query to verify if the 
> hash-agg strategy is chosen via plan, and verify if the data results are the 
> same as when the option `table.exec.hive.native-agg-function.enabled` is 
> disabled.
> 2. Enabling the option `table.exec.hive.native-agg-function.enabled`, combine 
> sum/count/avg/min/max functions in query, verify if the hash-agg strategy is 
> chosen via plan, and verify if the data results are the same as when option 
> `table.exec.hive.native-agg-function.enabled` is disabled.
> 3. Enabling the option `table.exec.hive.native-agg-function.enabled`, count 
> or max array and other complex types in query, verify whether the 
> sort-agg strategy is chosen via plan, verify whether the data result is the 
> same as when option `table.exec.hive.native-agg-function.enabled` is disabled.
> 4. Enabling the option `table.exec.hive.native-agg-function.enabled`, use the 
> sum/count and first_value/last_value functions in the query simultaneously, 
> verify that the sort-agg strategy is chosen via plan, verify that the data is 
> the same as when option `table.exec.hive.native-agg-function.enabled` is 
> disabled.
> 5. Enabling the option `table.exec.hive.native-agg-function.enabled`, use the 
> sum/count/avg/min/max functions in the query and open sort-agg strategy 
> forcibly, verify that the data results are the same as when option 
> `table.exec.hive.native-agg-function.enabled` is disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31059) Release Testing: Verify FLINK-29717 Supports hive udaf such as sum/count by native implementation

2023-02-21 Thread dalongliu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dalongliu closed FLINK-31059.
-
Resolution: Fixed

> Release Testing: Verify FLINK-29717 Supports hive udaf such as sum/count by 
> native implementation
> -
>
> Key: FLINK-31059
> URL: https://issues.apache.org/jira/browse/FLINK-31059
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Affects Versions: 1.17.0
>Reporter: dalongliu
>Assignee: miamiaoxyz
>Priority: Blocker
> Fix For: 1.17.0
>
> Attachments: image-2023-02-21-15-45-48-226.png, 
> image-2023-02-21-15-46-13-966.png, image-2023-02-21-15-47-54-043.png, 
> image-2023-02-21-15-49-58-854.png, image-2023-02-21-15-59-44-470.png, 
> image-2023-02-21-16-28-22-038.png, image-2023-02-21-16-29-42-983.png, 
> image-2023-02-21-16-31-58-361.png, image-2023-02-21-16-35-46-294.png
>
>
> This task aims to verify 
> [FLINK-29717|https://issues.apache.org/jira/browse/FLINK-29717] which 
> improves the hive udaf performance.
> As the document [PR|https://github.com/apache/flink/pull/21789] description, 
> please veriy:
> 1. Enabling the option `table.exec.hive.native-agg-function.enabled`, use the 
> sum/count/avg/min/max functions separately in the query to verify if the 
> hash-agg strategy is chosen via plan, and verify if the data results are the 
> same as when the option `table.exec.hive.native-agg-function.enabled` is 
> disabled.
> 2. Enabling the option `table.exec.hive.native-agg-function.enabled`, combine 
> sum/count/avg/min/max functions in query, verify if the hash-agg strategy is 
> chosen via plan, and verify if the data results are the same as when option 
> `table.exec.hive.native-agg-function.enabled` is disabled.
> 3. Enabling the option `table.exec.hive.native-agg-function.enabled`, count 
> or max array and other complex types in query, verify whether the 
> sort-agg strategy is chosen via plan, verify whether the data result is the 
> same as when option `table.exec.hive.native-agg-function.enabled` is disabled.
> 4. Enabling the option `table.exec.hive.native-agg-function.enabled`, use the 
> sum/count and first_value/last_value functions in the query simultaneously, 
> verify that the sort-agg strategy is chosen via plan, verify that the data is 
> the same as when option `table.exec.hive.native-agg-function.enabled` is 
> disabled.
> 5. Enabling the option `table.exec.hive.native-agg-function.enabled`, use the 
> sum/count/avg/min/max functions in the query and open sort-agg strategy 
> forcibly, verify that the data results are the same as when option 
> `table.exec.hive.native-agg-function.enabled` is disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-ml] zhipeng93 commented on pull request #213: [FLINK-31126] Move classes not depending on Flink runtime from flink-ml-core to flink-ml-servable-core

2023-02-21 Thread via GitHub


zhipeng93 commented on PR #213:
URL: https://github.com/apache/flink-ml/pull/213#issuecomment-1438119732

   Thanks for the PR. LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] zhipeng93 merged pull request #213: [FLINK-31126] Move classes not depending on Flink runtime from flink-ml-core to flink-ml-servable-core

2023-02-21 Thread via GitHub


zhipeng93 merged PR #213:
URL: https://github.com/apache/flink-ml/pull/213


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] lindong28 commented on pull request #213: [FLINK-31126] Move classes not depending on Flink runtime from flink-ml-core to flink-ml-servable-core

2023-02-21 Thread via GitHub


lindong28 commented on PR #213:
URL: https://github.com/apache/flink-ml/pull/213#issuecomment-1438122258

   Thank you both for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31153) Create a release branch

2023-02-21 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-31153:
--
Description: 
If you are doing a new minor release, you need to update Flink version in the 
following repositories:
 * [apache/flink|https://github.com/apache/flink]
 * [apache/flink-docker|https://github.com/apache/flink-docker]
 * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks]

Patch releases don't require the these repositories to be touched. Simply 
checkout the already existing branch for that version:
{code:java}
$ cd ./tools
$ git checkout release-$SHORT_RELEASE_VERSION
{code}
h4. Flink repository

Create a branch for the new version that we want to release before updating the 
master branch to the next development version:
{code:bash}
$ cd ./tools
$ releasing/create_snapshot_branch.sh
$ git checkout master
$ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION 
releasing/update_branch_version.sh
{code}
In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 as the last entry:
{code:java}
// ...
v1_12("1.12"),
v1_13("1.13"),
v1_14("1.14"),
v1_15("1.15"),
v1_16("1.16");
{code}
The newly created branch and updated {{master}} branch need to be pushed to the 
official repository.
h4. Flink Docker Repository

Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the 
[apache/flink-docker|https://github.com/apache/flink-docker] repository. Make 
sure that 
[apache/flink-docker:.github/workflows/ci.yml|https://github.com/apache/flink-docker/blob/dev-master/.github/workflows/ci.yml]
 points to the correct snapshot version; for {{dev-x.y}} it should point to 
{{{}x.y-SNAPSHOT{}}}, while for {{dev-master}} it should point to the most 
recent snapshot version (\{[$NEXT_SNAPSHOT_VERSION}}).

After pushing the new minor release branch, as the last step you should also 
update the documentation workflow to also build the documentation for the new 
release branch. Check [Managing 
Documentation|https://cwiki.apache.org/confluence/display/FLINK/Managing+Documentation]
 on details on how to do that. You may also want to manually trigger a build to 
make the changes visible as soon as possible.

 

h3. Expectations (Minor Version only)
 * Release branch has been created and pushed
 * Cron job has been added on the release branch in 
([apache-flink:./tools/azure-pipelines/build-apache-repo.yml|https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-apache-repo.yml])
 * Changes on the new release branch are picked up by [Azure 
CI|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=1&_a=summary]
 * Originating branch has the version information updated to the new version
 * New version is added to the 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 enum.
 * Make sure [flink-docker|https://github.com/apache/flink-docker/] has 
{{dev-x.y}} branch and docker e2e tests run against this branch in the 
corresponding Apache Flink release branch (see 
[apache/flink:flink-end-to-end-tests/test-scripts/common_docker.sh:51|https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/common_docker.sh#L51])
 * 
[apache-flink:docs/config.toml|https://github.com/apache/flink/blob/release-1.17/docs/config.toml]
 has been updated appropriately in the new Apache Flink release branch.
 * The {{dev-x.y}} branch of ({{{}$CURRENT_SNAPSHOT_VERSION{}}}) have been 
created in the Flink Benchmark repo.
 * The {{flink.version}} property (see 
[apache/flink-benchmarks:pom.xml|https://github.com/apache/flink-benchmarks/blob/master/pom.xml#L48]
 of Flink Benchmark repo has been updated to the latest snapshot version.

  was:
If you are doing a new minor release, you need to update Flink version in the 
following repositories:
 * [apache/flink|https://github.com/apache/flink]
 * [apache/flink-docker|https://github.com/apache/flink-docker]
 * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks]

Patch releases don't require the these repositories to be touched. Simply 
checkout the already existing branch for that version:
{code:java}
$ cd ./tools
$ git checkout release-$SHORT_RELEASE_VERSION
{code}
h4. Flink repository

Create a branch for the new version that we want to release before updating the 
master branch to the next development version:
{code:bash}
$ cd ./tools
$ releasing/create_snapshot_branch.sh
$ git checkout master
$ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION 
releasing/update_branch_version.sh
{code}
In the {{master}} branch, add a new 

[jira] [Updated] (FLINK-31147) Create a new version in JIRA

2023-02-21 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-31147:
--
Description: 
When contributors resolve an issue in JIRA, they are tagging it with a release 
that will contain their changes. With the release currently underway, new 
issues should be resolved against a subsequent future release. Therefore, you 
should create a release item for this subsequent release, as follows:
 # In JIRA, navigate to the [Flink > Administration > 
Versions|https://issues.apache.org/jira/plugins/servlet/project-config/FLINK/versions].
 # Add a new release: choose the next minor version number compared to the one 
currently underway, select today’s date as the Start Date, and choose Add.
(Note: Only PMC members have access to the project administration. If you do 
not have access, ask on the mailing list for assistance.)

 

h3. Expectations
 * The new version should be listed in the dropdown menu of {{fixVersion}} or 
{{affectedVersion}} under "unreleased versions" when creating a new Jira issue.

  was:
When contributors resolve an issue in JIRA, they are tagging it with a release 
that will contain their changes. With the release currently underway, new 
issues should be resolved against a subsequent future release. Therefore, you 
should create a release item for this subsequent release, as follows:
 # In JIRA, navigate to the [Flink > Administration > 
Versions|https://issues.apache.org/jira/plugins/servlet/project-config/FLINK/versions].
 # Add a new release: choose the next minor version number compared to the one 
currently underway, select today’s date as the Start Date, and choose Add.
(Note: Only PMC members have access to the project administration. If you do 
not have access, ask on the mailing list for assistance.)


> Create a new version in JIRA
> 
>
> Key: FLINK-31147
> URL: https://issues.apache.org/jira/browse/FLINK-31147
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Matthias Pohl
>Priority: Major
>
> When contributors resolve an issue in JIRA, they are tagging it with a 
> release that will contain their changes. With the release currently underway, 
> new issues should be resolved against a subsequent future release. Therefore, 
> you should create a release item for this subsequent release, as follows:
>  # In JIRA, navigate to the [Flink > Administration > 
> Versions|https://issues.apache.org/jira/plugins/servlet/project-config/FLINK/versions].
>  # Add a new release: choose the next minor version number compared to the 
> one currently underway, select today’s date as the Start Date, and choose Add.
> (Note: Only PMC members have access to the project administration. If you do 
> not have access, ask on the mailing list for assistance.)
>  
> 
> h3. Expectations
>  * The new version should be listed in the dropdown menu of {{fixVersion}} or 
> {{affectedVersion}} under "unreleased versions" when creating a new Jira 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31149) Review and update documentation

2023-02-21 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-31149:
--
Description: 
There are a few pages in the documentation that need to be reviewed and updated 
for each release.
 * Ensure that there exists a release notes page for each non-bugfix release 
(e.g., 1.5.0) in {{{}./docs/release-notes/{}}}, that it is up-to-date, and 
linked from the start page of the documentation.
 * Upgrading Applications and Flink Versions: 
[https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html]
 * ...

 

h3. Expectations
 * Update upgrade compatibility table 
([apache-flink:./docs/content/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content/docs/ops/upgrading.md#compatibility-table]
 and 
[apache-flink:./docs/content.zh/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content.zh/docs/ops/upgrading.md#compatibility-table]).
 * Update [Release Overview in 
Confluence|https://cwiki.apache.org/confluence/display/FLINK/Release+Management+and+Feature+Plan]
 * (minor only) The documentation for the new major release is visible under 
[https://nightlies.apache.org/flink/flink-docs-release-$SHORT_RELEASE_VERSION] 
(after at least one [doc 
build|https://github.com/apache/flink/actions/workflows/docs.yml] succeeded).
 * (minor only) The documentation for the new major release does not contain 
"-SNAPSHOT" in its version title, and all links refer to the corresponding 
version docs instead of {{{}master{}}}.

  was:
There are a few pages in the documentation that need to be reviewed and updated 
for each release.
 * Ensure that there exists a release notes page for each non-bugfix release 
(e.g., 1.5.0) in {{{}./docs/release-notes/{}}}, that it is up-to-date, and 
linked from the start page of the documentation.
 * Upgrading Applications and Flink Versions: 
[https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html]
 * ...

 

h3. Expectations
 * Update upgrade compatibility table 
([apache-flink:./docs/content/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content/docs/ops/upgrading.md#compatibility-table]
 and 
[apache-flink:./docs/content.zh/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content.zh/docs/ops/upgrading.md#compatibility-table]).
 * Update [Release Overview in 
Confluence|https://cwiki.apache.org/confluence/display/FLINK/Release+Management+and+Feature+Plan]
 * (major only) The documentation for the new major release is visible under 
https://nightlies.apache.org/flink/flink-docs-release-$SHORT_RELEASE_VERSION 
(after at least one [doc 
build|https://github.com/apache/flink/actions/workflows/docs.yml] succeeded).
 * (major only) The documentation for the new major release does not contain 
"-SNAPSHOT" in its version title, and all links refer to the corresponding 
version docs instead of {{master}}.


> Review and update documentation
> ---
>
> Key: FLINK-31149
> URL: https://issues.apache.org/jira/browse/FLINK-31149
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Matthias Pohl
>Priority: Major
>
> There are a few pages in the documentation that need to be reviewed and 
> updated for each release.
>  * Ensure that there exists a release notes page for each non-bugfix release 
> (e.g., 1.5.0) in {{{}./docs/release-notes/{}}}, that it is up-to-date, and 
> linked from the start page of the documentation.
>  * Upgrading Applications and Flink Versions: 
> [https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html]
>  * ...
>  
> 
> h3. Expectations
>  * Update upgrade compatibility table 
> ([apache-flink:./docs/content/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content/docs/ops/upgrading.md#compatibility-table]
>  and 
> [apache-flink:./docs/content.zh/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content.zh/docs/ops/upgrading.md#compatibility-table]).
>  * Update [Release Overview in 
> Confluence|https://cwiki.apache.org/confluence/display/FLINK/Release+Management+and+Feature+Plan]
>  * (minor only) The documentation for the new major release is visible under 
> [https://nightlies.apache.org/flink/flink-docs-release-$SHORT_RELEASE_VERSION]
>  (after at least one [doc 
> build|https://github.com/apache/flink/actions/workflows/docs.yml] succeeded).
>  * (minor only) The documentation for the new major release does not contain 
> "-SNAPSHOT" in its version title, and all links refer to the corresponding 
> version docs instead of {{{}master{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] reswqa commented on a diff in pull request #21977: [FLINK-31132] Compact without setting parallelism does not follow the configured sink parallelism for HiveTableSink

2023-02-21 Thread via GitHub


reswqa commented on code in PR #21977:
URL: https://github.com/apache/flink/pull/21977#discussion_r1112641083


##
flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/HiveTableCompactSinkITCase.java:
##
@@ -42,17 +44,18 @@
 import static org.assertj.core.api.Assertions.assertThat;
 
 /** IT case for Hive table compaction in batch mode. */
-public class HiveTableCompactSinkITCase {
+@ExtendWith(TestLoggerExtension.class)
+class HiveTableCompactSinkITCase {
 
 @RegisterExtension
-private static final MiniClusterExtension MINI_CLUSTER = new 
MiniClusterExtension();
+public static final MiniClusterExtension MINI_CLUSTER = new 
MiniClusterExtension();

Review Comment:
   TBH, The reason I changed it is only to follow the usage in the 
`MiniClusterExtension` class 
[javadoc](https://github.com/apache/flink/blob/bacdc326b58749924acbd8921d63eda06663a225/flink-test-utils-parent/flink-test-utils/src/main/java/org/apache/flink/test/junit5/MiniClusterExtension.java#L70).
   
   At the same time, in the `architecture check` of IT case, we can see similar 
[requirement](https://github.com/apache/flink/blob/bacdc326b58749924acbd8921d63eda06663a225/flink-architecture-tests/flink-architecture-tests-test/src/main/java/org/apache/flink/architecture/rules/ITCaseRules.java#L67).
 Unfortunately, I looked at the actual check logic and didn't seem to require 
it to be public, which really confused me.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] lindong28 commented on pull request #214: [FLINK-31127] Add public API classes for FLIP-289

2023-02-21 Thread via GitHub


lindong28 commented on PR #214:
URL: https://github.com/apache/flink-ml/pull/214#issuecomment-1438134181

   @jiangxin369 Can you help review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] FangYongs commented on pull request #21956: [FLINK-31113] Support AND filter push down in orc

2023-02-21 Thread via GitHub


FangYongs commented on PR #21956:
URL: https://github.com/apache/flink/pull/21956#issuecomment-1438133167

   Thanks @libenchao , I have updated


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31152) Setup for the executing release manager

2023-02-21 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-31152:
--
Description: 
h4. GPG Key

You need to have a GPG key to sign the release artifacts. Please be aware of 
the ASF-wide [release signing 
guidelines|https://www.apache.org/dev/release-signing.html]. If you don’t have 
a GPG key associated with your Apache account, please create one according to 
the guidelines.

Determine your Apache GPG Key and Key ID, as follows:
{code:java}
$ gpg --list-keys
{code}
This will list your GPG keys. One of these should reflect your Apache account, 
for example:
{code:java}
--
pub   2048R/845E6689 2016-02-23
uid  Nomen Nescio 
sub   2048R/BA4D50BE 2016-02-23
{code}
In the example above, the key ID is the 8-digit hex string in the {{pub}} line: 
{{{}845E6689{}}}.

Now, add your Apache GPG key to the Flink’s {{KEYS}} file in the [Apache Flink 
release KEYS file|https://dist.apache.org/repos/dist/release/flink/KEYS] 
repository at [dist.apache.org|http://dist.apache.org/]. Follow the 
instructions listed at the top of these files. (Note: Only PMC members have 
write access to the release repository. If you end up getting 403 errors ask on 
the mailing list for assistance.)

Configure {{git}} to use this key when signing code by giving it your key ID, 
as follows:
{code:java}
$ git config --global user.signingkey 845E6689
{code}
You may drop the {{--global}} option if you’d prefer to use this key for the 
current repository only.

You may wish to start {{gpg-agent}} to unlock your GPG key only once using your 
passphrase. Otherwise, you may need to enter this passphrase hundreds of times. 
The setup for {{gpg-agent}} varies based on operating system, but may be 
something like this:
{code:bash}
$ eval $(gpg-agent --daemon --no-grab --write-env-file $HOME/.gpg-agent-info)
$ export GPG_TTY=$(tty)
$ export GPG_AGENT_INFO
{code}
h4. Access to Apache Nexus repository

Configure access to the [Apache Nexus 
repository|https://repository.apache.org/], which enables final deployment of 
releases to the Maven Central Repository.
 # You log in with your Apache account.
 # Confirm you have appropriate access by finding {{org.apache.flink}} under 
{{{}Staging Profiles{}}}.
 # Navigate to your {{Profile}} (top right drop-down menu of the page).
 # Choose {{User Token}} from the dropdown, then click {{{}Access User 
Token{}}}. Copy a snippet of the Maven XML configuration block.
 # Insert this snippet twice into your global Maven {{settings.xml}} file, 
typically {{{}${HOME}/.m2/settings.xml{}}}. The end result should look like 
this, where {{TOKEN_NAME}} and {{TOKEN_PASSWORD}} are your secret tokens:
{code:xml}

   
 
   apache.releases.https
   TOKEN_NAME
   TOKEN_PASSWORD
 
 
   apache.snapshots.https
   TOKEN_NAME
   TOKEN_PASSWORD
 
   
 
{code}

h4. Website development setup

Get ready for updating the Flink website by following the [website development 
instructions|https://flink.apache.org/contributing/improve-website.html].
h4. GNU Tar Setup for Mac (Skip this step if you are not using a Mac)

The default tar application on Mac does not support GNU archive format and 
defaults to Pax. This bloats the archive with unnecessary metadata that can 
result in additional files when decompressing (see [1.15.2-RC2 vote 
thread|https://lists.apache.org/thread/mzbgsb7y9vdp9bs00gsgscsjv2ygy58q]). 
Install gnu-tar and create a symbolic link to use in preference of the default 
tar program.
{code:bash}
$ brew install gnu-tar
$ ln -s /usr/local/bin/gtar /usr/local/bin/tar
$ which tar
{code}
 

h3. Expectations
 * Release Manager’s GPG key is published to 
[dist.apache.org|http://dist.apache.org/]
 * Release Manager’s GPG key is configured in git configuration
 * Release Manager's GPG key is configured as the default gpg key.
 * Release Manager has {{org.apache.flink}} listed under Staging Profiles in 
Nexus
 * Release Manager’s Nexus User Token is configured in settings.xml

  was:
h4. GPG Key

You need to have a GPG key to sign the release artifacts. Please be aware of 
the ASF-wide [release signing 
guidelines|https://www.apache.org/dev/release-signing.html]. If you don’t have 
a GPG key associated with your Apache account, please create one according to 
the guidelines.

Determine your Apache GPG Key and Key ID, as follows:
{code:java}
$ gpg --list-keys
{code}
This will list your GPG keys. One of these should reflect your Apache account, 
for example:
{code:java}
--
pub   2048R/845E6689 2016-02-23
uid  Nomen Nescio 
sub   2048R/BA4D50BE 2016-02-23
{code}
In the example above, the key ID is the 8-digit hex string in the {{pub}} line: 
{{{}845E6689{}}}.

Now, add your Apache GPG key to the Flink’s {{KEYS}} file in the [Apache 

[jira] [Updated] (FLINK-31144) Slow scheduling on large-scale batch jobs

2023-02-21 Thread Julien Tournay (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Tournay updated FLINK-31144:
---
Attachment: image-2023-02-21-10-29-49-388.png

> Slow scheduling on large-scale batch jobs 
> --
>
> Key: FLINK-31144
> URL: https://issues.apache.org/jira/browse/FLINK-31144
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Julien Tournay
>Priority: Major
> Attachments: flink-1.17-snapshot-1676473798013.nps, 
> image-2023-02-21-10-29-49-388.png
>
>
> When executing a complex job graph at high parallelism 
> `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` can 
> get slow and cause long pauses where the JobManager becomes unresponsive and 
> all the taskmanagers just wait. I've attached a VisualVM snapshot to 
> illustrate the problem.[^flink-1.17-snapshot-1676473798013.nps]
> At Spotify we have complex jobs where this issue can cause batch "pause" of 
> 40+ minutes and make the overall execution 30% slower or more.
> More importantly this prevent us from running said jobs on larger cluster as 
> adding resources to the cluster worsen the issue.
> We have successfully tested a modified Flink version where 
> `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` was 
> completely commented and simply returns an empty collection and confirmed it 
> solves the issue.
> In the same spirit as a recent change 
> ([https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L98-L102)]
>  there could be a mechanism in place to detect when Flink run into this 
> specific issue and just skip the call to `getInputLocationFutures`  
> [https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L105-L108.]
> I'm not familiar enough with the internals of Flink to propose a more 
> advanced fix, however it seems like a configurable threshold on the number of 
> consumer vertices above which the preferred location is not computed would 
> do. If this  solution is good enough, I'd be happy to submit a PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-ml] zhipeng93 commented on a diff in pull request #212: [FLINK-31125] Flink ML benchmark framework should minimize the source operator overhead

2023-02-21 Thread via GitHub


zhipeng93 commented on code in PR #212:
URL: https://github.com/apache/flink-ml/pull/212#discussion_r1112795550


##
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/datagenerator/common/DenseVectorArrayGenerator.java:
##
@@ -42,7 +42,7 @@ protected RowGenerator[] getRowGenerators() {
 return new RowGenerator[] {
 new RowGenerator(getNumValues(), getSeed()) {
 @Override
-protected Row nextRow() {
+protected Row getRow() {

Review Comment:
   Reusing one row to simulate the input may cause the benchmark result not 
stable, since processing two different rows may take different time.
   
   The two optimizations introduced in this PR give us 12% speedup. Is there a 
breakdown analysis of this two optimization?
   
   If the data generation cost is indeed high, can we sample a fixed number of 
rows and reuse it later?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] zhipeng93 commented on a diff in pull request #212: [FLINK-31125] Flink ML benchmark framework should minimize the source operator overhead

2023-02-21 Thread via GitHub


zhipeng93 commented on code in PR #212:
URL: https://github.com/apache/flink-ml/pull/212#discussion_r1112795550


##
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/datagenerator/common/DenseVectorArrayGenerator.java:
##
@@ -42,7 +42,7 @@ protected RowGenerator[] getRowGenerators() {
 return new RowGenerator[] {
 new RowGenerator(getNumValues(), getSeed()) {
 @Override
-protected Row nextRow() {
+protected Row getRow() {

Review Comment:
   Reusing one row to simulate the input may cause the benchmark result not 
stable, since processing two different rows may take different time.
   
   The two optimizations introduced in this PR give us 12% speedup. Is there a 
breakdown analysis of this two optimization?
   
   If the data generation cost is indeed high, can we sample a fixed number of 
rows and reuse it in the evalution?



##
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/datagenerator/common/DenseVectorArrayGenerator.java:
##
@@ -42,7 +42,7 @@ protected RowGenerator[] getRowGenerators() {
 return new RowGenerator[] {
 new RowGenerator(getNumValues(), getSeed()) {
 @Override
-protected Row nextRow() {
+protected Row getRow() {

Review Comment:
   Reusing one row to simulate the input may cause the benchmark result not 
stable, since processing two different rows may take different time.
   
   The two optimizations introduced in this PR give us 12% speedup. Is there a 
breakdown analysis of this two optimization?
   
   If the data generation cost is indeed high, can we sample a fixed number of 
rows and reuse it in the evaluation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31144) Slow scheduling on large-scale batch jobs

2023-02-21 Thread Julien Tournay (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691474#comment-17691474
 ] 

Julien Tournay commented on FLINK-31144:


Hi [~huwh],


Thank you for the quick reply :)


{quote}when the topology is complex.
{quote}

Indeed. For the issue to be noticeable, the jobgraph has to be fairly complex, 
feature all-to-all distributions and execute with a high parallelism.

 
{quote}1. Is the slow scheduling or the scheduled result of location preferred 
make your job slow?
{quote}

Yes it very much does. We have a job that takes ~2h30 (after many many tweaks 
to get the best possible perf.). It's impossible to get it to run in less time 
because adding more taskmanagers make the scheduling slow and overall the 
execution gets longer. Removing preferred location makes it possible to run it 
in less that 2h (We're aiming at ~1h45min).

 
{quote}2. "we have complex jobs where this issue can cause batch "pause" of 40+ 
minutes"  What does "pause" meaning? Is the getPreferredLocationsBasedOnInputs 
take more than 40+ minutes?
{quote}

By "pause" I mean that at the beginning of the execution, the taskmanagers will 
wait for the JobManager for ~40min and then will start processing. With Flink 
1.17 and no preferred location, the "pause" is down to ~5min.

I should also mention the JM is very unresponsive and the web console struggles 
the show anything. 


{quote}Could you provide the topology of the complex job.
{quote}

I can but not sure what format to use. The graph is quite big and a simple 
screenshot is unreadable: !image-2023-02-21-10-29-49-388.png!

I can maybe share the archived execution json file (~500Mb) if that's helpful ?

 

> Slow scheduling on large-scale batch jobs 
> --
>
> Key: FLINK-31144
> URL: https://issues.apache.org/jira/browse/FLINK-31144
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Julien Tournay
>Priority: Major
> Attachments: flink-1.17-snapshot-1676473798013.nps, 
> image-2023-02-21-10-29-49-388.png
>
>
> When executing a complex job graph at high parallelism 
> `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` can 
> get slow and cause long pauses where the JobManager becomes unresponsive and 
> all the taskmanagers just wait. I've attached a VisualVM snapshot to 
> illustrate the problem.[^flink-1.17-snapshot-1676473798013.nps]
> At Spotify we have complex jobs where this issue can cause batch "pause" of 
> 40+ minutes and make the overall execution 30% slower or more.
> More importantly this prevent us from running said jobs on larger cluster as 
> adding resources to the cluster worsen the issue.
> We have successfully tested a modified Flink version where 
> `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` was 
> completely commented and simply returns an empty collection and confirmed it 
> solves the issue.
> In the same spirit as a recent change 
> ([https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L98-L102)]
>  there could be a mechanism in place to detect when Flink run into this 
> specific issue and just skip the call to `getInputLocationFutures`  
> [https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L105-L108.]
> I'm not familiar enough with the internals of Flink to propose a more 
> advanced fix, however it seems like a configurable threshold on the number of 
> consumer vertices above which the preferred location is not computed would 
> do. If this  solution is good enough, I'd be happy to submit a PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31153) Create a release branch

2023-02-21 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-31153:
--
Description: 
If you are doing a new minor release, you need to update Flink version in the 
following repositories:
 * [apache/flink|https://github.com/apache/flink]
 * [apache/flink-docker|https://github.com/apache/flink-docker]
 * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks]

Patch releases don't require the these repositories to be touched. Simply 
checkout the already existing branch for that version:
{code:java}
$ cd ./tools
$ git checkout release-$SHORT_RELEASE_VERSION
{code}
h4. Flink repository

Create a branch for the new version that we want to release before updating the 
master branch to the next development version:
{code:bash}
$ cd ./tools
$ releasing/create_snapshot_branch.sh
$ git checkout master
$ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION 
releasing/update_branch_version.sh
{code}
In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 as the last entry:
{code:java}
// ...
v1_12("1.12"),
v1_13("1.13"),
v1_14("1.14"),
v1_15("1.15"),
v1_16("1.16");
{code}
The newly created branch and updated {{master}} branch need to be pushed to the 
official repository.
h4. Flink Docker Repository

Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the 
[apache/flink-docker|https://github.com/apache/flink-docker] repository. Make 
sure that 
[apache/flink-docker:.github/workflows/ci.yml|https://github.com/apache/flink-docker/blob/dev-master/.github/workflows/ci.yml]
 points to the correct snapshot version; for {{dev-x.y}} it should point to 
{{{}x.y-SNAPSHOT{}}}, while for {{dev-master}} it should point to the most 
recent snapshot version (\{[$NEXT_SNAPSHOT_VERSION}}).

After pushing the new minor release branch, as the last step you should also 
update the documentation workflow to also build the documentation for the new 
release branch. Check [Managing 
Documentation|https://cwiki.apache.org/confluence/display/FLINK/Managing+Documentation]
 on details on how to do that. You may also want to manually trigger a build to 
make the changes visible as soon as possible.

 

h3. Expectations (Minor Version only)
 * Release branch has been created and pushed
 * Cron job has been added on the release branch in 
([apache-flink:./tools/azure-pipelines/build-apache-repo.yml|https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-apache-repo.yml])
 * Changes on the new release branch are picked up by [Azure 
CI|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=1&_a=summary]
 * {{master}} branch has the version information updated to the new version 
(check pom.xml files and 
 * 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 enum)
 * New version is added to the 
[apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java]
 enum.
 * Make sure [flink-docker|https://github.com/apache/flink-docker/] has 
{{dev-x.y}} branch and docker e2e tests run against this branch in the 
corresponding Apache Flink release branch (see 
[apache/flink:flink-end-to-end-tests/test-scripts/common_docker.sh:51|https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/common_docker.sh#L51])
 * 
[apache-flink:docs/config.toml|https://github.com/apache/flink/blob/release-1.17/docs/config.toml]
 has been updated appropriately in the new Apache Flink release branch.
 * The {{flink.version}} property (see 
[apache/flink-benchmarks:pom.xml|https://github.com/apache/flink-benchmarks/blob/master/pom.xml#L48]
 of Flink Benchmark repo has been updated to the latest snapshot version.

  was:
If you are doing a new minor release, you need to update Flink version in the 
following repositories:
 * [apache/flink|https://github.com/apache/flink]
 * [apache/flink-docker|https://github.com/apache/flink-docker]
 * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks]

Patch releases don't require the these repositories to be touched. Simply 
checkout the already existing branch for that version:
{code:java}
$ cd ./tools
$ git checkout release-$SHORT_RELEASE_VERSION
{code}
h4. Flink repository

Create a branch for the new version that we want to release before updating the 
master branch to the next development version:
{code:bash}
$ cd ./tools
$ releasing/create_snapshot_branch.sh
$ git checkout master
$ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION 

[GitHub] [flink-ml] zhipeng93 closed pull request #176: Bump jackson-databind from 2.12.6.1 to 2.12.7.1

2023-02-21 Thread via GitHub


zhipeng93 closed pull request #176: Bump jackson-databind from 2.12.6.1 to 
2.12.7.1
URL: https://github.com/apache/flink-ml/pull/176


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] dependabot[bot] commented on pull request #176: Bump jackson-databind from 2.12.6.1 to 2.12.7.1

2023-02-21 Thread via GitHub


dependabot[bot] commented on PR #176:
URL: https://github.com/apache/flink-ml/pull/176#issuecomment-1438157269

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. If you'd rather skip all updates until the next 
major or minor version, let me know by commenting `@dependabot ignore this 
major version` or `@dependabot ignore this minor version`.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] dmvk commented on a diff in pull request #21981: [FLINK-21450][runtime] Support LocalRecovery by AdaptiveScheduler

2023-02-21 Thread via GitHub


dmvk commented on code in PR #21981:
URL: https://github.com/apache/flink/pull/21981#discussion_r1112810683


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java:
##
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.scheduler.adaptive.allocator;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.executiongraph.ExecutionGraph;
+import org.apache.flink.runtime.executiongraph.ExecutionJobVertex;
+import org.apache.flink.runtime.executiongraph.ExecutionVertex;
+import org.apache.flink.runtime.jobgraph.JobVertexID;
+import org.apache.flink.runtime.jobmaster.SlotInfo;
+import 
org.apache.flink.runtime.scheduler.adaptive.allocator.SlotSharingSlotAllocator.ExecutionSlotSharingGroup;
+import 
org.apache.flink.runtime.scheduler.adaptive.allocator.SlotSharingSlotAllocator.ExecutionSlotSharingGroupAndSlot;
+import org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID;
+import org.apache.flink.runtime.state.KeyGroupRange;
+import org.apache.flink.runtime.state.KeyGroupRangeAssignment;
+import org.apache.flink.util.Preconditions;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.PriorityQueue;
+import java.util.stream.StreamSupport;
+
+import static java.util.function.Function.identity;
+import static java.util.stream.Collectors.toMap;
+
+/** A {@link SlotAssigner} that assigns slots based on the number of local key 
groups. */
+@Internal
+public class StateLocalitySlotAssigner implements SlotAssigner {
+
+private static class AllocationScore implements 
Comparable {
+
+private final String group;
+private final AllocationID allocationId;
+
+public AllocationScore(String group, AllocationID allocationId, int 
score) {
+this.group = group;
+this.allocationId = allocationId;
+this.score = score;
+}
+
+private final int score;
+
+public String getGroup() {
+return group;
+}
+
+public AllocationID getAllocationId() {
+return allocationId;
+}
+
+public int getScore() {
+return score;
+}
+
+@Override
+public int compareTo(StateLocalitySlotAssigner.AllocationScore other) {
+int result = Integer.compare(score, other.score);
+if (result != 0) {
+return result;
+}
+result = other.allocationId.compareTo(allocationId);
+if (result != 0) {
+return result;
+}
+return other.group.compareTo(group);
+}
+}
+
+private final Map> locality;
+private final Map maxParallelism;
+
+public StateLocalitySlotAssigner(ExecutionGraph archivedExecutionGraph) {
+this(
+calculateLocalKeyGroups(archivedExecutionGraph),
+StreamSupport.stream(
+
archivedExecutionGraph.getVerticesTopologically().spliterator(),
+false)
+.collect(
+toMap(
+ExecutionJobVertex::getJobVertexId,
+
ExecutionJobVertex::getMaxParallelism)));
+}
+
+public StateLocalitySlotAssigner(
+Map> locality,
+Map maxParallelism) {
+this.locality = locality;
+this.maxParallelism = maxParallelism;
+}
+
+@Override
+public AssignmentResult assignSlots(
+Collection slots, 
Collection groups) {
+
+final Map parallelism = new HashMap<>();
+groups.forEach(
+group ->
+group.getContainedExecutionVertices()
+.forEach(
+evi ->
+parallelism.merge(
+

[GitHub] [flink] dmvk commented on a diff in pull request #21981: [FLINK-21450][runtime] Support LocalRecovery by AdaptiveScheduler

2023-02-21 Thread via GitHub


dmvk commented on code in PR #21981:
URL: https://github.com/apache/flink/pull/21981#discussion_r1112815613


##
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotSharingSlotAllocator.java:
##
@@ -121,16 +133,13 @@ public Optional 
determineParallelism(
 slotSharingGroupParallelism.get(
 slotSharingGroup.getSlotSharingGroupId()));
 
-final Iterable 
sharedSlotToVertexAssignment =
+final List sharedSlotToVertexAssignment 
=
 createExecutionSlotSharingGroups(vertexParallelism);
 
-for (ExecutionSlotSharingGroup executionSlotSharingGroup :
-sharedSlotToVertexAssignment) {
-final SlotInfo slotInfo = slotIterator.next();
-
-assignments.add(
-new 
ExecutionSlotSharingGroupAndSlot(executionSlotSharingGroup, slotInfo));
-}
+SlotAssigner.AssignmentResult result =
+slotAssigner.assignSlots(freeSlots, 
sharedSlotToVertexAssignment);
+assignments.addAll(result.assignments);
+freeSlots = result.remainingSlots;

Review Comment:
   樂 Is there a reason to run this within the loop at all? Can we simply move 
this to after the loop?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31142) Some queries lead to abrupt sql client close

2023-02-21 Thread Sergey Nuyanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Nuyanzin updated FLINK-31142:

Priority: Blocker  (was: Major)

> Some queries lead to abrupt sql client close
> 
>
> Key: FLINK-31142
> URL: https://issues.apache.org/jira/browse/FLINK-31142
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.17.0
>Reporter: Sergey Nuyanzin
>Priority: Blocker
>
> Although the behavior has been changed in 1.17.0, I'm not sure whether it is 
> a blocker or not, since in both cases it is invalid query.
> The difference in the behavior is that before 1.17.0
> a query like 
> {code:sql}
> select /* multiline comment;
> {code}
> fails to execute and sql client prompts to submit another query.
> In 1.17.0 it  shuts down the session failing with 
> {noformat}
> Exception in thread "main" org.apache.flink.table.client.SqlClientException: 
> Could not read from command line.
>   at 
> org.apache.flink.table.client.cli.CliClient.getAndExecuteStatements(CliClient.java:205)
>   at 
> org.apache.flink.table.client.cli.CliClient.executeInteractive(CliClient.java:168)
>   at 
> org.apache.flink.table.client.cli.CliClient.executeInInteractiveMode(CliClient.java:113)
>   at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:169)
>   at org.apache.flink.table.client.SqlClient.start(SqlClient.java:118)
>   at 
> org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:228)
>   at org.apache.flink.table.client.SqlClient.main(SqlClient.java:179)
> Caused by: org.apache.flink.sql.parser.impl.TokenMgrError: Lexical error at 
> line 1, column 29.  Encountered:  after : ""
>   at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImplTokenManager.getNextToken(FlinkSqlParserImplTokenManager.java:26752)
>   at 
> org.apache.flink.table.client.cli.parser.SqlCommandParserImpl$TokenIterator.scan(SqlCommandParserImpl.java:89)
>   at 
> org.apache.flink.table.client.cli.parser.SqlCommandParserImpl$TokenIterator.next(SqlCommandParserImpl.java:81)
>   at 
> org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.checkIncompleteStatement(SqlCommandParserImpl.java:141)
>   at 
> org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.getCommand(SqlCommandParserImpl.java:111)
>   at 
> org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.parseStatement(SqlCommandParserImpl.java:52)
>   at 
> org.apache.flink.table.client.cli.parser.SqlMultiLineParser.parse(SqlMultiLineParser.java:82)
>   at 
> org.jline.reader.impl.LineReaderImpl.acceptLine(LineReaderImpl.java:2964)
>   at 
> org.jline.reader.impl.LineReaderImpl$1.apply(LineReaderImpl.java:3778)
>   at 
> org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:679)
>   at 
> org.apache.flink.table.client.cli.CliClient.getAndExecuteStatements(CliClient.java:183)
>   ... 6 more
> Shutting down the session...
> done.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31142) Some queries lead to abrupt sql client close

2023-02-21 Thread Sergey Nuyanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Nuyanzin updated FLINK-31142:

Description: 
Although the behavior has been changed in 1.17.0, I'm not sure whether it is a 
blocker or not, since in both cases it is invalid query.
I put it to blocker just because of regression.

The difference in the behavior is that before 1.17.0
a query like 
{code:sql}
select /* multiline comment;
{code}
fails to execute and sql client prompts to submit another query.

In 1.17.0 it  shuts down the session failing with 
{noformat}
Exception in thread "main" org.apache.flink.table.client.SqlClientException: 
Could not read from command line.
at 
org.apache.flink.table.client.cli.CliClient.getAndExecuteStatements(CliClient.java:205)
at 
org.apache.flink.table.client.cli.CliClient.executeInteractive(CliClient.java:168)
at 
org.apache.flink.table.client.cli.CliClient.executeInInteractiveMode(CliClient.java:113)
at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:169)
at org.apache.flink.table.client.SqlClient.start(SqlClient.java:118)
at 
org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:228)
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:179)
Caused by: org.apache.flink.sql.parser.impl.TokenMgrError: Lexical error at 
line 1, column 29.  Encountered:  after : ""
at 
org.apache.flink.sql.parser.impl.FlinkSqlParserImplTokenManager.getNextToken(FlinkSqlParserImplTokenManager.java:26752)
at 
org.apache.flink.table.client.cli.parser.SqlCommandParserImpl$TokenIterator.scan(SqlCommandParserImpl.java:89)
at 
org.apache.flink.table.client.cli.parser.SqlCommandParserImpl$TokenIterator.next(SqlCommandParserImpl.java:81)
at 
org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.checkIncompleteStatement(SqlCommandParserImpl.java:141)
at 
org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.getCommand(SqlCommandParserImpl.java:111)
at 
org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.parseStatement(SqlCommandParserImpl.java:52)
at 
org.apache.flink.table.client.cli.parser.SqlMultiLineParser.parse(SqlMultiLineParser.java:82)
at 
org.jline.reader.impl.LineReaderImpl.acceptLine(LineReaderImpl.java:2964)
at 
org.jline.reader.impl.LineReaderImpl$1.apply(LineReaderImpl.java:3778)
at 
org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:679)
at 
org.apache.flink.table.client.cli.CliClient.getAndExecuteStatements(CliClient.java:183)
... 6 more

Shutting down the session...
done.

{noformat}



  was:
Although the behavior has been changed in 1.17.0, I'm not sure whether it is a 
blocker or not, since in both cases it is invalid query.

The difference in the behavior is that before 1.17.0
a query like 
{code:sql}
select /* multiline comment;
{code}
fails to execute and sql client prompts to submit another query.

In 1.17.0 it  shuts down the session failing with 
{noformat}
Exception in thread "main" org.apache.flink.table.client.SqlClientException: 
Could not read from command line.
at 
org.apache.flink.table.client.cli.CliClient.getAndExecuteStatements(CliClient.java:205)
at 
org.apache.flink.table.client.cli.CliClient.executeInteractive(CliClient.java:168)
at 
org.apache.flink.table.client.cli.CliClient.executeInInteractiveMode(CliClient.java:113)
at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:169)
at org.apache.flink.table.client.SqlClient.start(SqlClient.java:118)
at 
org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:228)
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:179)
Caused by: org.apache.flink.sql.parser.impl.TokenMgrError: Lexical error at 
line 1, column 29.  Encountered:  after : ""
at 
org.apache.flink.sql.parser.impl.FlinkSqlParserImplTokenManager.getNextToken(FlinkSqlParserImplTokenManager.java:26752)
at 
org.apache.flink.table.client.cli.parser.SqlCommandParserImpl$TokenIterator.scan(SqlCommandParserImpl.java:89)
at 
org.apache.flink.table.client.cli.parser.SqlCommandParserImpl$TokenIterator.next(SqlCommandParserImpl.java:81)
at 
org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.checkIncompleteStatement(SqlCommandParserImpl.java:141)
at 
org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.getCommand(SqlCommandParserImpl.java:111)
at 
org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.parseStatement(SqlCommandParserImpl.java:52)
at 
org.apache.flink.table.client.cli.parser.SqlMultiLineParser.parse(SqlMultiLineParser.java:82)
at 
org.jline.reader.impl.LineReaderImpl.acceptLine(LineReaderImpl.java:2964)
at 

[jira] [Updated] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found

2023-02-21 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-31168:
--
Description: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706

We see this build failure because a job couldn't be found:
{code}
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: Error while waiting for job to be initialized
at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319)
at 
org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061)
at 
org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958)
at 
org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942)
at 
org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235)
at 
org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
Error while waiting for job to be initialized
at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
at 
org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056)
... 4 more
Caused by: java.lang.RuntimeException: Error while waiting for job to be 
initialized
at 
org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160)
at 
org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82)
at 
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73)
at 
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
at 
java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479)
at 
java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at 
java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.rest.util.RestClientException: 
[org.apache.flink.runtime.rest.NotFoundException: Job 
865dcd87f4828dbeb3d93eb52e2636b1 not found
at 
org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99)
at 
java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
at 
java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at 
org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.lambda$getExecutionGraphInternal$0(DefaultExecutionGraphCache.java:109)
at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:252)
at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at 
org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1387)
at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
   

[jira] [Updated] (FLINK-31133) PartiallyFinishedSourcesITCase hangs if a checkpoint fails

2023-02-21 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-31133:
--
Summary: PartiallyFinishedSourcesITCase hangs if a checkpoint fails  (was: 
AdaptiveSchedulerITCase took extraordinary long to finish)

> PartiallyFinishedSourcesITCase hangs if a checkpoint fails
> --
>
> Key: FLINK-31133
> URL: https://issues.apache.org/jira/browse/FLINK-31133
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.3
>Reporter: Matthias Pohl
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b
> This build ran into a timeout. Based on the stacktraces reported, it was 
> either caused by 
> [SnapshotMigrationTestBase.restoreAndExecute|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=13475]:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f23d800b800 nid=0x60cdd waiting on 
> condition [0x7f23e1c0d000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.flink.test.checkpointing.utils.SnapshotMigrationTestBase.restoreAndExecute(SnapshotMigrationTestBase.java:382)
>   at 
> org.apache.flink.test.migration.TypeSerializerSnapshotMigrationITCase.testSnapshot(TypeSerializerSnapshotMigrationITCase.java:172)
>   at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source)
> [...]
> {code}
> or 
> [PartiallyFinishedSourcesITCase.test|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=10401]:
> {code}
> 2023-02-20T07:13:05.6084711Z "main" #1 prio=5 os_prio=0 
> tid=0x7fd35c00b800 nid=0x8c8a waiting on condition [0x7fd363d0f000]
> 2023-02-20T07:13:05.6085149Zjava.lang.Thread.State: TIMED_WAITING 
> (sleeping)
> 2023-02-20T07:13:05.6085487Z  at java.lang.Thread.sleep(Native Method)
> 2023-02-20T07:13:05.6085925Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> 2023-02-20T07:13:05.6086512Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:138)
> 2023-02-20T07:13:05.6087103Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForSubtasksToFinish(CommonTestUtils.java:291)
> 2023-02-20T07:13:05.6087730Z  at 
> org.apache.flink.runtime.operators.lifecycle.TestJobExecutor.waitForSubtasksToFinish(TestJobExecutor.java:226)
> 2023-02-20T07:13:05.6088410Z  at 
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:138)
> 2023-02-20T07:13:05.6088957Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [...]
> {code}
> Still, it sounds odd: Based on a code analysis it's quite unlikely that those 
> two caused the issue. The former one has a 5 min timeout (see related code in 
> [SnapshotMigrationTestBase:382|https://github.com/apache/flink/blob/release-1.15/flink-tests/src/test/java/org/apache/flink/test/checkpointing/utils/SnapshotMigrationTestBase.java#L382]).
>  For the other one, we found it being not responsible in the past when some 
> other concurrent test caused the issue (see FLINK-30261).
> An investigation on where we lose the time for the timeout revealed that 
> {{AdaptiveSchedulerITCase}} took 2980s to finish (see [build 
> logs|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=5265]).
> {code}
> 2023-02-20T03:43:55.4546050Z Feb 20 03:43:55 [ERROR] Picked up 
> JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError
> 2023-02-20T03:43:58.0448506Z Feb 20 03:43:58 [INFO] Running 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> 2023-02-20T04:33:38.6824634Z Feb 20 04:33:38 [INFO] Tests run: 6, Failures: 
> 0, Errors: 0, Skipped: 0, Time elapsed: 2,980.445 s - in 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] Mulavar commented on pull request #21545: [FLINK-30396][table]make alias hint take effect in correlate

2023-02-21 Thread via GitHub


Mulavar commented on PR #21545:
URL: https://github.com/apache/flink/pull/21545#issuecomment-1438817485

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31138) Either StreamCheckpointingITCase/StreamFaultToleranceTestBase or EventTimeWindowCheckpointingITCase are timinng out

2023-02-21 Thread Roman Khachatryan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691725#comment-17691725
 ] 

Roman Khachatryan commented on FLINK-31138:
---

Thanks [~mapohl] and [~fanrui] , 

In the artifacts 
(logs-cron_azure-test_cron_azure_finegrained_resource_management-1676692614/mvn-1.log),

I've found the same problem with PartiallyFinishedSourcesITCase at 04:46 as in 
FLINK-31133: checkpoint failure -> running for too long -> no space left on 
device.

So I'd close this ticket as a duplicate.

 

However, there are some strange exceptions before that:

[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46283=logs=4d4a0d10-fca2-5507-8eed-c07f0bdf4887=7b25afdf-cc6c-566f-5459-359dc2585798=5584]
 
{code:java}
Feb 18 03:58:47 [INFO] Running 
org.apache.flink.core.memory.OffHeapUnsafeMemorySegmentTest
Exception in thread "Thread-13" java.lang.IllegalStateException: MemorySegment 
can be freed only once!
    at org.apache.flink.core.memory.MemorySegment.free(MemorySegment.java:244)
    at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-15" java.lang.IllegalStateException: MemorySegment 
can be freed only once!
    at org.apache.flink.core.memory.MemorySegment.free(MemorySegment.java:244)
    at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-17" java.lang.IllegalStateException: MemorySegment 
can be freed only once!
    at org.apache.flink.core.memory.MemorySegment.free(MemorySegment.java:244)
    at java.lang.Thread.run(Thread.java:748){code}
 

Starting at 04:08:
{code:java}
04:32:18,352 [flink-akka.actor.default-dispatcher-8] INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Job 
f1ec611893996fc2fc1830697195194b reached terminal state FAILED.
org.apache.flink.runtime.JobException: Recovery is suppressed by 
NoRestartBackoffTimeStrategy
        at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138)
        at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82)
        at 
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:301)
        at 
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:291)
        at 
org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:282)
        at 
org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:739)
        at 
org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:78)
        at 
org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:443)
        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:304)
        at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:302)
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
        at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
        at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
        at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at akka.actor.Actor.aroundReceive(Actor.scala:537)
        at akka.actor.Actor.aroundReceive$(Actor.scala:535)
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
        at akka.actor.ActorCell.invoke(ActorCell.scala:548)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
        at akka.dispatch.Mailbox.run(Mailbox.scala:231)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at 

[jira] [Updated] (FLINK-31162) Avoid setting private tokens to AM container context when kerberos delegation token fetch is disabled

2023-02-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31162:
---
Labels: pull-request-available  (was: )

> Avoid setting private tokens to AM container context when kerberos delegation 
> token fetch is disabled
> -
>
> Key: FLINK-31162
> URL: https://issues.apache.org/jira/browse/FLINK-31162
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.16.1
>Reporter: Venkata krishnan Sowrirajan
>Priority: Major
>  Labels: pull-request-available
>
> In our internal env, we have enabled [Consistent Reads from HDFS Observer 
> NameNode|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html].
>  With this, some of the _ObserverReadProxyProvider_ implementation clone the 
> delegation token for HA service and mark those tokens private so that they 
> won't be accessible through _ugi.getCredentials()._
> But Flink internally uses _currUsr.getTokens()_ 
> [here|https://github.com/apache/flink/blob/release-1.16.1/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L222]
>  to get the current user credentials tokens to be set in AM context for 
> submitting the YARN app to RM.
> This fails with the following error:
> {code:java}
> Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: test01-ha4.abc:9000, Ident: (HDFS_DELEGATION_TOKEN token 151335106 
> for john)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:495)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$900(DelegationTokenRenewer.java:79)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:939)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:916)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category WRITE is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1451)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5348)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:733)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:1056)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:525)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:495)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1905)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2856)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
> at org.apache.hadoop.ipc.Client.call(Client.java:1445)
> at org.apache.hadoop.ipc.Client.call(Client.java:1342)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> at com.sun.proxy.$Proxy87.renewDelegationToken(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewDelegationToken(ClientNamenodeProtocolTranslatorPB.java:986)
> at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
> at 
> 

[jira] [Commented] (FLINK-31133) AdaptiveSchedulerITCase took extraordinary long to finish

2023-02-21 Thread Roman Khachatryan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691693#comment-17691693
 ] 

Roman Khachatryan commented on FLINK-31133:
---

I think the issue is actually PartiallyFinishedSourcesITCase.

It starts at 3:40
{code:java}
03:40:55,702 [                main] INFO  
org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase [] -

Test test[complex graph ALL_SUBTASKS, failover: true, strategy: 
full](org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase)
 is running.
{code}
then a checkpoint fails because of a timeout:
{code:java}
03:41:10,775 [ChangelogRetryScheduler-1] INFO  
org.apache.flink.changelog.fs.RetryingExecutor               [] - failed with 3 
attempts: Attempt 3 timed out after 1000ms
03:41:10,777 [AsyncOperations-thread-1] INFO  
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - 
transform-2-keyed (4/4)#0 - asynchronous part of checkpoint 2 could not be 
completed.
java.util.concurrent.CompletionException: java.io.IOException: 
java.util.concurrent.TimeoutException: Attempt 3 timed out after 1000ms
        at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 ~[?:1.8.0_292]
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 ~[?:1.8.0_292]
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) 
~[?:1.8.0_292]
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
 ~[?:1.8.0_292]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_292]
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_292]
        at 
org.apache.flink.changelog.fs.FsStateChangelogWriter$UploadCompletionListener.onFailure(FsStateChangelogWriter.java:383)
 ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.changelog.fs.FsStateChangelogWriter.lambda$null$0(FsStateChangelogWriter.java:223)
 ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at java.util.ArrayList.removeIf(ArrayList.java:1415) ~[?:1.8.0_292]
        at 
org.apache.flink.changelog.fs.FsStateChangelogWriter.lambda$handleUploadFailure$4(FsStateChangelogWriter.java:222)
 ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
 ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) 
~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
 ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
 ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
 ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:807)
 ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:756) 
~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
 ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) 
~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) 
~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
Caused by: java.io.IOException: java.util.concurrent.TimeoutException: Attempt 
3 timed out after 1000ms
        at 
org.apache.flink.changelog.fs.FsStateChangelogWriter$UploadCompletionListener.onFailure(FsStateChangelogWriter.java:377)
 ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        ... 15 more
Caused by: java.util.concurrent.TimeoutException: Attempt 3 timed out after 
1000ms
        at 
org.apache.flink.changelog.fs.RetryingExecutor$RetriableActionAttempt.fmtError(RetryingExecutor.java:285)
 ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 

[jira] [Updated] (FLINK-31133) AdaptiveSchedulerITCase took extraordinary long to finish

2023-02-21 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-31133:
--
Priority: Major  (was: Critical)

> AdaptiveSchedulerITCase took extraordinary long to finish
> -
>
> Key: FLINK-31133
> URL: https://issues.apache.org/jira/browse/FLINK-31133
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.3
>Reporter: Matthias Pohl
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b
> This build ran into a timeout. Based on the stacktraces reported, it was 
> either caused by 
> [SnapshotMigrationTestBase.restoreAndExecute|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=13475]:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f23d800b800 nid=0x60cdd waiting on 
> condition [0x7f23e1c0d000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.flink.test.checkpointing.utils.SnapshotMigrationTestBase.restoreAndExecute(SnapshotMigrationTestBase.java:382)
>   at 
> org.apache.flink.test.migration.TypeSerializerSnapshotMigrationITCase.testSnapshot(TypeSerializerSnapshotMigrationITCase.java:172)
>   at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source)
> [...]
> {code}
> or 
> [PartiallyFinishedSourcesITCase.test|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=10401]:
> {code}
> 2023-02-20T07:13:05.6084711Z "main" #1 prio=5 os_prio=0 
> tid=0x7fd35c00b800 nid=0x8c8a waiting on condition [0x7fd363d0f000]
> 2023-02-20T07:13:05.6085149Zjava.lang.Thread.State: TIMED_WAITING 
> (sleeping)
> 2023-02-20T07:13:05.6085487Z  at java.lang.Thread.sleep(Native Method)
> 2023-02-20T07:13:05.6085925Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> 2023-02-20T07:13:05.6086512Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:138)
> 2023-02-20T07:13:05.6087103Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForSubtasksToFinish(CommonTestUtils.java:291)
> 2023-02-20T07:13:05.6087730Z  at 
> org.apache.flink.runtime.operators.lifecycle.TestJobExecutor.waitForSubtasksToFinish(TestJobExecutor.java:226)
> 2023-02-20T07:13:05.6088410Z  at 
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:138)
> 2023-02-20T07:13:05.6088957Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [...]
> {code}
> Still, it sounds odd: Based on a code analysis it's quite unlikely that those 
> two caused the issue. The former one has a 5 min timeout (see related code in 
> [SnapshotMigrationTestBase:382|https://github.com/apache/flink/blob/release-1.15/flink-tests/src/test/java/org/apache/flink/test/checkpointing/utils/SnapshotMigrationTestBase.java#L382]).
>  For the other one, we found it being not responsible in the past when some 
> other concurrent test caused the issue (see FLINK-30261).
> An investigation on where we lose the time for the timeout revealed that 
> {{AdaptiveSchedulerITCase}} took 2980s to finish (see [build 
> logs|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=5265]).
> {code}
> 2023-02-20T03:43:55.4546050Z Feb 20 03:43:55 [ERROR] Picked up 
> JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError
> 2023-02-20T03:43:58.0448506Z Feb 20 03:43:58 [INFO] Running 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> 2023-02-20T04:33:38.6824634Z Feb 20 04:33:38 [INFO] Tests run: 6, Failures: 
> 0, Errors: 0, Skipped: 0, Time elapsed: 2,980.445 s - in 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found

2023-02-21 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691710#comment-17691710
 ] 

Matthias Pohl commented on FLINK-31168:
---

Based on the stacktrace listed in the description, the job is executed (see 
[JobManagerHAProcessFailureRecoveryITCase:235|https://github.com/apache/flink/blob/489827520b1a53db04a94346c98327d0d42301c5/flink-tests/src/test/java/org/apache/flink/test/recovery/JobManagerHAProcessFailureRecoveryITCase.java#L235]
 which submits the job and then waits for the initialization to be over. This 
is implemented in using {{{}ClientUtils.waitUntilJobInitializationFinished{}}} 
in 
[AbstractSessionClusterExecutor:82ff|https://github.com/apache/flink/blob/f4b59f615438e76c2b42999fc0a8ebce6a543b07/flink-clients/src/main/java/org/apache/flink/client/deployment/executors/AbstractSessionClusterExecutor.java#L82]
 utilizing a {{{}RestClusterClient{}}}'s {{getJobStatus}} and 
{{requestJobResult}} methods.

The actual error happened in 
[Dispatcher:480f|https://github.com/apache/flink/blob/9e908567ecc06b607d08fca8e2e781b463a38de6/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L840]
 where {{JobManagerRunner}} is already deregistered. Therefore, the 
{{requestExecutionGraphInfo}} fails returning an exceptionally completed future 
with the {{FlinkJobNotFoundException}}.

{{ClientUtils.waitUntilJobInitializationFinished}} uses {{new 
ExponentialWaitStrategy(50, 2000)}} as a {{WaitStrategy}}. My theory is, that 
the 2s of the wait strategy are too long and the job starts and finishes 
between two {{getJobStatus}} calls causing the corresponding 
{{JobManagerRunner}} to be removed and the {{getJobStatus}} call to fail as 
observed.

> JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
> --
>
> Key: FLINK-31168
> URL: https://issues.apache.org/jira/browse/FLINK-31168
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.3, 1.16.1
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706
> We see this build failure because a job couldn't be found:
> {code}
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Error while waiting for job to be initialized
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942)
>   at 
> org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235)
>   at 
> org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336)
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Error while waiting for job to be initialized
>   at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
>   at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056)
>   ... 4 more
> Caused by: java.lang.RuntimeException: Error while waiting for job to be 
> initialized
>   at 
> org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160)
>   at 
> org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82)
>   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73)
>   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
>   at 
> java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479)
>   at 
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
>   at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
>   at 
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
>   at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
>   at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
> Caused by: 

[GitHub] [flink] venkata91 opened a new pull request, #21985: [FLINK-31162][yarn] Use currUsr.getTokens instead of currUsr.getCrede…

2023-02-21 Thread via GitHub


venkata91 opened a new pull request, #21985:
URL: https://github.com/apache/flink/pull/21985

   …ntials to avoid including private tokens in AM container context
   
   
   
   ## What is the purpose of the change
   
   Avoid setting private tokens to AM container context when kerberos 
delegation token fetch is disabled and DTs are managed.
   
   ## Brief change log
   
   Currently while setting user credentials to the AM container context, 
`ugi.getTokens()` is used which also returns the private tokens along with UGI 
tokens. But it should not be passed to the AM container context. This causes 
the launch of YARN RM app to fail in some cases for example when [Consistent 
Reads from HDFS Observer 
NameNode](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html)
 feature is enabled.  
   
   Instead, change it to `ugi.getCredentials().getAllTokens()` to only get user 
credentials tokens. Spark uses similar way of setting the [user credentials to 
AM container 
context](https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L348)
 as well.
   
   ## Verifying this change
   
   Tested this changed internally in our environment as this requires a managed 
way of Delegation token fetch. Also tested enabling kerberos delegation token 
fetch feature to make sure it doesn't regress.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-31133) AdaptiveSchedulerITCase took extraordinary long to finish

2023-02-21 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan reassigned FLINK-31133:
-

Assignee: Roman Khachatryan

> AdaptiveSchedulerITCase took extraordinary long to finish
> -
>
> Key: FLINK-31133
> URL: https://issues.apache.org/jira/browse/FLINK-31133
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.3
>Reporter: Matthias Pohl
>Assignee: Roman Khachatryan
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b
> This build ran into a timeout. Based on the stacktraces reported, it was 
> either caused by 
> [SnapshotMigrationTestBase.restoreAndExecute|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=13475]:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f23d800b800 nid=0x60cdd waiting on 
> condition [0x7f23e1c0d000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.flink.test.checkpointing.utils.SnapshotMigrationTestBase.restoreAndExecute(SnapshotMigrationTestBase.java:382)
>   at 
> org.apache.flink.test.migration.TypeSerializerSnapshotMigrationITCase.testSnapshot(TypeSerializerSnapshotMigrationITCase.java:172)
>   at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source)
> [...]
> {code}
> or 
> [PartiallyFinishedSourcesITCase.test|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=10401]:
> {code}
> 2023-02-20T07:13:05.6084711Z "main" #1 prio=5 os_prio=0 
> tid=0x7fd35c00b800 nid=0x8c8a waiting on condition [0x7fd363d0f000]
> 2023-02-20T07:13:05.6085149Zjava.lang.Thread.State: TIMED_WAITING 
> (sleeping)
> 2023-02-20T07:13:05.6085487Z  at java.lang.Thread.sleep(Native Method)
> 2023-02-20T07:13:05.6085925Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> 2023-02-20T07:13:05.6086512Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:138)
> 2023-02-20T07:13:05.6087103Z  at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForSubtasksToFinish(CommonTestUtils.java:291)
> 2023-02-20T07:13:05.6087730Z  at 
> org.apache.flink.runtime.operators.lifecycle.TestJobExecutor.waitForSubtasksToFinish(TestJobExecutor.java:226)
> 2023-02-20T07:13:05.6088410Z  at 
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:138)
> 2023-02-20T07:13:05.6088957Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [...]
> {code}
> Still, it sounds odd: Based on a code analysis it's quite unlikely that those 
> two caused the issue. The former one has a 5 min timeout (see related code in 
> [SnapshotMigrationTestBase:382|https://github.com/apache/flink/blob/release-1.15/flink-tests/src/test/java/org/apache/flink/test/checkpointing/utils/SnapshotMigrationTestBase.java#L382]).
>  For the other one, we found it being not responsible in the past when some 
> other concurrent test caused the issue (see FLINK-30261).
> An investigation on where we lose the time for the timeout revealed that 
> {{AdaptiveSchedulerITCase}} took 2980s to finish (see [build 
> logs|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=5265]).
> {code}
> 2023-02-20T03:43:55.4546050Z Feb 20 03:43:55 [ERROR] Picked up 
> JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError
> 2023-02-20T03:43:58.0448506Z Feb 20 03:43:58 [INFO] Running 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> 2023-02-20T04:33:38.6824634Z Feb 20 04:33:38 [INFO] Tests run: 6, Failures: 
> 0, Errors: 0, Skipped: 0, Time elapsed: 2,980.445 s - in 
> org.apache.flink.test.scheduling.AdaptiveSchedulerITCase
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31143) Invalid request: offset doesn't match when restarting from a savepoint

2023-02-21 Thread Weijie Guo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691714#comment-17691714
 ] 

Weijie Guo commented on FLINK-31143:


I'm a little suspicious that the current logic can't handle when `JobClient` is 
also restarted instead of simple task failover. There are some reasons from my 
side. For example, the `sinkRestarted` method 
[here|https://github.com/apache/flink/blob/5cda70d873c9630c898d765633ec7a6cfe53e3c6/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/AbstractCollectResultBuffer.java#L87]
 will not be called because our version(0) is equal to `INIT_VERSION`. I'm not 
sure why the restart of JobClient can not be taken into account, one possible 
reason is that transferring data from `CollectSink` to `JobClient` is a bit 
like writing data to an external system, It is not easy to ensure end-to-end 
EXACTLY_ONCE semantic.


> Invalid request: offset doesn't match when restarting from a savepoint
> --
>
> Key: FLINK-31143
> URL: https://issues.apache.org/jira/browse/FLINK-31143
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.17.0, 1.15.3, 1.16.1
>Reporter: Matthias Pohl
>Priority: Critical
>
> I tried to run the following case:
> {code:java}
> public static void main(String[] args) throws Exception {
> final String createTableQuery =
> "CREATE TABLE left_table (a int, c varchar) "
> + "WITH ("
> + " 'connector' = 'datagen', "
> + " 'rows-per-second' = '1', "
> + " 'fields.a.kind' = 'sequence', "
> + " 'fields.a.start' = '0', "
> + " 'fields.a.end' = '10'"
> + ");";
> final String selectQuery = "SELECT * FROM left_table;";
> final Configuration initialConfig = new Configuration();
> initialConfig.set(CoreOptions.DEFAULT_PARALLELISM, 1);
> final EnvironmentSettings initialSettings =
> EnvironmentSettings.newInstance()
> .inStreamingMode()
> .withConfiguration(initialConfig)
> .build();
> final TableEnvironment initialTableEnv = 
> TableEnvironment.create(initialSettings);
> // create job and consume two results
> initialTableEnv.executeSql(createTableQuery);
> final TableResult tableResult = 
> initialTableEnv.sqlQuery(selectQuery).execute();
> tableResult.await();
> System.out.println(tableResultIterator.next()); 
> System.out.println(tableResultIterator.next());          
> // stop job with savepoint
> final String savepointPath;
> try (CloseableIterator tableResultIterator = 
> tableResult.collect()) {
> final JobClient jobClient =
> 
> tableResult.getJobClient().orElseThrow(IllegalStateException::new);
> final File savepointDirectory = Files.createTempDir();
> savepointPath =
> jobClient
> .stopWithSavepoint(
> true,
> savepointDirectory.getAbsolutePath(),
> SavepointFormatType.CANONICAL)
> .get();
> }
> // restart the very same job from the savepoint
> final SavepointRestoreSettings savepointRestoreSettings =
> SavepointRestoreSettings.forPath(savepointPath, true);
> final Configuration restartConfig = new Configuration(initialConfig);
> SavepointRestoreSettings.toConfiguration(savepointRestoreSettings, 
> restartConfig);
> final EnvironmentSettings restartSettings =
> EnvironmentSettings.newInstance()
> .inStreamingMode()
> .withConfiguration(restartConfig)
> .build();
> final TableEnvironment restartTableEnv = 
> TableEnvironment.create(restartSettings);
> restartTableEnv.executeSql(createTableQuery);
> restartTableEnv.sqlQuery(selectQuery).execute().print();
> }
> {code}
> h3. Expected behavior
> The job continues omitting the inital two records and starts printing results 
> from 2 onwards.
> h3. Observed behavior
> No results are printed. The logs show that an invalid request was handled:
> {code:java}
> org.apache.flink.streaming.api.operators.collect.CollectSinkFunction [] - 
> Invalid request. Received version = b497a74f-e85c-404b-8df3-1b4b1a0c2de1, 
> offset = 0, while expected version = b497a74f-e85c-404b-8df3-1b4b1a0c2de1, 
> 

[GitHub] [flink] akalash commented on a diff in pull request #21923: FLINK-13871: Consolidate volatile status fields in StreamTask

2023-02-21 Thread via GitHub


akalash commented on code in PR #21923:
URL: https://github.com/apache/flink/pull/21923#discussion_r1113358376


##
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java:
##
@@ -949,7 +952,9 @@ public final void cleanUp(Throwable throwable) throws 
Exception {
 // disabled the interruptions or not.
 getCompletionFuture().exceptionally(unused -> null).join();
 // clean up everything we initialized
-isRunning = false;
+if (!isCanceled() && !isFailing()) {

Review Comment:
   I think that highly likely we can have FSM for this logic but I agree that 
it requires more research about its correctness.
   
   So if we want to keep this ticket simple and save the logic that we have 
now. We can not have `failing` in the enum because we can have `failing` and 
`running` at the same time. Maybe we should have `TaskState` object with enum 
and `failing` flag inside. It doesn't look as clean as the solution in this PR  
but at least it is cleaner than the current one and it keeps the same logic.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31138) Either StreamCheckpointingITCase/StreamFaultToleranceTestBase or EventTimeWindowCheckpointingITCase are timinng out

2023-02-21 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-31138:
--
Priority: Major  (was: Critical)

> Either StreamCheckpointingITCase/StreamFaultToleranceTestBase or 
> EventTimeWindowCheckpointingITCase are timinng out
> ---
>
> Key: FLINK-31138
> URL: https://issues.apache.org/jira/browse/FLINK-31138
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.15.3
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
> Attachments: 
> logs-cron_azure-test_cron_azure_finegrained_resource_management-1676692614.zip
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46283=logs=4d4a0d10-fca2-5507-8eed-c07f0bdf4887=7b25afdf-cc6c-566f-5459-359dc2585798
> This build timed out. The stacktraces revealed multiple tests that might have 
> caused this:
> * {{StreamCheckpointingITCase}} through 
> {{StreamFaultToleranceTestBase.runCheckpointedProgram}}
> {code}
> [...]
> 2023-02-18T07:37:47.6861582Z- locked <0x83c85250> (a 
> java.lang.Object)
> 2023-02-18T07:37:47.6862179Zat 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$SwitchingOnClose.collect(StreamSourceContexts.java:103)
> 2023-02-18T07:37:47.6862981Zat 
> org.apache.flink.test.checkpointing.StreamCheckpointingITCase$StringGeneratingSourceFunction.run(StreamCheckpointingITCase.java:169)
> 2023-02-18T07:37:47.6863762Z- locked <0x83c85250> (a 
> java.lang.Object)
> [...]
> 2023-02-18T07:37:47.7904307Z "main" #1 prio=5 os_prio=0 
> tid=0x7fca5000b800 nid=0x56636 waiting on condition [0x7fca57c58000]
> 2023-02-18T07:37:47.7904803Zjava.lang.Thread.State: WAITING (parking)
> 2023-02-18T07:37:47.7905160Zat sun.misc.Unsafe.park(Native Method)
> 2023-02-18T07:37:47.7905932Z- parking to wait for  <0x83c9df48> 
> (a java.util.concurrent.CompletableFuture$Signaller)
> 2023-02-18T07:37:47.7906498Zat 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 2023-02-18T07:37:47.7907074Zat 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> 2023-02-18T07:37:47.7907764Zat 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> 2023-02-18T07:37:47.7908457Zat 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> 2023-02-18T07:37:47.7909019Zat 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2023-02-18T07:37:47.7909605Zat 
> org.apache.flink.test.util.TestUtils.submitJobAndWaitForResult(TestUtils.java:93)
> 2023-02-18T07:37:47.7910413Zat 
> org.apache.flink.test.checkpointing.StreamFaultToleranceTestBase.runCheckpointedProgram(StreamFaultToleranceTestBase.java:134)
> [...]
> {code}
> or {{LocalRecoveryITCase}}/{{EventTimeWindowCheckpointingITCase}}:
> {code}
> [...]
> 2023-02-18T07:37:51.6744983Z "main" #1 prio=5 os_prio=0 
> tid=0x7efc4000b800 nid=0x5645a waiting on condition [0x7efc49b4e000]
> 2023-02-18T07:37:51.6745471Zjava.lang.Thread.State: WAITING (parking)
> 2023-02-18T07:37:51.6745823Zat sun.misc.Unsafe.park(Native Method)
> 2023-02-18T07:37:51.6746482Z- parking to wait for  <0x8718cce8> 
> (a java.util.concurrent.CompletableFuture$Signaller)
> 2023-02-18T07:37:51.6747147Zat 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 2023-02-18T07:37:51.6747725Zat 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> 2023-02-18T07:37:51.6748313Zat 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> 2023-02-18T07:37:51.6748892Zat 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> 2023-02-18T07:37:51.6749457Zat 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2023-02-18T07:37:51.6750118Zat 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1989)
> 2023-02-18T07:37:51.6750881Zat 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1969)
> 2023-02-18T07:37:51.6751694Zat 
> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase.testSlidingTimeWindow(EventTimeWindowCheckpointingITCase.java:524)
> 2023-02-18T07:37:51.6752476Zat 
> org.apache.flink.test.checkpointing.LocalRecoveryITCase.executeTest(LocalRecoveryITCase.java:84)
> 2023-02-18T07:37:51.6753157Zat 
> org.apache.flink.test.checkpointing.LocalRecoveryITCase.executeTest(LocalRecoveryITCase.java:66)
> 

[jira] [Comment Edited] (FLINK-31133) AdaptiveSchedulerITCase took extraordinary long to finish

2023-02-21 Thread Roman Khachatryan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691693#comment-17691693
 ] 

Roman Khachatryan edited comment on FLINK-31133 at 2/21/23 4:26 PM:


I think the issue is actually PartiallyFinishedSourcesITCase.

It starts at 3:40
{code:java}
03:40:55,702 [                main] INFO  
org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase [] -

Test test[complex graph ALL_SUBTASKS, failover: true, strategy: 
full](org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase)
 is running.
{code}
then a checkpoint fails because of a timeout:
{code:java}
03:41:10,775 [ChangelogRetryScheduler-1] INFO  
org.apache.flink.changelog.fs.RetryingExecutor               [] - failed with 3 
attempts: Attempt 3 timed out after 1000ms
03:41:10,777 [AsyncOperations-thread-1] INFO  
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - 
transform-2-keyed (4/4)#0 - asynchronous part of checkpoint 2 could not be 
completed.
java.util.concurrent.CompletionException: java.io.IOException: 
java.util.concurrent.TimeoutException: Attempt 3 timed out after 1000ms
        at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 ~[?:1.8.0_292]
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 ~[?:1.8.0_292]
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) 
~[?:1.8.0_292]
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
 ~[?:1.8.0_292]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_292]
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_292]
        at 
org.apache.flink.changelog.fs.FsStateChangelogWriter$UploadCompletionListener.onFailure(FsStateChangelogWriter.java:383)
 ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.changelog.fs.FsStateChangelogWriter.lambda$null$0(FsStateChangelogWriter.java:223)
 ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at java.util.ArrayList.removeIf(ArrayList.java:1415) ~[?:1.8.0_292]
        at 
org.apache.flink.changelog.fs.FsStateChangelogWriter.lambda$handleUploadFailure$4(FsStateChangelogWriter.java:222)
 ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
 ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) 
~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
 ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
 ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
 ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:807)
 ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:756) 
~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
 ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) 
~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) 
~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
Caused by: java.io.IOException: java.util.concurrent.TimeoutException: Attempt 
3 timed out after 1000ms
        at 
org.apache.flink.changelog.fs.FsStateChangelogWriter$UploadCompletionListener.onFailure(FsStateChangelogWriter.java:377)
 ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        ... 15 more
Caused by: java.util.concurrent.TimeoutException: Attempt 3 timed out after 
1000ms
        at 
org.apache.flink.changelog.fs.RetryingExecutor$RetriableActionAttempt.fmtError(RetryingExecutor.java:285)
 ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
        at 

[GitHub] [flink] flinkbot commented on pull request #21985: [FLINK-31162][yarn] Use currUsr.getTokens instead of currUsr.getCrede…

2023-02-21 Thread via GitHub


flinkbot commented on PR #21985:
URL: https://github.com/apache/flink/pull/21985#issuecomment-1438893837

   
   ## CI report:
   
   * bf60278b45ba5c499da3755d50c70aaf546f8bfa UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31109) Fails with proxy user not supported even when security.kerberos.fetch.delegation-token is set to false

2023-02-21 Thread Venkata krishnan Sowrirajan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691740#comment-17691740
 ] 

Venkata krishnan Sowrirajan commented on FLINK-31109:
-

[~martijnvisser] I'm still looking into it. I would probably need till the end 
of this week for the fix and all the internal testing that needs to be done 
before raising the PR. 

> Fails with proxy user not supported even when 
> security.kerberos.fetch.delegation-token is set to false
> --
>
> Key: FLINK-31109
> URL: https://issues.apache.org/jira/browse/FLINK-31109
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0
>Reporter: Venkata krishnan Sowrirajan
>Assignee: Venkata krishnan Sowrirajan
>Priority: Blocker
>
> With
> {code:java}
> security.kerberos.fetch.delegation-token: false
> {code}
> and delegation tokens obtained through our internal service which sets both 
> HADOOP_TOKEN_FILE_LOCATION to pick up the DTs and also sets the 
> HADOOP_PROXY_USER which fails with the below error
> {code:java}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/export/home/vsowrira/flink-1.18-SNAPSHOT/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/export/apps/hadoop/hadoop-bin_2100503/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> org.apache.flink.runtime.security.modules.SecurityModule$SecurityInstallException:
>  Unable to set the Hadoop login user
>   at 
> org.apache.flink.runtime.security.modules.HadoopModule.install(HadoopModule.java:106)
>   at 
> org.apache.flink.runtime.security.SecurityUtils.installModules(SecurityUtils.java:76)
>   at 
> org.apache.flink.runtime.security.SecurityUtils.install(SecurityUtils.java:57)
>   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1188)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.UnsupportedOperationException: Proxy user is not 
> supported
>   at 
> org.apache.flink.runtime.security.token.hadoop.KerberosLoginProvider.throwProxyUserNotSupported(KerberosLoginProvider.java:137)
>   at 
> org.apache.flink.runtime.security.token.hadoop.KerberosLoginProvider.isLoginPossible(KerberosLoginProvider.java:81)
>   at 
> org.apache.flink.runtime.security.modules.HadoopModule.install(HadoopModule.java:73)
>   ... 4 more
> {code}
> This seems to have gotten changed after 
> [480e6edf|https://github.com/apache/flink/commit/480e6edf9732f8334ef7576080fdbfc98051cb28]
>  ([FLINK-28330][runtime][security] Remove old delegation token framework code)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] venkata91 commented on pull request #21985: [FLINK-31162][yarn] Use currUsr.getCredentials.getTokens instead of currUsr.getTokens

2023-02-21 Thread via GitHub


venkata91 commented on PR #21985:
URL: https://github.com/apache/flink/pull/21985#issuecomment-1438988558

   > Re title: You mean 'Use currUsr.getCredentials().getAllTokens() instead of 
currUsr.getTokens() to avoid including private tokens' ?
   
   Oh. Yes. Thanks for pointing out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] venkata91 commented on pull request #21985: [FLINK-31162][yarn] Use currUsr.getCredentials.getTokens instead of currUsr.getTokens

2023-02-21 Thread via GitHub


venkata91 commented on PR #21985:
URL: https://github.com/apache/flink/pull/21985#issuecomment-1438989508

   > Change looks good to me. Not sure how testable this is ...
   
   Yeah not sure how we can add either unit tests or integration tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] venkata91 commented on pull request #21985: [FLINK-31162][yarn] Use currUsr.getTokens instead of currUsr.getCrede…

2023-02-21 Thread via GitHub


venkata91 commented on PR #21985:
URL: https://github.com/apache/flink/pull/21985#issuecomment-1438936505

   Please take a look. cc @gaborgsomogyi @MartijnVisser @becketqin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   4   >