[GitHub] [flink] zhuzhurk commented on a diff in pull request #21970: [FLINK-31041][runtime] Fix multiple restoreState when GlobalFailure occurs in a short period.
zhuzhurk commented on code in PR #21970: URL: https://github.com/apache/flink/pull/21970#discussion_r1112513443 ## flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java: ## @@ -377,6 +377,10 @@ private void restartTasks( final Set verticesToRestart = executionVertexVersioner.getUnmodifiedExecutionVertices(executionVertexVersions); +if (verticesToRestart.isEmpty()) { +return; Review Comment: I'm not entirely sure but a bit concerned is that Flink may take some important actions in `CheckpointCoordinator#restoreLatestCheckpointedStateInternal()` even if `verticesToRestart` if empty, e.g. `invoking OperatorCoordinator#resetToCheckpoint(...)`. These actions were always taken previously, while are possible to be skipped after this change(when a global failover and regional failover happen concurrently). I haven't had the chance to examine it all over yet. It's appreciated if you can also help to examine this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-ml] lindong28 commented on a diff in pull request #214: [FLINK-31127] Add public API classes for FLIP-289
lindong28 commented on code in PR #214: URL: https://github.com/apache/flink-ml/pull/214#discussion_r1113910042 ## flink-ml-servable-core/src/main/java/org/apache/flink/ml/servable/api/DataFrame.java: ## @@ -0,0 +1,120 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.ml.servable.api; + +import org.apache.flink.annotation.PublicEvolving; +import org.apache.flink.ml.servable.types.DataType; + +import java.util.Iterator; +import java.util.List; + +/** + * A DataFrame consists of some number of rows, each of which has the same list of column names and + * data types. + * + * All values in a column must have the same data type: integer, float, string etc. + */ +@PublicEvolving +public class DataFrame { + +private final List columnNames; +private final List dataTypes; +private final List rows; + +public DataFrame(List columnNames, List dataTypes, List rows) { +int numColumns = columnNames.size(); +if (dataTypes.size() != numColumns) { +throw new IllegalArgumentException( +"The number of data types is different from the number of column names."); +} +for (Row row : rows) { +if (row.size() != numColumns) { +throw new IllegalArgumentException( +"The number of values in some row is different from the number of column names"); +} +} + +this.columnNames = columnNames; +this.dataTypes = dataTypes; +this.rows = rows; +} + +/** Returns a list of the names of all the columns in this DataFrame. */ +public List getColumnNames() { +return columnNames; +} + +/** + * Returns the index of the column with the given name. + * + * @throws IllegalArgumentException if the column is not present in this table + */ +public int getIndex(String name) { +for (int i = 0; i < columnNames.size(); i++) { +if (columnNames.get(i).equals(name)) { +return i; +} +} +throw new IllegalArgumentException("Failed to find the column with the given name."); +} + +/** + * Returns the data type of the column with the given name. + * + * @throws IllegalArgumentException if the column is not present in this table + */ +public DataType getDataType(String name) { +int index = getIndex(name); +return dataTypes.get(index); +} + +/** + * Adds to this DataFrame a column with the given name, data type, and values. + * + * @throws IllegalArgumentException if the number of values is different from the number of + * rows. + */ +public DataFrame addColumn(String columnName, DataType dataType, List values) { +if (values.size() != rows.size()) { +throw new RuntimeException( +"The number of values is different from the number of rows."); +} +columnNames.add(columnName); +dataTypes.add(dataType); Review Comment: Thanks for catching this. It is fixed now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-31180) Fail early when installing minikube and check whether we can retry
Matthias Pohl created FLINK-31180: - Summary: Fail early when installing minikube and check whether we can retry Key: FLINK-31180 URL: https://issues.apache.org/jira/browse/FLINK-31180 Project: Flink Issue Type: Improvement Components: Test Infrastructure Affects Versions: 1.16.1, 1.15.3, 1.17.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46367=logs=bea52777-eaf8-5663-8482-18fbc3630e81=b2642e3a-5b86-574d-4c8a-f7e2842bfb14=4726 We experienced a build failure where Minikube couldn't be installed due to some network issues. Two things which we could do here: * check whether we can add a retry loop (maybe also to other resource) * fail early if CI didn't manage to install the binaries -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-table-store] JingsongLi commented on pull request #547: [FLINK-31128] Add Create Table As for flink table store
JingsongLi commented on PR #547: URL: https://github.com/apache/flink-table-store/pull/547#issuecomment-1439554535 > Thanks @zhangjun0x01 , at present, the biggest problem with CTAS is that it cannot specify the primary key and partition column, which makes it almost unavailable for production. Or can we consider supporting them? @zhangjun0x01 The core problem is that Flink SQL CTAS dose not support pk partition key declaration, Maybe we can provide options for primary key and partition key just like Spark Create table for table store. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #21992: [BP-1.17][FLINK-31132] Let hive compact operator without setting parallelism subject to sink operator's configured parallelism.
flinkbot commented on PR #21992: URL: https://github.com/apache/flink/pull/21992#issuecomment-1439566892 ## CI report: * 2da857058638b87b37c50be1cc402022fdc3 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] liuyongvs opened a new pull request, #21993: [FLINK-31166][table] Fix array_contains does not support null argumen…
liuyongvs opened a new pull request, #21993: URL: https://github.com/apache/flink/pull/21993 ## What is the purpose of the change *Fix array_contains does not support null argument when the array element type is not null.* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] dannycranmer merged pull request #21970: [FLINK-31041][runtime] Fix multiple restoreState when GlobalFailure occurs in a short period.
dannycranmer merged PR #21970: URL: https://github.com/apache/flink/pull/21970 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] dannycranmer commented on pull request #21970: [FLINK-31041][runtime] Fix multiple restoreState when GlobalFailure occurs in a short period.
dannycranmer commented on PR #21970: URL: https://github.com/apache/flink/pull/21970#issuecomment-1439572097 Thanks @huwh, seems like all open questions are resolved and CI passes. Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-ml] jiangxin369 commented on a diff in pull request #214: [FLINK-31127] Add public API classes for FLIP-289
jiangxin369 commented on code in PR #214: URL: https://github.com/apache/flink-ml/pull/214#discussion_r1113903069 ## flink-ml-servable-core/src/main/java/org/apache/flink/ml/servable/api/ModelServable.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.ml.servable.api; + +import org.apache.flink.annotation.PublicEvolving; + +import java.io.IOException; +import java.io.InputStream; + +/** + * A ModelServable is a TransformerServable with the extra API to set model data. + * + * @param The class type of the ModelServable implementation itself. + */ +@PublicEvolving +public interface ModelServable> extends TransformerServable { + +/** Sets model data using the serialized model data from the given input stream. */ Review Comment: ```suggestion /** Sets model data using the serialized model data from the given input streams. */ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-31061) Release Testing: Verify FLINK-30376 Introduce a new flink bushy join reorder rule which based on greedy algorithm
[ https://issues.apache.org/jira/browse/FLINK-31061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691961#comment-17691961 ] Yunhong Zheng commented on FLINK-31061: --- Thanks, [~JunRuiLi] for your careful testing. no problems were found during the test, so I will close this issue. > Release Testing: Verify FLINK-30376 Introduce a new flink bushy join reorder > rule which based on greedy algorithm > - > > Key: FLINK-31061 > URL: https://issues.apache.org/jira/browse/FLINK-31061 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Affects Versions: 1.17.0 >Reporter: Yunhong Zheng >Assignee: Junrui Li >Priority: Major > Fix For: 1.17.0 > > > This issue aims to verify FLINK-30376: [Introduce a new flink bushy join > reorder rule which based on greedy > algorithm|https://issues.apache.org/jira/browse/FLINK-30376]. > In Flink-1.17, bushy join reorder strategy is the default join reorder > strategy, and this strategy can be disable by setting factor ' > table.optimizer.bushy-join-reorder-threshold' smaller that the table number > need to be reordered. If disabled, the Lopt join reorder strategy, which is > default join reorder strategy in Flink-1.16, will be choosen. > We can verify it in SQL client after we build the flink-dist package. > # Firstly, we need to create several tables (The best case is that these > tables have table and column statistics). > # Secondly, we need to set 'table.optimizer.join-reorder-enabled = true' to > open join reorder. > # Verify bushy join reorder (The default bushy join reorder threshold is 12, > so if the number of table smaller than 12, the join reorder strategy is bushy > join reorder). > # Compare the results of bushy join reorder and Lopt join reorder strategy. > Need to be same. > # If we want to create a bushy join tree after join reorder, we need to add > statistics. Like:'JoinReorderITCaseBase.testBushyTreeJoinReorder'. > If you meet any problems, it's welcome to ping me directly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] zhuzhurk commented on a diff in pull request #21970: [FLINK-31041][runtime] Fix multiple restoreState when GlobalFailure occurs in a short period.
zhuzhurk commented on code in PR #21970: URL: https://github.com/apache/flink/pull/21970#discussion_r1113903871 ## flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java: ## @@ -377,6 +377,10 @@ private void restartTasks( final Set verticesToRestart = executionVertexVersioner.getUnmodifiedExecutionVertices(executionVertexVersions); +if (verticesToRestart.isEmpty()) { +return; Review Comment: Yes you are right. If a global failover is in progress, no regional failover will be triggered. Thanks for the explanation! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-31061) Release Testing: Verify FLINK-30376 Introduce a new flink bushy join reorder rule which based on greedy algorithm
[ https://issues.apache.org/jira/browse/FLINK-31061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yunhong Zheng closed FLINK-31061. - Resolution: Fixed > Release Testing: Verify FLINK-30376 Introduce a new flink bushy join reorder > rule which based on greedy algorithm > - > > Key: FLINK-31061 > URL: https://issues.apache.org/jira/browse/FLINK-31061 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Affects Versions: 1.17.0 >Reporter: Yunhong Zheng >Assignee: Junrui Li >Priority: Major > Fix For: 1.17.0 > > > This issue aims to verify FLINK-30376: [Introduce a new flink bushy join > reorder rule which based on greedy > algorithm|https://issues.apache.org/jira/browse/FLINK-30376]. > In Flink-1.17, bushy join reorder strategy is the default join reorder > strategy, and this strategy can be disable by setting factor ' > table.optimizer.bushy-join-reorder-threshold' smaller that the table number > need to be reordered. If disabled, the Lopt join reorder strategy, which is > default join reorder strategy in Flink-1.16, will be choosen. > We can verify it in SQL client after we build the flink-dist package. > # Firstly, we need to create several tables (The best case is that these > tables have table and column statistics). > # Secondly, we need to set 'table.optimizer.join-reorder-enabled = true' to > open join reorder. > # Verify bushy join reorder (The default bushy join reorder threshold is 12, > so if the number of table smaller than 12, the join reorder strategy is bushy > join reorder). > # Compare the results of bushy join reorder and Lopt join reorder strategy. > Need to be same. > # If we want to create a bushy join tree after join reorder, we need to add > statistics. Like:'JoinReorderITCaseBase.testBushyTreeJoinReorder'. > If you meet any problems, it's welcome to ping me directly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-ml] lindong28 commented on pull request #214: [FLINK-31127] Add public API classes for FLIP-289
lindong28 commented on PR #214: URL: https://github.com/apache/flink-ml/pull/214#issuecomment-1439531423 @zhipeng93 Can you help review this PR? If this PR looks good to you, I will need to send an email to the FLIP-289 voting thread to discuss the proposed API change (e.g. let `setModelData()` take multiple input streams) before this PR is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] reswqa opened a new pull request, #21992: [BP-1.17][FLINK-31132] Let hive compact operator without setting parallelism subject to sink operator's configured parallelism.
reswqa opened a new pull request, #21992: URL: https://github.com/apache/flink/pull/21992 Backport FLINK-31132 to release-1.17. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] ChenZhongPu commented on a diff in pull request #21989: [FLINK-31174][Doc] Resolve inconsistent data format between Learn-Flink Doc and flink-training-repo
ChenZhongPu commented on code in PR #21989: URL: https://github.com/apache/flink/pull/21989#discussion_r1113906467 ## docs/content.zh/docs/learn-flink/etl.md: ## @@ -223,11 +215,11 @@ minutesByStartCell 在 Flink 不参与管理状态的情况下,你的应用也可以使用状态,但 Flink 为其管理状态提供了一些引人注目的特性: -* **本地性**: Flink 状态是存储在使用它的机器本地的,并且可以以内存访问速度来获取 -* **持久性**: Flink 状态是容错的,例如,它可以自动按一定的时间间隔产生 checkpoint,并且在任务失败后进行恢复 -* **纵向可扩展性**: Flink 状态可以存储在集成的 RocksDB 实例中,这种方式下可以通过增加本地磁盘来扩展空间 -* **横向可扩展性**: Flink 状态可以随着集群的扩缩容重新分布 -* **可查询性**: Flink 状态可以通过使用 [状态查询 API]({{< ref "docs/dev/datastream/fault-tolerance/queryable_state" >}}) 从外部进行查询。 +- **本地性**: Flink 状态是存储在使用它的机器本地的,并且可以以内存访问速度来获取 Review Comment: I added an extra [jira](https://issues.apache.org/jira/browse/FLINK-31177) in terms of markdown formatting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] shuiqiangchen commented on pull request #21962: [FLINK-31124][Connectors/Hive] Add IT sase for HiveTableSink speculative execution
shuiqiangchen commented on PR #21962: URL: https://github.com/apache/flink/pull/21962#issuecomment-1439524626 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31177) To introduce a formatter for Markdown files
[ https://issues.apache.org/jira/browse/FLINK-31177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] CHEN Zhongpu updated FLINK-31177: - Description: Currently, markdown files in *docs* are maintained and updated by many contributors, and different people have varying code style taste. By the way, as the syntax of markdown is not really strict, the styles tend to be inconsistent. To name a few, * Some prefer `*` to make a list item, while others may prefer `-`. * It is common to leave many unnecessary blank lines and spaces. * To make a divider, the number of `-` can be varying. To this end, I think it would be nicer to encourage or demand contributors to format their markdown files before making a pull request. Personally, I think Prettier ([https://prettier.io/)] is a good candidate. was: Currently, markdown files in *docs* are maintained and updated by many contributors, and different people have varying code style taste. By the way, as the syntax of markdown is not really strict, the styles can also inconsistent. To name a few, * Some prefer `*` to make a list item, while others may prefer `-`. * It is common to leave many unnecessary blank lines and spaces. * To make a divider, the number of `-` can be varying. To this end, I think it would be nicer to encourage or demand contributors to format their markdown files before making a pull request. Personally, I think Prettier ([https://prettier.io/)] is a good candidate. > To introduce a formatter for Markdown files > --- > > Key: FLINK-31177 > URL: https://issues.apache.org/jira/browse/FLINK-31177 > Project: Flink > Issue Type: Improvement >Reporter: CHEN Zhongpu >Priority: Minor > > Currently, markdown files in *docs* are maintained and updated by many > contributors, and different people have varying code style taste. By the way, > as the syntax of markdown is not really strict, the styles tend to be > inconsistent. > To name a few, > * Some prefer `*` to make a list item, while others may prefer `-`. > * It is common to leave many unnecessary blank lines and spaces. > * To make a divider, the number of `-` can be varying. > To this end, I think it would be nicer to encourage or demand contributors to > format their markdown files before making a pull request. Personally, I > think Prettier ([https://prettier.io/)] is a good candidate. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-ml] lindong28 commented on a diff in pull request #214: [FLINK-31127] Add public API classes for FLIP-289
lindong28 commented on code in PR #214: URL: https://github.com/apache/flink-ml/pull/214#discussion_r1113911222 ## flink-ml-servable-core/src/main/java/org/apache/flink/ml/servable/api/ModelServable.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.ml.servable.api; + +import org.apache.flink.annotation.PublicEvolving; + +import java.io.IOException; +import java.io.InputStream; + +/** + * A ModelServable is a TransformerServable with the extra API to set model data. + * + * @param The class type of the ModelServable implementation itself. + */ +@PublicEvolving +public interface ModelServable> extends TransformerServable { + +/** Sets model data using the serialized model data from the given input stream. */ Review Comment: Thanks for catching this. It is fixed now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31166) array_contains element type error
[ https://issues.apache.org/jira/browse/FLINK-31166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-31166: --- Labels: pull-request-available (was: ) > array_contains element type error > - > > Key: FLINK-31166 > URL: https://issues.apache.org/jira/browse/FLINK-31166 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.18.0 >Reporter: jackylau >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > Attachments: image-2023-02-21-18-37-45-202.png, > image-2023-02-21-18-41-19-385.png, image-2023-02-22-09-56-59-257.png > > > !image-2023-02-22-09-56-59-257.png! > > !image-2023-02-21-18-41-19-385.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] flinkbot commented on pull request #21990: [Flink 31170] [docs]The spelling error of the document word causes sql to fail to execute
flinkbot commented on PR #21990: URL: https://github.com/apache/flink/pull/21990#issuecomment-1439515422 ## CI report: * 9dd7ee22fcd390e9a32e54fe914640ef2d7b888f UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] JingsongLi opened a new pull request, #551: [FLINK-31179] Make data structures serializable
JingsongLi opened a new pull request, #551: URL: https://github.com/apache/flink-table-store/pull/551 This can make our internal data structures easy to use. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31179) Make data structures serializable
[ https://issues.apache.org/jira/browse/FLINK-31179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-31179: --- Labels: pull-request-available (was: ) > Make data structures serializable > - > > Key: FLINK-31179 > URL: https://issues.apache.org/jira/browse/FLINK-31179 > Project: Flink > Issue Type: Improvement > Components: Table Store >Reporter: Jingsong Lee >Assignee: Jingsong Lee >Priority: Major > Labels: pull-request-available > Fix For: table-store-0.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31179) Make data structures serializable
Jingsong Lee created FLINK-31179: Summary: Make data structures serializable Key: FLINK-31179 URL: https://issues.apache.org/jira/browse/FLINK-31179 Project: Flink Issue Type: Improvement Components: Table Store Reporter: Jingsong Lee Assignee: Jingsong Lee Fix For: table-store-0.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] flinkbot commented on pull request #21991: [BP-1.17][FLINK-31091][sql-gateway] Add Ser/de for Interval types (#21945)
flinkbot commented on PR #21991: URL: https://github.com/apache/flink/pull/21991#issuecomment-1439549889 ## CI report: * f2a617eb6560c48e85f0dbff2cc1d550194bf67b UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-31177) To introduce a formatter for Markdown files
CHEN Zhongpu created FLINK-31177: Summary: To introduce a formatter for Markdown files Key: FLINK-31177 URL: https://issues.apache.org/jira/browse/FLINK-31177 Project: Flink Issue Type: Improvement Reporter: CHEN Zhongpu Currently, markdown files in *docs* are maintained and updated by many contributors, and different people have varying code style taste. By the way, as the syntax of markdown is not really strict, the styles can also inconsistent. To name a few, * Some prefer `*` to make a list item, while others may prefer `-`. * It is common to leave many unnecessary blank lines and spaces. * To make a divider, the number of `-` can be varying. To this end, I think it would be nicer to encourage or demand contributors to format their markdown files before making a pull request. Personally, I think Prettier ([https://prettier.io/)] is a good candidate. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31169) KubernetesResourceManagerDriverTest.testOnPodDeleted fails fatally due to 239 exit code
[ https://issues.apache.org/jira/browse/FLINK-31169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691994#comment-17691994 ] Matthias Pohl commented on FLINK-31169: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46382=logs=5cae8624-c7eb-5c51-92d3-4d2dacedd221=5acec1b4-945b-59ca-34f8-168928ce5199=26468 > KubernetesResourceManagerDriverTest.testOnPodDeleted fails fatally due to 239 > exit code > --- > > Key: FLINK-31169 > URL: https://issues.apache.org/jira/browse/FLINK-31169 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.17.0 >Reporter: Matthias Pohl >Assignee: Xintong Song >Priority: Blocker > Labels: pull-request-available, test-stability > > master: > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46341=logs=5cae8624-c7eb-5c51-92d3-4d2dacedd221=5acec1b4-945b-59ca-34f8-168928ce5199=27329 > {code} > [...] > Feb 21 04:44:11 [ERROR] Process Exit Code: 239 > Feb 21 04:44:11 [ERROR] Crashed tests: > Feb 21 04:44:11 [ERROR] > org.apache.flink.kubernetes.KubernetesResourceManagerDriverTest > Feb 21 04:44:11 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:748) > [...] > {code} > {code} > [...] > Test > org.apache.flink.kubernetes.KubernetesResourceManagerDriverTest.testOnPodDeleted[testOnPodDeleted()] > is running. > > 04:43:57,681 [ForkJoinPool-4-worker-1] INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Recovered 0 > pods from previous attempts, current attempt id is 1. > 04:43:57,701 [testing-rpc-main-thread] INFO > org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled > external resources: [] > 04:43:57,705 [testing-rpc-main-thread] INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Creating > new TaskManager pod with name testing-flink-cluster-taskmanager-1-1 and > resource <704,0.0>. > 04:43:57,708 [testing-rpc-main-thread] INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received > new TaskManager pod: testing-flink-cluster-taskmanager-1-1 > 04:43:57,708 [testing-rpc-main-thread] INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod > testing-flink-cluster-taskmanager-1-1 is created. > 04:43:57,708 [testing-rpc-main-thread] WARN > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod > testing-flink-cluster-taskmanager-1-1 is terminated before being scheduled. > 04:43:57,709 [testing-rpc-main-thread] ERROR > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Error > completing resource request. > org.apache.flink.util.FlinkException: Pod is terminated. > at > org.apache.flink.kubernetes.KubernetesResourceManagerDriver.onPodTerminated(KubernetesResourceManagerDriver.java:379) > ~[classes/:?] > at > org.apache.flink.kubernetes.KubernetesResourceManagerDriver.lambda$handlePodEventsInMainThread$2(KubernetesResourceManagerDriver.java:347) > ~[classes/:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_292] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [?:1.8.0_292] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > [?:1.8.0_292] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > [?:1.8.0_292] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_292] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_292] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] > 04:43:57,724 [testing-rpc-main-thread] ERROR > org.apache.flink.util.FatalExitExceptionHandler [] - FATAL: > Thread 'testing-rpc-main-thread' produced an uncaught exception. Stopping the > process... > java.util.concurrent.CompletionException: java.lang.RuntimeException: > org.apache.flink.util.FlinkException: Pod is terminated. > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) > ~[?:1.8.0_292] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) > ~[?:1.8.0_292] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) > ~[?:1.8.0_292] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_292] > at >
[GitHub] [flink] snuyanzin opened a new pull request, #21991: [BP-1.17][FLINK-31091][sql-gateway] Add Ser/de for Interval types (#21945)
snuyanzin opened a new pull request, #21991: URL: https://github.com/apache/flink/pull/21991 This is a backport of 14adc1679fb3d025a2808af91f23f14e7c6f6e24 to 1.17.0 branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #546: [FLINK-31009] Add recordCount to snapshot meta
JingsongLi commented on code in PR #546: URL: https://github.com/apache/flink-table-store/pull/546#discussion_r1113926517 ## flink-table-store-core/src/main/java/org/apache/flink/table/store/file/operation/FileStoreCommitImpl.java: ## @@ -477,12 +477,20 @@ private boolean tryCommitOnce( List newMetas = new ArrayList<>(); List changelogMetas = new ArrayList<>(); try { +long previousChangesRecordCount = 0L; if (latestSnapshot != null) { +List previousManifests = +latestSnapshot.dataManifests(manifestList); // read all previous manifest files - oldMetas.addAll(latestSnapshot.readAllDataManifests(manifestList)); +oldMetas.addAll(previousManifests); +previousChangesRecordCount = +latestSnapshot.version() >= Snapshot.CURRENT_VERSION Review Comment: Create a util method: `totalRecordCount(Snapshot snapshot)`. And I think maybe just judge that field is null instead of snapshot version? ## flink-table-store-core/src/main/java/org/apache/flink/table/store/file/Snapshot.java: ## @@ -124,6 +130,24 @@ public class Snapshot { @JsonProperty(FIELD_LOG_OFFSETS) private final Map logOffsets; +// record count of all changes from the previous snapshots +// null for table store <= 0.3 +@JsonProperty(FIELD_BASE_RECORD_COUNT) +@Nullable +private final Long baseRecordCount; Review Comment: maybe just provide a `totalRecordCount`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-31178) Public Writer API
Jingsong Lee created FLINK-31178: Summary: Public Writer API Key: FLINK-31178 URL: https://issues.apache.org/jira/browse/FLINK-31178 Project: Flink Issue Type: Improvement Components: Table Store Reporter: Jingsong Lee Assignee: Jingsong Lee Fix For: table-store-0.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31041) Race condition in DefaultScheduler results in memory leak and busy loop
[ https://issues.apache.org/jira/browse/FLINK-31041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691998#comment-17691998 ] Danny Cranmer commented on FLINK-31041: --- Merged commit [{{b3e1492}}|https://github.com/apache/flink/commit/b3e14928e815dd6dbdffbe3c5616733d4c7c8825] into apache:master > Race condition in DefaultScheduler results in memory leak and busy loop > --- > > Key: FLINK-31041 > URL: https://issues.apache.org/jira/browse/FLINK-31041 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1 >Reporter: Danny Cranmer >Assignee: Weihua Hu >Priority: Critical > Labels: pull-request-available > Fix For: 1.17.0, 1.15.4, 1.16.2 > > Attachments: failovers.log, flink-31041-heap-dump.png, > test-restart-strategy.log > > > h4. Context > When a job creates multiple sources that use the {{SourceCoordinator}} > (FLIP-27), there is a failure race condition that results in: > * Memory leak of {{ExecutionVertexVersion}} > * Busy loop constantly trying to restart job > * Restart strategy is not respected > This results in the Job Manager becoming unresponsive. > h4. !flink-31041-heap-dump.png! > h4. Reproduction Steps > This can be reproduced by a job that creates multiple sources that fail in > the {{{}SplitEnumerator{}}}. We observed this with multiple {{KafkaSource's}} > trying to load a non-existent cert from the file system and throwing FNFE. > Thus, here is a simple job to reproduce (BE WARNED: running this locally will > lock up your IDE): > {code:java} > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.setParallelism(1); > env.setRestartStrategy(new > RestartStrategies.FailureRateRestartStrategyConfiguration(1, Time.of(10, > TimeUnit.SECONDS), Time.of(10, TimeUnit.SECONDS))); > KafkaSource source = KafkaSource.builder() > .setProperty("security.protocol", "SASL_SSL") > // SSL configurations > // Configure the path of truststore (CA) provided by the server > .setProperty("ssl.truststore.location", > "/path/to/kafka.client.truststore.jks") > .setProperty("ssl.truststore.password", "test1234") > // Configure the path of keystore (private key) if client > authentication is required > .setProperty("ssl.keystore.location", > "/path/to/kafka.client.keystore.jks") > .setProperty("ssl.keystore.password", "test1234") > // SASL configurations > // Set SASL mechanism as SCRAM-SHA-256 > .setProperty("sasl.mechanism", "SCRAM-SHA-256") > // Set JAAS configurations > .setProperty("sasl.jaas.config", > "org.apache.kafka.common.security.scram.ScramLoginModule required > username=\"username\" password=\"password\";") > .setBootstrapServers("http://localhost:3456;) > .setTopics("input-topic") > .setGroupId("my-group") > .setStartingOffsets(OffsetsInitializer.earliest()) > .setValueOnlyDeserializer(new SimpleStringSchema()) > .build(); > List> sources = IntStream.range(0, 32) > .mapToObj(i -> env > .fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka > Source " + i).uid("source-" + i) > .keyBy(s -> s.charAt(0)) > .map(s -> s)) > .collect(Collectors.toList()); > env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka > Source").uid("source") > .keyBy(s -> s.charAt(0)) > .union(sources.toArray(new SingleOutputStreamOperator[] {})) > .print(); > env.execute("test job"); {code} > h4. Root Cause > We can see that the {{OperatorCoordinatorHolder}} already has a [debounce > mechanism|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/coordination/OperatorCoordinatorHolder.java#L609], > however the {{DefaultScheduler}} does not. We need a debounce mechanism in > the {{DefaultScheduler}} since it handles many > {{{}OperatorCoordinatorHolder{}}}. > h4. Fix > I have managed to fix this, I will open a PR, but would need feedback from > people who understand this code better than me! > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] liuyongvs commented on pull request #21993: [FLINK-31166][table] Fix array_contains does not support null argumen…
liuyongvs commented on PR #21993: URL: https://github.com/apache/flink/pull/21993#issuecomment-1439574746 hi @snuyanzin after dig into the code, i fix it now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-31079) Release Testing: Verify FLINK-29663 Further improvements of adaptive batch scheduler
[ https://issues.apache.org/jira/browse/FLINK-31079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lijie Wang closed FLINK-31079. -- Resolution: Done > Release Testing: Verify FLINK-29663 Further improvements of adaptive batch > scheduler > > > Key: FLINK-31079 > URL: https://issues.apache.org/jira/browse/FLINK-31079 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Lijie Wang >Assignee: miamiaoxyz >Priority: Blocker > Fix For: 1.17.0 > > Attachments: image-2023-02-22-14-00-13-646.png > > > This task aims to verify FLINK-29663 which improves the adaptive batch > scheduler. > Before the change of FLINK-29663, adaptive batch scheduler will distribute > subpartitoins according to the number of subpartitions, make different > downstream subtasks consume roughly the same number of subpartitions. This > will lead to imbalance loads of different downstream tasks when the > subpartitions contain different amounts of data. > To solve this problem, in FLINK-29663, we let the adaptive batch scheduler > distribute subpartitoins according to the amount of data, so that different > downstream subtasks consume roughly the same amount of data. Note that > currently it only takes effect for All-To-All edges. > The documentation of adaptive scheduler can be found > [here|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/#adaptive-batch-scheduler] > One can verify it by creating intended data skew on All-To-All edges. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31079) Release Testing: Verify FLINK-29663 Further improvements of adaptive batch scheduler
[ https://issues.apache.org/jira/browse/FLINK-31079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691969#comment-17691969 ] Lijie Wang edited comment on FLINK-31079 at 2/22/23 7:14 AM: - Thanks [~lsy]. Currently, the [{{execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task}}|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-ta] is cluster level, but I personally think it makes sense to make it job level, especially when using session cluster, I will evaluate it in the future version. was (Author: wanglijie95): Thanks [~lsy]. Currently, the [{{execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task}}|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-ta] is cluster level, but I personally think it makes sense to make it a job level, especially when using session cluster, I will evaluate it in the future version. > Release Testing: Verify FLINK-29663 Further improvements of adaptive batch > scheduler > > > Key: FLINK-31079 > URL: https://issues.apache.org/jira/browse/FLINK-31079 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Lijie Wang >Assignee: miamiaoxyz >Priority: Blocker > Fix For: 1.17.0 > > Attachments: image-2023-02-22-14-00-13-646.png > > > This task aims to verify FLINK-29663 which improves the adaptive batch > scheduler. > Before the change of FLINK-29663, adaptive batch scheduler will distribute > subpartitoins according to the number of subpartitions, make different > downstream subtasks consume roughly the same number of subpartitions. This > will lead to imbalance loads of different downstream tasks when the > subpartitions contain different amounts of data. > To solve this problem, in FLINK-29663, we let the adaptive batch scheduler > distribute subpartitoins according to the amount of data, so that different > downstream subtasks consume roughly the same amount of data. Note that > currently it only takes effect for All-To-All edges. > The documentation of adaptive scheduler can be found > [here|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/#adaptive-batch-scheduler] > One can verify it by creating intended data skew on All-To-All edges. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31079) Release Testing: Verify FLINK-29663 Further improvements of adaptive batch scheduler
[ https://issues.apache.org/jira/browse/FLINK-31079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691969#comment-17691969 ] Lijie Wang commented on FLINK-31079: Thanks [~lsy]. Currently, the [{{execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task}}|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-ta] is cluster level, but I personally think it makes sense to make it a job level, especially when using session cluster, I will evaluate it in the future version. > Release Testing: Verify FLINK-29663 Further improvements of adaptive batch > scheduler > > > Key: FLINK-31079 > URL: https://issues.apache.org/jira/browse/FLINK-31079 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Lijie Wang >Assignee: miamiaoxyz >Priority: Blocker > Fix For: 1.17.0 > > Attachments: image-2023-02-22-14-00-13-646.png > > > This task aims to verify FLINK-29663 which improves the adaptive batch > scheduler. > Before the change of FLINK-29663, adaptive batch scheduler will distribute > subpartitoins according to the number of subpartitions, make different > downstream subtasks consume roughly the same number of subpartitions. This > will lead to imbalance loads of different downstream tasks when the > subpartitions contain different amounts of data. > To solve this problem, in FLINK-29663, we let the adaptive batch scheduler > distribute subpartitoins according to the amount of data, so that different > downstream subtasks consume roughly the same amount of data. Note that > currently it only takes effect for All-To-All edges. > The documentation of adaptive scheduler can be found > [here|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/#adaptive-batch-scheduler] > One can verify it by creating intended data skew on All-To-All edges. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31079) Release Testing: Verify FLINK-29663 Further improvements of adaptive batch scheduler
[ https://issues.apache.org/jira/browse/FLINK-31079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691969#comment-17691969 ] Lijie Wang edited comment on FLINK-31079 at 2/22/23 7:14 AM: - Thanks [~lsy]. Currently, the [{{execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task}}|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-ta] is cluster level, but I personally think it makes sense to make it job level, especially when using session cluster, I will evaluate it in the future. was (Author: wanglijie95): Thanks [~lsy]. Currently, the [{{execution.batch.adaptive.auto-parallelism.avg-data-volume-per-task}}|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#execution-batch-adaptive-auto-parallelism-avg-data-volume-per-ta] is cluster level, but I personally think it makes sense to make it job level, especially when using session cluster, I will evaluate it in the future version. > Release Testing: Verify FLINK-29663 Further improvements of adaptive batch > scheduler > > > Key: FLINK-31079 > URL: https://issues.apache.org/jira/browse/FLINK-31079 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Lijie Wang >Assignee: miamiaoxyz >Priority: Blocker > Fix For: 1.17.0 > > Attachments: image-2023-02-22-14-00-13-646.png > > > This task aims to verify FLINK-29663 which improves the adaptive batch > scheduler. > Before the change of FLINK-29663, adaptive batch scheduler will distribute > subpartitoins according to the number of subpartitions, make different > downstream subtasks consume roughly the same number of subpartitions. This > will lead to imbalance loads of different downstream tasks when the > subpartitions contain different amounts of data. > To solve this problem, in FLINK-29663, we let the adaptive batch scheduler > distribute subpartitoins according to the amount of data, so that different > downstream subtasks consume roughly the same amount of data. Note that > currently it only takes effect for All-To-All edges. > The documentation of adaptive scheduler can be found > [here|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/elastic_scaling/#adaptive-batch-scheduler] > One can verify it by creating intended data skew on All-To-All edges. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31167) Verify that no exclusions were erroneously added to the japicmp plugin
[ https://issues.apache.org/jira/browse/FLINK-31167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691970#comment-17691970 ] Biao Liu commented on FLINK-31167: -- Hi [~mapohl] , As you mentioned, the new method with context were introduced to replace the old one. Let's take the {{finalizeGlobal}} as an example. If we do not provide the default implementation for {{finalizeGlobal(int)}} and someone wants to use the new method with context, he has to implement both {{finalizeGlobal(int)}} and {{finalizeGlobal(FinalizationContext)}} even the former one means nothing to him. That's not we expect users to do. The {{finalizeGlobal(int)}} works for the legacy codes, and {{finalizeGlobal(FinalizationContext)}} works for the new implementation. Users only need to implement one of them. It's sort of like how SinkFunction is handled in https://issues.apache.org/jira/browse/FLINK-7552. I think it's not a breaking change however it could not pass the japicmp checking. > Verify that no exclusions were erroneously added to the japicmp plugin > -- > > Key: FLINK-31167 > URL: https://issues.apache.org/jira/browse/FLINK-31167 > Project: Flink > Issue Type: Sub-task >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > > Verify that no exclusions were erroneously added to the japicmp plugin that > break compatibility guarantees. Check the exclusions for the > japicmp-maven-plugin in the root pom (see > [apache/flink:pom.xml:2175ff|https://github.com/apache/flink/blob/3856c49af77601cf7943a5072d8c932279ce46b4/pom.xml#L2175] > for exclusions that: > * For minor releases: break source compatibility for {{@Public}} APIs > * For patch releases: break source/binary compatibility for > {{@Public}}/{{@PublicEvolving}} APIs > Any such exclusion must be properly justified, in advance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-30129) Push projection through ChangelogNormalize
[ https://issues.apache.org/jira/browse/FLINK-30129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu updated FLINK-30129: Fix Version/s: 1.18.0 > Push projection through ChangelogNormalize > -- > > Key: FLINK-30129 > URL: https://issues.apache.org/jira/browse/FLINK-30129 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Planner >Reporter: Jark Wu >Priority: Major > Fix For: 1.18.0 > > > Currently, the ChangelogNormalize node is generated during the physical > optimization phase. That means the projection is not pushed through > ChangelogNormalize if the {{TableSource}} doesn't support > {{SupportsProjectionPushDown}}. We can implement such optimization to reduce > the state size (fewer columns in state value) and better throughput (only > changes on the selected columns will be emitted). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] MartijnVisser commented on pull request #21985: [FLINK-31162][yarn] Use currUsr.getCredentials.getTokens instead of currUsr.getTokens
MartijnVisser commented on PR #21985: URL: https://github.com/apache/flink/pull/21985#issuecomment-1439560715 > The issue here is different and is only specific to `1.16` So this issue only occurs in 1.16 and not in 1.17 and later? Then we're good yes :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-ml] Fanoid commented on a diff in pull request #210: [FLINK-31010] Add Transformer and Estimator for GBTClassifier and GBTRegressor
Fanoid commented on code in PR #210: URL: https://github.com/apache/flink-ml/pull/210#discussion_r1113947913 ## flink-ml-lib/src/main/java/org/apache/flink/ml/common/gbt/GBTModelParams.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.ml.common.gbt; + +import org.apache.flink.ml.classification.gbtclassifier.GBTClassifierModel; +import org.apache.flink.ml.common.param.HasCategoricalCols; +import org.apache.flink.ml.common.param.HasFeaturesCol; +import org.apache.flink.ml.common.param.HasLabelCol; +import org.apache.flink.ml.common.param.HasPredictionCol; +import org.apache.flink.ml.param.Param; +import org.apache.flink.ml.param.StringArrayParam; +import org.apache.flink.ml.regression.gbtregressor.GBTRegressorModel; + +/** + * Params of {@link GBTClassifierModel} and {@link GBTRegressorModel}. + * + * If the input features come from 1 column of vector type, `featuresCol` should be used, and all + * features are treated as continuous features. Otherwise, `inputCols` should be used for multiple + * columns. Columns whose names specified in `categoricalCols` are treated as categorical features, + * while others are continuous features. + * + * NOTE: `inputCols` and `featuresCol` are in conflict with each other, so they should not be set + * at the same time. In addition, `inputCols` has a higher precedence than `featuresCol`, that is, + * `featuresCol` is ignored when `inputCols` is not `null`. + * + * @param The class type of this instance. + */ +public interface GBTModelParams +extends HasFeaturesCol, HasLabelCol, HasCategoricalCols, HasPredictionCol { + +Param INPUT_COLS = new StringArrayParam("inputCols", "Input column names.", null); Review Comment: I'm OK to rename `HasInputCols` to `HasFeaturesCols`. The reason to have both two parameters is related with the handling of categorical columns. SparkML's implementation only supports a vector column as input, but it can store meta information in its schema [1], so the algorithm can distinguish between numerical and categorical values. In Flink, there is no way to store meta information in the Table schema. So when end-users want to feed categorical values, our implementation can only accept the data as multiple columns, i.e., `HasFeaturesCols`. Otherwise, `HasFeaturesCol` is used. [1] https://github.com/apache/spark/blob/27ed89b7be5ebb91e4a0b106b1669a7867a6012d/mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala#L194 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-31166) array_contains element type error
[ https://issues.apache.org/jira/browse/FLINK-31166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692003#comment-17692003 ] Martijn Visser commented on FLINK-31166: [~jackylau] So this is specifically about the behaviour when the needle (given ARRAY_CONTAINS(haystack, needle) is the function) is null? > array_contains element type error > - > > Key: FLINK-31166 > URL: https://issues.apache.org/jira/browse/FLINK-31166 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.18.0 >Reporter: jackylau >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > Attachments: image-2023-02-21-18-37-45-202.png, > image-2023-02-21-18-41-19-385.png, image-2023-02-22-09-56-59-257.png > > > !image-2023-02-22-09-56-59-257.png! > > !image-2023-02-21-18-41-19-385.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] MartijnVisser merged pull request #21964: [FLINK-30948][Formats/AWS] Remove GlueSchemaRegistry Avro and JSON fo…
MartijnVisser merged PR #21964: URL: https://github.com/apache/flink/pull/21964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-30948) Remove flink-avro-glue-schema-registry and flink-json-glue-schema-registry from Flink main repo
[ https://issues.apache.org/jira/browse/FLINK-30948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn Visser updated FLINK-30948: --- Fix Version/s: 1.17.0 > Remove flink-avro-glue-schema-registry and flink-json-glue-schema-registry > from Flink main repo > --- > > Key: FLINK-30948 > URL: https://issues.apache.org/jira/browse/FLINK-30948 > Project: Flink > Issue Type: Sub-task > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) > Environment: Remove flink-avro-glue-schema-registry and > flink-json-glue-schema-registry from Flink main repo, along with associated > end-to-end tests >Reporter: Hong Liang Teoh >Assignee: Hong Liang Teoh >Priority: Major > Labels: pull-request-available > Fix For: 1.17.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31153) Create a release branch
[ https://issues.apache.org/jira/browse/FLINK-31153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-31153: -- Description: If you are doing a new minor release, you need to update Flink version in the following repositories: * [apache/flink|https://github.com/apache/flink] * [apache/flink-docker|https://github.com/apache/flink-docker] * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks] Patch releases don't require the these repositories to be touched. Simply checkout the already existing branch for that version: {code:java} $ cd ./tools $ git checkout release-$SHORT_RELEASE_VERSION {code} h4. Flink repository Create a branch for the new version that we want to release before updating the master branch to the next development version: {code:bash} $ cd ./tools $ releasing/create_snapshot_branch.sh $ git checkout master $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION releasing/update_branch_version.sh {code} In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java] as the last entry: {code:java} // ... v1_12("1.12"), v1_13("1.13"), v1_14("1.14"), v1_15("1.15"), v1_16("1.16"); {code} The newly created branch and updated {{master}} branch need to be pushed to the official repository. h4. Flink Docker Repository Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the [apache/flink-docker|https://github.com/apache/flink-docker] repository. Make sure that [apache/flink-docker:.github/workflows/ci.yml|https://github.com/apache/flink-docker/blob/dev-master/.github/workflows/ci.yml] points to the correct snapshot version; for {{dev-x.y}} it should point to {{{}x.y-SNAPSHOT{}}}, while for {{dev-master}} it should point to the most recent snapshot version (\{[$NEXT_SNAPSHOT_VERSION}}). After pushing the new minor release branch, as the last step you should also update the documentation workflow to also build the documentation for the new release branch. Check [Managing Documentation|https://cwiki.apache.org/confluence/display/FLINK/Managing+Documentation] on details on how to do that. You may also want to manually trigger a build to make the changes visible as soon as possible. h3. Expectations (Minor Version only) * Release branch has been created and pushed * Cron job has been added on the release branch in ([apache-flink:./tools/azure-pipelines/build-apache-repo.yml|https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-apache-repo.yml]) * Originating branch has the version information updated to the new version * New version is added to the [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java] enum. * Make sure {{flink-docker}} has {{dev-x.y}} branch and docker e2e tests run against this branch * docs/config.toml has been updated appropriately. * The {{dev-x.y}} branch of ({{{}$CURRENT_SNAPSHOT_VERSION{}}}) have been created in the Flink Benchmark repo. * The {{flink.version}} property of Flink Benchmark repo has been updated to the latest snapshot version. was: If you are doing a new major release, you need to update Flink version in the following repositories: * [apache/flink|https://github.com/apache/flink] * [apache/flink-docker|https://github.com/apache/flink-docker] * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks] Minor releases don't require the these repositories to be touched. Simply checkout the already existing branch for that version: {code:java} $ cd ./tools $ git checkout release-$SHORT_RELEASE_VERSION {code} h4. Flink repository Create a branch for the new version that we want to release before updating the master branch to the next development version: {code:bash} $ cd ./tools $ releasing/create_snapshot_branch.sh $ git checkout master $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION releasing/update_branch_version.sh {code} In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java] as the last entry: {code:java} // ... v1_12("1.12"), v1_13("1.13"), v1_14("1.14"), v1_15("1.15"), v1_16("1.16"); {code} The newly created branch and updated {{master}} branch need to be pushed to the official repository. h4. Flink Docker Repository Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the [apache/flink-docker|https://github.com/apache/flink-docker] repository. Make sure that
[jira] [Commented] (FLINK-31059) Release Testing: Verify FLINK-29717 Supports hive udaf such as sum/count by native implementation
[ https://issues.apache.org/jira/browse/FLINK-31059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691464#comment-17691464 ] miamiaoxyz commented on FLINK-31059: I first use sql client to test the feature, but the `exec.hive.native-agg-function.enabled` do not work. I then use ITCase to verify. # I use IT case to turn on and off `exec.hive.native-agg-function.enabled` to verify the two results are the same, testSql function test whether the same sql get same result with on and off `exec.hive.native-agg-function.enabled`. I verfied that the plan use Hashagg when turn on the `exec.hive.native-agg-function.enabled`, and the plan use SortAgg when turn off by IT case. !image-2023-02-21-15-45-48-226.png|width=549,height=234! It pass all the IT Case below. !image-2023-02-21-15-46-13-966.png|width=501,height=371! 2. I verified that data results are the same when combine sum/count/avg/min/max functions in query using `exec.hive.native-agg-function.enabled` on and off using the IT case below. I verfied that the plan use Hashagg when turn on the `exec.hive.native-agg-function.enabled`, and the plan use SortAgg when turn off by IT case. !image-2023-02-21-15-49-58-854.png|width=536,height=219! 3. For `array` and `struct` do not support the max function. For count function, it does not store `array` or `struct` in agg, so they use bigint instead, and hash-agg is chosen . !image-2023-02-21-15-59-44-470.png|width=1016,height=189! 4. For `first_value` and `last_value` are not implemented in hive, [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF)] I use `collect_set` to test instead. All the plan use SortAgg, and get same result, which meet the expectations. ``` --- turn on `table.exec.hive.native-agg-function.enabled` == Abstract Syntax Tree == LogicalProject(x=[$0], _o__c1=[$1]) +- LogicalAggregate(group=[\{0}], agg#0=[collect_set($1)]) +- LogicalProject($f0=[$0], $f1=[$1]) +- LogicalTableScan(table=[[test-catalog, default, foo]]) == Optimized Physical Plan == SortAggregate(isMerge=[true], groupBy=[x], select=[x, Final_collect_set($f1) AS $f1]) +- Sort(orderBy=[x ASC]) +- Exchange(distribution=[hash[x]]) +- LocalSortAggregate(groupBy=[x], select=[x, Partial_collect_set(y) AS $f1]) +- Sort(orderBy=[x ASC]) +- TableSourceScan(table=[[test-catalog, default, foo]], fields=[x, y]) == Optimized Execution Plan == SortAggregate(isMerge=[true], groupBy=[x], select=[x, Final_collect_set($f1) AS $f1]) +- Exchange(distribution=[forward]) +- Sort(orderBy=[x ASC]) +- Exchange(distribution=[hash[x]]) +- LocalSortAggregate(groupBy=[x], select=[x, Partial_collect_set(y) AS $f1]) +- Exchange(distribution=[forward]) +- Sort(orderBy=[x ASC]) +- TableSourceScan(table=[[test-catalog, default, foo]], fields=[x, y]) --- turn off `table.exec.hive.native-agg-function.enabled` == Abstract Syntax Tree == LogicalProject(x=[$0], _o__c1=[$1]) +- LogicalAggregate(group=[\{0}], agg#0=[collect_set($1)]) +- LogicalProject($f0=[$0], $f1=[$1]) +- LogicalTableScan(table=[[test-catalog, default, foo]]) == Optimized Physical Plan == SortAggregate(isMerge=[true], groupBy=[x], select=[x, Final_collect_set($f1) AS $f1]) +- Sort(orderBy=[x ASC]) +- Exchange(distribution=[hash[x]]) +- LocalSortAggregate(groupBy=[x], select=[x, Partial_collect_set(y) AS $f1]) +- Sort(orderBy=[x ASC]) +- TableSourceScan(table=[[test-catalog, default, foo]], fields=[x, y]) == Optimized Execution Plan == SortAggregate(isMerge=[true], groupBy=[x], select=[x, Final_collect_set($f1) AS $f1]) +- Exchange(distribution=[forward]) +- Sort(orderBy=[x ASC]) +- Exchange(distribution=[hash[x]]) +- LocalSortAggregate(groupBy=[x], select=[x, Partial_collect_set(y) AS $f1]) +- Exchange(distribution=[forward]) +- Sort(orderBy=[x ASC]) +- TableSourceScan(table=[[test-catalog, default, foo]], fields=[x, y]) ``` !image-2023-02-21-16-31-58-361.png|width=620,height=261! 5. I disable the hashagg to force use sortagg to process all of the test above, which can see that the result of forcing to close hashagg is the same as the result of turn on and off`exec.hive.native-agg-function.enabled`, which meets the expectations !image-2023-02-21-16-35-46-294.png|width=632,height=392! Problems: a. The `exec.hive.native-agg-function.enabled` do not work on sql client. the hashagg is not chosen on sql client. !https://intranetproxy.alipay.com/skylark/lark/0/2023/png/83756403/1676952029939-182fa078-3a07-4e45-bdbb-832f7f74c838.png|width=703,height=383,id=u4fc84338! b. Enable and disable `table.exec.hive.native-agg-function.enabled` get different result.
[GitHub] [flink] FangYongs commented on a diff in pull request #21901: [FLINK-30968][sql-client] Sql client supports dynamic config to open session
FangYongs commented on code in PR #21901: URL: https://github.com/apache/flink/pull/21901#discussion_r1112748189 ## flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/gateway/DefaultContextUtils.java: ## @@ -51,12 +52,17 @@ public static DefaultContext buildDefaultContext(CliOptions.EmbeddedCliOptions o } else { libDirs = Collections.emptyList(); } -return DefaultContext.load( -options.getPythonConfiguration(), discoverDependencies(jars, libDirs), true); +Configuration pythonConfiguration = options.getPythonConfiguration(); +pythonConfiguration.addAll( + ConfigurationUtils.createConfiguration(options.getSessionConfig())); +return DefaultContext.load(pythonConfiguration, discoverDependencies(jars, libDirs), true); Review Comment: LGTM, DONE -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] MartijnVisser merged pull request #21969: [FLINK-30948][Formats/AWS] Remove GlueSchemaRegistry Avro and JSON fo…
MartijnVisser merged PR #21969: URL: https://github.com/apache/flink/pull/21969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-31150) Cross team testing
[ https://issues.apache.org/jira/browse/FLINK-31150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reassigned FLINK-31150: - Assignee: Qingsheng Ren > Cross team testing > -- > > Key: FLINK-31150 > URL: https://issues.apache.org/jira/browse/FLINK-31150 > Project: Flink > Issue Type: Sub-task >Reporter: Matthias Pohl >Assignee: Qingsheng Ren >Priority: Major > > For user facing features that go into the release we'd like to ensure they > can actually _be used_ by Flink users. To achieve this the release managers > ensure that an issue for cross team testing is created in the Apache Flink > Jira. This can and should be picked up by other community members to verify > the functionality and usability of the feature. > The issue should contain some entry points which enables other community > members to test it. It should not contain documentation on how to use the > feature as this should be part of the actual documentation. The cross team > tests are performed after the feature freeze. Documentation should be in > place before that. Those tests are manual tests, so do not confuse them with > automated tests. > To sum that up: > * User facing features should be tested by other contributors > * The scope is usability and sanity of the feature > * The feature needs to be already documented > * The contributor creates an issue containing some pointers on how to get > started (e.g. link to the documentation, suggested targets of verification) > * Other community members pick those issues up and provide feedback > * Cross team testing happens right after the feature freeze -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31165) Over Agg: The window rank function without order by error in top N query
P Rohan Kumar created FLINK-31165: - Summary: Over Agg: The window rank function without order by error in top N query Key: FLINK-31165 URL: https://issues.apache.org/jira/browse/FLINK-31165 Project: Flink Issue Type: Bug Components: Table SQL / API Affects Versions: 1.16.0 Reporter: P Rohan Kumar {code:java} val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment val tableEnv = StreamTableEnvironment.create(env) val td = TableDescriptor.forConnector("datagen").option("rows-per-second", "10") .option("number-of-rows", "10") .schema(Schema .newBuilder() .column("NAME", DataTypes.VARCHAR(2147483647)) .column("ROLLNO", DataTypes.DECIMAL(5, 0)) .column("DOB", DataTypes.DATE()) .column("CLASS", DataTypes.DECIMAL(2, 0)) .column("SUBJECT", DataTypes.VARCHAR(2147483647)) .build()) .build() val table = tableEnv.from(td) tableEnv.createTemporaryView("temp_table", table) val newTable = tableEnv.sqlQuery("select temp_table.*,cast('2022-01-01' as date) SRC_NO from temp_table") tableEnv.createTemporaryView("temp_table2", newTable) val newTable2 = tableEnv.sqlQuery("select * from (select NAME,ROLLNO,row_number() over (partition by NAME ORDER BY SRC_NO) AS rownum from temp_table2 a) where rownum <= 1") tableEnv.toChangelogStream(newTable2).print() env.execute() {code} I am getting the below error if I run the above code. I have already provided an order by column. If I change the order by column to some other column, such as "SUBJECT", then the job runs fine. {code:java} Exception in thread "main" java.lang.RuntimeException: Error while applying rule FlinkLogicalOverAggregateConverter(in:NONE,out:LOGICAL), args [rel#245:LogicalWindow.NONE.any.None: 0.[NONE].[NONE](input=RelSubset#244,window#0=window(partition {0} rows between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()]))] at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:256) at org.apache.calcite.plan.volcano.IterativeRuleDriver.drive(IterativeRuleDriver.java:58) at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:510) at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:312) at org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62) at org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram.$anonfun$optimize$1(FlinkChainedProgram.scala:59) at scala.collection.TraversableOnce$folder$1$.apply(TraversableOnce.scala:187) at scala.collection.TraversableOnce$folder$1$.apply(TraversableOnce.scala:185) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:189) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:184) at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:108) at org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram.optimize(FlinkChainedProgram.scala:55) at org.apache.flink.table.planner.plan.optimize.StreamCommonSubGraphBasedOptimizer.optimizeTree(StreamCommonSubGraphBasedOptimizer.scala:176) at org.apache.flink.table.planner.plan.optimize.StreamCommonSubGraphBasedOptimizer.doOptimize(StreamCommonSubGraphBasedOptimizer.scala:83) at org.apache.flink.table.planner.plan.optimize.CommonSubGraphBasedOptimizer.optimize(CommonSubGraphBasedOptimizer.scala:87) at org.apache.flink.table.planner.delegation.PlannerBase.optimize(PlannerBase.scala:315) at org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:195) at org.apache.flink.table.api.bridge.internal.AbstractStreamTableEnvironmentImpl.toStreamInternal(AbstractStreamTableEnvironmentImpl.java:224) at org.apache.flink.table.api.bridge.internal.AbstractStreamTableEnvironmentImpl.toStreamInternal(AbstractStreamTableEnvironmentImpl.java:219) at org.apache.flink.table.api.bridge.scala.internal.StreamTableEnvironmentImpl.toChangelogStream(StreamTableEnvironmentImpl.scala:160) at org.example.OverAggregateBug$.main(OverAggregateBug.scala:39) at org.example.OverAggregateBug.main(OverAggregateBug.scala) Caused by: org.apache.flink.table.api.ValidationException: Over Agg: The window rank function without order by. please re-check the over window statement. at
[jira] [Assigned] (FLINK-31153) Create a release branch
[ https://issues.apache.org/jira/browse/FLINK-31153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reassigned FLINK-31153: - Assignee: Leonard Xu > Create a release branch > --- > > Key: FLINK-31153 > URL: https://issues.apache.org/jira/browse/FLINK-31153 > Project: Flink > Issue Type: Sub-task >Reporter: Matthias Pohl >Assignee: Leonard Xu >Priority: Major > > If you are doing a new minor release, you need to update Flink version in the > following repositories: > * [apache/flink|https://github.com/apache/flink] > * [apache/flink-docker|https://github.com/apache/flink-docker] > * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks] > Patch releases don't require the these repositories to be touched. Simply > checkout the already existing branch for that version: > {code:java} > $ cd ./tools > $ git checkout release-$SHORT_RELEASE_VERSION > {code} > h4. Flink repository > Create a branch for the new version that we want to release before updating > the master branch to the next development version: > {code:bash} > $ cd ./tools > $ releasing/create_snapshot_branch.sh > $ git checkout master > $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION > releasing/update_branch_version.sh > {code} > In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to > [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java] > as the last entry: > {code:java} > // ... > v1_12("1.12"), > v1_13("1.13"), > v1_14("1.14"), > v1_15("1.15"), > v1_16("1.16"); > {code} > The newly created branch and updated {{master}} branch need to be pushed to > the official repository. > h4. Flink Docker Repository > Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the > [apache/flink-docker|https://github.com/apache/flink-docker] repository. Make > sure that > [apache/flink-docker:.github/workflows/ci.yml|https://github.com/apache/flink-docker/blob/dev-master/.github/workflows/ci.yml] > points to the correct snapshot version; for {{dev-x.y}} it should point to > {{{}x.y-SNAPSHOT{}}}, while for {{dev-master}} it should point to the most > recent snapshot version (\{[$NEXT_SNAPSHOT_VERSION}}). > After pushing the new minor release branch, as the last step you should also > update the documentation workflow to also build the documentation for the new > release branch. Check [Managing > Documentation|https://cwiki.apache.org/confluence/display/FLINK/Managing+Documentation] > on details on how to do that. You may also want to manually trigger a build > to make the changes visible as soon as possible. > > > h3. Expectations (Minor Version only) > * Release branch has been created and pushed > * Cron job has been added on the release branch in > ([apache-flink:./tools/azure-pipelines/build-apache-repo.yml|https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-apache-repo.yml]) > * Originating branch has the version information updated to the new version > * New version is added to the > [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java] > enum. > * Make sure {{flink-docker}} has {{dev-x.y}} branch and docker e2e tests run > against this branch > * docs/config.toml has been updated appropriately. > * The {{dev-x.y}} branch of ({{{}$CURRENT_SNAPSHOT_VERSION{}}}) have been > created in the Flink Benchmark repo. > * The {{flink.version}} property of Flink Benchmark repo has been updated to > the latest snapshot version. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31150) Cross team testing
[ https://issues.apache.org/jira/browse/FLINK-31150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691465#comment-17691465 ] Matthias Pohl commented on FLINK-31150: --- [~renqs] created FLINK-30926 as an umbrella ticket for the 1.17 release-testing efforts. > Cross team testing > -- > > Key: FLINK-31150 > URL: https://issues.apache.org/jira/browse/FLINK-31150 > Project: Flink > Issue Type: Sub-task >Reporter: Matthias Pohl >Assignee: Qingsheng Ren >Priority: Major > > For user facing features that go into the release we'd like to ensure they > can actually _be used_ by Flink users. To achieve this the release managers > ensure that an issue for cross team testing is created in the Apache Flink > Jira. This can and should be picked up by other community members to verify > the functionality and usability of the feature. > The issue should contain some entry points which enables other community > members to test it. It should not contain documentation on how to use the > feature as this should be part of the actual documentation. The cross team > tests are performed after the feature freeze. Documentation should be in > place before that. Those tests are manual tests, so do not confuse them with > automated tests. > To sum that up: > * User facing features should be tested by other contributors > * The scope is usability and sanity of the feature > * The feature needs to be already documented > * The contributor creates an issue containing some pointers on how to get > started (e.g. link to the documentation, suggested targets of verification) > * Other community members pick those issues up and provide feedback > * Cross team testing happens right after the feature freeze -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30948) Remove flink-avro-glue-schema-registry and flink-json-glue-schema-registry from Flink main repo
[ https://issues.apache.org/jira/browse/FLINK-30948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691466#comment-17691466 ] Martijn Visser commented on FLINK-30948: Fixed in: master: 29f009b7e8c714cd5af0626e9725eb8538a4bd0f release-1.17: d58829335557dac6ce428df5a80d4244fccf4491 > Remove flink-avro-glue-schema-registry and flink-json-glue-schema-registry > from Flink main repo > --- > > Key: FLINK-30948 > URL: https://issues.apache.org/jira/browse/FLINK-30948 > Project: Flink > Issue Type: Sub-task > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) > Environment: Remove flink-avro-glue-schema-registry and > flink-json-glue-schema-registry from Flink main repo, along with associated > end-to-end tests >Reporter: Hong Liang Teoh >Assignee: Hong Liang Teoh >Priority: Major > Labels: pull-request-available > Fix For: 1.17.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-30948) Remove flink-avro-glue-schema-registry and flink-json-glue-schema-registry from Flink main repo
[ https://issues.apache.org/jira/browse/FLINK-30948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn Visser closed FLINK-30948. -- Resolution: Fixed > Remove flink-avro-glue-schema-registry and flink-json-glue-schema-registry > from Flink main repo > --- > > Key: FLINK-30948 > URL: https://issues.apache.org/jira/browse/FLINK-30948 > Project: Flink > Issue Type: Sub-task > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) > Environment: Remove flink-avro-glue-schema-registry and > flink-json-glue-schema-registry from Flink main repo, along with associated > end-to-end tests >Reporter: Hong Liang Teoh >Assignee: Hong Liang Teoh >Priority: Major > Labels: pull-request-available > Fix For: 1.17.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] dmvk commented on a diff in pull request #21981: [FLINK-21450][runtime] Support LocalRecovery by AdaptiveScheduler
dmvk commented on code in PR #21981: URL: https://github.com/apache/flink/pull/21981#discussion_r1112754334 ## flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/AdaptiveScheduler.java: ## @@ -794,7 +802,23 @@ public void goToWaitingForResources() { LOG, desiredResources, this.initialResourceAllocationTimeout, -this.resourceStabilizationTimeout)); +this.resourceStabilizationTimeout, +null)); +} + +@Override +public void goToWaitingForResources(ExecutionGraph executionGraph) { Review Comment: I'm wondering, do we need an extra `goToWaitingForResources` method? 樂 Can we do something along these lines instead? ``` final ExecutionGraph previousExecutionGraph = state.as(StateWithExecutionGraph.class) .map(StateWithExecutionGraph::getExecutionGraph) .orElse(null); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31153) Create a release branch
[ https://issues.apache.org/jira/browse/FLINK-31153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-31153: -- Description: If you are doing a new minor release, you need to update Flink version in the following repositories: * [apache/flink|https://github.com/apache/flink] * [apache/flink-docker|https://github.com/apache/flink-docker] * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks] Patch releases don't require the these repositories to be touched. Simply checkout the already existing branch for that version: {code:java} $ cd ./tools $ git checkout release-$SHORT_RELEASE_VERSION {code} h4. Flink repository Create a branch for the new version that we want to release before updating the master branch to the next development version: {code:bash} $ cd ./tools $ releasing/create_snapshot_branch.sh $ git checkout master $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION releasing/update_branch_version.sh {code} In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java] as the last entry: {code:java} // ... v1_12("1.12"), v1_13("1.13"), v1_14("1.14"), v1_15("1.15"), v1_16("1.16"); {code} The newly created branch and updated {{master}} branch need to be pushed to the official repository. h4. Flink Docker Repository Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the [apache/flink-docker|https://github.com/apache/flink-docker] repository. Make sure that [apache/flink-docker:.github/workflows/ci.yml|https://github.com/apache/flink-docker/blob/dev-master/.github/workflows/ci.yml] points to the correct snapshot version; for {{dev-x.y}} it should point to {{{}x.y-SNAPSHOT{}}}, while for {{dev-master}} it should point to the most recent snapshot version (\{[$NEXT_SNAPSHOT_VERSION}}). After pushing the new minor release branch, as the last step you should also update the documentation workflow to also build the documentation for the new release branch. Check [Managing Documentation|https://cwiki.apache.org/confluence/display/FLINK/Managing+Documentation] on details on how to do that. You may also want to manually trigger a build to make the changes visible as soon as possible. h3. Expectations (Minor Version only) * Release branch has been created and pushed * Cron job has been added on the release branch in ([apache-flink:./tools/azure-pipelines/build-apache-repo.yml|https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-apache-repo.yml]) * Originating branch has the version information updated to the new version * New version is added to the [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java] enum. * Make sure [flink-docker|https://github.com/apache/flink-docker/] has {{dev-x.y}} branch and docker e2e tests run against this branch * docs/config.toml has been updated appropriately. * The {{dev-x.y}} branch of ({{{}$CURRENT_SNAPSHOT_VERSION{}}}) have been created in the Flink Benchmark repo. * The {{flink.version}} property of Flink Benchmark repo has been updated to the latest snapshot version. was: If you are doing a new minor release, you need to update Flink version in the following repositories: * [apache/flink|https://github.com/apache/flink] * [apache/flink-docker|https://github.com/apache/flink-docker] * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks] Patch releases don't require the these repositories to be touched. Simply checkout the already existing branch for that version: {code:java} $ cd ./tools $ git checkout release-$SHORT_RELEASE_VERSION {code} h4. Flink repository Create a branch for the new version that we want to release before updating the master branch to the next development version: {code:bash} $ cd ./tools $ releasing/create_snapshot_branch.sh $ git checkout master $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION releasing/update_branch_version.sh {code} In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java] as the last entry: {code:java} // ... v1_12("1.12"), v1_13("1.13"), v1_14("1.14"), v1_15("1.15"), v1_16("1.16"); {code} The newly created branch and updated {{master}} branch need to be pushed to the official repository. h4. Flink Docker Repository Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the [apache/flink-docker|https://github.com/apache/flink-docker]
[GitHub] [flink-ml] Fanoid commented on a diff in pull request #213: [FLINK-31126] Move classes not depending on Flink runtime from flink-ml-core to flink-ml-servable-core
Fanoid commented on code in PR #213: URL: https://github.com/apache/flink-ml/pull/213#discussion_r1112756657 ## flink-ml-servable-core/pom.xml: ## @@ -0,0 +1,127 @@ + + +http://maven.apache.org/POM/4.0.0; + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + 4.0.0 + + +org.apache.flink +flink-ml-parent +2.2-SNAPSHOT + + + flink-ml-servable-core + Flink ML : Servable : Core + + + + + org.apache.flink Review Comment: @lindong28 and I had a discussion about `flink-core` in a previous draft PR [1]. We agreed this change can be put in a follow-up PR. [1] https://github.com/apache/flink-ml/pull/199#discussion_r1090122715 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-31092) Hive ITCases fail with OutOfMemoryError
[ https://issues.apache.org/jira/browse/FLINK-31092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leonard Xu reassigned FLINK-31092: -- Assignee: luoyuxia > Hive ITCases fail with OutOfMemoryError > --- > > Key: FLINK-31092 > URL: https://issues.apache.org/jira/browse/FLINK-31092 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.17.0 >Reporter: Matthias Pohl >Assignee: luoyuxia >Priority: Critical > Labels: test-stability > Attachments: VisualVM-FLINK-31092.png > > > We're experiencing a OutOfMemoryError where the heap space reaches the upper > limit: > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46161=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23142 > {code} > Feb 15 05:05:14 [INFO] Running > org.apache.flink.table.catalog.hive.HiveCatalogITCase > Feb 15 05:05:17 [INFO] java.lang.OutOfMemoryError: Java heap space > Feb 15 05:05:17 [INFO] Dumping heap to java_pid9669.hprof ... > Feb 15 05:05:28 [INFO] Heap dump file created [1957090051 bytes in 11.718 > secs] > java.lang.OutOfMemoryError: Java heap space > at > org.apache.maven.surefire.booter.ForkedBooter.cancelPingScheduler(ForkedBooter.java:209) > at > org.apache.maven.surefire.booter.ForkedBooter.acknowledgedExit(ForkedBooter.java:419) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:186) > at > org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-30951) Release Testing: Verify FLINK-29635 Hive sink should support merge files in batch mode
[ https://issues.apache.org/jira/browse/FLINK-30951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shengkai Fang closed FLINK-30951. - Resolution: Fixed > Release Testing: Verify FLINK-29635 Hive sink should support merge files in > batch mode > -- > > Key: FLINK-30951 > URL: https://issues.apache.org/jira/browse/FLINK-30951 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: luoyuxia >Assignee: Shengkai Fang >Priority: Blocker > Fix For: 1.17.0 > > Attachments: screenshot-1.png > > > The issue aims to verfiy FLINK-29635. > Please verify in batch mode, the document is in > [https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#file-compaction]: > > 1: enable auto-compaction, write some data to a Hive table which results in > the average size of files is less than compaction.small-files.avg-size(16MB > by default), verfiy these files should be merged. > 2: enable auto-compaction, set compaction.small-files.avg-size to a smaller > values, then write some data to a Hive table which results in the average > size of files is greater thant the compaction.small-files.avg-size, verfiy > these files shouldn't be merged. > 3. set sink.parallelism manually, check the parallelism of the compact > operator is equal to sink.parallelism. > 4. set compaction.parallelism manually, check the parallelism of the compact > operator is equal to compaction.parallelism. > 5. set compaction.file-size, check the size of the each target file merged is > about the `compaction.file-size`. > > We shoud verify it with writing non-partitioned table, static partition > table, dynamic partition table. > We can find the example sql for how to create & write hive table in the > codebase > [HiveTableCompactSinkITCase|[https://github.com/apache/flink/blob/0915c9850d861165e283acc0f60545cd836f0567/flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/HiveTableCompactSinkITCase.java]]. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-ml] zhipeng93 commented on a diff in pull request #213: [FLINK-31126] Move classes not depending on Flink runtime from flink-ml-core to flink-ml-servable-core
zhipeng93 commented on code in PR #213: URL: https://github.com/apache/flink-ml/pull/213#discussion_r1112772912 ## flink-ml-servable-core/pom.xml: ## @@ -0,0 +1,127 @@ + + +http://maven.apache.org/POM/4.0.0; + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + 4.0.0 + + +org.apache.flink +flink-ml-parent +2.2-SNAPSHOT + + + flink-ml-servable-core + Flink ML : Servable : Core + + + + + org.apache.flink Review Comment: Thanks for the explanation. I agree it is a reasonable solution in the long term. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-31059) Release Testing: Verify FLINK-29717 Supports hive udaf such as sum/count by native implementation
[ https://issues.apache.org/jira/browse/FLINK-31059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691472#comment-17691472 ] dalongliu commented on FLINK-31059: --- [~miamiaoxyz] Thanks for your verify, for the problem, I will see it. > Release Testing: Verify FLINK-29717 Supports hive udaf such as sum/count by > native implementation > - > > Key: FLINK-31059 > URL: https://issues.apache.org/jira/browse/FLINK-31059 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Affects Versions: 1.17.0 >Reporter: dalongliu >Assignee: miamiaoxyz >Priority: Blocker > Fix For: 1.17.0 > > Attachments: image-2023-02-21-15-45-48-226.png, > image-2023-02-21-15-46-13-966.png, image-2023-02-21-15-47-54-043.png, > image-2023-02-21-15-49-58-854.png, image-2023-02-21-15-59-44-470.png, > image-2023-02-21-16-28-22-038.png, image-2023-02-21-16-29-42-983.png, > image-2023-02-21-16-31-58-361.png, image-2023-02-21-16-35-46-294.png > > > This task aims to verify > [FLINK-29717|https://issues.apache.org/jira/browse/FLINK-29717] which > improves the hive udaf performance. > As the document [PR|https://github.com/apache/flink/pull/21789] description, > please veriy: > 1. Enabling the option `table.exec.hive.native-agg-function.enabled`, use the > sum/count/avg/min/max functions separately in the query to verify if the > hash-agg strategy is chosen via plan, and verify if the data results are the > same as when the option `table.exec.hive.native-agg-function.enabled` is > disabled. > 2. Enabling the option `table.exec.hive.native-agg-function.enabled`, combine > sum/count/avg/min/max functions in query, verify if the hash-agg strategy is > chosen via plan, and verify if the data results are the same as when option > `table.exec.hive.native-agg-function.enabled` is disabled. > 3. Enabling the option `table.exec.hive.native-agg-function.enabled`, count > or max array and other complex types in query, verify whether the > sort-agg strategy is chosen via plan, verify whether the data result is the > same as when option `table.exec.hive.native-agg-function.enabled` is disabled. > 4. Enabling the option `table.exec.hive.native-agg-function.enabled`, use the > sum/count and first_value/last_value functions in the query simultaneously, > verify that the sort-agg strategy is chosen via plan, verify that the data is > the same as when option `table.exec.hive.native-agg-function.enabled` is > disabled. > 5. Enabling the option `table.exec.hive.native-agg-function.enabled`, use the > sum/count/avg/min/max functions in the query and open sort-agg strategy > forcibly, verify that the data results are the same as when option > `table.exec.hive.native-agg-function.enabled` is disabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-31059) Release Testing: Verify FLINK-29717 Supports hive udaf such as sum/count by native implementation
[ https://issues.apache.org/jira/browse/FLINK-31059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dalongliu closed FLINK-31059. - Resolution: Fixed > Release Testing: Verify FLINK-29717 Supports hive udaf such as sum/count by > native implementation > - > > Key: FLINK-31059 > URL: https://issues.apache.org/jira/browse/FLINK-31059 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Affects Versions: 1.17.0 >Reporter: dalongliu >Assignee: miamiaoxyz >Priority: Blocker > Fix For: 1.17.0 > > Attachments: image-2023-02-21-15-45-48-226.png, > image-2023-02-21-15-46-13-966.png, image-2023-02-21-15-47-54-043.png, > image-2023-02-21-15-49-58-854.png, image-2023-02-21-15-59-44-470.png, > image-2023-02-21-16-28-22-038.png, image-2023-02-21-16-29-42-983.png, > image-2023-02-21-16-31-58-361.png, image-2023-02-21-16-35-46-294.png > > > This task aims to verify > [FLINK-29717|https://issues.apache.org/jira/browse/FLINK-29717] which > improves the hive udaf performance. > As the document [PR|https://github.com/apache/flink/pull/21789] description, > please veriy: > 1. Enabling the option `table.exec.hive.native-agg-function.enabled`, use the > sum/count/avg/min/max functions separately in the query to verify if the > hash-agg strategy is chosen via plan, and verify if the data results are the > same as when the option `table.exec.hive.native-agg-function.enabled` is > disabled. > 2. Enabling the option `table.exec.hive.native-agg-function.enabled`, combine > sum/count/avg/min/max functions in query, verify if the hash-agg strategy is > chosen via plan, and verify if the data results are the same as when option > `table.exec.hive.native-agg-function.enabled` is disabled. > 3. Enabling the option `table.exec.hive.native-agg-function.enabled`, count > or max array and other complex types in query, verify whether the > sort-agg strategy is chosen via plan, verify whether the data result is the > same as when option `table.exec.hive.native-agg-function.enabled` is disabled. > 4. Enabling the option `table.exec.hive.native-agg-function.enabled`, use the > sum/count and first_value/last_value functions in the query simultaneously, > verify that the sort-agg strategy is chosen via plan, verify that the data is > the same as when option `table.exec.hive.native-agg-function.enabled` is > disabled. > 5. Enabling the option `table.exec.hive.native-agg-function.enabled`, use the > sum/count/avg/min/max functions in the query and open sort-agg strategy > forcibly, verify that the data results are the same as when option > `table.exec.hive.native-agg-function.enabled` is disabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-ml] zhipeng93 commented on pull request #213: [FLINK-31126] Move classes not depending on Flink runtime from flink-ml-core to flink-ml-servable-core
zhipeng93 commented on PR #213: URL: https://github.com/apache/flink-ml/pull/213#issuecomment-1438119732 Thanks for the PR. LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-ml] zhipeng93 merged pull request #213: [FLINK-31126] Move classes not depending on Flink runtime from flink-ml-core to flink-ml-servable-core
zhipeng93 merged PR #213: URL: https://github.com/apache/flink-ml/pull/213 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-ml] lindong28 commented on pull request #213: [FLINK-31126] Move classes not depending on Flink runtime from flink-ml-core to flink-ml-servable-core
lindong28 commented on PR #213: URL: https://github.com/apache/flink-ml/pull/213#issuecomment-1438122258 Thank you both for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31153) Create a release branch
[ https://issues.apache.org/jira/browse/FLINK-31153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-31153: -- Description: If you are doing a new minor release, you need to update Flink version in the following repositories: * [apache/flink|https://github.com/apache/flink] * [apache/flink-docker|https://github.com/apache/flink-docker] * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks] Patch releases don't require the these repositories to be touched. Simply checkout the already existing branch for that version: {code:java} $ cd ./tools $ git checkout release-$SHORT_RELEASE_VERSION {code} h4. Flink repository Create a branch for the new version that we want to release before updating the master branch to the next development version: {code:bash} $ cd ./tools $ releasing/create_snapshot_branch.sh $ git checkout master $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION releasing/update_branch_version.sh {code} In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java] as the last entry: {code:java} // ... v1_12("1.12"), v1_13("1.13"), v1_14("1.14"), v1_15("1.15"), v1_16("1.16"); {code} The newly created branch and updated {{master}} branch need to be pushed to the official repository. h4. Flink Docker Repository Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the [apache/flink-docker|https://github.com/apache/flink-docker] repository. Make sure that [apache/flink-docker:.github/workflows/ci.yml|https://github.com/apache/flink-docker/blob/dev-master/.github/workflows/ci.yml] points to the correct snapshot version; for {{dev-x.y}} it should point to {{{}x.y-SNAPSHOT{}}}, while for {{dev-master}} it should point to the most recent snapshot version (\{[$NEXT_SNAPSHOT_VERSION}}). After pushing the new minor release branch, as the last step you should also update the documentation workflow to also build the documentation for the new release branch. Check [Managing Documentation|https://cwiki.apache.org/confluence/display/FLINK/Managing+Documentation] on details on how to do that. You may also want to manually trigger a build to make the changes visible as soon as possible. h3. Expectations (Minor Version only) * Release branch has been created and pushed * Cron job has been added on the release branch in ([apache-flink:./tools/azure-pipelines/build-apache-repo.yml|https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-apache-repo.yml]) * Changes on the new release branch are picked up by [Azure CI|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=1&_a=summary] * Originating branch has the version information updated to the new version * New version is added to the [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java] enum. * Make sure [flink-docker|https://github.com/apache/flink-docker/] has {{dev-x.y}} branch and docker e2e tests run against this branch in the corresponding Apache Flink release branch (see [apache/flink:flink-end-to-end-tests/test-scripts/common_docker.sh:51|https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/common_docker.sh#L51]) * [apache-flink:docs/config.toml|https://github.com/apache/flink/blob/release-1.17/docs/config.toml] has been updated appropriately in the new Apache Flink release branch. * The {{dev-x.y}} branch of ({{{}$CURRENT_SNAPSHOT_VERSION{}}}) have been created in the Flink Benchmark repo. * The {{flink.version}} property (see [apache/flink-benchmarks:pom.xml|https://github.com/apache/flink-benchmarks/blob/master/pom.xml#L48] of Flink Benchmark repo has been updated to the latest snapshot version. was: If you are doing a new minor release, you need to update Flink version in the following repositories: * [apache/flink|https://github.com/apache/flink] * [apache/flink-docker|https://github.com/apache/flink-docker] * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks] Patch releases don't require the these repositories to be touched. Simply checkout the already existing branch for that version: {code:java} $ cd ./tools $ git checkout release-$SHORT_RELEASE_VERSION {code} h4. Flink repository Create a branch for the new version that we want to release before updating the master branch to the next development version: {code:bash} $ cd ./tools $ releasing/create_snapshot_branch.sh $ git checkout master $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION releasing/update_branch_version.sh {code} In the {{master}} branch, add a new
[jira] [Updated] (FLINK-31147) Create a new version in JIRA
[ https://issues.apache.org/jira/browse/FLINK-31147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-31147: -- Description: When contributors resolve an issue in JIRA, they are tagging it with a release that will contain their changes. With the release currently underway, new issues should be resolved against a subsequent future release. Therefore, you should create a release item for this subsequent release, as follows: # In JIRA, navigate to the [Flink > Administration > Versions|https://issues.apache.org/jira/plugins/servlet/project-config/FLINK/versions]. # Add a new release: choose the next minor version number compared to the one currently underway, select today’s date as the Start Date, and choose Add. (Note: Only PMC members have access to the project administration. If you do not have access, ask on the mailing list for assistance.) h3. Expectations * The new version should be listed in the dropdown menu of {{fixVersion}} or {{affectedVersion}} under "unreleased versions" when creating a new Jira issue. was: When contributors resolve an issue in JIRA, they are tagging it with a release that will contain their changes. With the release currently underway, new issues should be resolved against a subsequent future release. Therefore, you should create a release item for this subsequent release, as follows: # In JIRA, navigate to the [Flink > Administration > Versions|https://issues.apache.org/jira/plugins/servlet/project-config/FLINK/versions]. # Add a new release: choose the next minor version number compared to the one currently underway, select today’s date as the Start Date, and choose Add. (Note: Only PMC members have access to the project administration. If you do not have access, ask on the mailing list for assistance.) > Create a new version in JIRA > > > Key: FLINK-31147 > URL: https://issues.apache.org/jira/browse/FLINK-31147 > Project: Flink > Issue Type: Sub-task >Reporter: Matthias Pohl >Priority: Major > > When contributors resolve an issue in JIRA, they are tagging it with a > release that will contain their changes. With the release currently underway, > new issues should be resolved against a subsequent future release. Therefore, > you should create a release item for this subsequent release, as follows: > # In JIRA, navigate to the [Flink > Administration > > Versions|https://issues.apache.org/jira/plugins/servlet/project-config/FLINK/versions]. > # Add a new release: choose the next minor version number compared to the > one currently underway, select today’s date as the Start Date, and choose Add. > (Note: Only PMC members have access to the project administration. If you do > not have access, ask on the mailing list for assistance.) > > > h3. Expectations > * The new version should be listed in the dropdown menu of {{fixVersion}} or > {{affectedVersion}} under "unreleased versions" when creating a new Jira > issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31149) Review and update documentation
[ https://issues.apache.org/jira/browse/FLINK-31149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-31149: -- Description: There are a few pages in the documentation that need to be reviewed and updated for each release. * Ensure that there exists a release notes page for each non-bugfix release (e.g., 1.5.0) in {{{}./docs/release-notes/{}}}, that it is up-to-date, and linked from the start page of the documentation. * Upgrading Applications and Flink Versions: [https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html] * ... h3. Expectations * Update upgrade compatibility table ([apache-flink:./docs/content/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content/docs/ops/upgrading.md#compatibility-table] and [apache-flink:./docs/content.zh/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content.zh/docs/ops/upgrading.md#compatibility-table]). * Update [Release Overview in Confluence|https://cwiki.apache.org/confluence/display/FLINK/Release+Management+and+Feature+Plan] * (minor only) The documentation for the new major release is visible under [https://nightlies.apache.org/flink/flink-docs-release-$SHORT_RELEASE_VERSION] (after at least one [doc build|https://github.com/apache/flink/actions/workflows/docs.yml] succeeded). * (minor only) The documentation for the new major release does not contain "-SNAPSHOT" in its version title, and all links refer to the corresponding version docs instead of {{{}master{}}}. was: There are a few pages in the documentation that need to be reviewed and updated for each release. * Ensure that there exists a release notes page for each non-bugfix release (e.g., 1.5.0) in {{{}./docs/release-notes/{}}}, that it is up-to-date, and linked from the start page of the documentation. * Upgrading Applications and Flink Versions: [https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html] * ... h3. Expectations * Update upgrade compatibility table ([apache-flink:./docs/content/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content/docs/ops/upgrading.md#compatibility-table] and [apache-flink:./docs/content.zh/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content.zh/docs/ops/upgrading.md#compatibility-table]). * Update [Release Overview in Confluence|https://cwiki.apache.org/confluence/display/FLINK/Release+Management+and+Feature+Plan] * (major only) The documentation for the new major release is visible under https://nightlies.apache.org/flink/flink-docs-release-$SHORT_RELEASE_VERSION (after at least one [doc build|https://github.com/apache/flink/actions/workflows/docs.yml] succeeded). * (major only) The documentation for the new major release does not contain "-SNAPSHOT" in its version title, and all links refer to the corresponding version docs instead of {{master}}. > Review and update documentation > --- > > Key: FLINK-31149 > URL: https://issues.apache.org/jira/browse/FLINK-31149 > Project: Flink > Issue Type: Sub-task >Reporter: Matthias Pohl >Priority: Major > > There are a few pages in the documentation that need to be reviewed and > updated for each release. > * Ensure that there exists a release notes page for each non-bugfix release > (e.g., 1.5.0) in {{{}./docs/release-notes/{}}}, that it is up-to-date, and > linked from the start page of the documentation. > * Upgrading Applications and Flink Versions: > [https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html] > * ... > > > h3. Expectations > * Update upgrade compatibility table > ([apache-flink:./docs/content/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content/docs/ops/upgrading.md#compatibility-table] > and > [apache-flink:./docs/content.zh/docs/ops/upgrading.md|https://github.com/apache/flink/blob/master/docs/content.zh/docs/ops/upgrading.md#compatibility-table]). > * Update [Release Overview in > Confluence|https://cwiki.apache.org/confluence/display/FLINK/Release+Management+and+Feature+Plan] > * (minor only) The documentation for the new major release is visible under > [https://nightlies.apache.org/flink/flink-docs-release-$SHORT_RELEASE_VERSION] > (after at least one [doc > build|https://github.com/apache/flink/actions/workflows/docs.yml] succeeded). > * (minor only) The documentation for the new major release does not contain > "-SNAPSHOT" in its version title, and all links refer to the corresponding > version docs instead of {{{}master{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] reswqa commented on a diff in pull request #21977: [FLINK-31132] Compact without setting parallelism does not follow the configured sink parallelism for HiveTableSink
reswqa commented on code in PR #21977: URL: https://github.com/apache/flink/pull/21977#discussion_r1112641083 ## flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/HiveTableCompactSinkITCase.java: ## @@ -42,17 +44,18 @@ import static org.assertj.core.api.Assertions.assertThat; /** IT case for Hive table compaction in batch mode. */ -public class HiveTableCompactSinkITCase { +@ExtendWith(TestLoggerExtension.class) +class HiveTableCompactSinkITCase { @RegisterExtension -private static final MiniClusterExtension MINI_CLUSTER = new MiniClusterExtension(); +public static final MiniClusterExtension MINI_CLUSTER = new MiniClusterExtension(); Review Comment: TBH, The reason I changed it is only to follow the usage in the `MiniClusterExtension` class [javadoc](https://github.com/apache/flink/blob/bacdc326b58749924acbd8921d63eda06663a225/flink-test-utils-parent/flink-test-utils/src/main/java/org/apache/flink/test/junit5/MiniClusterExtension.java#L70). At the same time, in the `architecture check` of IT case, we can see similar [requirement](https://github.com/apache/flink/blob/bacdc326b58749924acbd8921d63eda06663a225/flink-architecture-tests/flink-architecture-tests-test/src/main/java/org/apache/flink/architecture/rules/ITCaseRules.java#L67). Unfortunately, I looked at the actual check logic and didn't seem to require it to be public, which really confused me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-ml] lindong28 commented on pull request #214: [FLINK-31127] Add public API classes for FLIP-289
lindong28 commented on PR #214: URL: https://github.com/apache/flink-ml/pull/214#issuecomment-1438134181 @jiangxin369 Can you help review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] FangYongs commented on pull request #21956: [FLINK-31113] Support AND filter push down in orc
FangYongs commented on PR #21956: URL: https://github.com/apache/flink/pull/21956#issuecomment-1438133167 Thanks @libenchao , I have updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31152) Setup for the executing release manager
[ https://issues.apache.org/jira/browse/FLINK-31152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-31152: -- Description: h4. GPG Key You need to have a GPG key to sign the release artifacts. Please be aware of the ASF-wide [release signing guidelines|https://www.apache.org/dev/release-signing.html]. If you don’t have a GPG key associated with your Apache account, please create one according to the guidelines. Determine your Apache GPG Key and Key ID, as follows: {code:java} $ gpg --list-keys {code} This will list your GPG keys. One of these should reflect your Apache account, for example: {code:java} -- pub 2048R/845E6689 2016-02-23 uid Nomen Nescio sub 2048R/BA4D50BE 2016-02-23 {code} In the example above, the key ID is the 8-digit hex string in the {{pub}} line: {{{}845E6689{}}}. Now, add your Apache GPG key to the Flink’s {{KEYS}} file in the [Apache Flink release KEYS file|https://dist.apache.org/repos/dist/release/flink/KEYS] repository at [dist.apache.org|http://dist.apache.org/]. Follow the instructions listed at the top of these files. (Note: Only PMC members have write access to the release repository. If you end up getting 403 errors ask on the mailing list for assistance.) Configure {{git}} to use this key when signing code by giving it your key ID, as follows: {code:java} $ git config --global user.signingkey 845E6689 {code} You may drop the {{--global}} option if you’d prefer to use this key for the current repository only. You may wish to start {{gpg-agent}} to unlock your GPG key only once using your passphrase. Otherwise, you may need to enter this passphrase hundreds of times. The setup for {{gpg-agent}} varies based on operating system, but may be something like this: {code:bash} $ eval $(gpg-agent --daemon --no-grab --write-env-file $HOME/.gpg-agent-info) $ export GPG_TTY=$(tty) $ export GPG_AGENT_INFO {code} h4. Access to Apache Nexus repository Configure access to the [Apache Nexus repository|https://repository.apache.org/], which enables final deployment of releases to the Maven Central Repository. # You log in with your Apache account. # Confirm you have appropriate access by finding {{org.apache.flink}} under {{{}Staging Profiles{}}}. # Navigate to your {{Profile}} (top right drop-down menu of the page). # Choose {{User Token}} from the dropdown, then click {{{}Access User Token{}}}. Copy a snippet of the Maven XML configuration block. # Insert this snippet twice into your global Maven {{settings.xml}} file, typically {{{}${HOME}/.m2/settings.xml{}}}. The end result should look like this, where {{TOKEN_NAME}} and {{TOKEN_PASSWORD}} are your secret tokens: {code:xml} apache.releases.https TOKEN_NAME TOKEN_PASSWORD apache.snapshots.https TOKEN_NAME TOKEN_PASSWORD {code} h4. Website development setup Get ready for updating the Flink website by following the [website development instructions|https://flink.apache.org/contributing/improve-website.html]. h4. GNU Tar Setup for Mac (Skip this step if you are not using a Mac) The default tar application on Mac does not support GNU archive format and defaults to Pax. This bloats the archive with unnecessary metadata that can result in additional files when decompressing (see [1.15.2-RC2 vote thread|https://lists.apache.org/thread/mzbgsb7y9vdp9bs00gsgscsjv2ygy58q]). Install gnu-tar and create a symbolic link to use in preference of the default tar program. {code:bash} $ brew install gnu-tar $ ln -s /usr/local/bin/gtar /usr/local/bin/tar $ which tar {code} h3. Expectations * Release Manager’s GPG key is published to [dist.apache.org|http://dist.apache.org/] * Release Manager’s GPG key is configured in git configuration * Release Manager's GPG key is configured as the default gpg key. * Release Manager has {{org.apache.flink}} listed under Staging Profiles in Nexus * Release Manager’s Nexus User Token is configured in settings.xml was: h4. GPG Key You need to have a GPG key to sign the release artifacts. Please be aware of the ASF-wide [release signing guidelines|https://www.apache.org/dev/release-signing.html]. If you don’t have a GPG key associated with your Apache account, please create one according to the guidelines. Determine your Apache GPG Key and Key ID, as follows: {code:java} $ gpg --list-keys {code} This will list your GPG keys. One of these should reflect your Apache account, for example: {code:java} -- pub 2048R/845E6689 2016-02-23 uid Nomen Nescio sub 2048R/BA4D50BE 2016-02-23 {code} In the example above, the key ID is the 8-digit hex string in the {{pub}} line: {{{}845E6689{}}}. Now, add your Apache GPG key to the Flink’s {{KEYS}} file in the [Apache
[jira] [Updated] (FLINK-31144) Slow scheduling on large-scale batch jobs
[ https://issues.apache.org/jira/browse/FLINK-31144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Tournay updated FLINK-31144: --- Attachment: image-2023-02-21-10-29-49-388.png > Slow scheduling on large-scale batch jobs > -- > > Key: FLINK-31144 > URL: https://issues.apache.org/jira/browse/FLINK-31144 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: Julien Tournay >Priority: Major > Attachments: flink-1.17-snapshot-1676473798013.nps, > image-2023-02-21-10-29-49-388.png > > > When executing a complex job graph at high parallelism > `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` can > get slow and cause long pauses where the JobManager becomes unresponsive and > all the taskmanagers just wait. I've attached a VisualVM snapshot to > illustrate the problem.[^flink-1.17-snapshot-1676473798013.nps] > At Spotify we have complex jobs where this issue can cause batch "pause" of > 40+ minutes and make the overall execution 30% slower or more. > More importantly this prevent us from running said jobs on larger cluster as > adding resources to the cluster worsen the issue. > We have successfully tested a modified Flink version where > `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` was > completely commented and simply returns an empty collection and confirmed it > solves the issue. > In the same spirit as a recent change > ([https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L98-L102)] > there could be a mechanism in place to detect when Flink run into this > specific issue and just skip the call to `getInputLocationFutures` > [https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L105-L108.] > I'm not familiar enough with the internals of Flink to propose a more > advanced fix, however it seems like a configurable threshold on the number of > consumer vertices above which the preferred location is not computed would > do. If this solution is good enough, I'd be happy to submit a PR. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-ml] zhipeng93 commented on a diff in pull request #212: [FLINK-31125] Flink ML benchmark framework should minimize the source operator overhead
zhipeng93 commented on code in PR #212: URL: https://github.com/apache/flink-ml/pull/212#discussion_r1112795550 ## flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/datagenerator/common/DenseVectorArrayGenerator.java: ## @@ -42,7 +42,7 @@ protected RowGenerator[] getRowGenerators() { return new RowGenerator[] { new RowGenerator(getNumValues(), getSeed()) { @Override -protected Row nextRow() { +protected Row getRow() { Review Comment: Reusing one row to simulate the input may cause the benchmark result not stable, since processing two different rows may take different time. The two optimizations introduced in this PR give us 12% speedup. Is there a breakdown analysis of this two optimization? If the data generation cost is indeed high, can we sample a fixed number of rows and reuse it later? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-ml] zhipeng93 commented on a diff in pull request #212: [FLINK-31125] Flink ML benchmark framework should minimize the source operator overhead
zhipeng93 commented on code in PR #212: URL: https://github.com/apache/flink-ml/pull/212#discussion_r1112795550 ## flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/datagenerator/common/DenseVectorArrayGenerator.java: ## @@ -42,7 +42,7 @@ protected RowGenerator[] getRowGenerators() { return new RowGenerator[] { new RowGenerator(getNumValues(), getSeed()) { @Override -protected Row nextRow() { +protected Row getRow() { Review Comment: Reusing one row to simulate the input may cause the benchmark result not stable, since processing two different rows may take different time. The two optimizations introduced in this PR give us 12% speedup. Is there a breakdown analysis of this two optimization? If the data generation cost is indeed high, can we sample a fixed number of rows and reuse it in the evalution? ## flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/datagenerator/common/DenseVectorArrayGenerator.java: ## @@ -42,7 +42,7 @@ protected RowGenerator[] getRowGenerators() { return new RowGenerator[] { new RowGenerator(getNumValues(), getSeed()) { @Override -protected Row nextRow() { +protected Row getRow() { Review Comment: Reusing one row to simulate the input may cause the benchmark result not stable, since processing two different rows may take different time. The two optimizations introduced in this PR give us 12% speedup. Is there a breakdown analysis of this two optimization? If the data generation cost is indeed high, can we sample a fixed number of rows and reuse it in the evaluation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-31144) Slow scheduling on large-scale batch jobs
[ https://issues.apache.org/jira/browse/FLINK-31144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691474#comment-17691474 ] Julien Tournay commented on FLINK-31144: Hi [~huwh], Thank you for the quick reply :) {quote}when the topology is complex. {quote} Indeed. For the issue to be noticeable, the jobgraph has to be fairly complex, feature all-to-all distributions and execute with a high parallelism. {quote}1. Is the slow scheduling or the scheduled result of location preferred make your job slow? {quote} Yes it very much does. We have a job that takes ~2h30 (after many many tweaks to get the best possible perf.). It's impossible to get it to run in less time because adding more taskmanagers make the scheduling slow and overall the execution gets longer. Removing preferred location makes it possible to run it in less that 2h (We're aiming at ~1h45min). {quote}2. "we have complex jobs where this issue can cause batch "pause" of 40+ minutes" What does "pause" meaning? Is the getPreferredLocationsBasedOnInputs take more than 40+ minutes? {quote} By "pause" I mean that at the beginning of the execution, the taskmanagers will wait for the JobManager for ~40min and then will start processing. With Flink 1.17 and no preferred location, the "pause" is down to ~5min. I should also mention the JM is very unresponsive and the web console struggles the show anything. {quote}Could you provide the topology of the complex job. {quote} I can but not sure what format to use. The graph is quite big and a simple screenshot is unreadable: !image-2023-02-21-10-29-49-388.png! I can maybe share the archived execution json file (~500Mb) if that's helpful ? > Slow scheduling on large-scale batch jobs > -- > > Key: FLINK-31144 > URL: https://issues.apache.org/jira/browse/FLINK-31144 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: Julien Tournay >Priority: Major > Attachments: flink-1.17-snapshot-1676473798013.nps, > image-2023-02-21-10-29-49-388.png > > > When executing a complex job graph at high parallelism > `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` can > get slow and cause long pauses where the JobManager becomes unresponsive and > all the taskmanagers just wait. I've attached a VisualVM snapshot to > illustrate the problem.[^flink-1.17-snapshot-1676473798013.nps] > At Spotify we have complex jobs where this issue can cause batch "pause" of > 40+ minutes and make the overall execution 30% slower or more. > More importantly this prevent us from running said jobs on larger cluster as > adding resources to the cluster worsen the issue. > We have successfully tested a modified Flink version where > `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` was > completely commented and simply returns an empty collection and confirmed it > solves the issue. > In the same spirit as a recent change > ([https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L98-L102)] > there could be a mechanism in place to detect when Flink run into this > specific issue and just skip the call to `getInputLocationFutures` > [https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L105-L108.] > I'm not familiar enough with the internals of Flink to propose a more > advanced fix, however it seems like a configurable threshold on the number of > consumer vertices above which the preferred location is not computed would > do. If this solution is good enough, I'd be happy to submit a PR. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31153) Create a release branch
[ https://issues.apache.org/jira/browse/FLINK-31153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-31153: -- Description: If you are doing a new minor release, you need to update Flink version in the following repositories: * [apache/flink|https://github.com/apache/flink] * [apache/flink-docker|https://github.com/apache/flink-docker] * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks] Patch releases don't require the these repositories to be touched. Simply checkout the already existing branch for that version: {code:java} $ cd ./tools $ git checkout release-$SHORT_RELEASE_VERSION {code} h4. Flink repository Create a branch for the new version that we want to release before updating the master branch to the next development version: {code:bash} $ cd ./tools $ releasing/create_snapshot_branch.sh $ git checkout master $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$NEXT_SNAPSHOT_VERSION releasing/update_branch_version.sh {code} In the {{master}} branch, add a new value (e.g. {{{}v1_16("1.16"){}}}) to [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java] as the last entry: {code:java} // ... v1_12("1.12"), v1_13("1.13"), v1_14("1.14"), v1_15("1.15"), v1_16("1.16"); {code} The newly created branch and updated {{master}} branch need to be pushed to the official repository. h4. Flink Docker Repository Afterwards fork off from {{dev-master}} a {{dev-x.y}} branch in the [apache/flink-docker|https://github.com/apache/flink-docker] repository. Make sure that [apache/flink-docker:.github/workflows/ci.yml|https://github.com/apache/flink-docker/blob/dev-master/.github/workflows/ci.yml] points to the correct snapshot version; for {{dev-x.y}} it should point to {{{}x.y-SNAPSHOT{}}}, while for {{dev-master}} it should point to the most recent snapshot version (\{[$NEXT_SNAPSHOT_VERSION}}). After pushing the new minor release branch, as the last step you should also update the documentation workflow to also build the documentation for the new release branch. Check [Managing Documentation|https://cwiki.apache.org/confluence/display/FLINK/Managing+Documentation] on details on how to do that. You may also want to manually trigger a build to make the changes visible as soon as possible. h3. Expectations (Minor Version only) * Release branch has been created and pushed * Cron job has been added on the release branch in ([apache-flink:./tools/azure-pipelines/build-apache-repo.yml|https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-apache-repo.yml]) * Changes on the new release branch are picked up by [Azure CI|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=1&_a=summary] * {{master}} branch has the version information updated to the new version (check pom.xml files and * [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java] enum) * New version is added to the [apache-flink:flink-annotations/src/main/java/org/apache/flink/FlinkVersion|https://github.com/apache/flink/blob/master/flink-annotations/src/main/java/org/apache/flink/FlinkVersion.java] enum. * Make sure [flink-docker|https://github.com/apache/flink-docker/] has {{dev-x.y}} branch and docker e2e tests run against this branch in the corresponding Apache Flink release branch (see [apache/flink:flink-end-to-end-tests/test-scripts/common_docker.sh:51|https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/common_docker.sh#L51]) * [apache-flink:docs/config.toml|https://github.com/apache/flink/blob/release-1.17/docs/config.toml] has been updated appropriately in the new Apache Flink release branch. * The {{flink.version}} property (see [apache/flink-benchmarks:pom.xml|https://github.com/apache/flink-benchmarks/blob/master/pom.xml#L48] of Flink Benchmark repo has been updated to the latest snapshot version. was: If you are doing a new minor release, you need to update Flink version in the following repositories: * [apache/flink|https://github.com/apache/flink] * [apache/flink-docker|https://github.com/apache/flink-docker] * [apache/flink-benchmarks|https://github.com/apache/flink-benchmarks] Patch releases don't require the these repositories to be touched. Simply checkout the already existing branch for that version: {code:java} $ cd ./tools $ git checkout release-$SHORT_RELEASE_VERSION {code} h4. Flink repository Create a branch for the new version that we want to release before updating the master branch to the next development version: {code:bash} $ cd ./tools $ releasing/create_snapshot_branch.sh $ git checkout master $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION
[GitHub] [flink-ml] zhipeng93 closed pull request #176: Bump jackson-databind from 2.12.6.1 to 2.12.7.1
zhipeng93 closed pull request #176: Bump jackson-databind from 2.12.6.1 to 2.12.7.1 URL: https://github.com/apache/flink-ml/pull/176 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-ml] dependabot[bot] commented on pull request #176: Bump jackson-databind from 2.12.6.1 to 2.12.7.1
dependabot[bot] commented on PR #176: URL: https://github.com/apache/flink-ml/pull/176#issuecomment-1438157269 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting `@dependabot ignore this major version` or `@dependabot ignore this minor version`. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] dmvk commented on a diff in pull request #21981: [FLINK-21450][runtime] Support LocalRecovery by AdaptiveScheduler
dmvk commented on code in PR #21981: URL: https://github.com/apache/flink/pull/21981#discussion_r1112810683 ## flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.scheduler.adaptive.allocator; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.runtime.clusterframework.types.AllocationID; +import org.apache.flink.runtime.executiongraph.ExecutionGraph; +import org.apache.flink.runtime.executiongraph.ExecutionJobVertex; +import org.apache.flink.runtime.executiongraph.ExecutionVertex; +import org.apache.flink.runtime.jobgraph.JobVertexID; +import org.apache.flink.runtime.jobmaster.SlotInfo; +import org.apache.flink.runtime.scheduler.adaptive.allocator.SlotSharingSlotAllocator.ExecutionSlotSharingGroup; +import org.apache.flink.runtime.scheduler.adaptive.allocator.SlotSharingSlotAllocator.ExecutionSlotSharingGroupAndSlot; +import org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID; +import org.apache.flink.runtime.state.KeyGroupRange; +import org.apache.flink.runtime.state.KeyGroupRangeAssignment; +import org.apache.flink.util.Preconditions; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.Comparator; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.PriorityQueue; +import java.util.stream.StreamSupport; + +import static java.util.function.Function.identity; +import static java.util.stream.Collectors.toMap; + +/** A {@link SlotAssigner} that assigns slots based on the number of local key groups. */ +@Internal +public class StateLocalitySlotAssigner implements SlotAssigner { + +private static class AllocationScore implements Comparable { + +private final String group; +private final AllocationID allocationId; + +public AllocationScore(String group, AllocationID allocationId, int score) { +this.group = group; +this.allocationId = allocationId; +this.score = score; +} + +private final int score; + +public String getGroup() { +return group; +} + +public AllocationID getAllocationId() { +return allocationId; +} + +public int getScore() { +return score; +} + +@Override +public int compareTo(StateLocalitySlotAssigner.AllocationScore other) { +int result = Integer.compare(score, other.score); +if (result != 0) { +return result; +} +result = other.allocationId.compareTo(allocationId); +if (result != 0) { +return result; +} +return other.group.compareTo(group); +} +} + +private final Map> locality; +private final Map maxParallelism; + +public StateLocalitySlotAssigner(ExecutionGraph archivedExecutionGraph) { +this( +calculateLocalKeyGroups(archivedExecutionGraph), +StreamSupport.stream( + archivedExecutionGraph.getVerticesTopologically().spliterator(), +false) +.collect( +toMap( +ExecutionJobVertex::getJobVertexId, + ExecutionJobVertex::getMaxParallelism))); +} + +public StateLocalitySlotAssigner( +Map> locality, +Map maxParallelism) { +this.locality = locality; +this.maxParallelism = maxParallelism; +} + +@Override +public AssignmentResult assignSlots( +Collection slots, Collection groups) { + +final Map parallelism = new HashMap<>(); +groups.forEach( +group -> +group.getContainedExecutionVertices() +.forEach( +evi -> +parallelism.merge( +
[GitHub] [flink] dmvk commented on a diff in pull request #21981: [FLINK-21450][runtime] Support LocalRecovery by AdaptiveScheduler
dmvk commented on code in PR #21981: URL: https://github.com/apache/flink/pull/21981#discussion_r1112815613 ## flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/SlotSharingSlotAllocator.java: ## @@ -121,16 +133,13 @@ public Optional determineParallelism( slotSharingGroupParallelism.get( slotSharingGroup.getSlotSharingGroupId())); -final Iterable sharedSlotToVertexAssignment = +final List sharedSlotToVertexAssignment = createExecutionSlotSharingGroups(vertexParallelism); -for (ExecutionSlotSharingGroup executionSlotSharingGroup : -sharedSlotToVertexAssignment) { -final SlotInfo slotInfo = slotIterator.next(); - -assignments.add( -new ExecutionSlotSharingGroupAndSlot(executionSlotSharingGroup, slotInfo)); -} +SlotAssigner.AssignmentResult result = +slotAssigner.assignSlots(freeSlots, sharedSlotToVertexAssignment); +assignments.addAll(result.assignments); +freeSlots = result.remainingSlots; Review Comment: 樂 Is there a reason to run this within the loop at all? Can we simply move this to after the loop? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31142) Some queries lead to abrupt sql client close
[ https://issues.apache.org/jira/browse/FLINK-31142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Nuyanzin updated FLINK-31142: Priority: Blocker (was: Major) > Some queries lead to abrupt sql client close > > > Key: FLINK-31142 > URL: https://issues.apache.org/jira/browse/FLINK-31142 > Project: Flink > Issue Type: Bug > Components: Table SQL / Client >Affects Versions: 1.17.0 >Reporter: Sergey Nuyanzin >Priority: Blocker > > Although the behavior has been changed in 1.17.0, I'm not sure whether it is > a blocker or not, since in both cases it is invalid query. > The difference in the behavior is that before 1.17.0 > a query like > {code:sql} > select /* multiline comment; > {code} > fails to execute and sql client prompts to submit another query. > In 1.17.0 it shuts down the session failing with > {noformat} > Exception in thread "main" org.apache.flink.table.client.SqlClientException: > Could not read from command line. > at > org.apache.flink.table.client.cli.CliClient.getAndExecuteStatements(CliClient.java:205) > at > org.apache.flink.table.client.cli.CliClient.executeInteractive(CliClient.java:168) > at > org.apache.flink.table.client.cli.CliClient.executeInInteractiveMode(CliClient.java:113) > at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:169) > at org.apache.flink.table.client.SqlClient.start(SqlClient.java:118) > at > org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:228) > at org.apache.flink.table.client.SqlClient.main(SqlClient.java:179) > Caused by: org.apache.flink.sql.parser.impl.TokenMgrError: Lexical error at > line 1, column 29. Encountered: after : "" > at > org.apache.flink.sql.parser.impl.FlinkSqlParserImplTokenManager.getNextToken(FlinkSqlParserImplTokenManager.java:26752) > at > org.apache.flink.table.client.cli.parser.SqlCommandParserImpl$TokenIterator.scan(SqlCommandParserImpl.java:89) > at > org.apache.flink.table.client.cli.parser.SqlCommandParserImpl$TokenIterator.next(SqlCommandParserImpl.java:81) > at > org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.checkIncompleteStatement(SqlCommandParserImpl.java:141) > at > org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.getCommand(SqlCommandParserImpl.java:111) > at > org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.parseStatement(SqlCommandParserImpl.java:52) > at > org.apache.flink.table.client.cli.parser.SqlMultiLineParser.parse(SqlMultiLineParser.java:82) > at > org.jline.reader.impl.LineReaderImpl.acceptLine(LineReaderImpl.java:2964) > at > org.jline.reader.impl.LineReaderImpl$1.apply(LineReaderImpl.java:3778) > at > org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:679) > at > org.apache.flink.table.client.cli.CliClient.getAndExecuteStatements(CliClient.java:183) > ... 6 more > Shutting down the session... > done. > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31142) Some queries lead to abrupt sql client close
[ https://issues.apache.org/jira/browse/FLINK-31142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Nuyanzin updated FLINK-31142: Description: Although the behavior has been changed in 1.17.0, I'm not sure whether it is a blocker or not, since in both cases it is invalid query. I put it to blocker just because of regression. The difference in the behavior is that before 1.17.0 a query like {code:sql} select /* multiline comment; {code} fails to execute and sql client prompts to submit another query. In 1.17.0 it shuts down the session failing with {noformat} Exception in thread "main" org.apache.flink.table.client.SqlClientException: Could not read from command line. at org.apache.flink.table.client.cli.CliClient.getAndExecuteStatements(CliClient.java:205) at org.apache.flink.table.client.cli.CliClient.executeInteractive(CliClient.java:168) at org.apache.flink.table.client.cli.CliClient.executeInInteractiveMode(CliClient.java:113) at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:169) at org.apache.flink.table.client.SqlClient.start(SqlClient.java:118) at org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:228) at org.apache.flink.table.client.SqlClient.main(SqlClient.java:179) Caused by: org.apache.flink.sql.parser.impl.TokenMgrError: Lexical error at line 1, column 29. Encountered: after : "" at org.apache.flink.sql.parser.impl.FlinkSqlParserImplTokenManager.getNextToken(FlinkSqlParserImplTokenManager.java:26752) at org.apache.flink.table.client.cli.parser.SqlCommandParserImpl$TokenIterator.scan(SqlCommandParserImpl.java:89) at org.apache.flink.table.client.cli.parser.SqlCommandParserImpl$TokenIterator.next(SqlCommandParserImpl.java:81) at org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.checkIncompleteStatement(SqlCommandParserImpl.java:141) at org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.getCommand(SqlCommandParserImpl.java:111) at org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.parseStatement(SqlCommandParserImpl.java:52) at org.apache.flink.table.client.cli.parser.SqlMultiLineParser.parse(SqlMultiLineParser.java:82) at org.jline.reader.impl.LineReaderImpl.acceptLine(LineReaderImpl.java:2964) at org.jline.reader.impl.LineReaderImpl$1.apply(LineReaderImpl.java:3778) at org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:679) at org.apache.flink.table.client.cli.CliClient.getAndExecuteStatements(CliClient.java:183) ... 6 more Shutting down the session... done. {noformat} was: Although the behavior has been changed in 1.17.0, I'm not sure whether it is a blocker or not, since in both cases it is invalid query. The difference in the behavior is that before 1.17.0 a query like {code:sql} select /* multiline comment; {code} fails to execute and sql client prompts to submit another query. In 1.17.0 it shuts down the session failing with {noformat} Exception in thread "main" org.apache.flink.table.client.SqlClientException: Could not read from command line. at org.apache.flink.table.client.cli.CliClient.getAndExecuteStatements(CliClient.java:205) at org.apache.flink.table.client.cli.CliClient.executeInteractive(CliClient.java:168) at org.apache.flink.table.client.cli.CliClient.executeInInteractiveMode(CliClient.java:113) at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:169) at org.apache.flink.table.client.SqlClient.start(SqlClient.java:118) at org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:228) at org.apache.flink.table.client.SqlClient.main(SqlClient.java:179) Caused by: org.apache.flink.sql.parser.impl.TokenMgrError: Lexical error at line 1, column 29. Encountered: after : "" at org.apache.flink.sql.parser.impl.FlinkSqlParserImplTokenManager.getNextToken(FlinkSqlParserImplTokenManager.java:26752) at org.apache.flink.table.client.cli.parser.SqlCommandParserImpl$TokenIterator.scan(SqlCommandParserImpl.java:89) at org.apache.flink.table.client.cli.parser.SqlCommandParserImpl$TokenIterator.next(SqlCommandParserImpl.java:81) at org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.checkIncompleteStatement(SqlCommandParserImpl.java:141) at org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.getCommand(SqlCommandParserImpl.java:111) at org.apache.flink.table.client.cli.parser.SqlCommandParserImpl.parseStatement(SqlCommandParserImpl.java:52) at org.apache.flink.table.client.cli.parser.SqlMultiLineParser.parse(SqlMultiLineParser.java:82) at org.jline.reader.impl.LineReaderImpl.acceptLine(LineReaderImpl.java:2964) at
[jira] [Updated] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-31168: -- Description: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 We see this build failure because a job couldn't be found: {code} java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error while waiting for job to be initialized at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) at org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) at org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) at org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error while waiting for job to be initialized at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) at org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) ... 4 more Caused by: java.lang.RuntimeException: Error while waiting for job to be initialized at org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160) at org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82) at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) at java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.util.RestClientException: [org.apache.flink.runtime.rest.NotFoundException: Job 865dcd87f4828dbeb3d93eb52e2636b1 not found at org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99) at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.lambda$getExecutionGraphInternal$0(DefaultExecutionGraphCache.java:109) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:252) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1387) at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
[jira] [Updated] (FLINK-31133) PartiallyFinishedSourcesITCase hangs if a checkpoint fails
[ https://issues.apache.org/jira/browse/FLINK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Khachatryan updated FLINK-31133: -- Summary: PartiallyFinishedSourcesITCase hangs if a checkpoint fails (was: AdaptiveSchedulerITCase took extraordinary long to finish) > PartiallyFinishedSourcesITCase hangs if a checkpoint fails > -- > > Key: FLINK-31133 > URL: https://issues.apache.org/jira/browse/FLINK-31133 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3 >Reporter: Matthias Pohl >Assignee: Roman Khachatryan >Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b > This build ran into a timeout. Based on the stacktraces reported, it was > either caused by > [SnapshotMigrationTestBase.restoreAndExecute|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=13475]: > {code} > "main" #1 prio=5 os_prio=0 tid=0x7f23d800b800 nid=0x60cdd waiting on > condition [0x7f23e1c0d000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.flink.test.checkpointing.utils.SnapshotMigrationTestBase.restoreAndExecute(SnapshotMigrationTestBase.java:382) > at > org.apache.flink.test.migration.TypeSerializerSnapshotMigrationITCase.testSnapshot(TypeSerializerSnapshotMigrationITCase.java:172) > at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source) > [...] > {code} > or > [PartiallyFinishedSourcesITCase.test|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=10401]: > {code} > 2023-02-20T07:13:05.6084711Z "main" #1 prio=5 os_prio=0 > tid=0x7fd35c00b800 nid=0x8c8a waiting on condition [0x7fd363d0f000] > 2023-02-20T07:13:05.6085149Zjava.lang.Thread.State: TIMED_WAITING > (sleeping) > 2023-02-20T07:13:05.6085487Z at java.lang.Thread.sleep(Native Method) > 2023-02-20T07:13:05.6085925Z at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145) > 2023-02-20T07:13:05.6086512Z at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:138) > 2023-02-20T07:13:05.6087103Z at > org.apache.flink.runtime.testutils.CommonTestUtils.waitForSubtasksToFinish(CommonTestUtils.java:291) > 2023-02-20T07:13:05.6087730Z at > org.apache.flink.runtime.operators.lifecycle.TestJobExecutor.waitForSubtasksToFinish(TestJobExecutor.java:226) > 2023-02-20T07:13:05.6088410Z at > org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:138) > 2023-02-20T07:13:05.6088957Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [...] > {code} > Still, it sounds odd: Based on a code analysis it's quite unlikely that those > two caused the issue. The former one has a 5 min timeout (see related code in > [SnapshotMigrationTestBase:382|https://github.com/apache/flink/blob/release-1.15/flink-tests/src/test/java/org/apache/flink/test/checkpointing/utils/SnapshotMigrationTestBase.java#L382]). > For the other one, we found it being not responsible in the past when some > other concurrent test caused the issue (see FLINK-30261). > An investigation on where we lose the time for the timeout revealed that > {{AdaptiveSchedulerITCase}} took 2980s to finish (see [build > logs|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=5265]). > {code} > 2023-02-20T03:43:55.4546050Z Feb 20 03:43:55 [ERROR] Picked up > JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError > 2023-02-20T03:43:58.0448506Z Feb 20 03:43:58 [INFO] Running > org.apache.flink.test.scheduling.AdaptiveSchedulerITCase > 2023-02-20T04:33:38.6824634Z Feb 20 04:33:38 [INFO] Tests run: 6, Failures: > 0, Errors: 0, Skipped: 0, Time elapsed: 2,980.445 s - in > org.apache.flink.test.scheduling.AdaptiveSchedulerITCase > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] Mulavar commented on pull request #21545: [FLINK-30396][table]make alias hint take effect in correlate
Mulavar commented on PR #21545: URL: https://github.com/apache/flink/pull/21545#issuecomment-1438817485 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-31138) Either StreamCheckpointingITCase/StreamFaultToleranceTestBase or EventTimeWindowCheckpointingITCase are timinng out
[ https://issues.apache.org/jira/browse/FLINK-31138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691725#comment-17691725 ] Roman Khachatryan commented on FLINK-31138: --- Thanks [~mapohl] and [~fanrui] , In the artifacts (logs-cron_azure-test_cron_azure_finegrained_resource_management-1676692614/mvn-1.log), I've found the same problem with PartiallyFinishedSourcesITCase at 04:46 as in FLINK-31133: checkpoint failure -> running for too long -> no space left on device. So I'd close this ticket as a duplicate. However, there are some strange exceptions before that: [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46283=logs=4d4a0d10-fca2-5507-8eed-c07f0bdf4887=7b25afdf-cc6c-566f-5459-359dc2585798=5584] {code:java} Feb 18 03:58:47 [INFO] Running org.apache.flink.core.memory.OffHeapUnsafeMemorySegmentTest Exception in thread "Thread-13" java.lang.IllegalStateException: MemorySegment can be freed only once! at org.apache.flink.core.memory.MemorySegment.free(MemorySegment.java:244) at java.lang.Thread.run(Thread.java:748) Exception in thread "Thread-15" java.lang.IllegalStateException: MemorySegment can be freed only once! at org.apache.flink.core.memory.MemorySegment.free(MemorySegment.java:244) at java.lang.Thread.run(Thread.java:748) Exception in thread "Thread-17" java.lang.IllegalStateException: MemorySegment can be freed only once! at org.apache.flink.core.memory.MemorySegment.free(MemorySegment.java:244) at java.lang.Thread.run(Thread.java:748){code} Starting at 04:08: {code:java} 04:32:18,352 [flink-akka.actor.default-dispatcher-8] INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job f1ec611893996fc2fc1830697195194b reached terminal state FAILED. org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:301) at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:291) at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:282) at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:739) at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:78) at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:443) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:304) at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:302) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at akka.actor.Actor.aroundReceive(Actor.scala:537) at akka.actor.Actor.aroundReceive$(Actor.scala:535) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580) at akka.actor.ActorCell.invoke(ActorCell.scala:548) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) at akka.dispatch.Mailbox.run(Mailbox.scala:231) at akka.dispatch.Mailbox.exec(Mailbox.scala:243) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at
[jira] [Updated] (FLINK-31162) Avoid setting private tokens to AM container context when kerberos delegation token fetch is disabled
[ https://issues.apache.org/jira/browse/FLINK-31162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-31162: --- Labels: pull-request-available (was: ) > Avoid setting private tokens to AM container context when kerberos delegation > token fetch is disabled > - > > Key: FLINK-31162 > URL: https://issues.apache.org/jira/browse/FLINK-31162 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Affects Versions: 1.16.1 >Reporter: Venkata krishnan Sowrirajan >Priority: Major > Labels: pull-request-available > > In our internal env, we have enabled [Consistent Reads from HDFS Observer > NameNode|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html]. > With this, some of the _ObserverReadProxyProvider_ implementation clone the > delegation token for HA service and mark those tokens private so that they > won't be accessible through _ugi.getCredentials()._ > But Flink internally uses _currUsr.getTokens()_ > [here|https://github.com/apache/flink/blob/release-1.16.1/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L222] > to get the current user credentials tokens to be set in AM context for > submitting the YARN app to RM. > This fails with the following error: > {code:java} > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: test01-ha4.abc:9000, Ident: (HDFS_DELEGATION_TOKEN token 151335106 > for john) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:495) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$900(DelegationTokenRenewer.java:79) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:939) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:916) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category WRITE is not supported in state standby. Visit > https://s.apache.org/sbnn-error > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1451) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5348) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:733) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:1056) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:525) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:495) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1905) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2856) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) > at org.apache.hadoop.ipc.Client.call(Client.java:1445) > at org.apache.hadoop.ipc.Client.call(Client.java:1342) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy87.renewDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewDelegationToken(ClientNamenodeProtocolTranslatorPB.java:986) > at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source) > at >
[jira] [Commented] (FLINK-31133) AdaptiveSchedulerITCase took extraordinary long to finish
[ https://issues.apache.org/jira/browse/FLINK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691693#comment-17691693 ] Roman Khachatryan commented on FLINK-31133: --- I think the issue is actually PartiallyFinishedSourcesITCase. It starts at 3:40 {code:java} 03:40:55,702 [ main] INFO org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase [] - Test test[complex graph ALL_SUBTASKS, failover: true, strategy: full](org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase) is running. {code} then a checkpoint fails because of a timeout: {code:java} 03:41:10,775 [ChangelogRetryScheduler-1] INFO org.apache.flink.changelog.fs.RetryingExecutor [] - failed with 3 attempts: Attempt 3 timed out after 1000ms 03:41:10,777 [AsyncOperations-thread-1] INFO org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - transform-2-keyed (4/4)#0 - asynchronous part of checkpoint 2 could not be completed. java.util.concurrent.CompletionException: java.io.IOException: java.util.concurrent.TimeoutException: Attempt 3 timed out after 1000ms at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_292] at org.apache.flink.changelog.fs.FsStateChangelogWriter$UploadCompletionListener.onFailure(FsStateChangelogWriter.java:383) ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.changelog.fs.FsStateChangelogWriter.lambda$null$0(FsStateChangelogWriter.java:223) ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at java.util.ArrayList.removeIf(ArrayList.java:1415) ~[?:1.8.0_292] at org.apache.flink.changelog.fs.FsStateChangelogWriter.lambda$handleUploadFailure$4(FsStateChangelogWriter.java:222) ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:807) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:756) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] Caused by: java.io.IOException: java.util.concurrent.TimeoutException: Attempt 3 timed out after 1000ms at org.apache.flink.changelog.fs.FsStateChangelogWriter$UploadCompletionListener.onFailure(FsStateChangelogWriter.java:377) ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] ... 15 more Caused by: java.util.concurrent.TimeoutException: Attempt 3 timed out after 1000ms at org.apache.flink.changelog.fs.RetryingExecutor$RetriableActionAttempt.fmtError(RetryingExecutor.java:285) ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
[jira] [Updated] (FLINK-31133) AdaptiveSchedulerITCase took extraordinary long to finish
[ https://issues.apache.org/jira/browse/FLINK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Khachatryan updated FLINK-31133: -- Priority: Major (was: Critical) > AdaptiveSchedulerITCase took extraordinary long to finish > - > > Key: FLINK-31133 > URL: https://issues.apache.org/jira/browse/FLINK-31133 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3 >Reporter: Matthias Pohl >Assignee: Roman Khachatryan >Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b > This build ran into a timeout. Based on the stacktraces reported, it was > either caused by > [SnapshotMigrationTestBase.restoreAndExecute|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=13475]: > {code} > "main" #1 prio=5 os_prio=0 tid=0x7f23d800b800 nid=0x60cdd waiting on > condition [0x7f23e1c0d000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.flink.test.checkpointing.utils.SnapshotMigrationTestBase.restoreAndExecute(SnapshotMigrationTestBase.java:382) > at > org.apache.flink.test.migration.TypeSerializerSnapshotMigrationITCase.testSnapshot(TypeSerializerSnapshotMigrationITCase.java:172) > at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source) > [...] > {code} > or > [PartiallyFinishedSourcesITCase.test|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=10401]: > {code} > 2023-02-20T07:13:05.6084711Z "main" #1 prio=5 os_prio=0 > tid=0x7fd35c00b800 nid=0x8c8a waiting on condition [0x7fd363d0f000] > 2023-02-20T07:13:05.6085149Zjava.lang.Thread.State: TIMED_WAITING > (sleeping) > 2023-02-20T07:13:05.6085487Z at java.lang.Thread.sleep(Native Method) > 2023-02-20T07:13:05.6085925Z at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145) > 2023-02-20T07:13:05.6086512Z at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:138) > 2023-02-20T07:13:05.6087103Z at > org.apache.flink.runtime.testutils.CommonTestUtils.waitForSubtasksToFinish(CommonTestUtils.java:291) > 2023-02-20T07:13:05.6087730Z at > org.apache.flink.runtime.operators.lifecycle.TestJobExecutor.waitForSubtasksToFinish(TestJobExecutor.java:226) > 2023-02-20T07:13:05.6088410Z at > org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:138) > 2023-02-20T07:13:05.6088957Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [...] > {code} > Still, it sounds odd: Based on a code analysis it's quite unlikely that those > two caused the issue. The former one has a 5 min timeout (see related code in > [SnapshotMigrationTestBase:382|https://github.com/apache/flink/blob/release-1.15/flink-tests/src/test/java/org/apache/flink/test/checkpointing/utils/SnapshotMigrationTestBase.java#L382]). > For the other one, we found it being not responsible in the past when some > other concurrent test caused the issue (see FLINK-30261). > An investigation on where we lose the time for the timeout revealed that > {{AdaptiveSchedulerITCase}} took 2980s to finish (see [build > logs|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=5265]). > {code} > 2023-02-20T03:43:55.4546050Z Feb 20 03:43:55 [ERROR] Picked up > JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError > 2023-02-20T03:43:58.0448506Z Feb 20 03:43:58 [INFO] Running > org.apache.flink.test.scheduling.AdaptiveSchedulerITCase > 2023-02-20T04:33:38.6824634Z Feb 20 04:33:38 [INFO] Tests run: 6, Failures: > 0, Errors: 0, Skipped: 0, Time elapsed: 2,980.445 s - in > org.apache.flink.test.scheduling.AdaptiveSchedulerITCase > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691710#comment-17691710 ] Matthias Pohl commented on FLINK-31168: --- Based on the stacktrace listed in the description, the job is executed (see [JobManagerHAProcessFailureRecoveryITCase:235|https://github.com/apache/flink/blob/489827520b1a53db04a94346c98327d0d42301c5/flink-tests/src/test/java/org/apache/flink/test/recovery/JobManagerHAProcessFailureRecoveryITCase.java#L235] which submits the job and then waits for the initialization to be over. This is implemented in using {{{}ClientUtils.waitUntilJobInitializationFinished{}}} in [AbstractSessionClusterExecutor:82ff|https://github.com/apache/flink/blob/f4b59f615438e76c2b42999fc0a8ebce6a543b07/flink-clients/src/main/java/org/apache/flink/client/deployment/executors/AbstractSessionClusterExecutor.java#L82] utilizing a {{{}RestClusterClient{}}}'s {{getJobStatus}} and {{requestJobResult}} methods. The actual error happened in [Dispatcher:480f|https://github.com/apache/flink/blob/9e908567ecc06b607d08fca8e2e781b463a38de6/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L840] where {{JobManagerRunner}} is already deregistered. Therefore, the {{requestExecutionGraphInfo}} fails returning an exceptionally completed future with the {{FlinkJobNotFoundException}}. {{ClientUtils.waitUntilJobInitializationFinished}} uses {{new ExponentialWaitStrategy(50, 2000)}} as a {{WaitStrategy}}. My theory is, that the 2s of the wait strategy are too long and the job starts and finishes between two {{getJobStatus}} calls causing the corresponding {{JobManagerRunner}} to be removed and the {{getJobStatus}} call to fail as observed. > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1 >Reporter: Matthias Pohl >Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) > ... 4 more > Caused by: java.lang.RuntimeException: Error while waiting for job to be > initialized > at > org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160) > at > org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82) > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) > at > java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) > at > java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at > java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) > Caused by:
[GitHub] [flink] venkata91 opened a new pull request, #21985: [FLINK-31162][yarn] Use currUsr.getTokens instead of currUsr.getCrede…
venkata91 opened a new pull request, #21985: URL: https://github.com/apache/flink/pull/21985 …ntials to avoid including private tokens in AM container context ## What is the purpose of the change Avoid setting private tokens to AM container context when kerberos delegation token fetch is disabled and DTs are managed. ## Brief change log Currently while setting user credentials to the AM container context, `ugi.getTokens()` is used which also returns the private tokens along with UGI tokens. But it should not be passed to the AM container context. This causes the launch of YARN RM app to fail in some cases for example when [Consistent Reads from HDFS Observer NameNode](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html) feature is enabled. Instead, change it to `ugi.getCredentials().getAllTokens()` to only get user credentials tokens. Spark uses similar way of setting the [user credentials to AM container context](https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L348) as well. ## Verifying this change Tested this changed internally in our environment as this requires a managed way of Delegation token fetch. Also tested enabling kerberos delegation token fetch feature to make sure it doesn't regress. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-31133) AdaptiveSchedulerITCase took extraordinary long to finish
[ https://issues.apache.org/jira/browse/FLINK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Khachatryan reassigned FLINK-31133: - Assignee: Roman Khachatryan > AdaptiveSchedulerITCase took extraordinary long to finish > - > > Key: FLINK-31133 > URL: https://issues.apache.org/jira/browse/FLINK-31133 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3 >Reporter: Matthias Pohl >Assignee: Roman Khachatryan >Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b > This build ran into a timeout. Based on the stacktraces reported, it was > either caused by > [SnapshotMigrationTestBase.restoreAndExecute|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=13475]: > {code} > "main" #1 prio=5 os_prio=0 tid=0x7f23d800b800 nid=0x60cdd waiting on > condition [0x7f23e1c0d000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.flink.test.checkpointing.utils.SnapshotMigrationTestBase.restoreAndExecute(SnapshotMigrationTestBase.java:382) > at > org.apache.flink.test.migration.TypeSerializerSnapshotMigrationITCase.testSnapshot(TypeSerializerSnapshotMigrationITCase.java:172) > at sun.reflect.GeneratedMethodAccessor47.invoke(Unknown Source) > [...] > {code} > or > [PartiallyFinishedSourcesITCase.test|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=10401]: > {code} > 2023-02-20T07:13:05.6084711Z "main" #1 prio=5 os_prio=0 > tid=0x7fd35c00b800 nid=0x8c8a waiting on condition [0x7fd363d0f000] > 2023-02-20T07:13:05.6085149Zjava.lang.Thread.State: TIMED_WAITING > (sleeping) > 2023-02-20T07:13:05.6085487Z at java.lang.Thread.sleep(Native Method) > 2023-02-20T07:13:05.6085925Z at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145) > 2023-02-20T07:13:05.6086512Z at > org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:138) > 2023-02-20T07:13:05.6087103Z at > org.apache.flink.runtime.testutils.CommonTestUtils.waitForSubtasksToFinish(CommonTestUtils.java:291) > 2023-02-20T07:13:05.6087730Z at > org.apache.flink.runtime.operators.lifecycle.TestJobExecutor.waitForSubtasksToFinish(TestJobExecutor.java:226) > 2023-02-20T07:13:05.6088410Z at > org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:138) > 2023-02-20T07:13:05.6088957Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [...] > {code} > Still, it sounds odd: Based on a code analysis it's quite unlikely that those > two caused the issue. The former one has a 5 min timeout (see related code in > [SnapshotMigrationTestBase:382|https://github.com/apache/flink/blob/release-1.15/flink-tests/src/test/java/org/apache/flink/test/checkpointing/utils/SnapshotMigrationTestBase.java#L382]). > For the other one, we found it being not responsible in the past when some > other concurrent test caused the issue (see FLINK-30261). > An investigation on where we lose the time for the timeout revealed that > {{AdaptiveSchedulerITCase}} took 2980s to finish (see [build > logs|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46299=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=5265]). > {code} > 2023-02-20T03:43:55.4546050Z Feb 20 03:43:55 [ERROR] Picked up > JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError > 2023-02-20T03:43:58.0448506Z Feb 20 03:43:58 [INFO] Running > org.apache.flink.test.scheduling.AdaptiveSchedulerITCase > 2023-02-20T04:33:38.6824634Z Feb 20 04:33:38 [INFO] Tests run: 6, Failures: > 0, Errors: 0, Skipped: 0, Time elapsed: 2,980.445 s - in > org.apache.flink.test.scheduling.AdaptiveSchedulerITCase > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31143) Invalid request: offset doesn't match when restarting from a savepoint
[ https://issues.apache.org/jira/browse/FLINK-31143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691714#comment-17691714 ] Weijie Guo commented on FLINK-31143: I'm a little suspicious that the current logic can't handle when `JobClient` is also restarted instead of simple task failover. There are some reasons from my side. For example, the `sinkRestarted` method [here|https://github.com/apache/flink/blob/5cda70d873c9630c898d765633ec7a6cfe53e3c6/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/collect/AbstractCollectResultBuffer.java#L87] will not be called because our version(0) is equal to `INIT_VERSION`. I'm not sure why the restart of JobClient can not be taken into account, one possible reason is that transferring data from `CollectSink` to `JobClient` is a bit like writing data to an external system, It is not easy to ensure end-to-end EXACTLY_ONCE semantic. > Invalid request: offset doesn't match when restarting from a savepoint > -- > > Key: FLINK-31143 > URL: https://issues.apache.org/jira/browse/FLINK-31143 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime >Affects Versions: 1.17.0, 1.15.3, 1.16.1 >Reporter: Matthias Pohl >Priority: Critical > > I tried to run the following case: > {code:java} > public static void main(String[] args) throws Exception { > final String createTableQuery = > "CREATE TABLE left_table (a int, c varchar) " > + "WITH (" > + " 'connector' = 'datagen', " > + " 'rows-per-second' = '1', " > + " 'fields.a.kind' = 'sequence', " > + " 'fields.a.start' = '0', " > + " 'fields.a.end' = '10'" > + ");"; > final String selectQuery = "SELECT * FROM left_table;"; > final Configuration initialConfig = new Configuration(); > initialConfig.set(CoreOptions.DEFAULT_PARALLELISM, 1); > final EnvironmentSettings initialSettings = > EnvironmentSettings.newInstance() > .inStreamingMode() > .withConfiguration(initialConfig) > .build(); > final TableEnvironment initialTableEnv = > TableEnvironment.create(initialSettings); > // create job and consume two results > initialTableEnv.executeSql(createTableQuery); > final TableResult tableResult = > initialTableEnv.sqlQuery(selectQuery).execute(); > tableResult.await(); > System.out.println(tableResultIterator.next()); > System.out.println(tableResultIterator.next()); > // stop job with savepoint > final String savepointPath; > try (CloseableIterator tableResultIterator = > tableResult.collect()) { > final JobClient jobClient = > > tableResult.getJobClient().orElseThrow(IllegalStateException::new); > final File savepointDirectory = Files.createTempDir(); > savepointPath = > jobClient > .stopWithSavepoint( > true, > savepointDirectory.getAbsolutePath(), > SavepointFormatType.CANONICAL) > .get(); > } > // restart the very same job from the savepoint > final SavepointRestoreSettings savepointRestoreSettings = > SavepointRestoreSettings.forPath(savepointPath, true); > final Configuration restartConfig = new Configuration(initialConfig); > SavepointRestoreSettings.toConfiguration(savepointRestoreSettings, > restartConfig); > final EnvironmentSettings restartSettings = > EnvironmentSettings.newInstance() > .inStreamingMode() > .withConfiguration(restartConfig) > .build(); > final TableEnvironment restartTableEnv = > TableEnvironment.create(restartSettings); > restartTableEnv.executeSql(createTableQuery); > restartTableEnv.sqlQuery(selectQuery).execute().print(); > } > {code} > h3. Expected behavior > The job continues omitting the inital two records and starts printing results > from 2 onwards. > h3. Observed behavior > No results are printed. The logs show that an invalid request was handled: > {code:java} > org.apache.flink.streaming.api.operators.collect.CollectSinkFunction [] - > Invalid request. Received version = b497a74f-e85c-404b-8df3-1b4b1a0c2de1, > offset = 0, while expected version = b497a74f-e85c-404b-8df3-1b4b1a0c2de1, >
[GitHub] [flink] akalash commented on a diff in pull request #21923: FLINK-13871: Consolidate volatile status fields in StreamTask
akalash commented on code in PR #21923: URL: https://github.com/apache/flink/pull/21923#discussion_r1113358376 ## flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java: ## @@ -949,7 +952,9 @@ public final void cleanUp(Throwable throwable) throws Exception { // disabled the interruptions or not. getCompletionFuture().exceptionally(unused -> null).join(); // clean up everything we initialized -isRunning = false; +if (!isCanceled() && !isFailing()) { Review Comment: I think that highly likely we can have FSM for this logic but I agree that it requires more research about its correctness. So if we want to keep this ticket simple and save the logic that we have now. We can not have `failing` in the enum because we can have `failing` and `running` at the same time. Maybe we should have `TaskState` object with enum and `failing` flag inside. It doesn't look as clean as the solution in this PR but at least it is cleaner than the current one and it keeps the same logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31138) Either StreamCheckpointingITCase/StreamFaultToleranceTestBase or EventTimeWindowCheckpointingITCase are timinng out
[ https://issues.apache.org/jira/browse/FLINK-31138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Khachatryan updated FLINK-31138: -- Priority: Major (was: Critical) > Either StreamCheckpointingITCase/StreamFaultToleranceTestBase or > EventTimeWindowCheckpointingITCase are timinng out > --- > > Key: FLINK-31138 > URL: https://issues.apache.org/jira/browse/FLINK-31138 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / State Backends >Affects Versions: 1.15.3 >Reporter: Matthias Pohl >Priority: Major > Labels: test-stability > Attachments: > logs-cron_azure-test_cron_azure_finegrained_resource_management-1676692614.zip > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46283=logs=4d4a0d10-fca2-5507-8eed-c07f0bdf4887=7b25afdf-cc6c-566f-5459-359dc2585798 > This build timed out. The stacktraces revealed multiple tests that might have > caused this: > * {{StreamCheckpointingITCase}} through > {{StreamFaultToleranceTestBase.runCheckpointedProgram}} > {code} > [...] > 2023-02-18T07:37:47.6861582Z- locked <0x83c85250> (a > java.lang.Object) > 2023-02-18T07:37:47.6862179Zat > org.apache.flink.streaming.api.operators.StreamSourceContexts$SwitchingOnClose.collect(StreamSourceContexts.java:103) > 2023-02-18T07:37:47.6862981Zat > org.apache.flink.test.checkpointing.StreamCheckpointingITCase$StringGeneratingSourceFunction.run(StreamCheckpointingITCase.java:169) > 2023-02-18T07:37:47.6863762Z- locked <0x83c85250> (a > java.lang.Object) > [...] > 2023-02-18T07:37:47.7904307Z "main" #1 prio=5 os_prio=0 > tid=0x7fca5000b800 nid=0x56636 waiting on condition [0x7fca57c58000] > 2023-02-18T07:37:47.7904803Zjava.lang.Thread.State: WAITING (parking) > 2023-02-18T07:37:47.7905160Zat sun.misc.Unsafe.park(Native Method) > 2023-02-18T07:37:47.7905932Z- parking to wait for <0x83c9df48> > (a java.util.concurrent.CompletableFuture$Signaller) > 2023-02-18T07:37:47.7906498Zat > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > 2023-02-18T07:37:47.7907074Zat > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) > 2023-02-18T07:37:47.7907764Zat > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > 2023-02-18T07:37:47.7908457Zat > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) > 2023-02-18T07:37:47.7909019Zat > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > 2023-02-18T07:37:47.7909605Zat > org.apache.flink.test.util.TestUtils.submitJobAndWaitForResult(TestUtils.java:93) > 2023-02-18T07:37:47.7910413Zat > org.apache.flink.test.checkpointing.StreamFaultToleranceTestBase.runCheckpointedProgram(StreamFaultToleranceTestBase.java:134) > [...] > {code} > or {{LocalRecoveryITCase}}/{{EventTimeWindowCheckpointingITCase}}: > {code} > [...] > 2023-02-18T07:37:51.6744983Z "main" #1 prio=5 os_prio=0 > tid=0x7efc4000b800 nid=0x5645a waiting on condition [0x7efc49b4e000] > 2023-02-18T07:37:51.6745471Zjava.lang.Thread.State: WAITING (parking) > 2023-02-18T07:37:51.6745823Zat sun.misc.Unsafe.park(Native Method) > 2023-02-18T07:37:51.6746482Z- parking to wait for <0x8718cce8> > (a java.util.concurrent.CompletableFuture$Signaller) > 2023-02-18T07:37:51.6747147Zat > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > 2023-02-18T07:37:51.6747725Zat > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) > 2023-02-18T07:37:51.6748313Zat > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > 2023-02-18T07:37:51.6748892Zat > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) > 2023-02-18T07:37:51.6749457Zat > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > 2023-02-18T07:37:51.6750118Zat > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1989) > 2023-02-18T07:37:51.6750881Zat > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1969) > 2023-02-18T07:37:51.6751694Zat > org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase.testSlidingTimeWindow(EventTimeWindowCheckpointingITCase.java:524) > 2023-02-18T07:37:51.6752476Zat > org.apache.flink.test.checkpointing.LocalRecoveryITCase.executeTest(LocalRecoveryITCase.java:84) > 2023-02-18T07:37:51.6753157Zat > org.apache.flink.test.checkpointing.LocalRecoveryITCase.executeTest(LocalRecoveryITCase.java:66) >
[jira] [Comment Edited] (FLINK-31133) AdaptiveSchedulerITCase took extraordinary long to finish
[ https://issues.apache.org/jira/browse/FLINK-31133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691693#comment-17691693 ] Roman Khachatryan edited comment on FLINK-31133 at 2/21/23 4:26 PM: I think the issue is actually PartiallyFinishedSourcesITCase. It starts at 3:40 {code:java} 03:40:55,702 [ main] INFO org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase [] - Test test[complex graph ALL_SUBTASKS, failover: true, strategy: full](org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase) is running. {code} then a checkpoint fails because of a timeout: {code:java} 03:41:10,775 [ChangelogRetryScheduler-1] INFO org.apache.flink.changelog.fs.RetryingExecutor [] - failed with 3 attempts: Attempt 3 timed out after 1000ms 03:41:10,777 [AsyncOperations-thread-1] INFO org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - transform-2-keyed (4/4)#0 - asynchronous part of checkpoint 2 could not be completed. java.util.concurrent.CompletionException: java.io.IOException: java.util.concurrent.TimeoutException: Attempt 3 timed out after 1000ms at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_292] at org.apache.flink.changelog.fs.FsStateChangelogWriter$UploadCompletionListener.onFailure(FsStateChangelogWriter.java:383) ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.changelog.fs.FsStateChangelogWriter.lambda$null$0(FsStateChangelogWriter.java:223) ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at java.util.ArrayList.removeIf(ArrayList.java:1415) ~[?:1.8.0_292] at org.apache.flink.changelog.fs.FsStateChangelogWriter.lambda$handleUploadFailure$4(FsStateChangelogWriter.java:222) ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:807) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:756) ~[flink-streaming-java-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) ~[flink-runtime-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] Caused by: java.io.IOException: java.util.concurrent.TimeoutException: Attempt 3 timed out after 1000ms at org.apache.flink.changelog.fs.FsStateChangelogWriter$UploadCompletionListener.onFailure(FsStateChangelogWriter.java:377) ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] ... 15 more Caused by: java.util.concurrent.TimeoutException: Attempt 3 timed out after 1000ms at org.apache.flink.changelog.fs.RetryingExecutor$RetriableActionAttempt.fmtError(RetryingExecutor.java:285) ~[flink-dstl-dfs-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
[GitHub] [flink] flinkbot commented on pull request #21985: [FLINK-31162][yarn] Use currUsr.getTokens instead of currUsr.getCrede…
flinkbot commented on PR #21985: URL: https://github.com/apache/flink/pull/21985#issuecomment-1438893837 ## CI report: * bf60278b45ba5c499da3755d50c70aaf546f8bfa UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-31109) Fails with proxy user not supported even when security.kerberos.fetch.delegation-token is set to false
[ https://issues.apache.org/jira/browse/FLINK-31109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691740#comment-17691740 ] Venkata krishnan Sowrirajan commented on FLINK-31109: - [~martijnvisser] I'm still looking into it. I would probably need till the end of this week for the fix and all the internal testing that needs to be done before raising the PR. > Fails with proxy user not supported even when > security.kerberos.fetch.delegation-token is set to false > -- > > Key: FLINK-31109 > URL: https://issues.apache.org/jira/browse/FLINK-31109 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.17.0 >Reporter: Venkata krishnan Sowrirajan >Assignee: Venkata krishnan Sowrirajan >Priority: Blocker > > With > {code:java} > security.kerberos.fetch.delegation-token: false > {code} > and delegation tokens obtained through our internal service which sets both > HADOOP_TOKEN_FILE_LOCATION to pick up the DTs and also sets the > HADOOP_PROXY_USER which fails with the below error > {code:java} > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/export/home/vsowrira/flink-1.18-SNAPSHOT/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/export/apps/hadoop/hadoop-bin_2100503/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > org.apache.flink.runtime.security.modules.SecurityModule$SecurityInstallException: > Unable to set the Hadoop login user > at > org.apache.flink.runtime.security.modules.HadoopModule.install(HadoopModule.java:106) > at > org.apache.flink.runtime.security.SecurityUtils.installModules(SecurityUtils.java:76) > at > org.apache.flink.runtime.security.SecurityUtils.install(SecurityUtils.java:57) > at > org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1188) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157) > Caused by: java.lang.UnsupportedOperationException: Proxy user is not > supported > at > org.apache.flink.runtime.security.token.hadoop.KerberosLoginProvider.throwProxyUserNotSupported(KerberosLoginProvider.java:137) > at > org.apache.flink.runtime.security.token.hadoop.KerberosLoginProvider.isLoginPossible(KerberosLoginProvider.java:81) > at > org.apache.flink.runtime.security.modules.HadoopModule.install(HadoopModule.java:73) > ... 4 more > {code} > This seems to have gotten changed after > [480e6edf|https://github.com/apache/flink/commit/480e6edf9732f8334ef7576080fdbfc98051cb28] > ([FLINK-28330][runtime][security] Remove old delegation token framework code) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] venkata91 commented on pull request #21985: [FLINK-31162][yarn] Use currUsr.getCredentials.getTokens instead of currUsr.getTokens
venkata91 commented on PR #21985: URL: https://github.com/apache/flink/pull/21985#issuecomment-1438988558 > Re title: You mean 'Use currUsr.getCredentials().getAllTokens() instead of currUsr.getTokens() to avoid including private tokens' ? Oh. Yes. Thanks for pointing out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] venkata91 commented on pull request #21985: [FLINK-31162][yarn] Use currUsr.getCredentials.getTokens instead of currUsr.getTokens
venkata91 commented on PR #21985: URL: https://github.com/apache/flink/pull/21985#issuecomment-1438989508 > Change looks good to me. Not sure how testable this is ... Yeah not sure how we can add either unit tests or integration tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] venkata91 commented on pull request #21985: [FLINK-31162][yarn] Use currUsr.getTokens instead of currUsr.getCrede…
venkata91 commented on PR #21985: URL: https://github.com/apache/flink/pull/21985#issuecomment-1438936505 Please take a look. cc @gaborgsomogyi @MartijnVisser @becketqin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org