[jira] [Commented] (FLINK-29427) LookupJoinITCase failed with classloader problem
[ https://issues.apache.org/jira/browse/FLINK-29427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649614#comment-17649614 ] Matthias Pohl commented on FLINK-29427: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44084=logs=de826397-1924-5900-0034-51895f69d4b7=f311e913-93a2-5a37-acab-4a63e1328f94=17551 > LookupJoinITCase failed with classloader problem > > > Key: FLINK-29427 > URL: https://issues.apache.org/jira/browse/FLINK-29427 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.16.0, 1.17.0 >Reporter: Huang Xingbo >Assignee: Alexander Smirnov >Priority: Blocker > Labels: pull-request-available, test-stability > > {code:java} > 2022-09-27T02:49:20.9501313Z Sep 27 02:49:20 Caused by: > org.codehaus.janino.InternalCompilerException: Compiling > "KeyProjection$108341": Trying to access closed classloader. Please check if > you store classloaders directly or indirectly in static fields. If the > stacktrace suggests that the leak occurs in a third party library and cannot > be fixed immediately, you can disable this check with the configuration > 'classloader.check-leaked-classloader'. > 2022-09-27T02:49:20.9502654Z Sep 27 02:49:20 at > org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:382) > 2022-09-27T02:49:20.9503366Z Sep 27 02:49:20 at > org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:237) > 2022-09-27T02:49:20.9504044Z Sep 27 02:49:20 at > org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:465) > 2022-09-27T02:49:20.9504704Z Sep 27 02:49:20 at > org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:216) > 2022-09-27T02:49:20.9505341Z Sep 27 02:49:20 at > org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:207) > 2022-09-27T02:49:20.9505965Z Sep 27 02:49:20 at > org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) > 2022-09-27T02:49:20.9506584Z Sep 27 02:49:20 at > org.codehaus.commons.compiler.Cookable.cook(Cookable.java:75) > 2022-09-27T02:49:20.9507261Z Sep 27 02:49:20 at > org.apache.flink.table.runtime.generated.CompileUtils.doCompile(CompileUtils.java:104) > 2022-09-27T02:49:20.9507883Z Sep 27 02:49:20 ... 30 more > 2022-09-27T02:49:20.9509266Z Sep 27 02:49:20 Caused by: > java.lang.IllegalStateException: Trying to access closed classloader. Please > check if you store classloaders directly or indirectly in static fields. If > the stacktrace suggests that the leak occurs in a third party library and > cannot be fixed immediately, you can disable this check with the > configuration 'classloader.check-leaked-classloader'. > 2022-09-27T02:49:20.9510835Z Sep 27 02:49:20 at > org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:184) > 2022-09-27T02:49:20.9511760Z Sep 27 02:49:20 at > org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:192) > 2022-09-27T02:49:20.9512456Z Sep 27 02:49:20 at > java.lang.Class.forName0(Native Method) > 2022-09-27T02:49:20.9513014Z Sep 27 02:49:20 at > java.lang.Class.forName(Class.java:348) > 2022-09-27T02:49:20.9513649Z Sep 27 02:49:20 at > org.codehaus.janino.ClassLoaderIClassLoader.findIClass(ClassLoaderIClassLoader.java:89) > 2022-09-27T02:49:20.9514339Z Sep 27 02:49:20 at > org.codehaus.janino.IClassLoader.loadIClass(IClassLoader.java:312) > 2022-09-27T02:49:20.9514990Z Sep 27 02:49:20 at > org.codehaus.janino.UnitCompiler.findTypeByName(UnitCompiler.java:8556) > 2022-09-27T02:49:20.9515659Z Sep 27 02:49:20 at > org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:6749) > 2022-09-27T02:49:20.9516337Z Sep 27 02:49:20 at > org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:6594) > 2022-09-27T02:49:20.9516989Z Sep 27 02:49:20 at > org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:6573) > 2022-09-27T02:49:20.9517632Z Sep 27 02:49:20 at > org.codehaus.janino.UnitCompiler.access$13900(UnitCompiler.java:215) > 2022-09-27T02:49:20.9518319Z Sep 27 02:49:20 at > org.codehaus.janino.UnitCompiler$22$1.visitReferenceType(UnitCompiler.java:6481) > 2022-09-27T02:49:20.9519018Z Sep 27 02:49:20 at > org.codehaus.janino.UnitCompiler$22$1.visitReferenceType(UnitCompiler.java:6476) > 2022-09-27T02:49:20.9519680Z Sep 27 02:49:20 at > org.codehaus.janino.Java$ReferenceType.accept(Java.java:3928) > 2022-09-27T02:49:20.9520386Z Sep 27 02:49:20 at > org.codehaus.janino.UnitCompiler$22.visitType(UnitCompiler.java:6476) > 2022-09-27T02:49:20.9521042Z Sep 27 02:49:20 at > org.codehaus.janino.UnitCompiler$22.visitType(UnitCompiler.java:6469) > 2022-09-27T02:49:20.9521677Z Sep 27 02:49:20 at >
[jira] [Commented] (FLINK-29859) TPC-DS end-to-end test with adaptive batch scheduler failed due to oo non-empty .out files.
[ https://issues.apache.org/jira/browse/FLINK-29859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649613#comment-17649613 ] Matthias Pohl commented on FLINK-29859: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44084=logs=f8e16326-dc75-5ba0-3e95-6178dd55bf6c=15c1d318-5ca8-529f-77a2-d113a700ec34=9253 > TPC-DS end-to-end test with adaptive batch scheduler failed due to oo > non-empty .out files. > --- > > Key: FLINK-29859 > URL: https://issues.apache.org/jira/browse/FLINK-29859 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.16.0, 1.17.0 >Reporter: Leonard Xu >Priority: Major > Labels: test-stability > > Nov 03 02:02:12 [FAIL] 'TPC-DS end-to-end test with adaptive batch scheduler' > failed after 21 minutes and 44 seconds! Test exited with exit code 0 but the > logs contained errors, exceptions or non-empty .out files > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=42766=logs=ae4f8708-9994-57d3-c2d7-b892156e7812=af184cdd-c6d8-5084-0b69-7e9c67b35f7a -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-29664) Collect subpartition sizes of BLOCKING result partitions
[ https://issues.apache.org/jira/browse/FLINK-29664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lijie Wang closed FLINK-29664. -- Resolution: Done > Collect subpartition sizes of BLOCKING result partitions > > > Key: FLINK-29664 > URL: https://issues.apache.org/jira/browse/FLINK-29664 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Lijie Wang >Assignee: Lijie Wang >Priority: Major > Labels: pull-request-available > Fix For: 1.17.0 > > > In order to divide subpartition range according to the amount of data, the > scheduler needs to collect the size of each subpartition produced by upstream > tasks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29664) Collect subpartition sizes of BLOCKING result partitions
[ https://issues.apache.org/jira/browse/FLINK-29664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649603#comment-17649603 ] Lijie Wang commented on FLINK-29664: Done via 96795ae5bf0b6d6c8b1fa793be742a7c1af3 > Collect subpartition sizes of BLOCKING result partitions > > > Key: FLINK-29664 > URL: https://issues.apache.org/jira/browse/FLINK-29664 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Lijie Wang >Assignee: Lijie Wang >Priority: Major > Labels: pull-request-available > Fix For: 1.17.0 > > > In order to divide subpartition range according to the amount of data, the > scheduler needs to collect the size of each subpartition produced by upstream > tasks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] wanglijie95 closed pull request #21111: [FLINK-29664][runtime] Collect subpartition sizes of blocking result partitions
wanglijie95 closed pull request #2: [FLINK-29664][runtime] Collect subpartition sizes of blocking result partitions URL: https://github.com/apache/flink/pull/2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-30456) OLM Bundle Description Version Problems
[ https://issues.apache.org/jira/browse/FLINK-30456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora reassigned FLINK-30456: -- Assignee: James Busche > OLM Bundle Description Version Problems > --- > > Key: FLINK-30456 > URL: https://issues.apache.org/jira/browse/FLINK-30456 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.3.0 >Reporter: James Busche >Assignee: James Busche >Priority: Major > Labels: pull-request-available > > OLM is working great with OperatorHub, but noticed a few items that need > fixing. > 1. The basic.yaml example version is release-1.1 instead of the latest > release (release-1.3). This needs fixing in two places: > tools/olm/generate-olm-bundle.sh > tools/olm/docker-entry.sh > 2. The label versions in the description are hardcoded to 1.2.0 instead of > the latest release (1.3.0) > 3. The Provider is blank space " " but soon needs to have some text in there > to avoid CI errors with the latest operator-sdk versions. The person who > noticed it recommended "Community" for now, but maybe we can being making it > "The Apache Software Foundation" now? Not sure if we're ready for that yet > or not... > > I'm working on a PR to address these items. Can you assign the issue to me? > Thanks! > fyi [~tedhtchang] [~gyfora] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] jbusche commented on pull request #491: [FLINK-30456] Fixing Version and provider in OLM Description
jbusche commented on PR #491: URL: https://github.com/apache/flink-kubernetes-operator/pull/491#issuecomment-1358917278 I tested my PR briefly on OpenShift 4.10, will do some more thorough testing tomorrow and post the results here. I think @tedhtchang might test on his server(s) too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-30453) Fix 'can't find CatalogFactory' error when using FLINK sql-client to add table store bundle jar
[ https://issues.apache.org/jira/browse/FLINK-30453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-30453: --- Labels: pull-request-available (was: ) > Fix 'can't find CatalogFactory' error when using FLINK sql-client to add > table store bundle jar > --- > > Key: FLINK-30453 > URL: https://issues.apache.org/jira/browse/FLINK-30453 > Project: Flink > Issue Type: Bug > Components: Table Store >Reporter: yuzelin >Priority: Major > Labels: pull-request-available > Fix For: table-store-0.3.0 > > > The FLINK 1.16 has introduced new mechanism to allow passing a ClassLoader to > EnvironmentSettings[FLINK-15635|https://issues.apache.org/jira/browse/FLINK-15635], > and the Flink SQL client will pass a `ClientWrapperClassLoader`, making > that table store CatalogFactory can't be found if it is added by 'ADD JAR' > statement through SQL Client. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-table-store] yuzelin opened a new pull request, #442: [FLINK-30453] Fix 'can't find CatalogFactory' error when using FLINK sql-client to add table store bundle jar
yuzelin opened a new pull request, #442: URL: https://github.com/apache/flink-table-store/pull/442 Main Change: - Allow passing ClassLoader parameter when creating catalog; - Pass ClassLoader from context when creating `FlinkCatalog`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] huwh commented on pull request #21496: [FLINK-29869][ResourceManager] make ResourceAllocator declarative.
huwh commented on PR #21496: URL: https://github.com/apache/flink/pull/21496#issuecomment-1358905082 Thanks for your comment, @xintongsong, I made the changes 1. delete UnwantedWorker and UnwantedWorkerWithResourceProfile. Make the unwanted workers as hint. 2. add totalWorkerCounter to maintian the count of all workers which have resource spec(include recovered workers) 3. remove the delayed check of declarationResource, since its cost is not very heavy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-30260) FLIP-271: Add initial Autoscaling implementation
[ https://issues.apache.org/jira/browse/FLINK-30260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649592#comment-17649592 ] SwathiChandrashekar commented on FLINK-30260: - Hi [~mxm] , [~gyfora] , I would like to volunteer if there are any subtasks where I can contribute > FLIP-271: Add initial Autoscaling implementation > - > > Key: FLINK-30260 > URL: https://issues.apache.org/jira/browse/FLINK-30260 > Project: Flink > Issue Type: New Feature > Components: Kubernetes Operator >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: kubernetes-operator-1.4.0 > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30435) `SHOW CREATE TABLE` statement shows column comment
[ https://issues.apache.org/jira/browse/FLINK-30435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649587#comment-17649587 ] Yubin Li commented on FLINK-30435: -- Hi [~lzljs3620320], After FLINK-29679 merged, just need to modify a few lines to implement the feature. Could you please give a reivew? Thanks! > `SHOW CREATE TABLE` statement shows column comment > -- > > Key: FLINK-30435 > URL: https://issues.apache.org/jira/browse/FLINK-30435 > Project: Flink > Issue Type: Improvement >Reporter: Yubin Li >Priority: Major > Labels: pull-request-available > > After a table with column comments created, we would find that the results > generated by `show create table` statement lose column comments. > As created table has been migrated to new schema framework in > https://issues.apache.org/jira/browse/FLINK-29679, It is easy to take it > done. > Note: the feature doesn't change sql syntax, just make outputs consistent > with ddl and more user-friendly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-30445) Add Flink Web3 Connector
[ https://issues.apache.org/jira/browse/FLINK-30445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] luoyuxia updated FLINK-30445: - Description: # Web3 is very hot. But you could search GitHub open source blockchain explorer, the most stars project is blockscout, [https://github.com/blockscout/blockscout|https://github.com/blockscout/blockscout,] which use Elixir as a parallel engine to sync block from blockchain node into a file(CSV format). I think Flink is the best solution of ingestion. Reason: (1)blockchain needs to match different chain, including Ethereum, Bitcoin, Solana, etc. through JSON RPC. (2)Like EtherScan, the blockchain needs to fetch the latest block into storage for the index to search. (3)Also as a supplement to (2), we need a connector to fully sync all block from Blockchain Node. I think Flink Stream/Batch alignment feature is suit for this scenarios. (4)According to FLIP-27, we could use block number as SourceSplit to read. It is very natural. (5)Flink Community could use web3 topic to get PR effects on web3 cycle. was: Web3 is very hot. But you could search GitHub open source blockchain explorer, the most stars project is blockscout, [https://github.com/blockscout/blockscout|https://github.com/blockscout/blockscout,] which use Elixir as a parallel engine to sync block from blockchain node into a file(CSV format). I think Flink is the best solution of ingestion. Reason: (1)blockchain needs to match different chain, including Ethereum, Bitcoin, Solana, etc. through JSON RPC. (2)Like EtherScan, the blockchain needs to fetch the latest block into storage for the index to search. (3)Also as a supplement to (2), we need a connector to fully sync all block from Blockchain Node. I think Flink Stream/Batch alignment feature is suit for this scenarios. (4)According to FLIP-27, we could use block number as SourceSplit to read. It is very natural. (5)Flink Community could use web3 topic to get PR effects on web3 cycle. > Add Flink Web3 Connector > > > Key: FLINK-30445 > URL: https://issues.apache.org/jira/browse/FLINK-30445 > Project: Flink > Issue Type: New Feature >Reporter: Junyao Huang >Priority: Major > > # Web3 is very hot. But you could search GitHub open source blockchain > explorer, the most stars project is blockscout, > [https://github.com/blockscout/blockscout|https://github.com/blockscout/blockscout,] > which use Elixir as a parallel engine to sync block from blockchain node > into a file(CSV format). I think Flink is the best solution of ingestion. > Reason: > (1)blockchain needs to match different chain, including Ethereum, Bitcoin, > Solana, etc. through JSON RPC. > (2)Like EtherScan, the blockchain needs to fetch the latest block into > storage for the index to search. > (3)Also as a supplement to (2), we need a connector to fully sync all block > from Blockchain Node. I think Flink Stream/Batch alignment feature is suit > for this scenarios. > (4)According to FLIP-27, we could use block number as SourceSplit to read. It > is very natural. > (5)Flink Community could use web3 topic to get PR effects on web3 cycle. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30445) Add Flink Web3 Connector
[ https://issues.apache.org/jira/browse/FLINK-30445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649558#comment-17649558 ] Junyao Huang commented on FLINK-30445: -- [https://lists.apache.org/list.html?d...@flink.apache.org] posted. [~martijnvisser] > Add Flink Web3 Connector > > > Key: FLINK-30445 > URL: https://issues.apache.org/jira/browse/FLINK-30445 > Project: Flink > Issue Type: New Feature >Reporter: Junyao Huang >Priority: Major > > Web3 is very hot. But you could search GitHub open source blockchain > explorer, the most stars project is blockscout, > [https://github.com/blockscout/blockscout|https://github.com/blockscout/blockscout,] > which use Elixir as a parallel engine to sync block from blockchain node > into a file(CSV format). I think Flink is the best solution of ingestion. > Reason: > (1)blockchain needs to match different chain, including Ethereum, Bitcoin, > Solana, etc. through JSON RPC. > (2)Like EtherScan, the blockchain needs to fetch the latest block into > storage for the index to search. > (3)Also as a supplement to (2), we need a connector to fully sync all block > from Blockchain Node. I think Flink Stream/Batch alignment feature is suit > for this scenarios. > (4)According to FLIP-27, we could use block number as SourceSplit to read. It > is very natural. > (5)Flink Community could use web3 topic to get PR effects on web3 cycle. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30445) Add Flink Web3 Connector
[ https://issues.apache.org/jira/browse/FLINK-30445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649553#comment-17649553 ] Junyao Huang commented on FLINK-30445: -- Is it this one? [https://lists.apache.org/list.html?u...@flink.apache.org] [~martijnvisser] > Add Flink Web3 Connector > > > Key: FLINK-30445 > URL: https://issues.apache.org/jira/browse/FLINK-30445 > Project: Flink > Issue Type: New Feature >Reporter: Junyao Huang >Priority: Major > > Web3 is very hot. But you could search GitHub open source blockchain > explorer, the most stars project is blockscout, > [https://github.com/blockscout/blockscout|https://github.com/blockscout/blockscout,] > which use Elixir as a parallel engine to sync block from blockchain node > into a file(CSV format). I think Flink is the best solution of ingestion. > Reason: > (1)blockchain needs to match different chain, including Ethereum, Bitcoin, > Solana, etc. through JSON RPC. > (2)Like EtherScan, the blockchain needs to fetch the latest block into > storage for the index to search. > (3)Also as a supplement to (2), we need a connector to fully sync all block > from Blockchain Node. I think Flink Stream/Batch alignment feature is suit > for this scenarios. > (4)According to FLIP-27, we could use block number as SourceSplit to read. It > is very natural. > (5)Flink Community could use web3 topic to get PR effects on web3 cycle. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] ramkrish86 commented on a diff in pull request #21508: [FLINK-30128] Introduce Azure Data Lake Gen2 APIs in the Hadoop Recoverable path
ramkrish86 commented on code in PR #21508: URL: https://github.com/apache/flink/pull/21508#discussion_r1052900496 ## flink-filesystems/flink-azure-fs-hadoop/pom.xml: ## @@ -185,13 +185,6 @@ under the License. - - org.apache.flink:flink-hadoop-fs - - org/apache/flink/runtime/util/HadoopUtils - org/apache/flink/runtime/fs/hdfs/HadoopRecoverable* - - Review Comment: Can I resolve this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] ramkrish86 commented on a diff in pull request #21508: [FLINK-30128] Introduce Azure Data Lake Gen2 APIs in the Hadoop Recoverable path
ramkrish86 commented on code in PR #21508: URL: https://github.com/apache/flink/pull/21508#discussion_r1052900374 ## flink-filesystems/flink-azure-fs-hadoop/src/main/java/org/apache/flink/fs/azurefs/AzureBlobFsRecoverableDataOutputStream.java: ## @@ -0,0 +1,299 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.fs.azurefs; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.core.fs.RecoverableFsDataOutputStream; +import org.apache.flink.core.fs.RecoverableWriter; +import org.apache.flink.runtime.fs.hdfs.HadoopFsRecoverable; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.FileNotFoundException; +import java.io.IOException; + +import static org.apache.flink.util.Preconditions.checkNotNull; + +/** + * An implementation of the {@link RecoverableFsDataOutputStream} for AzureBlob's file system + * abstraction. + */ +@Internal +public class AzureBlobFsRecoverableDataOutputStream extends RecoverableFsDataOutputStream { + +private static final Logger LOG = + LoggerFactory.getLogger(AzureBlobFsRecoverableDataOutputStream.class); + +private final FileSystem fs; + +private final Path targetFile; + +private final Path tempFile; + +private final FSDataOutputStream out; + +// Not final to override in tests +public static int minBufferLength = 2097152; + +// init to 0. When ever recovery is done add this to the pos. +private long initialFileSize = 0; + +public AzureBlobFsRecoverableDataOutputStream(FileSystem fs, Path targetFile, Path tempFile) +throws IOException { +this.fs = checkNotNull(fs); +this.targetFile = checkNotNull(targetFile); +LOG.debug("The targetFile is {}", targetFile.getName()); +this.tempFile = checkNotNull(tempFile); +LOG.debug("The tempFile is {}", tempFile.getName()); +this.out = fs.create(tempFile); +} + +public AzureBlobFsRecoverableDataOutputStream(FileSystem fs, HadoopFsRecoverable recoverable) +throws IOException { +this.fs = checkNotNull(fs); +this.targetFile = checkNotNull(recoverable.targetFile()); +this.tempFile = checkNotNull(recoverable.tempFile()); +long len = fs.getFileStatus(tempFile).getLen(); +LOG.info("The recoverable offset is {} and the file len is {}", recoverable.offset(), len); +// Happens when we recover from a previously committed offset. Otherwise this is not +// really needed +if (len > recoverable.offset()) { +truncate(fs, recoverable); +} else if (len < recoverable.offset()) { +LOG.error( +"Temp file length {} is less than the expected recoverable offset {}", +len, +recoverable.offset()); +throw new IOException( +"Unable to create recoverable outputstream as length of file " ++ len ++ " is less than " ++ "recoverable offset " ++ recoverable.offset()); +} +// In ABFS when we try to append we don't account for the initial file size like we do in +// DFS. +// So we explicitly store this and when we do a persist call we make use of it. +initialFileSize = fs.getFileStatus(tempFile).getLen(); +out = fs.append(tempFile); +LOG.debug("Created a new OS for appending {}", tempFile); +} + +private void truncate(FileSystem fs, HadoopFsRecoverable recoverable) throws IOException { +Path renameTempPath = new Path(tempFile.toString() + ".rename"); +try { +LOG.info( +"Creating the temp rename file {} for truncating the tempFile {}", +renameTempPath, +tempFile); +FSDataOutputStream fsDataOutputStream =
[jira] [Created] (FLINK-30457) Introduce period compaction to FRocksDB
Yun Tang created FLINK-30457: Summary: Introduce period compaction to FRocksDB Key: FLINK-30457 URL: https://issues.apache.org/jira/browse/FLINK-30457 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Reporter: Yun Tang Assignee: Yun Tang Fix For: 1.17.0 RocksDB supports period compaction once compaction filter is enabled (see https://github.com/facebook/rocksdb/wiki/Leveled-Compaction#periodic-compaction), which is really useful for Flink TTL feature to pick really old files in compaction. Current FRocksDB base version: 6.20.3 has included the necessary code on the C++ side: https://github.com/facebook/rocksdb/pull/5166 and https://github.com/facebook/rocksdb/pull/5184. We just need to cherry pick the https://github.com/facebook/rocksdb/pull/8579 on the java side to support it in Flink. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] yangjunhan commented on a diff in pull request #21483: [FLINK-30358][rest] Exception location contains TM resource id and link to TM pages.
yangjunhan commented on code in PR #21483: URL: https://github.com/apache/flink/pull/21483#discussion_r1052849668 ## flink-runtime-web/web-dashboard/src/app/pages/job/exceptions/job-exceptions.component.ts: ## @@ -151,4 +153,10 @@ export class JobExceptionsComponent implements OnInit, OnDestroy { }); }); } + + public navigateTo(taskManagerId: string): void { +if (taskManagerId != null) { Review Comment: ```suggestion if (taskManagerId !== null) { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] yangjunhan commented on a diff in pull request #21483: [FLINK-30358][rest] Exception location contains TM resource id and link to TM pages.
yangjunhan commented on code in PR #21483: URL: https://github.com/apache/flink/pull/21483#discussion_r1052849595 ## flink-runtime-web/web-dashboard/src/app/pages/job/exceptions/job-exceptions.component.ts: ## @@ -151,4 +153,10 @@ export class JobExceptionsComponent implements OnInit, OnDestroy { }); }); } + + public navigateTo(taskManagerId: string): void { Review Comment: ```suggestion public navigateTo(taskManagerId: string | null): void { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-24932) Frocksdb cannot run on Apple M1
[ https://issues.apache.org/jira/browse/FLINK-24932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yun Tang updated FLINK-24932: - Priority: Major (was: Minor) > Frocksdb cannot run on Apple M1 > --- > > Key: FLINK-24932 > URL: https://issues.apache.org/jira/browse/FLINK-24932 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: Yun Tang >Assignee: Yun Tang >Priority: Major > Fix For: 1.17.0 > > > After we bump up RocksDB version to 6.20.3, we support to run RocksDB on > linux arm cluster. However, according to the feedback from Robert, Apple M1 > machines cannot run FRocksDB yet: > {code:java} > java.lang.Exception: Exception while creating StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255) > ~[flink-streaming-java_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268) > ~[flink-streaming-java_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:109) > ~[flink-streaming-java_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:711) > ~[flink-streaming-java_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) > ~[flink-streaming-java_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:687) > ~[flink-streaming-java_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654) > ~[flink-streaming-java_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958) > ~[flink-runtime-1.14.0.jar:1.14.0] > at > org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) > ~[flink-runtime-1.14.0.jar:1.14.0] > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) > ~[flink-runtime-1.14.0.jar:1.14.0] > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) > ~[flink-runtime-1.14.0.jar:1.14.0] > at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_312] > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for StreamFlatMap_c21234bcbf1e8eb4c61f1927190efebd_(1/1) from > any of the 1 provided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160) > ~[flink-streaming-java_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:346) > ~[flink-streaming-java_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:164) > ~[flink-streaming-java_2.11-1.14.0.jar:1.14.0] > ... 11 more > Caused by: java.io.IOException: Could not load the native RocksDB library > at > org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.ensureRocksDBIsLoaded(EmbeddedRocksDBStateBackend.java:882) > ~[flink-statebackend-rocksdb_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:402) > ~[flink-statebackend-rocksdb_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:345) > ~[flink-statebackend-rocksdb_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:87) > ~[flink-statebackend-rocksdb_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:329) > ~[flink-streaming-java_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168) > ~[flink-streaming-java_2.11-1.14.0.jar:1.14.0] > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > ~[flink-streaming-java_2.11-1.14.0.jar:1.14.0] > at >
[GitHub] [flink-kubernetes-operator] jbusche commented on pull request #491: [FLINK-30456] Fixing Version and provider in OLM Description
jbusche commented on PR #491: URL: https://github.com/apache/flink-kubernetes-operator/pull/491#issuecomment-1358748799 Yes @morhidi it took me awhile to fill out completely -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] morhidi commented on pull request #491: [FLINK-30456] Fixing Version and provider in OLM Description
morhidi commented on PR #491: URL: https://github.com/apache/flink-kubernetes-operator/pull/491#issuecomment-1358742932 @jbusche please fill out the PR template properly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-30456) OLM Bundle Description Version Problems
[ https://issues.apache.org/jira/browse/FLINK-30456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-30456: --- Labels: pull-request-available (was: ) > OLM Bundle Description Version Problems > --- > > Key: FLINK-30456 > URL: https://issues.apache.org/jira/browse/FLINK-30456 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.3.0 >Reporter: James Busche >Priority: Major > Labels: pull-request-available > > OLM is working great with OperatorHub, but noticed a few items that need > fixing. > 1. The basic.yaml example version is release-1.1 instead of the latest > release (release-1.3). This needs fixing in two places: > tools/olm/generate-olm-bundle.sh > tools/olm/docker-entry.sh > 2. The label versions in the description are hardcoded to 1.2.0 instead of > the latest release (1.3.0) > 3. The Provider is blank space " " but soon needs to have some text in there > to avoid CI errors with the latest operator-sdk versions. The person who > noticed it recommended "Community" for now, but maybe we can being making it > "The Apache Software Foundation" now? Not sure if we're ready for that yet > or not... > > I'm working on a PR to address these items. Can you assign the issue to me? > Thanks! > fyi [~tedhtchang] [~gyfora] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-table-store] zjureel closed pull request #441: [FLINK-30423] Introduce codegen for CastExecutor
zjureel closed pull request #441: [FLINK-30423] Introduce codegen for CastExecutor URL: https://github.com/apache/flink-table-store/pull/441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] jbusche opened a new pull request, #491: [FLINK-30456] Fixing Version and provider in OLM Description
jbusche opened a new pull request, #491: URL: https://github.com/apache/flink-kubernetes-operator/pull/491 Signed-off-by: James Busche ## What is the purpose of the change *(For example: This pull request adds a new feature to periodically create and maintain savepoints through the `FlinkDeployment` custom resource.)* ## Brief change log *(for example:)* - *Periodic savepoint trigger is introduced to the custom resource* - *The operator checks on reconciliation whether the required time has passed* - *The JobManager's dispose savepoint API is used to clean up obsolete savepoints* ## Verifying this change *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changes to the `CustomResourceDescriptors`: (yes / no) - Core observer or reconciler logic that is regularly executed: (yes / no) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on pull request #21508: [FLINK-30128] Introduce Azure Data Lake Gen2 APIs in the Hadoop Recoverable path
xintongsong commented on PR #21508: URL: https://github.com/apache/flink/pull/21508#issuecomment-1358726442 > Mainly 2 TODOs. One is the 2MB buffer and next is the flush support. I think for now it should be good. But I can raise TODO JIRAs and work on it later. Is that ok ? I'm fine with the plan. Just trying to understand the plan. I have no strong opinions on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] xintongsong commented on a diff in pull request #21508: [FLINK-30128] Introduce Azure Data Lake Gen2 APIs in the Hadoop Recoverable path
xintongsong commented on code in PR #21508: URL: https://github.com/apache/flink/pull/21508#discussion_r1052787197 ## flink-filesystems/flink-azure-fs-hadoop/src/main/java/org/apache/flink/fs/azurefs/AzureBlobFsRecoverableDataOutputStream.java: ## @@ -0,0 +1,299 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.fs.azurefs; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.core.fs.RecoverableFsDataOutputStream; +import org.apache.flink.core.fs.RecoverableWriter; +import org.apache.flink.runtime.fs.hdfs.HadoopFsRecoverable; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.FileNotFoundException; +import java.io.IOException; + +import static org.apache.flink.util.Preconditions.checkNotNull; + +/** + * An implementation of the {@link RecoverableFsDataOutputStream} for AzureBlob's file system + * abstraction. + */ +@Internal +public class AzureBlobFsRecoverableDataOutputStream extends RecoverableFsDataOutputStream { + +private static final Logger LOG = + LoggerFactory.getLogger(AzureBlobFsRecoverableDataOutputStream.class); + +private final FileSystem fs; + +private final Path targetFile; + +private final Path tempFile; + +private final FSDataOutputStream out; + +// Not final to override in tests +public static int minBufferLength = 2097152; + +// init to 0. When ever recovery is done add this to the pos. +private long initialFileSize = 0; + +public AzureBlobFsRecoverableDataOutputStream(FileSystem fs, Path targetFile, Path tempFile) +throws IOException { +this.fs = checkNotNull(fs); +this.targetFile = checkNotNull(targetFile); +LOG.debug("The targetFile is {}", targetFile.getName()); +this.tempFile = checkNotNull(tempFile); +LOG.debug("The tempFile is {}", tempFile.getName()); +this.out = fs.create(tempFile); +} + +public AzureBlobFsRecoverableDataOutputStream(FileSystem fs, HadoopFsRecoverable recoverable) +throws IOException { +this.fs = checkNotNull(fs); +this.targetFile = checkNotNull(recoverable.targetFile()); +this.tempFile = checkNotNull(recoverable.tempFile()); +long len = fs.getFileStatus(tempFile).getLen(); +LOG.info("The recoverable offset is {} and the file len is {}", recoverable.offset(), len); +// Happens when we recover from a previously committed offset. Otherwise this is not +// really needed +if (len > recoverable.offset()) { +truncate(fs, recoverable); +} else if (len < recoverable.offset()) { +LOG.error( +"Temp file length {} is less than the expected recoverable offset {}", +len, +recoverable.offset()); +throw new IOException( +"Unable to create recoverable outputstream as length of file " ++ len ++ " is less than " ++ "recoverable offset " ++ recoverable.offset()); +} +// In ABFS when we try to append we don't account for the initial file size like we do in +// DFS. +// So we explicitly store this and when we do a persist call we make use of it. +initialFileSize = fs.getFileStatus(tempFile).getLen(); +out = fs.append(tempFile); +LOG.debug("Created a new OS for appending {}", tempFile); +} + +private void truncate(FileSystem fs, HadoopFsRecoverable recoverable) throws IOException { +Path renameTempPath = new Path(tempFile.toString() + ".rename"); +try { +LOG.info( +"Creating the temp rename file {} for truncating the tempFile {}", +renameTempPath, +tempFile); +FSDataOutputStream fsDataOutputStream =
[jira] [Updated] (FLINK-30456) OLM Bundle Description Version Problems
[ https://issues.apache.org/jira/browse/FLINK-30456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Busche updated FLINK-30456: - Description: OLM is working great with OperatorHub, but noticed a few items that need fixing. 1. The basic.yaml example version is release-1.1 instead of the latest release (release-1.3). This needs fixing in two places: tools/olm/generate-olm-bundle.sh tools/olm/docker-entry.sh 2. The label versions in the description are hardcoded to 1.2.0 instead of the latest release (1.3.0) 3. The Provider is blank space " " but soon needs to have some text in there to avoid CI errors with the latest operator-sdk versions. The person who noticed it recommended "Community" for now, but maybe we can being making it "The Apache Software Foundation" now? Not sure if we're ready for that yet or not... I'm working on a PR to address these items. Can you assign the issue to me? Thanks! fyi [~tedhtchang] [~gyfora] was: OLM is working great with OperatorHub, but noticed a few items that need fixing. 1. The basic.yaml example version is release-1.1 instead of the latest release (release-1.3). This needs fixing in two places: tools/olm/generate-olm-bundle.sh tools/olm/docker-entry.sh 2. The label versions in the description are hardcoded to 1.2.0 instead of the latest release (1.3.0) 3. The Provider is blank space " " but soon needs to have some text in there to avoid CI errors with the latest operator-sdk versions. The person who noticed it recommended "Community" for now, but maybe we can being making it "The Apache Software Foundation" now? Not sure if we're ready for that yet or not... I'm working on a PR to address these items. Can you assign the issue to me? Thanks! fyi [~tedchang] [~gyfora] > OLM Bundle Description Version Problems > --- > > Key: FLINK-30456 > URL: https://issues.apache.org/jira/browse/FLINK-30456 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.3.0 >Reporter: James Busche >Priority: Major > > OLM is working great with OperatorHub, but noticed a few items that need > fixing. > 1. The basic.yaml example version is release-1.1 instead of the latest > release (release-1.3). This needs fixing in two places: > tools/olm/generate-olm-bundle.sh > tools/olm/docker-entry.sh > 2. The label versions in the description are hardcoded to 1.2.0 instead of > the latest release (1.3.0) > 3. The Provider is blank space " " but soon needs to have some text in there > to avoid CI errors with the latest operator-sdk versions. The person who > noticed it recommended "Community" for now, but maybe we can being making it > "The Apache Software Foundation" now? Not sure if we're ready for that yet > or not... > > I'm working on a PR to address these items. Can you assign the issue to me? > Thanks! > fyi [~tedhtchang] [~gyfora] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30456) OLM Bundle Description Version Problems
James Busche created FLINK-30456: Summary: OLM Bundle Description Version Problems Key: FLINK-30456 URL: https://issues.apache.org/jira/browse/FLINK-30456 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.3.0 Reporter: James Busche OLM is working great with OperatorHub, but noticed a few items that need fixing. 1. The basic.yaml example version is release-1.1 instead of the latest release (release-1.3). This needs fixing in two places: tools/olm/generate-olm-bundle.sh tools/olm/docker-entry.sh 2. The label versions in the description are hardcoded to 1.2.0 instead of the latest release (1.3.0) 3. The Provider is blank space " " but soon needs to have some text in there to avoid CI errors with the latest operator-sdk versions. The person who noticed it recommended "Community" for now, but maybe we can being making it "The Apache Software Foundation" now? Not sure if we're ready for that yet or not... I'm working on a PR to address these items. Can you assign the issue to me? Thanks! fyi [~tedchang] [~gyfora] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-30406) Jobmanager Deployment error without HA metadata should not lead to unrecoverable error
[ https://issues.apache.org/jira/browse/FLINK-30406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-30406: --- Labels: pull-request-available (was: ) > Jobmanager Deployment error without HA metadata should not lead to > unrecoverable error > -- > > Key: FLINK-30406 > URL: https://issues.apache.org/jira/browse/FLINK-30406 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Gyula Fora >Assignee: Gyula Fora >Priority: Critical > Labels: pull-request-available > Fix For: kubernetes-operator-1.4.0 > > > Currently we don't have a good way of asserting that the job never started > after savepoint upgrade when the JM deployment fails (such as on an incorrect > image). > This easily leads to scenarios which require manual recovery from the user. > We should try to avoid this with some mechanism to greately improve the > robustness of savepoint ugrades. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] gyfora opened a new pull request, #489: [FLINK-30406] Detect when jobmanager never started
gyfora opened a new pull request, #489: URL: https://github.com/apache/flink-kubernetes-operator/pull/489 ## What is the purpose of the change The purpose of this PR is to fix the long standing annoying case where the job was stuck after a non-upgradable state after starting/upgrading from a savepoint but the JobManager never starts. In these cases previously we only supported last-state (HA based) upgrade which was impossible to do if the JM never started and never created the HA metadata configmaps. The PR introduces a check whether the JobManager pods ever started by checking the Availability conditions on the JM deployment and comparing condition times with the deployment creation timestamp. If availability is False and the Deployment never transitioned out of this state after creation, we can then assume that the JM never started and we can perform the upgrade using the last recorded savepoint. This also removes the slightly adhoc logic we had in place for upgrades on initial deployments before stable state (that basically intended to work around this limitaiton). ## Verifying this change Unit tests + manual testing on minikube ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changes to the `CustomResourceDescriptors`: no - Core observer or reconciler logic that is regularly executed: yes ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-30260) FLIP-271: Add initial Autoscaling implementation
[ https://issues.apache.org/jira/browse/FLINK-30260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora closed FLINK-30260. -- Fix Version/s: kubernetes-operator-1.4.0 Resolution: Fixed > FLIP-271: Add initial Autoscaling implementation > - > > Key: FLINK-30260 > URL: https://issues.apache.org/jira/browse/FLINK-30260 > Project: Flink > Issue Type: New Feature > Components: Kubernetes Operator >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: kubernetes-operator-1.4.0 > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-30437) State incompatibility issue might cause state loss
[ https://issues.apache.org/jira/browse/FLINK-30437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora updated FLINK-30437: --- Fix Version/s: kubernetes-operator-1.4.0 > State incompatibility issue might cause state loss > -- > > Key: FLINK-30437 > URL: https://issues.apache.org/jira/browse/FLINK-30437 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0, kubernetes-operator-1.3.0 >Reporter: Gyula Fora >Assignee: Gyula Fora >Priority: Blocker > Labels: pull-request-available > Fix For: kubernetes-operator-1.4.0 > > > Even though we set: > execution.shutdown-on-application-finish: false > execution.submit-failed-job-on-application-error: true > If there is a state incompatibility the jobmanager marks the Job failed, > cleans up HA metada and restarts itself. This is a very concerning behaviour, > but we have to fix this on the operator side to at least guarantee no state > loss. > The solution is to harden the HA metadata check properly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-30408) Add unit test for HA metadata check logic
[ https://issues.apache.org/jira/browse/FLINK-30408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora closed FLINK-30408. -- Resolution: Fixed merged to main c4e76402f02f05932c6446d97bdc3d60861b9b27 > Add unit test for HA metadata check logic > - > > Key: FLINK-30408 > URL: https://issues.apache.org/jira/browse/FLINK-30408 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Gyula Fora >Priority: Critical > Fix For: kubernetes-operator-1.4.0 > > > The current mechanism to check for the existence of HA metadata in the > operator is not guarded by any unit tests which makes in more susceptible to > accidental regressions. > We should add at least a few simple test cases to cover the expected behaviour -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-30437) State incompatibility issue might cause state loss
[ https://issues.apache.org/jira/browse/FLINK-30437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora closed FLINK-30437. -- Resolution: Fixed merged to main c4e76402f02f05932c6446d97bdc3d60861b9b27 > State incompatibility issue might cause state loss > -- > > Key: FLINK-30437 > URL: https://issues.apache.org/jira/browse/FLINK-30437 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.2.0, kubernetes-operator-1.3.0 >Reporter: Gyula Fora >Assignee: Gyula Fora >Priority: Blocker > Labels: pull-request-available > > Even though we set: > execution.shutdown-on-application-finish: false > execution.submit-failed-job-on-application-error: true > If there is a state incompatibility the jobmanager marks the Job failed, > cleans up HA metada and restarts itself. This is a very concerning behaviour, > but we have to fix this on the operator side to at least guarantee no state > loss. > The solution is to harden the HA metadata check properly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] gyfora merged pull request #488: [FLINK-30437][FLINK-30408] Harden HA meta check to avoid state loss
gyfora merged PR #488: URL: https://github.com/apache/flink-kubernetes-operator/pull/488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-30444) State recovery error not handled correctly and always causes JM failure
[ https://issues.apache.org/jira/browse/FLINK-30444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649470#comment-17649470 ] Gyula Fora commented on FLINK-30444: I think the JobManager shutting down is inconsistent with the execution.shutdown-on-application-finish configuration. All other fatal job errors simply leave the jobmanager there. Also in Application mode, this should trigger the proper shutdown of the whole application, which in Kubernetes mean that it won't restart. > State recovery error not handled correctly and always causes JM failure > --- > > Key: FLINK-30444 > URL: https://issues.apache.org/jira/browse/FLINK-30444 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Affects Versions: 1.16.0, 1.14.6, 1.15.3 >Reporter: Gyula Fora >Assignee: David Morávek >Priority: Critical > > When you submit a job in Application mode and you try to restore from an > incompatible savepoint, there is a very unexpected behaviour. > Even with the following config: > {noformat} > execution.shutdown-on-application-finish: false > execution.submit-failed-job-on-application-error: true{noformat} > The job goes into a FAILED state, and the jobmanager fails. In a kubernetes > environment (when using the native kubernetes integration) this means that > the JobManager is restarted automatically. > This will mean that if you have jobresult store enabled, after the JM comes > back you will end up with an empty application cluster. > I think the correct behaviour would be, depending on the above mention config: > 1. If there is a job recovery error and you have > (execution.submit-failed-job-on-application-error) configured, then the job > should show up as failed, and the JM should not exit (if > execution.shutdown-on-application-finish is false) > 2. If (execution.shutdown-on-application-finish is true) then the jobmanager > should exit cleanly like on normal job terminal state and thus stop the > deployment in Kubernetes, preventing a JM restart cycle -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] gyfora merged pull request #484: FLIP-271: Autoscaling
gyfora merged PR #484: URL: https://github.com/apache/flink-kubernetes-operator/pull/484 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] MartijnVisser commented on pull request #21529: [DO NOT MERGE] Testing CI with Hadoop 3 upgrade
MartijnVisser commented on PR #21529: URL: https://github.com/apache/flink/pull/21529#issuecomment-1358263883 @luoyuxia I could use your help with this. This is a PR to validate that after the update of Hadoop from 2.8.5 to 2.10.2 everything would still work. As part of the upgrade for Hadoop 2, we've concluded that we also need to update our Hadoop 3 support from Hadoop 3.1.3 to Hadoop 3.2.3. This PR was just to check that everything would work if we would run the Hadoop 3 profile (via `-Dflink.hadoop.version=3.2.3 -Phadoop3-tests,hive3`). Everything works except that now the `HiveRunnerITCase` fails with: ``` 2022-12-19T19:42:44.1764995Z Dec 19 19:42:44 [ERROR] org.apache.flink.connectors.hive.HiveRunnerITCase.testOrcSchemaEvol Time elapsed: 0.882 s <<< ERROR! 2022-12-19T19:42:44.1766652Z Dec 19 19:42:44 java.lang.IllegalArgumentException: Failed to executeQuery Hive query insert into table db1.src values (1,100),(2,200): Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org/apache/hadoop/metrics/Updater 2022-12-19T19:42:44.1767729Z Dec 19 19:42:44 at com.klarna.hiverunner.HiveServerContainer.executeStatement(HiveServerContainer.java:143) 2022-12-19T19:42:44.1768485Z Dec 19 19:42:44 at com.klarna.hiverunner.builder.HiveShellBase.executeStatementsWithCommandShellEmulation(HiveShellBase.java:121) 2022-12-19T19:42:44.1769312Z Dec 19 19:42:44 at com.klarna.hiverunner.builder.HiveShellBase.executeScriptWithCommandShellEmulation(HiveShellBase.java:110) 2022-12-19T19:42:44.1770263Z Dec 19 19:42:44 at com.klarna.hiverunner.builder.HiveShellBase.execute(HiveShellBase.java:129) 2022-12-19T19:42:44.1770974Z Dec 19 19:42:44 at org.apache.flink.connectors.hive.HiveRunnerITCase.testOrcSchemaEvol(HiveRunnerITCase.java:540) 2022-12-19T19:42:44.1771599Z Dec 19 19:42:44 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2022-12-19T19:42:44.1772276Z Dec 19 19:42:44 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 2022-12-19T19:42:44.1773807Z Dec 19 19:42:44 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2022-12-19T19:42:44.1775029Z Dec 19 19:42:44 at java.lang.reflect.Method.invoke(Method.java:498) 2022-12-19T19:42:44.1776298Z Dec 19 19:42:44 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) 2022-12-19T19:42:44.1777597Z Dec 19 19:42:44 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) 2022-12-19T19:42:44.1778841Z Dec 19 19:42:44 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) 2022-12-19T19:42:44.1780098Z Dec 19 19:42:44 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) 2022-12-19T19:42:44.1781377Z Dec 19 19:42:44 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 2022-12-19T19:42:44.1782663Z Dec 19 19:42:44 at org.apache.flink.connectors.hive.FlinkEmbeddedHiveRunner.runTestMethod(FlinkEmbeddedHiveRunner.java:128) 2022-12-19T19:42:44.1784165Z Dec 19 19:42:44 at org.apache.flink.connectors.hive.FlinkEmbeddedHiveRunner.runChild(FlinkEmbeddedHiveRunner.java:116) 2022-12-19T19:42:44.1785673Z Dec 19 19:42:44 at org.apache.flink.connectors.hive.FlinkEmbeddedHiveRunner.runChild(FlinkEmbeddedHiveRunner.java:71) 2022-12-19T19:42:44.1786903Z Dec 19 19:42:44 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 2022-12-19T19:42:44.1787956Z Dec 19 19:42:44 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 2022-12-19T19:42:44.1789050Z Dec 19 19:42:44 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 2022-12-19T19:42:44.1790152Z Dec 19 19:42:44 at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 2022-12-19T19:42:44.1791230Z Dec 19 19:42:44 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 2022-12-19T19:42:44.1792405Z Dec 19 19:42:44 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 2022-12-19T19:42:44.1793627Z Dec 19 19:42:44 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 2022-12-19T19:42:44.1794944Z Dec 19 19:42:44 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) 2022-12-19T19:42:44.1796202Z Dec 19 19:42:44 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) 2022-12-19T19:42:44.1797250Z Dec 19 19:42:44 at org.junit.rules.RunRules.evaluate(RunRules.java:20) 2022-12-19T19:42:44.1797836Z Dec 19 19:42:44 at
[jira] [Commented] (FLINK-30232) shading of netty epoll shared library does not account for ARM64 platform
[ https://issues.apache.org/jira/browse/FLINK-30232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649454#comment-17649454 ] Martijn Visser commented on FLINK-30232: [~cthomson] Have you heard anything? Else I can take a look myself > shading of netty epoll shared library does not account for ARM64 platform > - > > Key: FLINK-30232 > URL: https://issues.apache.org/jira/browse/FLINK-30232 > Project: Flink > Issue Type: Bug > Components: BuildSystem / Shaded >Affects Versions: 1.15.2 > Environment: Kubernetes 1.23 provided by AWS managed Kubernetes > service (EKS) with Graviton 2 based EC2 instances (ARM64) using Flink 1.15.2, > native epoll enabled (taskmanager.network.netty.transport: epoll) >Reporter: Chris Thomson >Priority: Major > > While evaluating migration of Flink application to Graviton 2 based EC2 > instances in a AWS managed Kubernetes service (EKS) using Kubernetes 1.23, > found that the shaded Netty library renames the AMD64 version of the shared > library as part of relocation of the Netty library but does not rename the > matching ARM64 shared library. This results in the following error when > `taskmanager.network.netty.transport: epoll` is used: > > > {{Suppressed: java.lang.UnsatisfiedLinkError: no > org_apache_flink_shaded_netty4_netty_transport_native_epoll in > java.library.path}} > {{at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1860) ~[?:1.8.0_352]}} > {{at java.lang.Runtime.loadLibrary0(Runtime.java:843) ~[?:1.8.0_352]}} > {{at java.lang.System.loadLibrary(System.java:1136) ~[?:1.8.0_352]}} > {{at > org.apache.flink.shaded.netty4.io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:38) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_352]}} > {{at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_352]}} > {{at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_352]}} > {{at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_352]}} > {{at > org.apache.flink.shaded.netty4.io.netty.util.internal.NativeLibraryLoader$1.run(NativeLibraryLoader.java:335) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_352]}} > {{at > org.apache.flink.shaded.netty4.io.netty.util.internal.NativeLibraryLoader.loadLibraryByHelper(NativeLibraryLoader.java:327) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:293) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:136) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.Native.loadNativeLibrary(Native.java:309) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.Native.(Native.java:85) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.Epoll.(Epoll.java:40) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.(EpollEventLoop.java:51) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:185) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:36) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:84) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:60) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:49) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:59) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoopGroup.(EpollEventLoopGroup.java:113) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoopGroup.(EpollEventLoopGroup.java:100) > ~[flink-dist-1.15.2.jar:1.15.2]}} > {{at >
[GitHub] [flink] MartijnVisser commented on pull request #21529: [DO NOT MERGE] Testing CI with Hadoop 3 upgrade
MartijnVisser commented on PR #21529: URL: https://github.com/apache/flink/pull/21529#issuecomment-1358137678 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-30444) State recovery error not handled correctly and always causes JM failure
[ https://issues.apache.org/jira/browse/FLINK-30444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649416#comment-17649416 ] David Morávek commented on FLINK-30444: --- This is a deeper issue with how the savepoints are recovered when JobMaster starts up. If anything goes sideways during execution graph restore, we fail the job master, which is ultimately handled as a fatal exception by the dispatcher (_DefaultExecutionGraphFactory.tryRestoreExecutionGraphFromSavepoint_ is the relevant code path). Since the job has already been registered for execution by the dispatcher, we should handle the exception and fail the job accordingly because we can't recover from this. > This also not consistent with some other startup errors such as, missing > application jar. That causes a jobmanager restart loop, but does not put the > job a terminal FAILED state. This behaviour is more desirable as it doesn't > lead to empty application clusters on Kubernetes This is a special class of errors that could be qualified as "pre-main method errors"; I think this is orthogonal to this issue and should be discussed separately. > State recovery error not handled correctly and always causes JM failure > --- > > Key: FLINK-30444 > URL: https://issues.apache.org/jira/browse/FLINK-30444 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Affects Versions: 1.16.0, 1.14.6, 1.15.3 >Reporter: Gyula Fora >Assignee: David Morávek >Priority: Critical > > When you submit a job in Application mode and you try to restore from an > incompatible savepoint, there is a very unexpected behaviour. > Even with the following config: > {noformat} > execution.shutdown-on-application-finish: false > execution.submit-failed-job-on-application-error: true{noformat} > The job goes into a FAILED state, and the jobmanager fails. In a kubernetes > environment (when using the native kubernetes integration) this means that > the JobManager is restarted automatically. > This will mean that if you have jobresult store enabled, after the JM comes > back you will end up with an empty application cluster. > I think the correct behaviour would be, depending on the above mention config: > 1. If there is a job recovery error and you have > (execution.submit-failed-job-on-application-error) configured, then the job > should show up as failed, and the JM should not exit (if > execution.shutdown-on-application-finish is false) > 2. If (execution.shutdown-on-application-finish is true) then the jobmanager > should exit cleanly like on normal job terminal state and thus stop the > deployment in Kubernetes, preventing a JM restart cycle -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] pnowojski commented on a diff in pull request #20151: [FLINK-26803][checkpoint] Merging channel state files
pnowojski commented on code in PR #20151: URL: https://github.com/apache/flink/pull/20151#discussion_r1052418421 ## flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateWriteRequestExecutorImpl.java: ## @@ -51,27 +66,40 @@ class ChannelStateWriteRequestExecutorImpl implements ChannelStateWriteRequestEx private final Thread thread; private volatile Exception thrown = null; private volatile boolean wasClosed = false; -private final String taskName; + +private final Map> unreadyQueues = +new ConcurrentHashMap<>(); + +private final JobID jobID; +private final Set subtasks; +private final AtomicBoolean isRegistering = new AtomicBoolean(true); +private final int numberOfSubtasksShareFile; Review Comment: > BlockingDeque.take() will wait until an element becomes available. Deque is hard to achieve. BlockingDeque is easy to implement the producer & consumer model. That's a slight complexity, but should be easily solved via `lock.wait()` and `lock.notifyAll()` called in one or two places (`close()` and whenever we add anything to the current `dequeue`)? https://howtodoinjava.com/java/multi-threading/wait-notify-and-notifyall-methods/ The loop probably should look like this: ``` while (true) { synchronized (lock) { if (wasClosed) return; (...) ChannelStateWriteRequest request = waitAndTakeUnsafe(); (...) } } private ChannelStateWriteRequest waitAndTakeUnsafe() { ChannelStateWriteRequest request; while (true) { request = dequeue.pollFirst(); if (request == null) { lock.wait(); } else { return request; } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #21532: FLINK-30455 [core] Excluding java.lang.String and primitive types from closure cleaning
flinkbot commented on PR #21532: URL: https://github.com/apache/flink/pull/21532#issuecomment-1357891772 ## CI report: * 5cb42cb1d6c5850906ac77e54d1943182097474d UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] gunnarmorling commented on pull request #21531: FLINK-30454 [runtime] Fixing visibility issue for SizeGauge.SizeSupplier
gunnarmorling commented on PR #21531: URL: https://github.com/apache/flink/pull/21531#issuecomment-1357888979 // CC @rmetzger -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-30455) Avoid "cleaning" of java.lang.String in ClosureCleaner
[ https://issues.apache.org/jira/browse/FLINK-30455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649387#comment-17649387 ] Gunnar Morling commented on FLINK-30455: Sent PR [https://github.com/apache/flink/pull/21532,] seems simple enough. > Avoid "cleaning" of java.lang.String in ClosureCleaner > -- > > Key: FLINK-30455 > URL: https://issues.apache.org/jira/browse/FLINK-30455 > Project: Flink > Issue Type: Sub-task >Reporter: Gunnar Morling >Assignee: Gunnar Morling >Priority: Major > Labels: pull-request-available > > When running a simple "hello world" example on JDK 17, I noticed the closure > cleaner tries to reflectively access the {{java.lang.String}} class, which > fails due to the strong encapsulation enabled by default in JDK 17 and > beyond. I don't think the closure cleaner needs to act on {{String}} at all, > as it doesn't contain any inner classes. Unless there are objections, I'll > provide a fix along those lines. With this change in place, I can run that > example on JDK 17. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-30455) Avoid "cleaning" of java.lang.String in ClosureCleaner
[ https://issues.apache.org/jira/browse/FLINK-30455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-30455: --- Labels: pull-request-available (was: ) > Avoid "cleaning" of java.lang.String in ClosureCleaner > -- > > Key: FLINK-30455 > URL: https://issues.apache.org/jira/browse/FLINK-30455 > Project: Flink > Issue Type: Sub-task >Reporter: Gunnar Morling >Assignee: Gunnar Morling >Priority: Major > Labels: pull-request-available > > When running a simple "hello world" example on JDK 17, I noticed the closure > cleaner tries to reflectively access the {{java.lang.String}} class, which > fails due to the strong encapsulation enabled by default in JDK 17 and > beyond. I don't think the closure cleaner needs to act on {{String}} at all, > as it doesn't contain any inner classes. Unless there are objections, I'll > provide a fix along those lines. With this change in place, I can run that > example on JDK 17. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] gunnarmorling opened a new pull request, #21532: FLINK-30455 [core] Excluding java.lang.String and primitive types from closure cleaning
gunnarmorling opened a new pull request, #21532: URL: https://github.com/apache/flink/pull/21532 https://issues.apache.org/jira/browse/FLINK-30455 ## What is the purpose of the change Avoiding reflective access to java.lang.String which fails on Java 17 and beyond. ## Brief change log Excluding java.lang.String and primitive types from closure cleaning ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? No - If yes, how is the feature documented? Not applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #21531: FLINK-30454 [runtime] Fixing visibility issue for SizeGauge.SizeSupplier
flinkbot commented on PR #21531: URL: https://github.com/apache/flink/pull/21531#issuecomment-1357877649 ## CI report: * b64e541cca8bfe43c5d0ec84918bbb2bd0851d24 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-30455) Avoid "cleaning" of java.lang.String in ClosureCleaner
[ https://issues.apache.org/jira/browse/FLINK-30455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn Visser reassigned FLINK-30455: -- Assignee: Gunnar Morling > Avoid "cleaning" of java.lang.String in ClosureCleaner > -- > > Key: FLINK-30455 > URL: https://issues.apache.org/jira/browse/FLINK-30455 > Project: Flink > Issue Type: Sub-task >Reporter: Gunnar Morling >Assignee: Gunnar Morling >Priority: Major > > When running a simple "hello world" example on JDK 17, I noticed the closure > cleaner tries to reflectively access the {{java.lang.String}} class, which > fails due to the strong encapsulation enabled by default in JDK 17 and > beyond. I don't think the closure cleaner needs to act on {{String}} at all, > as it doesn't contain any inner classes. Unless there are objections, I'll > provide a fix along those lines. With this change in place, I can run that > example on JDK 17. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-30454) Inconsistent class hierarchy in TaskIOMetricGroup
[ https://issues.apache.org/jira/browse/FLINK-30454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649383#comment-17649383 ] Gunnar Morling edited comment on FLINK-30454 at 12/19/22 3:51 PM: -- Ok, cool. Created PR [https://github.com/apache/flink/pull/21531/.|https://github.com/apache/flink/pull/21531/] was (Author: gunnar.morling): Ok, cool. Logged [https://github.com/apache/flink/pull/21531/.|https://github.com/apache/flink/pull/21531/] > Inconsistent class hierarchy in TaskIOMetricGroup > - > > Key: FLINK-30454 > URL: https://issues.apache.org/jira/browse/FLINK-30454 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Gunnar Morling >Priority: Major > Labels: pull-request-available > > I noticed an interesting issue when trying to compile the flink-runtime > module with Eclipse (same for VSCode): the _private_ inner class > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has > yet another _public_ inner class, {{{}SizeSupplier{}}}. The public method > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}} > has a parameter of that type. > The invocation of this method in > {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, > TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, > TaskMailbox)}} can be compiled with the javac compiler of the JDK, but fails > to compile with ecj: > {code:java} > The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the > target context is not visible here. > {code} > I tend to believe that the behavior of Eclipse's compiler is the correct one. > After all, you couldn't explicitly reference the {{SizeSupplier}} type either. > One possible fix would be to promote {{SizeSupplier}} to the same level as > {{{}SizeGauge{}}}. This would be source-compatible but not binary-compatible, > though. I.e. code compiled against the earlier signature of > {{registerMailboxSizeSupplier()}} would be broken with a > {{{}NoSuchMethodError{}}}. Depending on whether > {{registerMailboxSizeSupplier()}} are expected in client code or not, this > may or may not be acceptable. > Another fix would be to make {{SizeGauge}} public. I think that's the change > I'd do. Curious what other folks here think. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30454) Inconsistent class hierarchy in TaskIOMetricGroup
[ https://issues.apache.org/jira/browse/FLINK-30454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649383#comment-17649383 ] Gunnar Morling commented on FLINK-30454: Ok, cool. Logged [https://github.com/apache/flink/pull/21531/.|https://github.com/apache/flink/pull/21531/] > Inconsistent class hierarchy in TaskIOMetricGroup > - > > Key: FLINK-30454 > URL: https://issues.apache.org/jira/browse/FLINK-30454 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Gunnar Morling >Priority: Major > Labels: pull-request-available > > I noticed an interesting issue when trying to compile the flink-runtime > module with Eclipse (same for VSCode): the _private_ inner class > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has > yet another _public_ inner class, {{{}SizeSupplier{}}}. The public method > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}} > has a parameter of that type. > The invocation of this method in > {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, > TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, > TaskMailbox)}} can be compiled with the javac compiler of the JDK, but fails > to compile with ecj: > {code:java} > The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the > target context is not visible here. > {code} > I tend to believe that the behavior of Eclipse's compiler is the correct one. > After all, you couldn't explicitly reference the {{SizeSupplier}} type either. > One possible fix would be to promote {{SizeSupplier}} to the same level as > {{{}SizeGauge{}}}. This would be source-compatible but not binary-compatible, > though. I.e. code compiled against the earlier signature of > {{registerMailboxSizeSupplier()}} would be broken with a > {{{}NoSuchMethodError{}}}. Depending on whether > {{registerMailboxSizeSupplier()}} are expected in client code or not, this > may or may not be acceptable. > Another fix would be to make {{SizeGauge}} public. I think that's the change > I'd do. Curious what other folks here think. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-30454) Inconsistent class hierarchy in TaskIOMetricGroup
[ https://issues.apache.org/jira/browse/FLINK-30454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-30454: --- Labels: pull-request-available (was: ) > Inconsistent class hierarchy in TaskIOMetricGroup > - > > Key: FLINK-30454 > URL: https://issues.apache.org/jira/browse/FLINK-30454 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Gunnar Morling >Priority: Major > Labels: pull-request-available > > I noticed an interesting issue when trying to compile the flink-runtime > module with Eclipse (same for VSCode): the _private_ inner class > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has > yet another _public_ inner class, {{{}SizeSupplier{}}}. The public method > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}} > has a parameter of that type. > The invocation of this method in > {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, > TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, > TaskMailbox)}} can be compiled with the javac compiler of the JDK, but fails > to compile with ecj: > {code:java} > The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the > target context is not visible here. > {code} > I tend to believe that the behavior of Eclipse's compiler is the correct one. > After all, you couldn't explicitly reference the {{SizeSupplier}} type either. > One possible fix would be to promote {{SizeSupplier}} to the same level as > {{{}SizeGauge{}}}. This would be source-compatible but not binary-compatible, > though. I.e. code compiled against the earlier signature of > {{registerMailboxSizeSupplier()}} would be broken with a > {{{}NoSuchMethodError{}}}. Depending on whether > {{registerMailboxSizeSupplier()}} are expected in client code or not, this > may or may not be acceptable. > Another fix would be to make {{SizeGauge}} public. I think that's the change > I'd do. Curious what other folks here think. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] gunnarmorling opened a new pull request, #21531: FLINK-30454 [runtime] Fixing visibility issue for SizeGauge.SizeSupplier
gunnarmorling opened a new pull request, #21531: URL: https://github.com/apache/flink/pull/21531 ## What is the purpose of the change Making sure the runtime module can be compiled with ecj. I encountered two locations which seem inconsistent with the JLS but were still accepted by javac. ## Brief change log Reworked two code places to make it compile with ecj. Note the change around `SizeGauge.SizeSupplier` is source-compatible but not binary-compatible, based on the assumption that this code is not invoked in external client code. If that actually is the case, I'd rework the change to simply change the visibility of `SizeGauge` to public. ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature: no - If yes, how is the feature documented: not applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-30455) Avoid "cleaning" of java.lang.String in ClosureCleaner
[ https://issues.apache.org/jira/browse/FLINK-30455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649371#comment-17649371 ] Gunnar Morling commented on FLINK-30455: [~rmetzger] et al., if you think this makes sense, could you assign this issue to me? Thanks! > Avoid "cleaning" of java.lang.String in ClosureCleaner > -- > > Key: FLINK-30455 > URL: https://issues.apache.org/jira/browse/FLINK-30455 > Project: Flink > Issue Type: Sub-task >Reporter: Gunnar Morling >Priority: Major > > When running a simple "hello world" example on JDK 17, I noticed the closure > cleaner tries to reflectively access the {{java.lang.String}} class, which > fails due to the strong encapsulation enabled by default in JDK 17 and > beyond. I don't think the closure cleaner needs to act on {{String}} at all, > as it doesn't contain any inner classes. Unless there are objections, I'll > provide a fix along those lines. With this change in place, I can run that > example on JDK 17. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30455) Avoid "cleaning" of java.lang.String in ClosureCleaner
Gunnar Morling created FLINK-30455: -- Summary: Avoid "cleaning" of java.lang.String in ClosureCleaner Key: FLINK-30455 URL: https://issues.apache.org/jira/browse/FLINK-30455 Project: Flink Issue Type: Sub-task Reporter: Gunnar Morling When running a simple "hello world" example on JDK 17, I noticed the closure cleaner tries to reflectively access the {{java.lang.String}} class, which fails due to the strong encapsulation enabled by default in JDK 17 and beyond. I don't think the closure cleaner needs to act on {{String}} at all, as it doesn't contain any inner classes. Unless there are objections, I'll provide a fix along those lines. With this change in place, I can run that example on JDK 17. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30454) Inconsistent class hierarchy in TaskIOMetricGroup
[ https://issues.apache.org/jira/browse/FLINK-30454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649365#comment-17649365 ] Robert Metzger commented on FLINK-30454: I'm pretty sure these are not public classes. We use the @Public / @PublicEvolving / @Internal annotations to mark interface visibility in Flink. The TaskIOMetricGroup-stuff is, in my understanding anyways internal. > Inconsistent class hierarchy in TaskIOMetricGroup > - > > Key: FLINK-30454 > URL: https://issues.apache.org/jira/browse/FLINK-30454 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Gunnar Morling >Priority: Major > > I noticed an interesting issue when trying to compile the flink-runtime > module with Eclipse (same for VSCode): the _private_ inner class > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has > yet another _public_ inner class, {{{}SizeSupplier{}}}. The public method > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}} > has a parameter of that type. > The invocation of this method in > {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, > TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, > TaskMailbox)}} can be compiled with the javac compiler of the JDK, but fails > to compile with ecj: > {code:java} > The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the > target context is not visible here. > {code} > I tend to believe that the behavior of Eclipse's compiler is the correct one. > After all, you couldn't explicitly reference the {{SizeSupplier}} type either. > One possible fix would be to promote {{SizeSupplier}} to the same level as > {{{}SizeGauge{}}}. This would be source-compatible but not binary-compatible, > though. I.e. code compiled against the earlier signature of > {{registerMailboxSizeSupplier()}} would be broken with a > {{{}NoSuchMethodError{}}}. Depending on whether > {{registerMailboxSizeSupplier()}} are expected in client code or not, this > may or may not be acceptable. > Another fix would be to make {{SizeGauge}} public. I think that's the change > I'd do. Curious what other folks here think. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] pnowojski commented on a diff in pull request #20151: [FLINK-26803][checkpoint] Merging channel state files
pnowojski commented on code in PR #20151: URL: https://github.com/apache/flink/pull/20151#discussion_r1052319366 ## flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateCheckpointWriter.java: ## @@ -286,27 +277,32 @@ private void runWithChecks(RunnableWithException r) { } } -public void fail(JobID jobID, JobVertexID jobVertexID, int subtaskIndex, Throwable e) { +/** + * The throwable is just used for special subtask, other subtasks should fail by {@link + * CHANNEL_STATE_SHARED_STREAM_EXCEPTION}. Review Comment: nit: > The throwable is just used for specific subtask that triggered the failure. Other subtasks should fail by {@link CHANNEL_STATE_SHARED_STREAM_EXCEPTION} ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-30450) FileSystem supports exporting client-side metrics
[ https://issues.apache.org/jira/browse/FLINK-30450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649364#comment-17649364 ] Anton Ippolitov commented on FLINK-30450: - I am mostly interested in the checkpoint / savepoint side of things, the main idea is to monitor the pressure on the FS during these operations. I already have some server-side metrics for S3 etc but client-side metrics would offer a more granular view. > FileSystem supports exporting client-side metrics > - > > Key: FLINK-30450 > URL: https://issues.apache.org/jira/browse/FLINK-30450 > Project: Flink > Issue Type: New Feature > Components: FileSystems >Reporter: Hangxiang Yu >Priority: Major > > Client-side metrics, or job level metrics for filesystem could help us to > monitor filesystem more precisely. > Some metrics (like request rate , throughput, latency, retry count, etc) are > useful to monitor the network or client problem of checkpointing or other > access cases for a job. > Some filesystems like s3, s3-presto, gs have supported enabling some metrics, > these could be exported in the filesystem. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-15786) Load connector code with separate classloader
[ https://issues.apache.org/jira/browse/FLINK-15786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn Visser updated FLINK-15786: --- Priority: Major (was: Not a Priority) > Load connector code with separate classloader > - > > Key: FLINK-15786 > URL: https://issues.apache.org/jira/browse/FLINK-15786 > Project: Flink > Issue Type: Improvement > Components: Runtime / Task >Reporter: Guowei Ma >Priority: Major > Labels: auto-deprioritized-major, auto-deprioritized-minor, > usability > > Currently, connector code can be seen as part of user code. Usually, users > only need to add the corresponding connector as a dependency and package it > in the user jar. This is convenient enough. > However, connectors usually need to interact with external systems and often > introduce heavy dependencies, there is a high possibility of a class conflict > of different connectors or the user code of the same job. For example, every > one or two weeks, we will receive issue reports relevant with connector class > conflict from our users. The problem can get worse when users want to analyze > data from different sources and write output to different sinks. > Using separate classloader to load the different connector code could resolve > the problem. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30450) FileSystem supports exporting client-side metrics
[ https://issues.apache.org/jira/browse/FLINK-30450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649362#comment-17649362 ] Martijn Visser commented on FLINK-30450: With regards to this request, are you looking at FileSource / FileSink (to read and/or write files) or also looking at this from a checkpoint/savepointing perspective? If the first, it would be a duplicate of FLINK-28021 / FLINK-28117 imho. Exposing the Presto / Hadoop metrics might not be trivial given that these are loaded via the Flink plugin mechanism https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/plugins/ It would probably require a FLIP and a discussion the Dev mailing list > FileSystem supports exporting client-side metrics > - > > Key: FLINK-30450 > URL: https://issues.apache.org/jira/browse/FLINK-30450 > Project: Flink > Issue Type: New Feature > Components: FileSystems >Reporter: Hangxiang Yu >Priority: Major > > Client-side metrics, or job level metrics for filesystem could help us to > monitor filesystem more precisely. > Some metrics (like request rate , throughput, latency, retry count, etc) are > useful to monitor the network or client problem of checkpointing or other > access cases for a job. > Some filesystems like s3, s3-presto, gs have supported enabling some metrics, > these could be exported in the filesystem. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] pnowojski commented on a diff in pull request #20151: [FLINK-26803][checkpoint] Merging channel state files
pnowojski commented on code in PR #20151: URL: https://github.com/apache/flink/pull/20151#discussion_r1052295272 ## flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateCheckpointWriter.java: ## @@ -18,280 +18,265 @@ package org.apache.flink.runtime.checkpoint.channel; import org.apache.flink.annotation.VisibleForTesting; -import org.apache.flink.runtime.checkpoint.channel.ChannelStateWriter.ChannelStateWriteResult; +import org.apache.flink.api.common.JobID; +import org.apache.flink.runtime.checkpoint.CheckpointException; import org.apache.flink.runtime.io.network.buffer.Buffer; import org.apache.flink.runtime.io.network.logger.NetworkActionsLogger; -import org.apache.flink.runtime.state.AbstractChannelStateHandle; +import org.apache.flink.runtime.jobgraph.JobVertexID; import org.apache.flink.runtime.state.AbstractChannelStateHandle.StateContentMetaInfo; import org.apache.flink.runtime.state.CheckpointStateOutputStream; import org.apache.flink.runtime.state.CheckpointStreamFactory; -import org.apache.flink.runtime.state.InputChannelStateHandle; -import org.apache.flink.runtime.state.ResultSubpartitionStateHandle; import org.apache.flink.runtime.state.StreamStateHandle; -import org.apache.flink.runtime.state.memory.ByteStreamStateHandle; import org.apache.flink.util.Preconditions; import org.apache.flink.util.function.RunnableWithException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; +import javax.annotation.Nonnull; import javax.annotation.concurrent.NotThreadSafe; import java.io.DataOutputStream; import java.io.IOException; -import java.util.ArrayList; -import java.util.Collection; import java.util.HashMap; -import java.util.List; +import java.util.HashSet; import java.util.Map; -import java.util.Optional; -import java.util.concurrent.CompletableFuture; +import java.util.Objects; +import java.util.Set; -import static java.util.Collections.emptyList; -import static java.util.Collections.singletonList; -import static java.util.UUID.randomUUID; +import static org.apache.flink.runtime.checkpoint.CheckpointFailureReason.CHANNEL_STATE_SHARED_STREAM_EXCEPTION; import static org.apache.flink.runtime.state.CheckpointedStateScope.EXCLUSIVE; import static org.apache.flink.util.ExceptionUtils.findThrowable; import static org.apache.flink.util.ExceptionUtils.rethrow; +import static org.apache.flink.util.Preconditions.checkArgument; import static org.apache.flink.util.Preconditions.checkNotNull; import static org.apache.flink.util.Preconditions.checkState; -/** Writes channel state for a specific checkpoint-subtask-attempt triple. */ +/** Writes channel state for multiple subtasks of the same checkpoint. */ @NotThreadSafe class ChannelStateCheckpointWriter { private static final Logger LOG = LoggerFactory.getLogger(ChannelStateCheckpointWriter.class); private final DataOutputStream dataStream; private final CheckpointStateOutputStream checkpointStream; -private final ChannelStateWriteResult result; -private final Map inputChannelOffsets = new HashMap<>(); -private final Map resultSubpartitionOffsets = -new HashMap<>(); + +/** + * Indicates whether the current checkpoints of all subtasks have exception. If it's not null, + * the checkpoint will fail. + */ +private Throwable throwable; + private final ChannelStateSerializer serializer; private final long checkpointId; -private boolean allInputsReceived = false; -private boolean allOutputsReceived = false; private final RunnableWithException onComplete; -private final int subtaskIndex; -private final String taskName; + +// Subtasks that have not yet register writer result. +private final Set waitedSubtasks; + +private final Map pendingResults = new HashMap<>(); ChannelStateCheckpointWriter( -String taskName, -int subtaskIndex, -CheckpointStartRequest startCheckpointItem, +Set subtasks, +long checkpointId, CheckpointStreamFactory streamFactory, ChannelStateSerializer serializer, RunnableWithException onComplete) throws Exception { this( -taskName, -subtaskIndex, -startCheckpointItem.getCheckpointId(), -startCheckpointItem.getTargetResult(), +subtasks, +checkpointId, streamFactory.createCheckpointStateOutputStream(EXCLUSIVE), serializer, onComplete); } @VisibleForTesting ChannelStateCheckpointWriter( -String taskName, -int subtaskIndex, +Set subtasks, long checkpointId, -ChannelStateWriteResult result, CheckpointStateOutputStream stream, ChannelStateSerializer serializer,
[GitHub] [flink] Aitozi commented on pull request #21441: [hotfix] Remove the duplicated test case
Aitozi commented on PR #21441: URL: https://github.com/apache/flink/pull/21441#issuecomment-1357774211 Hi @twalthr could help take a look at this simple cleanup? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-30444) State recovery error not handled correctly and always causes JM failure
[ https://issues.apache.org/jira/browse/FLINK-30444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Morávek reassigned FLINK-30444: - Assignee: David Morávek > State recovery error not handled correctly and always causes JM failure > --- > > Key: FLINK-30444 > URL: https://issues.apache.org/jira/browse/FLINK-30444 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Affects Versions: 1.16.0, 1.14.6, 1.15.3 >Reporter: Gyula Fora >Assignee: David Morávek >Priority: Critical > > When you submit a job in Application mode and you try to restore from an > incompatible savepoint, there is a very unexpected behaviour. > Even with the following config: > {noformat} > execution.shutdown-on-application-finish: false > execution.submit-failed-job-on-application-error: true{noformat} > The job goes into a FAILED state, and the jobmanager fails. In a kubernetes > environment (when using the native kubernetes integration) this means that > the JobManager is restarted automatically. > This will mean that if you have jobresult store enabled, after the JM comes > back you will end up with an empty application cluster. > I think the correct behaviour would be, depending on the above mention config: > 1. If there is a job recovery error and you have > (execution.submit-failed-job-on-application-error) configured, then the job > should show up as failed, and the JM should not exit (if > execution.shutdown-on-application-finish is false) > 2. If (execution.shutdown-on-application-finish is true) then the jobmanager > should exit cleanly like on normal job terminal state and thus stop the > deployment in Kubernetes, preventing a JM restart cycle -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30450) FileSystem supports exporting client-side metrics
[ https://issues.apache.org/jira/browse/FLINK-30450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649357#comment-17649357 ] Anton Ippolitov commented on FLINK-30450: - Thank you for creating this ticket! To expand on what I mentioned in the original [email thread,|https://lists.apache.org/thread/dpgh6sh0r21sgohjxxbqtm2mrmjdolgr] we'd like to be able to see the following metrics: * Request rate (if possible tagged by HTTP method) * Request latency * Upload / download byte rates * Error rate (if possible tagged by error) - would be useful to track throttling errors from S3 for example * Retry count * Number of active connections As mentioned in the thread, the S3 Presto client already gathers these metrics [here|https://github.com/prestodb/presto/blob/0.272/presto-hive/src/main/java/com/facebook/presto/hive/s3/PrestoS3FileSystemStats.java] but they are not exposed anywhere in Flink. The S3A client also has built-in [metrics|https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Metrics] that are actually already exposed via JMX when the client is used in Flink but it would obviously be great to standardize the way we expose FS metrics on the Flink side. I haven't looked into GCS or Azure Storage yet but definitely interested in metrics from these clients too. > FileSystem supports exporting client-side metrics > - > > Key: FLINK-30450 > URL: https://issues.apache.org/jira/browse/FLINK-30450 > Project: Flink > Issue Type: New Feature > Components: FileSystems >Reporter: Hangxiang Yu >Priority: Major > > Client-side metrics, or job level metrics for filesystem could help us to > monitor filesystem more precisely. > Some metrics (like request rate , throughput, latency, retry count, etc) are > useful to monitor the network or client problem of checkpointing or other > access cases for a job. > Some filesystems like s3, s3-presto, gs have supported enabling some metrics, > these could be exported in the filesystem. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] flinkbot commented on pull request #21530: [FLINK-30376][table-planner] Introduce a new flink busy join reorder rule which based on greedy algorithm
flinkbot commented on PR #21530: URL: https://github.com/apache/flink/pull/21530#issuecomment-1357765946 ## CI report: * c139c227d20dd5e3406ac080d3efcc0e1c106e9d UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-30376) Introduce a new flink busy join reorder rule which based on greedy algorithm
[ https://issues.apache.org/jira/browse/FLINK-30376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-30376: --- Labels: pull-request-available (was: ) > Introduce a new flink busy join reorder rule which based on greedy algorithm > > > Key: FLINK-30376 > URL: https://issues.apache.org/jira/browse/FLINK-30376 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Affects Versions: 1.17.0 >Reporter: Yunhong Zheng >Priority: Major > Labels: pull-request-available > Fix For: 1.17.0 > > > Introducing a new Flink busy join reorder strategy which based on the greedy > algorithm. The old join reorder rule will also be the default join reorder > rule and the new busy join reorder strategy will be optional. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] swuferhong opened a new pull request, #21530: [FLINK-30376][table-planner] Introduce a new flink busy join reorder rule which based on greedy algorithm
swuferhong opened a new pull request, #21530: URL: https://github.com/apache/flink/pull/21530 ## What is the purpose of the change Introducing a new Flink busy join reorder strategy which based on the greedy algorithm. The old join reorder rule will also be the default join reorder rule and the new busy join reorder strategy will be optional. ## Brief change log - Introducing a new Flink busy join reorder strategy named `FlinkBusyJoinReorderRule` - Adding plan tests - Adding ITCase ## Verifying this change - Adding plan tests: `FlinkBusyJoinReorderRuleTest` - Adding ITCase: `JoinReorderITCase` in stream and batch mode ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper:no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? yes - If yes, how is the feature documented? no docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] ramkrish86 commented on pull request #21508: [FLINK-30128] Introduce Azure Data Lake Gen2 APIs in the Hadoop Recoverable path
ramkrish86 commented on PR #21508: URL: https://github.com/apache/flink/pull/21508#issuecomment-1357757731 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] Aitozi commented on pull request #15137: [FLINK-21396][table] Resolve tables when calling Catalog.createTable/alterTable
Aitozi commented on PR #15137: URL: https://github.com/apache/flink/pull/15137#issuecomment-1357750969 Thanks @twalthr for your input. Because the `TableEnvironment#getCatalog` leak the internal `Catalog` to the user's face, it's not compatible to enforce the Catalog to interact with resolved table only, Right? Recently, I'm trying to migrate `TableSchema` to new APIs in hive connector. And get some problems when dealing with the HiveCatalog. The best way I think is to let it deal with the ResolvedCatalogTable directly. Now, with your notes, I think it makes sense to do an extra cast internally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-30454) Inconsistent class hierarchy in TaskIOMetricGroup
[ https://issues.apache.org/jira/browse/FLINK-30454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunnar Morling updated FLINK-30454: --- Description: I noticed an interesting issue when trying to compile the flink-runtime module with Eclipse (same for VSCode): the _private_ inner class {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has yet another _public_ inner class, {{{}SizeSupplier{}}}. The public method {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}} has a parameter of that type. The invocation of this method in {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, TaskMailbox)}} can be compiled with the javac compiler of the JDK, but fails to compile with ecj: {code:java} The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the target context is not visible here. {code} I tend to believe that the behavior of Eclipse's compiler is the correct one. After all, you couldn't explicitly reference the {{SizeSupplier}} type either. One possible fix would be to promote {{SizeSupplier}} to the same level as {{{}SizeGauge{}}}. This would be source-compatible but not binary-compatible, though. I.e. code compiled against the earlier signature of {{registerMailboxSizeSupplier()}} would be broken with a {{{}NoSuchMethodError{}}}. Depending on whether {{registerMailboxSizeSupplier()}} are expected in client code or not, this may or may not be acceptable. Another fix would be to make {{SizeGauge}} public. I think that's the change I'd do. Curious what other folks here think. was: I noticed an interesting issue when trying to compile the flink-runtime module with Eclipse (same for VSCode): the _private_ inner class {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has yet another _public_ inner class, {{SizeSupplier}}. The public method {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}} has a parameter of that type. The invocation of this method in {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, TaskMailbox)}} can be compiled with the javac compiler of the JDK but fails to compile with ecj: {code} The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the target context is not visible here. {code} I tend to believe that the behavior of Eclipse's compiler is the correct one. After all, you couldn't explicitly reference the {{SizeSupplier}} type either. One possible fix would be to promote {{SizeSupplier}} to the same level as {{SizeGauge}}. This would be source-compatible but not binary-compatible, though. I.e. code compiled against the earlier signature of {{registerMailboxSizeSupplier()}} would be broken with a {{NoSuchMethodError}}. Depending on whether {{registerMailboxSizeSupplier()}} are expected in client code or not, this may or may not be acceptable. Another fix would be to make {{SizeGauge}} public. I think that's the change I'd do. Curious what other folks here think. > Inconsistent class hierarchy in TaskIOMetricGroup > - > > Key: FLINK-30454 > URL: https://issues.apache.org/jira/browse/FLINK-30454 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Gunnar Morling >Priority: Major > > I noticed an interesting issue when trying to compile the flink-runtime > module with Eclipse (same for VSCode): the _private_ inner class > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has > yet another _public_ inner class, {{{}SizeSupplier{}}}. The public method > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}} > has a parameter of that type. > The invocation of this method in > {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, > TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, > TaskMailbox)}} can be compiled with the javac compiler of the JDK, but fails > to compile with ecj: > {code:java} > The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the > target context is not visible here. > {code} > I tend to believe that the behavior of Eclipse's compiler is the correct one. > After all, you couldn't explicitly reference the {{SizeSupplier}} type either. > One possible fix would be to promote {{SizeSupplier}} to the same level as > {{{}SizeGauge{}}}. This would be source-compatible but not binary-compatible, > though. I.e. code compiled against the earlier signature of > {{registerMailboxSizeSupplier()}} would be
[jira] [Comment Edited] (FLINK-30454) Inconsistent class hierarchy in TaskIOMetricGroup
[ https://issues.apache.org/jira/browse/FLINK-30454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649345#comment-17649345 ] Gunnar Morling edited comment on FLINK-30454 at 12/19/22 2:16 PM: -- [~rmetzger], curious about your thoughts on that compatibility question. was (Author: gunnar.morling): [~rmetzger] , curious about your thoughts on that compatibility question. > Inconsistent class hierarchy in TaskIOMetricGroup > - > > Key: FLINK-30454 > URL: https://issues.apache.org/jira/browse/FLINK-30454 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Gunnar Morling >Priority: Major > > I noticed an interesting issue when trying to compile the flink-runtime > module with Eclipse (same for VSCode): the _private_ inner class > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has > yet another _public_ inner class, {{SizeSupplier}}. The public method > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}} > has a parameter of that type. The invocation of this method in > {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, > TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, > TaskMailbox)}} can be compiled with the javac compiler of the JDK but fails > to compile with ecj: > {code} > The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the > target context is not visible here. > {code} > I tend to believe that the behavior of Eclipse's compiler is the correct one. > After all, you couldn't explicitly reference the {{SizeSupplier}} type > either. One possible fix would be to promote {{SizeSupplier}} to the same > level as {{SizeGauge}}. This would be source-compatible but not > binary-compatible, though. I.e. code compiled against the earlier signature > of {{registerMailboxSizeSupplier()}} would be broken with a > {{NoSuchMethodError}}. Depending on whether {{registerMailboxSizeSupplier()}} > are expected in client code or not, this may or may not be acceptable. > Another fix would be to make {{SizeGauge}} public. I think that's the change > I'd do. Curious what other folks here think. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30454) Inconsistent class hierarchy in TaskIOMetricGroup
Gunnar Morling created FLINK-30454: -- Summary: Inconsistent class hierarchy in TaskIOMetricGroup Key: FLINK-30454 URL: https://issues.apache.org/jira/browse/FLINK-30454 Project: Flink Issue Type: Improvement Components: Runtime / Metrics Reporter: Gunnar Morling I noticed an interesting issue when trying to compile the flink-runtime module with Eclipse (same for VSCode): the _private_ inner class {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has yet another _public_ inner class, {{SizeSupplier}}. The public method {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}} has a parameter of that type. The invocation of this method in {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, TaskMailbox)}} can be compiled with the javac compiler of the JDK but fails to compile with ecj: {code} The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the target context is not visible here. {code} I tend to believe that the behavior of Eclipse's compiler is the correct one. After all, you couldn't explicitly reference the {{SizeSupplier}} type either. One possible fix would be to promote {{SizeSupplier}} to the same level as {{SizeGauge}}. This would be source-compatible but not binary-compatible, though. I.e. code compiled against the earlier signature of {{registerMailboxSizeSupplier()}} would be broken with a {{NoSuchMethodError}}. Depending on whether {{registerMailboxSizeSupplier()}} are expected in client code or not, this may or may not be acceptable. Another fix would be to make {{SizeGauge}} public. I think that's the change I'd do. Curious what other folks here think. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30454) Inconsistent class hierarchy in TaskIOMetricGroup
[ https://issues.apache.org/jira/browse/FLINK-30454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649345#comment-17649345 ] Gunnar Morling commented on FLINK-30454: [~rmetzger] , curious about your thoughts on that compatibility question. > Inconsistent class hierarchy in TaskIOMetricGroup > - > > Key: FLINK-30454 > URL: https://issues.apache.org/jira/browse/FLINK-30454 > Project: Flink > Issue Type: Improvement > Components: Runtime / Metrics >Reporter: Gunnar Morling >Priority: Major > > I noticed an interesting issue when trying to compile the flink-runtime > module with Eclipse (same for VSCode): the _private_ inner class > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.SizeGauge}} has > yet another _public_ inner class, {{SizeSupplier}}. The public method > {{org.apache.flink.runtime.metrics.groups.TaskIOMetricGroup.registerMailboxSizeSupplier(SizeSupplier)}} > has a parameter of that type. The invocation of this method in > {{org.apache.flink.streaming.runtime.tasks.StreamTask.StreamTask(Environment, > TimerService, UncaughtExceptionHandler, StreamTaskActionExecutor, > TaskMailbox)}} can be compiled with the javac compiler of the JDK but fails > to compile with ecj: > {code} > The type TaskIOMetricGroup.SizeGauge from the descriptor computed for the > target context is not visible here. > {code} > I tend to believe that the behavior of Eclipse's compiler is the correct one. > After all, you couldn't explicitly reference the {{SizeSupplier}} type > either. One possible fix would be to promote {{SizeSupplier}} to the same > level as {{SizeGauge}}. This would be source-compatible but not > binary-compatible, though. I.e. code compiled against the earlier signature > of {{registerMailboxSizeSupplier()}} would be broken with a > {{NoSuchMethodError}}. Depending on whether {{registerMailboxSizeSupplier()}} > are expected in client code or not, this may or may not be acceptable. > Another fix would be to make {{SizeGauge}} public. I think that's the change > I'd do. Curious what other folks here think. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28875) Add FlinkSessionJobControllerTest
[ https://issues.apache.org/jira/browse/FLINK-28875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649343#comment-17649343 ] Gyula Fora commented on FLINK-28875: merged to release-1.3 69f7e94808cbded27f2238bf3039b9d693009f4a > Add FlinkSessionJobControllerTest > - > > Key: FLINK-28875 > URL: https://issues.apache.org/jira/browse/FLINK-28875 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Gyula Fora >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.4.0, kubernetes-operator-1.3.1 > > > There are currently no tests for the FlinkSessionJobController, only unit > tests for the individual components of it. > We should add a larger integration style test similar to the > FlinkDeploymentControllerTest. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-28875) Add FlinkSessionJobControllerTest
[ https://issues.apache.org/jira/browse/FLINK-28875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora updated FLINK-28875: --- Fix Version/s: kubernetes-operator-1.3.1 > Add FlinkSessionJobControllerTest > - > > Key: FLINK-28875 > URL: https://issues.apache.org/jira/browse/FLINK-28875 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Gyula Fora >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.4.0, kubernetes-operator-1.3.1 > > > There are currently no tests for the FlinkSessionJobController, only unit > tests for the individual components of it. > We should add a larger integration style test similar to the > FlinkDeploymentControllerTest. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-kubernetes-operator] gyfora merged pull request #486: [backport][FLINK-28875] Add FlinkSessionJobControllerTest
gyfora merged PR #486: URL: https://github.com/apache/flink-kubernetes-operator/pull/486 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] ramkrish86 commented on pull request #21508: [FLINK-30128] Introduce Azure Data Lake Gen2 APIs in the Hadoop Recoverable path
ramkrish86 commented on PR #21508: URL: https://github.com/apache/flink/pull/21508#issuecomment-1357728930 Pushed the changes. We will update here once the other tests are done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-connector-mongodb] Jiabao-Sun commented on a diff in pull request #1: [FLINK-6573][connectors/mongodb] Flink MongoDB Connector
Jiabao-Sun commented on code in PR #1: URL: https://github.com/apache/flink-connector-mongodb/pull/1#discussion_r1052217885 ## flink-connector-mongodb/src/test/java/org/apache/flink/connector/mongodb/table/converter/MongoConvertersTest.java: ## @@ -146,28 +146,31 @@ public void testConvertBsonToRowData() { GenericRowData.of( StringData.fromString(oid.toHexString()), StringData.fromString("string"), -StringData.fromString(uuid.toString()), +StringData.fromString( +"{\"_value\": {\"$binary\": {\"base64\": \"gR+qXamERr+L0IyxvO9daQ==\", \"subType\": \"04\"}}}"), 2, 3L, 4.1d, DecimalData.fromBigDecimal(new BigDecimal("5.1"), 10, 2), false, TimestampData.fromEpochMillis(now.getEpochSecond() * 1000), +TimestampData.fromEpochMillis(now.toEpochMilli()), StringData.fromString( -OffsetDateTime.ofInstant( - Instant.ofEpochMilli(now.toEpochMilli()), -ZoneOffset.UTC) -.format(ISO_OFFSET_DATE_TIME)), -StringData.fromString("/^9$/i"), -StringData.fromString("function() { return 10; }"), -StringData.fromString("function() { return 11; }"), -StringData.fromString("12"), -StringData.fromString(oid.toHexString()), +"{\"_value\": {\"$regularExpression\": {\"pattern\": \"^9$\", \"options\": \"i\"}}}"), Review Comment: No matter which form it is, it cannot be directly handed over to MongoDB for processing. The bson document stored in mongodb needs to be converted into `BsonDocument`. We need to extract and convert the `RowData` fileds stored as string types, and finally convert the entire `RowData` into `BsonDocument`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-30355) crictl causes long wait in e2e tests
[ https://issues.apache.org/jira/browse/FLINK-30355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn Visser closed FLINK-30355. -- Fix Version/s: 1.17.0 Resolution: Fixed Fixed in master: f9557068cf203a2eabf31ffd3911c40373d7e9ce > crictl causes long wait in e2e tests > > > Key: FLINK-30355 > URL: https://issues.apache.org/jira/browse/FLINK-30355 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure, Tests >Affects Versions: 1.17.0 >Reporter: Matthias Pohl >Assignee: Martijn Visser >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.17.0 > > > We observed strange behavior in the e2e test where the e2e test run times > out: > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=43824=logs=bea52777-eaf8-5663-8482-18fbc3630e81=ae4f8708-9994-57d3-c2d7-b892156e7812=b2642e3a-5b86-574d-4c8a-f7e2842bfb14=7446 > The issue seems to be related to {{crictl}} again because we see the > following error message in multiple tests. No logs are produced afterwards > for ~30mins resulting in the overall test run taking too long: > {code} > Dec 09 08:55:39 crictl > fatal: destination path 'cri-dockerd' already exists and is not an empty > directory. > fatal: a branch named 'v0.2.3' already exists > mkdir: cannot create directory ‘bin’: File exists > Dec 09 09:26:41 fs.protected_regular = 0 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] MartijnVisser merged pull request #21520: [FLINK-30355][Kubernetes][Tests] Remove `cri-dockerd` if already cloned earlier
MartijnVisser merged PR #21520: URL: https://github.com/apache/flink/pull/21520 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-connector-mongodb] Jiabao-Sun commented on a diff in pull request #1: [FLINK-6573][connectors/mongodb] Flink MongoDB Connector
Jiabao-Sun commented on code in PR #1: URL: https://github.com/apache/flink-connector-mongodb/pull/1#discussion_r1052209447 ## flink-connector-mongodb/src/test/java/org/apache/flink/connector/mongodb/table/converter/MongoConvertersTest.java: ## @@ -146,28 +146,31 @@ public void testConvertBsonToRowData() { GenericRowData.of( StringData.fromString(oid.toHexString()), StringData.fromString("string"), -StringData.fromString(uuid.toString()), +StringData.fromString( +"{\"_value\": {\"$binary\": {\"base64\": \"gR+qXamERr+L0IyxvO9daQ==\", \"subType\": \"04\"}}}"), 2, 3L, 4.1d, DecimalData.fromBigDecimal(new BigDecimal("5.1"), 10, 2), false, TimestampData.fromEpochMillis(now.getEpochSecond() * 1000), +TimestampData.fromEpochMillis(now.toEpochMilli()), StringData.fromString( -OffsetDateTime.ofInstant( - Instant.ofEpochMilli(now.toEpochMilli()), -ZoneOffset.UTC) -.format(ISO_OFFSET_DATE_TIME)), -StringData.fromString("/^9$/i"), -StringData.fromString("function() { return 10; }"), -StringData.fromString("function() { return 11; }"), -StringData.fromString("12"), -StringData.fromString(oid.toHexString()), +"{\"_value\": {\"$regularExpression\": {\"pattern\": \"^9$\", \"options\": \"i\"}}}"), Review Comment: The current bson api only supports the conversion of bson document. For a single bson value, we need to customize `JsonWriter` and `JsonReader`. So in the previous implementation, we used a _value to wrap a single bson value as a bson document, so that we can parse them easily. ```java import org.bson.BsonDocument; import org.bson.BsonRegularExpression; import org.bson.json.JsonMode; import org.bson.json.JsonWriterSettings; import org.junit.jupiter.api.Test; import static org.junit.jupiter.api.Assertions.assertEquals; public class JsonConversionTest { @Test public void bsonToJsonTest() { BsonRegularExpression original = new BsonRegularExpression("regex", "i"); BsonDocument wrapped = new BsonDocument("_value", original); String json = wrapped.toJson(JsonWriterSettings.builder().outputMode(JsonMode.EXTENDED).build()); // {"_value": {"$regularExpression": {"pattern": "regex", "options": "i"}}} System.out.println(json); BsonDocument parsed = BsonDocument.parse(json); BsonRegularExpression parsedRegularExpression = parsed.getRegularExpression("_value"); assertEquals(parsedRegularExpression, original); } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] MartijnVisser commented on pull request #21128: [FLINK-29710] Bump minimum supported Hadoop version to 2.10.2
MartijnVisser commented on PR #21128: URL: https://github.com/apache/flink/pull/21128#issuecomment-1357654602 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-connector-mongodb] Jiabao-Sun commented on a diff in pull request #1: [FLINK-6573][connectors/mongodb] Flink MongoDB Connector
Jiabao-Sun commented on code in PR #1: URL: https://github.com/apache/flink-connector-mongodb/pull/1#discussion_r1052185934 ## flink-connector-mongodb/src/main/java/org/apache/flink/connector/mongodb/table/MongoDynamicTableFactory.java: ## @@ -151,16 +146,11 @@ public DynamicTableSink createDynamicTableSink(Context context) { MongoConfiguration config = new MongoConfiguration(helper.getOptions()); helper.validate(); -ResolvedSchema schema = context.getCatalogTable().getResolvedSchema(); -SerializableFunction keyExtractor = -MongoKeyExtractor.createKeyExtractor(schema); - return new MongoDynamicTableSink( getConnectionOptions(config), getWriteOptions(config), config.getSinkParallelism(), -context.getPhysicalRowDataType(), -keyExtractor); +context.getCatalogTable().getResolvedSchema()); Review Comment: OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-connector-mongodb] twalthr commented on a diff in pull request #1: [FLINK-6573][connectors/mongodb] Flink MongoDB Connector
twalthr commented on code in PR #1: URL: https://github.com/apache/flink-connector-mongodb/pull/1#discussion_r1052181728 ## flink-connector-mongodb/src/main/java/org/apache/flink/connector/mongodb/table/MongoDynamicTableFactory.java: ## @@ -151,16 +146,11 @@ public DynamicTableSink createDynamicTableSink(Context context) { MongoConfiguration config = new MongoConfiguration(helper.getOptions()); helper.validate(); -ResolvedSchema schema = context.getCatalogTable().getResolvedSchema(); -SerializableFunction keyExtractor = -MongoKeyExtractor.createKeyExtractor(schema); - return new MongoDynamicTableSink( getConnectionOptions(config), getWriteOptions(config), config.getSinkParallelism(), -context.getPhysicalRowDataType(), -keyExtractor); +context.getCatalogTable().getResolvedSchema()); Review Comment: Personally, I would keep out the catalog's schema from the sink. The sink has no schema but a data type. Let's use a boolean flag in the constructor `isUpsert`. ## flink-connector-mongodb/src/test/java/org/apache/flink/connector/mongodb/table/converter/MongoConvertersTest.java: ## @@ -146,28 +146,31 @@ public void testConvertBsonToRowData() { GenericRowData.of( StringData.fromString(oid.toHexString()), StringData.fromString("string"), -StringData.fromString(uuid.toString()), +StringData.fromString( +"{\"_value\": {\"$binary\": {\"base64\": \"gR+qXamERr+L0IyxvO9daQ==\", \"subType\": \"04\"}}}"), 2, 3L, 4.1d, DecimalData.fromBigDecimal(new BigDecimal("5.1"), 10, 2), false, TimestampData.fromEpochMillis(now.getEpochSecond() * 1000), +TimestampData.fromEpochMillis(now.toEpochMilli()), StringData.fromString( -OffsetDateTime.ofInstant( - Instant.ofEpochMilli(now.toEpochMilli()), -ZoneOffset.UTC) -.format(ISO_OFFSET_DATE_TIME)), -StringData.fromString("/^9$/i"), -StringData.fromString("function() { return 10; }"), -StringData.fromString("function() { return 11; }"), -StringData.fromString("12"), -StringData.fromString(oid.toHexString()), +"{\"_value\": {\"$regularExpression\": {\"pattern\": \"^9$\", \"options\": \"i\"}}}"), Review Comment: The question whether we need to customize a JsonWriter depends on the round-trip story. Is it possible to read a RegExp as string and write it out as a string again that MongoDB could correctly classify as RegExp again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-27177) Implement Binary Sorted State
[ https://issues.apache.org/jira/browse/FLINK-27177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot updated FLINK-27177: - Description: Following the plan in [FLIP-220|https://cwiki.apache.org/confluence/x/Xo_FD] (was: Following the plan in [FLIP-220|[https://cwiki.apache.org/confluence/x/Xo_FD]|https://cwiki.apache.org/confluence/x/Xo_FD],]) > Implement Binary Sorted State > - > > Key: FLINK-27177 > URL: https://issues.apache.org/jira/browse/FLINK-27177 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: David Anderson >Priority: Major > > Following the plan in [FLIP-220|https://cwiki.apache.org/confluence/x/Xo_FD] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] wangyang0918 commented on pull request #21527: [FLINK-27925] watch tm pod performance optimization
wangyang0918 commented on PR #21527: URL: https://github.com/apache/flink/pull/21527#issuecomment-1357635178 We need to add a test to guard this behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #21529: [DO NOT MERGE] Testing CI with Hadoop 3 upgrade
flinkbot commented on PR #21529: URL: https://github.com/apache/flink/pull/21529#issuecomment-1357633959 ## CI report: * d56fcee002d98d3a9b70818c0243cbb4856b3874 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-27925) Avoid to create watcher without the resourceVersion
[ https://issues.apache.org/jira/browse/FLINK-27925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Wang reassigned FLINK-27925: - Assignee: ouyangwulin > Avoid to create watcher without the resourceVersion > --- > > Key: FLINK-27925 > URL: https://issues.apache.org/jira/browse/FLINK-27925 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes >Reporter: Aitozi >Assignee: ouyangwulin >Priority: Major > Labels: pull-request-available > Attachments: image-2022-12-19-20-19-41-303.png > > > Currently, we create the watcher in KubernetesResourceManager. But it do not > pass the resourceVersion parameter, it will trigger a request to etcd. It > will bring the burden to the etcd in large scale cluster (which have been > seen in our internal k8s cluster). More detail can be found > [here|https://kubernetes.io/docs/reference/using-api/api-concepts/#the-resourceversion-parameter] > > I think we could use the informer to improve it (which will spawn a > list-watch and maintain the resourceVersion internally) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-27925) Avoid to create watcher without the resourceVersion
[ https://issues.apache.org/jira/browse/FLINK-27925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649317#comment-17649317 ] Yang Wang commented on FLINK-27925: --- Thanks [~ouyangwuli] for sharing the investigation. +1 for adding the resourceVersion=0 when doing the list pods. > Avoid to create watcher without the resourceVersion > --- > > Key: FLINK-27925 > URL: https://issues.apache.org/jira/browse/FLINK-27925 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes >Reporter: Aitozi >Priority: Major > Labels: pull-request-available > Attachments: image-2022-12-19-20-19-41-303.png > > > Currently, we create the watcher in KubernetesResourceManager. But it do not > pass the resourceVersion parameter, it will trigger a request to etcd. It > will bring the burden to the etcd in large scale cluster (which have been > seen in our internal k8s cluster). More detail can be found > [here|https://kubernetes.io/docs/reference/using-api/api-concepts/#the-resourceversion-parameter] > > I think we could use the informer to improve it (which will spawn a > list-watch and maintain the resourceVersion internally) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] MartijnVisser opened a new pull request, #21529: [DO NOT MERGE] Testing CI with Hadoop 3 upgrade
MartijnVisser opened a new pull request, #21529: URL: https://github.com/apache/flink/pull/21529 This PR is only to validate if https://github.com/apache/flink/pull/21128 compiles with the Hadoop 3 profile -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-connector-aws] boring-cyborg[bot] commented on pull request #41: [FLINK-30224][Connectors/Kinesis] An IT test for slow FlinKinesisConsumer's run() which caused an NPE in close
boring-cyborg[bot] commented on PR #41: URL: https://github.com/apache/flink-connector-aws/pull/41#issuecomment-1357608490 Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-connector-aws] astamur opened a new pull request, #41: [FLINK-30224][Connectors/Kinesis] An IT test for slow FlinKinesisConsumer's run() which caused an NPE in close
astamur opened a new pull request, #41: URL: https://github.com/apache/flink-connector-aws/pull/41 ## What is the purpose of the change [FLINK-30224](https://issues.apache.org/jira/projects/FLINK/issues/FLINK-30224) - An IT test was added which ensures that the issue with a slow KDS consumer initialization doesn't cause an NPE during cancelation; - Previously the issue itself was fixed in [FLINK-29324](https://issues.apache.org/jira/projects/FLINK/issues/FLINK-29324). ## Brief change log Improvement. An IT test was added which checks this fix: [FLINK-29324](https://issues.apache.org/jira/projects/FLINK/issues/FLINK-29324) ## Verifying this change This change only adds a new test. Azure build succeeded. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **No** - The public API, i.e., is any changed class annotated with @Public(Evolving): **No** - The serializers: **No** - The runtime per-record code paths (performance sensitive): **No** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: **No** - The S3 file system connector: **No** ## Documentation - Does this pull request introduce a new feature? **No** - If yes, how is the feature documented? not applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-27925) Avoid to create watcher without the resourceVersion
[ https://issues.apache.org/jira/browse/FLINK-27925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649232#comment-17649232 ] ouyangwulin edited comment on FLINK-27925 at 12/19/22 12:31 PM: In the case of large-scale start and stop jobs, constantly reading data from etcd can cause a bottleneck in etcd performance. I agree with [~wangyang0918] using informer will increase memory pressure. We can increase resourceversion=0 in watcher to reduce data read from etcd. As [https://kubernetes.io/docs/reference/using-api/api-concepts/#the-resourceversion-parameter] describe and the screenshots of code, 1 ."resourceVersion unset" is means "Most Recent" ,The returned data must be consistent (in detail: served from etcd via a quorum read). 2. "resourceVersion="0" is means "Any". Return data at any resource version. The newest available resource version is preferred, but strong consistency is not required; data at any resource version may be served. It is possible for the request to return data at a much older resource version that the client has previously observed, particularly in high availability configurations, due to partitions or stale caches. !image-2022-12-19-20-19-41-303.png! was (Author: ouyangwuli): In the case of large-scale start and stop jobs, constantly reading data from etcd can cause a bottleneck in etcd performance. I agree with [~wangyang0918] using informer will increase memory pressure. We can increase resourceversion=0 in watch to reduce data read from etcd. > Avoid to create watcher without the resourceVersion > --- > > Key: FLINK-27925 > URL: https://issues.apache.org/jira/browse/FLINK-27925 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes >Reporter: Aitozi >Priority: Major > Labels: pull-request-available > Attachments: image-2022-12-19-20-19-41-303.png > > > Currently, we create the watcher in KubernetesResourceManager. But it do not > pass the resourceVersion parameter, it will trigger a request to etcd. It > will bring the burden to the etcd in large scale cluster (which have been > seen in our internal k8s cluster). More detail can be found > [here|https://kubernetes.io/docs/reference/using-api/api-concepts/#the-resourceversion-parameter] > > I think we could use the informer to improve it (which will spawn a > list-watch and maintain the resourceVersion internally) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-30423) Introduce cast executor codegen for column type evolution
[ https://issues.apache.org/jira/browse/FLINK-30423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-30423: --- Labels: pull-request-available (was: ) > Introduce cast executor codegen for column type evolution > - > > Key: FLINK-30423 > URL: https://issues.apache.org/jira/browse/FLINK-30423 > Project: Flink > Issue Type: Sub-task > Components: Table Store >Affects Versions: table-store-0.3.0 >Reporter: Shammon >Priority: Major > Labels: pull-request-available > > Introduce cast executor codegen for column type evolution -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-table-store] zjureel opened a new pull request, #441: [FLINK-30423] Introduce codegen for CastExecutor
zjureel opened a new pull request, #441: URL: https://github.com/apache/flink-table-store/pull/441 This PR aims to introduce codegen for `CastExecutor`, the main codes are copied from flink `CastRule` and related class -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-connector-mongodb] Jiabao-Sun commented on a diff in pull request #1: [FLINK-6573][connectors/mongodb] Flink MongoDB Connector
Jiabao-Sun commented on code in PR #1: URL: https://github.com/apache/flink-connector-mongodb/pull/1#discussion_r1052145855 ## flink-connector-mongodb/src/test/java/org/apache/flink/connector/mongodb/table/converter/MongoConvertersTest.java: ## @@ -146,28 +146,31 @@ public void testConvertBsonToRowData() { GenericRowData.of( StringData.fromString(oid.toHexString()), StringData.fromString("string"), -StringData.fromString(uuid.toString()), +StringData.fromString( +"{\"_value\": {\"$binary\": {\"base64\": \"gR+qXamERr+L0IyxvO9daQ==\", \"subType\": \"04\"}}}"), 2, 3L, 4.1d, DecimalData.fromBigDecimal(new BigDecimal("5.1"), 10, 2), false, TimestampData.fromEpochMillis(now.getEpochSecond() * 1000), +TimestampData.fromEpochMillis(now.toEpochMilli()), StringData.fromString( -OffsetDateTime.ofInstant( - Instant.ofEpochMilli(now.toEpochMilli()), -ZoneOffset.UTC) -.format(ISO_OFFSET_DATE_TIME)), -StringData.fromString("/^9$/i"), -StringData.fromString("function() { return 10; }"), -StringData.fromString("function() { return 11; }"), -StringData.fromString("12"), -StringData.fromString(oid.toHexString()), +"{\"_value\": {\"$regularExpression\": {\"pattern\": \"^9$\", \"options\": \"i\"}}}"), Review Comment: Thanks @twalthr. The `JsonWriter` cannot directly write a `BsonValue` in to a string. It will throw an exception when writing directly to a `BsonValue`, so we used `_value` to wrap the bson value into a bson document. However, we can also extend a `JsonWriter` so that it does not check when writing bson value directly. Do you think we need to customize a `JsonWriter`? ```java package org.apache.flink; import org.bson.BsonRegularExpression; import org.bson.json.JsonWriter; import java.io.IOException; import java.io.StringWriter; public class JsonWriterTest { public static void main(String[] args) throws IOException { try (StringWriter stringWriter = new StringWriter(); JsonWriter jsonWriter = new JsonWriter(stringWriter)) { BsonRegularExpression regularExpression = new BsonRegularExpression("regex", "i"); jsonWriter.writeRegularExpression(regularExpression); } } } ``` ```shell Exception in thread "main" org.bson.BsonInvalidOperationException: A RegularExpression value cannot be written to the root level of a BSON document. at org.bson.AbstractBsonWriter.throwInvalidState(AbstractBsonWriter.java:740) at org.bson.AbstractBsonWriter.checkPreconditions(AbstractBsonWriter.java:701) at org.bson.AbstractBsonWriter.writeRegularExpression(AbstractBsonWriter.java:590) at org.apache.flink.JsonWriterTest.main(JsonWriterTest.java:15) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (FLINK-29996) Link to the task manager's thread dump page in the backpressure tab
[ https://issues.apache.org/jira/browse/FLINK-29996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yun Tang resolved FLINK-29996. -- Resolution: Fixed merged in master: 3c043cfd68fd52135175d4ae3d65a1e1909af133 > Link to the task manager's thread dump page in the backpressure tab > --- > > Key: FLINK-29996 > URL: https://issues.apache.org/jira/browse/FLINK-29996 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Web Frontend >Reporter: Yun Tang >Assignee: Yu Chen >Priority: Major > Labels: pull-request-available > Fix For: 1.17.0 > > > Currently, we have a complex steps to find the thread dump of backpressured > tasks, however, this could be simplified with a link in the backpressure tab > of web UI. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] Myasuka merged pull request #21465: [FLINK-29996] Link to the task manager's thread dump page in the back…
Myasuka merged PR #21465: URL: https://github.com/apache/flink/pull/21465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org