[jira] [Commented] (FLINK-10819) The instability problem of CI, JobManagerHAProcessFailureRecoveryITCase.testDispatcherProcessFailure test fail.
[ https://issues.apache.org/jira/browse/FLINK-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756979#comment-16756979 ] Till Rohrmann commented on FLINK-10819: --- Another instance: https://api.travis-ci.org/v3/job/486563876/log.txt > The instability problem of CI, > JobManagerHAProcessFailureRecoveryITCase.testDispatcherProcessFailure test > fail. > --- > > Key: FLINK-10819 > URL: https://issues.apache.org/jira/browse/FLINK-10819 > Project: Flink > Issue Type: Test > Components: Tests >Affects Versions: 1.7.0 >Reporter: sunjincheng >Priority: Critical > Labels: test-stability > Fix For: 1.8.0 > > > Found the following error in the process of CI: > Results : > Tests in error: > JobManagerHAProcessFailureRecoveryITCase.testDispatcherProcessFailure:331 » > IllegalArgument > Tests run: 1463, Failures: 0, Errors: 1, Skipped: 29 > 18:40:55.828 [INFO] > > 18:40:55.829 [INFO] BUILD FAILURE > 18:40:55.829 [INFO] > > 18:40:55.830 [INFO] Total time: 30:19 min > 18:40:55.830 [INFO] Finished at: 2018-11-07T18:40:55+00:00 > 18:40:56.294 [INFO] Final Memory: 92M/678M > 18:40:56.294 [INFO] > > 18:40:56.294 [WARNING] The requested profile "include-kinesis" could not be > activated because it does not exist. > 18:40:56.295 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test > (integration-tests) on project flink-tests_2.11: There are test failures. > 18:40:56.295 [ERROR] > 18:40:56.295 [ERROR] Please refer to > /home/travis/build/sunjincheng121/flink/flink-tests/target/surefire-reports > for the individual test results. > 18:40:56.295 [ERROR] -> [Help 1] > 18:40:56.295 [ERROR] > 18:40:56.295 [ERROR] To see the full stack trace of the errors, re-run Maven > with the -e switch. > 18:40:56.295 [ERROR] Re-run Maven using the -X switch to enable full debug > logging. > 18:40:56.295 [ERROR] > 18:40:56.295 [ERROR] For more information about the errors and possible > solutions, please read the following articles: > 18:40:56.295 [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > MVN exited with EXIT CODE: 1. > Trying to KILL watchdog (11329). > ./tools/travis_mvn_watchdog.sh: line 269: 11329 Terminated watchdog > PRODUCED build artifacts. > But after the rerun, the error disappeared. > Currently,no specific reasons are found, and will continue to pay attention. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] tillrohrmann commented on issue #7606: [FLINK-10774] Rework lifecycle management of partitionDiscoverer in FlinkKafkaConsumerBase
tillrohrmann commented on issue #7606: [FLINK-10774] Rework lifecycle management of partitionDiscoverer in FlinkKafkaConsumerBase URL: https://github.com/apache/flink/pull/7606#issuecomment-459245503 Thanks for the review @tzulitai. Merging. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (FLINK-11370) Check and port ZooKeeperLeaderElectionITCase to new code base if necessary
[ https://issues.apache.org/jira/browse/FLINK-11370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756943#comment-16756943 ] lining edited comment on FLINK-11370 at 1/31/19 7:08 AM: - [~till.rohrmann], I have find one bug in LeaderChangeClusterComponentsTest for cluster init. As I update it, so I fix it in this pr. You can review it too. was (Author: lining): [~till.rohrmann], I have find some bug in LeaderChangeClusterComponentsTest for cluster init. As I update it, so I fix it in this pr. You can review it too. > Check and port ZooKeeperLeaderElectionITCase to new code base if necessary > -- > > Key: FLINK-11370 > URL: https://issues.apache.org/jira/browse/FLINK-11370 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Till Rohrmann >Assignee: lining >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Check and port {{ZooKeeperLeaderElectionITCase}} to new code base if > necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jinglining commented on a change in pull request #7613: [FLINK-11370][test]Check and port ZooKeeperLeaderElectionITCase to ne…
jinglining commented on a change in pull request #7613: [FLINK-11370][test]Check and port ZooKeeperLeaderElectionITCase to ne… URL: https://github.com/apache/flink/pull/7613#discussion_r252554942 ## File path: flink-runtime/src/test/java/org/apache/flink/runtime/leaderelection/LeaderChangeClusterComponentsTest.java ## @@ -68,8 +69,8 @@ public static void setupClass() throws Exception { miniCluster = new TestingMiniCluster( new MiniClusterConfiguration.Builder() + .setNumTaskManagers(NUM_TMS) .setNumSlotsPerTaskManager(SLOTS_PER_TM) - .setNumSlotsPerTaskManager(NUM_TMS) Review comment: @tillrohrmann , the bug is this. I fixed it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-11370) Check and port ZooKeeperLeaderElectionITCase to new code base if necessary
[ https://issues.apache.org/jira/browse/FLINK-11370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756943#comment-16756943 ] lining commented on FLINK-11370: [~till.rohrmann], I have find some bug in LeaderChangeClusterComponentsTest for cluster init. As I update it, so I fix it in this pr. You can review it too. > Check and port ZooKeeperLeaderElectionITCase to new code base if necessary > -- > > Key: FLINK-11370 > URL: https://issues.apache.org/jira/browse/FLINK-11370 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Till Rohrmann >Assignee: lining >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Check and port {{ZooKeeperLeaderElectionITCase}} to new code base if > necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11370) Check and port ZooKeeperLeaderElectionITCase to new code base if necessary
[ https://issues.apache.org/jira/browse/FLINK-11370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756941#comment-16756941 ] lining commented on FLINK-11370: Hi, [~Tison]. I have pushed the code, you can review it. Thanks for your reply. > Check and port ZooKeeperLeaderElectionITCase to new code base if necessary > -- > > Key: FLINK-11370 > URL: https://issues.apache.org/jira/browse/FLINK-11370 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Till Rohrmann >Assignee: lining >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Check and port {{ZooKeeperLeaderElectionITCase}} to new code base if > necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jinglining opened a new pull request #7613: [FLINK-11370][test]Check and port ZooKeeperLeaderElectionITCase to ne…
jinglining opened a new pull request #7613: [FLINK-11370][test]Check and port ZooKeeperLeaderElectionITCase to ne… URL: https://github.com/apache/flink/pull/7613 ## What is the purpose of the change This pull request Check and port ZooKeeperLeaderElectionITCase to new code base if necessary. ## Brief change log - delete ZooKeeperLeaderElectionITCase, testJobExecutionOnClusterWithLeaderReelection which has tested in LeaderChangeClusterComponentsTest - add testTaskManagerRegisterReelectionOfResourceManager in LeaderChangeClusterComponentsTest for testTaskManagerRegistrationAtReelectedLeader. ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: ( no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-11370) Check and port ZooKeeperLeaderElectionITCase to new code base if necessary
[ https://issues.apache.org/jira/browse/FLINK-11370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-11370: --- Labels: pull-request-available (was: ) > Check and port ZooKeeperLeaderElectionITCase to new code base if necessary > -- > > Key: FLINK-11370 > URL: https://issues.apache.org/jira/browse/FLINK-11370 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Till Rohrmann >Assignee: lining >Priority: Major > Labels: pull-request-available > > Check and port {{ZooKeeperLeaderElectionITCase}} to new code base if > necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] klion26 commented on issue #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized
klion26 commented on issue #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized URL: https://github.com/apache/flink/pull/7612#issuecomment-459236367 @tzulitai thank you for your quick review. I think you may have pinged the wrong person in the last comment. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (FLINK-11322) Use try-with-resource for FlinkKafkaConsumer010
[ https://issues.apache.org/jira/browse/FLINK-11322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tzu-Li (Gordon) Tai closed FLINK-11322. --- Resolution: Fixed Merged for 1.8.0: 64cbf2a20178cf8b4f643ba510653881a3e5ce2f > Use try-with-resource for FlinkKafkaConsumer010 > --- > > Key: FLINK-11322 > URL: https://issues.apache.org/jira/browse/FLINK-11322 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector >Affects Versions: 1.7.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11485) Migrate PojoSerializer to use new serialization compatibility abstractions
Tzu-Li (Gordon) Tai created FLINK-11485: --- Summary: Migrate PojoSerializer to use new serialization compatibility abstractions Key: FLINK-11485 URL: https://issues.apache.org/jira/browse/FLINK-11485 Project: Flink Issue Type: Sub-task Components: Type Serialization System Reporter: Tzu-Li (Gordon) Tai Assignee: Tzu-Li (Gordon) Tai Fix For: 1.8.0 This subtask covers migration of the {{PojoSerializer}}. Pojo schema evolution is out of this scope. This ticket is only for migrating the {{PojoSerializer}} to be equivalent in behaviour / feature as the current master (1.8-SNAPSHOT). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] klion26 commented on a change in pull request #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized
klion26 commented on a change in pull request #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized URL: https://github.com/apache/flink/pull/7612#discussion_r252551404 ## File path: flink-tests/src/test/java/org/apache/flink/test/state/operator/restore/StreamOperatorSnapshotRestoreTest.java ## @@ -80,6 +94,18 @@ protected static TemporaryFolder temporaryFolder; + @Parameterized.Parameter + public StateBackendEnum stateBackendEnum; + + enum StateBackendEnum { Review comment: Here we use `Enum` to iterate the threes statebackens, we have to return the StateBackend Instance in `parameter()` if we don't have the `Enum`. Please correct me if I'm wrong. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (FLINK-11461) Remove useless MockRecordWriter
[ https://issues.apache.org/jira/browse/FLINK-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tzu-Li (Gordon) Tai closed FLINK-11461. --- Resolution: Fixed Merged for 1.8.0: 0e94ecfc7b65021905087b7efa94e3e04f761baf > Remove useless MockRecordWriter > --- > > Key: FLINK-11461 > URL: https://issues.apache.org/jira/browse/FLINK-11461 > Project: Flink > Issue Type: Task > Components: Tests >Reporter: zhijiang >Assignee: zhijiang >Priority: Minor > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The {{MockRecordWriter}} is not used in current code path, so delete it > directly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] hequn8128 commented on a change in pull request #6787: [FLINK-8577][table] Implement proctime DataStream to Table upsert conversion
hequn8128 commented on a change in pull request #6787: [FLINK-8577][table] Implement proctime DataStream to Table upsert conversion URL: https://github.com/apache/flink/pull/6787#discussion_r252550288 ## File path: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/logical/FlinkLogicalUpsertToRetraction.scala ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.plan.nodes.logical + +import java.util.{List => JList} + +import org.apache.calcite.plan._ +import org.apache.calcite.rel.{RelNode, SingleRel} +import org.apache.calcite.rel.convert.ConverterRule +import org.apache.calcite.rel.metadata.RelMetadataQuery +import org.apache.flink.table.plan.logical.rel.LogicalUpsertToRetraction +import org.apache.flink.table.plan.nodes.FlinkConventions + +class FlinkLogicalUpsertToRetraction( Review comment: Thanks for your reply. I guess we are on the same page now? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] asfgit closed pull request #7611: [FLINK-11460][test] Remove the useless class AcknowledgeStreamMockEnvironment
asfgit closed pull request #7611: [FLINK-11460][test] Remove the useless class AcknowledgeStreamMockEnvironment URL: https://github.com/apache/flink/pull/7611 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] asfgit closed pull request #7610: [FLINK-11461][test] Remove the useless MockRecordReader class
asfgit closed pull request #7610: [FLINK-11461][test] Remove the useless MockRecordReader class URL: https://github.com/apache/flink/pull/7610 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (FLINK-11460) Remove useless AcknowledgeStreamMockEnvironment
[ https://issues.apache.org/jira/browse/FLINK-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tzu-Li (Gordon) Tai closed FLINK-11460. --- Resolution: Fixed Merged for 1.8.0: 52c2b22bd047912c690edd1d770d567250f6d710 > Remove useless AcknowledgeStreamMockEnvironment > --- > > Key: FLINK-11460 > URL: https://issues.apache.org/jira/browse/FLINK-11460 > Project: Flink > Issue Type: Task > Components: Tests >Reporter: zhijiang >Assignee: zhijiang >Priority: Minor > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 10m > Remaining Estimate: 0h > > This class is not used any more in the code path, so delete it directly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] tzulitai commented on issue #7610: [FLINK-11461][test] Remove the useless MockRecordReader class
tzulitai commented on issue #7610: [FLINK-11461][test] Remove the useless MockRecordReader class URL: https://github.com/apache/flink/pull/7610#issuecomment-459232224 LGTM, +1 Merging .. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tzulitai commented on issue #7611: [FLINK-11460][test] Remove the useless class AcknowledgeStreamMockEnvironment
tzulitai commented on issue #7611: [FLINK-11460][test] Remove the useless class AcknowledgeStreamMockEnvironment URL: https://github.com/apache/flink/pull/7611#issuecomment-459232032 LGTM, +1. Merging this .. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tzulitai commented on a change in pull request #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized
tzulitai commented on a change in pull request #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized URL: https://github.com/apache/flink/pull/7612#discussion_r252544276 ## File path: flink-tests/src/test/java/org/apache/flink/test/state/operator/restore/StreamOperatorSnapshotRestoreTest.java ## @@ -26,6 +26,7 @@ import org.apache.flink.api.common.typeinfo.TypeInformation; import org.apache.flink.api.common.typeutils.base.IntSerializer; import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.contrib.streaming.state.RocksDBStateBackend; Review comment: ah I see, we need to move the test to a different module because of the dependency of the RocksDB backend classes This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tzulitai commented on a change in pull request #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized
tzulitai commented on a change in pull request #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized URL: https://github.com/apache/flink/pull/7612#discussion_r252544131 ## File path: flink-tests/src/test/java/org/apache/flink/test/state/operator/restore/StreamOperatorSnapshotRestoreTest.java ## @@ -26,6 +26,7 @@ import org.apache.flink.api.common.typeinfo.TypeInformation; import org.apache.flink.api.common.typeutils.base.IntSerializer; import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.contrib.streaming.state.RocksDBStateBackend; Review comment: ah I see, we need to move the test to a different module because of the dependency of the RocksDB backend classes This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tzulitai commented on a change in pull request #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized
tzulitai commented on a change in pull request #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized URL: https://github.com/apache/flink/pull/7612#discussion_r252544244 ## File path: flink-tests/src/test/java/org/apache/flink/test/state/operator/restore/StreamOperatorSnapshotRestoreTest.java ## @@ -80,6 +94,18 @@ protected static TemporaryFolder temporaryFolder; + @Parameterized.Parameter + public StateBackendEnum stateBackendEnum; + + enum StateBackendEnum { Review comment: I don't think we `Enum` in the name, seems redundant This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tzulitai commented on a change in pull request #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized
tzulitai commented on a change in pull request #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized URL: https://github.com/apache/flink/pull/7612#discussion_r252543815 ## File path: flink-tests/src/test/java/org/apache/flink/test/state/operator/restore/StreamOperatorSnapshotRestoreTest.java ## @@ -16,7 +16,7 @@ * limitations under the License. */ -package org.apache.flink.streaming.api.operators; +package org.apache.flink.test.state.operator.restore; Review comment: Is this necessary? I think the test was placed here so that it is in the same namespace as `StreamOperator`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-11471) Job is reported RUNNING prematurely for queued scheduling
[ https://issues.apache.org/jira/browse/FLINK-11471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756910#comment-16756910 ] vinoyang commented on FLINK-11471: -- I am not sure, it is feasible to introduce another JobStatus. I just put forward my thoughts. I am not sure if the official feels that it should be improved. Maybe we can listen to their opinions? [~StephanEwen] [~aljoscha] [~till.rohrmann] > Job is reported RUNNING prematurely for queued scheduling > - > > Key: FLINK-11471 > URL: https://issues.apache.org/jira/browse/FLINK-11471 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Scheduler >Affects Versions: 1.7.1 >Reporter: Nico Kruber >Assignee: vinoyang >Priority: Major > > With queued scheduling enabled (seems to be the default now), the job's > status is changed to RUNNING before all tasks are actually running. > Although {{JobStatus#RUNNING}} states > {quote}Some tasks are scheduled or running, some may be pending, some may be > finished.{quote} > you may argue whether this is the right thing to report, e.g. in the REST > interface, when a user wants to react on the actual state change from > SCHEDULED to RUNNING. It seems, some intermediate state is missing here which > would clarify things. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] tzulitai commented on issue #7206: [FLINK-11042][kafka, tests] Drop invalid kafka producer tests
tzulitai commented on issue #7206: [FLINK-11042][kafka,tests] Drop invalid kafka producer tests URL: https://github.com/apache/flink/pull/7206#issuecomment-459229838 Ok. Since we've already been removing tests that rely on shutting down Kafka brokers (which were mostly the ones causing a lot of the instabilities), I don't feel strongly against removing this one either. +1 from my side. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] casidiablo commented on issue #2671: [FLINK-4862] fix Timer register in ContinuousEventTimeTrigger
casidiablo commented on issue #2671: [FLINK-4862] fix Timer register in ContinuousEventTimeTrigger URL: https://github.com/apache/flink/pull/2671#issuecomment-459222191 I don't have enough time to fix this. I documented what I found here https://issues.apache.org/jira/browse/FLINK-11408 though This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-11471) Job is reported RUNNING prematurely for queued scheduling
[ https://issues.apache.org/jira/browse/FLINK-11471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756836#comment-16756836 ] Zhu Zhu commented on FLINK-11471: - I see. So in the future, a Job is "RUNNING" when and only when at least one task is "RUNNING". Looks good. Will there be a new JobStatus marking that the job is trying to scheduling tasks but no task is RUNNING yet? > Job is reported RUNNING prematurely for queued scheduling > - > > Key: FLINK-11471 > URL: https://issues.apache.org/jira/browse/FLINK-11471 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Scheduler >Affects Versions: 1.7.1 >Reporter: Nico Kruber >Assignee: vinoyang >Priority: Major > > With queued scheduling enabled (seems to be the default now), the job's > status is changed to RUNNING before all tasks are actually running. > Although {{JobStatus#RUNNING}} states > {quote}Some tasks are scheduled or running, some may be pending, some may be > finished.{quote} > you may argue whether this is the right thing to report, e.g. in the REST > interface, when a user wants to react on the actual state change from > SCHEDULED to RUNNING. It seems, some intermediate state is missing here which > would clarify things. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] hequn8128 commented on a change in pull request #7587: [FLINK-11064] [table] Setup a new flink-table module structure
hequn8128 commented on a change in pull request #7587: [FLINK-11064] [table] Setup a new flink-table module structure URL: https://github.com/apache/flink/pull/7587#discussion_r252527514 ## File path: flink-table/flink-table-planner/pom.xml ## @@ -22,13 +22,18 @@ under the License. org.apache.flink - flink-libraries + flink-table 1.8-SNAPSHOT .. - flink-table_${scala.binary.version} - flink-table + flink-table-planner_${scala.binary.version} + flink-table-planner + + This module bridges Table/SQL API and runtime. It contains + all resources that are required during pre-flight and runtime + phase. + Review comment: Make sense. Thanks for the explanation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-11483) Improve StreamOperatorSnapshotRestoreTest with Parameterized
[ https://issues.apache.org/jira/browse/FLINK-11483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-11483: --- Labels: pull-request-available (was: ) > Improve StreamOperatorSnapshotRestoreTest with Parameterized > > > Key: FLINK-11483 > URL: https://issues.apache.org/jira/browse/FLINK-11483 > Project: Flink > Issue Type: Test > Components: State Backends, Checkpointing, Tests >Reporter: Congxian Qiu >Assignee: Congxian Qiu >Priority: Major > Labels: pull-request-available > > In current implementation, we will test {{StreamOperatorSnapshot}} with three > statebackend: {{File}}, {{RocksDB_FULL}}, {{RocksDB_Incremental}}, each in a > sperate class, we could improve this with Parameterized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] klion26 opened a new pull request #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized
klion26 opened a new pull request #7612: [FLINK-11483][tests] Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized URL: https://github.com/apache/flink/pull/7612 ## What is the purpose of the change Improve StreamOperatorSnapshotRestoreTest with JUnit's Parameterized ## Brief change log In current we will test StreamOperatorSnapshotRestore with three backends: `file`, `rocksdb full` and `rocksdb incremental`, each in a separate class. In the current patch, we improve this by using JUnit's Parameterized. ## Verifying this change This change is already covered by existing tests, such as `StreamOperatorSnapshotRestoreTest`. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (**no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (**no**) - The serializers: (**no**) - The runtime per-record code paths (performance sensitive): (**no**) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (**no**) - The S3 file system connector: (**no**) ## Documentation - Does this pull request introduce a new feature? (**no**) - If yes, how is the feature documented? (**not applicable**) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (FLINK-10769) port InMemoryExternalCatalog to java
[ https://issues.apache.org/jira/browse/FLINK-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Li closed FLINK-10769. Resolution: Invalid Release Note: no need to port existing InMemoryCatalog since we will develop new in memory catalog with completely new APIs from scratch > port InMemoryExternalCatalog to java > > > Key: FLINK-10769 > URL: https://issues.apache.org/jira/browse/FLINK-10769 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Reporter: Bowen Li >Assignee: Bowen Li >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 10m > Remaining Estimate: 0h > > In the Flink-Hive integration design, we propose a new FlinkInMemoryCatalog > (FLINK-10697) for production use. FlinkInMemoryCatalog will share some part > with the existing InMemoryExternalCatalog, thus we need to make changes to > InMemoryExternalCatalog. > Background: there are two parallel efforts going on right now - FLINK-10686, > driven by Timo, includes moving external catalogs APIs from flink-table to > flink-table-common, also from Scala to Java; FLINK-10744 I'm working on right > now to integrate Flink with Hive and enhance external catalog functionality. > As discussed with @twalthr in FLINK-10689, we'd better parallelize these > efforts while introducing minimal overhead for integrating them later. An > agreed way is to writing new code/feature related to external catalogs/hive > in Java in flink-table. This way, it will minimize migration efforts later to > move these new code into flink-table-common. If existing classes are modified > for a feature we can start migrating it to Java in a separate commit first > and then perform the actual feature changes, and migrated classes can be > placed in flink-table/src/main/java until we find a better module structure. > Therefore, we will port InMemoryExternalCatalog to java first. This PR only > port scala to java with NO feature or behavior change. This is a > pre-requisite for FLINK-10697 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-10026) Integrate heap based timers in RocksDB backend with incremental checkpoints
[ https://issues.apache.org/jira/browse/FLINK-10026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] boshu Zheng reassigned FLINK-10026: --- Assignee: boshu Zheng > Integrate heap based timers in RocksDB backend with incremental checkpoints > --- > > Key: FLINK-10026 > URL: https://issues.apache.org/jira/browse/FLINK-10026 > Project: Flink > Issue Type: Sub-task > Components: State Backends, Checkpointing >Affects Versions: 1.6.0 >Reporter: Stefan Richter >Assignee: boshu Zheng >Priority: Major > Fix For: 1.8.0 > > > Currently, heap based priority queue state does not participate in RocksDB > incremental checkpointing and still uses the legacy (synchronous) way of > checkpointing timers. > We should integrate it with incremental checkpoints in such way that the > state is written non-incrementally into a file, and another key-grouped state > handle becomes part of the non-shared section of the incremental state handle. > After that we can also remove the path for legacy checkpointing completely > and also remove the key-grouping from the de-duplication hash map in the > {{HeapPriorityQueueSet}}. All can go into one map. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11471) Job is reported RUNNING prematurely for queued scheduling
[ https://issues.apache.org/jira/browse/FLINK-11471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756815#comment-16756815 ] vinoyang commented on FLINK-11471: -- Currently, the transformation of {{JobStatus.RUNNING}} does not depend on the state of any Task. Its state machine is completely independent on the JM side (even without the need for the Task to be scheduled or executed). I think we can make it more accurate according to the status report of the Task. For example, we only change the JobStatus to RUNNING after receiving any Task report "RUNNING". > Job is reported RUNNING prematurely for queued scheduling > - > > Key: FLINK-11471 > URL: https://issues.apache.org/jira/browse/FLINK-11471 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Scheduler >Affects Versions: 1.7.1 >Reporter: Nico Kruber >Assignee: vinoyang >Priority: Major > > With queued scheduling enabled (seems to be the default now), the job's > status is changed to RUNNING before all tasks are actually running. > Although {{JobStatus#RUNNING}} states > {quote}Some tasks are scheduled or running, some may be pending, some may be > finished.{quote} > you may argue whether this is the right thing to report, e.g. in the REST > interface, when a user wants to react on the actual state change from > SCHEDULED to RUNNING. It seems, some intermediate state is missing here which > would clarify things. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11461) Remove useless MockRecordWriter
[ https://issues.apache.org/jira/browse/FLINK-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-11461: --- Labels: pull-request-available (was: ) > Remove useless MockRecordWriter > --- > > Key: FLINK-11461 > URL: https://issues.apache.org/jira/browse/FLINK-11461 > Project: Flink > Issue Type: Task > Components: Tests >Reporter: zhijiang >Assignee: zhijiang >Priority: Minor > Labels: pull-request-available > Fix For: 1.8.0 > > > The {{MockRecordWriter}} is not used in current code path, so delete it > directly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11460) Remove useless AcknowledgeStreamMockEnvironment
[ https://issues.apache.org/jira/browse/FLINK-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-11460: --- Labels: pull-request-available (was: ) > Remove useless AcknowledgeStreamMockEnvironment > --- > > Key: FLINK-11460 > URL: https://issues.apache.org/jira/browse/FLINK-11460 > Project: Flink > Issue Type: Task > Components: Tests >Reporter: zhijiang >Assignee: zhijiang >Priority: Minor > Labels: pull-request-available > Fix For: 1.8.0 > > > This class is not used any more in the code path, so delete it directly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] zhijiangW opened a new pull request #7611: [FLINK-11460][test] Remove the useless class AcknowledgeStreamMockEnvironment
zhijiangW opened a new pull request #7611: [FLINK-11460][test] Remove the useless class AcknowledgeStreamMockEnvironment URL: https://github.com/apache/flink/pull/7611 ## What is the purpose of the change *The class `AcknowledgeStreamMockEnvironment` is not used any more in current code path, so delete it to make code clean.* ## Brief change log - *Delete the class `AcknowledgeStreamMockEnvironment` from code path* ## Verifying this change *This change is a code cleanup without any test coverage.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhijiangW opened a new pull request #7610: [FLINK-11461][test] Remove the useless MockRecordReader class
zhijiangW opened a new pull request #7610: [FLINK-11461][test] Remove the useless MockRecordReader class URL: https://github.com/apache/flink/pull/7610 ## What is the purpose of the change *The `MockRecordReader` class is not used any more in current code path, so delete it to make clean.* ## Brief change log - *Delete the class `MockRecordReader`* ## Verifying this change This change is a code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yanghua commented on a change in pull request #7571: [FLINK-10724] Refactor failure handling in check point coordinator
yanghua commented on a change in pull request #7571: [FLINK-10724] Refactor failure handling in check point coordinator URL: https://github.com/apache/flink/pull/7571#discussion_r252518530 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java ## @@ -1277,6 +1293,34 @@ private void discardCheckpoint(PendingCheckpoint pendingCheckpoint, @Nullable Th } } + private void tryHandleCheckpointException(DeclineCheckpoint message, PendingCheckpoint currentCheckpoint) { + boolean needFailJob = message.getReason() != null && !(message.getReason() instanceof CheckpointDeclineException) + && failOnCheckpointingErrors && !currentCheckpoint.getProps().isSavepoint(); + + if (needFailJob) { + boolean processed = false; + ExecutionAttemptID failedExecutionID = message.getTaskExecutionId(); + for (ExecutionVertex vertex : tasksToWaitFor) { + if (vertex.getCurrentExecutionAttempt().getAttemptId().equals(failedExecutionID)) { + if (currentPeriodicTrigger != null) { + currentPeriodicTrigger.cancel(true); + currentPeriodicTrigger = null; + } + + vertex.fail(message.getReason()); Review comment: @aljoscha Can you help determine if this operation will have side effects? cc @azagrebin This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-11440) ChainLengthIncreaseTest failed on Travis
[ https://issues.apache.org/jira/browse/FLINK-11440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756800#comment-16756800 ] vinoyang commented on FLINK-11440: -- [~aljoscha] In fact, I am not very sure whether this failure is due to my PR. Coincidentally, my PR is to refactor the checkpoint implementation. In the refactoring logic, there is an operation that triggers the ExecutionVertex#fail call in the CheckpointCoordinator. I will ping you in this PR related code, can you help determine if this operation will have side effects? > ChainLengthIncreaseTest failed on Travis > > > Key: FLINK-11440 > URL: https://issues.apache.org/jira/browse/FLINK-11440 > Project: Flink > Issue Type: Test > Components: Tests >Reporter: vinoyang >Priority: Major > Labels: test-stability > > > {code:java} > 12:55:46.212 [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time > elapsed: 11.875 s <<< FAILURE! - in > org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthIncreaseTest > 12:55:46.214 [ERROR] testMigrationAndRestore[Migrate Savepoint: > 1.7](org.apache.flink.test.state.operator.restore.unkeyed.ChainLengthIncreaseTest) > Time elapsed: 0.109 s <<< ERROR! > java.util.concurrent.ExecutionException: > java.util.concurrent.CompletionException: java.lang.IllegalStateException: > Checkpoint executing was failureTask received cancellation from one of its > inputs. > Caused by: java.util.concurrent.CompletionException: > java.lang.IllegalStateException: Checkpoint executing was failureTask > received cancellation from one of its inputs. > Caused by: java.lang.IllegalStateException: Checkpoint executing was > failureTask received cancellation from one of its inputs. > {code} > > log details : https://api.travis-ci.org/v3/job/485352065/log.txt > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11471) Job is reported RUNNING prematurely for queued scheduling
[ https://issues.apache.org/jira/browse/FLINK-11471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756792#comment-16756792 ] Zhu Zhu commented on FLINK-11471: - Hi [~yanghua] and [~NicoK], do you have the solution for this issue already? For batch job, there can always be some tasks running, some tasks in scheduling and some tasks pending for scheduling checking. What would the job status be in this case? > Job is reported RUNNING prematurely for queued scheduling > - > > Key: FLINK-11471 > URL: https://issues.apache.org/jira/browse/FLINK-11471 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Scheduler >Affects Versions: 1.7.1 >Reporter: Nico Kruber >Assignee: vinoyang >Priority: Major > > With queued scheduling enabled (seems to be the default now), the job's > status is changed to RUNNING before all tasks are actually running. > Although {{JobStatus#RUNNING}} states > {quote}Some tasks are scheduled or running, some may be pending, some may be > finished.{quote} > you may argue whether this is the right thing to report, e.g. in the REST > interface, when a user wants to react on the actual state change from > SCHEDULED to RUNNING. It seems, some intermediate state is missing here which > would clarify things. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] tzulitai commented on issue #7606: [FLINK-10774] Rework lifecycle management of partitionDiscoverer in FlinkKafkaConsumerBase
tzulitai commented on issue #7606: [FLINK-10774] Rework lifecycle management of partitionDiscoverer in FlinkKafkaConsumerBase URL: https://github.com/apache/flink/pull/7606#issuecomment-459186967 Since I've already reviewed this as part of the efforts in #7020, and Travis is green, LGTM +1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tzulitai removed a comment on issue #7606: [FLINK-10774] Rework lifecycle management of partitionDiscoverer in FlinkKafkaConsumerBase
tzulitai removed a comment on issue #7606: [FLINK-10774] Rework lifecycle management of partitionDiscoverer in FlinkKafkaConsumerBase URL: https://github.com/apache/flink/pull/7606#issuecomment-459186967 Since I've already reviewed this as part of the efforts in #7020, and Travis is green, LGTM +1 Thanks for fixing this @stevenzwu @tillrohrmann This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tzulitai edited a comment on issue #7606: [FLINK-10774] Rework lifecycle management of partitionDiscoverer in FlinkKafkaConsumerBase
tzulitai edited a comment on issue #7606: [FLINK-10774] Rework lifecycle management of partitionDiscoverer in FlinkKafkaConsumerBase URL: https://github.com/apache/flink/pull/7606#issuecomment-459186967 Since I've already reviewed this as part of the efforts in #7020, and Travis is green, LGTM +1 Thanks for fixing this @stevenzwu @tillrohrmann This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-11484) Blink java.util.concurrent.TimeoutException
pj created FLINK-11484: -- Summary: Blink java.util.concurrent.TimeoutException Key: FLINK-11484 URL: https://issues.apache.org/jira/browse/FLINK-11484 Project: Flink Issue Type: Bug Components: Table API SQL Affects Versions: 1.5.5 Environment: The link of blink source code: [github.com/apache/flink/tree/blink|https://github.com/apache/flink/tree/blink] Reporter: pj *If I run blink application on yarn and the parallelism number larger than 1.* *Following is the command :* ./flink run -m yarn-cluster -ynm FLINK_NG_ENGINE_1 -ys 4 -yn 10 -ytm 5120 -p 40 -c XXMain ~/xx.jar *Following is the code:* {{DataStream outputStream = tableEnv.toAppendStream(curTable, Row.class); outputStream.print();}} *{{The whole subtask of application will hang a long time and finally the }}{{toAppendStream()}}{{ function will throw an exception like below:}}* {{org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: f5e4f7243d06035202e8fa250c364304) at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:276) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:482) at org.apache.flink.streaming.api.environment.StreamContextEnvironment.executeInternal(StreamContextEnvironment.java:85) at org.apache.flink.streaming.api.environment.StreamContextEnvironment.executeInternal(StreamContextEnvironment.java:37) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1893) at com.ngengine.main.KafkaMergeMain.startApp(KafkaMergeMain.java:352) at com.ngengine.main.KafkaMergeMain.main(KafkaMergeMain.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:561) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:445) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:419) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:786) at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1029) at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1105) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1105) Caused by: java.util.concurrent.TimeoutException at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:834) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)}}{{}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11471) Job is reported RUNNING prematurely for queued scheduling
[ https://issues.apache.org/jira/browse/FLINK-11471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang reassigned FLINK-11471: Assignee: vinoyang > Job is reported RUNNING prematurely for queued scheduling > - > > Key: FLINK-11471 > URL: https://issues.apache.org/jira/browse/FLINK-11471 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Scheduler >Affects Versions: 1.7.1 >Reporter: Nico Kruber >Assignee: vinoyang >Priority: Major > > With queued scheduling enabled (seems to be the default now), the job's > status is changed to RUNNING before all tasks are actually running. > Although {{JobStatus#RUNNING}} states > {quote}Some tasks are scheduled or running, some may be pending, some may be > finished.{quote} > you may argue whether this is the right thing to report, e.g. in the REST > interface, when a user wants to react on the actual state change from > SCHEDULED to RUNNING. It seems, some intermediate state is missing here which > would clarify things. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jinglining commented on issue #7552: [FLINK-11405][rest]can see task and task attempt exception by start e…
jinglining commented on issue #7552: [FLINK-11405][rest]can see task and task attempt exception by start e… URL: https://github.com/apache/flink/pull/7552#issuecomment-459183827 @zentol can you review it? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] themez opened a new pull request #7609: [FLINK-filesystems] [flink-s3-fs-hadoop] Add mirrored config key for s3 endpoint
themez opened a new pull request #7609: [FLINK-filesystems] [flink-s3-fs-hadoop] Add mirrored config key for s3 endpoint URL: https://github.com/apache/flink/pull/7609 ## What is the purpose of the change Add s3 endpoint configuration so flink can connect to S3 in china regions which has different endpoints. ## Brief change log Add `MIRRORED_CONFIG_KEYS` item for s3 endpoint ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (yes) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (docs) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-11483) Improve StreamOperatorSnapshotRestoreTest with Parameterized
Congxian Qiu created FLINK-11483: Summary: Improve StreamOperatorSnapshotRestoreTest with Parameterized Key: FLINK-11483 URL: https://issues.apache.org/jira/browse/FLINK-11483 Project: Flink Issue Type: Test Components: State Backends, Checkpointing, Tests Reporter: Congxian Qiu Assignee: Congxian Qiu In current implementation, we will test {{StreamOperatorSnapshot}} with three statebackend: {{File}}, {{RocksDB_FULL}}, {{RocksDB_Incremental}}, each in a sperate class, we could improve this with Parameterized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11470) LocalEnvironment doesn't call FileSystem.initialize()
[ https://issues.apache.org/jira/browse/FLINK-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang reassigned FLINK-11470: Assignee: vinoyang > LocalEnvironment doesn't call FileSystem.initialize() > - > > Key: FLINK-11470 > URL: https://issues.apache.org/jira/browse/FLINK-11470 > Project: Flink > Issue Type: Bug > Components: Local Runtime >Affects Versions: 1.6.2 >Reporter: Nico Kruber >Assignee: vinoyang >Priority: Major > > Proper Flink cluster components, e.g. task manager or job manager, initialize > configured file systems with their parsed {{Configuration}} objects. However, > the {{LocalEnvironment}} does not seem to do that and we therefore lack the > ability to configure access credentials etc like in the following example: > {code} > Configuration config = new Configuration(); > config.setString("s3.access-key", "user"); > config.setString("s3.secret-key", "secret"); > //FileSystem.initialize(config); > final ExecutionEnvironment exEnv = > ExecutionEnvironment.createLocalEnvironment(config); > {code} > The workaround is to call {{FileSystem.initialize(config);}} yourself but it > is actually surprising that this is not done automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11455) Support evictor operations on slicing and merging operators
[ https://issues.apache.org/jira/browse/FLINK-11455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang reassigned FLINK-11455: Assignee: vinoyang > Support evictor operations on slicing and merging operators > --- > > Key: FLINK-11455 > URL: https://issues.apache.org/jira/browse/FLINK-11455 > Project: Flink > Issue Type: Sub-task >Reporter: Rong Rong >Assignee: vinoyang >Priority: Major > > The original implementation POC of SliceStream and MergeStream does not > considere evicting window operations. this support can be further expanded in > order to cover multiple timeout duration session windows. See > [https://docs.google.com/document/d/1ziVsuW_HQnvJr_4a9yKwx_LEnhVkdlde2Z5l6sx5HlY/edit#heading=h.ihxm3alf3tk0.] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] EronWright commented on issue #7390: [FLINK-11240] [table] External Catalog Factory and Descriptor
EronWright commented on issue #7390: [FLINK-11240] [table] External Catalog Factory and Descriptor URL: https://github.com/apache/flink/pull/7390#issuecomment-459178024 @twalthr ping This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-10744) Integrate Flink with Hive metastore
[ https://issues.apache.org/jira/browse/FLINK-10744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756766#comment-16756766 ] Eron Wright commented on FLINK-10744: -- Update on my end, all issues assigned to me have PRs open; awaiting review. > Integrate Flink with Hive metastore > > > Key: FLINK-10744 > URL: https://issues.apache.org/jira/browse/FLINK-10744 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Affects Versions: 1.6.2 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang >Priority: Major > > This JIRA keeps track of the effort of FLINK-10556 on Hive metastore > integration. It mainly covers two aspects: > # Register Hive metastore as an external catalog of Flink, such that Hive > table metadata can be accessed directly. > # Store Flink metadata (tables, views, UDFs, etc) in a catalog that utilizes > Hive as the schema registry. > Discussions and resulting design doc will be shared here, but detailed work > items will be tracked by sub-tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jinglining commented on issue #7554: [FLINK-11357][test]Check and port LeaderChangeJobRecoveryTest to new …
jinglining commented on issue #7554: [FLINK-11357][test]Check and port LeaderChangeJobRecoveryTest to new … URL: https://github.com/apache/flink/pull/7554#issuecomment-459178142 Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] EronWright commented on issue #7392: [FLINK-11241][table] - Enhance TableEnvironment to connect to a catalog via descriptor
EronWright commented on issue #7392: [FLINK-11241][table] - Enhance TableEnvironment to connect to a catalog via descriptor URL: https://github.com/apache/flink/pull/7392#issuecomment-459178101 @twalthr ping This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] EronWright commented on issue #7393: [FLINK-9172][table][sql-client] - Support external catalogs in SQL-Client
EronWright commented on issue #7393: [FLINK-9172][table][sql-client] - Support external catalogs in SQL-Client URL: https://github.com/apache/flink/pull/7393#issuecomment-459177902 @twalthr ping This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] EronWright commented on issue #7389: [FLINK-11237] [table] Enhance LocalExecutor to wrap TableEnvironment w/ classloader
EronWright commented on issue #7389: [FLINK-11237] [table] Enhance LocalExecutor to wrap TableEnvironment w/ classloader URL: https://github.com/apache/flink/pull/7389#issuecomment-459177499 @twalthr ping This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Assigned] (FLINK-11456) Improve window operator with sliding window assigners
[ https://issues.apache.org/jira/browse/FLINK-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang reassigned FLINK-11456: Assignee: vinoyang > Improve window operator with sliding window assigners > - > > Key: FLINK-11456 > URL: https://issues.apache.org/jira/browse/FLINK-11456 > Project: Flink > Issue Type: Sub-task > Components: DataStream API >Reporter: Rong Rong >Assignee: vinoyang >Priority: Major > > With Slicing and merging operators that exposes the internals of window > operators. current sliding window can be improved by eliminating duplicate > aggregations or duplicate element insert into multiple panes (e.g. > namespaces). > The following sliding window operation > {code:java} > val resultStream: DataStream = inputStream > .keyBy("key") > .window(SlidingEventTimeWindow.of(Time.seconds(5L), Time.seconds(15L))) > .sum("value") > {code} > can produce job graph equivalent to > {code:java} > val resultStream: DataStream = inputStream > .keyBy("key") > .sliceWindow(Time.seconds(5L)) > .sum("value") > .slideOver(Count.of(3)) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (FLINK-10618) Introduce catalog for Flink tables
[ https://issues.apache.org/jira/browse/FLINK-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated FLINK-10618: Comment: was deleted (was: This task is superseded by FLINK-11482. ) > Introduce catalog for Flink tables > -- > > Key: FLINK-10618 > URL: https://issues.apache.org/jira/browse/FLINK-10618 > Project: Flink > Issue Type: Sub-task > Components: SQL Client >Affects Versions: 1.6.1 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang >Priority: Major > Fix For: 1.8.0 > > > This JIRA covers the 2nd aspect of Flink-Hive metastore integration. > Besides meta objects such as tables that may come from an > {{ExternalCatalog}}, Flink also deals with tables/views/functions that are > created on the fly (in memory), or specified in a configuration file. Those > objects don't belong to any {{ExternalCatalog}}, yet Flink either stores them > in memory, which are non-persistent, or recreates them from a file, which is > a big pain for the user. Those objects are only known to Flink but Flink has > a poor management for them. > Since they are typical objects in a database catalog, it's natural to have a > catalog that manages those objects. The interface will be similar to > {{ExternalCatalog}}, which contains meta objects that are not managed by > Flink. There are several possible implementations of the Flink internal > catalog interface: memory, file, external registry (such as confluent schema > registry or Hive metastore), and relational database, etc. > The initial functionality as well as the catalog hierarchy could be very > simple. The basic functionality of the catalog will be mostly create, alter, > and drop tables, views, functions, etc. Obviously, this can evolve over the > time. > We plan to provide implementations: in-memory and in Hive metastore. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10697) Create FlinkInMemoryCatalog class, an in-memory catalog that stores Flink's meta objects for production use
[ https://issues.apache.org/jira/browse/FLINK-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756756#comment-16756756 ] Xuefu Zhang commented on FLINK-10697: - The task here is superseded by FLINK-11475. > Create FlinkInMemoryCatalog class, an in-memory catalog that stores Flink's > meta objects for production use > --- > > Key: FLINK-10697 > URL: https://issues.apache.org/jira/browse/FLINK-10697 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Affects Versions: 1.6.1 >Reporter: Xuefu Zhang >Assignee: Bowen Li >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > > Currently all Flink meta objects (currently tables only) are stored in memory > as part of Calcite catalog. Those objects are temporary (such as inline > tables), others are meant to live beyond user session. As we introduce > catalog for those objects (tables, views, and UDFs), it makes sense to > organize them neatly. Further, having a catalog implementation that store > those objects in memory is to retain the currently behavior, which can be > configured by user. > Please note that this implementation is different from the current > {{InMemoryExternalCatalog}, which is used mainly for testing and doesn't > reflect what's actually needed for Flink meta objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-10697) Create FlinkInMemoryCatalog class, an in-memory catalog that stores Flink's meta objects for production use
[ https://issues.apache.org/jira/browse/FLINK-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang closed FLINK-10697. --- Resolution: Duplicate Fix Version/s: (was: 1.8.0) > Create FlinkInMemoryCatalog class, an in-memory catalog that stores Flink's > meta objects for production use > --- > > Key: FLINK-10697 > URL: https://issues.apache.org/jira/browse/FLINK-10697 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Affects Versions: 1.6.1 >Reporter: Xuefu Zhang >Assignee: Bowen Li >Priority: Major > Labels: pull-request-available > > Currently all Flink meta objects (currently tables only) are stored in memory > as part of Calcite catalog. Those objects are temporary (such as inline > tables), others are meant to live beyond user session. As we introduce > catalog for those objects (tables, views, and UDFs), it makes sense to > organize them neatly. Further, having a catalog implementation that store > those objects in memory is to retain the currently behavior, which can be > configured by user. > Please note that this implementation is different from the current > {{InMemoryExternalCatalog}, which is used mainly for testing and doesn't > reflect what's actually needed for Flink meta objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-10699) Create FlinkHmsCatalog for persistent Flink meta objects using Hive metastore as a registry
[ https://issues.apache.org/jira/browse/FLINK-10699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang closed FLINK-10699. --- Resolution: Duplicate This duplicates FLINK-11482. > Create FlinkHmsCatalog for persistent Flink meta objects using Hive metastore > as a registry > --- > > Key: FLINK-10699 > URL: https://issues.apache.org/jira/browse/FLINK-10699 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Affects Versions: 1.6.1 >Reporter: Xuefu Zhang >Assignee: Bowen Li >Priority: Major > > Similar to FLINK-10697, but using Hive metastore as persistent storage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11482) Implement GenericHiveMetastoreCatalog
Xuefu Zhang created FLINK-11482: --- Summary: Implement GenericHiveMetastoreCatalog Key: FLINK-11482 URL: https://issues.apache.org/jira/browse/FLINK-11482 Project: Flink Issue Type: Sub-task Components: Table API SQL Reporter: Xuefu Zhang Assignee: Bowen Li {{GenericHiveMetastoreCatalog}} is a special implementation of {{ReadableWritableCatalog}} interface to store tables/views/functions defined in Flink to Hive metastore. With respect to the objects stored, {{GenericHiveMetastoreCatalog}} is similar to {{GenericInMemoryCatalog}}, but the storage used is a Hive metastore instead of memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10618) Introduce catalog for Flink tables
[ https://issues.apache.org/jira/browse/FLINK-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756752#comment-16756752 ] Xuefu Zhang commented on FLINK-10618: - This task is superseded by FLINK-11482. > Introduce catalog for Flink tables > -- > > Key: FLINK-10618 > URL: https://issues.apache.org/jira/browse/FLINK-10618 > Project: Flink > Issue Type: Sub-task > Components: SQL Client >Affects Versions: 1.6.1 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang >Priority: Major > Fix For: 1.8.0 > > > This JIRA covers the 2nd aspect of Flink-Hive metastore integration. > Besides meta objects such as tables that may come from an > {{ExternalCatalog}}, Flink also deals with tables/views/functions that are > created on the fly (in memory), or specified in a configuration file. Those > objects don't belong to any {{ExternalCatalog}}, yet Flink either stores them > in memory, which are non-persistent, or recreates them from a file, which is > a big pain for the user. Those objects are only known to Flink but Flink has > a poor management for them. > Since they are typical objects in a database catalog, it's natural to have a > catalog that manages those objects. The interface will be similar to > {{ExternalCatalog}}, which contains meta objects that are not managed by > Flink. There are several possible implementations of the Flink internal > catalog interface: memory, file, external registry (such as confluent schema > registry or Hive metastore), and relational database, etc. > The initial functionality as well as the catalog hierarchy could be very > simple. The basic functionality of the catalog will be mostly create, alter, > and drop tables, views, functions, etc. Obviously, this can evolve over the > time. > We plan to provide implementations: in-memory and in Hive metastore. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-10542) Implement an external catalog for Hive
[ https://issues.apache.org/jira/browse/FLINK-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang closed FLINK-10542. --- Resolution: Duplicate > Implement an external catalog for Hive > -- > > Key: FLINK-10542 > URL: https://issues.apache.org/jira/browse/FLINK-10542 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Affects Versions: 1.6.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang >Priority: Major > > Similar to FLINK-2167 but rather register Hive metastore as an external > catalog in the {{TableEnvironment}}. After registration, Table API and SQL > queries should be able to access all Hive tables (metadata only with this > JIRA completed). > This might supersede FLINK-2167 as Hive metastore stores a superset of tables > available via hCat without an indirection. > This JIRA covers the 1st aspect of Flink-Hive metastore integration outlined > in FLINK-10744 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10542) Implement an external catalog for Hive
[ https://issues.apache.org/jira/browse/FLINK-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756748#comment-16756748 ] Xuefu Zhang commented on FLINK-10542: - The task here is superseded by FLINK-11479. > Implement an external catalog for Hive > -- > > Key: FLINK-10542 > URL: https://issues.apache.org/jira/browse/FLINK-10542 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Affects Versions: 1.6.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang >Priority: Major > > Similar to FLINK-2167 but rather register Hive metastore as an external > catalog in the {{TableEnvironment}}. After registration, Table API and SQL > queries should be able to access all Hive tables (metadata only with this > JIRA completed). > This might supersede FLINK-2167 as Hive metastore stores a superset of tables > available via hCat without an indirection. > This JIRA covers the 1st aspect of Flink-Hive metastore integration outlined > in FLINK-10744 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11481) Integrate HiveTableFactory with existing table factory discovery mechanism
Xuefu Zhang created FLINK-11481: --- Summary: Integrate HiveTableFactory with existing table factory discovery mechanism Key: FLINK-11481 URL: https://issues.apache.org/jira/browse/FLINK-11481 Project: Flink Issue Type: Sub-task Components: Table API SQL Reporter: Xuefu Zhang Assignee: Xuefu Zhang Currently table factory is auto discovered based on table properties. However, for {{HiveTableFactory}}, the factory class for Hive tables, is specifically returned from {{HiveCatalog}}. Since we allow both mechanisms, we need to integrate the two so that they work seamlessly. Please refer to the design doc for details. Some further design thoughts are necessary, however. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11480) Create HiveTableFactory that creates TableSource/Sink from a Hive table
Xuefu Zhang created FLINK-11480: --- Summary: Create HiveTableFactory that creates TableSource/Sink from a Hive table Key: FLINK-11480 URL: https://issues.apache.org/jira/browse/FLINK-11480 Project: Flink Issue Type: Sub-task Components: Table API SQL Reporter: Xuefu Zhang Assignee: Xuefu Zhang This may requires some design thoughts because {{HiveTableFactory}} is different from existing {{TableFactory}} implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11275) Unified Catalog APIs
[ https://issues.apache.org/jira/browse/FLINK-11275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated FLINK-11275: Description: During Flink-Hive integration, we found that the current Catalog APIs are quite cumbersome and to a large extent requiring significant rework. While previous APIs are essentially not used, at least not in Flink code base, we needs to be careful in defining a new set of APIs to avoid future rework again. This is an uber JIRA covering all the work outlined in [FLIP-30|https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs]. was: During Flink-Hive integration, we found that the current Catalog APIs are quite cumbersome and to a large extent requiring significant rework. While previous APIs are essentially not used, at least not in Flink code base, we needs to be careful in defining a new set of APIs to avoid future rework again. This is an uber JIRA covering all the work outlined in FLIP-30. > Unified Catalog APIs > > > Key: FLINK-11275 > URL: https://issues.apache.org/jira/browse/FLINK-11275 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang >Priority: Major > > During Flink-Hive integration, we found that the current Catalog APIs are > quite cumbersome and to a large extent requiring significant rework. While > previous APIs are essentially not used, at least not in Flink code base, we > needs to be careful in defining a new set of APIs to avoid future rework > again. > This is an uber JIRA covering all the work outlined in > [FLIP-30|https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11479) Implement HiveCatalog
Xuefu Zhang created FLINK-11479: --- Summary: Implement HiveCatalog Key: FLINK-11479 URL: https://issues.apache.org/jira/browse/FLINK-11479 Project: Flink Issue Type: Sub-task Components: Table API SQL Reporter: Xuefu Zhang Assignee: Bowen Li {{HiveCatalog}} is an implementation of {{ReadableWritableCatalog}} interface for meta objects managed by Hive Metastore. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11477) Define catalog entries in SQL client YAML file and handle the creation and registration of those entries
Xuefu Zhang created FLINK-11477: --- Summary: Define catalog entries in SQL client YAML file and handle the creation and registration of those entries Key: FLINK-11477 URL: https://issues.apache.org/jira/browse/FLINK-11477 Project: Flink Issue Type: Sub-task Components: SQL Client Reporter: Xuefu Zhang Assignee: Bowen Li As the configuration for SQL client, the YAML file currently allows one to register tables along with other entities such as deployment, execution, and so on. However, it doesn't have a section for catalog registration. We need to support user registering catalogs at sql client. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11478) Handle existing table registration via YAML file in the context of catalog support
Xuefu Zhang created FLINK-11478: --- Summary: Handle existing table registration via YAML file in the context of catalog support Key: FLINK-11478 URL: https://issues.apache.org/jira/browse/FLINK-11478 Project: Flink Issue Type: Sub-task Components: SQL Client, Table API SQL Reporter: Xuefu Zhang Assignee: Xuefu Zhang Before we introduce Catalog, it's common for user to define his/her tables in SQL client YAML file. With catalog support, it is no longer necessary. However, to keep backward compatibility, this mechanism should continue functioning. Behind the scene, we need to solve the problem of the same tables is registered every time user launches SQL client. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11476) Create CatalogManager to manage multiple catalogs and encapsulate Calcite schema
Xuefu Zhang created FLINK-11476: --- Summary: Create CatalogManager to manage multiple catalogs and encapsulate Calcite schema Key: FLINK-11476 URL: https://issues.apache.org/jira/browse/FLINK-11476 Project: Flink Issue Type: Sub-task Components: Table API SQL Reporter: Xuefu Zhang Assignee: Xuefu Zhang Flink allows for more than one registered catalogs. {{CatalogManager}} class is the holding class to manage and encapsulate the catalogs and their interrelations with Calcite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11475) Adapt existing InMemoryExternalCatalog to GenericInMemoryCatalog
Xuefu Zhang created FLINK-11475: --- Summary: Adapt existing InMemoryExternalCatalog to GenericInMemoryCatalog Key: FLINK-11475 URL: https://issues.apache.org/jira/browse/FLINK-11475 Project: Flink Issue Type: Sub-task Components: Table API SQL Reporter: Xuefu Zhang Assignee: Bowen Li {{GenericInMemoryCatalog}} needs to implement ReadableWritableCatalog interface based on the design. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11474) Add ReadableCatalog, ReadableWritableCatalog, and other related interfaces
Xuefu Zhang created FLINK-11474: --- Summary: Add ReadableCatalog, ReadableWritableCatalog, and other related interfaces Key: FLINK-11474 URL: https://issues.apache.org/jira/browse/FLINK-11474 Project: Flink Issue Type: Sub-task Components: Table API SQL Reporter: Xuefu Zhang Assignee: Xuefu Zhang Also deprecate ReadableCatalog, ReadableWritableCatalog, and other related, existing classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] TisonKun commented on issue #7570: [FLINK-11422] Prefer testing class to mock StreamTask in AbstractStre…
TisonKun commented on issue #7570: [FLINK-11422] Prefer testing class to mock StreamTask in AbstractStre… URL: https://github.com/apache/flink/pull/7570#issuecomment-459151470 @pnowojski thanks for your review! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (FLINK-11363) Check and remove TaskManagerConfigurationTest
[ https://issues.apache.org/jira/browse/FLINK-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756567#comment-16756567 ] Gary Yao edited comment on FLINK-11363 at 1/30/19 9:24 PM: --- Fixed via 7af5a98adebb83b6a8e687ba6ec38ec527bb325b 85c3e7e2c2b46b3a2a7efeee77317dd8aabe3479 was (Author: gjy): Fixed via 969c8b2d08afae9dafe0349c2a10cf43c6deb918 7af5a98adebb83b6a8e687ba6ec38ec527bb325b > Check and remove TaskManagerConfigurationTest > - > > Key: FLINK-11363 > URL: https://issues.apache.org/jira/browse/FLINK-11363 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Till Rohrmann >Assignee: Yun Tang >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Check whether {{TaskManagerConfigurationTest}} contains any relevant tests > for the new code base and then remove this test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11363) Check and remove TaskManagerConfigurationTest
[ https://issues.apache.org/jira/browse/FLINK-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Yao updated FLINK-11363: - Fix Version/s: 1.8.0 > Check and remove TaskManagerConfigurationTest > - > > Key: FLINK-11363 > URL: https://issues.apache.org/jira/browse/FLINK-11363 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Till Rohrmann >Assignee: Yun Tang >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Check whether {{TaskManagerConfigurationTest}} contains any relevant tests > for the new code base and then remove this test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-11363) Check and remove TaskManagerConfigurationTest
[ https://issues.apache.org/jira/browse/FLINK-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Yao closed FLINK-11363. Resolution: Fixed > Check and remove TaskManagerConfigurationTest > - > > Key: FLINK-11363 > URL: https://issues.apache.org/jira/browse/FLINK-11363 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Till Rohrmann >Assignee: Yun Tang >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Check whether {{TaskManagerConfigurationTest}} contains any relevant tests > for the new code base and then remove this test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (FLINK-11363) Check and remove TaskManagerConfigurationTest
[ https://issues.apache.org/jira/browse/FLINK-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Yao reopened FLINK-11363: -- > Check and remove TaskManagerConfigurationTest > - > > Key: FLINK-11363 > URL: https://issues.apache.org/jira/browse/FLINK-11363 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Till Rohrmann >Assignee: Yun Tang >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Check whether {{TaskManagerConfigurationTest}} contains any relevant tests > for the new code base and then remove this test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-11363) Check and remove TaskManagerConfigurationTest
[ https://issues.apache.org/jira/browse/FLINK-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Yao closed FLINK-11363. Resolution: Fixed Fixed via 969c8b2d08afae9dafe0349c2a10cf43c6deb918 7af5a98adebb83b6a8e687ba6ec38ec527bb325b > Check and remove TaskManagerConfigurationTest > - > > Key: FLINK-11363 > URL: https://issues.apache.org/jira/browse/FLINK-11363 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Till Rohrmann >Assignee: Yun Tang >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Check whether {{TaskManagerConfigurationTest}} contains any relevant tests > for the new code base and then remove this test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] GJL merged pull request #7595: [FLINK-11363][tests] Port TaskManagerConfigurationTest to new code base
GJL merged pull request #7595: [FLINK-11363][tests] Port TaskManagerConfigurationTest to new code base URL: https://github.com/apache/flink/pull/7595 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] walterddr commented on issue #7607: [FLINK-10076][table] Upgrade Calcite dependency to 1.18
walterddr commented on issue #7607: [FLINK-10076][table] Upgrade Calcite dependency to 1.18 URL: https://github.com/apache/flink/pull/7607#issuecomment-459101279 @zentol yes I think definitely wait for @twalthr to merge #7587 first. It shouldn't be too much of a merge issue. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-11473) Clarify Documenation on Latency Tracking
[ https://issues.apache.org/jira/browse/FLINK-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Knauf updated FLINK-11473: - Description: The documenation on Latency Tracking states: {noformat} All intermediate operators keep a list of the last n latencies from each source to compute a latency distribution. The sink operators keep a list from each source, and each parallel source instance to allow detecting latency issues caused by individual machines.{noformat} {{This is not correct anymore as this behavior is controlled by a configuration `metrics.latency.granularity` now.}} was: The documenation on Latency Markers states: {noformat} All intermediate operators keep a list of the last n latencies from each source to compute a latency distribution. The sink operators keep a list from each source, and each parallel source instance to allow detecting latency issues caused by individual machines.{noformat} {{This is not correct anymore as this behavior is controlled by a configuration `metrics.latency.granularity` now.}} > Clarify Documenation on Latency Tracking > > > Key: FLINK-11473 > URL: https://issues.apache.org/jira/browse/FLINK-11473 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.7.1 >Reporter: Konstantin Knauf >Assignee: Konstantin Knauf >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The documenation on Latency Tracking states: > {noformat} > All intermediate operators keep a list of the last n latencies from each > source to compute a latency distribution. The sink operators keep a list from > each source, and each parallel source instance to allow detecting latency > issues caused by individual machines.{noformat} > {{This is not correct anymore as this behavior is controlled by a > configuration `metrics.latency.granularity` now.}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11473) Clarify Documenation on Latency Tracking
[ https://issues.apache.org/jira/browse/FLINK-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Knauf updated FLINK-11473: - Summary: Clarify Documenation on Latency Tracking (was: Clarify Documenation on Latency Markers) > Clarify Documenation on Latency Tracking > > > Key: FLINK-11473 > URL: https://issues.apache.org/jira/browse/FLINK-11473 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.7.1 >Reporter: Konstantin Knauf >Assignee: Konstantin Knauf >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The documenation on Latency Markers states: > {noformat} > All intermediate operators keep a list of the last n latencies from each > source to compute a latency distribution. The sink operators keep a list from > each source, and each parallel source instance to allow detecting latency > issues caused by individual machines.{noformat} > {{This is not correct anymore as this behavior is controlled by a > configuration `metrics.latency.granularity` now.}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11473) Clarify Documenation on Latency Markers
[ https://issues.apache.org/jira/browse/FLINK-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-11473: --- Labels: pull-request-available (was: ) > Clarify Documenation on Latency Markers > --- > > Key: FLINK-11473 > URL: https://issues.apache.org/jira/browse/FLINK-11473 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.7.1 >Reporter: Konstantin Knauf >Assignee: Konstantin Knauf >Priority: Minor > Labels: pull-request-available > > The documenation on Latency Markers states: > {noformat} > All intermediate operators keep a list of the last n latencies from each > source to compute a latency distribution. The sink operators keep a list from > each source, and each parallel source instance to allow detecting latency > issues caused by individual machines.{noformat} > {{This is not correct anymore as this behavior is controlled by a > configuration `metrics.latency.granularity` now.}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] knaufk opened a new pull request #7608: FLINK-11473 [documentation][ Clarify documenation on Latency Tracking
knaufk opened a new pull request #7608: FLINK-11473 [documentation][ Clarify documenation on Latency Tracking URL: https://github.com/apache/flink/pull/7608
[jira] [Created] (FLINK-11473) Clarify Documenation on Latency Markers
Konstantin Knauf created FLINK-11473: Summary: Clarify Documenation on Latency Markers Key: FLINK-11473 URL: https://issues.apache.org/jira/browse/FLINK-11473 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 1.7.1 Reporter: Konstantin Knauf Assignee: Konstantin Knauf The documenation on Latency Markers states: {noformat} All intermediate operators keep a list of the last n latencies from each source to compute a latency distribution. The sink operators keep a list from each source, and each parallel source instance to allow detecting latency issues caused by individual machines.{noformat} {{This is not correct anymore as this behavior is controlled by a configuration `metrics.latency.granularity` now.}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] zentol commented on issue #7607: [FLINK-10076][table] Upgrade Calcite dependency to 1.18
zentol commented on issue #7607: [FLINK-10076][table] Upgrade Calcite dependency to 1.18 URL: https://github.com/apache/flink/pull/7607#issuecomment-459066684 With #7587 being close to being merged you should probably re-base your PR on of it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (FLINK-11357) Check and port LeaderChangeJobRecoveryTest to new code base if necessary
[ https://issues.apache.org/jira/browse/FLINK-11357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Yao closed FLINK-11357. Resolution: Fixed Fixed via 969c8b2d08afae9dafe0349c2a10cf43c6deb918 > Check and port LeaderChangeJobRecoveryTest to new code base if necessary > > > Key: FLINK-11357 > URL: https://issues.apache.org/jira/browse/FLINK-11357 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Till Rohrmann >Assignee: lining >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Check and port {{LeaderChangeJobRecoveryTest}} to new code base if necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] GJL merged pull request #7554: [FLINK-11357][test]Check and port LeaderChangeJobRecoveryTest to new …
GJL merged pull request #7554: [FLINK-11357][test]Check and port LeaderChangeJobRecoveryTest to new … URL: https://github.com/apache/flink/pull/7554 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-10076) Upgrade Calcite dependency to 1.18
[ https://issues.apache.org/jira/browse/FLINK-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-10076: --- Labels: pull-request-available (was: ) > Upgrade Calcite dependency to 1.18 > -- > > Key: FLINK-10076 > URL: https://issues.apache.org/jira/browse/FLINK-10076 > Project: Flink > Issue Type: Task > Components: Table API SQL >Reporter: Shuyi Chen >Assignee: Shuyi Chen >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] walterddr opened a new pull request #7607: [FLINK-10076][table] Upgrade Calcite dependency to 1.18
walterddr opened a new pull request #7607: [FLINK-10076][table] Upgrade Calcite dependency to 1.18 URL: https://github.com/apache/flink/pull/7607 ## What is the purpose of the change Upgrade Calcite dependency to 1.18 for flink-table package. including various bug fixes and improvements ( details can be found in JIRA discussion ) ## Brief change log - Fix TimeIndicator type optimization - Due to: [CALCITE-1413], [CALCITE-2695] - Fix: include `digest` for `TimeIndicatorRelDataType` - Under-simplify rexcall chain - Due to: [CALCITE-2726] RexSimplify does not simplify enough for some of the nested operations, doesn't do nested visitCall - Fix: updated tests, future fixes needed on calcite - Remove duplicated expressions & Casting optimization - Due to: [CALCITE-1413], [CALCITE-2631], [CALCITE-2639] - Fix: update tests - Nested selected time indicator type not working properly --> Due to: [CALCITE-2470], project method combines expressions if the underlying node is a project; Fix: pulled in `RelBuilder` from calcite and changes flag to not optimize project merge. ## Verifying this change This change is already covered by existing tests. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): yes - Updated Calcite dependency and list of exclusions - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? N/A This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-11419) StreamingFileSink fails to recover after taskmanager failure
[ https://issues.apache.org/jira/browse/FLINK-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756394#comment-16756394 ] Edward Rojas commented on FLINK-11419: -- [~kkl0u] Travis build is failing but it's not related to the changes made in this PR. [~StephanEwen] I saw that you were the last person to modify this file, maybe you could take a look at the changes proposed. > StreamingFileSink fails to recover after taskmanager failure > > > Key: FLINK-11419 > URL: https://issues.apache.org/jira/browse/FLINK-11419 > Project: Flink > Issue Type: Bug > Components: filesystem-connector >Affects Versions: 1.7.1 >Reporter: Edward Rojas >Assignee: Edward Rojas >Priority: Major > Labels: pull-request-available > Fix For: 1.7.2 > > Time Spent: 10m > Remaining Estimate: 0h > > If a job with a StreamingFileSink sending data to HDFS is running in a > cluster with multiple taskmanagers and the taskmanager executing the job goes > down (for some reason), when the other task manager start executing the job, > it fails saying that there is some "missing data in tmp file" because it's > not able to perform a truncate in the file. > Here the full stack trace: > {code:java} > java.io.IOException: Missing data in tmp file: > hdfs://path/to/hdfs/2019-01-20/.part-0-0.inprogress.823f9c20-3594-4fe3-ae8c-f57b6c35e191 > at > org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.(HadoopRecoverableFsDataOutputStream.java:93) > at > org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.recover(HadoopRecoverableWriter.java:72) > at > org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.restoreInProgressFile(Bucket.java:140) > at > org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.(Bucket.java:127) > at > org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.restore(Bucket.java:396) > at > org.apache.flink.streaming.api.functions.sink.filesystem.DefaultBucketFactoryImpl.restoreBucket(DefaultBucketFactoryImpl.java:64) > at > org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.handleRestoredBucketState(Buckets.java:177) > at > org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.initializeActiveBuckets(Buckets.java:165) > at > org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.initializeState(Buckets.java:149) > at > org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.initializeState(StreamingFileSink.java:334) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:278) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): > Failed to TRUNCATE_FILE > /path/to/hdfs/2019-01-20/.part-0-0.inprogress.823f9c20-3594-4fe3-ae8c-f57b6c35e191 > for DFSClient_NONMAPREDUCE_-2103482360_62 on x.xxx.xx.xx because this file > lease is currently owned by DFSClient_NONMAPREDUCE_1834204750_59 on x.xx.xx.xx > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3190) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncateInternal(FSNamesystem.java:2282) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncateInt(FSNamesystem.java:2228) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncate(FSNamesystem.java:2198) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.truncate(NameNodeRpcServer.java:1056) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.truncate(ClientNamenodeProtocolServerSideTranslatorPB.java:622) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) > at
[GitHub] NicoK commented on a change in pull request #6705: [FLINK-10356][network] add sanity checks to SpillingAdaptiveSpanningRecordDeserializer
NicoK commented on a change in pull request #6705: [FLINK-10356][network] add sanity checks to SpillingAdaptiveSpanningRecordDeserializer URL: https://github.com/apache/flink/pull/6705#discussion_r252367679 ## File path: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/api/serialization/SpanningRecordSerializationTest.java ## @@ -105,11 +119,235 @@ public void testHandleMixedLargeRecords() throws Exception { testSerializationRoundTrip(originalRecords, segmentSize); } + /** +* Non-spanning, deserialization reads one byte too many and succeeds - failure report comes +* from an additional check in {@link SpillingAdaptiveSpanningRecordDeserializer}. +*/ + @Test + public void testHandleDeserializingTooMuchNonSpanning1() throws Exception { + testHandleWrongDeserialization( + DeserializingTooMuch.getValue(), + 32 * 1024); + } + + /** +* Non-spanning, deserialization reads one byte too many and fails. +*/ + @Test + public void testHandleDeserializingTooMuchNonSpanning2() throws Exception { + testHandleWrongDeserialization( + DeserializingTooMuch.getValue(), + (serializedLength) -> serializedLength, + isA(IndexOutOfBoundsException.class)); + } + + /** +* Spanning, deserialization reads one byte too many and fails. +*/ + @Test + public void testHandleDeserializingTooMuchSpanning1() throws Exception { + testHandleWrongDeserialization( + DeserializingTooMuch.getValue(), + (serializedLength) -> serializedLength - 1, + isA(EOFException.class)); + } + + /** +* Spanning, deserialization reads one byte too many and fails. +*/ + @Test + public void testHandleDeserializingTooMuchSpanning2() throws Exception { + testHandleWrongDeserialization( + DeserializingTooMuch.getValue(), + (serializedLength) -> 1, + isA(EOFException.class)); + } + + /** +* Spanning, spilling, deserialization reads one byte too many. +*/ + @Test + public void testHandleDeserializingTooMuchSpanningLargeRecord() throws Exception { + testHandleWrongDeserialization( + LargeObjectTypeDeserializingTooMuch.getRandom(), + 32 * 1024, + isA(EOFException.class)); + } + + /** +* Non-spanning, deserialization forgets to read one byte - failure report comes from an +* additional check in {@link SpillingAdaptiveSpanningRecordDeserializer}. +*/ + @Test + public void testHandleDeserializingNotEnoughNonSpanning() throws Exception { + testHandleWrongDeserialization( + DeserializingNotEnough.getValue(), + 32 * 1024); + } + + /** +* Spanning, deserialization forgets to read one byte - failure report comes from an additional +* check in {@link SpillingAdaptiveSpanningRecordDeserializer}. +*/ + @Test + public void testHandleDeserializingNotEnoughSpanning1() throws Exception { + testHandleWrongDeserialization( + DeserializingNotEnough.getValue(), + (serializedLength) -> serializedLength - 1); + } + + /** +* Spanning, serialization length is 17 (including headers), deserialization forgets to read one +* byte - failure report comes from an additional check in {@link SpillingAdaptiveSpanningRecordDeserializer}. +*/ + @Test + public void testHandleDeserializingNotEnoughSpanning2() throws Exception { + testHandleWrongDeserialization( + DeserializingNotEnough.getValue(), + 1); + } + + /** +* Spanning, spilling, deserialization forgets to read one byte - failure report comes from an +* additional check in {@link SpillingAdaptiveSpanningRecordDeserializer}. +*/ + @Test + public void testHandleDeserializingNotEnoughSpanningLargeRecord() throws Exception { + testHandleWrongDeserialization( + LargeObjectTypeDeserializingNotEnough.getRandom(), + 32 * 1024); + } + + private void testHandleWrongDeserialization( + WrongDeserializationValue testValue, + IntFunction segmentSizeProvider, + Matcher expectedCause) throws Exception { + expectedException.expectCause(expectedCause); + testHandleWrongDeserialization(testValue,
[GitHub] tillrohrmann commented on a change in pull request #7356: [FLINK-10868][flink-yarn] Enforce maximum TMs failure rate in ResourceManagers
tillrohrmann commented on a change in pull request #7356: [FLINK-10868][flink-yarn] Enforce maximum TMs failure rate in ResourceManagers URL: https://github.com/apache/flink/pull/7356#discussion_r252360494 ## File path: flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java ## @@ -395,12 +405,20 @@ public void onContainersAllocated(List containers) { nodeManagerClient.startContainer(container, taskExecutorLaunchContext); } catch (Throwable t) { log.error("Could not start TaskManager in container {}.", container.getId(), t); - // release the failed container workerNodeMap.remove(resourceId); resourceManagerClient.releaseAssignedContainer(container.getId()); - // and ask for a new one - requestYarnContainerIfRequired(); + log.error("Could not start TaskManager in container {}.", container.getId(), t); + recordFailure(); + if (shouldRejectRequests()) { + rejectAllPendingSlotRequests(new MaximumFailedTaskManagerExceedingException( + String.format("Maximum number of failed container %d in interval %s" + + "is detected in Resource Manager", maximumFailureTaskExecutorPerInternal, + failureInterval.toString()), t)); Review comment: This branch should go into the `recordFailure` method. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tillrohrmann commented on a change in pull request #7356: [FLINK-10868][flink-yarn] Enforce maximum TMs failure rate in ResourceManagers
tillrohrmann commented on a change in pull request #7356: [FLINK-10868][flink-yarn] Enforce maximum TMs failure rate in ResourceManagers URL: https://github.com/apache/flink/pull/7356#discussion_r252354300 ## File path: docs/_includes/generated/mesos_configuration.html ## @@ -27,6 +27,11 @@ -1 The maximum number of failed workers before the cluster fails. May be set to -1 to disable this feature. This option is ignored unless Flink is in legacy mode. + +mesos.maximum-failed-workers-per-interval +-1 +Maximum number of workers the system is going to reallocate in case of a failure in an interval. Review comment: Is this a Mesos specific configuration? If we want to support that for all `ResourceManagers`, then it might be better to use a more generic configuration name. E.g. `resourcemanager.maximum-workers-failure-rate`. Should we combine `maximum-failed-workers-per-interval` and `workers-failure-rate-interval` into a `failure-rate` config option which defines how many workers can fail per minute or per hour? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tillrohrmann commented on a change in pull request #7356: [FLINK-10868][flink-yarn] Enforce maximum TMs failure rate in ResourceManagers
tillrohrmann commented on a change in pull request #7356: [FLINK-10868][flink-yarn] Enforce maximum TMs failure rate in ResourceManagers URL: https://github.com/apache/flink/pull/7356#discussion_r252354509 ## File path: docs/_includes/generated/yarn_config_configuration.html ## @@ -42,6 +47,11 @@ (none) Maximum number of containers the system is going to reallocate in case of a failure. + +yarn.maximum-failed-containers-per-interval +-1 +Maximum number of containers the system is going to reallocate in case of a failure in an interval. + Review comment: Duplicate config option. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tillrohrmann commented on a change in pull request #7356: [FLINK-10868][flink-yarn] Enforce maximum TMs failure rate in ResourceManagers
tillrohrmann commented on a change in pull request #7356: [FLINK-10868][flink-yarn] Enforce maximum TMs failure rate in ResourceManagers URL: https://github.com/apache/flink/pull/7356#discussion_r252364001 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java ## @@ -145,6 +148,15 @@ /** All registered listeners for status updates of the ResourceManager. */ private ConcurrentMap infoMessageListeners; + protected final Time failureInterval; + + protected final int maximumFailureTaskExecutorPerInternal; + + private boolean checkFailureRate; + + private final ArrayDeque taskExecutorFailureTimestamps; Review comment: I think we should encapsulate the metering logic behind some interface. Ideally I would like to use exponentially-weighted moving average to approximate the failure rate. This would have the benefit that we would not have to store every timestamp here. However, with an interface we could also have different implementations (e.g. the exact one you have implemented here). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tillrohrmann commented on a change in pull request #7356: [FLINK-10868][flink-yarn] Enforce maximum TMs failure rate in ResourceManagers
tillrohrmann commented on a change in pull request #7356: [FLINK-10868][flink-yarn] Enforce maximum TMs failure rate in ResourceManagers URL: https://github.com/apache/flink/pull/7356#discussion_r252360333 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/SlotManager.java ## @@ -300,6 +304,18 @@ public boolean registerSlotRequest(SlotRequest slotRequest) throws SlotManagerEx } } + /** +* Rejects all pending slot requests. +* @param cause the exception caused the rejection +*/ + public void rejectAllPendingSlotRequests(Exception cause) { + for (PendingSlotRequest pendingSlotRequest : pendingSlotRequests.values()) { + rejectPendingSlotRequest(pendingSlotRequest, cause); + } + + pendingSlotRequests.clear(); + } Review comment: We should add a test for this method. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tillrohrmann commented on a change in pull request #7356: [FLINK-10868][flink-yarn] Enforce maximum TMs failure rate in ResourceManagers
tillrohrmann commented on a change in pull request #7356: [FLINK-10868][flink-yarn] Enforce maximum TMs failure rate in ResourceManagers URL: https://github.com/apache/flink/pull/7356#discussion_r252360809 ## File path: flink-yarn/src/test/java/org/apache/flink/yarn/YarnResourceManagerTest.java ## @@ -521,4 +533,56 @@ public void testOnContainerCompleted() throws Exception { }); }}; } + + /** +* Tests that YarnResourceManager will trigger to reject all pending slot request, when maximum number of failed +* contains is hit. +*/ + @Test + public void testOnContainersAllocatedWithFailure() throws Exception { Review comment: With the generalized test for the failure rate behaviour in `ResourceManagerTest` we would only need to check that a failing container would call `ResourceManager#recordFailure`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services