[jira] [Commented] (FLINK-10508) Port JobManagerITCase to new code base
[ https://issues.apache.org/jira/browse/FLINK-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641583#comment-16641583 ] TisonKun commented on FLINK-10508: -- This test is corresponding not to {{JobMasterITCase}} but to {{DispatcherTest}}. Seems mainly tests multiple types of job and their behavior. > Port JobManagerITCase to new code base > -- > > Key: FLINK-10508 > URL: https://issues.apache.org/jira/browse/FLINK-10508 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > Port {{JobManagerITCase}} to new code base. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10516) YarnApplicationMasterRunner fail to initialize FileSystem with correct Flink Configuration during setup
[ https://issues.apache.org/jira/browse/FLINK-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642911#comment-16642911 ] TisonKun commented on FLINK-10516: -- [~suez1224] What about the refactor? {{YarnApplicationMasterRunner}} looks like a class for legacy(non FLIP-6) code base and we currently have FLINK-10392, so I suspect if we should fix it for 1.7.0. cc [~till.rohrmann] > YarnApplicationMasterRunner fail to initialize FileSystem with correct Flink > Configuration during setup > --- > > Key: FLINK-10516 > URL: https://issues.apache.org/jira/browse/FLINK-10516 > Project: Flink > Issue Type: Bug > Components: YARN >Affects Versions: 1.4.0, 1.5.0, 1.6.0, 1.7.0 >Reporter: Shuyi Chen >Assignee: Shuyi Chen >Priority: Blocker > Fix For: 1.7.0, 1.6.2, 1.5.5 > > > Will add a fix, and refactor YarnApplicationMasterRunner to add a unittest to > prevent future regression. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10527) Cleanup constant isNewMode in YarnTestBase
[ https://issues.apache.org/jira/browse/FLINK-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645927#comment-16645927 ] TisonKun commented on FLINK-10527: -- How about adding this issue as a subtask of FLINK-10392? > Cleanup constant isNewMode in YarnTestBase > -- > > Key: FLINK-10527 > URL: https://issues.apache.org/jira/browse/FLINK-10527 > Project: Flink > Issue Type: Bug > Components: YARN >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > Labels: pull-request-available > > This seems to be a residual problem with FLINK-10396. It is set to true in > that PR. Currently it has three usage scenarios: > 1. assert, caused an error > {code:java} > assumeTrue("The new mode does not start TMs upfront.", !isNewMode); > {code} > 2. if (!isNewMode) the logic in the block would not have invoked, the if > block can be removed > 3. if (isNewMode) always been invoked, the if statement can be removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10508) Port JobManagerITCase to new code base
[ https://issues.apache.org/jira/browse/FLINK-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647375#comment-16647375 ] TisonKun commented on FLINK-10508: -- checklist: JobManagerITCase.The JobManager actor must handle jobs when not enough slots JobManagerITCase.The JobManager actor must support immediate scheduling of a single vertex JobManagerITCase.The JobManager actor must support queued scheduling of a single vertex JobManagerITCase.The JobManager actor must support forward jobs JobManagerITCase.The JobManager actor must support bipartite job JobManagerITCase.The JobManager actor must support two input job failing edge mismatch JobManagerITCase.The JobManager actor must support two input job JobManagerITCase.The JobManager actor must support scheduling all at once JobManagerITCase.The JobManager actor must handle job with a failing sender vertex JobManagerITCase.The JobManager actor must handle job with an occasionally failing sender vertex JobManagerITCase.The JobManager actor must handle job with a failing receiver vertex JobManagerITCase.The JobManager actor must handle job with all vertices failing during instantiation JobManagerITCase.The JobManager actor must handle job with some vertices failing during instantiation JobManagerITCase.The JobManager actor must check that all job vertices have completed the call to finalizeOnMaster before the job completes JobManagerITCase.The JobManager actor must remove execution graphs when the client ends the session explicitly JobManagerITCase.The JobManager actor must remove execution graphs when when the client's session times out JobManagerITCase.The JobManager actor must handle trigger savepoint response for non-existing job JobManagerITCase.The JobManager actor must handle trigger savepoint response for job with disabled checkpointing JobManagerITCase.The JobManager actor must handle trigger savepoint response after trigger savepoint failure JobManagerITCase.The JobManager actor must handle failed savepoint triggering JobManagerITCase.The JobManager actor must handle trigger savepoint response after succeeded savepoint future > Port JobManagerITCase to new code base > -- > > Key: FLINK-10508 > URL: https://issues.apache.org/jira/browse/FLINK-10508 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > Port {{JobManagerITCase}} to new code base. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10540) Remove legacy LocalFlinkMiniCluster
TisonKun created FLINK-10540: Summary: Remove legacy LocalFlinkMiniCluster Key: FLINK-10540 URL: https://issues.apache.org/jira/browse/FLINK-10540 Project: Flink Issue Type: Sub-task Affects Versions: 1.7.0 Reporter: TisonKun Fix For: 1.7.0 {{LocalFlinkMiniCluster}} is based on legacy cluster mode and should be no longer used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10541) Removed unused code of LocalFlinkMiniCluster
TisonKun created FLINK-10541: Summary: Removed unused code of LocalFlinkMiniCluster Key: FLINK-10541 URL: https://issues.apache.org/jira/browse/FLINK-10541 Project: Flink Issue Type: Task Components: Tests Affects Versions: 1.7.0 Reporter: TisonKun Assignee: TisonKun Fix For: 1.7.0 {{TestBaseUtils#startCluster}} depends on legacy class {{LocalFlinkMiniCluster}}, however, this class itself is unused. Let's remove it and help confirm we can directly remove {{LocalFlinkMiniCluster}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10541) Removed unused code of LocalFlinkMiniCluster
[ https://issues.apache.org/jira/browse/FLINK-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10541: - Issue Type: Sub-task (was: Task) Parent: FLINK-10392 > Removed unused code of LocalFlinkMiniCluster > > > Key: FLINK-10541 > URL: https://issues.apache.org/jira/browse/FLINK-10541 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > {{TestBaseUtils#startCluster}} depends on legacy class > {{LocalFlinkMiniCluster}}, however, this class itself is unused. Let's remove > it and help confirm we can directly remove {{LocalFlinkMiniCluster}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10541) Removed unused code depends on LocalFlinkMiniCluster
[ https://issues.apache.org/jira/browse/FLINK-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10541: - Summary: Removed unused code depends on LocalFlinkMiniCluster (was: Removed unused code of LocalFlinkMiniCluster) > Removed unused code depends on LocalFlinkMiniCluster > > > Key: FLINK-10541 > URL: https://issues.apache.org/jira/browse/FLINK-10541 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > {{TestBaseUtils#startCluster}} depends on legacy class > {{LocalFlinkMiniCluster}}, however, this class itself is unused. Let's remove > it and help confirm we can directly remove {{LocalFlinkMiniCluster}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10545) Remove JobManagerLeaderSessionIDITCase
TisonKun created FLINK-10545: Summary: Remove JobManagerLeaderSessionIDITCase Key: FLINK-10545 URL: https://issues.apache.org/jira/browse/FLINK-10545 Project: Flink Issue Type: Sub-task Components: Tests Affects Versions: 1.7.0 Reporter: TisonKun Assignee: TisonKun Fix For: 1.7.0 {{JobManagerLeaderSessionIDITCase.scala}} is based on legacy mode and I think we now have {{FencingToken}} and need not to maintain or port this test. Just simply remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10546) Remove StandaloneMiniCluster
TisonKun created FLINK-10546: Summary: Remove StandaloneMiniCluster Key: FLINK-10546 URL: https://issues.apache.org/jira/browse/FLINK-10546 Project: Flink Issue Type: Task Components: JobManager Affects Versions: 1.7.0 Reporter: TisonKun Assignee: TisonKun Fix For: 1.7.0 Before doing as title, I want to check that is the {{StandaloneMiniCluster}} still in use? IIRC there once be a deprecation of `start-local.sh` but I don’t know if it is relevant. Further, this class seems unused in all other place. Since it depends on legacy mode, I wonder whether we can *JUST* remove it. cc [~till.rohrmann] and [~Zentol] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10546) Remove StandaloneMiniCluster
[ https://issues.apache.org/jira/browse/FLINK-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10546: - Description: Before doing as title, I want to check that is the {{StandaloneMiniCluster}} still in use? IIRC there once be a deprecation of {{start-local.sh}} but I don’t know if it is relevant. Further, this class seems unused in all other place. Since it depends on legacy mode, I wonder whether we can *JUST* remove it. cc [~till.rohrmann] and [~Zentol] was: Before doing as title, I want to check that is the {{StandaloneMiniCluster}} still in use? IIRC there once be a deprecation of `start-local.sh` but I don’t know if it is relevant. Further, this class seems unused in all other place. Since it depends on legacy mode, I wonder whether we can *JUST* remove it. cc [~till.rohrmann] and [~Zentol] > Remove StandaloneMiniCluster > > > Key: FLINK-10546 > URL: https://issues.apache.org/jira/browse/FLINK-10546 > Project: Flink > Issue Type: Task > Components: JobManager >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > Before doing as title, I want to check that is the {{StandaloneMiniCluster}} > still in use? > IIRC there once be a deprecation of {{start-local.sh}} but I don’t know if > it is relevant. > Further, this class seems unused in all other place. Since it depends on > legacy mode, I wonder whether we can *JUST* remove it. > > cc [~till.rohrmann] and [~Zentol] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10540) Remove legacy FlinkMiniCluster
[ https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10540: - Summary: Remove legacy FlinkMiniCluster (was: Remove legacy LocalFlinkMiniCluster) > Remove legacy FlinkMiniCluster > -- > > Key: FLINK-10540 > URL: https://issues.apache.org/jira/browse/FLINK-10540 > Project: Flink > Issue Type: Sub-task >Affects Versions: 1.7.0 >Reporter: TisonKun >Priority: Major > Fix For: 1.7.0 > > > {{LocalFlinkMiniCluster}} is based on legacy cluster mode and should be no > longer used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10540) Remove legacy FlinkMiniCluster
[ https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10540: - Description: {{FlinkMiniCluster}} is based on legacy cluster mode and should be no longer used. (was: {{LocalFlinkMiniCluster}} is based on legacy cluster mode and should be no longer used.) > Remove legacy FlinkMiniCluster > -- > > Key: FLINK-10540 > URL: https://issues.apache.org/jira/browse/FLINK-10540 > Project: Flink > Issue Type: Sub-task >Affects Versions: 1.7.0 >Reporter: TisonKun >Priority: Major > Fix For: 1.7.0 > > > {{FlinkMiniCluster}} is based on legacy cluster mode and should be no longer > used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10549) Remove LegacyJobRetrievalITCase
TisonKun created FLINK-10549: Summary: Remove LegacyJobRetrievalITCase Key: FLINK-10549 URL: https://issues.apache.org/jira/browse/FLINK-10549 Project: Flink Issue Type: Sub-task Components: Tests Affects Versions: 1.7.0 Reporter: TisonKun Assignee: TisonKun Fix For: 1.7.0 We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10549) Remove Legacy* Tests
[ https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10549: - Summary: Remove Legacy* Tests (was: Remove LegacyJobRetrievalITCase) > Remove Legacy* Tests > > > Key: FLINK-10549 > URL: https://issues.apache.org/jira/browse/FLINK-10549 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > We already have {{JobRetrievalITCase}} and all test cases of > {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10549) Remove Legacy* Tests
[ https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10549: - Description: This Jira tracks the removal of tests that start with "LegacyXXX" and covered by a test named "XXX" and covering the same topic. 1. We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. was:We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. > Remove Legacy* Tests > > > Key: FLINK-10549 > URL: https://issues.apache.org/jira/browse/FLINK-10549 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > This Jira tracks the removal of tests that start with "LegacyXXX" and covered > by a test named "XXX" and covering the same topic. > 1. We already have {{JobRetrievalITCase}} and all test cases of > {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10549) Remove Legacy* Tests
[ https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10549: - Description: This Jira tracks the removal of tests that start with "LegacyXXX" and covered by a test named "XXX" and covering the same topic. # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. was: This Jira tracks the removal of tests that start with "LegacyXXX" and covered by a test named "XXX" and covering the same topic. 1. We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. > Remove Legacy* Tests > > > Key: FLINK-10549 > URL: https://issues.apache.org/jira/browse/FLINK-10549 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > This Jira tracks the removal of tests that start with "LegacyXXX" and covered > by a test named "XXX" and covering the same topic. > # We already have {{JobRetrievalITCase}} and all test cases of > {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. > # We already have {{AccumulatorLiveITCase}} and all test cases of > {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10549) Remove Legacy* Tests
[ https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10549: - Description: This Jira tracks the removal of tests that start with "LegacyXXX" and covered by a test named "XXX" and covering the same topic. # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. was: This Jira tracks the removal of tests that start with "LegacyXXX" and covered by a test named "XXX" and covering the same topic. # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. > Remove Legacy* Tests > > > Key: FLINK-10549 > URL: https://issues.apache.org/jira/browse/FLINK-10549 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > This Jira tracks the removal of tests that start with "LegacyXXX" and covered > by a test named "XXX" and covering the same topic. > # We already have {{JobRetrievalITCase}} and all test cases of > {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. > # We already have {{AccumulatorLiveITCase}} and all test cases of > {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. > # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test > cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just > simply remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode
[ https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10549: - Summary: Remove Legacy* Tests based on legacy mode (was: Remove Legacy* Tests) > Remove Legacy* Tests based on legacy mode > - > > Key: FLINK-10549 > URL: https://issues.apache.org/jira/browse/FLINK-10549 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > This Jira tracks the removal of tests that start with "LegacyXXX" and covered > by a test named "XXX" and covering the same topic. > # We already have {{JobRetrievalITCase}} and all test cases of > {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. > # We already have {{AccumulatorLiveITCase}} and all test cases of > {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. > # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test > cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just > simply remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode
[ https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10549: - Description: This Jira tracks the removal of tests based on legacy mode and starting with "LegacyXXX" while covered by a test named "XXX" . # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. was: This Jira tracks the removal of tests based on legacy mode and starting with "LegacyXXX" while covered by a test named "XXX" and covering the same topic. # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. > Remove Legacy* Tests based on legacy mode > - > > Key: FLINK-10549 > URL: https://issues.apache.org/jira/browse/FLINK-10549 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > This Jira tracks the removal of tests based on legacy mode and starting with > "LegacyXXX" while covered by a test named "XXX" . > # We already have {{JobRetrievalITCase}} and all test cases of > {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. > # We already have {{AccumulatorLiveITCase}} and all test cases of > {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. > # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test > cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just > simply remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode
[ https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10549: - Description: This Jira tracks the removal of tests based on legacy mode and starting with "LegacyXXX" while covered by a test named "XXX" . # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. # We already have {{ClassLoaderITCase}} and all test cases of {{LegacyClassLoaderITCase}} are covered. Just simply remove it. was: This Jira tracks the removal of tests based on legacy mode and starting with "LegacyXXX" while covered by a test named "XXX" . # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. > Remove Legacy* Tests based on legacy mode > - > > Key: FLINK-10549 > URL: https://issues.apache.org/jira/browse/FLINK-10549 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > This Jira tracks the removal of tests based on legacy mode and starting with > "LegacyXXX" while covered by a test named "XXX" . > # We already have {{JobRetrievalITCase}} and all test cases of > {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. > # We already have {{AccumulatorLiveITCase}} and all test cases of > {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. > # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test > cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just > simply remove it. > # We already have {{ClassLoaderITCase}} and all test cases of > {{LegacyClassLoaderITCase}} are covered. Just simply remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode
[ https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10549: - Description: This Jira tracks the removal of tests based on legacy mode and starting with "LegacyXXX" while covered by a test named "XXX" . # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. # We already have {{ClassLoaderITCase}} and all test cases of {{LegacyClassLoaderITCase}} are covered. Just simply remove it. # We already have {{SlotCountExceedingParallelismTest}} and all test cases of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it. was: This Jira tracks the removal of tests based on legacy mode and starting with "LegacyXXX" while covered by a test named "XXX" . # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. # We already have {{ClassLoaderITCase}} and all test cases of {{LegacyClassLoaderITCase}} are covered. Just simply remove it. > Remove Legacy* Tests based on legacy mode > - > > Key: FLINK-10549 > URL: https://issues.apache.org/jira/browse/FLINK-10549 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > This Jira tracks the removal of tests based on legacy mode and starting with > "LegacyXXX" while covered by a test named "XXX" . > # We already have {{JobRetrievalITCase}} and all test cases of > {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. > # We already have {{AccumulatorLiveITCase}} and all test cases of > {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. > # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test > cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just > simply remove it. > # We already have {{ClassLoaderITCase}} and all test cases of > {{LegacyClassLoaderITCase}} are covered. Just simply remove it. > # We already have {{SlotCountExceedingParallelismTest}} and all test cases > of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply > remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode
[ https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10549: - Description: This Jira tracks the removal of tests based on legacy mode and starting with "LegacyXXX" while covered by a test named "XXX" . # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. # We already have {{ClassLoaderITCase}} and all test cases of {{LegacyClassLoaderITCase}} are covered. Just simply remove it. # We already have {{SlotCountExceedingParallelismTest}} and all test cases of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it. # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of {{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it. was: This Jira tracks the removal of tests based on legacy mode and starting with "LegacyXXX" while covered by a test named "XXX" . # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. # We already have {{ClassLoaderITCase}} and all test cases of {{LegacyClassLoaderITCase}} are covered. Just simply remove it. # We already have {{SlotCountExceedingParallelismTest}} and all test cases of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it. > Remove Legacy* Tests based on legacy mode > - > > Key: FLINK-10549 > URL: https://issues.apache.org/jira/browse/FLINK-10549 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > This Jira tracks the removal of tests based on legacy mode and starting with > "LegacyXXX" while covered by a test named "XXX" . > # We already have {{JobRetrievalITCase}} and all test cases of > {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. > # We already have {{AccumulatorLiveITCase}} and all test cases of > {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. > # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test > cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just > simply remove it. > # We already have {{ClassLoaderITCase}} and all test cases of > {{LegacyClassLoaderITCase}} are covered. Just simply remove it. > # We already have {{SlotCountExceedingParallelismTest}} and all test cases > of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply > remove it. > # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of > {{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode
[ https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10549: - Description: This Jira tracks the removal of tests based on legacy mode and starting with "LegacyXXX" while covered by a test named "XXX" . # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. # We already have {{ClassLoaderITCase}} and all test cases of {{LegacyClassLoaderITCase}} are covered. Just simply remove it. # We already have {{SlotCountExceedingParallelismTest}} and all test cases of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it. # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of {{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it. # We already have {{PartialConsumePipelinedResultTest}} and all test cases of {{LegacyPartialConsumePipelinedResultTest}} are covered. Just simply remove it. was: This Jira tracks the removal of tests based on legacy mode and starting with "LegacyXXX" while covered by a test named "XXX" . # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. # We already have {{ClassLoaderITCase}} and all test cases of {{LegacyClassLoaderITCase}} are covered. Just simply remove it. # We already have {{SlotCountExceedingParallelismTest}} and all test cases of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it. # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of {{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it. > Remove Legacy* Tests based on legacy mode > - > > Key: FLINK-10549 > URL: https://issues.apache.org/jira/browse/FLINK-10549 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > This Jira tracks the removal of tests based on legacy mode and starting with > "LegacyXXX" while covered by a test named "XXX" . > # We already have {{JobRetrievalITCase}} and all test cases of > {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. > # We already have {{AccumulatorLiveITCase}} and all test cases of > {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. > # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test > cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just > simply remove it. > # We already have {{ClassLoaderITCase}} and all test cases of > {{LegacyClassLoaderITCase}} are covered. Just simply remove it. > # We already have {{SlotCountExceedingParallelismTest}} and all test cases > of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply > remove it. > # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of > {{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it. > # We already have {{PartialConsumePipelinedResultTest}} and all test cases > of {{LegacyPartialConsumePipelinedResultTest}} are covered. Just simply > remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode
[ https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10549: - Description: This Jira tracks the removal of tests based on legacy mode and starting with "LegacyXXX" while covered by a test named "XXX" and covering the same topic. # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. was: This Jira tracks the removal of tests that start with "LegacyXXX" and covered by a test named "XXX" and covering the same topic. # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. > Remove Legacy* Tests based on legacy mode > - > > Key: FLINK-10549 > URL: https://issues.apache.org/jira/browse/FLINK-10549 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > This Jira tracks the removal of tests based on legacy mode and starting with > "LegacyXXX" while covered by a test named "XXX" and covering the same topic. > # We already have {{JobRetrievalITCase}} and all test cases of > {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. > # We already have {{AccumulatorLiveITCase}} and all test cases of > {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. > # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test > cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just > simply remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode
[ https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10549: - Description: This Jira tracks the removal of tests based on legacy mode and starting with "LegacyXXX" while covered by a test named "XXX" . # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. # We already have {{ClassLoaderITCase}} and all test cases of {{LegacyClassLoaderITCase}} are covered. Just simply remove it. # We already have {{SlotCountExceedingParallelismTest}} and all test cases of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it. # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of {{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it. # We already have {{PartialConsumePipelinedResultTest}} and all test cases of {{LegacyPartialConsumePipelinedResultTest}} are covered. Just simply remove it. # We already have {{AvroExternalJarProgramITCase}} and all test cases of {{LegacyAvroExternalJarProgramITCase}} are covered. Just simply remove it. was: This Jira tracks the removal of tests based on legacy mode and starting with "LegacyXXX" while covered by a test named "XXX" . # We already have {{JobRetrievalITCase}} and all test cases of {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. # We already have {{AccumulatorLiveITCase}} and all test cases of {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply remove it. # We already have {{ClassLoaderITCase}} and all test cases of {{LegacyClassLoaderITCase}} are covered. Just simply remove it. # We already have {{SlotCountExceedingParallelismTest}} and all test cases of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it. # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of {{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it. # We already have {{PartialConsumePipelinedResultTest}} and all test cases of {{LegacyPartialConsumePipelinedResultTest}} are covered. Just simply remove it. > Remove Legacy* Tests based on legacy mode > - > > Key: FLINK-10549 > URL: https://issues.apache.org/jira/browse/FLINK-10549 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > This Jira tracks the removal of tests based on legacy mode and starting with > "LegacyXXX" while covered by a test named "XXX" . > # We already have {{JobRetrievalITCase}} and all test cases of > {{LegacyJobRetrievalITCase}} are covered. Just simply remove it. > # We already have {{AccumulatorLiveITCase}} and all test cases of > {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it. > # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test > cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just > simply remove it. > # We already have {{ClassLoaderITCase}} and all test cases of > {{LegacyClassLoaderITCase}} are covered. Just simply remove it. > # We already have {{SlotCountExceedingParallelismTest}} and all test cases > of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply > remove it. > # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of > {{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it. > # We already have {{PartialConsumePipelinedResultTest}} and all test cases > of {{LegacyPartialConsumePipelinedResultTest}} are covered. Just simply > remove it. > # We already have {{AvroExternalJarProgramITCase}} and all test cases of > {{LegacyAvroExternalJarProgramITCase}} are covered. Just simply remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10546) Remove StandaloneMiniCluster
[ https://issues.apache.org/jira/browse/FLINK-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10546: - Issue Type: Sub-task (was: Task) Parent: FLINK-10392 > Remove StandaloneMiniCluster > > > Key: FLINK-10546 > URL: https://issues.apache.org/jira/browse/FLINK-10546 > Project: Flink > Issue Type: Sub-task > Components: JobManager >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > Before doing as title, I want to check that is the {{StandaloneMiniCluster}} > still in use? > IIRC there once be a deprecation of {{start-local.sh}} but I don’t know if > it is relevant. > Further, this class seems unused in all other place. Since it depends on > legacy mode, I wonder whether we can *JUST* remove it. > > cc [~till.rohrmann] and [~Zentol] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10436) Example config uses deprecated key jobmanager.rpc.address
[ https://issues.apache.org/jira/browse/FLINK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649891#comment-16649891 ] TisonKun commented on FLINK-10436: -- [~till.rohrmann] Sounds reasonable. It is a bit strange if jobmanager can somehow replaced by rest. I do a bisect and the commit is https://github.com/apache/flink/commit/efd7336fa693a9f82b9ecfb5d81c0ef747ab7801 which is authored by [~gjy] and is reviewed by [~till.rohrmann]. Thus I involve [~gjy] here to hear the original purpose. If we go into the same direction of Till's thought above, we can revert the {{.withDeprecatedKeys(JobManagerOptions.ADDRESS.key())}} line. > Example config uses deprecated key jobmanager.rpc.address > - > > Key: FLINK-10436 > URL: https://issues.apache.org/jira/browse/FLINK-10436 > Project: Flink > Issue Type: Sub-task > Components: Startup Shell Scripts >Affects Versions: 1.7.0 >Reporter: Ufuk Celebi >Assignee: TisonKun >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > The example {{flink-conf.yaml}} shipped as part of the Flink distribution > (https://github.com/apache/flink/blob/master/flink-dist/src/main/resources/flink-conf.yaml) > has the following entry: > {code} > jobmanager.rpc.address: localhost > {code} > When using this key, the following deprecation warning is logged. > {code} > 2018-09-26 12:01:46,608 WARN org.apache.flink.configuration.Configuration > - Config uses deprecated configuration key > 'jobmanager.rpc.address' instead of proper key 'rest.address' > {code} > The example config should not use deprecated config options. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10555) Port AkkaSslITCase to new code base
TisonKun created FLINK-10555: Summary: Port AkkaSslITCase to new code base Key: FLINK-10555 URL: https://issues.apache.org/jira/browse/FLINK-10555 Project: Flink Issue Type: Sub-task Components: Tests Affects Versions: 1.7.0 Reporter: TisonKun Assignee: TisonKun Fix For: 1.7.0 Port {{AkkaSslITCase}} to new code base, as {{MiniClusterSslITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-2592) Rework of FlinkMiniCluster
[ https://issues.apache.org/jira/browse/FLINK-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun closed FLINK-2592. --- Resolution: Won't Fix In fact, we want to remove {{FlinkMiniCluster}} since it is based on deprecated legacy mode. > Rework of FlinkMiniCluster > -- > > Key: FLINK-2592 > URL: https://issues.apache.org/jira/browse/FLINK-2592 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Reporter: Till Rohrmann >Priority: Minor > > Over the time, the {{FlinkMiniCluster}} has become quite complex to support > all different execution modes (batch vs. streaming, standalone vs. ha with > ZooKeeper, single {{ActorSystem}} vs. multiple {{ActorSystems}}, etc.). There > is no consistent way of configuring all these options. Therefore it would be > good to rework the {{FlinkMiniCluster}} to avoid configuring it via the > {{Configuration}} object and instead to use explicit options which can be > turned on and off. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10559) Remove LegacyLocalStreamEnvironment
TisonKun created FLINK-10559: Summary: Remove LegacyLocalStreamEnvironment Key: FLINK-10559 URL: https://issues.apache.org/jira/browse/FLINK-10559 Project: Flink Issue Type: Sub-task Components: Local Runtime Affects Versions: 1.7.0 Reporter: TisonKun Assignee: TisonKun Fix For: 1.7.0 See the corresponding GitHub pull request for diagnostic, basically this class is not in used any more. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10540) Remove legacy FlinkMiniCluster
[ https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651288#comment-16651288 ] TisonKun commented on FLINK-10540: -- Hi [~dangdangdang], feel free to take over it. FYI, this is somehow an umbrella issue since there are many dependencies of {{FlinkMiniCluster}} its subclasses {{LocalFlinkMiniCluster}} and {{TestingMiniCluster}}. We finally remove {{FlinkMiniCluster}} but the process might be a bit further than just removal. > Remove legacy FlinkMiniCluster > -- > > Key: FLINK-10540 > URL: https://issues.apache.org/jira/browse/FLINK-10540 > Project: Flink > Issue Type: Sub-task >Affects Versions: 1.7.0 >Reporter: TisonKun >Priority: Major > Fix For: 1.7.0 > > > {{FlinkMiniCluster}} is based on legacy cluster mode and should be no longer > used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-10540) Remove legacy FlinkMiniCluster
[ https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651288#comment-16651288 ] TisonKun edited comment on FLINK-10540 at 10/16/18 8:04 AM: Hi [~dangdangdang], feel free to take over it. FYI, this is somehow an umbrella issue since there are many dependencies of {{FlinkMiniCluster}} and its subclasses {{LocalFlinkMiniCluster}} and {{TestingCluster}}. We finally remove {{FlinkMiniCluster}} but the process might be a bit further than just removal. was (Author: tison): Hi [~dangdangdang], feel free to take over it. FYI, this is somehow an umbrella issue since there are many dependencies of {{FlinkMiniCluster}} and its subclasses {{LocalFlinkMiniCluster}} and {{TestingMiniCluster}}. We finally remove {{FlinkMiniCluster}} but the process might be a bit further than just removal. > Remove legacy FlinkMiniCluster > -- > > Key: FLINK-10540 > URL: https://issues.apache.org/jira/browse/FLINK-10540 > Project: Flink > Issue Type: Sub-task >Affects Versions: 1.7.0 >Reporter: TisonKun >Priority: Major > Fix For: 1.7.0 > > > {{FlinkMiniCluster}} is based on legacy cluster mode and should be no longer > used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-10540) Remove legacy FlinkMiniCluster
[ https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651288#comment-16651288 ] TisonKun edited comment on FLINK-10540 at 10/16/18 8:04 AM: Hi [~dangdangdang], feel free to take over it. FYI, this is somehow an umbrella issue since there are many dependencies of {{FlinkMiniCluster}} and its subclasses {{LocalFlinkMiniCluster}} and {{TestingMiniCluster}}. We finally remove {{FlinkMiniCluster}} but the process might be a bit further than just removal. was (Author: tison): Hi [~dangdangdang], feel free to take over it. FYI, this is somehow an umbrella issue since there are many dependencies of {{FlinkMiniCluster}} its subclasses {{LocalFlinkMiniCluster}} and {{TestingMiniCluster}}. We finally remove {{FlinkMiniCluster}} but the process might be a bit further than just removal. > Remove legacy FlinkMiniCluster > -- > > Key: FLINK-10540 > URL: https://issues.apache.org/jira/browse/FLINK-10540 > Project: Flink > Issue Type: Sub-task >Affects Versions: 1.7.0 >Reporter: TisonKun >Priority: Major > Fix For: 1.7.0 > > > {{FlinkMiniCluster}} is based on legacy cluster mode and should be no longer > used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10565) Refactor SchedulerTestBase
TisonKun created FLINK-10565: Summary: Refactor SchedulerTestBase Key: FLINK-10565 URL: https://issues.apache.org/jira/browse/FLINK-10565 Project: Flink Issue Type: Sub-task Components: Tests Affects Versions: 1.7.0 Reporter: TisonKun Assignee: TisonKun Fix For: 1.7.0 Basically, current {{SchedulerTestBase}} covers {{Scheduler}} tests, which is a schedule mechanism of legacy mode. We can remove it from {{SchedulerTestBase}}. Besides, inner {{SchedulerTestBase}} there are two useful testing class {{TestingSlotPool}} and {{TestingSlotPoolSlotProvider}}. Extract them for further usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10564) tm costs too much time when communicating with jm
[ https://issues.apache.org/jira/browse/FLINK-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651641#comment-16651641 ] TisonKun commented on FLINK-10564: -- Hi [~chenlf], could you tell on which version of Flink, i.e., the {{Affects Version/s}}, you meet this issue? > tm costs too much time when communicating with jm > -- > > Key: FLINK-10564 > URL: https://issues.apache.org/jira/browse/FLINK-10564 > Project: Flink > Issue Type: Bug > Components: Core, JobManager, TaskManager > Environment: configs are following: > jm > high-availability zookeeper > taskmanager.heap.mb 16384 > taskmanager.memory.preallocatefalse > taskmanager.numberOfTaskSlots 64 > tm > slots 128 > free slots 0-128 > cpu core 40 > Physical Memory 95gb > free Memory 32gb-50gb > Flink Managed Memory 22gb-35gb >Reporter: chenlf >Priority: Major > Attachments: timeout.log > > > it works fine until the number of tasks is above about 400. > There are 600+ tasks(each task handles billion data) running in our cluster > now,and the problem is it costs too much time (even time out)when > submiting/canceling/querying a task. > Recouses like memory,cpu are on normal level. > after debuging,we found this method is the ulprit: > org.apache.flink.runtime.util.LeaderRetrievalUtils.LeaderGatewayListener.notifyLeaderAddress(String, > UUID) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10436) Example config uses deprecated key jobmanager.rpc.address
[ https://issues.apache.org/jira/browse/FLINK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651705#comment-16651705 ] TisonKun commented on FLINK-10436: -- Thanks for explaining the fallback behavior. I agree. > Example config uses deprecated key jobmanager.rpc.address > - > > Key: FLINK-10436 > URL: https://issues.apache.org/jira/browse/FLINK-10436 > Project: Flink > Issue Type: Sub-task > Components: Startup Shell Scripts >Affects Versions: 1.7.0 >Reporter: Ufuk Celebi >Assignee: TisonKun >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > The example {{flink-conf.yaml}} shipped as part of the Flink distribution > (https://github.com/apache/flink/blob/master/flink-dist/src/main/resources/flink-conf.yaml) > has the following entry: > {code} > jobmanager.rpc.address: localhost > {code} > When using this key, the following deprecation warning is logged. > {code} > 2018-09-26 12:01:46,608 WARN org.apache.flink.configuration.Configuration > - Config uses deprecated configuration key > 'jobmanager.rpc.address' instead of proper key 'rest.address' > {code} > The example config should not use deprecated config options. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10436) Example config uses deprecated key jobmanager.rpc.address
[ https://issues.apache.org/jira/browse/FLINK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651774#comment-16651774 ] TisonKun commented on FLINK-10436: -- [~till.rohrmann] can we archive this by first introducing a class {{FallbackKey}} and change {{deprecatedKeys}}'s type to it? {code:java} public class FallbackKey { private String key; private boolean isDeprecated; ... } {code} > Example config uses deprecated key jobmanager.rpc.address > - > > Key: FLINK-10436 > URL: https://issues.apache.org/jira/browse/FLINK-10436 > Project: Flink > Issue Type: Sub-task > Components: Startup Shell Scripts >Affects Versions: 1.7.0 >Reporter: Ufuk Celebi >Assignee: TisonKun >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > The example {{flink-conf.yaml}} shipped as part of the Flink distribution > (https://github.com/apache/flink/blob/master/flink-dist/src/main/resources/flink-conf.yaml) > has the following entry: > {code} > jobmanager.rpc.address: localhost > {code} > When using this key, the following deprecation warning is logged. > {code} > 2018-09-26 12:01:46,608 WARN org.apache.flink.configuration.Configuration > - Config uses deprecated configuration key > 'jobmanager.rpc.address' instead of proper key 'rest.address' > {code} > The example config should not use deprecated config options. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10569) Clean up uses of Scheduler and Instance in valid tests
TisonKun created FLINK-10569: Summary: Clean up uses of Scheduler and Instance in valid tests Key: FLINK-10569 URL: https://issues.apache.org/jira/browse/FLINK-10569 Project: Flink Issue Type: Sub-task Components: Tests Affects Versions: 1.7.0 Reporter: TisonKun Fix For: 1.7.0 Legacy class {{Scheduler}} and {{Instance}} are still used in some valid tests like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 schedule mode. The best way I can find is use {{SimpleSlotProvider}}. Note that we need not to remove all use points among all files since most of them stay in legacy codebase like {{JobManager.scala}} and would be removed later. *Feel free to take over it.* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10575) Remove deprecated ExecutionGraphBuilder.buildGraph method
[ https://issues.apache.org/jira/browse/FLINK-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652863#comment-16652863 ] TisonKun commented on FLINK-10575: -- [~isunjin] did you analyze the code path relevant? For my perspective this deprecated method does not depend on legacy code. This method looks like for historic purpose(as a fallback when vertex parallelism set to {{ExecutionConfig.PARALLELISM_AUTO_MAX}}), so technically I am not opposite to remove it. But I hesitate to do the removal so eagerly without necessary. > Remove deprecated ExecutionGraphBuilder.buildGraph method > - > > Key: FLINK-10575 > URL: https://issues.apache.org/jira/browse/FLINK-10575 > Project: Flink > Issue Type: Sub-task > Components: JobManager >Affects Versions: 1.7.0 >Reporter: JIN SUN >Assignee: JIN SUN >Priority: Major > > ExecutionGraphBuilder is not a public API and we should able to remove > deprecated method such as: > @Deprecated > public static ExecutionGraph buildGraph > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-10569) Clean up uses of Scheduler and Instance in valid tests
[ https://issues.apache.org/jira/browse/FLINK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun reassigned FLINK-10569: Assignee: TisonKun > Clean up uses of Scheduler and Instance in valid tests > -- > > Key: FLINK-10569 > URL: https://issues.apache.org/jira/browse/FLINK-10569 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > Legacy class {{Scheduler}} and {{Instance}} are still used in some valid > tests like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 > schedule mode. The best way I can find is use {{SimpleSlotProvider}}. > Note that we need not to remove all use points among all files since most of > them stay in legacy codebase like {{JobManager.scala}} and would be removed > later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10569) Clean up uses of Scheduler and Instance in valid tests
[ https://issues.apache.org/jira/browse/FLINK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10569: - Description: Legacy class {{Scheduler}} and {{Instance}} are still used in some valid tests like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 schedule mode. The best way I can find is use {{SimpleSlotProvider}}. Note that we need not to remove all use points among all files since most of them stay in legacy codebase like {{JobManager.scala}} and would be removed later. was: Legacy class {{Scheduler}} and {{Instance}} are still used in some valid tests like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 schedule mode. The best way I can find is use {{SimpleSlotProvider}}. Note that we need not to remove all use points among all files since most of them stay in legacy codebase like {{JobManager.scala}} and would be removed later. *Feel free to take over it.* > Clean up uses of Scheduler and Instance in valid tests > -- > > Key: FLINK-10569 > URL: https://issues.apache.org/jira/browse/FLINK-10569 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Priority: Major > Fix For: 1.7.0 > > > Legacy class {{Scheduler}} and {{Instance}} are still used in some valid > tests like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 > schedule mode. The best way I can find is use {{SimpleSlotProvider}}. > Note that we need not to remove all use points among all files since most of > them stay in legacy codebase like {{JobManager.scala}} and would be removed > later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (FLINK-10038) Parallel the creation of InputSplit if necessary
[ https://issues.apache.org/jira/browse/FLINK-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10038: - Comment: was deleted (was: My original purpose of mention "parallelize the creation of InputSplit" might be parallelize the creation of ONE InputSplit. Take a look at {{FileInputFormat#createInputSplits}}, it creates InputSplits file by file. Here is where I aim to parallelize. Thus it said "the interface for the creation of input splits is definitely InputSplitSource#createInputSplits". And this could be done without modify the interface, by change the implementation of {{createInputSplits}}. However, your ideas here are also brightly. Let's say a typical case gain benefits from these ideas is batch job with many files, where would prefer to using RegionFailover strategy if possible. Here I see 3 options. 1. create InputSplits before job running. 2. create InputSplits concurrent to scheduling the job. 3. Use a specific single task to generate the work. Option 1 is easier to implement as [~StephanEwen] said. Below with concrete challenges for the rest options. The main issue I concern is in batch job, we prefer not to cancelling all vertices and restart. What's worse, since we don't have batch checkpoint, the batch job has to restart completely. This is unacceptable for large scale batch job. For option2, what if jm failover after some input splits have been computed and sent off? We don't have specific jm failover strategy now, thus it cause the job completely restarted. By continue this option, it leads to discuss A jm failover strategy, that is, when jm failover and restart, it can recover(reconcile) state from the previous one. For option3, there would be a wider consider about Source. Take two input case into consider(below). Currently we read from source blocking, now we compute the input split as a single task, if we still use blocking approach, the downstream maybe stuck for waiting one input while the other input is ready to be read. Src1 \ Src2>Join One way to solve this issue is we read from the source unblocking. Assume introduce a method {{boolean SourceFunction#next(Collector)}}, when the downstream calling it, the source sent its data to the collector and return true. If there remains no more data, it return false. This also async source read from file and produce data. To sum up, focusing more on batch job, the main issue concerned would be jm failover for option 1 and 2(also extern but significant batch checkpoint), and more flexible source for option 3.) > Parallel the creation of InputSplit if necessary > > > Key: FLINK-10038 > URL: https://issues.apache.org/jira/browse/FLINK-10038 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.5.0 >Reporter: TisonKun >Priority: Major > Labels: improvement, inputformat, parallel, perfomance > > As a continue to the discussion in the PR about parallelize the creation of > ExecutionJobVertex [here|https://github.com/apache/flink/pull/6353]. > [~StephanEwen] suggested that we could parallelize the creation of > InputSplit, from which we gain performance improvements. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10569) Clean up uses of Scheduler and Instance in valid tests
[ https://issues.apache.org/jira/browse/FLINK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655620#comment-16655620 ] TisonKun commented on FLINK-10569: -- I find this topic is more complex than I ever thought. We actually have two version of slot implementation and current {{ExecutionGraph}} related tests heavily depend on legacy slot implementation. Maybe we should separated this thread into stepwise issues decouples existing valid tests with legacy slot implementation(as well as Scheduler and Instance). cc [~till.rohrmann]. > Clean up uses of Scheduler and Instance in valid tests > -- > > Key: FLINK-10569 > URL: https://issues.apache.org/jira/browse/FLINK-10569 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > Legacy class {{Scheduler}} and {{Instance}} are still used in some valid > tests like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 > schedule mode. The best way I can find is use {{SimpleSlotProvider}}. > Note that we need not to remove all use points among all files since most of > them stay in legacy codebase like {{JobManager.scala}} and would be removed > later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-10569) Clean up uses of Scheduler and Instance in valid tests
[ https://issues.apache.org/jira/browse/FLINK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655620#comment-16655620 ] TisonKun edited comment on FLINK-10569 at 10/18/18 5:22 PM: I find this topic is more complex than I ever thought. We actually have two version of slot implementation and current {{ExecutionGraph}} tests heavily depend on legacy slot implementation. Maybe we should separated this thread into stepwise issues decouples existing valid tests with legacy slot implementation(as well as Scheduler and Instance). cc [~till.rohrmann]. was (Author: tison): I find this topic is more complex than I ever thought. We actually have two version of slot implementation and current {{ExecutionGraph}} related tests heavily depend on legacy slot implementation. Maybe we should separated this thread into stepwise issues decouples existing valid tests with legacy slot implementation(as well as Scheduler and Instance). cc [~till.rohrmann]. > Clean up uses of Scheduler and Instance in valid tests > -- > > Key: FLINK-10569 > URL: https://issues.apache.org/jira/browse/FLINK-10569 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > Legacy class {{Scheduler}} and {{Instance}} are still used in some valid > tests like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 > schedule mode. The best way I can find is use {{SimpleSlotProvider}}. > Note that we need not to remove all use points among all files since most of > them stay in legacy codebase like {{JobManager.scala}} and would be removed > later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-10569) Clean up uses of Scheduler and Instance in valid tests
[ https://issues.apache.org/jira/browse/FLINK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655620#comment-16655620 ] TisonKun edited comment on FLINK-10569 at 10/18/18 5:22 PM: I find this topic is more complex than I ever thought. We actually have two version of slot implementation and current {{ExecutionGraph}} tests heavily depend on legacy slot implementation. Maybe we should separate this thread into stepwise issues decouples existing valid tests with legacy slot implementation(as well as Scheduler and Instance). cc [~till.rohrmann]. was (Author: tison): I find this topic is more complex than I ever thought. We actually have two version of slot implementation and current {{ExecutionGraph}} tests heavily depend on legacy slot implementation. Maybe we should separated this thread into stepwise issues decouples existing valid tests with legacy slot implementation(as well as Scheduler and Instance). cc [~till.rohrmann]. > Clean up uses of Scheduler and Instance in valid tests > -- > > Key: FLINK-10569 > URL: https://issues.apache.org/jira/browse/FLINK-10569 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.7.0 > > > Legacy class {{Scheduler}} and {{Instance}} are still used in some valid > tests like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 > schedule mode. The best way I can find is use {{SimpleSlotProvider}}. > Note that we need not to remove all use points among all files since most of > them stay in legacy codebase like {{JobManager.scala}} and would be removed > later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10540) Remove legacy FlinkMiniCluster
[ https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656218#comment-16656218 ] TisonKun commented on FLINK-10540: -- Great take [~dangdangdang]! > Remove legacy FlinkMiniCluster > -- > > Key: FLINK-10540 > URL: https://issues.apache.org/jira/browse/FLINK-10540 > Project: Flink > Issue Type: Sub-task >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: Shimin Yang >Priority: Major > Fix For: 1.7.0 > > > {{FlinkMiniCluster}} is based on legacy cluster mode and should be no longer > used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10540) Remove legacy FlinkMiniCluster
[ https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656239#comment-16656239 ] TisonKun commented on FLINK-10540: -- Yes I see. And in testing code its subclasses({{LocalFlinkMiniCluster}} and {{TestingCluster}}) are still in use, we might want to get rid of those uses, too. Otherwise they become blockers to this. > Remove legacy FlinkMiniCluster > -- > > Key: FLINK-10540 > URL: https://issues.apache.org/jira/browse/FLINK-10540 > Project: Flink > Issue Type: Sub-task >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: Shimin Yang >Priority: Major > Fix For: 1.7.0 > > > {{FlinkMiniCluster}} is based on legacy cluster mode and should be no longer > used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10610) Port CoLocationConstraintITCase to new codebase
TisonKun created FLINK-10610: Summary: Port CoLocationConstraintITCase to new codebase Key: FLINK-10610 URL: https://issues.apache.org/jira/browse/FLINK-10610 Project: Flink Issue Type: Sub-task Components: Tests Affects Versions: 1.7.0 Reporter: TisonKun Assignee: TisonKun Fix For: 1.7.0 Port {{CoLocationConstraintITCase}} to new codebase -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10610) Port slot sharing cases to new codebase
[ https://issues.apache.org/jira/browse/FLINK-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10610: - Summary: Port slot sharing cases to new codebase (was: Port CoLocationConstraintITCase to new codebase) > Port slot sharing cases to new codebase > --- > > Key: FLINK-10610 > URL: https://issues.apache.org/jira/browse/FLINK-10610 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > Port {{CoLocationConstraintITCase}} to new codebase -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10610) Port slot sharing cases to new codebase
[ https://issues.apache.org/jira/browse/FLINK-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10610: - Description: Port {{CoLocationConstraintITCase}} and {{SlotSharingITCase}} to new codebase (was: Port {{CoLocationConstraintITCase}} to new codebase) > Port slot sharing cases to new codebase > --- > > Key: FLINK-10610 > URL: https://issues.apache.org/jira/browse/FLINK-10610 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.7.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > Port {{CoLocationConstraintITCase}} and {{SlotSharingITCase}} to new codebase -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-10558) Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new code base
[ https://issues.apache.org/jira/browse/FLINK-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun reassigned FLINK-10558: Assignee: TisonKun > Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new > code base > --- > > Key: FLINK-10558 > URL: https://issues.apache.org/jira/browse/FLINK-10558 > Project: Flink > Issue Type: Sub-task >Reporter: vinoyang >Assignee: TisonKun >Priority: Minor > > {{YARNHighAvailabilityITCase}}, > {{YARNSessionCapacitySchedulerITCase#testClientStartup,}} > {{YARNSessionCapacitySchedulerITCase#testTaskManagerFailure}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10558) Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new code base
[ https://issues.apache.org/jira/browse/FLINK-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657694#comment-16657694 ] TisonKun commented on FLINK-10558: -- [~till.rohrmann] I'd like to know if the present status excepted. As [~yanghua] commented on GitHub. bq. @tillrohrmann For test case testTaskManagerFailure, the current test mode (based on stdout and stderr's special print message) cannot verify the Flip-6 scene. Because it doesn't pre-start a TM, so I can only commit a job to drive the TM registration, but this will cause the Job's output to flush out the flink's output, which shares stdout and stderr. Therefore, many judgments will not take effect. I wonder if on FLIP-6 yarn session mode, we'd like to (really) pre-start TMs, if so, we have to mark that the workaround(submit a job to drive th TM registration) is temporary. > Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new > code base > --- > > Key: FLINK-10558 > URL: https://issues.apache.org/jira/browse/FLINK-10558 > Project: Flink > Issue Type: Sub-task >Reporter: vinoyang >Assignee: TisonKun >Priority: Minor > > {{YARNHighAvailabilityITCase}}, > {{YARNSessionCapacitySchedulerITCase#testClientStartup,}} > {{YARNSessionCapacitySchedulerITCase#testTaskManagerFailure}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10640) Enable Slot Resource Profile for Resource Management
[ https://issues.apache.org/jira/browse/FLINK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658869#comment-16658869 ] TisonKun commented on FLINK-10640: -- Hi [~wuzang] I think this is a big proposal on resource management and require a discussion on the full picture. I'd like to know your thoughts between "roughly" slot-based management and real resource based management. It seems you are not going to change the slot concept but just change its generation from parallelism based to resource managed. And on this topic, what is your consider to work with FLINK-10407? Also, you give out a concept "elastic session" and say that it can somehow act as session mode. What is the relation between "elastic session" and current session/per-job mode and do you want to replace them with such "elastic session"? By the way, typically you can assign this JIRA to your self for further contributing. I will ping [~till.rohrmann] here since he is the committer now works on slot/resource management and can grant the contribution bit to you if needed. > Enable Slot Resource Profile for Resource Management > > > Key: FLINK-10640 > URL: https://issues.apache.org/jira/browse/FLINK-10640 > Project: Flink > Issue Type: New Feature > Components: ResourceManager >Reporter: Tony Xintong Song >Priority: Major > > Motivation & Backgrounds > * The existing concept of task slots roughly represents how many pipeline of > tasks a TaskManager can hold. However, it does not consider the differences > in resource needs and usage of individual tasks. Enabling resource profiles > of slots may allow Flink to better allocate execution resources according to > tasks fine-grained resource needs. > * The community version Flink already contains APIs and some implementation > for slot resource profile. However, such logic is not truly used. > (ResourceProfile of slot requests is by default set to UNKNOWN with negative > values, thus matches any given slot.) > Preliminary Design > * Slot Management > A slot represents a certain amount of resources for a single pipeline of > tasks to run in on a TaskManager. Initially, a TaskManager does not have any > slots but a total amount of resources. When allocating, the ResourceManager > finds proper TMs to generate new slots for the tasks to run according to the > slot requests. Once generated, the slot's size (resource profile) does not > change until it's freed. ResourceManager can apply different, portable > strategies to allocate slots from TaskManagers. > * TM Management > The size and number of TaskManagers and when to start them can also be > flexible. TMs can be started and released dynamically, and may have different > sizes. We may have many different, portable strategies. E.g., an elastic > session that can run multiple jobs like the session mode while dynamically > adjusting the size of session (number of TMs) according to the realtime > working load. > * About Slot Sharing > Slot sharing is a good heuristic to easily calculate how many slots needed > to get the job running and get better utilization when there is no resource > profile in slots. However, with resource profiles enabling finer-grained > resource management, each individual task has its specific resource need and > it does not make much sense to have multiple tasks sharing the resource of > the same slot. Instead, we may introduce locality preferences/constraints to > support the semantics of putting tasks in same/different TMs in a more > general way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10665) Port YARNSessionFIFOITCase#testJavaAPI to new codebase
TisonKun created FLINK-10665: Summary: Port YARNSessionFIFOITCase#testJavaAPI to new codebase Key: FLINK-10665 URL: https://issues.apache.org/jira/browse/FLINK-10665 Project: Flink Issue Type: Sub-task Components: Tests Affects Versions: 1.7.0 Reporter: TisonKun Assignee: TisonKun Fix For: 1.7.0 Port {{YARNSessionFIFOITCase#testJavaAPI}} to new codebase -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10666) Port YarnClusterDescriptorTest to new codebase
TisonKun created FLINK-10666: Summary: Port YarnClusterDescriptorTest to new codebase Key: FLINK-10666 URL: https://issues.apache.org/jira/browse/FLINK-10666 Project: Flink Issue Type: Sub-task Components: Tests Affects Versions: 1.7.0 Reporter: TisonKun Assignee: TisonKun Fix For: 1.7.0 Port {{YarnClusterDescriptorTest}} to new codebase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10558) Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new code base
[ https://issues.apache.org/jira/browse/FLINK-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662197#comment-16662197 ] TisonKun commented on FLINK-10558: -- For {{YARNHighAvailabilityITCase}}, I wonder how can we kill an AM. Previously we get {{JobManagerGateway}} by {{LeaderRetrievalUtils#retrieveLeaderGateway}} which I believe not alive on FLIP-6 codebase. Besides, if we kill JM by {{postStop}}, the Job would be suspend. I think what we need is externally kill AMs, with configuration Yarn restart attempt, to test if the blocking job can be taken over. > Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new > code base > --- > > Key: FLINK-10558 > URL: https://issues.apache.org/jira/browse/FLINK-10558 > Project: Flink > Issue Type: Sub-task >Reporter: vinoyang >Assignee: TisonKun >Priority: Minor > > {{YARNHighAvailabilityITCase}}, > {{YARNSessionCapacitySchedulerITCase#testClientStartup,}} > {{YARNSessionCapacitySchedulerITCase#testTaskManagerFailure}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-10924) StreamExecutionEnvironment method javadocs incorrect in regards to used charset
[ https://issues.apache.org/jira/browse/FLINK-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun reassigned FLINK-10924: Assignee: TisonKun > StreamExecutionEnvironment method javadocs incorrect in regards to used > charset > --- > > Key: FLINK-10924 > URL: https://issues.apache.org/jira/browse/FLINK-10924 > Project: Flink > Issue Type: Bug > Components: Documentation, Streaming >Affects Versions: 1.3.3, 1.4.2, 1.5.5, 1.6.2, 1.7.0 >Reporter: Chesnay Schepler >Assignee: TisonKun >Priority: Major > > Various methods of the {{StreamExecutionEnvironment}} (like > [readTextFile|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L939]) > are documented to be using the system's default charset if none is specified. > This is incorrect as they default to UTF-8 instead. The javadocs should be > updated to reflect this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10924) StreamExecutionEnvironment method javadocs incorrect in regards to used charset
[ https://issues.apache.org/jira/browse/FLINK-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16691414#comment-16691414 ] TisonKun commented on FLINK-10924: -- Start to digging and correcting in hours. > StreamExecutionEnvironment method javadocs incorrect in regards to used > charset > --- > > Key: FLINK-10924 > URL: https://issues.apache.org/jira/browse/FLINK-10924 > Project: Flink > Issue Type: Bug > Components: Documentation, Streaming >Affects Versions: 1.3.3, 1.4.2, 1.5.5, 1.6.2, 1.7.0 >Reporter: Chesnay Schepler >Assignee: TisonKun >Priority: Major > > Various methods of the {{StreamExecutionEnvironment}} (like > [readTextFile|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L939]) > are documented to be using the system's default charset if none is specified. > This is incorrect as they default to UTF-8 instead. The javadocs should be > updated to reflect this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10914) Prefer notifying status changed to waiting and checking in ExecutionGraph tests
[ https://issues.apache.org/jira/browse/FLINK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694152#comment-16694152 ] TisonKun commented on FLINK-10914: -- [~till.rohrmann] [~StephanEwen] do you mean, as your conversation above, there is some on-going effort to change the concurrency model of the execution graph, and so that we delay this issue until that has been done and see how to improve this test stability? If so, I'd like to see if I can help with the work that discuss/implement the new concurrency model(single threaded dispatcher). > Prefer notifying status changed to waiting and checking in ExecutionGraph > tests > --- > > Key: FLINK-10914 > URL: https://issues.apache.org/jira/browse/FLINK-10914 > Project: Flink > Issue Type: Improvement > Components: Tests >Affects Versions: 1.8.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.8.0 > > > Currently there are several tests of {{ExecutionGraph}} based on > {{ExecutionGraphTestUtils#waitUntilJobStatus}} and > {{ExecutionGraphTestUtils#waitUntilExecution(Vertex)State}}. I notice that we > have {{JobStatusListener}} s and {{ExecutionStatusListener}} s registered on > a {{ExecutionGraph}}. It is possible to replace the waiting version with a > notifying version, which is with less cpu workload and more reliable(the > waiting version might miss some very quick status/state changes). > What do you think [~till.rohrmann] [~Zentol]? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10436) Example config uses deprecated key jobmanager.rpc.address
[ https://issues.apache.org/jira/browse/FLINK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10436: - Issue Type: Bug (was: Sub-task) Parent: (was: FLINK-10392) > Example config uses deprecated key jobmanager.rpc.address > - > > Key: FLINK-10436 > URL: https://issues.apache.org/jira/browse/FLINK-10436 > Project: Flink > Issue Type: Bug > Components: Startup Shell Scripts >Affects Versions: 1.7.0 >Reporter: Ufuk Celebi >Assignee: TisonKun >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > > The example {{flink-conf.yaml}} shipped as part of the Flink distribution > (https://github.com/apache/flink/blob/master/flink-dist/src/main/resources/flink-conf.yaml) > has the following entry: > {code} > jobmanager.rpc.address: localhost > {code} > When using this key, the following deprecation warning is logged. > {code} > 2018-09-26 12:01:46,608 WARN org.apache.flink.configuration.Configuration > - Config uses deprecated configuration key > 'jobmanager.rpc.address' instead of proper key 'rest.address' > {code} > The example config should not use deprecated config options. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10436) Example config uses deprecated key jobmanager.rpc.address
[ https://issues.apache.org/jira/browse/FLINK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695943#comment-16695943 ] TisonKun commented on FLINK-10436: -- Change to an individual issue since it is irrelevant to legacy mode. > Example config uses deprecated key jobmanager.rpc.address > - > > Key: FLINK-10436 > URL: https://issues.apache.org/jira/browse/FLINK-10436 > Project: Flink > Issue Type: Bug > Components: Startup Shell Scripts >Affects Versions: 1.7.0 >Reporter: Ufuk Celebi >Assignee: TisonKun >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > > The example {{flink-conf.yaml}} shipped as part of the Flink distribution > (https://github.com/apache/flink/blob/master/flink-dist/src/main/resources/flink-conf.yaml) > has the following entry: > {code} > jobmanager.rpc.address: localhost > {code} > When using this key, the following deprecation warning is logged. > {code} > 2018-09-26 12:01:46,608 WARN org.apache.flink.configuration.Configuration > - Config uses deprecated configuration key > 'jobmanager.rpc.address' instead of proper key 'rest.address' > {code} > The example config should not use deprecated config options. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)
[ https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701842#comment-16701842 ] TisonKun commented on FLINK-10333: -- Agree with [~xiaogang.shi] that our leader elections is not that perfect. The main issue here is that we check the leadership and do the real action non-atomically. Theoretically implementation [~xiaogang.shi] suggested and that [~StephanEwen] suggested worked. [~xiaogang.shi]'s suggestion looks good to me but it causes a bit many works to do. [~StephanEwen]'s suggestion is more simple but I don't find a settable {{id}}(i.e., the payload) in curator 2.12.0 and we need to add leader ID to the payload after leadership granted. > Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, > CompletedCheckpoints) > - > > Key: FLINK-10333 > URL: https://issues.apache.org/jira/browse/FLINK-10333 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.0, 1.7.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.8.0 > > > While going over the ZooKeeper based stores > ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, > {{ZooKeeperCompletedCheckpointStore}}) and the underlying > {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were > introduced with past incremental changes. > * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} > or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization > problems will either lead to removing the Znode or not > * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of > exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case > of a failure) > * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be > better to move {{RetrievableStateStorageHelper}} out of it for a better > separation of concerns > * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even > if it is locked. This should not happen since it could leave another system > in an inconsistent state (imagine a changed {{JobGraph}} which restores from > an old checkpoint) > * Redundant but also somewhat inconsistent put logic in the different stores > * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} > which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}} > * Getting rid of the {{SubmittedJobGraphListener}} would be helpful > These problems made me think how reliable these components actually work. > Since these components are very important, I propose to refactor them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)
[ https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701846#comment-16701846 ] TisonKun commented on FLINK-10333: -- [~xiaogang.shi] adding the leader ID as payload and do the real action with that ID, then the action could be guarded by the fencing token mechanism. That is, not rely on zookeeper's transaction but with the atomic between lock and fencing token, a node/component lost his leadership could not actually do the real action. > Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, > CompletedCheckpoints) > - > > Key: FLINK-10333 > URL: https://issues.apache.org/jira/browse/FLINK-10333 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.0, 1.7.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.8.0 > > > While going over the ZooKeeper based stores > ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, > {{ZooKeeperCompletedCheckpointStore}}) and the underlying > {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were > introduced with past incremental changes. > * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} > or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization > problems will either lead to removing the Znode or not > * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of > exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case > of a failure) > * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be > better to move {{RetrievableStateStorageHelper}} out of it for a better > separation of concerns > * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even > if it is locked. This should not happen since it could leave another system > in an inconsistent state (imagine a changed {{JobGraph}} which restores from > an old checkpoint) > * Redundant but also somewhat inconsistent put logic in the different stores > * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} > which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}} > * Getting rid of the {{SubmittedJobGraphListener}} would be helpful > These problems made me think how reliable these components actually work. > Since these components are very important, I propose to refactor them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)
[ https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701842#comment-16701842 ] TisonKun edited comment on FLINK-10333 at 11/28/18 12:58 PM: - Agree with [~xiaogang.shi] that our leader elections is not that perfect. The main issue here is that we check the leadership and do the real action non-atomically. Sounds like implementation [~xiaogang.shi] suggested and that [~StephanEwen] suggested worked. [~xiaogang.shi]'s suggestion looks good to me but it causes a bit many works to do. [~StephanEwen]'s suggestion is more simple but I don't find a settable {{id}}(i.e., the payload) in curator 2.12.0 and we need to add leader ID to the payload after leadership granted. was (Author: tison): Agree with [~xiaogang.shi] that our leader elections is not that perfect. The main issue here is that we check the leadership and do the real action non-atomically. Theoretically implementation [~xiaogang.shi] suggested and that [~StephanEwen] suggested worked. [~xiaogang.shi]'s suggestion looks good to me but it causes a bit many works to do. [~StephanEwen]'s suggestion is more simple but I don't find a settable {{id}}(i.e., the payload) in curator 2.12.0 and we need to add leader ID to the payload after leadership granted. > Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, > CompletedCheckpoints) > - > > Key: FLINK-10333 > URL: https://issues.apache.org/jira/browse/FLINK-10333 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.0, 1.7.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.8.0 > > > While going over the ZooKeeper based stores > ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, > {{ZooKeeperCompletedCheckpointStore}}) and the underlying > {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were > introduced with past incremental changes. > * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} > or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization > problems will either lead to removing the Znode or not > * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of > exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case > of a failure) > * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be > better to move {{RetrievableStateStorageHelper}} out of it for a better > separation of concerns > * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even > if it is locked. This should not happen since it could leave another system > in an inconsistent state (imagine a changed {{JobGraph}} which restores from > an old checkpoint) > * Redundant but also somewhat inconsistent put logic in the different stores > * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} > which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}} > * Getting rid of the {{SubmittedJobGraphListener}} would be helpful > These problems made me think how reliable these components actually work. > Since these components are very important, I propose to refactor them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)
[ https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-10333: - Comment: was deleted (was: [~xiaogang.shi] adding the leader ID as payload and do the real action with that ID, then the action could be guarded by the fencing token mechanism. That is, not rely on zookeeper's transaction but with the atomic between lock and fencing token, a node/component lost his leadership could not actually do the real action.) > Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, > CompletedCheckpoints) > - > > Key: FLINK-10333 > URL: https://issues.apache.org/jira/browse/FLINK-10333 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.0, 1.7.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.8.0 > > > While going over the ZooKeeper based stores > ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, > {{ZooKeeperCompletedCheckpointStore}}) and the underlying > {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were > introduced with past incremental changes. > * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} > or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization > problems will either lead to removing the Znode or not > * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of > exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case > of a failure) > * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be > better to move {{RetrievableStateStorageHelper}} out of it for a better > separation of concerns > * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even > if it is locked. This should not happen since it could leave another system > in an inconsistent state (imagine a changed {{JobGraph}} which restores from > an old checkpoint) > * Redundant but also somewhat inconsistent put logic in the different stores > * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} > which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}} > * Getting rid of the {{SubmittedJobGraphListener}} would be helpful > These problems made me think how reliable these components actually work. > Since these components are very important, I propose to refactor them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)
[ https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16702631#comment-16702631 ] TisonKun commented on FLINK-10333: -- [~StephanEwen] infer your original proposal is to make lock and fencing token atomic, then the JobManager does action with its lock, meaning atomically with its fencing token. Thus the action would be filtered by the fencing token. Is it misunderstood? After a look into ZK and curator, I prefer [~xiaogang.shi]'s suggestion. This approach is a bit costly in implement but quite natural. We might do that all with curator but curator does not expose {{election-node-path(a EPHEMERAL_SEQUENTIAL node)}}, which force fallback to zookeeper. > Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, > CompletedCheckpoints) > - > > Key: FLINK-10333 > URL: https://issues.apache.org/jira/browse/FLINK-10333 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.0, 1.7.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.8.0 > > > While going over the ZooKeeper based stores > ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, > {{ZooKeeperCompletedCheckpointStore}}) and the underlying > {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were > introduced with past incremental changes. > * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} > or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization > problems will either lead to removing the Znode or not > * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of > exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case > of a failure) > * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be > better to move {{RetrievableStateStorageHelper}} out of it for a better > separation of concerns > * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even > if it is locked. This should not happen since it could leave another system > in an inconsistent state (imagine a changed {{JobGraph}} which restores from > an old checkpoint) > * Redundant but also somewhat inconsistent put logic in the different stores > * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} > which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}} > * Getting rid of the {{SubmittedJobGraphListener}} would be helpful > These problems made me think how reliable these components actually work. > Since these components are very important, I propose to refactor them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10558) Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new code base
[ https://issues.apache.org/jira/browse/FLINK-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705067#comment-16705067 ] TisonKun commented on FLINK-10558: -- Hi [~till.rohrmann], I think this jira and FLINK-10665 are the final broker of getting rid of directly akka based yarn cluster. If we actually implement yarn session mode as proposed by FLIP-6, we should pass {{YARNSessionCapacitySchedulerITCase#testClientStartup}}, {{YARNSessionCapacitySchedulerITCase#testTaskManagerFailure}} without hacking to start a real job. Or say we can remain the structure of these two tests. Thus I suggest regard these two as not yet implement and so ignored tests. If so, with FLINK-10558(this jira) and FLINK-10665 resolved, we can start to remove legacy flink-yarn production code. > Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new > code base > --- > > Key: FLINK-10558 > URL: https://issues.apache.org/jira/browse/FLINK-10558 > Project: Flink > Issue Type: Sub-task >Reporter: vinoyang >Assignee: TisonKun >Priority: Minor > Labels: pull-request-available > > {{YARNHighAvailabilityITCase}}, > {{YARNSessionCapacitySchedulerITCase#testClientStartup,}} > {{YARNSessionCapacitySchedulerITCase#testTaskManagerFailure}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10640) Enable Slot Resource Profile for Resource Management
[ https://issues.apache.org/jira/browse/FLINK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705598#comment-16705598 ] TisonKun commented on FLINK-10640: -- About the "TM Management" section, currently yarn session mode launches without starting any TM. Does this JIRA also cover starting arbitrary TMs on yarn session launched? [~till.rohrmann] [~wuzang] what do you think about the gap between current TM management and the proposed one, especially flink on yarn? > Enable Slot Resource Profile for Resource Management > > > Key: FLINK-10640 > URL: https://issues.apache.org/jira/browse/FLINK-10640 > Project: Flink > Issue Type: New Feature > Components: ResourceManager >Reporter: Tony Xintong Song >Priority: Major > > Motivation & Backgrounds > * The existing concept of task slots roughly represents how many pipeline of > tasks a TaskManager can hold. However, it does not consider the differences > in resource needs and usage of individual tasks. Enabling resource profiles > of slots may allow Flink to better allocate execution resources according to > tasks fine-grained resource needs. > * The community version Flink already contains APIs and some implementation > for slot resource profile. However, such logic is not truly used. > (ResourceProfile of slot requests is by default set to UNKNOWN with negative > values, thus matches any given slot.) > Preliminary Design > * Slot Management > A slot represents a certain amount of resources for a single pipeline of > tasks to run in on a TaskManager. Initially, a TaskManager does not have any > slots but a total amount of resources. When allocating, the ResourceManager > finds proper TMs to generate new slots for the tasks to run according to the > slot requests. Once generated, the slot's size (resource profile) does not > change until it's freed. ResourceManager can apply different, portable > strategies to allocate slots from TaskManagers. > * TM Management > The size and number of TaskManagers and when to start them can also be > flexible. TMs can be started and released dynamically, and may have different > sizes. We may have many different, portable strategies. E.g., an elastic > session that can run multiple jobs like the session mode while dynamically > adjusting the size of session (number of TMs) according to the realtime > working load. > * About Slot Sharing > Slot sharing is a good heuristic to easily calculate how many slots needed > to get the job running and get better utilization when there is no resource > profile in slots. However, with resource profiles enabling finer-grained > resource management, each individual task has its specific resource need and > it does not make much sense to have multiple tasks sharing the resource of > the same slot. Instead, we may introduce locality preferences/constraints to > support the semantics of putting tasks in same/different TMs in a more > general way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11056) Remove MesosApplicationMasterRunner
TisonKun created FLINK-11056: Summary: Remove MesosApplicationMasterRunner Key: FLINK-11056 URL: https://issues.apache.org/jira/browse/FLINK-11056 Project: Flink Issue Type: Sub-task Components: Mesos Affects Versions: 1.8.0 Reporter: TisonKun Assignee: TisonKun Fix For: 1.8.0 {{MesosApplicationMasterRunner}} is for legacy mode and none depend on it now, so that we can do a directly removal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11069) Remove FutureUtil
[ https://issues.apache.org/jira/browse/FLINK-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun reassigned FLINK-11069: Assignee: TisonKun > Remove FutureUtil > - > > Key: FLINK-11069 > URL: https://issues.apache.org/jira/browse/FLINK-11069 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Till Rohrmann >Assignee: TisonKun >Priority: Minor > > The utility class {{FutureUtil}} contains duplicate methods and method which > are not implemented optimally. I suggest to merge {{FutureUtil}} with > {{FutureUtils}} and get rid of the {{waitForAll}} method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11078) Capability to define the numerical range for running TaskExecutors
TisonKun created FLINK-11078: Summary: Capability to define the numerical range for running TaskExecutors Key: FLINK-11078 URL: https://issues.apache.org/jira/browse/FLINK-11078 Project: Flink Issue Type: New Feature Components: Client, ResourceManager Affects Versions: 1.8.0 Reporter: TisonKun Assignee: TisonKun Fix For: 1.8.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10640) Enable Slot Resource Profile for Resource Management
[ https://issues.apache.org/jira/browse/FLINK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710041#comment-16710041 ] TisonKun commented on FLINK-10640: -- @[~wuzang] After an offline discuss with [~till.rohrmann], for part of "TM Management" issue, i.e., start arbitrary TMs on yarn session launched, I propose introduce a pair (min, max) represents the minimum and maximum for the number of running {{TaskExecutor}}s. With such option, when setting {{minimum = maximum = n}} we effectively have the same behaviour as before with the pre-Flip-6 code, that is, a fixed number of pre-allocated TMs; and when setting {{minimum = 0, maximum = inf}} we effectively have the same behaviour as current code path. I think such a feature improve "TM Management" especially when user want to running job on a specific cluster and require less changes than achieving an arbitrarily flexible "TM Management". What do you think? > Enable Slot Resource Profile for Resource Management > > > Key: FLINK-10640 > URL: https://issues.apache.org/jira/browse/FLINK-10640 > Project: Flink > Issue Type: New Feature > Components: ResourceManager >Reporter: Tony Xintong Song >Priority: Major > > Motivation & Backgrounds > * The existing concept of task slots roughly represents how many pipeline of > tasks a TaskManager can hold. However, it does not consider the differences > in resource needs and usage of individual tasks. Enabling resource profiles > of slots may allow Flink to better allocate execution resources according to > tasks fine-grained resource needs. > * The community version Flink already contains APIs and some implementation > for slot resource profile. However, such logic is not truly used. > (ResourceProfile of slot requests is by default set to UNKNOWN with negative > values, thus matches any given slot.) > Preliminary Design > * Slot Management > A slot represents a certain amount of resources for a single pipeline of > tasks to run in on a TaskManager. Initially, a TaskManager does not have any > slots but a total amount of resources. When allocating, the ResourceManager > finds proper TMs to generate new slots for the tasks to run according to the > slot requests. Once generated, the slot's size (resource profile) does not > change until it's freed. ResourceManager can apply different, portable > strategies to allocate slots from TaskManagers. > * TM Management > The size and number of TaskManagers and when to start them can also be > flexible. TMs can be started and released dynamically, and may have different > sizes. We may have many different, portable strategies. E.g., an elastic > session that can run multiple jobs like the session mode while dynamically > adjusting the size of session (number of TMs) according to the realtime > working load. > * About Slot Sharing > Slot sharing is a good heuristic to easily calculate how many slots needed > to get the job running and get better utilization when there is no resource > profile in slots. However, with resource profiles enabling finer-grained > resource management, each individual task has its specific resource need and > it does not make much sense to have multiple tasks sharing the resource of > the same slot. Instead, we may introduce locality preferences/constraints to > support the semantics of putting tasks in same/different TMs in a more > general way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11078) Capability to define the numerical range for running TaskExecutors
[ https://issues.apache.org/jira/browse/FLINK-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-11078: - Description: In pre-FLIP-6 context, we start a yarn session with fixed number of running {{TaskManagers}}. This is good for user to run a series of small jobs on a specific cluster and reduce the cost of deploying a cluster per job. In current code base, we start a yarn session with none pre-allocated TMs, but allocates them on a certain job submitted and running. To get benefits from both mode, I propose introducing a pair {{(min, max)}} represents the minimum and maximum for the number of running {{TaskExecutors}}. With such option, when setting minimum = maximum = n we effectively have the same behaviour as before with the pre-Flip-6 code; and when setting minimum = 0, maximum = inf we effectively have the same behaviour as current code path. Most of the implementation area would be passing such option in {{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on {{grantLeadership}} and {{stopWorker}}. Hopefully the changes are not that big and I would try to draft more details, and be glad to your suggestions and ideas. > Capability to define the numerical range for running TaskExecutors > -- > > Key: FLINK-11078 > URL: https://issues.apache.org/jira/browse/FLINK-11078 > Project: Flink > Issue Type: New Feature > Components: Client, ResourceManager >Affects Versions: 1.8.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.8.0 > > > In pre-FLIP-6 context, we start a yarn session with fixed number of running > {{TaskManagers}}. This is good for user to run a series of small jobs on a > specific cluster and reduce the cost of deploying a cluster per job. In > current code base, we start a yarn session with none pre-allocated TMs, but > allocates them on a certain job submitted and running. > To get benefits from both mode, I propose introducing a pair {{(min, max)}} > represents the minimum and maximum for the number of running > {{TaskExecutors}}. > With such option, when setting minimum = maximum = n we effectively have the > same behaviour as before with the pre-Flip-6 code; and when setting minimum = > 0, maximum = inf we effectively have the same behaviour as current code path. > Most of the implementation area would be passing such option in > {{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on > {{grantLeadership}} and {{stopWorker}}. > Hopefully the changes are not that big and I would try to draft more details, > and be glad to your suggestions and ideas. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11078) Capability to define the numerical range for running TaskExecutors
[ https://issues.apache.org/jira/browse/FLINK-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710072#comment-16710072 ] TisonKun commented on FLINK-11078: -- Yes. I've updated the description. > Capability to define the numerical range for running TaskExecutors > -- > > Key: FLINK-11078 > URL: https://issues.apache.org/jira/browse/FLINK-11078 > Project: Flink > Issue Type: New Feature > Components: Client, ResourceManager >Affects Versions: 1.8.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.8.0 > > > In pre-FLIP-6 context, we start a yarn session with fixed number of running > {{TaskManagers}}. This is good for user to run a series of small jobs on a > specific cluster and reduce the cost of deploying a cluster per job. In > current code base, we start a yarn session with none pre-allocated TMs, but > allocates them on a certain job submitted and running. > To get benefits from both mode, I propose introducing a pair {{(min, max)}} > represents the minimum and maximum for the number of running > {{TaskExecutors}}. > With such option, when setting minimum = maximum = n we effectively have the > same behaviour as before with the pre-Flip-6 code; and when setting minimum = > 0, maximum = inf we effectively have the same behaviour as current code path. > Most of the implementation area would be passing such option in > {{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on > {{grantLeadership}} and {{stopWorker}}. > Hopefully the changes are not that big and I would try to draft more details, > and be glad to your suggestions and ideas. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-10640) Enable Slot Resource Profile for Resource Management
[ https://issues.apache.org/jira/browse/FLINK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710041#comment-16710041 ] TisonKun edited comment on FLINK-10640 at 12/5/18 1:36 PM: --- @[~wuzang] After an offline discuss with [~till.rohrmann], for part of "TM Management" issue, i.e., start arbitrary TMs on yarn session launched, I propose introduce a pair (min, max) represents the minimum and maximum for the number of running {{TaskExecutor}}s. With such option, when setting {{minimum = maximum = n}} we effectively have the same behaviour as before with the pre-Flip-6 code, that is, a fixed number of pre-allocated TMs; and when setting {{minimum = 0, maximum = inf}} we effectively have the same behaviour as current code path. I think such a feature improve "TM Management" especially when user want to running job on a specific cluster and require less changes than achieving an arbitrarily flexible "TM Management". What do you think? (FYI I create a separated JIRA FLINK-11078 to discuss this topic) was (Author: tison): @[~wuzang] After an offline discuss with [~till.rohrmann], for part of "TM Management" issue, i.e., start arbitrary TMs on yarn session launched, I propose introduce a pair (min, max) represents the minimum and maximum for the number of running {{TaskExecutor}}s. With such option, when setting {{minimum = maximum = n}} we effectively have the same behaviour as before with the pre-Flip-6 code, that is, a fixed number of pre-allocated TMs; and when setting {{minimum = 0, maximum = inf}} we effectively have the same behaviour as current code path. I think such a feature improve "TM Management" especially when user want to running job on a specific cluster and require less changes than achieving an arbitrarily flexible "TM Management". What do you think? > Enable Slot Resource Profile for Resource Management > > > Key: FLINK-10640 > URL: https://issues.apache.org/jira/browse/FLINK-10640 > Project: Flink > Issue Type: New Feature > Components: ResourceManager >Reporter: Tony Xintong Song >Priority: Major > > Motivation & Backgrounds > * The existing concept of task slots roughly represents how many pipeline of > tasks a TaskManager can hold. However, it does not consider the differences > in resource needs and usage of individual tasks. Enabling resource profiles > of slots may allow Flink to better allocate execution resources according to > tasks fine-grained resource needs. > * The community version Flink already contains APIs and some implementation > for slot resource profile. However, such logic is not truly used. > (ResourceProfile of slot requests is by default set to UNKNOWN with negative > values, thus matches any given slot.) > Preliminary Design > * Slot Management > A slot represents a certain amount of resources for a single pipeline of > tasks to run in on a TaskManager. Initially, a TaskManager does not have any > slots but a total amount of resources. When allocating, the ResourceManager > finds proper TMs to generate new slots for the tasks to run according to the > slot requests. Once generated, the slot's size (resource profile) does not > change until it's freed. ResourceManager can apply different, portable > strategies to allocate slots from TaskManagers. > * TM Management > The size and number of TaskManagers and when to start them can also be > flexible. TMs can be started and released dynamically, and may have different > sizes. We may have many different, portable strategies. E.g., an elastic > session that can run multiple jobs like the session mode while dynamically > adjusting the size of session (number of TMs) according to the realtime > working load. > * About Slot Sharing > Slot sharing is a good heuristic to easily calculate how many slots needed > to get the job running and get better utilization when there is no resource > profile in slots. However, with resource profiles enabling finer-grained > resource management, each individual task has its specific resource need and > it does not make much sense to have multiple tasks sharing the resource of > the same slot. Instead, we may introduce locality preferences/constraints to > support the semantics of putting tasks in same/different TMs in a more > general way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11078) Capability to define the numerical range for running TaskExecutors
[ https://issues.apache.org/jira/browse/FLINK-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-11078: - Description: In pre-FLIP-6 context, we start a yarn session with fixed number of running {{TaskManagers}}. This is good for user to run a series of small jobs on a specific cluster and reduce the cost of deploying a cluster per job. In current code base, we start a yarn session with none pre-allocated TMs, but allocates them on a certain job submitted and running. To get benefits from both mode, I propose introducing a pair {{(min, max)}} represents the minimum and maximum for the number of running {{TaskExecutors}}. With such option, when setting minimum = maximum = n we effectively have the same behaviour as before with the pre-Flip-6 code; and when setting minimum = 0, maximum = inf we effectively have the same behaviour as current code path. Most of the implementation area would be passing such option in {{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on {{grantLeadership}}(thus starting the {{SlotManager}}) and {{stopWorker}}. Hopefully the changes are not that big and I would try to draft more details, and be glad to your suggestions and ideas. was: In pre-FLIP-6 context, we start a yarn session with fixed number of running {{TaskManagers}}. This is good for user to run a series of small jobs on a specific cluster and reduce the cost of deploying a cluster per job. In current code base, we start a yarn session with none pre-allocated TMs, but allocates them on a certain job submitted and running. To get benefits from both mode, I propose introducing a pair {{(min, max)}} represents the minimum and maximum for the number of running {{TaskExecutors}}. With such option, when setting minimum = maximum = n we effectively have the same behaviour as before with the pre-Flip-6 code; and when setting minimum = 0, maximum = inf we effectively have the same behaviour as current code path. Most of the implementation area would be passing such option in {{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on {{grantLeadership}} and {{stopWorker}}. Hopefully the changes are not that big and I would try to draft more details, and be glad to your suggestions and ideas. > Capability to define the numerical range for running TaskExecutors > -- > > Key: FLINK-11078 > URL: https://issues.apache.org/jira/browse/FLINK-11078 > Project: Flink > Issue Type: New Feature > Components: Client, ResourceManager >Affects Versions: 1.8.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.8.0 > > > In pre-FLIP-6 context, we start a yarn session with fixed number of running > {{TaskManagers}}. This is good for user to run a series of small jobs on a > specific cluster and reduce the cost of deploying a cluster per job. In > current code base, we start a yarn session with none pre-allocated TMs, but > allocates them on a certain job submitted and running. > To get benefits from both mode, I propose introducing a pair {{(min, max)}} > represents the minimum and maximum for the number of running > {{TaskExecutors}}. > With such option, when setting minimum = maximum = n we effectively have the > same behaviour as before with the pre-Flip-6 code; and when setting minimum = > 0, maximum = inf we effectively have the same behaviour as current code path. > Most of the implementation area would be passing such option in > {{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on > {{grantLeadership}}(thus starting the {{SlotManager}}) and {{stopWorker}}. > Hopefully the changes are not that big and I would try to draft more details, > and be glad to your suggestions and ideas. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11078) Capability to define the numerical range for running TaskExecutors
[ https://issues.apache.org/jira/browse/FLINK-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-11078: - Description: In pre-FLIP-6 context, we start a yarn session with fixed number of running {{TaskManagers}}. This is good for user to run a series of small jobs on a specific cluster and reduce the cost of deploying a cluster per job. In current code base, we start a yarn session with none pre-allocated TMs, but allocates them on a certain job submitted and running. To get benefits from both mode, I propose introducing a pair {{(min, max)}} represents the minimum and maximum for the number of running {{TaskExecutors}}. With such option, when setting minimum = maximum = n we effectively have the same behaviour as before with the pre-Flip-6 code; and when setting minimum = 0, maximum = inf we effectively have the same behaviour as current code path. Most of the implementation area would be passing such option in {{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on {{grantLeadership}}(thus starting the {{SlotManager}}) and {{stopWorker}}. Hopefully the changes are not that big and I would try to draft more details. Glad to see your suggestions and ideas. was: In pre-FLIP-6 context, we start a yarn session with fixed number of running {{TaskManagers}}. This is good for user to run a series of small jobs on a specific cluster and reduce the cost of deploying a cluster per job. In current code base, we start a yarn session with none pre-allocated TMs, but allocates them on a certain job submitted and running. To get benefits from both mode, I propose introducing a pair {{(min, max)}} represents the minimum and maximum for the number of running {{TaskExecutors}}. With such option, when setting minimum = maximum = n we effectively have the same behaviour as before with the pre-Flip-6 code; and when setting minimum = 0, maximum = inf we effectively have the same behaviour as current code path. Most of the implementation area would be passing such option in {{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on {{grantLeadership}}(thus starting the {{SlotManager}}) and {{stopWorker}}. Hopefully the changes are not that big and I would try to draft more details, and be glad to your suggestions and ideas. > Capability to define the numerical range for running TaskExecutors > -- > > Key: FLINK-11078 > URL: https://issues.apache.org/jira/browse/FLINK-11078 > Project: Flink > Issue Type: New Feature > Components: Client, ResourceManager >Affects Versions: 1.8.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.8.0 > > > In pre-FLIP-6 context, we start a yarn session with fixed number of running > {{TaskManagers}}. This is good for user to run a series of small jobs on a > specific cluster and reduce the cost of deploying a cluster per job. In > current code base, we start a yarn session with none pre-allocated TMs, but > allocates them on a certain job submitted and running. > To get benefits from both mode, I propose introducing a pair {{(min, max)}} > represents the minimum and maximum for the number of running > {{TaskExecutors}}. > With such option, when setting minimum = maximum = n we effectively have the > same behaviour as before with the pre-Flip-6 code; and when setting minimum = > 0, maximum = inf we effectively have the same behaviour as current code path. > Most of the implementation area would be passing such option in > {{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on > {{grantLeadership}}(thus starting the {{SlotManager}}) and {{stopWorker}}. > Hopefully the changes are not that big and I would try to draft more details. > Glad to see your suggestions and ideas. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11079) Update LICENSE and NOTICE files for flnk-storm-examples
[ https://issues.apache.org/jira/browse/FLINK-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710100#comment-16710100 ] TisonKun commented on FLINK-11079: -- Given FLINK-10509 I wonder if we are going to drop {{flink-storm}} and thus {{flink-storm-example}} becomes invalid any more? > Update LICENSE and NOTICE files for flnk-storm-examples > --- > > Key: FLINK-11079 > URL: https://issues.apache.org/jira/browse/FLINK-11079 > Project: Flink > Issue Type: Bug > Components: Storm Compatibility >Affects Versions: 1.5.5, 1.6.2, 1.7.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Fix For: 1.5.6, 1.6.3, 1.8.0, 1.7.1 > > > Similar to FLINK-10987 we should also update the {{LICENSE}} and {{NOTICE}} > for {{flink-storm-examples}}. > > This project creates several fat example jars that are deployed to maven > central. > Alternatively we could about dropping these examples. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)
[ https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710963#comment-16710963 ] TisonKun commented on FLINK-10333: -- Hi [~StephanEwen] and [~till.rohrmann] With an offline discuss with [~xiaogang.shi] we see ZK has a transactional mechanism so that we can ensure only the leader writes ZK. Given this knowledge and the inconsistencies [~till.rohrmann] noticed, before go into reimplementation, I did a survey of the usage of ZK based stores in flink. Ideally there is exact one role who writes a specific znode. There are four types of znodes that flink writes. Besides {{SubmittedJobGraphStore}} written by {{Dispatcher}}, {{CompletedCheckpointStore}} written by {{JobMaster}} and {{MesosWorkerStore}} written by {{MesosResourceManager}}, there is the {{RunningJobsRegistry}} that also has a ZK based implementation. All of the first three write ZK with a heavy “store lock”, but as [~xiaogang.shi] pointing out, it still lacks of atomicity. And with the solution based on ZK transaction — for example, a current {{Dispatcher}} leader {{setData}} with {{check}} for {{election-node-path}} — we can ensure the atomicity while getting rid of the lock. For the last one, {{RunningJobsRegistry}}, situation becomes a bit more complex. {{JobManagerRunner}} is responsible for {{setJobRunning}} and {{setJobFinished}} and {{Dispatcher}} is responsible for {{clearJob}}. This is against the ideal that one role for one znode. Also I notice that the gap between the semantic of {{DONE}} and that of clear is ambiguous. {{JobSchedulingStatus}} becomes {{DONE}} only if an {{ArchivedExecutionGraph}} generated so that we can prevent the same job be processed by an approach other than check an ephemeral {{DONE}} status. What if we replace {{setJobFinished}} with clearing {{RunningJobsRegistry}}? > Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, > CompletedCheckpoints) > - > > Key: FLINK-10333 > URL: https://issues.apache.org/jira/browse/FLINK-10333 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.0, 1.7.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.8.0 > > > While going over the ZooKeeper based stores > ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, > {{ZooKeeperCompletedCheckpointStore}}) and the underlying > {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were > introduced with past incremental changes. > * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} > or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization > problems will either lead to removing the Znode or not > * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of > exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case > of a failure) > * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be > better to move {{RetrievableStateStorageHelper}} out of it for a better > separation of concerns > * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even > if it is locked. This should not happen since it could leave another system > in an inconsistent state (imagine a changed {{JobGraph}} which restores from > an old checkpoint) > * Redundant but also somewhat inconsistent put logic in the different stores > * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} > which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}} > * Getting rid of the {{SubmittedJobGraphListener}} would be helpful > These problems made me think how reliable these components actually work. > Since these components are very important, I propose to refactor them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)
[ https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710963#comment-16710963 ] TisonKun edited comment on FLINK-10333 at 12/6/18 11:49 AM: Hi [~StephanEwen] and [~till.rohrmann] With an offline discuss with [~xiaogang.shi] we see ZK has a transactional mechanism so that we can ensure only the leader writes ZK. Given this knowledge and the inconsistencies [~till.rohrmann] noticed, before go into reimplementation, I did a survey of the usage of ZK based stores in flink. Ideally there is exact one role who writes a specific znode. There are four types of znodes that flink writes. Besides {{SubmittedJobGraphStore}} written by {{Dispatcher}}, {{CompletedCheckpointStore}} written by {{JobMaster}} and {{MesosWorkerStore}} written by {{MesosResourceManager}}, there is the {{RunningJobsRegistry}} that also has a ZK based implementation. All of the first three write ZK with a heavy “store lock”, but as [~xiaogang.shi] pointing out, it still lacks of atomicity. And with the solution based on ZK transaction — for example, a current {{Dispatcher}} leader {{setData}} with {{check}} for {{election-node-path}} — we can ensure the atomicity while getting rid of the lock. For the last one, {{RunningJobsRegistry}}, situation becomes a bit more complex. {{JobManagerRunner}} is responsible for {{setJobRunning}} and {{setJobFinished}} and {{Dispatcher}} is responsible for {{clearJob}}. This is against the ideal that one role for one znode. Also I notice that the gap between the semantic of {{DONE}} and that of clear is ambiguous. We might try to prevent the same job be processed by an approach other than check an ephemeral {{DONE}} status. What if we replace {{setJobFinished}} with clearing {{RunningJobsRegistry}}? was (Author: tison): Hi [~StephanEwen] and [~till.rohrmann] With an offline discuss with [~xiaogang.shi] we see ZK has a transactional mechanism so that we can ensure only the leader writes ZK. Given this knowledge and the inconsistencies [~till.rohrmann] noticed, before go into reimplementation, I did a survey of the usage of ZK based stores in flink. Ideally there is exact one role who writes a specific znode. There are four types of znodes that flink writes. Besides {{SubmittedJobGraphStore}} written by {{Dispatcher}}, {{CompletedCheckpointStore}} written by {{JobMaster}} and {{MesosWorkerStore}} written by {{MesosResourceManager}}, there is the {{RunningJobsRegistry}} that also has a ZK based implementation. All of the first three write ZK with a heavy “store lock”, but as [~xiaogang.shi] pointing out, it still lacks of atomicity. And with the solution based on ZK transaction — for example, a current {{Dispatcher}} leader {{setData}} with {{check}} for {{election-node-path}} — we can ensure the atomicity while getting rid of the lock. For the last one, {{RunningJobsRegistry}}, situation becomes a bit more complex. {{JobManagerRunner}} is responsible for {{setJobRunning}} and {{setJobFinished}} and {{Dispatcher}} is responsible for {{clearJob}}. This is against the ideal that one role for one znode. Also I notice that the gap between the semantic of {{DONE}} and that of clear is ambiguous. {{JobSchedulingStatus}} becomes {{DONE}} only if an {{ArchivedExecutionGraph}} generated so that we can prevent the same job be processed by an approach other than check an ephemeral {{DONE}} status. What if we replace {{setJobFinished}} with clearing {{RunningJobsRegistry}}? > Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, > CompletedCheckpoints) > - > > Key: FLINK-10333 > URL: https://issues.apache.org/jira/browse/FLINK-10333 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.0, 1.7.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.8.0 > > > While going over the ZooKeeper based stores > ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, > {{ZooKeeperCompletedCheckpointStore}}) and the underlying > {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were > introduced with past incremental changes. > * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} > or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization > problems will either lead to removing the Znode or not > * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of > exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case > of a failure) > * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be > better to move {{RetrievableStateStorageHelper}} out of it for a better
[jira] [Commented] (FLINK-11097) Migrate flink-table runtime InputFormat classes
[ https://issues.apache.org/jira/browse/FLINK-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712631#comment-16712631 ] TisonKun commented on FLINK-11097: -- [~x1q1j1] if you don't have contributor permission yet(to assign this JIRA to yourself), feel free to send a mail to d...@flink.apache.org mail list. > Migrate flink-table runtime InputFormat classes > > > Key: FLINK-11097 > URL: https://issues.apache.org/jira/browse/FLINK-11097 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: xuqianjin >Priority: Major > > As discussed in FLINK-11065, this is a subtask which migrates flink-table > org.apache.flink.table.runtime.io Scala files in the directory to java in > module flink-table-runtime -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11105) Add a new implementation of the HighAvailabilityServices using etcd
[ https://issues.apache.org/jira/browse/FLINK-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713102#comment-16713102 ] TisonKun commented on FLINK-11105: -- Thanks for bringing this proposal! I like it and think such a new feature needs a bit more in depth discussion. A design document describing the integration of etcd should help. > Add a new implementation of the HighAvailabilityServices using etcd > --- > > Key: FLINK-11105 > URL: https://issues.apache.org/jira/browse/FLINK-11105 > Project: Flink > Issue Type: New Feature >Reporter: Yang Wang >Priority: Major > > In flink, we use HighAvailabilityServices to do many things, e.g. RM/JM > leader election and retrieval. ZooKeeperHaServices is an implementation of > HighAvailabilityServices using Apache ZooKeeper. It is very easy to integrate > with hadoop ecosystem. However, the cloud native and micro service are become > more and more popular. We just need to follow the step and add a new > implementation EtcdHaService using etcd. > Now flink has supported to run StandaloneSession on kubernetes and FLINK-9953 > start to make an native integration with kubernetes. If we have the > EtcdHaService, both of them will benefit from this and we will not have > deploy a zookeeper service on kubernetes cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11106) Remove legacy flink-yarn component
TisonKun created FLINK-11106: Summary: Remove legacy flink-yarn component Key: FLINK-11106 URL: https://issues.apache.org/jira/browse/FLINK-11106 Project: Flink Issue Type: Sub-task Components: Client, Cluster Management, YARN Affects Versions: 1.8.0 Reporter: TisonKun Assignee: TisonKun Fix For: 1.8.0 After flink-yarn tests ported to new codebase, we are able to remove Remove legacy flink-yarn component, including - {{ApplicationClient.scala}} - {{YarnJobManager.scala}} - {{YarnMessages.scala}} - {{YarnTaskManager.scala}} - {{org.apache.flink.yarn.messages.*}} - {{LegacyYarnClusterDescriptor.java}} - {{RegisteredYarnWorkerNode.java}} - {{YarnApplicationMasterRunner.java}} - {{YarnClusterClient.java}} - {{YarnContainerInLaunch.java}} - {{YarnFlinkResourceManager.java}} - {{YarnResourceManagerCallbackHandler.java}} - {{YarnTaskManagerRunnerFactory.java}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11106) Remove legacy flink-yarn component
[ https://issues.apache.org/jira/browse/FLINK-11106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TisonKun updated FLINK-11106: - Description: After flink-yarn tests ported to new codebase, we are able to remove legacy flink-yarn component, including - {{ApplicationClient.scala}} - {{YarnJobManager.scala}} - {{YarnMessages.scala}} - {{YarnTaskManager.scala}} - {{org.apache.flink.yarn.messages.*}} - {{LegacyYarnClusterDescriptor.java}} - {{RegisteredYarnWorkerNode.java}} - {{YarnApplicationMasterRunner.java}} - {{YarnClusterClient.java}} - {{YarnContainerInLaunch.java}} - {{YarnFlinkResourceManager.java}} - {{YarnResourceManagerCallbackHandler.java}} - {{YarnTaskManagerRunnerFactory.java}} was: After flink-yarn tests ported to new codebase, we are able to remove Remove legacy flink-yarn component, including - {{ApplicationClient.scala}} - {{YarnJobManager.scala}} - {{YarnMessages.scala}} - {{YarnTaskManager.scala}} - {{org.apache.flink.yarn.messages.*}} - {{LegacyYarnClusterDescriptor.java}} - {{RegisteredYarnWorkerNode.java}} - {{YarnApplicationMasterRunner.java}} - {{YarnClusterClient.java}} - {{YarnContainerInLaunch.java}} - {{YarnFlinkResourceManager.java}} - {{YarnResourceManagerCallbackHandler.java}} - {{YarnTaskManagerRunnerFactory.java}} > Remove legacy flink-yarn component > -- > > Key: FLINK-11106 > URL: https://issues.apache.org/jira/browse/FLINK-11106 > Project: Flink > Issue Type: Sub-task > Components: Client, Cluster Management, YARN >Affects Versions: 1.8.0 >Reporter: TisonKun >Assignee: TisonKun >Priority: Major > Fix For: 1.8.0 > > > After flink-yarn tests ported to new codebase, we are able to remove legacy > flink-yarn component, including > - {{ApplicationClient.scala}} > - {{YarnJobManager.scala}} > - {{YarnMessages.scala}} > - {{YarnTaskManager.scala}} > - {{org.apache.flink.yarn.messages.*}} > - {{LegacyYarnClusterDescriptor.java}} > - {{RegisteredYarnWorkerNode.java}} > - {{YarnApplicationMasterRunner.java}} > - {{YarnClusterClient.java}} > - {{YarnContainerInLaunch.java}} > - {{YarnFlinkResourceManager.java}} > - {{YarnResourceManagerCallbackHandler.java}} > - {{YarnTaskManagerRunnerFactory.java}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)
[ https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713900#comment-16713900 ] TisonKun commented on FLINK-10333: -- Diving into this issue, it is quite tricky that we'd better list what Problems we need to answer. h3. *Problems on current ZooKeeper based store* 1. We don't have a good znode layout. What is suggested is documented at {{ZooKeeperHaServices}} but {{ZooKeeperCompletedCheckpointStore}} and others break it. 2. We use Curator API to deal with writing ZooKeeper based store, it is lack of atomicity. Specifically, we check the leadership and write the znode non-atomically. 3. We use {{RunningJobsRegistry}} to track if a job running by a jm. Ideally the termination of a job at jm, at dispatcher and the removal of the submitted jobgraph, these three action should happen atomically, but it isn't. h3. *Questions about resolving the problems* *1. How to layout znode?* Actually we could follow the document at {{ZooKeeperHaServices}}. That is, all job related nodes should be children of the job node, and so as cluster. For example, znode like "checkpoint/job_id/1 persistent" is better to change to "job_id/checkpoint/1 persistent". *2. What actions we want to be atomic?* At least two. One is we check the leader ship and write a znode, for example, commit checkpoint only if JobManager is leader. The other is we terminate a job atomically at jm and dispatcher, but since jm and dispatcher are different components, it's important to deal the gap between jm finished the job and dispatcher commit it(specifically, remove submitted jobgraph). Also it is worth to consider how users see the status of a job on WebUI, depending on jm or dispatcher. *3. How to achieve atomicity?* As discussion above, we can take use of ZooKeeper transaction. An alternative might be a distributed lock. This scope is mainly about the implementation. > Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, > CompletedCheckpoints) > - > > Key: FLINK-10333 > URL: https://issues.apache.org/jira/browse/FLINK-10333 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.0, 1.7.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.8.0 > > > While going over the ZooKeeper based stores > ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, > {{ZooKeeperCompletedCheckpointStore}}) and the underlying > {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were > introduced with past incremental changes. > * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} > or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization > problems will either lead to removing the Znode or not > * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of > exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case > of a failure) > * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be > better to move {{RetrievableStateStorageHelper}} out of it for a better > separation of concerns > * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even > if it is locked. This should not happen since it could leave another system > in an inconsistent state (imagine a changed {{JobGraph}} which restores from > an old checkpoint) > * Redundant but also somewhat inconsistent put logic in the different stores > * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} > which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}} > * Getting rid of the {{SubmittedJobGraphListener}} would be helpful > These problems made me think how reliable these components actually work. > Since these components are very important, I propose to refactor them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)
[ https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713900#comment-16713900 ] TisonKun edited comment on FLINK-10333 at 12/9/18 8:47 AM: --- Diving into this issue, it is quite tricky that we'd better list what problem we want to resolve and what questions we need to answer. h3. *Problems on current ZooKeeper based store* 1. We don't have a good znode layout. What is suggested is documented at {{ZooKeeperHaServices}} but {{ZooKeeperCompletedCheckpointStore}} and others break it. 2. We use Curator API to deal with writing ZooKeeper based store, it is lack of atomicity. Specifically, we check the leadership and write the znode non-atomically. 3. We use {{RunningJobsRegistry}} to track if a job running by a jm. Ideally the termination of a job at jm, at dispatcher and the removal of the submitted jobgraph, these three action should happen atomically, but it isn't. h3. *Questions about resolving the problems* *1. How to layout znode?* Actually we could follow the document at {{ZooKeeperHaServices}}. That is, all job related nodes should be children of the job node, and so as cluster. For example, znode like "checkpoint/job_id/1 persistent" is better to change to "job_id/checkpoint/1 persistent". *2. What actions we want to be atomic?* At least two. One is we check the leader ship and write a znode, for example, commit checkpoint only if JobManager is leader. The other is we terminate a job atomically at jm and dispatcher, but since jm and dispatcher are different components, it's important to deal the gap between jm finished the job and dispatcher commit it(specifically, remove submitted jobgraph). Also it is worth to consider how users see the status of a job on WebUI, depending on jm or dispatcher. *3. How to achieve atomicity?* As discussion above, we can take use of ZooKeeper transaction. An alternative might be a distributed lock. This scope is mainly about the implementation. was (Author: tison): Diving into this issue, it is quite tricky that we'd better list what Problems we need to answer. h3. *Problems on current ZooKeeper based store* 1. We don't have a good znode layout. What is suggested is documented at {{ZooKeeperHaServices}} but {{ZooKeeperCompletedCheckpointStore}} and others break it. 2. We use Curator API to deal with writing ZooKeeper based store, it is lack of atomicity. Specifically, we check the leadership and write the znode non-atomically. 3. We use {{RunningJobsRegistry}} to track if a job running by a jm. Ideally the termination of a job at jm, at dispatcher and the removal of the submitted jobgraph, these three action should happen atomically, but it isn't. h3. *Questions about resolving the problems* *1. How to layout znode?* Actually we could follow the document at {{ZooKeeperHaServices}}. That is, all job related nodes should be children of the job node, and so as cluster. For example, znode like "checkpoint/job_id/1 persistent" is better to change to "job_id/checkpoint/1 persistent". *2. What actions we want to be atomic?* At least two. One is we check the leader ship and write a znode, for example, commit checkpoint only if JobManager is leader. The other is we terminate a job atomically at jm and dispatcher, but since jm and dispatcher are different components, it's important to deal the gap between jm finished the job and dispatcher commit it(specifically, remove submitted jobgraph). Also it is worth to consider how users see the status of a job on WebUI, depending on jm or dispatcher. *3. How to achieve atomicity?* As discussion above, we can take use of ZooKeeper transaction. An alternative might be a distributed lock. This scope is mainly about the implementation. > Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, > CompletedCheckpoints) > - > > Key: FLINK-10333 > URL: https://issues.apache.org/jira/browse/FLINK-10333 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.5.3, 1.6.0, 1.7.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.8.0 > > > While going over the ZooKeeper based stores > ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, > {{ZooKeeperCompletedCheckpointStore}}) and the underlying > {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were > introduced with past incremental changes. > * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} > or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization > problems will either lead to removing the Znode or not > * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of >
[jira] [Commented] (FLINK-11125) Remove useless import
[ https://issues.apache.org/jira/browse/FLINK-11125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716274#comment-16716274 ] TisonKun commented on FLINK-11125: -- @[~hequn8128] If the change is not big, it might be satisfied in a hotfix. Otherwise it's better to detail the title as "Remove useless import under XXXClass" or something. > Remove useless import > -- > > Key: FLINK-11125 > URL: https://issues.apache.org/jira/browse/FLINK-11125 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL, Tests >Reporter: Hequn Cheng >Assignee: Hequn Cheng >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-13750) Separate HA services between client-/ and server-side
[ https://issues.apache.org/jira/browse/FLINK-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919295#comment-16919295 ] TisonKun commented on FLINK-13750: -- Hi here. I noticed that {{ClusterClient#getClusterConnectionInfo}} works our of expected in YARN scenario. In YARN scenario we set the port exactly to WebMonitor's port. {code:java} final String host = appReport.getHost(); final int rpcPort = appReport.getRpcPort(); LOG.info("Found application JobManager host name '{}' and port '{}' from supplied application id '{}'", host, rpcPort, applicationId); flinkConfiguration.setString(JobManagerOptions.ADDRESS, host); flinkConfiguration.setInteger(JobManagerOptions.PORT, rpcPort); flinkConfiguration.setString(RestOptions.ADDRESS, host); flinkConfiguration.setInteger(RestOptions.PORT, rpcPort); {code} thus the interface never works as expected. I suspect this doesn't find before because no one relies on it. Given a further investigating all usages(omit used in logs, all under scala-shell and thus remote executor) actually communicate with {{WebMonitor}} instead of {{Dispatcher}}. I'd like to just remove this interface though. > Separate HA services between client-/ and server-side > - > > Key: FLINK-13750 > URL: https://issues.apache.org/jira/browse/FLINK-13750 > Project: Flink > Issue Type: Improvement > Components: Command Line Client, Runtime / Coordination >Reporter: Chesnay Schepler >Assignee: TisonKun >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we use the same {{HighAvailabilityServices}} on the client and > server. However, the client does not need several of the features that the > services currently provide (access to the blobstore or checkpoint metadata). > Additionally, due to how these services are setup they also require the > client to have access to the blob storage, despite it never actually being > used, which can cause issues, like FLINK-13500. > [~Tison] Would be be interested in this issue? -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-13912) Remove ClusterClient#getClusterConnectionInfo
TisonKun created FLINK-13912: Summary: Remove ClusterClient#getClusterConnectionInfo Key: FLINK-13912 URL: https://issues.apache.org/jira/browse/FLINK-13912 Project: Flink Issue Type: Improvement Components: Command Line Client, Runtime / Coordination Affects Versions: 1.10.0 Reporter: TisonKun Fix For: 1.10.0 As discussed in FLINK-13750, we actually doesn't need this method any more. All configuration needed is WebMonitor address and port in standalone HA mode. We can safely remove this method and replace its usages with {{ClusterClient#getWebInterfaceURL}} cc [~till.rohrmann] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-13750) Separate HA services between client-/ and server-side
[ https://issues.apache.org/jira/browse/FLINK-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919380#comment-16919380 ] TisonKun commented on FLINK-13750: -- [~till.rohrmann] Sure. FYI FLINK-13912 and I'd like to mark this one blocked by that one because we can avoid introduce a new endpoint at WebMonitor then. > Separate HA services between client-/ and server-side > - > > Key: FLINK-13750 > URL: https://issues.apache.org/jira/browse/FLINK-13750 > Project: Flink > Issue Type: Improvement > Components: Command Line Client, Runtime / Coordination >Reporter: Chesnay Schepler >Assignee: TisonKun >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we use the same {{HighAvailabilityServices}} on the client and > server. However, the client does not need several of the features that the > services currently provide (access to the blobstore or checkpoint metadata). > Additionally, due to how these services are setup they also require the > client to have access to the blob storage, despite it never actually being > used, which can cause issues, like FLINK-13500. > [~Tison] Would be be interested in this issue? -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-13912) Remove ClusterClient#getClusterConnectionInfo
[ https://issues.apache.org/jira/browse/FLINK-13912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919381#comment-16919381 ] TisonKun commented on FLINK-13912: -- If there is no more concern I volunteer to start progress. > Remove ClusterClient#getClusterConnectionInfo > - > > Key: FLINK-13912 > URL: https://issues.apache.org/jira/browse/FLINK-13912 > Project: Flink > Issue Type: Improvement > Components: Command Line Client, Runtime / Coordination >Affects Versions: 1.10.0 >Reporter: TisonKun >Priority: Major > Fix For: 1.10.0 > > > As discussed in FLINK-13750, we actually doesn't need this method any more. > All configuration needed is WebMonitor address and port in standalone HA > mode. We can safely remove this method and replace its usages with > {{ClusterClient#getWebInterfaceURL}} > cc [~till.rohrmann] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (FLINK-13750) Separate HA services between client-/ and server-side
[ https://issues.apache.org/jira/browse/FLINK-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919295#comment-16919295 ] TisonKun edited comment on FLINK-13750 at 8/30/19 9:55 AM: --- Hi here. I noticed that {{ClusterClient#getClusterConnectionInfo}} works out of expected in YARN scenario. In YARN scenario we set the port exactly to WebMonitor's port. {code:java} final String host = appReport.getHost(); final int rpcPort = appReport.getRpcPort(); LOG.info("Found application JobManager host name '{}' and port '{}' from supplied application id '{}'", host, rpcPort, applicationId); flinkConfiguration.setString(JobManagerOptions.ADDRESS, host); flinkConfiguration.setInteger(JobManagerOptions.PORT, rpcPort); flinkConfiguration.setString(RestOptions.ADDRESS, host); flinkConfiguration.setInteger(RestOptions.PORT, rpcPort); {code} thus the interface never works as expected. I suspect this hasn't been found so far because no one relies on it. Given a further investigating all usages(omit used in logs, all under scala-shell and thus remote executor) actually communicate with {{WebMonitor}} instead of {{Dispatcher}}. I'd like to just remove this interface though. was (Author: tison): Hi here. I noticed that {{ClusterClient#getClusterConnectionInfo}} works our of expected in YARN scenario. In YARN scenario we set the port exactly to WebMonitor's port. {code:java} final String host = appReport.getHost(); final int rpcPort = appReport.getRpcPort(); LOG.info("Found application JobManager host name '{}' and port '{}' from supplied application id '{}'", host, rpcPort, applicationId); flinkConfiguration.setString(JobManagerOptions.ADDRESS, host); flinkConfiguration.setInteger(JobManagerOptions.PORT, rpcPort); flinkConfiguration.setString(RestOptions.ADDRESS, host); flinkConfiguration.setInteger(RestOptions.PORT, rpcPort); {code} thus the interface never works as expected. I suspect this doesn't find before because no one relies on it. Given a further investigating all usages(omit used in logs, all under scala-shell and thus remote executor) actually communicate with {{WebMonitor}} instead of {{Dispatcher}}. I'd like to just remove this interface though. > Separate HA services between client-/ and server-side > - > > Key: FLINK-13750 > URL: https://issues.apache.org/jira/browse/FLINK-13750 > Project: Flink > Issue Type: Improvement > Components: Command Line Client, Runtime / Coordination >Reporter: Chesnay Schepler >Assignee: TisonKun >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we use the same {{HighAvailabilityServices}} on the client and > server. However, the client does not need several of the features that the > services currently provide (access to the blobstore or checkpoint metadata). > Additionally, due to how these services are setup they also require the > client to have access to the blob storage, despite it never actually being > used, which can cause issues, like FLINK-13500. > [~Tison] Would be be interested in this issue? -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-13929) Revisit REST & JM URL
TisonKun created FLINK-13929: Summary: Revisit REST & JM URL Key: FLINK-13929 URL: https://issues.apache.org/jira/browse/FLINK-13929 Project: Flink Issue Type: Improvement Components: Runtime / Configuration, Runtime / Coordination Reporter: TisonKun Currently we have several issues on URL(i.e., ADDRESS and PORT) configurations of REST(WebMonitor) and JM(DispatcherRMComponent). # Client side code should only retrieve REST PORT but for historical reasons we sometimes pass JM PORT. And this doesn't become a problem because some of them are unused while others JM PORT is incorrectly set with REST PORT value so we do incorrectly twice but conclude in success. # Generally speaking, back to the design of [FLIP-6|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077], there is no concept named {{WebMonitor}}. The responsibility to communicate with client is covered by {{Dispatcher}}. So it seems no argument to separate {{JobManagerOptions.ADDRESS}} and {{RestOptions.ADDRESS}}. Besides, we unfortunately use different PORT because REST server uses a netty connection while JM requires an actor system which has to bind to another port. Theoretically all message can be passed via the same port, either we handle REST requests in Akka scope or handle RPC in netty scope, so that this "two-port" requirement is hopefully not required then. # nit: Deprecated config {{WebOptions.PORT}} still in use at {{YarnEntrypointUtils.loadConfiguration}}. This should be easily resolved by replaced with {{RestOptions.PORT}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-13932) PyTest ExecutionConfigTests.test_equals_and_hash fail
TisonKun created FLINK-13932: Summary: PyTest ExecutionConfigTests.test_equals_and_hash fail Key: FLINK-13932 URL: https://issues.apache.org/jira/browse/FLINK-13932 Project: Flink Issue Type: Bug Components: API / Python Reporter: TisonKun Not yet found at master, but independent pull requests even trivial fix fail on the same case {code:java} === FAILURES === __ ExecutionConfigTests.test_equals_and_hash ___ self = def test_equals_and_hash(self): config1 = ExecutionEnvironment.get_execution_environment().get_config() config2 = ExecutionEnvironment.get_execution_environment().get_config() self.assertEqual(config1, config2) > self.assertEqual(hash(config1), hash(config2)) E AssertionError: 897378335 != 1596606912 pyflink/common/tests/test_execution_config.py:277: AssertionError === warnings summary === .tox/py37/lib/python3.7/site-packages/py4j/java_collections.py:13 .tox/py37/lib/python3.7/site-packages/py4j/java_collections.py:13 .tox/py37/lib/python3.7/site-packages/py4j/java_collections.py:13 .tox/py37/lib/python3.7/site-packages/py4j/java_collections.py:13 .tox/py37/lib/python3.7/site-packages/py4j/java_collections.py:13 /home/travis/build/flink-ci/flink/flink-python/.tox/py37/lib/python3.7/site-packages/py4j/java_collections.py:13: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import ( {code} https://api.travis-ci.com/v3/job/229672435/log.txt https://api.travis-ci.com/v3/job/229721832/log.txt cc [~sunjincheng121] [~dian.fu] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-13417) Bump Zookeeper to 3.5.5
[ https://issues.apache.org/jira/browse/FLINK-13417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921386#comment-16921386 ] TisonKun commented on FLINK-13417: -- [~till.rohrmann] Here is [the thread|https://lists.apache.org/x/thread.html/a35ef4da607121ac3621f7eee01b377b4d205b6bf676357bd25d949f@%3Cuser.zookeeper.apache.org%3E] as well as the upgrade FAQ [page|https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ]. It is announced no intended or known break changes if we upgrade zk to 3.5 and require quorums running on 3.5. But if we upgrade to zk 3.5, since we would like to make used of {{CreateMode.CONTAINER}} the packet won't be identified by a 3.4 server and will fail with {{KeeperException.UnimplementedException}}. Simply sum up, the upgrade should not cost any changes in FLINK side but required our users running FLINK against zk 3.5 servers. Unless we add an switch to opt-in and opt-out, with manually fallback the use of new features introduced in 3.5. > Bump Zookeeper to 3.5.5 > --- > > Key: FLINK-13417 > URL: https://issues.apache.org/jira/browse/FLINK-13417 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.9.0 >Reporter: Konstantin Knauf >Priority: Blocker > Fix For: 1.10.0 > > > User might want to secure their Zookeeper connection via SSL. > This requires a Zookeeper version >= 3.5.1. We might as well try to bump it > to 3.5.5, which is the latest version. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-13946) Remove deactivated JobSession-related code.
[ https://issues.apache.org/jira/browse/FLINK-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921951#comment-16921951 ] TisonKun commented on FLINK-13946: -- Good to hear. I'm volunteer to review your patch :-) > Remove deactivated JobSession-related code. > --- > > Key: FLINK-13946 > URL: https://issues.apache.org/jira/browse/FLINK-13946 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Affects Versions: 1.9.0 >Reporter: Kostas Kloudas >Assignee: Kostas Kloudas >Priority: Major > > This issue refers to removing the code related to job session as described in > [FLINK-2097|https://issues.apache.org/jira/browse/FLINK-2097]. The feature > is deactivated, as pointed by the comment > [here|https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/ExecutionEnvironment.java#L285] > and it complicates the code paths related to job submission, namely the > lifecycle of the Remote and LocalExecutors. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (FLINK-13961) Remove obsolete abstraction JobExecutor(Service)
TisonKun created FLINK-13961: Summary: Remove obsolete abstraction JobExecutor(Service) Key: FLINK-13961 URL: https://issues.apache.org/jira/browse/FLINK-13961 Project: Flink Issue Type: Sub-task Components: Client / Job Submission Affects Versions: 1.10.0 Reporter: TisonKun Fix For: 1.10.0 Refer to Till's comment The JobExecutor and the JobExecutorService have been introduced to bridge between the Flip-6 MiniCluster and the legacy FlinkMiniCluster. They should be obsolete now and could be removed if needed. Actually we should make used of {{MiniClusterClient}} for submission ideally but we have some tests based on MiniCluster in flink-runtime or somewhere that doesn't have a dependency to flink-client; while move {{MiniClusterClient}} to flink-runtime is unclear whether reasonable or not. Thus I'd prefer keep {{executeJobBlocking}} for now and defer the possible refactor. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-13417) Bump Zookeeper to 3.5.5
[ https://issues.apache.org/jira/browse/FLINK-13417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922970#comment-16922970 ] TisonKun commented on FLINK-13417: -- Yes [~till.rohrmann]. I locally build with zk 3.5 and no compile error reported while I fired CI it passed almost builds, see also https://travis-ci.org/TisonKun/flink/builds/580757901 which reported failures on {{HBaseConnectorITCase}} when started HBase MiniCluster when started MiniZooKeeperCluster. It seems like a problem of testing class implementation. Maybe [~carp84] can provide some inputs here. {code:java} java.io.IOException: Waiting for startup of standalone server at org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.startup(MiniZooKeeperCluster.java:261) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniZKCluster(HBaseTestingUtility.java:814) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniZKCluster(HBaseTestingUtility.java:784) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1041) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:917) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:899) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:881) at org.apache.flink.addons.hbase.util.HBaseTestingClusterAutoStarter.setUp(HBaseTestingClusterAutoStarter.java:147) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) {code} > Bump Zookeeper to 3.5.5 > --- > > Key: FLINK-13417 > URL: https://issues.apache.org/jira/browse/FLINK-13417 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.9.0 >Reporter: Konstantin Knauf >Priority: Blocker > Fix For: 1.10.0 > > > User might want to secure their Zookeeper connection via SSL. > This requires a Zookeeper version >= 3.5.1. We might as well try to bump it > to 3.5.5, which is the latest version. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-13961) Remove obsolete abstraction JobExecutor(Service)
[ https://issues.apache.org/jira/browse/FLINK-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923003#comment-16923003 ] TisonKun commented on FLINK-13961: -- Hi [~kkl0u] I'd like to work on this issue. Please assign the issue to me if there is no more concern. I'm going to start progress once FLINK-13946 resolved since there would be several conflicts and FLINK-13946 is almost done. > Remove obsolete abstraction JobExecutor(Service) > - > > Key: FLINK-13961 > URL: https://issues.apache.org/jira/browse/FLINK-13961 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Affects Versions: 1.10.0 >Reporter: TisonKun >Priority: Major > Fix For: 1.10.0 > > > Refer to Till's comment > The JobExecutor and the JobExecutorService have been introduced to bridge > between the Flip-6 MiniCluster and the legacy FlinkMiniCluster. They should > be obsolete now and could be removed if needed. > Actually we should make used of {{MiniClusterClient}} for submission ideally > but we have some tests based on MiniCluster in flink-runtime or somewhere > that doesn't have a dependency to flink-client; while move > {{MiniClusterClient}} to flink-runtime is unclear whether reasonable or not. > Thus I'd prefer keep {{executeJobBlocking}} for now and defer the possible > refactor. -- This message was sent by Atlassian Jira (v8.3.2#803003)