[jira] [Commented] (FLINK-10508) Port JobManagerITCase to new code base

2018-10-08 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641583#comment-16641583
 ] 

TisonKun commented on FLINK-10508:
--

This test is corresponding not to {{JobMasterITCase}} but to 
{{DispatcherTest}}. Seems mainly tests multiple types of job and their 
behavior. 

> Port JobManagerITCase to new code base
> --
>
> Key: FLINK-10508
> URL: https://issues.apache.org/jira/browse/FLINK-10508
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> Port {{JobManagerITCase}} to new code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10516) YarnApplicationMasterRunner fail to initialize FileSystem with correct Flink Configuration during setup

2018-10-09 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642911#comment-16642911
 ] 

TisonKun commented on FLINK-10516:
--

[~suez1224] What about the refactor? {{YarnApplicationMasterRunner}} looks like 
a class for legacy(non FLIP-6) code base and we currently have FLINK-10392, so 
I suspect if we should fix it for 1.7.0.

cc [~till.rohrmann]

> YarnApplicationMasterRunner fail to initialize FileSystem with correct Flink 
> Configuration during setup
> ---
>
> Key: FLINK-10516
> URL: https://issues.apache.org/jira/browse/FLINK-10516
> Project: Flink
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.4.0, 1.5.0, 1.6.0, 1.7.0
>Reporter: Shuyi Chen
>Assignee: Shuyi Chen
>Priority: Blocker
> Fix For: 1.7.0, 1.6.2, 1.5.5
>
>
> Will add a fix, and refactor YarnApplicationMasterRunner to add a unittest to 
> prevent future regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10527) Cleanup constant isNewMode in YarnTestBase

2018-10-10 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645927#comment-16645927
 ] 

TisonKun commented on FLINK-10527:
--

How about adding this issue as a subtask of FLINK-10392?

> Cleanup constant isNewMode in YarnTestBase
> --
>
> Key: FLINK-10527
> URL: https://issues.apache.org/jira/browse/FLINK-10527
> Project: Flink
>  Issue Type: Bug
>  Components: YARN
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> This seems to be a residual problem with FLINK-10396. It is set to true in 
> that PR. Currently it has three usage scenarios:
> 1. assert, caused an error
> {code:java}
> assumeTrue("The new mode does not start TMs upfront.", !isNewMode);
> {code}
> 2. if (!isNewMode) the logic in the block would not have invoked, the if 
> block can be removed
> 3. if (isNewMode) always been invoked, the if statement can be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10508) Port JobManagerITCase to new code base

2018-10-11 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647375#comment-16647375
 ] 

TisonKun commented on FLINK-10508:
--

checklist:

JobManagerITCase.The JobManager actor must handle jobs when not enough slots
JobManagerITCase.The JobManager actor must support immediate scheduling of a 
single vertex
JobManagerITCase.The JobManager actor must support queued scheduling of a 
single vertex
JobManagerITCase.The JobManager actor must support forward jobs
JobManagerITCase.The JobManager actor must support bipartite job
JobManagerITCase.The JobManager actor must support two input job failing edge 
mismatch
JobManagerITCase.The JobManager actor must support two input job
JobManagerITCase.The JobManager actor must support scheduling all at once
JobManagerITCase.The JobManager actor must handle job with a failing sender 
vertex
JobManagerITCase.The JobManager actor must handle job with an occasionally 
failing sender vertex
JobManagerITCase.The JobManager actor must handle job with a failing receiver 
vertex
JobManagerITCase.The JobManager actor must handle job with all vertices failing 
during instantiation
JobManagerITCase.The JobManager actor must handle job with some vertices 
failing during instantiation
JobManagerITCase.The JobManager actor must check that all job vertices have 
completed the call to finalizeOnMaster before the job completes
JobManagerITCase.The JobManager actor must remove execution graphs when the 
client ends the session explicitly
JobManagerITCase.The JobManager actor must remove execution graphs when when 
the client's session times out
JobManagerITCase.The JobManager actor must handle trigger savepoint response 
for non-existing job
JobManagerITCase.The JobManager actor must handle trigger savepoint response 
for job with disabled checkpointing
JobManagerITCase.The JobManager actor must handle trigger savepoint response 
after trigger savepoint failure
JobManagerITCase.The JobManager actor must handle failed savepoint triggering
JobManagerITCase.The JobManager actor must handle trigger savepoint response 
after succeeded savepoint future

> Port JobManagerITCase to new code base
> --
>
> Key: FLINK-10508
> URL: https://issues.apache.org/jira/browse/FLINK-10508
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> Port {{JobManagerITCase}} to new code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10540) Remove legacy LocalFlinkMiniCluster

2018-10-12 Thread TisonKun (JIRA)
TisonKun created FLINK-10540:


 Summary: Remove legacy LocalFlinkMiniCluster
 Key: FLINK-10540
 URL: https://issues.apache.org/jira/browse/FLINK-10540
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.7.0
Reporter: TisonKun
 Fix For: 1.7.0


{{LocalFlinkMiniCluster}} is based on legacy cluster mode and should be no 
longer used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10541) Removed unused code of LocalFlinkMiniCluster

2018-10-12 Thread TisonKun (JIRA)
TisonKun created FLINK-10541:


 Summary: Removed unused code of LocalFlinkMiniCluster
 Key: FLINK-10541
 URL: https://issues.apache.org/jira/browse/FLINK-10541
 Project: Flink
  Issue Type: Task
  Components: Tests
Affects Versions: 1.7.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.7.0


{{TestBaseUtils#startCluster}} depends on legacy class 
{{LocalFlinkMiniCluster}}, however, this class itself is unused. Let's remove 
it and help confirm we can directly remove {{LocalFlinkMiniCluster}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10541) Removed unused code of LocalFlinkMiniCluster

2018-10-12 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10541:
-
Issue Type: Sub-task  (was: Task)
Parent: FLINK-10392

> Removed unused code of LocalFlinkMiniCluster
> 
>
> Key: FLINK-10541
> URL: https://issues.apache.org/jira/browse/FLINK-10541
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> {{TestBaseUtils#startCluster}} depends on legacy class 
> {{LocalFlinkMiniCluster}}, however, this class itself is unused. Let's remove 
> it and help confirm we can directly remove {{LocalFlinkMiniCluster}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10541) Removed unused code depends on LocalFlinkMiniCluster

2018-10-12 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10541:
-
Summary: Removed unused code depends on LocalFlinkMiniCluster  (was: 
Removed unused code of LocalFlinkMiniCluster)

> Removed unused code depends on LocalFlinkMiniCluster
> 
>
> Key: FLINK-10541
> URL: https://issues.apache.org/jira/browse/FLINK-10541
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> {{TestBaseUtils#startCluster}} depends on legacy class 
> {{LocalFlinkMiniCluster}}, however, this class itself is unused. Let's remove 
> it and help confirm we can directly remove {{LocalFlinkMiniCluster}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10545) Remove JobManagerLeaderSessionIDITCase

2018-10-14 Thread TisonKun (JIRA)
TisonKun created FLINK-10545:


 Summary: Remove JobManagerLeaderSessionIDITCase
 Key: FLINK-10545
 URL: https://issues.apache.org/jira/browse/FLINK-10545
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.7.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.7.0


{{JobManagerLeaderSessionIDITCase.scala}} is based on legacy mode and I think 
we now have {{FencingToken}} and need not to maintain or port this test. Just 
simply remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10546) Remove StandaloneMiniCluster

2018-10-14 Thread TisonKun (JIRA)
TisonKun created FLINK-10546:


 Summary: Remove StandaloneMiniCluster
 Key: FLINK-10546
 URL: https://issues.apache.org/jira/browse/FLINK-10546
 Project: Flink
  Issue Type: Task
  Components: JobManager
Affects Versions: 1.7.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.7.0


Before doing as title, I want to check that is the {{StandaloneMiniCluster}} 
still in use?
 IIRC there once be a deprecation of `start-local.sh` but I don’t know if it is 
relevant.
 Further, this class seems unused in all other place. Since it depends on 
legacy mode, I wonder whether we can *JUST* remove it.

 

cc [~till.rohrmann] and [~Zentol]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10546) Remove StandaloneMiniCluster

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10546:
-
Description: 
Before doing as title, I want to check that is the {{StandaloneMiniCluster}} 
still in use?
 IIRC there once be a deprecation of {{start-local.sh}} but I don’t know if it 
is relevant.
 Further, this class seems unused in all other place. Since it depends on 
legacy mode, I wonder whether we can *JUST* remove it.

 

cc [~till.rohrmann] and [~Zentol]

  was:
Before doing as title, I want to check that is the {{StandaloneMiniCluster}} 
still in use?
 IIRC there once be a deprecation of `start-local.sh` but I don’t know if it is 
relevant.
 Further, this class seems unused in all other place. Since it depends on 
legacy mode, I wonder whether we can *JUST* remove it.

 

cc [~till.rohrmann] and [~Zentol]


> Remove StandaloneMiniCluster
> 
>
> Key: FLINK-10546
> URL: https://issues.apache.org/jira/browse/FLINK-10546
> Project: Flink
>  Issue Type: Task
>  Components: JobManager
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> Before doing as title, I want to check that is the {{StandaloneMiniCluster}} 
> still in use?
>  IIRC there once be a deprecation of {{start-local.sh}} but I don’t know if 
> it is relevant.
>  Further, this class seems unused in all other place. Since it depends on 
> legacy mode, I wonder whether we can *JUST* remove it.
>  
> cc [~till.rohrmann] and [~Zentol]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10540) Remove legacy FlinkMiniCluster

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10540:
-
Summary: Remove legacy FlinkMiniCluster  (was: Remove legacy 
LocalFlinkMiniCluster)

> Remove legacy FlinkMiniCluster
> --
>
> Key: FLINK-10540
> URL: https://issues.apache.org/jira/browse/FLINK-10540
> Project: Flink
>  Issue Type: Sub-task
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> {{LocalFlinkMiniCluster}} is based on legacy cluster mode and should be no 
> longer used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10540) Remove legacy FlinkMiniCluster

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10540:
-
Description: {{FlinkMiniCluster}} is based on legacy cluster mode and 
should be no longer used.  (was: {{LocalFlinkMiniCluster}} is based on legacy 
cluster mode and should be no longer used.)

> Remove legacy FlinkMiniCluster
> --
>
> Key: FLINK-10540
> URL: https://issues.apache.org/jira/browse/FLINK-10540
> Project: Flink
>  Issue Type: Sub-task
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> {{FlinkMiniCluster}} is based on legacy cluster mode and should be no longer 
> used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10549) Remove LegacyJobRetrievalITCase

2018-10-14 Thread TisonKun (JIRA)
TisonKun created FLINK-10549:


 Summary: Remove LegacyJobRetrievalITCase
 Key: FLINK-10549
 URL: https://issues.apache.org/jira/browse/FLINK-10549
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.7.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.7.0


We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10549) Remove Legacy* Tests

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10549:
-
Summary: Remove Legacy* Tests  (was: Remove LegacyJobRetrievalITCase)

> Remove Legacy* Tests
> 
>
> Key: FLINK-10549
> URL: https://issues.apache.org/jira/browse/FLINK-10549
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> We already have {{JobRetrievalITCase}} and all test cases of 
> {{LegacyJobRetrievalITCase}} are covered. Just simply remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10549) Remove Legacy* Tests

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10549:
-
Description: 
This Jira tracks the removal of tests that start with "LegacyXXX" and covered 
by a test named "XXX" and covering the same topic.

1. We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.

  was:We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.


> Remove Legacy* Tests
> 
>
> Key: FLINK-10549
> URL: https://issues.apache.org/jira/browse/FLINK-10549
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> This Jira tracks the removal of tests that start with "LegacyXXX" and covered 
> by a test named "XXX" and covering the same topic.
> 1. We already have {{JobRetrievalITCase}} and all test cases of 
> {{LegacyJobRetrievalITCase}} are covered. Just simply remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10549) Remove Legacy* Tests

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10549:
-
Description: 
This Jira tracks the removal of tests that start with "LegacyXXX" and covered 
by a test named "XXX" and covering the same topic.
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.

  was:
This Jira tracks the removal of tests that start with "LegacyXXX" and covered 
by a test named "XXX" and covering the same topic.

1. We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.


> Remove Legacy* Tests
> 
>
> Key: FLINK-10549
> URL: https://issues.apache.org/jira/browse/FLINK-10549
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> This Jira tracks the removal of tests that start with "LegacyXXX" and covered 
> by a test named "XXX" and covering the same topic.
>  # We already have {{JobRetrievalITCase}} and all test cases of 
> {{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
>  # We already have {{AccumulatorLiveITCase}} and all test cases of 
> {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10549) Remove Legacy* Tests

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10549:
-
Description: 
This Jira tracks the removal of tests that start with "LegacyXXX" and covered 
by a test named "XXX" and covering the same topic.
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.

  was:
This Jira tracks the removal of tests that start with "LegacyXXX" and covered 
by a test named "XXX" and covering the same topic.
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.


> Remove Legacy* Tests
> 
>
> Key: FLINK-10549
> URL: https://issues.apache.org/jira/browse/FLINK-10549
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> This Jira tracks the removal of tests that start with "LegacyXXX" and covered 
> by a test named "XXX" and covering the same topic.
>  # We already have {{JobRetrievalITCase}} and all test cases of 
> {{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
>  # We already have {{AccumulatorLiveITCase}} and all test cases of 
> {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
>  # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test 
> cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just 
> simply remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10549:
-
Summary: Remove Legacy* Tests based on legacy mode  (was: Remove Legacy* 
Tests)

> Remove Legacy* Tests based on legacy mode
> -
>
> Key: FLINK-10549
> URL: https://issues.apache.org/jira/browse/FLINK-10549
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> This Jira tracks the removal of tests that start with "LegacyXXX" and covered 
> by a test named "XXX" and covering the same topic.
>  # We already have {{JobRetrievalITCase}} and all test cases of 
> {{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
>  # We already have {{AccumulatorLiveITCase}} and all test cases of 
> {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
>  # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test 
> cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just 
> simply remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10549:
-
Description: 
This Jira tracks the removal of tests based on legacy mode and starting with 
"LegacyXXX" while covered by a test named "XXX" .
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.

  was:
This Jira tracks the removal of tests based on legacy mode and starting with 
"LegacyXXX" while covered by a test named "XXX" and covering the same topic.
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.


> Remove Legacy* Tests based on legacy mode
> -
>
> Key: FLINK-10549
> URL: https://issues.apache.org/jira/browse/FLINK-10549
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> This Jira tracks the removal of tests based on legacy mode and starting with 
> "LegacyXXX" while covered by a test named "XXX" .
>  # We already have {{JobRetrievalITCase}} and all test cases of 
> {{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
>  # We already have {{AccumulatorLiveITCase}} and all test cases of 
> {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
>  # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test 
> cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just 
> simply remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10549:
-
Description: 
This Jira tracks the removal of tests based on legacy mode and starting with 
"LegacyXXX" while covered by a test named "XXX" .
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.
 # We already have {{ClassLoaderITCase}} and all test cases of 
{{LegacyClassLoaderITCase}} are covered. Just simply remove it.

  was:
This Jira tracks the removal of tests based on legacy mode and starting with 
"LegacyXXX" while covered by a test named "XXX" .
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.


> Remove Legacy* Tests based on legacy mode
> -
>
> Key: FLINK-10549
> URL: https://issues.apache.org/jira/browse/FLINK-10549
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> This Jira tracks the removal of tests based on legacy mode and starting with 
> "LegacyXXX" while covered by a test named "XXX" .
>  # We already have {{JobRetrievalITCase}} and all test cases of 
> {{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
>  # We already have {{AccumulatorLiveITCase}} and all test cases of 
> {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
>  # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test 
> cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just 
> simply remove it.
>  # We already have {{ClassLoaderITCase}} and all test cases of 
> {{LegacyClassLoaderITCase}} are covered. Just simply remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10549:
-
Description: 
This Jira tracks the removal of tests based on legacy mode and starting with 
"LegacyXXX" while covered by a test named "XXX" .
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.
 # We already have {{ClassLoaderITCase}} and all test cases of 
{{LegacyClassLoaderITCase}} are covered. Just simply remove it.
 # We already have {{SlotCountExceedingParallelismTest}} and all test cases of 
{{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it.

  was:
This Jira tracks the removal of tests based on legacy mode and starting with 
"LegacyXXX" while covered by a test named "XXX" .
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.
 # We already have {{ClassLoaderITCase}} and all test cases of 
{{LegacyClassLoaderITCase}} are covered. Just simply remove it.


> Remove Legacy* Tests based on legacy mode
> -
>
> Key: FLINK-10549
> URL: https://issues.apache.org/jira/browse/FLINK-10549
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> This Jira tracks the removal of tests based on legacy mode and starting with 
> "LegacyXXX" while covered by a test named "XXX" .
>  # We already have {{JobRetrievalITCase}} and all test cases of 
> {{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
>  # We already have {{AccumulatorLiveITCase}} and all test cases of 
> {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
>  # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test 
> cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just 
> simply remove it.
>  # We already have {{ClassLoaderITCase}} and all test cases of 
> {{LegacyClassLoaderITCase}} are covered. Just simply remove it.
>  # We already have {{SlotCountExceedingParallelismTest}} and all test cases 
> of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply 
> remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10549:
-
Description: 
This Jira tracks the removal of tests based on legacy mode and starting with 
"LegacyXXX" while covered by a test named "XXX" .
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.
 # We already have {{ClassLoaderITCase}} and all test cases of 
{{LegacyClassLoaderITCase}} are covered. Just simply remove it.
 # We already have {{SlotCountExceedingParallelismTest}} and all test cases of 
{{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it.
 # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of 
{{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it.


  was:
This Jira tracks the removal of tests based on legacy mode and starting with 
"LegacyXXX" while covered by a test named "XXX" .
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.
 # We already have {{ClassLoaderITCase}} and all test cases of 
{{LegacyClassLoaderITCase}} are covered. Just simply remove it.
 # We already have {{SlotCountExceedingParallelismTest}} and all test cases of 
{{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it.


> Remove Legacy* Tests based on legacy mode
> -
>
> Key: FLINK-10549
> URL: https://issues.apache.org/jira/browse/FLINK-10549
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> This Jira tracks the removal of tests based on legacy mode and starting with 
> "LegacyXXX" while covered by a test named "XXX" .
>  # We already have {{JobRetrievalITCase}} and all test cases of 
> {{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
>  # We already have {{AccumulatorLiveITCase}} and all test cases of 
> {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
>  # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test 
> cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just 
> simply remove it.
>  # We already have {{ClassLoaderITCase}} and all test cases of 
> {{LegacyClassLoaderITCase}} are covered. Just simply remove it.
>  # We already have {{SlotCountExceedingParallelismTest}} and all test cases 
> of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply 
> remove it.
>  # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of 
> {{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10549:
-
Description: 
This Jira tracks the removal of tests based on legacy mode and starting with 
"LegacyXXX" while covered by a test named "XXX" .
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.
 # We already have {{ClassLoaderITCase}} and all test cases of 
{{LegacyClassLoaderITCase}} are covered. Just simply remove it.
 # We already have {{SlotCountExceedingParallelismTest}} and all test cases of 
{{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it.
 # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of 
{{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it.
 # We already have {{PartialConsumePipelinedResultTest}} and all test cases of 
{{LegacyPartialConsumePipelinedResultTest}} are covered. Just simply remove it.


  was:
This Jira tracks the removal of tests based on legacy mode and starting with 
"LegacyXXX" while covered by a test named "XXX" .
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.
 # We already have {{ClassLoaderITCase}} and all test cases of 
{{LegacyClassLoaderITCase}} are covered. Just simply remove it.
 # We already have {{SlotCountExceedingParallelismTest}} and all test cases of 
{{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it.
 # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of 
{{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it.



> Remove Legacy* Tests based on legacy mode
> -
>
> Key: FLINK-10549
> URL: https://issues.apache.org/jira/browse/FLINK-10549
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> This Jira tracks the removal of tests based on legacy mode and starting with 
> "LegacyXXX" while covered by a test named "XXX" .
>  # We already have {{JobRetrievalITCase}} and all test cases of 
> {{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
>  # We already have {{AccumulatorLiveITCase}} and all test cases of 
> {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
>  # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test 
> cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just 
> simply remove it.
>  # We already have {{ClassLoaderITCase}} and all test cases of 
> {{LegacyClassLoaderITCase}} are covered. Just simply remove it.
>  # We already have {{SlotCountExceedingParallelismTest}} and all test cases 
> of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply 
> remove it.
>  # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of 
> {{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it.
>  # We already have {{PartialConsumePipelinedResultTest}} and all test cases 
> of {{LegacyPartialConsumePipelinedResultTest}} are covered. Just simply 
> remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10549:
-
Description: 
This Jira tracks the removal of tests based on legacy mode and starting with 
"LegacyXXX" while covered by a test named "XXX" and covering the same topic.
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.

  was:
This Jira tracks the removal of tests that start with "LegacyXXX" and covered 
by a test named "XXX" and covering the same topic.
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.


> Remove Legacy* Tests based on legacy mode
> -
>
> Key: FLINK-10549
> URL: https://issues.apache.org/jira/browse/FLINK-10549
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> This Jira tracks the removal of tests based on legacy mode and starting with 
> "LegacyXXX" while covered by a test named "XXX" and covering the same topic.
>  # We already have {{JobRetrievalITCase}} and all test cases of 
> {{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
>  # We already have {{AccumulatorLiveITCase}} and all test cases of 
> {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
>  # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test 
> cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just 
> simply remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10549) Remove Legacy* Tests based on legacy mode

2018-10-14 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10549:
-
Description: 
This Jira tracks the removal of tests based on legacy mode and starting with 
"LegacyXXX" while covered by a test named "XXX" .
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.
 # We already have {{ClassLoaderITCase}} and all test cases of 
{{LegacyClassLoaderITCase}} are covered. Just simply remove it.
 # We already have {{SlotCountExceedingParallelismTest}} and all test cases of 
{{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it.
 # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of 
{{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it.
 # We already have {{PartialConsumePipelinedResultTest}} and all test cases of 
{{LegacyPartialConsumePipelinedResultTest}} are covered. Just simply remove it.
 # We already have {{AvroExternalJarProgramITCase}} and all test cases of 
{{LegacyAvroExternalJarProgramITCase}} are covered. Just simply remove it.

  was:
This Jira tracks the removal of tests based on legacy mode and starting with 
"LegacyXXX" while covered by a test named "XXX" .
 # We already have {{JobRetrievalITCase}} and all test cases of 
{{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
 # We already have {{AccumulatorLiveITCase}} and all test cases of 
{{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
 # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test cases 
of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just simply 
remove it.
 # We already have {{ClassLoaderITCase}} and all test cases of 
{{LegacyClassLoaderITCase}} are covered. Just simply remove it.
 # We already have {{SlotCountExceedingParallelismTest}} and all test cases of 
{{LegacySlotCountExceedingParallelismTest}} are covered. Just simply remove it.
 # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of 
{{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it.
 # We already have {{PartialConsumePipelinedResultTest}} and all test cases of 
{{LegacyPartialConsumePipelinedResultTest}} are covered. Just simply remove it.



> Remove Legacy* Tests based on legacy mode
> -
>
> Key: FLINK-10549
> URL: https://issues.apache.org/jira/browse/FLINK-10549
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> This Jira tracks the removal of tests based on legacy mode and starting with 
> "LegacyXXX" while covered by a test named "XXX" .
>  # We already have {{JobRetrievalITCase}} and all test cases of 
> {{LegacyJobRetrievalITCase}} are covered. Just simply remove it.
>  # We already have {{AccumulatorLiveITCase}} and all test cases of 
> {{LegacyAccumulatorLiveITCase}} are covered. Just simply remove it.
>  # We already have {{TaskCancelAsyncProducerConsumerITCase}} and all test 
> cases of {{LegacyTaskCancelAsyncProducerConsumerITCase}} are covered. Just 
> simply remove it.
>  # We already have {{ClassLoaderITCase}} and all test cases of 
> {{LegacyClassLoaderITCase}} are covered. Just simply remove it.
>  # We already have {{SlotCountExceedingParallelismTest}} and all test cases 
> of {{LegacySlotCountExceedingParallelismTest}} are covered. Just simply 
> remove it.
>  # We already have {{ScheduleOrUpdateConsumersTest}} and all test cases of 
> {{LegacyScheduleOrUpdateConsumersTest}} are covered. Just simply remove it.
>  # We already have {{PartialConsumePipelinedResultTest}} and all test cases 
> of {{LegacyPartialConsumePipelinedResultTest}} are covered. Just simply 
> remove it.
>  # We already have {{AvroExternalJarProgramITCase}} and all test cases of 
> {{LegacyAvroExternalJarProgramITCase}} are covered. Just simply remove it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10546) Remove StandaloneMiniCluster

2018-10-15 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10546:
-
Issue Type: Sub-task  (was: Task)
Parent: FLINK-10392

> Remove StandaloneMiniCluster
> 
>
> Key: FLINK-10546
> URL: https://issues.apache.org/jira/browse/FLINK-10546
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> Before doing as title, I want to check that is the {{StandaloneMiniCluster}} 
> still in use?
>  IIRC there once be a deprecation of {{start-local.sh}} but I don’t know if 
> it is relevant.
>  Further, this class seems unused in all other place. Since it depends on 
> legacy mode, I wonder whether we can *JUST* remove it.
>  
> cc [~till.rohrmann] and [~Zentol]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10436) Example config uses deprecated key jobmanager.rpc.address

2018-10-15 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649891#comment-16649891
 ] 

TisonKun commented on FLINK-10436:
--

[~till.rohrmann] Sounds reasonable. It is a bit strange if jobmanager can 
somehow replaced by rest. I do a bisect and the commit is 
https://github.com/apache/flink/commit/efd7336fa693a9f82b9ecfb5d81c0ef747ab7801 
which is authored by [~gjy] and is reviewed by [~till.rohrmann]. Thus I involve 
[~gjy] here to hear the original purpose. If we go into the same direction of 
Till's thought above, we can revert the 
{{.withDeprecatedKeys(JobManagerOptions.ADDRESS.key())}} line.

> Example config uses deprecated key jobmanager.rpc.address
> -
>
> Key: FLINK-10436
> URL: https://issues.apache.org/jira/browse/FLINK-10436
> Project: Flink
>  Issue Type: Sub-task
>  Components: Startup Shell Scripts
>Affects Versions: 1.7.0
>Reporter: Ufuk Celebi
>Assignee: TisonKun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> The example {{flink-conf.yaml}} shipped as part of the Flink distribution 
> (https://github.com/apache/flink/blob/master/flink-dist/src/main/resources/flink-conf.yaml)
>  has the following entry:
> {code}
> jobmanager.rpc.address: localhost
> {code}
> When using this key, the following deprecation warning is logged.
> {code}
> 2018-09-26 12:01:46,608 WARN  org.apache.flink.configuration.Configuration
>   - Config uses deprecated configuration key 
> 'jobmanager.rpc.address' instead of proper key 'rest.address'
> {code}
> The example config should not use deprecated config options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10555) Port AkkaSslITCase to new code base

2018-10-15 Thread TisonKun (JIRA)
TisonKun created FLINK-10555:


 Summary: Port AkkaSslITCase to new code base
 Key: FLINK-10555
 URL: https://issues.apache.org/jira/browse/FLINK-10555
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.7.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.7.0


Port {{AkkaSslITCase}} to new code base, as {{MiniClusterSslITCase}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-2592) Rework of FlinkMiniCluster

2018-10-15 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun closed FLINK-2592.
---
Resolution: Won't Fix

In fact, we want to remove {{FlinkMiniCluster}} since it is based on deprecated 
legacy mode.

> Rework of FlinkMiniCluster
> --
>
> Key: FLINK-2592
> URL: https://issues.apache.org/jira/browse/FLINK-2592
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Coordination
>Reporter: Till Rohrmann
>Priority: Minor
>
> Over the time, the {{FlinkMiniCluster}} has become quite complex to support 
> all different execution modes (batch vs. streaming, standalone vs. ha with 
> ZooKeeper, single {{ActorSystem}} vs. multiple {{ActorSystems}}, etc.). There 
> is no consistent way of configuring all these options. Therefore it would be 
> good to rework the {{FlinkMiniCluster}} to avoid configuring it via the 
> {{Configuration}} object and instead to use explicit options which can be 
> turned on and off.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10559) Remove LegacyLocalStreamEnvironment

2018-10-15 Thread TisonKun (JIRA)
TisonKun created FLINK-10559:


 Summary: Remove LegacyLocalStreamEnvironment
 Key: FLINK-10559
 URL: https://issues.apache.org/jira/browse/FLINK-10559
 Project: Flink
  Issue Type: Sub-task
  Components: Local Runtime
Affects Versions: 1.7.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.7.0


See the corresponding GitHub pull request for diagnostic, basically this class 
is not in used any more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10540) Remove legacy FlinkMiniCluster

2018-10-16 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651288#comment-16651288
 ] 

TisonKun commented on FLINK-10540:
--

Hi [~dangdangdang], feel free to take over it.

FYI, this is somehow an umbrella issue since there are many dependencies of  
{{FlinkMiniCluster}} its subclasses {{LocalFlinkMiniCluster}} and 
{{TestingMiniCluster}}. We finally remove {{FlinkMiniCluster}} but the process 
might be a bit further than just removal.

> Remove legacy FlinkMiniCluster
> --
>
> Key: FLINK-10540
> URL: https://issues.apache.org/jira/browse/FLINK-10540
> Project: Flink
>  Issue Type: Sub-task
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> {{FlinkMiniCluster}} is based on legacy cluster mode and should be no longer 
> used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10540) Remove legacy FlinkMiniCluster

2018-10-16 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651288#comment-16651288
 ] 

TisonKun edited comment on FLINK-10540 at 10/16/18 8:04 AM:


Hi [~dangdangdang], feel free to take over it.

FYI, this is somehow an umbrella issue since there are many dependencies of  
{{FlinkMiniCluster}} and its subclasses {{LocalFlinkMiniCluster}} and 
{{TestingCluster}}. We finally remove {{FlinkMiniCluster}} but the process 
might be a bit further than just removal.


was (Author: tison):
Hi [~dangdangdang], feel free to take over it.

FYI, this is somehow an umbrella issue since there are many dependencies of  
{{FlinkMiniCluster}} and its subclasses {{LocalFlinkMiniCluster}} and 
{{TestingMiniCluster}}. We finally remove {{FlinkMiniCluster}} but the process 
might be a bit further than just removal.

> Remove legacy FlinkMiniCluster
> --
>
> Key: FLINK-10540
> URL: https://issues.apache.org/jira/browse/FLINK-10540
> Project: Flink
>  Issue Type: Sub-task
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> {{FlinkMiniCluster}} is based on legacy cluster mode and should be no longer 
> used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10540) Remove legacy FlinkMiniCluster

2018-10-16 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651288#comment-16651288
 ] 

TisonKun edited comment on FLINK-10540 at 10/16/18 8:04 AM:


Hi [~dangdangdang], feel free to take over it.

FYI, this is somehow an umbrella issue since there are many dependencies of  
{{FlinkMiniCluster}} and its subclasses {{LocalFlinkMiniCluster}} and 
{{TestingMiniCluster}}. We finally remove {{FlinkMiniCluster}} but the process 
might be a bit further than just removal.


was (Author: tison):
Hi [~dangdangdang], feel free to take over it.

FYI, this is somehow an umbrella issue since there are many dependencies of  
{{FlinkMiniCluster}} its subclasses {{LocalFlinkMiniCluster}} and 
{{TestingMiniCluster}}. We finally remove {{FlinkMiniCluster}} but the process 
might be a bit further than just removal.

> Remove legacy FlinkMiniCluster
> --
>
> Key: FLINK-10540
> URL: https://issues.apache.org/jira/browse/FLINK-10540
> Project: Flink
>  Issue Type: Sub-task
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> {{FlinkMiniCluster}} is based on legacy cluster mode and should be no longer 
> used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10565) Refactor SchedulerTestBase

2018-10-16 Thread TisonKun (JIRA)
TisonKun created FLINK-10565:


 Summary: Refactor SchedulerTestBase
 Key: FLINK-10565
 URL: https://issues.apache.org/jira/browse/FLINK-10565
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.7.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.7.0


Basically, current {{SchedulerTestBase}} covers {{Scheduler}} tests, which is a 
schedule mechanism of legacy mode. We can remove it from  
{{SchedulerTestBase}}. Besides, inner {{SchedulerTestBase}} there are two 
useful testing class {{TestingSlotPool}} and {{TestingSlotPoolSlotProvider}}. 
Extract them for further usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10564) tm costs too much time when communicating with jm

2018-10-16 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651641#comment-16651641
 ] 

TisonKun commented on FLINK-10564:
--

Hi [~chenlf], could you tell on which version of Flink, i.e., the {{Affects 
Version/s}}, you meet this issue?

> tm costs too much time when communicating with  jm
> --
>
> Key: FLINK-10564
> URL: https://issues.apache.org/jira/browse/FLINK-10564
> Project: Flink
>  Issue Type: Bug
>  Components: Core, JobManager, TaskManager
> Environment: configs are following:
> jm
> high-availability zookeeper
> taskmanager.heap.mb   16384
> taskmanager.memory.preallocatefalse
> taskmanager.numberOfTaskSlots 64
> tm
> slots 128
> free slots 0-128
> cpu core 40 
> Physical Memory 95gb
> free Memory 32gb-50gb
> Flink Managed Memory 22gb-35gb
>Reporter: chenlf
>Priority: Major
> Attachments: timeout.log
>
>
> it works fine until the number of tasks is above about 400.
> There are  600+ tasks(each task handles billion data) running in our cluster 
> now,and the problem is it costs too much time (even time out)when 
> submiting/canceling/querying a task.
> Recouses like memory,cpu are on normal level.
> after debuging,we found this method is the ulprit:
> org.apache.flink.runtime.util.LeaderRetrievalUtils.LeaderGatewayListener.notifyLeaderAddress(String,
>  UUID)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10436) Example config uses deprecated key jobmanager.rpc.address

2018-10-16 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651705#comment-16651705
 ] 

TisonKun commented on FLINK-10436:
--

Thanks for explaining the fallback behavior. I agree.

> Example config uses deprecated key jobmanager.rpc.address
> -
>
> Key: FLINK-10436
> URL: https://issues.apache.org/jira/browse/FLINK-10436
> Project: Flink
>  Issue Type: Sub-task
>  Components: Startup Shell Scripts
>Affects Versions: 1.7.0
>Reporter: Ufuk Celebi
>Assignee: TisonKun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> The example {{flink-conf.yaml}} shipped as part of the Flink distribution 
> (https://github.com/apache/flink/blob/master/flink-dist/src/main/resources/flink-conf.yaml)
>  has the following entry:
> {code}
> jobmanager.rpc.address: localhost
> {code}
> When using this key, the following deprecation warning is logged.
> {code}
> 2018-09-26 12:01:46,608 WARN  org.apache.flink.configuration.Configuration
>   - Config uses deprecated configuration key 
> 'jobmanager.rpc.address' instead of proper key 'rest.address'
> {code}
> The example config should not use deprecated config options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10436) Example config uses deprecated key jobmanager.rpc.address

2018-10-16 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651774#comment-16651774
 ] 

TisonKun commented on FLINK-10436:
--

[~till.rohrmann] can we archive this by first introducing a class 
{{FallbackKey}} and change {{deprecatedKeys}}'s type to it?

{code:java}
public class FallbackKey {
  private String key;
  private boolean isDeprecated;
  ...
}
{code}


> Example config uses deprecated key jobmanager.rpc.address
> -
>
> Key: FLINK-10436
> URL: https://issues.apache.org/jira/browse/FLINK-10436
> Project: Flink
>  Issue Type: Sub-task
>  Components: Startup Shell Scripts
>Affects Versions: 1.7.0
>Reporter: Ufuk Celebi
>Assignee: TisonKun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> The example {{flink-conf.yaml}} shipped as part of the Flink distribution 
> (https://github.com/apache/flink/blob/master/flink-dist/src/main/resources/flink-conf.yaml)
>  has the following entry:
> {code}
> jobmanager.rpc.address: localhost
> {code}
> When using this key, the following deprecation warning is logged.
> {code}
> 2018-09-26 12:01:46,608 WARN  org.apache.flink.configuration.Configuration
>   - Config uses deprecated configuration key 
> 'jobmanager.rpc.address' instead of proper key 'rest.address'
> {code}
> The example config should not use deprecated config options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10569) Clean up uses of Scheduler and Instance in valid tests

2018-10-16 Thread TisonKun (JIRA)
TisonKun created FLINK-10569:


 Summary: Clean up uses of Scheduler and Instance in valid tests
 Key: FLINK-10569
 URL: https://issues.apache.org/jira/browse/FLINK-10569
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.7.0
Reporter: TisonKun
 Fix For: 1.7.0


Legacy class {{Scheduler}} and {{Instance}} are still used in some valid tests 
like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 schedule 
mode. The best way I can find is use {{SimpleSlotProvider}}.

Note that we need not to remove all use points among all files since most of 
them stay in legacy codebase like {{JobManager.scala}} and would be removed 
later.

*Feel free to take over it.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10575) Remove deprecated ExecutionGraphBuilder.buildGraph method

2018-10-16 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652863#comment-16652863
 ] 

TisonKun commented on FLINK-10575:
--

[~isunjin] did you analyze the code path relevant?

For my perspective this deprecated method does not depend on legacy code. This 
method looks like for historic purpose(as a fallback when vertex parallelism 
set to {{ExecutionConfig.PARALLELISM_AUTO_MAX}}), so technically I am not 
opposite to remove it. But I hesitate to do the removal so eagerly without 
necessary.

> Remove deprecated ExecutionGraphBuilder.buildGraph method
> -
>
> Key: FLINK-10575
> URL: https://issues.apache.org/jira/browse/FLINK-10575
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Affects Versions: 1.7.0
>Reporter: JIN SUN
>Assignee: JIN SUN
>Priority: Major
>
> ExecutionGraphBuilder is not a public API and we should able to remove 
> deprecated method such as:
> @Deprecated
> public static ExecutionGraph buildGraph
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-10569) Clean up uses of Scheduler and Instance in valid tests

2018-10-16 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun reassigned FLINK-10569:


Assignee: TisonKun

> Clean up uses of Scheduler and Instance in valid tests
> --
>
> Key: FLINK-10569
> URL: https://issues.apache.org/jira/browse/FLINK-10569
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> Legacy class {{Scheduler}} and {{Instance}} are still used in some valid 
> tests like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 
> schedule mode. The best way I can find is use {{SimpleSlotProvider}}.
> Note that we need not to remove all use points among all files since most of 
> them stay in legacy codebase like {{JobManager.scala}} and would be removed 
> later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10569) Clean up uses of Scheduler and Instance in valid tests

2018-10-16 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10569:
-
Description: 
Legacy class {{Scheduler}} and {{Instance}} are still used in some valid tests 
like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 schedule 
mode. The best way I can find is use {{SimpleSlotProvider}}.

Note that we need not to remove all use points among all files since most of 
them stay in legacy codebase like {{JobManager.scala}} and would be removed 
later.

  was:
Legacy class {{Scheduler}} and {{Instance}} are still used in some valid tests 
like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 schedule 
mode. The best way I can find is use {{SimpleSlotProvider}}.

Note that we need not to remove all use points among all files since most of 
them stay in legacy codebase like {{JobManager.scala}} and would be removed 
later.

*Feel free to take over it.*


> Clean up uses of Scheduler and Instance in valid tests
> --
>
> Key: FLINK-10569
> URL: https://issues.apache.org/jira/browse/FLINK-10569
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> Legacy class {{Scheduler}} and {{Instance}} are still used in some valid 
> tests like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 
> schedule mode. The best way I can find is use {{SimpleSlotProvider}}.
> Note that we need not to remove all use points among all files since most of 
> them stay in legacy codebase like {{JobManager.scala}} and would be removed 
> later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (FLINK-10038) Parallel the creation of InputSplit if necessary

2018-10-17 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10038:
-
Comment: was deleted

(was: My original purpose of mention "parallelize the creation of InputSplit" 
might be parallelize the creation of ONE InputSplit. Take a look at 
{{FileInputFormat#createInputSplits}}, it creates InputSplits file by file. 
Here is where I aim to parallelize. Thus it said "the interface for the 
creation of input splits is definitely InputSplitSource#createInputSplits". And 
this could be done without modify the interface, by change the implementation 
of {{createInputSplits}}.

However, your ideas here are also brightly. Let's say a typical case gain 
benefits from these ideas is batch job with many files, where would prefer to 
using RegionFailover strategy if possible.
Here I see 3 options. 1. create InputSplits before job running. 2. create 
InputSplits concurrent to scheduling the job. 3. Use a specific single task to 
generate the work.

Option 1 is easier to implement as [~StephanEwen] said. Below with concrete 
challenges for the rest options.

The main issue I concern is in batch job, we prefer not to cancelling all 
vertices and restart. What's worse, since we don't have batch checkpoint, the 
batch job has to restart completely. This is unacceptable for large scale batch 
job.
For option2, what if jm failover after some input splits have been computed and 
sent off? We don't have specific jm failover strategy now, thus it cause the 
job completely restarted. By continue this option, it leads to discuss A jm 
failover strategy, that is, when jm failover and restart, it can 
recover(reconcile) state from the previous one.
For option3, there would be a wider consider about Source. Take two input case 
into consider(below). Currently we read from source blocking, now we compute 
the input split as a single task, if we still use blocking approach, the 
downstream maybe stuck for waiting one input while the other input is ready to 
be read.

Src1 \
Src2>Join

One way to solve this issue is we read from the source unblocking. Assume 
introduce a method {{boolean SourceFunction#next(Collector)}}, when the 
downstream calling it, the source sent its data to the collector and return 
true. If there remains no more data, it return false. This also async source 
read from file and produce data.

To sum up, focusing more on batch job, the main issue concerned would be jm 
failover for option 1 and 2(also extern but significant batch checkpoint), and 
more flexible source for option 3.)

> Parallel the creation of InputSplit if necessary
> 
>
> Key: FLINK-10038
> URL: https://issues.apache.org/jira/browse/FLINK-10038
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Coordination
>Affects Versions: 1.5.0
>Reporter: TisonKun
>Priority: Major
>  Labels: improvement, inputformat, parallel, perfomance
>
> As a continue to the discussion in the PR about parallelize the creation of 
> ExecutionJobVertex [here|https://github.com/apache/flink/pull/6353].
> [~StephanEwen] suggested that we could parallelize the creation of 
> InputSplit, from which we gain performance improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10569) Clean up uses of Scheduler and Instance in valid tests

2018-10-18 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655620#comment-16655620
 ] 

TisonKun commented on FLINK-10569:
--

I find this topic is more complex than I ever thought. We actually have two 
version of slot implementation and current {{ExecutionGraph}} related tests 
heavily depend on legacy slot implementation. Maybe we should separated this 
thread into stepwise issues decouples existing valid tests with legacy slot 
implementation(as well as Scheduler and Instance).

cc [~till.rohrmann].

> Clean up uses of Scheduler and Instance in valid tests
> --
>
> Key: FLINK-10569
> URL: https://issues.apache.org/jira/browse/FLINK-10569
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> Legacy class {{Scheduler}} and {{Instance}} are still used in some valid 
> tests like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 
> schedule mode. The best way I can find is use {{SimpleSlotProvider}}.
> Note that we need not to remove all use points among all files since most of 
> them stay in legacy codebase like {{JobManager.scala}} and would be removed 
> later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10569) Clean up uses of Scheduler and Instance in valid tests

2018-10-18 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655620#comment-16655620
 ] 

TisonKun edited comment on FLINK-10569 at 10/18/18 5:22 PM:


I find this topic is more complex than I ever thought. We actually have two 
version of slot implementation and current {{ExecutionGraph}} tests heavily 
depend on legacy slot implementation. Maybe we should separated this thread 
into stepwise issues decouples existing valid tests with legacy slot 
implementation(as well as Scheduler and Instance).

cc [~till.rohrmann].


was (Author: tison):
I find this topic is more complex than I ever thought. We actually have two 
version of slot implementation and current {{ExecutionGraph}} related tests 
heavily depend on legacy slot implementation. Maybe we should separated this 
thread into stepwise issues decouples existing valid tests with legacy slot 
implementation(as well as Scheduler and Instance).

cc [~till.rohrmann].

> Clean up uses of Scheduler and Instance in valid tests
> --
>
> Key: FLINK-10569
> URL: https://issues.apache.org/jira/browse/FLINK-10569
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> Legacy class {{Scheduler}} and {{Instance}} are still used in some valid 
> tests like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 
> schedule mode. The best way I can find is use {{SimpleSlotProvider}}.
> Note that we need not to remove all use points among all files since most of 
> them stay in legacy codebase like {{JobManager.scala}} and would be removed 
> later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10569) Clean up uses of Scheduler and Instance in valid tests

2018-10-18 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655620#comment-16655620
 ] 

TisonKun edited comment on FLINK-10569 at 10/18/18 5:22 PM:


I find this topic is more complex than I ever thought. We actually have two 
version of slot implementation and current {{ExecutionGraph}} tests heavily 
depend on legacy slot implementation. Maybe we should separate this thread into 
stepwise issues decouples existing valid tests with legacy slot 
implementation(as well as Scheduler and Instance).

cc [~till.rohrmann].


was (Author: tison):
I find this topic is more complex than I ever thought. We actually have two 
version of slot implementation and current {{ExecutionGraph}} tests heavily 
depend on legacy slot implementation. Maybe we should separated this thread 
into stepwise issues decouples existing valid tests with legacy slot 
implementation(as well as Scheduler and Instance).

cc [~till.rohrmann].

> Clean up uses of Scheduler and Instance in valid tests
> --
>
> Key: FLINK-10569
> URL: https://issues.apache.org/jira/browse/FLINK-10569
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.7.0
>
>
> Legacy class {{Scheduler}} and {{Instance}} are still used in some valid 
> tests like {{ExecutionGraphRestartTest}}. We should replace them with FLIP-6 
> schedule mode. The best way I can find is use {{SimpleSlotProvider}}.
> Note that we need not to remove all use points among all files since most of 
> them stay in legacy codebase like {{JobManager.scala}} and would be removed 
> later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10540) Remove legacy FlinkMiniCluster

2018-10-18 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656218#comment-16656218
 ] 

TisonKun commented on FLINK-10540:
--

Great take [~dangdangdang]!

> Remove legacy FlinkMiniCluster
> --
>
> Key: FLINK-10540
> URL: https://issues.apache.org/jira/browse/FLINK-10540
> Project: Flink
>  Issue Type: Sub-task
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: Shimin Yang
>Priority: Major
> Fix For: 1.7.0
>
>
> {{FlinkMiniCluster}} is based on legacy cluster mode and should be no longer 
> used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10540) Remove legacy FlinkMiniCluster

2018-10-18 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656239#comment-16656239
 ] 

TisonKun commented on FLINK-10540:
--

Yes I see. And in testing code its subclasses({{LocalFlinkMiniCluster}} and 
{{TestingCluster}}) are still in use, we might want to get rid of those uses, 
too. Otherwise they become blockers to this.

> Remove legacy FlinkMiniCluster
> --
>
> Key: FLINK-10540
> URL: https://issues.apache.org/jira/browse/FLINK-10540
> Project: Flink
>  Issue Type: Sub-task
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: Shimin Yang
>Priority: Major
> Fix For: 1.7.0
>
>
> {{FlinkMiniCluster}} is based on legacy cluster mode and should be no longer 
> used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10610) Port CoLocationConstraintITCase to new codebase

2018-10-19 Thread TisonKun (JIRA)
TisonKun created FLINK-10610:


 Summary: Port CoLocationConstraintITCase to new codebase
 Key: FLINK-10610
 URL: https://issues.apache.org/jira/browse/FLINK-10610
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.7.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.7.0


Port {{CoLocationConstraintITCase}} to new codebase



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10610) Port slot sharing cases to new codebase

2018-10-19 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10610:
-
Summary: Port slot sharing cases to new codebase  (was: Port 
CoLocationConstraintITCase to new codebase)

> Port slot sharing cases to new codebase
> ---
>
> Key: FLINK-10610
> URL: https://issues.apache.org/jira/browse/FLINK-10610
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> Port {{CoLocationConstraintITCase}} to new codebase



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10610) Port slot sharing cases to new codebase

2018-10-19 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10610:
-
Description: Port {{CoLocationConstraintITCase}} and {{SlotSharingITCase}} 
to new codebase  (was: Port {{CoLocationConstraintITCase}} to new codebase)

> Port slot sharing cases to new codebase
> ---
>
> Key: FLINK-10610
> URL: https://issues.apache.org/jira/browse/FLINK-10610
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> Port {{CoLocationConstraintITCase}} and {{SlotSharingITCase}} to new codebase



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-10558) Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new code base

2018-10-19 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun reassigned FLINK-10558:


Assignee: TisonKun

> Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new 
> code base
> ---
>
> Key: FLINK-10558
> URL: https://issues.apache.org/jira/browse/FLINK-10558
> Project: Flink
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: TisonKun
>Priority: Minor
>
> {{YARNHighAvailabilityITCase}}, 
> {{YARNSessionCapacitySchedulerITCase#testClientStartup,}} 
> {{YARNSessionCapacitySchedulerITCase#testTaskManagerFailure}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10558) Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new code base

2018-10-19 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657694#comment-16657694
 ] 

TisonKun commented on FLINK-10558:
--

[~till.rohrmann] I'd like to know if the present status excepted. As [~yanghua] 
commented on GitHub.

bq. @tillrohrmann For test case testTaskManagerFailure, the current test mode 
(based on stdout and stderr's special print message) cannot verify the Flip-6 
scene. Because it doesn't pre-start a TM, so I can only commit a job to drive 
the TM registration, but this will cause the Job's output to flush out the 
flink's output, which shares stdout and stderr. Therefore, many judgments will 
not take effect.

I wonder if on FLIP-6 yarn session mode, we'd like to (really) pre-start TMs, 
if so, we have to mark that the workaround(submit a job to drive th TM 
registration) is temporary.

> Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new 
> code base
> ---
>
> Key: FLINK-10558
> URL: https://issues.apache.org/jira/browse/FLINK-10558
> Project: Flink
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: TisonKun
>Priority: Minor
>
> {{YARNHighAvailabilityITCase}}, 
> {{YARNSessionCapacitySchedulerITCase#testClientStartup,}} 
> {{YARNSessionCapacitySchedulerITCase#testTaskManagerFailure}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10640) Enable Slot Resource Profile for Resource Management

2018-10-22 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658869#comment-16658869
 ] 

TisonKun commented on FLINK-10640:
--

Hi [~wuzang] I think this is a big proposal on resource management and require 
a discussion on the full picture.

I'd like to know your thoughts between "roughly" slot-based management and real 
resource based  management. It seems you are not going to change the slot 
concept but just change its generation from parallelism based to resource 
managed. And on this topic, what is your consider to work with FLINK-10407?
Also, you give out a concept "elastic session" and say that it can somehow act 
as session mode. What is the relation between "elastic session" and current 
session/per-job mode and do you want to replace them with such "elastic 
session"?

By the way, typically you can assign this JIRA to your self for further 
contributing. I will ping [~till.rohrmann] here since he is the committer now 
works on slot/resource management and can grant the contribution bit to you if 
needed.

> Enable Slot Resource Profile for Resource Management
> 
>
> Key: FLINK-10640
> URL: https://issues.apache.org/jira/browse/FLINK-10640
> Project: Flink
>  Issue Type: New Feature
>  Components: ResourceManager
>Reporter: Tony Xintong Song
>Priority: Major
>
> Motivation & Backgrounds
>  * The existing concept of task slots roughly represents how many pipeline of 
> tasks a TaskManager can hold. However, it does not consider the differences 
> in resource needs and usage of individual tasks. Enabling resource profiles 
> of slots may allow Flink to better allocate execution resources according to 
> tasks fine-grained resource needs.
>  * The community version Flink already contains APIs and some implementation 
> for slot resource profile. However, such logic is not truly used. 
> (ResourceProfile of slot requests is by default set to UNKNOWN with negative 
> values, thus matches any given slot.)
> Preliminary Design
>  * Slot Management
>  A slot represents a certain amount of resources for a single pipeline of 
> tasks to run in on a TaskManager. Initially, a TaskManager does not have any 
> slots but a total amount of resources. When allocating, the ResourceManager 
> finds proper TMs to generate new slots for the tasks to run according to the 
> slot requests. Once generated, the slot's size (resource profile) does not 
> change until it's freed. ResourceManager can apply different, portable 
> strategies to allocate slots from TaskManagers.
>  * TM Management
>  The size and number of TaskManagers and when to start them can also be 
> flexible. TMs can be started and released dynamically, and may have different 
> sizes. We may have many different, portable strategies. E.g., an elastic 
> session that can run multiple jobs like the session mode while dynamically 
> adjusting the size of session (number of TMs) according to the realtime 
> working load.
>  * About Slot Sharing
>  Slot sharing is a good heuristic to easily calculate how many slots needed 
> to get the job running and get better utilization when there is no resource 
> profile in slots. However, with resource profiles enabling finer-grained 
> resource management, each individual task has its specific resource need and 
> it does not make much sense to have multiple tasks sharing the resource of 
> the same slot. Instead, we may introduce locality preferences/constraints to 
> support the semantics of putting tasks in same/different TMs in a more 
> general way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10665) Port YARNSessionFIFOITCase#testJavaAPI to new codebase

2018-10-24 Thread TisonKun (JIRA)
TisonKun created FLINK-10665:


 Summary: Port YARNSessionFIFOITCase#testJavaAPI to new codebase
 Key: FLINK-10665
 URL: https://issues.apache.org/jira/browse/FLINK-10665
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.7.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.7.0


Port {{YARNSessionFIFOITCase#testJavaAPI}} to new codebase



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10666) Port YarnClusterDescriptorTest to new codebase

2018-10-24 Thread TisonKun (JIRA)
TisonKun created FLINK-10666:


 Summary: Port YarnClusterDescriptorTest to new codebase
 Key: FLINK-10666
 URL: https://issues.apache.org/jira/browse/FLINK-10666
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.7.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.7.0


Port {{YarnClusterDescriptorTest}} to new codebase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10558) Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new code base

2018-10-24 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662197#comment-16662197
 ] 

TisonKun commented on FLINK-10558:
--

For {{YARNHighAvailabilityITCase}}, I wonder how can we kill an AM. Previously 
we get {{JobManagerGateway}} by {{LeaderRetrievalUtils#retrieveLeaderGateway}} 
which I believe not alive on FLIP-6 codebase. Besides, if we kill JM by 
{{postStop}}, the Job would be suspend. I think what we need is externally kill 
AMs, with configuration Yarn restart attempt, to test if the blocking job can 
be taken over.

> Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new 
> code base
> ---
>
> Key: FLINK-10558
> URL: https://issues.apache.org/jira/browse/FLINK-10558
> Project: Flink
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: TisonKun
>Priority: Minor
>
> {{YARNHighAvailabilityITCase}}, 
> {{YARNSessionCapacitySchedulerITCase#testClientStartup,}} 
> {{YARNSessionCapacitySchedulerITCase#testTaskManagerFailure}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-10924) StreamExecutionEnvironment method javadocs incorrect in regards to used charset

2018-11-19 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun reassigned FLINK-10924:


Assignee: TisonKun

> StreamExecutionEnvironment method javadocs incorrect in regards to used 
> charset
> ---
>
> Key: FLINK-10924
> URL: https://issues.apache.org/jira/browse/FLINK-10924
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Streaming
>Affects Versions: 1.3.3, 1.4.2, 1.5.5, 1.6.2, 1.7.0
>Reporter: Chesnay Schepler
>Assignee: TisonKun
>Priority: Major
>
> Various methods of the {{StreamExecutionEnvironment}} (like 
> [readTextFile|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L939])
>  are documented to be using the system's default charset if none is specified.
> This is incorrect as they default to UTF-8 instead. The javadocs should be 
> updated to reflect this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10924) StreamExecutionEnvironment method javadocs incorrect in regards to used charset

2018-11-19 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16691414#comment-16691414
 ] 

TisonKun commented on FLINK-10924:
--

Start to digging and correcting in hours.

> StreamExecutionEnvironment method javadocs incorrect in regards to used 
> charset
> ---
>
> Key: FLINK-10924
> URL: https://issues.apache.org/jira/browse/FLINK-10924
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Streaming
>Affects Versions: 1.3.3, 1.4.2, 1.5.5, 1.6.2, 1.7.0
>Reporter: Chesnay Schepler
>Assignee: TisonKun
>Priority: Major
>
> Various methods of the {{StreamExecutionEnvironment}} (like 
> [readTextFile|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L939])
>  are documented to be using the system's default charset if none is specified.
> This is incorrect as they default to UTF-8 instead. The javadocs should be 
> updated to reflect this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10914) Prefer notifying status changed to waiting and checking in ExecutionGraph tests

2018-11-20 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694152#comment-16694152
 ] 

TisonKun commented on FLINK-10914:
--

[~till.rohrmann] [~StephanEwen] do you mean, as your conversation above, there 
is some on-going effort to change the concurrency model of the execution graph, 
and so that we delay this issue until that has been done and see how to improve 
this test stability?

If so, I'd like to see if I can help with the work that discuss/implement the 
new concurrency model(single threaded dispatcher).

> Prefer notifying status changed to waiting and checking in ExecutionGraph 
> tests
> ---
>
> Key: FLINK-10914
> URL: https://issues.apache.org/jira/browse/FLINK-10914
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.8.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.8.0
>
>
> Currently there are several tests of {{ExecutionGraph}} based on 
> {{ExecutionGraphTestUtils#waitUntilJobStatus}} and 
> {{ExecutionGraphTestUtils#waitUntilExecution(Vertex)State}}. I notice that we 
> have {{JobStatusListener}} s and {{ExecutionStatusListener}} s registered on 
> a {{ExecutionGraph}}. It is possible to replace the waiting version with a 
> notifying version, which is with less cpu workload and more reliable(the 
> waiting version might miss some very quick status/state changes).
> What do you think [~till.rohrmann] [~Zentol]?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10436) Example config uses deprecated key jobmanager.rpc.address

2018-11-22 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10436:
-
Issue Type: Bug  (was: Sub-task)
Parent: (was: FLINK-10392)

> Example config uses deprecated key jobmanager.rpc.address
> -
>
> Key: FLINK-10436
> URL: https://issues.apache.org/jira/browse/FLINK-10436
> Project: Flink
>  Issue Type: Bug
>  Components: Startup Shell Scripts
>Affects Versions: 1.7.0
>Reporter: Ufuk Celebi
>Assignee: TisonKun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.8.0
>
>
> The example {{flink-conf.yaml}} shipped as part of the Flink distribution 
> (https://github.com/apache/flink/blob/master/flink-dist/src/main/resources/flink-conf.yaml)
>  has the following entry:
> {code}
> jobmanager.rpc.address: localhost
> {code}
> When using this key, the following deprecation warning is logged.
> {code}
> 2018-09-26 12:01:46,608 WARN  org.apache.flink.configuration.Configuration
>   - Config uses deprecated configuration key 
> 'jobmanager.rpc.address' instead of proper key 'rest.address'
> {code}
> The example config should not use deprecated config options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10436) Example config uses deprecated key jobmanager.rpc.address

2018-11-22 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695943#comment-16695943
 ] 

TisonKun commented on FLINK-10436:
--

Change to an individual issue since it is irrelevant to legacy mode.

> Example config uses deprecated key jobmanager.rpc.address
> -
>
> Key: FLINK-10436
> URL: https://issues.apache.org/jira/browse/FLINK-10436
> Project: Flink
>  Issue Type: Bug
>  Components: Startup Shell Scripts
>Affects Versions: 1.7.0
>Reporter: Ufuk Celebi
>Assignee: TisonKun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.8.0
>
>
> The example {{flink-conf.yaml}} shipped as part of the Flink distribution 
> (https://github.com/apache/flink/blob/master/flink-dist/src/main/resources/flink-conf.yaml)
>  has the following entry:
> {code}
> jobmanager.rpc.address: localhost
> {code}
> When using this key, the following deprecation warning is logged.
> {code}
> 2018-09-26 12:01:46,608 WARN  org.apache.flink.configuration.Configuration
>   - Config uses deprecated configuration key 
> 'jobmanager.rpc.address' instead of proper key 'rest.address'
> {code}
> The example config should not use deprecated config options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

2018-11-28 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701842#comment-16701842
 ] 

TisonKun commented on FLINK-10333:
--

Agree with [~xiaogang.shi] that our leader elections is not that perfect. The 
main issue here is that we check the leadership and do the real action 
non-atomically.

Theoretically implementation [~xiaogang.shi] suggested and that [~StephanEwen] 
suggested worked. [~xiaogang.shi]'s suggestion looks good to me but it causes a 
bit many works to do. [~StephanEwen]'s suggestion is more simple but I don't 
find a settable {{id}}(i.e., the payload) in curator 2.12.0 and we need to add 
leader ID to the payload after leadership granted.

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -
>
> Key: FLINK-10333
> URL: https://issues.apache.org/jira/browse/FLINK-10333
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.3, 1.6.0, 1.7.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.8.0
>
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better 
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even 
> if it is locked. This should not happen since it could leave another system 
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from 
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} 
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. 
> Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

2018-11-28 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701846#comment-16701846
 ] 

TisonKun commented on FLINK-10333:
--

[~xiaogang.shi] adding the leader ID as payload and do the real action with 
that ID, then the action could be guarded by the fencing token mechanism. That 
is, not rely on zookeeper's transaction but with the atomic between lock and 
fencing token, a node/component lost his leadership could not actually do the 
real action.

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -
>
> Key: FLINK-10333
> URL: https://issues.apache.org/jira/browse/FLINK-10333
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.3, 1.6.0, 1.7.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.8.0
>
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better 
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even 
> if it is locked. This should not happen since it could leave another system 
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from 
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} 
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. 
> Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

2018-11-28 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701842#comment-16701842
 ] 

TisonKun edited comment on FLINK-10333 at 11/28/18 12:58 PM:
-

Agree with [~xiaogang.shi] that our leader elections is not that perfect. The 
main issue here is that we check the leadership and do the real action 
non-atomically.

Sounds like implementation [~xiaogang.shi] suggested and that [~StephanEwen] 
suggested worked. [~xiaogang.shi]'s suggestion looks good to me but it causes a 
bit many works to do. [~StephanEwen]'s suggestion is more simple but I don't 
find a settable {{id}}(i.e., the payload) in curator 2.12.0 and we need to add 
leader ID to the payload after leadership granted.


was (Author: tison):
Agree with [~xiaogang.shi] that our leader elections is not that perfect. The 
main issue here is that we check the leadership and do the real action 
non-atomically.

Theoretically implementation [~xiaogang.shi] suggested and that [~StephanEwen] 
suggested worked. [~xiaogang.shi]'s suggestion looks good to me but it causes a 
bit many works to do. [~StephanEwen]'s suggestion is more simple but I don't 
find a settable {{id}}(i.e., the payload) in curator 2.12.0 and we need to add 
leader ID to the payload after leadership granted.

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -
>
> Key: FLINK-10333
> URL: https://issues.apache.org/jira/browse/FLINK-10333
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.3, 1.6.0, 1.7.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.8.0
>
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better 
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even 
> if it is locked. This should not happen since it could leave another system 
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from 
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} 
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. 
> Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

2018-11-28 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-10333:
-
Comment: was deleted

(was: [~xiaogang.shi] adding the leader ID as payload and do the real action 
with that ID, then the action could be guarded by the fencing token mechanism. 
That is, not rely on zookeeper's transaction but with the atomic between lock 
and fencing token, a node/component lost his leadership could not actually do 
the real action.)

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -
>
> Key: FLINK-10333
> URL: https://issues.apache.org/jira/browse/FLINK-10333
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.3, 1.6.0, 1.7.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.8.0
>
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better 
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even 
> if it is locked. This should not happen since it could leave another system 
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from 
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} 
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. 
> Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

2018-11-28 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16702631#comment-16702631
 ] 

TisonKun commented on FLINK-10333:
--

[~StephanEwen] infer your original proposal is to make lock and fencing token 
atomic, then the JobManager does action with its lock, meaning atomically with 
its fencing token. Thus the action would be filtered by the fencing token. Is 
it misunderstood?

After a look into ZK and curator, I prefer [~xiaogang.shi]'s suggestion. This 
approach is a bit costly in implement but quite natural. We might do that all 
with curator but curator does not expose {{election-node-path(a 
EPHEMERAL_SEQUENTIAL node)}}, which force fallback to zookeeper.

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -
>
> Key: FLINK-10333
> URL: https://issues.apache.org/jira/browse/FLINK-10333
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.3, 1.6.0, 1.7.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.8.0
>
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better 
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even 
> if it is locked. This should not happen since it could leave another system 
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from 
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} 
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. 
> Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10558) Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new code base

2018-11-30 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705067#comment-16705067
 ] 

TisonKun commented on FLINK-10558:
--

Hi [~till.rohrmann], I think this jira and FLINK-10665 are the final broker of 
getting rid of directly akka based yarn cluster.

If we actually implement yarn session mode as proposed by FLIP-6, we should 
pass {{YARNSessionCapacitySchedulerITCase#testClientStartup}}, 
{{YARNSessionCapacitySchedulerITCase#testTaskManagerFailure}} without hacking 
to start a real job. Or say we can remain the structure of these two tests. 
Thus I suggest regard these two as not yet implement and so ignored tests. If 
so, with FLINK-10558(this jira) and FLINK-10665 resolved, we can start to 
remove legacy flink-yarn production code.

> Port YARNHighAvailabilityITCase and YARNSessionCapacitySchedulerITCase to new 
> code base
> ---
>
> Key: FLINK-10558
> URL: https://issues.apache.org/jira/browse/FLINK-10558
> Project: Flink
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: TisonKun
>Priority: Minor
>  Labels: pull-request-available
>
> {{YARNHighAvailabilityITCase}}, 
> {{YARNSessionCapacitySchedulerITCase#testClientStartup,}} 
> {{YARNSessionCapacitySchedulerITCase#testTaskManagerFailure}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10640) Enable Slot Resource Profile for Resource Management

2018-11-30 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705598#comment-16705598
 ] 

TisonKun commented on FLINK-10640:
--

About the "TM Management" section, currently yarn session mode launches without 
starting any TM. Does this JIRA also cover starting arbitrary TMs on yarn 
session launched? [~till.rohrmann] [~wuzang] what do you think about the gap 
between current TM management and the proposed one, especially flink on yarn?

> Enable Slot Resource Profile for Resource Management
> 
>
> Key: FLINK-10640
> URL: https://issues.apache.org/jira/browse/FLINK-10640
> Project: Flink
>  Issue Type: New Feature
>  Components: ResourceManager
>Reporter: Tony Xintong Song
>Priority: Major
>
> Motivation & Backgrounds
>  * The existing concept of task slots roughly represents how many pipeline of 
> tasks a TaskManager can hold. However, it does not consider the differences 
> in resource needs and usage of individual tasks. Enabling resource profiles 
> of slots may allow Flink to better allocate execution resources according to 
> tasks fine-grained resource needs.
>  * The community version Flink already contains APIs and some implementation 
> for slot resource profile. However, such logic is not truly used. 
> (ResourceProfile of slot requests is by default set to UNKNOWN with negative 
> values, thus matches any given slot.)
> Preliminary Design
>  * Slot Management
>  A slot represents a certain amount of resources for a single pipeline of 
> tasks to run in on a TaskManager. Initially, a TaskManager does not have any 
> slots but a total amount of resources. When allocating, the ResourceManager 
> finds proper TMs to generate new slots for the tasks to run according to the 
> slot requests. Once generated, the slot's size (resource profile) does not 
> change until it's freed. ResourceManager can apply different, portable 
> strategies to allocate slots from TaskManagers.
>  * TM Management
>  The size and number of TaskManagers and when to start them can also be 
> flexible. TMs can be started and released dynamically, and may have different 
> sizes. We may have many different, portable strategies. E.g., an elastic 
> session that can run multiple jobs like the session mode while dynamically 
> adjusting the size of session (number of TMs) according to the realtime 
> working load.
>  * About Slot Sharing
>  Slot sharing is a good heuristic to easily calculate how many slots needed 
> to get the job running and get better utilization when there is no resource 
> profile in slots. However, with resource profiles enabling finer-grained 
> resource management, each individual task has its specific resource need and 
> it does not make much sense to have multiple tasks sharing the resource of 
> the same slot. Instead, we may introduce locality preferences/constraints to 
> support the semantics of putting tasks in same/different TMs in a more 
> general way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11056) Remove MesosApplicationMasterRunner

2018-12-03 Thread TisonKun (JIRA)
TisonKun created FLINK-11056:


 Summary: Remove MesosApplicationMasterRunner
 Key: FLINK-11056
 URL: https://issues.apache.org/jira/browse/FLINK-11056
 Project: Flink
  Issue Type: Sub-task
  Components: Mesos
Affects Versions: 1.8.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.8.0


{{MesosApplicationMasterRunner}} is for legacy mode and none depend on it now, 
so that we can do a directly removal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-11069) Remove FutureUtil

2018-12-04 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun reassigned FLINK-11069:


Assignee: TisonKun

> Remove FutureUtil
> -
>
> Key: FLINK-11069
> URL: https://issues.apache.org/jira/browse/FLINK-11069
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Till Rohrmann
>Assignee: TisonKun
>Priority: Minor
>
> The utility class {{FutureUtil}} contains duplicate methods and method which 
> are not implemented optimally. I suggest to merge {{FutureUtil}} with 
> {{FutureUtils}} and get rid of the {{waitForAll}} method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11078) Capability to define the numerical range for running TaskExecutors

2018-12-05 Thread TisonKun (JIRA)
TisonKun created FLINK-11078:


 Summary: Capability to define the numerical range for running 
TaskExecutors
 Key: FLINK-11078
 URL: https://issues.apache.org/jira/browse/FLINK-11078
 Project: Flink
  Issue Type: New Feature
  Components: Client, ResourceManager
Affects Versions: 1.8.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.8.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10640) Enable Slot Resource Profile for Resource Management

2018-12-05 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710041#comment-16710041
 ] 

TisonKun commented on FLINK-10640:
--

@[~wuzang]

After an offline discuss with [~till.rohrmann], for part of "TM Management" 
issue, i.e., start arbitrary TMs on yarn session launched, I propose introduce 
a pair (min, max) represents the minimum and maximum for the number of running 
{{TaskExecutor}}s.

With such option, when setting {{minimum = maximum = n}} we effectively have 
the same behaviour as before with the pre-Flip-6 code, that is, a fixed number 
of pre-allocated TMs; and when setting {{minimum = 0, maximum = inf}} we 
effectively have the same behaviour as current code path. I think such a 
feature improve "TM Management" especially when user want to running job on a 
specific cluster and require less changes than achieving an arbitrarily 
flexible "TM Management".

What do you think?

> Enable Slot Resource Profile for Resource Management
> 
>
> Key: FLINK-10640
> URL: https://issues.apache.org/jira/browse/FLINK-10640
> Project: Flink
>  Issue Type: New Feature
>  Components: ResourceManager
>Reporter: Tony Xintong Song
>Priority: Major
>
> Motivation & Backgrounds
>  * The existing concept of task slots roughly represents how many pipeline of 
> tasks a TaskManager can hold. However, it does not consider the differences 
> in resource needs and usage of individual tasks. Enabling resource profiles 
> of slots may allow Flink to better allocate execution resources according to 
> tasks fine-grained resource needs.
>  * The community version Flink already contains APIs and some implementation 
> for slot resource profile. However, such logic is not truly used. 
> (ResourceProfile of slot requests is by default set to UNKNOWN with negative 
> values, thus matches any given slot.)
> Preliminary Design
>  * Slot Management
>  A slot represents a certain amount of resources for a single pipeline of 
> tasks to run in on a TaskManager. Initially, a TaskManager does not have any 
> slots but a total amount of resources. When allocating, the ResourceManager 
> finds proper TMs to generate new slots for the tasks to run according to the 
> slot requests. Once generated, the slot's size (resource profile) does not 
> change until it's freed. ResourceManager can apply different, portable 
> strategies to allocate slots from TaskManagers.
>  * TM Management
>  The size and number of TaskManagers and when to start them can also be 
> flexible. TMs can be started and released dynamically, and may have different 
> sizes. We may have many different, portable strategies. E.g., an elastic 
> session that can run multiple jobs like the session mode while dynamically 
> adjusting the size of session (number of TMs) according to the realtime 
> working load.
>  * About Slot Sharing
>  Slot sharing is a good heuristic to easily calculate how many slots needed 
> to get the job running and get better utilization when there is no resource 
> profile in slots. However, with resource profiles enabling finer-grained 
> resource management, each individual task has its specific resource need and 
> it does not make much sense to have multiple tasks sharing the resource of 
> the same slot. Instead, we may introduce locality preferences/constraints to 
> support the semantics of putting tasks in same/different TMs in a more 
> general way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11078) Capability to define the numerical range for running TaskExecutors

2018-12-05 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-11078:
-
Description: 
In pre-FLIP-6 context, we start a yarn session with fixed number of running 
{{TaskManagers}}. This is good for user to run a series of small jobs on a 
specific cluster and reduce the cost of deploying a cluster per job. In current 
code base, we start a yarn session with none pre-allocated TMs, but allocates 
them on a certain job submitted and running.

To get benefits from both mode, I propose introducing a pair {{(min, max)}} 
represents the minimum and maximum for the number of running {{TaskExecutors}}.

With such option, when setting minimum = maximum = n we effectively have the 
same behaviour as before with the pre-Flip-6 code; and when setting minimum = 
0, maximum = inf we effectively have the same behaviour as current code path.

Most of the implementation area would be passing such option in 
{{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on 
{{grantLeadership}} and {{stopWorker}}.

Hopefully the changes are not that big and I would try to draft more details, 
and be glad to your suggestions and ideas.

> Capability to define the numerical range for running TaskExecutors
> --
>
> Key: FLINK-11078
> URL: https://issues.apache.org/jira/browse/FLINK-11078
> Project: Flink
>  Issue Type: New Feature
>  Components: Client, ResourceManager
>Affects Versions: 1.8.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.8.0
>
>
> In pre-FLIP-6 context, we start a yarn session with fixed number of running 
> {{TaskManagers}}. This is good for user to run a series of small jobs on a 
> specific cluster and reduce the cost of deploying a cluster per job. In 
> current code base, we start a yarn session with none pre-allocated TMs, but 
> allocates them on a certain job submitted and running.
> To get benefits from both mode, I propose introducing a pair {{(min, max)}} 
> represents the minimum and maximum for the number of running 
> {{TaskExecutors}}.
> With such option, when setting minimum = maximum = n we effectively have the 
> same behaviour as before with the pre-Flip-6 code; and when setting minimum = 
> 0, maximum = inf we effectively have the same behaviour as current code path.
> Most of the implementation area would be passing such option in 
> {{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on 
> {{grantLeadership}} and {{stopWorker}}.
> Hopefully the changes are not that big and I would try to draft more details, 
> and be glad to your suggestions and ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11078) Capability to define the numerical range for running TaskExecutors

2018-12-05 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710072#comment-16710072
 ] 

TisonKun commented on FLINK-11078:
--

Yes. I've updated the description.

> Capability to define the numerical range for running TaskExecutors
> --
>
> Key: FLINK-11078
> URL: https://issues.apache.org/jira/browse/FLINK-11078
> Project: Flink
>  Issue Type: New Feature
>  Components: Client, ResourceManager
>Affects Versions: 1.8.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.8.0
>
>
> In pre-FLIP-6 context, we start a yarn session with fixed number of running 
> {{TaskManagers}}. This is good for user to run a series of small jobs on a 
> specific cluster and reduce the cost of deploying a cluster per job. In 
> current code base, we start a yarn session with none pre-allocated TMs, but 
> allocates them on a certain job submitted and running.
> To get benefits from both mode, I propose introducing a pair {{(min, max)}} 
> represents the minimum and maximum for the number of running 
> {{TaskExecutors}}.
> With such option, when setting minimum = maximum = n we effectively have the 
> same behaviour as before with the pre-Flip-6 code; and when setting minimum = 
> 0, maximum = inf we effectively have the same behaviour as current code path.
> Most of the implementation area would be passing such option in 
> {{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on 
> {{grantLeadership}} and {{stopWorker}}.
> Hopefully the changes are not that big and I would try to draft more details, 
> and be glad to your suggestions and ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10640) Enable Slot Resource Profile for Resource Management

2018-12-05 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710041#comment-16710041
 ] 

TisonKun edited comment on FLINK-10640 at 12/5/18 1:36 PM:
---

@[~wuzang]

After an offline discuss with [~till.rohrmann], for part of "TM Management" 
issue, i.e., start arbitrary TMs on yarn session launched, I propose introduce 
a pair (min, max) represents the minimum and maximum for the number of running 
{{TaskExecutor}}s.

With such option, when setting {{minimum = maximum = n}} we effectively have 
the same behaviour as before with the pre-Flip-6 code, that is, a fixed number 
of pre-allocated TMs; and when setting {{minimum = 0, maximum = inf}} we 
effectively have the same behaviour as current code path. I think such a 
feature improve "TM Management" especially when user want to running job on a 
specific cluster and require less changes than achieving an arbitrarily 
flexible "TM Management".

What do you think? (FYI I create a separated JIRA FLINK-11078 to discuss this 
topic)


was (Author: tison):
@[~wuzang]

After an offline discuss with [~till.rohrmann], for part of "TM Management" 
issue, i.e., start arbitrary TMs on yarn session launched, I propose introduce 
a pair (min, max) represents the minimum and maximum for the number of running 
{{TaskExecutor}}s.

With such option, when setting {{minimum = maximum = n}} we effectively have 
the same behaviour as before with the pre-Flip-6 code, that is, a fixed number 
of pre-allocated TMs; and when setting {{minimum = 0, maximum = inf}} we 
effectively have the same behaviour as current code path. I think such a 
feature improve "TM Management" especially when user want to running job on a 
specific cluster and require less changes than achieving an arbitrarily 
flexible "TM Management".

What do you think?

> Enable Slot Resource Profile for Resource Management
> 
>
> Key: FLINK-10640
> URL: https://issues.apache.org/jira/browse/FLINK-10640
> Project: Flink
>  Issue Type: New Feature
>  Components: ResourceManager
>Reporter: Tony Xintong Song
>Priority: Major
>
> Motivation & Backgrounds
>  * The existing concept of task slots roughly represents how many pipeline of 
> tasks a TaskManager can hold. However, it does not consider the differences 
> in resource needs and usage of individual tasks. Enabling resource profiles 
> of slots may allow Flink to better allocate execution resources according to 
> tasks fine-grained resource needs.
>  * The community version Flink already contains APIs and some implementation 
> for slot resource profile. However, such logic is not truly used. 
> (ResourceProfile of slot requests is by default set to UNKNOWN with negative 
> values, thus matches any given slot.)
> Preliminary Design
>  * Slot Management
>  A slot represents a certain amount of resources for a single pipeline of 
> tasks to run in on a TaskManager. Initially, a TaskManager does not have any 
> slots but a total amount of resources. When allocating, the ResourceManager 
> finds proper TMs to generate new slots for the tasks to run according to the 
> slot requests. Once generated, the slot's size (resource profile) does not 
> change until it's freed. ResourceManager can apply different, portable 
> strategies to allocate slots from TaskManagers.
>  * TM Management
>  The size and number of TaskManagers and when to start them can also be 
> flexible. TMs can be started and released dynamically, and may have different 
> sizes. We may have many different, portable strategies. E.g., an elastic 
> session that can run multiple jobs like the session mode while dynamically 
> adjusting the size of session (number of TMs) according to the realtime 
> working load.
>  * About Slot Sharing
>  Slot sharing is a good heuristic to easily calculate how many slots needed 
> to get the job running and get better utilization when there is no resource 
> profile in slots. However, with resource profiles enabling finer-grained 
> resource management, each individual task has its specific resource need and 
> it does not make much sense to have multiple tasks sharing the resource of 
> the same slot. Instead, we may introduce locality preferences/constraints to 
> support the semantics of putting tasks in same/different TMs in a more 
> general way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11078) Capability to define the numerical range for running TaskExecutors

2018-12-05 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-11078:
-
Description: 
In pre-FLIP-6 context, we start a yarn session with fixed number of running 
{{TaskManagers}}. This is good for user to run a series of small jobs on a 
specific cluster and reduce the cost of deploying a cluster per job. In current 
code base, we start a yarn session with none pre-allocated TMs, but allocates 
them on a certain job submitted and running.

To get benefits from both mode, I propose introducing a pair {{(min, max)}} 
represents the minimum and maximum for the number of running {{TaskExecutors}}.

With such option, when setting minimum = maximum = n we effectively have the 
same behaviour as before with the pre-Flip-6 code; and when setting minimum = 
0, maximum = inf we effectively have the same behaviour as current code path.

Most of the implementation area would be passing such option in 
{{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on 
{{grantLeadership}}(thus starting the {{SlotManager}}) and {{stopWorker}}.

Hopefully the changes are not that big and I would try to draft more details, 
and be glad to your suggestions and ideas.

  was:
In pre-FLIP-6 context, we start a yarn session with fixed number of running 
{{TaskManagers}}. This is good for user to run a series of small jobs on a 
specific cluster and reduce the cost of deploying a cluster per job. In current 
code base, we start a yarn session with none pre-allocated TMs, but allocates 
them on a certain job submitted and running.

To get benefits from both mode, I propose introducing a pair {{(min, max)}} 
represents the minimum and maximum for the number of running {{TaskExecutors}}.

With such option, when setting minimum = maximum = n we effectively have the 
same behaviour as before with the pre-Flip-6 code; and when setting minimum = 
0, maximum = inf we effectively have the same behaviour as current code path.

Most of the implementation area would be passing such option in 
{{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on 
{{grantLeadership}} and {{stopWorker}}.

Hopefully the changes are not that big and I would try to draft more details, 
and be glad to your suggestions and ideas.


> Capability to define the numerical range for running TaskExecutors
> --
>
> Key: FLINK-11078
> URL: https://issues.apache.org/jira/browse/FLINK-11078
> Project: Flink
>  Issue Type: New Feature
>  Components: Client, ResourceManager
>Affects Versions: 1.8.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.8.0
>
>
> In pre-FLIP-6 context, we start a yarn session with fixed number of running 
> {{TaskManagers}}. This is good for user to run a series of small jobs on a 
> specific cluster and reduce the cost of deploying a cluster per job. In 
> current code base, we start a yarn session with none pre-allocated TMs, but 
> allocates them on a certain job submitted and running.
> To get benefits from both mode, I propose introducing a pair {{(min, max)}} 
> represents the minimum and maximum for the number of running 
> {{TaskExecutors}}.
> With such option, when setting minimum = maximum = n we effectively have the 
> same behaviour as before with the pre-Flip-6 code; and when setting minimum = 
> 0, maximum = inf we effectively have the same behaviour as current code path.
> Most of the implementation area would be passing such option in 
> {{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on 
> {{grantLeadership}}(thus starting the {{SlotManager}}) and {{stopWorker}}.
> Hopefully the changes are not that big and I would try to draft more details, 
> and be glad to your suggestions and ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11078) Capability to define the numerical range for running TaskExecutors

2018-12-05 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-11078:
-
Description: 
In pre-FLIP-6 context, we start a yarn session with fixed number of running 
{{TaskManagers}}. This is good for user to run a series of small jobs on a 
specific cluster and reduce the cost of deploying a cluster per job. In current 
code base, we start a yarn session with none pre-allocated TMs, but allocates 
them on a certain job submitted and running.

To get benefits from both mode, I propose introducing a pair {{(min, max)}} 
represents the minimum and maximum for the number of running {{TaskExecutors}}.

With such option, when setting minimum = maximum = n we effectively have the 
same behaviour as before with the pre-Flip-6 code; and when setting minimum = 
0, maximum = inf we effectively have the same behaviour as current code path.

Most of the implementation area would be passing such option in 
{{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on 
{{grantLeadership}}(thus starting the {{SlotManager}}) and {{stopWorker}}.

Hopefully the changes are not that big and I would try to draft more details. 
Glad to see your suggestions and ideas.

  was:
In pre-FLIP-6 context, we start a yarn session with fixed number of running 
{{TaskManagers}}. This is good for user to run a series of small jobs on a 
specific cluster and reduce the cost of deploying a cluster per job. In current 
code base, we start a yarn session with none pre-allocated TMs, but allocates 
them on a certain job submitted and running.

To get benefits from both mode, I propose introducing a pair {{(min, max)}} 
represents the minimum and maximum for the number of running {{TaskExecutors}}.

With such option, when setting minimum = maximum = n we effectively have the 
same behaviour as before with the pre-Flip-6 code; and when setting minimum = 
0, maximum = inf we effectively have the same behaviour as current code path.

Most of the implementation area would be passing such option in 
{{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on 
{{grantLeadership}}(thus starting the {{SlotManager}}) and {{stopWorker}}.

Hopefully the changes are not that big and I would try to draft more details, 
and be glad to your suggestions and ideas.


> Capability to define the numerical range for running TaskExecutors
> --
>
> Key: FLINK-11078
> URL: https://issues.apache.org/jira/browse/FLINK-11078
> Project: Flink
>  Issue Type: New Feature
>  Components: Client, ResourceManager
>Affects Versions: 1.8.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.8.0
>
>
> In pre-FLIP-6 context, we start a yarn session with fixed number of running 
> {{TaskManagers}}. This is good for user to run a series of small jobs on a 
> specific cluster and reduce the cost of deploying a cluster per job. In 
> current code base, we start a yarn session with none pre-allocated TMs, but 
> allocates them on a certain job submitted and running.
> To get benefits from both mode, I propose introducing a pair {{(min, max)}} 
> represents the minimum and maximum for the number of running 
> {{TaskExecutors}}.
> With such option, when setting minimum = maximum = n we effectively have the 
> same behaviour as before with the pre-Flip-6 code; and when setting minimum = 
> 0, maximum = inf we effectively have the same behaviour as current code path.
> Most of the implementation area would be passing such option in 
> {{FlinkYarnSessionCli}} and respecting it in {{ResourceManager}} on 
> {{grantLeadership}}(thus starting the {{SlotManager}}) and {{stopWorker}}.
> Hopefully the changes are not that big and I would try to draft more details. 
> Glad to see your suggestions and ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11079) Update LICENSE and NOTICE files for flnk-storm-examples

2018-12-05 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710100#comment-16710100
 ] 

TisonKun commented on FLINK-11079:
--

Given FLINK-10509 I wonder if we are going to drop {{flink-storm}} and thus 
{{flink-storm-example}} becomes invalid any more?

> Update LICENSE and NOTICE files for flnk-storm-examples
> ---
>
> Key: FLINK-11079
> URL: https://issues.apache.org/jira/browse/FLINK-11079
> Project: Flink
>  Issue Type: Bug
>  Components: Storm Compatibility
>Affects Versions: 1.5.5, 1.6.2, 1.7.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.5.6, 1.6.3, 1.8.0, 1.7.1
>
>
> Similar to FLINK-10987 we should also update the {{LICENSE}} and {{NOTICE}} 
> for {{flink-storm-examples}}.
>  
> This project creates several fat example jars that are deployed to maven 
> central.
> Alternatively we could about dropping these examples.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

2018-12-05 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710963#comment-16710963
 ] 

TisonKun commented on FLINK-10333:
--

Hi [~StephanEwen] and [~till.rohrmann]

With an offline discuss with [~xiaogang.shi] we see ZK has a transactional 
mechanism so that we can ensure only the leader writes ZK.

Given this knowledge and the inconsistencies [~till.rohrmann] noticed, before 
go into reimplementation, I did a survey of the usage of ZK based stores in 
flink. Ideally there is exact one role who writes a specific znode.

There are four types of znodes that flink writes. Besides 
{{SubmittedJobGraphStore}} written by {{Dispatcher}}, 
{{CompletedCheckpointStore}} written by {{JobMaster}} and {{MesosWorkerStore}} 
written by {{MesosResourceManager}}, there is the {{RunningJobsRegistry}} that 
also has a ZK based implementation.

All of the first three write ZK with a heavy “store lock”, but as 
[~xiaogang.shi]  pointing out, it still lacks of atomicity. And with the 
solution based on ZK transaction — for example, a current {{Dispatcher}} leader 
{{setData}} with {{check}} for {{election-node-path}} — we can ensure the 
atomicity while getting rid of the lock.

For the last one, {{RunningJobsRegistry}}, situation becomes a bit more 
complex. {{JobManagerRunner}} is responsible for {{setJobRunning}} and 
{{setJobFinished}} and {{Dispatcher}} is responsible for {{clearJob}}. This is 
against the ideal that one role for one znode. Also I notice that the gap 
between the semantic of {{DONE}} and that of clear is ambiguous. 

{{JobSchedulingStatus}} becomes {{DONE}} only if an {{ArchivedExecutionGraph}} 
generated so that we can prevent the same job be processed by an approach other 
than check an ephemeral {{DONE}} status. What if we replace {{setJobFinished}} 
with clearing {{RunningJobsRegistry}}?

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -
>
> Key: FLINK-10333
> URL: https://issues.apache.org/jira/browse/FLINK-10333
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.3, 1.6.0, 1.7.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.8.0
>
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better 
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even 
> if it is locked. This should not happen since it could leave another system 
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from 
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} 
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. 
> Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

2018-12-06 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710963#comment-16710963
 ] 

TisonKun edited comment on FLINK-10333 at 12/6/18 11:49 AM:


Hi [~StephanEwen] and [~till.rohrmann]

With an offline discuss with [~xiaogang.shi] we see ZK has a transactional 
mechanism so that we can ensure only the leader writes ZK.

Given this knowledge and the inconsistencies [~till.rohrmann] noticed, before 
go into reimplementation, I did a survey of the usage of ZK based stores in 
flink. Ideally there is exact one role who writes a specific znode.

There are four types of znodes that flink writes. Besides 
{{SubmittedJobGraphStore}} written by {{Dispatcher}}, 
{{CompletedCheckpointStore}} written by {{JobMaster}} and {{MesosWorkerStore}} 
written by {{MesosResourceManager}}, there is the {{RunningJobsRegistry}} that 
also has a ZK based implementation.

All of the first three write ZK with a heavy “store lock”, but as 
[~xiaogang.shi]  pointing out, it still lacks of atomicity. And with the 
solution based on ZK transaction — for example, a current {{Dispatcher}} leader 
{{setData}} with {{check}} for {{election-node-path}} — we can ensure the 
atomicity while getting rid of the lock.

For the last one, {{RunningJobsRegistry}}, situation becomes a bit more 
complex. {{JobManagerRunner}} is responsible for {{setJobRunning}} and 
{{setJobFinished}} and {{Dispatcher}} is responsible for {{clearJob}}. This is 
against the ideal that one role for one znode. Also I notice that the gap 
between the semantic of {{DONE}} and that of clear is ambiguous. 

We might try to prevent the same job be processed by an approach other than 
check an ephemeral {{DONE}} status. What if we replace {{setJobFinished}} with 
clearing {{RunningJobsRegistry}}?


was (Author: tison):
Hi [~StephanEwen] and [~till.rohrmann]

With an offline discuss with [~xiaogang.shi] we see ZK has a transactional 
mechanism so that we can ensure only the leader writes ZK.

Given this knowledge and the inconsistencies [~till.rohrmann] noticed, before 
go into reimplementation, I did a survey of the usage of ZK based stores in 
flink. Ideally there is exact one role who writes a specific znode.

There are four types of znodes that flink writes. Besides 
{{SubmittedJobGraphStore}} written by {{Dispatcher}}, 
{{CompletedCheckpointStore}} written by {{JobMaster}} and {{MesosWorkerStore}} 
written by {{MesosResourceManager}}, there is the {{RunningJobsRegistry}} that 
also has a ZK based implementation.

All of the first three write ZK with a heavy “store lock”, but as 
[~xiaogang.shi]  pointing out, it still lacks of atomicity. And with the 
solution based on ZK transaction — for example, a current {{Dispatcher}} leader 
{{setData}} with {{check}} for {{election-node-path}} — we can ensure the 
atomicity while getting rid of the lock.

For the last one, {{RunningJobsRegistry}}, situation becomes a bit more 
complex. {{JobManagerRunner}} is responsible for {{setJobRunning}} and 
{{setJobFinished}} and {{Dispatcher}} is responsible for {{clearJob}}. This is 
against the ideal that one role for one znode. Also I notice that the gap 
between the semantic of {{DONE}} and that of clear is ambiguous. 

{{JobSchedulingStatus}} becomes {{DONE}} only if an {{ArchivedExecutionGraph}} 
generated so that we can prevent the same job be processed by an approach other 
than check an ephemeral {{DONE}} status. What if we replace {{setJobFinished}} 
with clearing {{RunningJobsRegistry}}?

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -
>
> Key: FLINK-10333
> URL: https://issues.apache.org/jira/browse/FLINK-10333
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.3, 1.6.0, 1.7.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.8.0
>
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better

[jira] [Commented] (FLINK-11097) Migrate flink-table runtime InputFormat classes

2018-12-07 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712631#comment-16712631
 ] 

TisonKun commented on FLINK-11097:
--

[~x1q1j1] if you don't have contributor permission yet(to assign this JIRA to 
yourself), feel free to send a mail to d...@flink.apache.org mail list.

> Migrate flink-table runtime InputFormat classes 
> 
>
> Key: FLINK-11097
> URL: https://issues.apache.org/jira/browse/FLINK-11097
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: xuqianjin
>Priority: Major
>
> As discussed in FLINK-11065, this is a subtask which migrates flink-table
> org.apache.flink.table.runtime.io Scala files in the directory to java in 
> module flink-table-runtime



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11105) Add a new implementation of the HighAvailabilityServices using etcd

2018-12-07 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713102#comment-16713102
 ] 

TisonKun commented on FLINK-11105:
--

Thanks for bringing this proposal! I like it and think such a new feature needs 
a bit more in depth discussion. A design document describing the integration of 
etcd should help.

> Add a new implementation of the HighAvailabilityServices using etcd
> ---
>
> Key: FLINK-11105
> URL: https://issues.apache.org/jira/browse/FLINK-11105
> Project: Flink
>  Issue Type: New Feature
>Reporter: Yang Wang
>Priority: Major
>
> In flink, we use HighAvailabilityServices to do many things, e.g. RM/JM 
> leader election and retrieval. ZooKeeperHaServices is an implementation of 
> HighAvailabilityServices using Apache ZooKeeper. It is very easy to integrate 
> with hadoop ecosystem. However, the cloud native and micro service are become 
> more and more popular. We just need to follow the step and add a new 
> implementation EtcdHaService using etcd.
> Now flink has supported to run StandaloneSession on kubernetes and FLINK-9953 
> start to make an native integration with kubernetes. If we have the 
> EtcdHaService, both of them will benefit from this and we will not have 
> deploy a zookeeper service on kubernetes cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11106) Remove legacy flink-yarn component

2018-12-07 Thread TisonKun (JIRA)
TisonKun created FLINK-11106:


 Summary: Remove legacy flink-yarn component
 Key: FLINK-11106
 URL: https://issues.apache.org/jira/browse/FLINK-11106
 Project: Flink
  Issue Type: Sub-task
  Components: Client, Cluster Management, YARN
Affects Versions: 1.8.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.8.0


After flink-yarn tests ported to new codebase, we are able to remove Remove 
legacy flink-yarn component, including

- {{ApplicationClient.scala}}
- {{YarnJobManager.scala}}
- {{YarnMessages.scala}}
- {{YarnTaskManager.scala}}
- {{org.apache.flink.yarn.messages.*}}
- {{LegacyYarnClusterDescriptor.java}}
- {{RegisteredYarnWorkerNode.java}}
- {{YarnApplicationMasterRunner.java}}
- {{YarnClusterClient.java}}
- {{YarnContainerInLaunch.java}}
- {{YarnFlinkResourceManager.java}}
- {{YarnResourceManagerCallbackHandler.java}}
- {{YarnTaskManagerRunnerFactory.java}}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11106) Remove legacy flink-yarn component

2018-12-07 Thread TisonKun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated FLINK-11106:
-
Description: 
After flink-yarn tests ported to new codebase, we are able to remove legacy 
flink-yarn component, including

- {{ApplicationClient.scala}}
- {{YarnJobManager.scala}}
- {{YarnMessages.scala}}
- {{YarnTaskManager.scala}}
- {{org.apache.flink.yarn.messages.*}}
- {{LegacyYarnClusterDescriptor.java}}
- {{RegisteredYarnWorkerNode.java}}
- {{YarnApplicationMasterRunner.java}}
- {{YarnClusterClient.java}}
- {{YarnContainerInLaunch.java}}
- {{YarnFlinkResourceManager.java}}
- {{YarnResourceManagerCallbackHandler.java}}
- {{YarnTaskManagerRunnerFactory.java}}



  was:
After flink-yarn tests ported to new codebase, we are able to remove Remove 
legacy flink-yarn component, including

- {{ApplicationClient.scala}}
- {{YarnJobManager.scala}}
- {{YarnMessages.scala}}
- {{YarnTaskManager.scala}}
- {{org.apache.flink.yarn.messages.*}}
- {{LegacyYarnClusterDescriptor.java}}
- {{RegisteredYarnWorkerNode.java}}
- {{YarnApplicationMasterRunner.java}}
- {{YarnClusterClient.java}}
- {{YarnContainerInLaunch.java}}
- {{YarnFlinkResourceManager.java}}
- {{YarnResourceManagerCallbackHandler.java}}
- {{YarnTaskManagerRunnerFactory.java}}




> Remove legacy flink-yarn component
> --
>
> Key: FLINK-11106
> URL: https://issues.apache.org/jira/browse/FLINK-11106
> Project: Flink
>  Issue Type: Sub-task
>  Components: Client, Cluster Management, YARN
>Affects Versions: 1.8.0
>Reporter: TisonKun
>Assignee: TisonKun
>Priority: Major
> Fix For: 1.8.0
>
>
> After flink-yarn tests ported to new codebase, we are able to remove legacy 
> flink-yarn component, including
> - {{ApplicationClient.scala}}
> - {{YarnJobManager.scala}}
> - {{YarnMessages.scala}}
> - {{YarnTaskManager.scala}}
> - {{org.apache.flink.yarn.messages.*}}
> - {{LegacyYarnClusterDescriptor.java}}
> - {{RegisteredYarnWorkerNode.java}}
> - {{YarnApplicationMasterRunner.java}}
> - {{YarnClusterClient.java}}
> - {{YarnContainerInLaunch.java}}
> - {{YarnFlinkResourceManager.java}}
> - {{YarnResourceManagerCallbackHandler.java}}
> - {{YarnTaskManagerRunnerFactory.java}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

2018-12-09 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713900#comment-16713900
 ] 

TisonKun commented on FLINK-10333:
--

Diving into this issue, it is quite tricky that we'd better list what Problems 
we need to answer.
h3. *Problems on current ZooKeeper based store*

1. We don't have a good znode layout. What is suggested is documented at 
{{ZooKeeperHaServices}} but {{ZooKeeperCompletedCheckpointStore}} and others 
break it.
 2. We use Curator API to deal with writing ZooKeeper based store, it is lack 
of atomicity. Specifically, we check the leadership and write the znode 
non-atomically.
 3. We use {{RunningJobsRegistry}} to track if a job running by a jm. Ideally 
the termination of a job at jm, at dispatcher and the removal of the submitted 
jobgraph, these three action should happen atomically, but it isn't.
h3. *Questions about resolving the problems*

*1. How to layout znode?*
 Actually we could follow the document at {{ZooKeeperHaServices}}. That is, all 
job related nodes should be children of the job node, and so as cluster. For 
example, znode like "checkpoint/job_id/1 persistent" is better to change to 
"job_id/checkpoint/1 persistent".

*2. What actions we want to be atomic?*
 At least two. One is we check the leader ship and write a znode, for example, 
commit checkpoint only if JobManager is leader. The other is we terminate a job 
atomically at jm and dispatcher, but since jm and dispatcher are different 
components, it's important to deal the gap between jm finished the job and 
dispatcher commit it(specifically, remove submitted jobgraph). Also it is worth 
to consider how users see the status of a job on WebUI, depending on jm or 
dispatcher.

*3. How to achieve atomicity?*
 As discussion above, we can take use of ZooKeeper transaction. An alternative 
might be a distributed lock. This scope is mainly about the implementation.

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -
>
> Key: FLINK-10333
> URL: https://issues.apache.org/jira/browse/FLINK-10333
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.3, 1.6.0, 1.7.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.8.0
>
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better 
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even 
> if it is locked. This should not happen since it could leave another system 
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from 
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} 
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. 
> Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

2018-12-09 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713900#comment-16713900
 ] 

TisonKun edited comment on FLINK-10333 at 12/9/18 8:47 AM:
---

Diving into this issue, it is quite tricky that we'd better list what problem 
we want to resolve and what questions we need to answer.
h3. *Problems on current ZooKeeper based store*

1. We don't have a good znode layout. What is suggested is documented at 
{{ZooKeeperHaServices}} but {{ZooKeeperCompletedCheckpointStore}} and others 
break it.
 2. We use Curator API to deal with writing ZooKeeper based store, it is lack 
of atomicity. Specifically, we check the leadership and write the znode 
non-atomically.
 3. We use {{RunningJobsRegistry}} to track if a job running by a jm. Ideally 
the termination of a job at jm, at dispatcher and the removal of the submitted 
jobgraph, these three action should happen atomically, but it isn't.
h3. *Questions about resolving the problems*

*1. How to layout znode?*
 Actually we could follow the document at {{ZooKeeperHaServices}}. That is, all 
job related nodes should be children of the job node, and so as cluster. For 
example, znode like "checkpoint/job_id/1 persistent" is better to change to 
"job_id/checkpoint/1 persistent".

*2. What actions we want to be atomic?*
 At least two. One is we check the leader ship and write a znode, for example, 
commit checkpoint only if JobManager is leader. The other is we terminate a job 
atomically at jm and dispatcher, but since jm and dispatcher are different 
components, it's important to deal the gap between jm finished the job and 
dispatcher commit it(specifically, remove submitted jobgraph). Also it is worth 
to consider how users see the status of a job on WebUI, depending on jm or 
dispatcher.

*3. How to achieve atomicity?*
 As discussion above, we can take use of ZooKeeper transaction. An alternative 
might be a distributed lock. This scope is mainly about the implementation.


was (Author: tison):
Diving into this issue, it is quite tricky that we'd better list what Problems 
we need to answer.
h3. *Problems on current ZooKeeper based store*

1. We don't have a good znode layout. What is suggested is documented at 
{{ZooKeeperHaServices}} but {{ZooKeeperCompletedCheckpointStore}} and others 
break it.
 2. We use Curator API to deal with writing ZooKeeper based store, it is lack 
of atomicity. Specifically, we check the leadership and write the znode 
non-atomically.
 3. We use {{RunningJobsRegistry}} to track if a job running by a jm. Ideally 
the termination of a job at jm, at dispatcher and the removal of the submitted 
jobgraph, these three action should happen atomically, but it isn't.
h3. *Questions about resolving the problems*

*1. How to layout znode?*
 Actually we could follow the document at {{ZooKeeperHaServices}}. That is, all 
job related nodes should be children of the job node, and so as cluster. For 
example, znode like "checkpoint/job_id/1 persistent" is better to change to 
"job_id/checkpoint/1 persistent".

*2. What actions we want to be atomic?*
 At least two. One is we check the leader ship and write a znode, for example, 
commit checkpoint only if JobManager is leader. The other is we terminate a job 
atomically at jm and dispatcher, but since jm and dispatcher are different 
components, it's important to deal the gap between jm finished the job and 
dispatcher commit it(specifically, remove submitted jobgraph). Also it is worth 
to consider how users see the status of a job on WebUI, depending on jm or 
dispatcher.

*3. How to achieve atomicity?*
 As discussion above, we can take use of ZooKeeper transaction. An alternative 
might be a distributed lock. This scope is mainly about the implementation.

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -
>
> Key: FLINK-10333
> URL: https://issues.apache.org/jira/browse/FLINK-10333
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.3, 1.6.0, 1.7.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.8.0
>
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
>

[jira] [Commented] (FLINK-11125) Remove useless import

2018-12-10 Thread TisonKun (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716274#comment-16716274
 ] 

TisonKun commented on FLINK-11125:
--

@[~hequn8128] If the change is not big, it might be satisfied in a hotfix. 
Otherwise it's better to detail the title as "Remove useless import under 
XXXClass" or something.

> Remove useless import 
> --
>
> Key: FLINK-11125
> URL: https://issues.apache.org/jira/browse/FLINK-11125
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL, Tests
>Reporter: Hequn Cheng
>Assignee: Hequn Cheng
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-13750) Separate HA services between client-/ and server-side

2019-08-30 Thread TisonKun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919295#comment-16919295
 ] 

TisonKun commented on FLINK-13750:
--

Hi here. I noticed that {{ClusterClient#getClusterConnectionInfo}} works our of 
expected in YARN scenario. In YARN scenario we set the port exactly to 
WebMonitor's port.


{code:java}
final String host = appReport.getHost();
final int rpcPort = appReport.getRpcPort();

LOG.info("Found application JobManager host name '{}' and port '{}' from 
supplied application id '{}'",
host, rpcPort, applicationId);

flinkConfiguration.setString(JobManagerOptions.ADDRESS, host);
flinkConfiguration.setInteger(JobManagerOptions.PORT, rpcPort);

flinkConfiguration.setString(RestOptions.ADDRESS, host);
flinkConfiguration.setInteger(RestOptions.PORT, rpcPort);
{code}

thus the interface never works as expected. I suspect this doesn't find before 
because no one relies on it. 

Given a further investigating all usages(omit used in logs, all under 
scala-shell and thus remote executor) actually communicate with {{WebMonitor}} 
instead of {{Dispatcher}}. I'd like to just remove this interface though.

> Separate HA services between client-/ and server-side
> -
>
> Key: FLINK-13750
> URL: https://issues.apache.org/jira/browse/FLINK-13750
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Runtime / Coordination
>Reporter: Chesnay Schepler
>Assignee: TisonKun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we use the same {{HighAvailabilityServices}} on the client and 
> server. However, the client does not need several of the features that the 
> services currently provide (access to the blobstore or checkpoint metadata).
> Additionally, due to how these services are setup they also require the 
> client to have access to the blob storage, despite it never actually being 
> used, which can cause issues, like FLINK-13500.
> [~Tison] Would be be interested in this issue?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13912) Remove ClusterClient#getClusterConnectionInfo

2019-08-30 Thread TisonKun (Jira)
TisonKun created FLINK-13912:


 Summary: Remove ClusterClient#getClusterConnectionInfo
 Key: FLINK-13912
 URL: https://issues.apache.org/jira/browse/FLINK-13912
 Project: Flink
  Issue Type: Improvement
  Components: Command Line Client, Runtime / Coordination
Affects Versions: 1.10.0
Reporter: TisonKun
 Fix For: 1.10.0


As discussed in FLINK-13750, we actually doesn't need this method any more. All 
configuration needed is WebMonitor address and port in standalone HA mode. We 
can safely remove this method and replace its usages with 
{{ClusterClient#getWebInterfaceURL}}

cc [~till.rohrmann]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-13750) Separate HA services between client-/ and server-side

2019-08-30 Thread TisonKun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919380#comment-16919380
 ] 

TisonKun commented on FLINK-13750:
--

[~till.rohrmann] Sure. FYI FLINK-13912 and I'd like to mark this one blocked by 
that one because we can avoid introduce a new endpoint at WebMonitor then.

> Separate HA services between client-/ and server-side
> -
>
> Key: FLINK-13750
> URL: https://issues.apache.org/jira/browse/FLINK-13750
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Runtime / Coordination
>Reporter: Chesnay Schepler
>Assignee: TisonKun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we use the same {{HighAvailabilityServices}} on the client and 
> server. However, the client does not need several of the features that the 
> services currently provide (access to the blobstore or checkpoint metadata).
> Additionally, due to how these services are setup they also require the 
> client to have access to the blob storage, despite it never actually being 
> used, which can cause issues, like FLINK-13500.
> [~Tison] Would be be interested in this issue?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-13912) Remove ClusterClient#getClusterConnectionInfo

2019-08-30 Thread TisonKun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919381#comment-16919381
 ] 

TisonKun commented on FLINK-13912:
--

If there is no more concern I volunteer to start progress.

> Remove ClusterClient#getClusterConnectionInfo
> -
>
> Key: FLINK-13912
> URL: https://issues.apache.org/jira/browse/FLINK-13912
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: TisonKun
>Priority: Major
> Fix For: 1.10.0
>
>
> As discussed in FLINK-13750, we actually doesn't need this method any more. 
> All configuration needed is WebMonitor address and port in standalone HA 
> mode. We can safely remove this method and replace its usages with 
> {{ClusterClient#getWebInterfaceURL}}
> cc [~till.rohrmann]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (FLINK-13750) Separate HA services between client-/ and server-side

2019-08-30 Thread TisonKun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919295#comment-16919295
 ] 

TisonKun edited comment on FLINK-13750 at 8/30/19 9:55 AM:
---

Hi here. I noticed that {{ClusterClient#getClusterConnectionInfo}} works out of 
expected in YARN scenario. In YARN scenario we set the port exactly to 
WebMonitor's port.


{code:java}
final String host = appReport.getHost();
final int rpcPort = appReport.getRpcPort();

LOG.info("Found application JobManager host name '{}' and port '{}' from 
supplied application id '{}'",
host, rpcPort, applicationId);

flinkConfiguration.setString(JobManagerOptions.ADDRESS, host);
flinkConfiguration.setInteger(JobManagerOptions.PORT, rpcPort);

flinkConfiguration.setString(RestOptions.ADDRESS, host);
flinkConfiguration.setInteger(RestOptions.PORT, rpcPort);
{code}

thus the interface never works as expected. I suspect this hasn't been found so 
far because no one relies on it. 

Given a further investigating all usages(omit used in logs, all under 
scala-shell and thus remote executor) actually communicate with {{WebMonitor}} 
instead of {{Dispatcher}}. I'd like to just remove this interface though.


was (Author: tison):
Hi here. I noticed that {{ClusterClient#getClusterConnectionInfo}} works our of 
expected in YARN scenario. In YARN scenario we set the port exactly to 
WebMonitor's port.


{code:java}
final String host = appReport.getHost();
final int rpcPort = appReport.getRpcPort();

LOG.info("Found application JobManager host name '{}' and port '{}' from 
supplied application id '{}'",
host, rpcPort, applicationId);

flinkConfiguration.setString(JobManagerOptions.ADDRESS, host);
flinkConfiguration.setInteger(JobManagerOptions.PORT, rpcPort);

flinkConfiguration.setString(RestOptions.ADDRESS, host);
flinkConfiguration.setInteger(RestOptions.PORT, rpcPort);
{code}

thus the interface never works as expected. I suspect this doesn't find before 
because no one relies on it. 

Given a further investigating all usages(omit used in logs, all under 
scala-shell and thus remote executor) actually communicate with {{WebMonitor}} 
instead of {{Dispatcher}}. I'd like to just remove this interface though.

> Separate HA services between client-/ and server-side
> -
>
> Key: FLINK-13750
> URL: https://issues.apache.org/jira/browse/FLINK-13750
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Runtime / Coordination
>Reporter: Chesnay Schepler
>Assignee: TisonKun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we use the same {{HighAvailabilityServices}} on the client and 
> server. However, the client does not need several of the features that the 
> services currently provide (access to the blobstore or checkpoint metadata).
> Additionally, due to how these services are setup they also require the 
> client to have access to the blob storage, despite it never actually being 
> used, which can cause issues, like FLINK-13500.
> [~Tison] Would be be interested in this issue?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13929) Revisit REST & JM URL

2019-08-30 Thread TisonKun (Jira)
TisonKun created FLINK-13929:


 Summary: Revisit REST & JM URL 
 Key: FLINK-13929
 URL: https://issues.apache.org/jira/browse/FLINK-13929
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration, Runtime / Coordination
Reporter: TisonKun


Currently we have several issues on URL(i.e., ADDRESS and PORT) configurations 
of REST(WebMonitor) and JM(DispatcherRMComponent).
 # Client side code should only retrieve REST PORT but for historical reasons 
we sometimes pass JM PORT. And this doesn't become a problem because some of 
them are unused while others JM PORT is incorrectly set with REST PORT value so 
we do incorrectly twice but conclude in success.
 # Generally speaking, back to the design of 
[FLIP-6|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077],
 there is no concept named {{WebMonitor}}. The responsibility to communicate 
with client is covered by {{Dispatcher}}. So it seems no argument to separate 
{{JobManagerOptions.ADDRESS}} and {{RestOptions.ADDRESS}}. Besides, we 
unfortunately use different PORT because REST server uses a netty connection 
while JM requires an actor system which has to bind to another port. 
Theoretically all message can be passed via the same port, either we handle 
REST requests in Akka scope or handle RPC in netty scope, so that this 
"two-port" requirement is hopefully not required then.
 # nit: Deprecated config {{WebOptions.PORT}} still in use at 
{{YarnEntrypointUtils.loadConfiguration}}. This should be easily resolved by 
replaced with {{RestOptions.PORT}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13932) PyTest ExecutionConfigTests.test_equals_and_hash fail

2019-08-30 Thread TisonKun (Jira)
TisonKun created FLINK-13932:


 Summary: PyTest ExecutionConfigTests.test_equals_and_hash fail
 Key: FLINK-13932
 URL: https://issues.apache.org/jira/browse/FLINK-13932
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Reporter: TisonKun


Not yet found at master, but independent pull requests even trivial fix fail on 
the same case


{code:java}
=== FAILURES ===
__ ExecutionConfigTests.test_equals_and_hash ___
self = 
def test_equals_and_hash(self):

config1 = ExecutionEnvironment.get_execution_environment().get_config()

config2 = ExecutionEnvironment.get_execution_environment().get_config()

self.assertEqual(config1, config2)

>   self.assertEqual(hash(config1), hash(config2))
E   AssertionError: 897378335 != 1596606912
pyflink/common/tests/test_execution_config.py:277: AssertionError
=== warnings summary ===
.tox/py37/lib/python3.7/site-packages/py4j/java_collections.py:13
.tox/py37/lib/python3.7/site-packages/py4j/java_collections.py:13
.tox/py37/lib/python3.7/site-packages/py4j/java_collections.py:13
.tox/py37/lib/python3.7/site-packages/py4j/java_collections.py:13
.tox/py37/lib/python3.7/site-packages/py4j/java_collections.py:13
  
/home/travis/build/flink-ci/flink/flink-python/.tox/py37/lib/python3.7/site-packages/py4j/java_collections.py:13:
 DeprecationWarning: Using or importing the ABCs from 'collections' instead of 
from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import (
{code}

https://api.travis-ci.com/v3/job/229672435/log.txt
https://api.travis-ci.com/v3/job/229721832/log.txt

cc [~sunjincheng121] [~dian.fu]




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-13417) Bump Zookeeper to 3.5.5

2019-09-03 Thread TisonKun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921386#comment-16921386
 ] 

TisonKun commented on FLINK-13417:
--

[~till.rohrmann]

Here is [the 
thread|https://lists.apache.org/x/thread.html/a35ef4da607121ac3621f7eee01b377b4d205b6bf676357bd25d949f@%3Cuser.zookeeper.apache.org%3E]
 as well as the upgrade FAQ 
[page|https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ]. 

It is announced no intended or known break changes if we upgrade zk to 3.5 and 
require quorums running on 3.5. But if we upgrade to zk 3.5, since we would 
like to make used of {{CreateMode.CONTAINER}} the packet won't be identified by 
a 3.4 server and will fail with {{KeeperException.UnimplementedException}}.

Simply sum up, the upgrade should not cost any changes in FLINK side but 
required our users running FLINK against zk 3.5 servers. Unless we add an 
switch to opt-in and opt-out, with manually fallback the use of new features 
introduced in 3.5.

> Bump Zookeeper to 3.5.5
> ---
>
> Key: FLINK-13417
> URL: https://issues.apache.org/jira/browse/FLINK-13417
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0
>Reporter: Konstantin Knauf
>Priority: Blocker
> Fix For: 1.10.0
>
>
> User might want to secure their Zookeeper connection via SSL.
> This requires a Zookeeper version >= 3.5.1. We might as well try to bump it 
> to 3.5.5, which is the latest version. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-13946) Remove deactivated JobSession-related code.

2019-09-03 Thread TisonKun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921951#comment-16921951
 ] 

TisonKun commented on FLINK-13946:
--

Good to hear. I'm volunteer to review your patch :-)

> Remove deactivated JobSession-related code.
> ---
>
> Key: FLINK-13946
> URL: https://issues.apache.org/jira/browse/FLINK-13946
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>Affects Versions: 1.9.0
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>Priority: Major
>
> This issue refers to removing the code related to job session as described in 
> [FLINK-2097|https://issues.apache.org/jira/browse/FLINK-2097].  The feature 
> is deactivated, as pointed by the comment 
> [here|https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/ExecutionEnvironment.java#L285]
>  and it complicates the code paths related to job submission, namely the 
> lifecycle of the Remote and LocalExecutors.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13961) Remove obsolete abstraction JobExecutor(Service)

2019-09-04 Thread TisonKun (Jira)
TisonKun created FLINK-13961:


 Summary: Remove obsolete abstraction JobExecutor(Service) 
 Key: FLINK-13961
 URL: https://issues.apache.org/jira/browse/FLINK-13961
 Project: Flink
  Issue Type: Sub-task
  Components: Client / Job Submission
Affects Versions: 1.10.0
Reporter: TisonKun
 Fix For: 1.10.0


Refer to Till's comment

The JobExecutor and the JobExecutorService have been introduced to bridge 
between the Flip-6 MiniCluster and the legacy FlinkMiniCluster. They should be 
obsolete now and could be removed if needed.

Actually we should make used of {{MiniClusterClient}} for submission ideally 
but we have some tests based on MiniCluster in flink-runtime or somewhere that 
doesn't have a dependency to flink-client; while move {{MiniClusterClient}} to 
flink-runtime is unclear whether reasonable or not. Thus I'd prefer keep 
{{executeJobBlocking}} for now and defer the possible refactor.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-13417) Bump Zookeeper to 3.5.5

2019-09-04 Thread TisonKun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922970#comment-16922970
 ] 

TisonKun commented on FLINK-13417:
--

Yes [~till.rohrmann]. I locally build with zk 3.5 and no compile error reported 
while I fired CI it passed almost builds, see also

https://travis-ci.org/TisonKun/flink/builds/580757901

which reported failures on {{HBaseConnectorITCase}} when started HBase 
MiniCluster when started MiniZooKeeperCluster. It seems like a problem of 
testing class implementation.

Maybe [~carp84] can provide some inputs here.


{code:java}
java.io.IOException: Waiting for startup of standalone server

at 
org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.startup(MiniZooKeeperCluster.java:261)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniZKCluster(HBaseTestingUtility.java:814)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniZKCluster(HBaseTestingUtility.java:784)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1041)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:917)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:899)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:881)
at 
org.apache.flink.addons.hbase.util.HBaseTestingClusterAutoStarter.setUp(HBaseTestingClusterAutoStarter.java:147)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
{code}


> Bump Zookeeper to 3.5.5
> ---
>
> Key: FLINK-13417
> URL: https://issues.apache.org/jira/browse/FLINK-13417
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0
>Reporter: Konstantin Knauf
>Priority: Blocker
> Fix For: 1.10.0
>
>
> User might want to secure their Zookeeper connection via SSL.
> This requires a Zookeeper version >= 3.5.1. We might as well try to bump it 
> to 3.5.5, which is the latest version. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-13961) Remove obsolete abstraction JobExecutor(Service)

2019-09-04 Thread TisonKun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923003#comment-16923003
 ] 

TisonKun commented on FLINK-13961:
--

Hi [~kkl0u] I'd like to work on this issue. Please assign the issue to me if 
there is no more concern. I'm going to start progress once FLINK-13946 resolved 
since there would be several conflicts and FLINK-13946 is almost done.

> Remove obsolete abstraction JobExecutor(Service) 
> -
>
> Key: FLINK-13961
> URL: https://issues.apache.org/jira/browse/FLINK-13961
> Project: Flink
>  Issue Type: Sub-task
>  Components: Client / Job Submission
>Affects Versions: 1.10.0
>Reporter: TisonKun
>Priority: Major
> Fix For: 1.10.0
>
>
> Refer to Till's comment
> The JobExecutor and the JobExecutorService have been introduced to bridge 
> between the Flip-6 MiniCluster and the legacy FlinkMiniCluster. They should 
> be obsolete now and could be removed if needed.
> Actually we should make used of {{MiniClusterClient}} for submission ideally 
> but we have some tests based on MiniCluster in flink-runtime or somewhere 
> that doesn't have a dependency to flink-client; while move 
> {{MiniClusterClient}} to flink-runtime is unclear whether reasonable or not. 
> Thus I'd prefer keep {{executeJobBlocking}} for now and defer the possible 
> refactor.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


  1   2   3   4   5   6   >