[jira] [Commented] (FLINK-8976) End-to-end test: Resume with different parallelism

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413429#comment-16413429
 ] 

ASF GitHub Bot commented on FLINK-8976:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5745


> End-to-end test: Resume with different parallelism
> --
>
> Key: FLINK-8976
> URL: https://issues.apache.org/jira/browse/FLINK-8976
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
> with a different parallelism after taking 
> a) a savepoint
> b) from the last retained checkpoint



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8976) End-to-end test: Resume with different parallelism

2018-03-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411091#comment-16411091
 ] 

ASF GitHub Bot commented on FLINK-8976:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/5745
  
we may improve the test times by writing a custom reporter specifically for 
this test that writes to the log if the condition is met.


> End-to-end test: Resume with different parallelism
> --
>
> Key: FLINK-8976
> URL: https://issues.apache.org/jira/browse/FLINK-8976
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
> with a different parallelism after taking 
> a) a savepoint
> b) from the last retained checkpoint



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8976) End-to-end test: Resume with different parallelism

2018-03-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411075#comment-16411075
 ] 

ASF GitHub Bot commented on FLINK-8976:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/5745#discussion_r176678588
  
--- Diff: flink-end-to-end-tests/test-scripts/test_resume_savepoint.sh ---
@@ -17,11 +17,25 @@
 # limitations under the License.
 

 
+if [ -z $1 ] || [ -z $2 ]; then
+  echo "Usage: ./test_resume_savepoint.sh  "
+  exit 1
+fi
+
 source "$(dirname "$0")"/common.sh
 
-# modify configuration to have 2 slots
+ORIGINAL_DOP=$1
+NEW_DOP=$2
+
+if (( $ORIGINAL_DOP >= $NEW_DOP )); then
+  NUM_SLOTS=$(( $ORIGINAL_DOP + 1 ))
--- End diff --

 Yes, it is for the Kafka event generator job.


> End-to-end test: Resume with different parallelism
> --
>
> Key: FLINK-8976
> URL: https://issues.apache.org/jira/browse/FLINK-8976
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
> with a different parallelism after taking 
> a) a savepoint
> b) from the last retained checkpoint



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8976) End-to-end test: Resume with different parallelism

2018-03-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411078#comment-16411078
 ] 

ASF GitHub Bot commented on FLINK-8976:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/5745
  
Thanks @zentol.
I'll merge this (as well as #5733) with your comments addressed, as soon as 
Travis shows green.


> End-to-end test: Resume with different parallelism
> --
>
> Key: FLINK-8976
> URL: https://issues.apache.org/jira/browse/FLINK-8976
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
> with a different parallelism after taking 
> a) a savepoint
> b) from the last retained checkpoint



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8976) End-to-end test: Resume with different parallelism

2018-03-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411074#comment-16411074
 ] 

ASF GitHub Bot commented on FLINK-8976:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/5745#discussion_r176678540
  
--- Diff: flink-end-to-end-tests/test-scripts/test_resume_savepoint.sh ---
@@ -17,11 +17,25 @@
 # limitations under the License.
 

 
+if [ -z $1 ] || [ -z $2 ]; then
+  echo "Usage: ./test_resume_savepoint.sh  "
+  exit 1
+fi
+
 source "$(dirname "$0")"/common.sh
 
-# modify configuration to have 2 slots
+ORIGINAL_DOP=$1
+NEW_DOP=$2
+
+if (( $ORIGINAL_DOP >= $NEW_DOP )); then
+  NUM_SLOTS=$(( $ORIGINAL_DOP + 1 ))
+else
+  NUM_SLOTS=$(( $NEW_DOP + 1 ))
+fi
+
+# modify configuration to have enough slots
 cp $FLINK_DIR/conf/flink-conf.yaml $FLINK_DIR/conf/flink-conf.yaml.bak
-sed -i -e 's/taskmanager.numberOfTaskSlots: 
1/taskmanager.numberOfTaskSlots: 2/' $FLINK_DIR/conf/flink-conf.yaml
+sed -i -e "s/taskmanager.numberOfTaskSlots: 
1/taskmanager.numberOfTaskSlots: $NUM_SLOTS/" $FLINK_DIR/conf/flink-conf.yaml
 
 # modify configuration to use SLF4J reporter; we will be using this to 
monitor the state machine progress
 cp $FLINK_DIR/opt/flink-metrics-slf4j-1.6-SNAPSHOT.jar $FLINK_DIR/lib/
--- End diff --

That makes sense, will fix this.


> End-to-end test: Resume with different parallelism
> --
>
> Key: FLINK-8976
> URL: https://issues.apache.org/jira/browse/FLINK-8976
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
> with a different parallelism after taking 
> a) a savepoint
> b) from the last retained checkpoint



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8976) End-to-end test: Resume with different parallelism

2018-03-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411068#comment-16411068
 ] 

ASF GitHub Bot commented on FLINK-8976:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5745#discussion_r176677065
  
--- Diff: flink-end-to-end-tests/test-scripts/test_resume_savepoint.sh ---
@@ -17,11 +17,25 @@
 # limitations under the License.
 

 
+if [ -z $1 ] || [ -z $2 ]; then
+  echo "Usage: ./test_resume_savepoint.sh  "
+  exit 1
+fi
+
 source "$(dirname "$0")"/common.sh
 
-# modify configuration to have 2 slots
+ORIGINAL_DOP=$1
+NEW_DOP=$2
+
+if (( $ORIGINAL_DOP >= $NEW_DOP )); then
+  NUM_SLOTS=$(( $ORIGINAL_DOP + 1 ))
--- End diff --

would be good to explain where the +1 comes from. (i guess it is for the 
kafka job)


> End-to-end test: Resume with different parallelism
> --
>
> Key: FLINK-8976
> URL: https://issues.apache.org/jira/browse/FLINK-8976
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
> with a different parallelism after taking 
> a) a savepoint
> b) from the last retained checkpoint



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8976) End-to-end test: Resume with different parallelism

2018-03-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411067#comment-16411067
 ] 

ASF GitHub Bot commented on FLINK-8976:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5745#discussion_r176677259
  
--- Diff: flink-end-to-end-tests/test-scripts/test_resume_savepoint.sh ---
@@ -17,11 +17,25 @@
 # limitations under the License.
 

 
+if [ -z $1 ] || [ -z $2 ]; then
+  echo "Usage: ./test_resume_savepoint.sh  "
+  exit 1
+fi
+
 source "$(dirname "$0")"/common.sh
 
-# modify configuration to have 2 slots
+ORIGINAL_DOP=$1
+NEW_DOP=$2
+
+if (( $ORIGINAL_DOP >= $NEW_DOP )); then
+  NUM_SLOTS=$(( $ORIGINAL_DOP + 1 ))
+else
+  NUM_SLOTS=$(( $NEW_DOP + 1 ))
+fi
+
+# modify configuration to have enough slots
 cp $FLINK_DIR/conf/flink-conf.yaml $FLINK_DIR/conf/flink-conf.yaml.bak
-sed -i -e 's/taskmanager.numberOfTaskSlots: 
1/taskmanager.numberOfTaskSlots: 2/' $FLINK_DIR/conf/flink-conf.yaml
+sed -i -e "s/taskmanager.numberOfTaskSlots: 
1/taskmanager.numberOfTaskSlots: $NUM_SLOTS/" $FLINK_DIR/conf/flink-conf.yaml
 
 # modify configuration to use SLF4J reporter; we will be using this to 
monitor the state machine progress
 cp $FLINK_DIR/opt/flink-metrics-slf4j-1.6-SNAPSHOT.jar $FLINK_DIR/lib/
--- End diff --

can we define the version as an environment variable in the common.sh?


> End-to-end test: Resume with different parallelism
> --
>
> Key: FLINK-8976
> URL: https://issues.apache.org/jira/browse/FLINK-8976
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
> with a different parallelism after taking 
> a) a savepoint
> b) from the last retained checkpoint



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8976) End-to-end test: Resume with different parallelism

2018-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409264#comment-16409264
 ] 

ASF GitHub Bot commented on FLINK-8976:
---

GitHub user tzulitai opened a pull request:

https://github.com/apache/flink/pull/5745

[FLINK-8976] [test] Add end-to-end test for resuming savepoints with 
different parallelism

## What is the purpose of the change

This PR adds end-to-end tests for resuming a savepoint with different 
parallelisms.


## Brief change log

The changes are based on the new `test_resume_savepoint.sh` test script in 
#5733, so only the last commit is relevant.

That script has been modified to be able to specify parallelism of the 
state machine job, before and after the savepoint restore.

- Adapt `test_resume_savepoint.sh` to be able to use different parallelism 
after the savepoint restore.
- Add scale up / scale down tests to be executed by Travis.

## Verifying this change

This PR adds new tests.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (yes / **no**)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
  - The serializers: (yes / **no** / don't know)
  - The runtime per-record code paths (performance sensitive): (yes / 
**no** / don't know)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
  - The S3 file system connector: (yes / **no** / don't know)

## Documentation

  - Does this pull request introduce a new feature? (yes / **no**)
  - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tzulitai/flink FLINK-8976

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5745


commit 529e060cb05fd723b8656dcc9ef48f8011282dd8
Author: Tzu-Li (Gordon) Tai 
Date:   2018-03-21T08:25:37Z

[FLINK-8975] [test] Add Kafka events generator job for StateMachineExample

commit 6b5126c006c752c6c1bee2699d429927e74587c9
Author: Tzu-Li (Gordon) Tai 
Date:   2018-03-21T08:32:51Z

[FLINK-8975] [test] Add resume from savepoint end-to-end test

commit 475ef4de0f8abbd3bb485e0e6fefcce2612074d3
Author: Tzu-Li (Gordon) Tai 
Date:   2018-03-22T07:34:49Z

fixup! Use SLF4J reporter to monitor state machine progress

commit 55d69567bc4cb61cd08ad852126c9c2c39b8feb4
Author: Tzu-Li (Gordon) Tai 
Date:   2018-03-22T09:09:53Z

[FLINK-8976] [test] Add end-to-end tests for resuming savepoint with 
differrent parallelism




> End-to-end test: Resume with different parallelism
> --
>
> Key: FLINK-8976
> URL: https://issues.apache.org/jira/browse/FLINK-8976
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
> with a different parallelism after taking 
> a) a savepoint
> b) from the last retained checkpoint



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8976) End-to-end test: Resume with different parallelism

2018-03-19 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405096#comment-16405096
 ] 

Till Rohrmann commented on FLINK-8976:
--

My oversight. It should also work with incremental checkpoints. At least it 
worked for me when trying it out. I will correct the description.

> End-to-end test: Resume with different parallelism
> --
>
> Key: FLINK-8976
> URL: https://issues.apache.org/jira/browse/FLINK-8976
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
> with a different parallelism after taking 
> a) a savepoint
> b) from the last retained checkpoint (this won't work with RocksDB 
> incremental checkpoints) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8976) End-to-end test: Resume with different parallelism

2018-03-16 Thread Sihua Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402023#comment-16402023
 ] 

Sihua Zhou commented on FLINK-8976:
---

Hi [~till.rohrmann] could I ask why this issue won't work with RocksDB 
incremental checkpoints?

> End-to-end test: Resume with different parallelism
> --
>
> Key: FLINK-8976
> URL: https://issues.apache.org/jira/browse/FLINK-8976
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
> with a different parallelism after taking 
> a) a savepoint
> b) from the last retained checkpoint (this won't work with RocksDB 
> incremental checkpoints) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)