[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception

2015-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328028#comment-14328028
 ] 

ASF GitHub Bot commented on FLINK-1556:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/422


> JobClient does not wait until a job failed completely if submission exception
> -
>
> Key: FLINK-1556
> URL: https://issues.apache.org/jira/browse/FLINK-1556
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>
> If an exception occurs during job submission the {{JobClient}} received a 
> {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} 
> terminates itself and returns the error to the {{Client}}. This indicates to 
> the user that the job has been completely failed which is not necessarily 
> true. 
> If the user directly after such a failure submits another job, then it might 
> be the case that not all slots of the formerly failed job are returned. This 
> can lead to a {{NoRessourceAvailableException}}.
> We can solve this problem by waiting for the completion of the job failure in 
> the {{JobClient}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception

2015-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327812#comment-14327812
 ] 

ASF GitHub Bot commented on FLINK-1556:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/422#issuecomment-75102524
  
Looks good, will merge this as well...


> JobClient does not wait until a job failed completely if submission exception
> -
>
> Key: FLINK-1556
> URL: https://issues.apache.org/jira/browse/FLINK-1556
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>
> If an exception occurs during job submission the {{JobClient}} received a 
> {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} 
> terminates itself and returns the error to the {{Client}}. This indicates to 
> the user that the job has been completely failed which is not necessarily 
> true. 
> If the user directly after such a failure submits another job, then it might 
> be the case that not all slots of the formerly failed job are returned. This 
> can lead to a {{NoRessourceAvailableException}}.
> We can solve this problem by waiting for the completion of the job failure in 
> the {{JobClient}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception

2015-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327632#comment-14327632
 ] 

ASF GitHub Bot commented on FLINK-1556:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/422

[FLINK-1556] Corrects faulty JobClient behaviour in case of a submission 
failure

Corrects the behaviour of the ```JobClient``` in case of a submission 
failure. The PR also contains test cases for the job submission.

Additionally, reworked how exceptions are transmitted from the 
```JobManager``` to the ```JobClient```. They are directly wrapped into a 
```akka.actor.Status.Failure``` and send to the ```JobClient```.

This PR is based on #419.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink fixSubmissionExceptions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/422.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #422


commit 8cc604d61d75370972146333c5a016b5fcdddc77
Author: Till Rohrmann 
Date:   2015-02-19T10:04:56Z

[FLINK-1584] [runtime][tests] Fixes TaskManagerFailsITCase by replacing the 
TestingCluster with a ForkableFlinkMiniCluster

commit 8ecca959d2bf96fa8be1961b413f4a2c45cf50e1
Author: Till Rohrmann 
Date:   2015-02-19T11:44:32Z

[FLINK-1556] [runtime] Fails jobs properly in case of a job submission 
exception

Conflicts:

flink-runtime/src/test/scala/org/apache/flink/runtime/testingUtils/TestingUtils.scala

flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/taskmanager/TaskManagerFailsITCase.scala




> JobClient does not wait until a job failed completely if submission exception
> -
>
> Key: FLINK-1556
> URL: https://issues.apache.org/jira/browse/FLINK-1556
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>
> If an exception occurs during job submission the {{JobClient}} received a 
> {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} 
> terminates itself and returns the error to the {{Client}}. This indicates to 
> the user that the job has been completely failed which is not necessarily 
> true. 
> If the user directly after such a failure submits another job, then it might 
> be the case that not all slots of the formerly failed job are returned. This 
> can lead to a {{NoRessourceAvailableException}}.
> We can solve this problem by waiting for the completion of the job failure in 
> the {{JobClient}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception

2015-02-17 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324364#comment-14324364
 ] 

Robert Metzger commented on FLINK-1556:
---

Also, it seems that these "failearily" jobs are not properly removed from the 
jobmanager?

http://imgur.com/PyuQEfm


> JobClient does not wait until a job failed completely if submission exception
> -
>
> Key: FLINK-1556
> URL: https://issues.apache.org/jira/browse/FLINK-1556
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>
> If an exception occurs during job submission the {{JobClient}} received a 
> {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} 
> terminates itself and returns the error to the {{Client}}. This indicates to 
> the user that the job has been completely failed which is not necessarily 
> true. 
> If the user directly after such a failure submits another job, then it might 
> be the case that not all slots of the formerly failed job are returned. This 
> can lead to a {{NoRessourceAvailableException}}.
> We can solve this problem by waiting for the completion of the job failure in 
> the {{JobClient}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception

2015-02-17 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324359#comment-14324359
 ] 

Robert Metzger commented on FLINK-1556:
---

I think showing the whole stacktrace of the exception is helpful to understand 
the deployment issue better.

> JobClient does not wait until a job failed completely if submission exception
> -
>
> Key: FLINK-1556
> URL: https://issues.apache.org/jira/browse/FLINK-1556
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>
> If an exception occurs during job submission the {{JobClient}} received a 
> {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} 
> terminates itself and returns the error to the {{Client}}. This indicates to 
> the user that the job has been completely failed which is not necessarily 
> true. 
> If the user directly after such a failure submits another job, then it might 
> be the case that not all slots of the formerly failed job are returned. This 
> can lead to a {{NoRessourceAvailableException}}.
> We can solve this problem by waiting for the completion of the job failure in 
> the {{JobClient}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception

2015-02-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324122#comment-14324122
 ] 

ASF GitHub Bot commented on FLINK-1556:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/406


> JobClient does not wait until a job failed completely if submission exception
> -
>
> Key: FLINK-1556
> URL: https://issues.apache.org/jira/browse/FLINK-1556
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>
> If an exception occurs during job submission the {{JobClient}} received a 
> {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} 
> terminates itself and returns the error to the {{Client}}. This indicates to 
> the user that the job has been completely failed which is not necessarily 
> true. 
> If the user directly after such a failure submits another job, then it might 
> be the case that not all slots of the formerly failed job are returned. This 
> can lead to a {{NoRessourceAvailableException}}.
> We can solve this problem by waiting for the completion of the job failure in 
> the {{JobClient}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception

2015-02-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323936#comment-14323936
 ] 

ASF GitHub Bot commented on FLINK-1556:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/406#issuecomment-74633679
  
Ok, I'll merge it.


> JobClient does not wait until a job failed completely if submission exception
> -
>
> Key: FLINK-1556
> URL: https://issues.apache.org/jira/browse/FLINK-1556
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>
> If an exception occurs during job submission the {{JobClient}} received a 
> {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} 
> terminates itself and returns the error to the {{Client}}. This indicates to 
> the user that the job has been completely failed which is not necessarily 
> true. 
> If the user directly after such a failure submits another job, then it might 
> be the case that not all slots of the formerly failed job are returned. This 
> can lead to a {{NoRessourceAvailableException}}.
> We can solve this problem by waiting for the completion of the job failure in 
> the {{JobClient}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception

2015-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323219#comment-14323219
 ] 

ASF GitHub Bot commented on FLINK-1556:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/406#issuecomment-74565029
  
Looks good to me!


> JobClient does not wait until a job failed completely if submission exception
> -
>
> Key: FLINK-1556
> URL: https://issues.apache.org/jira/browse/FLINK-1556
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>
> If an exception occurs during job submission the {{JobClient}} received a 
> {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} 
> terminates itself and returns the error to the {{Client}}. This indicates to 
> the user that the job has been completely failed which is not necessarily 
> true. 
> If the user directly after such a failure submits another job, then it might 
> be the case that not all slots of the formerly failed job are returned. This 
> can lead to a {{NoRessourceAvailableException}}.
> We can solve this problem by waiting for the completion of the job failure in 
> the {{JobClient}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1556) JobClient does not wait until a job failed completely if submission exception

2015-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14322884#comment-14322884
 ] 

ASF GitHub Bot commented on FLINK-1556:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/406

[FLINK-1556] Corrects faulty JobClient behaviour in case of a submission 
failure

If an error occurred during job submission, a ```SubmissionFailure``` is 
sent to the ```JobClient```. As a reaction, the ```JobClient``` terminated 
itself and sent the failure to the ```Client```. However, this does not 
necessarily mean that the job has reached a terminal state, because the failing 
procedure is executed asynchronously.

The ```JobClient``` now waits until it receives a ```JobResult``` message 
indicating that the job has completed and all resources are properly returned.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink minorFixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/406.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #406


commit 2f32e9c6b87b8e295f792c04306d78fbb858f80d
Author: Till Rohrmann 
Date:   2015-02-16T09:17:21Z

[FLINK-1556] [runtime] Corrects faulty JobClient behaviour in case of a 
submission failure




> JobClient does not wait until a job failed completely if submission exception
> -
>
> Key: FLINK-1556
> URL: https://issues.apache.org/jira/browse/FLINK-1556
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>
> If an exception occurs during job submission the {{JobClient}} received a 
> {{SubmissionFailure}}. Upon receiving this message, the {{JobClient}} 
> terminates itself and returns the error to the {{Client}}. This indicates to 
> the user that the job has been completely failed which is not necessarily 
> true. 
> If the user directly after such a failure submits another job, then it might 
> be the case that not all slots of the formerly failed job are returned. This 
> can lead to a {{NoRessourceAvailableException}}.
> We can solve this problem by waiting for the completion of the job failure in 
> the {{JobClient}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)