[jira] [Commented] (FLINK-10400) Return failed JobResult if job terminates in state FAILED or CANCELED
[ https://issues.apache.org/jira/browse/FLINK-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630088#comment-16630088 ] ASF GitHub Bot commented on FLINK-10400: asfgit closed pull request #6742: [FLINK-10400] Fail JobResult if application finished in CANCELED or FAILED state URL: https://github.com/apache/flink/pull/6742 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/flink-clients/src/main/java/org/apache/flink/client/program/MiniClusterClient.java b/flink-clients/src/main/java/org/apache/flink/client/program/MiniClusterClient.java index 81cf784441d..3077f183acb 100644 --- a/flink-clients/src/main/java/org/apache/flink/client/program/MiniClusterClient.java +++ b/flink-clients/src/main/java/org/apache/flink/client/program/MiniClusterClient.java @@ -21,6 +21,7 @@ import org.apache.flink.api.common.JobID; import org.apache.flink.api.common.JobSubmissionResult; import org.apache.flink.configuration.Configuration; +import org.apache.flink.runtime.client.JobExecutionException; import org.apache.flink.runtime.client.JobStatusMessage; import org.apache.flink.runtime.clusterframework.messages.GetClusterStatusResponse; import org.apache.flink.runtime.executiongraph.AccessExecutionGraph; @@ -94,8 +95,8 @@ public JobSubmissionResult submitJob(JobGraph jobGraph, ClassLoader classLoader) try { return jobResult.toJobExecutionResult(classLoader); - } catch (JobResult.WrappedJobException e) { - throw new ProgramInvocationException("Job failed", jobGraph.getJobID(), e.getCause()); + } catch (JobExecutionException e) { + throw new ProgramInvocationException("Job failed", jobGraph.getJobID(), e); } catch (IOException | ClassNotFoundException e) { throw new ProgramInvocationException("Job failed", jobGraph.getJobID(), e); } diff --git a/flink-clients/src/main/java/org/apache/flink/client/program/rest/RestClusterClient.java b/flink-clients/src/main/java/org/apache/flink/client/program/rest/RestClusterClient.java index 935a07faf89..86cc52da3b2 100644 --- a/flink-clients/src/main/java/org/apache/flink/client/program/rest/RestClusterClient.java +++ b/flink-clients/src/main/java/org/apache/flink/client/program/rest/RestClusterClient.java @@ -32,6 +32,7 @@ import org.apache.flink.client.program.rest.retry.WaitStrategy; import org.apache.flink.configuration.Configuration; import org.apache.flink.core.fs.Path; +import org.apache.flink.runtime.client.JobExecutionException; import org.apache.flink.runtime.client.JobStatusMessage; import org.apache.flink.runtime.client.JobSubmissionException; import org.apache.flink.runtime.clusterframework.messages.GetClusterStatusResponse; @@ -263,8 +264,8 @@ public JobSubmissionResult submitJob(JobGraph jobGraph, ClassLoader classLoader) try { this.lastJobExecutionResult = jobResult.toJobExecutionResult(classLoader); return lastJobExecutionResult; - } catch (JobResult.WrappedJobException we) { - throw new ProgramInvocationException("Job failed.", jobGraph.getJobID(), we.getCause()); + } catch (JobExecutionException e) { + throw new ProgramInvocationException("Job failed.", jobGraph.getJobID(), e); } catch (IOException | ClassNotFoundException e) { throw new ProgramInvocationException("Job failed.", jobGraph.getJobID(), e); } diff --git a/flink-clients/src/test/java/org/apache/flink/client/program/rest/RestClusterClientTest.java b/flink-clients/src/test/java/org/apache/flink/client/program/rest/RestClusterClientTest.java index 75f16c03330..abe59d38bb6 100644 --- a/flink-clients/src/test/java/org/apache/flink/client/program/rest/RestClusterClientTest.java +++ b/flink-clients/src/test/java/org/apache/flink/client/program/rest/RestClusterClientTest.java @@ -31,6 +31,7 @@ import org.apache.flink.configuration.JobManagerOptions; import org.apache.flink.configuration.RestOptions; import org.apache.flink.runtime.client.JobStatusMessage; +import org.apache.flink.runtime.clusterframework.ApplicationStatus; import org.apache.flink.runtime.concurrent.FutureUtils; import org.apache.flink.runtime.dispatcher.Dispatcher; import org.apache.flink.runtime.dispatcher.DispatcherGateway; @@ -122,6 +123,7 @@ import java.util.List; import
[jira] [Commented] (FLINK-10400) Return failed JobResult if job terminates in state FAILED or CANCELED
[ https://issues.apache.org/jira/browse/FLINK-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628927#comment-16628927 ] ASF GitHub Bot commented on FLINK-10400: tillrohrmann commented on issue #6742: [FLINK-10400] Fail JobResult if application finished in CANCELED or FAILED state URL: https://github.com/apache/flink/pull/6742#issuecomment-424755066 Thanks for the review @TisonKun and @zentol. Merging this PR once Travis gives green light (after rebasing). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Return failed JobResult if job terminates in state FAILED or CANCELED > - > > Key: FLINK-10400 > URL: https://issues.apache.org/jira/browse/FLINK-10400 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.6.1, 1.7.0, 1.5.4 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0, 1.6.2, 1.5.5 > > > If the job reaches the globally terminal state {{FAILED}} or {{CANCELED}}, > the {{JobResult}} must return a non-successful result. At the moment, it can > happen that in the {{CANCELED}} state where we don't find a failure cause > that we return a successful {{JobResult}}. > In order to change this I propose to always return a {{JobResult}} with a > {{JobCancellationException}} in case of {{CANCELED}} and a > {{JobExecutionException}} in case of {{FAILED}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10400) Return failed JobResult if job terminates in state FAILED or CANCELED
[ https://issues.apache.org/jira/browse/FLINK-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628923#comment-16628923 ] ASF GitHub Bot commented on FLINK-10400: tillrohrmann commented on a change in pull request #6742: [FLINK-10400] Fail JobResult if application finished in CANCELED or FAILED state URL: https://github.com/apache/flink/pull/6742#discussion_r220605439 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobResult.java ## @@ -108,22 +118,39 @@ public long getNetRuntime() { * * @param classLoader to use for deserialization * @return JobExecutionResult -* @throws WrappedJobException if the JobResult contains a serialized exception +* @throws JobExecutionException if the job execution did not succeed Review comment: Good point @TisonKun. Will add it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Return failed JobResult if job terminates in state FAILED or CANCELED > - > > Key: FLINK-10400 > URL: https://issues.apache.org/jira/browse/FLINK-10400 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.6.1, 1.7.0, 1.5.4 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0, 1.6.2, 1.5.5 > > > If the job reaches the globally terminal state {{FAILED}} or {{CANCELED}}, > the {{JobResult}} must return a non-successful result. At the moment, it can > happen that in the {{CANCELED}} state where we don't find a failure cause > that we return a successful {{JobResult}}. > In order to change this I propose to always return a {{JobResult}} with a > {{JobCancellationException}} in case of {{CANCELED}} and a > {{JobExecutionException}} in case of {{FAILED}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10400) Return failed JobResult if job terminates in state FAILED or CANCELED
[ https://issues.apache.org/jira/browse/FLINK-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625942#comment-16625942 ] ASF GitHub Bot commented on FLINK-10400: TisonKun commented on a change in pull request #6742: [FLINK-10400] Fail JobResult if application finished in CANCELED or FAILED state URL: https://github.com/apache/flink/pull/6742#discussion_r219871333 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobResult.java ## @@ -108,22 +118,39 @@ public long getNetRuntime() { * * @param classLoader to use for deserialization * @return JobExecutionResult -* @throws WrappedJobException if the JobResult contains a serialized exception +* @throws JobExecutionException if the job execution did not succeed Review comment: this method might throws `JobCancellationException` if the job is cancelled, should be documented. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Return failed JobResult if job terminates in state FAILED or CANCELED > - > > Key: FLINK-10400 > URL: https://issues.apache.org/jira/browse/FLINK-10400 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.6.1, 1.7.0, 1.5.4 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0, 1.6.2, 1.5.5 > > > If the job reaches the globally terminal state {{FAILED}} or {{CANCELED}}, > the {{JobResult}} must return a non-successful result. At the moment, it can > happen that in the {{CANCELED}} state where we don't find a failure cause > that we return a successful {{JobResult}}. > In order to change this I propose to always return a {{JobResult}} with a > {{JobCancellationException}} in case of {{CANCELED}} and a > {{JobExecutionException}} in case of {{FAILED}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10400) Return failed JobResult if job terminates in state FAILED or CANCELED
[ https://issues.apache.org/jira/browse/FLINK-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625902#comment-16625902 ] ASF GitHub Bot commented on FLINK-10400: tillrohrmann opened a new pull request #6742: [FLINK-10400] Fail JobResult if application finished in CANCELED or FAILED state URL: https://github.com/apache/flink/pull/6742 ## What is the purpose of the change When generating a `JobExecutionResult` from a `JobResult`, we fail the conversion if the application status was not `SUCCEEDED`. In case of the CANCELED state, the client will throw an JobCancellationException. In case of the FAILED state, the client will throw an JobExecutionException. ## Brief change log - Add `ApplicationStatus` field to `JobResult` - Throw `JobCancellationException` if `ApplicationStatus` was `CANCELED` - Throw `JobExecutionException` if `ApplicationStatus` was `FAILED` ## Verifying this change - Added `JobResultTest#testCancelledJobIsFailureResult`, `#testFailedJobIsFailureResult`, `#testCancelledJobThrowsJobCancellationException` and `#testFailedJobThrowsJobExecutionException`. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Return failed JobResult if job terminates in state FAILED or CANCELED > - > > Key: FLINK-10400 > URL: https://issues.apache.org/jira/browse/FLINK-10400 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.6.1, 1.7.0, 1.5.4 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0, 1.6.2, 1.5.5 > > > If the job reaches the globally terminal state {{FAILED}} or {{CANCELED}}, > the {{JobResult}} must return a non-successful result. At the moment, it can > happen that in the {{CANCELED}} state where we don't find a failure cause > that we return a successful {{JobResult}}. > In order to change this I propose to always return a {{JobResult}} with a > {{JobCancellationException}} in case of {{CANCELED}} and a > {{JobExecutionException}} in case of {{FAILED}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10400) Return failed JobResult if job terminates in state FAILED or CANCELED
[ https://issues.apache.org/jira/browse/FLINK-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625030#comment-16625030 ] tison commented on FLINK-10400: --- Agree. It is a code wart that should be fixed. To be more clear, return a {{JobResult}} with {{Exception}} as described, {{addSuppressed}} if there is a failure cause. > Return failed JobResult if job terminates in state FAILED or CANCELED > - > > Key: FLINK-10400 > URL: https://issues.apache.org/jira/browse/FLINK-10400 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.6.1, 1.7.0, 1.5.4 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Fix For: 1.7.0, 1.6.2, 1.5.5 > > > If the job reaches the globally terminal state {{FAILED}} or {{CANCELED}}, > the {{JobResult}} must return a non-successful result. At the moment, it can > happen that in the {{CANCELED}} state where we don't find a failure cause > that we return a successful {{JobResult}}. > In order to change this I propose to always return a {{JobResult}} with a > {{JobCancellationException}} in case of {{CANCELED}} and a > {{JobExecutionException}} in case of {{FAILED}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)