[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531842#comment-16531842 ] ASF GitHub Bot commented on FLINK-8785: --- Github user zentol closed the pull request at: https://github.com/apache/flink/pull/6222 > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6, pull-request-available > Fix For: 1.6.0, 1.5.1 > > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531293#comment-16531293 ] ASF GitHub Bot commented on FLINK-8785: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/6222 Merging, will create the issue once I'm done. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6, pull-request-available > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531275#comment-16531275 ] ASF GitHub Bot commented on FLINK-8785: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/6222#discussion_r199783421 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/JobSubmitHandler.java --- @@ -66,6 +67,9 @@ public JobSubmitHandler( } return gateway.submitJob(jobGraph, timeout) - .thenApply(ack -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID())); + .thenApply(ack -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID())) + .exceptionally(exception -> { + throw new CompletionException(new RestHandlerException("Job submission failed.", HttpResponseStatus.INTERNAL_SERVER_ERROR, exception)); --- End diff -- I would be in favor of approach 3 because we are doing something similar for the `JobExecutionResult`/`JobResult`. We could then throw the exception on the `RestClusterClient`. And I also agree that this is something we can add as a follow up. Can you please create a JIRA issue for this @zentol. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6, pull-request-available > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530990#comment-16530990 ] ASF GitHub Bot commented on FLINK-8785: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6222#discussion_r199719645 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/JobSubmitHandler.java --- @@ -66,6 +67,9 @@ public JobSubmitHandler( } return gateway.submitJob(jobGraph, timeout) - .thenApply(ack -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID())); + .thenApply(ack -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID())) + .exceptionally(exception -> { + throw new CompletionException(new RestHandlerException("Job submission failed.", HttpResponseStatus.INTERNAL_SERVER_ERROR, exception)); --- End diff -- Do note that this discussion isn't really blocking the PR from being merged as it would effectively be an extension of it. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6, pull-request-available > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529799#comment-16529799 ] ASF GitHub Bot commented on FLINK-8785: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6222#discussion_r199477966 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/JobSubmitHandler.java --- @@ -66,6 +67,9 @@ public JobSubmitHandler( } return gateway.submitJob(jobGraph, timeout) - .thenApply(ack -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID())); + .thenApply(ack -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID())) + .exceptionally(exception -> { + throw new CompletionException(new RestHandlerException("Job submission failed.", HttpResponseStatus.INTERNAL_SERVER_ERROR, exception)); --- End diff -- well, there's no doubt that it _could_ be helpful; my point is that it can be _harmful_ if not done properly. The `submitJob` should either provide the `JobSubmitHandler` with means to detect these exceptions and create adequate responses, or explicitly throw exceptions with messages that we can safely pass on to users. That said, I do not know how to do either of these things in a good way. For completeness sake, here are ideas that came to mind: ## 1 Introduce a special `FlinkUserFacingException` that we "trust" to contain a good error message. Con: This provides little additional safety and will never provide proper HTTP response code. ## 2 Introduce dedicated exceptions for the scenarios that you listed and explicitly look for them in the `exceptionally` block, i.e ``` .exceptionally(exception -> { if (exception instanceof JobAlreadyExistsException) { throw new CompletionException(new RestHandlerException("Job already exists.", HttpResponseStatus.BAD_REQUEST, exception)); } else { throw new CompletionException(new RestHandlerException("Job submission failed.", HttpResponseStatus.INTERNAL_SERVER_ERROR, exception)); } } ``` Con: Obviously, this approach is inherently flawed as there is no guarantee that a given exception can be thrown; we would have to manually keep it in sync with the actual implementation because `CompletableFuture` throw a wrench into sane exception handling. ## 3 Encode possible user-facing exceptions in the return value of `submitJob`, i.e. return a `AckOrException` ``` public class AckOrException { // holds exception, could also be a series of nullable fields private final SuperEither exception; ... public void throwIfError() throws ExceptionA, ExceptionB, ExceptionC; } ``` Con: Relies on users to call `throwIfError` and introduces an entirely separate channel for passing errors, but it would allow exception matching. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6, pull-request-available > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528914#comment-16528914 ] ASF GitHub Bot commented on FLINK-8785: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/6222#discussion_r199333683 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/JobSubmitHandler.java --- @@ -66,6 +67,9 @@ public JobSubmitHandler( } return gateway.submitJob(jobGraph, timeout) - .thenApply(ack -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID())); + .thenApply(ack -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID())) + .exceptionally(exception -> { + throw new CompletionException(new RestHandlerException("Job submission failed.", HttpResponseStatus.INTERNAL_SERVER_ERROR, exception)); --- End diff -- I see your point. I'm just wondering whether some bits of context wouldn't be helpful on the client side when using the CLI. So for example if the job was misconfigured or if it was already submitted to the cluster in HA mode, then it would be helpful for the user to know. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6, pull-request-available > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528609#comment-16528609 ] ASF GitHub Bot commented on FLINK-8785: --- Github user yanghua commented on the issue: https://github.com/apache/flink/pull/6229 Hi @satybald thanks for your contribution, based on [Flink's contribution guide](http://flink.apache.org/how-to-contribute.html) you'd better open an issue in [JIRA](https://issues.apache.org/jira/projects/FLINK/issues/FLINK-8785?filter=allopenissues) before submit a PR. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6, pull-request-available > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528071#comment-16528071 ] ASF GitHub Bot commented on FLINK-8785: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6222#discussion_r199248513 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/JobSubmitHandler.java --- @@ -66,6 +67,9 @@ public JobSubmitHandler( } return gateway.submitJob(jobGraph, timeout) - .thenApply(ack -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID())); + .thenApply(ack -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID())) + .exceptionally(exception -> { + throw new CompletionException(new RestHandlerException("Job submission failed.", HttpResponseStatus.INTERNAL_SERVER_ERROR, exception)); --- End diff -- I'm not quite fond of the idea. As alluded in the PR description and JIRA I agree that the existing error message isn't _helpful_, yet better than the current state. I rather like that so far the REST API has control over the error messages. This ensures that the user only sees messages that were actually meant for him. In contrast, exception messages are pretty much arbitrary. They may change at will, the audience isn't defined (user vs dev), may only helpful if the fully stack trace is present, often don't have any message at all (see usages of `Preconditions`, or NPEs) and typically only describe what went wrong, not why, how to fix it or if it even was a user-error. Given that this would break down the barrier between internal/user-facing messages you obviously also run into cases where users have _no idea_ what the message even means. Finally you end up with mismatches between the error message and error code. To me the underlying issue is that `submitJob` funnels all manner of exceptions into a `FlinkException/JobSubmissionException` that we can't do much with. Neither can we categorize them in any way, nor distinguish between who's responsible (user vs Flink) nor when in the process the failure occurred. Without diving into the implementation you don't _even know which exceptions are thrown_, but i suppose this is a general issue of `CompletableFutures`. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6, pull-request-available > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527669#comment-16527669 ] ASF GitHub Bot commented on FLINK-8785: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/6222#discussion_r199170250 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/JobSubmitHandler.java --- @@ -66,6 +67,9 @@ public JobSubmitHandler( } return gateway.submitJob(jobGraph, timeout) - .thenApply(ack -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID())); + .thenApply(ack -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID())) + .exceptionally(exception -> { + throw new CompletionException(new RestHandlerException("Job submission failed.", HttpResponseStatus.INTERNAL_SERVER_ERROR, exception)); --- End diff -- Maybe we could add the `exception.getMessage` to the `message` of the `RestHandlerException`. Otherwise the user will only see `"Job submission failed."` in the `ErrorResponseBody`. With the change it could be `"Job submission failed: Failure cause"` > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6, pull-request-available > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527671#comment-16527671 ] ASF GitHub Bot commented on FLINK-8785: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/6222#discussion_r199171079 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/rest/handler/job/JobSubmitHandlerTest.java --- @@ -87,4 +89,33 @@ public void testSuccessfulJobSubmission() throws Exception { handler.handleRequest(new HandlerRequest<>(request, EmptyMessageParameters.getInstance()), mockGateway) .get(); } + + @Test + public void testFailedJobSubmission() throws Exception { + final String errorMessage = "test"; + DispatcherGateway mockGateway = mock(DispatcherGateway.class); + when(mockGateway.submitJob(any(JobGraph.class), any(Time.class))).thenReturn(FutureUtils.completedExceptionally(new Exception(errorMessage))); + GatewayRetriever mockGatewayRetriever = mock(GatewayRetriever.class); + + JobSubmitHandler handler = new JobSubmitHandler( + CompletableFuture.completedFuture("http://localhost:1234;), + mockGatewayRetriever, + RpcUtils.INF_TIMEOUT, + Collections.emptyMap()); + + JobGraph job = new JobGraph("testjob"); + JobSubmitRequestBody request = new JobSubmitRequestBody(job); + + try { + handler.handleRequest(new HandlerRequest<>(request, EmptyMessageParameters.getInstance()), mockGateway) + .get(); + } catch (Exception e) { + Throwable t = ExceptionUtils.stripExecutionException(e); + if (t instanceof RestHandlerException){ + Assert.assertTrue(t.getMessage().equals("Job submission failed.")); + } else { + throw e; + } + } --- End diff -- I think we should make sure that `errorMessage` is part of the `RestHandlerException#message`. Otherwise this information won't be sent to the client in the form of the `ErrorResponseBody`. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6, pull-request-available > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527670#comment-16527670 ] ASF GitHub Bot commented on FLINK-8785: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/6222#discussion_r199170668 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/rest/handler/job/JobSubmitHandlerTest.java --- @@ -87,4 +89,33 @@ public void testSuccessfulJobSubmission() throws Exception { handler.handleRequest(new HandlerRequest<>(request, EmptyMessageParameters.getInstance()), mockGateway) .get(); } + + @Test + public void testFailedJobSubmission() throws Exception { + final String errorMessage = "test"; + DispatcherGateway mockGateway = mock(DispatcherGateway.class); + when(mockGateway.submitJob(any(JobGraph.class), any(Time.class))).thenReturn(FutureUtils.completedExceptionally(new Exception(errorMessage))); + GatewayRetriever mockGatewayRetriever = mock(GatewayRetriever.class); --- End diff -- No need to create a mock. `() -> CompletableFuture.completed(mockGateway)` should be good enough. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6, pull-request-available > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526130#comment-16526130 ] ASF GitHub Bot commented on FLINK-8785: --- GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/6222 [FLINK-8785][rest] Handle JobSubmissionExceptions ## What is the purpose of the change This PR modifies the `JobSubmitHandler` to handle exceptions contained in the future returned by `DispatcherGateway#submitJob`. An exception handler was added via `CompletableFuture#exceptionally` to return a proper `ErrorResponseBody` signaling that the job submission has failed. This PR is pretty much the bare-bones solution; in the JIRA I advocated for including error messages from exceptions since there are various reasons why the submission could fail, but I can't find a satisfying solution. ## Verifying this change * see new test in `JobSubmitHandlerTest` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink 8785_basic Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6222.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6222 commit 32fe49270596cdcf2f91f822c3a6504a14ba40eb Author: zentol Date: 2018-06-28T08:57:01Z [FLINK-8785][rest] Handle JobSubmissionExceptions > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6, pull-request-available > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512451#comment-16512451 ] ASF GitHub Bot commented on FLINK-8785: --- Github user zentol closed the pull request at: https://github.com/apache/flink/pull/6158 > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6 > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510851#comment-16510851 ] ASF GitHub Bot commented on FLINK-8785: --- Github user buptljy closed the pull request at: https://github.com/apache/flink/pull/5877 > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6 > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510844#comment-16510844 ] ASF GitHub Bot commented on FLINK-8785: --- GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/6158 [FLINK-8785][rest] Handle JobSubmissionExceptions ## What is the purpose of the change Currently, if a `JobSubmissionException` occurs the `JobSubmissionHandler` only returns `505 Internal server error.`, with no details about the original exception. This PR modifies the handler to returns the messages of all exceptions in the stack trace. ## Brief change log * modify `RestHandlerException` to allow multiple error messages, similar to `ErrorResponseBody` * modify `JobSubmitHandler` to handle exceptions during the job submission ## Verifying this change * see `JobSubmitHandlerTest#testFailedJobSubmission ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (yes) - If yes, how is the feature documented? (not applicable) You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink 8785 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6158.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6158 commit bdb9a7b3949ce0fc1d31e57459e51f593ee0b699 Author: zentol Date: 2018-06-13T09:15:35Z [FLINK-8785][rest] Handle JobSubmissionExceptions > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Blocker > Labels: flip-6 > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444055#comment-16444055 ] ASF GitHub Bot commented on FLINK-8785: --- Github user buptljy commented on the issue: https://github.com/apache/flink/pull/5877 @zentol ok, I thought that it was a small change before, so you mean we can make more changes on the messages of the exceptions and let them reported more properly ? I will test all the cases in the JobSubmissionFailsITCase. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Assignee: buptljy >Priority: Critical > Labels: flip-6 > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443562#comment-16443562 ] ASF GitHub Bot commented on FLINK-8785: --- GitHub user buptljy opened a pull request: https://github.com/apache/flink/pull/5877 [FLINK-8785][Job-Submission]Handle JobSubmissionExceptions ## What is the purpose of the change We will get an "Internal server error" exception if we submit a jobgraph with a restclusterclient. This PR helps us get more details and causes of the exception, such as "The jobgraph is empty" message. ## Brief change log Add causes and details of an exception which happens in job submission. ## Verifying this change ## Does this pull request potentially affect one of the following parts: ## Documentation You can merge this pull request into a Git repository by running: $ git pull https://github.com/buptljy/flink 8785 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5877.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5877 commit 443a3c0fda861cd5324083df93ed5080c2f9f476 Author: windDate: 2018-04-18T17:29:14Z add error messages > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Assignee: buptljy >Priority: Critical > Labels: flip-6 > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443563#comment-16443563 ] buptljy commented on FLINK-8785: [~Zentol] Can you help review my PR ? > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Assignee: buptljy >Priority: Critical > Labels: flip-6 > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441047#comment-16441047 ] buptljy commented on FLINK-8785: [~Zentol] Okay, I've already known how to do this. I will assign this task to myself if you don't mind. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: flip-6 > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440751#comment-16440751 ] Chesnay Schepler commented on FLINK-8785: - The {{ErrorResponseBody}} can accept a list of errors. I would suggest to pass in the the message of all exceptions in the trace. For the response code we will have to stick to {{INTERNAL_SERVER_ERROR}} as we just can't differentiate between the errors right now. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: flip-6 > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413085#comment-16413085 ] Jiayi commented on FLINK-8785: -- [~Zentol] Thanks! I've reproduced the "internal server error" and run several unit tests of it. I find that the easiest way to return an error message is adding the "error.getMessage()": {code:java} HandlerUtils.sendErrorResponse( ctx, httpRequest, new ErrorResponseBody("Internal server error. " + error.getMessage() + "."), HttpResponseStatus.INTERNAL_SERVER_ERROR, responseHeaders); {code} For example, if you submit an empty JobGraph, the response will be "[Internal server error.Could not start JobManager.]". If it's still not explicit enough, I can add more messages in "Dispatcher.submitJob" and the response will be "[Internal server error.Could not start JobManager.Could not set up JobManager.The given job is empty.]". The response is long but I can't do anything about it because the messages are delivered and combined between functions. What do you think of my plan or do you have any suggestions ? > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: flip-6 > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16406048#comment-16406048 ] Chesnay Schepler commented on FLINK-8785: - The {{JobSubmissionFailsITCase}} has not been ported to flip6 on the master yet (which we can't since it doesn't work yet). You can reproduce this by starting a flip6 cluster (you can use {{MiniCluster}} in the IDE) and submitting a job using the {{RestClusterClient}}. (See {{MiniClusterResource}} on how to set these up) I went through all cases where we explicitly return an internal server error (by searching for the string), and those are all intended as they truly only occur on internal errors. Handlers are allowed to throw internal server errors, but must do so explicitly. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: flip-6 > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405966#comment-16405966 ] Wind commented on FLINK-8785: - [~Zentol] I can see that many handlers send error response whose "ErrorResponseBody" is "internal server error". I guess you expect that the "ErrorResponseBody" returns the real message of the exception like "The job graph is wrong". I can take this task if you can give me more details about this. And I can't see "internal server error" in JobSubmissionFailsITCase, is it because the job is not submitted in a restful way ? > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: flip-6 > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404597#comment-16404597 ] Chesnay Schepler commented on FLINK-8785: - [~wind_ljy] Handlers are only allowed to throw \{{RestHandlerException}}s, to provide proper error messages and error codes. > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: flip-6 > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8785) JobSubmitHandler does not handle JobSubmissionExceptions
[ https://issues.apache.org/jira/browse/FLINK-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391274#comment-16391274 ] Wind commented on FLINK-8785: - [~Zentol] You mean the exception should be caught instead of being thrown ? > JobSubmitHandler does not handle JobSubmissionExceptions > > > Key: FLINK-8785 > URL: https://issues.apache.org/jira/browse/FLINK-8785 > Project: Flink > Issue Type: Bug > Components: Job-Submission, JobManager, REST >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: flip-6 > > If the job submission, i.e. {{DispatcherGateway#submitJob}} fails with a > {{JobSubmissionException}} the {{JobSubmissionHandler}} returns "Internal > server error" instead of signaling the failed job submission. > This can for example occur if the transmitted execution graph is faulty, as > tested by the \{{JobSubmissionFailsITCase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)