[GitHub] flink issue #4697: [FLINK-7650] [flip6] Port JobCancellationHandler to new R...
Github user zentol commented on the issue: https://github.com/apache/flink/pull/4697 +1 from my side. ---
[GitHub] flink issue #4697: [FLINK-7650] [flip6] Port JobCancellationHandler to new R...
Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/4697 Thanks for the review @zentol. I've rebased it and wait for Travis. ---
[GitHub] flink issue #4697: [FLINK-7650] [flip6] Port JobCancellationHandler to new R...
Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/4697 I guess some of the problems like taking a savepoint could still be served under `/jobs/:jobid/savepoints/:savepointid` because this resource will be removed upon first request anyway. I'm not sure how well providing better error messages will work in the end, because right now, we don't distinguish between completed and running jobs on the handler side. What it currently does is trigger `takeSavepoint` on the `Dispatcher`. The `Dispatcher` will fail with a `FlinkJobNotFoundException` if the job is no longer executed/non-existent. In order to see whether there was a job being executed, the handler then had to access some other information Monitoring a job is indeed a bit tricky with splitting the URLs. Theoretically, this is already the case, since we only store a limited number of `ArchivedExecutionGraphs` on the `MemoryArchivist`. Thus, if you have many completing jobs, it might be the case that you don't see it when querying `/jobs/:jobid`. Nevertheless, the advantages of keeping it like it is are evident. I'll change the method to `PATCH`. Let's see how far we get with that. ---
[GitHub] flink issue #4697: [FLINK-7650] [flip6] Port JobCancellationHandler to new R...
Github user zentol commented on the issue: https://github.com/apache/flink/pull/4697 I see a few problems with separating running jobs. * The monitoring pattern for a job becomes quite weird. While a job is running you ask "/jobs/running/:jobid", until you get a 404 and then check "/jobs/completed/:jobid". * Submitting a job is a POST to "/jobs", which returns "/jobs/running/:jobid" which may return a 404 if the job fails straight away. * A unified URL allows us to actually provide good error messages, i.e. a 405 Method not Allowed when trying to take a savepoint for finished jobs. For separate URLs this would return a 404. * We have to handle the case of a savepoint/stop request arriving for the finished job anyway, if the job finishes between accepting the request and handling it. Given that it only affects taking savepoints and stopping a job i don't think it's really necessary to change things. Note that if we were to do that the job termination should probably be a POST instead. ---
[GitHub] flink issue #4697: [FLINK-7650] [flip6] Port JobCancellationHandler to new R...
Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/4697 Thanks for the review @bowenli86 and @zentol. I've addressed your PR comments modulo the `DELETE` method. I somehow still think that `DELETE` is the right verb and we maybe should change how we represent running and archived jobs differently. What do you think? ---