[GitHub] flink issue #4697: [FLINK-7650] [flip6] Port JobCancellationHandler to new R...

2017-09-27 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4697
  
+1 from my side.


---


[GitHub] flink issue #4697: [FLINK-7650] [flip6] Port JobCancellationHandler to new R...

2017-09-27 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/4697
  
Thanks for the review @zentol. I've rebased it and wait for Travis.


---


[GitHub] flink issue #4697: [FLINK-7650] [flip6] Port JobCancellationHandler to new R...

2017-09-27 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/4697
  
I guess some of the problems like taking a savepoint could still be served 
under `/jobs/:jobid/savepoints/:savepointid` because this resource will be 
removed upon first request anyway.

I'm not sure how well providing better error messages will work in the end, 
because right now, we don't distinguish between completed and running jobs on 
the handler side. What it currently does is trigger `takeSavepoint` on the 
`Dispatcher`. The `Dispatcher` will fail with a `FlinkJobNotFoundException` if 
the job is no longer executed/non-existent. In order to see whether there was a 
job being executed, the handler then had to access some other information

Monitoring a job is indeed a bit tricky with splitting the URLs. 
Theoretically, this is already the case, since we only store a limited number 
of `ArchivedExecutionGraphs` on the `MemoryArchivist`. Thus, if you have many 
completing jobs, it might be the case that you don't see it when querying 
`/jobs/:jobid`.

Nevertheless, the advantages of keeping it like it is are evident. I'll 
change the method to `PATCH`. Let's see how far we get with that.


---


[GitHub] flink issue #4697: [FLINK-7650] [flip6] Port JobCancellationHandler to new R...

2017-09-27 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4697
  
I see a few problems with separating running jobs.

* The monitoring pattern for a job becomes quite weird. While a job is 
running you ask "/jobs/running/:jobid", until you get a 404 and then check 
"/jobs/completed/:jobid".
* Submitting a job is a POST to "/jobs", which returns 
"/jobs/running/:jobid" which may return a 404 if the job fails straight away.
* A unified URL allows us to actually provide good error messages, i.e. a 
405 Method not Allowed when trying to take a savepoint for finished jobs. For 
separate URLs this would return a 404.
* We have to handle the case of a savepoint/stop request arriving for the 
finished job anyway, if the job finishes between accepting the request and 
handling it.

Given that it only affects taking savepoints and stopping a job i don't 
think it's really necessary to change things. Note that if we were to do that 
the job termination should probably be a POST instead.


---


[GitHub] flink issue #4697: [FLINK-7650] [flip6] Port JobCancellationHandler to new R...

2017-09-27 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/4697
  
Thanks for the review @bowenli86 and @zentol. I've addressed your PR 
comments modulo the `DELETE` method. I somehow still think that `DELETE` is the 
right verb and we maybe should change how we represent running and archived 
jobs differently. What do you think?


---