This is being addressed here: https://github.com/apache/flink/pull/2249
On Tue, Jul 12, 2016 at 3:48 PM, Stephan Ewen <se...@apache.org> wrote: > I think there is a confusion between how Flink thinks about HA and job life > cycle, and how many users think about it. > > Flink thinks that a killing of the YARN session is a failure of the job. So > as soon as new Yarn resources become available, it tries to recover the job. > Most users think that killing a Yarn session is equivalent to canceling the > job. > > I am unsure if we should start to interpret the killing of a Yarn session as > a cancellation. Do Yarn sessions never get killed accidentally, or as the > result of a Yarn-related failure? > > Using Flink-job-at-a-time-on-yarn, cancelling the Flink Job also shuts down > the Yarn session and hence shuts down everything properly. > > Hope that train of thought helps. > > > On Tue, Jul 12, 2016 at 3:15 PM, Ufuk Celebi <u...@apache.org> wrote: >> >> Are you running in HA mode? If yes, that's the expected behaviour at >> the moment, because the ZooKeeper data is only cleaned up on a >> terminal state (FINISHED, FAILED, CANCELLED). You have to specify >> separate ZooKeeper root paths via "recovery.zookeeper.path.root". >> There is an issue which should be fixed for 1.2 to make this >> configurable in an easy way. >> >> On Tue, Jul 12, 2016 at 1:28 PM, Konstantin Gregor >> <konstantin.gre...@tngtech.com> wrote: >> > Hello everyone, >> > >> > I have a question concerning stopping Flink streaming processes that run >> > in a detached Yarn session. >> > >> > Here's what we do: We start a Yarn session via >> > yarn-session.sh -n 8 -d -jm 4096 -tm 10000 -s 10 -qu flink_queue >> > >> > Then, we start our Flink streaming application via >> > flink run -p 65 -c SomeClass some.jar > /dev/null 2>&1 & >> > >> > The problem occurs when we stop the application. >> > If we stop the Flink application with >> > flink cancel <JOB_ID> >> > and then kill the yarn application with >> > yarn application -kill <APPLICATION_ID> >> > everything is fine. >> > But what we expected was that when we only kill the yarn application >> > without specifically canceling the Flink job before, the Flink job will >> > stay lingering on the machine and use resources until it is killed >> > manually via its process id. >> > >> > One thing that we tried was to stop using ephemeral ports for the >> > application-manager, namely we set yarn.application-master.port >> > specifically to some port number, but the problem remains: Killing the >> > yarn application does not kill the corresponding Flink job. >> > >> > Does anyone have an idea about this? Any help is greatly appreciated :-) >> > By the way, our application reads data from a Kafka queue and writes it >> > into HDFS, maybe this is also important to know. >> > >> > Thank you and best regards >> > >> > Konstantin >> > -- >> > Konstantin Gregor * konstantin.gre...@tngtech.com >> > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring >> > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke >> > Sitz: Unterföhring * Amtsgericht München * HRB 135082 > >