Re: Killing Yarn Session Leaves Lingering Flink Jobs

Ufuk Celebi Fri, 15 Jul 2016 10:28:54 -0700

This is being addressed here: https://github.com/apache/flink/pull/2249


On Tue, Jul 12, 2016 at 3:48 PM, Stephan Ewen <se...@apache.org> wrote:
> I think there is a confusion between how Flink thinks about HA and job life
> cycle, and how many users think about it.
>
> Flink thinks that a killing of the YARN session is a failure of the job. So
> as soon as new Yarn resources become available, it tries to recover the job.
> Most users think that killing a Yarn session is equivalent to canceling the
> job.
>
> I am unsure if we should start to interpret the killing of a Yarn session as
> a cancellation. Do Yarn sessions never get killed accidentally, or as the
> result of a Yarn-related failure?
>
> Using Flink-job-at-a-time-on-yarn, cancelling the Flink Job also shuts down
> the Yarn session and hence shuts down everything properly.
>
> Hope that train of thought helps.
>
>
> On Tue, Jul 12, 2016 at 3:15 PM, Ufuk Celebi <u...@apache.org> wrote:
>>
>> Are you running in HA mode? If yes, that's the expected behaviour at
>> the moment, because the ZooKeeper data is only cleaned up on a
>> terminal state (FINISHED, FAILED, CANCELLED). You have to specify
>> separate ZooKeeper root paths via "recovery.zookeeper.path.root".
>> There is an issue which should be fixed for 1.2 to make this
>> configurable in an easy way.
>>
>> On Tue, Jul 12, 2016 at 1:28 PM, Konstantin Gregor
>> <konstantin.gre...@tngtech.com> wrote:
>> > Hello everyone,
>> >
>> > I have a question concerning stopping Flink streaming processes that run
>> > in a detached Yarn session.
>> >
>> > Here's what we do: We start a Yarn session via
>> > yarn-session.sh -n 8 -d -jm 4096 -tm 10000 -s 10 -qu flink_queue
>> >
>> > Then, we start our Flink streaming application via
>> > flink run -p 65 -c SomeClass some.jar > /dev/null 2>&1  &
>> >
>> > The problem occurs when we stop the application.
>> > If we stop the Flink application with
>> > flink cancel <JOB_ID>
>> > and then kill the yarn application with
>> > yarn application -kill <APPLICATION_ID>
>> > everything is fine.
>> > But what we expected was that when we only kill the yarn application
>> > without specifically canceling the Flink job before, the Flink job will
>> > stay lingering on the machine and use resources until it is killed
>> > manually via its process id.
>> >
>> > One thing that we tried was to stop using ephemeral ports for the
>> > application-manager, namely we set yarn.application-master.port
>> > specifically to some port number, but the problem remains: Killing the
>> > yarn application does not kill the corresponding Flink job.
>> >
>> > Does anyone have an idea about this? Any help is greatly appreciated :-)
>> > By the way, our application reads data from a Kafka queue and writes it
>> > into HDFS, maybe this is also important to know.
>> >
>> > Thank you and best regards
>> >
>> > Konstantin
>> > --
>> > Konstantin Gregor * konstantin.gre...@tngtech.com
>> > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>> > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>> > Sitz: Unterföhring * Amtsgericht München * HRB 135082
>
>

Re: Killing Yarn Session Leaves Lingering Flink Jobs

Reply via email to