Hi:

After enabling Flink’s HistoryServer, we observed that different ways of
stopping a running job lead to different results in the HistoryServer:

If we use cancel or stop with savepoint, in most cases the HistoryServer
can display the basic information of the job (for example checkpoint
status, exceptions, DAG, etc.).

But if we kill the job via YARN kill, then definitely that job’s history is
not visible in the HistoryServer.


This difference causes us some trouble: we expect to reliably obtain
historical job information, but at present there is no shutdown method that
100% guarantees that the job history will appear in the HistoryServer.

I roughly understand the underlying mechanism, but I don’t know why the
community designed it this way. This uncertainty complicates our
upper-level job operation & maintenance platform.

Reply via email to