刘方奇 created FLINK-25029:
---------------------------

             Summary: Hadoop Caller Context Setting In Flink
                 Key: FLINK-25029
                 URL: https://issues.apache.org/jira/browse/FLINK-25029
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Task
            Reporter: 刘方奇


For a given HDFS operation (e.g. delete file), it's very helpful to track which 
upper level job issues it. The upper level callers may be specific Oozie tasks, 
MR jobs, and hive queries. One scenario is that the namenode (NN) is 
abused/spammed, the operator may want to know immediately which MR job should 
be blamed so that she can kill it. To this end, the caller context contains at 
least the application-dependent "tracking id".

The above is the main effect of the Caller Context. HDFS Client set Caller 
Context, then name node get it in audit log to do some work.

Now the Spark and hive have the Caller Context to meet the HDFS Job Audit 
requirement.

In my company, flink jobs often cause some problems for HDFS, so we did it for 
preventing some cases.

If the feature is general enough. Should we support it, then I can submit a PR 
for this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to