Zhao, Qingwen created EAGLE-1024:
------------------------------------

             Summary: Monitor jobs with high RPC throughput 
                 Key: EAGLE-1024
                 URL: https://issues.apache.org/jira/browse/EAGLE-1024
             Project: Eagle
          Issue Type: Improvement
    Affects Versions: v0.5.0
            Reporter: Zhao, Qingwen


We've identified some jobs with high RPC throughput which causes the NN heavy 
RPC overhead. These jobs has requested extremely large HDFS operations in a 
very short window (2 mins).

So we tend to capture those jobs with:
a) the job has very large RPC throughput, using the job total HDFS ops/the job 
duration, if the throughput is larger than 1000
b) and if the HDFS ops per task is larger than 25
Then send out the alert out. Later, we will notify the users to optimize their 
jobs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to