Zhao, Qingwen created EAGLE-1024: ------------------------------------ Summary: Monitor jobs with high RPC throughput Key: EAGLE-1024 URL: https://issues.apache.org/jira/browse/EAGLE-1024 Project: Eagle Issue Type: Improvement Affects Versions: v0.5.0 Reporter: Zhao, Qingwen
We've identified some jobs with high RPC throughput which causes the NN heavy RPC overhead. These jobs has requested extremely large HDFS operations in a very short window (2 mins). So we tend to capture those jobs with: a) the job has very large RPC throughput, using the job total HDFS ops/the job duration, if the throughput is larger than 1000 b) and if the HDFS ops per task is larger than 25 Then send out the alert out. Later, we will notify the users to optimize their jobs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)