[
https://issues.apache.org/jira/browse/OOZIE-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099801#comment-13099801
]
Hadoop QA commented on OOZIE-103:
---------------------------------
mislam77 remarked:
Objective :
---------------
1. The long term objective is: when hadoop is slow oozie should be able to
throttle the JT/NN load through submitting fewer jobs(e.g.).
2. In short term, we want to instrument oozie so that it could report the
response time of JT/NN at any time. How will the value be meat or presented is
not the scope of this short term goal.
3. It is expected that the design to achieve the short term objective should be
extend-able and reusable for long term objective.
Solution:
------------
Following ideas were discussed internally at Y! .
Approach 1:
Use a separate monitoring thread that will periodically ping with a
representative command to the Hadoop server. For example, in namenode, the
thread will invoke "ls /tmp" like command.
Pros & Cons :
* This thread will add extra overhead to hadoop as well as to oozie.
* Find a representative command that would represent the actual health of
hadoop might not be trivial.
Approach 2:
When oozie calls to NN, JT, oozie could instrument that turn-around time. The
benefit is: there is no extra command sent.
Pros & Cons :
* There are different types of commands and there normal response time also
varied. In this case, oozie could restrict the instrumentation to a subset of
commonly used commands. Each command type will have a different instrumented
value.
* When oozie is idle, oozie might miss the data for that period.
Comments please.
> GH-68: Better reporting/handling of problems in Hadoop
> ------------------------------------------------------
>
> Key: OOZIE-103
> URL: https://issues.apache.org/jira/browse/OOZIE-103
> Project: Oozie
> Issue Type: Bug
> Reporter: Hadoop QA
>
> Add instrumentation to track performance stats of NN and JT (how long to get
> directory listing on hdfs; how long to submit a job or query JT queue)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira