This is an automated email from the ASF dual-hosted git repository.

yuchaoran pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-yunikorn-site.git


The following commit(s) were added to refs/heads/master by this push:
     new be96059  [YUNIKORN-925]Add document to explain existing yunikorn 
metrics (#94)
be96059 is described below

commit be96059df4c40a926e59696f997a70ba9fea6d40
Author: Tingyao Huang <[email protected]>
AuthorDate: Mon Nov 22 14:59:22 2021 +0800

    [YUNIKORN-925]Add document to explain existing yunikorn metrics (#94)
---
 docs/performance/metrics.md | 39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/docs/performance/metrics.md b/docs/performance/metrics.md
index d7ebfa4..9dedbec 100644
--- a/docs/performance/metrics.md
+++ b/docs/performance/metrics.md
@@ -25,13 +25,50 @@ under the License.
 -->
 
 YuniKorn leverages [Prometheus](https://prometheus.io/) to record metrics. The 
metrics system keeps tracking of
-scheduler's critical execution paths, to reveal potential performance 
bottlenecks. Currently, there are two categories
+scheduler's critical execution paths, to reveal potential performance 
bottlenecks. Currently, there are three categories
 for these metrics:
 
 - scheduler: generic metrics of the scheduler, such as allocation latency, num 
of apps etc.
 - queue: each queue has its own metrics sub-system, tracking queue status.
+- event: record various changes of events in YuniKorn.
 
 all metrics are declared in `yunikorn` namespace.
+###    Scheduler Metrics
+
+| Metrics Name          | Metrics Type  | Description  | 
+| --------------------- | ------------  | ------------ |
+| containerAllocation   | Counter       | Total number of attempts to allocate 
containers. State of the attempt includes `allocated`, `rejected`, `error`, 
`released`. Increase only.  |
+| applicationSubmission | Counter       | Total number of application 
submissions. State of the attempt includes `accepted` and `rejected`. Increase 
only. |
+| applicationStatus     | Gauge         | Total number of application status. 
State of the application includes `running` and `completed`.  | 
+| totalNodeActive       | Gauge         | Total number of active nodes.        
                  |
+| totalNodeFailed       | Gauge         | Total number of failed nodes.        
                  |
+| nodeResourceUsage     | Gauge         | Total resource usage of node, by 
resource name.        |
+| schedulingLatency     | Histogram     | Latency of the main scheduling 
routine, in seconds.    |
+| nodeSortingLatency    | Histogram     | Latency of all nodes sorting, in 
seconds.              |
+| appSortingLatency     | Histogram     | Latency of all applications sorting, 
in seconds.       |
+| queueSortingLatency   | Histogram     | Latency of all queues sorting, in 
seconds.             |
+| tryNodeLatency        | Histogram     | Latency of node condition checks for 
container allocations, such as placement constraints, in seconds, in seconds. |
+
+###    Queue Metrics
+
+| Metrics Name              | Metrics Type  | Description |
+| ------------------------- | ------------- | ----------- |
+| appMetrics                | Counter       | Application Metrics, record the 
total number of applications. State of the application includes 
`accepted`,`rejected` and `Completed`.     |
+| usedResourceMetrics       | Gauge         | Queue used resource.     |
+| pendingResourceMetrics    | Gauge         | Queue pending resource.  |
+| availableResourceMetrics  | Gauge         | Used resource metrics related to 
queues etc.    |
+
+###    Event Metrics
+
+| Metrics Name             | Metrics Type  | Description |
+| ------------------------ | ------------  | ----------- |
+| totalEventsCreated       | Gauge         | Total events created.          |
+| totalEventsChanneled     | Gauge         | Total events channeled.        |
+| totalEventsNotChanneled  | Gauge         | Total events not channeled.    |
+| totalEventsProcessed     | Gauge         | Total events processed.        |
+| totalEventsStored        | Gauge         | Total events stored.           |
+| totalEventsNotStored     | Gauge         | Total events not stored.       |
+| totalEventsCollected     | Gauge         | Total events collected.        |
 
 ## Access Metrics
 

Reply via email to