chenya-zhang commented on a change in pull request #352:
URL:
https://github.com/apache/incubator-yunikorn-core/pull/352#discussion_r775057744
##########
File path: pkg/metrics/queue.go
##########
@@ -28,60 +28,38 @@ import (
"github.com/apache/incubator-yunikorn-core/pkg/log"
)
+// QueueMetrics to declare queue metrics
type QueueMetrics struct {
- // metrics related to app
- appMetrics *prometheus.CounterVec
-
- // metrics related to resource
- usedResourceMetrics *prometheus.GaugeVec
- pendingResourceMetrics *prometheus.GaugeVec
- availableResourceMetrics *prometheus.GaugeVec
+ appMetrics *prometheus.GaugeVec
+ ResourceMetrics *prometheus.GaugeVec
}
-func forQueue(name string) CoreQueueMetrics {
+// InitQueueMetrics to initialize queue metrics
+func InitQueueMetrics(name string) CoreQueueMetrics {
q := &QueueMetrics{}
- // Queue Metrics
- q.appMetrics = prometheus.NewCounterVec(
- prometheus.CounterOpts{
- Namespace: Namespace,
- Subsystem: substituteQueueName(name),
- Name: "app_metrics",
- Help: "Application Metrics",
- }, []string{"state"})
-
- q.usedResourceMetrics = prometheus.NewGaugeVec(
- prometheus.GaugeOpts{
- Namespace: Namespace,
- Subsystem: substituteQueueName(name),
- Name: "used_resource",
- Help: "Queue used resource",
- }, []string{"resource"})
-
- q.pendingResourceMetrics = prometheus.NewGaugeVec(
+ q.appMetrics = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: Namespace,
Subsystem: substituteQueueName(name),
- Name: "pending_resource",
- Help: "Queue pending resource",
- }, []string{"resource"})
+ Name: "queue_app",
+ Help: "Queue application metrics. State of the
application includes `running`.",
+ }, []string{"state"})
Review comment:
Similar to the above, I think it is more meaningful to count the current
running apps in a queue not "all the apps that have
run/accepted/rejected/completed in a queue unless the scheduler restarts". Not
very sure what the business value is.
From the scheduler/cluster level, we have the metrics
`applicationSubmission` and `application` to count apps that have
run/accepted/rejected/completed. I think it should satisfy our operational
needs since all queues share the same scheduler in a cluster.
One thing to note is that, on the queue level, these metrics are never
implemented so it is not a backward incompatible change. I will help to
implement them in future PRs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]