yuecong edited a comment on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-539804591
 
 
   @dongjoon-hyun Thanks for fixing this. 
   I have several questions on this.
   
   1. Short-lived metrics
   As Prometheus uses pull model, how do you recommend people to use these 
metrics for some executors who get shut down immediately?  Also how this will 
work for some short-lived(e.g. shorter than one Prometheus scrape interval, 
usually it is 30s) spark application?
   Check this [blog]( 
https://www.metricfire.com/prometheus-tutorials/prometheus-monitoring-101) 
about short-lived metrics for Prometheus.
   
   2. Cardinality
    It looks like you are using app_id as one of the app_id, which will 
increase the cardinality for Prometheus metrics. See more information about 
prometheus's cardinality issue as 
[here](https://www.robustperception.io/cardinality-is-key) as well as this 
[doc](https://prometheus.io/docs/practices/naming/#labels)
   
   If a user uses a central Prometheus server to scrape its spark application 
with this PR. for each time, it has a new Spark application, it will have N 
metrics(say 10) and assume it has M workers(20) on average. As app_id will 
change each time, with time going, old metrics will not disappear, it will add 
up to millions and even billions of metrics. This will cause a heavy load for a 
traditional Prometheus server. There are several 
solutions([M3](https://eng.uber.com/m3/), 
[Cortex](https://www.cncf.io/blog/2018/12/18/cortex-a-multi-tenant-horizontally-scalable-prometheus-as-a-service/),
 [Thanos](https://improbable.io/blog/thanos-prometheus-at-scale)) to address 
this issue, but we should make it clear about the cardinality for users to use 
such metrics.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to