Hello Guys,

*What is the Problem?*
I'm facing slow Grafana dashboard performance, I'm using Prometheus as my 
datastore,
just need to debug/understand the bottleneck/slowness.

*What I've tried to improve performance? *
1. Tried Trickster as a caching/accelerator layer between Prometheus and 
Grafana.
2.  Increase some query parameters limits.  
   
         --query.max-concurrency=20  
                                 Maximum number of queries executed 
concurrently.
         --query.max-samples=50000000  
                                 Maximum number of samples a single query 
can load into memory. 
          These help to reduce connection timeout issues but not help for 
slow performance 
3. Check System resources usage - Its good enough to handle the query.  


*What I need to know ?*
1. Want understand more about below timing stats which can fetch from 
prometheus query logs 
(evalTotalTime,execQueueTime,execTotalTime",innerEvalTime,queryPreparationTime",resultSortTime
 
) 
 
    "stats": {
        "timings": {
            "evalTotalTime": 0.000447452,
            "execQueueTime": 7.599e-06,
            "execTotalTime": 0.000461232,
            "innerEvalTime": 0.000427033,
            "queryPreparationTime": 1.4177e-05,
            "resultSortTime": 6.48e-07
        }
2. We're using Prometheus widely but unable to find a useful resource for 
performance tuning, so can you guys please share some tunable options/ideas 
to improve Prometheus query performance, guide me, to do anything better to 
narrow down the exact area which contributing the slowness.


*Stack Details *
OS: Centos 7 
Version:  Prometheus 2.20 
Deployment: Docker compose stack (Prometheus, Grafana, Trickster)  





*+ Adding some additional points.*

If prometheus_engine_queries is greater than 
prometheus_engine_queries_concurrent_max, it means that some queries are 
queued. The queue time is part of the two-minute default timeout. 

We have Analysed the max query rate from our dashboard it's between 30-40 & 
our default value was 20, this will cause some timeout and slowness due to 
queueing the request so now its increased it to 60. (this will cause some 
more resource utilisation but it's ok as per our system specification ) 

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/9ca46a9d-b788-4e0f-a371-5ab493c76441n%40googlegroups.com.

Reply via email to