Thanks for the help Aliaksandr and Julien. I upgraded to latest Prometheus 
2.25.0 and golang 1.15.8 and see a huge performance improvement.

On Tuesday, March 9, 2021 at 11:22:39 AM UTC-8 [email protected] wrote:

> On Fri, Mar 5, 2021 at 12:10 AM Dhruv Patel <[email protected]> wrote:
>
>> Hi Folks,
>>   We are seeing an issue in our current Prometheus Setup where we are not 
>> able to ingest beyond 22 million metrics/min. We have run several Load Test 
>> at 25 Million, 29 Million and 35 Million but the ingestion rate remains 
>> constant around the same 22 million metrics/min. Moreover, we are also 
>> seeing that our CPU Usage is around 70% and have more than 50% memory 
>> available memory. Looking at this it feels like we are not hitting resource 
>> limitations but something to do with lock contention.
>>
>> *Prometheus Version:* 2.9.1
>> *Host Shape:* x7-enclave-104 (It is a bare metal host with 104 processor 
>> units). More info can be obtained in below screenshots
>> *Memory Info: *
>>                        total        used        free         shared  
>> buff/cache   available
>> Mem:           754G         88G        528G         67M        136G      
>>   719G
>> Swap:          1.0G           0B           1.0G
>> Total:           755G          88G        529G
>>
>> We also ran some profiling during our load test setup at 20Million, 22 
>> Million and 25 Million and have seen an increase in time taken taken for 
>> running runtime.mallocgc which leads to an increased usage in 
>> runtime.futex. Some how we are not able to figure out what could be the 
>> issue of the lock contention. I have attached our profiling results at 
>> different load test levels if thats any useful. Any ideas on what could be 
>> causing the high time taken in runtime malloc gc?
>>
>
> Prometheus is written in Go. The runtime.mallocgc function is called every 
> time Prometheus allocates a new object during its operation. It looks like 
> Prometheus 2.9.1 allocates a lot during the load test. The runtime.futex is 
> used internally by Go runtime during objects' allocation and subsequent 
> objects' deallocation (aka garbage collection). It looks like the Go 
> runtime used in Prometheus 2.9.1 isn't optimized well for programs with 
> frequent object allocations that run on systems with many CPU cores. This 
> should be improved in Go 1.15 - Allocation of small objects now performs 
> much better at high core counts, and has lower worst-case latency 
> <https://tip.golang.org/doc/go1.15#runtime> . So it is recommended 
> repeating the load test on to the latest available version of Prometheus, 
> which is hopefully built with at least Go 1.15 - see 
> https://github.com/prometheus/prometheus/releases .
>
> Additionally, you can run the load test on VictoriaMetrics and compare its 
> scalability with Prometheus. See 
> https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter
>  
> .
>  
>
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/abccd4c0-c69d-4869-8598-899b3de693f7n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/abccd4c0-c69d-4869-8598-899b3de693f7n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
> Best Regards,
>
> Aliaksandr Valialkin, CTO VictoriaMetrics
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f54ae6b0-26ac-4ea2-a62c-48aa81aba0e1n%40googlegroups.com.

Reply via email to