Hello,

I'm not "a Prometheus dev" but this is something that I am interested in. 
Could you open up a PR with the benchmark and the fix? I'll help out with 
reviewing.

Thanks,
Giedrius

On Saturday, 4 January 2025 at 09:52:01 UTC+2 Rafał Dowgird wrote:

> Dear and Esteemed Prometheus developers,
>
> I'd like to discuss with you a performance problem with Pushgateway, 
> namely that the complexity of adding n metrics might get quadratic 
> (O(n^2)). Details follow.
>
> We have a mixed push/scrape system where Pushgateway handles some of the 
> metrics which come from batch jobs. While migrating some jobs to 
> Pushgateway we hit a performance bottleneck. We worked around this by 
> sharding Pushgateway. Still the sharded setup is more complex and the 
> amount of data wasn't that big, so we investigated the Pushgateway side of 
> things.
>
> It seems that the root of the problem is that every push operation causes 
> recalculation of hashes for all metrics already existing in the database. 
> This is how the consistency check logic works at present.
>
> I have created a simple benchmark to isolate/demonstrate the problem: 
> https://github.com/dowgird/pushgateway/commit/
> e0629ecb999c2f22cf098c87c78fc71cd0414733
>
> The output demonstrates that subsequent pushes of metrics get linearly 
> slower:
>
> I: 100 elapsed:220.379138ms diff:220.379138ms
> I: 200 elapsed:505.576881ms diff:285.197743ms
> I: 300 elapsed:841.153205ms diff:335.576324ms
> .
> .
> .
> I: 2700 elapsed:21.806380441s diff:1.391117119s
> I: 2800 elapsed:23.229272852s diff:1.422892411s
> I: 2900 elapsed:24.674250223s diff:1.444977371s
>
> Possible fix doesn't look very complicated algorithmically (memorizing the 
> hashes should work). Code-wise it's a bit more complex, which is a part of 
> why I'm writing this message. I can contribute the fix but this would 
> require some discussion of client API.
>
> The other part is that I understand from documentation and communications 
> on github issues that Pushgateway is not meant to be high performance. That 
> said, I still think it would be beneficial to remove this particular 
> performance bottleneck - there seem to be other people hitting it (
> https://github.com/prometheus/pushgateway/issues/643 might be caused by 
> this).
>
> Would you be open to accepting a fix for this issue?
>
> --
> Rafał

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-developers/b53b5c14-a4ea-4c21-8a53-aeb1cb0a6036n%40googlegroups.com.

Reply via email to