Marton Greber has posted comments on this change. ( http://gerrit.cloudera.org:8080/21723 )
Change subject: Add Prometheus HTTP service discovery ...................................................................... Patch Set 10: (3 comments) http://gerrit.cloudera.org:8080/#/c/21723/10/src/kudu/master/master-test.cc File src/kudu/master/master-test.cc: http://gerrit.cloudera.org:8080/#/c/21723/10/src/kudu/master/master-test.cc@4111 PS10, Line 4111: } // namespace master > Since this changelist has added functionality to enable Prometheus service Done http://gerrit.cloudera.org:8080/#/c/21723/10/src/kudu/master/master_path_handlers.cc File src/kudu/master/master_path_handlers.cc: http://gerrit.cloudera.org:8080/#/c/21723/10/src/kudu/master/master_path_handlers.cc@984 PS10, Line 984: WriteEmptyPrometheusSDResponse(output); > nit: since the server responds with HttpStatusCode::ServiceUnavailable, sen I've checked the Prometheus source and as expected prometheus short circuits based on non 200 status codes [1], so yes this line can be removed. [1] https://github.com/prometheus/prometheus/blob/7512d13e00c50b0287f8cd8576eb54d91977f77c/discovery/http/http.go#L172 http://gerrit.cloudera.org:8080/#/c/21723/10/src/kudu/master/master_path_handlers.cc@988 PS10, Line 988: if (!l.leader_status().ok()) { : WriteEmptyPrometheusSDResponse(output); > If we are sending back and empty list with HTTP 200 when a particular insta Ah, yes—nice catch, thank you! I’ve looked into this, and it turns out the data is not necessarily wiped in such cases; it simply becomes stale [1],[2]: “If a target scrape or rule evaluation no longer returns a sample for a time series that was previously present, that time series is marked as stale. If a target is removed, the previously retrieved time series will be marked stale soon after removal.” [1] What isn’t clear to me is whether, once a series is marked stale, it automatically becomes “live” again when a different master is elected leader, serves the SD endpoint, and Prometheus can resume scraping. If stale series can revert to normal as soon as new samples arrive, then we’re fine—but we need to test it. I’ll implement MiniPrometheus (KUDU-3685) and circle back once we have a proper test environment to verify scenarios like this with confidence. [1] https://github.com/prometheus/prometheus/blob/7512d13e00c50b0287f8cd8576eb54d91977f77c/discovery/http/http.go#L172 [2] https://www.robustperception.io/staleness-and-promql/ -- To view, visit http://gerrit.cloudera.org:8080/21723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I931aa72a7567c0dde43d7b7ed53a2dd0fa8bc1fe Gerrit-Change-Number: 21723 Gerrit-PatchSet: 10 Gerrit-Owner: Marton Greber <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Gabriella Lotz <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Reviewer: Zoltan Chovan <[email protected]> Gerrit-Reviewer: Zoltan Martonka <[email protected]> Gerrit-Comment-Date: Wed, 06 Aug 2025 12:57:36 +0000 Gerrit-HasComments: Yes
