Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21723 )
Change subject: Add Prometheus HTTP service discovery ...................................................................... Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/21723/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21723/3//COMMIT_MSG@14 PS3, Line 14: servers. This returns the Kudu Master and TServers grouped, according > As a second thought if customers are running Knox I think they can do some Yes, I think Knox might be a good option to explore in this context. Also, it's possible to enable TLS client authentication in 'tls_config' [3] and basic HTTP authn there as well [1] in the 'http_config' sub-section of the 'scape_config' section [2]. However, it necessary to implement the customization of authn per end-point in the squeasel webserver, IIUC. [1] https://prometheus.io/docs/prometheus/latest/configuration/https/ [2] https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config [3] https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master-test.cc File src/kudu/master/master-test.cc: http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master-test.cc@3883 PS3, Line 3883: } // namespace master Does it make sense to add a test to ensure the reported list of targets is empty when master hasn't yet initialized? That's just to make sure we know it's HTTP 200 and an empty array, not HTTP error code from the server. http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master-test.cc File src/kudu/master/master-test.cc: http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master-test.cc@4003 PS8, Line 4003: nit: put these into separate lines, otherwise clang-tidy isn't happy http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.h File src/kudu/master/master_path_handlers.h: http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.h@70 PS8, Line 70: r addresses su nit: wrong indent? http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master_path_handlers.cc File src/kudu/master/master_path_handlers.cc: http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master_path_handlers.cc@936 PS3, Line 936: jw.String("Kudu"); > Done, moreover added location info. Great. Now, should different clusters be scraped by different Prometheus jobs? Imagine 2 Kudu clusters served by the same Prometheus instance: one Kudu cluster have 5 nodes, and another Kudu cluster has 250 nodes. http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master_path_handlers.cc@1010 PS3, Line 1010: server->RegisterJsonPathHandler( : "/prometheus-sd", "Prometheus SD", : [this](const Webserver::WebRequest& req, Webserver::PrerenderedWebResponse* resp) { : this->HandlePrometheusServiceDiscovery(req, resp); : }, : false /*is_on_nav_bar*/); Probably, that's a question for a separate changelist, but should this endpoint be kept open when SPENGO authn is used to secure the embedded webserver? From reading the Prometheus docs [1], I'm not quite sure whether the SD URL should be always kept open, or it's possible to require one of the supported authn options for the SD endpoint? ```The URL to the HTTP SD is not considered secret. The authentication and any API keys should be passed with the appropriate authentication mechanisms. Prometheus supports TLS authentication, basic authentication, OAuth2, and authorization headers.``` [1] https://prometheus.io/docs/prometheus/latest/http_sd/ http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc File src/kudu/master/master_path_handlers.cc: http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@889 PS8, Line 889: JsonError(s, output); nit: should any of attributes mot be empty? If any, consider adding corresponding DCHECK() for those http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@917 PS8, Line 917: // Create Prometheus service discovery JSON. Some tablet servers might considered down after they have registered with masters (i.e., haven't heard from them for some time). Should that be somehow reflected in the scrape entries for Prometheus? -- To view, visit http://gerrit.cloudera.org:8080/21723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I931aa72a7567c0dde43d7b7ed53a2dd0fa8bc1fe Gerrit-Change-Number: 21723 Gerrit-PatchSet: 3 Gerrit-Owner: Marton Greber <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Gabriella Lotz <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Reviewer: Zoltan Chovan <[email protected]> Gerrit-Reviewer: Zoltan Martonka <[email protected]> Gerrit-Comment-Date: Tue, 15 Jul 2025 07:42:05 +0000 Gerrit-HasComments: Yes
