Marton Greber has posted comments on this change. ( http://gerrit.cloudera.org:8080/21723 )
Change subject: Add Prometheus HTTP service discovery ...................................................................... Patch Set 9: (16 comments) http://gerrit.cloudera.org:8080/#/c/21723/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21723/3//COMMIT_MSG@14 PS3, Line 14: servers. This returns the Kudu Master and TServers, according to the > Yes, I think Knox might be a good option to explore in this context. Thanks! I will gather all these follow up information look into them when i'm doing followup, and in the context of the blogpost. http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master-test.cc File src/kudu/master/master-test.cc: http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master-test.cc@3883 PS3, Line 3883: table_identifier_b, CatalogManager::TableInfoMapType::kAllTableType, > Ah, and now I think we should get HTTP 503 response instead of HTTP 200 wit Absolutely. I've changed it to report 503. In the latest rev I've added a tests to cover this: "PrometheusServiceDiscoveryDuringStartup". http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master-test.cc File src/kudu/master/master-test.cc: http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master-test.cc@3961 PS8, Line 3961: NO_FATALS(CheckPrometheusOutput(str)); > nit: would be great to have a small doc blurb on what this scenario tries t Done http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master-test.cc@3969 PS8, Line 3969: : // Verify that Prometheus format metrics for Kudu masters are output : // in accordance with the severity level filtering c > Does it make sense to test for the presence of all the expected fields here Done http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master-test.cc@4003 PS8, Line 4003: const auto& str = buf.ToString(); > nit: put these into separate lines, otherwise clang-tidy isn't happy Done http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.h File src/kudu/master/master_path_handlers.h: http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.h@53 PS8, Line 53: > Even if right now Kudu masters don't have assigned locations, I'd think of Done http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.h@70 PS8, Line 70: nst Webserver: > nit: wrong indent? Done http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master_path_handlers.cc File src/kudu/master/master_path_handlers.cc: http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master_path_handlers.cc@880 PS3, Line 880: jw.EndArray(); > I missed responding to this in prior review rounds, but it's better late th Yep that makes sense, thanks for taking the time to think this through! Added 503 in this patch. Created a TODO in the source to implement the grace period (FLAGS_heartbeat_rpc_timeout_ms + FLAGS_heartbeat_interval_ms) functionality in a followup patch. http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master_path_handlers.cc@936 PS3, Line 936: Substitute > Indeed, it seems Kudu nodes from different clusters should logically belong Yep exactly. Now the default is that we supply the job: "Kudu" in all targets. (for single Kudu cluster users) For customers operating multiple Kudu clusters, they can just give different job names in the scrape config by using relabeling. Using the below scrape config we can change the default job: "Kudu" to "kudu-prod" for example scrape_configs: - job_name: 'kudu-prod' scrape_interval: 5s metrics_path: '/metrics_prometheus' http_sd_configs: - url: 'http://127.0.0.1:8765/prometheus-sd' refresh_interval: 5s relabel_configs: - target_label: job replacement: kudu-prod http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master_path_handlers.cc@1010 PS3, Line 1010: : for (const auto& target : data.tserver_targets) { : WritePrometheusSDTargetGroup(&jw, : "tservers", : target.address, : > Probably, that's a question for a separate changelist, but should this endp In my reading I the http sd endpoint can also be secured. However I think I will need to verify this by running some experiments, alongside the other mentioned Knox approach (whether it works or not). http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc File src/kudu/master/master_path_handlers.cc: http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@889 PS8, Line 889: DCHECK(!group_name.empty()); > nit: should any of attributes mot be empty? If any, consider adding corres Done http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@917 PS8, Line 917: } > Ah, please ignore this. After some consideration, it's now clear to me tha Done http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@933 PS8, Line 933: > Why not to use HostPort::ToString() instead? This is HostPortPB. If i'm not mistaken there is no ToString(). http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@947 PS8, Line 947: : } > nit: using Substitute() makes messages' format easier to comprehend Done http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@953 PS8, Line 953: ntinue; > Use HostPort::ToString() instead ditto http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@962 PS8, Line 962: return Status::OK(); > nit: it might be useful to output the X-Prometheus-Refresh-Interval-Seconds Done -- To view, visit http://gerrit.cloudera.org:8080/21723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I931aa72a7567c0dde43d7b7ed53a2dd0fa8bc1fe Gerrit-Change-Number: 21723 Gerrit-PatchSet: 9 Gerrit-Owner: Marton Greber <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Gabriella Lotz <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Reviewer: Zoltan Chovan <[email protected]> Gerrit-Reviewer: Zoltan Martonka <[email protected]> Gerrit-Comment-Date: Thu, 17 Jul 2025 11:10:52 +0000 Gerrit-HasComments: Yes
