Marton Greber has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21723 )

Change subject: Add Prometheus HTTP service discovery
......................................................................


Patch Set 9:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/21723/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21723/3//COMMIT_MSG@14
PS3, Line 14: servers. This returns the Kudu Master and TServers, according to 
the
> Yes, I think Knox might be a good option to explore in this context.
Thanks! I will gather all these follow up information look into them when i'm 
doing followup, and in the context of the blogpost.


http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master-test.cc
File src/kudu/master/master-test.cc:

http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master-test.cc@3883
PS3, Line 3883:         table_identifier_b, 
CatalogManager::TableInfoMapType::kAllTableType,
> Ah, and now I think we should get HTTP 503 response instead of HTTP 200 wit
Absolutely.
I've changed it to report 503.
In the latest rev I've added a tests to cover this: 
"PrometheusServiceDiscoveryDuringStartup".


http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master-test.cc
File src/kudu/master/master-test.cc:

http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master-test.cc@3961
PS8, Line 3961:   NO_FATALS(CheckPrometheusOutput(str));
> nit: would be great to have a small doc blurb on what this scenario tries t
Done


http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master-test.cc@3969
PS8, Line 3969:
              : // Verify that Prometheus format metrics for Kudu masters are 
output
              : // in accordance with the severity level filtering c
> Does it make sense to test for the presence of all the expected fields here
Done


http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master-test.cc@4003
PS8, Line 4003:     const auto& str = buf.ToString();
> nit: put these into separate lines, otherwise clang-tidy isn't happy
Done


http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.h
File src/kudu/master/master_path_handlers.h:

http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.h@53
PS8, Line 53:
> Even if right now Kudu masters don't have assigned locations, I'd think of
Done


http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.h@70
PS8, Line 70: nst Webserver:
> nit: wrong indent?
Done


http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master_path_handlers.cc
File src/kudu/master/master_path_handlers.cc:

http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master_path_handlers.cc@880
PS3, Line 880:   jw.EndArray();
> I missed responding to this in prior review rounds, but it's better late th
Yep that makes sense, thanks for taking the time to think this through!
Added 503 in this patch. Created a TODO in the source to implement the grace 
period (FLAGS_heartbeat_rpc_timeout_ms + FLAGS_heartbeat_interval_ms) 
functionality in a followup patch.


http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master_path_handlers.cc@936
PS3, Line 936:           Substitute
> Indeed, it seems Kudu nodes from different clusters should logically belong
Yep exactly.
Now the default is that we supply the
job: "Kudu" in all targets. (for single Kudu cluster users)
For customers operating multiple Kudu clusters, they can just give different 
job names in the scrape config by using relabeling. Using the below scrape 
config we can change the default job: "Kudu" to "kudu-prod" for example

scrape_configs:
  - job_name: 'kudu-prod'

    scrape_interval: 5s
    metrics_path: '/metrics_prometheus'

    http_sd_configs:
      - url: 'http://127.0.0.1:8765/prometheus-sd'
        refresh_interval: 5s

    relabel_configs:
      - target_label: job
        replacement: kudu-prod


http://gerrit.cloudera.org:8080/#/c/21723/3/src/kudu/master/master_path_handlers.cc@1010
PS3, Line 1010:
              :   for (const auto& target : data.tserver_targets) {
              :     WritePrometheusSDTargetGroup(&jw,
              :                                  "tservers",
              :                                  target.address,
              :
> Probably, that's a question for a separate changelist, but should this endp
In my reading I the http sd endpoint can also be secured. However I think I 
will need to verify this by running some experiments, alongside the other 
mentioned Knox approach (whether it works or not).


http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc
File src/kudu/master/master_path_handlers.cc:

http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@889
PS8, Line 889:   DCHECK(!group_name.empty());
> nit: should any of attributes mot be empty?  If any, consider adding corres
Done


http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@917
PS8, Line 917: }
> Ah, please ignore this.  After some consideration, it's now clear to me tha
Done


http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@933
PS8, Line 933:
> Why not to use HostPort::ToString() instead?
This is HostPortPB. If i'm not mistaken there is no ToString().


http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@947
PS8, Line 947:
             :     }
> nit: using Substitute() makes messages' format easier to comprehend
Done


http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@953
PS8, Line 953: ntinue;
> Use HostPort::ToString() instead
ditto


http://gerrit.cloudera.org:8080/#/c/21723/8/src/kudu/master/master_path_handlers.cc@962
PS8, Line 962:   return Status::OK();
> nit: it might be useful to output the X-Prometheus-Refresh-Interval-Seconds
Done



--
To view, visit http://gerrit.cloudera.org:8080/21723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I931aa72a7567c0dde43d7b7ed53a2dd0fa8bc1fe
Gerrit-Change-Number: 21723
Gerrit-PatchSet: 9
Gerrit-Owner: Marton Greber <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Gabriella Lotz <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Wang Xixu <[email protected]>
Gerrit-Reviewer: Zoltan Chovan <[email protected]>
Gerrit-Reviewer: Zoltan Martonka <[email protected]>
Gerrit-Comment-Date: Thu, 17 Jul 2025 11:10:52 +0000
Gerrit-HasComments: Yes

Reply via email to