Yan-Daojiang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24096 )

Change subject: KUDU-3692 Export entity attributes as Prometheus labels
......................................................................


Patch Set 4:

(22 comments)

Hi Alexey,
Thank you for the detailed and thorough review.

All comments and issues have been addressed.
I would really appreciate it if you could take another look when you have time.

Thanks again for your help!

http://gerrit.cloudera.org:8080/#/c/24096/3//COMMIT_MSG
Commit Message:

PS3:
> Just to clarify: this patch doesn't introduce host/node attribute into metr
Correct, this patch does not introduce host/node attributes into metrics yet.
I'm planning to add node-related labels in a follow-up patch. Since this 
information is not currently part of the metrics framework, I need to further 
investigate and design how the node-level labels should be structured and 
propagated.
Will follow up on that separately.


http://gerrit.cloudera.org:8080/#/c/24096/3//COMMIT_MSG@12
PS3, Line 12: and part of at
> I guess only the ID was embedded in metric names, right?
Yes, only the ID is embedded in metric names currently.


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics-test.cc
File src/kudu/util/metrics-test.cc:

http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics-test.cc@152
PS3, Line 152: EST_F(MetricsTest, TableAndTabletPrometheus
> nit: use google::FlagSaver if it's necessary to rollback the customized set
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics-test.cc@247
PS3, Line 247:   const auto& out = output.str();
> Instead, use  google::FlagSaver for more robust flags' rollback.
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics-test.cc@837
PS3, Line 837:   hist->IncrementBy(4, 40);
> Use google::FlagSaver instead
Done


http://gerrit.cloudera.org:8080/#/c/24096/2/src/kudu/util/metrics.cc
File src/kudu/util/metrics.cc:

http://gerrit.cloudera.org:8080/#/c/24096/2/src/kudu/util/metrics.cc@42
PS2, Line 42: labe
> Marton, which later release you are thinking about?  May 1.19.0 be such a r
Thanks for the suggestion.

I agree that the legacy format is not quite usable for standard Prometheus 
workflows, and the new label-based format is the right direction.

For now I've kept the flag metrics_prometheus_use_entity_labels defaulting to 
false to avoid breaking existing environments that may have Grafana dashboards 
or alerting rules built on the legacy metric names. Flipping the default would 
immediately invalidate those setups on upgrade.

That said, I'm not very familiar with the community's release process and 
migration conventions, so I'd appreciate your and others' input on the right 
timing and approach for changing the default. I'm happy to help with whatever 
is needed — release notes, migration docs, or follow-up patches.


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc
File src/kudu/util/metrics.cc:

http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@42
PS3, Line 42: metrics_prometheus_use_entit
> Consider naming it to show it's related to metrics, e.g. metrics_prometheus
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@854
PS3, Line 854:
> style nit: move this into a separate line a well once switching to multi-li
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@1015
PS3, Line 1015: st {
> style nit: move this into a separate line a well once switching to multi-li
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@1092
PS3, Line 1092:
> style nit: move this into a separate line a well once switching to multi-li
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@1175
PS3, Line 1175:   };
              :   static constexpr const char* const kHelpTypeFmt =
              :       "# HELP $0$1 $2\n# TYPE $3$4 $5\n";
              :
              :   const char* const name = prototype_->name();
              :   DCHECK(name);
              :   const char* const unit = MetricUnit::Name(prototype_->unit());
              :   DCHECK(unit);
              :
              :   // A snapshot is taken to have more consistent statistics 
while generating
              :   // the output.
              :   const HdrHistogram h(*histogram_);
> nit: split this block and move its parts under the corresponding 'if (FLAGS
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/metrics.cc@1226
PS3, Line 1226:   } else {
              :     // Legacy format: no labels, space after comma, _sum/_count 
have no labels.
> It seems this code generates incorrect unit for the summary count.  In the
Good catch!

The _count metric indeed had two issues:
Issue 1: unit_type label — _count is a dimensionless counter, but unit_type was 
incorrectly set to the histogram's native unit. Fixed by using 
MetricUnit::Name(MetricUnit::kUnits) for _count lines, while keeping the 
original unit for quantile lines and _sum.

Issue 2: # HELP text — The HELP description for _count was generated from 
prototype_->description(), which often contains unit-specific wording like 
"Microseconds spent on appending to the log segment file". Fixed by using 
prototype_->label(), producing "Log Append Latency (count)" instead.


Now:
# HELP kudu_log_append_latency_count Log Append Latency (count)
# TYPE kudu_log_append_latency_count counter
kudu_log_append_latency_count{...,unit_type="units"} 6


Changes:
metrics.cc: Fixed both issues in 3 codepaths — Histogram (new format), 
Histogram (legacy format), and MeanGauge.

metrics-test.cc: Updated existing tests (HistogramPrometheusTest, 
HistogramLegacyPrometheusTest, MeanGaugePrometheusTest) and added 2 new 
dedicated tests (HistogramPrometheusCountUnitTest, 
MeanGaugePrometheusCountUnitTest) that verify:
1. _count carries unit_type="units";
2. _count does NOT carry the native unit;
3. _count HELP uses label-based text, not description.


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.h
File src/kudu/util/prometheus_writer.h:

http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.h@69
PS3, Line 69:     return labels + ",";
> nit: add description for this utility function
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.h@73
PS3, Line 73: // Returns labels of the form: type="master",id="<uuid>"
> nit: add description for this utility function
Done


http://gerrit.cloudera.org:8080/#/c/24096/1/src/kudu/util/prometheus_writer.cc
File src/kudu/util/prometheus_writer.cc:

http://gerrit.cloudera.org:8080/#/c/24096/1/src/kudu/util/prometheus_writer.cc@56
PS1, Line 56:
> +1
Thanks! Yes, I'm planning to add node/host labels in a follow-up patch.


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc
File src/kudu/util/prometheus_writer.cc:

http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@31
PS3, Line 31:
> style nit for there and elsewhere in this .cc file: add 'using std::string;
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@56
PS3, Line 56:
> nit: consider adding DCHECK() here to make sure 'entity_type' == "tablet"
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@66
PS3, Line 66: verPrometheus
> Make this a constant to use here and in WriteAsPrometheus() as well.
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@68
PS3, Line 68: rometheusLabels(ent
> Make this a constant to use here and in WriteAsPrometheus() as well.
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@73
PS3, Line 73: const unor
> nit: change to LOG(DFATAL) -- in release build it will be LOG(ERROR), but i
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@78
PS3, Line 78:     labels = "type=\"tserver\"";
            :   } else {
> style nit: switch to using FindOrNull() or other helper from src/kudu/gutil
Done


http://gerrit.cloudera.org:8080/#/c/24096/3/src/kudu/util/prometheus_writer.cc@91
PS3, Line 91:
> nit for here and elsewhere when well-known and controlled input is used: is
Good point, thanks for raising this.

I’ve applied the DCHECK-instead-of-escape approach for uuid (line 86–87), since 
it is internally generated and guaranteed to contain only hex characters.

For the remaining call sites:

BuildServerPrometheusLabels (line 81)
This is the LOG(DFATAL) fallback path, which should be unreachable in debug 
builds and is kept as a defensive safeguard in release builds. Escaping here is 
intentional, as this path handles unexpected states.

BuildTableTabletPrometheusLabels (line 113: table_id, table_name)
While table_id is also a hex UUID and could theoretically use DCHECK, 
table_name is user-supplied input. ValidateIdentifier() only enforces valid 
UTF-8 and a length ? 256, so it may still contain characters like \, ", or \n, 
which require escaping for Prometheus correctness.

Since both attributes are processed together in the loop over kKnownAttrs, 
splitting the logic just to avoid escaping table_id would introduce additional 
complexity. Therefore, I kept escaping for both fields to keep the 
implementation simple.

Let me know if you think it's worth separating these paths, or if this approach 
looks reasonable to you.



--
To view, visit http://gerrit.cloudera.org:8080/24096
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I690918d06f19c393369b8fc43c3ec90dd2231d3d
Gerrit-Change-Number: 24096
Gerrit-PatchSet: 4
Gerrit-Owner: Yan-Daojiang <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Yan-Daojiang <[email protected]>
Gerrit-Comment-Date: Thu, 19 Mar 2026 10:23:56 +0000
Gerrit-HasComments: Yes

Reply via email to