Re: [PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
JTaky commented on code in PR #27744:
URL: https://github.com/apache/flink/pull/27744#discussion_r3009606186
##
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java:
##
@@ -32,13 +33,16 @@
import java.io.IOException;
import java.net.URL;
+import java.util.LinkedHashMap;
import java.util.Map;
/**
* {@link MetricReporter} that exports {@link Metric Metrics} via Prometheus
{@link PushGateway}.
*/
@PublicEvolving
public class PrometheusPushGatewayReporter extends AbstractPrometheusReporter
implements Scheduled {
+public static final String REPORTER_ID_GROUPING_KEY =
Review Comment:
do we need to make this value public? I feel that package private would
allow to use it freely and encapsulate from the usage in another context
##
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java:
##
@@ -64,7 +69,10 @@ public class PrometheusPushGatewayReporter extends
AbstractPrometheusReporter im
this.basicAuthEnabled = false;
}
this.jobName = Preconditions.checkNotNull(jobName);
-this.groupingKey = Preconditions.checkNotNull(groupingKey);
+this.groupingKey =
+metricsGroupingByReporter
+? groupWithReporterId(groupingKey)
+: new LinkedHashMap<>(groupingKey);
Review Comment:
do we still need to wrap `groupingKey` into LinkedHashMap if new option
`metricsGroupingByReporter` is disabled? I think that groupingKey is not
modified later, thus LinkedHashMap could be miss-leading at least for human
reader.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
qinghui-xu commented on code in PR #27744:
URL: https://github.com/apache/flink/pull/27744#discussion_r2958864089
##
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporterOptions.java:
##
@@ -45,12 +45,15 @@ public class PrometheusPushGatewayReporterOptions {
.defaultValue("")
.withDescription("The job name under which metrics will be
pushed");
+@Deprecated
public static final ConfigOption RANDOM_JOB_NAME_SUFFIX =
ConfigOptions.key("randomJobNameSuffix")
.booleanType()
-.defaultValue(true)
+.defaultValue(false)
.withDescription(
-"Specifies whether a random suffix should be
appended to the job name.");
+"Specifies whether random suffixing `job` label
(the old way) to avoid metric collision among reporters from taskamangers."
Review Comment:
I updated the description and regenerated the doc. And it turns out
deprecating this option will discard the entry entirely from the doc site. So
I'd postpone the deprecation for a future commit (or maybe we should still keep
doc for deprecated items that are not yet marked `forRemoval`, but that's a
separate topic) cc @rionmonster
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
qinghui-xu commented on code in PR #27744:
URL: https://github.com/apache/flink/pull/27744#discussion_r2910640472
##
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporterOptions.java:
##
@@ -45,13 +45,6 @@ public class PrometheusPushGatewayReporterOptions {
.defaultValue("")
.withDescription("The job name under which metrics will be
pushed");
-public static final ConfigOption RANDOM_JOB_NAME_SUFFIX =
Review Comment:
Sent a followup commit in this sense.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
qinghui-xu commented on code in PR #27744:
URL: https://github.com/apache/flink/pull/27744#discussion_r2910638155
##
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java:
##
@@ -96,4 +104,68 @@ public void close() {
}
super.close();
}
+
+/**
+ * (FLINK-21309) Put "flink_prometheus_push_gateway_reporter_id" as the
last entry of the
+ * grouping keys, so that each taskmanger instance and jobmanager will not
collide their metrics
+ * in the PushGateway.
+ */
+static class GroupingKeyMap extends AbstractMap {
+private final Map customGroupingKeys;
+private final String reporterId = new AbstractID().toString();
+private final Set> entrySet = new
GroupingKeySet();
+
+GroupingKeyMap(Map customGroupingKeys) {
+if (customGroupingKeys.containsKey(REPORTER_ID_GROUPPING_KEY)) {
+throw new IllegalArgumentException(
Review Comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
qinghui-xu commented on code in PR #27744:
URL: https://github.com/apache/flink/pull/27744#discussion_r2910632549
##
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java:
##
@@ -32,13 +33,20 @@
import java.io.IOException;
import java.net.URL;
+import java.util.AbstractMap;
+import java.util.AbstractSet;
+import java.util.Iterator;
import java.util.Map;
+import java.util.NoSuchElementException;
+import java.util.Set;
/**
* {@link MetricReporter} that exports {@link Metric Metrics} via Prometheus
{@link PushGateway}.
*/
@PublicEvolving
public class PrometheusPushGatewayReporter extends AbstractPrometheusReporter
implements Scheduled {
+public static final String REPORTER_ID_GROUPPING_KEY =
Review Comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
qinghui-xu commented on code in PR #27744:
URL: https://github.com/apache/flink/pull/27744#discussion_r2904411827
##
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporterOptions.java:
##
@@ -45,13 +45,6 @@ public class PrometheusPushGatewayReporterOptions {
.defaultValue("")
.withDescription("The job name under which metrics will be
pushed");
-public static final ConfigOption RANDOM_JOB_NAME_SUFFIX =
Review Comment:
Good point. Let me deprecate it in the PR. I'm wondering should I also
change the default value to `false` while keeping it. I'll use this flag as a
switch: if `job` is not suffixed, a "flink_prometheus_push_gateway_reporter_id"
grouping key will be used.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
rionmonster commented on code in PR #27744:
URL: https://github.com/apache/flink/pull/27744#discussion_r2902547588
##
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java:
##
@@ -96,4 +104,68 @@ public void close() {
}
super.close();
}
+
+/**
+ * (FLINK-21309) Put "flink_prometheus_push_gateway_reporter_id" as the
last entry of the
+ * grouping keys, so that each taskmanger instance and jobmanager will not
collide their metrics
+ * in the PushGateway.
+ */
+static class GroupingKeyMap extends AbstractMap {
+private final Map customGroupingKeys;
+private final String reporterId = new AbstractID().toString();
+private final Set> entrySet = new
GroupingKeySet();
+
+GroupingKeyMap(Map customGroupingKeys) {
+if (customGroupingKeys.containsKey(REPORTER_ID_GROUPPING_KEY)) {
+throw new IllegalArgumentException(
Review Comment:
We'll want to add a test here to cover the assertion against the user
supplying "flink_prometheus_push_gateway_reporter_id" as an explicit grouping
(asserting the expected exception).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
rionmonster commented on code in PR #27744:
URL: https://github.com/apache/flink/pull/27744#discussion_r2902547588
##
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java:
##
@@ -96,4 +104,68 @@ public void close() {
}
super.close();
}
+
+/**
+ * (FLINK-21309) Put "flink_prometheus_push_gateway_reporter_id" as the
last entry of the
+ * grouping keys, so that each taskmanger instance and jobmanager will not
collide their metrics
+ * in the PushGateway.
+ */
+static class GroupingKeyMap extends AbstractMap {
+private final Map customGroupingKeys;
+private final String reporterId = new AbstractID().toString();
+private final Set> entrySet = new
GroupingKeySet();
+
+GroupingKeyMap(Map customGroupingKeys) {
+if (customGroupingKeys.containsKey(REPORTER_ID_GROUPPING_KEY)) {
+throw new IllegalArgumentException(
Review Comment:
I think we'll want a separate test case here to assert against the
user-supplied "flink_prometheus_push_gateway_reporter_id" exception being
thrown.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
rionmonster commented on code in PR #27744:
URL: https://github.com/apache/flink/pull/27744#discussion_r2902536451
##
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporterOptions.java:
##
@@ -45,13 +45,6 @@ public class PrometheusPushGatewayReporterOptions {
.defaultValue("")
.withDescription("The job name under which metrics will be
pushed");
-public static final ConfigOption RANDOM_JOB_NAME_SUFFIX =
Review Comment:
Are we sure we want to remove this configuration entirely? I’d imagine there
are existing jobs and/or users already relying on it. This would also require
updating [the associated
documentation](https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/metric_reporters/#prometheuspushgateway)
and, at a minimum, feels like something that should be deprecated before being
removed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
rionmonster commented on code in PR #27744:
URL: https://github.com/apache/flink/pull/27744#discussion_r2902536451
##
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporterOptions.java:
##
@@ -45,13 +45,6 @@ public class PrometheusPushGatewayReporterOptions {
.defaultValue("")
.withDescription("The job name under which metrics will be
pushed");
-public static final ConfigOption RANDOM_JOB_NAME_SUFFIX =
Review Comment:
Are we sure we want to remove this configuration entirely? I’d imagine there
are existing jobs and/or users already relying on it. This would also require
updating the associated documentation and, at a minimum, feels like something
that should be deprecated before being removed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
rionmonster commented on code in PR #27744:
URL: https://github.com/apache/flink/pull/27744#discussion_r2902525521
##
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java:
##
@@ -32,13 +33,20 @@
import java.io.IOException;
import java.net.URL;
+import java.util.AbstractMap;
+import java.util.AbstractSet;
+import java.util.Iterator;
import java.util.Map;
+import java.util.NoSuchElementException;
+import java.util.Set;
/**
* {@link MetricReporter} that exports {@link Metric Metrics} via Prometheus
{@link PushGateway}.
*/
@PublicEvolving
public class PrometheusPushGatewayReporter extends AbstractPrometheusReporter
implements Scheduled {
+public static final String REPORTER_ID_GROUPPING_KEY =
Review Comment:
Minor spelling nit: I think we want this to be called
`REPORTER_ID_GROUPING_KEY.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
qinghui-xu commented on PR #27744: URL: https://github.com/apache/flink/pull/27744#issuecomment-4017449394 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
flinkbot commented on PR #27744: URL: https://github.com/apache/flink/pull/27744#issuecomment-4012508106 ## CI report: * 3c79d1bc3d1af1554ff95f96d375ec38c7e8bc85 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
[PR] [FLINK-21309][metrics] Add UUID as grouping key for prom pushgateway [flink]
qinghui-xu opened a new pull request, #27744: URL: https://github.com/apache/flink/pull/27744 ## What is the purpose of the change Previously we are relying on random uuid suffix of `job` label to avoid metric collisions among taskmangers. This makes it difficult to aggregate over `job` label. Now we use the uuid as grouping key (`flink_prometheus_push_gateway_reporter_id`) instead, while respecting prometheus guide line on `job` label usage. ## Brief change log - Remove the `randomJobNameSuffix` flag from the config options - `job` label will not be suffixed - Systematically add flink_prometheus_push_gateway_reporter_id = UUID as last grouping key in all case. ## Verifying this change Please make sure both new and modified tests in this PR follow [the conventions for tests defined in our code quality guide](https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#7-testing). This change added tests and can be verified as follows: - Added unit test to make sure that flink_prometheus_push_gateway_reporter_id = UUID is appended to user provided grouping keys. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
