[GitHub] Tom-Goong commented on a change in pull request #7820: [FLINK-11742][Metrics]Push metrics to Pushgateway without "instance"

2019-03-02 Thread GitBox
Tom-Goong commented on a change in pull request #7820: 
[FLINK-11742][Metrics]Push metrics to Pushgateway without "instance"
URL: https://github.com/apache/flink/pull/7820#discussion_r261851802
 
 

 ##
 File path: 
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java
 ##
 @@ -73,7 +77,7 @@ public void open(MetricConfig config) {
@Override
public void report() {
try {
-   pushGateway.push(CollectorRegistry.defaultRegistry, 
jobName);
+   pushGateway.push(CollectorRegistry.defaultRegistry, 
jobName, instance);
 
 Review comment:
   Compared with **instance = ""**, **instance = jobname** just enriches the 
meaning of the indicator, but does not solve the data loss problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Tom-Goong commented on a change in pull request #7820: [FLINK-11742][Metrics]Push metrics to Pushgateway without "instance"

2019-03-02 Thread GitBox
Tom-Goong commented on a change in pull request #7820: 
[FLINK-11742][Metrics]Push metrics to Pushgateway without "instance"
URL: https://github.com/apache/flink/pull/7820#discussion_r261851648
 
 

 ##
 File path: 
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java
 ##
 @@ -73,7 +77,7 @@ public void open(MetricConfig config) {
@Override
public void report() {
try {
-   pushGateway.push(CollectorRegistry.defaultRegistry, 
jobName);
+   pushGateway.push(CollectorRegistry.defaultRegistry, 
jobName, instance);
 
 Review comment:
   In fact, using a random string is also a compromise. However, if you only 
use **jobName** as the instance, putgateway will merge into one according to 
the **metric name and instance**, and the data will be lost. Just like the 
picture I wrote in JIRA  (https://issues.apache.org/jira/browse/FLINK-11742 ). 
Now I adjust the instance to a splicing of jobName and random strings, which 
can reduce confusion.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Tom-Goong commented on a change in pull request #7820: [FLINK-11742][Metrics]Push metrics to Pushgateway without "instance"

2019-02-28 Thread GitBox
Tom-Goong commented on a change in pull request #7820: 
[FLINK-11742][Metrics]Push metrics to Pushgateway without "instance"
URL: https://github.com/apache/flink/pull/7820#discussion_r261476961
 
 

 ##
 File path: 
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java
 ##
 @@ -73,7 +77,7 @@ public void open(MetricConfig config) {
@Override
public void report() {
try {
-   pushGateway.push(CollectorRegistry.defaultRegistry, 
jobName);
+   pushGateway.push(CollectorRegistry.defaultRegistry, 
jobName, instance);
 
 Review comment:
   First of all, thank you very much for the information provided, which will 
help me better understand Flink related knowledge.
   
   This question depends on how we view the Flink cluster.
   
   First, consider the cluster as a black box. JM is the same as "dispatch" in 
SpringMVC. That whole cluster is a Promethues Job.
   
   Second, the JM class is compared to a more feature-rich nginx. Then 
different jobs correspond to different Promethues Jobs.
   
   Third, treat each Flink Job as a microservice cluster. The same function of 
Task and even Sub-Task is a specific micro-service function, a series of 
associated micro-services complete the entire business. Because the above two 
methods need to solve a problem, different TM runs this different Flink Job's 
task, the same task runs on different TMs. This means that the TM's logo cannot 
accurately separate the tasks of different Flink Jobs, metrics of the same name 
still conflict . JM and TM are just the running containers for jobs. So in 
theory, we need to be precise to the Task and even the Sub-Task level, then use 
the Job prefix to group them.
   
   To solve this problem perfectly, we need a comprehensive understanding of 
the entire Metric system. Maybe we should close this PR and decide how to deal 
with it after a thorough discussion in the community.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Tom-Goong commented on a change in pull request #7820: [FLINK-11742][Metrics]Push metrics to Pushgateway without "instance"

2019-02-28 Thread GitBox
Tom-Goong commented on a change in pull request #7820: 
[FLINK-11742][Metrics]Push metrics to Pushgateway without "instance"
URL: https://github.com/apache/flink/pull/7820#discussion_r261221674
 
 

 ##
 File path: 
flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java
 ##
 @@ -73,7 +77,7 @@ public void open(MetricConfig config) {
@Override
public void report() {
try {
-   pushGateway.push(CollectorRegistry.defaultRegistry, 
jobName);
+   pushGateway.push(CollectorRegistry.defaultRegistry, 
jobName, instance);
 
 Review comment:
   > In Prometheus terms, an endpoint you can scrape is called an instance, 
usually corresponding to a single process. A collection of instances with the 
same purpose, a process replicated for scalability or reliability for example, 
is called a job.
   
   > For example, an API server job with four replicated instances:
   job: api-server
   --  instance 1: 1.2.3.4:5670
   --  instance 2: 1.2.3.4:5671
   --  instance 3: 5.6.7.8:5670
   --  instance 4: 5.6.7.8:5671
   
   https://prometheus.io/docs/concepts/jobs_instances/#jobs-and-instances
   
   I think a Flink job corresponds to a Prometheus job, and taskmanager and 
jobmanager correspond to different instances. If the jobName is used as the 
instance label, the same metrics of different tasksmanages will conflict, and 
operations such as sum will fail.
   
   I failed to use ip:port to form an instance label,  because I am not 
familiar enough with the Flink source. So  use this the easiest way


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services