[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494002#comment-15494002
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2363


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache, 1.2.0
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15473381#comment-15473381
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2363
  
I'll address the checkNotNull/comment formatting while merging, which I'm 
doing now. Thank you for looking over it again @tillrohrmann .


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465326#comment-15465326
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r77540987
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -35,109 +46,111 @@
final Map taskManagers = new 
HashMap<>();
final Map jobs = new HashMap<>();
 
-   /**
-* Adds a metric to this MetricStore.
-*
-* @param name  the metric identifier
-* @param value the metric value
-*/
-   public void add(String name, Object value) {
-   TaskManagerMetricStore tm;
-   JobMetricStore job;
-   TaskMetricStore task;
-
+   public void add(MetricDump metric) {
try {
-   String[] components = name.split(":");
-   switch (components[0]) {
-   /**
-* JobManagerMetricStore metric
-* format: 0:.
-*/
-   case "0":
-   jobManager.metrics.put(components[1], 
value);
-   break;
-   /**
-* TaskManager metric
-* format: 1::.
-*/
-   case "1":
-   if (components.length != 3) {
-   break;
-   }
-   tm = taskManagers.get(components[1]);
+   QueryScopeInfo info = metric.scopeInfo;
+   TaskManagerMetricStore tm;
+   JobMetricStore job;
+   TaskMetricStore task;
+
+   String name = info.scope.isEmpty()
+   ? metric.name
+   : info.scope + "." + metric.name;
+   
+   if (name.isEmpty()) { // malformed transmission
+   return;
+   }
+
+   switch (info.getCategory()) {
+   case INFO_CATEGORY_JM:
--- End diff --

On the other hand, it does not seem too overly complicated to be not 
maintainable. With that in mind, my other comments are mainly obsolete.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465303#comment-15465303
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r77540202
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -35,109 +46,111 @@
final Map taskManagers = new 
HashMap<>();
final Map jobs = new HashMap<>();
 
-   /**
-* Adds a metric to this MetricStore.
-*
-* @param name  the metric identifier
-* @param value the metric value
-*/
-   public void add(String name, Object value) {
-   TaskManagerMetricStore tm;
-   JobMetricStore job;
-   TaskMetricStore task;
-
+   public void add(MetricDump metric) {
try {
-   String[] components = name.split(":");
-   switch (components[0]) {
-   /**
-* JobManagerMetricStore metric
-* format: 0:.
-*/
-   case "0":
-   jobManager.metrics.put(components[1], 
value);
-   break;
-   /**
-* TaskManager metric
-* format: 1::.
-*/
-   case "1":
-   if (components.length != 3) {
-   break;
-   }
-   tm = taskManagers.get(components[1]);
+   QueryScopeInfo info = metric.scopeInfo;
+   TaskManagerMetricStore tm;
+   JobMetricStore job;
+   TaskMetricStore task;
+
+   String name = info.scope.isEmpty()
+   ? metric.name
+   : info.scope + "." + metric.name;
+   
+   if (name.isEmpty()) { // malformed transmission
+   return;
+   }
+
+   switch (info.getCategory()) {
+   case INFO_CATEGORY_JM:
--- End diff --

That is true. Performance-wise it is the more efficient way to execute it, 
no doubt. I was just wondering whether this is not a case of premature 
optimization with the price of harder maintainability.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465265#comment-15465265
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r77538038
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -35,109 +46,111 @@
final Map taskManagers = new 
HashMap<>();
final Map jobs = new HashMap<>();
 
-   /**
-* Adds a metric to this MetricStore.
-*
-* @param name  the metric identifier
-* @param value the metric value
-*/
-   public void add(String name, Object value) {
-   TaskManagerMetricStore tm;
-   JobMetricStore job;
-   TaskMetricStore task;
-
+   public void add(MetricDump metric) {
try {
-   String[] components = name.split(":");
-   switch (components[0]) {
-   /**
-* JobManagerMetricStore metric
-* format: 0:.
-*/
-   case "0":
-   jobManager.metrics.put(components[1], 
value);
-   break;
-   /**
-* TaskManager metric
-* format: 1::.
-*/
-   case "1":
-   if (components.length != 3) {
-   break;
-   }
-   tm = taskManagers.get(components[1]);
+   QueryScopeInfo info = metric.scopeInfo;
+   TaskManagerMetricStore tm;
+   JobMetricStore job;
+   TaskMetricStore task;
+
+   String name = info.scope.isEmpty()
+   ? metric.name
+   : info.scope + "." + metric.name;
+   
+   if (name.isEmpty()) { // malformed transmission
+   return;
+   }
+
+   switch (info.getCategory()) {
+   case INFO_CATEGORY_JM:
--- End diff --

eh, seemed like the proper way of handling it. Also, (up to) 4 comparisons 
vs a jump.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465132#comment-15465132
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2363
  
I think the changes look good. Thanks for your work @zentol :-) I only had 
a minor question whether we can substitute the explicit category information by 
the type information of the metric dumps and the `QueryScopeInfo` instances 
(not for serialization but in the `MetricStore`).


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465125#comment-15465125
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r77529056
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/dump/MetricDump.java
 ---
@@ -25,6 +25,7 @@
public static final byte METRIC_CATEGORY_COUNTER = 0;
public static final byte METRIC_CATEGORY_GAUGE = 1;
public static final byte METRIC_CATEGORY_HISTOGRAM = 2;
+   public static final byte METRIC_CATEGORY_METER = 3;
--- End diff --

Sorry just saw your latest commit.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465122#comment-15465122
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r77528928
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/dump/MetricDump.java
 ---
@@ -25,6 +25,7 @@
public static final byte METRIC_CATEGORY_COUNTER = 0;
public static final byte METRIC_CATEGORY_GAUGE = 1;
public static final byte METRIC_CATEGORY_HISTOGRAM = 2;
+   public static final byte METRIC_CATEGORY_METER = 3;
--- End diff --

No test added for meter metric type.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465111#comment-15465111
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r77528158
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/dump/MetricDumpSerialization.java
 ---
@@ -0,0 +1,269 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.metrics.dump;
+
+import org.apache.commons.io.output.ByteArrayOutputStream;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.Gauge;
+import org.apache.flink.metrics.Histogram;
+import org.apache.flink.metrics.HistogramStatistics;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import static 
org.apache.flink.runtime.metrics.dump.QueryScopeInfo.INFO_CATEGORY_JM;
+import static 
org.apache.flink.runtime.metrics.dump.QueryScopeInfo.INFO_CATEGORY_JOB;
+import static 
org.apache.flink.runtime.metrics.dump.QueryScopeInfo.INFO_CATEGORY_OPERATOR;
+import static 
org.apache.flink.runtime.metrics.dump.QueryScopeInfo.INFO_CATEGORY_TASK;
+import static 
org.apache.flink.runtime.metrics.dump.QueryScopeInfo.INFO_CATEGORY_TM;
+
+/**
+ * Utility class for the serialization of metrics.
+ */
+public class MetricDumpSerialization {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricDumpSerialization.class);
+
+   private MetricDumpSerialization() {
+   }
+
+   // = Serialization 
=
+   public static class MetricDumpSerializer {
+   private ByteArrayOutputStream baos = new 
ByteArrayOutputStream(4096);
+   private DataOutputStream dos = new DataOutputStream(baos);
+
+   /**
+* Serializes the given metrics and returns the resulting byte 
array.
+*
+* @param counters   counters to serialize
+* @param gauges gauges to serialize
+* @param histograms histograms to serialize
+* @return byte array containing the serialized metrics
+* @throws IOException
+*/
+   public byte[] serialize(Map> counters, Map> gauges, 
Map> histograms) throws IOException {
+   baos.reset();
+   dos.writeInt(counters.size());
+   dos.writeInt(gauges.size());
+   dos.writeInt(histograms.size());
+
+   for (Map.Entry> 
entry : counters.entrySet()) {
+   serializeMetricInfo(dos, entry.getValue().f0);
+   serializeString(dos, entry.getValue().f1);
+   serializeCounter(dos, entry.getKey());
+   }
+
+   for (Map.Entry> entry : gauges.entrySet()) {
+   serializeMetricInfo(dos, entry.getValue().f0);
+   serializeString(dos, entry.getValue().f1);
+   serializeGauge(dos, entry.getKey());
+   }
+
+   for (Map.Entry> entry : histograms.entrySet()) {
+   serializeMetricInfo(dos, entry.getValue().f0);
+   

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465112#comment-15465112
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r77528207
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/dump/MetricDumpSerialization.java
 ---
@@ -0,0 +1,269 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.metrics.dump;
+
+import org.apache.commons.io.output.ByteArrayOutputStream;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.Gauge;
+import org.apache.flink.metrics.Histogram;
+import org.apache.flink.metrics.HistogramStatistics;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import static 
org.apache.flink.runtime.metrics.dump.QueryScopeInfo.INFO_CATEGORY_JM;
+import static 
org.apache.flink.runtime.metrics.dump.QueryScopeInfo.INFO_CATEGORY_JOB;
+import static 
org.apache.flink.runtime.metrics.dump.QueryScopeInfo.INFO_CATEGORY_OPERATOR;
+import static 
org.apache.flink.runtime.metrics.dump.QueryScopeInfo.INFO_CATEGORY_TASK;
+import static 
org.apache.flink.runtime.metrics.dump.QueryScopeInfo.INFO_CATEGORY_TM;
+
+/**
+ * Utility class for the serialization of metrics.
+ */
+public class MetricDumpSerialization {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricDumpSerialization.class);
+
+   private MetricDumpSerialization() {
+   }
+
+   // = Serialization 
=
+   public static class MetricDumpSerializer {
+   private ByteArrayOutputStream baos = new 
ByteArrayOutputStream(4096);
+   private DataOutputStream dos = new DataOutputStream(baos);
+
+   /**
+* Serializes the given metrics and returns the resulting byte 
array.
+*
+* @param counters   counters to serialize
+* @param gauges gauges to serialize
+* @param histograms histograms to serialize
+* @return byte array containing the serialized metrics
+* @throws IOException
+*/
+   public byte[] serialize(Map> counters, Map> gauges, 
Map> histograms) throws IOException {
+   baos.reset();
+   dos.writeInt(counters.size());
+   dos.writeInt(gauges.size());
+   dos.writeInt(histograms.size());
+
+   for (Map.Entry> 
entry : counters.entrySet()) {
+   serializeMetricInfo(dos, entry.getValue().f0);
+   serializeString(dos, entry.getValue().f1);
+   serializeCounter(dos, entry.getKey());
+   }
+
+   for (Map.Entry> entry : gauges.entrySet()) {
+   serializeMetricInfo(dos, entry.getValue().f0);
+   serializeString(dos, entry.getValue().f1);
+   serializeGauge(dos, entry.getKey());
+   }
+
+   for (Map.Entry> entry : histograms.entrySet()) {
+   serializeMetricInfo(dos, entry.getValue().f0);
+   

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465108#comment-15465108
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r77527854
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/dump/MetricDump.java
 ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.metrics.dump;
+
+/**
+ * A container for a dumped metric that contains the scope, name and 
value(s) of the metric.
+ */
+public abstract class MetricDump {
+   /** Categories to be returned by {@link MetricDump#getCategory()} to 
avoid instanceof checks. */
+   public static final byte METRIC_CATEGORY_COUNTER = 0;
+   public static final byte METRIC_CATEGORY_GAUGE = 1;
+   public static final byte METRIC_CATEGORY_HISTOGRAM = 2;
+
+   /** The scope information for the stored metric. */
+   public final QueryScopeInfo scopeInfo;
+   /** The name of the stored metric. */
+   public final String name;
+
+   private MetricDump(QueryScopeInfo scopeInfo, String name) {
+   this.scopeInfo = scopeInfo;
+   this.name = name;
--- End diff --

`checkNotNull` missing


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465106#comment-15465106
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r77527822
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/dump/MetricDump.java
 ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.metrics.dump;
+
+/**
+ * A container for a dumped metric that contains the scope, name and 
value(s) of the metric.
+ */
+public abstract class MetricDump {
+   /** Categories to be returned by {@link MetricDump#getCategory()} to 
avoid instanceof checks. */
+   public static final byte METRIC_CATEGORY_COUNTER = 0;
+   public static final byte METRIC_CATEGORY_GAUGE = 1;
+   public static final byte METRIC_CATEGORY_HISTOGRAM = 2;
+
+   /** The scope information for the stored metric. */
+   public final QueryScopeInfo scopeInfo;
+   /** The name of the stored metric. */
+   public final String name;
+
+   private MetricDump(QueryScopeInfo scopeInfo, String name) {
+   this.scopeInfo = scopeInfo;
+   this.name = name;
+   }
+
+   /**
+* Returns the category for this MetricDump.
+*
+* @return category
+*/
+   public abstract byte getCategory();
--- End diff --

I think we don't need the explicit category information, because it is 
already encoded in the sub-types of `MetricDump`.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465103#comment-15465103
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r77527637
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -35,109 +46,111 @@
final Map taskManagers = new 
HashMap<>();
final Map jobs = new HashMap<>();
 
-   /**
-* Adds a metric to this MetricStore.
-*
-* @param name  the metric identifier
-* @param value the metric value
-*/
-   public void add(String name, Object value) {
-   TaskManagerMetricStore tm;
-   JobMetricStore job;
-   TaskMetricStore task;
-
+   public void add(MetricDump metric) {
try {
-   String[] components = name.split(":");
-   switch (components[0]) {
-   /**
-* JobManagerMetricStore metric
-* format: 0:.
-*/
-   case "0":
-   jobManager.metrics.put(components[1], 
value);
-   break;
-   /**
-* TaskManager metric
-* format: 1::.
-*/
-   case "1":
-   if (components.length != 3) {
-   break;
-   }
-   tm = taskManagers.get(components[1]);
+   QueryScopeInfo info = metric.scopeInfo;
+   TaskManagerMetricStore tm;
+   JobMetricStore job;
+   TaskMetricStore task;
+
+   String name = info.scope.isEmpty()
+   ? metric.name
+   : info.scope + "." + metric.name;
+   
+   if (name.isEmpty()) { // malformed transmission
+   return;
+   }
+
+   switch (info.getCategory()) {
+   case INFO_CATEGORY_JM:
+   addMetric(jobManager.metrics, name, 
metric);
+   case INFO_CATEGORY_TM:
+   String tmID = 
((QueryScopeInfo.TaskManagerQueryScopeInfo) info).taskManagerID;
+   tm = taskManagers.get(tmID);
if (tm == null) {
tm = new 
TaskManagerMetricStore();
-   taskManagers.put(components[1], 
tm);
+   taskManagers.put(tmID, tm);
}
-   tm.metrics.put(components[2], value);
+   addMetric(tm.metrics, name, metric);
break;
-   /**
-* Job metric
-* format: 2::.
-*/
-   case "2":
-   if (components.length != 3) {
-   break;
-   }
-   job = jobs.get(components[1]);
+   case INFO_CATEGORY_JOB:
+   QueryScopeInfo.JobQueryScopeInfo 
jobInfo = (QueryScopeInfo.JobQueryScopeInfo) info;
+   job = jobs.get(jobInfo.jobID);
if (job == null) {
job = new JobMetricStore();
-   jobs.put(components[1], job);
+   jobs.put(jobInfo.jobID, job);
}
-   job.metrics.put(components[2], value);
+   addMetric(job.metrics, name, metric);
break;
-   /**
-* Task metric
-* format: 
3.
-*
-* As the 

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-09-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465100#comment-15465100
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r77527444
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -35,109 +46,111 @@
final Map taskManagers = new 
HashMap<>();
final Map jobs = new HashMap<>();
 
-   /**
-* Adds a metric to this MetricStore.
-*
-* @param name  the metric identifier
-* @param value the metric value
-*/
-   public void add(String name, Object value) {
-   TaskManagerMetricStore tm;
-   JobMetricStore job;
-   TaskMetricStore task;
-
+   public void add(MetricDump metric) {
try {
-   String[] components = name.split(":");
-   switch (components[0]) {
-   /**
-* JobManagerMetricStore metric
-* format: 0:.
-*/
-   case "0":
-   jobManager.metrics.put(components[1], 
value);
-   break;
-   /**
-* TaskManager metric
-* format: 1::.
-*/
-   case "1":
-   if (components.length != 3) {
-   break;
-   }
-   tm = taskManagers.get(components[1]);
+   QueryScopeInfo info = metric.scopeInfo;
+   TaskManagerMetricStore tm;
+   JobMetricStore job;
+   TaskMetricStore task;
+
+   String name = info.scope.isEmpty()
+   ? metric.name
+   : info.scope + "." + metric.name;
+   
+   if (name.isEmpty()) { // malformed transmission
+   return;
+   }
+
+   switch (info.getCategory()) {
+   case INFO_CATEGORY_JM:
--- End diff --

What's the benefit of having an explicit type field over using 
`instanceof`? I think encoding the type via the actual type has the advantage 
that you don't mix up classes with wrong category types.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436834#comment-15436834
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2363
  
I've updated and rebased the PR.

The scope information is now stored in a `QueryScopeInfo` inside the 
`MetricGroups`, for which sub-classes like `TaskQueryScopeInfo` exist. They 
contain fields for specific values, like the job ID, and the remaining scope 
not covered by these fields.

Metrics, or rather their scope, name and value(s), are now serialized with 
a new `MetricDumpSerializer` to a byte[] using a `Data-/ByteArrayOutputStream`. 

On the other end we have the `MetricDumpDeserializer` which deserializes 
the metrics to a `List`. A `MetricDump`is a container for the 
metric value, it's name the `QueryScopeInfo`. There are sub-classes for each 
metric type, like `CounterDump`.

Neither the `MetricQueryService` nor `MetricFetcher` know anything about 
the serialized format, just that it's a byte array.

There is no encoding for field orderings but tests that verify that the 
fields are assigned correctly. If a developer were to change the order of 
fields a test would fail, and the only way for this to make it into master 
would be if a) the test is simply changed to give a green light and b) it isn't 
noticed in the review, at which point all bets are of anyway. So i decided to 
keep it a bit simpler.

The `MetricStore#addMetric()` method has now become a bit smarter in 
regards to handling ´Histograms`. With all values being contained in a 
`HistogramDump` we now only have to analyze the scope once.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433097#comment-15433097
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2363
  
Great to hear @zentol  


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430747#comment-15430747
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2363
  
Regarding hierarchy: I'm close to being done with a container for the scope 
information.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430586#comment-15430586
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2363
  
I think this should be addressed (either way) before merging this PR.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430504#comment-15430504
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2363
  
well...currently that is _still_ done. Whether it _will_ be done once this 
is merged is up in the air.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430493#comment-15430493
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2363
  
Sorry, I meant that the hierarchy information is still encoded in a string 
and then re-parsed. Furthermore, the histogram data is sent as an object array 
without any information about the field orderings.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430475#comment-15430475
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2363
  
only Gauge values are sent as strings.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430428#comment-15430428
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2363
  
But we still send metric data as strings encoded over the wire and have no 
checks that the histogram field order is actually correct, right?


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430395#comment-15430395
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75644101
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerJobMetricGroup.java
 ---
@@ -61,9 +58,7 @@ public TaskManagerJobMetricGroup(
JobID jobId,
@Nullable String jobName) {
 
-   super(registry, jobId, jobName, scopeFormat.formatScope(parent, 
jobId, jobName));
-
-   this.parent = checkNotNull(parent);
+   super(registry, checkNotNull(parent), jobId, jobName, 
scopeFormat.formatScope(parent, jobId, jobName));
--- End diff --

fair enough.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428165#comment-15428165
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75477986
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerJobMetricGroup.java
 ---
@@ -61,9 +58,7 @@ public TaskManagerJobMetricGroup(
JobID jobId,
@Nullable String jobName) {
 
-   super(registry, jobId, jobName, scopeFormat.formatScope(parent, 
jobId, jobName));
-
-   this.parent = checkNotNull(parent);
+   super(registry, checkNotNull(parent), jobId, jobName, 
scopeFormat.formatScope(parent, jobId, jobName));
--- End diff --

But then don't do it here but in the `formatScope` method. Usually the 
method should enforce that it's preconditions are kept.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428157#comment-15428157
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75477146
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerJobMetricGroup.java
 ---
@@ -61,9 +58,7 @@ public TaskManagerJobMetricGroup(
JobID jobId,
@Nullable String jobName) {
 
-   super(registry, jobId, jobName, scopeFormat.formatScope(parent, 
jobId, jobName));
-
-   this.parent = checkNotNull(parent);
+   super(registry, checkNotNull(parent), jobId, jobName, 
scopeFormat.formatScope(parent, jobId, jobName));
--- End diff --

I'm only avoiding redundant calls, that's all.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428139#comment-15428139
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75474901
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerJobMetricGroup.java
 ---
@@ -61,9 +58,7 @@ public TaskManagerJobMetricGroup(
JobID jobId,
@Nullable String jobName) {
 
-   super(registry, jobId, jobName, scopeFormat.formatScope(parent, 
jobId, jobName));
-
-   this.parent = checkNotNull(parent);
+   super(registry, checkNotNull(parent), jobId, jobName, 
scopeFormat.formatScope(parent, jobId, jobName));
--- End diff --

Why do you do them at all if you want to save them in the first place?


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428138#comment-15428138
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75474843
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Nested data-structure to store metrics.
+ *
+ * This structure is not thread-safe.
+ */
+public class MetricStore {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricStore.class);
+
+   JobManagerMetricStore jobManager = new JobManagerMetricStore();
+   Map taskManagers = new HashMap<>();
+   Map jobs = new HashMap<>();
+
+   /**
+* Adds a metric to this MetricStore.
+*
+* @param name  the metric identifier
+* @param value the metric value
+*/
+   public void add(String name, Object value) {
--- End diff --

I think the object array representation (with the contained objects) is not 
as concise as the byte buffer representation (using your own serialization) 
because the standard Java serialization will write for every object the class 
name to identify it.

But again this falls imo into the category of premature optimization.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428131#comment-15428131
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75473604
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Nested data-structure to store metrics.
+ *
+ * This structure is not thread-safe.
+ */
+public class MetricStore {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricStore.class);
+
+   JobManagerMetricStore jobManager = new JobManagerMetricStore();
+   Map taskManagers = new HashMap<>();
+   Map jobs = new HashMap<>();
+
+   /**
+* Adds a metric to this MetricStore.
+*
+* @param name  the metric identifier
+* @param value the metric value
+*/
+   public void add(String name, Object value) {
--- End diff --

Wouldn't this just replace the object array overhead with the byte buffer 
allocation overhead?


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428127#comment-15428127
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2363
  
@tillrohrmann I've addressed most of your comments. Excluded are calling 
`checkNotNull` inside `formatScope` and the serialization format.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428112#comment-15428112
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75472729
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerJobMetricGroup.java
 ---
@@ -61,9 +58,7 @@ public TaskManagerJobMetricGroup(
JobID jobId,
@Nullable String jobName) {
 
-   super(registry, jobId, jobName, scopeFormat.formatScope(parent, 
jobId, jobName));
-
-   this.parent = checkNotNull(parent);
+   super(registry, checkNotNull(parent), jobId, jobName, 
scopeFormat.formatScope(parent, jobId, jobName));
--- End diff --

Generally i would agree, but this way save a checkNotNull call...


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427882#comment-15427882
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75451233
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/ComponentMetricGroup.java
 ---
@@ -33,19 +36,42 @@
  * group could for example include the task attempt number (more fine 
grained identification), or
  * exclude it (for continuity of the namespace across failure and 
recovery).
  */
-public abstract class ComponentMetricGroup extends AbstractMetricGroup {
+public abstract class ComponentMetricGroup 
extends AbstractMetricGroup {
--- End diff --

yes they should, the must've been removed by accident when we moved them to 
`flink-runtime`.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427881#comment-15427881
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75451065
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricQueryService.java
 ---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.metrics;
+
+import akka.actor.ActorRef;
+import akka.actor.ActorSystem;
+import akka.actor.Props;
+import akka.actor.UntypedActor;
+import org.apache.flink.api.java.tuple.Tuple3;
+import org.apache.flink.metrics.CharacterFilter;
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.Gauge;
+import org.apache.flink.metrics.Histogram;
+import org.apache.flink.metrics.HistogramStatistics;
+import org.apache.flink.metrics.Metric;
+import org.apache.flink.runtime.metrics.groups.AbstractMetricGroup;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * The MetricQueryService creates a key-value representation of all 
metrics currently registered with Flink when queried.
+ *
+ * It is realized as an actor and receives the following messages:
+ * - {@code Tuple3 => Notification of 
added Metric}
+ * - {@code Metric => Notification of removed Metric}
+ * - {@code ActorRef => Query for metric dump}
+ */
+public class MetricQueryService extends UntypedActor {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricQueryService.class);
+
+   public static final byte CATEGORY_COUNTER = 0;
+   public static final byte CATEGORY_GAUGE = 1;
+   public static final byte CATEGORY_HISTOGRAM = 2;
+
+   private final Map gauges = new HashMap<>();
+   private final Map counters = new HashMap<>();
+   private final Map histograms = new HashMap<>();
+
+   private final List toRemove = new ArrayList<>();
+
+   @Override
+   public void onReceive(Object message) throws Exception {
+   try {
+   if (message instanceof Tuple3) { // add metric
+   Tuple3 
tuple = (Tuple3) message;
+
+   String metricName = tuple.f0;
+   Metric metric = tuple.f1;
+   AbstractMetricGroup group = tuple.f2;
+
+   String name = 
group.getQueryServiceMetricIdentifier(metricName, new CharacterFilter() {
+   @Override
+   public String filterCharacters(String 
input) {
+   return input.replaceAll("[ 
:.,]", "_");
+   }
+   });
+
+   if (metric instanceof Counter) {
+   counters.put((Counter) metric, name);
+   } else if (metric instanceof Gauge) {
+   gauges.put((Gauge) metric, name);
+   } else if (metric instanceof Histogram) {
+   histograms.put((Histogram) metric, 
name);
+   }
+   } else if (message instanceof Metric) { // remove metric
+   toRemove.add((Metric) message);
+   } else if (message instanceof ActorRef) { // create dump
+   Object[] dump = createDump();
+   

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427880#comment-15427880
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75450989
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricQueryService.java
 ---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.metrics;
+
+import akka.actor.ActorRef;
+import akka.actor.ActorSystem;
+import akka.actor.Props;
+import akka.actor.UntypedActor;
+import org.apache.flink.api.java.tuple.Tuple3;
+import org.apache.flink.metrics.CharacterFilter;
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.Gauge;
+import org.apache.flink.metrics.Histogram;
+import org.apache.flink.metrics.HistogramStatistics;
+import org.apache.flink.metrics.Metric;
+import org.apache.flink.runtime.metrics.groups.AbstractMetricGroup;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * The MetricQueryService creates a key-value representation of all 
metrics currently registered with Flink when queried.
+ *
+ * It is realized as an actor and receives the following messages:
+ * - {@code Tuple3 => Notification of 
added Metric}
+ * - {@code Metric => Notification of removed Metric}
+ * - {@code ActorRef => Query for metric dump}
+ */
+public class MetricQueryService extends UntypedActor {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricQueryService.class);
+
+   public static final byte CATEGORY_COUNTER = 0;
+   public static final byte CATEGORY_GAUGE = 1;
+   public static final byte CATEGORY_HISTOGRAM = 2;
+
+   private final Map gauges = new HashMap<>();
+   private final Map counters = new HashMap<>();
+   private final Map histograms = new HashMap<>();
+
+   private final List toRemove = new ArrayList<>();
+
+   @Override
+   public void onReceive(Object message) throws Exception {
+   try {
+   if (message instanceof Tuple3) { // add metric
+   Tuple3 
tuple = (Tuple3) message;
+
+   String metricName = tuple.f0;
+   Metric metric = tuple.f1;
+   AbstractMetricGroup group = tuple.f2;
+
+   String name = 
group.getQueryServiceMetricIdentifier(metricName, new CharacterFilter() {
+   @Override
+   public String filterCharacters(String 
input) {
+   return input.replaceAll("[ 
:.,]", "_");
+   }
+   });
+
+   if (metric instanceof Counter) {
+   counters.put((Counter) metric, name);
+   } else if (metric instanceof Gauge) {
+   gauges.put((Gauge) metric, name);
+   } else if (metric instanceof Histogram) {
+   histograms.put((Histogram) metric, 
name);
+   }
+   } else if (message instanceof Metric) { // remove metric
+   toRemove.add((Metric) message);
+   } else if (message instanceof ActorRef) { // create dump
+   Object[] dump = createDump();
+   

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427876#comment-15427876
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75450661
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Nested data-structure to store metrics.
+ *
+ * This structure is not thread-safe.
+ */
+public class MetricStore {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricStore.class);
+
+   JobManagerMetricStore jobManager = new JobManagerMetricStore();
+   Map taskManagers = new HashMap<>();
+   Map jobs = new HashMap<>();
+
+   /**
+* Adds a metric to this MetricStore.
+*
+* @param name  the metric identifier
+* @param value the metric value
+*/
+   public void add(String name, Object value) {
--- End diff --

You're right, I've combined two things here. But B) also depends a little 
bit on A) (at least implementation wise)

Concerning A) I'm not quite sure whether this is not a classical case of 
premature optimization. Have we actually measured the impact of the other 
approach? It would be good to know what it costs us. I would assume that the 
overhead is negligible.

Furthermore, I'm think you can also achieve an efficient serialization 
without too many short-living objects if you don't send strings around. First 
of all, we could serialize all metrics into a continuous byte buffer. Then we 
wouldn't have the object array overhead. Next we could use our own serializers 
to store the different object types efficiently (e.g. byte encoding for 
different metric types). On the server side you would only create the metric 
name objects once and that's all. Only upon deserialization you would have to 
create a new set of these objects. But that is ok since it happens on the web 
server side. The important thing is, though, to have a serialization component 
which is responsible for serialization and deserialization. That will avoid 
spreading the encoding over multiple places.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427862#comment-15427862
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75448874
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerJobMetricGroup.java
 ---
@@ -61,9 +58,7 @@ public TaskManagerJobMetricGroup(
JobID jobId,
@Nullable String jobName) {
 
-   super(registry, jobId, jobName, scopeFormat.formatScope(parent, 
jobId, jobName));
-
-   this.parent = checkNotNull(parent);
+   super(registry, checkNotNull(parent), jobId, jobName, 
scopeFormat.formatScope(parent, jobId, jobName));
--- End diff --

Ah ok, I see. Looking at the `formatScope` method, shouldn't rather the 
method enforce the preconditions than the caller? 


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427855#comment-15427855
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75448477
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -1024,6 +1024,17 @@ class JobManager(
 
 case RequestWebMonitorPort =>
   sender() ! ResponseWebMonitorPort(webMonitorPort)
+
+case MetricRequest =>
+  metricsRegistry match {
+case Some(registry) =>
+  registry.getQueryService match {
+case Some(queryService) =>
+  queryService ! sender()
+case None =>
+  }
+case None =>
--- End diff --

If we directly ask the query service we don't need this code anyway.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427851#comment-15427851
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75448294
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/ComponentMetricGroup.java
 ---
@@ -33,19 +36,42 @@
  * group could for example include the task attempt number (more fine 
grained identification), or
  * exclude it (for continuity of the namespace across failure and 
recovery).
  */
-public abstract class ComponentMetricGroup extends AbstractMetricGroup {
+public abstract class ComponentMetricGroup 
extends AbstractMetricGroup {
--- End diff --

The `JobMetricGroup` is annotated as `internal`. I was just wondering 
whether we should do the same for this metric group and the others (to which it 
applies) as well. 


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427849#comment-15427849
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75448166
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/AbstractMetricGroup.java
 ---
@@ -54,13 +53,19 @@
  * return Counters, Gauges, etc to the code, to prevent exceptions in the 
monitored code.
  * These metrics simply do not get reported any more, when created on a 
closed group.
  */
-public abstract class AbstractMetricGroup implements MetricGroup {
+public abstract class AbstractMetricGroup 
implements MetricGroup {
--- End diff --

Sounds good. Haven't exclude the #2300 commit from the review. My bad.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427847#comment-15427847
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75447958
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricFetcher.java
 ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import akka.dispatch.OnSuccess;
+import org.apache.flink.configuration.ConfigConstants;
+import org.apache.flink.runtime.instance.ActorGateway;
+import org.apache.flink.runtime.instance.Instance;
+import org.apache.flink.runtime.messages.JobManagerMessages;
+import org.apache.flink.runtime.messages.Messages;
+import org.apache.flink.runtime.messages.webmonitor.JobDetails;
+import org.apache.flink.runtime.messages.webmonitor.MultipleJobsDetails;
+import org.apache.flink.runtime.messages.webmonitor.RequestJobDetails;
+import org.apache.flink.runtime.webmonitor.JobManagerRetriever;
+import org.apache.flink.util.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import scala.Option;
+import scala.concurrent.ExecutionContext;
+import scala.concurrent.duration.Duration;
+import scala.concurrent.duration.FiniteDuration;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * The MetricFetcher can be used to fetch metrics from the JobManager and 
all registered TaskManagers.
+ *
+ * Metrics will only be fetched when {@link MetricFetcher#update()} is 
called, provided that a sufficient time since
+ * the last call has passed.
+ */
+public class MetricFetcher {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricFetcher.class);
+
+   private final JobManagerRetriever retriever;
+   private final ExecutionContext ctx;
+   private final FiniteDuration timeout = new 
FiniteDuration(Duration.create(ConfigConstants.DEFAULT_AKKA_ASK_TIMEOUT).toMillis(),
 TimeUnit.MILLISECONDS);
+
+   private MetricStore metrics = new MetricStore();
+
+   private long lastUpdateTime;
+
+   public MetricFetcher(JobManagerRetriever retriever, ExecutionContext 
ctx) {
+   this.retriever = Preconditions.checkNotNull(retriever);
+   this.ctx = Preconditions.checkNotNull(ctx);
+   }
+
+   /**
+* Returns the MetricStore containing all stored metrics.
+*
+* @return MetricStore containing all stored metrics;
+*/
+   public MetricStore getMetricStore() {
+   return metrics;
+   }
+
+   /**
+* This method can be used to signal this MetricFetcher that the 
metrics are still in use and should be updated.
+*/
+   public void update() {
+   synchronized (this) {
+   long currentTime = System.currentTimeMillis();
+   if (currentTime - lastUpdateTime > 1) { // 10 
seconds have passed since the last update
+   lastUpdateTime = currentTime;
+   fetchMetrics();
+   }
+   }
+   }
+
+   private void fetchMetrics() {
+   try {
+   Option> 
jobManagerGatewayAndWebPort = retriever.getJobManagerGatewayAndWebPort();
+   if (jobManagerGatewayAndWebPort.isDefined()) {
+   ActorGateway jobManager = 
jobManagerGatewayAndWebPort.get()._1();
+
+   /**
+* Remove all metrics that belong to a job that 
is not running and no longer archived.
+*/
+   jobManager.ask(new 

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426941#comment-15426941
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75360196
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Nested data-structure to store metrics.
+ *
+ * This structure is not thread-safe.
+ */
+public class MetricStore {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricStore.class);
+
+   JobManagerMetricStore jobManager = new JobManagerMetricStore();
+   Map taskManagers = new HashMap<>();
+   Map jobs = new HashMap<>();
+
+   /**
+* Adds a metric to this MetricStore.
+*
+* @param name  the metric identifier
+* @param value the metric value
+*/
+   public void add(String name, Object value) {
--- End diff --

I think we are mixing 2 separate issues here:
A) the format of the transfer
B) how the retrieved metrics are added to the store

Your comment regarding reparsing the string for Histograms targets B. This 
can be fixed easily by providing specific `addHistogram/...` methods to the 
store.

Now, let's talk about A. Essentially, the goal was to reduce the overhead 
for the sender. If you look at the dumping procedure in the 
`MetricQueryService` you will find that for the sender the current requires the 
least amount of work, as far as i can tell. We don't create more elaborate 
containers (which btw. i would personally prefer too!) as this is another 
short-lived object created for every metric.

Also, in regards to sender/receiver going out of sync: we can add a test to 
guarantee that sender do not go out of sync.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426892#comment-15426892
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75355525
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricFetcher.java
 ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import akka.dispatch.OnSuccess;
+import org.apache.flink.configuration.ConfigConstants;
+import org.apache.flink.runtime.instance.ActorGateway;
+import org.apache.flink.runtime.instance.Instance;
+import org.apache.flink.runtime.messages.JobManagerMessages;
+import org.apache.flink.runtime.messages.Messages;
+import org.apache.flink.runtime.messages.webmonitor.JobDetails;
+import org.apache.flink.runtime.messages.webmonitor.MultipleJobsDetails;
+import org.apache.flink.runtime.messages.webmonitor.RequestJobDetails;
+import org.apache.flink.runtime.webmonitor.JobManagerRetriever;
+import org.apache.flink.util.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import scala.Option;
+import scala.concurrent.ExecutionContext;
+import scala.concurrent.duration.Duration;
+import scala.concurrent.duration.FiniteDuration;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * The MetricFetcher can be used to fetch metrics from the JobManager and 
all registered TaskManagers.
+ *
+ * Metrics will only be fetched when {@link MetricFetcher#update()} is 
called, provided that a sufficient time since
+ * the last call has passed.
+ */
+public class MetricFetcher {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricFetcher.class);
+
+   private final JobManagerRetriever retriever;
+   private final ExecutionContext ctx;
+   private final FiniteDuration timeout = new 
FiniteDuration(Duration.create(ConfigConstants.DEFAULT_AKKA_ASK_TIMEOUT).toMillis(),
 TimeUnit.MILLISECONDS);
+
+   private MetricStore metrics = new MetricStore();
+
+   private long lastUpdateTime;
+
+   public MetricFetcher(JobManagerRetriever retriever, ExecutionContext 
ctx) {
+   this.retriever = Preconditions.checkNotNull(retriever);
+   this.ctx = Preconditions.checkNotNull(ctx);
+   }
+
+   /**
+* Returns the MetricStore containing all stored metrics.
+*
+* @return MetricStore containing all stored metrics;
+*/
+   public MetricStore getMetricStore() {
+   return metrics;
+   }
+
+   /**
+* This method can be used to signal this MetricFetcher that the 
metrics are still in use and should be updated.
+*/
+   public void update() {
+   synchronized (this) {
+   long currentTime = System.currentTimeMillis();
+   if (currentTime - lastUpdateTime > 1) { // 10 
seconds have passed since the last update
+   lastUpdateTime = currentTime;
+   fetchMetrics();
+   }
+   }
+   }
+
+   private void fetchMetrics() {
+   try {
+   Option> 
jobManagerGatewayAndWebPort = retriever.getJobManagerGatewayAndWebPort();
+   if (jobManagerGatewayAndWebPort.isDefined()) {
+   ActorGateway jobManager = 
jobManagerGatewayAndWebPort.get()._1();
+
+   /**
+* Remove all metrics that belong to a job that 
is not running and no longer archived.
+*/
+   jobManager.ask(new RequestJobDetails(true, 

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426894#comment-15426894
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75355630
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricFetcher.java
 ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import akka.dispatch.OnSuccess;
+import org.apache.flink.configuration.ConfigConstants;
+import org.apache.flink.runtime.instance.ActorGateway;
+import org.apache.flink.runtime.instance.Instance;
+import org.apache.flink.runtime.messages.JobManagerMessages;
+import org.apache.flink.runtime.messages.Messages;
+import org.apache.flink.runtime.messages.webmonitor.JobDetails;
+import org.apache.flink.runtime.messages.webmonitor.MultipleJobsDetails;
+import org.apache.flink.runtime.messages.webmonitor.RequestJobDetails;
+import org.apache.flink.runtime.webmonitor.JobManagerRetriever;
+import org.apache.flink.util.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import scala.Option;
+import scala.concurrent.ExecutionContext;
+import scala.concurrent.duration.Duration;
+import scala.concurrent.duration.FiniteDuration;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * The MetricFetcher can be used to fetch metrics from the JobManager and 
all registered TaskManagers.
+ *
+ * Metrics will only be fetched when {@link MetricFetcher#update()} is 
called, provided that a sufficient time since
+ * the last call has passed.
+ */
+public class MetricFetcher {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricFetcher.class);
+
+   private final JobManagerRetriever retriever;
+   private final ExecutionContext ctx;
+   private final FiniteDuration timeout = new 
FiniteDuration(Duration.create(ConfigConstants.DEFAULT_AKKA_ASK_TIMEOUT).toMillis(),
 TimeUnit.MILLISECONDS);
+
+   private MetricStore metrics = new MetricStore();
+
+   private long lastUpdateTime;
+
+   public MetricFetcher(JobManagerRetriever retriever, ExecutionContext 
ctx) {
+   this.retriever = Preconditions.checkNotNull(retriever);
+   this.ctx = Preconditions.checkNotNull(ctx);
+   }
+
+   /**
+* Returns the MetricStore containing all stored metrics.
+*
+* @return MetricStore containing all stored metrics;
+*/
+   public MetricStore getMetricStore() {
+   return metrics;
+   }
+
+   /**
+* This method can be used to signal this MetricFetcher that the 
metrics are still in use and should be updated.
+*/
+   public void update() {
+   synchronized (this) {
+   long currentTime = System.currentTimeMillis();
+   if (currentTime - lastUpdateTime > 1) { // 10 
seconds have passed since the last update
+   lastUpdateTime = currentTime;
+   fetchMetrics();
+   }
+   }
+   }
+
+   private void fetchMetrics() {
+   try {
+   Option> 
jobManagerGatewayAndWebPort = retriever.getJobManagerGatewayAndWebPort();
+   if (jobManagerGatewayAndWebPort.isDefined()) {
+   ActorGateway jobManager = 
jobManagerGatewayAndWebPort.get()._1();
+
+   /**
+* Remove all metrics that belong to a job that 
is not running and no longer archived.
+*/
+   jobManager.ask(new RequestJobDetails(true, 

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426912#comment-15426912
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75356544
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/ComponentMetricGroup.java
 ---
@@ -33,19 +36,42 @@
  * group could for example include the task attempt number (more fine 
grained identification), or
  * exclude it (for continuity of the namespace across failure and 
recovery).
  */
-public abstract class ComponentMetricGroup extends AbstractMetricGroup {
+public abstract class ComponentMetricGroup 
extends AbstractMetricGroup {
--- End diff --

I'm not sure what you mean, but `JobMetricGroup` inherits from 
`ComponentMetricGroup`.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426908#comment-15426908
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75356320
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/AbstractMetricGroup.java
 ---
@@ -54,13 +53,19 @@
  * return Counters, Gauges, etc to the code, to prevent exceptions in the 
monitored code.
  * These metrics simply do not get reported any more, when created on a 
closed group.
  */
-public abstract class AbstractMetricGroup implements MetricGroup {
+public abstract class AbstractMetricGroup 
implements MetricGroup {
--- End diff --

I will address this in #2300


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426906#comment-15426906
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75356190
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricQueryService.java
 ---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.metrics;
+
+import akka.actor.ActorRef;
+import akka.actor.ActorSystem;
+import akka.actor.Props;
+import akka.actor.UntypedActor;
+import org.apache.flink.api.java.tuple.Tuple3;
+import org.apache.flink.metrics.CharacterFilter;
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.Gauge;
+import org.apache.flink.metrics.Histogram;
+import org.apache.flink.metrics.HistogramStatistics;
+import org.apache.flink.metrics.Metric;
+import org.apache.flink.runtime.metrics.groups.AbstractMetricGroup;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * The MetricQueryService creates a key-value representation of all 
metrics currently registered with Flink when queried.
+ *
+ * It is realized as an actor and receives the following messages:
+ * - {@code Tuple3 => Notification of 
added Metric}
+ * - {@code Metric => Notification of removed Metric}
+ * - {@code ActorRef => Query for metric dump}
+ */
+public class MetricQueryService extends UntypedActor {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricQueryService.class);
+
+   public static final byte CATEGORY_COUNTER = 0;
+   public static final byte CATEGORY_GAUGE = 1;
+   public static final byte CATEGORY_HISTOGRAM = 2;
+
+   private final Map gauges = new HashMap<>();
+   private final Map counters = new HashMap<>();
+   private final Map histograms = new HashMap<>();
+
+   private final List toRemove = new ArrayList<>();
+
+   @Override
+   public void onReceive(Object message) throws Exception {
+   try {
+   if (message instanceof Tuple3) { // add metric
--- End diff --

agreed.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426899#comment-15426899
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75355826
  
--- Diff: 
flink-runtime-web/src/test/java/org/apache/flink/runtime/webmonitor/metrics/AbstractMetricHandlerTest.java
 ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.apache.flink.runtime.webmonitor.JobManagerRetriever;
+import org.junit.Test;
+import scala.concurrent.ExecutionContext;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+import static org.powermock.api.mockito.PowerMockito.mock;
+
+public class AbstractMetricHandlerTest {
+   @Test
+   public void testHandleRequest() throws Exception {
+   MetricFetcher fetcher = new 
MetricFetcher(mock(JobManagerRetriever.class), mock(ExecutionContext.class));
+   MetricStore store = fetcher.getMetricStore();
+
+   JobVertexMetricsHandler handler = new 
JobVertexMetricsHandler(fetcher);
+
+   store.add("0:abc.metric1", 0);
+   store.add("1:tmid:abc.metric2", 1);
+   store.add("2:jobid:abc.metric3", 2);
+   store.add("3:jobid:taskid:subindex:abc.metric4", 3);
+   store.add("4:jobid:taskid:subindex:opname:abc.metric5", 4);
+
+   Map pathParams = new HashMap<>();
+   Map queryParams = new HashMap<>();
+
+   pathParams.put("jobid", "jobid");
+   pathParams.put("vertexid", "taskid");
+
+   String availableList = handler.handleRequest(pathParams, 
queryParams, null);
+
+   assertEquals("[" +
+   "{\"id\":\"subindex.opname.abc.metric5\"}," +
+   "{\"id\":\"subindex.abc.metric4\"}" +
+   "]",
+   availableList);
+
+   queryParams.put("get", "subindex.opname.abc.metric5");
+
+   String metricValue = handler.handleRequest(pathParams, 
queryParams, null);
+
+   assertEquals("[" +
+   
"{\"id\":\"subindex.opname.abc.metric5\",\"value\":\"4\"}" +
+   "]"
+   , metricValue
+   );
+
+   queryParams.put("get", 
"subindex.opname.abc.metric5,subindex.abc.metric4");
+
+   String metricValues = handler.handleRequest(pathParams, 
queryParams, null);
+
+   assertEquals("[" +
+   
"{\"id\":\"subindex.opname.abc.metric5\",\"value\":\"4\"}," +
+   
"{\"id\":\"subindex.abc.metric4\",\"value\":\"3\"}" +
+   "]",
+   metricValues
+   );
+   }
+
+   @Test
+   public void testInvalidListDoesNotFail() {
+   MetricFetcher fetcher = new 
MetricFetcher(mock(JobManagerRetriever.class), mock(ExecutionContext.class));
+   MetricStore store = fetcher.getMetricStore();
+
+   JobVertexMetricsHandler handler = new 
JobVertexMetricsHandler(fetcher);
+
+   store.add("0:abc.metric1", 0);
+   store.add("1:tmid:abc.metric2", 1);
+   store.add("2:jobid:abc.metric3", 2);
+   store.add("3:jobid:taskid:subindex:abc.metric4", 3);
+   store.add("4:jobid:taskid:subindex:opname:abc.metric5", 4);
+
+   Map pathParams = new HashMap<>();
+   Map queryParams = new HashMap<>();
+
+   pathParams.put("jobid", "jobid");
+   pathParams.put("vertexid", "taskid");
  

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426885#comment-15426885
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75354845
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/messages/Messages.scala 
---
@@ -47,4 +47,21 @@ object Messages {
* @return The Acknowledge case object instance.
*/
   def getAcknowledge(): Acknowledge.type = Acknowledge
+
+  /**
+* Signals that the receiver (JobManager/TaskManager) should transmit a 
dump of the
+* registered metrics.
+* 
+* This message may be send regularly by the WebInterface.
+*/
+  case object MetricRequest
+
+  /**
+* Accessor for the case object instance, to simplify Java 
interoperability.
+*
+* @return The MetricRequest case object instance.
+*/
+  def getRequestMetrics(): AnyRef = {
--- End diff --

sure


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426884#comment-15426884
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75354763
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricQueryService.java
 ---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.metrics;
+
+import akka.actor.ActorRef;
+import akka.actor.ActorSystem;
+import akka.actor.Props;
+import akka.actor.UntypedActor;
+import org.apache.flink.api.java.tuple.Tuple3;
+import org.apache.flink.metrics.CharacterFilter;
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.Gauge;
+import org.apache.flink.metrics.Histogram;
+import org.apache.flink.metrics.HistogramStatistics;
+import org.apache.flink.metrics.Metric;
+import org.apache.flink.runtime.metrics.groups.AbstractMetricGroup;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * The MetricQueryService creates a key-value representation of all 
metrics currently registered with Flink when queried.
+ *
+ * It is realized as an actor and receives the following messages:
+ * - {@code Tuple3 => Notification of 
added Metric}
+ * - {@code Metric => Notification of removed Metric}
+ * - {@code ActorRef => Query for metric dump}
+ */
+public class MetricQueryService extends UntypedActor {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricQueryService.class);
+
+   public static final byte CATEGORY_COUNTER = 0;
+   public static final byte CATEGORY_GAUGE = 1;
+   public static final byte CATEGORY_HISTOGRAM = 2;
+
+   private final Map gauges = new HashMap<>();
+   private final Map counters = new HashMap<>();
+   private final Map histograms = new HashMap<>();
+
+   private final List toRemove = new ArrayList<>();
+
+   @Override
+   public void onReceive(Object message) throws Exception {
+   try {
+   if (message instanceof Tuple3) { // add metric
+   Tuple3 
tuple = (Tuple3) message;
+
+   String metricName = tuple.f0;
+   Metric metric = tuple.f1;
+   AbstractMetricGroup group = tuple.f2;
+
+   String name = 
group.getQueryServiceMetricIdentifier(metricName, new CharacterFilter() {
+   @Override
+   public String filterCharacters(String 
input) {
+   return input.replaceAll("[ 
:.,]", "_");
+   }
+   });
+
+   if (metric instanceof Counter) {
+   counters.put((Counter) metric, name);
+   } else if (metric instanceof Gauge) {
+   gauges.put((Gauge) metric, name);
+   } else if (metric instanceof Histogram) {
+   histograms.put((Histogram) metric, 
name);
+   }
+   } else if (message instanceof Metric) { // remove metric
+   toRemove.add((Metric) message);
+   } else if (message instanceof ActorRef) { // create dump
+   Object[] dump = createDump();
+   

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426877#comment-15426877
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75354216
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerJobMetricGroup.java
 ---
@@ -61,9 +58,7 @@ public TaskManagerJobMetricGroup(
JobID jobId,
@Nullable String jobName) {
 
-   super(registry, jobId, jobName, scopeFormat.formatScope(parent, 
jobId, jobName));
-
-   this.parent = checkNotNull(parent);
+   super(registry, checkNotNull(parent), jobId, jobName, 
scopeFormat.formatScope(parent, jobId, jobName));
--- End diff --

`parent` is also used in `scopeFormat.formatScope`, as such i wanted the 
check before that call is made.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426870#comment-15426870
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75353675
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Nested data-structure to store metrics.
+ *
+ * This structure is not thread-safe.
+ */
+public class MetricStore {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricStore.class);
+
+   JobManagerMetricStore jobManager = new JobManagerMetricStore();
+   Map taskManagers = new HashMap<>();
+   Map jobs = new HashMap<>();
+
+   /**
+* Adds a metric to this MetricStore.
+*
+* @param name  the metric identifier
+* @param value the metric value
+*/
+   public void add(String name, Object value) {
+   TaskManagerMetricStore tm;
+   JobMetricStore job;
+   TaskMetricStore task;
+
+   try {
+   String[] components = name.split(":");
--- End diff --

yes, the MetricQueryService does that


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426867#comment-15426867
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75353538
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Nested data-structure to store metrics.
+ *
+ * This structure is not thread-safe.
+ */
+public class MetricStore {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricStore.class);
+
+   JobManagerMetricStore jobManager = new JobManagerMetricStore();
+   Map taskManagers = new HashMap<>();
+   Map jobs = new HashMap<>();
--- End diff --

yes


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426830#comment-15426830
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2363
  
Thanks for your contribution @zentol. I've gone over the code and made some 
inline comments. My main concern/question is actually the representation of 
metric's type and hierarchy information. I think that encoding it in a string 
and then re-parsing it on the receiver side to reconstruct the information is 
rather fragile and error-prone especially wrt maintainability. Maybe you can 
give me some background why you decided to do it so.

Apart from that, I think the code contains many tests, which I really like 
:-)


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426822#comment-15426822
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75347741
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricQueryService.java
 ---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.metrics;
+
+import akka.actor.ActorRef;
+import akka.actor.ActorSystem;
+import akka.actor.Props;
+import akka.actor.UntypedActor;
+import org.apache.flink.api.java.tuple.Tuple3;
+import org.apache.flink.metrics.CharacterFilter;
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.Gauge;
+import org.apache.flink.metrics.Histogram;
+import org.apache.flink.metrics.HistogramStatistics;
+import org.apache.flink.metrics.Metric;
+import org.apache.flink.runtime.metrics.groups.AbstractMetricGroup;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * The MetricQueryService creates a key-value representation of all 
metrics currently registered with Flink when queried.
+ *
+ * It is realized as an actor and receives the following messages:
+ * - {@code Tuple3 => Notification of 
added Metric}
+ * - {@code Metric => Notification of removed Metric}
+ * - {@code ActorRef => Query for metric dump}
+ */
+public class MetricQueryService extends UntypedActor {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricQueryService.class);
+
+   public static final byte CATEGORY_COUNTER = 0;
+   public static final byte CATEGORY_GAUGE = 1;
+   public static final byte CATEGORY_HISTOGRAM = 2;
+
+   private final Map gauges = new HashMap<>();
+   private final Map counters = new HashMap<>();
+   private final Map histograms = new HashMap<>();
+
+   private final List toRemove = new ArrayList<>();
+
+   @Override
+   public void onReceive(Object message) throws Exception {
+   try {
+   if (message instanceof Tuple3) { // add metric
+   Tuple3 
tuple = (Tuple3) message;
+
+   String metricName = tuple.f0;
+   Metric metric = tuple.f1;
+   AbstractMetricGroup group = tuple.f2;
+
+   String name = 
group.getQueryServiceMetricIdentifier(metricName, new CharacterFilter() {
+   @Override
+   public String filterCharacters(String 
input) {
+   return input.replaceAll("[ 
:.,]", "_");
+   }
+   });
+
+   if (metric instanceof Counter) {
+   counters.put((Counter) metric, name);
+   } else if (metric instanceof Gauge) {
+   gauges.put((Gauge) metric, name);
+   } else if (metric instanceof Histogram) {
+   histograms.put((Histogram) metric, 
name);
+   }
+   } else if (message instanceof Metric) { // remove metric
+   toRemove.add((Metric) message);
+   } else if (message instanceof ActorRef) { // create dump
+   Object[] dump = createDump();
+   

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426813#comment-15426813
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75346843
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -1024,6 +1024,17 @@ class JobManager(
 
 case RequestWebMonitorPort =>
   sender() ! ResponseWebMonitorPort(webMonitorPort)
+
+case MetricRequest =>
+  metricsRegistry match {
+case Some(registry) =>
+  registry.getQueryService match {
+case Some(queryService) =>
+  queryService ! sender()
+case None =>
+  }
+case None =>
--- End diff --

In both `None` cases you should return a failure message to the sender so 
that it does not have to wait until the future times out.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426812#comment-15426812
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75346742
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/messages/Messages.scala 
---
@@ -47,4 +47,21 @@ object Messages {
* @return The Acknowledge case object instance.
*/
   def getAcknowledge(): Acknowledge.type = Acknowledge
+
+  /**
+* Signals that the receiver (JobManager/TaskManager) should transmit a 
dump of the
+* registered metrics.
+* 
+* This message may be send regularly by the WebInterface.
+*/
+  case object MetricRequest
+
+  /**
+* Accessor for the case object instance, to simplify Java 
interoperability.
+*
+* @return The MetricRequest case object instance.
+*/
+  def getRequestMetrics(): AnyRef = {
--- End diff --

Can we name this method like the message?


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426808#comment-15426808
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75346431
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Nested data-structure to store metrics.
+ *
+ * This structure is not thread-safe.
+ */
+public class MetricStore {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricStore.class);
+
+   JobManagerMetricStore jobManager = new JobManagerMetricStore();
+   Map taskManagers = new HashMap<>();
+   Map jobs = new HashMap<>();
+
+   /**
+* Adds a metric to this MetricStore.
+*
+* @param name  the metric identifier
+* @param value the metric value
+*/
+   public void add(String name, Object value) {
--- End diff --

To be honest, I'm not a very big fan of encoding type and hierarchical 
information in a string which has to be reparsed in order to reconstruct the 
afore-mentioned information. The problem with this approach is that everything 
is very implicit and you don't have a tight coupling (in terms of format) 
between the sender and receiver. If something changes at the sender side, you 
won't notice at all that you have to change here something as well. Even at 
runtime the only thing you see is that you don't see the metrics. This makes it 
very hard to debug. I would be in favour of creating for the different types 
different messages, e.g. `JobManagerMetric`, `TaskManagerMetric`, `TaskMetric`, 
etc. These messages contain the respective information. Furthermore, for the 
histogram type case, you will have to do the whole parsing over and over again 
instead of doing it once.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426787#comment-15426787
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75344971
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerJobMetricGroup.java
 ---
@@ -61,9 +58,7 @@ public TaskManagerJobMetricGroup(
JobID jobId,
@Nullable String jobName) {
 
-   super(registry, jobId, jobName, scopeFormat.formatScope(parent, 
jobId, jobName));
-
-   this.parent = checkNotNull(parent);
+   super(registry, checkNotNull(parent), jobId, jobName, 
scopeFormat.formatScope(parent, jobId, jobName));
--- End diff --

shouldn't the `checkNotNull` be performed by the super constructor?


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426786#comment-15426786
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75344832
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/ComponentMetricGroup.java
 ---
@@ -33,19 +36,42 @@
  * group could for example include the task attempt number (more fine 
grained identification), or
  * exclude it (for continuity of the namespace across failure and 
recovery).
  */
-public abstract class ComponentMetricGroup extends AbstractMetricGroup {
+public abstract class ComponentMetricGroup 
extends AbstractMetricGroup {
--- End diff --

Is this an internal component like the `JobMetricGroup`?


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426784#comment-15426784
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75344502
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/ComponentMetricGroup.java
 ---
@@ -33,19 +36,42 @@
  * group could for example include the task attempt number (more fine 
grained identification), or
  * exclude it (for continuity of the namespace across failure and 
recovery).
  */
-public abstract class ComponentMetricGroup extends AbstractMetricGroup {
+public abstract class ComponentMetricGroup 
extends AbstractMetricGroup {
--- End diff --

Type parameter in the description is missing.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426783#comment-15426783
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r7534
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/AbstractMetricGroup.java
 ---
@@ -54,13 +53,19 @@
  * return Counters, Gauges, etc to the code, to prevent exceptions in the 
monitored code.
  * These metrics simply do not get reported any more, when created on a 
closed group.
  */
-public abstract class AbstractMetricGroup implements MetricGroup {
+public abstract class AbstractMetricGroup 
implements MetricGroup {
--- End diff --

Type parameter description in the JavaDocs is missing


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426776#comment-15426776
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75343567
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistry.java
 ---
@@ -207,6 +239,9 @@ public void register(Metric metric, String metricName, 
MetricGroup group) {
}
}
}
+   if (queryService != null) {
+   notifyOfAddedMetric(queryService, metric, 
metricName, (AbstractMetricGroup) group);
--- End diff --

I think it's always good practice to add the class whose static method on 
calls in front of the static method. Otherwise you easily think that this 
method is somewhere defined in this class. Consequently, I would discourage 
static imports a little bit.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426774#comment-15426774
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75343353
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistry.java
 ---
@@ -144,6 +152,30 @@ public MetricRegistry(Configuration config) {
}
}
 
+   /**
+* Initializes the MetricQueryService.
+* 
+* @param actorSystem ActorSystem to create the MetricQueryService on
+ */
+   public void startQueryService(ActorSystem actorSystem) {
+   try {
+   queryService = 
MetricQueryService.startMetricQueryService(actorSystem);
+   } catch (Exception e) {
+   LOG.warn("Could not start MetricDumpActor. No metrics 
will be submitted to the WebInterface.", e);
+   }
+   }
+
+   /**
+* Returns an ActorRef to the MetricQueryService
+* 
+* @return ActorRef to the MetricQueryService
+ */
+   public Option getQueryService() {
+   return queryService == null
--- End diff --

this is equivalent to `Option.apply(queryService)`. If queryService == 
null, then the apply call will return a `None`.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426770#comment-15426770
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75343063
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricQueryService.java
 ---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.metrics;
+
+import akka.actor.ActorRef;
+import akka.actor.ActorSystem;
+import akka.actor.Props;
+import akka.actor.UntypedActor;
+import org.apache.flink.api.java.tuple.Tuple3;
+import org.apache.flink.metrics.CharacterFilter;
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.Gauge;
+import org.apache.flink.metrics.Histogram;
+import org.apache.flink.metrics.HistogramStatistics;
+import org.apache.flink.metrics.Metric;
+import org.apache.flink.runtime.metrics.groups.AbstractMetricGroup;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * The MetricQueryService creates a key-value representation of all 
metrics currently registered with Flink when queried.
+ *
+ * It is realized as an actor and receives the following messages:
+ * - {@code Tuple3 => Notification of 
added Metric}
+ * - {@code Metric => Notification of removed Metric}
+ * - {@code ActorRef => Query for metric dump}
+ */
+public class MetricQueryService extends UntypedActor {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricQueryService.class);
+
+   public static final byte CATEGORY_COUNTER = 0;
+   public static final byte CATEGORY_GAUGE = 1;
+   public static final byte CATEGORY_HISTOGRAM = 2;
+
+   private final Map gauges = new HashMap<>();
+   private final Map counters = new HashMap<>();
+   private final Map histograms = new HashMap<>();
+
+   private final List toRemove = new ArrayList<>();
+
+   @Override
+   public void onReceive(Object message) throws Exception {
+   try {
+   if (message instanceof Tuple3) { // add metric
+   Tuple3 
tuple = (Tuple3) message;
+
+   String metricName = tuple.f0;
+   Metric metric = tuple.f1;
+   AbstractMetricGroup group = tuple.f2;
+
+   String name = 
group.getQueryServiceMetricIdentifier(metricName, new CharacterFilter() {
+   @Override
+   public String filterCharacters(String 
input) {
+   return input.replaceAll("[ 
:.,]", "_");
+   }
+   });
+
+   if (metric instanceof Counter) {
+   counters.put((Counter) metric, name);
+   } else if (metric instanceof Gauge) {
+   gauges.put((Gauge) metric, name);
+   } else if (metric instanceof Histogram) {
+   histograms.put((Histogram) metric, 
name);
+   }
+   } else if (message instanceof Metric) { // remove metric
--- End diff --

The same applies to the remove metric message and the create dump. Both 
should get a proper message, e.g. `RemoveMetric` and `CreateDump`.


> Expose metrics to Webfrontend
> -
>
>  

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426764#comment-15426764
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75342927
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricQueryService.java
 ---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.metrics;
+
+import akka.actor.ActorRef;
+import akka.actor.ActorSystem;
+import akka.actor.Props;
+import akka.actor.UntypedActor;
+import org.apache.flink.api.java.tuple.Tuple3;
+import org.apache.flink.metrics.CharacterFilter;
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.Gauge;
+import org.apache.flink.metrics.Histogram;
+import org.apache.flink.metrics.HistogramStatistics;
+import org.apache.flink.metrics.Metric;
+import org.apache.flink.runtime.metrics.groups.AbstractMetricGroup;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * The MetricQueryService creates a key-value representation of all 
metrics currently registered with Flink when queried.
+ *
+ * It is realized as an actor and receives the following messages:
+ * - {@code Tuple3 => Notification of 
added Metric}
+ * - {@code Metric => Notification of removed Metric}
+ * - {@code ActorRef => Query for metric dump}
+ */
+public class MetricQueryService extends UntypedActor {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricQueryService.class);
+
+   public static final byte CATEGORY_COUNTER = 0;
+   public static final byte CATEGORY_GAUGE = 1;
+   public static final byte CATEGORY_HISTOGRAM = 2;
+
+   private final Map gauges = new HashMap<>();
+   private final Map counters = new HashMap<>();
+   private final Map histograms = new HashMap<>();
+
+   private final List toRemove = new ArrayList<>();
+
+   @Override
+   public void onReceive(Object message) throws Exception {
+   try {
+   if (message instanceof Tuple3) { // add metric
--- End diff --

I think we should create proper message types instead of using a `Tuple3` 
here.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426740#comment-15426740
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75340930
  
--- Diff: 
flink-runtime-web/src/test/java/org/apache/flink/runtime/webmonitor/metrics/AbstractMetricHandlerTest.java
 ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.apache.flink.runtime.webmonitor.JobManagerRetriever;
+import org.junit.Test;
+import scala.concurrent.ExecutionContext;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+import static org.powermock.api.mockito.PowerMockito.mock;
+
+public class AbstractMetricHandlerTest {
--- End diff --

Should be added to all tests because this makes debugging with Travis logs 
considerably easier.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426738#comment-15426738
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75340742
  
--- Diff: 
flink-runtime-web/src/test/java/org/apache/flink/runtime/webmonitor/metrics/AbstractMetricHandlerTest.java
 ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.apache.flink.runtime.webmonitor.JobManagerRetriever;
+import org.junit.Test;
+import scala.concurrent.ExecutionContext;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+import static org.powermock.api.mockito.PowerMockito.mock;
+
+public class AbstractMetricHandlerTest {
+   @Test
+   public void testHandleRequest() throws Exception {
+   MetricFetcher fetcher = new 
MetricFetcher(mock(JobManagerRetriever.class), mock(ExecutionContext.class));
+   MetricStore store = fetcher.getMetricStore();
+
+   JobVertexMetricsHandler handler = new 
JobVertexMetricsHandler(fetcher);
+
+   store.add("0:abc.metric1", 0);
+   store.add("1:tmid:abc.metric2", 1);
+   store.add("2:jobid:abc.metric3", 2);
+   store.add("3:jobid:taskid:subindex:abc.metric4", 3);
+   store.add("4:jobid:taskid:subindex:opname:abc.metric5", 4);
+
+   Map pathParams = new HashMap<>();
+   Map queryParams = new HashMap<>();
+
+   pathParams.put("jobid", "jobid");
+   pathParams.put("vertexid", "taskid");
+
+   String availableList = handler.handleRequest(pathParams, 
queryParams, null);
+
+   assertEquals("[" +
+   "{\"id\":\"subindex.opname.abc.metric5\"}," +
+   "{\"id\":\"subindex.abc.metric4\"}" +
+   "]",
+   availableList);
+
+   queryParams.put("get", "subindex.opname.abc.metric5");
+
+   String metricValue = handler.handleRequest(pathParams, 
queryParams, null);
+
+   assertEquals("[" +
+   
"{\"id\":\"subindex.opname.abc.metric5\",\"value\":\"4\"}" +
+   "]"
+   , metricValue
+   );
+
+   queryParams.put("get", 
"subindex.opname.abc.metric5,subindex.abc.metric4");
+
+   String metricValues = handler.handleRequest(pathParams, 
queryParams, null);
+
+   assertEquals("[" +
+   
"{\"id\":\"subindex.opname.abc.metric5\",\"value\":\"4\"}," +
+   
"{\"id\":\"subindex.abc.metric4\",\"value\":\"3\"}" +
+   "]",
+   metricValues
+   );
+   }
+
+   @Test
+   public void testInvalidListDoesNotFail() {
+   MetricFetcher fetcher = new 
MetricFetcher(mock(JobManagerRetriever.class), mock(ExecutionContext.class));
+   MetricStore store = fetcher.getMetricStore();
+
+   JobVertexMetricsHandler handler = new 
JobVertexMetricsHandler(fetcher);
+
+   store.add("0:abc.metric1", 0);
+   store.add("1:tmid:abc.metric2", 1);
+   store.add("2:jobid:abc.metric3", 2);
+   store.add("3:jobid:taskid:subindex:abc.metric4", 3);
+   store.add("4:jobid:taskid:subindex:opname:abc.metric5", 4);
+
+   Map pathParams = new HashMap<>();
+   Map queryParams = new HashMap<>();
+
+   pathParams.put("jobid", "jobid");
+   pathParams.put("vertexid", 

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426732#comment-15426732
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75340401
  
--- Diff: 
flink-runtime-web/src/test/java/org/apache/flink/runtime/webmonitor/metrics/AbstractMetricHandlerTest.java
 ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.apache.flink.runtime.webmonitor.JobManagerRetriever;
+import org.junit.Test;
+import scala.concurrent.ExecutionContext;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+import static org.powermock.api.mockito.PowerMockito.mock;
+
+public class AbstractMetricHandlerTest {
--- End diff --

`extends TestLogger` missing


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426734#comment-15426734
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75340445
  
--- Diff: 
flink-runtime-web/src/test/java/org/apache/flink/runtime/webmonitor/metrics/AbstractMetricHandlerTest.java
 ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.apache.flink.runtime.webmonitor.JobManagerRetriever;
+import org.junit.Test;
+import scala.concurrent.ExecutionContext;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+import static org.powermock.api.mockito.PowerMockito.mock;
+
+public class AbstractMetricHandlerTest {
+   @Test
+   public void testHandleRequest() throws Exception {
--- End diff --

A short description would be helpful


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426720#comment-15426720
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75339017
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Nested data-structure to store metrics.
+ *
+ * This structure is not thread-safe.
+ */
+public class MetricStore {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricStore.class);
+
+   JobManagerMetricStore jobManager = new JobManagerMetricStore();
+   Map taskManagers = new HashMap<>();
+   Map jobs = new HashMap<>();
+
+   /**
+* Adds a metric to this MetricStore.
+*
+* @param name  the metric identifier
+* @param value the metric value
+*/
+   public void add(String name, Object value) {
+   TaskManagerMetricStore tm;
+   JobMetricStore job;
+   TaskMetricStore task;
+
+   try {
+   String[] components = name.split(":");
--- End diff --

Do we filter `:` out in user defined metric names?


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426718#comment-15426718
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75338668
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricStore.java
 ---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Nested data-structure to store metrics.
+ *
+ * This structure is not thread-safe.
+ */
+public class MetricStore {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricStore.class);
+
+   JobManagerMetricStore jobManager = new JobManagerMetricStore();
+   Map taskManagers = new HashMap<>();
+   Map jobs = new HashMap<>();
--- End diff --

Can these fields be final?


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426704#comment-15426704
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75337579
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricFetcher.java
 ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import akka.dispatch.OnSuccess;
+import org.apache.flink.configuration.ConfigConstants;
+import org.apache.flink.runtime.instance.ActorGateway;
+import org.apache.flink.runtime.instance.Instance;
+import org.apache.flink.runtime.messages.JobManagerMessages;
+import org.apache.flink.runtime.messages.Messages;
+import org.apache.flink.runtime.messages.webmonitor.JobDetails;
+import org.apache.flink.runtime.messages.webmonitor.MultipleJobsDetails;
+import org.apache.flink.runtime.messages.webmonitor.RequestJobDetails;
+import org.apache.flink.runtime.webmonitor.JobManagerRetriever;
+import org.apache.flink.util.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import scala.Option;
+import scala.concurrent.ExecutionContext;
+import scala.concurrent.duration.Duration;
+import scala.concurrent.duration.FiniteDuration;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * The MetricFetcher can be used to fetch metrics from the JobManager and 
all registered TaskManagers.
+ *
+ * Metrics will only be fetched when {@link MetricFetcher#update()} is 
called, provided that a sufficient time since
+ * the last call has passed.
+ */
+public class MetricFetcher {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricFetcher.class);
+
+   private final JobManagerRetriever retriever;
+   private final ExecutionContext ctx;
+   private final FiniteDuration timeout = new 
FiniteDuration(Duration.create(ConfigConstants.DEFAULT_AKKA_ASK_TIMEOUT).toMillis(),
 TimeUnit.MILLISECONDS);
+
+   private MetricStore metrics = new MetricStore();
+
+   private long lastUpdateTime;
+
+   public MetricFetcher(JobManagerRetriever retriever, ExecutionContext 
ctx) {
+   this.retriever = Preconditions.checkNotNull(retriever);
+   this.ctx = Preconditions.checkNotNull(ctx);
+   }
+
+   /**
+* Returns the MetricStore containing all stored metrics.
+*
+* @return MetricStore containing all stored metrics;
+*/
+   public MetricStore getMetricStore() {
+   return metrics;
+   }
+
+   /**
+* This method can be used to signal this MetricFetcher that the 
metrics are still in use and should be updated.
+*/
+   public void update() {
+   synchronized (this) {
+   long currentTime = System.currentTimeMillis();
+   if (currentTime - lastUpdateTime > 1) { // 10 
seconds have passed since the last update
+   lastUpdateTime = currentTime;
+   fetchMetrics();
+   }
+   }
+   }
+
+   private void fetchMetrics() {
+   try {
+   Option> 
jobManagerGatewayAndWebPort = retriever.getJobManagerGatewayAndWebPort();
+   if (jobManagerGatewayAndWebPort.isDefined()) {
+   ActorGateway jobManager = 
jobManagerGatewayAndWebPort.get()._1();
+
+   /**
+* Remove all metrics that belong to a job that 
is not running and no longer archived.
+*/
+   jobManager.ask(new 

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426701#comment-15426701
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75337340
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/MetricFetcher.java
 ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import akka.dispatch.OnSuccess;
+import org.apache.flink.configuration.ConfigConstants;
+import org.apache.flink.runtime.instance.ActorGateway;
+import org.apache.flink.runtime.instance.Instance;
+import org.apache.flink.runtime.messages.JobManagerMessages;
+import org.apache.flink.runtime.messages.Messages;
+import org.apache.flink.runtime.messages.webmonitor.JobDetails;
+import org.apache.flink.runtime.messages.webmonitor.MultipleJobsDetails;
+import org.apache.flink.runtime.messages.webmonitor.RequestJobDetails;
+import org.apache.flink.runtime.webmonitor.JobManagerRetriever;
+import org.apache.flink.util.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import scala.Option;
+import scala.concurrent.ExecutionContext;
+import scala.concurrent.duration.Duration;
+import scala.concurrent.duration.FiniteDuration;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * The MetricFetcher can be used to fetch metrics from the JobManager and 
all registered TaskManagers.
+ *
+ * Metrics will only be fetched when {@link MetricFetcher#update()} is 
called, provided that a sufficient time since
+ * the last call has passed.
+ */
+public class MetricFetcher {
+   private static final Logger LOG = 
LoggerFactory.getLogger(MetricFetcher.class);
+
+   private final JobManagerRetriever retriever;
+   private final ExecutionContext ctx;
+   private final FiniteDuration timeout = new 
FiniteDuration(Duration.create(ConfigConstants.DEFAULT_AKKA_ASK_TIMEOUT).toMillis(),
 TimeUnit.MILLISECONDS);
+
+   private MetricStore metrics = new MetricStore();
+
+   private long lastUpdateTime;
+
+   public MetricFetcher(JobManagerRetriever retriever, ExecutionContext 
ctx) {
+   this.retriever = Preconditions.checkNotNull(retriever);
+   this.ctx = Preconditions.checkNotNull(ctx);
+   }
+
+   /**
+* Returns the MetricStore containing all stored metrics.
+*
+* @return MetricStore containing all stored metrics;
+*/
+   public MetricStore getMetricStore() {
+   return metrics;
+   }
+
+   /**
+* This method can be used to signal this MetricFetcher that the 
metrics are still in use and should be updated.
+*/
+   public void update() {
+   synchronized (this) {
+   long currentTime = System.currentTimeMillis();
+   if (currentTime - lastUpdateTime > 1) { // 10 
seconds have passed since the last update
+   lastUpdateTime = currentTime;
+   fetchMetrics();
+   }
+   }
+   }
+
+   private void fetchMetrics() {
+   try {
+   Option> 
jobManagerGatewayAndWebPort = retriever.getJobManagerGatewayAndWebPort();
+   if (jobManagerGatewayAndWebPort.isDefined()) {
+   ActorGateway jobManager = 
jobManagerGatewayAndWebPort.get()._1();
+
+   /**
+* Remove all metrics that belong to a job that 
is not running and no longer archived.
+*/
+   jobManager.ask(new 

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426620#comment-15426620
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2363#discussion_r75328166
  
--- Diff: 
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/metrics/AbstractMetricsHandler.java
 ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.runtime.webmonitor.metrics;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import org.apache.flink.runtime.instance.ActorGateway;
+import org.apache.flink.runtime.webmonitor.handlers.JsonFactory;
+import org.apache.flink.runtime.webmonitor.handlers.RequestHandler;
+import org.apache.flink.util.Preconditions;
+
+import java.io.IOException;
+import java.io.StringWriter;
+import java.util.Map;
+
+/**
+ * Abstract request handler that returns a list of all available metrics 
or the values for a set of metrics.
+ *
+ * If the query parameters do not contain a "get" parameter the list of 
all metrics is returned.
+ * {@code {"available": [ { "name" : "X", "id" : "X" } ] } }
--- End diff --

`"name" : "X"` won't be written, will it?


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426606#comment-15426606
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2363
  
Fixed the failing test.


> Expose metrics to Webfrontend
> -
>
> Key: FLINK-4389
> URL: https://issues.apache.org/jira/browse/FLINK-4389
> Project: Flink
>  Issue Type: Sub-task
>  Components: Metrics, Webfrontend
>Affects Versions: 1.1.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

2016-08-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418767#comment-15418767
 ] 

ASF GitHub Bot commented on FLINK-4389:
---

GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/2363

[FLINK-4389] Expose metrics to WebFrontend

This PR exposes metrics to the Webfrontend, as proposed in 
[FLIP-7](https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface).

This PR builds on-top of #2300, meaning that 2866f56 is not part of the PR.

I've split the implementation into 5 commits that implement
* the generation of a separate scope string for the WebInterface
* the MetricQueryService, a separate actor running on all Job-/TaskManagers 
whose main purpose is to create and return a dump of the metrics when queried 
to do so
* the MetricStore, a nested data structure used in the WebInterface to 
store transmitted metrics
* the MetricFetcher, which is used by the WebInterface to fetch metrics 
from Job-/TaskManagers
* various MetricsHandler classes, which handle REST calls requesting 
specific metrics

### MetricQueryService
The MetricQueryService is an actor running inside the MetricRegistry acting 
like an unscheduled reporter that is queried from the outside for a report. The 
MetricRegistry notifies it of added/removed metrics whereas the MetricFetcher 
sends report requests to the JM/TM which are then forwarded to the 
MetricQueryService, which answers directly to the MetricFetcher.

The report is one big `Object[]`, which contains for each metric
 1. the type of the metric, encoded as a byte (so that we know how many 
values are transmitted)
 2. the fully qualified metric name (based on the separate format)
 3. the value(s) of the metric (turned into Strings for Gauges)

### MetricStore
The MetricStore is a relatively simple nested data-structure that contains 
one HashMap for every JM/TM/job/task. Received metrics are 
added to these HashMaps based on the format string. There is only a single 
MetricStore instance in the WebInterface.

### MetricFetcher
The MetricFetcher initiates the transfer and cleanup of metrics. It 
contains the MetricStore instance, which is accessed by MetricHandlers. The 
fetching is only done when a handler asks for it, with a minimum duration of 10 
seconds between updates. As such no fetching will be done if the metrics are 
not accessed with REST calls.

The fetching procedure can be summed up in pseudo-code as following:
```
fetch():
askJobManagerForJobDetails()
=> retain all metrics belonging to the given jobs
askJobManagerForMetrics()
=> add received metrics to MetricStore
askJobManagerForRegisteredTaskManagers()
=> retain all metrics belonging to registered task managers
=> for each TaskManager:
askTaskManagerForMetrics()
=> add received metrics to MetricStore
```

### MetricsHandler
The MetricsHandlers deal with two requests:
* getAllAvailableMetrics - any REST request that does not have a `get` 
query parameter is treated as a request for all available metrics for a given 
JM/TM/job/task, denoted by the REST path. The reply will be a JSON array, for 
example: `[{"id":"metric_1"},{"id":"metric_2"}]`
* getMetricValues - the Webfrontend can request the values for several 
metrics by passing a comma-separated list of metric id's as the `get` query 
parameter. The reply will be a JSON array of id:value pairs, for example: 
`[{"id":"metric_1", "value":"4"}]` or an empty string if an error occurred.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink 4389_metrics_exposed

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2363


commit ea0e4d892717f042acf26ec9653a2371d7b21028
Author: zentol 
Date:   2016-07-27T09:25:27Z

[FLINK-4245] Expose all defined variables

commit ea1154644566f8009ccda64a0acbdde7d59ad235
Author: zentol 
Date:   2016-08-05T11:54:37Z

Implement Query Scope

Modifies various MetricGroups to return a separate scope for the query 
service.

commit 3791a94529d703351dffb284ed3d5d19f1ce272c
Author: zentol 
Date:   2016-08-05T11:49:10Z

Implement MetricQueryService

Used on the JM/TM to create a key-value representation of all metrics.

commit a0e1418decc8a3a4b53da15dc744f1702247db9f
Author: zentol 
Date: