Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints

2016-06-09 Thread Aravindan Vijayan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48030/#review136841
---



This patch has been committed. Please close this review.

- Aravindan Vijayan


On June 9, 2016, 1:58 a.m., Jungtaek Lim wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48030/
> ---
> 
> (Updated June 9, 2016, 1:58 a.m.)
> 
> 
> Review request for Ambari, Aravindan Vijayan, Sriharsha Chintalapani, and Sid 
> Wagle.
> 
> 
> Bugs: AMBARI-16946
> https://issues.apache.org/jira/browse/AMBARI-16946
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> There's a mismatch between TimelineMetricsCache and Storm metrics unit, while 
> TimelineMetricsCache considers "metric name + timestamp" to be unique but 
> Storm is not.
> 
> For example, assume that bolt B has task T1, T2 and B has registered metrics 
> M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same 
> timestamp TS1 (in TaskInfo, not current time), and received later will be 
> discarded from TimelineMetricsCache.
> 
> If we want to have unique metric point of Storm, we should use "topology name 
> + component name + task id + metric name" to metric name so that "metric name 
> + timestamp" will be unique.
> 
> There're other issues I would like to address, too.
> 
> - Currently, hostname is written to hostname of the machine which runs 
> metrics sink. Since TaskInfo has hostname of the machine which runs task, 
> we're better to use this.
> - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this 
> as millisecond, resulting in timestamp flaw, and malfunction of cache 
> eviction. It should be multiplied by 1000.
> - 'component name' is not unique across the cluster, so it's not fit for app 
> id. 'topology name' is unique so proper value of app id is topology name.
> 
> Consideration: Hostname for determining 'write shard' is set to hostname of 
> the machine which runs metrics sink. Since I don't know read also be sharded, 
> I'm not sure it's safe to use TaskInfo's hostname to hostname of 
> TimelineMetric. Please review carefully regarding this.
> 
> 
> Diffs
> -
> 
>   
> ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java
>  02f5598 
>   
> ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java
>  8171a4d 
> 
> Diff: https://reviews.apache.org/r/48030/diff/
> 
> 
> Testing
> ---
> 
> I tested this only ambari-metrics module since changeset is not related on 
> other modules.
> 
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] ambari-metrics . SUCCESS [  0.896 
> s]
> [INFO] Ambari Metrics Common .. SUCCESS [ 13.386 
> s]
> [INFO] Ambari Metrics Hadoop Sink . SUCCESS [  5.085 
> s]
> [INFO] Ambari Metrics Flume Sink .. SUCCESS [  6.827 
> s]
> [INFO] Ambari Metrics Kafka Sink .. SUCCESS [  4.190 
> s]
> [INFO] Ambari Metrics Storm Sink .. SUCCESS [  1.384 
> s]
> [INFO] Ambari Metrics Collector ... SUCCESS [04:06 
> min]
> [INFO] Ambari Metrics Monitor . SUCCESS [  3.556 
> s]
> [INFO] Ambari Metrics Grafana . SUCCESS [01:03 
> min]
> [INFO] Ambari Metrics Assembly  SUCCESS [  3.567 
> s]
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
> [INFO] Total time: 05:48 min
> [INFO] Finished at: 2016-05-30T16:46:07+09:00
> [INFO] Final Memory: 78M/1038M
> [INFO] 
> 
> 
> In fact, I tried to run `mvn test` from ambari root directory but build is 
> failing from ambari-web.
> 
> ```
> > fsevents@0.2.1 install 
> > /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents
> > node-gyp rebuild
> ...
> npm WARN install:fsevents fsevents@0.2.1 install: `node-gyp rebuild`
> npm WARN install:fsevents Exit status 1
> ```
> 
> No luck on `npm install`, too.
> 
> 
> Thanks,
> 
> Jungtaek Lim
> 
>



Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints

2016-06-09 Thread Sriharsha Chintalapani


> On June 7, 2016, 6:07 a.m., Sid Wagle wrote:
> > Changes look good, only thing to consider is the changes to the metric 
> > name. Cluster Aggregation will not occur at topology level since appId = 
> > topologyName for metrics with the same metric name. Is the metric name to 
> > fine grained? Only thing to consider is whether we need task metrics to be 
> > aggregated across topology? If yes, taskId cannot be part of the metric 
> > name. 
> > 
> > Could you also add Aravindan Vijayan to the reviewers? Thanks.
> 
> Sid Wagle wrote:
> Rephrase: Cluster Aggregation will *now* occur at topology level
> 
> Jungtaek Lim wrote:
> > Only thing to consider is whether we need task metrics to be aggregated 
> across topology? If yes, taskId cannot be part of the metric name. 
> 
> It depends on users, but most cases I don't think user will aggregate 
> metrics across topologies.
> 
> And like what I was saying, technically there're no way to aggregate 
> metrics on sink side since parallelism of sink can be set to more than 1 (I 
> mean, we could have multiple aggregation points which breaks aggregation.)
> So we need to push task level metrics into AMS, and task id should be 
> included as metric name for making it unique.
> 
> Based on that, we need `series aggregation` to aggregate task level 
> metrics by higher level. (I'm trying to address this via AMBARI-17027)
> 
> The only thing which affects aggregation is appId.
> 
> - When we set appId to 'component name' (current), same component (Spout, 
> Bolt) name across topologies can be queried together.
> - When we set appId to 'topology name', same metric name (for Storm's 
> view) from different components in topology can be queried together.
> - When we set appId to 'Storm', all metrics can be queried together. 
> (metric name should also include topology name as well)
> 
> I'm not familiar with structure of metrics so I'm not sure how they 
> affects performance while storing / querying. So I'd like to hear opinions on 
> reviewers.
> 
> For reference, below is how storm-graphite composes prefix of metric 
> name. It uses topology name, component name, worker host, worker port, task 
> id.
> 
> https://github.com/verisign/storm-graphite/blob/master/src/main/java/com/verisign/storm/metrics/GraphiteMetricsConsumer.java#L278
> 
> When querying metrics from Graphite users can query with wildcards & 
> series function to aggregate metrics into one and Grafana can show that. 
> That's what I want to address to AMS.
> 
> Aravindan Vijayan wrote:
> Jungtaek, there is also another level of classification called 
> "instancedId" Every appId can have multiple instanceIds. You should find that 
> in the TimelineMetric object. Perhaps, we can use that here.
> 
> Sriharsha Chintalapani wrote:
> @Jungtaek what Sid was saying is we won't be able to see the aggregated 
> metrics for a single topology if we use taskId. Lets say if I want to see how 
> many tuples are processed in last hour for that topology that aggregation 
> won't be possible using taskId.
> 
> Jungtaek Lim wrote:
> @Aravindan
> Yeah, actually I was trying to use instanceId for the first time, but 
> TimelineMetricCache (TimelineMetricHolder) only checks metric name as unique 
> key while putting so that's what I described. Furthermore, I would want to 
> aggregate task level metrics into component level metrics but there's no 
> wildcard support on instanceId. So there seems to much effort to use 
> instanceId for taskId, and having taskId as metric name is easier way to 
> achieve aggregation for now.
> But I'm open to use instanceId, and support wildcard & aggregation on 
> this. Please let me know we would want to use instanceId. @Sid @Sriharsha
> 
> @Sriharsha
> I guess what Sid was saying is opposite to what you're saying. Let's 
> pretend cluster aggregation is occurred.
> 
> A. current
> 
> appId is component name and metric name is just metric name so there's no 
> distinction between topologies. (topology name is not placed anywhere now) So 
> cluster aggregation will aggregate metrics across topologies, and we can't 
> query last 1 hours of Topology T1, Bolt B1, Metric M1. There're no way to 
> query within Topology T1 regardless of aggregation.
> 
> B. after patch
> 
> appId is topology name and metric name is component name + task id + 
> metric name so metrics are classified as topology name. So cluster 
> aggregation will aggregate metrics for each topology, and we can't query last 
> 1 hours of all topologies, Bolt B1, Metric M1.
> There're no way to query across topologies regardless of aggregation.
> 
> So if we want to have full flexible queries, topology name also should be 
> placed to metric name as same as component name and task id.
> 
> @Sid Please correct me if I'm wrong.
> 
> Metrics aggregation at 

Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints

2016-06-09 Thread Sriharsha Chintalapani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48030/#review136767
---


Ship it!




Ship It!

- Sriharsha Chintalapani


On June 9, 2016, 1:58 a.m., Jungtaek Lim wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48030/
> ---
> 
> (Updated June 9, 2016, 1:58 a.m.)
> 
> 
> Review request for Ambari, Aravindan Vijayan, Sriharsha Chintalapani, and Sid 
> Wagle.
> 
> 
> Bugs: AMBARI-16946
> https://issues.apache.org/jira/browse/AMBARI-16946
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> There's a mismatch between TimelineMetricsCache and Storm metrics unit, while 
> TimelineMetricsCache considers "metric name + timestamp" to be unique but 
> Storm is not.
> 
> For example, assume that bolt B has task T1, T2 and B has registered metrics 
> M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same 
> timestamp TS1 (in TaskInfo, not current time), and received later will be 
> discarded from TimelineMetricsCache.
> 
> If we want to have unique metric point of Storm, we should use "topology name 
> + component name + task id + metric name" to metric name so that "metric name 
> + timestamp" will be unique.
> 
> There're other issues I would like to address, too.
> 
> - Currently, hostname is written to hostname of the machine which runs 
> metrics sink. Since TaskInfo has hostname of the machine which runs task, 
> we're better to use this.
> - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this 
> as millisecond, resulting in timestamp flaw, and malfunction of cache 
> eviction. It should be multiplied by 1000.
> - 'component name' is not unique across the cluster, so it's not fit for app 
> id. 'topology name' is unique so proper value of app id is topology name.
> 
> Consideration: Hostname for determining 'write shard' is set to hostname of 
> the machine which runs metrics sink. Since I don't know read also be sharded, 
> I'm not sure it's safe to use TaskInfo's hostname to hostname of 
> TimelineMetric. Please review carefully regarding this.
> 
> 
> Diffs
> -
> 
>   
> ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java
>  02f5598 
>   
> ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java
>  8171a4d 
> 
> Diff: https://reviews.apache.org/r/48030/diff/
> 
> 
> Testing
> ---
> 
> I tested this only ambari-metrics module since changeset is not related on 
> other modules.
> 
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] ambari-metrics . SUCCESS [  0.896 
> s]
> [INFO] Ambari Metrics Common .. SUCCESS [ 13.386 
> s]
> [INFO] Ambari Metrics Hadoop Sink . SUCCESS [  5.085 
> s]
> [INFO] Ambari Metrics Flume Sink .. SUCCESS [  6.827 
> s]
> [INFO] Ambari Metrics Kafka Sink .. SUCCESS [  4.190 
> s]
> [INFO] Ambari Metrics Storm Sink .. SUCCESS [  1.384 
> s]
> [INFO] Ambari Metrics Collector ... SUCCESS [04:06 
> min]
> [INFO] Ambari Metrics Monitor . SUCCESS [  3.556 
> s]
> [INFO] Ambari Metrics Grafana . SUCCESS [01:03 
> min]
> [INFO] Ambari Metrics Assembly  SUCCESS [  3.567 
> s]
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
> [INFO] Total time: 05:48 min
> [INFO] Finished at: 2016-05-30T16:46:07+09:00
> [INFO] Final Memory: 78M/1038M
> [INFO] 
> 
> 
> In fact, I tried to run `mvn test` from ambari root directory but build is 
> failing from ambari-web.
> 
> ```
> > fsevents@0.2.1 install 
> > /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents
> > node-gyp rebuild
> ...
> npm WARN install:fsevents fsevents@0.2.1 install: `node-gyp rebuild`
> npm WARN install:fsevents Exit status 1
> ```
> 
> No luck on `npm install`, too.
> 
> 
> Thanks,
> 
> Jungtaek Lim
> 
>



Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints

2016-06-08 Thread Jungtaek Lim

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48030/
---

(Updated 6 9, 2016, 1:58 오전)


Review request for Ambari, Aravindan Vijayan, Sriharsha Chintalapani, and Sid 
Wagle.


Changes
---

Addressing AMBARI-17133: Replace underscore character from Storm metrics sink.

Since AMBARI-17133 should be built on top of AMBARI-16946 but it's tiny size I 
would like to let AMBARI-16946 containing AMBARI-17133.


Bugs: AMBARI-16946
https://issues.apache.org/jira/browse/AMBARI-16946


Repository: ambari


Description
---

There's a mismatch between TimelineMetricsCache and Storm metrics unit, while 
TimelineMetricsCache considers "metric name + timestamp" to be unique but Storm 
is not.

For example, assume that bolt B has task T1, T2 and B has registered metrics 
M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same 
timestamp TS1 (in TaskInfo, not current time), and received later will be 
discarded from TimelineMetricsCache.

If we want to have unique metric point of Storm, we should use "topology name + 
component name + task id + metric name" to metric name so that "metric name + 
timestamp" will be unique.

There're other issues I would like to address, too.

- Currently, hostname is written to hostname of the machine which runs metrics 
sink. Since TaskInfo has hostname of the machine which runs task, we're better 
to use this.
- Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this 
as millisecond, resulting in timestamp flaw, and malfunction of cache eviction. 
It should be multiplied by 1000.
- 'component name' is not unique across the cluster, so it's not fit for app 
id. 'topology name' is unique so proper value of app id is topology name.

Consideration: Hostname for determining 'write shard' is set to hostname of the 
machine which runs metrics sink. Since I don't know read also be sharded, I'm 
not sure it's safe to use TaskInfo's hostname to hostname of TimelineMetric. 
Please review carefully regarding this.


Diffs (updated)
-

  
ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java
 02f5598 
  
ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java
 8171a4d 

Diff: https://reviews.apache.org/r/48030/diff/


Testing
---

I tested this only ambari-metrics module since changeset is not related on 
other modules.

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] ambari-metrics . SUCCESS [  0.896 s]
[INFO] Ambari Metrics Common .. SUCCESS [ 13.386 s]
[INFO] Ambari Metrics Hadoop Sink . SUCCESS [  5.085 s]
[INFO] Ambari Metrics Flume Sink .. SUCCESS [  6.827 s]
[INFO] Ambari Metrics Kafka Sink .. SUCCESS [  4.190 s]
[INFO] Ambari Metrics Storm Sink .. SUCCESS [  1.384 s]
[INFO] Ambari Metrics Collector ... SUCCESS [04:06 min]
[INFO] Ambari Metrics Monitor . SUCCESS [  3.556 s]
[INFO] Ambari Metrics Grafana . SUCCESS [01:03 min]
[INFO] Ambari Metrics Assembly  SUCCESS [  3.567 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 05:48 min
[INFO] Finished at: 2016-05-30T16:46:07+09:00
[INFO] Final Memory: 78M/1038M
[INFO] 

In fact, I tried to run `mvn test` from ambari root directory but build is 
failing from ambari-web.

```
> fsevents@0.2.1 install 
> /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents
> node-gyp rebuild
...
npm WARN install:fsevents fsevents@0.2.1 install: `node-gyp rebuild`
npm WARN install:fsevents Exit status 1
```

No luck on `npm install`, too.


Thanks,

Jungtaek Lim



Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints

2016-06-07 Thread Jungtaek Lim


> On 6 7, 2016, 6:07 오전, Sid Wagle wrote:
> > Changes look good, only thing to consider is the changes to the metric 
> > name. Cluster Aggregation will not occur at topology level since appId = 
> > topologyName for metrics with the same metric name. Is the metric name to 
> > fine grained? Only thing to consider is whether we need task metrics to be 
> > aggregated across topology? If yes, taskId cannot be part of the metric 
> > name. 
> > 
> > Could you also add Aravindan Vijayan to the reviewers? Thanks.
> 
> Sid Wagle wrote:
> Rephrase: Cluster Aggregation will *now* occur at topology level
> 
> Jungtaek Lim wrote:
> > Only thing to consider is whether we need task metrics to be aggregated 
> across topology? If yes, taskId cannot be part of the metric name. 
> 
> It depends on users, but most cases I don't think user will aggregate 
> metrics across topologies.
> 
> And like what I was saying, technically there're no way to aggregate 
> metrics on sink side since parallelism of sink can be set to more than 1 (I 
> mean, we could have multiple aggregation points which breaks aggregation.)
> So we need to push task level metrics into AMS, and task id should be 
> included as metric name for making it unique.
> 
> Based on that, we need `series aggregation` to aggregate task level 
> metrics by higher level. (I'm trying to address this via AMBARI-17027)
> 
> The only thing which affects aggregation is appId.
> 
> - When we set appId to 'component name' (current), same component (Spout, 
> Bolt) name across topologies can be queried together.
> - When we set appId to 'topology name', same metric name (for Storm's 
> view) from different components in topology can be queried together.
> - When we set appId to 'Storm', all metrics can be queried together. 
> (metric name should also include topology name as well)
> 
> I'm not familiar with structure of metrics so I'm not sure how they 
> affects performance while storing / querying. So I'd like to hear opinions on 
> reviewers.
> 
> For reference, below is how storm-graphite composes prefix of metric 
> name. It uses topology name, component name, worker host, worker port, task 
> id.
> 
> https://github.com/verisign/storm-graphite/blob/master/src/main/java/com/verisign/storm/metrics/GraphiteMetricsConsumer.java#L278
> 
> When querying metrics from Graphite users can query with wildcards & 
> series function to aggregate metrics into one and Grafana can show that. 
> That's what I want to address to AMS.
> 
> Aravindan Vijayan wrote:
> Jungtaek, there is also another level of classification called 
> "instancedId" Every appId can have multiple instanceIds. You should find that 
> in the TimelineMetric object. Perhaps, we can use that here.
> 
> Sriharsha Chintalapani wrote:
> @Jungtaek what Sid was saying is we won't be able to see the aggregated 
> metrics for a single topology if we use taskId. Lets say if I want to see how 
> many tuples are processed in last hour for that topology that aggregation 
> won't be possible using taskId.

@Aravindan
Yeah, actually I was trying to use instanceId for the first time, but 
TimelineMetricCache (TimelineMetricHolder) only checks metric name as unique 
key while putting so that's what I described. Furthermore, I would want to 
aggregate task level metrics into component level metrics but there's no 
wildcard support on instanceId. So there seems to much effort to use instanceId 
for taskId, and having taskId as metric name is easier way to achieve 
aggregation for now.
But I'm open to use instanceId, and support wildcard & aggregation on this. 
Please let me know we would want to use instanceId. @Sid @Sriharsha

@Sriharsha
I guess what Sid was saying is opposite to what you're saying. Let's pretend 
cluster aggregation is occurred.

A. current

appId is component name and metric name is just metric name so there's no 
distinction between topologies. (topology name is not placed anywhere now) So 
cluster aggregation will aggregate metrics across topologies, and we can't 
query last 1 hours of Topology T1, Bolt B1, Metric M1. There're no way to query 
within Topology T1 regardless of aggregation.

B. after patch

appId is topology name and metric name is component name + task id + metric 
name so metrics are classified as topology name. So cluster aggregation will 
aggregate metrics for each topology, and we can't query last 1 hours of all 
topologies, Bolt B1, Metric M1.
There're no way to query across topologies regardless of aggregation.

So if we want to have full flexible queries, topology name also should be 
placed to metric name as same as component name and task id.

@Sid Please correct me if I'm wrong.

Metrics aggregation at query time is not possible for now but will be possible 
with AMBARI-17027.
https://issues.apache.org/jira/browse/AMBARI-17027

>From what I'm understanding, AMS aggregates metrics by 

Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints

2016-06-07 Thread Sriharsha Chintalapani


> On June 7, 2016, 6:07 a.m., Sid Wagle wrote:
> > Changes look good, only thing to consider is the changes to the metric 
> > name. Cluster Aggregation will not occur at topology level since appId = 
> > topologyName for metrics with the same metric name. Is the metric name to 
> > fine grained? Only thing to consider is whether we need task metrics to be 
> > aggregated across topology? If yes, taskId cannot be part of the metric 
> > name. 
> > 
> > Could you also add Aravindan Vijayan to the reviewers? Thanks.
> 
> Sid Wagle wrote:
> Rephrase: Cluster Aggregation will *now* occur at topology level
> 
> Jungtaek Lim wrote:
> > Only thing to consider is whether we need task metrics to be aggregated 
> across topology? If yes, taskId cannot be part of the metric name. 
> 
> It depends on users, but most cases I don't think user will aggregate 
> metrics across topologies.
> 
> And like what I was saying, technically there're no way to aggregate 
> metrics on sink side since parallelism of sink can be set to more than 1 (I 
> mean, we could have multiple aggregation points which breaks aggregation.)
> So we need to push task level metrics into AMS, and task id should be 
> included as metric name for making it unique.
> 
> Based on that, we need `series aggregation` to aggregate task level 
> metrics by higher level. (I'm trying to address this via AMBARI-17027)
> 
> The only thing which affects aggregation is appId.
> 
> - When we set appId to 'component name' (current), same component (Spout, 
> Bolt) name across topologies can be queried together.
> - When we set appId to 'topology name', same metric name (for Storm's 
> view) from different components in topology can be queried together.
> - When we set appId to 'Storm', all metrics can be queried together. 
> (metric name should also include topology name as well)
> 
> I'm not familiar with structure of metrics so I'm not sure how they 
> affects performance while storing / querying. So I'd like to hear opinions on 
> reviewers.
> 
> For reference, below is how storm-graphite composes prefix of metric 
> name. It uses topology name, component name, worker host, worker port, task 
> id.
> 
> https://github.com/verisign/storm-graphite/blob/master/src/main/java/com/verisign/storm/metrics/GraphiteMetricsConsumer.java#L278
> 
> When querying metrics from Graphite users can query with wildcards & 
> series function to aggregate metrics into one and Grafana can show that. 
> That's what I want to address to AMS.
> 
> Aravindan Vijayan wrote:
> Jungtaek, there is also another level of classification called 
> "instancedId" Every appId can have multiple instanceIds. You should find that 
> in the TimelineMetric object. Perhaps, we can use that here.

@Jungtaek what Sid was saying is we won't be able to see the aggregated metrics 
for a single topology if we use taskId. Lets say if I want to see how many 
tuples are processed in last hour for that topology that aggregation won't be 
possible using taskId.


- Sriharsha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48030/#review136411
---


On June 7, 2016, 6:14 a.m., Jungtaek Lim wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48030/
> ---
> 
> (Updated June 7, 2016, 6:14 a.m.)
> 
> 
> Review request for Ambari, Aravindan Vijayan, Sriharsha Chintalapani, and Sid 
> Wagle.
> 
> 
> Bugs: AMBARI-16946
> https://issues.apache.org/jira/browse/AMBARI-16946
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> There's a mismatch between TimelineMetricsCache and Storm metrics unit, while 
> TimelineMetricsCache considers "metric name + timestamp" to be unique but 
> Storm is not.
> 
> For example, assume that bolt B has task T1, T2 and B has registered metrics 
> M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same 
> timestamp TS1 (in TaskInfo, not current time), and received later will be 
> discarded from TimelineMetricsCache.
> 
> If we want to have unique metric point of Storm, we should use "topology name 
> + component name + task id + metric name" to metric name so that "metric name 
> + timestamp" will be unique.
> 
> There're other issues I would like to address, too.
> 
> - Currently, hostname is written to hostname of the machine which runs 
> metrics sink. Since TaskInfo has hostname of the machine which runs task, 
> we're better to use this.
> - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this 
> as millisecond, resulting in timestamp flaw, and malfunction of cache 
> eviction. It should be multiplied by 1000.
> - 

Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints

2016-06-07 Thread Sid Wagle


> On June 7, 2016, 6:07 a.m., Sid Wagle wrote:
> > Changes look good, only thing to consider is the changes to the metric 
> > name. Cluster Aggregation will not occur at topology level since appId = 
> > topologyName for metrics with the same metric name. Is the metric name to 
> > fine grained? Only thing to consider is whether we need task metrics to be 
> > aggregated across topology? If yes, taskId cannot be part of the metric 
> > name. 
> > 
> > Could you also add Aravindan Vijayan to the reviewers? Thanks.

Rephrase: Cluster Aggregation will *now* occur at topology level


- Sid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48030/#review136411
---


On May 30, 2016, 8:21 a.m., Jungtaek Lim wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48030/
> ---
> 
> (Updated May 30, 2016, 8:21 a.m.)
> 
> 
> Review request for Ambari, Sriharsha Chintalapani and Sid Wagle.
> 
> 
> Bugs: AMBARI-16946
> https://issues.apache.org/jira/browse/AMBARI-16946
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> There's a mismatch between TimelineMetricsCache and Storm metrics unit, while 
> TimelineMetricsCache considers "metric name + timestamp" to be unique but 
> Storm is not.
> 
> For example, assume that bolt B has task T1, T2 and B has registered metrics 
> M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same 
> timestamp TS1 (in TaskInfo, not current time), and received later will be 
> discarded from TimelineMetricsCache.
> 
> If we want to have unique metric point of Storm, we should use "topology name 
> + component name + task id + metric name" to metric name so that "metric name 
> + timestamp" will be unique.
> 
> There're other issues I would like to address, too.
> 
> - Currently, hostname is written to hostname of the machine which runs 
> metrics sink. Since TaskInfo has hostname of the machine which runs task, 
> we're better to use this.
> - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this 
> as millisecond, resulting in timestamp flaw, and malfunction of cache 
> eviction. It should be multiplied by 1000.
> - 'component name' is not unique across the cluster, so it's not fit for app 
> id. 'topology name' is unique so proper value of app id is topology name.
> 
> Consideration: Hostname for determining 'write shard' is set to hostname of 
> the machine which runs metrics sink. Since I don't know read also be sharded, 
> I'm not sure it's safe to use TaskInfo's hostname to hostname of 
> TimelineMetric. Please review carefully regarding this.
> 
> 
> Diffs
> -
> 
>   
> ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java
>  02f5598 
>   
> ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java
>  8171a4d 
> 
> Diff: https://reviews.apache.org/r/48030/diff/
> 
> 
> Testing
> ---
> 
> I tested this only ambari-metrics module since changeset is not related on 
> other modules.
> 
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] ambari-metrics . SUCCESS [  0.896 
> s]
> [INFO] Ambari Metrics Common .. SUCCESS [ 13.386 
> s]
> [INFO] Ambari Metrics Hadoop Sink . SUCCESS [  5.085 
> s]
> [INFO] Ambari Metrics Flume Sink .. SUCCESS [  6.827 
> s]
> [INFO] Ambari Metrics Kafka Sink .. SUCCESS [  4.190 
> s]
> [INFO] Ambari Metrics Storm Sink .. SUCCESS [  1.384 
> s]
> [INFO] Ambari Metrics Collector ... SUCCESS [04:06 
> min]
> [INFO] Ambari Metrics Monitor . SUCCESS [  3.556 
> s]
> [INFO] Ambari Metrics Grafana . SUCCESS [01:03 
> min]
> [INFO] Ambari Metrics Assembly  SUCCESS [  3.567 
> s]
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
> [INFO] Total time: 05:48 min
> [INFO] Finished at: 2016-05-30T16:46:07+09:00
> [INFO] Final Memory: 78M/1038M
> [INFO] 
> 
> 
> In fact, I tried to run `mvn test` from ambari root directory but build is 
> failing from ambari-web.
> 
> ```
> > fsevents@0.2.1 install 
> > /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents
> > 

Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints

2016-06-06 Thread Sriharsha Chintalapani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48030/#review136400
---


Ship it!




Ship It!

- Sriharsha Chintalapani


On May 30, 2016, 8:21 a.m., Jungtaek Lim wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48030/
> ---
> 
> (Updated May 30, 2016, 8:21 a.m.)
> 
> 
> Review request for Ambari, Sriharsha Chintalapani and Sid Wagle.
> 
> 
> Bugs: AMBARI-16946
> https://issues.apache.org/jira/browse/AMBARI-16946
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> There's a mismatch between TimelineMetricsCache and Storm metrics unit, while 
> TimelineMetricsCache considers "metric name + timestamp" to be unique but 
> Storm is not.
> 
> For example, assume that bolt B has task T1, T2 and B has registered metrics 
> M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same 
> timestamp TS1 (in TaskInfo, not current time), and received later will be 
> discarded from TimelineMetricsCache.
> 
> If we want to have unique metric point of Storm, we should use "topology name 
> + component name + task id + metric name" to metric name so that "metric name 
> + timestamp" will be unique.
> 
> There're other issues I would like to address, too.
> 
> - Currently, hostname is written to hostname of the machine which runs 
> metrics sink. Since TaskInfo has hostname of the machine which runs task, 
> we're better to use this.
> - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this 
> as millisecond, resulting in timestamp flaw, and malfunction of cache 
> eviction. It should be multiplied by 1000.
> - 'component name' is not unique across the cluster, so it's not fit for app 
> id. 'topology name' is unique so proper value of app id is topology name.
> 
> Consideration: Hostname for determining 'write shard' is set to hostname of 
> the machine which runs metrics sink. Since I don't know read also be sharded, 
> I'm not sure it's safe to use TaskInfo's hostname to hostname of 
> TimelineMetric. Please review carefully regarding this.
> 
> 
> Diffs
> -
> 
>   
> ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java
>  02f5598 
>   
> ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java
>  8171a4d 
> 
> Diff: https://reviews.apache.org/r/48030/diff/
> 
> 
> Testing
> ---
> 
> I tested this only ambari-metrics module since changeset is not related on 
> other modules.
> 
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] ambari-metrics . SUCCESS [  0.896 
> s]
> [INFO] Ambari Metrics Common .. SUCCESS [ 13.386 
> s]
> [INFO] Ambari Metrics Hadoop Sink . SUCCESS [  5.085 
> s]
> [INFO] Ambari Metrics Flume Sink .. SUCCESS [  6.827 
> s]
> [INFO] Ambari Metrics Kafka Sink .. SUCCESS [  4.190 
> s]
> [INFO] Ambari Metrics Storm Sink .. SUCCESS [  1.384 
> s]
> [INFO] Ambari Metrics Collector ... SUCCESS [04:06 
> min]
> [INFO] Ambari Metrics Monitor . SUCCESS [  3.556 
> s]
> [INFO] Ambari Metrics Grafana . SUCCESS [01:03 
> min]
> [INFO] Ambari Metrics Assembly  SUCCESS [  3.567 
> s]
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
> [INFO] Total time: 05:48 min
> [INFO] Finished at: 2016-05-30T16:46:07+09:00
> [INFO] Final Memory: 78M/1038M
> [INFO] 
> 
> 
> In fact, I tried to run `mvn test` from ambari root directory but build is 
> failing from ambari-web.
> 
> ```
> > fsevents@0.2.1 install 
> > /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents
> > node-gyp rebuild
> ...
> npm WARN install:fsevents fsevents@0.2.1 install: `node-gyp rebuild`
> npm WARN install:fsevents Exit status 1
> ```
> 
> No luck on `npm install`, too.
> 
> 
> Thanks,
> 
> Jungtaek Lim
> 
>



Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints

2016-05-30 Thread Jungtaek Lim

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48030/
---

(Updated 5 30, 2016, 8:21 오전)


Review request for Ambari, Sriharsha Chintalapani and Sid Wagle.


Bugs: AMBARI-16946
https://issues.apache.org/jira/browse/AMBARI-16946


Repository: ambari


Description
---

There's a mismatch between TimelineMetricsCache and Storm metrics unit, while 
TimelineMetricsCache considers "metric name + timestamp" to be unique but Storm 
is not.

For example, assume that bolt B has task T1, T2 and B has registered metrics 
M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same 
timestamp TS1 (in TaskInfo, not current time), and received later will be 
discarded from TimelineMetricsCache.

If we want to have unique metric point of Storm, we should use "topology name + 
component name + task id + metric name" to metric name so that "metric name + 
timestamp" will be unique.

There're other issues I would like to address, too.

- Currently, hostname is written to hostname of the machine which runs metrics 
sink. Since TaskInfo has hostname of the machine which runs task, we're better 
to use this.
- Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this 
as millisecond, resulting in timestamp flaw, and malfunction of cache eviction. 
It should be multiplied by 1000.
- 'component name' is not unique across the cluster, so it's not fit for app 
id. 'topology name' is unique so proper value of app id is topology name.

Consideration: Hostname for determining 'write shard' is set to hostname of the 
machine which runs metrics sink. Since I don't know read also be sharded, I'm 
not sure it's safe to use TaskInfo's hostname to hostname of TimelineMetric. 
Please review carefully regarding this.


Diffs
-

  
ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java
 02f5598 
  
ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java
 8171a4d 

Diff: https://reviews.apache.org/r/48030/diff/


Testing
---

I tested this only ambari-metrics module since changeset is not related on 
other modules.

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] ambari-metrics . SUCCESS [  0.896 s]
[INFO] Ambari Metrics Common .. SUCCESS [ 13.386 s]
[INFO] Ambari Metrics Hadoop Sink . SUCCESS [  5.085 s]
[INFO] Ambari Metrics Flume Sink .. SUCCESS [  6.827 s]
[INFO] Ambari Metrics Kafka Sink .. SUCCESS [  4.190 s]
[INFO] Ambari Metrics Storm Sink .. SUCCESS [  1.384 s]
[INFO] Ambari Metrics Collector ... SUCCESS [04:06 min]
[INFO] Ambari Metrics Monitor . SUCCESS [  3.556 s]
[INFO] Ambari Metrics Grafana . SUCCESS [01:03 min]
[INFO] Ambari Metrics Assembly  SUCCESS [  3.567 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 05:48 min
[INFO] Finished at: 2016-05-30T16:46:07+09:00
[INFO] Final Memory: 78M/1038M
[INFO] 

In fact, I tried to run `mvn test` from ambari root directory but build is 
failing from ambari-web.

```
> fsevents@0.2.1 install 
> /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents
> node-gyp rebuild
...
npm WARN install:fsevents fsevents@0.2.1 install: `node-gyp rebuild`
npm WARN install:fsevents Exit status 1
```

No luck on `npm install`, too.


Thanks,

Jungtaek Lim