Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints

Aravindan Vijayan Tue, 07 Jun 2016 08:06:11 -0700


> On June 7, 2016, 6:07 a.m., Sid Wagle wrote:
> > Changes look good, only thing to consider is the changes to the metric 
> > name. Cluster Aggregation will not occur at topology level since appId = 
> > topologyName for metrics with the same metric name. Is the metric name to 
> > fine grained? Only thing to consider is whether we need task metrics to be 
> > aggregated across topology? If yes, taskId cannot be part of the metric 
> > name. 
> > 
> > Could you also add Aravindan Vijayan to the reviewers? Thanks.
> 
> Sid Wagle wrote:
>     Rephrase: Cluster Aggregation will *now* occur at topology level
> 
> Jungtaek Lim wrote:
>     > Only thing to consider is whether we need task metrics to be aggregated 
> across topology? If yes, taskId cannot be part of the metric name. 
>     
>     It depends on users, but most cases I don't think user will aggregate 
> metrics across topologies.
>     
>     And like what I was saying, technically there're no way to aggregate 
> metrics on sink side since parallelism of sink can be set to more than 1 (I 
> mean, we could have multiple aggregation points which breaks aggregation.)
>     So we need to push task level metrics into AMS, and task id should be 
> included as metric name for making it unique.
>     
>     Based on that, we need `series aggregation` to aggregate task level 
> metrics by higher level. (I'm trying to address this via AMBARI-17027)
>     
>     The only thing which affects aggregation is appId.
>     
>     - When we set appId to 'component name' (current), same component (Spout, 
> Bolt) name across topologies can be queried together.
>     - When we set appId to 'topology name', same metric name (for Storm's 
> view) from different components in topology can be queried together.
>     - When we set appId to 'Storm', all metrics can be queried together. 
> (metric name should also include topology name as well)
>     
>     I'm not familiar with structure of metrics so I'm not sure how they 
> affects performance while storing / querying. So I'd like to hear opinions on 
> reviewers.
>     
>     For reference, below is how storm-graphite composes prefix of metric 
> name. It uses topology name, component name, worker host, worker port, task 
> id.
>     
> https://github.com/verisign/storm-graphite/blob/master/src/main/java/com/verisign/storm/metrics/GraphiteMetricsConsumer.java#L278
>     
>     When querying metrics from Graphite users can query with wildcards & 
> series function to aggregate metrics into one and Grafana can show that. 
> That's what I want to address to AMS.


Jungtaek, there is also another level of classification called "instancedId" 
Every appId can have multiple instanceIds. You should find that in the 
TimelineMetric object. Perhaps, we can use that here.


- Aravindan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48030/#review136411
-----------------------------------------------------------


On June 7, 2016, 6:14 a.m., Jungtaek Lim wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48030/
> -----------------------------------------------------------
> 
> (Updated June 7, 2016, 6:14 a.m.)
> 
> 
> Review request for Ambari, Aravindan Vijayan, Sriharsha Chintalapani, and Sid 
> Wagle.
> 
> 
> Bugs: AMBARI-16946
>     https://issues.apache.org/jira/browse/AMBARI-16946
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> There's a mismatch between TimelineMetricsCache and Storm metrics unit, while 
> TimelineMetricsCache considers "metric name + timestamp" to be unique but 
> Storm is not.
> 
> For example, assume that bolt B has task T1, T2 and B has registered metrics 
> M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same 
> timestamp TS1 (in TaskInfo, not current time), and received later will be 
> discarded from TimelineMetricsCache.
> 
> If we want to have unique metric point of Storm, we should use "topology name 
> + component name + task id + metric name" to metric name so that "metric name 
> + timestamp" will be unique.
> 
> There're other issues I would like to address, too.
> 
> - Currently, hostname is written to hostname of the machine which runs 
> metrics sink. Since TaskInfo has hostname of the machine which runs task, 
> we're better to use this.
> - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this 
> as millisecond, resulting in timestamp flaw, and malfunction of cache 
> eviction. It should be multiplied by 1000.
> - 'component name' is not unique across the cluster, so it's not fit for app 
> id. 'topology name' is unique so proper value of app id is topology name.
> 
> Consideration: Hostname for determining 'write shard' is set to hostname of 
> the machine which runs metrics sink. Since I don't know read also be sharded, 
> I'm not sure it's safe to use TaskInfo's hostname to hostname of 
> TimelineMetric. Please review carefully regarding this.
> 
> 
> Diffs
> -----
> 
>   
> ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java
>  02f5598 
>   
> ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java
>  8171a4d 
> 
> Diff: https://reviews.apache.org/r/48030/diff/
> 
> 
> Testing
> -------
> 
> I tested this only ambari-metrics module since changeset is not related on 
> other modules.
> 
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO]
> [INFO] ambari-metrics ..................................... SUCCESS [  0.896 
> s]
> [INFO] Ambari Metrics Common .............................. SUCCESS [ 13.386 
> s]
> [INFO] Ambari Metrics Hadoop Sink ......................... SUCCESS [  5.085 
> s]
> [INFO] Ambari Metrics Flume Sink .......................... SUCCESS [  6.827 
> s]
> [INFO] Ambari Metrics Kafka Sink .......................... SUCCESS [  4.190 
> s]
> [INFO] Ambari Metrics Storm Sink .......................... SUCCESS [  1.384 
> s]
> [INFO] Ambari Metrics Collector ........................... SUCCESS [04:06 
> min]
> [INFO] Ambari Metrics Monitor ............................. SUCCESS [  3.556 
> s]
> [INFO] Ambari Metrics Grafana ............................. SUCCESS [01:03 
> min]
> [INFO] Ambari Metrics Assembly ............................ SUCCESS [  3.567 
> s]
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] Total time: 05:48 min
> [INFO] Finished at: 2016-05-30T16:46:07+09:00
> [INFO] Final Memory: 78M/1038M
> [INFO] 
> ------------------------------------------------------------------------
> 
> In fact, I tried to run `mvn test` from ambari root directory but build is 
> failing from ambari-web.
> 
> ```
> > [email protected] install 
> > /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents
> > node-gyp rebuild
> ...
> npm WARN install:fsevents [email protected] install: `node-gyp rebuild`
> npm WARN install:fsevents Exit status 1
> ```
> 
> No luck on `npm install`, too.
> 
> 
> Thanks,
> 
> Jungtaek Lim
> 
>

Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints

Reply via email to