Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints

Jungtaek Lim Tue, 07 Jun 2016 17:22:20 -0700


> On 6 7, 2016, 6:07 오전, Sid Wagle wrote:
> > Changes look good, only thing to consider is the changes to the metric 
> > name. Cluster Aggregation will not occur at topology level since appId = 
> > topologyName for metrics with the same metric name. Is the metric name to 
> > fine grained? Only thing to consider is whether we need task metrics to be 
> > aggregated across topology? If yes, taskId cannot be part of the metric 
> > name. 
> > 
> > Could you also add Aravindan Vijayan to the reviewers? Thanks.
> 
> Sid Wagle wrote:
>     Rephrase: Cluster Aggregation will *now* occur at topology level
> 
> Jungtaek Lim wrote:
>     > Only thing to consider is whether we need task metrics to be aggregated 
> across topology? If yes, taskId cannot be part of the metric name. 
>     
>     It depends on users, but most cases I don't think user will aggregate 
> metrics across topologies.
>     
>     And like what I was saying, technically there're no way to aggregate 
> metrics on sink side since parallelism of sink can be set to more than 1 (I 
> mean, we could have multiple aggregation points which breaks aggregation.)
>     So we need to push task level metrics into AMS, and task id should be 
> included as metric name for making it unique.
>     
>     Based on that, we need `series aggregation` to aggregate task level 
> metrics by higher level. (I'm trying to address this via AMBARI-17027)
>     
>     The only thing which affects aggregation is appId.
>     
>     - When we set appId to 'component name' (current), same component (Spout, 
> Bolt) name across topologies can be queried together.
>     - When we set appId to 'topology name', same metric name (for Storm's 
> view) from different components in topology can be queried together.
>     - When we set appId to 'Storm', all metrics can be queried together. 
> (metric name should also include topology name as well)
>     
>     I'm not familiar with structure of metrics so I'm not sure how they 
> affects performance while storing / querying. So I'd like to hear opinions on 
> reviewers.
>     
>     For reference, below is how storm-graphite composes prefix of metric 
> name. It uses topology name, component name, worker host, worker port, task 
> id.
>     
> https://github.com/verisign/storm-graphite/blob/master/src/main/java/com/verisign/storm/metrics/GraphiteMetricsConsumer.java#L278
>     
>     When querying metrics from Graphite users can query with wildcards & 
> series function to aggregate metrics into one and Grafana can show that. 
> That's what I want to address to AMS.
> 
> Aravindan Vijayan wrote:
>     Jungtaek, there is also another level of classification called 
> "instancedId" Every appId can have multiple instanceIds. You should find that 
> in the TimelineMetric object. Perhaps, we can use that here.
> 
> Sriharsha Chintalapani wrote:
>     @Jungtaek what Sid was saying is we won't be able to see the aggregated 
> metrics for a single topology if we use taskId. Lets say if I want to see how 
> many tuples are processed in last hour for that topology that aggregation 
> won't be possible using taskId.


@Aravindan
Yeah, actually I was trying to use instanceId for the first time, but 
TimelineMetricCache (TimelineMetricHolder) only checks metric name as unique 
key while putting so that's what I described. Furthermore, I would want to 
aggregate task level metrics into component level metrics but there's no 
wildcard support on instanceId. So there seems to much effort to use instanceId 
for taskId, and having taskId as metric name is easier way to achieve 
aggregation for now.
But I'm open to use instanceId, and support wildcard & aggregation on this. 
Please let me know we would want to use instanceId. @Sid @Sriharsha

@Sriharsha
I guess what Sid was saying is opposite to what you're saying. Let's pretend 
cluster aggregation is occurred.

A. current

appId is component name and metric name is just metric name so there's no 
distinction between topologies. (topology name is not placed anywhere now) So 
cluster aggregation will aggregate metrics across topologies, and we can't 
query last 1 hours of Topology T1, Bolt B1, Metric M1. There're no way to query 
within Topology T1 regardless of aggregation.

B. after patch

appId is topology name and metric name is component name + task id + metric 
name so metrics are classified as topology name. So cluster aggregation will 
aggregate metrics for each topology, and we can't query last 1 hours of all 
topologies, Bolt B1, Metric M1.
There're no way to query across topologies regardless of aggregation.

So if we want to have full flexible queries, topology name also should be 
placed to metric name as same as component name and task id.

@Sid Please correct me if I'm wrong.

Metrics aggregation at query time is not possible for now but will be possible 
with AMBARI-17027.
https://issues.apache.org/jira/browse/AMBARI-17027

>From what I'm understanding, AMS aggregates metrics by host level and stores 
>to other tables.
But there could be another kind of aggregations so having flexible aggregation 
is great.


- Jungtaek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48030/#review136411
-----------------------------------------------------------


On 6 7, 2016, 6:14 오전, Jungtaek Lim wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48030/
> -----------------------------------------------------------
> 
> (Updated 6 7, 2016, 6:14 오전)
> 
> 
> Review request for Ambari, Aravindan Vijayan, Sriharsha Chintalapani, and Sid 
> Wagle.
> 
> 
> Bugs: AMBARI-16946
>     https://issues.apache.org/jira/browse/AMBARI-16946
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> There's a mismatch between TimelineMetricsCache and Storm metrics unit, while 
> TimelineMetricsCache considers "metric name + timestamp" to be unique but 
> Storm is not.
> 
> For example, assume that bolt B has task T1, T2 and B has registered metrics 
> M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same 
> timestamp TS1 (in TaskInfo, not current time), and received later will be 
> discarded from TimelineMetricsCache.
> 
> If we want to have unique metric point of Storm, we should use "topology name 
> + component name + task id + metric name" to metric name so that "metric name 
> + timestamp" will be unique.
> 
> There're other issues I would like to address, too.
> 
> - Currently, hostname is written to hostname of the machine which runs 
> metrics sink. Since TaskInfo has hostname of the machine which runs task, 
> we're better to use this.
> - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this 
> as millisecond, resulting in timestamp flaw, and malfunction of cache 
> eviction. It should be multiplied by 1000.
> - 'component name' is not unique across the cluster, so it's not fit for app 
> id. 'topology name' is unique so proper value of app id is topology name.
> 
> Consideration: Hostname for determining 'write shard' is set to hostname of 
> the machine which runs metrics sink. Since I don't know read also be sharded, 
> I'm not sure it's safe to use TaskInfo's hostname to hostname of 
> TimelineMetric. Please review carefully regarding this.
> 
> 
> Diffs
> -----
> 
>   
> ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java
>  02f5598 
>   
> ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java
>  8171a4d 
> 
> Diff: https://reviews.apache.org/r/48030/diff/
> 
> 
> Testing
> -------
> 
> I tested this only ambari-metrics module since changeset is not related on 
> other modules.
> 
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO]
> [INFO] ambari-metrics ..................................... SUCCESS [  0.896 
> s]
> [INFO] Ambari Metrics Common .............................. SUCCESS [ 13.386 
> s]
> [INFO] Ambari Metrics Hadoop Sink ......................... SUCCESS [  5.085 
> s]
> [INFO] Ambari Metrics Flume Sink .......................... SUCCESS [  6.827 
> s]
> [INFO] Ambari Metrics Kafka Sink .......................... SUCCESS [  4.190 
> s]
> [INFO] Ambari Metrics Storm Sink .......................... SUCCESS [  1.384 
> s]
> [INFO] Ambari Metrics Collector ........................... SUCCESS [04:06 
> min]
> [INFO] Ambari Metrics Monitor ............................. SUCCESS [  3.556 
> s]
> [INFO] Ambari Metrics Grafana ............................. SUCCESS [01:03 
> min]
> [INFO] Ambari Metrics Assembly ............................ SUCCESS [  3.567 
> s]
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] Total time: 05:48 min
> [INFO] Finished at: 2016-05-30T16:46:07+09:00
> [INFO] Final Memory: 78M/1038M
> [INFO] 
> ------------------------------------------------------------------------
> 
> In fact, I tried to run `mvn test` from ambari root directory but build is 
> failing from ambari-web.
> 
> ```
> > [email protected] install 
> > /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents
> > node-gyp rebuild
> ...
> npm WARN install:fsevents [email protected] install: `node-gyp rebuild`
> npm WARN install:fsevents Exit status 1
> ```
> 
> No luck on `npm install`, too.
> 
> 
> Thanks,
> 
> Jungtaek Lim
> 
>

Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints

Reply via email to