Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48030/#review136841 --- This patch has been committed. Please close this review. - Aravindan Vijayan On June 9, 2016, 1:58 a.m., Jungtaek Lim wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48030/ > --- > > (Updated June 9, 2016, 1:58 a.m.) > > > Review request for Ambari, Aravindan Vijayan, Sriharsha Chintalapani, and Sid > Wagle. > > > Bugs: AMBARI-16946 > https://issues.apache.org/jira/browse/AMBARI-16946 > > > Repository: ambari > > > Description > --- > > There's a mismatch between TimelineMetricsCache and Storm metrics unit, while > TimelineMetricsCache considers "metric name + timestamp" to be unique but > Storm is not. > > For example, assume that bolt B has task T1, T2 and B has registered metrics > M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same > timestamp TS1 (in TaskInfo, not current time), and received later will be > discarded from TimelineMetricsCache. > > If we want to have unique metric point of Storm, we should use "topology name > + component name + task id + metric name" to metric name so that "metric name > + timestamp" will be unique. > > There're other issues I would like to address, too. > > - Currently, hostname is written to hostname of the machine which runs > metrics sink. Since TaskInfo has hostname of the machine which runs task, > we're better to use this. > - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this > as millisecond, resulting in timestamp flaw, and malfunction of cache > eviction. It should be multiplied by 1000. > - 'component name' is not unique across the cluster, so it's not fit for app > id. 'topology name' is unique so proper value of app id is topology name. > > Consideration: Hostname for determining 'write shard' is set to hostname of > the machine which runs metrics sink. Since I don't know read also be sharded, > I'm not sure it's safe to use TaskInfo's hostname to hostname of > TimelineMetric. Please review carefully regarding this. > > > Diffs > - > > > ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java > 02f5598 > > ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java > 8171a4d > > Diff: https://reviews.apache.org/r/48030/diff/ > > > Testing > --- > > I tested this only ambari-metrics module since changeset is not related on > other modules. > > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] ambari-metrics . SUCCESS [ 0.896 > s] > [INFO] Ambari Metrics Common .. SUCCESS [ 13.386 > s] > [INFO] Ambari Metrics Hadoop Sink . SUCCESS [ 5.085 > s] > [INFO] Ambari Metrics Flume Sink .. SUCCESS [ 6.827 > s] > [INFO] Ambari Metrics Kafka Sink .. SUCCESS [ 4.190 > s] > [INFO] Ambari Metrics Storm Sink .. SUCCESS [ 1.384 > s] > [INFO] Ambari Metrics Collector ... SUCCESS [04:06 > min] > [INFO] Ambari Metrics Monitor . SUCCESS [ 3.556 > s] > [INFO] Ambari Metrics Grafana . SUCCESS [01:03 > min] > [INFO] Ambari Metrics Assembly SUCCESS [ 3.567 > s] > [INFO] > > [INFO] BUILD SUCCESS > [INFO] > > [INFO] Total time: 05:48 min > [INFO] Finished at: 2016-05-30T16:46:07+09:00 > [INFO] Final Memory: 78M/1038M > [INFO] > > > In fact, I tried to run `mvn test` from ambari root directory but build is > failing from ambari-web. > > ``` > > fsevents@0.2.1 install > > /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents > > node-gyp rebuild > ... > npm WARN install:fsevents fsevents@0.2.1 install: `node-gyp rebuild` > npm WARN install:fsevents Exit status 1 > ``` > > No luck on `npm install`, too. > > > Thanks, > > Jungtaek Lim > >
Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints
> On June 7, 2016, 6:07 a.m., Sid Wagle wrote: > > Changes look good, only thing to consider is the changes to the metric > > name. Cluster Aggregation will not occur at topology level since appId = > > topologyName for metrics with the same metric name. Is the metric name to > > fine grained? Only thing to consider is whether we need task metrics to be > > aggregated across topology? If yes, taskId cannot be part of the metric > > name. > > > > Could you also add Aravindan Vijayan to the reviewers? Thanks. > > Sid Wagle wrote: > Rephrase: Cluster Aggregation will *now* occur at topology level > > Jungtaek Lim wrote: > > Only thing to consider is whether we need task metrics to be aggregated > across topology? If yes, taskId cannot be part of the metric name. > > It depends on users, but most cases I don't think user will aggregate > metrics across topologies. > > And like what I was saying, technically there're no way to aggregate > metrics on sink side since parallelism of sink can be set to more than 1 (I > mean, we could have multiple aggregation points which breaks aggregation.) > So we need to push task level metrics into AMS, and task id should be > included as metric name for making it unique. > > Based on that, we need `series aggregation` to aggregate task level > metrics by higher level. (I'm trying to address this via AMBARI-17027) > > The only thing which affects aggregation is appId. > > - When we set appId to 'component name' (current), same component (Spout, > Bolt) name across topologies can be queried together. > - When we set appId to 'topology name', same metric name (for Storm's > view) from different components in topology can be queried together. > - When we set appId to 'Storm', all metrics can be queried together. > (metric name should also include topology name as well) > > I'm not familiar with structure of metrics so I'm not sure how they > affects performance while storing / querying. So I'd like to hear opinions on > reviewers. > > For reference, below is how storm-graphite composes prefix of metric > name. It uses topology name, component name, worker host, worker port, task > id. > > https://github.com/verisign/storm-graphite/blob/master/src/main/java/com/verisign/storm/metrics/GraphiteMetricsConsumer.java#L278 > > When querying metrics from Graphite users can query with wildcards & > series function to aggregate metrics into one and Grafana can show that. > That's what I want to address to AMS. > > Aravindan Vijayan wrote: > Jungtaek, there is also another level of classification called > "instancedId" Every appId can have multiple instanceIds. You should find that > in the TimelineMetric object. Perhaps, we can use that here. > > Sriharsha Chintalapani wrote: > @Jungtaek what Sid was saying is we won't be able to see the aggregated > metrics for a single topology if we use taskId. Lets say if I want to see how > many tuples are processed in last hour for that topology that aggregation > won't be possible using taskId. > > Jungtaek Lim wrote: > @Aravindan > Yeah, actually I was trying to use instanceId for the first time, but > TimelineMetricCache (TimelineMetricHolder) only checks metric name as unique > key while putting so that's what I described. Furthermore, I would want to > aggregate task level metrics into component level metrics but there's no > wildcard support on instanceId. So there seems to much effort to use > instanceId for taskId, and having taskId as metric name is easier way to > achieve aggregation for now. > But I'm open to use instanceId, and support wildcard & aggregation on > this. Please let me know we would want to use instanceId. @Sid @Sriharsha > > @Sriharsha > I guess what Sid was saying is opposite to what you're saying. Let's > pretend cluster aggregation is occurred. > > A. current > > appId is component name and metric name is just metric name so there's no > distinction between topologies. (topology name is not placed anywhere now) So > cluster aggregation will aggregate metrics across topologies, and we can't > query last 1 hours of Topology T1, Bolt B1, Metric M1. There're no way to > query within Topology T1 regardless of aggregation. > > B. after patch > > appId is topology name and metric name is component name + task id + > metric name so metrics are classified as topology name. So cluster > aggregation will aggregate metrics for each topology, and we can't query last > 1 hours of all topologies, Bolt B1, Metric M1. > There're no way to query across topologies regardless of aggregation. > > So if we want to have full flexible queries, topology name also should be > placed to metric name as same as component name and task id. > > @Sid Please correct me if I'm wrong. > > Metrics aggregation at
Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48030/#review136767 --- Ship it! Ship It! - Sriharsha Chintalapani On June 9, 2016, 1:58 a.m., Jungtaek Lim wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48030/ > --- > > (Updated June 9, 2016, 1:58 a.m.) > > > Review request for Ambari, Aravindan Vijayan, Sriharsha Chintalapani, and Sid > Wagle. > > > Bugs: AMBARI-16946 > https://issues.apache.org/jira/browse/AMBARI-16946 > > > Repository: ambari > > > Description > --- > > There's a mismatch between TimelineMetricsCache and Storm metrics unit, while > TimelineMetricsCache considers "metric name + timestamp" to be unique but > Storm is not. > > For example, assume that bolt B has task T1, T2 and B has registered metrics > M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same > timestamp TS1 (in TaskInfo, not current time), and received later will be > discarded from TimelineMetricsCache. > > If we want to have unique metric point of Storm, we should use "topology name > + component name + task id + metric name" to metric name so that "metric name > + timestamp" will be unique. > > There're other issues I would like to address, too. > > - Currently, hostname is written to hostname of the machine which runs > metrics sink. Since TaskInfo has hostname of the machine which runs task, > we're better to use this. > - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this > as millisecond, resulting in timestamp flaw, and malfunction of cache > eviction. It should be multiplied by 1000. > - 'component name' is not unique across the cluster, so it's not fit for app > id. 'topology name' is unique so proper value of app id is topology name. > > Consideration: Hostname for determining 'write shard' is set to hostname of > the machine which runs metrics sink. Since I don't know read also be sharded, > I'm not sure it's safe to use TaskInfo's hostname to hostname of > TimelineMetric. Please review carefully regarding this. > > > Diffs > - > > > ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java > 02f5598 > > ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java > 8171a4d > > Diff: https://reviews.apache.org/r/48030/diff/ > > > Testing > --- > > I tested this only ambari-metrics module since changeset is not related on > other modules. > > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] ambari-metrics . SUCCESS [ 0.896 > s] > [INFO] Ambari Metrics Common .. SUCCESS [ 13.386 > s] > [INFO] Ambari Metrics Hadoop Sink . SUCCESS [ 5.085 > s] > [INFO] Ambari Metrics Flume Sink .. SUCCESS [ 6.827 > s] > [INFO] Ambari Metrics Kafka Sink .. SUCCESS [ 4.190 > s] > [INFO] Ambari Metrics Storm Sink .. SUCCESS [ 1.384 > s] > [INFO] Ambari Metrics Collector ... SUCCESS [04:06 > min] > [INFO] Ambari Metrics Monitor . SUCCESS [ 3.556 > s] > [INFO] Ambari Metrics Grafana . SUCCESS [01:03 > min] > [INFO] Ambari Metrics Assembly SUCCESS [ 3.567 > s] > [INFO] > > [INFO] BUILD SUCCESS > [INFO] > > [INFO] Total time: 05:48 min > [INFO] Finished at: 2016-05-30T16:46:07+09:00 > [INFO] Final Memory: 78M/1038M > [INFO] > > > In fact, I tried to run `mvn test` from ambari root directory but build is > failing from ambari-web. > > ``` > > fsevents@0.2.1 install > > /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents > > node-gyp rebuild > ... > npm WARN install:fsevents fsevents@0.2.1 install: `node-gyp rebuild` > npm WARN install:fsevents Exit status 1 > ``` > > No luck on `npm install`, too. > > > Thanks, > > Jungtaek Lim > >
Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48030/ --- (Updated 6 9, 2016, 1:58 오전) Review request for Ambari, Aravindan Vijayan, Sriharsha Chintalapani, and Sid Wagle. Changes --- Addressing AMBARI-17133: Replace underscore character from Storm metrics sink. Since AMBARI-17133 should be built on top of AMBARI-16946 but it's tiny size I would like to let AMBARI-16946 containing AMBARI-17133. Bugs: AMBARI-16946 https://issues.apache.org/jira/browse/AMBARI-16946 Repository: ambari Description --- There's a mismatch between TimelineMetricsCache and Storm metrics unit, while TimelineMetricsCache considers "metric name + timestamp" to be unique but Storm is not. For example, assume that bolt B has task T1, T2 and B has registered metrics M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same timestamp TS1 (in TaskInfo, not current time), and received later will be discarded from TimelineMetricsCache. If we want to have unique metric point of Storm, we should use "topology name + component name + task id + metric name" to metric name so that "metric name + timestamp" will be unique. There're other issues I would like to address, too. - Currently, hostname is written to hostname of the machine which runs metrics sink. Since TaskInfo has hostname of the machine which runs task, we're better to use this. - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this as millisecond, resulting in timestamp flaw, and malfunction of cache eviction. It should be multiplied by 1000. - 'component name' is not unique across the cluster, so it's not fit for app id. 'topology name' is unique so proper value of app id is topology name. Consideration: Hostname for determining 'write shard' is set to hostname of the machine which runs metrics sink. Since I don't know read also be sharded, I'm not sure it's safe to use TaskInfo's hostname to hostname of TimelineMetric. Please review carefully regarding this. Diffs (updated) - ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java 02f5598 ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java 8171a4d Diff: https://reviews.apache.org/r/48030/diff/ Testing --- I tested this only ambari-metrics module since changeset is not related on other modules. [INFO] [INFO] Reactor Summary: [INFO] [INFO] ambari-metrics . SUCCESS [ 0.896 s] [INFO] Ambari Metrics Common .. SUCCESS [ 13.386 s] [INFO] Ambari Metrics Hadoop Sink . SUCCESS [ 5.085 s] [INFO] Ambari Metrics Flume Sink .. SUCCESS [ 6.827 s] [INFO] Ambari Metrics Kafka Sink .. SUCCESS [ 4.190 s] [INFO] Ambari Metrics Storm Sink .. SUCCESS [ 1.384 s] [INFO] Ambari Metrics Collector ... SUCCESS [04:06 min] [INFO] Ambari Metrics Monitor . SUCCESS [ 3.556 s] [INFO] Ambari Metrics Grafana . SUCCESS [01:03 min] [INFO] Ambari Metrics Assembly SUCCESS [ 3.567 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 05:48 min [INFO] Finished at: 2016-05-30T16:46:07+09:00 [INFO] Final Memory: 78M/1038M [INFO] In fact, I tried to run `mvn test` from ambari root directory but build is failing from ambari-web. ``` > fsevents@0.2.1 install > /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents > node-gyp rebuild ... npm WARN install:fsevents fsevents@0.2.1 install: `node-gyp rebuild` npm WARN install:fsevents Exit status 1 ``` No luck on `npm install`, too. Thanks, Jungtaek Lim
Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints
> On 6 7, 2016, 6:07 오전, Sid Wagle wrote: > > Changes look good, only thing to consider is the changes to the metric > > name. Cluster Aggregation will not occur at topology level since appId = > > topologyName for metrics with the same metric name. Is the metric name to > > fine grained? Only thing to consider is whether we need task metrics to be > > aggregated across topology? If yes, taskId cannot be part of the metric > > name. > > > > Could you also add Aravindan Vijayan to the reviewers? Thanks. > > Sid Wagle wrote: > Rephrase: Cluster Aggregation will *now* occur at topology level > > Jungtaek Lim wrote: > > Only thing to consider is whether we need task metrics to be aggregated > across topology? If yes, taskId cannot be part of the metric name. > > It depends on users, but most cases I don't think user will aggregate > metrics across topologies. > > And like what I was saying, technically there're no way to aggregate > metrics on sink side since parallelism of sink can be set to more than 1 (I > mean, we could have multiple aggregation points which breaks aggregation.) > So we need to push task level metrics into AMS, and task id should be > included as metric name for making it unique. > > Based on that, we need `series aggregation` to aggregate task level > metrics by higher level. (I'm trying to address this via AMBARI-17027) > > The only thing which affects aggregation is appId. > > - When we set appId to 'component name' (current), same component (Spout, > Bolt) name across topologies can be queried together. > - When we set appId to 'topology name', same metric name (for Storm's > view) from different components in topology can be queried together. > - When we set appId to 'Storm', all metrics can be queried together. > (metric name should also include topology name as well) > > I'm not familiar with structure of metrics so I'm not sure how they > affects performance while storing / querying. So I'd like to hear opinions on > reviewers. > > For reference, below is how storm-graphite composes prefix of metric > name. It uses topology name, component name, worker host, worker port, task > id. > > https://github.com/verisign/storm-graphite/blob/master/src/main/java/com/verisign/storm/metrics/GraphiteMetricsConsumer.java#L278 > > When querying metrics from Graphite users can query with wildcards & > series function to aggregate metrics into one and Grafana can show that. > That's what I want to address to AMS. > > Aravindan Vijayan wrote: > Jungtaek, there is also another level of classification called > "instancedId" Every appId can have multiple instanceIds. You should find that > in the TimelineMetric object. Perhaps, we can use that here. > > Sriharsha Chintalapani wrote: > @Jungtaek what Sid was saying is we won't be able to see the aggregated > metrics for a single topology if we use taskId. Lets say if I want to see how > many tuples are processed in last hour for that topology that aggregation > won't be possible using taskId. @Aravindan Yeah, actually I was trying to use instanceId for the first time, but TimelineMetricCache (TimelineMetricHolder) only checks metric name as unique key while putting so that's what I described. Furthermore, I would want to aggregate task level metrics into component level metrics but there's no wildcard support on instanceId. So there seems to much effort to use instanceId for taskId, and having taskId as metric name is easier way to achieve aggregation for now. But I'm open to use instanceId, and support wildcard & aggregation on this. Please let me know we would want to use instanceId. @Sid @Sriharsha @Sriharsha I guess what Sid was saying is opposite to what you're saying. Let's pretend cluster aggregation is occurred. A. current appId is component name and metric name is just metric name so there's no distinction between topologies. (topology name is not placed anywhere now) So cluster aggregation will aggregate metrics across topologies, and we can't query last 1 hours of Topology T1, Bolt B1, Metric M1. There're no way to query within Topology T1 regardless of aggregation. B. after patch appId is topology name and metric name is component name + task id + metric name so metrics are classified as topology name. So cluster aggregation will aggregate metrics for each topology, and we can't query last 1 hours of all topologies, Bolt B1, Metric M1. There're no way to query across topologies regardless of aggregation. So if we want to have full flexible queries, topology name also should be placed to metric name as same as component name and task id. @Sid Please correct me if I'm wrong. Metrics aggregation at query time is not possible for now but will be possible with AMBARI-17027. https://issues.apache.org/jira/browse/AMBARI-17027 >From what I'm understanding, AMS aggregates metrics by
Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints
> On June 7, 2016, 6:07 a.m., Sid Wagle wrote: > > Changes look good, only thing to consider is the changes to the metric > > name. Cluster Aggregation will not occur at topology level since appId = > > topologyName for metrics with the same metric name. Is the metric name to > > fine grained? Only thing to consider is whether we need task metrics to be > > aggregated across topology? If yes, taskId cannot be part of the metric > > name. > > > > Could you also add Aravindan Vijayan to the reviewers? Thanks. > > Sid Wagle wrote: > Rephrase: Cluster Aggregation will *now* occur at topology level > > Jungtaek Lim wrote: > > Only thing to consider is whether we need task metrics to be aggregated > across topology? If yes, taskId cannot be part of the metric name. > > It depends on users, but most cases I don't think user will aggregate > metrics across topologies. > > And like what I was saying, technically there're no way to aggregate > metrics on sink side since parallelism of sink can be set to more than 1 (I > mean, we could have multiple aggregation points which breaks aggregation.) > So we need to push task level metrics into AMS, and task id should be > included as metric name for making it unique. > > Based on that, we need `series aggregation` to aggregate task level > metrics by higher level. (I'm trying to address this via AMBARI-17027) > > The only thing which affects aggregation is appId. > > - When we set appId to 'component name' (current), same component (Spout, > Bolt) name across topologies can be queried together. > - When we set appId to 'topology name', same metric name (for Storm's > view) from different components in topology can be queried together. > - When we set appId to 'Storm', all metrics can be queried together. > (metric name should also include topology name as well) > > I'm not familiar with structure of metrics so I'm not sure how they > affects performance while storing / querying. So I'd like to hear opinions on > reviewers. > > For reference, below is how storm-graphite composes prefix of metric > name. It uses topology name, component name, worker host, worker port, task > id. > > https://github.com/verisign/storm-graphite/blob/master/src/main/java/com/verisign/storm/metrics/GraphiteMetricsConsumer.java#L278 > > When querying metrics from Graphite users can query with wildcards & > series function to aggregate metrics into one and Grafana can show that. > That's what I want to address to AMS. > > Aravindan Vijayan wrote: > Jungtaek, there is also another level of classification called > "instancedId" Every appId can have multiple instanceIds. You should find that > in the TimelineMetric object. Perhaps, we can use that here. @Jungtaek what Sid was saying is we won't be able to see the aggregated metrics for a single topology if we use taskId. Lets say if I want to see how many tuples are processed in last hour for that topology that aggregation won't be possible using taskId. - Sriharsha --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48030/#review136411 --- On June 7, 2016, 6:14 a.m., Jungtaek Lim wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48030/ > --- > > (Updated June 7, 2016, 6:14 a.m.) > > > Review request for Ambari, Aravindan Vijayan, Sriharsha Chintalapani, and Sid > Wagle. > > > Bugs: AMBARI-16946 > https://issues.apache.org/jira/browse/AMBARI-16946 > > > Repository: ambari > > > Description > --- > > There's a mismatch between TimelineMetricsCache and Storm metrics unit, while > TimelineMetricsCache considers "metric name + timestamp" to be unique but > Storm is not. > > For example, assume that bolt B has task T1, T2 and B has registered metrics > M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same > timestamp TS1 (in TaskInfo, not current time), and received later will be > discarded from TimelineMetricsCache. > > If we want to have unique metric point of Storm, we should use "topology name > + component name + task id + metric name" to metric name so that "metric name > + timestamp" will be unique. > > There're other issues I would like to address, too. > > - Currently, hostname is written to hostname of the machine which runs > metrics sink. Since TaskInfo has hostname of the machine which runs task, > we're better to use this. > - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this > as millisecond, resulting in timestamp flaw, and malfunction of cache > eviction. It should be multiplied by 1000. > -
Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints
> On June 7, 2016, 6:07 a.m., Sid Wagle wrote: > > Changes look good, only thing to consider is the changes to the metric > > name. Cluster Aggregation will not occur at topology level since appId = > > topologyName for metrics with the same metric name. Is the metric name to > > fine grained? Only thing to consider is whether we need task metrics to be > > aggregated across topology? If yes, taskId cannot be part of the metric > > name. > > > > Could you also add Aravindan Vijayan to the reviewers? Thanks. Rephrase: Cluster Aggregation will *now* occur at topology level - Sid --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48030/#review136411 --- On May 30, 2016, 8:21 a.m., Jungtaek Lim wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48030/ > --- > > (Updated May 30, 2016, 8:21 a.m.) > > > Review request for Ambari, Sriharsha Chintalapani and Sid Wagle. > > > Bugs: AMBARI-16946 > https://issues.apache.org/jira/browse/AMBARI-16946 > > > Repository: ambari > > > Description > --- > > There's a mismatch between TimelineMetricsCache and Storm metrics unit, while > TimelineMetricsCache considers "metric name + timestamp" to be unique but > Storm is not. > > For example, assume that bolt B has task T1, T2 and B has registered metrics > M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same > timestamp TS1 (in TaskInfo, not current time), and received later will be > discarded from TimelineMetricsCache. > > If we want to have unique metric point of Storm, we should use "topology name > + component name + task id + metric name" to metric name so that "metric name > + timestamp" will be unique. > > There're other issues I would like to address, too. > > - Currently, hostname is written to hostname of the machine which runs > metrics sink. Since TaskInfo has hostname of the machine which runs task, > we're better to use this. > - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this > as millisecond, resulting in timestamp flaw, and malfunction of cache > eviction. It should be multiplied by 1000. > - 'component name' is not unique across the cluster, so it's not fit for app > id. 'topology name' is unique so proper value of app id is topology name. > > Consideration: Hostname for determining 'write shard' is set to hostname of > the machine which runs metrics sink. Since I don't know read also be sharded, > I'm not sure it's safe to use TaskInfo's hostname to hostname of > TimelineMetric. Please review carefully regarding this. > > > Diffs > - > > > ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java > 02f5598 > > ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java > 8171a4d > > Diff: https://reviews.apache.org/r/48030/diff/ > > > Testing > --- > > I tested this only ambari-metrics module since changeset is not related on > other modules. > > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] ambari-metrics . SUCCESS [ 0.896 > s] > [INFO] Ambari Metrics Common .. SUCCESS [ 13.386 > s] > [INFO] Ambari Metrics Hadoop Sink . SUCCESS [ 5.085 > s] > [INFO] Ambari Metrics Flume Sink .. SUCCESS [ 6.827 > s] > [INFO] Ambari Metrics Kafka Sink .. SUCCESS [ 4.190 > s] > [INFO] Ambari Metrics Storm Sink .. SUCCESS [ 1.384 > s] > [INFO] Ambari Metrics Collector ... SUCCESS [04:06 > min] > [INFO] Ambari Metrics Monitor . SUCCESS [ 3.556 > s] > [INFO] Ambari Metrics Grafana . SUCCESS [01:03 > min] > [INFO] Ambari Metrics Assembly SUCCESS [ 3.567 > s] > [INFO] > > [INFO] BUILD SUCCESS > [INFO] > > [INFO] Total time: 05:48 min > [INFO] Finished at: 2016-05-30T16:46:07+09:00 > [INFO] Final Memory: 78M/1038M > [INFO] > > > In fact, I tried to run `mvn test` from ambari root directory but build is > failing from ambari-web. > > ``` > > fsevents@0.2.1 install > > /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents > >
Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48030/#review136400 --- Ship it! Ship It! - Sriharsha Chintalapani On May 30, 2016, 8:21 a.m., Jungtaek Lim wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48030/ > --- > > (Updated May 30, 2016, 8:21 a.m.) > > > Review request for Ambari, Sriharsha Chintalapani and Sid Wagle. > > > Bugs: AMBARI-16946 > https://issues.apache.org/jira/browse/AMBARI-16946 > > > Repository: ambari > > > Description > --- > > There's a mismatch between TimelineMetricsCache and Storm metrics unit, while > TimelineMetricsCache considers "metric name + timestamp" to be unique but > Storm is not. > > For example, assume that bolt B has task T1, T2 and B has registered metrics > M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same > timestamp TS1 (in TaskInfo, not current time), and received later will be > discarded from TimelineMetricsCache. > > If we want to have unique metric point of Storm, we should use "topology name > + component name + task id + metric name" to metric name so that "metric name > + timestamp" will be unique. > > There're other issues I would like to address, too. > > - Currently, hostname is written to hostname of the machine which runs > metrics sink. Since TaskInfo has hostname of the machine which runs task, > we're better to use this. > - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this > as millisecond, resulting in timestamp flaw, and malfunction of cache > eviction. It should be multiplied by 1000. > - 'component name' is not unique across the cluster, so it's not fit for app > id. 'topology name' is unique so proper value of app id is topology name. > > Consideration: Hostname for determining 'write shard' is set to hostname of > the machine which runs metrics sink. Since I don't know read also be sharded, > I'm not sure it's safe to use TaskInfo's hostname to hostname of > TimelineMetric. Please review carefully regarding this. > > > Diffs > - > > > ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java > 02f5598 > > ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java > 8171a4d > > Diff: https://reviews.apache.org/r/48030/diff/ > > > Testing > --- > > I tested this only ambari-metrics module since changeset is not related on > other modules. > > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] ambari-metrics . SUCCESS [ 0.896 > s] > [INFO] Ambari Metrics Common .. SUCCESS [ 13.386 > s] > [INFO] Ambari Metrics Hadoop Sink . SUCCESS [ 5.085 > s] > [INFO] Ambari Metrics Flume Sink .. SUCCESS [ 6.827 > s] > [INFO] Ambari Metrics Kafka Sink .. SUCCESS [ 4.190 > s] > [INFO] Ambari Metrics Storm Sink .. SUCCESS [ 1.384 > s] > [INFO] Ambari Metrics Collector ... SUCCESS [04:06 > min] > [INFO] Ambari Metrics Monitor . SUCCESS [ 3.556 > s] > [INFO] Ambari Metrics Grafana . SUCCESS [01:03 > min] > [INFO] Ambari Metrics Assembly SUCCESS [ 3.567 > s] > [INFO] > > [INFO] BUILD SUCCESS > [INFO] > > [INFO] Total time: 05:48 min > [INFO] Finished at: 2016-05-30T16:46:07+09:00 > [INFO] Final Memory: 78M/1038M > [INFO] > > > In fact, I tried to run `mvn test` from ambari root directory but build is > failing from ambari-web. > > ``` > > fsevents@0.2.1 install > > /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents > > node-gyp rebuild > ... > npm WARN install:fsevents fsevents@0.2.1 install: `node-gyp rebuild` > npm WARN install:fsevents Exit status 1 > ``` > > No luck on `npm install`, too. > > > Thanks, > > Jungtaek Lim > >
Re: Review Request 48030: AMBARI-16946 Storm Metrics Sink has high chance to discard some datapoints
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48030/ --- (Updated 5 30, 2016, 8:21 오전) Review request for Ambari, Sriharsha Chintalapani and Sid Wagle. Bugs: AMBARI-16946 https://issues.apache.org/jira/browse/AMBARI-16946 Repository: ambari Description --- There's a mismatch between TimelineMetricsCache and Storm metrics unit, while TimelineMetricsCache considers "metric name + timestamp" to be unique but Storm is not. For example, assume that bolt B has task T1, T2 and B has registered metrics M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same timestamp TS1 (in TaskInfo, not current time), and received later will be discarded from TimelineMetricsCache. If we want to have unique metric point of Storm, we should use "topology name + component name + task id + metric name" to metric name so that "metric name + timestamp" will be unique. There're other issues I would like to address, too. - Currently, hostname is written to hostname of the machine which runs metrics sink. Since TaskInfo has hostname of the machine which runs task, we're better to use this. - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this as millisecond, resulting in timestamp flaw, and malfunction of cache eviction. It should be multiplied by 1000. - 'component name' is not unique across the cluster, so it's not fit for app id. 'topology name' is unique so proper value of app id is topology name. Consideration: Hostname for determining 'write shard' is set to hostname of the machine which runs metrics sink. Since I don't know read also be sharded, I'm not sure it's safe to use TaskInfo's hostname to hostname of TimelineMetric. Please review carefully regarding this. Diffs - ambari-metrics/ambari-metrics-storm-sink/src/main/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSink.java 02f5598 ambari-metrics/ambari-metrics-storm-sink/src/test/java/org/apache/hadoop/metrics2/sink/storm/StormTimelineMetricsSinkTest.java 8171a4d Diff: https://reviews.apache.org/r/48030/diff/ Testing --- I tested this only ambari-metrics module since changeset is not related on other modules. [INFO] [INFO] Reactor Summary: [INFO] [INFO] ambari-metrics . SUCCESS [ 0.896 s] [INFO] Ambari Metrics Common .. SUCCESS [ 13.386 s] [INFO] Ambari Metrics Hadoop Sink . SUCCESS [ 5.085 s] [INFO] Ambari Metrics Flume Sink .. SUCCESS [ 6.827 s] [INFO] Ambari Metrics Kafka Sink .. SUCCESS [ 4.190 s] [INFO] Ambari Metrics Storm Sink .. SUCCESS [ 1.384 s] [INFO] Ambari Metrics Collector ... SUCCESS [04:06 min] [INFO] Ambari Metrics Monitor . SUCCESS [ 3.556 s] [INFO] Ambari Metrics Grafana . SUCCESS [01:03 min] [INFO] Ambari Metrics Assembly SUCCESS [ 3.567 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 05:48 min [INFO] Finished at: 2016-05-30T16:46:07+09:00 [INFO] Final Memory: 78M/1038M [INFO] In fact, I tried to run `mvn test` from ambari root directory but build is failing from ambari-web. ``` > fsevents@0.2.1 install > /Users/jlim/WorkArea/JavaProjects/ambari/ambari-web/node_modules/chokidar/node_modules/fsevents > node-gyp rebuild ... npm WARN install:fsevents fsevents@0.2.1 install: `node-gyp rebuild` npm WARN install:fsevents Exit status 1 ``` No luck on `npm install`, too. Thanks, Jungtaek Lim