GitHub user mareksimunek opened a pull request: https://github.com/apache/spark/pull/22485
[SPARK-18364][YARN] Expose metrics for YarnShuffleService ## What changes were proposed in this pull request? This PR is follow-up of closed https://github.com/apache/spark/pull/17401 which only ended due to of inactivity, but its still nice feature to have. Given review by @jerryshao taken in consideration and edited: - @VisibleForTesting deleted because of dependency conflicts - removed unnecessary reflection for `MetricsSystemImpl` - added more available types for gauge ## How was this patch tested? Manual deploy of new yarn-shuffle jar into a Node Manager and verifying that the metrics appear in the Node Manager-standard location. This is JMX with an query endpoint running on `hostname:port` Resulting metrics look like this: ``` curl -sk -XGET hostname:port | grep -v '#' | grep 'shuffleService' hadoop_nodemanager_openblockrequestlatencymillis_rate15{name="shuffleService",} 0.31428910657834713 hadoop_nodemanager_blocktransferratebytes_rate15{name="shuffleService",} 566144.9983653595 hadoop_nodemanager_blocktransferratebytes_ratemean{name="shuffleService",} 2464409.9678099006 hadoop_nodemanager_openblockrequestlatencymillis_rate1{name="shuffleService",} 1.2893844732240272 hadoop_nodemanager_registeredexecutorssize{name="shuffleService",} 2.0 hadoop_nodemanager_openblockrequestlatencymillis_ratemean{name="shuffleService",} 1.255574678369966 hadoop_nodemanager_openblockrequestlatencymillis_count{name="shuffleService",} 315.0 hadoop_nodemanager_openblockrequestlatencymillis_rate5{name="shuffleService",} 0.7661929192569739 hadoop_nodemanager_registerexecutorrequestlatencymillis_ratemean{name="shuffleService",} 0.0 hadoop_nodemanager_registerexecutorrequestlatencymillis_count{name="shuffleService",} 0.0 hadoop_nodemanager_registerexecutorrequestlatencymillis_rate1{name="shuffleService",} 0.0 hadoop_nodemanager_registerexecutorrequestlatencymillis_rate5{name="shuffleService",} 0.0 hadoop_nodemanager_blocktransferratebytes_count{name="shuffleService",} 6.18271213E8 hadoop_nodemanager_registerexecutorrequestlatencymillis_rate15{name="shuffleService",} 0.0 hadoop_nodemanager_blocktransferratebytes_rate5{name="shuffleService",} 1154114.4881816586 hadoop_nodemanager_blocktransferratebytes_rate1{name="shuffleService",} 574745.0749848988 ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/mareksimunek/spark SPARK-18364 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22485.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22485 ---- commit 5d9434bbb2fa650e987bc2e68d183aea691f9ac5 Author: Andrew Ash <andrew@...> Date: 2017-03-23T02:59:38Z [SPARK-18364][YARN] Expose metrics for YarnShuffleService Registers the shuffle server's metrics with the Hadoop Node Manager's DefaultMetricsSystem. Test metric collector gets right converted calls camel-case shuffleService Pass scalastyle Reformat and organize imports With import order specified at http://spark.apache.org/contributing.html commit 6c96397536af57a8bbe8dd2529547427f643512b Author: marek.simunek <marek.simunek@...> Date: 2018-09-19T15:17:53Z [SPARK-18364][YARN] YarnShuffleService metrics correction ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org