GitHub user rezasafi opened a pull request: https://github.com/apache/spark/pull/21916
[SPARK-24958][WIP] Report executors' process tree total memory information to heartbeat signals This is work in progress for SPARK-24958 and this PR is opened on top of the PR for SPARK-23429: https://github.com/apache/spark/pull/21221/ To view the changes that are only related to SPARK-24958 you can check the following view: https://github.com/rezasafi/spark/pull/1 Spark executors' process tree total memory information can be really useful. Currently such information are not available. The goal of this PR is to compute such information for each executor, add these information to the heartbeat signals, and compute the peaks at the driver. This PR is tested by running the current unit tests and the ones that are added by the PR for SPARK-23429. I have also tested this on our internal cluster and have verified that it is working. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rezasafi/spark ptreememory Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21916.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21916 ---- commit c8e8abedbdfec6e92b0c63e90f3c2c5755fd8978 Author: Edwina Lu <edlu@...> Date: 2018-03-09T23:39:36Z SPARK-23429: Add executor memory metrics to heartbeat and expose in executors REST API Add new executor level memory metrics (JVM used memory, on/off heap execution memory, on/off heap storage memory), and expose via the executors REST API. This information will help provide insight into how executor and driver JVM memory is used, and for the different memory regions. It can be used to help determine good values for spark.executor.memory, spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction. Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. The new ExecutorMetrics will be sent by executors to the driver as part of Heartbeat. A heartbeat will be added for the driver as well, to collect these metrics for the driver. Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there is a new peak value for any of the memory metrics for an executor and stage. Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize additional logging. Modify the AppStatusListener to record the peak values for each memory metric. Add the new memory metrics to the executors REST API. commit 5d6ae1c34bf6618754e4b8b2e756a9a7b4bad987 Author: Edwina Lu <edlu@...> Date: 2018-04-02T02:13:41Z modify MimaExcludes.scala to filter changes to SparkListenerExecutorMetricsUpdate commit ad10d2814bbfbaf8c21fcbb1abe83ef7a8e9ffe7 Author: Edwina Lu <edlu@...> Date: 2018-04-22T00:02:57Z Address code review comments, change event logging to stage end. commit 10ed328bfcf160711e7619aac23472f97bf1c976 Author: Edwina Lu <edlu@...> Date: 2018-05-15T00:24:22Z Add configuration parameter spark.eventLog.logExecutorMetricsUpdates.enabled to enable/disable executor metrics update logging. Code review comments. commit 2d2036760a298c7434eb4816c1bf045c43713e6f Author: Imran Rashid <irashid@...> Date: 2018-05-23T19:37:26Z wip on enum based metrics commit f904f1e0bc3fab90db7f7aa7cfcf71b9fb26e890 Author: Imran Rashid <irashid@...> Date: 2018-05-23T20:50:26Z wip ... has both enum and non-enum version commit c502ec4c7f55083356187c2906d24440d0168d2f Author: Imran Rashid <irashid@...> Date: 2018-05-23T21:23:44Z case objects, mostly complete commit 7879e66eed22cfd4dff2367c0ee3138369243711 Author: edwinalu <edwina.lu@...> Date: 2018-06-03T02:31:14Z Merge pull request #1 from squito/metric_enums Metric enums commit 2662f6f9c6a7c34cea34b748f6735eb1625b73cb Author: Edwina Lu <edlu@...> Date: 2018-06-10T21:34:19Z Address comments (move heartbeater from DAGScheduler to SparkContext, move logic for getting metrics to Heartbeater), and modifiy tests for the new ExecutorMetrics format. commit 287133597f819417f96ae5965895c1b640703d86 Author: Edwina Lu <edlu@...> Date: 2018-03-09T23:39:36Z SPARK-23429: Add executor memory metrics to heartbeat and expose in executors REST API Add new executor level memory metrics (JVM used memory, on/off heap execution memory, on/off heap storage memory), and expose via the executors REST API. This information will help provide insight into how executor and driver JVM memory is used, and for the different memory regions. It can be used to help determine good values for spark.executor.memory, spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction. Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. The new ExecutorMetrics will be sent by executors to the driver as part of Heartbeat. A heartbeat will be added for the driver as well, to collect these metrics for the driver. Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there is a new peak value for any of the memory metrics for an executor and stage. Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize additional logging. Modify the AppStatusListener to record the peak values for each memory metric. Add the new memory metrics to the executors REST API. commit da83f2e58ff7d495111a0c1f36bf54ebcf35d444 Author: Edwina Lu <edlu@...> Date: 2018-04-02T02:13:41Z modify MimaExcludes.scala to filter changes to SparkListenerExecutorMetricsUpdate commit f25a44b95e4e6a8532c6541ee985789dff5bc7de Author: Edwina Lu <edlu@...> Date: 2018-04-22T00:02:57Z Address code review comments, change event logging to stage end. commit ca85c8219f46e3265b8191e82a4017c2cb97fc49 Author: Edwina Lu <edlu@...> Date: 2018-05-15T00:24:22Z Add configuration parameter spark.eventLog.logExecutorMetricsUpdates.enabled to enable/disable executor metrics update logging. Code review comments. commit 8b74ba8fff21b499e7cc9d93f9864831aa29773e Author: Imran Rashid <irashid@...> Date: 2018-05-23T19:37:26Z wip on enum based metrics commit 036148cdbe60b7ad7ff318260580896ad0da6cd0 Author: Imran Rashid <irashid@...> Date: 2018-05-23T20:50:26Z wip ... has both enum and non-enum version commit 91fb1db09504fc4386477ab51221d28240c3c901 Author: Imran Rashid <irashid@...> Date: 2018-05-23T21:23:44Z case objects, mostly complete commit 2d8894a91f4a0dacd49114dc74cc97b7c9426879 Author: Edwina Lu <edlu@...> Date: 2018-06-10T21:34:19Z Address comments (move heartbeater from DAGScheduler to SparkContext, move logic for getting metrics to Heartbeater), and modifiy tests for the new ExecutorMetrics format. commit 99044e6ec0cdc1b760c57dd5b7e74349384c6a98 Author: Edwina Lu <edlu@...> Date: 2018-06-14T00:15:00Z Merge branch 'SPARK-23429.2' of https://github.com/edwinalu/spark into SPARK-23429.2 commit 263c8c846265b6bdfdce471e44c163ab85b930a3 Author: Edwina Lu <edlu@...> Date: 2018-06-14T23:52:11Z code review comments commit 812fdcf3961bae2a4fa20b4f60e739b45233fcd0 Author: Edwina Lu <edlu@...> Date: 2018-06-22T23:53:23Z code review comments: - remove timestamp - change ExecutorMetrics to Array[Long] - create new SparkListenerStageExecutorMetrics for recording stage executor metric peaks in the history log Fix issue where metrics for a removed executor were ignored (save dead executors while there currently active stages that the executor was alive for). commit 7ed42a5d0eb0b93bb9ddecf14d9461c80dfe1ea0 Author: Edwina Lu <edlu@...> Date: 2018-06-28T18:41:58Z Address code review comments. Also make executorUpdates in SparkListenerExecutorMetricsUpdate not optional. These are no longer logged, and backward compatibility should not be an issue. These events should only be used to send task and executor updates for heartbeats, and executors and driver should be the same Spark version. commit 8d9acdf32984c0c9c621a058b45805872bb9e4c5 Author: Edwina Lu <edlu@...> Date: 2018-06-29T23:27:51Z Revert and make executorUpdates in SparkListenerExecutorMetricsUpdate optional again, in case of existing users of SparkListenerExecutorMetricsUpdate. commit 20799d2af7b70334534be913f7defea6d6b79ffb Author: Edwina Lu <edlu@...> Date: 2018-07-25T18:02:45Z code review comments: hid array implementation of executor metrics, and add ExecutorMetrics, with getMetricValue() method for accessing executor metric values. Rename MetricGetter to ExecutorMetricType. Should ExecutorMetricType be moved to executor package, or ExecutorMetrics be moved to metrics package? Should Json (de)serialization functions be moved from api.scala to ExecutorMetrics? commit 8905d231c3a959f70266223d3546b17a655cee39 Author: Edwina Lu <edlu@...> Date: 2018-07-25T20:49:09Z merge with master commit 81dd2e519fb269a90515f5167f3d8f425515b661 Author: Reza Safi <rezasafi@...> Date: 2018-07-26T21:33:52Z Integration of ProcessTreeMetrics with PR 21221 commit 26dc46bde1506a3718d74fc5edac20856c609a88 Author: Reza Safi <rezasafi@...> Date: 2018-07-27T15:03:59Z Some improvements in integration commit d60e255de5e90d8529c7d6496b95ceae2ae20be3 Author: Reza Safi <rezasafi@...> Date: 2018-07-27T15:05:34Z Integration with the unit tests of the upstream open PR commit d8c3293e9cd7238fef5b4c517b23ac05f1d83508 Author: Reza Safi <rezasafi@...> Date: 2018-07-28T05:44:19Z Fix an isuue with memory info computation. commit ee9ba5985741da26cb148009760b546353d0cb34 Author: Reza Safi <rezasafi@...> Date: 2018-07-28T06:09:20Z Fix scalastyle errors ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org