[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API
[ https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060097#comment-16060097 ] Jose Soltren commented on SPARK-20391: -- So, this is months old now and irrelevant, but since you pinged me, I'll say that jerryshao's changes look fine to me. Thanks. > Properly rename the memory related fields in ExecutorSummary REST API > - > > Key: SPARK-20391 > URL: https://issues.apache.org/jira/browse/SPARK-20391 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Saisai Shao >Assignee: Saisai Shao >Priority: Blocker > Fix For: 2.2.0 > > > Currently in Spark we could get executor summary through REST API > {{/api/v1/applications//executors}}. The format of executor summary > is: > {code} > class ExecutorSummary private[spark]( > val id: String, > val hostPort: String, > val isActive: Boolean, > val rddBlocks: Int, > val memoryUsed: Long, > val diskUsed: Long, > val totalCores: Int, > val maxTasks: Int, > val activeTasks: Int, > val failedTasks: Int, > val completedTasks: Int, > val totalTasks: Int, > val totalDuration: Long, > val totalGCTime: Long, > val totalInputBytes: Long, > val totalShuffleRead: Long, > val totalShuffleWrite: Long, > val isBlacklisted: Boolean, > val maxMemory: Long, > val executorLogs: Map[String, String], > val onHeapMemoryUsed: Option[Long], > val offHeapMemoryUsed: Option[Long], > val maxOnHeapMemory: Option[Long], > val maxOffHeapMemory: Option[Long]) > {code} > Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, > {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, > {{maxOffHeapMemory}}. > These all 6 fields reflects the *storage* memory usage in Spark, but from the > name of this 6 fields, user doesn't really know it is referring to *storage* > memory or the total memory (storage memory + execution memory). This will be > misleading. > So I think we should properly rename these fields to reflect their real > meanings. Or we should will document it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API
[ https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976376#comment-15976376 ] Apache Spark commented on SPARK-20391: -- User 'jerryshao' has created a pull request for this issue: https://github.com/apache/spark/pull/17700 > Properly rename the memory related fields in ExecutorSummary REST API > - > > Key: SPARK-20391 > URL: https://issues.apache.org/jira/browse/SPARK-20391 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Saisai Shao >Priority: Blocker > > Currently in Spark we could get executor summary through REST API > {{/api/v1/applications//executors}}. The format of executor summary > is: > {code} > class ExecutorSummary private[spark]( > val id: String, > val hostPort: String, > val isActive: Boolean, > val rddBlocks: Int, > val memoryUsed: Long, > val diskUsed: Long, > val totalCores: Int, > val maxTasks: Int, > val activeTasks: Int, > val failedTasks: Int, > val completedTasks: Int, > val totalTasks: Int, > val totalDuration: Long, > val totalGCTime: Long, > val totalInputBytes: Long, > val totalShuffleRead: Long, > val totalShuffleWrite: Long, > val isBlacklisted: Boolean, > val maxMemory: Long, > val executorLogs: Map[String, String], > val onHeapMemoryUsed: Option[Long], > val offHeapMemoryUsed: Option[Long], > val maxOnHeapMemory: Option[Long], > val maxOffHeapMemory: Option[Long]) > {code} > Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, > {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, > {{maxOffHeapMemory}}. > These all 6 fields reflects the *storage* memory usage in Spark, but from the > name of this 6 fields, user doesn't really know it is referring to *storage* > memory or the total memory (storage memory + execution memory). This will be > misleading. > So I think we should properly rename these fields to reflect their real > meanings. Or we should will document it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API
[ https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976319#comment-15976319 ] Saisai Shao commented on SPARK-20391: - I'm in favor of using new REST API to define memory related metrics for executor and don't add more fields to {{ExecutorSummary}}. So here I will only rename this 4 newly added fields: {code} val onHeapMemoryUsed: Option[Long], val offHeapMemoryUsed: Option[Long], val maxOnHeapMemory: Option[Long], val maxOffHeapMemory: Option[Long] {code} For {{maxMemory}} and {{memoryUsed}} I will leave as it was. We could properly define a new API {{ExecutorMemoryMetrics}} where it includes all the memory usage mentioned above. > Properly rename the memory related fields in ExecutorSummary REST API > - > > Key: SPARK-20391 > URL: https://issues.apache.org/jira/browse/SPARK-20391 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Saisai Shao >Priority: Blocker > > Currently in Spark we could get executor summary through REST API > {{/api/v1/applications//executors}}. The format of executor summary > is: > {code} > class ExecutorSummary private[spark]( > val id: String, > val hostPort: String, > val isActive: Boolean, > val rddBlocks: Int, > val memoryUsed: Long, > val diskUsed: Long, > val totalCores: Int, > val maxTasks: Int, > val activeTasks: Int, > val failedTasks: Int, > val completedTasks: Int, > val totalTasks: Int, > val totalDuration: Long, > val totalGCTime: Long, > val totalInputBytes: Long, > val totalShuffleRead: Long, > val totalShuffleWrite: Long, > val isBlacklisted: Boolean, > val maxMemory: Long, > val executorLogs: Map[String, String], > val onHeapMemoryUsed: Option[Long], > val offHeapMemoryUsed: Option[Long], > val maxOnHeapMemory: Option[Long], > val maxOffHeapMemory: Option[Long]) > {code} > Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, > {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, > {{maxOffHeapMemory}}. > These all 6 fields reflects the *storage* memory usage in Spark, but from the > name of this 6 fields, user doesn't really know it is referring to *storage* > memory or the total memory (storage memory + execution memory). This will be > misleading. > So I think we should properly rename these fields to reflect their real > meanings. Or we should will document it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API
[ https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976298#comment-15976298 ] Saisai Shao commented on SPARK-20391: - bq. I assume managed memory here is spark.memory.fraction on heap + spark.memory.offHeap.size? {{totalManagedMemory}} should be equal to spark.memory.fraction + spark.memory.offHeap.size, but {{totalStorageMemory}} is no larger than {{totalManagedMemory}}. At beginning when there's no job running, {{totalStorageMemory}} == {{totalManagedMemory}}, if execution memory is consumed, then {{totalStorageMemory}} < {{totalManagedMemory}}. Here we have two problems in block manager: 1. all the tracked memory in block manager is storage memory, so we should clarify the naming, which is the purpose of this JIRA. 2. block manager only gets the initial snapshot of storage memory ({{totalStorageMemory}} == {{totalManagedMemory}}). As {{totalStorageMemory}} is varying during runtime, so the {{memRemaining}} tracked in {{StorageStatus}} is not accurate. This could be addressed in another JIRA. > Properly rename the memory related fields in ExecutorSummary REST API > - > > Key: SPARK-20391 > URL: https://issues.apache.org/jira/browse/SPARK-20391 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Saisai Shao >Priority: Blocker > > Currently in Spark we could get executor summary through REST API > {{/api/v1/applications//executors}}. The format of executor summary > is: > {code} > class ExecutorSummary private[spark]( > val id: String, > val hostPort: String, > val isActive: Boolean, > val rddBlocks: Int, > val memoryUsed: Long, > val diskUsed: Long, > val totalCores: Int, > val maxTasks: Int, > val activeTasks: Int, > val failedTasks: Int, > val completedTasks: Int, > val totalTasks: Int, > val totalDuration: Long, > val totalGCTime: Long, > val totalInputBytes: Long, > val totalShuffleRead: Long, > val totalShuffleWrite: Long, > val isBlacklisted: Boolean, > val maxMemory: Long, > val executorLogs: Map[String, String], > val onHeapMemoryUsed: Option[Long], > val offHeapMemoryUsed: Option[Long], > val maxOnHeapMemory: Option[Long], > val maxOffHeapMemory: Option[Long]) > {code} > Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, > {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, > {{maxOffHeapMemory}}. > These all 6 fields reflects the *storage* memory usage in Spark, but from the > name of this 6 fields, user doesn't really know it is referring to *storage* > memory or the total memory (storage memory + execution memory). This will be > misleading. > So I think we should properly rename these fields to reflect their real > meanings. Or we should will document it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API
[ https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976145#comment-15976145 ] Saisai Shao commented on SPARK-20391: - Thanks [~irashid] [~tgraves] for your comments. bq. with the current naming: bq. maxMemory is (5): the amount of memory managed by spark bq. maxOnHeapMemory & maxOffHeapMemory are (5) divided into onheap & offheap In the current Spark code, {{maxMemory}} actually reflects the {{totalStorageMemory}}, not the total managed memory, there still left amount of memory for execution (shuffle, tungsten) that's not counted in. So I think it is more precise to change to {{totalStorageMemory}}, not {{totalManagedMemory}}. Also for {{maxOnHeapMemory}} and {{maxOffHeapMemory}}, would be better to change to {{totalOnHeapStorageMemory}} and {{totalOffHeapStorageMemory}}. > Properly rename the memory related fields in ExecutorSummary REST API > - > > Key: SPARK-20391 > URL: https://issues.apache.org/jira/browse/SPARK-20391 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Saisai Shao >Priority: Blocker > > Currently in Spark we could get executor summary through REST API > {{/api/v1/applications//executors}}. The format of executor summary > is: > {code} > class ExecutorSummary private[spark]( > val id: String, > val hostPort: String, > val isActive: Boolean, > val rddBlocks: Int, > val memoryUsed: Long, > val diskUsed: Long, > val totalCores: Int, > val maxTasks: Int, > val activeTasks: Int, > val failedTasks: Int, > val completedTasks: Int, > val totalTasks: Int, > val totalDuration: Long, > val totalGCTime: Long, > val totalInputBytes: Long, > val totalShuffleRead: Long, > val totalShuffleWrite: Long, > val isBlacklisted: Boolean, > val maxMemory: Long, > val executorLogs: Map[String, String], > val onHeapMemoryUsed: Option[Long], > val offHeapMemoryUsed: Option[Long], > val maxOnHeapMemory: Option[Long], > val maxOffHeapMemory: Option[Long]) > {code} > Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, > {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, > {{maxOffHeapMemory}}. > These all 6 fields reflects the *storage* memory usage in Spark, but from the > name of this 6 fields, user doesn't really know it is referring to *storage* > memory or the total memory (storage memory + execution memory). This will be > misleading. > So I think we should properly rename these fields to reflect their real > meanings. Or we should will document it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API
[ https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975115#comment-15975115 ] Thomas Graves commented on SPARK-20391: --- > My proposal was to add 2 extra fields which duplicate the existing ones, so > that the memory metrics are together and hopefully the meaning is clear. > totalManagedMemory would be the same as maxMemory; usedStorageMemory would be > the same as memoryUsed. But I'm not super firm on that, and its definitely > not "must do" for 2.2. yep, makes sense I would think it is easy enough to do, we should just do it here. > Properly rename the memory related fields in ExecutorSummary REST API > - > > Key: SPARK-20391 > URL: https://issues.apache.org/jira/browse/SPARK-20391 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Saisai Shao >Priority: Blocker > > Currently in Spark we could get executor summary through REST API > {{/api/v1/applications//executors}}. The format of executor summary > is: > {code} > class ExecutorSummary private[spark]( > val id: String, > val hostPort: String, > val isActive: Boolean, > val rddBlocks: Int, > val memoryUsed: Long, > val diskUsed: Long, > val totalCores: Int, > val maxTasks: Int, > val activeTasks: Int, > val failedTasks: Int, > val completedTasks: Int, > val totalTasks: Int, > val totalDuration: Long, > val totalGCTime: Long, > val totalInputBytes: Long, > val totalShuffleRead: Long, > val totalShuffleWrite: Long, > val isBlacklisted: Boolean, > val maxMemory: Long, > val executorLogs: Map[String, String], > val onHeapMemoryUsed: Option[Long], > val offHeapMemoryUsed: Option[Long], > val maxOnHeapMemory: Option[Long], > val maxOffHeapMemory: Option[Long]) > {code} > Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, > {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, > {{maxOffHeapMemory}}. > These all 6 fields reflects the *storage* memory usage in Spark, but from the > name of this 6 fields, user doesn't really know it is referring to *storage* > memory or the total memory (storage memory + execution memory). This will be > misleading. > So I think we should properly rename these fields to reflect their real > meanings. Or we should will document it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API
[ https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975062#comment-15975062 ] Imran Rashid commented on SPARK-20391: -- bq. If we want to change the names of the other 2 we could simply add 2 extra fields with a more appropriate name and leave the other 2 not sure that is necessary at this point though. My proposal was to add 2 extra fields which duplicate the existing ones, so that the memory metrics are together and hopefully the meaning is clear. {{totalManagedMemory}} would be the same as {{maxMemory}}; {{usedStorageMemory}} would be the same as {{memoryUsed}}. But I'm not super firm on that, and its definitely not "must do" for 2.2. bq. It think we should document rest api better yeah, no objections to better docs, I just see that as a bigger change, and I think I'd rather update the names for 2.2 bq. I assume managed memory here is spark.memory.fraction on heap + spark.memory.offHeap.size? yes. [~jerryshao] I'm going to mark this as a blocker for 2.2, I think tom and I basically agree on what needs to be done immediately here. Can you take care of the implementation? > Properly rename the memory related fields in ExecutorSummary REST API > - > > Key: SPARK-20391 > URL: https://issues.apache.org/jira/browse/SPARK-20391 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Saisai Shao >Priority: Minor > > Currently in Spark we could get executor summary through REST API > {{/api/v1/applications//executors}}. The format of executor summary > is: > {code} > class ExecutorSummary private[spark]( > val id: String, > val hostPort: String, > val isActive: Boolean, > val rddBlocks: Int, > val memoryUsed: Long, > val diskUsed: Long, > val totalCores: Int, > val maxTasks: Int, > val activeTasks: Int, > val failedTasks: Int, > val completedTasks: Int, > val totalTasks: Int, > val totalDuration: Long, > val totalGCTime: Long, > val totalInputBytes: Long, > val totalShuffleRead: Long, > val totalShuffleWrite: Long, > val isBlacklisted: Boolean, > val maxMemory: Long, > val executorLogs: Map[String, String], > val onHeapMemoryUsed: Option[Long], > val offHeapMemoryUsed: Option[Long], > val maxOnHeapMemory: Option[Long], > val maxOffHeapMemory: Option[Long]) > {code} > Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, > {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, > {{maxOffHeapMemory}}. > These all 6 fields reflects the *storage* memory usage in Spark, but from the > name of this 6 fields, user doesn't really know it is referring to *storage* > memory or the total memory (storage memory + execution memory). This will be > misleading. > So I think we should properly rename these fields to reflect their real > meanings. Or we should will document it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API
[ https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975029#comment-15975029 ] Thomas Graves commented on SPARK-20391: --- I agree that if its been released we can't change it, the on/off heap we need to change asap before a release. If we want to change the names of the other 2 we could simply add 2 extra fields with a more appropriate name and leave the other 2 not sure that is necessary at this point though. It think we should document rest api better and I think that page would be fine or link to another page, but that might be a separate jira if this is to change names still. Its an api and we should have had that from the beginning. example of yarn rest api docs: https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html I'm sure there are better examples too. I think making it a separate ExecutorMemoryMetrics makes sense so we can more easily extend in the future.. I assume managed memory here is spark.memory.fraction on heap + spark.memory.offHeap.size? > Properly rename the memory related fields in ExecutorSummary REST API > - > > Key: SPARK-20391 > URL: https://issues.apache.org/jira/browse/SPARK-20391 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Saisai Shao >Priority: Minor > > Currently in Spark we could get executor summary through REST API > {{/api/v1/applications//executors}}. The format of executor summary > is: > {code} > class ExecutorSummary private[spark]( > val id: String, > val hostPort: String, > val isActive: Boolean, > val rddBlocks: Int, > val memoryUsed: Long, > val diskUsed: Long, > val totalCores: Int, > val maxTasks: Int, > val activeTasks: Int, > val failedTasks: Int, > val completedTasks: Int, > val totalTasks: Int, > val totalDuration: Long, > val totalGCTime: Long, > val totalInputBytes: Long, > val totalShuffleRead: Long, > val totalShuffleWrite: Long, > val isBlacklisted: Boolean, > val maxMemory: Long, > val executorLogs: Map[String, String], > val onHeapMemoryUsed: Option[Long], > val offHeapMemoryUsed: Option[Long], > val maxOnHeapMemory: Option[Long], > val maxOffHeapMemory: Option[Long]) > {code} > Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, > {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, > {{maxOffHeapMemory}}. > These all 6 fields reflects the *storage* memory usage in Spark, but from the > name of this 6 fields, user doesn't really know it is referring to *storage* > memory or the total memory (storage memory + execution memory). This will be > misleading. > So I think we should properly rename these fields to reflect their real > meanings. Or we should will document it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API
[ https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974926#comment-15974926 ] Imran Rashid commented on SPARK-20391: -- cc [~tgraves] [~jsoltren] > Properly rename the memory related fields in ExecutorSummary REST API > - > > Key: SPARK-20391 > URL: https://issues.apache.org/jira/browse/SPARK-20391 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Saisai Shao >Priority: Minor > > Currently in Spark we could get executor summary through REST API > {{/api/v1/applications//executors}}. The format of executor summary > is: > {code} > class ExecutorSummary private[spark]( > val id: String, > val hostPort: String, > val isActive: Boolean, > val rddBlocks: Int, > val memoryUsed: Long, > val diskUsed: Long, > val totalCores: Int, > val maxTasks: Int, > val activeTasks: Int, > val failedTasks: Int, > val completedTasks: Int, > val totalTasks: Int, > val totalDuration: Long, > val totalGCTime: Long, > val totalInputBytes: Long, > val totalShuffleRead: Long, > val totalShuffleWrite: Long, > val isBlacklisted: Boolean, > val maxMemory: Long, > val executorLogs: Map[String, String], > val onHeapMemoryUsed: Option[Long], > val offHeapMemoryUsed: Option[Long], > val maxOnHeapMemory: Option[Long], > val maxOffHeapMemory: Option[Long]) > {code} > Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, > {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, > {{maxOffHeapMemory}}. > These all 6 fields reflects the *storage* memory usage in Spark, but from the > name of this 6 fields, user doesn't really know it is referring to *storage* > memory or the total memory (storage memory + execution memory). This will be > misleading. > So I think we should properly rename these fields to reflect their real > meanings. Or we should will document it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API
[ https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974924#comment-15974924 ] Imran Rashid commented on SPARK-20391: -- {{memoryUsed}} and {{maxMemory}} exist in already released versions, so unfortunately I don't think we can rename those, unfortunately. There also isn't a great place to document the exact meaning of the fields -- these are the only docs we have now http://spark.apache.org/docs/latest/monitoring.html#rest-api , which doesnt' go into detail on fields returned. Do you have any suggestions on where we could document it? I don't think this is worth an api/v2 by itself. I feel like the best thing for us to do would be to leave {{memoryUsed}} and {{maxMemory}} alone, and lets reorganize / rename the others. Lets start by considering what we might want to eventually report in these metrics, so we can make sure we have unambiguous names for all of them. 1) total memory available for the executor -- onheap and offheap (eg., include the "overhead" memory on yarn) 2) heap size 3) heap used 4) total memory used by the process 5) amount of memory managed by spark 6) memory used by spark's memory manager 7) memory "designated" for caching rdds (eg. from {{spark.memory.storageFraction}} with unified memory manager)* 8) memory currently used for caching rdds 9) memory currently used for execution all of the metrics related to spark's memory management have an onheap & offheap component. all of the "memory used" metrics will vary over time, so its not really clear what you want to report. I named the metrics as "current value". But that is strange if you're looking at a completed app, and anything other than storage memory. That is somewhat orthogonal from the discussion here, though -- for now its just clearly distinguishing those metrics from the storage metrics. with the current naming: * {{maxMemory}} is (5): the amount of memory managed by spark ** {{maxOnHeapMemory}} & {{maxOffHeapMemory}} are (5) divided into onheap & offheap * {{memoryUsed}} is (8): memory currently used for cached rdds * {{onHeapMemoryUsed}} and {{offHeapMemoryUsed}} are (8) subdivided into onheap & offheap right? Given the number of different metrics already, with the list potentially growing, I think we should add a {{ExecutorMemoryMetrics}} inside {{ExecutorMetrics}}, with the following names for what we have so far: * {{totalManagedMemory}} * {{totalManagedOnHeapMemory}} * {{totalManagedOffHeapMemory}} * {{usedStorageMemory}} * {{usedOnHeapStorageMemory}} * {{usedOffHeapStorageMemory}} I'm avoiding using "max", and used "total" instead, as in the future I can see that we might want to report a "max used over time" (eg. over the entire lifetime of my application, what was the maximum execution memory?) how does that sound? > Properly rename the memory related fields in ExecutorSummary REST API > - > > Key: SPARK-20391 > URL: https://issues.apache.org/jira/browse/SPARK-20391 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Saisai Shao >Priority: Minor > > Currently in Spark we could get executor summary through REST API > {{/api/v1/applications//executors}}. The format of executor summary > is: > {code} > class ExecutorSummary private[spark]( > val id: String, > val hostPort: String, > val isActive: Boolean, > val rddBlocks: Int, > val memoryUsed: Long, > val diskUsed: Long, > val totalCores: Int, > val maxTasks: Int, > val activeTasks: Int, > val failedTasks: Int, > val completedTasks: Int, > val totalTasks: Int, > val totalDuration: Long, > val totalGCTime: Long, > val totalInputBytes: Long, > val totalShuffleRead: Long, > val totalShuffleWrite: Long, > val isBlacklisted: Boolean, > val maxMemory: Long, > val executorLogs: Map[String, String], > val onHeapMemoryUsed: Option[Long], > val offHeapMemoryUsed: Option[Long], > val maxOnHeapMemory: Option[Long], > val maxOffHeapMemory: Option[Long]) > {code} > Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, > {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, > {{maxOffHeapMemory}}. > These all 6 fields reflects the *storage* memory usage in Spark, but from the > name of this 6 fields, user doesn't really know it is referring to *storage* > memory or the total memory (storage memory + execution memory). This will be > misleading. > So I think we should properly rename these fields to reflect their real > meanings. Or we should will document it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To
[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API
[ https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974685#comment-15974685 ] Saisai Shao commented on SPARK-20391: - [~irashid], would be grateful to hear your suggestion. > Properly rename the memory related fields in ExecutorSummary REST API > - > > Key: SPARK-20391 > URL: https://issues.apache.org/jira/browse/SPARK-20391 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Saisai Shao >Priority: Minor > > Currently in Spark we could get executor summary through REST API > {{/api/v1/applications//executors}}. The format of executor summary > is: > {code} > class ExecutorSummary private[spark]( > val id: String, > val hostPort: String, > val isActive: Boolean, > val rddBlocks: Int, > val memoryUsed: Long, > val diskUsed: Long, > val totalCores: Int, > val maxTasks: Int, > val activeTasks: Int, > val failedTasks: Int, > val completedTasks: Int, > val totalTasks: Int, > val totalDuration: Long, > val totalGCTime: Long, > val totalInputBytes: Long, > val totalShuffleRead: Long, > val totalShuffleWrite: Long, > val isBlacklisted: Boolean, > val maxMemory: Long, > val executorLogs: Map[String, String], > val onHeapMemoryUsed: Option[Long], > val offHeapMemoryUsed: Option[Long], > val maxOnHeapMemory: Option[Long], > val maxOffHeapMemory: Option[Long]) > {code} > Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, > {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, > {{maxOffHeapMemory}}. > These all 6 fields reflects the *storage* memory usage in Spark, but from the > name of this 6 fields, user doesn't really know it is referring to *storage* > memory or the total memory (storage memory + execution memory). This will be > misleading. > So I think we should properly rename these fields to reflect their real > meanings. Or we should will document it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org