[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-06-22 Thread Jose Soltren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060097#comment-16060097
 ] 

Jose Soltren commented on SPARK-20391:
--

So, this is months old now and irrelevant, but since you pinged me, I'll say 
that jerryshao's changes look fine to me. Thanks.

> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Blocker
> Fix For: 2.2.0
>
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976376#comment-15976376
 ] 

Apache Spark commented on SPARK-20391:
--

User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/17700

> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Blocker
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-20 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976319#comment-15976319
 ] 

Saisai Shao commented on SPARK-20391:
-

I'm in favor of using new REST API to define memory related metrics for 
executor and don't add more fields to {{ExecutorSummary}}.

So here I will only rename this 4 newly added fields:

{code}
val onHeapMemoryUsed: Option[Long],
val offHeapMemoryUsed: Option[Long],
val maxOnHeapMemory: Option[Long],
val maxOffHeapMemory: Option[Long]
{code}

For {{maxMemory}} and {{memoryUsed}} I will leave as it was. 

We could properly define a new API {{ExecutorMemoryMetrics}} where it includes 
all the memory usage mentioned above.

> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Blocker
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-20 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976298#comment-15976298
 ] 

Saisai Shao commented on SPARK-20391:
-

bq. I assume managed memory here is spark.memory.fraction on heap + 
spark.memory.offHeap.size?

{{totalManagedMemory}} should be equal to spark.memory.fraction + 
spark.memory.offHeap.size, but {{totalStorageMemory}} is no larger than 
{{totalManagedMemory}}. At beginning when there's no job running, 
{{totalStorageMemory}} == {{totalManagedMemory}}, if execution memory is 
consumed, then {{totalStorageMemory}} < {{totalManagedMemory}}. 

Here we have two problems in block manager:

1. all the tracked memory in block manager is storage memory, so we should 
clarify the naming, which is the purpose of this JIRA.
2. block manager only gets the initial snapshot of storage memory 
({{totalStorageMemory}} == {{totalManagedMemory}}). As {{totalStorageMemory}} 
is varying during runtime, so the {{memRemaining}} tracked in {{StorageStatus}} 
is not accurate. This could be addressed in another JIRA.


> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Blocker
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-20 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976145#comment-15976145
 ] 

Saisai Shao commented on SPARK-20391:
-

Thanks [~irashid] [~tgraves] for your comments.

bq. with the current naming:
bq. maxMemory is (5): the amount of memory managed by spark
bq. maxOnHeapMemory & maxOffHeapMemory are (5) divided into onheap & offheap

In the current Spark code, {{maxMemory}} actually reflects the 
{{totalStorageMemory}}, not the total managed memory, there still left amount 
of memory for execution (shuffle, tungsten) that's not counted in. So I think 
it is more precise to change to {{totalStorageMemory}}, not 
{{totalManagedMemory}}.

Also for {{maxOnHeapMemory}} and {{maxOffHeapMemory}}, would be better to 
change to {{totalOnHeapStorageMemory}} and {{totalOffHeapStorageMemory}}.

> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Blocker
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-19 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975115#comment-15975115
 ] 

Thomas Graves commented on SPARK-20391:
---

> My proposal was to add 2 extra fields which duplicate the existing ones, so 
> that the memory metrics are together and hopefully the meaning is clear. 
> totalManagedMemory would be the same as maxMemory; usedStorageMemory would be 
> the same as memoryUsed. But I'm not super firm on that, and its definitely 
> not "must do" for 2.2.

yep, makes sense I would think it is easy enough to do, we should just do it 
here.

> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Blocker
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-19 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975062#comment-15975062
 ] 

Imran Rashid commented on SPARK-20391:
--

bq. If we want to change the names of the other 2 we could simply add 2 extra 
fields with a more appropriate name and leave the other 2 not sure that is 
necessary at this point though.

My proposal was to add 2 extra fields which duplicate the existing ones, so 
that the memory metrics are together and hopefully the meaning is clear.  
{{totalManagedMemory}} would be the same as {{maxMemory}}; 
{{usedStorageMemory}} would be the same as {{memoryUsed}}.  But I'm not super 
firm on that, and its definitely not "must do" for 2.2.

bq. It think we should document rest api better

yeah, no objections to better docs, I just see that as a bigger change, and I 
think I'd rather update the names for 2.2

bq. I assume managed memory here is spark.memory.fraction on heap + 
spark.memory.offHeap.size?

yes.

[~jerryshao]  I'm going to mark this as a blocker for 2.2, I think tom and I 
basically agree on what needs to be done immediately here.  Can you take care 
of the implementation?

> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Minor
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-19 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975029#comment-15975029
 ] 

Thomas Graves commented on SPARK-20391:
---

I agree that if its been released we can't change it, the on/off heap we need 
to change asap before a release.  If we want to change the names of the other 2 
we could simply add 2 extra fields with a more appropriate name and leave the 
other 2 not sure that is necessary at this point though.

It think we should document rest api better and I think that page would be fine 
or link to another page, but that might be a separate jira if this is to change 
names still.  Its an api and we should have had that from the beginning. 
example of yarn rest api docs: 
https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
  I'm sure there are better examples too.

I think making it a separate ExecutorMemoryMetrics makes sense so we can more 
easily extend in the future..   I assume managed memory here is  
spark.memory.fraction on heap + spark.memory.offHeap.size?



> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Minor
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-19 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974926#comment-15974926
 ] 

Imran Rashid commented on SPARK-20391:
--

cc [~tgraves] [~jsoltren]

> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Minor
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-19 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974924#comment-15974924
 ] 

Imran Rashid commented on SPARK-20391:
--

{{memoryUsed}} and {{maxMemory}} exist in already released versions, so 
unfortunately I don't think we can rename those, unfortunately.  There also 
isn't a great place to document the exact meaning of the fields -- these are 
the only docs we have now 
http://spark.apache.org/docs/latest/monitoring.html#rest-api , which doesnt' go 
into detail on fields returned.  Do you have any suggestions on where we could 
document it?

I don't think this is worth an api/v2 by itself.  I feel like the best thing 
for us to do would be to leave {{memoryUsed}} and {{maxMemory}} alone, and lets 
reorganize / rename the others.

Lets start by considering what we might want to eventually report in these 
metrics, so we can make sure we have unambiguous names for all of them.

1) total memory available for the executor -- onheap and offheap (eg., include 
the "overhead" memory on yarn)
2) heap size
3) heap used
4) total memory used by the process
5) amount of memory managed by spark
6) memory used by spark's memory manager
7) memory "designated" for caching rdds (eg. from 
{{spark.memory.storageFraction}} with unified memory manager)*
8) memory currently used for caching rdds
9) memory currently used for execution

all of the metrics related to spark's memory management have an onheap & 
offheap component.

all of the "memory used" metrics will vary over time, so its not really clear 
what you want to report.  I named the metrics as "current value".  But that is 
strange if you're looking at a completed app, and anything other than storage 
memory.  That is somewhat orthogonal from the discussion here, though -- for 
now its just clearly distinguishing those metrics from the storage metrics.

with the current naming:

* {{maxMemory}} is (5): the amount of memory managed by spark
** {{maxOnHeapMemory}} & {{maxOffHeapMemory}} are (5) divided into onheap & 
offheap
* {{memoryUsed}} is (8): memory currently used for cached rdds
* {{onHeapMemoryUsed}} and {{offHeapMemoryUsed}} are (8) subdivided into onheap 
& offheap

right?

Given the number of different metrics already, with the list potentially 
growing, I think we should add a {{ExecutorMemoryMetrics}} inside 
{{ExecutorMetrics}}, with the following names for what we have so far:

* {{totalManagedMemory}}
* {{totalManagedOnHeapMemory}}
* {{totalManagedOffHeapMemory}}
* {{usedStorageMemory}}
* {{usedOnHeapStorageMemory}}
* {{usedOffHeapStorageMemory}}

I'm avoiding using "max", and used "total" instead, as in the future I can see 
that we might want to report a "max used over time" (eg. over the entire 
lifetime of my application, what was the maximum execution memory?)

how does that sound?

> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Minor
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To 

[jira] [Commented] (SPARK-20391) Properly rename the memory related fields in ExecutorSummary REST API

2017-04-19 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974685#comment-15974685
 ] 

Saisai Shao commented on SPARK-20391:
-

[~irashid], would be grateful to hear your suggestion.

> Properly rename the memory related fields in ExecutorSummary REST API
> -
>
> Key: SPARK-20391
> URL: https://issues.apache.org/jira/browse/SPARK-20391
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Minor
>
> Currently in Spark we could get executor summary through REST API 
> {{/api/v1/applications//executors}}. The format of executor summary 
> is:
> {code}
> class ExecutorSummary private[spark](
> val id: String,
> val hostPort: String,
> val isActive: Boolean,
> val rddBlocks: Int,
> val memoryUsed: Long,
> val diskUsed: Long,
> val totalCores: Int,
> val maxTasks: Int,
> val activeTasks: Int,
> val failedTasks: Int,
> val completedTasks: Int,
> val totalTasks: Int,
> val totalDuration: Long,
> val totalGCTime: Long,
> val totalInputBytes: Long,
> val totalShuffleRead: Long,
> val totalShuffleWrite: Long,
> val isBlacklisted: Boolean,
> val maxMemory: Long,
> val executorLogs: Map[String, String],
> val onHeapMemoryUsed: Option[Long],
> val offHeapMemoryUsed: Option[Long],
> val maxOnHeapMemory: Option[Long],
> val maxOffHeapMemory: Option[Long])
> {code}
> Here are 6 memory related fields: {{memoryUsed}}, {{maxMemory}}, 
> {{onHeapMemoryUsed}}, {{offHeapMemoryUsed}}, {{maxOnHeapMemory}}, 
> {{maxOffHeapMemory}}.
> These all 6 fields reflects the *storage* memory usage in Spark, but from the 
> name of this 6 fields, user doesn't really know it is referring to *storage* 
> memory or the total memory (storage memory + execution memory). This will be 
> misleading.
> So I think we should properly rename these fields to reflect their real 
> meanings. Or we should will document it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org