[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Dev Lakhani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173926#comment-14173926
 ] 

Dev Lakhani commented on SPARK-3957:


Here is my thoughts on a possible approach.

Hi All

The broadcast occurs  form the Spark Context to the broadcastmanager and new 
Broadcast method. In the first instance, the broadcasted data is stored in the 
Block Manager (see HttpBroadCast) of the executor. Any tracking of broadcast 
variables must be referenced by the BlockManagerSlaveActor and 
BlockManagerMasterActor.  In particular UpdateBlockInfo and RemoveBroadcast 
should update the total memory in blocks used when blocks are added and removed.

These can then be hooked up to the UI using a new Page like ExecutorsPage and 
defining a new methods in the relevant listener such as StorageStatusListener. 

These are my initial thoughts for someone new to these components, any other 
ideas or approaches?

 Broadcast variable memory usage not reflected in UI
 ---

 Key: SPARK-3957
 URL: https://issues.apache.org/jira/browse/SPARK-3957
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

 Memory used by broadcast variables are not reflected in the memory usage 
 reported in the WebUI. For example, the executors tab shows memory used in 
 each executor but this number doesn't include memory used by broadcast 
 variables. Similarly the storage tab only shows list of rdds cached and how 
 much memory they use.  
 We should add a separate column / tab for broadcast variables to make it 
 easier to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173954#comment-14173954
 ] 

Shivaram Venkataraman commented on SPARK-3957:
--

I think it needs to be tracked in the Block Manager -- However we also need to 
track this on a per-executor basis and not just at the driver. Right now AFAIK, 
executors do not report new broadcast blocks to the master to reduce 
communication. However we could add broadcast blocks to some periodic report. 
[~andrewor] might know more.

 Broadcast variable memory usage not reflected in UI
 ---

 Key: SPARK-3957
 URL: https://issues.apache.org/jira/browse/SPARK-3957
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

 Memory used by broadcast variables are not reflected in the memory usage 
 reported in the WebUI. For example, the executors tab shows memory used in 
 each executor but this number doesn't include memory used by broadcast 
 variables. Similarly the storage tab only shows list of rdds cached and how 
 much memory they use.  
 We should add a separate column / tab for broadcast variables to make it 
 easier to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174019#comment-14174019
 ] 

Andrew Or commented on SPARK-3957:
--

Yeah my understanding is that broadcast blocks aren't reported to the driver 
(and it makes sense to not report them because the driver is the one who 
initiated the broadcast in the first place). The source of the broadcast info 
we want to display is in the BlockManager of each executor, and we need to get 
this to the driver somehow. We could add some periodic reporting but that opens 
another channel between the driver and the executors. There is an ongoing 
effort to do something similar for task metrics 
https://github.com/apache/spark/pull/2087, so maybe we can piggyback this 
information on the heartbeats there.

Also I believe this is a duplicate of an old issue SPARK-1761, though this one 
contains more information so let's keep this one open. I will close the other 
one in favor of this.

 Broadcast variable memory usage not reflected in UI
 ---

 Key: SPARK-3957
 URL: https://issues.apache.org/jira/browse/SPARK-3957
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

 Memory used by broadcast variables are not reflected in the memory usage 
 reported in the WebUI. For example, the executors tab shows memory used in 
 each executor but this number doesn't include memory used by broadcast 
 variables. Similarly the storage tab only shows list of rdds cached and how 
 much memory they use.  
 We should add a separate column / tab for broadcast variables to make it 
 easier to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174024#comment-14174024
 ] 

Andrew Or commented on SPARK-3957:
--

Hey [~devl.development] are you planning to work on this? Or is [~CodingCat]? 
The latter is currently assigned but maybe you guys should work it out.

 Broadcast variable memory usage not reflected in UI
 ---

 Key: SPARK-3957
 URL: https://issues.apache.org/jira/browse/SPARK-3957
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

 Memory used by broadcast variables are not reflected in the memory usage 
 reported in the WebUI. For example, the executors tab shows memory used in 
 each executor but this number doesn't include memory used by broadcast 
 variables. Similarly the storage tab only shows list of rdds cached and how 
 much memory they use.  
 We should add a separate column / tab for broadcast variables to make it 
 easier to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Nan Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174053#comment-14174053
 ] 

Nan Zhu commented on SPARK-3957:


I agree with [~andrewor14], I was also thinking about piggyback the information 
in the heartbeat between heartbeatReceiver and the executor 

...not sure about the current Hadoop implementation, in 1.x version, TaskStatus 
was piggyback in the heartbeat between TaskTracker and JobTracker...to me, it's 
a very natural way to do this

I accepted it this morning and have started some work, so, [~devlakhani], 
please let me finish this, thanks

 Broadcast variable memory usage not reflected in UI
 ---

 Key: SPARK-3957
 URL: https://issues.apache.org/jira/browse/SPARK-3957
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

 Memory used by broadcast variables are not reflected in the memory usage 
 reported in the WebUI. For example, the executors tab shows memory used in 
 each executor but this number doesn't include memory used by broadcast 
 variables. Similarly the storage tab only shows list of rdds cached and how 
 much memory they use.  
 We should add a separate column / tab for broadcast variables to make it 
 easier to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Dev Lakhani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174107#comment-14174107
 ] 

Dev Lakhani commented on SPARK-3957:


Hi 

For now I am happy for [~CodingCat] to take this on, maybe once there are some 
commits I can help with the UI side, but for now I'll hold back.



 Broadcast variable memory usage not reflected in UI
 ---

 Key: SPARK-3957
 URL: https://issues.apache.org/jira/browse/SPARK-3957
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

 Memory used by broadcast variables are not reflected in the memory usage 
 reported in the WebUI. For example, the executors tab shows memory used in 
 each executor but this number doesn't include memory used by broadcast 
 variables. Similarly the storage tab only shows list of rdds cached and how 
 much memory they use.  
 We should add a separate column / tab for broadcast variables to make it 
 easier to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Nan Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174664#comment-14174664
 ] 

Nan Zhu commented on SPARK-3957:


After looking at the problem more closely, I think we might just set the 
tellMaster flag to true to get this information (after put, it will report to 
BlockManagerMaster), instead of introducing a fat heartbeat message or open new 
channel

the only thing we need to add is that, we need distinguish RDD and broadcast 
variable in BlockStatus

how you guys think about it?

 Broadcast variable memory usage not reflected in UI
 ---

 Key: SPARK-3957
 URL: https://issues.apache.org/jira/browse/SPARK-3957
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

 Memory used by broadcast variables are not reflected in the memory usage 
 reported in the WebUI. For example, the executors tab shows memory used in 
 each executor but this number doesn't include memory used by broadcast 
 variables. Similarly the storage tab only shows list of rdds cached and how 
 much memory they use.  
 We should add a separate column / tab for broadcast variables to make it 
 easier to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Nan Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174675#comment-14174675
 ] 

Nan Zhu commented on SPARK-3957:


BlockId can directly tell if the corresponding block is a broadcast variable

 Broadcast variable memory usage not reflected in UI
 ---

 Key: SPARK-3957
 URL: https://issues.apache.org/jira/browse/SPARK-3957
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

 Memory used by broadcast variables are not reflected in the memory usage 
 reported in the WebUI. For example, the executors tab shows memory used in 
 each executor but this number doesn't include memory used by broadcast 
 variables. Similarly the storage tab only shows list of rdds cached and how 
 much memory they use.  
 We should add a separate column / tab for broadcast variables to make it 
 easier to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Nan Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174684#comment-14174684
 ] 

Nan Zhu commented on SPARK-3957:


[~andrewor14], why we didn't report broadcast variable resource usage to 
BlockManagerMaster in the current implementation?

 Broadcast variable memory usage not reflected in UI
 ---

 Key: SPARK-3957
 URL: https://issues.apache.org/jira/browse/SPARK-3957
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

 Memory used by broadcast variables are not reflected in the memory usage 
 reported in the WebUI. For example, the executors tab shows memory used in 
 each executor but this number doesn't include memory used by broadcast 
 variables. Similarly the storage tab only shows list of rdds cached and how 
 much memory they use.  
 We should add a separate column / tab for broadcast variables to make it 
 easier to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Nan Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174747#comment-14174747
 ] 

Nan Zhu commented on SPARK-3957:


Ok, when i work on executor tab, i rwslize that, we eventually need a 
per-executor record of broadcast usageso will still follow the heartbeat 
based strategy

 Broadcast variable memory usage not reflected in UI
 ---

 Key: SPARK-3957
 URL: https://issues.apache.org/jira/browse/SPARK-3957
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

 Memory used by broadcast variables are not reflected in the memory usage 
 reported in the WebUI. For example, the executors tab shows memory used in 
 each executor but this number doesn't include memory used by broadcast 
 variables. Similarly the storage tab only shows list of rdds cached and how 
 much memory they use.  
 We should add a separate column / tab for broadcast variables to make it 
 easier to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org