[jira] [Commented] (SPARK-2316) StorageStatusListener should avoid O(blocks) operations
[ https://issues.apache.org/jira/browse/SPARK-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269243#comment-14269243 ] Paul Wolfe commented on SPARK-2316: --- Any workaround ideas for users who can't yet upgrade (stuck on version 1.0.0)? StorageStatusListener should avoid O(blocks) operations --- Key: SPARK-2316 URL: https://issues.apache.org/jira/browse/SPARK-2316 Project: Spark Issue Type: Bug Components: Spark Core, Web UI Affects Versions: 1.0.0 Reporter: Patrick Wendell Assignee: Andrew Or Priority: Critical Fix For: 1.1.0 In the case where jobs are frequently causing dropped blocks the storage status listener can bottleneck. This is slow for a few reasons, one being that we use Scala collection operations, the other being that we operations that are O(number of blocks). I think using a few indices here could make this much faster. {code} at java.lang.Integer.valueOf(Integer.java:642) at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70) at org.apache.spark.storage.StorageUtils$$anonfun$9.apply(StorageUtils.scala:82) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:328) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:327) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.TraversableLike$class.groupBy(TraversableLike.scala:327) at scala.collection.AbstractTraversable.groupBy(Traversable.scala:105) at org.apache.spark.storage.StorageUtils$.rddInfoFromStorageStatus(StorageUtils.scala:82) at org.apache.spark.ui.storage.StorageListener.updateRDDInfo(StorageTab.scala:56) at org.apache.spark.ui.storage.StorageListener.onTaskEnd(StorageTab.scala:67) - locked 0xa27ebe30 (a org.apache.spark.ui.storage.StorageListener) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2316) StorageStatusListener should avoid O(blocks) operations
[ https://issues.apache.org/jira/browse/SPARK-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080449#comment-14080449 ] Apache Spark commented on SPARK-2316: - User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/1679 StorageStatusListener should avoid O(blocks) operations --- Key: SPARK-2316 URL: https://issues.apache.org/jira/browse/SPARK-2316 Project: Spark Issue Type: Bug Components: Spark Core, Web UI Affects Versions: 1.0.0 Reporter: Patrick Wendell Assignee: Andrew Or Priority: Critical Fix For: 1.1.0 In the case where jobs are frequently causing dropped blocks the storage status listener can bottleneck. This is slow for a few reasons, one being that we use Scala collection operations, the other being that we operations that are O(number of blocks). I think using a few indices here could make this much faster. {code} at java.lang.Integer.valueOf(Integer.java:642) at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70) at org.apache.spark.storage.StorageUtils$$anonfun$9.apply(StorageUtils.scala:82) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:328) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:327) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.TraversableLike$class.groupBy(TraversableLike.scala:327) at scala.collection.AbstractTraversable.groupBy(Traversable.scala:105) at org.apache.spark.storage.StorageUtils$.rddInfoFromStorageStatus(StorageUtils.scala:82) at org.apache.spark.ui.storage.StorageListener.updateRDDInfo(StorageTab.scala:56) at org.apache.spark.ui.storage.StorageListener.onTaskEnd(StorageTab.scala:67) - locked 0xa27ebe30 (a org.apache.spark.ui.storage.StorageListener) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2316) StorageStatusListener should avoid O(blocks) operations
[ https://issues.apache.org/jira/browse/SPARK-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074692#comment-14074692 ] Shivaram Venkataraman commented on SPARK-2316: -- On a related note, can we have flags to turn off some of the UI listeners ? If the StorageTab is going to be too expensive to update, it'll be good to have a way to turn it off and just have the JobProgress show up in the UI StorageStatusListener should avoid O(blocks) operations --- Key: SPARK-2316 URL: https://issues.apache.org/jira/browse/SPARK-2316 Project: Spark Issue Type: Bug Components: Spark Core, Web UI Affects Versions: 1.0.0 Reporter: Patrick Wendell Assignee: Andrew Or Priority: Critical In the case where jobs are frequently causing dropped blocks the storage status listener can bottleneck. This is slow for a few reasons, one being that we use Scala collection operations, the other being that we operations that are O(number of blocks). I think using a few indices here could make this much faster. {code} at java.lang.Integer.valueOf(Integer.java:642) at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70) at org.apache.spark.storage.StorageUtils$$anonfun$9.apply(StorageUtils.scala:82) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:328) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:327) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.TraversableLike$class.groupBy(TraversableLike.scala:327) at scala.collection.AbstractTraversable.groupBy(Traversable.scala:105) at org.apache.spark.storage.StorageUtils$.rddInfoFromStorageStatus(StorageUtils.scala:82) at org.apache.spark.ui.storage.StorageListener.updateRDDInfo(StorageTab.scala:56) at org.apache.spark.ui.storage.StorageListener.onTaskEnd(StorageTab.scala:67) - locked 0xa27ebe30 (a org.apache.spark.ui.storage.StorageListener) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2316) StorageStatusListener should avoid O(blocks) operations
[ https://issues.apache.org/jira/browse/SPARK-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065256#comment-14065256 ] Shivaram Venkataraman commented on SPARK-2316: -- I'd just like to add that in cases where we have many thousands of blocks, this stack trace occupies one core constantly on the Master and is probably one of the reasons why the WebUI stops functioning after a certain point. StorageStatusListener should avoid O(blocks) operations --- Key: SPARK-2316 URL: https://issues.apache.org/jira/browse/SPARK-2316 Project: Spark Issue Type: Bug Components: Spark Core, Web UI Affects Versions: 1.0.0 Reporter: Patrick Wendell Assignee: Andrew Or In the case where jobs are frequently causing dropped blocks the storage status listener can bottleneck. This is slow for a few reasons, one being that we use Scala collection operations, the other being that we operations that are O(number of blocks). I think using a few indices here could make this much faster. {code} at java.lang.Integer.valueOf(Integer.java:642) at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70) at org.apache.spark.storage.StorageUtils$$anonfun$9.apply(StorageUtils.scala:82) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:328) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:327) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.TraversableLike$class.groupBy(TraversableLike.scala:327) at scala.collection.AbstractTraversable.groupBy(Traversable.scala:105) at org.apache.spark.storage.StorageUtils$.rddInfoFromStorageStatus(StorageUtils.scala:82) at org.apache.spark.ui.storage.StorageListener.updateRDDInfo(StorageTab.scala:56) at org.apache.spark.ui.storage.StorageListener.onTaskEnd(StorageTab.scala:67) - locked 0xa27ebe30 (a org.apache.spark.ui.storage.StorageListener) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)