[jira] [Created] (SPARK-2151) spark-submit issue (int format expected for memory parameter)
Nishkam Ravi created SPARK-2151: --- Summary: spark-submit issue (int format expected for memory parameter) Key: SPARK-2151 URL: https://issues.apache.org/jira/browse/SPARK-2151 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Nishkam Ravi Get this exception when invoking spark-submit in standalone cluster mode: Exception in thread main java.lang.NumberFormatException: For input string: 38g at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.lang.Integer.parseInt(Integer.java:527) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) at scala.collection.immutable.StringOps.toInt(StringOps.scala:31) at org.apache.spark.deploy.ClientArguments.parse(ClientArguments.scala:55) at org.apache.spark.deploy.ClientArguments.init(ClientArguments.scala:47) at org.apache.spark.deploy.Client$.main(Client.scala:148) at org.apache.spark.deploy.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2151) spark-submit issue (int format expected for memory parameter)
[ https://issues.apache.org/jira/browse/SPARK-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032133#comment-14032133 ] Nishkam Ravi commented on SPARK-2151: - PR: https://github.com/apache/spark/pull/1095/ spark-submit issue (int format expected for memory parameter) - Key: SPARK-2151 URL: https://issues.apache.org/jira/browse/SPARK-2151 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Nishkam Ravi Get this exception when invoking spark-submit in standalone cluster mode: Exception in thread main java.lang.NumberFormatException: For input string: 38g at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.lang.Integer.parseInt(Integer.java:527) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) at scala.collection.immutable.StringOps.toInt(StringOps.scala:31) at org.apache.spark.deploy.ClientArguments.parse(ClientArguments.scala:55) at org.apache.spark.deploy.ClientArguments.init(ClientArguments.scala:47) at org.apache.spark.deploy.Client$.main(Client.scala:148) at org.apache.spark.deploy.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1999) UI : StorageLevel in storage tab and RDD Storage Info never changes
[ https://issues.apache.org/jira/browse/SPARK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1999. Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 Target Version/s: 1.0.1, 1.1.0 Fixed in: https://github.com/apache/spark/pull/968 UI : StorageLevel in storage tab and RDD Storage Info never changes Key: SPARK-1999 URL: https://issues.apache.org/jira/browse/SPARK-1999 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.0.0 Reporter: Chen Chao Assignee: Chen Chao Fix For: 1.0.1, 1.1.0 StorageLevel in 'storage tab' and 'RDD Storage Info' never changes even if you call rdd.unpersist() and then you give the rdd another different storage level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1999) UI : StorageLevel in storage tab and RDD Storage Info never changes
[ https://issues.apache.org/jira/browse/SPARK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1999: --- Assignee: Chen Chao UI : StorageLevel in storage tab and RDD Storage Info never changes Key: SPARK-1999 URL: https://issues.apache.org/jira/browse/SPARK-1999 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.0.0 Reporter: Chen Chao Assignee: Chen Chao Fix For: 1.0.1, 1.1.0 StorageLevel in 'storage tab' and 'RDD Storage Info' never changes even if you call rdd.unpersist() and then you give the rdd another different storage level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2148) Document custom class as key needing equals() AND hashcode()
[ https://issues.apache.org/jira/browse/SPARK-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2148: --- Issue Type: Improvement (was: Bug) Document custom class as key needing equals() AND hashcode() Key: SPARK-2148 URL: https://issues.apache.org/jira/browse/SPARK-2148 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.0.0 Reporter: Andrew Ash Several support requests on user@ have been tracked down to using a custom class as the key in a {{groupByKey()}} or {{reduceByKey()}} that has a custom {{equals()}} method but not the corresponding custom {{hashCode()}} method. Let's add a note in the documentation that custom keys need both {{equals()}} and {{hashCode()}} overridden, never just {{equals()}} The right place for this addition might be as a sub-section or note in http://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs It should probably include a link to http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2148) Document custom class as key needing equals() AND hashcode()
[ https://issues.apache.org/jira/browse/SPARK-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2148. Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 Issue resolved by pull request 1092 [https://github.com/apache/spark/pull/1092] Document custom class as key needing equals() AND hashcode() Key: SPARK-2148 URL: https://issues.apache.org/jira/browse/SPARK-2148 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.0.0 Reporter: Andrew Ash Fix For: 1.0.1, 1.1.0 Several support requests on user@ have been tracked down to using a custom class as the key in a {{groupByKey()}} or {{reduceByKey()}} that has a custom {{equals()}} method but not the corresponding custom {{hashCode()}} method. Let's add a note in the documentation that custom keys need both {{equals()}} and {{hashCode()}} overridden, never just {{equals()}} The right place for this addition might be as a sub-section or note in http://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs It should probably include a link to http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2148) Document custom class as key needing equals() AND hashcode()
[ https://issues.apache.org/jira/browse/SPARK-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2148: --- Assignee: Andrew Ash Document custom class as key needing equals() AND hashcode() Key: SPARK-2148 URL: https://issues.apache.org/jira/browse/SPARK-2148 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.0.0 Reporter: Andrew Ash Assignee: Andrew Ash Fix For: 1.0.1, 1.1.0 Several support requests on user@ have been tracked down to using a custom class as the key in a {{groupByKey()}} or {{reduceByKey()}} that has a custom {{equals()}} method but not the corresponding custom {{hashCode()}} method. Let's add a note in the documentation that custom keys need both {{equals()}} and {{hashCode()}} overridden, never just {{equals()}} The right place for this addition might be as a sub-section or note in http://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs It should probably include a link to http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2039) Run hadoop output checks for all formats
[ https://issues.apache.org/jira/browse/SPARK-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2039. Resolution: Fixed Fix Version/s: 1.1.0 Fixed by: https://github.com/apache/spark/pull/1088 Run hadoop output checks for all formats Key: SPARK-2039 URL: https://issues.apache.org/jira/browse/SPARK-2039 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Patrick Wendell Assignee: Nan Zhu Fix For: 1.1.0 Now that SPARK-1677 allows users to disable output checks, we should just run them for all types of output formats. I'm not sure why we didn't do this originally but it might have been out of defensiveness since we weren't sure what all implementations did. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1946) Submit stage after executors have been registered
[ https://issues.apache.org/jira/browse/SPARK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-1946: -- Description: Because creating TaskSetManager and registering executors are asynchronous, if running job without enough executors, it will lead to some issues * early stages' tasks run without preferred locality. * the default parallelism in yarn is based on number of executors, * the number of intermediate files per node for shuffle (this can bring the node down btw) * and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?) * and so on ... (thanks [~mridulm80] 's [comments | https://github.com/apache/spark/pull/900#issuecomment-45780405]) A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the ratio, default value 0 in Standalone mode and 0.9 in Yarn mode spark.scheduler.minRegisteredRatio = 0.8 \# whatever registered number is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.scheduler.maxRegisteredWaitingTime = 5000 was: Because creating TaskSetManager and registering executors are asynchronous, if running job without enough executors, it will lead to some issues * early stages' tasks run without preferred locality. * the default parallelism in yarn is based on number of executors, * the number of intermediate files per node for shuffle (this can bring the node down btw) * and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?) * and so on ... (thanks [~mridulm80] 's [comments | https://github.com/apache/spark/pull/900#issuecomment-45780405]) A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the number, default value 0 spark.executor.minRegisteredNum = 20 \# whatever registeredRatio is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.executor.maxRegisteredWaitingTime = 5000 Submit stage after executors have been registered - Key: SPARK-1946 URL: https://issues.apache.org/jira/browse/SPARK-1946 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Zhihui Attachments: Spark Task Scheduler Optimization Proposal.pptx Because creating TaskSetManager and registering executors are asynchronous, if running job without enough executors, it will lead to some issues * early stages' tasks run without preferred locality. * the default parallelism in yarn is based on number of executors, * the number of intermediate files per node for shuffle (this can bring the node down btw) * and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?) * and so on ... (thanks [~mridulm80] 's [comments | https://github.com/apache/spark/pull/900#issuecomment-45780405]) A simple solution is sleeping few seconds in application, so that executors have enough time to register. A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties. \# submit stage only after successfully registered executors arrived the ratio, default value 0 in Standalone mode and 0.9 in Yarn mode spark.scheduler.minRegisteredRatio = 0.8 \# whatever registered number is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 1 spark.scheduler.maxRegisteredWaitingTime = 5000 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2152) the error of comput rightNodeAgg about Decision tree algorithm in Spark MLlib
caoli created SPARK-2152: Summary: the error of comput rightNodeAgg about Decision tree algorithm in Spark MLlib Key: SPARK-2152 URL: https://issues.apache.org/jira/browse/SPARK-2152 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Environment: windows7 ,32 operator,and 3G mem Reporter: caoli the error of comput rightNodeAgg about Decision tree algorithm in Spark MLlib about the function extractLeftRightNodeAggregates() ,when compute rightNodeAgg used bindata index is error. in the DecisionTree.scala file about Line 980: rightNodeAgg(featureIndex)(2 * (numBins - 2 - splitIndex)) = binData(shift + (2 * (numBins - 2 - splitIndex))) + rightNodeAgg(featureIndex)(2 * (numBins - 1 - splitIndex)) the binData(shift + (2 * (numBins - 2 - splitIndex))) index compute is error, so the result of rightNodeAgg include repeated data about bins -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2018) Big-Endian (IBM Power7) Spark Serialization issue
[ https://issues.apache.org/jira/browse/SPARK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032255#comment-14032255 ] Gireesh Punathil commented on SPARK-2018: - I was able to identify the root cause. Please see https://github.com/ning/compress/issues/37 for details. Big-Endian (IBM Power7) Spark Serialization issue -- Key: SPARK-2018 URL: https://issues.apache.org/jira/browse/SPARK-2018 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Environment: hardware : IBM Power7 OS:Linux version 2.6.32-358.el6.ppc64 (mockbu...@ppc-017.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Tue Jan 29 11:43:27 EST 2013 JDK: Java(TM) SE Runtime Environment (build pxp6470sr5-20130619_01(SR5)) IBM J9 VM (build 2.6, JRE 1.7.0 Linux ppc64-64 Compressed References 20130617_152572 (JIT enabled, AOT enabled) Hadoop:Hadoop-0.2.3-CDH5.0 Spark:Spark-1.0.0 or Spark-0.9.1 spark-env.sh: export JAVA_HOME=/opt/ibm/java-ppc64-70/ export SPARK_MASTER_IP=9.114.34.69 export SPARK_WORKER_MEMORY=1m export SPARK_CLASSPATH=/home/test1/spark-1.0.0-bin-hadoop2/lib export STANDALONE_SPARK_MASTER_HOST=9.114.34.69 #export SPARK_JAVA_OPTS=' -Xdebug -Xrunjdwp:transport=dt_socket,address=9,server=y,suspend=n ' Reporter: Yanjie Gao We have an application run on Spark on Power7 System . But we meet an important issue about serialization. The example HdfsWordCount can meet the problem. ./bin/run-example org.apache.spark.examples.streaming.HdfsWordCount localdir We used Power7 (Big-Endian arch) and Redhat 6.4. Big-Endian is the main cause since the example ran successfully in another Power-based Little Endian setup. here is the exception stack and log: Spark Executor Command: /opt/ibm/java-ppc64-70//bin/java -cp /home/test1/spark-1.0.0-bin-hadoop2/lib::/home/test1/src/spark-1.0.0-bin-hadoop2/conf:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/test1/src/hadoop-2.3.0-cdh5.0.0/etc/hadoop/:/home/test1/src/hadoop-2.3.0-cdh5.0.0/etc/hadoop/ -XX:MaxPermSize=128m -Xdebug -Xrunjdwp:transport=dt_socket,address=9,server=y,suspend=n -Xms512M -Xmx512M org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://spark@9.186.105.141:60253/user/CoarseGrainedScheduler 2 p7hvs7br16 4 akka.tcp://sparkWorker@p7hvs7br16:59240/user/Worker app-20140604023054- 14/06/04 02:31:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/06/04 02:31:21 INFO spark.SecurityManager: Changing view acls to: test1,yifeng 14/06/04 02:31:21 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(test1, yifeng) 14/06/04 02:31:22 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/06/04 02:31:22 INFO Remoting: Starting remoting 14/06/04 02:31:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@p7hvs7br16:39658] 14/06/04 02:31:22 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutor@p7hvs7br16:39658] 14/06/04 02:31:22 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@9.186.105.141:60253/user/CoarseGrainedScheduler 14/06/04 02:31:22 INFO worker.WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@p7hvs7br16:59240/user/Worker 14/06/04 02:31:23 INFO worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@p7hvs7br16:59240/user/Worker 14/06/04 02:31:24 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver 14/06/04 02:31:24 INFO spark.SecurityManager: Changing view acls to: test1,yifeng 14/06/04 02:31:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(test1, yifeng) 14/06/04 02:31:24 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/06/04 02:31:24 INFO Remoting: Starting remoting 14/06/04 02:31:24 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@p7hvs7br16:58990] 14/06/04 02:31:24 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@p7hvs7br16:58990] 14/06/04 02:31:24 INFO spark.SparkEnv: Connecting to MapOutputTracker: akka.tcp://spark@9.186.105.141:60253/user/MapOutputTracker 14/06/04 02:31:25 INFO spark.SparkEnv: Connecting to BlockManagerMaster: akka.tcp://spark@9.186.105.141:60253/user/BlockManagerMaster 14/06/04 02:31:25 INFO
[jira] [Commented] (SPARK-2126) Move MapOutputTracker behind ShuffleManager interface
[ https://issues.apache.org/jira/browse/SPARK-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032319#comment-14032319 ] Nan Zhu commented on SPARK-2126: [~matei], how about assigning it to me? I'm interested in working on this, thanks! Move MapOutputTracker behind ShuffleManager interface - Key: SPARK-2126 URL: https://issues.apache.org/jira/browse/SPARK-2126 Project: Spark Issue Type: Sub-task Components: Shuffle, Spark Core Reporter: Matei Zaharia This will require changing the interface between the DAGScheduler and MapOutputTracker to be method calls on the ShuffleManager instead. However, it will make it easier to do push-based shuffle and other ideas requiring changes to map output tracking. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1291) Link the spark UI to RM ui in yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032343#comment-14032343 ] Rahul Singhal commented on SPARK-1291: -- I was trying my hand at providing UI while the app is running. I have a working implementation except the fact the since the UI is started before AM, the UI does not know about APPLICATION_WEB_PROXY_BASE and thus relative paths in UI do not work. :( Any suggestions? Link the spark UI to RM ui in yarn-client mode -- Key: SPARK-1291 URL: https://issues.apache.org/jira/browse/SPARK-1291 Project: Spark Issue Type: Improvement Affects Versions: 0.9.0, 1.0.0 Reporter: Thomas Graves Assignee: Guoqiang Li Currently when you run spark on yarn in the yarn-client mode the spark UI is not linked up to the Yarn Resource manager UI so its harder for a user of YARN to find the UI. Note that in yarn-standalone/yarn-cluster mode it is properly linked up. Ideally the yarn-client UI should also be hooked up to the Yarn RM proxy for security. The challenge with the yarn-client mode is that the UI is started before the application master and it doesn't know what the yarn proxy link is when the UI started. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2153) Spark Examples
vishnu created SPARK-2153: - Summary: Spark Examples Key: SPARK-2153 URL: https://issues.apache.org/jira/browse/SPARK-2153 Project: Spark Issue Type: Bug Components: Examples Affects Versions: 1.0.0 Reporter: vishnu Priority: Minor Fix For: 1.0.0 The Spark Example CassandraTest.scala does cannot be built on newer versions of cassandra. I tried it on Cassandra 2.0.8. It is because Cassandra looks case sensitive for the key spaces and stores all the keyspaces in lowercase. And in the example the KeySpace is casDemo . So the program fails with an error stating keyspace not found. The new Cassandra jars do not have the org.apache.cassandra.db.IColumn .So instead we have to use org.apache.cassandra.db.Column. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2154) Worker goes down.
siva venkat gogineni created SPARK-2154: --- Summary: Worker goes down. Key: SPARK-2154 URL: https://issues.apache.org/jira/browse/SPARK-2154 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 0.9.0, 0.8.1 Environment: Spark on cluster of three nodes on Ubuntu 12.04.4 LTS Reporter: siva venkat gogineni Worker dies when i try to submit drivers more than the allocated cores. When I submit 9 drivers with one core for each driver on a cluster having 8 cores all together the worker dies as soon as i submit the 9 the driver. It works fine until it reaches 8 cores, As soon as i submit 9th driver the driver status remains Submitted and the worker crashes. I understand that we cannot run drivers more than the allocated cores but the problem here is instead of the 9th driver being in queue it is being executed and as a result it is crashing the worker. Let me know if there is a way to get around this issue or is it being fixed in the upcoming version? Cluster Details: Spark 1.00 2 nodes with 4 cores each. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2155) Support effectful / non-deterministic key expressions in CASE WHEN statements
[ https://issues.apache.org/jira/browse/SPARK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongheng Yang updated SPARK-2155: - Description: Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant evaluations of the key expression. Relevant discussions here: https://github.com/apache/spark/pull/1055/files#r13784248 (was: Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant evaluations of the key expression. Relevant discussions here: https://github.com/apache/spark/pull/1055/files#r13784248) Support effectful / non-deterministic key expressions in CASE WHEN statements - Key: SPARK-2155 URL: https://issues.apache.org/jira/browse/SPARK-2155 Project: Spark Issue Type: Bug Components: SQL Reporter: Zongheng Yang Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant evaluations of the key expression. Relevant discussions here: https://github.com/apache/spark/pull/1055/files#r13784248 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2155) Support effectful / non-deterministic key expressions in CASE WHEN statements
[ https://issues.apache.org/jira/browse/SPARK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zongheng Yang updated SPARK-2155: - Description: Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant evaluations of the key expression. Relevant discussions here: https://github.com/apache/spark/pull/1055/files#r13784248 If we are very in need of support for effectful key expressions, at least we can resort to the baseline approach of having both CaseWhen and CaseKeyWhen as expressions, which seem to introduce much code duplication (e.g. see https://github.com/concretevitamin/spark/blob/47d406a58d129e5bba68bfadf9dd1faa9054d834/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L216 for a sketch implementation). was: Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant evaluations of the key expression. Relevant discussions here: https://github.com/apache/spark/pull/1055/files#r13784248 If we are very in need of support for effectful key expressions, at least we can resort to the baseline approach of having both CaseWhen and CaseKeyWhen as expressions, which share a lot of code duplications (e.g. see https://github.com/concretevitamin/spark/blob/47d406a58d129e5bba68bfadf9dd1faa9054d834/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L216 for a sketch implementation). Support effectful / non-deterministic key expressions in CASE WHEN statements - Key: SPARK-2155 URL: https://issues.apache.org/jira/browse/SPARK-2155 Project: Spark Issue Type: Bug Components: SQL Reporter: Zongheng Yang Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant evaluations of the key expression. Relevant discussions here: https://github.com/apache/spark/pull/1055/files#r13784248 If we are very in need of support for effectful key expressions, at least we can resort to the baseline approach of having both CaseWhen and CaseKeyWhen as expressions, which seem to introduce much code duplication (e.g. see https://github.com/concretevitamin/spark/blob/47d406a58d129e5bba68bfadf9dd1faa9054d834/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L216 for a sketch implementation). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1903) Document Spark's network connections
[ https://issues.apache.org/jira/browse/SPARK-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ash resolved SPARK-1903. --- Resolution: Fixed Fix Version/s: 1.0.0 Merged into master and branch-1.0 in time for the 1.0.0 release. Jira experts, what's the difference between Fix Version and Target Version? Document Spark's network connections Key: SPARK-1903 URL: https://issues.apache.org/jira/browse/SPARK-1903 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.0.0 Reporter: Andrew Ash Fix For: 1.0.0 For people who want to apply strict firewalls to the Spark cluster, knowing when and why what JVMs connect to what other JVMs on what ports would be super valuable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2100) Allow users to disable Jetty Spark UI in local mode
[ https://issues.apache.org/jira/browse/SPARK-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032803#comment-14032803 ] Sean Owen commented on SPARK-2100: -- Tomcat and Jetty classes don't overlap -- do you mean the Servlet API classes? that's a different known issue. Allow users to disable Jetty Spark UI in local mode --- Key: SPARK-2100 URL: https://issues.apache.org/jira/browse/SPARK-2100 Project: Spark Issue Type: Improvement Reporter: DB Tsai Since we want to use Spark hadoop APIs in local mode for design time to explore the first couple hundred lines of data in HDFS. Also, we want to use Spark in our tomcat application, so starting a jetty UI will make our tomcat unhappy. In those scenarios, Spark UI is not necessary, and wasting resource. As a result, for local mode, it's desirable that users are able to disable the spark UI. Couple places I found where the jetty will be started. In SparkEnv.scala 1) val broadcastManager = new BroadcastManager(isDriver, conf, securityManager) 2) val httpFileServer = new HttpFileServer(securityManager) httpFileServer.initialize() I don't know if broadcastManager is needed in local mode tho. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1930) The Container is running beyond physical memory limits, so as to be killed.
[ https://issues.apache.org/jira/browse/SPARK-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-1930. -- Resolution: Fixed Fix Version/s: 1.1.0 The Container is running beyond physical memory limits, so as to be killed. --- Key: SPARK-1930 URL: https://issues.apache.org/jira/browse/SPARK-1930 Project: Spark Issue Type: Bug Components: YARN Reporter: Guoqiang Li Assignee: Guoqiang Li Fix For: 1.0.1, 1.1.0 When the containers occupies 8G memory ,the containers were killed yarn node manager log: {code} 2014-05-23 13:35:30,776 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=4947,containerID=container_1400809535638_0015_01_05] is running beyond physical memory limits. Current usage: 8.6 GB of 8.5 GB physical memory used; 10.0 GB of 17.8 GB virtual memory used. Killing container. Dump of the process-tree for container_1400809535638_0015_01_05 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 4947 25417 4947 4947 (bash) 0 0 110804992 335 /bin/bash -c /usr/java/jdk1.7.0_45-cloudera/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms8192m -Xmx8192m -Xss2m -Djava.io.tmpdir=/yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_05/tmp -Dlog4j.configuration=log4j-spark-container.properties -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 -Dspark.akka.frameSize=20 org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://sp...@10dian71.domain.test:45477/user/CoarseGrainedScheduler 3 10dian72.domain.test 4 1 /var/log/hadoop-yarn/container/application_1400809535638_0015/container_1400809535638_0015_01_05/stdout 2 /var/log/hadoop-yarn/container/application_1400809535638_0015/container_1400809535638_0015_01_05/stderr |- 4957 4947 4947 4947 (java) 157809 12620 10667016192 2245522 /usr/java/jdk1.7.0_45-cloudera/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms8192m -Xmx8192m -Xss2m -Djava.io.tmpdir=/yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_05/tmp -Dlog4j.configuration=log4j-spark-container.properties -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 -Dspark.akka.frameSize=20 org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://sp...@10dian71.domain.test:45477/user/CoarseGrainedScheduler 3 10dian72.domain.test 4 2014-05-23 13:35:30,776 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Removed ProcessTree with root 4947 2014-05-23 13:35:30,776 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1400809535638_0015_01_05 transitioned from RUNNING to KILLING 2014-05-23 13:35:30,777 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1400809535638_0015_01_05 2014-05-23 13:35:30,788 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1400809535638_0015_01_05 is : 143 2014-05-23 13:35:30,829 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1400809535638_0015_01_05 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL 2014-05-23 13:35:30,830 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_05 2014-05-23 13:35:30,830 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=spark OPERATION=Container Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1400809535638_0015 CONTAINERID=container_1400809535638_0015_01_05 2014-05-23 13:35:30,830 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1400809535638_0015_01_05 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE 2014-05-23 13:35:30,830 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1400809535638_0015_01_05 from application application_1400809535638_0015 {code} I think it should be related with {{YarnAllocationHandler.MEMORY_OVERHEA}} https://github.com/apache/spark/blob/master/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala#L562 Relative to 8G, 384 MB is too small -- This message was sent by
[jira] [Commented] (SPARK-1112) When spark.akka.frameSize 10, task results bigger than 10MiB block execution
[ https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032867#comment-14032867 ] Chen Jin commented on SPARK-1112: - To follow up this thread, I have done some experiments when the frameSize is around 10MB . 1) spark.akka.frameSize = 10 If one of the partition size is very close to 10MB, say 9.97MB, the execution blocks without any exception or warning. Worker finished the task to send the serialized result, and then throw exception saying hadoop IPC client connection stops (changing the logging to debug level). However, the master never receives the results and the program just hangs. But if sizes for all the partitions less than some number btw 9.96MB amd 9.97MB, the program works fine. 2) spark.akka.frameSize = 9 when the partition size is just a little bit smaller than 9MB, it fails as well. This bug behavior is not exactly what spark-1112 is about, could you please guide me how to open a separate bug when the serialization size is very close to 10MB. Thanks a lot When spark.akka.frameSize 10, task results bigger than 10MiB block execution -- Key: SPARK-1112 URL: https://issues.apache.org/jira/browse/SPARK-1112 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Guillaume Pitel Priority: Critical Fix For: 0.9.2 When I set the spark.akka.frameSize to something over 10, the messages sent from the executors to the driver completely block the execution if the message is bigger than 10MiB and smaller than the frameSize (if it's above the frameSize, it's ok) Workaround is to set the spark.akka.frameSize to 10. In this case, since 0.8.1, the blockManager deal with the data to be sent. It seems slower than akka direct message though. The configuration seems to be correctly read (see actorSystemConfig.txt), so I don't see where the 10MiB could come from -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2010) Support for nested data in PySpark SQL
[ https://issues.apache.org/jira/browse/SPARK-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032931#comment-14032931 ] Michael Armbrust commented on SPARK-2010: - I know we merged one PR for this, but there are still some open questions about what SQL structs and maps mean in python, so lets leave this open. Support for nested data in PySpark SQL -- Key: SPARK-2010 URL: https://issues.apache.org/jira/browse/SPARK-2010 Project: Spark Issue Type: Improvement Components: SQL Reporter: Michael Armbrust Assignee: Kan Zhang -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2147) Master UI forgets about Executors when application exits cleanly
[ https://issues.apache.org/jira/browse/SPARK-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-2147: - Description: When an application exits cleanly, the Master will remove all executors from the application's ApplicationInfo, causing the historic Completed Applications page to report that there were no executors associated with that application. On the contrary, if the application exits uncleanly, then the Master will remove the application FIRST, and will not actually remove the executors from the ApplicationInfo page. This causes the executors to show up correctly in the Completed Applications page. The correct behavior would probably be to gather a history of all executors (so we'd retain executors that we had at one point but were removed during the job), and not remove lost executors. was: When an application exists cleanly, the Master will remove all executors from the application's ApplicationInfo, causing the historic Completed Applications page to report that there were no executors associated with that application. On the contrary, if the application exits uncleanly, then the Master will remove the application FIRST, and will not actually remove the executors from the ApplicationInfo page. This causes the executors to show up correctly in the Completed Applications page. The correct behavior would probably be to gather a history of all executors (so we'd retain executors that we had at one point but were removed during the job), and not remove lost executors. Master UI forgets about Executors when application exits cleanly Key: SPARK-2147 URL: https://issues.apache.org/jira/browse/SPARK-2147 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.0.0 Reporter: Aaron Davidson Assignee: Andrew Or When an application exits cleanly, the Master will remove all executors from the application's ApplicationInfo, causing the historic Completed Applications page to report that there were no executors associated with that application. On the contrary, if the application exits uncleanly, then the Master will remove the application FIRST, and will not actually remove the executors from the ApplicationInfo page. This causes the executors to show up correctly in the Completed Applications page. The correct behavior would probably be to gather a history of all executors (so we'd retain executors that we had at one point but were removed during the job), and not remove lost executors. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2100) Allow users to disable Jetty Spark UI in local mode
[ https://issues.apache.org/jira/browse/SPARK-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033049#comment-14033049 ] DB Tsai commented on SPARK-2100: [~sowen] You are right. The servlet api is pulled by jetty's dependency. If the jetty is included with intransitive annotation, the tomcat can start. However, when I start a new SparkContext, it will hang forever without any error message. val sc = new SparkContext(deployMode, appName, sparkConf) Allow users to disable Jetty Spark UI in local mode --- Key: SPARK-2100 URL: https://issues.apache.org/jira/browse/SPARK-2100 Project: Spark Issue Type: Improvement Reporter: DB Tsai Since we want to use Spark hadoop APIs in local mode for design time to explore the first couple hundred lines of data in HDFS. Also, we want to use Spark in our tomcat application, so starting a jetty UI will make our tomcat unhappy. In those scenarios, Spark UI is not necessary, and wasting resource. As a result, for local mode, it's desirable that users are able to disable the spark UI. Couple places I found where the jetty will be started. In SparkEnv.scala 1) val broadcastManager = new BroadcastManager(isDriver, conf, securityManager) 2) val httpFileServer = new HttpFileServer(securityManager) httpFileServer.initialize() I don't know if broadcastManager is needed in local mode tho. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2100) Allow users to disable Jetty Spark UI in local mode
[ https://issues.apache.org/jira/browse/SPARK-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033049#comment-14033049 ] DB Tsai edited comment on SPARK-2100 at 6/16/14 9:50 PM: - [~sowen] You are right. The servlet api is pulled by jetty's dependency. If the jetty is included with intransitive annotation, the tomcat can start. However, when I start a new SparkContext, it will hang forever without any error message. val sc = new SparkContext(local, appName, sparkConf) was (Author: dbtsai): [~sowen] You are right. The servlet api is pulled by jetty's dependency. If the jetty is included with intransitive annotation, the tomcat can start. However, when I start a new SparkContext, it will hang forever without any error message. val sc = new SparkContext(deployMode, appName, sparkConf) Allow users to disable Jetty Spark UI in local mode --- Key: SPARK-2100 URL: https://issues.apache.org/jira/browse/SPARK-2100 Project: Spark Issue Type: Improvement Reporter: DB Tsai Since we want to use Spark hadoop APIs in local mode for design time to explore the first couple hundred lines of data in HDFS. Also, we want to use Spark in our tomcat application, so starting a jetty UI will make our tomcat unhappy. In those scenarios, Spark UI is not necessary, and wasting resource. As a result, for local mode, it's desirable that users are able to disable the spark UI. Couple places I found where the jetty will be started. In SparkEnv.scala 1) val broadcastManager = new BroadcastManager(isDriver, conf, securityManager) 2) val httpFileServer = new HttpFileServer(securityManager) httpFileServer.initialize() I don't know if broadcastManager is needed in local mode tho. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2100) Allow users to disable Jetty Spark UI in local mode
[ https://issues.apache.org/jira/browse/SPARK-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033055#comment-14033055 ] Sean Owen edited comment on SPARK-2100 at 6/16/14 9:55 PM: --- Yes, the Maven build has to do a little work to exclude copies of the Servlet 2.x API. Spark ends up including one copy of the Servlet 3.0 APIs, which should make everybody happy. But if your build brings back in something else, and it's bringing its own Servlet API, you may need to exclude it. (This dependency is super annoying because different containers have distributed the same classes in different artifacts.) Advert break: SPARK-1949 fixes this type of issue for Spark's own SBT-based build. Not exactly the issue here but related, and would be cool to get it committed. https://issues.apache.org/jira/browse/SPARK-1949 was (Author: srowen): Yes, the Maven build has to do a little work to exclude copies of the Servlet 2.x API. Spark ends up including one copy of the Servlet 3.0 APIs, which should everybody happing. But if your build brings back in something else, and it's bringing its own Servlet API, you may need to exclude it. (This dependency is super annoying because different containers have distributed the same classes in different artifacts.) Advert break: SPARK-1949 fixes this type of issue for Spark's own SBT-based build. Not exactly the issue here but related, and would be cool to get it committed. https://issues.apache.org/jira/browse/SPARK-1949 Allow users to disable Jetty Spark UI in local mode --- Key: SPARK-2100 URL: https://issues.apache.org/jira/browse/SPARK-2100 Project: Spark Issue Type: Improvement Reporter: DB Tsai Since we want to use Spark hadoop APIs in local mode for design time to explore the first couple hundred lines of data in HDFS. Also, we want to use Spark in our tomcat application, so starting a jetty UI will make our tomcat unhappy. In those scenarios, Spark UI is not necessary, and wasting resource. As a result, for local mode, it's desirable that users are able to disable the spark UI. Couple places I found where the jetty will be started. In SparkEnv.scala 1) val broadcastManager = new BroadcastManager(isDriver, conf, securityManager) 2) val httpFileServer = new HttpFileServer(securityManager) httpFileServer.initialize() I don't know if broadcastManager is needed in local mode tho. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2100) Allow users to disable Jetty Spark UI in local mode
[ https://issues.apache.org/jira/browse/SPARK-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033055#comment-14033055 ] Sean Owen commented on SPARK-2100: -- Yes, the Maven build has to do a little work to exclude copies of the Servlet 2.x API. Spark ends up including one copy of the Servlet 3.0 APIs, which should everybody happing. But if your build brings back in something else, and it's bringing its own Servlet API, you may need to exclude it. (This dependency is super annoying because different containers have distributed the same classes in different artifacts.) Advert break: SPARK-1949 fixes this type of issue for Spark's own SBT-based build. Not exactly the issue here but related, and would be cool to get it committed. https://issues.apache.org/jira/browse/SPARK-1949 Allow users to disable Jetty Spark UI in local mode --- Key: SPARK-2100 URL: https://issues.apache.org/jira/browse/SPARK-2100 Project: Spark Issue Type: Improvement Reporter: DB Tsai Since we want to use Spark hadoop APIs in local mode for design time to explore the first couple hundred lines of data in HDFS. Also, we want to use Spark in our tomcat application, so starting a jetty UI will make our tomcat unhappy. In those scenarios, Spark UI is not necessary, and wasting resource. As a result, for local mode, it's desirable that users are able to disable the spark UI. Couple places I found where the jetty will be started. In SparkEnv.scala 1) val broadcastManager = new BroadcastManager(isDriver, conf, securityManager) 2) val httpFileServer = new HttpFileServer(securityManager) httpFileServer.initialize() I don't know if broadcastManager is needed in local mode tho. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2160) error of Decision tree algorithm in Spark MLlib
caoli created SPARK-2160: Summary: error of Decision tree algorithm in Spark MLlib Key: SPARK-2160 URL: https://issues.apache.org/jira/browse/SPARK-2160 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: caoli Fix For: 1.1.0 the error of comput rightNodeAgg about Decision tree algorithm in Spark MLlib , in the function extractLeftRightNodeAggregates() ,when compute rightNodeAgg used bindata index is error. in the DecisionTree.scala file about Line980: rightNodeAgg(featureIndex)(2 * (numBins - 2 - splitIndex)) = binData(shift + (2 * (numBins - 2 - splitIndex))) + rightNodeAgg(featureIndex)(2 * (numBins - 1 - splitIndex)) the binData(shift + (2 * (numBins - 2 - splitIndex))) index compute is error, so the result of rightNodeAgg include repeated data about bins -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2161) UI should remember executors that have been removed
Andrew Or created SPARK-2161: Summary: UI should remember executors that have been removed Key: SPARK-2161 URL: https://issues.apache.org/jira/browse/SPARK-2161 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Andrew Or Fix For: 1.0.1 This applies to all of SparkUI, MasterWebUI, and WorkerWebUI. If an executor fails, it just disappears from these UIs. It would be helpful if you can see the logs for why they failed on the UIs. -- This message was sent by Atlassian JIRA (v6.2#6252)