[jira] [Created] (SPARK-2151) spark-submit issue (int format expected for memory parameter)

2014-06-16 Thread Nishkam Ravi (JIRA)
Nishkam Ravi created SPARK-2151:
---

 Summary: spark-submit issue (int format expected for memory 
parameter)
 Key: SPARK-2151
 URL: https://issues.apache.org/jira/browse/SPARK-2151
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Nishkam Ravi


Get this exception when invoking spark-submit in standalone cluster mode:

Exception in thread main java.lang.NumberFormatException: For input string: 
38g
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.parseInt(Integer.java:527)
at 
scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
at 
org.apache.spark.deploy.ClientArguments.parse(ClientArguments.scala:55)
at 
org.apache.spark.deploy.ClientArguments.init(ClientArguments.scala:47)
at org.apache.spark.deploy.Client$.main(Client.scala:148)
at org.apache.spark.deploy.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2151) spark-submit issue (int format expected for memory parameter)

2014-06-16 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032133#comment-14032133
 ] 

Nishkam Ravi commented on SPARK-2151:
-

PR: https://github.com/apache/spark/pull/1095/

 spark-submit issue (int format expected for memory parameter)
 -

 Key: SPARK-2151
 URL: https://issues.apache.org/jira/browse/SPARK-2151
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Nishkam Ravi

 Get this exception when invoking spark-submit in standalone cluster mode:
 Exception in thread main java.lang.NumberFormatException: For input string: 
 38g
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   at java.lang.Integer.parseInt(Integer.java:492)
   at java.lang.Integer.parseInt(Integer.java:527)
   at 
 scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
   at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
   at 
 org.apache.spark.deploy.ClientArguments.parse(ClientArguments.scala:55)
   at 
 org.apache.spark.deploy.ClientArguments.init(ClientArguments.scala:47)
   at org.apache.spark.deploy.Client$.main(Client.scala:148)
   at org.apache.spark.deploy.Client.main(Client.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1999) UI : StorageLevel in storage tab and RDD Storage Info never changes

2014-06-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1999.


  Resolution: Fixed
   Fix Version/s: 1.1.0
  1.0.1
Target Version/s: 1.0.1, 1.1.0

Fixed in:
https://github.com/apache/spark/pull/968

 UI : StorageLevel in storage tab and RDD Storage Info never changes 
 

 Key: SPARK-1999
 URL: https://issues.apache.org/jira/browse/SPARK-1999
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.0.0
Reporter: Chen Chao
Assignee: Chen Chao
 Fix For: 1.0.1, 1.1.0


 StorageLevel in 'storage tab' and 'RDD Storage Info' never changes even if 
 you call rdd.unpersist() and then you give the rdd another different storage 
 level.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1999) UI : StorageLevel in storage tab and RDD Storage Info never changes

2014-06-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1999:
---

Assignee: Chen Chao

 UI : StorageLevel in storage tab and RDD Storage Info never changes 
 

 Key: SPARK-1999
 URL: https://issues.apache.org/jira/browse/SPARK-1999
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.0.0
Reporter: Chen Chao
Assignee: Chen Chao
 Fix For: 1.0.1, 1.1.0


 StorageLevel in 'storage tab' and 'RDD Storage Info' never changes even if 
 you call rdd.unpersist() and then you give the rdd another different storage 
 level.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2148) Document custom class as key needing equals() AND hashcode()

2014-06-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2148:
---

Issue Type: Improvement  (was: Bug)

 Document custom class as key needing equals() AND hashcode()
 

 Key: SPARK-2148
 URL: https://issues.apache.org/jira/browse/SPARK-2148
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Andrew Ash

 Several support requests on user@ have been tracked down to using a custom 
 class as the key in a {{groupByKey()}} or {{reduceByKey()}} that has a custom 
 {{equals()}} method but not the corresponding custom {{hashCode()}} method.
 Let's add a note in the documentation that custom keys need both {{equals()}} 
 and {{hashCode()}} overridden, never just {{equals()}}
 The right place for this addition might be as a sub-section or note in 
 http://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs
 It should probably include a link to 
 http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-2148) Document custom class as key needing equals() AND hashcode()

2014-06-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2148.


   Resolution: Fixed
Fix Version/s: 1.1.0
   1.0.1

Issue resolved by pull request 1092
[https://github.com/apache/spark/pull/1092]

 Document custom class as key needing equals() AND hashcode()
 

 Key: SPARK-2148
 URL: https://issues.apache.org/jira/browse/SPARK-2148
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Andrew Ash
 Fix For: 1.0.1, 1.1.0


 Several support requests on user@ have been tracked down to using a custom 
 class as the key in a {{groupByKey()}} or {{reduceByKey()}} that has a custom 
 {{equals()}} method but not the corresponding custom {{hashCode()}} method.
 Let's add a note in the documentation that custom keys need both {{equals()}} 
 and {{hashCode()}} overridden, never just {{equals()}}
 The right place for this addition might be as a sub-section or note in 
 http://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs
 It should probably include a link to 
 http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2148) Document custom class as key needing equals() AND hashcode()

2014-06-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2148:
---

Assignee: Andrew Ash

 Document custom class as key needing equals() AND hashcode()
 

 Key: SPARK-2148
 URL: https://issues.apache.org/jira/browse/SPARK-2148
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Andrew Ash
Assignee: Andrew Ash
 Fix For: 1.0.1, 1.1.0


 Several support requests on user@ have been tracked down to using a custom 
 class as the key in a {{groupByKey()}} or {{reduceByKey()}} that has a custom 
 {{equals()}} method but not the corresponding custom {{hashCode()}} method.
 Let's add a note in the documentation that custom keys need both {{equals()}} 
 and {{hashCode()}} overridden, never just {{equals()}}
 The right place for this addition might be as a sub-section or note in 
 http://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs
 It should probably include a link to 
 http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-2039) Run hadoop output checks for all formats

2014-06-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2039.


   Resolution: Fixed
Fix Version/s: 1.1.0

Fixed by:
https://github.com/apache/spark/pull/1088

 Run hadoop output checks for all formats
 

 Key: SPARK-2039
 URL: https://issues.apache.org/jira/browse/SPARK-2039
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Patrick Wendell
Assignee: Nan Zhu
 Fix For: 1.1.0


 Now that SPARK-1677 allows users to disable output checks, we should just run 
 them for all types of output formats. I'm not sure why we didn't do this 
 originally but it might have been out of defensiveness since we weren't sure 
 what all implementations did.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1946) Submit stage after executors have been registered

2014-06-16 Thread Zhihui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihui updated SPARK-1946:
--

Description: 
Because creating TaskSetManager and registering executors are asynchronous, if 
running job without enough executors, it will lead to some issues
* early stages' tasks run without preferred locality.
* the default parallelism in yarn is based on number of executors, 
* the number of intermediate files per node for shuffle (this can bring the 
node down btw)
* and amount of memory consumed on a node for rdd MEMORY persisted data (making 
the job fail if disk is not specified : like some of the mllib algos ?)
* and so on ...
(thanks [~mridulm80] 's [comments | 
https://github.com/apache/spark/pull/900#issuecomment-45780405])

A simple solution is sleeping few seconds in application, so that executors 
have enough time to register.

A better way is to make DAGScheduler submit stage after a few of executors have 
been registered by configuration properties.

\# submit stage only after successfully registered executors arrived the ratio, 
default value 0 in Standalone mode and 0.9 in Yarn mode
spark.scheduler.minRegisteredRatio = 0.8

\# whatever registered number is arrived, submit stage after the 
maxRegisteredWaitingTime(millisecond), default value 1
spark.scheduler.maxRegisteredWaitingTime = 5000

  was:
Because creating TaskSetManager and registering executors are asynchronous, if 
running job without enough executors, it will lead to some issues
* early stages' tasks run without preferred locality.
* the default parallelism in yarn is based on number of executors, 
* the number of intermediate files per node for shuffle (this can bring the 
node down btw)
* and amount of memory consumed on a node for rdd MEMORY persisted data (making 
the job fail if disk is not specified : like some of the mllib algos ?)
* and so on ...
(thanks [~mridulm80] 's [comments | 
https://github.com/apache/spark/pull/900#issuecomment-45780405])

A simple solution is sleeping few seconds in application, so that executors 
have enough time to register.

A better way is to make DAGScheduler submit stage after a few of executors have 
been registered by configuration properties.

\# submit stage only after successfully registered executors arrived the 
number, default value 0
spark.executor.minRegisteredNum = 20

\# whatever registeredRatio is arrived, submit stage after the 
maxRegisteredWaitingTime(millisecond), default value 1
spark.executor.maxRegisteredWaitingTime = 5000


 Submit stage after executors have been registered
 -

 Key: SPARK-1946
 URL: https://issues.apache.org/jira/browse/SPARK-1946
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Zhihui
 Attachments: Spark Task Scheduler Optimization Proposal.pptx


 Because creating TaskSetManager and registering executors are asynchronous, 
 if running job without enough executors, it will lead to some issues
 * early stages' tasks run without preferred locality.
 * the default parallelism in yarn is based on number of executors, 
 * the number of intermediate files per node for shuffle (this can bring the 
 node down btw)
 * and amount of memory consumed on a node for rdd MEMORY persisted data 
 (making the job fail if disk is not specified : like some of the mllib algos 
 ?)
 * and so on ...
 (thanks [~mridulm80] 's [comments | 
 https://github.com/apache/spark/pull/900#issuecomment-45780405])
 A simple solution is sleeping few seconds in application, so that executors 
 have enough time to register.
 A better way is to make DAGScheduler submit stage after a few of executors 
 have been registered by configuration properties.
 \# submit stage only after successfully registered executors arrived the 
 ratio, default value 0 in Standalone mode and 0.9 in Yarn mode
 spark.scheduler.minRegisteredRatio = 0.8
 \# whatever registered number is arrived, submit stage after the 
 maxRegisteredWaitingTime(millisecond), default value 1
 spark.scheduler.maxRegisteredWaitingTime = 5000



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2152) the error of comput rightNodeAgg about Decision tree algorithm in Spark MLlib

2014-06-16 Thread caoli (JIRA)
caoli created SPARK-2152:


 Summary: the error of comput rightNodeAgg about  Decision tree 
algorithm  in Spark MLlib 
 Key: SPARK-2152
 URL: https://issues.apache.org/jira/browse/SPARK-2152
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.0.0
 Environment: windows7 ,32 operator,and 3G mem
Reporter: caoli


 the error of comput rightNodeAgg about  Decision tree algorithm  in Spark 
MLlib  about  the function extractLeftRightNodeAggregates() ,when compute 
rightNodeAgg  used bindata index is error. in the DecisionTree.scala file about 
 Line 980:

 rightNodeAgg(featureIndex)(2 * (numBins - 2 - splitIndex)) =
binData(shift + (2 * (numBins - 2 - splitIndex))) +
  rightNodeAgg(featureIndex)(2 * (numBins - 1 - splitIndex))

 the   binData(shift + (2 * (numBins - 2 - splitIndex)))  index compute is 
error, so the result of rightNodeAgg  include  repeated data about bins  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2018) Big-Endian (IBM Power7) Spark Serialization issue

2014-06-16 Thread Gireesh Punathil (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032255#comment-14032255
 ] 

Gireesh Punathil commented on SPARK-2018:
-

I was able to identify the root cause. Please see 
https://github.com/ning/compress/issues/37 for details.

 Big-Endian (IBM Power7)  Spark Serialization issue
 --

 Key: SPARK-2018
 URL: https://issues.apache.org/jira/browse/SPARK-2018
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.0.0
 Environment: hardware : IBM Power7
 OS:Linux version 2.6.32-358.el6.ppc64 
 (mockbu...@ppc-017.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red 
 Hat 4.4.7-3) (GCC) ) #1 SMP Tue Jan 29 11:43:27 EST 2013
 JDK: Java(TM) SE Runtime Environment (build pxp6470sr5-20130619_01(SR5))
 IBM J9 VM (build 2.6, JRE 1.7.0 Linux ppc64-64 Compressed References 
 20130617_152572 (JIT enabled, AOT enabled)
 Hadoop:Hadoop-0.2.3-CDH5.0
 Spark:Spark-1.0.0 or Spark-0.9.1
 spark-env.sh:
 export JAVA_HOME=/opt/ibm/java-ppc64-70/
 export SPARK_MASTER_IP=9.114.34.69
 export SPARK_WORKER_MEMORY=1m
 export SPARK_CLASSPATH=/home/test1/spark-1.0.0-bin-hadoop2/lib
 export  STANDALONE_SPARK_MASTER_HOST=9.114.34.69
 #export SPARK_JAVA_OPTS=' -Xdebug 
 -Xrunjdwp:transport=dt_socket,address=9,server=y,suspend=n '
Reporter: Yanjie Gao

 We have an application run on Spark on Power7 System .
 But we meet an important issue about serialization.
 The example HdfsWordCount can meet the problem.
 ./bin/run-example  org.apache.spark.examples.streaming.HdfsWordCount 
 localdir
 We used Power7 (Big-Endian arch) and Redhat  6.4.
 Big-Endian  is the main cause since the example ran successfully in another 
 Power-based Little Endian setup.
 here is the exception stack and log:
 Spark Executor Command: /opt/ibm/java-ppc64-70//bin/java -cp 
 /home/test1/spark-1.0.0-bin-hadoop2/lib::/home/test1/src/spark-1.0.0-bin-hadoop2/conf:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/test1/src/hadoop-2.3.0-cdh5.0.0/etc/hadoop/:/home/test1/src/hadoop-2.3.0-cdh5.0.0/etc/hadoop/
  -XX:MaxPermSize=128m  -Xdebug 
 -Xrunjdwp:transport=dt_socket,address=9,server=y,suspend=n -Xms512M 
 -Xmx512M org.apache.spark.executor.CoarseGrainedExecutorBackend 
 akka.tcp://spark@9.186.105.141:60253/user/CoarseGrainedScheduler 2 
 p7hvs7br16 4 akka.tcp://sparkWorker@p7hvs7br16:59240/user/Worker 
 app-20140604023054-
 
 14/06/04 02:31:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 14/06/04 02:31:21 INFO spark.SecurityManager: Changing view acls to: 
 test1,yifeng
 14/06/04 02:31:21 INFO spark.SecurityManager: SecurityManager: authentication 
 disabled; ui acls disabled; users with view permissions: Set(test1, yifeng)
 14/06/04 02:31:22 INFO slf4j.Slf4jLogger: Slf4jLogger started
 14/06/04 02:31:22 INFO Remoting: Starting remoting
 14/06/04 02:31:22 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://sparkExecutor@p7hvs7br16:39658]
 14/06/04 02:31:22 INFO Remoting: Remoting now listens on addresses: 
 [akka.tcp://sparkExecutor@p7hvs7br16:39658]
 14/06/04 02:31:22 INFO executor.CoarseGrainedExecutorBackend: Connecting to 
 driver: akka.tcp://spark@9.186.105.141:60253/user/CoarseGrainedScheduler
 14/06/04 02:31:22 INFO worker.WorkerWatcher: Connecting to worker 
 akka.tcp://sparkWorker@p7hvs7br16:59240/user/Worker
 14/06/04 02:31:23 INFO worker.WorkerWatcher: Successfully connected to 
 akka.tcp://sparkWorker@p7hvs7br16:59240/user/Worker
 14/06/04 02:31:24 INFO executor.CoarseGrainedExecutorBackend: Successfully 
 registered with driver
 14/06/04 02:31:24 INFO spark.SecurityManager: Changing view acls to: 
 test1,yifeng
 14/06/04 02:31:24 INFO spark.SecurityManager: SecurityManager: authentication 
 disabled; ui acls disabled; users with view permissions: Set(test1, yifeng)
 14/06/04 02:31:24 INFO slf4j.Slf4jLogger: Slf4jLogger started
 14/06/04 02:31:24 INFO Remoting: Starting remoting
 14/06/04 02:31:24 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://spark@p7hvs7br16:58990]
 14/06/04 02:31:24 INFO Remoting: Remoting now listens on addresses: 
 [akka.tcp://spark@p7hvs7br16:58990]
 14/06/04 02:31:24 INFO spark.SparkEnv: Connecting to MapOutputTracker: 
 akka.tcp://spark@9.186.105.141:60253/user/MapOutputTracker
 14/06/04 02:31:25 INFO spark.SparkEnv: Connecting to BlockManagerMaster: 
 akka.tcp://spark@9.186.105.141:60253/user/BlockManagerMaster
 14/06/04 02:31:25 INFO 

[jira] [Commented] (SPARK-2126) Move MapOutputTracker behind ShuffleManager interface

2014-06-16 Thread Nan Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032319#comment-14032319
 ] 

Nan Zhu commented on SPARK-2126:


[~matei], how about assigning it to me? I'm interested in working on this, 
thanks!

 Move MapOutputTracker behind ShuffleManager interface
 -

 Key: SPARK-2126
 URL: https://issues.apache.org/jira/browse/SPARK-2126
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle, Spark Core
Reporter: Matei Zaharia

 This will require changing the interface between the DAGScheduler and 
 MapOutputTracker to be method calls on the ShuffleManager instead. However, 
 it will make it easier to do push-based shuffle and other ideas requiring 
 changes to map output tracking.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1291) Link the spark UI to RM ui in yarn-client mode

2014-06-16 Thread Rahul Singhal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032343#comment-14032343
 ] 

Rahul Singhal commented on SPARK-1291:
--

I was trying my hand at providing UI while the app is running. I have a working 
implementation except the fact the since the UI is started before AM, the UI 
does not know about APPLICATION_WEB_PROXY_BASE and thus relative paths in UI do 
not work. :(

Any suggestions?

 Link the spark UI to RM ui in yarn-client mode
 --

 Key: SPARK-1291
 URL: https://issues.apache.org/jira/browse/SPARK-1291
 Project: Spark
  Issue Type: Improvement
Affects Versions: 0.9.0, 1.0.0
Reporter: Thomas Graves
Assignee: Guoqiang Li

 Currently when you run spark on yarn in the yarn-client mode the spark UI is 
 not linked up to the Yarn Resource manager UI so its harder for a user of 
 YARN to find the UI.  Note that in yarn-standalone/yarn-cluster mode it is 
 properly linked up.
 Ideally the yarn-client UI should also be hooked up to the Yarn RM proxy for 
 security.
 The challenge with the yarn-client mode is that the UI is started before the 
 application master and it doesn't know what the yarn proxy link is when the 
 UI started. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2153) Spark Examples

2014-06-16 Thread vishnu (JIRA)
vishnu created SPARK-2153:
-

 Summary: Spark Examples
 Key: SPARK-2153
 URL: https://issues.apache.org/jira/browse/SPARK-2153
 Project: Spark
  Issue Type: Bug
  Components: Examples
Affects Versions: 1.0.0
Reporter: vishnu
Priority: Minor
 Fix For: 1.0.0


The Spark Example CassandraTest.scala does cannot be built on newer versions of 
cassandra. I tried it on Cassandra 2.0.8. 

It is because Cassandra looks case sensitive for the key spaces and stores all 
the keyspaces in lowercase. And in the example the KeySpace is casDemo . So 
the program fails with an error stating keyspace not found.

The new Cassandra jars do not have the org.apache.cassandra.db.IColumn .So 
instead we have to use org.apache.cassandra.db.Column.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2154) Worker goes down.

2014-06-16 Thread siva venkat gogineni (JIRA)
siva venkat gogineni created SPARK-2154:
---

 Summary: Worker goes down.
 Key: SPARK-2154
 URL: https://issues.apache.org/jira/browse/SPARK-2154
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0, 0.9.0, 0.8.1
 Environment: Spark on cluster of three nodes on Ubuntu 12.04.4 LTS
Reporter: siva venkat gogineni


Worker dies when i try to submit drivers more than the allocated cores. When I 
submit 9 drivers with one core for each driver on a cluster having 8 cores all 
together the worker dies as soon as i submit the 9 the driver. It works fine 
until it reaches 8 cores, As soon as i submit 9th driver the driver status 
remains Submitted and the worker crashes. I understand that we cannot run  
drivers more than the allocated cores but the problem here is instead of the 
9th driver being in queue it is being executed and as a result it is crashing 
the worker. Let me know if there is a way to get around this issue or is it 
being fixed in the upcoming version?

Cluster Details:
Spark 1.00
2 nodes with 4 cores each.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2155) Support effectful / non-deterministic key expressions in CASE WHEN statements

2014-06-16 Thread Zongheng Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongheng Yang updated SPARK-2155:
-

Description: Currently we translate CASE KEY WHEN to CASE WHEN, hence 
incurring redundant evaluations of the key expression. Relevant discussions 
here: https://github.com/apache/spark/pull/1055/files#r13784248  (was: 
Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant 
evaluations of the key expression. Relevant discussions here:

https://github.com/apache/spark/pull/1055/files#r13784248)

 Support effectful / non-deterministic key expressions in CASE WHEN statements
 -

 Key: SPARK-2155
 URL: https://issues.apache.org/jira/browse/SPARK-2155
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Zongheng Yang

 Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant 
 evaluations of the key expression. Relevant discussions here: 
 https://github.com/apache/spark/pull/1055/files#r13784248



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2155) Support effectful / non-deterministic key expressions in CASE WHEN statements

2014-06-16 Thread Zongheng Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongheng Yang updated SPARK-2155:
-

Description: 
Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant 
evaluations of the key expression. Relevant discussions here: 
https://github.com/apache/spark/pull/1055/files#r13784248

If we are very in need of support for effectful key expressions, at least we 
can resort to the baseline approach of having both CaseWhen and CaseKeyWhen as 
expressions, which seem to introduce much code duplication (e.g. see 
https://github.com/concretevitamin/spark/blob/47d406a58d129e5bba68bfadf9dd1faa9054d834/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L216
 for a sketch implementation). 

  was:
Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant 
evaluations of the key expression. Relevant discussions here: 
https://github.com/apache/spark/pull/1055/files#r13784248

If we are very in need of support for effectful key expressions, at least we 
can resort to the baseline approach of having both CaseWhen and CaseKeyWhen as 
expressions, which share a lot of code duplications (e.g. see 
https://github.com/concretevitamin/spark/blob/47d406a58d129e5bba68bfadf9dd1faa9054d834/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L216
 for a sketch implementation). 


 Support effectful / non-deterministic key expressions in CASE WHEN statements
 -

 Key: SPARK-2155
 URL: https://issues.apache.org/jira/browse/SPARK-2155
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Zongheng Yang

 Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant 
 evaluations of the key expression. Relevant discussions here: 
 https://github.com/apache/spark/pull/1055/files#r13784248
 If we are very in need of support for effectful key expressions, at least we 
 can resort to the baseline approach of having both CaseWhen and CaseKeyWhen 
 as expressions, which seem to introduce much code duplication (e.g. see 
 https://github.com/concretevitamin/spark/blob/47d406a58d129e5bba68bfadf9dd1faa9054d834/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L216
  for a sketch implementation). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1903) Document Spark's network connections

2014-06-16 Thread Andrew Ash (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ash resolved SPARK-1903.
---

   Resolution: Fixed
Fix Version/s: 1.0.0

Merged into master and branch-1.0 in time for the 1.0.0 release.

Jira experts, what's the difference between Fix Version and Target Version?

 Document Spark's network connections
 

 Key: SPARK-1903
 URL: https://issues.apache.org/jira/browse/SPARK-1903
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Andrew Ash
 Fix For: 1.0.0


 For people who want to apply strict firewalls to the Spark cluster, knowing 
 when and why what JVMs connect to what other JVMs on what ports would be 
 super valuable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2100) Allow users to disable Jetty Spark UI in local mode

2014-06-16 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032803#comment-14032803
 ] 

Sean Owen commented on SPARK-2100:
--

Tomcat and Jetty classes don't overlap -- do you mean the Servlet API classes? 
that's a different known issue.

 Allow users to disable Jetty Spark UI in local mode
 ---

 Key: SPARK-2100
 URL: https://issues.apache.org/jira/browse/SPARK-2100
 Project: Spark
  Issue Type: Improvement
Reporter: DB Tsai

 Since we want to use Spark hadoop APIs in local mode for design time to 
 explore the first couple hundred lines of data in HDFS. Also, we want to use 
 Spark in our tomcat application, so starting a jetty UI will make our tomcat 
 unhappy. In those scenarios, Spark UI is not necessary, and wasting resource. 
 As a result, for local mode, it's desirable that users are able to disable 
 the spark UI.
 Couple places I found where the jetty will be started.
 In SparkEnv.scala
 1) val broadcastManager = new BroadcastManager(isDriver, conf, 
 securityManager)
 2)  val httpFileServer = new HttpFileServer(securityManager)
 httpFileServer.initialize()
 I don't know if broadcastManager is needed in local mode tho.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1930) The Container is running beyond physical memory limits, so as to be killed.

2014-06-16 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-1930.
--

   Resolution: Fixed
Fix Version/s: 1.1.0

 The Container is running beyond physical memory limits, so as to be killed.
 ---

 Key: SPARK-1930
 URL: https://issues.apache.org/jira/browse/SPARK-1930
 Project: Spark
  Issue Type: Bug
  Components: YARN
Reporter: Guoqiang Li
Assignee: Guoqiang Li
 Fix For: 1.0.1, 1.1.0


 When the containers occupies 8G memory ,the containers were killed
 yarn node manager log:
 {code}
 2014-05-23 13:35:30,776 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Container [pid=4947,containerID=container_1400809535638_0015_01_05] is 
 running beyond physical memory limits. Current usage: 8.6 GB of 8.5 GB 
 physical memory used; 10.0 GB of 17.8 GB virtual memory used. Killing 
 container.
 Dump of the process-tree for container_1400809535638_0015_01_05 :
 |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
 SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
 |- 4947 25417 4947 4947 (bash) 0 0 110804992 335 /bin/bash -c 
 /usr/java/jdk1.7.0_45-cloudera/bin/java -server -XX:OnOutOfMemoryError='kill 
 %p' -Xms8192m -Xmx8192m  -Xss2m 
 -Djava.io.tmpdir=/yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_05/tmp
   -Dlog4j.configuration=log4j-spark-container.properties 
 -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
 -Dspark.akka.frameSize=20 
 org.apache.spark.executor.CoarseGrainedExecutorBackend 
 akka.tcp://sp...@10dian71.domain.test:45477/user/CoarseGrainedScheduler 3 
 10dian72.domain.test 4 1 
 /var/log/hadoop-yarn/container/application_1400809535638_0015/container_1400809535638_0015_01_05/stdout
  2 
 /var/log/hadoop-yarn/container/application_1400809535638_0015/container_1400809535638_0015_01_05/stderr
  
 |- 4957 4947 4947 4947 (java) 157809 12620 10667016192 2245522 
 /usr/java/jdk1.7.0_45-cloudera/bin/java -server -XX:OnOutOfMemoryError=kill 
 %p -Xms8192m -Xmx8192m -Xss2m 
 -Djava.io.tmpdir=/yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_05/tmp
  -Dlog4j.configuration=log4j-spark-container.properties 
 -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
 -Dspark.akka.frameSize=20 
 org.apache.spark.executor.CoarseGrainedExecutorBackend 
 akka.tcp://sp...@10dian71.domain.test:45477/user/CoarseGrainedScheduler 3 
 10dian72.domain.test 4 
 2014-05-23 13:35:30,776 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
  Removed ProcessTree with root 4947
 2014-05-23 13:35:30,776 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1400809535638_0015_01_05 transitioned from RUNNING 
 to KILLING
 2014-05-23 13:35:30,777 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
  Cleaning up container container_1400809535638_0015_01_05
 2014-05-23 13:35:30,788 WARN 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
 from container container_1400809535638_0015_01_05 is : 143
 2014-05-23 13:35:30,829 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1400809535638_0015_01_05 transitioned from KILLING 
 to CONTAINER_CLEANEDUP_AFTER_KILL
 2014-05-23 13:35:30,830 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting 
 absolute path : 
 /yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_05
 2014-05-23 13:35:30,830 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=spark
 OPERATION=Container Finished - Killed   TARGET=ContainerImpl
 RESULT=SUCCESS  APPID=application_1400809535638_0015
 CONTAINERID=container_1400809535638_0015_01_05
 2014-05-23 13:35:30,830 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1400809535638_0015_01_05 transitioned from 
 CONTAINER_CLEANEDUP_AFTER_KILL to DONE
 2014-05-23 13:35:30,830 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Removing container_1400809535638_0015_01_05 from application 
 application_1400809535638_0015
 {code}
 I think it should be related with {{YarnAllocationHandler.MEMORY_OVERHEA}}  
 https://github.com/apache/spark/blob/master/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala#L562
 Relative to 8G, 384 MB is too small



--
This message was sent by 

[jira] [Commented] (SPARK-1112) When spark.akka.frameSize 10, task results bigger than 10MiB block execution

2014-06-16 Thread Chen Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032867#comment-14032867
 ] 

Chen Jin commented on SPARK-1112:
-

To follow up this thread, I have done some experiments when the frameSize is 
around 10MB .

1) spark.akka.frameSize = 10
If one of the partition size is very close to 10MB, say 9.97MB, the execution 
blocks without any exception or warning. Worker finished the task to send the 
serialized result, and then throw exception saying hadoop IPC client connection 
stops (changing the logging to debug level). However, the master never receives 
the results and the program just hangs.
But if sizes for all the partitions less than some number btw 9.96MB amd 
9.97MB, the program works fine.
2) spark.akka.frameSize = 9
when the partition size is just a little bit smaller than 9MB, it fails as well.

This bug behavior is not exactly what spark-1112 is about, could you please 
guide me how to open a separate bug when the serialization size is very close 
to 10MB. 

Thanks a lot

 When spark.akka.frameSize  10, task results bigger than 10MiB block execution
 --

 Key: SPARK-1112
 URL: https://issues.apache.org/jira/browse/SPARK-1112
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0
Reporter: Guillaume Pitel
Priority: Critical
 Fix For: 0.9.2


 When I set the spark.akka.frameSize to something over 10, the messages sent 
 from the executors to the driver completely block the execution if the 
 message is bigger than 10MiB and smaller than the frameSize (if it's above 
 the frameSize, it's ok)
 Workaround is to set the spark.akka.frameSize to 10. In this case, since 
 0.8.1, the blockManager deal with  the data to be sent. It seems slower than 
 akka direct message though.
 The configuration seems to be correctly read (see actorSystemConfig.txt), so 
 I don't see where the 10MiB could come from 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2010) Support for nested data in PySpark SQL

2014-06-16 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032931#comment-14032931
 ] 

Michael Armbrust commented on SPARK-2010:
-

I know we merged one PR for this, but there are still some open questions about 
what SQL structs and maps mean in python, so lets leave this open.

 Support for nested data in PySpark SQL
 --

 Key: SPARK-2010
 URL: https://issues.apache.org/jira/browse/SPARK-2010
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Michael Armbrust
Assignee: Kan Zhang





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2147) Master UI forgets about Executors when application exits cleanly

2014-06-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-2147:
-

Description: 
When an application exits cleanly, the Master will remove all executors from 
the application's ApplicationInfo, causing the historic Completed 
Applications page to report that there were no executors associated with that 
application. 

On the contrary, if the application exits uncleanly, then the Master will 
remove the application FIRST, and will not actually remove the executors from 
the ApplicationInfo page. This causes the executors to show up correctly in the 
Completed Applications page.

The correct behavior would probably be to gather a history of all executors (so 
we'd retain executors that we had at one point but were removed during the 
job), and not remove lost executors.

  was:
When an application exists cleanly, the Master will remove all executors from 
the application's ApplicationInfo, causing the historic Completed 
Applications page to report that there were no executors associated with that 
application. 

On the contrary, if the application exits uncleanly, then the Master will 
remove the application FIRST, and will not actually remove the executors from 
the ApplicationInfo page. This causes the executors to show up correctly in the 
Completed Applications page.

The correct behavior would probably be to gather a history of all executors (so 
we'd retain executors that we had at one point but were removed during the 
job), and not remove lost executors.


 Master UI forgets about Executors when application exits cleanly
 

 Key: SPARK-2147
 URL: https://issues.apache.org/jira/browse/SPARK-2147
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Assignee: Andrew Or

 When an application exits cleanly, the Master will remove all executors from 
 the application's ApplicationInfo, causing the historic Completed 
 Applications page to report that there were no executors associated with 
 that application. 
 On the contrary, if the application exits uncleanly, then the Master will 
 remove the application FIRST, and will not actually remove the executors from 
 the ApplicationInfo page. This causes the executors to show up correctly in 
 the Completed Applications page.
 The correct behavior would probably be to gather a history of all executors 
 (so we'd retain executors that we had at one point but were removed during 
 the job), and not remove lost executors.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2100) Allow users to disable Jetty Spark UI in local mode

2014-06-16 Thread DB Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033049#comment-14033049
 ] 

DB Tsai commented on SPARK-2100:


[~sowen] You are right. The servlet api is pulled by jetty's dependency.  If 
the jetty is included with intransitive annotation, the tomcat can start. 
However, when I start a new SparkContext, it will hang forever without any 
error message. 

  val sc = new SparkContext(deployMode, appName, sparkConf)


 Allow users to disable Jetty Spark UI in local mode
 ---

 Key: SPARK-2100
 URL: https://issues.apache.org/jira/browse/SPARK-2100
 Project: Spark
  Issue Type: Improvement
Reporter: DB Tsai

 Since we want to use Spark hadoop APIs in local mode for design time to 
 explore the first couple hundred lines of data in HDFS. Also, we want to use 
 Spark in our tomcat application, so starting a jetty UI will make our tomcat 
 unhappy. In those scenarios, Spark UI is not necessary, and wasting resource. 
 As a result, for local mode, it's desirable that users are able to disable 
 the spark UI.
 Couple places I found where the jetty will be started.
 In SparkEnv.scala
 1) val broadcastManager = new BroadcastManager(isDriver, conf, 
 securityManager)
 2)  val httpFileServer = new HttpFileServer(securityManager)
 httpFileServer.initialize()
 I don't know if broadcastManager is needed in local mode tho.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (SPARK-2100) Allow users to disable Jetty Spark UI in local mode

2014-06-16 Thread DB Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033049#comment-14033049
 ] 

DB Tsai edited comment on SPARK-2100 at 6/16/14 9:50 PM:
-

[~sowen] You are right. The servlet api is pulled by jetty's dependency.  If 
the jetty is included with intransitive annotation, the tomcat can start. 
However, when I start a new SparkContext, it will hang forever without any 
error message. 

  val sc = new SparkContext(local, appName, sparkConf)



was (Author: dbtsai):
[~sowen] You are right. The servlet api is pulled by jetty's dependency.  If 
the jetty is included with intransitive annotation, the tomcat can start. 
However, when I start a new SparkContext, it will hang forever without any 
error message. 

  val sc = new SparkContext(deployMode, appName, sparkConf)


 Allow users to disable Jetty Spark UI in local mode
 ---

 Key: SPARK-2100
 URL: https://issues.apache.org/jira/browse/SPARK-2100
 Project: Spark
  Issue Type: Improvement
Reporter: DB Tsai

 Since we want to use Spark hadoop APIs in local mode for design time to 
 explore the first couple hundred lines of data in HDFS. Also, we want to use 
 Spark in our tomcat application, so starting a jetty UI will make our tomcat 
 unhappy. In those scenarios, Spark UI is not necessary, and wasting resource. 
 As a result, for local mode, it's desirable that users are able to disable 
 the spark UI.
 Couple places I found where the jetty will be started.
 In SparkEnv.scala
 1) val broadcastManager = new BroadcastManager(isDriver, conf, 
 securityManager)
 2)  val httpFileServer = new HttpFileServer(securityManager)
 httpFileServer.initialize()
 I don't know if broadcastManager is needed in local mode tho.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (SPARK-2100) Allow users to disable Jetty Spark UI in local mode

2014-06-16 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033055#comment-14033055
 ] 

Sean Owen edited comment on SPARK-2100 at 6/16/14 9:55 PM:
---

Yes, the Maven build has to do a little work to exclude copies of the Servlet 
2.x API. Spark ends up including one copy of the Servlet 3.0 APIs, which should 
make everybody happy. But if your build brings back in something else, and it's 
bringing its own Servlet API, you may need to exclude it. (This dependency is 
super annoying because different containers have distributed the same classes 
in different artifacts.)

Advert break: SPARK-1949 fixes this type of issue for Spark's own SBT-based 
build. Not exactly the issue here but related, and would be cool to get it 
committed. https://issues.apache.org/jira/browse/SPARK-1949


was (Author: srowen):
Yes, the Maven build has to do a little work to exclude copies of the Servlet 
2.x API. Spark ends up including one copy of the Servlet 3.0 APIs, which should 
everybody happing. But if your build brings back in something else, and it's 
bringing its own Servlet API, you may need to exclude it. (This dependency is 
super annoying because different containers have distributed the same classes 
in different artifacts.)

Advert break: SPARK-1949 fixes this type of issue for Spark's own SBT-based 
build. Not exactly the issue here but related, and would be cool to get it 
committed. https://issues.apache.org/jira/browse/SPARK-1949

 Allow users to disable Jetty Spark UI in local mode
 ---

 Key: SPARK-2100
 URL: https://issues.apache.org/jira/browse/SPARK-2100
 Project: Spark
  Issue Type: Improvement
Reporter: DB Tsai

 Since we want to use Spark hadoop APIs in local mode for design time to 
 explore the first couple hundred lines of data in HDFS. Also, we want to use 
 Spark in our tomcat application, so starting a jetty UI will make our tomcat 
 unhappy. In those scenarios, Spark UI is not necessary, and wasting resource. 
 As a result, for local mode, it's desirable that users are able to disable 
 the spark UI.
 Couple places I found where the jetty will be started.
 In SparkEnv.scala
 1) val broadcastManager = new BroadcastManager(isDriver, conf, 
 securityManager)
 2)  val httpFileServer = new HttpFileServer(securityManager)
 httpFileServer.initialize()
 I don't know if broadcastManager is needed in local mode tho.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2100) Allow users to disable Jetty Spark UI in local mode

2014-06-16 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033055#comment-14033055
 ] 

Sean Owen commented on SPARK-2100:
--

Yes, the Maven build has to do a little work to exclude copies of the Servlet 
2.x API. Spark ends up including one copy of the Servlet 3.0 APIs, which should 
everybody happing. But if your build brings back in something else, and it's 
bringing its own Servlet API, you may need to exclude it. (This dependency is 
super annoying because different containers have distributed the same classes 
in different artifacts.)

Advert break: SPARK-1949 fixes this type of issue for Spark's own SBT-based 
build. Not exactly the issue here but related, and would be cool to get it 
committed. https://issues.apache.org/jira/browse/SPARK-1949

 Allow users to disable Jetty Spark UI in local mode
 ---

 Key: SPARK-2100
 URL: https://issues.apache.org/jira/browse/SPARK-2100
 Project: Spark
  Issue Type: Improvement
Reporter: DB Tsai

 Since we want to use Spark hadoop APIs in local mode for design time to 
 explore the first couple hundred lines of data in HDFS. Also, we want to use 
 Spark in our tomcat application, so starting a jetty UI will make our tomcat 
 unhappy. In those scenarios, Spark UI is not necessary, and wasting resource. 
 As a result, for local mode, it's desirable that users are able to disable 
 the spark UI.
 Couple places I found where the jetty will be started.
 In SparkEnv.scala
 1) val broadcastManager = new BroadcastManager(isDriver, conf, 
 securityManager)
 2)  val httpFileServer = new HttpFileServer(securityManager)
 httpFileServer.initialize()
 I don't know if broadcastManager is needed in local mode tho.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2160) error of Decision tree algorithm in Spark MLlib

2014-06-16 Thread caoli (JIRA)
caoli created SPARK-2160:


 Summary: error of  Decision tree algorithm  in Spark MLlib 
 Key: SPARK-2160
 URL: https://issues.apache.org/jira/browse/SPARK-2160
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.0.0
Reporter: caoli
 Fix For: 1.1.0


the error of comput rightNodeAgg about  Decision tree algorithm  in Spark MLlib 
 , in the function extractLeftRightNodeAggregates() ,when compute rightNodeAgg  
used bindata index is error. in the DecisionTree.scala file about  Line980:

 rightNodeAgg(featureIndex)(2 * (numBins - 2 - splitIndex)) =
binData(shift + (2 * (numBins - 2 - splitIndex))) +
  rightNodeAgg(featureIndex)(2 * (numBins - 1 - splitIndex))

 the   binData(shift + (2 * (numBins - 2 - splitIndex)))  index compute is 
error, so the result of rightNodeAgg  include  repeated data about bins  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2161) UI should remember executors that have been removed

2014-06-16 Thread Andrew Or (JIRA)
Andrew Or created SPARK-2161:


 Summary: UI should remember executors that have been removed
 Key: SPARK-2161
 URL: https://issues.apache.org/jira/browse/SPARK-2161
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Andrew Or
 Fix For: 1.0.1


This applies to all of SparkUI, MasterWebUI, and WorkerWebUI. If an executor 
fails, it just disappears from these UIs. It would be helpful if you can see 
the logs for why they failed on the UIs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)