[jira] [Commented] (SPARK-1605) Improve mllib.linalg.Vector

2014-04-24 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979380#comment-13979380
 ] 

Sean Owen commented on SPARK-1605:
--

I think this was on purpose, to try to hide breeze as an implementation detail, 
at least in public APIs?

 Improve mllib.linalg.Vector
 ---

 Key: SPARK-1605
 URL: https://issues.apache.org/jira/browse/SPARK-1605
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Sandeep Singh

 We can make current Vector a wrapper around Breeze.linalg.Vector ?
 I want to work on this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1540) Investigate whether we should require keys in PairRDD to be Comparable

2014-04-24 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia resolved SPARK-1540.
--

Resolution: Fixed

Resolved here: https://github.com/apache/spark/pull/487/files. We were able to 
add an implicit Ordering to most methods while giving it a default value of 
null, which will allow these to work on un-orderble types too using the 
current, less efficient code path.

 Investigate whether we should require keys in PairRDD to be Comparable
 --

 Key: SPARK-1540
 URL: https://issues.apache.org/jira/browse/SPARK-1540
 Project: Spark
  Issue Type: New Feature
Reporter: Matei Zaharia
Assignee: Matei Zaharia
Priority: Blocker
 Fix For: 1.0.0


 This is kind of a bigger change, but it would make it easier to do sort-based 
 versions of external operations later. We might also get away without it. 
 Note that sortByKey() already does require an Ordering or Comparables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1540) Investigate whether we should require keys in PairRDD to be Comparable

2014-04-24 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979393#comment-13979393
 ] 

Matei Zaharia commented on SPARK-1540:
--

Note that it will remain to add this to the Java and Python APIs. But in those 
it can be done through new versions of methods, so it will be doable in a 
binary-compatible way.

 Investigate whether we should require keys in PairRDD to be Comparable
 --

 Key: SPARK-1540
 URL: https://issues.apache.org/jira/browse/SPARK-1540
 Project: Spark
  Issue Type: New Feature
Reporter: Matei Zaharia
Assignee: Matei Zaharia
Priority: Blocker
 Fix For: 1.0.0


 This is kind of a bigger change, but it would make it easier to do sort-based 
 versions of external operations later. We might also get away without it. 
 Note that sortByKey() already does require an Ordering or Comparables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (SPARK-1494) Hive Dependencies being checked by MIMA

2014-04-24 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust reopened SPARK-1494:
-


 Hive Dependencies being checked by MIMA
 ---

 Key: SPARK-1494
 URL: https://issues.apache.org/jira/browse/SPARK-1494
 Project: Spark
  Issue Type: Bug
  Components: Project Infra, SQL
Affects Versions: 1.0.0
Reporter: Ahir Reddy
Assignee: Michael Armbrust
Priority: Minor
 Fix For: 1.0.0


 It looks like code in companion objects is being invoked by the MIMA checker, 
 as it uses Scala reflection to check all of the interfaces. As a result it's 
 starting a Spark context and eventually out of memory errors. As a temporary 
 fix all classes that contain hive or Hive are excluded from the check.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1590) Recommend to use FindBugs

2014-04-24 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979505#comment-13979505
 ] 

Shixiong Zhu commented on SPARK-1590:
-

Agree that a period scanning is better then incorporating into the build, since 
there are so many false positives.

 Recommend to use FindBugs
 -

 Key: SPARK-1590
 URL: https://issues.apache.org/jira/browse/SPARK-1590
 Project: Spark
  Issue Type: Question
  Components: Build
Reporter: Shixiong Zhu
Priority: Minor
 Attachments: findbugs.png


 FindBugs is an open source program created by Bill Pugh and David Hovemeyer 
 which looks for bugs in Java code. It uses static analysis to identify 
 hundreds of different potential types of errors in Java programs.
 Although Spark is a Scala project, FindBugs is still helpful. For example, I 
 used it to find SPARK-1583 and SPARK-1589. However, the disadvantage is that 
 the report generated by FindBugs usually contains many false alarms for a 
 Scala project.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1593) Add status command to Spark Daemons(master/worker)

2014-04-24 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979541#comment-13979541
 ] 

Sandeep Singh commented on SPARK-1593:
--

Make a pull request.

 Add status command to Spark Daemons(master/worker)
 --

 Key: SPARK-1593
 URL: https://issues.apache.org/jira/browse/SPARK-1593
 Project: Spark
  Issue Type: New Feature
  Components: Deploy
Affects Versions: 0.9.1
Reporter: Pradeep Chanumolu
  Labels: patch
 Attachments: 
 0001-Adding-Spark-Daemon-master-worker-status-command.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Currently we have only start and stop commands for spark
 daemons(master/worker). So a status command can be added to spark-daemon.sh 
 and spark-daemons.sh which tells if the master/worker is alive or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1608) Cast.nullable should be true when cast from StringType to NumericType/TimestampType

2014-04-24 Thread Takuya Ueshin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979559#comment-13979559
 ] 

Takuya Ueshin commented on SPARK-1608:
--

Pull-requested: https://github.com/apache/spark/pull/532

 Cast.nullable should be true when cast from StringType to 
 NumericType/TimestampType
 ---

 Key: SPARK-1608
 URL: https://issues.apache.org/jira/browse/SPARK-1608
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Takuya Ueshin

 Cast.nullable should be true when cast from StringType to NumericType or 
 TimestampType.
 Because if StringType expression has an illegal number string or illegal 
 timestamp string, the casted value becomes null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1609) Executor fails to start

2014-04-24 Thread witgo (JIRA)
witgo created SPARK-1609:


 Summary: Executor fails to start
 Key: SPARK-1609
 URL: https://issues.apache.org/jira/browse/SPARK-1609
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: witgo
Priority: Blocker


{code}
export SPARK_JAVA_OPTS=-server -Dspark.ui.killEnabled=false 
-Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
-Dspark.locality.wait=1 
-Dspark.storage.blockManagerTimeoutIntervalMs=600 
-Dspark.storage.memoryFraction=0.7 
-Dspark.broadcast.factory=org.apache.spark.broadcast.TorrentBroadcastFactory
{code}
Executor fails to start.
{code}
export SPARK_JAVA_OPTS=-Dspark.ui.killEnabled=false 
-Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
-Dspark.locality.wait=1 
-Dspark.storage.blockManagerTimeoutIntervalMs=600 
-Dspark.storage.memoryFraction=0.7 
-Dspark.broadcast.factory=org.apache.spark.broadcast.TorrentBroadcastFactory
{code}
Executor can work 







--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1609) Executor fails to start when use spark-submit

2014-04-24 Thread witgo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

witgo updated SPARK-1609:
-

Summary: Executor fails to start when use spark-submit  (was: Executor 
fails to start)

 Executor fails to start when use spark-submit
 -

 Key: SPARK-1609
 URL: https://issues.apache.org/jira/browse/SPARK-1609
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: witgo
Priority: Blocker
 Attachments: spark.log


 {code}
 export SPARK_JAVA_OPTS=-server -Dspark.ui.killEnabled=false 
 -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
 -Dspark.locality.wait=1 
 -Dspark.storage.blockManagerTimeoutIntervalMs=600 
 -Dspark.storage.memoryFraction=0.7 
 -Dspark.broadcast.factory=org.apache.spark.broadcast.TorrentBroadcastFactory
 {code}
 Executor fails to start.
 {code}
 export SPARK_JAVA_OPTS=-Dspark.ui.killEnabled=false 
 -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
 -Dspark.locality.wait=1 
 -Dspark.storage.blockManagerTimeoutIntervalMs=600 
 -Dspark.storage.memoryFraction=0.7 
 -Dspark.broadcast.factory=org.apache.spark.broadcast.TorrentBroadcastFactory
 {code}
 Executor can work 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1600) flaky test case in streaming.CheckpointSuite

2014-04-24 Thread Nan Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nan Zhu updated SPARK-1600:
---

  Component/s: Streaming
Affects Version/s: 0.9.1
   1.0.0
   0.9.0

 flaky test case in streaming.CheckpointSuite
 

 Key: SPARK-1600
 URL: https://issues.apache.org/jira/browse/SPARK-1600
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9.0, 1.0.0, 0.9.1
Reporter: Nan Zhu

 the case recovery with file input stream.recovery with file input stream   
 sometimes fails when the Jenkins is very busy with an unrelated change 
 I have met it for 3 times, I also saw it in other places, 
 the latest example is in 
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14397/
 where the modification is just in YARN related files
 I once reported in dev mail list: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/a-weird-test-case-in-Streaming-td6116.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1610) Cast from BooleanType to NumericType should use exact type value.

2014-04-24 Thread Takuya Ueshin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979618#comment-13979618
 ] 

Takuya Ueshin commented on SPARK-1610:
--

Pull-requested: https://github.com/apache/spark/pull/533

 Cast from BooleanType to NumericType should use exact type value.
 -

 Key: SPARK-1610
 URL: https://issues.apache.org/jira/browse/SPARK-1610
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Takuya Ueshin

 Cast from BooleanType to NumericType are all using Int value.
 But it causes ClassCastException when the casted value is used by the 
 following evaluation like the code below:
 {quote}
 scala import org.apache.spark.sql.catalyst._
 import org.apache.spark.sql.catalyst._
 scala import types._
 import types._
 scala import expressions._
 import expressions._
 scala Add(Cast(Literal(true), ShortType), Literal(1.toShort)).eval()
 java.lang.ClassCastException: java.lang.Integer cannot be cast to 
 java.lang.Short
   at scala.runtime.BoxesRunTime.unboxToShort(BoxesRunTime.java:102)
   at scala.math.Numeric$ShortIsIntegral$.plus(Numeric.scala:72)
   at 
 org.apache.spark.sql.catalyst.expressions.Add$$anonfun$eval$2.apply(arithmetic.scala:58)
   at 
 org.apache.spark.sql.catalyst.expressions.Add$$anonfun$eval$2.apply(arithmetic.scala:58)
   at 
 org.apache.spark.sql.catalyst.expressions.Expression.n2(Expression.scala:114)
   at 
 org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:58)
   at .init(console:17)
   at .clinit(console)
   at .init(console:7)
   at .clinit(console)
   at $print(console)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:483)
   at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
   at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
   at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
   at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
   at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
   at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
   at 
 scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805)
   at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
   at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
   at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
   at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
   at 
 scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882)
   at 
 scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
   at 
 scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
   at 
 scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
   at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
   at 
 scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:83)
   at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
   at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
   at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1604) YARN cluster mode broken

2014-04-24 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979624#comment-13979624
 ] 

Kan Zhang commented on SPARK-1604:
--

I doubt it, since when I ran it in YARN client mode, it did work.

 YARN cluster mode broken
 

 Key: SPARK-1604
 URL: https://issues.apache.org/jira/browse/SPARK-1604
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.0.0
Reporter: Kan Zhang
Priority: Blocker

 SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar
  ./bin/spark-submit 
 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar --master 
 yarn --deploy-mode cluster --class org.apache.spark.examples.sql.JavaSparkSQL 
 Exception in thread main java.lang.ClassNotFoundException: 
 org.apache.spark.deploy.yarn.Client
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:270)
   at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:234)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:47)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1611) Incorrect initialization order in AppendOnlyMap

2014-04-24 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-1611:
---

 Summary: Incorrect initialization order in AppendOnlyMap
 Key: SPARK-1611
 URL: https://issues.apache.org/jira/browse/SPARK-1611
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu
Priority: Minor


The initialization order of growThreshold and LOAD_FACTOR is incorrect. 
growThreshold will be initialized to 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1583) Use java.util.HashMap.remove by mistake in BlockManagerMasterActor.removeBlockManager

2014-04-24 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-1583.
-

   Resolution: Fixed
Fix Version/s: 1.0.0

 Use java.util.HashMap.remove by mistake in 
 BlockManagerMasterActor.removeBlockManager
 -

 Key: SPARK-1583
 URL: https://issues.apache.org/jira/browse/SPARK-1583
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu
  Labels: easyfix
 Fix For: 1.0.0


 The following code in BlockManagerMasterActor.removeBlockManager uses a value 
 to remove an entry from java.util.HashMap.
   if (locations.size == 0) {
 blockLocations.remove(locations)
   }
 Should change to blockLocations.remove(blockId).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1609) Executor fails to start when use spark-submit

2014-04-24 Thread witgo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

witgo updated SPARK-1609:
-

Attachment: (was: spark.log)

 Executor fails to start when use spark-submit
 -

 Key: SPARK-1609
 URL: https://issues.apache.org/jira/browse/SPARK-1609
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: witgo
Priority: Blocker
 Attachments: spark.log


 {code}
 export SPARK_JAVA_OPTS=-server -Dspark.ui.killEnabled=false 
 -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
 -Dspark.locality.wait=1 
 -Dspark.storage.blockManagerTimeoutIntervalMs=600 
 -Dspark.storage.memoryFraction=0.7 
 -Dspark.broadcast.factory=org.apache.spark.broadcast.TorrentBroadcastFactory
 {code}
 Executor fails to start.
 {code}
 export SPARK_JAVA_OPTS=-Dspark.ui.killEnabled=false 
 -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
 -Dspark.locality.wait=1 
 -Dspark.storage.blockManagerTimeoutIntervalMs=600 
 -Dspark.storage.memoryFraction=0.7 
 -Dspark.broadcast.factory=org.apache.spark.broadcast.TorrentBroadcastFactory
 {code}
 Executor can work 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1612) Potential resource leaks in Utils.copyStream and Utils.offsetBytes

2014-04-24 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-1612:
---

 Summary: Potential resource leaks in Utils.copyStream and 
Utils.offsetBytes
 Key: SPARK-1612
 URL: https://issues.apache.org/jira/browse/SPARK-1612
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu
Priority: Minor


Should move the close statements into a finally block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1612) Potential resource leaks in Utils.copyStream and Utils.offsetBytes

2014-04-24 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-1612:


Labels: easyfix  (was: )

 Potential resource leaks in Utils.copyStream and Utils.offsetBytes
 --

 Key: SPARK-1612
 URL: https://issues.apache.org/jira/browse/SPARK-1612
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu
Priority: Minor
  Labels: easyfix

 Should move the close statements into a finally block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1612) Potential resource leaks in Utils.copyStream and Utils.offsetBytes

2014-04-24 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979752#comment-13979752
 ] 

Shixiong Zhu commented on SPARK-1612:
-

PR: https://github.com/apache/spark/pull/535

 Potential resource leaks in Utils.copyStream and Utils.offsetBytes
 --

 Key: SPARK-1612
 URL: https://issues.apache.org/jira/browse/SPARK-1612
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu
Priority: Minor
  Labels: easyfix

 Should move the close statements into a finally block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1613) Difficulty starting up cluster on Amazon EC2

2014-04-24 Thread Johnny King (JIRA)
Johnny King created SPARK-1613:
--

 Summary: Difficulty starting up cluster on Amazon EC2
 Key: SPARK-1613
 URL: https://issues.apache.org/jira/browse/SPARK-1613
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.0.0
 Environment: Default Amazon AMI, default instance
Reporter: Johnny King


When I use Pyspark and the Spark-shell, master disconnects from all slaves. 

Executor updated: ...  is now FAILED (Command exited with code 1)
Master disconnected from cluster

If it's working for you, can you show me that exact steps taken? I'm not doing 
anything out of the ordinary, just following deploy EC2 instructions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1614) Move Mesos protobufs out of TaskState

2014-04-24 Thread Shivaram Venkataraman (JIRA)
Shivaram Venkataraman created SPARK-1614:


 Summary: Move Mesos protobufs out of TaskState
 Key: SPARK-1614
 URL: https://issues.apache.org/jira/browse/SPARK-1614
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 0.9.1
Reporter: Shivaram Venkataraman
Priority: Minor


To isolate usage of Mesos protobufs it would be good to move them out of 
TaskState into either a new class (MesosUtils ?) or 
CoarseGrainedMesos{Executor, Backend}.

This would allow applications to build Spark to run without including protobuf 
from Mesos in their shaded jars.  This is one way to avoid protobuf conflicts 
between Mesos and Hadoop (https://issues.apache.org/jira/browse/MESOS-1203)




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1609) Executor fails to start when use spark-submit

2014-04-24 Thread witgo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

witgo updated SPARK-1609:
-

Attachment: (was: spark.log)

 Executor fails to start when use spark-submit
 -

 Key: SPARK-1609
 URL: https://issues.apache.org/jira/browse/SPARK-1609
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: witgo
Priority: Blocker
 Attachments: spark.log


 {code}
 export SPARK_JAVA_OPTS=-server -Dspark.ui.killEnabled=false 
 -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
 -Dspark.locality.wait=1 
 -Dspark.storage.blockManagerTimeoutIntervalMs=600 
 -Dspark.storage.memoryFraction=0.7 
 -Dspark.broadcast.factory=org.apache.spark.broadcast.TorrentBroadcastFactory
 {code}
 Executor fails to start.
 {code}
 export SPARK_JAVA_OPTS=-Dspark.ui.killEnabled=false 
 -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
 -Dspark.locality.wait=1 
 -Dspark.storage.blockManagerTimeoutIntervalMs=600 
 -Dspark.storage.memoryFraction=0.7 
 -Dspark.broadcast.factory=org.apache.spark.broadcast.TorrentBroadcastFactory
 {code}
 Executor can work 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1615) Very subtle race condition in SparkListenerSuite

2014-04-24 Thread Andrew Or (JIRA)
Andrew Or created SPARK-1615:


 Summary: Very subtle race condition in SparkListenerSuite
 Key: SPARK-1615
 URL: https://issues.apache.org/jira/browse/SPARK-1615
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.1
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Minor
 Fix For: 1.0.0


Much of SparkListenerSuite relies on LiveListenerBus's waitUntilEmpty() method. 
As the name suggests, this waits until the event queue is empty. However, the 
following race condition could happen:

(1) We dequeue the event
(2) The queue is empty, we return true
(3) The test asserts something assuming that all listeners have finished 
executing
(4) The listeners receive the event

This has been a possible race condition for a long time, but for some reason 
we've never run into it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1604) YARN cluster mode broken

2014-04-24 Thread Kan Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kan Zhang updated SPARK-1604:
-

Component/s: (was: YARN)
 Build

 YARN cluster mode broken
 

 Key: SPARK-1604
 URL: https://issues.apache.org/jira/browse/SPARK-1604
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.0
Reporter: Kan Zhang
Priority: Blocker

 SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar
  ./bin/spark-submit 
 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar --master 
 yarn --deploy-mode cluster --class org.apache.spark.examples.sql.JavaSparkSQL 
 Exception in thread main java.lang.ClassNotFoundException: 
 org.apache.spark.deploy.yarn.Client
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:270)
   at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:234)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:47)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1615) Very subtle race condition in SparkListenerSuite

2014-04-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-1615:
-

Description: 
Much of SparkListenerSuite relies on LiveListenerBus's waitUntilEmpty() method. 
As the name suggests, this waits until the event queue is empty. However, the 
following race condition could happen:

(1) We dequeue the event
(2) The queue is empty, we return true
(3) The test asserts something assuming that all listeners have finished 
executing (and fails)
(4) The listeners receive the event

This has been a possible race condition for a long time, but for some reason 
we've never run into it.

  was:
Much of SparkListenerSuite relies on LiveListenerBus's waitUntilEmpty() method. 
As the name suggests, this waits until the event queue is empty. However, the 
following race condition could happen:

(1) We dequeue the event
(2) The queue is empty, we return true
(3) The test asserts something assuming that all listeners have finished 
executing
(4) The listeners receive the event

This has been a possible race condition for a long time, but for some reason 
we've never run into it.


 Very subtle race condition in SparkListenerSuite
 

 Key: SPARK-1615
 URL: https://issues.apache.org/jira/browse/SPARK-1615
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.1
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Minor
 Fix For: 1.0.0


 Much of SparkListenerSuite relies on LiveListenerBus's waitUntilEmpty() 
 method. As the name suggests, this waits until the event queue is empty. 
 However, the following race condition could happen:
 (1) We dequeue the event
 (2) The queue is empty, we return true
 (3) The test asserts something assuming that all listeners have finished 
 executing (and fails)
 (4) The listeners receive the event
 This has been a possible race condition for a long time, but for some reason 
 we've never run into it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (SPARK-1604) Couldn't run spark-submit with yarn cluster mode when using deps jar

2014-04-24 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979994#comment-13979994
 ] 

Kan Zhang edited comment on SPARK-1604 at 4/24/14 5:38 PM:
---

Ah, that could be the reason. I was using sbt assemble-deps and then package to 
build. Just verified, when building the normal sbt assembly jar, problem 
disappears. Could be an issue with the former build sequence. 

Moving this to BUILD.


was (Author: kzhang):
Ah, that could be the reason. I was using sbt assemble-deps and then package to 
build. Just verified, when building the normal sbt assembly jar, problem 
disappears. Could be a problem with the former build sequence. 

Moving this BUILD.

 Couldn't run spark-submit with yarn cluster mode when using deps jar
 

 Key: SPARK-1604
 URL: https://issues.apache.org/jira/browse/SPARK-1604
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.0
Reporter: Kan Zhang
Priority: Blocker

 SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar
  ./bin/spark-submit 
 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar --master 
 yarn --deploy-mode cluster --class org.apache.spark.examples.sql.JavaSparkSQL 
 Exception in thread main java.lang.ClassNotFoundException: 
 org.apache.spark.deploy.yarn.Client
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:270)
   at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:234)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:47)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1604) Couldn't run spark-submit with yarn cluster mode when using deps jar

2014-04-24 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1398#comment-1398
 ] 

Thomas Graves commented on SPARK-1604:
--

if its just in that particular sequence of build steps, perhaps also lower to 
not be a blocker.

 Couldn't run spark-submit with yarn cluster mode when using deps jar
 

 Key: SPARK-1604
 URL: https://issues.apache.org/jira/browse/SPARK-1604
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.0
Reporter: Kan Zhang
Priority: Blocker

 SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar
  ./bin/spark-submit 
 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar --master 
 yarn --deploy-mode cluster --class org.apache.spark.examples.sql.JavaSparkSQL 
 Exception in thread main java.lang.ClassNotFoundException: 
 org.apache.spark.deploy.yarn.Client
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:270)
   at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:234)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:47)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1604) Couldn't run spark-submit with yarn cluster mode when using deps jar

2014-04-24 Thread Kan Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kan Zhang updated SPARK-1604:
-

Priority: Major  (was: Blocker)

 Couldn't run spark-submit with yarn cluster mode when using deps jar
 

 Key: SPARK-1604
 URL: https://issues.apache.org/jira/browse/SPARK-1604
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.0
Reporter: Kan Zhang

 SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar
  ./bin/spark-submit 
 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar --master 
 yarn --deploy-mode cluster --class org.apache.spark.examples.sql.JavaSparkSQL 
 Exception in thread main java.lang.ClassNotFoundException: 
 org.apache.spark.deploy.yarn.Client
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:270)
   at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:234)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:47)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1604) Couldn't run spark-submit with yarn cluster mode when using deps jar

2014-04-24 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980003#comment-13980003
 ] 

Kan Zhang commented on SPARK-1604:
--

Sure, lowered it to Major.

 Couldn't run spark-submit with yarn cluster mode when using deps jar
 

 Key: SPARK-1604
 URL: https://issues.apache.org/jira/browse/SPARK-1604
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.0
Reporter: Kan Zhang

 SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar
  ./bin/spark-submit 
 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar --master 
 yarn --deploy-mode cluster --class org.apache.spark.examples.sql.JavaSparkSQL 
 Exception in thread main java.lang.ClassNotFoundException: 
 org.apache.spark.deploy.yarn.Client
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:270)
   at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:234)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:47)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1573) slight modification with regards to sbt/sbt test

2014-04-24 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980015#comment-13980015
 ] 

Nishkam Ravi commented on SPARK-1573:
-

In the documentation at the bottom of this page: https://github.com/apache/spark
a clause can be added for sbt/sbt test in the section on Note about Hadoop 
versions. 

Maybe you can assign it to someone who has edit rights?

 slight modification with regards to sbt/sbt test
 

 Key: SPARK-1573
 URL: https://issues.apache.org/jira/browse/SPARK-1573
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Nishkam Ravi

 When the sources are built against a certain Hadoop version with 
 SPARK_YARN=true, the same settings seem necessary when running sbt/sbt test. 
 For example:
 SPARK_HADOOP_VERSION=2.3.0-cdh5.0.0 SPARK_YARN=true sbt/sbt assembly
 SPARK_HADOOP_VERSION=2.3.0-cdh5.0.0 SPARK_YARN=true sbt/sbt test
 Otherwise build errors and failing tests are seen.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1575) failing tests with master branch

2014-04-24 Thread Nishkam Ravi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishkam Ravi updated SPARK-1575:


Priority: Blocker  (was: Major)

 failing tests with master branch 
 -

 Key: SPARK-1575
 URL: https://issues.apache.org/jira/browse/SPARK-1575
 Project: Spark
  Issue Type: Test
Reporter: Nishkam Ravi
Priority: Blocker

 Built the master branch against Hadoop version 2.3.0-cdh5.0.0 with 
 SPARK_YARN=true. sbt tests don't go through successfully (tried multiple 
 runs).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1571) UnresolvedException when running JavaSparkSQL example

2014-04-24 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980034#comment-13980034
 ] 

Kan Zhang commented on SPARK-1571:
--

Thanks, it worked.

 UnresolvedException when running JavaSparkSQL example
 -

 Key: SPARK-1571
 URL: https://issues.apache.org/jira/browse/SPARK-1571
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Kan Zhang
Assignee: Michael Armbrust
Priority: Blocker

 When running JavaSparkSQL example using spark-submit in local mode (this 
 happens after fixing the class loading issue in SPARK-1570).
 14/04/22 12:46:47 ERROR Executor: Exception in task ID 0
 org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
 dataType on unresolved object, tree: 'age
   at 
 org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:49)
   at 
 org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:47)
   at 
 org.apache.spark.sql.catalyst.expressions.Expression.c2(Expression.scala:203)
   at 
 org.apache.spark.sql.catalyst.expressions.GreaterThanOrEqual.eval(predicates.scala:142)
   at 
 org.apache.spark.sql.catalyst.expressions.And.eval(predicates.scala:84)
   at 
 org.apache.spark.sql.execution.Filter$$anonfun$2$$anonfun$apply$1.apply(basicOperators.scala:43)
   at 
 org.apache.spark.sql.execution.Filter$$anonfun$2$$anonfun$apply$1.apply(basicOperators.scala:43)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1548) Add Partial Random Forest algorithm to MLlib

2014-04-24 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-1548:
-

Assignee: Jason Day

 Add Partial Random Forest algorithm to MLlib
 

 Key: SPARK-1548
 URL: https://issues.apache.org/jira/browse/SPARK-1548
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Manish Amde
Assignee: Jason Day

 This task involves creating an alternate approximate random forest 
 implementation where each tree is constructed per partition.
 The tasks involves:
 - Justifying with theory and experimental results why this algorithm is a 
 good choice.
 - Comparing the various tradeoffs and finalizing the algorithm before 
 implementation
 - Code implementation
 - Unit tests
 - Functional tests
 - Performance tests
 - Documentation



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1

2014-04-24 Thread Niraj Suthar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niraj Suthar resolved SPARK-1527.
-

Resolution: Fixed

 rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, 
 rootDir1
 ---

 Key: SPARK-1527
 URL: https://issues.apache.org/jira/browse/SPARK-1527
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0
Reporter: Ye Xianjin
Assignee: Niraj Suthar
Priority: Minor
  Labels: starter
   Original Estimate: 24h
  Remaining Estimate: 24h

 In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala
   val rootDir0 = Files.createTempDir()
   rootDir0.deleteOnExit()
   val rootDir1 = Files.createTempDir()
   rootDir1.deleteOnExit()
   val rootDirs = rootDir0.getName + , + rootDir1.getName
 rootDir0 and rootDir1 are in system's temporary directory. 
 rootDir0.getName will not get the full path of the directory but the last 
 component of the directory. When passing to DiskBlockManage constructor, the 
 DiskBlockerManger creates directories in pwd not the temporary directory.
 rootDir0.toString will fix this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1616) input file not found issue

2014-04-24 Thread prasad potipireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

prasad potipireddi updated SPARK-1616:
--

Summary: input file not found issue   (was: input availability)

 input file not found issue 
 ---

 Key: SPARK-1616
 URL: https://issues.apache.org/jira/browse/SPARK-1616
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 0.9.0
 Environment: Linux 2.6.18-348.3.1.el5 
Reporter: prasad potipireddi





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1616) input availability

2014-04-24 Thread prasad potipireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980164#comment-13980164
 ] 

prasad potipireddi commented on SPARK-1616:
---

Env:
  I have setup spark cluster with 3 nodes. one master and 3 workers

Usecase:
 1. I have copied jar file into master node
 2. I have provided the input file only in master node but not in any 
worker nodes
 3. Run the jar using 
  java -cp 
~/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar:`echo
 $SCALA_HOME/lib/*.jar | sed 's/ 
/:/g'`:~/examples/jars/SparkPoc-0.0.1-SNAPSHOT.jar -Dspark.master=local 
com.spark.mllib.LinearRegression spark://master node:7077 
~/examples/data/sampledata.csv

  then i got File Not Found error


Alternative Solution:
I have copied the same Sampledata.csv into all other worker nodes then 
its working as expected.





 input availability
 --

 Key: SPARK-1616
 URL: https://issues.apache.org/jira/browse/SPARK-1616
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 0.9.0
 Environment: Linux 2.6.18-348.3.1.el5 
Reporter: prasad potipireddi





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1527) rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1

2014-04-24 Thread Niraj Suthar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980185#comment-13980185
 ] 

Niraj Suthar commented on SPARK-1527:
-

Sure Ye Xianjin,

I am more thn happy to do so. after reading the comments I looked at the 
HttpBroadcast.scala and will update it appropriately.

if you guys have any suggestions here..please let me know. 

Thank you,
Niraj

 rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, 
 rootDir1
 ---

 Key: SPARK-1527
 URL: https://issues.apache.org/jira/browse/SPARK-1527
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0
Reporter: Ye Xianjin
Assignee: Niraj Suthar
Priority: Minor
  Labels: starter
   Original Estimate: 24h
  Remaining Estimate: 24h

 In core/src/test/scala/org/apache/storage/DiskBlockManagerSuite.scala
   val rootDir0 = Files.createTempDir()
   rootDir0.deleteOnExit()
   val rootDir1 = Files.createTempDir()
   rootDir1.deleteOnExit()
   val rootDirs = rootDir0.getName + , + rootDir1.getName
 rootDir0 and rootDir1 are in system's temporary directory. 
 rootDir0.getName will not get the full path of the directory but the last 
 component of the directory. When passing to DiskBlockManage constructor, the 
 DiskBlockerManger creates directories in pwd not the temporary directory.
 rootDir0.toString will fix this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1557) Set permissions on event log files/directories

2014-04-24 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980213#comment-13980213
 ] 

Thomas Graves commented on SPARK-1557:
--

https://github.com/apache/spark/pull/538

 Set permissions on event log files/directories
 --

 Key: SPARK-1557
 URL: https://issues.apache.org/jira/browse/SPARK-1557
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Thomas Graves
Assignee: Thomas Graves

 We should set the permissions on the event log directories and files so that 
 it restricts access to only those users who own them, but could also allow a 
 super user to read them so that they could be displayed by the history server 
 in a multi-tenant secure environment. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1490) Add kerberos support to the HistoryServer

2014-04-24 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-1490.
--

   Resolution: Fixed
Fix Version/s: 1.0.0

 Add kerberos support to the HistoryServer
 -

 Key: SPARK-1490
 URL: https://issues.apache.org/jira/browse/SPARK-1490
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.0.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Fix For: 1.0.0


 Now that we have a history server that works on yarn and mesos we should add 
 the ability for it to authenticate via kerberos so that it can read HDFS 
 files without having to be restarted every 24 hours. 
 One solution to this is to have the history server read a keytab file.  The 
 Hadoop UserGroupInformation class has that functionality built in and as long 
 as its using rpc to talk to hdfs it will automatically relogin when it needs 
 to.   If the history server isn't using rpc to talk to hdfs then we would 
 have to add some functionality to relogin approximately every 24 hours 
 (configurable time).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1618) Socket receiver not restarting properly when connection is refused

2014-04-24 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-1618:


 Summary: Socket receiver not restarting properly when connection 
is refused
 Key: SPARK-1618
 URL: https://issues.apache.org/jira/browse/SPARK-1618
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das


If the socket receiver cannot connect in the first attempt, it should try to 
restart after a delay. That was broken, as the thread that restarts (hence, 
stops) the receiver waited on Thread.join on itself!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (SPARK-1619) Spark shell should use spark-submit

2014-04-24 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reassigned SPARK-1619:
--

Assignee: Patrick Wendell

 Spark shell should use spark-submit
 ---

 Key: SPARK-1619
 URL: https://issues.apache.org/jira/browse/SPARK-1619
 Project: Spark
  Issue Type: New Feature
Affects Versions: 1.0.0
Reporter: Patrick Wendell
Assignee: Patrick Wendell
 Fix For: 1.0.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1619) Spark shell should use spark-submit

2014-04-24 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1619:
---

Priority: Blocker  (was: Major)

 Spark shell should use spark-submit
 ---

 Key: SPARK-1619
 URL: https://issues.apache.org/jira/browse/SPARK-1619
 Project: Spark
  Issue Type: New Feature
Affects Versions: 1.0.0
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1619) Spark shell should use spark-submit

2014-04-24 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-1619:
--

 Summary: Spark shell should use spark-submit
 Key: SPARK-1619
 URL: https://issues.apache.org/jira/browse/SPARK-1619
 Project: Spark
  Issue Type: New Feature
Affects Versions: 1.0.0
Reporter: Patrick Wendell
 Fix For: 1.0.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-928) Add support for Unsafe-based serializer in Kryo 2.22

2014-04-24 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-928:


Priority: Major  (was: Minor)

 Add support for Unsafe-based serializer in Kryo 2.22
 

 Key: SPARK-928
 URL: https://issues.apache.org/jira/browse/SPARK-928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Matei Zaharia
  Labels: starter
 Fix For: 1.0.0


 This can reportedly be quite a bit faster, but it also requires Chill to 
 update its Kryo dependency. Once that happens we should add a 
 spark.kryo.useUnsafe flag.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1620) Uncaught exception from Akka scheduler

2014-04-24 Thread Mark Hamstra (JIRA)
Mark Hamstra created SPARK-1620:
---

 Summary: Uncaught exception from Akka scheduler
 Key: SPARK-1620
 URL: https://issues.apache.org/jira/browse/SPARK-1620
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0, 1.0.0
Reporter: Mark Hamstra
Priority: Blocker


I've been looking at this one in the context of a BlockManagerMaster that OOMs 
and doesn't respond to heartBeat(), but I suspect that there may be problems 
elsewhere where we use Akka's scheduler.

The basic nature of the problem is that we are expecting exceptions thrown from 
a scheduled function to be caught in the thread where 
_ActorSystem_.scheduler.schedule() or scheduleOnce() has been called.  In fact, 
the scheduled function runs on its own thread, so any exceptions that it throws 
are not caught in the thread that called schedule() -- e.g., unanswered 
BlockManager heartBeats (scheduled in BlockManager#initialize) that end up 
throwing exceptions in BlockManagerMaster#askDriverWithReply do not cause those 
exceptions to be handled by the Executor thread's UncaughtExceptionHandler. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-928) Add support for Unsafe-based serializer in Kryo 2.22

2014-04-24 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-928:


Priority: Minor  (was: Major)

 Add support for Unsafe-based serializer in Kryo 2.22
 

 Key: SPARK-928
 URL: https://issues.apache.org/jira/browse/SPARK-928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Matei Zaharia
Priority: Minor
  Labels: starter
 Fix For: 1.0.0


 This can reportedly be quite a bit faster, but it also requires Chill to 
 update its Kryo dependency. Once that happens we should add a 
 spark.kryo.useUnsafe flag.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1104) Worker should not block while killing executors

2014-04-24 Thread Aaron Davidson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Davidson resolved SPARK-1104.
---

   Resolution: Fixed
Fix Version/s: 1.0.0

 Worker should not block while killing executors
 ---

 Key: SPARK-1104
 URL: https://issues.apache.org/jira/browse/SPARK-1104
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 0.9.0, 1.0.0
Reporter: Patrick Cogan
Assignee: Nan Zhu
 Fix For: 1.0.0


 Sometimes due to large shuffles executors will take a long time shutting 
 down. In particular this can happen if large numbers of shuffle files are 
 around (this will be alleviated by SPARK-1103, but nonetheless...).
 The symptom is you have DEAD workers sitting around in the UI and the 
 existing workers keep trying to re-register but can't because they've been 
 assumed dead.
 If killing the executor happens in its own thread, or if the ExecutorRunner 
 were an actor, this would not be a problem. For 0.9 I'd prefer the former 
 approach since it minimizes code changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (SPARK-1621) Update Chill to 0.3.6

2014-04-24 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia reassigned SPARK-1621:


Assignee: Matei Zaharia

 Update Chill to 0.3.6
 -

 Key: SPARK-1621
 URL: https://issues.apache.org/jira/browse/SPARK-1621
 Project: Spark
  Issue Type: Improvement
Reporter: Matei Zaharia
Assignee: Matei Zaharia
Priority: Minor
 Fix For: 1.0.0


 It registers more Scala classes, including things like Ranges that we had to 
 register manually before. See https://github.com/twitter/chill/releases for 
 Chill's change log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1621) Update Chill to 0.3.6

2014-04-24 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-1621:


 Summary: Update Chill to 0.3.6
 Key: SPARK-1621
 URL: https://issues.apache.org/jira/browse/SPARK-1621
 Project: Spark
  Issue Type: Improvement
Reporter: Matei Zaharia
Priority: Minor
 Fix For: 1.0.0


It registers more Scala classes, including things like Ranges that we had to 
register manually before. See https://github.com/twitter/chill/releases for 
Chill's change log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-928) Add support for Unsafe-based serializer in Kryo 2.22

2014-04-24 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980471#comment-13980471
 ] 

Matei Zaharia commented on SPARK-928:
-

This probably can't be fixed in 1.0.0 because no Chill release uses Kryo 2.22 
yet, and as far as I can tell we can't build the current Chill with Kryo 2.22 
(I get some nasty Scala compiler errors when I try that). We can bump fix 
version to 1.1.0 once we do the final pass through 1.0.0 issues.

 Add support for Unsafe-based serializer in Kryo 2.22
 

 Key: SPARK-928
 URL: https://issues.apache.org/jira/browse/SPARK-928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Matei Zaharia
Priority: Minor
  Labels: starter
 Fix For: 1.0.0


 This can reportedly be quite a bit faster, but it also requires Chill to 
 update its Kryo dependency. Once that happens we should add a 
 spark.kryo.useUnsafe flag.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1438) Update RDD.sample() API to make seed parameter optional

2014-04-24 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-1438:
-

Assignee: Arun Ramakrishnan

 Update RDD.sample() API to make seed parameter optional
 ---

 Key: SPARK-1438
 URL: https://issues.apache.org/jira/browse/SPARK-1438
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Matei Zaharia
Assignee: Arun Ramakrishnan
Priority: Blocker
  Labels: Starter
 Fix For: 1.0.0


 When a seed is not given, it should pick one based on Math.random().
 This needs to be done in Java and Python as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1438) Update RDD.sample() API to make seed parameter optional

2014-04-24 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia resolved SPARK-1438.
--

Resolution: Fixed

 Update RDD.sample() API to make seed parameter optional
 ---

 Key: SPARK-1438
 URL: https://issues.apache.org/jira/browse/SPARK-1438
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Matei Zaharia
Assignee: Arun Ramakrishnan
Priority: Blocker
  Labels: Starter
 Fix For: 1.0.0


 When a seed is not given, it should pick one based on Math.random().
 This needs to be done in Java and Python as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1340) Some Spark Streaming receivers are not restarted when worker fails

2014-04-24 Thread Tathagata Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980595#comment-13980595
 ] 

Tathagata Das commented on SPARK-1340:
--

I havent explicitly tested this, but this should be fixed after after a whole 
refactoring in the receiver API done in https://github.com/apache/spark/pull/300


 Some Spark Streaming receivers are not restarted when worker fails
 --

 Key: SPARK-1340
 URL: https://issues.apache.org/jira/browse/SPARK-1340
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9.0
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Critical

 For some streams like Kafka stream, the receiver do not get restarted if the 
 worker running the receiver fails. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (SPARK-1340) Some Spark Streaming receivers are not restarted when worker fails

2014-04-24 Thread Tathagata Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980595#comment-13980595
 ] 

Tathagata Das edited comment on SPARK-1340 at 4/25/14 1:34 AM:
---

I havent explicitly tested this, but this should be fixed after after a whole 
refactoring in the receiver API done in https://github.com/apache/spark/pull/300

To elaborate further, the new refactored receiver ensures that the task that 
launches the receiver does not complete until the receiver is explicitly 
shutdown. So if the receiver fails with an exception it should get relaunched. 
Well, ideally. This still needs to be tested.



was (Author: tdas):
I havent explicitly tested this, but this should be fixed after after a whole 
refactoring in the receiver API done in https://github.com/apache/spark/pull/300


 Some Spark Streaming receivers are not restarted when worker fails
 --

 Key: SPARK-1340
 URL: https://issues.apache.org/jira/browse/SPARK-1340
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9.0
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Critical

 For some streams like Kafka stream, the receiver do not get restarted if the 
 worker running the receiver fails. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1489) Fix the HistoryServer view acls

2014-04-24 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1489.


   Resolution: Fixed
Fix Version/s: 1.0.0

 Fix the HistoryServer view acls
 ---

 Key: SPARK-1489
 URL: https://issues.apache.org/jira/browse/SPARK-1489
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.0.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Fix For: 1.0.0


 If you are running the historyServer with view acls enabled (and a filter 
 added to do auth), the application acls don't work properly.  It is looking 
 at the user running the history server and not the user who ran the actual 
 application, so basically no one other then the user running the history 
 server can see anything.
 We also need a way to allow all users to see the front page of the history 
 server and only do authorization on the particular applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1609) Executor fails to start when use spark-submit

2014-04-24 Thread witgo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980637#comment-13980637
 ] 

witgo commented on SPARK-1609:
--

Spark Executor Command: /opt/jdk1.8.0/bin/java -cp 
:/opt/spark/classes/echo-1.0-SNAPSHOT.jar:/opt/spark/classes/toona-assembly-1.0.0-SNAPSHOT.jar:/opt/spark/spark-1.0.0-cdh3/conf:/opt/spark/spark-1.0.0-cdh3/lib/spark-assembly-1.0.0-SNAPSHOT-hadoop0.20.2-cdh3u5.jar
 -Xss2m -Dspark.ui.killEnabled=false -Xms5120M -Xmx5120M 
org.apache.spark.executor.CoarseGrainedExecutorBackend 
akka.tcp://spark@spark:47185/user/CoarseGrainedScheduler 7 spark 4 
akka.tcp://sparkWorker@spark:35646/user/Worker app-20140425183255-


Invalid thread stack size: -Xss2m -Dspark.ui.killEnabled=false
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

 Executor fails to start when use spark-submit
 -

 Key: SPARK-1609
 URL: https://issues.apache.org/jira/browse/SPARK-1609
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: witgo
Priority: Blocker
 Attachments: spark.log


 {code}
 export SPARK_JAVA_OPTS=-server -Dspark.ui.killEnabled=false 
 -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
 -Dspark.locality.wait=1 
 -Dspark.storage.blockManagerTimeoutIntervalMs=600 
 -Dspark.storage.memoryFraction=0.7 
 -Dspark.broadcast.factory=org.apache.spark.broadcast.TorrentBroadcastFactory
 {code}
 Executor fails to start.
 {code}
 export SPARK_JAVA_OPTS=-Dspark.ui.killEnabled=false 
 -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
 -Dspark.locality.wait=1 
 -Dspark.storage.blockManagerTimeoutIntervalMs=600 
 -Dspark.storage.memoryFraction=0.7 
 -Dspark.broadcast.factory=org.apache.spark.broadcast.TorrentBroadcastFactory
 {code}
 Executor can work 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1623) Broadcast cleaner should use getCononicalPath when deleting files by name

2014-04-24 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-1623:
--

 Summary: Broadcast cleaner should use getCononicalPath when 
deleting files by name
 Key: SPARK-1623
 URL: https://issues.apache.org/jira/browse/SPARK-1623
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Patrick Wendell
Priority: Blocker
 Fix For: 1.0.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1609) Executor fails to start when Command.extraJavaOptions contains multiple Java options

2014-04-24 Thread witgo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

witgo updated SPARK-1609:
-

Summary: Executor fails to start when Command.extraJavaOptions contains 
multiple Java options  (was: Executor fails to start when use spark-submit)

 Executor fails to start when Command.extraJavaOptions contains multiple Java 
 options
 

 Key: SPARK-1609
 URL: https://issues.apache.org/jira/browse/SPARK-1609
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: witgo
Priority: Blocker
 Attachments: spark.log


 {code}
 export SPARK_JAVA_OPTS=-server -Dspark.ui.killEnabled=false 
 -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
 -Dspark.locality.wait=1 
 -Dspark.storage.blockManagerTimeoutIntervalMs=600 
 -Dspark.storage.memoryFraction=0.7 
 -Dspark.broadcast.factory=org.apache.spark.broadcast.TorrentBroadcastFactory
 {code}
 Executor fails to start.
 {code}
 export SPARK_JAVA_OPTS=-Dspark.ui.killEnabled=false 
 -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
 -Dspark.locality.wait=1 
 -Dspark.storage.blockManagerTimeoutIntervalMs=600 
 -Dspark.storage.memoryFraction=0.7 
 -Dspark.broadcast.factory=org.apache.spark.broadcast.TorrentBroadcastFactory
 {code}
 Executor can work 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-986) Add job cancellation to PySpark

2014-04-24 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia resolved SPARK-986.
-

Resolution: Fixed

 Add job cancellation to PySpark
 ---

 Key: SPARK-986
 URL: https://issues.apache.org/jira/browse/SPARK-986
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Reporter: Josh Rosen
Assignee: Ahir Reddy
 Fix For: 1.0.0


 We should add support for job cancellation to PySpark.  It would also be nice 
 to be able to cancel jobs via ctrl-c in the PySpark shell.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-986) Add job cancellation to PySpark

2014-04-24 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-986:


Affects Version/s: (was: 0.9.0)

 Add job cancellation to PySpark
 ---

 Key: SPARK-986
 URL: https://issues.apache.org/jira/browse/SPARK-986
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Reporter: Josh Rosen
Assignee: Ahir Reddy
 Fix For: 1.0.0


 We should add support for job cancellation to PySpark.  It would also be nice 
 to be able to cancel jobs via ctrl-c in the PySpark shell.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-986) Add job cancellation to PySpark

2014-04-24 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-986:


Fix Version/s: 1.0.0

 Add job cancellation to PySpark
 ---

 Key: SPARK-986
 URL: https://issues.apache.org/jira/browse/SPARK-986
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Reporter: Josh Rosen
Assignee: Ahir Reddy
 Fix For: 1.0.0


 We should add support for job cancellation to PySpark.  It would also be nice 
 to be able to cancel jobs via ctrl-c in the PySpark shell.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-986) Add job cancellation to PySpark

2014-04-24 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-986:


Assignee: Ahir Reddy

 Add job cancellation to PySpark
 ---

 Key: SPARK-986
 URL: https://issues.apache.org/jira/browse/SPARK-986
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Reporter: Josh Rosen
Assignee: Ahir Reddy
 Fix For: 1.0.0


 We should add support for job cancellation to PySpark.  It would also be nice 
 to be able to cancel jobs via ctrl-c in the PySpark shell.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1586) Fix issues with spark development under windows

2014-04-24 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia resolved SPARK-1586.
--

   Resolution: Fixed
Fix Version/s: 1.0.0

 Fix issues with spark development under windows
 ---

 Key: SPARK-1586
 URL: https://issues.apache.org/jira/browse/SPARK-1586
 Project: Spark
  Issue Type: Bug
Reporter: Mridul Muralidharan
Assignee: Mridul Muralidharan
 Fix For: 1.0.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1584) Upgrade Flume dependency to 1.4.0

2014-04-24 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1584.


Resolution: Fixed
  Assignee: Ted Malaska  (was: Sandy Ryza)

 Upgrade Flume dependency to 1.4.0
 -

 Key: SPARK-1584
 URL: https://issues.apache.org/jira/browse/SPARK-1584
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.9.0
Reporter: Sandy Ryza
Assignee: Ted Malaska
 Fix For: 1.0.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1617) Exposing receiver state and errors in the streaming UI

2014-04-24 Thread Tathagata Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980681#comment-13980681
 ] 

Tathagata Das commented on SPARK-1617:
--

https://github.com/apache/spark/pull/540/

 Exposing receiver state and errors in the streaming UI
 --

 Key: SPARK-1617
 URL: https://issues.apache.org/jira/browse/SPARK-1617
 Project: Spark
  Issue Type: Improvement
  Components: Streaming, Web UI
Reporter: Tathagata Das
Assignee: Tathagata Das
 Fix For: 1.0.0


 The receiver state (active or inactive) and last error was not exposed in the 
 UI prior to these changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1618) Socket receiver not restarting properly when connection is refused

2014-04-24 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-1618.
--

Resolution: Fixed

 Socket receiver not restarting properly when connection is refused
 --

 Key: SPARK-1618
 URL: https://issues.apache.org/jira/browse/SPARK-1618
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
 Fix For: 1.0.0


 If the socket receiver cannot connect in the first attempt, it should try to 
 restart after a delay. That was broken, as the thread that restarts (hence, 
 stops) the receiver waited on Thread.join on itself!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1618) Socket receiver not restarting properly when connection is refused

2014-04-24 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-1618:
-

Fix Version/s: 1.0.0

 Socket receiver not restarting properly when connection is refused
 --

 Key: SPARK-1618
 URL: https://issues.apache.org/jira/browse/SPARK-1618
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
 Fix For: 1.0.0


 If the socket receiver cannot connect in the first attempt, it should try to 
 restart after a delay. That was broken, as the thread that restarts (hence, 
 stops) the receiver waited on Thread.join on itself!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1617) Exposing receiver state and errors in the streaming UI

2014-04-24 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-1617.
--

Resolution: Fixed

 Exposing receiver state and errors in the streaming UI
 --

 Key: SPARK-1617
 URL: https://issues.apache.org/jira/browse/SPARK-1617
 Project: Spark
  Issue Type: Improvement
  Components: Streaming, Web UI
Reporter: Tathagata Das
Assignee: Tathagata Das
 Fix For: 1.0.0


 The receiver state (active or inactive) and last error was not exposed in the 
 UI prior to these changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1592) Old streaming input blocks not removed automatically from the BlockManagers

2014-04-24 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-1592:
-

Fix Version/s: 1.0.0

 Old streaming input blocks not removed automatically from the BlockManagers
 ---

 Key: SPARK-1592
 URL: https://issues.apache.org/jira/browse/SPARK-1592
 Project: Spark
  Issue Type: Bug
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker
 Fix For: 1.0.0


 The raw input data is stored as blocks in BlockManagers. Earlier they were 
 cleared by cleaner ttl. Now since streaming does not require cleaner TTL to 
 be set, the block would not get cleared. This increases up the Spark's memory 
 usage, which is not even accounted and shown in the Spark storage UI. It may 
 cause the data blocks to spill over to disk, which eventually slows down the 
 receiving of data (persisting to memory become bottlenecked by writing to 
 disk).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1510) Add Spark Streaming metrics source for metrics system

2014-04-24 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-1510.
--

   Resolution: Fixed
Fix Version/s: 1.0.0

https://github.com/apache/spark/pull/545

 Add Spark Streaming metrics source for metrics system
 -

 Key: SPARK-1510
 URL: https://issues.apache.org/jira/browse/SPARK-1510
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: Saisai Shao
 Fix For: 1.0.0


 Since Spark Streaming application is a long-run application, it is more 
 important to monitor the current running status of this application. Now we 
 have Streaming UI which can directly view the status of app, it is also 
 necessary to export some of these metrics to metrics system, so that external 
 tools can connect and monitor the status of app.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1592) Old streaming input blocks not removed automatically from the BlockManagers

2014-04-24 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-1592.
--

Resolution: Fixed

https://github.com/apache/spark/pull/512

 Old streaming input blocks not removed automatically from the BlockManagers
 ---

 Key: SPARK-1592
 URL: https://issues.apache.org/jira/browse/SPARK-1592
 Project: Spark
  Issue Type: Bug
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker

 The raw input data is stored as blocks in BlockManagers. Earlier they were 
 cleared by cleaner ttl. Now since streaming does not require cleaner TTL to 
 be set, the block would not get cleared. This increases up the Spark's memory 
 usage, which is not even accounted and shown in the Spark storage UI. It may 
 cause the data blocks to spill over to disk, which eventually slows down the 
 receiving of data (persisting to memory become bottlenecked by writing to 
 disk).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1624) addShutdownHook in DiskBlockManager Doesn't work the way it is supposed to work

2014-04-24 Thread Sandeep Singh (JIRA)
Sandeep Singh created SPARK-1624:


 Summary: addShutdownHook in DiskBlockManager Doesn't work the way 
it is supposed to work
 Key: SPARK-1624
 URL: https://issues.apache.org/jira/browse/SPARK-1624
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Sandeep Singh
Assignee: Sandeep Singh
Priority: Blocker


method Stop() is called in new Thread which calls Thead.stop() instead of 
method stop() provided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1624) addShutdownHook in DiskBlockManager Doesn't work the way it is supposed to work

2014-04-24 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980706#comment-13980706
 ] 

Sandeep Singh commented on SPARK-1624:
--

https://github.com/apache/spark/pull/527

 addShutdownHook in DiskBlockManager Doesn't work the way it is supposed to 
 work
 ---

 Key: SPARK-1624
 URL: https://issues.apache.org/jira/browse/SPARK-1624
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Sandeep Singh
Assignee: Sandeep Singh
Priority: Blocker

 method Stop() is called in new Thread which calls Thead.stop() instead of 
 method stop() provided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (SPARK-1624) addShutdownHook in DiskBlockManager Doesn't work the way it is supposed to work

2014-04-24 Thread Sandeep Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh closed SPARK-1624.


Resolution: Not a Problem

 addShutdownHook in DiskBlockManager Doesn't work the way it is supposed to 
 work
 ---

 Key: SPARK-1624
 URL: https://issues.apache.org/jira/browse/SPARK-1624
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Sandeep Singh
Assignee: Sandeep Singh
Priority: Blocker

 method Stop() is called in new Thread which calls Thead.stop() instead of 
 method stop() provided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (SPARK-1623) Broadcast cleaner should use getCononicalPath when deleting files by name

2014-04-24 Thread Niraj Suthar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niraj Suthar reassigned SPARK-1623:
---

Assignee: Niraj Suthar

 Broadcast cleaner should use getCononicalPath when deleting files by name
 -

 Key: SPARK-1623
 URL: https://issues.apache.org/jira/browse/SPARK-1623
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Niraj Suthar
Priority: Blocker
 Fix For: 1.0.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1623) SPARK-1623. Broadcast cleaner should use getCanonicalPath when deleting files by name

2014-04-24 Thread Niraj Suthar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niraj Suthar updated SPARK-1623:


Summary: SPARK-1623. Broadcast cleaner should use getCanonicalPath when 
deleting files by name  (was: Broadcast cleaner should use getCononicalPath 
when deleting files by name)

 SPARK-1623. Broadcast cleaner should use getCanonicalPath when deleting files 
 by name
 -

 Key: SPARK-1623
 URL: https://issues.apache.org/jira/browse/SPARK-1623
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Niraj Suthar
Priority: Blocker
 Fix For: 1.0.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)