[jira] [Resolved] (SPARK-1424) InsertInto should work on JavaSchemaRDD as well.

2014-04-16 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-1424.
-

Resolution: Fixed

 InsertInto should work on JavaSchemaRDD as well.
 

 Key: SPARK-1424
 URL: https://issues.apache.org/jira/browse/SPARK-1424
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.0.0
Reporter: Michael Armbrust
Assignee: Michael Armbrust
Priority: Blocker
 Fix For: 1.0.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1510) Add Spark Streaming metrics source for metrics system

2014-04-16 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-1510:
--

 Summary: Add Spark Streaming metrics source for metrics system
 Key: SPARK-1510
 URL: https://issues.apache.org/jira/browse/SPARK-1510
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: Saisai Shao


Since Spark Streaming application is a long-run application, it is more 
important to monitor the current running status of this application. Now we 
have Streaming UI which can directly view the status of app, it is also 
necessary to export some of these metrics to metrics system, so that external 
tools can connect and monitor the status of app.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1175) on shutting down a long running job, the cluster does not accept new jobs and gets hung

2014-04-16 Thread Tal Sliwowicz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970641#comment-13970641
 ] 

Tal Sliwowicz commented on SPARK-1175:
--

This prevents us from having a real automated solution for fail over when the 
driver fails. We cannot automatically start a new driver because spark is stuck 
on cleanup.

 on shutting down a long running job, the cluster does not accept new jobs and 
 gets hung
 ---

 Key: SPARK-1175
 URL: https://issues.apache.org/jira/browse/SPARK-1175
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.8.1, 0.9.0
Reporter: Tal Sliwowicz
Assignee: Nan Zhu
  Labels: shutdown, worker

 When shutting down a long processing job (24+ hours) that runs periodically 
 on the same context and generates a lot of shuffles (many hundreds of GB) the 
 spark workers get hung for a long while and the cluster does not accept new 
 jobs. The only way to proceed is to kill -9 the workers.
 This is a big problem because when multiple contexts run on the same cluster, 
 one mast stop them all for a simple restart.
 The context is stopped using sc.stop()
 This happens both in standalone mode and under mesos.
 We suspect this is caused by the delete Spark local dirs thread. Attached a 
 thread dump of the worker. Also, the relevant part may be:
 SIGTERM handler - Thread t@41040
java.lang.Thread.State: BLOCKED
   at java.lang.Shutdown.exit(Shutdown.java:168)
   - waiting to lock 69eab6a3 (a java.lang.Class) owned by SIGTERM 
 handler t@41038
   at java.lang.Terminator$1.handle(Terminator.java:35)
   at sun.misc.Signal$1.run(Signal.java:195)
   at java.lang.Thread.run(Thread.java:662)
Locked ownable synchronizers:
   - None
 delete Spark local dirs - Thread t@40
java.lang.Thread.State: RUNNABLE
   at java.io.UnixFileSystem.delete0(Native Method)
   at java.io.UnixFileSystem.delete(UnixFileSystem.java:251)
   at java.io.File.delete(File.java:904)
   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:482)
   at 
 org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:479)
   at 
 org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:478)
   at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:478)
   at 
 org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:479)
   at 
 org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:478)
   at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:478)
   at 
 org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$2.apply(DiskBlockManager.scala:141)
   at 
 org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$2.apply(DiskBlockManager.scala:139)
   at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
   at 
 org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139)
Locked ownable synchronizers:
   - None
 SIGTERM handler - Thread t@41038
java.lang.Thread.State: WAITING
   at java.lang.Object.wait(Native Method)
   - waiting on 355c6c8d (a 
 org.apache.spark.storage.DiskBlockManager$$anon$1)
   at java.lang.Thread.join(Thread.java:1186)
   at java.lang.Thread.join(Thread.java:1239)
   at 
 java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79)
   at 
 java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24)
   at java.lang.Shutdown.runHooks(Shutdown.java:79)
   at java.lang.Shutdown.sequence(Shutdown.java:123)
   at java.lang.Shutdown.exit(Shutdown.java:168)
   - locked 69eab6a3 (a java.lang.Class)
   at java.lang.Terminator$1.handle(Terminator.java:35)
   at sun.misc.Signal$1.run(Signal.java:195)
   at java.lang.Thread.run(Thread.java:662)
Locked ownable synchronizers:
   - None



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1511) Update TestUtils.createCompiledClass() API to work with creating class file on different filesystem

2014-04-16 Thread Ye Xianjin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ye Xianjin updated SPARK-1511:
--

Affects Version/s: 0.8.1
   0.9.0

 Update TestUtils.createCompiledClass() API to work with creating class file 
 on different filesystem
 ---

 Key: SPARK-1511
 URL: https://issues.apache.org/jira/browse/SPARK-1511
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.8.1, 0.9.0, 1.0.0
 Environment: Mac OS X, two disks. 
Reporter: Ye Xianjin
Priority: Minor
  Labels: starter
   Original Estimate: 24h
  Remaining Estimate: 24h

 The createCompliedClass method uses java File.renameTo method to rename 
 source file to destination file, which will fail if source and destination 
 files are on different disks (or partitions).
 see 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Tests-failed-after-assembling-the-latest-code-from-github-td6315.html
  for more details.
 Use com.google.common.io.Files.move instead of renameTo will solve this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1399) Reason for Stage Failure should be shown in UI

2014-04-16 Thread Lianhui Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971415#comment-13971415
 ] 

Lianhui Wang commented on SPARK-1399:
-

i think the user defined accumulators of every task need to shown in ui 

 Reason for Stage Failure should be shown in UI
 --

 Key: SPARK-1399
 URL: https://issues.apache.org/jira/browse/SPARK-1399
 Project: Spark
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Kay Ousterhout
Assignee: Nan Zhu

 Right now, we don't show why a stage failed in the UI.  We have this 
 information, and it would be useful for users to see (e.g., to see that a 
 stage was killed because the job was cancelled).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1175) on shutting down a long running job, the cluster does not accept new jobs and gets hung

2014-04-16 Thread Tal Sliwowicz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971520#comment-13971520
 ] 

Tal Sliwowicz commented on SPARK-1175:
--

Yes

 on shutting down a long running job, the cluster does not accept new jobs and 
 gets hung
 ---

 Key: SPARK-1175
 URL: https://issues.apache.org/jira/browse/SPARK-1175
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.8.1, 0.9.0
Reporter: Tal Sliwowicz
Assignee: Nan Zhu
  Labels: shutdown, worker

 When shutting down a long processing job (24+ hours) that runs periodically 
 on the same context and generates a lot of shuffles (many hundreds of GB) the 
 spark workers get hung for a long while and the cluster does not accept new 
 jobs. The only way to proceed is to kill -9 the workers.
 This is a big problem because when multiple contexts run on the same cluster, 
 one mast stop them all for a simple restart.
 The context is stopped using sc.stop()
 This happens both in standalone mode and under mesos.
 We suspect this is caused by the delete Spark local dirs thread. Attached a 
 thread dump of the worker. Also, the relevant part may be:
 SIGTERM handler - Thread t@41040
java.lang.Thread.State: BLOCKED
   at java.lang.Shutdown.exit(Shutdown.java:168)
   - waiting to lock 69eab6a3 (a java.lang.Class) owned by SIGTERM 
 handler t@41038
   at java.lang.Terminator$1.handle(Terminator.java:35)
   at sun.misc.Signal$1.run(Signal.java:195)
   at java.lang.Thread.run(Thread.java:662)
Locked ownable synchronizers:
   - None
 delete Spark local dirs - Thread t@40
java.lang.Thread.State: RUNNABLE
   at java.io.UnixFileSystem.delete0(Native Method)
   at java.io.UnixFileSystem.delete(UnixFileSystem.java:251)
   at java.io.File.delete(File.java:904)
   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:482)
   at 
 org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:479)
   at 
 org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:478)
   at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:478)
   at 
 org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:479)
   at 
 org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:478)
   at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:478)
   at 
 org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$2.apply(DiskBlockManager.scala:141)
   at 
 org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$2.apply(DiskBlockManager.scala:139)
   at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
   at 
 org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139)
Locked ownable synchronizers:
   - None
 SIGTERM handler - Thread t@41038
java.lang.Thread.State: WAITING
   at java.lang.Object.wait(Native Method)
   - waiting on 355c6c8d (a 
 org.apache.spark.storage.DiskBlockManager$$anon$1)
   at java.lang.Thread.join(Thread.java:1186)
   at java.lang.Thread.join(Thread.java:1239)
   at 
 java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79)
   at 
 java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24)
   at java.lang.Shutdown.runHooks(Shutdown.java:79)
   at java.lang.Shutdown.sequence(Shutdown.java:123)
   at java.lang.Shutdown.exit(Shutdown.java:168)
   - locked 69eab6a3 (a java.lang.Class)
   at java.lang.Terminator$1.handle(Terminator.java:35)
   at sun.misc.Signal$1.run(Signal.java:195)
   at java.lang.Thread.run(Thread.java:662)
Locked ownable synchronizers:
   - None



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1512) improve spark sql to support table with more than 22 fields

2014-04-16 Thread wangfei (JIRA)
wangfei created SPARK-1512:
--

 Summary: improve spark sql to support table with more than 22 
fields
 Key: SPARK-1512
 URL: https://issues.apache.org/jira/browse/SPARK-1512
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: wangfei
 Fix For: 1.0.0


spark sql use case class to define a table, so 22 fields limit in case classes 
lead to spark sql not support wide(more than 22 fields) tables. wide table is  
common in many cases



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1277) Automatically set the UI persistence directory based on cluster settings

2014-04-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1277.


Resolution: Won't Fix

Subsumed by other configuration patches.

 Automatically set the UI persistence directory based on cluster settings
 

 Key: SPARK-1277
 URL: https://issues.apache.org/jira/browse/SPARK-1277
 Project: Spark
  Issue Type: New Feature
  Components: Web UI
Reporter: Patrick Wendell
Assignee: Andrew Or
Priority: Blocker
 Fix For: 1.0.0


 More details forthcoming



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1497) Spark YARN code isn't checked with Scalastyle and has many style violations

2014-04-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1497.


Resolution: Fixed
  Assignee: Sean Owen

 Spark YARN code isn't checked with Scalastyle and has many style violations
 ---

 Key: SPARK-1497
 URL: https://issues.apache.org/jira/browse/SPARK-1497
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra, YARN
Reporter: Patrick Wendell
Assignee: Sean Owen
 Fix For: 1.1.0


 We should just set SPARK_YARN=true when running scalatstyle. Also we should 
 clean up the existing style issues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1513) Specialized ColumnType for Timestamp

2014-04-16 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-1513:
-

 Summary: Specialized ColumnType for Timestamp
 Key: SPARK-1513
 URL: https://issues.apache.org/jira/browse/SPARK-1513
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Cheng Lian






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1514) Standardize process for creating Spark packages

2014-04-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1514:
---

Summary: Standardize process for creating Spark packages  (was: Standardize 
way to create Spark releases)

 Standardize process for creating Spark packages
 ---

 Key: SPARK-1514
 URL: https://issues.apache.org/jira/browse/SPARK-1514
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.1.0


 Over time we've got (a) make-distribution.sh (b) maven distribution targets 
 (c) create-release.sh script in /dev. This is pretty confusing for downstream 
 packagers.
 We should have a single way to package releases, probably using a modified 
 maven distribution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1514) Standardize way to create Spark releases

2014-04-16 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-1514:
--

 Summary: Standardize way to create Spark releases
 Key: SPARK-1514
 URL: https://issues.apache.org/jira/browse/SPARK-1514
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.1.0


Over time we've got (a) make-distribution.sh (b) maven distribution targets (c) 
create-release.sh script in /dev. This is pretty confusing for downstream 
packagers.

We should have a single way to package releases, probably using a modified 
maven distribution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1469) Scheduler mode should accept lower-case definitions and have nicer error messages

2014-04-16 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1469.


   Resolution: Fixed
Fix Version/s: (was: 1.1.0)
   1.0.0

 Scheduler mode should accept lower-case definitions and have nicer error 
 messages
 -

 Key: SPARK-1469
 URL: https://issues.apache.org/jira/browse/SPARK-1469
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Sandeep Singh
  Labels: starter
 Fix For: 1.0.0


 I tried setting spark.scheduler.mode=fair and I got the following nasty 
 exception:
 {code}
 java.util.NoSuchElementException: None.get
   at scala.None$.get(Option.scala:313)
   at scala.None$.get(Option.scala:311)
   at scala.Enumeration.withName(Enumeration.scala:132)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl.init(TaskSchedulerImpl.scala:101)
   at 
 org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1338)
   at org.apache.spark.SparkContext.init(SparkContext.scala:230)
   at 
 org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:956)
   at $iwC$$iwC.init(console:8)
   at $iwC.init(console:14)
   at init(console:16)
   at .init(console:20)
   at .clinit(console)
 {code}
 We should do two improvements:
 1. We should make the built in ones case insensitive (fair/FAIR, fifo/FIFO).
 2. If an invalid mode is given we should print a better error message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1439) Aggregate Scaladocs across projects

2014-04-16 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971662#comment-13971662
 ] 

Sean Owen commented on SPARK-1439:
--

I had a run at this today. First I tried Maven-based formulas, but didn't quite 
do the trick. I made some progress with unidoc although not all the way. Maybe 
an SBT expert can help me figure how to finish it.

*Maven*

http://stackoverflow.com/questions/12301620/how-to-generate-an-aggregated-scaladoc-for-a-maven-site

This works, but, generates *javadoc* for everything, including Scala source. 
The resulting javadoc is not so helpful. It also complains a lot about not 
finding references since javadoc doesn't quite understand links in the same way.

*Maven #2*

You can also invoke the scala-maven-plugin 'doc' goal as part of the site 
generation:

{code:xml}
  reporting
plugins
  ...
  plugin
groupIdnet.alchim31.maven/groupId
artifactIdscala-maven-plugin/artifactId
reportSets
  reportSet
reports
  reportdoc/report
/reports
  /reportSet
/reportSets
  /plugin
/plugins
  /reporting
{code}

It lacks a goal like aggregate that the javadoc plugin has, which takes care 
of combining everything into one set of docs. This only generates scaladoc in 
each module in exploded format.

*Unidoc / SBT*

It is almost as easy as:

- adding the plugin to plugins.sbt: {{addSbtPlugin(com.eed3si9n % 
sbt-unidoc % 0.3.0)}}
- {{import sbtunidoc.Plugin.\_}} and {{UnidocKeys.\_}} in SparkBuild.scala
- adding ++ unidocSettings to rootSettings in SparkBuild.scala

but it was also necessary to:

- {{SPARK_YARN=true}} and {{SPARK_HADOOP_VERSION=2.2.0}}, for example, to make 
YARN scaladoc work
- Exclude {{yarn-alpha}} since scaladoc doesn't like the collision of class 
names:

{code}
  def rootSettings = sharedSettings ++ unidocSettings ++ Seq(
unidocProjectFilter in (ScalaUnidoc, unidoc) := inAnyProject -- 
inProjects(yarnAlpha),
publish := {}
  )
{code}

I still get SBT errors since I think this is not quite correctly finessing the 
build. But it seems almost there.


 Aggregate Scaladocs across projects
 ---

 Key: SPARK-1439
 URL: https://issues.apache.org/jira/browse/SPARK-1439
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Reporter: Matei Zaharia
 Fix For: 1.0.0


 Apparently there's a Unidoc plugin to put together ScalaDocs across 
 modules: https://github.com/akka/akka/blob/master/project/Unidoc.scala



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1496) SparkContext.jarOfClass should return Option instead of a sequence

2014-04-16 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971671#comment-13971671
 ] 

haosdent commented on SPARK-1496:
-

Is it should return Option[Seq[String]]? Maybe I could help you fix this issue. 
:-)

 SparkContext.jarOfClass should return Option instead of a sequence
 --

 Key: SPARK-1496
 URL: https://issues.apache.org/jira/browse/SPARK-1496
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Patrick Wendell
 Fix For: 1.0.0


 This is pretty confusing, especially since addJar expects to take a single 
 jar.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1516) Yarn Client should not call System.exit, should throw exception instead.

2014-04-16 Thread DB Tsai (JIRA)
DB Tsai created SPARK-1516:
--

 Summary: Yarn Client should not call System.exit, should throw 
exception instead.
 Key: SPARK-1516
 URL: https://issues.apache.org/jira/browse/SPARK-1516
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Reporter: DB Tsai
Assignee: DB Tsai


People submit spark job inside their application to yarn cluster using spark 
yarn client, and it's not desirable to call System.exit in yarn client which 
will terminate the parent application as well.

We should throw exception instead, and people can determine which action they 
want to take given the exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1516) Yarn Client should not call System.exit, should throw exception instead.

2014-04-16 Thread DB Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai updated SPARK-1516:
---

Assignee: (was: DB Tsai)

 Yarn Client should not call System.exit, should throw exception instead.
 

 Key: SPARK-1516
 URL: https://issues.apache.org/jira/browse/SPARK-1516
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Reporter: DB Tsai

 People submit spark job inside their application to yarn cluster using spark 
 yarn client, and it's not desirable to call System.exit in yarn client which 
 will terminate the parent application as well.
 We should throw exception instead, and people can determine which action they 
 want to take given the exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds

2014-04-16 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-1517:
--

 Summary: Publish nightly snapshots of documentation, maven 
artifacts, and binary builds
 Key: SPARK-1517
 URL: https://issues.apache.org/jira/browse/SPARK-1517
 Project: Spark
  Issue Type: Improvement
  Components: Build, Project Infra
Reporter: Patrick Wendell
 Fix For: 1.1.0


Should be pretty easy to do with Jenkins. The only thing I can think of that 
would be tricky is to set up credentials so that jenkins can publish this stuff 
somewhere on apache infra.

Ideally we don't want to have to put a private key on every jenkins box (since 
they are otherwise pretty stateless). One idea is to encrypt these credentials 
with a passphrase and post them somewhere publicly visible. Then the jenkins 
build can download the credentials provided we set a passphrase in an 
environment variable in jenkins. There may be simpler solutions as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1465) Spark compilation is broken with the latest hadoop-2.4.0 release

2014-04-16 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-1465.
--

Resolution: Fixed

 Spark compilation is broken with the latest hadoop-2.4.0 release
 

 Key: SPARK-1465
 URL: https://issues.apache.org/jira/browse/SPARK-1465
 Project: Spark
  Issue Type: Bug
Reporter: Xuan Gong
Priority: Blocker
 Fix For: 1.0.0


 Building spark with the latest 2.4.0 version of yarn, appears to be broken
 {code}
 [ERROR]   Apps.addToEnvironment(env, Environment.CLASSPATH.name, 
 Environment.PWD.$() +
 [ERROR]^
 [ERROR] 
 /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala:441:
  not enough arguments for method addToEnvironment: (x$1: 
 java.util.Map[String,String], x$2: String, x$3: String, x$4: String)Unit.
 Unspecified value parameter x$4.
 [ERROR] Apps.addToEnvironment(env, Environment.CLASSPATH.name, 
 Environment.PWD.$() +
 [ERROR]  ^
 [ERROR] 
 /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala:446:
  not enough arguments for method addToEnvironment: (x$1: 
 java.util.Map[String,String], x$2: String, x$3: String, x$4: String)Unit.
 Unspecified value parameter x$4.
 [ERROR]   Apps.addToEnvironment(env, Environment.CLASSPATH.name, 
 Environment.PWD.$() +
 [ERROR]^
 [ERROR] 
 /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala:449:
  not enough arguments for method addToEnvironment: (x$1: 
 java.util.Map[String,String], x$2: String, x$3: String, x$4: String)Unit.
 Unspecified value parameter x$4.
 [ERROR] Apps.addToEnvironment(env, Environment.CLASSPATH.name, 
 Environment.PWD.$() +
 [ERROR]  ^
 [ERROR] 
 /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala:170:
  not enough arguments for method setEnvFromInputString: (x$1: 
 java.util.Map[String,String], x$2: String, x$3: String)Unit.
 Unspecified value parameter x$3.
 [ERROR] Apps.setEnvFromInputString(env, 
 System.getenv(SPARK_YARN_USER_ENV))
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1518) Spark master doesn't compile against hadoop-common trunk

2014-04-16 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-1518:
-

 Summary: Spark master doesn't compile against hadoop-common trunk
 Key: SPARK-1518
 URL: https://issues.apache.org/jira/browse/SPARK-1518
 Project: Spark
  Issue Type: Bug
Reporter: Marcelo Vanzin


FSDataOutputStream::sync() has disappeared from trunk in Hadoop; 
FileLogger.scala is calling it.

I've changed it locally to hsync() so I can compile the code, but haven't 
checked yet whether those are equivalent. hsync() seems to have been there 
forever, so it hopefully works with all versions Spark cares about.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1519) support minPartitions parameter of wholeTextFiles() in pyspark

2014-04-16 Thread Nan Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nan Zhu updated SPARK-1519:
---

Description: 
though Scala implementation provides the parameter of minPartitions in 
wholeTextFiles, PySpark hasn't support it, 

should be easy to add in context.py

  was:
though Scala implementation provides the parameter of minPartitions in 
wholeTextFiles, PySpark hasn't provide support to it, 

should be easy to add in context.py


 support minPartitions parameter of wholeTextFiles() in pyspark
 --

 Key: SPARK-1519
 URL: https://issues.apache.org/jira/browse/SPARK-1519
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Nan Zhu

 though Scala implementation provides the parameter of minPartitions in 
 wholeTextFiles, PySpark hasn't support it, 
 should be easy to add in context.py



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1483) Rename minSplits to minPartitions in public APIs

2014-04-16 Thread Nan Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971961#comment-13971961
 ] 

Nan Zhu commented on SPARK-1483:


made the PR: https://github.com/apache/spark/pull/430

 Rename minSplits to minPartitions in public APIs
 

 Key: SPARK-1483
 URL: https://issues.apache.org/jira/browse/SPARK-1483
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Matei Zaharia
Assignee: Nan Zhu
Priority: Critical
 Fix For: 1.0.0


 The parameter name is part of the public API in Scala and Python, since you 
 can pass named parameters to a method, so we should name it to this more 
 descriptive term. Everywhere else we refer to splits as partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1462) Examples of ML algorithms are using deprecated APIs

2014-04-16 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia resolved SPARK-1462.
--

   Resolution: Fixed
Fix Version/s: 1.0.0

 Examples of ML algorithms are using deprecated APIs
 ---

 Key: SPARK-1462
 URL: https://issues.apache.org/jira/browse/SPARK-1462
 Project: Spark
  Issue Type: Improvement
  Components: Examples, MLlib
Affects Versions: 1.0.0
Reporter: Nan Zhu
Assignee: Sandeep Singh
 Fix For: 1.0.0


 mainly Vector, better to be Vectors.dense



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1475) Draining event logging queue before stopping event logger

2014-04-16 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972225#comment-13972225
 ] 

Kan Zhang commented on SPARK-1475:
--

A second PR that fixes the unit test introduced above.

https://github.com/apache/spark/pull/401

 Draining event logging queue before stopping event logger
 -

 Key: SPARK-1475
 URL: https://issues.apache.org/jira/browse/SPARK-1475
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Kan Zhang
Assignee: Kan Zhang
Priority: Blocker
 Fix For: 1.0.0


 When stopping SparkListenerBus, its event queue needs to be drained. And this 
 needs to happen before event logger is stopped. Otherwise, any event still 
 waiting to be processed in the queue may be lost and consequently event log 
 file may be incomplete. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)