[jira] [Resolved] (SPARK-1424) InsertInto should work on JavaSchemaRDD as well.
[ https://issues.apache.org/jira/browse/SPARK-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-1424. - Resolution: Fixed InsertInto should work on JavaSchemaRDD as well. Key: SPARK-1424 URL: https://issues.apache.org/jira/browse/SPARK-1424 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.0.0 Reporter: Michael Armbrust Assignee: Michael Armbrust Priority: Blocker Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1510) Add Spark Streaming metrics source for metrics system
Saisai Shao created SPARK-1510: -- Summary: Add Spark Streaming metrics source for metrics system Key: SPARK-1510 URL: https://issues.apache.org/jira/browse/SPARK-1510 Project: Spark Issue Type: Improvement Components: Streaming Reporter: Saisai Shao Since Spark Streaming application is a long-run application, it is more important to monitor the current running status of this application. Now we have Streaming UI which can directly view the status of app, it is also necessary to export some of these metrics to metrics system, so that external tools can connect and monitor the status of app. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1175) on shutting down a long running job, the cluster does not accept new jobs and gets hung
[ https://issues.apache.org/jira/browse/SPARK-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970641#comment-13970641 ] Tal Sliwowicz commented on SPARK-1175: -- This prevents us from having a real automated solution for fail over when the driver fails. We cannot automatically start a new driver because spark is stuck on cleanup. on shutting down a long running job, the cluster does not accept new jobs and gets hung --- Key: SPARK-1175 URL: https://issues.apache.org/jira/browse/SPARK-1175 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.8.1, 0.9.0 Reporter: Tal Sliwowicz Assignee: Nan Zhu Labels: shutdown, worker When shutting down a long processing job (24+ hours) that runs periodically on the same context and generates a lot of shuffles (many hundreds of GB) the spark workers get hung for a long while and the cluster does not accept new jobs. The only way to proceed is to kill -9 the workers. This is a big problem because when multiple contexts run on the same cluster, one mast stop them all for a simple restart. The context is stopped using sc.stop() This happens both in standalone mode and under mesos. We suspect this is caused by the delete Spark local dirs thread. Attached a thread dump of the worker. Also, the relevant part may be: SIGTERM handler - Thread t@41040 java.lang.Thread.State: BLOCKED at java.lang.Shutdown.exit(Shutdown.java:168) - waiting to lock 69eab6a3 (a java.lang.Class) owned by SIGTERM handler t@41038 at java.lang.Terminator$1.handle(Terminator.java:35) at sun.misc.Signal$1.run(Signal.java:195) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None delete Spark local dirs - Thread t@40 java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.delete0(Native Method) at java.io.UnixFileSystem.delete(UnixFileSystem.java:251) at java.io.File.delete(File.java:904) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:482) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:479) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:478) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:478) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:479) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:478) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:478) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$2.apply(DiskBlockManager.scala:141) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$2.apply(DiskBlockManager.scala:139) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139) Locked ownable synchronizers: - None SIGTERM handler - Thread t@41038 java.lang.Thread.State: WAITING at java.lang.Object.wait(Native Method) - waiting on 355c6c8d (a org.apache.spark.storage.DiskBlockManager$$anon$1) at java.lang.Thread.join(Thread.java:1186) at java.lang.Thread.join(Thread.java:1239) at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79) at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24) at java.lang.Shutdown.runHooks(Shutdown.java:79) at java.lang.Shutdown.sequence(Shutdown.java:123) at java.lang.Shutdown.exit(Shutdown.java:168) - locked 69eab6a3 (a java.lang.Class) at java.lang.Terminator$1.handle(Terminator.java:35) at sun.misc.Signal$1.run(Signal.java:195) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1511) Update TestUtils.createCompiledClass() API to work with creating class file on different filesystem
[ https://issues.apache.org/jira/browse/SPARK-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Xianjin updated SPARK-1511: -- Affects Version/s: 0.8.1 0.9.0 Update TestUtils.createCompiledClass() API to work with creating class file on different filesystem --- Key: SPARK-1511 URL: https://issues.apache.org/jira/browse/SPARK-1511 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.8.1, 0.9.0, 1.0.0 Environment: Mac OS X, two disks. Reporter: Ye Xianjin Priority: Minor Labels: starter Original Estimate: 24h Remaining Estimate: 24h The createCompliedClass method uses java File.renameTo method to rename source file to destination file, which will fail if source and destination files are on different disks (or partitions). see http://apache-spark-developers-list.1001551.n3.nabble.com/Tests-failed-after-assembling-the-latest-code-from-github-td6315.html for more details. Use com.google.common.io.Files.move instead of renameTo will solve this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1399) Reason for Stage Failure should be shown in UI
[ https://issues.apache.org/jira/browse/SPARK-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971415#comment-13971415 ] Lianhui Wang commented on SPARK-1399: - i think the user defined accumulators of every task need to shown in ui Reason for Stage Failure should be shown in UI -- Key: SPARK-1399 URL: https://issues.apache.org/jira/browse/SPARK-1399 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: Kay Ousterhout Assignee: Nan Zhu Right now, we don't show why a stage failed in the UI. We have this information, and it would be useful for users to see (e.g., to see that a stage was killed because the job was cancelled). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1175) on shutting down a long running job, the cluster does not accept new jobs and gets hung
[ https://issues.apache.org/jira/browse/SPARK-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971520#comment-13971520 ] Tal Sliwowicz commented on SPARK-1175: -- Yes on shutting down a long running job, the cluster does not accept new jobs and gets hung --- Key: SPARK-1175 URL: https://issues.apache.org/jira/browse/SPARK-1175 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.8.1, 0.9.0 Reporter: Tal Sliwowicz Assignee: Nan Zhu Labels: shutdown, worker When shutting down a long processing job (24+ hours) that runs periodically on the same context and generates a lot of shuffles (many hundreds of GB) the spark workers get hung for a long while and the cluster does not accept new jobs. The only way to proceed is to kill -9 the workers. This is a big problem because when multiple contexts run on the same cluster, one mast stop them all for a simple restart. The context is stopped using sc.stop() This happens both in standalone mode and under mesos. We suspect this is caused by the delete Spark local dirs thread. Attached a thread dump of the worker. Also, the relevant part may be: SIGTERM handler - Thread t@41040 java.lang.Thread.State: BLOCKED at java.lang.Shutdown.exit(Shutdown.java:168) - waiting to lock 69eab6a3 (a java.lang.Class) owned by SIGTERM handler t@41038 at java.lang.Terminator$1.handle(Terminator.java:35) at sun.misc.Signal$1.run(Signal.java:195) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None delete Spark local dirs - Thread t@40 java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.delete0(Native Method) at java.io.UnixFileSystem.delete(UnixFileSystem.java:251) at java.io.File.delete(File.java:904) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:482) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:479) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:478) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:478) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:479) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:478) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:478) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$2.apply(DiskBlockManager.scala:141) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$2.apply(DiskBlockManager.scala:139) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139) Locked ownable synchronizers: - None SIGTERM handler - Thread t@41038 java.lang.Thread.State: WAITING at java.lang.Object.wait(Native Method) - waiting on 355c6c8d (a org.apache.spark.storage.DiskBlockManager$$anon$1) at java.lang.Thread.join(Thread.java:1186) at java.lang.Thread.join(Thread.java:1239) at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79) at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24) at java.lang.Shutdown.runHooks(Shutdown.java:79) at java.lang.Shutdown.sequence(Shutdown.java:123) at java.lang.Shutdown.exit(Shutdown.java:168) - locked 69eab6a3 (a java.lang.Class) at java.lang.Terminator$1.handle(Terminator.java:35) at sun.misc.Signal$1.run(Signal.java:195) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1512) improve spark sql to support table with more than 22 fields
wangfei created SPARK-1512: -- Summary: improve spark sql to support table with more than 22 fields Key: SPARK-1512 URL: https://issues.apache.org/jira/browse/SPARK-1512 Project: Spark Issue Type: Improvement Components: SQL Reporter: wangfei Fix For: 1.0.0 spark sql use case class to define a table, so 22 fields limit in case classes lead to spark sql not support wide(more than 22 fields) tables. wide table is common in many cases -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1277) Automatically set the UI persistence directory based on cluster settings
[ https://issues.apache.org/jira/browse/SPARK-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1277. Resolution: Won't Fix Subsumed by other configuration patches. Automatically set the UI persistence directory based on cluster settings Key: SPARK-1277 URL: https://issues.apache.org/jira/browse/SPARK-1277 Project: Spark Issue Type: New Feature Components: Web UI Reporter: Patrick Wendell Assignee: Andrew Or Priority: Blocker Fix For: 1.0.0 More details forthcoming -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1497) Spark YARN code isn't checked with Scalastyle and has many style violations
[ https://issues.apache.org/jira/browse/SPARK-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1497. Resolution: Fixed Assignee: Sean Owen Spark YARN code isn't checked with Scalastyle and has many style violations --- Key: SPARK-1497 URL: https://issues.apache.org/jira/browse/SPARK-1497 Project: Spark Issue Type: Improvement Components: Project Infra, YARN Reporter: Patrick Wendell Assignee: Sean Owen Fix For: 1.1.0 We should just set SPARK_YARN=true when running scalatstyle. Also we should clean up the existing style issues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1513) Specialized ColumnType for Timestamp
Cheng Lian created SPARK-1513: - Summary: Specialized ColumnType for Timestamp Key: SPARK-1513 URL: https://issues.apache.org/jira/browse/SPARK-1513 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: Cheng Lian -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1514) Standardize process for creating Spark packages
[ https://issues.apache.org/jira/browse/SPARK-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1514: --- Summary: Standardize process for creating Spark packages (was: Standardize way to create Spark releases) Standardize process for creating Spark packages --- Key: SPARK-1514 URL: https://issues.apache.org/jira/browse/SPARK-1514 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.1.0 Over time we've got (a) make-distribution.sh (b) maven distribution targets (c) create-release.sh script in /dev. This is pretty confusing for downstream packagers. We should have a single way to package releases, probably using a modified maven distribution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1514) Standardize way to create Spark releases
Patrick Wendell created SPARK-1514: -- Summary: Standardize way to create Spark releases Key: SPARK-1514 URL: https://issues.apache.org/jira/browse/SPARK-1514 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.1.0 Over time we've got (a) make-distribution.sh (b) maven distribution targets (c) create-release.sh script in /dev. This is pretty confusing for downstream packagers. We should have a single way to package releases, probably using a modified maven distribution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1469) Scheduler mode should accept lower-case definitions and have nicer error messages
[ https://issues.apache.org/jira/browse/SPARK-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1469. Resolution: Fixed Fix Version/s: (was: 1.1.0) 1.0.0 Scheduler mode should accept lower-case definitions and have nicer error messages - Key: SPARK-1469 URL: https://issues.apache.org/jira/browse/SPARK-1469 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Patrick Wendell Assignee: Sandeep Singh Labels: starter Fix For: 1.0.0 I tried setting spark.scheduler.mode=fair and I got the following nasty exception: {code} java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:313) at scala.None$.get(Option.scala:311) at scala.Enumeration.withName(Enumeration.scala:132) at org.apache.spark.scheduler.TaskSchedulerImpl.init(TaskSchedulerImpl.scala:101) at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1338) at org.apache.spark.SparkContext.init(SparkContext.scala:230) at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:956) at $iwC$$iwC.init(console:8) at $iwC.init(console:14) at init(console:16) at .init(console:20) at .clinit(console) {code} We should do two improvements: 1. We should make the built in ones case insensitive (fair/FAIR, fifo/FIFO). 2. If an invalid mode is given we should print a better error message. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1439) Aggregate Scaladocs across projects
[ https://issues.apache.org/jira/browse/SPARK-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971662#comment-13971662 ] Sean Owen commented on SPARK-1439: -- I had a run at this today. First I tried Maven-based formulas, but didn't quite do the trick. I made some progress with unidoc although not all the way. Maybe an SBT expert can help me figure how to finish it. *Maven* http://stackoverflow.com/questions/12301620/how-to-generate-an-aggregated-scaladoc-for-a-maven-site This works, but, generates *javadoc* for everything, including Scala source. The resulting javadoc is not so helpful. It also complains a lot about not finding references since javadoc doesn't quite understand links in the same way. *Maven #2* You can also invoke the scala-maven-plugin 'doc' goal as part of the site generation: {code:xml} reporting plugins ... plugin groupIdnet.alchim31.maven/groupId artifactIdscala-maven-plugin/artifactId reportSets reportSet reports reportdoc/report /reports /reportSet /reportSets /plugin /plugins /reporting {code} It lacks a goal like aggregate that the javadoc plugin has, which takes care of combining everything into one set of docs. This only generates scaladoc in each module in exploded format. *Unidoc / SBT* It is almost as easy as: - adding the plugin to plugins.sbt: {{addSbtPlugin(com.eed3si9n % sbt-unidoc % 0.3.0)}} - {{import sbtunidoc.Plugin.\_}} and {{UnidocKeys.\_}} in SparkBuild.scala - adding ++ unidocSettings to rootSettings in SparkBuild.scala but it was also necessary to: - {{SPARK_YARN=true}} and {{SPARK_HADOOP_VERSION=2.2.0}}, for example, to make YARN scaladoc work - Exclude {{yarn-alpha}} since scaladoc doesn't like the collision of class names: {code} def rootSettings = sharedSettings ++ unidocSettings ++ Seq( unidocProjectFilter in (ScalaUnidoc, unidoc) := inAnyProject -- inProjects(yarnAlpha), publish := {} ) {code} I still get SBT errors since I think this is not quite correctly finessing the build. But it seems almost there. Aggregate Scaladocs across projects --- Key: SPARK-1439 URL: https://issues.apache.org/jira/browse/SPARK-1439 Project: Spark Issue Type: Sub-task Components: Documentation Reporter: Matei Zaharia Fix For: 1.0.0 Apparently there's a Unidoc plugin to put together ScalaDocs across modules: https://github.com/akka/akka/blob/master/project/Unidoc.scala -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1496) SparkContext.jarOfClass should return Option instead of a sequence
[ https://issues.apache.org/jira/browse/SPARK-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971671#comment-13971671 ] haosdent commented on SPARK-1496: - Is it should return Option[Seq[String]]? Maybe I could help you fix this issue. :-) SparkContext.jarOfClass should return Option instead of a sequence -- Key: SPARK-1496 URL: https://issues.apache.org/jira/browse/SPARK-1496 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Patrick Wendell Assignee: Patrick Wendell Fix For: 1.0.0 This is pretty confusing, especially since addJar expects to take a single jar. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1516) Yarn Client should not call System.exit, should throw exception instead.
DB Tsai created SPARK-1516: -- Summary: Yarn Client should not call System.exit, should throw exception instead. Key: SPARK-1516 URL: https://issues.apache.org/jira/browse/SPARK-1516 Project: Spark Issue Type: Improvement Components: Deploy Reporter: DB Tsai Assignee: DB Tsai People submit spark job inside their application to yarn cluster using spark yarn client, and it's not desirable to call System.exit in yarn client which will terminate the parent application as well. We should throw exception instead, and people can determine which action they want to take given the exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1516) Yarn Client should not call System.exit, should throw exception instead.
[ https://issues.apache.org/jira/browse/SPARK-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-1516: --- Assignee: (was: DB Tsai) Yarn Client should not call System.exit, should throw exception instead. Key: SPARK-1516 URL: https://issues.apache.org/jira/browse/SPARK-1516 Project: Spark Issue Type: Improvement Components: Deploy Reporter: DB Tsai People submit spark job inside their application to yarn cluster using spark yarn client, and it's not desirable to call System.exit in yarn client which will terminate the parent application as well. We should throw exception instead, and people can determine which action they want to take given the exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds
Patrick Wendell created SPARK-1517: -- Summary: Publish nightly snapshots of documentation, maven artifacts, and binary builds Key: SPARK-1517 URL: https://issues.apache.org/jira/browse/SPARK-1517 Project: Spark Issue Type: Improvement Components: Build, Project Infra Reporter: Patrick Wendell Fix For: 1.1.0 Should be pretty easy to do with Jenkins. The only thing I can think of that would be tricky is to set up credentials so that jenkins can publish this stuff somewhere on apache infra. Ideally we don't want to have to put a private key on every jenkins box (since they are otherwise pretty stateless). One idea is to encrypt these credentials with a passphrase and post them somewhere publicly visible. Then the jenkins build can download the credentials provided we set a passphrase in an environment variable in jenkins. There may be simpler solutions as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1465) Spark compilation is broken with the latest hadoop-2.4.0 release
[ https://issues.apache.org/jira/browse/SPARK-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-1465. -- Resolution: Fixed Spark compilation is broken with the latest hadoop-2.4.0 release Key: SPARK-1465 URL: https://issues.apache.org/jira/browse/SPARK-1465 Project: Spark Issue Type: Bug Reporter: Xuan Gong Priority: Blocker Fix For: 1.0.0 Building spark with the latest 2.4.0 version of yarn, appears to be broken {code} [ERROR] Apps.addToEnvironment(env, Environment.CLASSPATH.name, Environment.PWD.$() + [ERROR]^ [ERROR] /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala:441: not enough arguments for method addToEnvironment: (x$1: java.util.Map[String,String], x$2: String, x$3: String, x$4: String)Unit. Unspecified value parameter x$4. [ERROR] Apps.addToEnvironment(env, Environment.CLASSPATH.name, Environment.PWD.$() + [ERROR] ^ [ERROR] /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala:446: not enough arguments for method addToEnvironment: (x$1: java.util.Map[String,String], x$2: String, x$3: String, x$4: String)Unit. Unspecified value parameter x$4. [ERROR] Apps.addToEnvironment(env, Environment.CLASSPATH.name, Environment.PWD.$() + [ERROR]^ [ERROR] /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala:449: not enough arguments for method addToEnvironment: (x$1: java.util.Map[String,String], x$2: String, x$3: String, x$4: String)Unit. Unspecified value parameter x$4. [ERROR] Apps.addToEnvironment(env, Environment.CLASSPATH.name, Environment.PWD.$() + [ERROR] ^ [ERROR] /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala:170: not enough arguments for method setEnvFromInputString: (x$1: java.util.Map[String,String], x$2: String, x$3: String)Unit. Unspecified value parameter x$3. [ERROR] Apps.setEnvFromInputString(env, System.getenv(SPARK_YARN_USER_ENV)) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1518) Spark master doesn't compile against hadoop-common trunk
Marcelo Vanzin created SPARK-1518: - Summary: Spark master doesn't compile against hadoop-common trunk Key: SPARK-1518 URL: https://issues.apache.org/jira/browse/SPARK-1518 Project: Spark Issue Type: Bug Reporter: Marcelo Vanzin FSDataOutputStream::sync() has disappeared from trunk in Hadoop; FileLogger.scala is calling it. I've changed it locally to hsync() so I can compile the code, but haven't checked yet whether those are equivalent. hsync() seems to have been there forever, so it hopefully works with all versions Spark cares about. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1519) support minPartitions parameter of wholeTextFiles() in pyspark
[ https://issues.apache.org/jira/browse/SPARK-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-1519: --- Description: though Scala implementation provides the parameter of minPartitions in wholeTextFiles, PySpark hasn't support it, should be easy to add in context.py was: though Scala implementation provides the parameter of minPartitions in wholeTextFiles, PySpark hasn't provide support to it, should be easy to add in context.py support minPartitions parameter of wholeTextFiles() in pyspark -- Key: SPARK-1519 URL: https://issues.apache.org/jira/browse/SPARK-1519 Project: Spark Issue Type: Improvement Components: PySpark Reporter: Nan Zhu though Scala implementation provides the parameter of minPartitions in wholeTextFiles, PySpark hasn't support it, should be easy to add in context.py -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1483) Rename minSplits to minPartitions in public APIs
[ https://issues.apache.org/jira/browse/SPARK-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971961#comment-13971961 ] Nan Zhu commented on SPARK-1483: made the PR: https://github.com/apache/spark/pull/430 Rename minSplits to minPartitions in public APIs Key: SPARK-1483 URL: https://issues.apache.org/jira/browse/SPARK-1483 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia Assignee: Nan Zhu Priority: Critical Fix For: 1.0.0 The parameter name is part of the public API in Scala and Python, since you can pass named parameters to a method, so we should name it to this more descriptive term. Everywhere else we refer to splits as partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1462) Examples of ML algorithms are using deprecated APIs
[ https://issues.apache.org/jira/browse/SPARK-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1462. -- Resolution: Fixed Fix Version/s: 1.0.0 Examples of ML algorithms are using deprecated APIs --- Key: SPARK-1462 URL: https://issues.apache.org/jira/browse/SPARK-1462 Project: Spark Issue Type: Improvement Components: Examples, MLlib Affects Versions: 1.0.0 Reporter: Nan Zhu Assignee: Sandeep Singh Fix For: 1.0.0 mainly Vector, better to be Vectors.dense -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1475) Draining event logging queue before stopping event logger
[ https://issues.apache.org/jira/browse/SPARK-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972225#comment-13972225 ] Kan Zhang commented on SPARK-1475: -- A second PR that fixes the unit test introduced above. https://github.com/apache/spark/pull/401 Draining event logging queue before stopping event logger - Key: SPARK-1475 URL: https://issues.apache.org/jira/browse/SPARK-1475 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Kan Zhang Assignee: Kan Zhang Priority: Blocker Fix For: 1.0.0 When stopping SparkListenerBus, its event queue needs to be drained. And this needs to happen before event logger is stopped. Otherwise, any event still waiting to be processed in the queue may be lost and consequently event log file may be incomplete. -- This message was sent by Atlassian JIRA (v6.2#6252)