[jira] [Commented] (SPARK-1551) Spark master does not build in sbt
[ https://issues.apache.org/jira/browse/SPARK-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975442#comment-13975442 ] holdenk commented on SPARK-1551: Sorry about that, I had something dirty locally that added the requirement for gangilla. Spark master does not build in sbt -- Key: SPARK-1551 URL: https://issues.apache.org/jira/browse/SPARK-1551 Project: Spark Issue Type: Bug Reporter: holdenk metrics-ganglia is missing -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1552) GraphX performs type comparison incorrectly
Ankur Dave created SPARK-1552: - Summary: GraphX performs type comparison incorrectly Key: SPARK-1552 URL: https://issues.apache.org/jira/browse/SPARK-1552 Project: Spark Issue Type: Bug Components: GraphX Reporter: Ankur Dave In GraphImpl, mapVertices and outerJoinVertices use a more efficient implementation when the map function preserves vertex attribute types. This is implemented by comparing the ClassTags of the old and new vertex attribute types. However, ClassTags store _erased_ types, so the comparison will return a false positive for types with different type parameters, such as Option[Int] and Option[Double]. Demo in the Scala shell: scala import scala.reflect.{classTag, ClassTag} scala def typesEqual[A: ClassTag, B: ClassTag](a: A, b: B): Boolean = classTag[A] equals classTag[B] scala typesEqual(Some(1), Some(2.0)) // should return false res2: Boolean = true We can require richer TypeTags for these methods, or just take a flag from the caller specifying whether the types are equal. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1552) GraphX performs type comparison incorrectly
[ https://issues.apache.org/jira/browse/SPARK-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave updated SPARK-1552: -- Description: In GraphImpl, mapVertices and outerJoinVertices use a more efficient implementation when the map function preserves vertex attribute types. This is implemented by comparing the ClassTags of the old and new vertex attribute types. However, ClassTags store _erased_ types, so the comparison will return a false positive for types with different type parameters, such as Option[Int] and Option[Double]. Thanks to Pierre-Alexandre Fonta for reporting this bug on the [mailing list|http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Cast-error-when-comparing-a-vertex-attribute-after-its-type-has-changed-td4119.html]. Demo in the Scala shell: scala import scala.reflect.{classTag, ClassTag} scala def typesEqual[A: ClassTag, B: ClassTag](a: A, b: B): Boolean = classTag[A] equals classTag[B] scala typesEqual(Some(1), Some(2.0)) // should return false res2: Boolean = true We can require richer TypeTags for these methods, or just take a flag from the caller specifying whether the types are equal. was: In GraphImpl, mapVertices and outerJoinVertices use a more efficient implementation when the map function preserves vertex attribute types. This is implemented by comparing the ClassTags of the old and new vertex attribute types. However, ClassTags store _erased_ types, so the comparison will return a false positive for types with different type parameters, such as Option[Int] and Option[Double]. Demo in the Scala shell: scala import scala.reflect.{classTag, ClassTag} scala def typesEqual[A: ClassTag, B: ClassTag](a: A, b: B): Boolean = classTag[A] equals classTag[B] scala typesEqual(Some(1), Some(2.0)) // should return false res2: Boolean = true We can require richer TypeTags for these methods, or just take a flag from the caller specifying whether the types are equal. GraphX performs type comparison incorrectly --- Key: SPARK-1552 URL: https://issues.apache.org/jira/browse/SPARK-1552 Project: Spark Issue Type: Bug Components: GraphX Reporter: Ankur Dave In GraphImpl, mapVertices and outerJoinVertices use a more efficient implementation when the map function preserves vertex attribute types. This is implemented by comparing the ClassTags of the old and new vertex attribute types. However, ClassTags store _erased_ types, so the comparison will return a false positive for types with different type parameters, such as Option[Int] and Option[Double]. Thanks to Pierre-Alexandre Fonta for reporting this bug on the [mailing list|http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Cast-error-when-comparing-a-vertex-attribute-after-its-type-has-changed-td4119.html]. Demo in the Scala shell: scala import scala.reflect.{classTag, ClassTag} scala def typesEqual[A: ClassTag, B: ClassTag](a: A, b: B): Boolean = classTag[A] equals classTag[B] scala typesEqual(Some(1), Some(2.0)) // should return false res2: Boolean = true We can require richer TypeTags for these methods, or just take a flag from the caller specifying whether the types are equal. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1438) Update RDD.sample() API to make seed parameter optional
[ https://issues.apache.org/jira/browse/SPARK-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975486#comment-13975486 ] Arun Ramakrishnan commented on SPARK-1438: -- pull request at https://github.com/apache/spark/pull/462 Update RDD.sample() API to make seed parameter optional --- Key: SPARK-1438 URL: https://issues.apache.org/jira/browse/SPARK-1438 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia Priority: Blocker Labels: Starter Fix For: 1.0.0 When a seed is not given, it should pick one based on Math.random(). This needs to be done in Java and Python as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-1490) Add kerberos support to the HistoryServer
[ https://issues.apache.org/jira/browse/SPARK-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-1490: Assignee: Thomas Graves Add kerberos support to the HistoryServer - Key: SPARK-1490 URL: https://issues.apache.org/jira/browse/SPARK-1490 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 1.0.0 Reporter: Thomas Graves Assignee: Thomas Graves Now that we have a history server that works on yarn and mesos we should add the ability for it to authenticate via kerberos so that it can read HDFS files without having to be restarted every 24 hours. One solution to this is to have the history server read a keytab file. The Hadoop UserGroupInformation class has that functionality built in and as long as its using rpc to talk to hdfs it will automatically relogin when it needs to. If the history server isn't using rpc to talk to hdfs then we would have to add some functionality to relogin approximately every 24 hours (configurable time). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1472) Go through YARN api used in Spark to make sure we aren't using Private Apis
[ https://issues.apache.org/jira/browse/SPARK-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975609#comment-13975609 ] Thomas Graves commented on SPARK-1472: -- So it looks like its currently impossible to use all public interfaces with Hadoop. There are some that are limitedprivate that we will have to use. I filed several jira in Hadoop land to add public interfaces for various things that we either need or would be handy for all types of applications. https://issues.apache.org/jira/browse/YARN-1953 I should file a couple more jira also as things like UserGroupInformation is marked limitedprivate also. In this jira I will clean up as much as possible mostly in the yarn stable code since that is where api's changed scope. Go through YARN api used in Spark to make sure we aren't using Private Apis --- Key: SPARK-1472 URL: https://issues.apache.org/jira/browse/SPARK-1472 Project: Spark Issue Type: Task Components: YARN Affects Versions: 1.0.0 Reporter: Thomas Graves Assignee: Thomas Graves We need to look through all the yarn api's we are using to make sure they aren't now Private. If they are private change to not use those api's. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-1439) Aggregate Scaladocs across projects
[ https://issues.apache.org/jira/browse/SPARK-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-1439: Assignee: Matei Zaharia Aggregate Scaladocs across projects --- Key: SPARK-1439 URL: https://issues.apache.org/jira/browse/SPARK-1439 Project: Spark Issue Type: Sub-task Components: Documentation Reporter: Matei Zaharia Assignee: Matei Zaharia Fix For: 1.0.0 Apparently there's a Unidoc plugin to put together ScalaDocs across modules: https://github.com/akka/akka/blob/master/project/Unidoc.scala -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-1440) Generate JavaDoc instead of ScalaDoc for Java API
[ https://issues.apache.org/jira/browse/SPARK-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-1440: Assignee: Matei Zaharia Generate JavaDoc instead of ScalaDoc for Java API - Key: SPARK-1440 URL: https://issues.apache.org/jira/browse/SPARK-1440 Project: Spark Issue Type: Sub-task Components: Documentation Reporter: Matei Zaharia Assignee: Matei Zaharia Fix For: 1.0.0 It may be possible to use this plugin: https://github.com/typesafehub/genjavadoc -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1554) Update doc overview page to not mention building if you get a pre-built distro
Matei Zaharia created SPARK-1554: Summary: Update doc overview page to not mention building if you get a pre-built distro Key: SPARK-1554 URL: https://issues.apache.org/jira/browse/SPARK-1554 Project: Spark Issue Type: Sub-task Reporter: Matei Zaharia SBT assembly takes a long time and we should tell people to skip it if they got a binary build (which will likely be the most common case). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1202) Add a cancel button in the UI for stages
[ https://issues.apache.org/jira/browse/SPARK-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1202. Resolution: Fixed Add a cancel button in the UI for stages -- Key: SPARK-1202 URL: https://issues.apache.org/jira/browse/SPARK-1202 Project: Spark Issue Type: New Feature Components: Web UI Reporter: Patrick Wendell Assignee: Sundeep Narravula Priority: Critical Fix For: 1.0.0 Seems like this would be really useful for people. It's not that hard, we just need to lookup the jobs associated with the stage and kill them. Might involve exposing some additional API's in SparkContext. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-918) hadoop-client dependency should be explained for Scala in addition to Java in quickstart
[ https://issues.apache.org/jira/browse/SPARK-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-918. --- Resolution: Won't Fix This was fixed as a result of a separate re-factoring of the docs. hadoop-client dependency should be explained for Scala in addition to Java in quickstart Key: SPARK-918 URL: https://issues.apache.org/jira/browse/SPARK-918 Project: Spark Issue Type: Bug Components: Documentation Reporter: Patrick Wendell Labels: starter Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1529) Support setting spark.local.dirs to a hadoop FileSystem
[ https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975914#comment-13975914 ] Cheng Lian commented on SPARK-1529: --- After some investigation, I came to the conclusion that, unlike adding Tachyon support, to allow setting {{spark.local.dir}} to a Hadoop FS location, instead of adding something like {{HDFSBlockManager}} / {{HDFSStore}}, we have to refactor related local FS access code to leverage HDFS interfaces. And it seems hard to make this change incremental. Besides writing shuffle map output, at least two places reference {{spark.local.dir}}: # HTTP broadcasting uses {{spark.local.dir}} as resource root, and access local FS with `java.io.File` # {{FileServerHandler}} accesses {{spark.local.dir}} via {{DiskBlockManager}} and reads local file with {{FileSegment}} and {{java.io.File}} Adding new block manager / store for HDFS can't fix these places. I'm currently working on this issue by: # Refactoring {{FileSegment.file}} from {{java.io.File}} to {{org.apache.hadoop.fs.Path}}, # Refactoring {{DiskBlockManager}}, {{DiskStore}}, {{HttpBroadcast}} {{FileServerHandler}} to leverage HDFS interfaces. Please leave comments if I missed anything or there's simpler ways to workaround this. (PS: We should definitely refactor block manager related code to reduce duplicate code and encapsulate more details. Maybe the public interface of block manager should only communicate with other component with block IDs and storage levels.) Support setting spark.local.dirs to a hadoop FileSystem Key: SPARK-1529 URL: https://issues.apache.org/jira/browse/SPARK-1529 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Patrick Wendell Assignee: Cheng Lian Fix For: 1.1.0 In some environments, like with MapR, local volumes are accessed through the Hadoop filesystem interface. We should allow setting spark.local.dir to a Hadoop filesystem location. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (SPARK-1438) Update RDD.sample() API to make seed parameter optional
[ https://issues.apache.org/jira/browse/SPARK-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Tham updated SPARK-1438: -- Comment: was deleted (was: I can work on this (I'd like to try to submit my first Spark contribution :-) )) Update RDD.sample() API to make seed parameter optional --- Key: SPARK-1438 URL: https://issues.apache.org/jira/browse/SPARK-1438 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia Priority: Blocker Labels: Starter Fix For: 1.0.0 When a seed is not given, it should pick one based on Math.random(). This needs to be done in Java and Python as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1529) Support setting spark.local.dirs to a hadoop FileSystem
[ https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975947#comment-13975947 ] Patrick Wendell commented on SPARK-1529: [~liancheng] Hey Cheng, the tricky thing here is want to avoid _always_ going through the HDFS filesystem interface when people are actually using local files. We might need to add an intermediate abstraction to deal with this. We already do this elsehwere in the code base, for instance the JobLogger will load an output stream either directly form a file or from a hadoop file. One thing to note is that the requirement here is really only for the shuffle files, not for the other uses. But I realize we currently conflate these inside of Spark so that might not buy us much. I'll look into this a bit more later. Support setting spark.local.dirs to a hadoop FileSystem Key: SPARK-1529 URL: https://issues.apache.org/jira/browse/SPARK-1529 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Patrick Wendell Assignee: Cheng Lian Fix For: 1.1.0 In some environments, like with MapR, local volumes are accessed through the Hadoop filesystem interface. We should allow setting spark.local.dir to a Hadoop filesystem location. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1556) jets3t dependency is outdated
Nan Zhu created SPARK-1556: -- Summary: jets3t dependency is outdated Key: SPARK-1556 URL: https://issues.apache.org/jira/browse/SPARK-1556 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0, 0.8.1, 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu Fix For: 1.0.0 In Hadoop 2.2.x or newer, Jet3st 0.9.0 which defines S3ServiceException/ServiceException is introduced, however, Spark still relies on Jet3st 0.7.x which has no definition of these classes What I met is that [code] 14/04/21 19:30:53 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 14/04/21 19:30:53 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 14/04/21 19:30:53 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 14/04/21 19:30:53 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 14/04/21 19:30:53 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:280) at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:270) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:205) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:205) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:205) at org.apache.spark.SparkContext.runJob(SparkContext.scala:891) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:741) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:692) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:574) at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:900) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:640) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:604) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:793) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:838) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:750) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:598) at
[jira] [Commented] (SPARK-1556) jets3t dependency is outdated
[ https://issues.apache.org/jira/browse/SPARK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975957#comment-13975957 ] Sean Owen commented on SPARK-1556: -- Actually, why does Spark have a direct dependency on jets3t at all? it is not used directly in the code. If it's only needed at runtime, it can/should be declared that way. But if the reason it's there is just for Hadoop, then of course hadoop-client is already bringing it in, and should be allowed to bring in the version it wants. jets3t dependency is outdated - Key: SPARK-1556 URL: https://issues.apache.org/jira/browse/SPARK-1556 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.8.1, 0.9.0, 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu Fix For: 1.0.0 In Hadoop 2.2.x or newer, Jet3st 0.9.0 which defines S3ServiceException/ServiceException is introduced, however, Spark still relies on Jet3st 0.7.x which has no definition of these classes What I met is that [code] 14/04/21 19:30:53 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 14/04/21 19:30:53 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 14/04/21 19:30:53 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 14/04/21 19:30:53 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 14/04/21 19:30:53 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:280) at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:270) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:205) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:205) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:205) at org.apache.spark.SparkContext.runJob(SparkContext.scala:891) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:741) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:692) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:574) at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:900) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609) at
[jira] [Commented] (SPARK-1556) jets3t dependency is outdated
[ https://issues.apache.org/jira/browse/SPARK-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975967#comment-13975967 ] Sean Owen commented on SPARK-1556: -- OK, I partly eat my words. jets3t isn't included by the Hadoop client library it appears. It's only included by the Hadoop server-side components. So yeah Spark has to include jets3t to make s3:// URLs work in the REPL. FWIW I agree with updating the version -- ideally just in the Hadoop 2.2+ profiles. And it should be scoperuntime/scope jets3t dependency is outdated - Key: SPARK-1556 URL: https://issues.apache.org/jira/browse/SPARK-1556 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.8.1, 0.9.0, 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu Fix For: 1.0.0 In Hadoop 2.2.x or newer, Jet3st 0.9.0 which defines S3ServiceException/ServiceException is introduced, however, Spark still relies on Jet3st 0.7.x which has no definition of these classes What I met is that [code] 14/04/21 19:30:53 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 14/04/21 19:30:53 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 14/04/21 19:30:53 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 14/04/21 19:30:53 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 14/04/21 19:30:53 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:280) at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:270) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:205) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:205) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:205) at org.apache.spark.SparkContext.runJob(SparkContext.scala:891) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:741) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:692) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:574) at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:900) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609) at
[jira] [Created] (SPARK-1557) Set permissions on event log files/directories
Thomas Graves created SPARK-1557: Summary: Set permissions on event log files/directories Key: SPARK-1557 URL: https://issues.apache.org/jira/browse/SPARK-1557 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Thomas Graves Assignee: Thomas Graves We should set the permissions on the event log directories and files so that it restricts access to only those users who own them, but could also allow a super user to read them so that they could be displayed by the history server in a multi-tenant secure environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1529) Support setting spark.local.dirs to a hadoop FileSystem
[ https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976038#comment-13976038 ] Patrick Wendell commented on SPARK-1529: One idea proposed by [~adav] was to always use the Hadoop filesystem API, but to potentially implement our own version of the local filesystem if we find the Hadoop version has performance drawbacks. Another issue is that we use FileChannel objects directly in the `DiskBlockObjectWriter`. After looking through this a bit, the functionality there to commit and rewind writes is not actually used anywhere, we could probably just remove it. [~liancheng] I think it would be worth it to look at a version where we just take all of the File API's and replace them with Hadoop equivalents. I.e. your proposal. Support setting spark.local.dirs to a hadoop FileSystem Key: SPARK-1529 URL: https://issues.apache.org/jira/browse/SPARK-1529 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Patrick Wendell Assignee: Cheng Lian Fix For: 1.1.0 In some environments, like with MapR, local volumes are accessed through the Hadoop filesystem interface. We should allow setting spark.local.dir to a Hadoop filesystem location. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1399) Reason for Stage Failure should be shown in UI
[ https://issues.apache.org/jira/browse/SPARK-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1399. Resolution: Fixed Reason for Stage Failure should be shown in UI -- Key: SPARK-1399 URL: https://issues.apache.org/jira/browse/SPARK-1399 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: Kay Ousterhout Assignee: Nan Zhu Right now, we don't show why a stage failed in the UI. We have this information, and it would be useful for users to see (e.g., to see that a stage was killed because the job was cancelled). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1399) Reason for Stage Failure should be shown in UI
[ https://issues.apache.org/jira/browse/SPARK-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1399: --- Fix Version/s: 1.0.0 Reason for Stage Failure should be shown in UI -- Key: SPARK-1399 URL: https://issues.apache.org/jira/browse/SPARK-1399 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: Kay Ousterhout Assignee: Nan Zhu Fix For: 1.0.0 Right now, we don't show why a stage failed in the UI. We have this information, and it would be useful for users to see (e.g., to see that a stage was killed because the job was cancelled). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1539) RDDPage.scala contains RddPage
[ https://issues.apache.org/jira/browse/SPARK-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1539. Resolution: Fixed Fix Version/s: 1.0.0 RDDPage.scala contains RddPage -- Key: SPARK-1539 URL: https://issues.apache.org/jira/browse/SPARK-1539 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.0.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Fix For: 1.0.0 SPARK-1386 changed RDDPage to RddPage but didn't change the filename. I tried sbt/sbt publish-local. Inside the spark-core jar, the unit name is RDDPage.class and hence I got the following error: {code} [error] (run-main) java.lang.NoClassDefFoundError: org/apache/spark/ui/storage/RddPage java.lang.NoClassDefFoundError: org/apache/spark/ui/storage/RddPage at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:59) at org.apache.spark.ui.SparkUI.init(SparkUI.scala:52) at org.apache.spark.ui.SparkUI.init(SparkUI.scala:42) at org.apache.spark.SparkContext.init(SparkContext.scala:215) at MovieLensALS$.main(MovieLensALS.scala:38) at MovieLensALS.main(MovieLensALS.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.storage.RddPage at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:59) at org.apache.spark.ui.SparkUI.init(SparkUI.scala:52) at org.apache.spark.ui.SparkUI.init(SparkUI.scala:42) at org.apache.spark.SparkContext.init(SparkContext.scala:215) at MovieLensALS$.main(MovieLensALS.scala:38) at MovieLensALS.main(MovieLensALS.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) {code} This can be fixed after renaming RddPage to RDDPage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1558) [streaming] Update receiver information to match it with code
Tathagata Das created SPARK-1558: Summary: [streaming] Update receiver information to match it with code Key: SPARK-1558 URL: https://issues.apache.org/jira/browse/SPARK-1558 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Tathagata Das Assignee: Tathagata Das Priority: Blocker -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1505) [streaming] Add 0.9 to 1.0 migration guide for streaming receiver
[ https://issues.apache.org/jira/browse/SPARK-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-1505: - Component/s: Streaming [streaming] Add 0.9 to 1.0 migration guide for streaming receiver - Key: SPARK-1505 URL: https://issues.apache.org/jira/browse/SPARK-1505 Project: Spark Issue Type: Sub-task Components: Documentation, Streaming Reporter: Tathagata Das Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1558) [streaming] Update receiver information to match it with code
[ https://issues.apache.org/jira/browse/SPARK-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-1558: - Component/s: Documentation [streaming] Update receiver information to match it with code - Key: SPARK-1558 URL: https://issues.apache.org/jira/browse/SPARK-1558 Project: Spark Issue Type: Sub-task Components: Documentation, Streaming Reporter: Tathagata Das Assignee: Tathagata Das Priority: Blocker Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1504) [streaming] Add deployment subsection to streaming
[ https://issues.apache.org/jira/browse/SPARK-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-1504: - Component/s: Streaming [streaming] Add deployment subsection to streaming -- Key: SPARK-1504 URL: https://issues.apache.org/jira/browse/SPARK-1504 Project: Spark Issue Type: Sub-task Components: Documentation, Streaming Reporter: Tathagata Das Assignee: Tathagata Das Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-1457) Change APIs for training algorithms to take optimizer as parameter
[ https://issues.apache.org/jira/browse/SPARK-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai reassigned SPARK-1457: -- Assignee: DB Tsai Change APIs for training algorithms to take optimizer as parameter --- Key: SPARK-1457 URL: https://issues.apache.org/jira/browse/SPARK-1457 Project: Spark Issue Type: Improvement Components: MLlib Reporter: DB Tsai Assignee: DB Tsai Currently, the training api has signature like LogisticRegressionWithSGD. If we want to use another optimizer, we've two options, either adding new api like LogisticRegressionWithNewOptimizer which causes 99% of the code duplication, or we can re-factorize the api to take the optimizer as an option like the following. class LogisticRegression private ( var optimizer: Optimizer) extends GeneralizedLinearAlgorithm[LogisticRegressionModel] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-1516) Yarn Client should not call System.exit, should throw exception instead.
[ https://issues.apache.org/jira/browse/SPARK-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai reassigned SPARK-1516: -- Assignee: DB Tsai Yarn Client should not call System.exit, should throw exception instead. Key: SPARK-1516 URL: https://issues.apache.org/jira/browse/SPARK-1516 Project: Spark Issue Type: Improvement Components: Deploy Reporter: DB Tsai Assignee: DB Tsai People submit spark job inside their application to yarn cluster using spark yarn client, and it's not desirable to call System.exit in yarn client which will terminate the parent application as well. We should throw exception instead, and people can determine which action they want to take given the exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1559) Add conf dir to CLASSPATH in compute-classpath.sh dependent on whether SPARK_CONF_DIR is set
Albert Chu created SPARK-1559: - Summary: Add conf dir to CLASSPATH in compute-classpath.sh dependent on whether SPARK_CONF_DIR is set Key: SPARK-1559 URL: https://issues.apache.org/jira/browse/SPARK-1559 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.0.0 Reporter: Albert Chu Priority: Minor Attachments: SPARK-1559.patch bin/load-spark-env.sh loads spark-env.sh from SPARK_CONF_DIR if it is set, or from $parent_dir/conf if it is not set. However, in compute-classpath.sh, the CLASSPATH adds $FWDIR/conf to the CLASSPATH regardless if SPARK_CONF_DIR is set. Attached patch fixes this. Pull request on github will also be sent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-693) Let deploy scripts set alternate conf, work directories
[ https://issues.apache.org/jira/browse/SPARK-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert Chu updated SPARK-693: - Attachment: SPARK-693.patch We required this support in our environment. Attached is my patch to implement this for Spark 1.0.0. Git pull request will be sent too. Let deploy scripts set alternate conf, work directories --- Key: SPARK-693 URL: https://issues.apache.org/jira/browse/SPARK-693 Project: Spark Issue Type: Improvement Affects Versions: 0.6.2 Reporter: David Chiang Priority: Minor Attachments: SPARK-693.patch Currently SPARK_CONF_DIR is overridden in spark-config.sh, and start-slaves.sh doesn't allow the user to pass a -d option in to set the work directory. Allowing this is a small change and makes it possible to have multiple clusters running at once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1543) Add ADMM for solving Lasso (and elastic net) problem
[ https://issues.apache.org/jira/browse/SPARK-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuo Xiang updated SPARK-1543: -- Description: This PR introduces the Alternating Direction Method of Multipliers (ADMM) for solving Lasso (elastic net, in fact) in mllib. ADMM is capable of solving a class of composite minimization problems in a distributed way. Specifically for Lasso (if only L1-regularization) or elastic-net (both L1- and L2- regularization), in each iteration, it requires solving independent systems of linear equations on each partition and a subsequent soft-threholding operation on the driver machine. Unlike SGD, it is a deterministic algorithm (except for the random partition). Details can be found in the [S. Boyd's paper](http://www.stanford.edu/~boyd/papers/admm_distr_stats.html). The linear algebra operations mainly rely on the Breeze library, particularly, it applies `breeze.linalg.cholesky` to perform cholesky decomposition on each partition to solve the linear system. I tried to follow the organization of existing Lasso implementation. However, as ADMM is also a good fit for similar optimization problems, e.g., (sparse) logistic regression, it may be worth reorganizing and putting ADMM into a separate section. was: This PR introduces the Alternating Direction Method of Multipliers (ADMM) for solving Lasso (elastic net, in fact) in mllib. ADMM is capable of solving a class of composite minimization problems in a distributed way. Specifically for Lasso (if only L1-regularization) or elastic-net (both L1- and L2- regularization), it requires solving independent systems of linear equations on each partition and a soft-threholding operation on the driver. Unlike SGD, it is a deterministic algorithm (except for the random partition). Details can be found in the [S. Boyd's paper](http://www.stanford.edu/~boyd/papers/admm_distr_stats.html). The linear algebra operations mainly rely on the Breeze library, particularly, it applies `breeze.linalg.cholesky` to perform cholesky decomposition on each partition to solve the linear system. I tried to follow the organization of existing Lasso implementation. However, as ADMM is also a good fit for similar optimization problems, e.g., (sparse) logistic regression, it may worth to re-organize and put ADMM into a separate section. PR: https://github.com/apache/spark/pull/458 Add ADMM for solving Lasso (and elastic net) problem Key: SPARK-1543 URL: https://issues.apache.org/jira/browse/SPARK-1543 Project: Spark Issue Type: New Feature Reporter: Shuo Xiang Priority: Minor Labels: features Original Estimate: 168h Remaining Estimate: 168h This PR introduces the Alternating Direction Method of Multipliers (ADMM) for solving Lasso (elastic net, in fact) in mllib. ADMM is capable of solving a class of composite minimization problems in a distributed way. Specifically for Lasso (if only L1-regularization) or elastic-net (both L1- and L2- regularization), in each iteration, it requires solving independent systems of linear equations on each partition and a subsequent soft-threholding operation on the driver machine. Unlike SGD, it is a deterministic algorithm (except for the random partition). Details can be found in the [S. Boyd's paper](http://www.stanford.edu/~boyd/papers/admm_distr_stats.html). The linear algebra operations mainly rely on the Breeze library, particularly, it applies `breeze.linalg.cholesky` to perform cholesky decomposition on each partition to solve the linear system. I tried to follow the organization of existing Lasso implementation. However, as ADMM is also a good fit for similar optimization problems, e.g., (sparse) logistic regression, it may be worth reorganizing and putting ADMM into a separate section. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1561) sbt/sbt assembly generates too many local files
Xiangrui Meng created SPARK-1561: Summary: sbt/sbt assembly generates too many local files Key: SPARK-1561 URL: https://issues.apache.org/jira/browse/SPARK-1561 Project: Spark Issue Type: Improvement Affects Versions: 1.0.0 Reporter: Xiangrui Meng Running `find ./ | wc -l` after `sbt/sbt assembly` returned 564365 This hits the default limit of #INode of an 8GB EXT FS (the default volume size for an EC2 instance), which means you can do nothing after 'sbt/sbt assembly` on such a partition. Most of the small files are under assembly/target/streams and the same folder under examples/. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1562) Exclude internal catalyst classes from scaladoc, or make them package private
Patrick Wendell created SPARK-1562: -- Summary: Exclude internal catalyst classes from scaladoc, or make them package private Key: SPARK-1562 URL: https://issues.apache.org/jira/browse/SPARK-1562 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Patrick Wendell Assignee: Michael Armbrust Priority: Blocker Fix For: 1.0.0 Michael - this is up to you. But I noticed there are a ton of internal catalyst types that show up in our scaladoc. I'm not sure if you mean these to be user-facing API's. If not, it might be good to hide them from the docs or make them package private. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1440) Generate JavaDoc instead of ScalaDoc for Java API
[ https://issues.apache.org/jira/browse/SPARK-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1440. Resolution: Fixed Generate JavaDoc instead of ScalaDoc for Java API - Key: SPARK-1440 URL: https://issues.apache.org/jira/browse/SPARK-1440 Project: Spark Issue Type: Sub-task Components: Documentation Reporter: Matei Zaharia Assignee: Matei Zaharia Fix For: 1.0.0 It may be possible to use this plugin: https://github.com/typesafehub/genjavadoc -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1439) Aggregate Scaladocs across projects
[ https://issues.apache.org/jira/browse/SPARK-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1439. Resolution: Fixed Aggregate Scaladocs across projects --- Key: SPARK-1439 URL: https://issues.apache.org/jira/browse/SPARK-1439 Project: Spark Issue Type: Sub-task Components: Documentation Reporter: Matei Zaharia Assignee: Matei Zaharia Fix For: 1.0.0 Apparently there's a Unidoc plugin to put together ScalaDocs across modules: https://github.com/akka/akka/blob/master/project/Unidoc.scala -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1332) Improve Spark Streaming's Network Receiver and InputDStream API for future stability
[ https://issues.apache.org/jira/browse/SPARK-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1332. Resolution: Fixed Fix Version/s: 1.0.0 Improve Spark Streaming's Network Receiver and InputDStream API for future stability Key: SPARK-1332 URL: https://issues.apache.org/jira/browse/SPARK-1332 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 0.9.0 Reporter: Tathagata Das Assignee: Tathagata Das Priority: Blocker Fix For: 1.0.0 The current Network Receiver API makes it slightly complicated to right a new receiver as one needs to create an instance of BlockGenerator as shown in SocketReceiver https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/SocketInputDStream.scala#L51 Exposing the BlockGenerator interface has made it harder to improve the receiving process. The API of NetworkReceiver (which was not a very stable API anyways) needs to be change if we are to ensure future stability. Additionally, the functions like streamingContext.socketStream that create input streams, return DStream objects. That makes it hard to expose functionality (say, rate limits) unique to input dstreams. They should return InputDStream or NetworkInputDStream. -- This message was sent by Atlassian JIRA (v6.2#6252)