[jira] [Created] (SPARK-3811) More robust / standard Utils.deleteRecursively, Utils.createTempDir
Sean Owen created SPARK-3811: Summary: More robust / standard Utils.deleteRecursively, Utils.createTempDir Key: SPARK-3811 URL: https://issues.apache.org/jira/browse/SPARK-3811 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Sean Owen Priority: Minor I noticed a few issues with how temp directories are created and deleted: *Minor* * Guava's {{Files.createTempDir()}} plus {{File.deleteOnExit()}} is used in many tests to make a temp dir, but {{Utils.createTempDir()}} seems to be the standard Spark mechanism * Call to {{File.deleteOnExit()}} could be pushed into {{Utils.createTempDir()}} as well, along with this replacement. * _I messed up the message in an exception in {{Utils}} in SPARK-3794; fixed here_ *Bit Less Minor* * {{Utils.deleteRecursively()}} fails immediately if any {{IOException}} occurs, instead of trying to delete any remaining files and subdirectories. I've observed this leave temp dirs around. I suggest changing it to continue in the face of an exception and throw one of the possibly several exceptions that occur at the end. * {{Utils.createTempDir()}} will add a JVM shutdown hook every time the method is called. Even if the subdir is the parent of another parent dir, since this check is inside the hook. However {{Utils}} manages a set of all dirs to delete on shutdown already, called {{shutdownDeletePaths}}. A single hook can be registered to delete all of these on exit. This is how Tachyon temp paths are cleaned up in {{TachyonBlockManager}}. I noticed a few other things that might be changed but wanted to ask first: * Shouldn't the set of dirs to delete be {{File}}, not just {{String}} paths? * {{Utils}} manages the set of {{TachyonFile}} that have been registered for deletion, but the shutdown hook is managed in {{TachyonBlockManager}}. Should this logic not live together, and not in {{Utils}}? it's more specific to Tachyon, and looks a slight bit odd to import in such a generic place. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3828) Spark returns inconsistent results when building with different Hadoop version
[ https://issues.apache.org/jira/browse/SPARK-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161552#comment-14161552 ] Sean Owen commented on SPARK-3828: -- (Agree, although there's an interesting point in here about Spark using the old TextInputFormat even on newer Hadoop. I kind of assume that will persist until Hadoop 1.x support is dropped, rather than bother to use reflection to use the newer class.) Spark returns inconsistent results when building with different Hadoop version --- Key: SPARK-3828 URL: https://issues.apache.org/jira/browse/SPARK-3828 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Environment: OSX 10.9, Spark master branch Reporter: Liquan Pei For text8 data at http://mattmahoney.net/dc/text8.zip. To reproduce, please unzip first. Spark build with different Hadoop version returns different result. {code} val data = sc.textFile(text8) data.count() {code} returns 1 when built with SPARK_HADOOP_VERSION=1.0.4 and return 2 when built with SPARK_HADOOP_VERSION=2.4.0. Looking through the rdd code, it seems that textFile uses hadoopFile which creates HadoopRDD, we should probably create newHadoopRDD when building spark with SPARK_HADOOP_VERSION = 2.0.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3785) Support off-loading computations to a GPU
[ https://issues.apache.org/jira/browse/SPARK-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163125#comment-14163125 ] Sean Owen commented on SPARK-3785: -- Sure, but a GPU isn't going to be good at general map, filter, reduce, groupBy operations. It can't run arbitrary functions like the JVM. I wonder how many use cases actually consist of enough computation that can be specialized for the GPU, chained together, that makes the GPU worth it. My suspicious is still that there are really a few wins for this use case but that they are achievable by just calling to the GPU from Java code. I'd love to see that this is in fact a way to transparently speed up a non-trivial slice of mainstream Spark use cases though. Support off-loading computations to a GPU - Key: SPARK-3785 URL: https://issues.apache.org/jira/browse/SPARK-3785 Project: Spark Issue Type: Brainstorming Components: MLlib Reporter: Thomas Darimont Priority: Minor Are there any plans to adding support for off-loading computations to the GPU, e.g. via an open-cl binding? http://www.jocl.org/ https://code.google.com/p/javacl/ http://lwjgl.org/wiki/index.php?title=OpenCL_in_LWJGL -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3895) Scala style: Indentation of method
[ https://issues.apache.org/jira/browse/SPARK-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166415#comment-14166415 ] Sean Owen commented on SPARK-3895: -- To be clear, the second is correct because of line length and brace placement. However the PR you mention shows the opposite, changing the second into the first. The style guide already covers line length and braces. If the net change is moving braces, I am not sure that's worth doing for its own sake, given it is completely non-functional. (Although it can be fixed up when nearby code is edited.) So is there any action that falls out from this JIRA? Scala style: Indentation of method -- Key: SPARK-3895 URL: https://issues.apache.org/jira/browse/SPARK-3895 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: sjk such as https://github.com/apache/spark/pull/2734 {code:title=core/src/main/scala/org/apache/spark/Aggregator.scala|borderStyle=solid} // for example def combineCombinersByKey(iter: Iterator[_ : Product2[K, C]], context: TaskContext) : Iterator[(K, C)] = { ... def combineValuesByKey(iter: Iterator[_ : Product2[K, V]], context: TaskContext): Iterator[(K, C)] = { {code} there are not conform to the rule.https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide there are so much code like this -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3894) Scala style: line length increase to 160
[ https://issues.apache.org/jira/browse/SPARK-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166417#comment-14166417 ] Sean Owen commented on SPARK-3894: -- This is an ancient religious war. At Google, it was the 80s vs 120s, and eventually the 120s won after a long bitter battle (email thread really). I have not heard anyone argue for 160, and the best practical argument against it is that it makes it hard even on largeish screens to have two editors side by side. Even 120 does sometimes. Even if the standard became 160, the entire code base is wrapped at 100, and we'd have code that is vastly 100 characters wide with a bit of 160. I personally think 100 is just fine. Scala style: line length increase to 160 Key: SPARK-3894 URL: https://issues.apache.org/jira/browse/SPARK-3894 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: sjk 100 is shorter our screen is bigger -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3445) Deprecate and later remove YARN alpha support
[ https://issues.apache.org/jira/browse/SPARK-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168577#comment-14168577 ] Sean Owen commented on SPARK-3445: -- YARN alpha is not deprecated or removed yet; this JIRA is not resolved. Even if it were deprecated it should still compile and work. This is an error introduced by a recent change. [~andrewor14] [~andrewor] can you have a look? this line looks like it was introduced in https://github.com/apache/spark/commit/c4022dd52b4827323ff956632dc7623f546da937 / SPARK-3477 Deprecate and later remove YARN alpha support - Key: SPARK-3445 URL: https://issues.apache.org/jira/browse/SPARK-3445 Project: Spark Issue Type: Improvement Components: YARN Reporter: Patrick Wendell This will depend a bit on both user demand and the commitment level of maintainers, but I'd like to propose the following timeline for yarn-alpha support. Spark 1.2: Deprecate YARN-alpha Spark 1.3: Remove YARN-alpha (i.e. require YARN-stable) Since YARN-alpha is clearly identified as an alpha API, it seems reasonable to drop support for it in a minor release. However, it does depend a bit whether anyone uses this outside of Yahoo!, and that I'm not sure of. In the past this API has been used and maintained by Yahoo, but they'll be migrating soon to the stable API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3926) result of JavaRDD collectAsMap() is not serializable
[ https://issues.apache.org/jira/browse/SPARK-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169383#comment-14169383 ] Sean Owen commented on SPARK-3926: -- Yeah, seems fine to just let {{MapWrapper}} implement {{Serializable}}, because standard Java {{Map}} implementations are as well. It's backwards-compatible so seems like an easy PR to submit if you like. result of JavaRDD collectAsMap() is not serializable Key: SPARK-3926 URL: https://issues.apache.org/jira/browse/SPARK-3926 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.1.0 Environment: CentOS / Spark 1.1 / Hadoop Hortonworks 2.4.0.2.1.2.0-402 Reporter: Antoine Amend Using the Java API, I want to collect the result of a RDDString, String as a HashMap using collectAsMap function: MapString, String map = myJavaRDD.collectAsMap(); This works fine, but when passing this map to another function, such as... myOtherJavaRDD.mapToPair(new CustomFunction(map)) ...this leads to the following error: Exception in thread main org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) at org.apache.spark.rdd.RDD.map(RDD.scala:270) at org.apache.spark.api.java.JavaRDDLike$class.mapToPair(JavaRDDLike.scala:99) at org.apache.spark.api.java.JavaPairRDD.mapToPair(JavaPairRDD.scala:44) ../.. MY CLASS ../.. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.NotSerializableException: scala.collection.convert.Wrappers$MapWrapper at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164) This seems to be due to WrapAsJava.scala being non serializable ../.. implicit def mapAsJavaMap[A, B](m: Map[A, B]): ju.Map[A, B] = m match { //case JConcurrentMapWrapper(wrapped) = wrapped case JMapWrapper(wrapped) = wrapped.asInstanceOf[ju.Map[A, B]] case _ = new MapWrapper(m) } ../.. The workaround is to manually wrapper this map into another one (serialized) MapString, String map = myJavaRDD.collectAsMap(); MapString, String tmp = new HashMapString, String(map); myOtherJavaRDD.mapToPair(new CustomFunction(tmp)) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3897) Scala style: format example code
[ https://issues.apache.org/jira/browse/SPARK-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-3897. -- Resolution: Won't Fix Given recent discussion, and consensus to not make sweeping style changes, I think this is WontFix. Scala style: format example code Key: SPARK-3897 URL: https://issues.apache.org/jira/browse/SPARK-3897 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: sjk https://github.com/apache/spark/pull/2754 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3895) Scala style: Indentation of method
[ https://issues.apache.org/jira/browse/SPARK-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-3895. -- Resolution: Won't Fix Given recent discussion, and consensus to not make sweeping style changes, I think this is WontFix. Scala style: Indentation of method -- Key: SPARK-3895 URL: https://issues.apache.org/jira/browse/SPARK-3895 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: sjk such as https://github.com/apache/spark/pull/2734 {code:title=core/src/main/scala/org/apache/spark/Aggregator.scala|borderStyle=solid} // for example def combineCombinersByKey(iter: Iterator[_ : Product2[K, C]], context: TaskContext) : Iterator[(K, C)] = { ... def combineValuesByKey(iter: Iterator[_ : Product2[K, V]], context: TaskContext): Iterator[(K, C)] = { {code} there are not conform to the rule.https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide there are so much code like this -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3781) code style format
[ https://issues.apache.org/jira/browse/SPARK-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-3781. -- Resolution: Won't Fix Given recent discussion, and consensus to not make sweeping style changes, I think this is WontFix. code style format - Key: SPARK-3781 URL: https://issues.apache.org/jira/browse/SPARK-3781 Project: Spark Issue Type: Improvement Reporter: sjk -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3890) remove redundant spark.executor.memory in doc
[ https://issues.apache.org/jira/browse/SPARK-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169405#comment-14169405 ] Sean Owen commented on SPARK-3890: -- For some reason the PR was not linked: https://github.com/apache/spark/pull/2745 remove redundant spark.executor.memory in doc - Key: SPARK-3890 URL: https://issues.apache.org/jira/browse/SPARK-3890 Project: Spark Issue Type: Improvement Components: Documentation Reporter: WangTaoTheTonic Priority: Minor Seems like there is a redundant spark.executor.memory config item in docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3896) checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is expensive
[ https://issues.apache.org/jira/browse/SPARK-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169409#comment-14169409 ] Sean Owen commented on SPARK-3896: -- Oops, my bad. I just realized that some PRs didn't link after looking at other recent JIRAs. checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is expensive --- Key: SPARK-3896 URL: https://issues.apache.org/jira/browse/SPARK-3896 Project: Spark Issue Type: Improvement Reporter: sjk -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3662) Importing pandas breaks included pi.py example
[ https://issues.apache.org/jira/browse/SPARK-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169424#comment-14169424 ] Sean Owen commented on SPARK-3662: -- [~esamanas] Do you have a suggested change here, beyond just disambiguating imports in your example? Or a different example that doesn't involve import collision? It sounds like the modified example is then misunderstood to refer to a pandas random class, not the Python one, and that is simply a matter of namespace collision, and why pandas is dragged in. This example seems to fall down before it demonstrates anything else. Importing pandas breaks included pi.py example -- Key: SPARK-3662 URL: https://issues.apache.org/jira/browse/SPARK-3662 Project: Spark Issue Type: Bug Components: PySpark, YARN Affects Versions: 1.1.0 Environment: Xubuntu 14.04. Yarn cluster running on Ubuntu 12.04. Reporter: Evan Samanas If I add import pandas at the top of the included pi.py example and submit using spark-submit --master yarn-client, I get this stack trace: {code} Traceback (most recent call last): File /home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi.py, line 39, in module count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add) File /home/evan/pub_src/spark/python/pyspark/rdd.py, line 759, in reduce vals = self.mapPartitions(func).collect() File /home/evan/pub_src/spark/python/pyspark/rdd.py, line 723, in collect bytesInJava = self._jrdd.collect().iterator() File /home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in __call__ File /home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value py4j.protocol.Py4JJavaError14/09/23 15:51:58 INFO TaskSetManager: Lost task 2.3 in stage 0.0 (TID 10) on executor SERVERNAMEREMOVED: org.apache.spark.api.python.PythonException (Traceback (most recent call last): File /yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/worker.py, line 75, in main command = pickleSer._read_with_length(infile) File /yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/serializers.py, line 150, in _read_with_length return self.loads(obj) ImportError: No module named algos {code} The example works fine if I move the statement from random import random from the top and into the function (def f(_)) defined in the example. Near as I can tell, random is getting confused with a function of the same name within pandas.algos. Submitting the same script using --master local works, but gives a distressing amount of random characters to stdout or stderr and messes up my terminal: {code} ... @J@J@J@J@J@J@J@J@J@J@J@J@J@JJ@J@J@J@J @J!@J@J#@J$@J%@J@J'@J(@J)@J*@J+@J,@J-@J.@J/@J0@J1@J2@J3@J4@J5@J6@J7@J8@J9@J:@J;@J@J=@J@J?@J@@JA@JB@JC@JD@JE@JF@JG@JH@JI@JJ@JK@JL@JM@JN@JO@JP@JQ@JR@JS@JT@JU@JV@JW@JX@JY@JZ@J[@J\@J]@J^@J_@J`@Ja@Jb@Jc@Jd@Je@Jf@Jg@Jh@Ji@Jj@Jk@Jl@Jm@Jn@Jo@Jp@Jq@Jr@Js@Jt@Ju@Jv@Jw@Jx@Jy@Jz@J{@J|@J}@J~@J@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JJJ�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JAJAJAJAJAJAJAJAAJ AJ AJ AJ AJAJAJAJAJAJAJAJAJAJAJAJAJAJJAJAJAJAJ AJ!AJAJ#AJ$AJ%AJAJ'AJ(AJ)AJ*AJ+AJ,AJ-AJ.AJ/AJ0AJ1AJ2AJ3AJ4AJ5AJ6AJ7AJ8AJ9AJ:AJ;AJAJ=AJAJ?AJ@AJAAJBAJCAJDAJEAJFAJGAJHAJIAJJAJKAJLAJMAJNAJOAJPAJQAJRAJSAJTAJUAJVAJWAJXAJYAJZAJ[AJ\AJ]AJ^AJ_AJ`AJaAJbAJcAJdAJeAJfAJgAJhAJiAJjAJkAJlAJmAJnAJoAJpAJqAJrAJsAJtAJuAJvAJwAJxAJyAJzAJ{AJ|AJ}AJ~AJAJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJJJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�A14/09/23 15:42:09 INFO SparkContext: Job finished: reduce at /home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi_sframe.py:38, took 11.276879779 s J�AJ�AJ�AJ�AJ�AJ�AJ�A�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJBJBJBJBJBJBJBJBBJ BJ BJ BJ BJBJBJBJBJBJBJBJBJBJBJBJBJBJJBJBJBJBJ BJ!BJBJ#BJ$BJ%BJBJ'BJ(BJ)BJ*BJ+BJ,BJ-BJ.BJ/BJ0BJ1BJ2BJ3BJ4BJ5BJ6BJ7BJ8BJ9BJ:BJ;BJBJ=BJBJ?BJ@Be. �]qJ#1a. �]qJX4a. �]qJX4a. �]qJ#1a. �]qJX4a. �]qJX4a. �]qJ#1a. �]qJX4a. �]qJX4a. �]qJa. Pi is roughly 3.146136 {code} No idea if that's related, but thought I'd include it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail:
[jira] [Resolved] (SPARK-3506) 1.1.0-SNAPSHOT in docs for 1.1.0 under docs/latest
[ https://issues.apache.org/jira/browse/SPARK-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-3506. -- Resolution: Fixed Fix Version/s: 1.1.1 Looks like the site has been updated, and I see no SNAPSHOT on the page. 1.1.0-SNAPSHOT in docs for 1.1.0 under docs/latest -- Key: SPARK-3506 URL: https://issues.apache.org/jira/browse/SPARK-3506 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.1.0 Reporter: Jacek Laskowski Assignee: Patrick Wendell Priority: Trivial Fix For: 1.1.1 In https://spark.apache.org/docs/latest/ there are references to 1.1.0-SNAPSHOT: * This documentation is for Spark version 1.1.0-SNAPSHOT. * For the Scala API, Spark 1.1.0-SNAPSHOT uses Scala 2.10. It should be version 1.1.0 since that's the latest released version and the header tells so, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3251) Clarify learning interfaces
[ https://issues.apache.org/jira/browse/SPARK-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169449#comment-14169449 ] Sean Owen commented on SPARK-3251: -- Is this a subset of / duplicate of SPARK-3702 now, given the discussion? Clarify learning interfaces Key: SPARK-3251 URL: https://issues.apache.org/jira/browse/SPARK-3251 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.1.0, 1.1.1 Reporter: Christoph Sawade *Make threshold mandatory* Currently, the output of predict for an example is either the score or the class. This side-effect is caused by clearThreshold. To clarify that behaviour three different types of predict (predictScore, predictClass, predictProbabilty) were introduced; the threshold is not longer optional. *Clarify classification interfaces* Currently, some functionality is spreaded over multiple models. In order to clarify the structure and simplify the implementation of more complex models (like multinomial logistic regression), two new classes are introduced: - BinaryClassificationModel: for all models that derives a binary classification from a single weight vector. Comprises the tresholding functionality to derive a prediction from a score. It basically captures SVMModel and LogisticRegressionModel. - ProbabilitistClassificaitonModel: This trait defines the interface for models that return a calibrated confidence score (aka probability). *Misc* - some renaming - add test for probabilistic output -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3480) Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for sbt build tool during 'Running Scala style checks'
[ https://issues.apache.org/jira/browse/SPARK-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169472#comment-14169472 ] Sean Owen commented on SPARK-3480: -- Given the discussion I suggest this is CannotReproduce? Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for sbt build tool during 'Running Scala style checks' --- Key: SPARK-3480 URL: https://issues.apache.org/jira/browse/SPARK-3480 Project: Spark Issue Type: Bug Components: Build Reporter: Yi Zhou Priority: Minor Symptom: Run ./dev/run-tests and dump outputs as following: SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl [Warn] Java 8 tests will not run because JDK version is 1.8. = Running Apache RAT checks = RAT checks passed. = Running Scala style checks = Scalastyle checks failed at following occurrences: [error] Expected ID character [error] Not a valid command: yarn-alpha [error] Expected project ID [error] Expected configuration [error] Expected ':' (if selecting a configuration) [error] Expected key [error] Not a valid key: yarn-alpha [error] yarn-alpha/scalastyle [error] ^ Possible Cause: I checked the dev/scalastyle, found that there are 2 parameters 'yarn-alpha/scalastyle' and 'yarn/scalastyle' separately,like echo -e q\n | sbt/sbt -Pyarn -Phadoop-0.23 -Dhadoop.version=0.23.9 yarn-alpha/scalastyle \ scalastyle.txt echo -e q\n | sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 yarn/scalastyle \ scalastyle.txt From above error message, sbt seems to complain them due to '/' separator. So it can be run through after I manually modified original ones to 'yarn-alpha:scalastyle' and 'yarn:scalastyle'.. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3924) Upgrade to Akka version 2.3.6
[ https://issues.apache.org/jira/browse/SPARK-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169504#comment-14169504 ] Sean Owen commented on SPARK-3924: -- I think this is a duplicate of SPARK-2707 and SPARK-2805. Upgrade to Akka version 2.3.6 - Key: SPARK-3924 URL: https://issues.apache.org/jira/browse/SPARK-3924 Project: Spark Issue Type: Dependency upgrade Environment: deploy env Reporter: Helena Edelson I tried every sbt in the book but can't use the latest Akka version in my project with Spark. It would be great if I could. Also I can not use the latest Typesafe Config - 1.2.1, which would also be great. See https://issues.apache.org/jira/browse/SPARK-2593 This is a big change. If I have time I can do a PR. [~helena_e] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2707) Upgrade to Akka 2.3
[ https://issues.apache.org/jira/browse/SPARK-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169509#comment-14169509 ] Sean Owen commented on SPARK-2707: -- Can this be considered a duplicate of SPARK-2805, since that's where I see recent action? Upgrade to Akka 2.3 --- Key: SPARK-2707 URL: https://issues.apache.org/jira/browse/SPARK-2707 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 1.0.0 Reporter: Yardena Upgrade Akka from 2.2 to 2.3. We want to be able to use new Akka and Spray features directly in the same project. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1834) NoSuchMethodError when invoking JavaPairRDD.reduce() in Java
[ https://issues.apache.org/jira/browse/SPARK-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1834. -- Resolution: Duplicate On another look, I'm almost sure this is the same issue as in SPARK-3266, which [~joshrosen] has been looking at. NoSuchMethodError when invoking JavaPairRDD.reduce() in Java Key: SPARK-1834 URL: https://issues.apache.org/jira/browse/SPARK-1834 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.1 Environment: Redhat Linux, Java 7, Hadoop 2.2, Scala 2.10.4 Reporter: John Snodgrass I get a java.lang.NoSuchMethod error when invoking JavaPairRDD.reduce(). Here is the partial stack trace: Exception in thread main java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:39) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: java.lang.NoSuchMethodError: org.apache.spark.api.java.JavaPairRDD.reduce(Lorg/apache/spark/api/java/function/Function2;)Lscala/Tuple2; at JavaPairRDDReduceTest.main(JavaPairRDDReduceTest.java:49)... I'm using Spark 0.9.1. I checked to ensure that I'm compiling with the same version of Spark as I am running on the cluster. The reduce() method works fine with JavaRDD, just not with JavaPairRDD. Here is a code snippet that exhibits the problem: ArrayListInteger array = new ArrayList(); for (int i = 0; i 10; ++i) { array.add(i); } JavaRDDInteger rdd = javaSparkContext.parallelize(array); JavaPairRDDString, Integer testRDD = rdd.map(new PairFunctionInteger, String, Integer() { @Override public Tuple2String, Integer call(Integer t) throws Exception { return new Tuple2( + t, t); } }).cache(); testRDD.reduce(new Function2Tuple2String, Integer, Tuple2String, Integer, Tuple2String, Integer() { @Override public Tuple2String, Integer call(Tuple2String, Integer arg0, Tuple2String, Integer arg1) throws Exception { return new Tuple2(arg0._1 + arg1._1, arg0._2 * 10 + arg0._2); } }); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2493) SBT gen-idea doesn't generate correct Intellij project
[ https://issues.apache.org/jira/browse/SPARK-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169527#comment-14169527 ] Sean Owen commented on SPARK-2493: -- Is this still an issue [~dbtsai] ? For IntelliJ, I find it much easier to point directly at the Maven build, and that's more the primary build system now anyway. SBT gen-idea doesn't generate correct Intellij project -- Key: SPARK-2493 URL: https://issues.apache.org/jira/browse/SPARK-2493 Project: Spark Issue Type: Sub-task Components: Build Reporter: DB Tsai I've a clean clone of spark master repository, and I generated the intellij project file by sbt gen-idea as usual. There are two issues we have after merging SPARK-1776 (read dependencies from Maven). 1) After SPARK-1776, sbt gen-idea will download the dependencies from internet even those jars are in local cache. Before merging, the second time we run gen-idea will not download anything but use the jars in cache. 2) The tests with spark local context can not be run in the intellij. It will show the following exception. The current workaround we've are checking out any snapshot before merging to gen-idea, and then switch back to current master. But this will not work when the master deviate too much from the latest working snapshot. [ERROR] [07/14/2014 16:27:49.967] [ScalaTest-run] [Remoting] Remoting error: [Startup timed out] [ akka.remote.RemoteTransportException: Startup timed out at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129) at akka.remote.Remoting.start(Remoting.scala:191) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588) at akka.actor.ActorSystem$.apply(ActorSystem.scala:111) at akka.actor.ActorSystem$.apply(ActorSystem.scala:104) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:104) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:153) at org.apache.spark.SparkContext.init(SparkContext.scala:202) at org.apache.spark.SparkContext.init(SparkContext.scala:117) at org.apache.spark.SparkContext.init(SparkContext.scala:132) at org.apache.spark.mllib.util.LocalSparkContext$class.beforeAll(LocalSparkContext.scala:29) at org.apache.spark.mllib.optimization.LBFGSSuite.beforeAll(LBFGSSuite.scala:27) at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) at org.apache.spark.mllib.optimization.LBFGSSuite.beforeAll(LBFGSSuite.scala:27) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) at org.apache.spark.mllib.optimization.LBFGSSuite.run(LBFGSSuite.scala:27) at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557) at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044) at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043) at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722) at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043) at org.scalatest.tools.Runner$.run(Runner.scala:883) at org.scalatest.tools.Runner.run(Runner.scala) at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:141) at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [1 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at akka.remote.Remoting.start(Remoting.scala:173) ... 35 more ] An exception or error caused a
[jira] [Resolved] (SPARK-2198) Partition the scala build file so that it is easier to maintain
[ https://issues.apache.org/jira/browse/SPARK-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-2198. -- Resolution: Won't Fix Sounds like a WontFix Partition the scala build file so that it is easier to maintain --- Key: SPARK-2198 URL: https://issues.apache.org/jira/browse/SPARK-2198 Project: Spark Issue Type: Task Components: Build Reporter: Helena Edelson Priority: Minor Original Estimate: 3h Remaining Estimate: 3h Partition to standard Dependencies, Version, Settings, Publish.scala. keeping the SparkBuild clean to describe the modules and their deps so that changes in versions, for example, need only be made in Version.scala, settings changes such as in scalac in Settings.scala, etc. I'd be happy to do this ([~helena_e]) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1849) Broken UTF-8 encoded data gets character replacements and thus can't be fixed
[ https://issues.apache.org/jira/browse/SPARK-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169549#comment-14169549 ] Sean Owen commented on SPARK-1849: -- Yes, I think there isn't a 'fix' here short of a quite different implementation. Hadoop's text support pretty deeply assumes UTF-8 (partly for speed) and the Spark implementation is just Hadoop's. This would have to justify rewriting all that. I think you have to treat this as binary data for now. Broken UTF-8 encoded data gets character replacements and thus can't be fixed --- Key: SPARK-1849 URL: https://issues.apache.org/jira/browse/SPARK-1849 Project: Spark Issue Type: Bug Reporter: Harry Brundage Attachments: encoding_test I'm trying to process a file which isn't valid UTF-8 data inside hadoop using Spark via {{sc.textFile()}}. Is this possible, and if not, is this a bug that we should fix? It looks like {{HadoopRDD}} uses {{org.apache.hadoop.io.Text.toString}} on all the data it ever reads, which I believe replaces invalid UTF-8 byte sequences with the UTF-8 replacement character, \uFFFD. Some example code mimicking what {{sc.textFile}} does underneath: {code} scala sc.textFile(path).collect()(0) res8: String = ?pple scala sc.hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text]).map(pair = pair._2.toString).collect()(0).getBytes() res9: Array[Byte] = Array(-17, -65, -67, 112, 112, 108, 101) scala sc.hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text]).map(pair = pair._2.getBytes).collect()(0) res10: Array[Byte] = Array(-60, 112, 112, 108, 101) {code} In the above example, the first two snippets show the string representation and byte representation of the example line of text. The string shows a question mark for the replacement character and the bytes reveal the replacement character has been swapped in by {{Text.toString}}. The third snippet shows what happens if you call {{getBytes}} on the {{Text}} object which comes back from hadoop land: we get the real bytes in the file out. Now, I think this is a bug, though you may disagree. The text inside my file is perfectly valid iso-8859-1 encoded bytes, which I would like to be able to rescue and re-encode into UTF-8, because I want my application to be smart like that. I think Spark should give me the raw broken string so I can re-encode, but I can't get at the original bytes in order to guess at what the source encoding might be, as they have already been replaced. I'm dealing with data from some CDN access logs which are to put it nicely diversely encoded, but I think a use case Spark should fully support. So, my suggested fix, which I'd like some guidance, is to change {{textFile}} to spit out broken strings by not using {{Text}}'s UTF-8 encoding. Further compounding this issue is that my application is actually in PySpark, but we can talk about how bytes fly through to Scala land after this if we agree that this is an issue at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1787) Build failure on JDK8 :: SBT fails to load build configuration file
[ https://issues.apache.org/jira/browse/SPARK-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1787. -- Resolution: Duplicate FWIW SBT + Java 8 has worked fine for me on master for a long while, so assume this does not affect 1.1 or perhaps 1.0. Build failure on JDK8 :: SBT fails to load build configuration file --- Key: SPARK-1787 URL: https://issues.apache.org/jira/browse/SPARK-1787 Project: Spark Issue Type: New Feature Components: Build Affects Versions: 0.9.0 Environment: JDK8 Scala 2.10.X SBT 0.12.X Reporter: Richard Gomes Priority: Minor SBT fails to build under JDK8. Please find steps to reproduce the error below: (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ uname -a Linux terra 3.13-1-amd64 #1 SMP Debian 3.13.10-1 (2014-04-15) x86_64 GNU/Linux (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ java -version java version 1.8.0_05 Java(TM) SE Runtime Environment (build 1.8.0_05-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode) (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ scala -version Scala code runner version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ sbt/sbt clean Launching sbt from sbt/sbt-launch-0.12.4.jar Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=350m; support was removed in 8.0 [info] Loading project definition from /home/rgomes/workspace/spark-0.9.1/project/project [info] Compiling 1 Scala source to /home/rgomes/workspace/spark-0.9.1/project/project/target/scala-2.9.2/sbt-0.12/classes... [error] error while loading CharSequence, class file '/opt/developer/jdk1.8.0_05/jre/lib/rt.jar(java/lang/CharSequence.class)' is broken [error] (bad constant pool tag 15 at byte 1501) [error] error while loading Comparator, class file '/opt/developer/jdk1.8.0_05/jre/lib/rt.jar(java/util/Comparator.class)' is broken [error] (bad constant pool tag 15 at byte 5003) [error] two errors found [error] (compile:compile) Compilation failed Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? q -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1738) Is spark-debugger still available?
[ https://issues.apache.org/jira/browse/SPARK-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1738. -- Resolution: Fixed That document was since deleted at some point anyway, and I assume the answer is that it does not exit. Is spark-debugger still available? -- Key: SPARK-1738 URL: https://issues.apache.org/jira/browse/SPARK-1738 Project: Spark Issue Type: Question Components: Documentation Reporter: WangTaoTheTonic Priority: Minor I see the arthur branch(https://github.com/apache/spark/tree/arthur) described in docs/spark-debugger.md does not exist. So the spark-debugger is still available? If not, should the document be deleted? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1605) Improve mllib.linalg.Vector
[ https://issues.apache.org/jira/browse/SPARK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1605. -- Resolution: Won't Fix Another WontFix then? Improve mllib.linalg.Vector --- Key: SPARK-1605 URL: https://issues.apache.org/jira/browse/SPARK-1605 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Sandeep Singh We can make current Vector a wrapper around Breeze.linalg.Vector ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1573) slight modification with regards to sbt/sbt test
[ https://issues.apache.org/jira/browse/SPARK-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1573. -- Resolution: Won't Fix This has been resolved insofar as the main README.md no longer has this text. slight modification with regards to sbt/sbt test Key: SPARK-1573 URL: https://issues.apache.org/jira/browse/SPARK-1573 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Nishkam Ravi When the sources are built against a certain Hadoop version with SPARK_YARN=true, the same settings seem necessary when running sbt/sbt test. For example: SPARK_HADOOP_VERSION=2.3.0-cdh5.0.0 SPARK_YARN=true sbt/sbt assembly SPARK_HADOOP_VERSION=2.3.0-cdh5.0.0 SPARK_YARN=true sbt/sbt test Otherwise build errors and failing tests are seen. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1479) building spark on 2.0.0-cdh4.4.0 failed
[ https://issues.apache.org/jira/browse/SPARK-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1479. -- Resolution: Won't Fix Given discussion in SPARK-3445, I doubt anything more will be done for YARN alpha support, as it's on its way out. building spark on 2.0.0-cdh4.4.0 failed --- Key: SPARK-1479 URL: https://issues.apache.org/jira/browse/SPARK-1479 Project: Spark Issue Type: Question Environment: 2.0.0-cdh4.4.0 Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL spark 0.9.1 java version 1.6.0_32 Reporter: jackielihf Attachments: mvn.log [INFO] [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.5:compile (scala-compile-first) on project spark-yarn-alpha_2.10: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.1.5:compile failed. CompileFailed - [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.5:compile (scala-compile-first) on project spark-yarn-alpha_2.10: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.1.5:compile failed. at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:225) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196) at org.apache.maven.cli.MavenCli.main(MavenCli.java:141) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352) Caused by: org.apache.maven.plugin.PluginExecutionException: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.1.5:compile failed. at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:110) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209) ... 19 more Caused by: Compilation failed at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:76) at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:35) at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:29) at sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply$mcV$sp(AggressiveCompile.scala:71) at sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:71) at sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:71) at sbt.compiler.AggressiveCompile.sbt$compiler$AggressiveCompile$$timed(AggressiveCompile.scala:101) at sbt.compiler.AggressiveCompile$$anonfun$4.compileScala$1(AggressiveCompile.scala:70) at sbt.compiler.AggressiveCompile$$anonfun$4.apply(AggressiveCompile.scala:88) at sbt.compiler.AggressiveCompile$$anonfun$4.apply(AggressiveCompile.scala:60) at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:24) at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:22) at sbt.inc.Incremental$.cycle(Incremental.scala:40) at sbt.inc.Incremental$.compile(Incremental.scala:25) at sbt.inc.IncrementalCompile$.apply(Compile.scala:20) at sbt.compiler.AggressiveCompile.compile2(AggressiveCompile.scala:96) at
[jira] [Commented] (SPARK-1409) Flaky Test: actor input stream test in org.apache.spark.streaming.InputStreamsSuite
[ https://issues.apache.org/jira/browse/SPARK-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169589#comment-14169589 ] Sean Owen commented on SPARK-1409: -- Since this test was removed with SPARK-2805, safe to call this closed? Flaky Test: actor input stream test in org.apache.spark.streaming.InputStreamsSuite - Key: SPARK-1409 URL: https://issues.apache.org/jira/browse/SPARK-1409 Project: Spark Issue Type: Bug Components: Streaming Reporter: Michael Armbrust Assignee: Tathagata Das Here are just a few cases: https://travis-ci.org/apache/spark/jobs/22151827 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13709/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1398) Remove FindBugs jsr305 dependency
[ https://issues.apache.org/jira/browse/SPARK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1398. -- Resolution: Won't Fix From the PR discussion, this had to be reverted because of some build problems, so I assume removing this .jar is a WontFix Remove FindBugs jsr305 dependency - Key: SPARK-1398 URL: https://issues.apache.org/jira/browse/SPARK-1398 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Mark Hamstra Assignee: Mark Hamstra Priority: Minor We're not making much use of FindBugs at this point, but findbugs-2.0.x is a drop-in replacement for 1.3.9 and does offer significant improvements (http://findbugs.sourceforge.net/findbugs2.html), so it's probably where we want to be for Spark 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1339) Build error: org.eclipse.paho:mqtt-client
[ https://issues.apache.org/jira/browse/SPARK-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1339. -- Resolution: Not a Problem Build error: org.eclipse.paho:mqtt-client - Key: SPARK-1339 URL: https://issues.apache.org/jira/browse/SPARK-1339 Project: Spark Issue Type: Bug Components: Build Affects Versions: 0.9.0 Reporter: Ken Williams Using Maven, I'm unable to build the 0.9.0 distribution I just downloaded. The Maven error is: {code} [ERROR] Failed to execute goal on project spark-examples_2.10: Could not resolve dependencies for project org.apache.spark:spark-examples_2.10:jar:0.9.0-incubating: Could not find artifact org.eclipse.paho:mqtt-client:jar:0.4.0 in nexus {code} My Maven version is 3.2.1, running on Java 1.7.0, using Scala 2.10.4. Is there an additional Maven repository I should add or something? If I go into the {{pom.xml}} and comment out the {{external/mqtt}} and {{examples}} modules, the build succeeds. I'm fine without the MQTT stuff, but I would really like to get the examples working because I haven't played with Spark before. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1317) sbt doesn't work for building Spark programs
[ https://issues.apache.org/jira/browse/SPARK-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169629#comment-14169629 ] Sean Owen commented on SPARK-1317: -- PS if you're still interested in this, I am pretty sure #1 is the correct answer. I would use my own sbt (or really, the SBT support in my IDE perhaps, or Maven) to build my own app. sbt doesn't work for building Spark programs Key: SPARK-1317 URL: https://issues.apache.org/jira/browse/SPARK-1317 Project: Spark Issue Type: Bug Components: Build, Documentation Affects Versions: 0.9.0 Reporter: Diana Carroll I don't know if this is a doc bug or a product bug, because I don't know how it is supposed to work. The Spark quick start guide page has a section that walks you through creating a standalone Spark app in Scala. I think the instructions worked in 0.8.1 but I can't get them to work in 0.9.0. The instructions have you create a directory structure in the canonical sbt format, but do not tell you where to locate this directory. However, after setting up the structure, the tutorial then instructs you to use the command {code}sbt/sbt package{code} which implies that the working directory must be SPARK_HOME. I tried it both ways: creating a mysparkapp directory right in SPARK_HOME and creating it in my home directory. Neither worked, with different results: - if I create a mysparkapp directory as instructed in SPARK_HOME, cd to SPARK_HOME and run the command sbt/sbt package as specified, it packages ALL of Spark...but does not build my own app. - if I create a mysparkapp directory elsewhere, cd to that directory, and run the command there, I get an error: {code} $SPARK_HOME/sbt/sbt package awk: cmd. line:1: fatal: cannot open file `./project/build.properties' for reading (No such file or directory) Attempting to fetch sbt /usr/lib/spark/sbt/sbt: line 33: sbt/sbt-launch-.jar: No such file or directory /usr/lib/spark/sbt/sbt: line 33: sbt/sbt-launch-.jar: No such file or directory Our attempt to download sbt locally to sbt/sbt-launch-.jar failed. Please install sbt manually from http://www.scala-sbt.org/ {code} So, either: 1: the Spark distribution of sbt can only be used to build Spark itself, not you own code...in which case the quick start guide is wrong, and should instead say that users should install sbt separately OR 2: the Spark distribution of sbt CAN be used, with property configuration, in which case that configuration should be documented (I wasn't able to figure it out, but I didn't try that hard either) OR 3: the Spark distribution of sbt is *supposed* to be able to build Spark apps, but is configured incorrectly in the product, in which case there's a product bug rather than a doc bug Although this is not a show-stopper, because the obvious workaround is to simply install sbt separately, I think at least updating the docs is pretty high priority, because most people learning Spark start with that Quick Start page, which doesn't work. (If it's doc issue #1, let me know, and I'll fix the docs myself. :-) ) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1243) spark compilation error
[ https://issues.apache.org/jira/browse/SPARK-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1243. -- Resolution: Fixed This appears to be long since resolved by something else, perhaps a subsequent change to Jetty deps. I have never seen this personally, and Jenkins builds are fine. spark compilation error --- Key: SPARK-1243 URL: https://issues.apache.org/jira/browse/SPARK-1243 Project: Spark Issue Type: Bug Components: Build Reporter: Qiuzhuang Lian After issuing git pull from git master, spark could not compile any longer Here is the error message, it seems that it is related to jetty upgrade.@rxin compile [info] Compiling 301 Scala sources and 19 Java sources to E:\projects\amplab\spark\core\target\scala-2.10\classes... [warn] Class java.nio.channels.ReadPendingException not found - continuing with a stub. [error] [error] while compiling: E:\projects\amplab\spark\core\src\main\scala\org\apache\spark\HttpServer.scala [error] during phase: erasure [error] library version: version 2.10.3 [error] compiler version: version 2.10.3 [error] reconstructed args: -Xmax-classfile-name 120 -deprecation -bootclasspath C:\Java\jdk1.6.0_27\jre\lib\resources.jar;C:\Java\jdk1.6.0_27\jre\lib\rt.jar;C:\Java\jdk1.6.0_27\jre\lib\sunrsasign.jar;C:\Java\jdk1.6.0_27\jre\lib\jsse.jar;C:\Java\jdk1.6.0_27\jre\lib\jce.jar;C:\Java\jdk1.6.0_27\jre\lib\charsets.jar;C:\Java\jdk1.6.0_27\jre\lib\modules\jdk.boot.jar;C:\Java\jdk1.6.0_27\jre\classes;C:\Users\Kand\.sbt\boot\scala-2.10.3\lib\scala-library.jar -unchecked -classpath
[jira] [Resolved] (SPARK-1306) no instructions provided for sbt assembly with Hadoop 2.2
[ https://issues.apache.org/jira/browse/SPARK-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1306. -- Resolution: Fixed I think this was obviated by subsequent changes to this documentation. SBT is no longer the focus, but, building-spark.md now has more comprehensive documentation on building with YARN, including these recent versions. no instructions provided for sbt assembly with Hadoop 2.2 - Key: SPARK-1306 URL: https://issues.apache.org/jira/browse/SPARK-1306 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 0.9.0 Reporter: Diana Carroll on the running-on-yarn.html page, in the section Building a YARN-Enabled Assembly JAR, only the instructions for building for old Hadoop (2.0.5) are provided. There's a comment that The build process now also supports new YARN versions (2.2.x). See below. However, the only mention below is a single sentence which says See Building Spark with Maven for instructions on how to build Spark using the Maven process. There are no instructions for building with sbt. This is different than in prior versions of the docs, in which a whole paragraph was provided. I'd like to see the command line to build for Hadoop 2.2 included right at the top of the page. Also remove the bit about how it is now supported. Hadoop 2.2 is now the norm, no longer an exception, as I see it. Unfortunately I'm not sure exactly what the command should be. I tried this, but got errors: SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1234) clean up typos and grammar issues in Spark on YARN page
[ https://issues.apache.org/jira/browse/SPARK-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1234. -- Resolution: Won't Fix Given the discussion in https://github.com/apache/spark/pull/130 , this was abandoned, but I also don't see the bad text on that page anymore anyhow. It probably got improved in another subsequent update. clean up typos and grammar issues in Spark on YARN page --- Key: SPARK-1234 URL: https://issues.apache.org/jira/browse/SPARK-1234 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 0.9.0 Reporter: Diana Carroll Priority: Minor The Launch spark application with yarn-client mode section of this of this page has several incomplete sentences, typos, etc.etc. http://spark.incubator.apache.org/docs/latest/running-on-yarn.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1192) Around 30 parameters in Spark are used but undocumented and some are having confusing name
[ https://issues.apache.org/jira/browse/SPARK-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169649#comment-14169649 ] Sean Owen commented on SPARK-1192: -- PR is actually at https://github.com/apache/spark/pull/2312 and is misnamed. Is this still live though? Around 30 parameters in Spark are used but undocumented and some are having confusing name -- Key: SPARK-1192 URL: https://issues.apache.org/jira/browse/SPARK-1192 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu I grep the code in core component, I found that around 30 parameters in the implementation is actually used but undocumented. By reading the source code, I found that some of them are actually very useful for the user. I suggest to make a complete document on the parameters. Also some parameters are having confusing names spark.shuffle.copier.threads - this parameters is to control how many threads you will use when you start a Netty-based shuffle servicebut from the name, we cannot get this information spark.shuffle.sender.port - the similar problem with the above one, when you use Netty-based shuffle receiver, you will have to setup a Netty-based sender...this parameter is to setup the port used by the Netty sender, but the name cannot convey this information -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1149) Bad partitioners can cause Spark to hang
[ https://issues.apache.org/jira/browse/SPARK-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1149. -- Resolution: Fixed Looks like Patrick merged this into master in March. It might have been fixed for ... 1.0? Bad partitioners can cause Spark to hang Key: SPARK-1149 URL: https://issues.apache.org/jira/browse/SPARK-1149 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Bryn Keller Priority: Minor While implementing a unit test for lookup, I accidentally created a situation where a partitioner returned a partition number that was outside its range. It should have returned 0 or 1, but in the last case, it returned a -1. Rather than reporting the problem via an exception, Spark simply hangs during the unit test run. We should catch this bad behavior by partitioners and throw an exception. test(lookup with bad partitioner) { val pairs = sc.parallelize(Array((1,2), (3,4), (5,6), (5,7))) val p = new Partitioner { def numPartitions: Int = 2 def getPartition(key: Any): Int = key.hashCode() % 2 } val shuffled = pairs.partitionBy(p) assert(shuffled.partitioner === Some(p)) assert(shuffled.lookup(1) === Seq(2)) assert(shuffled.lookup(5) === Seq(6,7)) assert(shuffled.lookup(-1) === Seq()) } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1083) Build fail
[ https://issues.apache.org/jira/browse/SPARK-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1083. -- Resolution: Cannot Reproduce This looks like a git error, and is ancient at this point. I presume that since we have evidence that Windows builds subsequently worked, this was either a local problem or fixed by something else. Build fail -- Key: SPARK-1083 URL: https://issues.apache.org/jira/browse/SPARK-1083 Project: Spark Issue Type: Bug Components: Build, Windows Affects Versions: 0.7.3 Reporter: Jan Paw Problem with building the latest version from github. {code:none}[info] Loading project definition from C:\Users\Jan\Documents\GitHub\incubator-s park\project\project [debug] [debug] Initial source changes: [debug] removed:Set() [debug] added: Set() [debug] modified: Set() [debug] Removed products: Set() [debug] Modified external sources: Set() [debug] Modified binary dependencies: Set() [debug] Initial directly invalidated sources: Set() [debug] [debug] Sources indirectly invalidated by: [debug] product: Set() [debug] binary dep: Set() [debug] external source: Set() [debug] All initially invalidated sources: Set() [debug] Copy resource mappings: [debug] java.lang.RuntimeException: Nonzero exit code (128): git clone https://github.co m/chenkelmann/junit_xml_listener.git C:\Users\Jan\.sbt\0.13\staging\5f76b43a3aca 87b5c013\junit_xml_listener at scala.sys.package$.error(package.scala:27) at sbt.Resolvers$.run(Resolvers.scala:134) at sbt.Resolvers$.run(Resolvers.scala:123) at sbt.Resolvers$$anon$2.clone(Resolvers.scala:78) at sbt.Resolvers$DistributedVCS$$anonfun$toResolver$1$$anonfun$apply$11$ $anonfun$apply$5.apply$mcV$sp(Resolvers.scala:104) at sbt.Resolvers$.creates(Resolvers.scala:141) at sbt.Resolvers$DistributedVCS$$anonfun$toResolver$1$$anonfun$apply$11. apply(Resolvers.scala:103) at sbt.Resolvers$DistributedVCS$$anonfun$toResolver$1$$anonfun$apply$11. apply(Resolvers.scala:103) at sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$3.apply(Bui ldLoader.scala:90) at sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$3.apply(Bui ldLoader.scala:89) at scala.Option.map(Option.scala:145) at sbt.BuildLoader$$anonfun$componentLoader$1.apply(BuildLoader.scala:89 ) at sbt.BuildLoader$$anonfun$componentLoader$1.apply(BuildLoader.scala:85 ) at sbt.MultiHandler.apply(BuildLoader.scala:16) at sbt.BuildLoader.apply(BuildLoader.scala:142) at sbt.Load$.loadAll(Load.scala:314) at sbt.Load$.loadURI(Load.scala:266) at sbt.Load$.load(Load.scala:262) at sbt.Load$.load(Load.scala:253) at sbt.Load$.apply(Load.scala:137) at sbt.Load$.buildPluginDefinition(Load.scala:597) at sbt.Load$.buildPlugins(Load.scala:563) at sbt.Load$.plugins(Load.scala:551) at sbt.Load$.loadUnit(Load.scala:412) at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:258) at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:258) at sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$ apply$5$$anonfun$apply$6.apply(BuildLoader.scala:93) at sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$ apply$5$$anonfun$apply$6.apply(BuildLoader.scala:92) at sbt.BuildLoader.apply(BuildLoader.scala:143) at sbt.Load$.loadAll(Load.scala:314) at sbt.Load$.loadURI(Load.scala:266) at sbt.Load$.load(Load.scala:262) at sbt.Load$.load(Load.scala:253) at sbt.Load$.apply(Load.scala:137) at sbt.Load$.defaultLoad(Load.scala:40) at sbt.BuiltinCommands$.doLoadProject(Main.scala:451) at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:445) at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:445) at sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.sca la:60) at sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.sca la:60) at sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.sca la:62) at sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.sca la:62) at sbt.Command$.process(Command.scala:95) at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100) at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100) at sbt.State$$anon$1.process(State.scala:179) at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100) at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100) at
[jira] [Resolved] (SPARK-1017) Set the permgen even if we are calling the users sbt (via SBT_OPTS)
[ https://issues.apache.org/jira/browse/SPARK-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1017. -- Resolution: Won't Fix As I understand, only {{sbt/sbt}} is supported for building Spark with SBT, rather than a local {{sbt}}. Maven is the primary build, and it sets {{MaxPermSize}} and {{PermGen}} for scalac and scalatest. I think this is obsolete and/or already covered then? Set the permgen even if we are calling the users sbt (via SBT_OPTS) --- Key: SPARK-1017 URL: https://issues.apache.org/jira/browse/SPARK-1017 Project: Spark Issue Type: Improvement Reporter: Patrick Cogan Assignee: Patrick Cogan Now we will call the users sbt installation if they have one. But users might run into the permgen issues... so we should force the permgen unless the user explicitly overrides it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1463) cleanup unnecessary dependency jars in the spark assembly jars
[ https://issues.apache.org/jira/browse/SPARK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169709#comment-14169709 ] Sean Owen commented on SPARK-1463: -- FWIW I do not see these packages in the final assembly JAR anymore. This may be obsolete? cleanup unnecessary dependency jars in the spark assembly jars -- Key: SPARK-1463 URL: https://issues.apache.org/jira/browse/SPARK-1463 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 0.9.0 Reporter: Jenny MA Priority: Minor Labels: easyfix Fix For: 1.0.0 there are couple GPL/LGPL based dependencies which are included in the final assembly jar, which are not used by spark runtime. identified the following libraries. we can provide a fix in assembly/pom.xml. excludecom.google.code.findbugs:*/exclude excludeorg.acplt:oncrpc:*/exclude excludeglassfish:*/exclude excludecom.cenqua.clover:clover:*/exclude excludeorg.glassfish:*/exclude excludeorg.glassfish.grizzly:*/exclude excludeorg.glassfish.gmbal:*/exclude excludeorg.glassfish.external:*/exclude -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1010) Update all unit tests to use SparkConf instead of system properties
[ https://issues.apache.org/jira/browse/SPARK-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169724#comment-14169724 ] Sean Owen commented on SPARK-1010: -- Yes, lots of usage in tests still. A lot looks intentional. {code} find . -name *Suite.scala -type f -exec grep -E System\.[gs]etProperty {} \; ... .format(System.getProperty(user.name, unknown), .format(System.getProperty(user.name, unknown)).stripMargin System.setProperty(spark.testing, true) System.setProperty(spark.reducer.maxMbInFlight, 1) System.setProperty(spark.storage.memoryFraction, 0.0001) System.setProperty(spark.storage.memoryFraction, 0.01) System.setProperty(spark.authenticate, false) System.setProperty(spark.authenticate, false) System.setProperty(spark.shuffle.manager, hash) System.setProperty(spark.scheduler.mode, FIFO) System.setProperty(spark.scheduler.mode, FAIR) ... {code} Update all unit tests to use SparkConf instead of system properties --- Key: SPARK-1010 URL: https://issues.apache.org/jira/browse/SPARK-1010 Project: Spark Issue Type: New Feature Affects Versions: 0.9.0 Reporter: Patrick Cogan Assignee: Nirmal Priority: Minor Labels: starter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1209) SparkHadoopUtil should not use package org.apache.hadoop
[ https://issues.apache.org/jira/browse/SPARK-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169835#comment-14169835 ] Sean Owen commented on SPARK-1209: -- Yes, I wonder too, does SparkHadoopMapRedUtil and SparkHadoopMapReduceUtil need to live in {{org.apache.hadoop}} anymore? I assume they may have in the past to access some package-private Hadoop code. But I've tried moving them under {{org.apache.spark}} and compiling versus a few Hadoop versions and it all seems fine. Am I missing something or is this worth changing? it's private to Spark (well, org.apache right now by necessity) so think it's fair game to move. See https://github.com/srowen/spark/tree/SPARK-1209 SparkHadoopUtil should not use package org.apache.hadoop Key: SPARK-1209 URL: https://issues.apache.org/jira/browse/SPARK-1209 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: Sandy Pérez González Assignee: Mark Grover It's private, so the change won't break compatibility -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3934) RandomForest bug in sanity check in DTStatsAggregator
[ https://issues.apache.org/jira/browse/SPARK-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170220#comment-14170220 ] Sean Owen commented on SPARK-3934: -- Yep that fixes the issue I was seeing. Thanks! I can confirm it did not affect DecisionTree too, so it seems to match your analysis. RandomForest bug in sanity check in DTStatsAggregator - Key: SPARK-3934 URL: https://issues.apache.org/jira/browse/SPARK-3934 Project: Spark Issue Type: Bug Components: MLlib Reporter: Joseph K. Bradley Assignee: Joseph K. Bradley When run with a mix of unordered categorical and continuous features, on multiclass classification, RandomForest fails. The bug is in the sanity checks in getFeatureOffset and getLeftRightFeatureOffsets, which use the wrong indices for checking whether features are unordered. Proposal: Remove the sanity checks since they are not really needed, and since they would require DTStatsAggregator to keep track of an extra set of indices (for the feature subset). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3895) Scala style: Indentation of method
[ https://issues.apache.org/jira/browse/SPARK-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170609#comment-14170609 ] Sean Owen commented on SPARK-3895: -- The rule is already in style guide. There seems to be agreement not to change this all at once. The original PR seemed to contradict the style guide. What change are you proposing in order to reopen this? Scala style: Indentation of method -- Key: SPARK-3895 URL: https://issues.apache.org/jira/browse/SPARK-3895 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: sjk {code:title=core/src/main/scala/org/apache/spark/Aggregator.scala|borderStyle=solid} // for example def combineCombinersByKey(iter: Iterator[_ : Product2[K, C]], context: TaskContext) : Iterator[(K, C)] = { ... def combineValuesByKey(iter: Iterator[_ : Product2[K, V]], context: TaskContext): Iterator[(K, C)] = { {code} there are not conform to the rule.https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide there are so much code like this -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1209) SparkHadoopUtil should not use package org.apache.hadoop
[ https://issues.apache.org/jira/browse/SPARK-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170617#comment-14170617 ] Sean Owen commented on SPARK-1209: -- Hm, it's {{private[apache]}} though. Couldn't this only be used by people writing in the {{org.apache}} namespace? naturally a project might do just this to access this code, but I hadn't though this was promised as an stable API. People that pull this trick can I suppose declare their hack in {{org.apache.spark}}, although that's a source change. I can set up a forwarder and deprecate to see how that looks but wanted to check if it's really these classes in question that are being used outside Spark. SparkHadoopUtil should not use package org.apache.hadoop Key: SPARK-1209 URL: https://issues.apache.org/jira/browse/SPARK-1209 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: Sandy Ryza Assignee: Mark Grover It's private, so the change won't break compatibility -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3480) Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for sbt build tool during 'Running Scala style checks'
[ https://issues.apache.org/jira/browse/SPARK-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-3480. -- Resolution: Cannot Reproduce Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for sbt build tool during 'Running Scala style checks' --- Key: SPARK-3480 URL: https://issues.apache.org/jira/browse/SPARK-3480 Project: Spark Issue Type: Bug Components: Build Reporter: Yi Zhou Priority: Minor Symptom: Run ./dev/run-tests and dump outputs as following: SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl [Warn] Java 8 tests will not run because JDK version is 1.8. = Running Apache RAT checks = RAT checks passed. = Running Scala style checks = Scalastyle checks failed at following occurrences: [error] Expected ID character [error] Not a valid command: yarn-alpha [error] Expected project ID [error] Expected configuration [error] Expected ':' (if selecting a configuration) [error] Expected key [error] Not a valid key: yarn-alpha [error] yarn-alpha/scalastyle [error] ^ Possible Cause: I checked the dev/scalastyle, found that there are 2 parameters 'yarn-alpha/scalastyle' and 'yarn/scalastyle' separately,like echo -e q\n | sbt/sbt -Pyarn -Phadoop-0.23 -Dhadoop.version=0.23.9 yarn-alpha/scalastyle \ scalastyle.txt echo -e q\n | sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 yarn/scalastyle \ scalastyle.txt From above error message, sbt seems to complain them due to '/' separator. So it can be run through after I manually modified original ones to 'yarn-alpha:scalastyle' and 'yarn:scalastyle'.. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3926) result of JavaRDD collectAsMap() is not serializable
[ https://issues.apache.org/jira/browse/SPARK-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171418#comment-14171418 ] Sean Owen commented on SPARK-3926: -- Oops, embarrassed to say I didn't realize {{WrapAsJava.scala}} is a Scala class. Can't change that. This requires subclassing {{MapWrapper}} to add {{java.io.Serializable}}. It still basically seems worthwhile to support this use case, so I'll propose it as a PR. result of JavaRDD collectAsMap() is not serializable Key: SPARK-3926 URL: https://issues.apache.org/jira/browse/SPARK-3926 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 1.1.0 Environment: CentOS / Spark 1.1 / Hadoop Hortonworks 2.4.0.2.1.2.0-402 Reporter: Antoine Amend Using the Java API, I want to collect the result of a RDDString, String as a HashMap using collectAsMap function: MapString, String map = myJavaRDD.collectAsMap(); This works fine, but when passing this map to another function, such as... myOtherJavaRDD.mapToPair(new CustomFunction(map)) ...this leads to the following error: Exception in thread main org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) at org.apache.spark.rdd.RDD.map(RDD.scala:270) at org.apache.spark.api.java.JavaRDDLike$class.mapToPair(JavaRDDLike.scala:99) at org.apache.spark.api.java.JavaPairRDD.mapToPair(JavaPairRDD.scala:44) ../.. MY CLASS ../.. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.NotSerializableException: scala.collection.convert.Wrappers$MapWrapper at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164) This seems to be due to WrapAsJava.scala being non serializable ../.. implicit def mapAsJavaMap[A, B](m: Map[A, B]): ju.Map[A, B] = m match { //case JConcurrentMapWrapper(wrapped) = wrapped case JMapWrapper(wrapped) = wrapped.asInstanceOf[ju.Map[A, B]] case _ = new MapWrapper(m) } ../.. The workaround is to manually wrapper this map into another one (serialized) MapString, String map = myJavaRDD.collectAsMap(); MapString, String tmp = new HashMapString, String(map); myOtherJavaRDD.mapToPair(new CustomFunction(tmp)) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3955) Different versions between jackson-mapper-asl and jackson-core-asl
[ https://issues.apache.org/jira/browse/SPARK-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172148#comment-14172148 ] Sean Owen commented on SPARK-3955: -- Looks like Jackson is managed to version 1.8.8 for Avro reasons. I think core just needs to be managed the same way. I'll try it locally to make sure that works. Different versions between jackson-mapper-asl and jackson-core-asl -- Key: SPARK-3955 URL: https://issues.apache.org/jira/browse/SPARK-3955 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.1.0 Reporter: Jongyoul Lee In the parent pom.xml, specified a version of jackson-mapper-asl. This is used by sql/hive/pom.xml. When mvn assembly runs, however, jackson-mapper-asl is not same as jackson-core-asl. This is because other libraries use several versions of jackson, so other version of jackson-core-asl is assembled. Simply, fix this problem if pom.xml has a specific version information of jackson-core-asl. If it's not set, a version 1.9.11 is merged info assembly.jar and we cannot use jackson library properly. {code} [INFO] Including org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8 in the shaded jar. [INFO] Including org.codehaus.jackson:jackson-core-asl:jar:1.9.11 in the shaded jar. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1209) SparkHadoopUtil should not use package org.apache.hadoop
[ https://issues.apache.org/jira/browse/SPARK-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172571#comment-14172571 ] Sean Owen commented on SPARK-1209: -- ... and why wouldn't you, that's the title of the JIRA, oops. It's not that class that moves or even changes actually, and yes it should not move. Let me fix the title and fix my PR too. Maybe that's a more palatable change. SparkHadoopUtil should not use package org.apache.hadoop Key: SPARK-1209 URL: https://issues.apache.org/jira/browse/SPARK-1209 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: Sandy Ryza Assignee: Mark Grover It's private, so the change won't break compatibility -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173618#comment-14173618 ] Sean Owen commented on SPARK-3431: -- Yes that should be what scalatest does. It is a fork of an old surefire so only has a very few options. This parallelization failed as above for a few reasons. I have not gotten surefire to run the scala tests Parallelize execution of tests -- Key: SPARK-3431 URL: https://issues.apache.org/jira/browse/SPARK-3431 Project: Spark Issue Type: Improvement Components: Build Reporter: Nicholas Chammas Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common strategy to cut test time down is to parallelize the execution of the tests. Doing that may in turn require some prerequisite changes to be made to how certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS
[ https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175377#comment-14175377 ] Sean Owen commented on SPARK-2426: -- Regarding licensing, if the code is BSD licensed then it does not require an entry in NOTICE file (it's a Category A license), and entries shouldn't be added to NOTICE unless required. I believe that in this case we will need to reproduce the text of the license in LICENSE since it will not be included otherwise from a Maven artifact. So I suggest: don't change NOTICE, and move the license in LICENSE up to the section where other licenses are reproduced in full. It's a complex issue but this is my best understanding of the right thing to do. Quadratic Minimization for MLlib ALS Key: SPARK-2426 URL: https://issues.apache.org/jira/browse/SPARK-2426 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.0.0 Reporter: Debasish Das Assignee: Debasish Das Original Estimate: 504h Remaining Estimate: 504h Current ALS supports least squares and nonnegative least squares. I presented ADMM and IPM based Quadratic Minimization solvers to be used for the following ALS problems: 1. ALS with bounds 2. ALS with L1 regularization 3. ALS with Equality constraint and bounds Initial runtime comparisons are presented at Spark Summit. http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark Based on Xiangrui's feedback I am currently comparing the ADMM based Quadratic Minimization solvers with IPM based QpSolvers and the default ALS/NNLS. I will keep updating the runtime comparison results. For integration the detailed plan is as follows: 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization 2. Integrate QuadraticMinimizer in mllib ALS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3967) Spark applications fail in yarn-cluster mode when the directories configured in yarn.nodemanager.local-dirs are located on different disks/partitions
[ https://issues.apache.org/jira/browse/SPARK-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175748#comment-14175748 ] Sean Owen commented on SPARK-3967: -- You guys should make PRs for these. I am also not sure if it's so necessary to download the file into a temp directory and move it... it may cause a copy instead of rename, and in fact does here, and so is not like the file appears in the target dir atomically anyway. I'm not sure the code here cleans up the partially downloaded file in case of error and that could leave a broken file in the target dir instead of just a temp dir. The change to not copy the file when identical looks sound; I bet you can avoid checking if it exists twice. Spark applications fail in yarn-cluster mode when the directories configured in yarn.nodemanager.local-dirs are located on different disks/partitions - Key: SPARK-3967 URL: https://issues.apache.org/jira/browse/SPARK-3967 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Christophe PRÉAUD Attachments: spark-1.1.0-utils-fetch.patch, spark-1.1.0-yarn_cluster_tmpdir.patch Spark applications fail from time to time in yarn-cluster mode (but not in yarn-client mode) when yarn.nodemanager.local-dirs (Hadoop YARN config) is set to a comma-separated list of directories which are located on different disks/partitions. Steps to reproduce: 1. Set yarn.nodemanager.local-dirs (in yarn-site.xml) to a list of directories located on different partitions (the more you set, the more likely it will be to reproduce the bug): (...) property nameyarn.nodemanager.local-dirs/name valuefile:/d1/yarn/local/nm-local-dir,file:/d2/yarn/local/nm-local-dir,file:/d3/yarn/local/nm-local-dir,file:/d4/yarn/local/nm-local-dir,file:/d5/yarn/local/nm-local-dir,file:/d6/yarn/local/nm-local-dir,file:/d7/yarn/local/nm-local-dir/value /property (...) 2. Launch (several times) an application in yarn-cluster mode, it will fail (apparently randomly) from time to time -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4007) EOF exception to load an JavaRDD from HDFS
[ https://issues.apache.org/jira/browse/SPARK-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176824#comment-14176824 ] Sean Owen commented on SPARK-4007: -- Is this going to have any information associated to it? JavaRDD works fine with HDFS. EOF exception to load an JavaRDD from HDFS --- Key: SPARK-4007 URL: https://issues.apache.org/jira/browse/SPARK-4007 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Environment: hadoop-client-2.30 hadoop-hdfs-2.30 spark-core-1.10 spark-mllib-1.10 Reporter: Cristian Galán -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4002) JavaKafkaStreamSuite.testKafkaStream fails on OSX
[ https://issues.apache.org/jira/browse/SPARK-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176995#comment-14176995 ] Sean Owen commented on SPARK-4002: -- FWIW It doesn't fail for me from master right now if I run dev/run-tests. I'm on OS X Yosemite now (10.10) JavaKafkaStreamSuite.testKafkaStream fails on OSX - Key: SPARK-4002 URL: https://issues.apache.org/jira/browse/SPARK-4002 Project: Spark Issue Type: Bug Environment: Mac OSX 10.9.5. Reporter: Ryan Williams [~sowen] mentioned this on spark-dev [here|http://mail-archives.apache.org/mod_mbox/spark-dev/201409.mbox/%3ccamassdjs0fmsdc-k-4orgbhbfz2vvrmm0hfyifeeal-spft...@mail.gmail.com%3E] and I just reproduced it on {{master}} ([7e63bb4|https://github.com/apache/spark/commit/7e63bb49c526c3f872619ae14e4b5273f4c535e9]). The relevant output I get when running {{./dev/run-tests}} is: {code} [info] KafkaStreamSuite: [info] - Kafka input stream [info] Test run started [info] Test org.apache.spark.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream started [error] Test org.apache.spark.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream failed: junit.framework.AssertionFailedError: expected:3 but was:0 [error] at junit.framework.Assert.fail(Assert.java:50) [error] at junit.framework.Assert.failNotEquals(Assert.java:287) [error] at junit.framework.Assert.assertEquals(Assert.java:67) [error] at junit.framework.Assert.assertEquals(Assert.java:199) [error] at junit.framework.Assert.assertEquals(Assert.java:205) [error] at org.apache.spark.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream(JavaKafkaStreamSuite.java:129) [error] ... [info] Test run finished: 1 failed, 0 ignored, 1 total, 19.798s {code} Seems like this test should be {{@Ignore}}'d, or some note about this made on the {{README.md}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4018) RDD.reduce failing with java.lang.ClassCastException: org.apache.spark.SparkContext$$anonfun$26 cannot be cast to scala.Function2
[ https://issues.apache.org/jira/browse/SPARK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177592#comment-14177592 ] Sean Owen commented on SPARK-4018: -- Your sample code is Java, but the error seems to concern the Scala API. Are you sure the exception occurs on this invocation? Does it compile and then fail at runtime? or are you operating just in the shell? RDD.reduce failing with java.lang.ClassCastException: org.apache.spark.SparkContext$$anonfun$26 cannot be cast to scala.Function2 - Key: SPARK-4018 URL: https://issues.apache.org/jira/browse/SPARK-4018 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.1 Reporter: Haithem Turki Priority: Critical Hey all, A simple reduce operation against Spark 1.1.1 is giving me following exception: {code} 14/10/20 16:27:22 ERROR executor.Executor: Exception in task 9.7 in stage 0.0 (TID 1001) java.lang.ClassCastException: org.apache.spark.SparkContext$$anonfun$26 cannot be cast to scala.Function2 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} My code is a relatively simple map-reduce: {code} MapString, Foo aggregateTracker = rdd.map(new MapFunction(list)) .reduce(new ReduceFunction()); {code} Where: - MapFunction is of type FunctionRecord, MapString, Object - ReduceFunction is of type Function2MapString, Foo, MapString, Foo, MapString, Foo - list is just a list of Foo2 Both Foo1 and Foo2 are serializable I've tried this with both the Java and Scala API, lines for each are: org.apache.spark.api.java.JavaRDD.reduce(JavaRDD.scala:32) org.apache.spark.rdd.RDD.reduce(RDD.scala:861) The thing being flagged is always: org.apache.spark.SparkContext$$anonfun$26 (the number doesn't change). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib
[ https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177707#comment-14177707 ] Sean Owen commented on SPARK-4001: -- FWIW I do perceive Apriori to be *the* basic frequent itemset algorithm. I think this is the original paper -- at least it was on Wikipedia and looks like the right time / author: http://rakesh.agrawal-family.com/papers/vldb94apriori.pdf It is very simple, and probably what you'd cook up if you invented a solution to the problem: http://en.wikipedia.org/wiki/Apriori_algorithm Frequent itemset is not quite the same as a frequent item algorithm. From a bunch of sets of items, it tries to determine which subsets occur frequently. FP-Growth is the other itemset algorithm I have ever heard of. It's more sophisticated. I don't have a paper reference. If you're going to implement frequent itemsets, I think these are the two to start with. That said I perceive frequent itemsets to be kind of 90s and I have never had to use it myself. That is not to say they don't have use, and hey they're simple. I suppose my problem with this type of technique is that it's not really telling you whether the set occurred unusually frequently, just that it did in absolute terms. There is not a probabilistic element to these. Add Apriori algorithm to Spark MLlib Key: SPARK-4001 URL: https://issues.apache.org/jira/browse/SPARK-4001 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Jacky Li Assignee: Jacky Li Apriori is the classic algorithm for frequent item set mining in a transactional data set. It will be useful if Apriori algorithm is added to MLLib in Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4021) Kinesis code can cause compile failures with newer JDK's
[ https://issues.apache.org/jira/browse/SPARK-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177725#comment-14177725 ] Sean Owen commented on SPARK-4021: -- This code is just fine though. The error is the kind you get from Java 5 when you @Override something that is not a superclass method. Here it's an interface method, which is perfectly fine in Java 6+. The other new warnings indicate that something turn on -Xlint:all but I don't see that in the build. It seems like a Jenkins config issue and I don't know that it's anything to do with Java 7, from this? 7u71 doesn't seem to have any compiler changes, and certainly wouldn't have any breaking like this. Are we sure Jenkins hasn't somehow located a copy of Java 5 installed somewhere? Kinesis code can cause compile failures with newer JDK's Key: SPARK-4021 URL: https://issues.apache.org/jira/browse/SPARK-4021 Project: Spark Issue Type: Bug Components: Streaming Environment: JDK 7u71 Reporter: Patrick Wendell When compiled with JDK7u71, the Spark build failed due to these issues: {code} [error] -- [error] 1. WARNING in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 83) [error] private static final Logger logger = Logger.getLogger(JavaKinesisWordCountASL.class); [error] ^^ [error] The field JavaKinesisWordCountASL.logger is never read locally [error] -- [error] 2. WARNING in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 151) [error] JavaDStreamString words = unionStreams.flatMap(new FlatMapFunctionbyte[], String() { [error] ^ [error] The serializable class does not declare a static final serialVersionUID field of type long [error] -- [error] 3. ERROR in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 153) [error] public IterableString call(byte[] line) { [error] ^ [error] The method call(byte[]) of type new FlatMapFunctionbyte[],String(){} must override a superclass method [error] -- [error] 4. WARNING in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 160) [error] new PairFunctionString, String, Integer() { [error] ^^^ [error] The serializable class does not declare a static final serialVersionUID field of type long [error] -- [error] 5. ERROR in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 162) [error] public Tuple2String, Integer call(String s) { [error] ^^ [error] The method call(String) of type new PairFunctionString,String,Integer(){} must override a superclass method [error] -- [error] 6. WARNING in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 165) [error] }).reduceByKey(new Function2Integer, Integer, Integer() { [error] ^^ [error] The serializable class does not declare a static final serialVersionUID field of type long [error] -- [error] 7. ERROR in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 167) [error] public Integer call(Integer i1, Integer i2) { [error] [error] The method call(Integer, Integer) of type new Function2Integer,Integer,Integer(){} must override a superclass method [error] -- [error] 7 problems (3 errors, 4 warnings) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe,
[jira] [Commented] (SPARK-4022) Replace colt dependency (LGPL) with commons-math
[ https://issues.apache.org/jira/browse/SPARK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177759#comment-14177759 ] Sean Owen commented on SPARK-4022: -- Yeah, looks like it is only in examples at least, so I don't know if Colt ever got technically distributed (? I forget whether it goes out with transitive deps). But best to change it. I can try it, since I know Commons Math well, unless someone's already on it. Replace colt dependency (LGPL) with commons-math Key: SPARK-4022 URL: https://issues.apache.org/jira/browse/SPARK-4022 Project: Spark Issue Type: Bug Components: MLlib Reporter: Patrick Wendell Priority: Critical The colt library we use is LGPL-licensed: http://acs.lbl.gov/ACSSoftware/colt/license.html We need to swap this out for commons-math. It is also a very old library that hasn't been updated since 2004. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4018) RDD.reduce failing with java.lang.ClassCastException: org.apache.spark.SparkContext$$anonfun$26 cannot be cast to scala.Function2
[ https://issues.apache.org/jira/browse/SPARK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177766#comment-14177766 ] Sean Owen commented on SPARK-4018: -- Hm, I mean I suppose it goes without saying that the Java unit tests show that this does in general work, and I am successfully using JavaRDD all day myself with map and reduce. I think the anonymous function here is the context cleaner function perhaps, but I think that's a red herring. I noticed I don't see a org.apache.spark.SparkContext$$anonfun$26 in the byte code when built from master and this reminds me of a potential explanation. Are you building against a different version of Spark than you're running? anonymous function 26 could be something totally different at runtime if so. That would explain why you aren't seeing any problem at compile time. I imagine it's something like this and not a problem with Spark per se. RDD.reduce failing with java.lang.ClassCastException: org.apache.spark.SparkContext$$anonfun$26 cannot be cast to scala.Function2 - Key: SPARK-4018 URL: https://issues.apache.org/jira/browse/SPARK-4018 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.1 Reporter: Haithem Turki Priority: Critical Hey all, A simple reduce operation against Spark 1.1.1 is giving me following exception: {code} 14/10/20 16:27:22 ERROR executor.Executor: Exception in task 9.7 in stage 0.0 (TID 1001) java.lang.ClassCastException: org.apache.spark.SparkContext$$anonfun$26 cannot be cast to scala.Function2 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} My code is a relatively simple map-reduce: {code} MapString, Foo aggregateTracker = rdd.map(new MapFunction(list)) .reduce(new ReduceFunction()); {code} Where: - MapFunction is of type FunctionRecord, MapString, Object - ReduceFunction is of type Function2MapString, Foo, MapString, Foo, MapString, Foo - list is just a list of Foo2 Both Foo1 and Foo2 are serializable I've tried this with both the Java and Scala API, lines for each are: org.apache.spark.api.java.JavaRDD.reduce(JavaRDD.scala:32) org.apache.spark.rdd.RDD.reduce(RDD.scala:861) The thing being flagged is always: org.apache.spark.SparkContext$$anonfun$26 (the number doesn't change). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4022) Replace colt dependency (LGPL) with commons-math
[ https://issues.apache.org/jira/browse/SPARK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1416#comment-1416 ] Sean Owen commented on SPARK-4022: -- Ah right there is Jet too, not just Colt. The LGPL license actually only pertains to a few parts of Colt, in hep.aida.*, which aren't used by Spark. Another solution is just to make sure these classes never become part of the distribution. Colt and Jet themselves don't appear to be LGPL, in the main. Of course, if there was a desire to just stop using Colt+Jet anyway, I'm cool with that too. Replace colt dependency (LGPL) with commons-math Key: SPARK-4022 URL: https://issues.apache.org/jira/browse/SPARK-4022 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Assignee: Sean Owen Priority: Critical The colt library we use is LGPL-licensed: http://acs.lbl.gov/ACSSoftware/colt/license.html We need to swap this out for commons-math. It is also a very old library that hasn't been updated since 2004. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4032) Deprecate YARN alpha support in Spark 1.2
[ https://issues.apache.org/jira/browse/SPARK-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-4032: - Summary: Deprecate YARN alpha support in Spark 1.2 (was: Deprecate YARN support in Spark 1.2) Deprecate YARN alpha support in Spark 1.2 - Key: SPARK-4032 URL: https://issues.apache.org/jira/browse/SPARK-4032 Project: Spark Issue Type: Sub-task Components: Spark Core, YARN Reporter: Patrick Wendell Assignee: Prashant Sharma Priority: Blocker When someone builds for yarn alpha, we should just display a warning like {code} ***WARNING***: Support for YARN-alpha API's will be removed in Spark 1.3 (see SPARK-3445). {code} We can print a warning when the yarn-alpha profile is used: http://stackoverflow.com/questions/3416573/how-can-i-display-a-message-in-maven -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4034) get java.lang.NoClassDefFoundError: com/google/common/util/concurrent/ThreadFactoryBuilder in idea
[ https://issues.apache.org/jira/browse/SPARK-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178219#comment-14178219 ] Sean Owen commented on SPARK-4034: -- You get this when you do what? This is a Guava class. The build compiles and runs correctly, and does for me in IDEA too, so this is probably something wrong with your local setup. get java.lang.NoClassDefFoundError: com/google/common/util/concurrent/ThreadFactoryBuilder in idea Key: SPARK-4034 URL: https://issues.apache.org/jira/browse/SPARK-4034 Project: Spark Issue Type: Bug Reporter: baishuo -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4002) JavaKafkaStreamSuite.testKafkaStream fails on OSX
[ https://issues.apache.org/jira/browse/SPARK-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178239#comment-14178239 ] Sean Owen commented on SPARK-4002: -- This passes for me right now in {{master}}: mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests package mvn test -Dsuites='*KafkaStreamSuite' It also passed with the default settings -- no {{hadoop.version}}, etc. JavaKafkaStreamSuite.testKafkaStream fails on OSX - Key: SPARK-4002 URL: https://issues.apache.org/jira/browse/SPARK-4002 Project: Spark Issue Type: Bug Components: Streaming Environment: Mac OSX 10.9.5. Reporter: Ryan Williams [~sowen] mentioned this on spark-dev [here|http://mail-archives.apache.org/mod_mbox/spark-dev/201409.mbox/%3ccamassdjs0fmsdc-k-4orgbhbfz2vvrmm0hfyifeeal-spft...@mail.gmail.com%3E] and I just reproduced it on {{master}} ([7e63bb4|https://github.com/apache/spark/commit/7e63bb49c526c3f872619ae14e4b5273f4c535e9]). The relevant output I get when running {{./dev/run-tests}} is: {code} [info] KafkaStreamSuite: [info] - Kafka input stream [info] Test run started [info] Test org.apache.spark.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream started [error] Test org.apache.spark.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream failed: junit.framework.AssertionFailedError: expected:3 but was:0 [error] at junit.framework.Assert.fail(Assert.java:50) [error] at junit.framework.Assert.failNotEquals(Assert.java:287) [error] at junit.framework.Assert.assertEquals(Assert.java:67) [error] at junit.framework.Assert.assertEquals(Assert.java:199) [error] at junit.framework.Assert.assertEquals(Assert.java:205) [error] at org.apache.spark.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream(JavaKafkaStreamSuite.java:129) [error] ... [info] Test run finished: 1 failed, 0 ignored, 1 total, 19.798s {code} Seems like this test should be {{@Ignore}}'d, or some note about this made on the {{README.md}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3955) Different versions between jackson-mapper-asl and jackson-core-asl
[ https://issues.apache.org/jira/browse/SPARK-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178294#comment-14178294 ] Sean Owen commented on SPARK-3955: -- I think the issue is matching the version used by Hadoop. Your usage of Jackson shouldn't affect what Spark uses, but I imagine you're saying this is a case of dependency leakage? Different versions between jackson-mapper-asl and jackson-core-asl -- Key: SPARK-3955 URL: https://issues.apache.org/jira/browse/SPARK-3955 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.1.0 Reporter: Jongyoul Lee In the parent pom.xml, specified a version of jackson-mapper-asl. This is used by sql/hive/pom.xml. When mvn assembly runs, however, jackson-mapper-asl is not same as jackson-core-asl. This is because other libraries use several versions of jackson, so other version of jackson-core-asl is assembled. Simply, fix this problem if pom.xml has a specific version information of jackson-core-asl. If it's not set, a version 1.9.11 is merged info assembly.jar and we cannot use jackson library properly. {code} [INFO] Including org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8 in the shaded jar. [INFO] Including org.codehaus.jackson:jackson-core-asl:jar:1.9.11 in the shaded jar. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3359) `sbt/sbt unidoc` doesn't work with Java 8
[ https://issues.apache.org/jira/browse/SPARK-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178583#comment-14178583 ] Sean Owen commented on SPARK-3359: -- Sure, but it can be fixed right now too. I tried to figure out how to set plugin properties in SBT and failed, although, I'm sure it's not a bit hard to someone who knows how it works. `sbt/sbt unidoc` doesn't work with Java 8 - Key: SPARK-3359 URL: https://issues.apache.org/jira/browse/SPARK-3359 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.1.0 Reporter: Xiangrui Meng Priority: Minor It seems that Java 8 is stricter on JavaDoc. I got many error messages like {code} [error] /Users/meng/src/spark-mengxr/core/target/java/org/apache/hadoop/mapred/SparkHadoopMapRedUtil.java:2: error: modifier private not allowed here [error] private abstract interface SparkHadoopMapRedUtil { [error] ^ {code} This is minor because we can always use Java 6/7 to generate the doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4021) Issues observed after upgrading Jenkins to JDK7u71
[ https://issues.apache.org/jira/browse/SPARK-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179283#comment-14179283 ] Sean Owen commented on SPARK-4021: -- [~shaneknapp] I don't think this can have anything to do with OpenJDK per se. The error clearly indicates that either a) javac 5 or earlier was used, or b) javac was told to target source 1.5 or earlier. OpenJDK Java 7 will compile Spark source entirely correctly. I just tried it on Ubuntu even. There is something else at play here -- some other Java or other configuration was used after the change. It's probably nothing to do with the upgrade per se but something to do with changing or unsetting things like JAVA_HOME. Issues observed after upgrading Jenkins to JDK7u71 -- Key: SPARK-4021 URL: https://issues.apache.org/jira/browse/SPARK-4021 Project: Spark Issue Type: Bug Components: Project Infra Environment: JDK 7u71 Reporter: Patrick Wendell Assignee: shane knapp The following compile failure was observed after adding JDK7u71 to Jenkins. However, this is likely a misconfiguration from Jenkins rather than an issue with Spark (these errors are specific to JDK5, in fact). {code} [error] -- [error] 1. WARNING in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 83) [error] private static final Logger logger = Logger.getLogger(JavaKinesisWordCountASL.class); [error] ^^ [error] The field JavaKinesisWordCountASL.logger is never read locally [error] -- [error] 2. WARNING in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 151) [error] JavaDStreamString words = unionStreams.flatMap(new FlatMapFunctionbyte[], String() { [error] ^ [error] The serializable class does not declare a static final serialVersionUID field of type long [error] -- [error] 3. ERROR in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 153) [error] public IterableString call(byte[] line) { [error] ^ [error] The method call(byte[]) of type new FlatMapFunctionbyte[],String(){} must override a superclass method [error] -- [error] 4. WARNING in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 160) [error] new PairFunctionString, String, Integer() { [error] ^^^ [error] The serializable class does not declare a static final serialVersionUID field of type long [error] -- [error] 5. ERROR in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 162) [error] public Tuple2String, Integer call(String s) { [error] ^^ [error] The method call(String) of type new PairFunctionString,String,Integer(){} must override a superclass method [error] -- [error] 6. WARNING in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 165) [error] }).reduceByKey(new Function2Integer, Integer, Integer() { [error] ^^ [error] The serializable class does not declare a static final serialVersionUID field of type long [error] -- [error] 7. ERROR in /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java (at line 167) [error] public Integer call(Integer i1, Integer i2) { [error] [error] The method call(Integer, Integer) of type new Function2Integer,Integer,Integer(){} must override a superclass method [error] -- [error] 7 problems (3 errors, 4 warnings) {code} -- This message was sent by Atlassian JIRA
[jira] [Resolved] (SPARK-4018) RDD.reduce failing with java.lang.ClassCastException: org.apache.spark.SparkContext$$anonfun$26 cannot be cast to scala.Function2
[ https://issues.apache.org/jira/browse/SPARK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4018. -- Resolution: Not a Problem RDD.reduce failing with java.lang.ClassCastException: org.apache.spark.SparkContext$$anonfun$26 cannot be cast to scala.Function2 - Key: SPARK-4018 URL: https://issues.apache.org/jira/browse/SPARK-4018 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.1 Reporter: Haithem Turki Priority: Critical Hey all, A simple reduce operation against Spark 1.1.1 is giving me following exception: {code} 14/10/20 16:27:22 ERROR executor.Executor: Exception in task 9.7 in stage 0.0 (TID 1001) java.lang.ClassCastException: org.apache.spark.SparkContext$$anonfun$26 cannot be cast to scala.Function2 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} My code is a relatively simple map-reduce: {code} MapString, Foo aggregateTracker = rdd.map(new MapFunction(list)) .reduce(new ReduceFunction()); {code} Where: - MapFunction is of type FunctionRecord, MapString, Object - ReduceFunction is of type Function2MapString, Foo, MapString, Foo, MapString, Foo - list is just a list of Foo2 Both Foo1 and Foo2 are serializable I've tried this with both the Java and Scala API, lines for each are: org.apache.spark.api.java.JavaRDD.reduce(JavaRDD.scala:32) org.apache.spark.rdd.RDD.reduce(RDD.scala:861) The thing being flagged is always: org.apache.spark.SparkContext$$anonfun$26 (the number doesn't change). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4044) Thriftserver fails to start when JAVA_HOME points to JRE instead of JDK
[ https://issues.apache.org/jira/browse/SPARK-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179620#comment-14179620 ] Sean Owen commented on SPARK-4044: -- How about using {{unzip -l}} to probe the contents of the .jar files? They're just zip files after all. You check the exit status to see if it contained the entry in question -- 0 if it did, non-0 otherwise. I am not sure how this will interact with the check for an invalid JAR file that is also in the script though. Thriftserver fails to start when JAVA_HOME points to JRE instead of JDK --- Key: SPARK-4044 URL: https://issues.apache.org/jira/browse/SPARK-4044 Project: Spark Issue Type: Bug Components: Documentation, SQL Affects Versions: 1.1.0, 1.2.0 Reporter: Josh Rosen If {{JAVA_HOME}} points to a JRE instead of a JDK, e.g. {code} JAVA_HOME=/usr/lib/jvm/java-7-oracle/jre/ {code} instead of {code} JAVA_HOME=/usr/lib/jvm/java-7-oracle/ {code} Then start-thriftserver.sh will fail with Datanucleus JAR errors: {code} Caused by: java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at javax.jdo.JDOHelper$18.run(JDOHelper.java:2018) at javax.jdo.JDOHelper$18.run(JDOHelper.java:2016) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.forName(JDOHelper.java:2015) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1162) {code} The root problem seems to be that {{compute-classpath.sh}} uses {{JAVA_HOME}} to find the path to the {{jar}} command, which isn't present in JRE directories. This leads to silent failures when adding the Datanucleus JARs to the classpath. This same issue presumably affects the command that checks whether Spark was built on Java 7 but run on Java 6. We should probably add some error-handling that checks whether the {{jar}} command is actually present and throws an error otherwise, and also update the documentation to say that `JAVA_HOME` must point to a JDK and not a JRE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4046) Incorrect Java example on site
[ https://issues.apache.org/jira/browse/SPARK-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-4046: - Priority: Minor (was: Critical) Affects Version/s: 1.1.0 Summary: Incorrect Java example on site (was: Incorrect examples on site) Incorrect Java example on site -- Key: SPARK-4046 URL: https://issues.apache.org/jira/browse/SPARK-4046 Project: Spark Issue Type: Bug Components: Documentation, Java API Affects Versions: 1.1.0 Environment: Web Reporter: Ian Babrou Priority: Minor https://spark.apache.org/examples.html Here word count example for java is incorrect. It should be mapToPair instead of map. Correct example is here: https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/JavaWordCount.java -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3359) `sbt/sbt unidoc` doesn't work with Java 8
[ https://issues.apache.org/jira/browse/SPARK-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181503#comment-14181503 ] Sean Owen commented on SPARK-3359: -- I inquired about these with the plugin project: https://github.com/typesafehub/genjavadoc/issues/43 https://github.com/typesafehub/genjavadoc/issues/44 `sbt/sbt unidoc` doesn't work with Java 8 - Key: SPARK-3359 URL: https://issues.apache.org/jira/browse/SPARK-3359 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.1.0 Reporter: Xiangrui Meng Priority: Minor It seems that Java 8 is stricter on JavaDoc. I got many error messages like {code} [error] /Users/meng/src/spark-mengxr/core/target/java/org/apache/hadoop/mapred/SparkHadoopMapRedUtil.java:2: error: modifier private not allowed here [error] private abstract interface SparkHadoopMapRedUtil { [error] ^ {code} This is minor because we can always use Java 6/7 to generate the doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4022) Replace colt dependency (LGPL) with commons-math
[ https://issues.apache.org/jira/browse/SPARK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181524#comment-14181524 ] Sean Owen commented on SPARK-4022: -- I have begun work on this. You can see the base change here: https://github.com/srowen/spark/commits/SPARK-4022 https://github.com/srowen/spark/commit/8246dbd39be7ff162392c59c28dee74a1419e236 There are 4 potential problems, each of which might need some assistance as to how to proceed. *HypothesisTestSuite failure* CC [~dorx] {code} HypothesisTestSuite: ... - chi squared pearson RDD[LabeledPoint] *** FAILED *** org.apache.commons.math3.exception.NotStrictlyPositiveException: shape (0) at org.apache.commons.math3.distribution.GammaDistribution.init(GammaDistribution.java:168) ... at org.apache.spark.mllib.stat.test.ChiSqTest$.chiSquaredMatrix(ChiSqTest.scala:241) at org.apache.spark.mllib.stat.test.ChiSqTest$$anonfun$chiSquaredFeatures$4.apply(ChiSqTest.scala:134) at org.apache.spark.mllib.stat.test.ChiSqTest$$anonfun$chiSquaredFeatures$4.apply(ChiSqTest.scala:125) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) {code} The commons-math3 implementation complains that a chi squared distribution is created with 0 degrees of freedom. It looks like that for col 645 in this test, there is just one feature, 0, and two labels. The contingency table should be at least 2x2 but it's 1x2 only, and that's not valid AFAICT. I spent some time staring at this and don't quite know what to make of fixing it. *KMeansClusterSuite failure* CC [~mengxr] {code} KMeansClusterSuite: - task size should be small in both training and prediction *** FAILED *** org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 8, localhost): java.io.InvalidClassException: org.apache.spark.util.random.PoissonSampler; local class incompatible: stream classdesc serialVersionUID = -795011761847245121, local class serialVersionUID = 424924496318419 {code} I understand what it's saying. PoissonSampler did indeed change and its serialized form changed, but, I don't see how two incompatible versions are turning up as a result of one clean build. *RandomForestSuite failure* CC [~josephkb] {code} RandomForestSuite: ... - alternating categorical and continuous features with multiclass labels to test indexing *** FAILED *** java.lang.AssertionError: assertion failed: validateClassifier calculated accuracy 0.75 but required 1.0. at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.mllib.tree.RandomForestSuite$.validateClassifier(RandomForestSuite.scala:227) {code} My guess on this one is that something is sampled differently as a result of this change, and happens to make the decision forest come out differently on this toy data set, and it happens to get 3/4 instead of 4/4 right now. This may be ignorable, meaning, the test was actually a little too strict and optimistic. *Less efficient seeded sampling for series of Poisson variables* CC [~dorx] Colt had a way to seed the RNG, then generate a one-off sample from a Poisson distribution with mean m. commons-math3 lets you seed an instance of a Poisson distribution with mean m, but then not change that mean. To simulate, it's necessary to recreate a Poisson distribution with each successive mean with a deterministic series of seeds. See here: https://github.com/srowen/spark/commit/8246dbd39be7ff162392c59c28dee74a1419e236#diff-0544248063499d8688c21f49be0918c8R285 This isn't a problem per se but could be slower. I am not sure if this code can be changed to not require constant reinitialization of the distribution. Replace colt dependency (LGPL) with commons-math Key: SPARK-4022 URL: https://issues.apache.org/jira/browse/SPARK-4022 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Assignee: Sean Owen Priority: Critical The colt library we use is LGPL-licensed: http://acs.lbl.gov/ACSSoftware/colt/license.html We need to swap this out for commons-math. It is also a very old library that hasn't been updated since 2004. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4063) Add the ability to send messages to Kafka in the stream
[ https://issues.apache.org/jira/browse/SPARK-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181526#comment-14181526 ] Sean Owen commented on SPARK-4063: -- Is this like a streaming operation that saves an RDD to a Kafka queue? The general ability to write to Kafka is of course accomplishable by just using the Kafka APIs. Add the ability to send messages to Kafka in the stream --- Key: SPARK-4063 URL: https://issues.apache.org/jira/browse/SPARK-4063 Project: Spark Issue Type: New Feature Components: Input/Output Reporter: Helena Edelson Currently you can only receive from Kafka in the stream. This would be adding the ability to publish from the stream as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable
[ https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182658#comment-14182658 ] Sean Owen commented on SPARK-4066: -- Does this work? -Dscalastyle.failOnViolation was already a built-in way to control this, so if it didn't work a) I'm surprised and b) not clear this will work either. I suppose I don't think this is something that needs to be configurable. The pain is just that someone writes new code that violates style rules? this has to be fixed anyway and fixing style stuff is not hard. Make whether maven builds fails on scalastyle violation configurable Key: SPARK-4066 URL: https://issues.apache.org/jira/browse/SPARK-4066 Project: Spark Issue Type: Improvement Reporter: Ted Yu Priority: Minor Attachments: spark-4066-v1.txt Here is the thread Koert started: http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bitsubj=scalastyle+annoys+me+a+little+bit It would be flexible if whether maven build fails due to scalastyle violation configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4022) Replace colt dependency (LGPL) with commons-math
[ https://issues.apache.org/jira/browse/SPARK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182781#comment-14182781 ] Sean Owen commented on SPARK-4022: -- [~mengxr] [~josephkb] Great, most of this is resolved now. {{KMeansSuite}} still fails for me, yes, after a clean build. I wonder if the problem is that the commons-math3 code is in a different package in the assembly? Before thinking too hard about it, let me open a PR to see what Jenkins makes of it. I also implemented the different approach to Poisson sampler seeding. It would be good to cap the size of the cache, although, I wonder if that could lead to problems. If a sampler is removed and recreated, it will start generating the same sequence again from the same seed. If it is not seeded, it will be nondeterministic. It looks like {{RandomDataGenerator}} instances are short-lived and applied to a fixed set of mean values, which suggests this won't blow up readily. I admit I just glanced at the usages though. Replace colt dependency (LGPL) with commons-math Key: SPARK-4022 URL: https://issues.apache.org/jira/browse/SPARK-4022 Project: Spark Issue Type: Bug Components: MLlib, Spark Core Reporter: Patrick Wendell Assignee: Sean Owen Priority: Critical The colt library we use is LGPL-licensed: http://acs.lbl.gov/ACSSoftware/colt/license.html We need to swap this out for commons-math. It is also a very old library that hasn't been updated since 2004. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable
[ https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183340#comment-14183340 ] Sean Owen commented on SPARK-4066: -- Yeah, it's another flag to add to the build, but it's minor. But how about instead just making scalastyle not run for package phase at all? I don't feel strongly against this, to be sure. Make whether maven builds fails on scalastyle violation configurable Key: SPARK-4066 URL: https://issues.apache.org/jira/browse/SPARK-4066 Project: Spark Issue Type: Improvement Reporter: Ted Yu Priority: Minor Attachments: spark-4066-v1.txt Here is the thread Koert started: http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bitsubj=scalastyle+annoys+me+a+little+bit It would be flexible if whether maven build fails due to scalastyle violation configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4111) [MLlib] Implement regression model evaluation metrics
[ https://issues.apache.org/jira/browse/SPARK-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186496#comment-14186496 ] Sean Owen commented on SPARK-4111: -- Is this more than just MAE / RMSE / R2? It might be handy to have a little utility class for these although they're almost one-liners already. [MLlib] Implement regression model evaluation metrics - Key: SPARK-4111 URL: https://issues.apache.org/jira/browse/SPARK-4111 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.2.0 Reporter: Yanbo Liang Supervised machine learning include classification and regression. There is classification metrics (BinaryClassificationMetrics) in MLlib, we also need regression metrics to evaluate the regression model and tunning parameter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4121) Master build failures after shading commons-math3
[ https://issues.apache.org/jira/browse/SPARK-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187475#comment-14187475 ] Sean Owen commented on SPARK-4121: -- Yeah I was seeing this locally, but not on the Jenkins test build, so chalked it up to weirdness in my build. I think the answer may indeed be to do the relocating in core/mllib itself. I'll get on that. Master build failures after shading commons-math3 - Key: SPARK-4121 URL: https://issues.apache.org/jira/browse/SPARK-4121 Project: Spark Issue Type: Bug Components: Build, MLlib, Spark Core Affects Versions: 1.2.0 Reporter: Xiangrui Meng Priority: Blocker The Spark master Maven build kept failing after we replace colt with commons-math3 and shade the latter: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/ The error message is: {code} KMeansClusterSuite: Spark assembly has been built with Hive, including Datanucleus jars on classpath Spark assembly has been built with Hive, including Datanucleus jars on classpath - task size should be small in both training and prediction *** FAILED *** org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 9, localhost): java.io.InvalidClassException: org.apache.spark.util.random.PoissonSampler; local class incompatible: stream classdesc serialVersionUID = -795011761847245121, local class serialVersionUID = 424924496318419 java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57) org.apache.spark.scheduler.Task.run(Task.scala:56) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) {code} This test passed in local sbt build. So the issue should be caused by shading. Maybe there are two versions of commons-math3 (hadoop depends on it), or MLlib doesn't use the shaded version at compile. [~srowen] Could you take a look? Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4132) Spark uses incompatible HDFS API
[ https://issues.apache.org/jira/browse/SPARK-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4132. -- Resolution: Duplicate I'm all but certain you're describing the same thing as SPARK-4078 Spark uses incompatible HDFS API Key: SPARK-4132 URL: https://issues.apache.org/jira/browse/SPARK-4132 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Environment: Spark1.1.0 on Hadoop1.2.1 CentOS 6.3 64bit Reporter: kuromatsu nobuyuki Priority: Minor When I enable event logging and set it to output to HDFS, initialization fails with 'java.lang.ClassNotFoundException' (see trace below). I found that an API incompatibility in org.apache.hadoop.fs.permission.FsPermission between Hadoop 1.0.4 and Hadoop 1.1.0 (and above) causes this error (org.apache.hadoop.fs.permission.FsPermission$2 is used in 1.0.4 but doesn't exist in my 1.2.1 environment). I think that the Spark jar file pre-built for Hadoop1.X should be built on Hadoop Stable version(Hadoop 1.2.1). 2014-10-24 10:43:22,893 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: readAndProcess threw exception java.lang.RuntimeException: readObject can't find class org.apache.hadoop.fs.permission.FsPermission$2. Count of bytes read: 0 java.lang.RuntimeException: readObject can't find class org.apache.hadoop.fs.permission.FsPermission$2 at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:233) at org.apache.hadoop.ipc.RPC$Invocation.readFields(RPC.java:106) at org.apache.hadoop.ipc.Server$Connection.processData(Server.java:1347) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1326) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1226) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:577) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:384) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:701) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.permission.FsPermission$2 at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:231) ... 9 more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-683) Spark 0.7 with Hadoop 1.0 does not work with current AMI's HDFS installation
[ https://issues.apache.org/jira/browse/SPARK-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188160#comment-14188160 ] Sean Owen commented on SPARK-683: - PS I think this also turns out to be the same as SPARK-4078 Spark 0.7 with Hadoop 1.0 does not work with current AMI's HDFS installation Key: SPARK-683 URL: https://issues.apache.org/jira/browse/SPARK-683 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 0.7.0 Reporter: Tathagata Das A simple saveAsObjectFile() leads to the following error. org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NoSuchMethodException: org.apache.hadoop.hdfs.protocol.ClientProtocol.create(java.lang.String, org.apache.hadoop.fs.permission.FsPermission, java.lang.String, boolean, boolean, short, long) at java.lang.Class.getMethod(Class.java:1622) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4147) Remove log4j dependency
[ https://issues.apache.org/jira/browse/SPARK-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189739#comment-14189739 ] Sean Owen commented on SPARK-4147: -- Yes, slf4j does not let you actually control logging levels. You have to call into a specific logging implementation for this, and that's why log4j is used directly. You can re-route log4j to slf4j and slf4j to your own logger, still. I think this on purpose, so I don't necessarily think this should change. Remove log4j dependency --- Key: SPARK-4147 URL: https://issues.apache.org/jira/browse/SPARK-4147 Project: Spark Issue Type: Wish Components: Spark Core Affects Versions: 1.1.0 Reporter: Tobias Pfeiffer spark-core has a hard dependency on log4j, which shouldn't be necessary since slf4j is used. I tried to exclude slf4j-log4j12 and log4j dependencies in my sbt file. Excluding org.slf4j.slf4j-log4j12 works fine if logback is on the classpath. However, removing the log4j dependency fails because in https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/Logging.scala#L121 a static method of org.apache.log4j.LogManager is accessed *even if* log4j is not in use. I guess removing all dependencies on log4j may be a bigger task, but it would be a great help if the access to LogManager would be done only if log4j use was detected before. (This is a 2-line change.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4121) Master build failures after shading commons-math3
[ https://issues.apache.org/jira/browse/SPARK-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190708#comment-14190708 ] Sean Owen commented on SPARK-4121: -- Sorry I have been traveling without internet. When I tried revising the shading last night, tests hung. Probably due to network weirdness. My concern was that by shading just 2 modules you have to watch for other modules that accidentally start using Math3. I think a less drastic solution like modulating the version with hadoop profile sounds fine in principle. I'd have to look up what version goes with what. I think we are not using any very new methods. Sorry about that and please proceed as you see fit though I will have another look tonight. Master build failures after shading commons-math3 - Key: SPARK-4121 URL: https://issues.apache.org/jira/browse/SPARK-4121 Project: Spark Issue Type: Bug Components: Build, MLlib, Spark Core Affects Versions: 1.2.0 Reporter: Xiangrui Meng Priority: Blocker The Spark master Maven build kept failing after we replace colt with commons-math3 and shade the latter: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/ The error message is: {code} KMeansClusterSuite: Spark assembly has been built with Hive, including Datanucleus jars on classpath Spark assembly has been built with Hive, including Datanucleus jars on classpath - task size should be small in both training and prediction *** FAILED *** org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 9, localhost): java.io.InvalidClassException: org.apache.spark.util.random.PoissonSampler; local class incompatible: stream classdesc serialVersionUID = -795011761847245121, local class serialVersionUID = 424924496318419 java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57) org.apache.spark.scheduler.Task.run(Task.scala:56) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) {code} This test passed in local sbt build. So the issue should be caused by shading. Maybe there are two versions of commons-math3 (hadoop depends on it), or MLlib doesn't use the shaded version at compile. [~srowen] Could you take a look? Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4162) Make scripts symlinkable
[ https://issues.apache.org/jira/browse/SPARK-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4162. -- Resolution: Duplicate Duplicate of https://issues.apache.org/jira/browse/SPARK-3482 and https://issues.apache.org/jira/browse/SPARK-2960 Have a look at the PR for 3482 and suggest changes. This has come up several times so would be good to get it fixed. Make scripts symlinkable - Key: SPARK-4162 URL: https://issues.apache.org/jira/browse/SPARK-4162 Project: Spark Issue Type: Improvement Components: Deploy, EC2, Spark Shell Affects Versions: 1.1.0 Environment: Mac, linux Reporter: Shay Seng Scripts are not symlink-able because they all use: FWDIR=$(cd `dirname $0`/..; pwd) to detect the parent Spark dir, which doesn't take into account symlinks. Instead replace the above line with: SOURCE=$0; SCRIPT=`basename $SOURCE`; while [ -h $SOURCE ]; do SCRIPT=`basename $SOURCE`; LOOKUP=`ls -ld $SOURCE`; TARGET=`expr $LOOKUP : '.*- \(.*\)$'`; if expr ${TARGET:-.}/ : '/.*/$' /dev/null; then SOURCE=${TARGET:-.}; else SOURCE=`dirname $SOURCE`/${TARGET:-.}; fi; done; FWDIR=$(cd `dirname $SOURCE`/..; pwd) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4170) Closure problems when running Scala app that extends App
Sean Owen created SPARK-4170: Summary: Closure problems when running Scala app that extends App Key: SPARK-4170 URL: https://issues.apache.org/jira/browse/SPARK-4170 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Sean Owen Priority: Minor Michael Albert noted this problem on the mailing list (http://apache-spark-user-list.1001560.n3.nabble.com/BUG-when-running-as-quot-extends-App-quot-closures-don-t-capture-variables-td17675.html): {code} object DemoBug extends App { val conf = new SparkConf() val sc = new SparkContext(conf) val rdd = sc.parallelize(List(A,B,C,D)) val str1 = A val rslt1 = rdd.filter(x = { x != A }).count val rslt2 = rdd.filter(x = { str1 != null x != A }).count println(DemoBug: rslt1 = + rslt1 + rslt2 = + rslt2) } {code} This produces the output: {code} DemoBug: rslt1 = 3 rslt2 = 0 {code} If instead there is a proper main(), it works as expected. I also this week noticed that in a program which extends App, some values were inexplicably null in a closure. When changing to use main(), it was fine. I assume there is a problem with variables not being added to the closure when main() doesn't appear in the standard way. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4196) Streaming + checkpointing yields NotSerializableException for Hadoop Configuration from saveAsNewAPIHadoopFiles ?
Sean Owen created SPARK-4196: Summary: Streaming + checkpointing yields NotSerializableException for Hadoop Configuration from saveAsNewAPIHadoopFiles ? Key: SPARK-4196 URL: https://issues.apache.org/jira/browse/SPARK-4196 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.1.0 Reporter: Sean Owen I am reasonably sure there is some issue here in Streaming and that I'm not missing something basic, but not 100%. I went ahead and posted it as a JIRA to track, since it's come up a few times before without resolution, and right now I can't get checkpointing to work at all. When Spark Streaming checkpointing is enabled, I see a NotSerializableException thrown for a Hadoop Configuration object, and it seems like it is not one from my user code. Before I post my particular instance see http://mail-archives.apache.org/mod_mbox/spark-user/201408.mbox/%3c1408135046777-12202.p...@n3.nabble.com%3E for another occurrence. I was also on customer site last week debugging an identical issue with checkpointing in a Scala-based program and they also could not enable checkpointing without hitting exactly this error. The essence of my code is: {code} final JavaSparkContext sparkContext = new JavaSparkContext(sparkConf); JavaStreamingContextFactory streamingContextFactory = new JavaStreamingContextFactory() { @Override public JavaStreamingContext create() { return new JavaStreamingContext(sparkContext, new Duration(batchDurationMS)); } }; streamingContext = JavaStreamingContext.getOrCreate( checkpointDirString, sparkContext.hadoopConfiguration(), streamingContextFactory, false); streamingContext.checkpoint(checkpointDirString); {code} It yields: {code} 2014-10-31 14:29:00,211 ERROR OneForOneStrategy:66 org.apache.hadoop.conf.Configuration - field (class org.apache.spark.streaming.dstream.PairDStreamFunctions$$anonfun$9, name: conf$2, type: class org.apache.hadoop.conf.Configuration) - object (class org.apache.spark.streaming.dstream.PairDStreamFunctions$$anonfun$9, function2) - field (class org.apache.spark.streaming.dstream.ForEachDStream, name: org$apache$spark$streaming$dstream$ForEachDStream$$foreachFunc, type: interface scala.Function2) - object (class org.apache.spark.streaming.dstream.ForEachDStream, org.apache.spark.streaming.dstream.ForEachDStream@cb8016a) ... {code} This looks like it's due to PairRDDFunctions, as this saveFunc seems to be org.apache.spark.streaming.dstream.PairDStreamFunctions$$anonfun$9 : {code} def saveAsNewAPIHadoopFiles( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]], conf: Configuration = new Configuration ) { val saveFunc = (rdd: RDD[(K, V)], time: Time) = { val file = rddToFileName(prefix, suffix, time) rdd.saveAsNewAPIHadoopFile(file, keyClass, valueClass, outputFormatClass, conf) } self.foreachRDD(saveFunc) } {code} Is that not a problem? but then I don't know how it would ever work in Spark. But then again I don't see why this is an issue and only when checkpointing is enabled. Long-shot, but I wonder if it is related to closure issues like https://issues.apache.org/jira/browse/SPARK-1866 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib
[ https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194315#comment-14194315 ] Sean Owen commented on SPARK-1406: -- I put some comments on the PR. Thanks for starting on this. I think PMML interoperability is indeed helpful. So, one big issue here is that MLlib does not at the moment have any notion of a schema. PMML does, and this is vital to actually using the model elsewhere. You have to document what the variables are so they can be matched up with the same variables in another tool. So it's not possible now to do anything but make a model with field_1, field_2, ... This calls into question whether PMML can be meaningfully exported at this point from MLlib? Maybe it will have to wait until other PRs go in that start to add schema. I also thought it would be a little better to separate the representation of a model, from utility methods to write the model to things like files. The latter can be at least separated out of the type hierarchy. I'm also wondering how much value it adds to design for non-PMML export at this stage. (Finally I have some code lying around here that will translate the MLlib logistic regression model to PMML. I can put that in the pot at a suitable time.) PMML model evaluation support via MLib -- Key: SPARK-1406 URL: https://issues.apache.org/jira/browse/SPARK-1406 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Thomas Darimont Attachments: SPARK-1406.pdf, kmeans.xml It would be useful if spark would provide support the evaluation of PMML models (http://www.dmg.org/v4-2/GeneralStructure.html). This would allow to use analytical models that were created with a statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which would perform the actual model evaluation for a given input tuple. The PMML model would then just contain the parameterization of an analytical model. Other projects like JPMML-Evaluator do a similar thing. https://github.com/jpmml/jpmml/tree/master/pmml-evaluator -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4206) BlockManager warnings in local mode: Block $blockId already exists on this machine; not re-adding it
[ https://issues.apache.org/jira/browse/SPARK-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194550#comment-14194550 ] Sean Owen commented on SPARK-4206: -- I think there was a discussion about this and the consensus was that these aren't anything to worry about and can be info-level messages? BlockManager warnings in local mode: Block $blockId already exists on this machine; not re-adding it - Key: SPARK-4206 URL: https://issues.apache.org/jira/browse/SPARK-4206 Project: Spark Issue Type: Bug Reporter: Imran Rashid Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2432) Apriori algorithm for frequent itemset mining
[ https://issues.apache.org/jira/browse/SPARK-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-2432. -- Resolution: Duplicate Resolving as duplicate of the later issue with more discussion. Apriori algorithm for frequent itemset mining - Key: SPARK-2432 URL: https://issues.apache.org/jira/browse/SPARK-2432 Project: Spark Issue Type: Improvement Components: MLlib Reporter: lukovnikov A parallel implementation of the apriori algorithm. Apriori is a well-known and simple algorithm that finds frequent itemsets and lends itself perfectly for a parallel implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3954) source code optimization
[ https://issues.apache.org/jira/browse/SPARK-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195956#comment-14195956 ] Sean Owen commented on SPARK-3954: -- source code optimization is not a good JIRA title. Please don't change it back again. source code optimization Key: SPARK-3954 URL: https://issues.apache.org/jira/browse/SPARK-3954 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.0.0, 1.1.0 Reporter: 宿荣全 about convert files to RDDS there are 3 loops with files sequence in spark source. loops files sequence: 1.files.map(...) 2.files.zip(fileRDDs) 3.files-size.foreach modiy 3 recursions to 1 recursion. spark source code: private def filesToRDD(files: Seq[String]): RDD[(K, V)] = { val fileRDDs = files.map(file = context.sparkContext.newAPIHadoopFile[K, V, F](file)) files.zip(fileRDDs).foreach { case (file, rdd) = { if (rdd.partitions.size == 0) { logError(File + file + has no data in it. Spark Streaming can only ingest + files that have been \moved\ to the directory assigned to the file stream. + Refer to the streaming programming guide for more details.) } }} new UnionRDD(context.sparkContext, fileRDDs) } // --- modified code: private def filesToRDD(files: Seq[String]): RDD[(K, V)] = { val fileRDDs = for (file - files; rdd = context.sparkContext.newAPIHadoopFile[K, V, F](file)) yield { if (rdd.partitions.size == 0) { logError(File + file + has no data in it. Spark Streaming can only ingest + files that have been \moved\ to the directory assigned to the file stream. + Refer to the streaming programming guide for more details.) } rdd } new UnionRDD(context.sparkContext, fileRDDs) } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3954) Optimization to FileInputDStream
[ https://issues.apache.org/jira/browse/SPARK-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-3954: - Summary: Optimization to FileInputDStream (was: source code optimization) Optimization to FileInputDStream Key: SPARK-3954 URL: https://issues.apache.org/jira/browse/SPARK-3954 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.0.0, 1.1.0 Reporter: 宿荣全 about convert files to RDDS there are 3 loops with files sequence in spark source. loops files sequence: 1.files.map(...) 2.files.zip(fileRDDs) 3.files-size.foreach modiy 3 recursions to 1 recursion. spark source code: private def filesToRDD(files: Seq[String]): RDD[(K, V)] = { val fileRDDs = files.map(file = context.sparkContext.newAPIHadoopFile[K, V, F](file)) files.zip(fileRDDs).foreach { case (file, rdd) = { if (rdd.partitions.size == 0) { logError(File + file + has no data in it. Spark Streaming can only ingest + files that have been \moved\ to the directory assigned to the file stream. + Refer to the streaming programming guide for more details.) } }} new UnionRDD(context.sparkContext, fileRDDs) } // --- modified code: private def filesToRDD(files: Seq[String]): RDD[(K, V)] = { val fileRDDs = for (file - files; rdd = context.sparkContext.newAPIHadoopFile[K, V, F](file)) yield { if (rdd.partitions.size == 0) { logError(File + file + has no data in it. Spark Streaming can only ingest + files that have been \moved\ to the directory assigned to the file stream. + Refer to the streaming programming guide for more details.) } rdd } new UnionRDD(context.sparkContext, fileRDDs) } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4220) Spark New Feature
[ https://issues.apache.org/jira/browse/SPARK-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4220. -- Resolution: Invalid Given two empty JIRAs were opened, I assume these were accidental. Spark New Feature - Key: SPARK-4220 URL: https://issues.apache.org/jira/browse/SPARK-4220 Project: Spark Issue Type: New Feature Reporter: Tao Li -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4219) Spark New Feature
[ https://issues.apache.org/jira/browse/SPARK-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4219. -- Resolution: Invalid Given two empty JIRAs were opened, I assume these were accidental. Spark New Feature - Key: SPARK-4219 URL: https://issues.apache.org/jira/browse/SPARK-4219 Project: Spark Issue Type: New Feature Reporter: Tao Li -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4196) Streaming + checkpointing yields NotSerializableException for Hadoop Configuration from saveAsNewAPIHadoopFiles ?
[ https://issues.apache.org/jira/browse/SPARK-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196762#comment-14196762 ] Sean Owen commented on SPARK-4196: -- Same problem I'm afraid. The serialization error suggests it's the fact that the configuration -- whatever its source in the caller -- is serialized in a call to foreachRDD in saveAsNewAPIHadoopFiles. Streaming + checkpointing yields NotSerializableException for Hadoop Configuration from saveAsNewAPIHadoopFiles ? - Key: SPARK-4196 URL: https://issues.apache.org/jira/browse/SPARK-4196 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.1.0 Reporter: Sean Owen I am reasonably sure there is some issue here in Streaming and that I'm not missing something basic, but not 100%. I went ahead and posted it as a JIRA to track, since it's come up a few times before without resolution, and right now I can't get checkpointing to work at all. When Spark Streaming checkpointing is enabled, I see a NotSerializableException thrown for a Hadoop Configuration object, and it seems like it is not one from my user code. Before I post my particular instance see http://mail-archives.apache.org/mod_mbox/spark-user/201408.mbox/%3c1408135046777-12202.p...@n3.nabble.com%3E for another occurrence. I was also on customer site last week debugging an identical issue with checkpointing in a Scala-based program and they also could not enable checkpointing without hitting exactly this error. The essence of my code is: {code} final JavaSparkContext sparkContext = new JavaSparkContext(sparkConf); JavaStreamingContextFactory streamingContextFactory = new JavaStreamingContextFactory() { @Override public JavaStreamingContext create() { return new JavaStreamingContext(sparkContext, new Duration(batchDurationMS)); } }; streamingContext = JavaStreamingContext.getOrCreate( checkpointDirString, sparkContext.hadoopConfiguration(), streamingContextFactory, false); streamingContext.checkpoint(checkpointDirString); {code} It yields: {code} 2014-10-31 14:29:00,211 ERROR OneForOneStrategy:66 org.apache.hadoop.conf.Configuration - field (class org.apache.spark.streaming.dstream.PairDStreamFunctions$$anonfun$9, name: conf$2, type: class org.apache.hadoop.conf.Configuration) - object (class org.apache.spark.streaming.dstream.PairDStreamFunctions$$anonfun$9, function2) - field (class org.apache.spark.streaming.dstream.ForEachDStream, name: org$apache$spark$streaming$dstream$ForEachDStream$$foreachFunc, type: interface scala.Function2) - object (class org.apache.spark.streaming.dstream.ForEachDStream, org.apache.spark.streaming.dstream.ForEachDStream@cb8016a) ... {code} This looks like it's due to PairRDDFunctions, as this saveFunc seems to be org.apache.spark.streaming.dstream.PairDStreamFunctions$$anonfun$9 : {code} def saveAsNewAPIHadoopFiles( prefix: String, suffix: String, keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]], conf: Configuration = new Configuration ) { val saveFunc = (rdd: RDD[(K, V)], time: Time) = { val file = rddToFileName(prefix, suffix, time) rdd.saveAsNewAPIHadoopFile(file, keyClass, valueClass, outputFormatClass, conf) } self.foreachRDD(saveFunc) } {code} Is that not a problem? but then I don't know how it would ever work in Spark. But then again I don't see why this is an issue and only when checkpointing is enabled. Long-shot, but I wonder if it is related to closure issues like https://issues.apache.org/jira/browse/SPARK-1866 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4237) add Manifest File for Maven building
[ https://issues.apache.org/jira/browse/SPARK-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197698#comment-14197698 ] Sean Owen commented on SPARK-4237: -- How does the PR address this? the manifest file is already built and included, and contains manifest entries from dependencies. Is this a problem? add Manifest File for Maven building Key: SPARK-4237 URL: https://issues.apache.org/jira/browse/SPARK-4237 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.1.0 Reporter: wangfei Fix For: 1.2.0 Running with spark sql jdbc/odbc, the output will be JackydeMacBook-Pro:spark1 jackylee$ bin/beeline Spark assembly has been built with Hive, including Datanucleus jars on classpath Beeline version ??? by Apache Hive we should add Manifest File for Maven building -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1297) Upgrade HBase dependency to 0.98.0
[ https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198514#comment-14198514 ] Sean Owen commented on SPARK-1297: -- Ted are you updating a pull request? patches aren't used in this project. Upgrade HBase dependency to 0.98.0 -- Key: SPARK-1297 URL: https://issues.apache.org/jira/browse/SPARK-1297 Project: Spark Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: pom.xml, spark-1297-v2.txt, spark-1297-v4.txt, spark-1297-v5.txt, spark-1297-v6.txt, spark-1297-v7.txt HBase 0.94.6 was released 11 months ago. Upgrade HBase dependency to 0.98.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4271) jetty Server can't tryport+1
[ https://issues.apache.org/jira/browse/SPARK-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4271. -- Resolution: Duplicate Well here's a good example. I'm sure this is duplicate of https://issues.apache.org/jira/browse/SPARK-4169 which is solved better in its PR, and this is still awaiting review/commit. jetty Server can't tryport+1 Key: SPARK-4271 URL: https://issues.apache.org/jira/browse/SPARK-4271 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Environment: operating system language is Chinese. Reporter: honestold3 if operating system language is not Englisth, occur org.apache.spark.util.Util.isBindCollision can't contains BingException message. so jetty Server can't tryport+1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4231) Add RankingMetrics to examples.MovieLensALS
[ https://issues.apache.org/jira/browse/SPARK-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200396#comment-14200396 ] Sean Owen commented on SPARK-4231: -- So this method basically computes where each test item would rank if you asked for a list of recommendations that ranks every single item. It's not necessarily efficient, but is simple. The reason I did it that way was to avoid recreating a lot of the recommender ranking logic. I don't think one has to define MAP this way -- I effectively averaged over all k to the # of items. Yes I found the straightforward definition hard to implement at scale. I ended up opting to compute an approximation of AUC for recommender eval in this next version I'm working on: https://github.com/OryxProject/oryx/blob/master/oryx-ml-mllib/src/main/java/com/cloudera/oryx/ml/mllib/als/AUC.java#L106 Sorry for the hard-to-read Java 7; going to redo this in Java 8 soon. Basically you're just sampling random relevant/not-relevant pairs and comparing their scores. You might consider that. I dunno if it's worth bothering with a toy implementation in the examples. The example is already just to show Spark really not ALS. Add RankingMetrics to examples.MovieLensALS --- Key: SPARK-4231 URL: https://issues.apache.org/jira/browse/SPARK-4231 Project: Spark Issue Type: Improvement Components: Examples Affects Versions: 1.2.0 Reporter: Debasish Das Fix For: 1.2.0 Original Estimate: 24h Remaining Estimate: 24h examples.MovieLensALS computes RMSE for movielens dataset but after addition of RankingMetrics and enhancements to ALS, it is critical to look at not only the RMSE but also measures like prec@k and MAP. In this JIRA we added RMSE and MAP computation for examples.MovieLensALS and also added a flag that takes an input whether user/product recommendation is being validated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4231) Add RankingMetrics to examples.MovieLensALS
[ https://issues.apache.org/jira/browse/SPARK-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200470#comment-14200470 ] Sean Owen commented on SPARK-4231: -- Yes I'm mostly questioning implementing this in examples. The definition in RankingMetrics looks like the usual one to me -- average from 1 to min(# recs, # relevant items). You could say the version you found above is 'extended' to look into the long tail (# recs = # items), although the long tail doesn't affect MAP much. Same definition, different limit. precision@k does not have the same question since there is one k value, not lots. AUC may not help you if you're comparing to other things for which you don't have AUC. It was a side comment mostly. (Anyway there is already an AUC implementation here which I am trying to see if I can use.) Add RankingMetrics to examples.MovieLensALS --- Key: SPARK-4231 URL: https://issues.apache.org/jira/browse/SPARK-4231 Project: Spark Issue Type: Improvement Components: Examples Affects Versions: 1.2.0 Reporter: Debasish Das Fix For: 1.2.0 Original Estimate: 24h Remaining Estimate: 24h examples.MovieLensALS computes RMSE for movielens dataset but after addition of RankingMetrics and enhancements to ALS, it is critical to look at not only the RMSE but also measures like prec@k and MAP. In this JIRA we added RMSE and MAP computation for examples.MovieLensALS and also added a flag that takes an input whether user/product recommendation is being validated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4276) Spark streaming requires at least two working thread
[ https://issues.apache.org/jira/browse/SPARK-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200539#comment-14200539 ] Sean Owen commented on SPARK-4276: -- This is basically the same concern addressed already by https://issues.apache.org/jira/browse/SPARK-4040 no? This code can set a master of, say, local[2], but it was my understanding that all examples don't set a master and this is supplied by spark-submit. Then again SPARK-4040 changed the doc example to set a local[2] master. hm. Spark streaming requires at least two working thread Key: SPARK-4276 URL: https://issues.apache.org/jira/browse/SPARK-4276 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.1.0 Reporter: varun sharma Fix For: 1.1.0 Spark streaming requires at least two working threads.But example in spark/examples/src/main/scala/org/apache/spark/examples/streaming/NetworkWordCount.scala has // Create the context with a 1 second batch size val sparkConf = new SparkConf().setAppName(NetworkWordCount) val ssc = new StreamingContext(sparkConf, Seconds(1)) which creates only 1 thread. It should have atleast 2 threads: http://spark.apache.org/docs/latest/streaming-programming-guide.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4289) Creating an instance of Hadoop Job fails in the Spark shell when toString() is called on the instance.
[ https://issues.apache.org/jira/browse/SPARK-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201827#comment-14201827 ] Sean Owen commented on SPARK-4289: -- This is a Hadoop issue, right? I don't know if Spark can address this directly I suppose you could work around this with :silent in the shell. Creating an instance of Hadoop Job fails in the Spark shell when toString() is called on the instance. -- Key: SPARK-4289 URL: https://issues.apache.org/jira/browse/SPARK-4289 Project: Spark Issue Type: Bug Reporter: Corey J. Nolet This one is easy to reproduce. preval job = new Job(sc.hadoopConfiguration)/pre I'm not sure what the solution would be off hand as it's happening when the shell is calling toString() on the instance of Job. The problem is, because of the failure, the instance is never actually assigned to the job val. java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:283) at org.apache.hadoop.mapreduce.Job.toString(Job.java:452) at scala.runtime.ScalaRunTime$.scala$runtime$ScalaRunTime$$inner$1(ScalaRunTime.scala:324) at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:329) at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:337) at .init(console:10) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:859) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:771) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:616) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:624) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:629) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:954) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:997) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4288) Add Sparse Autoencoder algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-4288: - Description: Are you proposing an implementation? Is it related to the neural network JIRA? Target Version/s: (was: 1.3.0) Issue Type: Wish (was: Bug) Add Sparse Autoencoder algorithm to MLlib -- Key: SPARK-4288 URL: https://issues.apache.org/jira/browse/SPARK-4288 Project: Spark Issue Type: Wish Components: MLlib Reporter: Guoqiang Li Labels: features Are you proposing an implementation? Is it related to the neural network JIRA? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4283) Spark source code does not correctly import into eclipse
[ https://issues.apache.org/jira/browse/SPARK-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201853#comment-14201853 ] Sean Owen commented on SPARK-4283: -- This is really an Eclipse problem. I don't personally think it's worth the extra weight in the build for this. (Use pull requests, not patches on JIRAs, in Spark.) Spark source code does not correctly import into eclipse Key: SPARK-4283 URL: https://issues.apache.org/jira/browse/SPARK-4283 Project: Spark Issue Type: Bug Components: Build Reporter: Yang Yang Priority: Minor Attachments: spark_eclipse.diff when I import spark src into eclipse, either by mvn eclipse:eclipse, then import existing general projects or import existing maven projects, it does not recognize the project as a scala project. I am adding a new plugin , so import works -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org