[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3146#issuecomment-62110522 [Test build #23047 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23047/consoleFull) for PR 3146 at commit [`b8e2a49`](https://github.com/apache/spark/commit/b8e2a49aeed255053a52f22e03ec458ec5aecd84). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4225][SQL] Resorts to SparkContext.vers...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3105#issuecomment-62110929 @liancheng, you need rebase this:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3149#issuecomment-62111028 [Test build #23048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23048/consoleFull) for PR 3149 at commit [`8b2d845`](https://github.com/apache/spark/commit/8b2d84540b154b5092c81f960e463e851ff6ab54). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2659#issuecomment-62111305 [Test build #23042 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23042/consoleFull) for PR 2659 at commit [`a2f6a02`](https://github.com/apache/spark/commit/a2f6a02afed1df72d994d067017a3403c1adf933). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SizeLimitedStream(object):` * `class CompressedStream(object):` * `class LargeObjectSerializer(Serializer):` * `class CompressedSerializer(Serializer):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update JavaCustomReceiver.java
GitHub user xiao321 opened a pull request: https://github.com/apache/spark/pull/3153 Update JavaCustomReceiver.java æ°ç»ä¸æ è¶ç You can merge this pull request into a Git repository by running: $ git pull https://github.com/xiao321/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3153.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3153 commit 0ed17b578decfdb3221ae1bcba8de6f877983ef2 Author: xiao321 1042460...@qq.com Date: 2014-11-07T08:11:52Z Update JavaCustomReceiver.java æ°ç»ä¸æ è¶ç --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update JavaCustomReceiver.java
Github user xiao321 commented on the pull request: https://github.com/apache/spark/pull/3153#issuecomment-62111433 the array index out of bounds --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update JavaCustomReceiver.java
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3153#issuecomment-62111582 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3146#issuecomment-62111641 This sort of seems like it's reinventing what Thrift or protobuf do. Also, why is it necessary to introduce another serialization-related interface just to customize the serialization? Not objecting so much as asking why you can't just override the serialization with a desired compact serialization, or use a library. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update JavaCustomReceiver.java
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3153#issuecomment-62111740 LGTM but the title and description are not informative. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1812] Scala 2.11 support.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3111#issuecomment-62112493 [Test build #23043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23043/consoleFull) for PR 3111 at commit [`19a5167`](https://github.com/apache/spark/commit/19a5167ef3d7e573ad053ec33d93e5dc76149bea). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1812] Scala 2.11 support.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3111#issuecomment-62112498 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23043/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-3530][MLLIB] pipeline and paramete...
Github user tomerk commented on the pull request: https://github.com/apache/spark/pull/3099#issuecomment-62112717 At @shivaram's suggestion, I started porting over a simple text classifier pipeline that was already using an Estimator/Transformer abstraction of RDD[U] to RDD[V] transforms to this interface. The almost-complete port (the imports got messed up when moving files around) can be found at https://github.com/shivaram/spark-ml/commit/522aec73172b28a4bc1b22df030a459fddbd93dd. Beyond what Shivaram already mentioned, here are my thoughts: 1. The trickiest bit by far was all of the implicit conversions. I ended up needing to use several types of implicit conversion imports (case class - schema RDD, spark sql dsl, parameter map, etc.) They also got mysteriously deleted by the IDE as I moved files between projects. I ended up having to copy and paste these whenever appropriate because I couldn't keep track of them. 2. Like Shivaram, I'm also not familiar with the Spark SQL dsl, so here I also had to copy and paste code. It's unclear what syntax is valid and what isn't. For example, is saying as outputCol enough, or is as Symbol(outputCol) required? 3. There is a lot of boilerplate code. It was easier to write the Transformers in the form RDD[U] to RDD[V] instead of SchemaRDD to SchemaRDD, so I fully agree with Shivaram on that front. Potentially, certain interfaces along those lines (iterator to iterator transformers that can be applied to RDDs using mappartitions) could make it easier to have transformers not depend on local Spark Contexts to execute. 4. I found the parameter mapping in estimators fairly verbose, I like Shivaram's idea of having the estimators pass everything to the transformers no matter what. 5. Estimators requiring the transformers they output to extend Model didn't make sense to me. Certain estimators, such as to choose only the most frequent tokens in a collection to keep for each document, don't seem like they should output models. On that front, should it be required for estimators to specify the type of transformer they output? It can be convenient sometimes to just inline an anonymous Transformer to output without making it a top-level class. 6. There are a lot of parameter traits: HasRegParam, HasMaxIter, HasScoreCol, HasFeatureCol Does it make sense to have this many specific parameter traits if we still have to maintain boilerplate setters code for Java anyway? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/3146#issuecomment-62113295 **TL;DR:** The goal is to keep the network package small, with minimal dependencies and minimal overhead to verify cross-version compatibility moving forward. It is my feeling that protobuf and thrift are expensive dependencies to have, and that Java serialization is harder to reason about. The problem with using thrift or protobuf is inherently about dependencies. Protobuf dependencies are already a mess in Spark due to different, backwards-incompatible versions being used in Hadoop, Mesos, Akka, etc., and adding a real dependency in Spark just complicates the issue. Thrift is another relatively common dependency and has a few extra dependencies of its own, but I haven't explored that route as far. Since the code here is intended to work while running within other JVMs (e.g., YARN Node Manager), we want to keep dependencies down. Other parts of the network package use the Encodable interface because they write directly to Netty and this API is thus natural (decoding ByteBufs from an IO buffer, for instance). The choice of using Encodable here rather than implementing Externalizable/Serializable objects is for two reasons: simplicity and flexibility. The Java serialization framework brings a lot of baggage and has some non-obvious pitfalls, and accidental misuse may go unnoticed until the serial version id mismatch errors arrive. Second, it is less obvious how to explicitly handle changes in classes between versions. Since we expect the shuffle service to be long-lived, we must be able to simply and straightforwardly verify that code will work in a cross-version manner, and I feel that that is harder to prove when relying on Java serialization. Finally, the thing that makes this problem tractable, in my opinion, is that we should never be serializing complex object graphs at this level of the API. Everything should be ultimately simple, primitive types with minimal to no abstract types. We're not trying to solve serialization of general objects, just serialization of small, mostly static messages. Arrays of Strings should be the most complicated things we have to serialize. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-62113523 [Test build #23045 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23045/consoleFull) for PR 3130 at commit [`076322b`](https://github.com/apache/spark/commit/076322bc9151926002f494dbab4e3e1de1caef2e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-62113528 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23045/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3146#issuecomment-62114325 Thanks! good to hear the reasoning. It is indeed light and the use case is not quite the same as the usual general serialization use cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3146#issuecomment-62114341 @srowen I was initially actually for protobuf or avro, but looking at the dependency list, it'd be great hard to guarantee compatibility in the future. Given the number of messages we are actually serializing is very small, the work to do custom serialization protocol is very contained. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4294][Streaming] The same function shou...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3151#issuecomment-62114557 How much change would it take to use `require()` consistently across the code base? Looks like 10-20 occurrences. I wonder if people would find that too disruptive to be worth it? but seems better to fix it all or not bother with fixing one by one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-62115143 It looks good to me in general, and I like the idea of summarizing the convertible data type checking, but in the meantime, I am a little afraid it might be error-prone for future maintenance or new data type added. Or can we remove the `resolve` method? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update JavaCustomReceiver.java
Github user xiao321 commented on the pull request: https://github.com/apache/spark/pull/3153#issuecomment-62115778 oh,sorrywhen i run this command bin/run-example org.apache.spark.examples.streaming.JavaCustomReceiver localhost ,this error is java.lang.ClassNotFoundException: org.apache.spark.examples.arg.apache.spark.examples.streaming.JavaCustomReceiver,so i change the command like this bin/run-example streaming.JavaCustomReceiver localhost ,this error is java.lang.ArrayIndexOutOfBoundsException: 2.and then i view source,i find thisJavaReceiverInputDStreamString lines = ssc.receiverStream(new JavaCustomReceiver(args[1], Integer.parseInt(args[2])));,i think this should be changed to JavaReceiverInputDStreamString lines = ssc.receiverStream(new JavaCustomReceiver(args[0], Integer.parseInt(args[1]))); am i wrong?? -- åå§é®ä»¶ -- å件人: Sean Owennotificati...@github.com; åéæ¶é´: 2014å¹´11æ7æ¥(ææäº) ä¸å4:19 æ¶ä»¶äºº: apache/sparksp...@noreply.github.com; æé: xiaobingxian1042460...@qq.com; 主é¢: Re: [spark] Update JavaCustomReceiver.java (#3153) LGTM but the title and description are not informative. â Reply to this email directly or view it on GitHub. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4269][SQL] make wait time configurable ...
Github user jackylk commented on the pull request: https://github.com/apache/spark/pull/3133#issuecomment-62115794 IMHO, firstly I think it is not a good practice to put any hard coded value in the code, it is better to let user have more control over the configuration according to his needs since he knows his environment best. Secondly it did fail some SQL queries in my own environment which involve multiple table join. The code is updated according to your comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: invalid variable
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/3154 invalid variable You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3154.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3154 commit f5bde61e4597e89fcadbec73a7d28c3ccf2ac569 Author: viper-kun xukun...@huawei.com Date: 2014-11-07T09:15:19Z invalid variable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-62116902 [Test build #23046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23046/consoleFull) for PR 3130 at commit [`076322b`](https://github.com/apache/spark/commit/076322bc9151926002f494dbab4e3e1de1caef2e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-62116911 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23046/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: invalid variable
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3154#issuecomment-62116932 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update JavaCustomReceiver.java
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3153#issuecomment-62117084 No, I agree with the change. I'm saying that Update JavaCustomReceiver with no description is not a helpful title. Normally changes need a JIRA too, although this is so trivial that it may not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: invalid variable
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3154#issuecomment-62117020 That variable is used on line 194; I don't think you can remove it. This is a trivial change anyway, and doesn't have any useful description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3149#issuecomment-62117247 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23048/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3149#issuecomment-62117243 [Test build #23048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23048/consoleFull) for PR 3149 at commit [`8b2d845`](https://github.com/apache/spark/commit/8b2d84540b154b5092c81f960e463e851ff6ab54). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class RetryingBlockFetcher ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Allow disabling direct allocation in Net...
GitHub user aarondav opened a pull request: https://github.com/apache/spark/pull/3155 [WIP] Allow disabling direct allocation in NettyBlockTransferService You can merge this pull request into a Git repository by running: $ git pull https://github.com/aarondav/spark conf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3155.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3155 commit 5645c30d64c9b4c9095a5a8ff82647e97943be2d Author: Aaron Davidson aa...@databricks.com Date: 2014-11-07T09:21:45Z [WIP] Allow disabling direct allocation in NettyBlockTransferService --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update JavaCustomReceiver.java
Github user xiao321 commented on the pull request: https://github.com/apache/spark/pull/3153#issuecomment-62117528 sorry,,,i am a beginner,,i will pay attention next time -- åå§é®ä»¶ -- å件人: Sean Owennotificati...@github.com; åéæ¶é´: 2014å¹´11æ7æ¥(ææäº) ä¸å5:19 æ¶ä»¶äºº: apache/sparksp...@noreply.github.com; æé: xiaobingxian1042460...@qq.com; 主é¢: Re: [spark] Update JavaCustomReceiver.java (#3153) No, I agree with the change. I'm saying that Update JavaCustomReceiver with no description is not a helpful title. Normally changes need a JIRA too, although this is so trivial that it may not. â Reply to this email directly or view it on GitHub. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3146#issuecomment-62117594 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23047/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3146#issuecomment-62117588 [Test build #23047 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23047/consoleFull) for PR 3146 at commit [`b8e2a49`](https://github.com/apache/spark/commit/b8e2a49aeed255053a52f22e03ec458ec5aecd84). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/3150#discussion_r20001109 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -37,8 +42,62 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w case (BooleanType, DateType) = true case (DateType, _: NumericType) = true case (DateType, BooleanType) = true -case (_, DecimalType.Fixed(_, _)) = true // TODO: not all upcasts here can really give null -case _= child.nullable +case (_, DecimalType.Fixed(_, _)) = true // TODO: not all upcasts here can really give null +case _= false + } + + private[this] def resolvableNullability(from: Boolean, to: Boolean) = !from || to + + private[this] def resolve(from: DataType, to: DataType): Boolean = { +(from, to) match { + case (from, to) if from == to = true + + case (NullType, _)= true + + case (_, StringType) = true + + case (StringType, BinaryType) = true + + case (StringType, BooleanType)= true + case (DateType, BooleanType) = true + case (TimestampType, BooleanType) = true + case (_: NumericType, BooleanType)= true + + case (StringType, TimestampType) = true + case (BooleanType, TimestampType) = true + case (DateType, TimestampType)= true + case (_: NumericType, TimestampType) = true + + case (_, DateType)= true + + case (StringType, _: NumericType) = true + case (BooleanType, _: NumericType)= true + case (DateType, _: NumericType) = true + case (TimestampType, _: NumericType) = true + case (_: NumericType, _: NumericType) = true + + case (ArrayType(from, fn), ArrayType(to, tn)) = +resolve(from, to) + resolvableNullability(fn || forceNullable(from, to), tn) + + case (MapType(fromKey, fromValue, fn), MapType(toKey, toValue, tn)) = +resolve(fromKey, toKey) + (!forceNullable(fromKey, toKey)) + resolve(fromValue, toValue) + resolvableNullability(fn || forceNullable(fromValue, toValue), tn) + + case (StructType(fromFields), StructType(toFields)) = +fromFields.size == toFields.size + fromFields.zip(toFields).forall { +case (fromField, toField) = + resolve(fromField.dataType, toField.dataType) +resolvableNullability( + fromField.nullable || forceNullable(fromField.dataType, toField.dataType), + toField.nullable) + } + + case _ = false --- End diff -- Hmm, I think the resolve check should be in logical plan analyzing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Allow disabling direct allocation in Net...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3155#issuecomment-62118189 [Test build #23050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23050/consoleFull) for PR 3155 at commit [`5645c30`](https://github.com/apache/spark/commit/5645c30d64c9b4c9095a5a8ff82647e97943be2d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3146#issuecomment-62118486 [Test build #23051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23051/consoleFull) for PR 3146 at commit [`ed1102a`](https://github.com/apache/spark/commit/ed1102a007097e8eeb1d87f8cac0c85b3e71e2dd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/3150#discussion_r20001249 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -37,8 +42,62 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w case (BooleanType, DateType) = true case (DateType, _: NumericType) = true case (DateType, BooleanType) = true -case (_, DecimalType.Fixed(_, _)) = true // TODO: not all upcasts here can really give null -case _= child.nullable +case (_, DecimalType.Fixed(_, _)) = true // TODO: not all upcasts here can really give null +case _= false + } + + private[this] def resolvableNullability(from: Boolean, to: Boolean) = !from || to + + private[this] def resolve(from: DataType, to: DataType): Boolean = { +(from, to) match { + case (from, to) if from == to = true + + case (NullType, _)= true + + case (_, StringType) = true + + case (StringType, BinaryType) = true + + case (StringType, BooleanType)= true + case (DateType, BooleanType) = true + case (TimestampType, BooleanType) = true + case (_: NumericType, BooleanType)= true + + case (StringType, TimestampType) = true + case (BooleanType, TimestampType) = true + case (DateType, TimestampType)= true + case (_: NumericType, TimestampType) = true + + case (_, DateType)= true + + case (StringType, _: NumericType) = true + case (BooleanType, _: NumericType)= true + case (DateType, _: NumericType) = true + case (TimestampType, _: NumericType) = true + case (_: NumericType, _: NumericType) = true + + case (ArrayType(from, fn), ArrayType(to, tn)) = +resolve(from, to) + resolvableNullability(fn || forceNullable(from, to), tn) + + case (MapType(fromKey, fromValue, fn), MapType(toKey, toValue, tn)) = +resolve(fromKey, toKey) + (!forceNullable(fromKey, toKey)) + resolve(fromValue, toValue) + resolvableNullability(fn || forceNullable(fromValue, toValue), tn) + + case (StructType(fromFields), StructType(toFields)) = +fromFields.size == toFields.size + fromFields.zip(toFields).forall { +case (fromField, toField) = + resolve(fromField.dataType, toField.dataType) +resolvableNullability( + fromField.nullable || forceNullable(fromField.dataType, toField.dataType), + toField.nullable) + } + + case _ = false --- End diff -- Some expressions are checking the `resolved` in the `dataType` method, though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/3150#discussion_r20001270 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -323,28 +371,53 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w buildCast[Date](_, d = dateToDouble(d)) case TimestampType = buildCast[Timestamp](_, t = timestampToDouble(t).toFloat) -case DecimalType() = - buildCast[Decimal](_, _.toFloat) case x: NumericType = b = x.numeric.asInstanceOf[Numeric[Any]].toFloat(b) } - private[this] lazy val cast: Any = Any = dataType match { + private[this] def castArray(from: ArrayType, to: ArrayType): Any = Any = { +val elementCast = cast(from.elementType, to.elementType) +buildCast[Seq[Any]](_, _.map(v = if (v == null) null else elementCast(v))) --- End diff -- I don't think we need to handle the case specially the same as other expressions. The element data of the type `ArrayType.containsNull == false` are never `null`, so always `elementCast(v)` will be called. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/3150#issuecomment-62118596 @chenghao-intel, Thank you for your comments. If `resolve` method is removed, the nullability check (e.g. cast from `ArrayType(IntegerType, containsNull = true)` to `ArrayType(IntegerType, containsNull = false)` is apparently invalid) is also removed and it will cause unexpected errors. If there is a better way to ensure the nullability check, we can remove the method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] Ensure that files are fetched ato...
Github user preaudc commented on the pull request: https://github.com/apache/spark/pull/2855#issuecomment-62119248 As @ryan-williams pointed out, this is initially only a workaround to SPARK-3967. I have still no idea why the move fails (with a {{Permission denied}}) when the source and target files are not on the same partition (it is no more atomic, but it should succeed anyway). I have made this patch because it does not seem necessary to download the file into another local directory then move it (it may cause a copy instead of a rename, and does in fact here). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4294][Streaming] The same function shou...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3152#issuecomment-62119710 (Copying comment from closed PR) Is it worth replacing this same pattern everywhere? looks like 10-20 occurrences. I don't know if that's too disruptive, but replacing one by one is too trivial. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3722][Docs]minor improvement and fix in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2579#issuecomment-62119877 [Test build #23049 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23049/consoleFull) for PR 2579 at commit [`6f91cec`](https://github.com/apache/spark/commit/6f91cec38959f3510ae41ebf8931a72a20d6b2a7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3722][Docs]minor improvement and fix in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2579#issuecomment-62119885 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23049/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3149#issuecomment-62120054 Actually, this is caused by https://github.com/marmbrus/spark/commit/85872f6e2fbb2385793b645a629ed26ee2e98cbc#diff-1(in https://github.com/apache/spark/pull/3063 ) @marmbrus is there a reason you remove ```override lazy val toRdd``` there? I think we should keep ```override lazy val toRdd: RDD[Row] = executedPlan.execute().map(_.copy())``` in ```HiveContext``` to avoid this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4275]fix for path including space
GitHub user shuhuai007 opened a pull request: https://github.com/apache/spark/pull/3156 [SPARK-4275]fix for path including space You can merge this pull request into a Git repository by running: $ git pull https://github.com/shuhuai007/spark branch-1.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3156.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3156 commit 49e26074280f4de02251aba1422d6924a3c61ef9 Author: Joe zhoujie...@126.com Date: 2014-11-07T09:46:17Z [SPARK-4275]fix for path including space --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4275]fix for path including space
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3156#issuecomment-62120878 Duplicate of SPARK-3337 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4275]fix for path including space
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3156#issuecomment-62121073 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3149#issuecomment-62121116 In #3063, `HiveContext.toRdd` was removed ([line 377](https://github.com/apache/spark/pull/3063/files#diff-ff50aea397a607b79df9bec6f2a841dbL377)) , and the copy operation was moved to `HiveContext.stringResult` ([line 436](https://github.com/apache/spark/pull/3063/files#diff-ff50aea397a607b79df9bec6f2a841dbL436)). However, the Thrift server relies on `HiveContext.toRdd` to retrieve result RDD, thus causes this bug. @marmbrus I'm a bit confused here, could you please elaborate on the reason behind this change? Reverting this change should fix this bug, but I'm not sure whether this breaks any other contracts introduced in #3063. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3149#issuecomment-62121643 @scwf Oh, didn't notice you've already pointed this out :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] SPARK-4231: Add RankingMetrics to exam...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r20002546 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala --- @@ -165,22 +169,60 @@ object MovieLensALS { .setProductBlocks(params.numProductBlocks) .run(training) -val rmse = computeRmse(model, test, params.implicitPrefs) - -println(sTest RMSE = $rmse.) - +val (rmse, userMap, productMap) = + computeRecommendationMetrics(model, test, params.implicitPrefs) + +println(sTest RMSE = $rmse user MAP = $userMap product MAP = $productMap.) + sc.stop() } - - /** Compute RMSE (Root Mean Squared Error). */ - def computeRmse(model: MatrixFactorizationModel, data: RDD[Rating], implicitPrefs: Boolean) = { - -def mapPredictedRating(r: Double) = if (implicitPrefs) math.max(math.min(r, 1.0), 0.0) else r - + + /** + * Threshold for predictions are at 0.5 + */ + def mapPredictedRating(r: Double, implicitPrefs: Boolean) = { +if (implicitPrefs) math.max(math.min(r, 1.0), 0.0) +else math.max(round(r), 0.0) + } + + /** + * Compute MAP (Mean Average Precision) statistics + */ + def computeMap(predictedAndLabels: RDD[(Int, (Double, Double))]) = { + val ranking = predictedAndLabels.groupByKey.map { + case (user, entries) = { +val predictionValues = entries.toArray --- End diff -- I was going to comment on this point too - MAP has a max of 1.0. The input to `RankingMetrics` should be RDD[(predicted IDs array), (ground truth IDs array)], where the predictions are ordered by score (position matters for avg precision at K). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/2991#discussion_r20003429 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka + +import java.util.Properties +import java.util.concurrent.{ConcurrentHashMap, Executors} + +import scala.collection.Map +import scala.collection.mutable +import scala.reflect.{classTag, ClassTag} + +import kafka.common.TopicAndPartition +import kafka.consumer.{Consumer, ConsumerConfig, ConsumerConnector} +import kafka.serializer.Decoder +import kafka.utils.{ZkUtils, ZKGroupTopicDirs, ZKStringSerializer, VerifiableProperties} +import org.I0Itec.zkclient.ZkClient + +import org.apache.spark.{SparkEnv, Logging} +import org.apache.spark.storage.{StreamBlockId, StorageLevel} +import org.apache.spark.streaming.receiver.{BlockGeneratorListener, BlockGenerator, Receiver} + +private[streaming] +class ReliableKafkaReceiver[ + K: ClassTag, + V: ClassTag, + U : Decoder[_]: ClassTag, + T : Decoder[_]: ClassTag]( +kafkaParams: Map[String, String], +topics: Map[String, Int], +storageLevel: StorageLevel) +extends Receiver[Any](storageLevel) with Logging { + + /** High level consumer to connect to Kafka */ + private var consumerConnector: ConsumerConnector = null + + /** zkClient to connect to Zookeeper to commit the offsets */ + private var zkClient: ZkClient = null + + private val groupId = kafkaParams(group.id) + + private lazy val env = SparkEnv.get + + private val AUTO_OFFSET_COMMIT = auto.commit.enable + + /** A HashMap to manage the offset for each topic/partition, this HashMap is called in +* synchronized block, so mutable HashMap will not meet concurrency issue */ + private lazy val topicPartitionOffsetMap = new mutable.HashMap[TopicAndPartition, Long] + + /** A concurrent HashMap to store the stream block id and related offset snapshot */ + private lazy val blockOffsetMap = +new ConcurrentHashMap[StreamBlockId, Map[TopicAndPartition, Long]] + + private lazy val blockGeneratorListener = new BlockGeneratorListener { --- End diff -- Good to define the named class for this generator listener. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Allow disabling direct allocation in Net...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3155#issuecomment-62127176 [Test build #23050 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23050/consoleFull) for PR 3155 at commit [`5645c30`](https://github.com/apache/spark/commit/5645c30d64c9b4c9095a5a8ff82647e97943be2d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Allow disabling direct allocation in Net...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3155#issuecomment-62127181 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23050/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3146#issuecomment-62127452 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23051/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3146#issuecomment-62127443 [Test build #23051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23051/consoleFull) for PR 3146 at commit [`ed1102a`](https://github.com/apache/spark/commit/ed1102a007097e8eeb1d87f8cac0c85b3e71e2dd). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-62135443 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23052/consoleFull) for PR 2906 at commit [`8355f95`](https://github.com/apache/spark/commit/8355f959f02ca67454c9cb070912480db0a44671). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4225][SQL] Resorts to SparkContext.vers...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3105#issuecomment-62139172 @scwf Thanks for reminding, rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4225][SQL] Resorts to SparkContext.vers...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3105#issuecomment-62140381 [Test build #23053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23053/consoleFull) for PR 3105 at commit [`d9585e1`](https://github.com/apache/spark/commit/d9585e1db73798b881f1908e784c6fffd8ff9446). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/3157 SPARK-4297 [BUILD] Build warning fixes omnibus There are a number of warnings generated in a normal, successful build right now. They're mostly Java unchecked cast warnings, which can be suppressed. But there's a grab bag of other Scala language warnings and so on that can all be easily fixed. The forthcoming PR fixes about 90% of the build warnings I see now. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-4297 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3157.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3157 commit 17bc58143948d6deea8ada0ad9643958b5daf1db Author: Sean Owen so...@cloudera.com Date: 2014-11-07T13:30:33Z Suppress unchecked cast warnings, and several other build warning fixes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3722][Docs]minor improvement and fix in...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2579#issuecomment-62144467 @srowen did you have any further comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3157#issuecomment-62144509 [Test build #23054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23054/consoleFull) for PR 3157 at commit [`17bc581`](https://github.com/apache/spark/commit/17bc58143948d6deea8ada0ad9643958b5daf1db). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3157#issuecomment-62144778 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23054/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3157#issuecomment-62144774 [Test build #23054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23054/consoleFull) for PR 3157 at commit [`17bc581`](https://github.com/apache/spark/commit/17bc58143948d6deea8ada0ad9643958b5daf1db). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3722][Docs]minor improvement and fix in...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2579#issuecomment-62144929 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3157#issuecomment-62146486 [Test build #23055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23055/consoleFull) for PR 3157 at commit [`27800f7`](https://github.com/apache/spark/commit/27800f7602b2e1c338f176f6ebc46b65fc280b9a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-62147985 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23052/consoleFull) for PR 2906 at commit [`8355f95`](https://github.com/apache/spark/commit/8355f959f02ca67454c9cb070912480db0a44671). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `public class JavaHierarchicalClustering ` * `trait HierarchicalClusteringConf extends Serializable ` * `class HierarchicalClustering(` * `class HierarchicalClusteringModel(object):` * `class HierarchicalClustering(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-62147990 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23052/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4225][SQL] Resorts to SparkContext.vers...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3105#issuecomment-62156413 [Test build #23053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23053/consoleFull) for PR 3105 at commit [`d9585e1`](https://github.com/apache/spark/commit/d9585e1db73798b881f1908e784c6fffd8ff9446). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4225][SQL] Resorts to SparkContext.vers...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3105#issuecomment-62156422 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23053/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date classes w...
Github user culler closed the pull request at: https://github.com/apache/spark/pull/3066 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date classes w...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3066#issuecomment-62158264 Hi @liancheng , now that I have completely screwed up this PR by attempting to rebase the repository, I will close it and open a new one which will hopefully be clean. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3157#issuecomment-62161953 [Test build #23055 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23055/consoleFull) for PR 3157 at commit [`27800f7`](https://github.com/apache/spark/commit/27800f7602b2e1c338f176f6ebc46b65fc280b9a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3157#issuecomment-62161978 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23055/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62164793 Those changes are made, and I removed the extra static methods that I added. I agree--it's much cleaner now. Not sure if any cleanup can be done on the existing static methods--looks like they're only used in the test suites, but I'm going to leave them alone for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on the pull request: https://github.com/apache/spark/pull/2848#issuecomment-62165292 OK, I refactored a little further. Updates: * renamed the helper function `maybeMoveFile` (instead of `moveFile`) * introduced a second signature for `maybeMoveFile` that just takes two `File`s * this allowed me to bring the 3rd instance of this repeated logic in `Utils.doFetchFile` into the fold, which helps the overall consistency / cleanliness a lot, I think. * incidentally, that last code path handled the `exists` vs. `delete()` trickery differently than I was doing before; it used a boolean `var` that recorded explicitly whether we `shouldCopy` (`true` to start, set to `false` iff we found an identical file to exist). I decided that this way was cleaner, per @andrewor14's and @pwendell's (earlier in this thread) suggestions, and structured `maybeMoveFile` that way. * folded the code path around L397 into `maybeMoveFile` as well, per @andrewor14's last suggestion. lmk how it looks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
GitHub user culler opened a pull request: https://github.com/apache/spark/pull/3158 [SPARK-4205][SQL] Timestamp and Date with comparisons / DSL literals This is the same as pull request #3066, which I closed due to corruption of the repository after I tried to rebase so as to include modifications to a test file added after the original pull request was issued. There are two parts: (1) new RichDate and RichTimestamp classes provide comparison operators, which allows them to be used in DSL expressions, and initializers which accept string representations of dates or times; (2) new implicit conversions are added which allow recognition of DSL expressions which have a literal on the left, e.g. 0 'x . You can merge this pull request into a Git repository by running: $ git pull https://github.com/culler/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3158.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3158 commit 7180345ba7ce255a2fc389ae8c55998ed20b9a82 Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-07T16:11:09Z Adds RichDate and RichTimestamp classes with comparison operators, allowing them to be used in DSL expressions. These classes provide initializers which accept string representations of dates or times. They are renamed as Date and Timestamp when the members of an SQLContext are in scope. commit bcf6e6bb143f8e4a5f22356fadae54fce4f57041 Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-07T16:17:33Z Adds new implicit conversions which allow DSL expressions to start with a literal, e.g. 0 'x . These conversions expose a conflict with the scalatest === operator if assert(X === Y) is used when the conversions are in scope. To fix this, several tests are modified, as recommended in the scalatest documentation, by making the change: assert(X === Y) -- assert(convertToEqualizer(X).===(Y)) commit ef5e4a4230d671ed2ae19f74c280f5e8c44f41aa Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-07T16:38:18Z Clarification of one comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3158#issuecomment-62175153 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3158#issuecomment-62175638 @liangcheng and @rxin, I am reopening pull request #3066 as #3158 so it can be based on a current commit of the spark source. I messed up #3066 by trying to rebase it after a new test file was added which required minor changes to compile. Sorry for any confusion this causes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4281][Build] Package Yarn shuffle servi...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3147#discussion_r20023234 --- Diff: network/common/pom.xml --- @@ -41,12 +41,12 @@ groupIdio.netty/groupId artifactIdnetty-all/artifactId /dependency + +!-- Provided dependencies -- dependency groupIdorg.slf4j/groupId artifactIdslf4j-api/artifactId --- End diff -- Yeah, actually Yarn already provides slf4j so it doesn't need to be a core dependency. For standalone mode, this is also required by Spark so it should just be a provided dependency. HOWEVER I just realized I forgot to actually make it provided by adding the tag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4281][Build] Package Yarn shuffle servi...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3147#discussion_r20023680 --- Diff: network/yarn/pom.xml --- @@ -54,5 +54,38 @@ build outputDirectorytarget/scala-${scala.binary.version}/classes/outputDirectory testOutputDirectorytarget/scala-${scala.binary.version}/test-classes/testOutputDirectory +plugins --- End diff -- My understanding is that the shading plugin is primarily used to create uber jars, and the shading dependency part is just a generally useful thing in this process: http://maven.apache.org/plugins/maven-shade-plugin/. This is how we create assembly jars in say the `example` and `core` modules, except the difference here is that we don't actually need to shade any dependencies. I think this is a pretty standard thing to do and I'm not sure if a comment is necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4291][Build] Rename network module proj...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3148#issuecomment-62180406 Ok, thanks @shivaram. I renamed it Spark Project Networking. What do others think about this name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4281][Build] Package Yarn shuffle servi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3147#issuecomment-62180858 [Test build #23056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23056/consoleFull) for PR 3147 at commit [`65db822`](https://github.com/apache/spark/commit/65db8227ef5632ff53574fc8efd7c579b6f26133). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4281][Build] Package Yarn shuffle servi...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3147#discussion_r20024208 --- Diff: network/common/pom.xml --- @@ -41,12 +41,12 @@ groupIdio.netty/groupId artifactIdnetty-all/artifactId /dependency + +!-- Provided dependencies -- dependency groupIdorg.slf4j/groupId artifactIdslf4j-api/artifactId --- End diff -- Ok I added the tag --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4291][Build] Rename network module proj...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3148#issuecomment-62182350 [Test build #23057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23057/consoleFull) for PR 3148 at commit [`eac839b`](https://github.com/apache/spark/commit/eac839b0c8524ae778b09c23b7296a1c75e51297). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3146#issuecomment-62183450 Merging in master branch-1.2. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3146 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-62187026 @witgo Thank you for your suggestion! Could you elaborate how als algorithm design could be used? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: invalid variable
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3154#issuecomment-62189370 Do you mind closing this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3149#issuecomment-62189648 Good catch guys, and thanks for adding a test. The comment on `toRdd` has always been `/** Internal version of the RDD. Avoids copies and has no schema */` so it was kind of confusing that this was different for Hive. I think the right solution here is to avoid using the internal `queryExecution` API from the thrift server and instead just call `.collect()` on `resultRdd`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3149#issuecomment-62190459 @marmbrus, i think you mean ```.collect()``` on ```result```, not ```resultRdd```, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-62190595 For matrix factorization we have user x product sparse matrix...You can think of this sparse matrix as the feature matrix for ANN...Now consider two matrices H1 and H2 of size feature x rank...where rank is the number of hidden layers...With this the problem is minimize || X - f(H1'X)H2 || + lambdaL1(H1) + lambdaL2(H2) The major difference is can H1'X breaks the way matrix factorization breaks ? If it can then we should be able to use ALS design...or an extension of ALS design... But say the hidden layer grows from 1 to 10 (Latest Google paper mentioned 22 layers)...then I don't think this idea works...we have to formulate the problem on graphx where the model is distributed over workers and not built on Master @witgo you think we can break f(H1'X) in ALS way? I have not thought more on it ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4291][Build] Rename network module proj...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3148#issuecomment-62191364 [Test build #23057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23057/consoleFull) for PR 3148 at commit [`eac839b`](https://github.com/apache/spark/commit/eac839b0c8524ae778b09c23b7296a1c75e51297). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4291][Build] Rename network module proj...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3148#issuecomment-62191373 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23057/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3149#issuecomment-62191452 Yes, correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-62191470 f is neural activation...it can be tanh or sigmoid function (they are non-convex, nonlinear) , LRU units (max is convex)...in this PR https://github.com/apache/spark/pull/2705 I am experimenting with convex and nonlinear functions for matrix factorization loss..Idea is to use the gradient interfaces for the loss functions...if f(H1'X) can break component wise we can re-use lot of ALS development... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4291][Build] Rename network module proj...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3148#issuecomment-62191626 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4291][Build] Rename network module proj...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3148#issuecomment-62192179 [Test build #23058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23058/consoleFull) for PR 3148 at commit [`eac839b`](https://github.com/apache/spark/commit/eac839b0c8524ae778b09c23b7296a1c75e51297). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3149#issuecomment-62192492 Hmm, i think ```result.collect``` is ok, but ```result.toLocalIterator``` can get the right answer? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4281][Build] Package Yarn shuffle servi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3147#issuecomment-62193940 [Test build #23056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23056/consoleFull) for PR 3147 at commit [`65db822`](https://github.com/apache/spark/commit/65db8227ef5632ff53574fc8efd7c579b6f26133). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4281][Build] Package Yarn shuffle servi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3147#issuecomment-62193959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23056/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org