date:20141107

[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3146#issuecomment-62110522
  
  [Test build #23047 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23047/consoleFull)
 for   PR 3146 at commit 
[`b8e2a49`](https://github.com/apache/spark/commit/b8e2a49aeed255053a52f22e03ec458ec5aecd84).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4225][SQL] Resorts to SparkContext.vers...

2014-11-07 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3105#issuecomment-62110929
  
@liancheng, you need rebase this:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3149#issuecomment-62111028
  
  [Test build #23048 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23048/consoleFull)
 for   PR 3149 at commit 
[`8b2d845`](https://github.com/apache/spark/commit/8b2d84540b154b5092c81f960e463e851ff6ab54).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3721] [PySpark] broadcast objects large...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2659#issuecomment-62111305
  
  [Test build #23042 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23042/consoleFull)
 for   PR 2659 at commit 
[`a2f6a02`](https://github.com/apache/spark/commit/a2f6a02afed1df72d994d067017a3403c1adf933).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SizeLimitedStream(object):`
  * `class CompressedStream(object):`
  * `class LargeObjectSerializer(Serializer):`
  * `class CompressedSerializer(Serializer):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update JavaCustomReceiver.java

2014-11-07 Thread xiao321

GitHub user xiao321 opened a pull request:

https://github.com/apache/spark/pull/3153

Update JavaCustomReceiver.java

æ°ç»ä¸æ è¶ç

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xiao321/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3153.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3153


commit 0ed17b578decfdb3221ae1bcba8de6f877983ef2
Author: xiao321 1042460...@qq.com
Date:   2014-11-07T08:11:52Z

Update JavaCustomReceiver.java

æ°ç»ä¸æ è¶ç




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update JavaCustomReceiver.java

2014-11-07 Thread xiao321

Github user xiao321 commented on the pull request:

https://github.com/apache/spark/pull/3153#issuecomment-62111433
  
the array index out of bounds


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update JavaCustomReceiver.java

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3153#issuecomment-62111582
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...

2014-11-07 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3146#issuecomment-62111641
  
This sort of seems like it's reinventing what Thrift or protobuf do. Also, 
why is it necessary to introduce another serialization-related interface just 
to customize the serialization? Not objecting so much as asking why you can't 
just override the serialization with a desired compact serialization, or use a 
library.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update JavaCustomReceiver.java

2014-11-07 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3153#issuecomment-62111740
  
LGTM but the title and description are not informative.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1812] Scala 2.11 support.

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3111#issuecomment-62112493
  
  [Test build #23043 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23043/consoleFull)
 for   PR 3111 at commit 
[`19a5167`](https://github.com/apache/spark/commit/19a5167ef3d7e573ad053ec33d93e5dc76149bea).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1812] Scala 2.11 support.

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3111#issuecomment-62112498
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23043/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3530][MLLIB] pipeline and paramete...

2014-11-07 Thread tomerk

Github user tomerk commented on the pull request:

https://github.com/apache/spark/pull/3099#issuecomment-62112717
  
At @shivaram's suggestion, I started porting over a simple text classifier 
pipeline that was already using an Estimator/Transformer abstraction of RDD[U] 
to RDD[V] transforms to this interface. The almost-complete port (the imports 
got messed up when moving files around) can be found at 
https://github.com/shivaram/spark-ml/commit/522aec73172b28a4bc1b22df030a459fddbd93dd.
 

Beyond what Shivaram already mentioned, here are my thoughts:

1. The trickiest bit by far was all of the implicit conversions. I ended up 
needing to use several types of implicit conversion imports (case class - 
schema RDD, spark sql dsl, parameter map, etc.) They also got mysteriously 
deleted by the IDE as I moved files between projects. I ended up having to copy 
and paste these whenever appropriate because I couldn't keep track of them.

2. Like Shivaram, I'm also not familiar with the Spark SQL dsl, so here I 
also had to copy and paste code. It's unclear what syntax is valid and what 
isn't. For example, is saying as outputCol enough, or is as 
Symbol(outputCol) required?

3. There is a lot of boilerplate code. It was easier to write the 
Transformers in the form RDD[U] to RDD[V] instead of SchemaRDD to SchemaRDD, so 
I fully agree with Shivaram on that front. Potentially, certain interfaces 
along those lines (iterator to iterator transformers that can be applied to 
RDDs using mappartitions) could make it easier to have transformers not depend 
on local Spark Contexts to execute.

4. I found the parameter mapping in estimators fairly verbose, I like 
Shivaram's idea of having the estimators pass everything to the transformers no 
matter what.

5. Estimators requiring the transformers they output to extend Model didn't 
make sense to me. Certain estimators, such as to choose only the most frequent 
tokens in a collection to keep for each document, don't seem like they should 
output models. On that front, should it be required for estimators to specify 
the type of transformer they output? It can be convenient sometimes to just 
inline an anonymous Transformer to output without making it a top-level class.

6. There are a lot of parameter traits: HasRegParam, HasMaxIter, 
HasScoreCol, HasFeatureCol Does it make sense to have this many specific 
parameter traits if we still have to maintain boilerplate setters code for Java 
anyway?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...

2014-11-07 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/3146#issuecomment-62113295
  
**TL;DR:** The goal is to keep the network package small, with minimal 
dependencies and minimal overhead to verify cross-version compatibility moving 
forward. It is my feeling that protobuf and thrift are expensive dependencies 
to have, and that Java serialization is harder to reason about.

The problem with using thrift or protobuf is inherently about dependencies. 
Protobuf dependencies are already a mess in Spark due to different, 
backwards-incompatible versions being used in Hadoop, Mesos, Akka, etc., and 
adding a real dependency in Spark just complicates the issue. Thrift is another 
relatively common dependency and has a few extra dependencies of its own, but I 
haven't explored that route as far. Since the code here is intended to work 
while running within other JVMs (e.g., YARN Node Manager), we want to keep 
dependencies down.

Other parts of the network package use the Encodable interface because 
they write directly to Netty and this API is thus natural (decoding ByteBufs 
from an IO buffer, for instance). The choice of using Encodable here rather 
than implementing Externalizable/Serializable objects is for two reasons: 
simplicity and flexibility. The Java serialization framework brings a lot of 
baggage and has some non-obvious pitfalls, and accidental misuse may go 
unnoticed until the serial version id mismatch errors arrive. Second, it is 
less obvious how to explicitly handle changes in classes between versions. 
Since we expect the shuffle service to be long-lived, we must be able to simply 
and straightforwardly verify that code will work in a cross-version manner, and 
I feel that that is harder to prove when relying on Java serialization.

Finally, the thing that makes this problem tractable, in my opinion, is 
that we should never be serializing complex object graphs at this level of the 
API. Everything should be ultimately simple, primitive types with minimal to no 
abstract types. We're not trying to solve serialization of general objects, 
just serialization of small, mostly static messages. Arrays of Strings should 
be the most complicated things we have to serialize.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3130#issuecomment-62113523
  
  [Test build #23045 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23045/consoleFull)
 for   PR 3130 at commit 
[`076322b`](https://github.com/apache/spark/commit/076322bc9151926002f494dbab4e3e1de1caef2e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3130#issuecomment-62113528
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23045/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...

2014-11-07 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3146#issuecomment-62114325
  
Thanks! good to hear the reasoning. It is indeed light and the use case is 
not quite the same as the usual general serialization use cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...

2014-11-07 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3146#issuecomment-62114341
  
@srowen I was initially actually for protobuf or avro, but looking at the 
dependency list, it'd be great hard to guarantee compatibility in the future. 
Given the number of messages we are actually serializing is very small, the 
work to do custom serialization protocol is very contained.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4294][Streaming] The same function shou...

2014-11-07 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3151#issuecomment-62114557
  
How much change would it take to use `require()` consistently across the 
code base? Looks like 10-20 occurrences. I wonder if people would find that too 
disruptive to be worth it? but seems better to fix it all or not bother with 
fixing one by one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...

2014-11-07 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/3150#issuecomment-62115143
  
It looks good to me in general, and I like the idea of summarizing the 
convertible data type checking, but in the meantime, I am a little afraid it 
might be error-prone for future maintenance or new data type added. 
Or can we remove the `resolve` method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update JavaCustomReceiver.java

2014-11-07 Thread xiao321

Github user xiao321 commented on the pull request:

https://github.com/apache/spark/pull/3153#issuecomment-62115778
  
oh,sorrywhen i run this command bin/run-example 
org.apache.spark.examples.streaming.JavaCustomReceiver localhost  ,this 
error is java.lang.ClassNotFoundException: 
org.apache.spark.examples.arg.apache.spark.examples.streaming.JavaCustomReceiver,so
 i change the command like this bin/run-example streaming.JavaCustomReceiver 
localhost ,this error is  java.lang.ArrayIndexOutOfBoundsException: 
2.and then i view source,i find thisJavaReceiverInputDStreamString lines = 
ssc.receiverStream(new JavaCustomReceiver(args[1], 
Integer.parseInt(args[2])));,i think this should be changed to 
JavaReceiverInputDStreamString lines = ssc.receiverStream(new 
JavaCustomReceiver(args[0], Integer.parseInt(args[1])));
am i wrong??




-- åå§é®ä»¶ --
åä»¶äºº: Sean Owennotificati...@github.com; 
åéæ¶é´: 2014å¹´11æ7æ¥(ææäº) ä¸å4:19
æ¶ä»¶äºº: apache/sparksp...@noreply.github.com; 
æé: xiaobingxian1042460...@qq.com; 
ä¸»é¢: Re: [spark] Update JavaCustomReceiver.java (#3153)




LGTM but the title and description are not informative.
 
â
Reply to this email directly or view it on GitHub.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4269][SQL] make wait time configurable ...

2014-11-07 Thread jackylk

Github user jackylk commented on the pull request:

https://github.com/apache/spark/pull/3133#issuecomment-62115794
  
IMHO, firstly I think it is not a good practice to put any hard coded value 
in the code, it is better to let user have more control over the configuration 
according to his needs since he knows his environment best. Secondly it did 
fail some SQL queries in my own environment which involve multiple table join.

The code is updated according to your comment. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: invalid variable

2014-11-07 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/3154

invalid variable



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3154.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3154


commit f5bde61e4597e89fcadbec73a7d28c3ccf2ac569
Author: viper-kun xukun...@huawei.com
Date:   2014-11-07T09:15:19Z

invalid variable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3130#issuecomment-62116902
  
  [Test build #23046 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23046/consoleFull)
 for   PR 3130 at commit 
[`076322b`](https://github.com/apache/spark/commit/076322bc9151926002f494dbab4e3e1de1caef2e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3130#issuecomment-62116911
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23046/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: invalid variable

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3154#issuecomment-62116932
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update JavaCustomReceiver.java

2014-11-07 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3153#issuecomment-62117084
  
No, I agree with the change. I'm saying that Update JavaCustomReceiver 
with no description is not a helpful title. Normally changes need a JIRA too, 
although this is so trivial that it may not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: invalid variable

2014-11-07 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3154#issuecomment-62117020
  
That variable is used on line 194; I don't think you can remove it. This is 
a trivial change anyway, and doesn't have any useful description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3149#issuecomment-62117247
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23048/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3149#issuecomment-62117243
  
  [Test build #23048 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23048/consoleFull)
 for   PR 3149 at commit 
[`8b2d845`](https://github.com/apache/spark/commit/8b2d84540b154b5092c81f960e463e851ff6ab54).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class RetryingBlockFetcher `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Allow disabling direct allocation in Net...

2014-11-07 Thread aarondav

GitHub user aarondav opened a pull request:

https://github.com/apache/spark/pull/3155

[WIP] Allow disabling direct allocation in NettyBlockTransferService



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aarondav/spark conf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3155.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3155


commit 5645c30d64c9b4c9095a5a8ff82647e97943be2d
Author: Aaron Davidson aa...@databricks.com
Date:   2014-11-07T09:21:45Z

[WIP] Allow disabling direct allocation in NettyBlockTransferService




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update JavaCustomReceiver.java

2014-11-07 Thread xiao321

Github user xiao321 commented on the pull request:

https://github.com/apache/spark/pull/3153#issuecomment-62117528
  
sorry,,,i am a beginner,,i will pay attention next time




-- åå§é®ä»¶ --
åä»¶äºº: Sean Owennotificati...@github.com; 
åéæ¶é´: 2014å¹´11æ7æ¥(ææäº) ä¸å5:19
æ¶ä»¶äºº: apache/sparksp...@noreply.github.com; 
æé: xiaobingxian1042460...@qq.com; 
ä¸»é¢: Re: [spark] Update JavaCustomReceiver.java (#3153)




No, I agree with the change. I'm saying that Update JavaCustomReceiver 
with no description is not a helpful title. Normally changes need a JIRA too, 
although this is so trivial that it may not.
 
â
Reply to this email directly or view it on GitHub.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3146#issuecomment-62117594
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23047/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3146#issuecomment-62117588
  
  [Test build #23047 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23047/consoleFull)
 for   PR 3146 at commit 
[`b8e2a49`](https://github.com/apache/spark/commit/b8e2a49aeed255053a52f22e03ec458ec5aecd84).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...

2014-11-07 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3150#discussion_r20001109
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -37,8 +42,62 @@ case class Cast(child: Expression, dataType: DataType) 
extends UnaryExpression w
 case (BooleanType, DateType)  = true
 case (DateType, _: NumericType)   = true
 case (DateType, BooleanType)  = true
-case (_, DecimalType.Fixed(_, _)) = true  // TODO: not all upcasts 
here can really give null
-case _= child.nullable
+case (_, DecimalType.Fixed(_, _)) = true // TODO: not all upcasts 
here can really give null
+case _= false
+  }
+
+  private[this] def resolvableNullability(from: Boolean, to: Boolean) = 
!from || to
+
+  private[this] def resolve(from: DataType, to: DataType): Boolean = {
+(from, to) match {
+  case (from, to) if from == to = true
+
+  case (NullType, _)= true
+
+  case (_, StringType)  = true
+
+  case (StringType, BinaryType) = true
+
+  case (StringType, BooleanType)= true
+  case (DateType, BooleanType)  = true
+  case (TimestampType, BooleanType) = true
+  case (_: NumericType, BooleanType)= true
+
+  case (StringType, TimestampType)  = true
+  case (BooleanType, TimestampType) = true
+  case (DateType, TimestampType)= true
+  case (_: NumericType, TimestampType)  = true
+
+  case (_, DateType)= true
+
+  case (StringType, _: NumericType) = true
+  case (BooleanType, _: NumericType)= true
+  case (DateType, _: NumericType)   = true
+  case (TimestampType, _: NumericType)  = true
+  case (_: NumericType, _: NumericType) = true
+
+  case (ArrayType(from, fn), ArrayType(to, tn)) =
+resolve(from, to) 
+  resolvableNullability(fn || forceNullable(from, to), tn)
+
+  case (MapType(fromKey, fromValue, fn), MapType(toKey, toValue, tn)) 
=
+resolve(fromKey, toKey) 
+  (!forceNullable(fromKey, toKey)) 
+  resolve(fromValue, toValue) 
+  resolvableNullability(fn || forceNullable(fromValue, toValue), 
tn)
+
+  case (StructType(fromFields), StructType(toFields)) =
+fromFields.size == toFields.size 
+  fromFields.zip(toFields).forall {
+case (fromField, toField) =
+  resolve(fromField.dataType, toField.dataType) 
+resolvableNullability(
+  fromField.nullable || forceNullable(fromField.dataType, 
toField.dataType),
+  toField.nullable)
+  }
+
+  case _ = false
--- End diff --

Hmm, I think the resolve check should be in logical plan analyzing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Allow disabling direct allocation in Net...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3155#issuecomment-62118189
  
  [Test build #23050 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23050/consoleFull)
 for   PR 3155 at commit 
[`5645c30`](https://github.com/apache/spark/commit/5645c30d64c9b4c9095a5a8ff82647e97943be2d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3146#issuecomment-62118486
  
  [Test build #23051 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23051/consoleFull)
 for   PR 3146 at commit 
[`ed1102a`](https://github.com/apache/spark/commit/ed1102a007097e8eeb1d87f8cac0c85b3e71e2dd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...

2014-11-07 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3150#discussion_r20001249
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -37,8 +42,62 @@ case class Cast(child: Expression, dataType: DataType) 
extends UnaryExpression w
 case (BooleanType, DateType)  = true
 case (DateType, _: NumericType)   = true
 case (DateType, BooleanType)  = true
-case (_, DecimalType.Fixed(_, _)) = true  // TODO: not all upcasts 
here can really give null
-case _= child.nullable
+case (_, DecimalType.Fixed(_, _)) = true // TODO: not all upcasts 
here can really give null
+case _= false
+  }
+
+  private[this] def resolvableNullability(from: Boolean, to: Boolean) = 
!from || to
+
+  private[this] def resolve(from: DataType, to: DataType): Boolean = {
+(from, to) match {
+  case (from, to) if from == to = true
+
+  case (NullType, _)= true
+
+  case (_, StringType)  = true
+
+  case (StringType, BinaryType) = true
+
+  case (StringType, BooleanType)= true
+  case (DateType, BooleanType)  = true
+  case (TimestampType, BooleanType) = true
+  case (_: NumericType, BooleanType)= true
+
+  case (StringType, TimestampType)  = true
+  case (BooleanType, TimestampType) = true
+  case (DateType, TimestampType)= true
+  case (_: NumericType, TimestampType)  = true
+
+  case (_, DateType)= true
+
+  case (StringType, _: NumericType) = true
+  case (BooleanType, _: NumericType)= true
+  case (DateType, _: NumericType)   = true
+  case (TimestampType, _: NumericType)  = true
+  case (_: NumericType, _: NumericType) = true
+
+  case (ArrayType(from, fn), ArrayType(to, tn)) =
+resolve(from, to) 
+  resolvableNullability(fn || forceNullable(from, to), tn)
+
+  case (MapType(fromKey, fromValue, fn), MapType(toKey, toValue, tn)) 
=
+resolve(fromKey, toKey) 
+  (!forceNullable(fromKey, toKey)) 
+  resolve(fromValue, toValue) 
+  resolvableNullability(fn || forceNullable(fromValue, toValue), 
tn)
+
+  case (StructType(fromFields), StructType(toFields)) =
+fromFields.size == toFields.size 
+  fromFields.zip(toFields).forall {
+case (fromField, toField) =
+  resolve(fromField.dataType, toField.dataType) 
+resolvableNullability(
+  fromField.nullable || forceNullable(fromField.dataType, 
toField.dataType),
+  toField.nullable)
+  }
+
+  case _ = false
--- End diff --

Some expressions are checking the `resolved` in the `dataType` method, 
though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...

2014-11-07 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3150#discussion_r20001270
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -323,28 +371,53 @@ case class Cast(child: Expression, dataType: 
DataType) extends UnaryExpression w
   buildCast[Date](_, d = dateToDouble(d))
 case TimestampType =
   buildCast[Timestamp](_, t = timestampToDouble(t).toFloat)
-case DecimalType() =
-  buildCast[Decimal](_, _.toFloat)
 case x: NumericType =
   b = x.numeric.asInstanceOf[Numeric[Any]].toFloat(b)
   }
 
-  private[this] lazy val cast: Any = Any = dataType match {
+  private[this] def castArray(from: ArrayType, to: ArrayType): Any = Any 
= {
+val elementCast = cast(from.elementType, to.elementType)
+buildCast[Seq[Any]](_, _.map(v = if (v == null) null else 
elementCast(v)))
--- End diff --

I don't think we need to handle the case specially the same as other 
expressions.
The element data of the type `ArrayType.containsNull == false` are never 
`null`, so always `elementCast(v)` will be called.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4293][SQL] Make Cast be able to handle ...

2014-11-07 Thread ueshin

Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/3150#issuecomment-62118596
  
@chenghao-intel, Thank you for your comments.
If `resolve` method is removed, the nullability check (e.g. cast from 
`ArrayType(IntegerType, containsNull = true)` to `ArrayType(IntegerType, 
containsNull = false)` is apparently invalid) is also removed and it will cause 
unexpected errors. If there is a better way to ensure the nullability check, we 
can remove the method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3967] Ensure that files are fetched ato...

2014-11-07 Thread preaudc

Github user preaudc commented on the pull request:

https://github.com/apache/spark/pull/2855#issuecomment-62119248
  
As @ryan-williams pointed out, this is initially only a workaround to 
SPARK-3967.
I have still no idea why the move fails (with a {{Permission denied}}) when 
the source and target files are not on the same partition (it is no more 
atomic, but it should succeed anyway).
I have made this patch because it does not seem necessary to download the 
file into another local directory then move it (it may cause a copy instead of 
a rename, and does in fact here).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4294][Streaming] The same function shou...

2014-11-07 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3152#issuecomment-62119710
  
(Copying comment from closed PR) Is it worth replacing this same pattern 
everywhere? looks like 10-20 occurrences. I don't know if that's too 
disruptive, but replacing one by one is too trivial.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3722][Docs]minor improvement and fix in...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2579#issuecomment-62119877
  
  [Test build #23049 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23049/consoleFull)
 for   PR 2579 at commit 
[`6f91cec`](https://github.com/apache/spark/commit/6f91cec38959f3510ae41ebf8931a72a20d6b2a7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3722][Docs]minor improvement and fix in...

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2579#issuecomment-62119885
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23049/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...

2014-11-07 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3149#issuecomment-62120054
  
Actually, this is caused by 
https://github.com/marmbrus/spark/commit/85872f6e2fbb2385793b645a629ed26ee2e98cbc#diff-1(in
 https://github.com/apache/spark/pull/3063 )
@marmbrus is there a reason you remove ```override lazy val toRdd``` there? 
I think we should keep ```override lazy val toRdd: RDD[Row] = 
executedPlan.execute().map(_.copy())``` in ```HiveContext``` to avoid this 
issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4275]fix for path including space

2014-11-07 Thread shuhuai007

GitHub user shuhuai007 opened a pull request:

https://github.com/apache/spark/pull/3156

[SPARK-4275]fix for path including space



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shuhuai007/spark branch-1.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3156.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3156


commit 49e26074280f4de02251aba1422d6924a3c61ef9
Author: Joe zhoujie...@126.com
Date:   2014-11-07T09:46:17Z

[SPARK-4275]fix for path including space




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4275]fix for path including space

2014-11-07 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3156#issuecomment-62120878
  
Duplicate of SPARK-3337


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4275]fix for path including space

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3156#issuecomment-62121073
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...

2014-11-07 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3149#issuecomment-62121116
  
In #3063, `HiveContext.toRdd` was removed ([line 
377](https://github.com/apache/spark/pull/3063/files#diff-ff50aea397a607b79df9bec6f2a841dbL377))
 , and the copy operation was moved to `HiveContext.stringResult` ([line 
436](https://github.com/apache/spark/pull/3063/files#diff-ff50aea397a607b79df9bec6f2a841dbL436)).
 However, the Thrift server relies on `HiveContext.toRdd` to retrieve result 
RDD, thus causes this bug.

@marmbrus I'm a bit confused here, could you please elaborate on the reason 
behind this change? Reverting this change should fix this bug, but I'm not sure 
whether this breaks any other contracts introduced in #3063.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...

2014-11-07 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3149#issuecomment-62121643
  
@scwf Oh, didn't notice you've already pointed this out :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-4231: Add RankingMetrics to exam...

2014-11-07 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/3098#discussion_r20002546
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala ---
@@ -165,22 +169,60 @@ object MovieLensALS {
   .setProductBlocks(params.numProductBlocks)
   .run(training)
 
-val rmse = computeRmse(model, test, params.implicitPrefs)
-
-println(sTest RMSE = $rmse.)
-
+val (rmse, userMap, productMap) = 
+  computeRecommendationMetrics(model, test, params.implicitPrefs)
+
+println(sTest RMSE = $rmse user MAP = $userMap product MAP = 
$productMap.)
+
 sc.stop()
   }
-
-  /** Compute RMSE (Root Mean Squared Error). */
-  def computeRmse(model: MatrixFactorizationModel, data: RDD[Rating], 
implicitPrefs: Boolean) = {
-
-def mapPredictedRating(r: Double) = if (implicitPrefs) 
math.max(math.min(r, 1.0), 0.0) else r
-
+  
+  /**  
+   * Threshold for predictions are at 0.5
+   */
+  def mapPredictedRating(r: Double, implicitPrefs: Boolean) = {
+if (implicitPrefs) math.max(math.min(r, 1.0), 0.0)
+else math.max(round(r), 0.0)
+  }
+  
+  /**  
+   * Compute MAP (Mean Average Precision) statistics
+   */
+  def computeMap(predictedAndLabels: RDD[(Int, (Double, Double))]) = {
+ val ranking = predictedAndLabels.groupByKey.map {
+  case (user, entries) = {
+val predictionValues = entries.toArray
--- End diff --

I was going to comment on this point too - MAP has a max of 1.0.

The input to `RankingMetrics` should be RDD[(predicted IDs array), (ground 
truth IDs array)], where the predictions are ordered by score (position matters 
for avg precision at K).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-11-07 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2991#discussion_r20003429
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala
 ---
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.kafka
+
+import java.util.Properties
+import java.util.concurrent.{ConcurrentHashMap, Executors}
+
+import scala.collection.Map
+import scala.collection.mutable
+import scala.reflect.{classTag, ClassTag}
+
+import kafka.common.TopicAndPartition
+import kafka.consumer.{Consumer, ConsumerConfig, ConsumerConnector}
+import kafka.serializer.Decoder
+import kafka.utils.{ZkUtils, ZKGroupTopicDirs, ZKStringSerializer, 
VerifiableProperties}
+import org.I0Itec.zkclient.ZkClient
+
+import org.apache.spark.{SparkEnv, Logging}
+import org.apache.spark.storage.{StreamBlockId, StorageLevel}
+import org.apache.spark.streaming.receiver.{BlockGeneratorListener, 
BlockGenerator, Receiver}
+
+private[streaming]
+class ReliableKafkaReceiver[
+  K: ClassTag,
+  V: ClassTag,
+  U : Decoder[_]: ClassTag,
+  T : Decoder[_]: ClassTag](
+kafkaParams: Map[String, String],
+topics: Map[String, Int],
+storageLevel: StorageLevel)
+extends Receiver[Any](storageLevel) with Logging {
+
+  /** High level consumer to connect to Kafka */
+  private var consumerConnector: ConsumerConnector = null
+
+  /** zkClient to connect to Zookeeper to commit the offsets */
+  private var zkClient: ZkClient = null
+
+  private val groupId = kafkaParams(group.id)
+
+  private lazy val env = SparkEnv.get
+
+  private val AUTO_OFFSET_COMMIT = auto.commit.enable
+
+  /** A HashMap to manage the offset for each topic/partition, this 
HashMap is called in
+* synchronized block, so mutable HashMap will not meet concurrency 
issue */
+  private lazy val topicPartitionOffsetMap = new 
mutable.HashMap[TopicAndPartition, Long]
+
+  /** A concurrent HashMap to store the stream block id and related offset 
snapshot */
+  private lazy val blockOffsetMap =
+new ConcurrentHashMap[StreamBlockId, Map[TopicAndPartition, Long]]
+
+  private lazy val blockGeneratorListener = new BlockGeneratorListener {
--- End diff --

Good to define the named class for this generator listener.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Allow disabling direct allocation in Net...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3155#issuecomment-62127176
  
  [Test build #23050 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23050/consoleFull)
 for   PR 3155 at commit 
[`5645c30`](https://github.com/apache/spark/commit/5645c30d64c9b4c9095a5a8ff82647e97943be2d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Allow disabling direct allocation in Net...

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3155#issuecomment-62127181
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23050/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3146#issuecomment-62127452
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23051/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3146#issuecomment-62127443
  
  [Test build #23051 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23051/consoleFull)
 for   PR 3146 at commit 
[`ed1102a`](https://github.com/apache/spark/commit/ed1102a007097e8eeb1d87f8cac0c85b3e71e2dd).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-62135443
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23052/consoleFull)
 for   PR 2906 at commit 
[`8355f95`](https://github.com/apache/spark/commit/8355f959f02ca67454c9cb070912480db0a44671).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4225][SQL] Resorts to SparkContext.vers...

2014-11-07 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3105#issuecomment-62139172
  
@scwf Thanks for reminding, rebased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4225][SQL] Resorts to SparkContext.vers...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3105#issuecomment-62140381
  
  [Test build #23053 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23053/consoleFull)
 for   PR 3105 at commit 
[`d9585e1`](https://github.com/apache/spark/commit/d9585e1db73798b881f1908e784c6fffd8ff9446).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-11-07 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/3157

SPARK-4297 [BUILD] Build warning fixes omnibus

There are a number of warnings generated in a normal, successful build 
right now. They're mostly Java unchecked cast warnings, which can be 
suppressed. But there's a grab bag of other Scala language warnings and so on 
that can all be easily fixed. The forthcoming PR fixes about 90% of the build 
warnings I see now.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-4297

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3157.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3157


commit 17bc58143948d6deea8ada0ad9643958b5daf1db
Author: Sean Owen so...@cloudera.com
Date:   2014-11-07T13:30:33Z

Suppress unchecked cast warnings, and several other build warning fixes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3722][Docs]minor improvement and fix in...

2014-11-07 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2579#issuecomment-62144467
  
@srowen did you have any further comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3157#issuecomment-62144509
  
  [Test build #23054 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23054/consoleFull)
 for   PR 3157 at commit 
[`17bc581`](https://github.com/apache/spark/commit/17bc58143948d6deea8ada0ad9643958b5daf1db).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3157#issuecomment-62144778
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23054/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3157#issuecomment-62144774
  
  [Test build #23054 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23054/consoleFull)
 for   PR 3157 at commit 
[`17bc581`](https://github.com/apache/spark/commit/17bc58143948d6deea8ada0ad9643958b5daf1db).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3722][Docs]minor improvement and fix in...

2014-11-07 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2579#issuecomment-62144929
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3157#issuecomment-62146486
  
  [Test build #23055 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23055/consoleFull)
 for   PR 3157 at commit 
[`27800f7`](https://github.com/apache/spark/commit/27800f7602b2e1c338f176f6ebc46b65fc280b9a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-62147985
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23052/consoleFull)
 for   PR 2906 at commit 
[`8355f95`](https://github.com/apache/spark/commit/8355f959f02ca67454c9cb070912480db0a44671).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaHierarchicalClustering `
  * `trait HierarchicalClusteringConf extends Serializable `
  * `class HierarchicalClustering(`
  * `class HierarchicalClusteringModel(object):`
  * `class HierarchicalClustering(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-62147990
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23052/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4225][SQL] Resorts to SparkContext.vers...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3105#issuecomment-62156413
  
  [Test build #23053 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23053/consoleFull)
 for   PR 3105 at commit 
[`d9585e1`](https://github.com/apache/spark/commit/d9585e1db73798b881f1908e784c6fffd8ff9446).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4225][SQL] Resorts to SparkContext.vers...

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3105#issuecomment-62156422
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23053/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date classes w...

2014-11-07 Thread culler

Github user culler closed the pull request at:

https://github.com/apache/spark/pull/3066


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date classes w...

2014-11-07 Thread culler

Github user culler commented on the pull request:

https://github.com/apache/spark/pull/3066#issuecomment-62158264
  
Hi @liancheng , now that I have completely screwed up this PR by attempting 
to rebase the repository, I will close it and open a new one which will 
hopefully be clean.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3157#issuecomment-62161953
  
  [Test build #23055 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23055/consoleFull)
 for   PR 3157 at commit 
[`27800f7`](https://github.com/apache/spark/commit/27800f7602b2e1c338f176f6ebc46b65fc280b9a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4297 [BUILD] Build warning fixes omnibus

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3157#issuecomment-62161978
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23055/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-07 Thread mdagost

Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62164793
  
Those changes are made, and I removed the extra static methods that I 
added.  I agree--it's much cleaner now.  Not sure if any cleanup can be done on 
the existing static methods--looks like they're only used in the test suites, 
but I'm going to leave them alone for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-07 Thread ryan-williams

Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-62165292
  
OK, I refactored a little further. Updates:

* renamed the helper function `maybeMoveFile` (instead of `moveFile`)
* introduced a second signature for `maybeMoveFile` that just takes two 
`File`s
* this allowed me to bring the 3rd instance of this repeated logic in 
`Utils.doFetchFile` into the fold, which helps the overall consistency / 
cleanliness a lot, I think.
* incidentally, that last code path handled the `exists` vs. `delete()` 
trickery differently than I was doing before; it used a boolean `var` that 
recorded explicitly whether we `shouldCopy` (`true` to start, set to `false` 
iff we found an identical file to exist). I decided that this way was cleaner, 
per @andrewor14's and @pwendell's (earlier in this thread) suggestions, and 
structured `maybeMoveFile` that way.
* folded the code path around L397 into `maybeMoveFile` as well, per 
@andrewor14's last suggestion.

lmk how it looks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...

2014-11-07 Thread culler

GitHub user culler opened a pull request:

https://github.com/apache/spark/pull/3158

[SPARK-4205][SQL] Timestamp and Date with comparisons / DSL literals

This is the same as pull request #3066, which I closed due to corruption of 
the repository after I tried to rebase so as to include modifications to a test 
file added after the original pull request was issued.

There are two parts:
(1) new RichDate and RichTimestamp classes provide comparison operators, 
which allows them to be used in DSL expressions, and initializers which accept 
string representations of dates or times;
(2) new implicit conversions are added which allow recognition of DSL 
expressions which have a literal on the left, e.g. 0  'x . 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/culler/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3158


commit 7180345ba7ce255a2fc389ae8c55998ed20b9a82
Author: Marc Culler marc.cul...@gmail.com
Date:   2014-11-07T16:11:09Z

Adds RichDate and RichTimestamp classes with comparison operators,
allowing them to be used in DSL expressions.  These classes provide
initializers which accept string representations of dates or times.
They are renamed as Date and Timestamp when the members of an
SQLContext are in scope.

commit bcf6e6bb143f8e4a5f22356fadae54fce4f57041
Author: Marc Culler marc.cul...@gmail.com
Date:   2014-11-07T16:17:33Z

Adds new implicit conversions which allow DSL expressions to start
with a literal, e.g. 0  'x .
These conversions expose a conflict with the scalatest === operator
if assert(X === Y) is used when the conversions are in scope.  To
fix this, several tests are modified, as recommended in the scalatest
documentation, by making the change:
assert(X === Y) -- assert(convertToEqualizer(X).===(Y))

commit ef5e4a4230d671ed2ae19f74c280f5e8c44f41aa
Author: Marc Culler marc.cul...@gmail.com
Date:   2014-11-07T16:38:18Z

Clarification of one comment.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3158#issuecomment-62175153
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...

2014-11-07 Thread culler

Github user culler commented on the pull request:

https://github.com/apache/spark/pull/3158#issuecomment-62175638
  
@liangcheng and @rxin, I am reopening pull request #3066 as #3158 so it can 
be based on a current commit of the spark source.  I messed up #3066 by trying 
to rebase it after a new test file was added which required minor changes to 
compile.  Sorry for any confusion this causes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4281][Build] Package Yarn shuffle servi...

2014-11-07 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3147#discussion_r20023234
  
--- Diff: network/common/pom.xml ---
@@ -41,12 +41,12 @@
   groupIdio.netty/groupId
   artifactIdnetty-all/artifactId
 /dependency
+
+!-- Provided dependencies --
 dependency
   groupIdorg.slf4j/groupId
   artifactIdslf4j-api/artifactId
--- End diff --

Yeah, actually Yarn already provides slf4j so it doesn't need to be a core 
dependency. For standalone mode, this is also required by Spark so it should 
just be a provided dependency. HOWEVER I just realized I forgot to actually 
make it provided by adding the tag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4281][Build] Package Yarn shuffle servi...

2014-11-07 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3147#discussion_r20023680
  
--- Diff: network/yarn/pom.xml ---
@@ -54,5 +54,38 @@
   build
 
outputDirectorytarget/scala-${scala.binary.version}/classes/outputDirectory
 
testOutputDirectorytarget/scala-${scala.binary.version}/test-classes/testOutputDirectory
+plugins
--- End diff --

My understanding is that the shading plugin is primarily used to create 
uber jars, and the shading dependency part is just a generally useful thing in 
this process: http://maven.apache.org/plugins/maven-shade-plugin/. This is how 
we create assembly jars in say the `example` and `core` modules, except the 
difference here is that we don't actually need to shade any dependencies. I 
think this is a pretty standard thing to do and I'm not sure if a comment is 
necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4291][Build] Rename network module proj...

2014-11-07 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3148#issuecomment-62180406
  
Ok, thanks @shivaram. I renamed it Spark Project Networking. What do 
others think about this name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4281][Build] Package Yarn shuffle servi...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3147#issuecomment-62180858
  
  [Test build #23056 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23056/consoleFull)
 for   PR 3147 at commit 
[`65db822`](https://github.com/apache/spark/commit/65db8227ef5632ff53574fc8efd7c579b6f26133).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4281][Build] Package Yarn shuffle servi...

2014-11-07 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3147#discussion_r20024208
  
--- Diff: network/common/pom.xml ---
@@ -41,12 +41,12 @@
   groupIdio.netty/groupId
   artifactIdnetty-all/artifactId
 /dependency
+
+!-- Provided dependencies --
 dependency
   groupIdorg.slf4j/groupId
   artifactIdslf4j-api/artifactId
--- End diff --

Ok I added the tag


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4291][Build] Rename network module proj...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3148#issuecomment-62182350
  
  [Test build #23057 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23057/consoleFull)
 for   PR 3148 at commit 
[`eac839b`](https://github.com/apache/spark/commit/eac839b0c8524ae778b09c23b7296a1c75e51297).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...

2014-11-07 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3146#issuecomment-62183450
  
Merging in master  branch-1.2. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4187] [Core] Switch to binary protocol ...

2014-11-07 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3146


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-07 Thread avulanov

Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62187026
  
@witgo Thank you for your suggestion! Could you elaborate how als algorithm 
design could be used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: invalid variable

2014-11-07 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3154#issuecomment-62189370
  
Do you mind closing this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...

2014-11-07 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3149#issuecomment-62189648
  
Good catch guys, and thanks for adding a test.

The comment on `toRdd` has always been `/** Internal version of the RDD. 
Avoids copies and has no schema */` so it was kind of confusing that this was 
different for Hive.

I think the right solution here is to avoid using the internal 
`queryExecution` API from the thrift server and instead just call `.collect()` 
on `resultRdd`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...

2014-11-07 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3149#issuecomment-62190459
  
@marmbrus, i think you mean ```.collect()``` on ```result```, not  
```resultRdd```, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-07 Thread debasish83

Github user debasish83 commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62190595
  
For matrix factorization we have user x product sparse matrix...You can 
think of this sparse matrix as the feature matrix for ANN...Now consider two 
matrices H1 and H2 of size feature x rank...where rank is the number of hidden 
layers...With this the problem is minimize || X - f(H1'X)H2 || + lambdaL1(H1) + 
lambdaL2(H2)

The major difference is can H1'X breaks the way matrix factorization breaks 
? If it can then we should be able to use ALS design...or an extension of ALS 
design...

But say the hidden layer grows from 1 to 10 (Latest Google paper mentioned 
22 layers)...then I don't think this idea works...we have to formulate the 
problem on graphx where the model is distributed over workers and not built on 
Master  

@witgo you think we can break f(H1'X) in ALS way? I have not thought more 
on it !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4291][Build] Rename network module proj...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3148#issuecomment-62191364
  
  [Test build #23057 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23057/consoleFull)
 for   PR 3148 at commit 
[`eac839b`](https://github.com/apache/spark/commit/eac839b0c8524ae778b09c23b7296a1c75e51297).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4291][Build] Rename network module proj...

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3148#issuecomment-62191373
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23057/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...

2014-11-07 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3149#issuecomment-62191452
  
Yes, correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-07 Thread debasish83

Github user debasish83 commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62191470
  
f is neural activation...it can be tanh or sigmoid function (they are 
non-convex, nonlinear) , LRU units (max is convex)...in this PR 
https://github.com/apache/spark/pull/2705 I am experimenting with convex and 
nonlinear functions for matrix factorization loss..Idea is to use the gradient 
interfaces for the loss functions...if f(H1'X) can break component wise we can 
re-use lot of ALS development...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4291][Build] Rename network module proj...

2014-11-07 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3148#issuecomment-62191626
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4291][Build] Rename network module proj...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3148#issuecomment-62192179
  
  [Test build #23058 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23058/consoleFull)
 for   PR 3148 at commit 
[`eac839b`](https://github.com/apache/spark/commit/eac839b0c8524ae778b09c23b7296a1c75e51297).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4292][SQL] Result set iterator bug in J...

2014-11-07 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3149#issuecomment-62192492
  
Hmm, i think ```result.collect``` is ok, but ```result.toLocalIterator``` 
can get the right answer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4281][Build] Package Yarn shuffle servi...

2014-11-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3147#issuecomment-62193940
  
  [Test build #23056 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23056/consoleFull)
 for   PR 3147 at commit 
[`65db822`](https://github.com/apache/spark/commit/65db8227ef5632ff53574fc8efd7c579b6f26133).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4281][Build] Package Yarn shuffle servi...

2014-11-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3147#issuecomment-62193959
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23056/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 414 matches

Mail list logo