date:20150827

[GitHub] spark pull request: Mytest0

2015-08-27 Thread semad

Github user semad closed the pull request at:

https://github.com/apache/spark/pull/8472


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9089] [Core] Fallback to another one if...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8337#issuecomment-135446848
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10316][SQL] respect nondeterministic ex...

2015-08-27 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/8486

[SPARK-10316][SQL] respect nondeterministic expressions in PhysicalOperation

We did a lot of special handling for non-deterministic expressions in 
`Optimizer`. However, `PhysicalOperation` just collects all Projects and 
Filters and mess it up. We should respect the operators order caused by 
non-deterministic expressions in `PhysicalOperation`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8486.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8486


commit 16ae7e2394caf6a3925cb3c69692b4b14c7811cb
Author: Wenchen Fan cloud0...@outlook.com
Date:   2015-08-27T14:37:46Z

respect nondeterministic expressions in PhysicalOperation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135445272
  
  [Test build #41687 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41687/console)
 for   PR 8484 at commit 
[`9f88786`](https://github.com/apache/spark/commit/9f88786b4efa89a7eddd81e6bba58700630a4429).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135445314
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135436303
  
  [Test build #41687 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41687/consoleFull)
 for   PR 8484 at commit 
[`9f88786`](https://github.com/apache/spark/commit/9f88786b4efa89a7eddd81e6bba58700630a4429).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...

2015-08-27 Thread CodingCat

Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/8483#issuecomment-135407119
  
just look at the branch-1.5 code, this parameter is not used (I guess this 
was used in the years when we  used death watch in Spark's implementation)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...

2015-08-27 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/8441#discussion_r38089726
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1696,12 +1711,16 @@ version specified by users. An isolated classloader 
is used here to avoid depend
   property can be one of three options:
   ol
 licodebuiltin/code/li
-Use Hive 0.13.1, which is bundled with the Spark assembly jar when 
code-Phive/code is
+Use Hive 1.2.1, which is bundled with the Spark assembly jar when 
code-Phive/code is
 enabled. When this option is chosen, 
codespark.sql.hive.metastore.version/code must be
-either code0.13.1/code or not defined.
+either code1.2.1/code or not defined.
 licodemaven/code/li
-Use Hive jars of specified version downloaded from Maven 
repositories.
-liA classpath in the standard format for both Hive and 
Hadoop./li
+Use Hive jars of specified version downloaded from Maven 
repositories.  This configuration
+is not generally recommended for production deployments. 
+liA classpath in the standard format for the JVM.  This 
classpath must include all of Hive 
+and its dependencies, including the correct version of Hadoop.  
These jars only need to be
+present on the driver, but if you are running in yarn client mode 
then you must ensure
--- End diff --

These jars aren't needed by the executors at all?   If that is the case the 
only time they need to be shipped is in yarn cluster mode.  




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] Use OrcStructInspector to be...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-135423916
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] Use OrcStructInspector to be...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-135423867
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8485#issuecomment-135436880
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8485#issuecomment-135436933
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8485#issuecomment-135445963
  
  [Test build #41688 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41688/console)
 for   PR 8485 at commit 
[`565a831`](https://github.com/apache/spark/commit/565a83142eae29b23ad1bdae3239df375cc47001).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DCT(JavaTransformer, HasInputCol, HasOutputCol):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9089] [Core] Fallback to another one if...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8337#issuecomment-135449083
  
  [Test build #41689 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41689/consoleFull)
 for   PR 8337 at commit 
[`573a37c`](https://github.com/apache/spark/commit/573a37c6a541d6993d6a45a2f7977056e936b05d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] Use OrcStructInspector to be...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-135426805
  
  [Test build #41686 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41686/consoleFull)
 for   PR 7520 at commit 
[`dc8bd26`](https://github.com/apache/spark/commit/dc8bd26b21b67b9bc8d4021965a10bc29ce3b379).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135434765
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9089] [Core] Fallback to another one if...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8337#issuecomment-135446822
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9089] [Core] Fallback to another one if...

2015-08-27 Thread yanboliang

Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/8337#issuecomment-135446690
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.

2015-08-27 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/8398#issuecomment-135409799
  
I think it would be nice if we update the docs to tell users * is 
supported.  Can you update docs/configuration.md. Perhaps under each 
description of modify.acsl, view.acls, admin.acls add something that says 
Special value of * means anyone


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135434696
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT

2015-08-27 Thread yanboliang

GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/8485

[SPARK-8472] [ML] [PySpark] Python API for DCT

Add Python API for ml.feature.DCT.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-8472

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8485.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8485


commit 565a83142eae29b23ad1bdae3239df375cc47001
Author: Yanbo Liang yblia...@gmail.com
Date:   2015-08-27T13:42:04Z

Python API for DCT




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135445315
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41687/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.

2015-08-27 Thread zhuoliu

Github user zhuoliu commented on the pull request:

https://github.com/apache/spark/pull/8398#issuecomment-135452755
  
Sure. Docs updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10316][SQL] respect nondeterministic ex...

2015-08-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/8486#discussion_r38103673
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -26,83 +24,28 @@ import org.apache.spark.sql.catalyst.plans._
 import org.apache.spark.sql.catalyst.plans.logical._
 
 /**
- * A pattern that matches any number of filter operations on top of 
another relational operator.
- * Adjacent filter operators are collected and their conditions are broken 
up and returned as a
- * sequence of conjunctive predicates.
- *
- * @return A tuple containing a sequence of conjunctive predicates that 
should be used to filter the
- * output and a relational operator.
+ * A pattern that matches at most one Filter and one Project on top of 
another relational operator.
+ * Filter condition is broken up to conjunctive parts.
  */
-object FilteredOperation extends PredicateHelper {
--- End diff --

The `FilteredOperation` is not used anywhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...

2015-08-27 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/8483#issuecomment-135405791
  
@CodingCat yes, looks unused. Is it unused as of 1.5 too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8464#issuecomment-135407868
  
  [Test build #41681 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41681/console)
 for   PR 8464 at commit 
[`22e7bc0`](https://github.com/apache/spark/commit/22e7bc0b9882b637bb06ee39a66d3ece789042fa).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class LimitNode(limit: Int, child: LocalNode) extends 
UnaryLocalNode `
  * `case class UnionNode(children: Seq[LocalNode]) extends LocalNode `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8464#issuecomment-135407996
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8464#issuecomment-135407998
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41681/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] User-provided columns should...

2015-08-27 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-135423114
  
@liancheng Thanks for the clear investigation and explanation.

If I understand it correctly, it means that the original direction of this 
PR is correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] Use OrcStructInspector to be...

2015-08-27 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-135437020
  
@viirya Yeah, I agree with you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8485#issuecomment-135439908
  
  [Test build #41688 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41688/consoleFull)
 for   PR 8485 at commit 
[`565a831`](https://github.com/apache/spark/commit/565a83142eae29b23ad1bdae3239df375cc47001).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8485#issuecomment-135446096
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8472] [ML] [PySpark] Python API for DCT

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8485#issuecomment-135446099
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41688/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8436#issuecomment-135491197
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8398#issuecomment-135493649
  
  [Test build #41699 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41699/console)
 for   PR 8398 at commit 
[`b1d49b3`](https://github.com/apache/spark/commit/b1d49b32a7b1e85c265a8cee8930d0138fd3bd8d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8398#issuecomment-135493701
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41699/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7685][ML] Apply weights to different sa...

2015-08-27 Thread feynmanliang

Github user feynmanliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/7884#discussion_r38122528
  
--- Diff: project/MimaExcludes.scala ---
@@ -60,6 +60,10 @@ object MimaExcludes {
   org.apache.spark.ml.regression.LeastSquaresCostFun.this),
 ProblemFilters.exclude[MissingMethodProblem](
   org.apache.spark.ml.classification.LogisticCostFun.this),
+ProblemFilters.exclude[MissingMethodProblem](
--- End diff --

Good point, this is OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8436#issuecomment-135502190
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8436#issuecomment-135502194
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41702/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...

2015-08-27 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/8441#issuecomment-135508637
  
thanks LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8441#issuecomment-135509928
  
  [Test build #41704 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41704/consoleFull)
 for   PR 8441 at commit 
[`f3fdf62`](https://github.com/apache/spark/commit/f3fdf625b0b092984d8d5f0e733a130ff9ff92b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8464#discussion_r38126845
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala ---
@@ -0,0 +1,45 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.local
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+
+
+case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode {
+
+  private[this] var count = 0
+
+  override def output: Seq[Attribute] = child.output
--- End diff --

Why do iterators need to know their `output`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8464#discussion_r38126899
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala ---
@@ -0,0 +1,45 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.local
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+
+
+case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode {
--- End diff --

Is there a need to distinguish `Unary` operators from others?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8464#discussion_r38126926
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala ---
@@ -0,0 +1,45 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.local
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+
+
+case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode {
+
+  private[this] var count = 0
+
+  override def output: Seq[Attribute] = child.output
+
+  override def open(): Unit = child.open()
--- End diff --

Should this also reset the count?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8464#discussion_r38128387
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala ---
@@ -0,0 +1,45 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.local
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+
+
+case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode {
+
+  private[this] var count = 0
+
+  override def output: Seq[Attribute] = child.output
--- End diff --

Hmm I guess this is useful for `collect` which is nice for debugging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8464#discussion_r38128313
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/local/LocalNodeTest.scala
 ---
@@ -0,0 +1,189 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.local
+
+import scala.util.control.NonFatal
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.types.StructType
+
+class LocalNodeTest extends SparkFunSuite {
+
+  /**
+   * Runs the LocalNode and makes sure the answer matches the expected 
result.
+   * @param input the input data to be used.
+   * @param nodeFunction a function which accepts the input LocalNode and 
uses it to instantiate
+   * the local physical operator that's being tested.
+   * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s.
+   * @param sortAnswers if true, the answers will be sorted by their 
toString representations prior
+   *to being compared.
+   */
+  protected def checkAnswer(
+  input: DataFrame,
+  nodeFunction: LocalNode = LocalNode,
+  expectedAnswer: Seq[Row],
+  sortAnswers: Boolean = true): Unit = {
+doCheckAnswer(
+  input :: Nil,
+  nodes = nodeFunction(nodes.head),
+  expectedAnswer,
+  sortAnswers)
+  }
+
+  /**
+   * Runs the LocalNode and makes sure the answer matches the expected 
result.
+   * @param left the left input data to be used.
+   * @param right the right input data to be used.
+   * @param nodeFunction a function which accepts the input LocalNode and 
uses it to instantiate
+   * the local physical operator that's being tested.
+   * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s.
+   * @param sortAnswers if true, the answers will be sorted by their 
toString representations prior
+   *to being compared.
+   */
+  protected def checkAnswer2(
+  left: DataFrame,
+  right: DataFrame,
+  nodeFunction: (LocalNode, LocalNode) = LocalNode,
+  expectedAnswer: Seq[Row],
+  sortAnswers: Boolean = true): Unit = {
+doCheckAnswer(
+  left :: right :: Nil,
+  nodes = nodeFunction(nodes(0), nodes(1)),
+  expectedAnswer,
+  sortAnswers)
+  }
+
+  /**
+   * Runs the `LocalNode`s and makes sure the answer matches the expected 
result.
+   * @param input the input data to be used.
+   * @param nodeFunction a function which accepts a sequence of input 
`LocalNode`s and uses them to
+   * instantiate the local physical operator that's 
being tested.
+   * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s.
+   * @param sortAnswers if true, the answers will be sorted by their 
toString representations prior
+   *to being compared.
+   */
+  protected def doCheckAnswer(
+input: Seq[DataFrame],
+nodeFunction: Seq[LocalNode] = LocalNode,
+expectedAnswer: Seq[Row],
+sortAnswers: Boolean = true): Unit = {
+LocalNodeTest.checkAnswer(
+  input.map(dataFrameToSeqScanNode), nodeFunction, expectedAnswer, 
sortAnswers) match {
+  case Some(errorMessage) = fail(errorMessage)
+  case None =
+}
+  }
+
+  protected def dataFrameToSeqScanNode(df: DataFrame): SeqScanNode = {
+new SeqScanNode(
+  df.queryExecution.sparkPlan.output,
+  df.queryExecution.toRdd.map(_.copy()).collect())
+  }
+
+}
+
+/**
+ * Helper methods for writing tests of individual local physical operators.
+ */
+object LocalNodeTest {
+
+  /**
+   * Runs the `LocalNode`s and makes sure the

[GitHub] spark pull request: [SPARK-10020][MLlib]: ML model broadcasts shou...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8249#issuecomment-135513807
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10020][MLlib]: ML model broadcasts shou...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8249#issuecomment-135513810
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41693/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10020][MLlib]: ML model broadcasts shou...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8249#issuecomment-135513716
  
  [Test build #41693 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41693/console)
 for   PR 8249 at commit 
[`6dd471f`](https://github.com/apache/spark/commit/6dd471fef77092f2a0406f82dada49f7fb176757).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8464#discussion_r38128741
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala ---
@@ -0,0 +1,45 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.local
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+
+
+case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode {
+
+  private[this] var count = 0
+
+  override def output: Seq[Attribute] = child.output
+
+  override def open(): Unit = child.open()
+
+  override def close(): Unit = child.close()
+
+  override def get(): InternalRow = child.get()
--- End diff --

`get` should probably not have `()`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10018][MLlib]: ML model broadcasts shou...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8247#issuecomment-135469131
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10020][MLlib]: ML model broadcasts shou...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8249#issuecomment-135469107
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10019][MLlib]: ML model broadcasts shou...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8248#issuecomment-135469126
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10020][MLlib]: ML model broadcasts shou...

2015-08-27 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8249#issuecomment-135468749
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135469174
  
  [Test build #41691 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41691/console)
 for   PR 8484 at commit 
[`9f88786`](https://github.com/apache/spark/commit/9f88786b4efa89a7eddd81e6bba58700630a4429).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class LogisticRegressionModel @Since(1.3.0) (`
  * `class SVMModel @Since(1.1.0) (`
  * `class GaussianMixtureModel @Since(1.3.0) (`
  * `class KMeansModel @Since(1.1.0) (@Since(1.0.0) val clusterCenters: 
Array[Vector])`
  * `class PowerIterationClusteringModel @Since(1.3.0) (`
  * `class StreamingKMeansModel @Since(1.2.0) (`
  * `class StreamingKMeans @Since(1.2.0) (`
  * `class BinaryClassificationMetrics @Since(1.3.0) (`
  * `class MulticlassMetrics @Since(1.1.0) (predictionAndLabels: 
RDD[(Double, Double)]) `
  * `class MultilabelMetrics @Since(1.2.0) (predictionAndLabels: 
RDD[(Array[Double], Array[Double])]) `
  * `class RegressionMetrics @Since(1.2.0) (`
  * `class ChiSqSelectorModel @Since(1.3.0) (`
  * `class ChiSqSelector @Since(1.3.0) (`
  * `class ElementwiseProduct @Since(1.4.0) (`
  * `class IDF @Since(1.2.0) (@Since(1.2.0) val minDocFreq: Int) `
  * `class Normalizer @Since(1.1.0) (p: Double) extends VectorTransformer 
`
  * `class PCA @Since(1.4.0) (@Since(1.4.0) val k: Int) `
  * `class StandardScaler @Since(1.1.0) (withMean: Boolean, withStd: 
Boolean) extends Logging `
  * `class StandardScalerModel @Since(1.3.0) (`
  * `class FPGrowthModel[Item: ClassTag] @Since(1.3.0) (`
  * `  class FreqItemset[Item] @Since(1.3.0) (`
  * `  class FreqSequence[Item] @Since(1.5.0) (`
  * `class PrefixSpanModel[Item] @Since(1.5.0) (`
  * `class DenseMatrix @Since(1.3.0) (`
  * `class SparseMatrix @Since(1.3.0) (`
  * `class DenseVector @Since(1.0.0) (`
  * `class SparseVector @Since(1.0.0) (`
  * `class BlockMatrix @Since(1.3.0) (`
  * `class CoordinateMatrix @Since(1.0.0) (`
  * `class IndexedRowMatrix @Since(1.0.0) (`
  * `class RowMatrix @Since(1.0.0) (`
  * `class PoissonGenerator @Since(1.1.0) (`
  * `class ExponentialGenerator @Since(1.3.0) (`
  * `class GammaGenerator @Since(1.3.0) (`
  * `class LogNormalGenerator @Since(1.3.0) (`
  * `case class Rating @Since(0.8.0) (`
  * `class MatrixFactorizationModel @Since(0.8.0) (`
  * `abstract class GeneralizedLinearModel @Since(1.0.0) (`
  * `class IsotonicRegressionModel @Since(1.3.0) (`
  * `case class LabeledPoint @Since(1.0.0) (`
  * `class LassoModel @Since(1.1.0) (`
  * `class LinearRegressionModel @Since(1.1.0) (`
  * `class RidgeRegressionModel @Since(1.1.0) (`
  * `class MultivariateGaussian @Since(1.3.0) (`
  * `case class BoostingStrategy @Since(1.4.0) (`
  * `class Strategy @Since(1.3.0) (`
  * `class DecisionTreeModel @Since(1.0.0) (`
  * `class Node @Since(1.2.0) (`
  * `class Predict @Since(1.2.0) (`
  * `class RandomForestModel @Since(1.2.0) (`
  * `class GradientBoostedTreesModel @Since(1.2.0) (`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8241#issuecomment-135469129
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10015][MLlib]: ML model broadcasts shou...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8243#issuecomment-135469145
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10018][MLlib]: ML model broadcasts shou...

2015-08-27 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8247#issuecomment-135468873
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10015][MLlib]: ML model broadcasts shou...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8243#issuecomment-135469113
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10019][MLlib]: ML model broadcasts shou...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8248#issuecomment-135469085
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10018][MLlib]: ML model broadcasts shou...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8247#issuecomment-135469097
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9890] [Doc] [ML] User guide for CountVe...

2015-08-27 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8487#discussion_r38115529
  
--- Diff: docs/ml-features.md ---
@@ -211,6 +211,87 @@ for feature in result.select(result).take(3):
 /div
 /div
 
+## CountVectorizer
+
+As a transformer, `CountVectorizerModel` converts a collection of text 
documents to vectors of token counts.
+It takes parameter `vocabulary: Array[String]` and produces sparse 
representations for the documents over the vocabulary, which can then be passed 
to other algorithms like LDA.
+
+When an a-priori dictionary is not available, `CountVectorizer` can be 
used as an Estimator to extract the vocabulary and generates a 
`CountVectorizerModel`.
+It will select the top `vocabSize` words ordered by term frequency across 
the corpus.
+An optional parameter minDF also affect the fitting process by 
specifying the minimum number (or fraction if  1.0) of documents a term must 
appear in to be included in the vocabulary.
+
+div class=codetabs
+div data-lang=scala markdown=1
+More details can be found in the API docs for

+[CountVectorizer](api/scala/index.html#org.apache.spark.ml.feature.CountVectorizer)
 and

+[CountVectorizerModel](api/scala/index.html#org.apache.spark.ml.feature.CountVectorizerModel).
+{% highlight scala %}
+import org.apache.spark.ml.feature.CountVectorizer
+import org.apache.spark.mllib.util.CountVectorizerModel
+
+val df = sqlContext.createDataFrame(Seq(
+  (0, Array(a, b, c)),
+  (1, Array(a, b, b, c, a))
+)).toDF(id, words)
+
+// define CountVectorizerModel with a-priori vocabulary
+val cv = new CountVectorizerModel(Array(a, b, c))
+  .setInputCol(words)
+  .setOutputCol(features)
+
+// alternatively, fit a CountVectorizerModel from the corpus
+val cv2: CountVectorizerModel = new CountVectorizer()
+  .setInputCol(words)
+  .setOutputCol(features)
+  .setVocabSize(3)
+  .setMinDF(2) // a term must appear in more than 2 documents to be 
included in the vocabulary
+  .fit(df)
+
+cv.transform(df).select(features).collect()
+{% endhighlight %}
+/div
+
+div data-lang=java markdown=1
+More details can be found in the API docs for

+[CountVectorizer](api/java/org/apache/spark/ml/feature/CountVectorizer.html) 
and

+[CountVectorizerModel](api/java/org/apache/spark/ml/feature/CountVectorizerModel.html).
+{% highlight java %}
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.ml.feature.CountVectorizer;
+import org.apache.spark.ml.feature.CountVectorizerModel;
+import org.apache.spark.sql.DataFrame;
+
+// Input data: Each row is a bag of words from a sentence or document.
+JavaRDDRow jrdd = jsc.parallelize(Arrays.asList(
+  RowFactory.create(Arrays.asList(a b c.split( ))),
+  RowFactory.create(Arrays.asList(a b b c a.split( )))
+));
+StructType schema = new StructType(new StructField[]{
+  new StructField(text, new ArrayType(DataTypes.StringType, true), 
false, Metadata.empty())
+});
+DataFrame documentDF = sqlContext.createDataFrame(jrdd, schema);
+
+// define CountVectorizerModel with a-priori vocabulary
+CountVectorizerModel cv = new CountVectorizerModel(new String[]{a, b, 
c})
+  .setInputCol(text)
+  .setOutputCol(feature);
+
+// alternatively, fit a CountVectorizerModel from the corpus
+CountVectorizerModel cv2 = new CountVectorizer()
+  .setInputCol(text)
+  .setOutputCol(feature)
+  .setVocabSize(3)
+  .setMinDF(2) // a term must appear in more than 2 documents to be 
included in the vocabulary
+  .fit(documentDF);
+
+DataFrame result = cv.transform(documentDF);
--- End diff --

use `cv.transform(documentDF).show()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9890] [Doc] [ML] User guide for CountVe...

2015-08-27 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8487#discussion_r38115521
  
--- Diff: docs/ml-features.md ---
@@ -211,6 +211,87 @@ for feature in result.select(result).take(3):
 /div
 /div
 
+## CountVectorizer
+
+As a transformer, `CountVectorizerModel` converts a collection of text 
documents to vectors of token counts.
+It takes parameter `vocabulary: Array[String]` and produces sparse 
representations for the documents over the vocabulary, which can then be passed 
to other algorithms like LDA.
+
+When an a-priori dictionary is not available, `CountVectorizer` can be 
used as an Estimator to extract the vocabulary and generates a 
`CountVectorizerModel`.
+It will select the top `vocabSize` words ordered by term frequency across 
the corpus.
+An optional parameter minDF also affect the fitting process by 
specifying the minimum number (or fraction if  1.0) of documents a term must 
appear in to be included in the vocabulary.
+
+div class=codetabs
+div data-lang=scala markdown=1
+More details can be found in the API docs for

+[CountVectorizer](api/scala/index.html#org.apache.spark.ml.feature.CountVectorizer)
 and

+[CountVectorizerModel](api/scala/index.html#org.apache.spark.ml.feature.CountVectorizerModel).
+{% highlight scala %}
+import org.apache.spark.ml.feature.CountVectorizer
+import org.apache.spark.mllib.util.CountVectorizerModel
+
+val df = sqlContext.createDataFrame(Seq(
+  (0, Array(a, b, c)),
+  (1, Array(a, b, b, c, a))
+)).toDF(id, words)
+
+// define CountVectorizerModel with a-priori vocabulary
+val cv = new CountVectorizerModel(Array(a, b, c))
+  .setInputCol(words)
+  .setOutputCol(features)
+
+// alternatively, fit a CountVectorizerModel from the corpus
+val cv2: CountVectorizerModel = new CountVectorizer()
+  .setInputCol(words)
+  .setOutputCol(features)
+  .setVocabSize(3)
+  .setMinDF(2) // a term must appear in more than 2 documents to be 
included in the vocabulary
+  .fit(df)
+
+cv.transform(df).select(features).collect()
+{% endhighlight %}
+/div
+
+div data-lang=java markdown=1
+More details can be found in the API docs for

+[CountVectorizer](api/java/org/apache/spark/ml/feature/CountVectorizer.html) 
and

+[CountVectorizerModel](api/java/org/apache/spark/ml/feature/CountVectorizerModel.html).
+{% highlight java %}
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.ml.feature.CountVectorizer;
+import org.apache.spark.ml.feature.CountVectorizerModel;
+import org.apache.spark.sql.DataFrame;
+
+// Input data: Each row is a bag of words from a sentence or document.
+JavaRDDRow jrdd = jsc.parallelize(Arrays.asList(
+  RowFactory.create(Arrays.asList(a b c.split( ))),
+  RowFactory.create(Arrays.asList(a b b c a.split( )))
+));
+StructType schema = new StructType(new StructField[]{
+  new StructField(text, new ArrayType(DataTypes.StringType, true), 
false, Metadata.empty())
+});
+DataFrame documentDF = sqlContext.createDataFrame(jrdd, schema);
--- End diff --

`documentDF` - `df` to be consistent with Scala code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...

2015-08-27 Thread feynmanliang

Github user feynmanliang commented on the pull request:

https://github.com/apache/spark/pull/8451#issuecomment-135486169
  
Whoops forgot to push the last commit, the Strings and default list size 
should be there now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Test pr

2015-08-27 Thread semad

GitHub user semad opened a pull request:

https://github.com/apache/spark/pull/8488

Test pr

Pull Req 1

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/semad/spark test_pr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8488.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8488


commit 8c2e18f1f3b09b982ea75f95a22b489f4924a9de
Author: shahram emadi shahram@shahrams-macbook-pro.local
Date:   2015-08-26T22:30:54Z

Add GCP stuff)

commit 31095671fde7b70de9f498ff048790872f1157c7
Author: semad se...@users.noreply.github.com
Date:   2015-08-26T23:05:18Z

Update build_remote.sh

commit a091f158be9ae3189a37670866434a575bd0d968
Author: semad se...@users.noreply.github.com
Date:   2015-08-26T23:22:05Z

Update build_remote.sh

commit 03e7d292a795b1d16f5c426b989d18cd9f86cf28
Author: semad se...@users.noreply.github.com
Date:   2015-08-26T23:29:21Z

Update README.md

Test commits

commit 9e582a065207db0b13aa146fef987bf1e52754fa
Author: semad se...@users.noreply.github.com
Date:   2015-08-26T23:54:27Z

Update README.md

commit e39a30b1166d16fabff797781fcfce4eb732ae93
Author: semad se...@users.noreply.github.com
Date:   2015-08-27T16:27:59Z

Update README.md




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8451#issuecomment-135487161
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8451#issuecomment-135487130
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8241#issuecomment-135482010
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Test pr

2015-08-27 Thread semad

Github user semad closed the pull request at:

https://github.com/apache/spark/pull/8488


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8451#issuecomment-135487780
  
  [Test build #41701 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41701/consoleFull)
 for   PR 8451 at commit 
[`0695e51`](https://github.com/apache/spark/commit/0695e5157ff8bd76d49769d787200f6b4799a294).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8398#issuecomment-135495613
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8398#issuecomment-135495581
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8451#issuecomment-135498467
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8451#issuecomment-135498471
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41701/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7685][ML] Apply weights to different sa...

2015-08-27 Thread feynmanliang

Github user feynmanliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/7884#discussion_r38122369
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -218,31 +217,59 @@ class LogisticRegression(override val uid: String)
 
   override def getThreshold: Double = super.getThreshold
 
+  /**
+   * Whether to over-/undersamples each of training sample according to 
the given
+   * weight in `weightCol`. If empty, all samples are supposed to have 
weights as 1.0.
+   * Default is empty, so all samples have weight one.
+   * @group setParam
+   */
+  def setWeightCol(value: String): this.type = set(weightCol, value)
+  setDefault(weightCol - )
+
   override def setThresholds(value: Array[Double]): this.type = 
super.setThresholds(value)
 
   override def getThresholds: Array[Double] = super.getThresholds
 
   override protected def train(dataset: DataFrame): 
LogisticRegressionModel = {
 // Extract columns from data.  If dataset is persisted, do not persist 
oldDataset.
-val instances = extractLabeledPoints(dataset).map {
-  case LabeledPoint(label: Double, features: Vector) = (label, 
features)
-}
+val instances: Either[RDD[(Double, Vector)], RDD[(Double, Double, 
Vector)]] =
+  if ($(weightCol).isEmpty) {
+Left(dataset.select($(labelCol), $(featuresCol)).map {
+  case Row(label: Double, features: Vector) = (label, features)
+})
+  } else {
+Right(dataset.select($(labelCol), $(weightCol), 
$(featuresCol)).map {
+  case Row(label: Double, weight: Double, features: Vector) =
+(label, weight, features)
+})
+  }
+
 val handlePersistence = dataset.rdd.getStorageLevel == 
StorageLevel.NONE
-if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK)
-
-val (summarizer, labelSummarizer) = instances.treeAggregate(
-  (new MultivariateOnlineSummarizer, new MultiClassSummarizer))(
-seqOp = (c, v) = (c, v) match {
-  case ((summarizer: MultivariateOnlineSummarizer, 
labelSummarizer: MultiClassSummarizer),
-  (label: Double, features: Vector)) =
-(summarizer.add(features), labelSummarizer.add(label))
-},
-combOp = (c1, c2) = (c1, c2) match {
-  case ((summarizer1: MultivariateOnlineSummarizer,
-  classSummarizer1: MultiClassSummarizer), (summarizer2: 
MultivariateOnlineSummarizer,
-  classSummarizer2: MultiClassSummarizer)) =
-(summarizer1.merge(summarizer2), 
classSummarizer1.merge(classSummarizer2))
-  })
+if (handlePersistence) instances.fold(identity, 
identity).persist(StorageLevel.MEMORY_AND_DISK)
+
+val (summarizer, labelSummarizer) = {
+  val combOp = (c1: (MultivariateOnlineSummarizer, 
MultiClassSummarizer),
+c2: (MultivariateOnlineSummarizer, MultiClassSummarizer)) =
+  (c1._1.merge(c2._1), c1._2.merge(c2._2))
+
+  instances match {
--- End diff --

OK I see what's going on; `fold` on the either expects two functions into 
the same type so type inference is inferring an upper bound for `RDD[(Double, 
Vector)]` and `RDD[(Double, Double, Vector)]` whereas in the earlier code 
`instances` was bound by the concrete types within the `Either`.

We can leave as is or remove the `Either`s and use `RDD[(Double, 1.0, 
Vector)]` for the unweighted instances; I am a fan of removing the `Either`s 
since that will reduce pattern matching code but both approaches are acceptable 
to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8464#discussion_r38127111
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala ---
@@ -0,0 +1,45 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.local
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+
+
+case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode {
--- End diff --

Maybe more generally, if we are never going to do transformations of these 
iterator trees, do they need to inherit from `TreeNode`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135514282
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8398#issuecomment-135493699
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10316][SQL] respect nondeterministic ex...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8486#issuecomment-135496815
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41690/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8398#issuecomment-135497162
  
  [Test build #41703 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41703/consoleFull)
 for   PR 8398 at commit 
[`b1d49b3`](https://github.com/apache/spark/commit/b1d49b32a7b1e85c265a8cee8930d0138fd3bd8d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8451#issuecomment-135498118
  
  [Test build #41701 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41701/console)
 for   PR 8451 at commit 
[`0695e51`](https://github.com/apache/spark/commit/0695e5157ff8bd76d49769d787200f6b4799a294).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7685][ML] Apply weights to different sa...

2015-08-27 Thread feynmanliang

Github user feynmanliang commented on the pull request:

https://github.com/apache/spark/pull/7884#issuecomment-135501418
  
LGTM, I slightly prefer the `RDD[(Double, 1.0, Vector)]` approach but it's 
your call


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8436#issuecomment-135502050
  
  [Test build #41702 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41702/console)
 for   PR 8436 at commit 
[`074583e`](https://github.com/apache/spark/commit/074583e2fb5b31275f94af5d35f58fa0f2737c50).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaStopWordsRemoverSuite `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8510] [CORE] [PYSPARK] NumPy matrices a...

2015-08-27 Thread paberline-rms

Github user paberline-rms commented on the pull request:

https://github.com/apache/spark/pull/8384#issuecomment-135505761
  
JIRA: https://issues.apache.org/jira/browse/SPARK-8510


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10182] [MLlib] GeneralizedLinearModel d...

2015-08-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8395


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...

2015-08-27 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8441#discussion_r38126088
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1696,12 +1711,16 @@ version specified by users. An isolated classloader 
is used here to avoid depend
   property can be one of three options:
   ol
 licodebuiltin/code/li
-Use Hive 0.13.1, which is bundled with the Spark assembly jar when 
code-Phive/code is
+Use Hive 1.2.1, which is bundled with the Spark assembly jar when 
code-Phive/code is
 enabled. When this option is chosen, 
codespark.sql.hive.metastore.version/code must be
-either code0.13.1/code or not defined.
+either code1.2.1/code or not defined.
 licodemaven/code/li
-Use Hive jars of specified version downloaded from Maven 
repositories.
-liA classpath in the standard format for both Hive and 
Hadoop./li
+Use Hive jars of specified version downloaded from Maven 
repositories.  This configuration
+is not generally recommended for production deployments. 
+liA classpath in the standard format for the JVM.  This 
classpath must include all of Hive 
+and its dependencies, including the correct version of Hadoop.  
These jars only need to be
+present on the driver, but if you are running in yarn client mode 
then you must ensure
--- End diff --

Correct, they are only used by the driver to get metadata.  Thanks for the 
clarification on cluster vs client.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8441#issuecomment-135508406
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8441#issuecomment-135513991
  
  [Test build #41704 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41704/console)
 for   PR 8441 at commit 
[`f3fdf62`](https://github.com/apache/spark/commit/f3fdf625b0b092984d8d5f0e733a130ff9ff92b4).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135514287
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41700/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7685][ML] Apply weights to different sa...

2015-08-27 Thread dbtsai

Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/7884#issuecomment-135503606
  
I know Xiangrui is using `RDD[(Double, 1.0, Vector)]` in isotonic 
regression, so I don't mind as well as long as everyone is on the same page.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...

2015-08-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8451


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8510] [CORE] [PYSPARK] NumPy matrices a...

2015-08-27 Thread paberline

Github user paberline commented on the pull request:

https://github.com/apache/spark/pull/8384#issuecomment-135506108
  
JIRA: https://issues.apache.org/jira/browse/SPARK-8510


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10182] [MLlib] GeneralizedLinearModel d...

2015-08-27 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/8395#issuecomment-135507172
  
(PS not sure why it doesn't seem to show up, but the tests passed again 
after the last commit: 
https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1685/console
 )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8441#issuecomment-135508384
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8464#discussion_r38126751
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/local/LocalNodeTest.scala
 ---
@@ -0,0 +1,189 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.local
+
+import scala.util.control.NonFatal
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.types.StructType
+
+class LocalNodeTest extends SparkFunSuite {
+
+  /**
+   * Runs the LocalNode and makes sure the answer matches the expected 
result.
+   * @param input the input data to be used.
+   * @param nodeFunction a function which accepts the input LocalNode and 
uses it to instantiate
+   * the local physical operator that's being tested.
+   * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s.
+   * @param sortAnswers if true, the answers will be sorted by their 
toString representations prior
+   *to being compared.
+   */
+  protected def checkAnswer(
+  input: DataFrame,
+  nodeFunction: LocalNode = LocalNode,
+  expectedAnswer: Seq[Row],
+  sortAnswers: Boolean = true): Unit = {
+doCheckAnswer(
+  input :: Nil,
+  nodes = nodeFunction(nodes.head),
+  expectedAnswer,
+  sortAnswers)
+  }
+
+  /**
+   * Runs the LocalNode and makes sure the answer matches the expected 
result.
+   * @param left the left input data to be used.
+   * @param right the right input data to be used.
+   * @param nodeFunction a function which accepts the input LocalNode and 
uses it to instantiate
+   * the local physical operator that's being tested.
+   * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s.
+   * @param sortAnswers if true, the answers will be sorted by their 
toString representations prior
+   *to being compared.
+   */
+  protected def checkAnswer2(
--- End diff --

Just name this `checkAnswer`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8464#discussion_r38127948
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/local/UnionNode.scala ---
@@ -0,0 +1,75 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.local
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+
+case class UnionNode(children: Seq[LocalNode]) extends LocalNode {
--- End diff --

Consider making this an `Array[LocalNode]`.  In general, we should probably 
only be using `Array` as this level of execution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8436#issuecomment-135491108
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8436#issuecomment-135492754
  
  [Test build #41702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41702/consoleFull)
 for   PR 8436 at commit 
[`074583e`](https://github.com/apache/spark/commit/074583e2fb5b31275f94af5d35f58fa0f2737c50).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4223] [Core] Support * in acls.

2015-08-27 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/8398#issuecomment-135495019
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9089] [Core] Fallback to another one if...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8337#issuecomment-135495054
  
  [Test build #41689 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41689/console)
 for   PR 8337 at commit 
[`573a37c`](https://github.com/apache/spark/commit/573a37c6a541d6993d6a45a2f7977056e936b05d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 579 matches

Mail list logo