[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19892
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/91/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20349: Fix the path to the examples jar

2018-01-22 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20349
  
Please add [SPARK-XXX][DOC] to the PR title with your jira ticket number, 
if you created a jira ticket for this. If this is a minor issue without jira 
ticket, just replace it to [Minor][DOC].


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-22 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/19892
  
@MLnick you ok with this then?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r162873388
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/MulticlassLogisticRegressionWithElasticNetExample.scala
 ---
@@ -49,6 +49,48 @@ object MulticlassLogisticRegressionWithElasticNetExample 
{
 // Print the coefficients and intercept for multinomial logistic 
regression
 println(s"Coefficients: \n${lrModel.coefficientMatrix}")
 println(s"Intercepts: \n${lrModel.interceptVector}")
+
+val trainingSummary = lrModel.summary
+
+val objectiveHistory = trainingSummary.objectiveHistory
--- End diff --

ditto here for the comment to be consistent with Java / Python versions


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r162873036
  
--- Diff: docs/ml-classification-regression.md ---
@@ -97,10 +97,6 @@ only available on the driver.
 
[`LogisticRegressionTrainingSummary`](api/scala/index.html#org.apache.spark.ml.classification.LogisticRegressionTrainingSummary)
 provides a summary for a
 
[`LogisticRegressionModel`](api/scala/index.html#org.apache.spark.ml.classification.LogisticRegressionModel).
-Currently, only binary classification is supported and the
--- End diff --

Should we add a note reflecting the difference between the summary and 
binary summary? Perhaps indicating the usage of `binarySummary` or `asBinary` 
method?

I know it's done in the example but perhaps a short line about that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r162872261
  
--- Diff: docs/ml-classification-regression.md ---
@@ -125,7 +117,6 @@ Continuing the earlier example:
 
[`LogisticRegressionTrainingSummary`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionSummary)
 provides a summary for a
 
[`LogisticRegressionModel`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionModel).
--- End diff --

Shall we just add a short line to the `Example` section of MLoR:

"The following example shows how to train a multiclass logistic regression 
model with elastic net regularization, as well as extract the multiclass 
training summary." 

or something like that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r162873193
  
--- Diff: 
examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py 
---
@@ -43,6 +43,43 @@
 # Print the coefficients and intercept for multinomial logistic 
regression
 print("Coefficients: \n" + str(lrModel.coefficientMatrix))
 print("Intercept: " + str(lrModel.interceptVector))
+
+trainingSummary = lrModel.summary
+
+# Obtain the objective per iteration
+objectiveHistory = trainingSummary.objectiveHistory
+print("objectiveHistory:")
+for objective in objectiveHistory:
+print(objective)
+
+print("False positive rate by label:")
--- End diff --

Do we want to have a consistent comment as per the Java version above?: `// 
for multiclass, we can inspect metrics on a per-label basis` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20349: Fix the path to the examples jar

2018-01-22 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20349
  
Can you please search if there're similar issues in the doc?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20349: Fix the path to the examples jar

2018-01-22 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20349
  
ok to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20349: Fix the path to the examples jar

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20349
  
**[Test build #86467 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86467/testReport)**
 for PR 20349 at commit 
[`20d502f`](https://github.com/apache/spark/commit/20d502fd2a271fcec1614a909c3e89934e81582e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2018-01-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17123#discussion_r162874801
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -53,7 +53,8 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") 
override val uid: String
* Values at -inf, inf must be explicitly provided to cover all Double 
values;
* otherwise, values outside the splits specified will be treated as 
errors.
*
-   * See also [[handleInvalid]], which can optionally create an additional 
bucket for NaN values.
+   * See also [[handleInvalid]], which can optionally create an additional 
bucket for NaN/NULL
--- End diff --

This sounds like a behavior change, we should add an item in migration 
guide of ML docs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19892
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19892
  
**[Test build #86464 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86464/testReport)**
 for PR 19892 at commit 
[`014fb08`](https://github.com/apache/spark/commit/014fb08ac279002203267bed65ebce2c980f7912).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/92/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20343
  
**[Test build #86465 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86465/testReport)**
 for PR 20343 at commit 
[`5d6092c`](https://github.com/apache/spark/commit/5d6092c4bf029a021930a4ba66e6e1de3a4b15ed).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20277
  
**[Test build #86469 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86469/testReport)**
 for PR 20277 at commit 
[`0c22f5b`](https://github.com/apache/spark/commit/0c22f5bec3ce5d3bd9f54d7950b58bff65f4941b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20349: Fix the path to the examples jar

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20349
  
**[Test build #86468 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86468/testReport)**
 for PR 20349 at commit 
[`20d502f`](https://github.com/apache/spark/commit/20d502fd2a271fcec1614a909c3e89934e81582e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/94/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #86470 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86470/testReport)**
 for PR 13599 at commit 
[`3c5cbfc`](https://github.com/apache/spark/commit/3c5cbfc2311a88dce928241da4523f788aa09602).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-22 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20277#discussion_r162876178
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java 
---
@@ -33,18 +33,6 @@
   private final ArrowVectorAccessor accessor;
   private ArrowColumnVector[] childColumns;
 
-  private void ensureAccessible(int index) {
-ensureAccessible(index, 1);
-  }
-
-  private void ensureAccessible(int index, int count) {
--- End diff --

It is good to do it later. I agree that we do the same check at one place.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20349: Fix the path to the examples jar

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20349
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20349: Fix the path to the examples jar

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20349
  
**[Test build #86467 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86467/testReport)**
 for PR 20349 at commit 
[`20d502f`](https://github.com/apache/spark/commit/20d502fd2a271fcec1614a909c3e89934e81582e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20349: Fix the path to the examples jar

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20349
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86467/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19892
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86464/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19892
  
**[Test build #86464 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86464/testReport)**
 for PR 19892 at commit 
[`014fb08`](https://github.com/apache/spark/commit/014fb08ac279002203267bed65ebce2c980f7912).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19892
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20344: [MINOR] Typo fixes

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20344
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/93/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/95/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20277
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20277
  
**[Test build #86459 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86459/testReport)**
 for PR 20277 at commit 
[`55a288e`](https://github.com/apache/spark/commit/55a288e925a71cd48a533d6171926e398f857c2e).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20277
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86460/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86462/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20277
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86463/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20146
  
**[Test build #86460 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86460/testReport)**
 for PR 20146 at commit 
[`540c364`](https://github.com/apache/spark/commit/540c364d2a70ecd6ee5b92fadedc5e9b85026d2c).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20349: Fix the path to the examples jar

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20349
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-22 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20277
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20349: Fix the path to the examples jar

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20349
  
**[Test build #86468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86468/testReport)**
 for PR 20349 at commit 
[`20d502f`](https://github.com/apache/spark/commit/20d502fd2a271fcec1614a909c3e89934e81582e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20349: Fix the path to the examples jar

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20349
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86468/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20349: Fix the path to the examples jar

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20349
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

2018-01-22 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r162886239
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2221,6 +2223,35 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
 
.. seealso:: :meth:`pyspark.sql.GroupedData.apply`
 
+3. GROUP_AGG
+
+   A group aggregate UDF defines a transformation: One or more 
`pandas.Series` -> A scalar
+   The `returnType` should be a primitive data type, e.g, 
:class:`DoubleType`.
--- End diff --

very small nit: `e.g.` instead of `e.g`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20348: [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs...

2018-01-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20348


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20342: [SPARK-23170][SQL] Dump the statistics of effective runs...

2018-01-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20342
  
Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20342: [SPARK-23170][SQL] Dump the statistics of effective runs...

2018-01-22 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20342
  
We need to update `TPCDSQueryBenchmark`, too? We could replace the updated 
queries there?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/96/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20349: [Minor][DOC] Fix the path to the examples jar

2018-01-22 Thread tashoyan
Github user tashoyan commented on the issue:

https://github.com/apache/spark/pull/20349
  
@jerryshao Not found yet


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-22 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/19892
  
@holdenk everything except my comment in 
https://github.com/apache/spark/pull/19892#discussion_r162900053 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20343
  
**[Test build #86465 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86465/testReport)**
 for PR 20343 at commit 
[`5d6092c`](https://github.com/apache/spark/commit/5d6092c4bf029a021930a4ba66e6e1de3a4b15ed).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20295: [WIP][SPARK-23011] Support alternative function f...

2018-01-22 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20295#discussion_r162912985
  
--- Diff: python/pyspark/serializers.py ---
@@ -267,13 +267,13 @@ def load_stream(self, stream):
 """
 Deserialize ArrowRecordBatches to an Arrow table and return as a 
list of pandas.Series.
 """
-from pyspark.sql.types import _check_dataframe_localize_timestamps
+from pyspark.sql.types import _check_series_localize_timestamps
 import pyarrow as pa
 reader = pa.open_stream(stream)
 for batch in reader:
 # NOTE: changed from pa.Columns.to_pandas, timezone issue in 
conversion fixed in 0.7.1
-pdf = _check_dataframe_localize_timestamps(batch.to_pandas(), 
self._timezone)
-yield [c for _, c in pdf.iteritems()]
+yield [_check_series_localize_timestamps(c.to_pandas(), 
self._timezone)
+   for c in pa.Table.from_batches([batch]).itercolumns()]
--- End diff --

Maybe we can remove the comment above (`# NOTE: ...`) ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20347: [SPARK-20129][Core] JavaSparkContext should use SparkCon...

2018-01-22 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20347
  
Can you please explain why do we need to change to `getOrCreate`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20348: [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UD...

2018-01-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20348
  
@HyukjinKwon That is fine. I am reviewing all the API changes made in Spark 
2.3 release. 

Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20046: [SPARK-22362][SQL] Add unit test for Window Aggregate Fu...

2018-01-22 Thread attilapiros
Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/20046
  
I have already extended 
sql/core/src/test/resources/sql-tests/inputs/window.sql with the missing window 
aggregate functions but if you would like I can move it to a different PR too. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-22 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20343
  
We need to update `TPCDSQueryBenchmark`, too? I think we could replace the 
update queries there.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #86470 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86470/testReport)**
 for PR 13599 at commit 
[`3c5cbfc`](https://github.com/apache/spark/commit/3c5cbfc2311a88dce928241da4523f788aa09602).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-22 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20276
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20177: [SPARK-22954][SQL] Fix the exception thrown by Analyze c...

2018-01-22 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20177
  
actually it's not really useful to have `NoSuchTableException`, 
`NoSuchFunctionException`, etc. always using AnalysisException seems fine. CC 
@gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2018-01-22 Thread crackcell
Github user crackcell commented on the issue:

https://github.com/apache/spark/pull/17123
  
@WeichenXu123 I have finished my work, plz review it. Any suggestion is 
welcome. :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #86471 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86471/testReport)**
 for PR 13599 at commit 
[`789a8e5`](https://github.com/apache/spark/commit/789a8e5222d7e57f3b6a15fac38604b2502e45d2).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)`
  * `  class DriverEndpoint(override val rpcEnv: RpcEnv)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86471/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19892: [SPARK-22797][PySpark] Bucketizer support multi-c...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/19892#discussion_r162900053
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -315,13 +315,19 @@ class BucketedRandomProjectionLSHModel(LSHModel, 
JavaMLReadable, JavaMLWritable)
 
 
 @inherit_doc
-class Bucketizer(JavaTransformer, HasInputCol, HasOutputCol, 
HasHandleInvalid,
- JavaMLReadable, JavaMLWritable):
-"""
-Maps a column of continuous features to a column of feature buckets.
-
->>> values = [(0.1,), (0.4,), (1.2,), (1.5,), (float("nan"),), 
(float("nan"),)]
->>> df = spark.createDataFrame(values, ["values"])
+class Bucketizer(JavaTransformer, HasInputCol, HasOutputCol, HasInputCols, 
HasOutputCols,
+ HasHandleInvalid, JavaMLReadable, JavaMLWritable):
+"""
+Maps a column of continuous features to a column of feature buckets. 
Since 2.3.0,
+:py:class:`Bucketizer` can map multiple columns at once by setting the 
:py:attr:`inputCols`
+parameter. Note that when both the :py:attr:`inputCol` and 
:py:attr:`inputCols` parameters
+are set, a log warning will be printed and only :py:attr:`inputCol` 
will take effect, while
--- End diff --

@holdenk this comment will need to be changed as per #19993 - but that has 
not been merged yet. I think #19993 will block 2.3 though, so we could 
preemptively change the doc here to match the Scala side in #19993 about 
throwing and exception.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86465/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20341: [MINOR] [SQL] [TEST] Test case cleanups for recen...

2018-01-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20341


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20342: [SPARK-23170][SQL] Dump the statistics of effecti...

2018-01-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20342


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20342: [SPARK-23170][SQL] Dump the statistics of effective runs...

2018-01-22 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20342
  
In following activities,  you will make a pr for per-query statistics?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20276
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20276
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/98/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/100/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20276
  
**[Test build #86481 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86481/testReport)**
 for PR 20276 at commit 
[`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/99/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

2018-01-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19285#discussion_r162927703
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -233,17 +235,13 @@ private[spark] class MemoryStore(
 }
 
 if (keepUnrolling) {
-  // We successfully unrolled the entirety of this block
-  val arrayValues = vector.toArray
-  vector = null
-  val entry =
-new DeserializedMemoryEntry[T](arrayValues, 
SizeEstimator.estimate(arrayValues), classTag)
-  val size = entry.size
+  // We need more precise value
+  val size = valuesHolder.esitimatedSize(false)
--- End diff --

why do we need `esitimatedSize(false)`? It seems we can just build the 
entry and call `entry.size`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2018-01-22 Thread crackcell
Github user crackcell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17123#discussion_r162885968
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -53,7 +53,8 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") 
override val uid: String
* Values at -inf, inf must be explicitly provided to cover all Double 
values;
* otherwise, values outside the splits specified will be treated as 
errors.
*
-   * See also [[handleInvalid]], which can optionally create an additional 
bucket for NaN values.
+   * See also [[handleInvalid]], which can optionally create an additional 
bucket for NaN/NULL
--- End diff --

@viirya done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #86471 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86471/testReport)**
 for PR 13599 at commit 
[`789a8e5`](https://github.com/apache/spark/commit/789a8e5222d7e57f3b6a15fac38604b2502e45d2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20344: [MINOR] Typo fixes

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20344
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20344: [MINOR] Typo fixes

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20344
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86466/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20277
  
**[Test build #86469 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86469/testReport)**
 for PR 20277 at commit 
[`0c22f5b`](https://github.com/apache/spark/commit/0c22f5bec3ce5d3bd9f54d7950b58bff65f4941b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20046: [SPARK-22362][SQL] Add unit test for Window Aggregate Fu...

2018-01-22 Thread smurakozi
Github user smurakozi commented on the issue:

https://github.com/apache/spark/pull/20046
  
@jiangxb1987 how does your request to cover the sql interface relates to 
SPARK-23160? 
I assume it is to be covered in that issue, is that correct?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20046: [SPARK-22362][SQL] Add unit test for Window Aggregate Fu...

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20046
  
**[Test build #86474 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86474/testReport)**
 for PR 20046 at commit 
[`458a0cc`](https://github.com/apache/spark/commit/458a0ccd7530afeededc52c72bfc38bb83f0bbd1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/97/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20319
  
**[Test build #86479 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86479/testReport)**
 for PR 20319 at commit 
[`dc7e708`](https://github.com/apache/spark/commit/dc7e7084dbe2c3eb987e05b28da70d54560e6e95).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #86473 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86473/testReport)**
 for PR 13599 at commit 
[`83d66c5`](https://github.com/apache/spark/commit/83d66c5cf5aad4c1fd877b29dc2a2f6453880dc3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20045: [Spark-22360][SQL][TEST] Add unit tests for Window Speci...

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20045
  
**[Test build #86475 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86475/testReport)**
 for PR 20045 at commit 
[`5feb4f7`](https://github.com/apache/spark/commit/5feb4f7d75eaba759eec6d84cfc23d8c1a347f2f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #86480 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86480/testReport)**
 for PR 13599 at commit 
[`76918ae`](https://github.com/apache/spark/commit/76918ae1a8d3345f1835a906b5b50b000de1233d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20046: [SPARK-22362][SQL] Add unit test for Window Aggregate Fu...

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20046
  
**[Test build #86476 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86476/testReport)**
 for PR 20046 at commit 
[`a0e14cc`](https://github.com/apache/spark/commit/a0e14cc5ec320df430993f3c2f67c08ce9474163).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20046: [SPARK-22362][SQL] Add unit test for Window Aggregate Fu...

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20046
  
**[Test build #86472 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86472/testReport)**
 for PR 20046 at commit 
[`5c941c7`](https://github.com/apache/spark/commit/5c941c7e47e0f8782c97da6765465d85a66345e5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #86478 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86478/testReport)**
 for PR 13599 at commit 
[`c903043`](https://github.com/apache/spark/commit/c903043d78bbf1e8cec157784459f1cce1fe8c93).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20276
  
**[Test build #86477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86477/testReport)**
 for PR 20276 at commit 
[`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20276
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/101/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20276
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20046: [SPARK-22362][SQL] Add unit test for Window Aggre...

2018-01-22 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20046#discussion_r162894686
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala
 ---
@@ -86,6 +93,429 @@ class DataFrameWindowFunctionsSuite extends QueryTest 
with SharedSQLContext {
 assert(e.message.contains("requires window to be ordered"))
   }
 
+  test("aggregation and rows between") {
+val df = Seq((1, "1"), (2, "1"), (2, "2"), (1, "1"), (2, 
"2")).toDF("key", "value")
--- End diff --

This tests was removed and re-added as result of merge conflict. Now I 
cleaned up.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2018-01-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17123#discussion_r162910933
  
--- Diff: docs/ml-guide.md ---
@@ -122,6 +122,8 @@ There are no deprecations.
 * [SPARK-21027](https://issues.apache.org/jira/browse/SPARK-21027):
  We are now setting the default parallelism used in `OneVsRest` to be 1 
(i.e. serial), in 2.2 and earlier version,
  the `OneVsRest` parallelism would be parallelism of the default 
threadpool in scala.
+* [SPARK-19781](https://issues.apache.org/jira/browse/SPARK-19781):
+ `Bucketizer` handles NULL values the same way as NaN when handleInvalid 
is skip or keep.
--- End diff --

hmm, I think for skip, `dataset.na.drop` drops NULL. We didn't change its 
behavior. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-22 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20276
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >