[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19892 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/91/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: Fix the path to the examples jar
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20349 Please add [SPARK-XXX][DOC] to the PR title with your jira ticket number, if you created a jira ticket for this. If this is a minor issue without jira ticket, just replace it to [Minor][DOC]. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/19892 @MLnick you ok with this then? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r162873388 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/MulticlassLogisticRegressionWithElasticNetExample.scala --- @@ -49,6 +49,48 @@ object MulticlassLogisticRegressionWithElasticNetExample { // Print the coefficients and intercept for multinomial logistic regression println(s"Coefficients: \n${lrModel.coefficientMatrix}") println(s"Intercepts: \n${lrModel.interceptVector}") + +val trainingSummary = lrModel.summary + +val objectiveHistory = trainingSummary.objectiveHistory --- End diff -- ditto here for the comment to be consistent with Java / Python versions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r162873036 --- Diff: docs/ml-classification-regression.md --- @@ -97,10 +97,6 @@ only available on the driver. [`LogisticRegressionTrainingSummary`](api/scala/index.html#org.apache.spark.ml.classification.LogisticRegressionTrainingSummary) provides a summary for a [`LogisticRegressionModel`](api/scala/index.html#org.apache.spark.ml.classification.LogisticRegressionModel). -Currently, only binary classification is supported and the --- End diff -- Should we add a note reflecting the difference between the summary and binary summary? Perhaps indicating the usage of `binarySummary` or `asBinary` method? I know it's done in the example but perhaps a short line about that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r162872261 --- Diff: docs/ml-classification-regression.md --- @@ -125,7 +117,6 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionSummary) provides a summary for a [`LogisticRegressionModel`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionModel). --- End diff -- Shall we just add a short line to the `Example` section of MLoR: "The following example shows how to train a multiclass logistic regression model with elastic net regularization, as well as extract the multiclass training summary." or something like that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r162873193 --- Diff: examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py --- @@ -43,6 +43,43 @@ # Print the coefficients and intercept for multinomial logistic regression print("Coefficients: \n" + str(lrModel.coefficientMatrix)) print("Intercept: " + str(lrModel.interceptVector)) + +trainingSummary = lrModel.summary + +# Obtain the objective per iteration +objectiveHistory = trainingSummary.objectiveHistory +print("objectiveHistory:") +for objective in objectiveHistory: +print(objective) + +print("False positive rate by label:") --- End diff -- Do we want to have a consistent comment as per the Java version above?: `// for multiclass, we can inspect metrics on a per-label basis` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: Fix the path to the examples jar
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20349 Can you please search if there're similar issues in the doc? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: Fix the path to the examples jar
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20349 ok to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: Fix the path to the examples jar
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20349 **[Test build #86467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86467/testReport)** for PR 20349 at commit [`20d502f`](https://github.com/apache/spark/commit/20d502fd2a271fcec1614a909c3e89934e81582e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r162874801 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -53,7 +53,8 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String * Values at -inf, inf must be explicitly provided to cover all Double values; * otherwise, values outside the splits specified will be treated as errors. * - * See also [[handleInvalid]], which can optionally create an additional bucket for NaN values. + * See also [[handleInvalid]], which can optionally create an additional bucket for NaN/NULL --- End diff -- This sounds like a behavior change, we should add an item in migration guide of ML docs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19892 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19892 **[Test build #86464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86464/testReport)** for PR 19892 at commit [`014fb08`](https://github.com/apache/spark/commit/014fb08ac279002203267bed65ebce2c980f7912). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20343 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/92/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20343 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20343 **[Test build #86465 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86465/testReport)** for PR 20343 at commit [`5d6092c`](https://github.com/apache/spark/commit/5d6092c4bf029a021930a4ba66e6e1de3a4b15ed). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20277 **[Test build #86469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86469/testReport)** for PR 20277 at commit [`0c22f5b`](https://github.com/apache/spark/commit/0c22f5bec3ce5d3bd9f54d7950b58bff65f4941b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: Fix the path to the examples jar
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20349 **[Test build #86468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86468/testReport)** for PR 20349 at commit [`20d502f`](https://github.com/apache/spark/commit/20d502fd2a271fcec1614a909c3e89934e81582e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/94/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #86470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86470/testReport)** for PR 13599 at commit [`3c5cbfc`](https://github.com/apache/spark/commit/3c5cbfc2311a88dce928241da4523f788aa09602). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20277#discussion_r162876178 --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java --- @@ -33,18 +33,6 @@ private final ArrowVectorAccessor accessor; private ArrowColumnVector[] childColumns; - private void ensureAccessible(int index) { -ensureAccessible(index, 1); - } - - private void ensureAccessible(int index, int count) { --- End diff -- It is good to do it later. I agree that we do the same check at one place. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: Fix the path to the examples jar
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20349 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: Fix the path to the examples jar
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20349 **[Test build #86467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86467/testReport)** for PR 20349 at commit [`20d502f`](https://github.com/apache/spark/commit/20d502fd2a271fcec1614a909c3e89934e81582e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: Fix the path to the examples jar
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20349 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86467/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19892 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86464/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19892 **[Test build #86464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86464/testReport)** for PR 19892 at commit [`014fb08`](https://github.com/apache/spark/commit/014fb08ac279002203267bed65ebce2c980f7912). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19892 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20344: [MINOR] Typo fixes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20344 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/93/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20277 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/95/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20277 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20277 **[Test build #86459 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86459/testReport)** for PR 20277 at commit [`55a288e`](https://github.com/apache/spark/commit/55a288e925a71cd48a533d6171926e398f857c2e). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20277 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86460/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20343 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86462/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20277 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86463/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20146 **[Test build #86460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86460/testReport)** for PR 20146 at commit [`540c364`](https://github.com/apache/spark/commit/540c364d2a70ecd6ee5b92fadedc5e9b85026d2c). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: Fix the path to the examples jar
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20349 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20277 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: Fix the path to the examples jar
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20349 **[Test build #86468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86468/testReport)** for PR 20349 at commit [`20d502f`](https://github.com/apache/spark/commit/20d502fd2a271fcec1614a909c3e89934e81582e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: Fix the path to the examples jar
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20349 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86468/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: Fix the path to the examples jar
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20349 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r162886239 --- Diff: python/pyspark/sql/functions.py --- @@ -2221,6 +2223,35 @@ def pandas_udf(f=None, returnType=None, functionType=None): .. seealso:: :meth:`pyspark.sql.GroupedData.apply` +3. GROUP_AGG + + A group aggregate UDF defines a transformation: One or more `pandas.Series` -> A scalar + The `returnType` should be a primitive data type, e.g, :class:`DoubleType`. --- End diff -- very small nit: `e.g.` instead of `e.g`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20348: [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20348 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20342: [SPARK-23170][SQL] Dump the statistics of effective runs...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20342 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20342: [SPARK-23170][SQL] Dump the statistics of effective runs...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20342 We need to update `TPCDSQueryBenchmark`, too? We could replace the updated queries there? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/96/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20349: [Minor][DOC] Fix the path to the examples jar
Github user tashoyan commented on the issue: https://github.com/apache/spark/pull/20349 @jerryshao Not found yet --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19892 @holdenk everything except my comment in https://github.com/apache/spark/pull/19892#discussion_r162900053 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20343 **[Test build #86465 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86465/testReport)** for PR 20343 at commit [`5d6092c`](https://github.com/apache/spark/commit/5d6092c4bf029a021930a4ba66e6e1de3a4b15ed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20295: [WIP][SPARK-23011] Support alternative function f...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20295#discussion_r162912985 --- Diff: python/pyspark/serializers.py --- @@ -267,13 +267,13 @@ def load_stream(self, stream): """ Deserialize ArrowRecordBatches to an Arrow table and return as a list of pandas.Series. """ -from pyspark.sql.types import _check_dataframe_localize_timestamps +from pyspark.sql.types import _check_series_localize_timestamps import pyarrow as pa reader = pa.open_stream(stream) for batch in reader: # NOTE: changed from pa.Columns.to_pandas, timezone issue in conversion fixed in 0.7.1 -pdf = _check_dataframe_localize_timestamps(batch.to_pandas(), self._timezone) -yield [c for _, c in pdf.iteritems()] +yield [_check_series_localize_timestamps(c.to_pandas(), self._timezone) + for c in pa.Table.from_batches([batch]).itercolumns()] --- End diff -- Maybe we can remove the comment above (`# NOTE: ...`) ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20347: [SPARK-20129][Core] JavaSparkContext should use SparkCon...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20347 Can you please explain why do we need to change to `getOrCreate`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20348: [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UD...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20348 @HyukjinKwon That is fine. I am reviewing all the API changes made in Spark 2.3 release. Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20046: [SPARK-22362][SQL] Add unit test for Window Aggregate Fu...
Github user attilapiros commented on the issue: https://github.com/apache/spark/pull/20046 I have already extended sql/core/src/test/resources/sql-tests/inputs/window.sql with the missing window aggregate functions but if you would like I can move it to a different PR too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20343 We need to update `TPCDSQueryBenchmark`, too? I think we could replace the update queries there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #86470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86470/testReport)** for PR 13599 at commit [`3c5cbfc`](https://github.com/apache/spark/commit/3c5cbfc2311a88dce928241da4523f788aa09602). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: Boolean)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20276 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20177: [SPARK-22954][SQL] Fix the exception thrown by Analyze c...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20177 actually it's not really useful to have `NoSuchTableException`, `NoSuchFunctionException`, etc. always using AnalysisException seems fine. CC @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...
Github user crackcell commented on the issue: https://github.com/apache/spark/pull/17123 @WeichenXu123 I have finished my work, plz review it. Any suggestion is welcome. :-) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #86471 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86471/testReport)** for PR 13599 at commit [`789a8e5`](https://github.com/apache/spark/commit/789a8e5222d7e57f3b6a15fac38604b2502e45d2). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: Boolean)` * ` class DriverEndpoint(override val rpcEnv: RpcEnv)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86471/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19892: [SPARK-22797][PySpark] Bucketizer support multi-c...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/19892#discussion_r162900053 --- Diff: python/pyspark/ml/feature.py --- @@ -315,13 +315,19 @@ class BucketedRandomProjectionLSHModel(LSHModel, JavaMLReadable, JavaMLWritable) @inherit_doc -class Bucketizer(JavaTransformer, HasInputCol, HasOutputCol, HasHandleInvalid, - JavaMLReadable, JavaMLWritable): -""" -Maps a column of continuous features to a column of feature buckets. - ->>> values = [(0.1,), (0.4,), (1.2,), (1.5,), (float("nan"),), (float("nan"),)] ->>> df = spark.createDataFrame(values, ["values"]) +class Bucketizer(JavaTransformer, HasInputCol, HasOutputCol, HasInputCols, HasOutputCols, + HasHandleInvalid, JavaMLReadable, JavaMLWritable): +""" +Maps a column of continuous features to a column of feature buckets. Since 2.3.0, +:py:class:`Bucketizer` can map multiple columns at once by setting the :py:attr:`inputCols` +parameter. Note that when both the :py:attr:`inputCol` and :py:attr:`inputCols` parameters +are set, a log warning will be printed and only :py:attr:`inputCol` will take effect, while --- End diff -- @holdenk this comment will need to be changed as per #19993 - but that has not been merged yet. I think #19993 will block 2.3 though, so we could preemptively change the doc here to match the Scala side in #19993 about throwing and exception. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20343 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20343 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86465/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20341: [MINOR] [SQL] [TEST] Test case cleanups for recen...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20341 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20342: [SPARK-23170][SQL] Dump the statistics of effecti...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20342 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20342: [SPARK-23170][SQL] Dump the statistics of effective runs...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20342 In following activities, you will make a pr for per-query statistics? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20276 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20276 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/98/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/100/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20276 **[Test build #86481 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86481/testReport)** for PR 20276 at commit [`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/99/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19285#discussion_r162927703 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -233,17 +235,13 @@ private[spark] class MemoryStore( } if (keepUnrolling) { - // We successfully unrolled the entirety of this block - val arrayValues = vector.toArray - vector = null - val entry = -new DeserializedMemoryEntry[T](arrayValues, SizeEstimator.estimate(arrayValues), classTag) - val size = entry.size + // We need more precise value + val size = valuesHolder.esitimatedSize(false) --- End diff -- why do we need `esitimatedSize(false)`? It seems we can just build the entry and call `entry.size`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...
Github user crackcell commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r162885968 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -53,7 +53,8 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String * Values at -inf, inf must be explicitly provided to cover all Double values; * otherwise, values outside the splits specified will be treated as errors. * - * See also [[handleInvalid]], which can optionally create an additional bucket for NaN values. + * See also [[handleInvalid]], which can optionally create an additional bucket for NaN/NULL --- End diff -- @viirya done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #86471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86471/testReport)** for PR 13599 at commit [`789a8e5`](https://github.com/apache/spark/commit/789a8e5222d7e57f3b6a15fac38604b2502e45d2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20344: [MINOR] Typo fixes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20344 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20344: [MINOR] Typo fixes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20344 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86466/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20277 **[Test build #86469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86469/testReport)** for PR 20277 at commit [`0c22f5b`](https://github.com/apache/spark/commit/0c22f5bec3ce5d3bd9f54d7950b58bff65f4941b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20046: [SPARK-22362][SQL] Add unit test for Window Aggregate Fu...
Github user smurakozi commented on the issue: https://github.com/apache/spark/pull/20046 @jiangxb1987 how does your request to cover the sql interface relates to SPARK-23160? I assume it is to be covered in that issue, is that correct? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20046: [SPARK-22362][SQL] Add unit test for Window Aggregate Fu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20046 **[Test build #86474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86474/testReport)** for PR 20046 at commit [`458a0cc`](https://github.com/apache/spark/commit/458a0ccd7530afeededc52c72bfc38bb83f0bbd1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/97/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20319 **[Test build #86479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86479/testReport)** for PR 20319 at commit [`dc7e708`](https://github.com/apache/spark/commit/dc7e7084dbe2c3eb987e05b28da70d54560e6e95). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #86473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86473/testReport)** for PR 13599 at commit [`83d66c5`](https://github.com/apache/spark/commit/83d66c5cf5aad4c1fd877b29dc2a2f6453880dc3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20045: [Spark-22360][SQL][TEST] Add unit tests for Window Speci...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20045 **[Test build #86475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86475/testReport)** for PR 20045 at commit [`5feb4f7`](https://github.com/apache/spark/commit/5feb4f7d75eaba759eec6d84cfc23d8c1a347f2f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #86480 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86480/testReport)** for PR 13599 at commit [`76918ae`](https://github.com/apache/spark/commit/76918ae1a8d3345f1835a906b5b50b000de1233d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20046: [SPARK-22362][SQL] Add unit test for Window Aggregate Fu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20046 **[Test build #86476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86476/testReport)** for PR 20046 at commit [`a0e14cc`](https://github.com/apache/spark/commit/a0e14cc5ec320df430993f3c2f67c08ce9474163). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20046: [SPARK-22362][SQL] Add unit test for Window Aggregate Fu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20046 **[Test build #86472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86472/testReport)** for PR 20046 at commit [`5c941c7`](https://github.com/apache/spark/commit/5c941c7e47e0f8782c97da6765465d85a66345e5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #86478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86478/testReport)** for PR 13599 at commit [`c903043`](https://github.com/apache/spark/commit/c903043d78bbf1e8cec157784459f1cce1fe8c93). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20276 **[Test build #86477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86477/testReport)** for PR 20276 at commit [`83c5fda`](https://github.com/apache/spark/commit/83c5fda86a55f60ea5844116a5239590140414e5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20276 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/101/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20276 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20046: [SPARK-22362][SQL] Add unit test for Window Aggre...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/20046#discussion_r162894686 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala --- @@ -86,6 +93,429 @@ class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext { assert(e.message.contains("requires window to be ordered")) } + test("aggregation and rows between") { +val df = Seq((1, "1"), (2, "1"), (2, "2"), (1, "1"), (2, "2")).toDF("key", "value") --- End diff -- This tests was removed and re-added as result of merge conflict. Now I cleaned up. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r162910933 --- Diff: docs/ml-guide.md --- @@ -122,6 +122,8 @@ There are no deprecations. * [SPARK-21027](https://issues.apache.org/jira/browse/SPARK-21027): We are now setting the default parallelism used in `OneVsRest` to be 1 (i.e. serial), in 2.2 and earlier version, the `OneVsRest` parallelism would be parallelism of the default threadpool in scala. +* [SPARK-19781](https://issues.apache.org/jira/browse/SPARK-19781): + `Bucketizer` handles NULL values the same way as NaN when handleInvalid is skip or keep. --- End diff -- hmm, I think for skip, `dataset.na.drop` drops NULL. We didn't change its behavior. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20276 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org