[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-10-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 Thanks @gatorsmile @WeichenXu123 @zhengruifeng --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19229 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-10-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19229 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-10-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82369/ Test PASSed. ---

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-10-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-10-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #82369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82369/testReport)** for PR 19229 at commit

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #82369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82369/testReport)** for PR 19229 at commit

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-30 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 @gatorsmile `withColumn` is reimplemented now. Please take a look when you have time. Thanks. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #82347 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82347/testReport)** for PR 19229 at commit

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82347/ Test PASSed. ---

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #82347 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82347/testReport)** for PR 19229 at commit

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-30 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82344/ Test FAILed. ---

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #82344 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82344/testReport)** for PR 19229 at commit

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #82344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82344/testReport)** for PR 19229 at commit

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-29 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 also cc @jkbradley and @MLnick for final check of the ML change. Thanks. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-29 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 ping @gatorsmile Can you take a quick look? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-26 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 ping @gatorsmile for the SQL part. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-25 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 @gatorsmile Added the check for case sensitivity. Please take a look again. Thanks. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82141/ Test PASSed. ---

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #82141 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82141/testReport)** for PR 19229 at commit

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #82141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82141/testReport)** for PR 19229 at commit

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 @WeichenXu123 Have any more comments on this? Thanks. I think the ML part is straightforward. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19229 @viirya Yeah the perf gap I only focus on `mean` which can take advantage of codegen. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 Yeah, I think that fix should work for the strategy `Imputer.mean` because `Imputer.mean` aggregates many columns at once now and that can be a too large gen'd code for aggregation. For the

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19229 The performance gap issue (compared with RDD version), I create a separated JIRA to track: https://issues.apache.org/jira/browse/SPARK-22105 As the result of discussion with @cloud-fan

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-22 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19229 I am not familiar with SQL source, but I think it's great to transform all columns at a time --- - To unsubscribe, e-mail:

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-21 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 ping @zhengruifeng @WeichenXu123 Any more comments on this? Thanks. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81964/ Test PASSed. ---

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #81964 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81964/testReport)** for PR 19229 at commit

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19229 @viirya Thanks very much! Although the perf gap exists (when numCols is large), it won't block this PR. I will create a JIRA to track this. ---

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 numColums | RDD Mean | RDD Median | DataFrame Mean | DataFrame Median -- | -- | -- | -- | -- 1 | 0.1642173481 | 0.199774305 | 0.4260180671006 | 0.2025112919 10 | 0.3713707549 |

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 I ran the test codes to benchmark RDD-version and DataFrame version with this `ImputerModel` change: import org.apache.spark.ml.feature._ import org.apache.spark.sql.{DataFrame,

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #81964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81964/testReport)** for PR 19229 at commit

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19229 Oh. That's what have done in the old PR #18902 .(Because the RDD version (not in master branch, only personal impl here (sorry for put wrong link, the code link is here:

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 @WeichenXu123 I'm not sure I understand it correctly. This change only replaces the chain of `withColumn` to a pass of `withColumns`. We don't have RDD version for this, so I'm not sure what version

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19229 @viirya No, keep the dataframe version code. But I only want to confirm how much performance gap between this and RDD version. (for possible improvements in the future, because in similar test

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 @WeichenXu123 Thanks for verifying that. Do you mean using ApproxQuantiles to compute mean and median? But I think this change is not intended to improve this part. ---

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19229 @viirya I run the code, you're right, most of time cost on the executedPlan generation (The old version code). thanks! But can you append benchmark comparison with `RDD.aggregate` version?

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19229 Looks not the reason. maybe issues somewhere else. Let me run test later. Thanks! But there is some small issues in test: Don't include gen data time: ``` val start =

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 Btw, I don't see any hint about `df.withColumn(..).withColumn(..).withColumn(..)` can prevent the shuffle re-using. --- - To

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 I don't think re-using shuffle is the reason behind the numbers. If you looked at the previous comments, you will find that I ran the test before without `count` after `model.transform`. Namely the

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 New numbers: numColums | Old Mean | Old Median | New Mean | New Median -- | -- | -- | -- | -- 1 | 0.1727832915997 | 0.1537169693 | 0.1687325048997 | 0.1521283075 10 |

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19229 Great! That's it. thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 Updated test codes: import org.apache.spark.ml.feature._ import org.apache.spark.sql.{DataFrame, Row} import org.apache.spark.sql.types._ import spark.implicits._

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19229 @viirya I guess the reason is, the old PR version: `df.withColumn(..).withColumn(..).withColumn(..)`, the long df chain prevent the shuffle re-using... but now you merge them into one

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 @WeichenXu123 Btw, the test is basically re-using the codes from https://github.com/apache/spark/pull/18902#issuecomment-321727416. Is your concern is specified for this? ---

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 @WeichenXu123 Sure. And I must point out that I ran this benchmark in spark-shell under local mode. It is great if you can run the benchmark too to verify the numbers. ---

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81756/ Test PASSed. ---

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #81756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81756/testReport)** for PR 19229 at commit

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19229 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81755/ Test PASSed. ---

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #81755 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81755/testReport)** for PR 19229 at commit

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-13 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 @zhengruifeng Yeah, it is better. Actually the difference between running multiple `withColumn` and one `withColumns` is mainly in the cost of query analysis and plan/dataset initialization. I will

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-13 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19229 In the test code, should we use `model.transform(df).count` instead? --- - To unsubscribe, e-mail:

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-13 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 FYI, the `withColumns` API was proposed in #17819. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-13 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 Ran the similar benchmark as https://github.com/apache/spark/pull/18902#issuecomment-321727416: numColums | Old Mean | Old Median | New Mean | New Median -- | -- | -- | -- | --

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #81756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81756/testReport)** for PR 19229 at commit

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-13 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 cc @MLnick @zhengruifeng @yanboliang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19229: [SPARK-22001][ML][SQL] ImputerModel can do withColumn fo...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19229 **[Test build #81755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81755/testReport)** for PR 19229 at commit