GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/19229
[SPARK-22001][ML][SQL] ImputerModel can do withColumn for all input columns
at one pass
## What changes were proposed in this pull request?
SPARK-21690 makes one-pass `Imputer` by parallelizing the computation of
all input columns. When we transform dataset with `ImputerModel`, we do
`withColumn` on all input columns sequentially. We can also do this on all
input columns at once by adding a `withColumns` API to `Dataset`.
The new `withColumns` API is for internal use only now.
## How was this patch tested?
Existing tests for `ImputerModel`'s change. Added tests for `withColumns`
API.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 SPARK-22001
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19229.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19229
----
commit 4efb64374b7c93bae3e9b0d2fc0ebc4f5ad1e1d5
Author: Liang-Chi Hsieh <[email protected]>
Date: 2017-09-14T03:49:16Z
Do withColumn on all input columns at once.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]