GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/20566
[SPARK-23377][ML] Fixes Bucketizer with multiple columns persistence bug
## What changes were proposed in this pull request?
Since 2.3, `Bucketizer` supports multiple input/output columns. We will
check if exclusive params are set during transformation. E.g., if `inputCols`
and `outputCol` are both set, an error will be thrown.
However, when we write `Bucketizer`, looks like the default params and
user-supplied params are merged during writing. All saved params are loaded
back and set to created model instance. So the default `outputCol` param in
`HasOutputCol` trait will be set in `paramMap` and become an user-supplied
param. That makes the check of exclusive params failed.
This patch changes `DefaultParamsWriter` and only save user-supplied params.
## How was this patch tested?
Modified test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 SPARK-23377
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20566.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20566
----
commit 7785cacee8dd4a6e9938c3c99dad3ad3117655d3
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-02-10T08:52:17Z
Only save user-supplied params.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]