GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/20566

    [SPARK-23377][ML] Fixes Bucketizer with multiple columns persistence bug

    ## What changes were proposed in this pull request?
    
    Since 2.3, `Bucketizer` supports multiple input/output columns. We will 
check if exclusive params are set during transformation. E.g., if `inputCols` 
and `outputCol` are both set, an error will be thrown.
    
    However, when we write `Bucketizer`, looks like the default params and 
user-supplied params are merged during writing. All saved params are loaded 
back and set to created model instance. So the default `outputCol` param in 
`HasOutputCol` trait will be set in `paramMap` and become an user-supplied 
param. That makes the check of exclusive params failed.
    
    This patch changes `DefaultParamsWriter` and only save user-supplied params.
    
    ## How was this patch tested?
    
    Modified test.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-23377

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20566.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20566
    
----
commit 7785cacee8dd4a6e9938c3c99dad3ad3117655d3
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-02-10T08:52:17Z

    Only save user-supplied params.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to