GitHub user viirya opened a pull request:

    [SPARK-23377][ML] Fixes Bucketizer with multiple columns persistence bug

    ## What changes were proposed in this pull request?
    #### Problem:
    Since 2.3, `Bucketizer` supports multiple input/output columns. We will 
check if exclusive params are set during transformation. E.g., if `inputCols` 
and `outputCol` are both set, an error will be thrown.
    However, when we write `Bucketizer`, looks like the default params and 
user-supplied params are merged during writing. All saved params are loaded 
back and set to created model instance. So the default `outputCol` param in 
`HasOutputCol` trait will be set in `paramMap` and become an user-supplied 
param. That makes the check of exclusive params failed.
    #### Fix:
    This changes the saving logic of Bucketizer to handle this case. This is a 
quick fix to catch the time of 2.3. We should consider modify the persistence 
mechanism later.
    Please see the discussion in the JIRA.
    Note: The multi-column `QuantileDiscretizer` also has the same issue.
    ## How was this patch tested?
    Modified tests.

You can merge this pull request into a Git repository by running:

    $ git pull SPARK-23377-2

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20594
commit 9cd7c86fad04c814b2c8f5547583122ba12c359b
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-02-13T03:51:41Z

    Remove outputCol default value if inputCols is set.



To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to