GitHub user actuaryzhang opened a pull request:
https://github.com/apache/spark/pull/17840
[SPARK-20574][ML] Allow Bucketizer to handle non-Double column
## What changes were proposed in this pull request?
Bucketizer currently requires input column to be Double, but the logic
should work on any numeric data types. Many practical problems have
integer/float data types, and it could get very tedious to manually cast them
into Double before calling bucketizer. This PR extends bucketizer to handle all
numeric types.
## How was this patch tested?
New test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/actuaryzhang/spark bucketizer
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17840.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17840
----
commit 7d93d5ab3d84d04c0636d9128215881f9d00a479
Author: Wayne Zhang <[email protected]>
Date: 2017-05-03T04:02:33Z
allow bucketizer to work for non-double column
commit a86fbde996afb1aed49c10a5785d934b4c12b2a2
Author: Wayne Zhang <[email protected]>
Date: 2017-05-03T06:01:21Z
update test for non-Double types
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]