GitHub user yinxusen opened a pull request:

    https://github.com/apache/spark/pull/5980

    [SPARK-5893]

    @jkbradely, JIRA issue 
[here](https://issues.apache.org/jira/browse/SPARK-5893).
    
    One thing to make clear, the `buckets` parameter, which is an array of 
`Double`, performs as split points. Say, 
    
    ```scala
    buckets = Array(-0.5, 0.0, 0.5)
    ```
    
    splits the real number into 4 ranges, (-inf, -0.5], (-0.5, 0.0], (0.0, 
0.5], (0.5, +inf), which is encoded as 0, 1, 2, 3.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yinxusen/spark SPARK-5893

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5980.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5980
    
----
commit 5fe190e481ba35f5e14575ba26ce8ff3ff29588e
Author: Xusen Yin <[email protected]>
Date:   2015-05-07T08:09:25Z

    add bucketizer

commit 4024cf1a74ba70f92d21e92db1b7449e72c88357
Author: Xusen Yin <[email protected]>
Date:   2015-05-07T09:30:20Z

    add test suite

commit 998bc87e43c26a1d4890eae8ceb13f057171d58c
Author: Xusen Yin <[email protected]>
Date:   2015-05-07T13:01:04Z

    check buckets

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to