[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2018-01-22 Thread crackcell
Github user crackcell commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r162935486 --- Diff: docs/ml-guide.md --- @@ -122,6 +122,8 @@ There are no deprecations. * [SPARK-21027](https://issues.apache.org/jira/browse/SPARK-21027

[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2018-01-22 Thread crackcell
Github user crackcell commented on the issue: https://github.com/apache/spark/pull/17123 @WeichenXu123 I have finished my work, plz review it. Any suggestion is welcome. :-) --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2018-01-22 Thread crackcell
Github user crackcell commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r162885968 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -53,7 +53,8 @@ final class Bucketizer @Since("1.4.0") (@Si

[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2018-01-21 Thread crackcell
Github user crackcell commented on the issue: https://github.com/apache/spark/pull/17123 @WeichenXu123 sorry to miss the message for two days, I'm working on it. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2017-03-16 Thread crackcell
Github user crackcell commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r106363022 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -105,20 +106,21 @@ final class Bucketizer @Since("1.4.0"

[GitHub] spark issue #17233: [SPARK-11569][ML] Fix StringIndexer to handle null value...

2017-03-13 Thread crackcell
Github user crackcell commented on the issue: https://github.com/apache/spark/pull/17233 @jkbradley Hi, I have made some updates according to your comments, please review it again. :-) --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #17233: [SPARK-11569][ML] Fix StringIndexer to handle nul...

2017-03-13 Thread crackcell
Github user crackcell commented on a diff in the pull request: https://github.com/apache/spark/pull/17233#discussion_r105820314 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -188,35 +189,45 @@ class StringIndexerModel

[GitHub] spark pull request #17233: [SPARK-11569][ML] Fix StringIndexer to handle nul...

2017-03-13 Thread crackcell
Github user crackcell commented on a diff in the pull request: https://github.com/apache/spark/pull/17233#discussion_r105820279 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -122,6 +122,86 @@ class StringIndexerSuite assert

[GitHub] spark pull request #17233: [SPARK-11569][ML] Fix StringIndexer to handle nul...

2017-03-13 Thread crackcell
Github user crackcell commented on a diff in the pull request: https://github.com/apache/spark/pull/17233#discussion_r105820283 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -122,6 +122,86 @@ class StringIndexerSuite assert

[GitHub] spark issue #17233: [SPARK-11569][ML] Fix StringIndexer to handle null value...

2017-03-10 Thread crackcell
Github user crackcell commented on the issue: https://github.com/apache/spark/pull/17233 cc @srowen @cloud-fan @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2017-03-09 Thread crackcell
Github user crackcell commented on the issue: https://github.com/apache/spark/pull/17123 @cloud-fan Would you please review my code again? I'm now using `Option` to handle NULLs. :-) --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #17233: [SPARK-11569][ML] Fix StringIndexer to handle nul...

2017-03-09 Thread crackcell
GitHub user crackcell opened a pull request: https://github.com/apache/spark/pull/17233 [SPARK-11569][ML] Fix StringIndexer to handle null value properly ## What changes were proposed in this pull request? This PR is to enhance StringIndexer with NULL values handling

[GitHub] spark issue #16883: [SPARK-17498][ML] StringIndexer enhancement for handling...

2017-03-07 Thread crackcell
Github user crackcell commented on the issue: https://github.com/apache/spark/pull/16883 Nice work! I'm just planning to improve `StringIndexer` exactly the same way as yours. Now I can have a rest. :-) --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2017-03-06 Thread crackcell
Github user crackcell commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r104572696 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -105,20 +106,21 @@ final class Bucketizer @Since("1.4.0"

[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2017-03-06 Thread crackcell
Github user crackcell commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r104434899 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -105,20 +106,21 @@ final class Bucketizer @Since("1.4.0"

[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2017-03-04 Thread crackcell
Github user crackcell commented on the issue: https://github.com/apache/spark/pull/17123 @imatiach-msft @cloud-fan I updated the code, replaced java.lang.Double with isNullAt() and getDouble(). --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2017-03-03 Thread crackcell
Github user crackcell commented on the issue: https://github.com/apache/spark/pull/17123 @srowen @cloud-fan Please review my code. Thanks. :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2017-03-03 Thread crackcell
Github user crackcell commented on the issue: https://github.com/apache/spark/pull/17123 @imatiach-msft Hi, Ilya. I have added two tests based on the original tests for NaN data. Please review my code again. Thanks for your time. :-) --- If your project is set up for it, you can

[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2017-03-02 Thread crackcell
Github user crackcell commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r103955065 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -171,23 +176,23 @@ object Bucketizer extends DefaultParamsReadable

[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2017-03-02 Thread crackcell
Github user crackcell commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r103954857 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -105,20 +106,24 @@ final class Bucketizer @Since("1.4.0"

[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2017-03-01 Thread crackcell
Github user crackcell commented on the issue: https://github.com/apache/spark/pull/17123 Fixed style errors during the unit tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...

2017-03-01 Thread crackcell
GitHub user crackcell opened a pull request: https://github.com/apache/spark/pull/17123 [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucketizer when handleInvalid is on ## What changes were proposed in this pull request? The original Bucketizer can put NaNs