[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-02-09 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16630 @actuaryzhang sorry I'm at Spark Summit East, will take a look soon. For the feature name or "lazy val featureName: Array[String]", I recall there is a sparse (eg output by

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-02-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r100927396 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -1152,4 +1170,32 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-02-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r100931445 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -915,6 +917,22 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-02-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r100931568 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -1152,4 +1170,32 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-02-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r100932182 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -1104,6 +1103,83 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-02-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r100932561 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -1152,4 +1170,32 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-02-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r100933207 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -1152,4 +1170,32 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-02-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r100933747 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -915,6 +917,22 @@ class

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-02-13 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16630 the code looks very good, I added a few minor comments, will take another look tomorrow, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r100934554 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -798,77 +798,160 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r100934645 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -168,6 +179,7 @@ private[regression] trait

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101338639 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala --- @@ -124,8 +129,8 @@ private[ml] object TreeTests extends

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101338837 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -106,14 +122,18 @@ class

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101339648 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/RandomForestClassifier.scala --- @@ -126,20 +127,22 @@ class

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101339732 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/RandomForestClassifier.scala --- @@ -126,20 +127,20 @@ class

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101339952 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala --- @@ -99,16 +105,31 @@ class DecisionTreeRegressor

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101341596 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala --- @@ -60,12 +68,14 @@ private[spark] object BaggedPoint

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101342186 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala --- @@ -115,7 +122,10 @@ private[spark] object

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101342791 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -351,6 +370,36 @@ class

[GitHub] spark issue #16722: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16722 the code looks good to me, maybe a contributor can comment? This is a great feature, nice work! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101346522 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala --- @@ -60,12 +68,14 @@ private[spark] object BaggedPoint

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101351063 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala --- @@ -82,16 +92,16 @@ private[spark] object BaggedPoint

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101354732 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -351,6 +370,36 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r101356243 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -1104,6 +1103,83 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r101356475 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -1152,4 +1173,33 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r101357001 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -915,6 +919,23 @@ class

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16630 Thanks for the updates, the changes look good to me. One question, out of scope of the specific changes in this review: are there any other summary statistics that we could add in the future

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r101362237 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -1104,6 +1103,83 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-02-15 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r101362084 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -1104,6 +1103,83 @@ class

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-16 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101531848 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -351,6 +370,36 @@ class

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-16 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101532020 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala --- @@ -60,12 +68,14 @@ private[spark] object BaggedPoint

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-02-17 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16630 @actuaryzhang thanks, LGTM! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-02-17 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16630 @actuaryzhang sorry, can you comment on this question I had above: One question, out of scope of the specific changes in this review: are there any other summary statistics that we could

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-02-27 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 ok, I will close this and create three new PRs, one for each of the evaluators --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #16557: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2017-02-27 Thread imatiach-msft
Github user imatiach-msft closed the pull request at: https://github.com/apache/spark/pull/16557 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2017-02-27 Thread imatiach-msft
GitHub user imatiach-msft opened a pull request: https://github.com/apache/spark/pull/17084 [SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator ## What changes were proposed in this pull request? The

[GitHub] spark pull request #17085: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2017-02-27 Thread imatiach-msft
GitHub user imatiach-msft opened a pull request: https://github.com/apache/spark/pull/17085 [SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator ## What changes were proposed in this pull request? The evaluators

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2017-02-27 Thread imatiach-msft
GitHub user imatiach-msft opened a pull request: https://github.com/apache/spark/pull/17086 [SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added weight column for multiclass classification evaluator ## What changes were proposed in this pull request? The

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-02-27 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 I've created 3 PRs, located here: https://github.com/apache/spark/pull/17084 https://github.com/apache/spark/pull/17085 https://github.com/apache/spark/pull/17086 --- If

[GitHub] spark issue #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees handling ...

2016-12-22 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16377 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2016-12-22 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r93639635 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala --- @@ -113,6 +113,10 @@ private[spark] object

[GitHub] spark issue #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees handling ...

2016-12-22 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16377 How can I run the build/tests? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2016-12-22 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 Yep, there is still a TODO to verify the fix. I'm waiting for the dataset from Alok to reproduce the issue: https://issues.apache.org/jira/browse/SPARK-16473 --- If your project i

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2016-12-26 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 Hi Alok! Sorry I was away for holiday break. I will try to reproduce the failure. Thank you, Ilya --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees handling ...

2016-12-27 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16377 Is there anything I need to do to allow this fix to be pushed to the base branch? Are there any pending questions/comments that still need to resolved? Thank you, Ilya --- If your

[GitHub] spark issue #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees handling ...

2016-12-27 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16377 @sethah would you be able to take a look at the proposed changes? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2016-12-28 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 I have very good news :). I was not only able to repro the issue with your dataset, but I was also able to verify that with the suggested fix the algorithm does not fail (adding the val

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2016-12-28 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2016-12-28 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 I've updated with a new commit. I was able to reproduce the issue by generating a synthetic sparse dataset similar to the one Alok sent me, in accordance with the test-style of spark

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2016-12-28 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2016-12-29 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 @jkbradley @srowen any comments on the changes? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees handling ...

2016-12-29 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16377 ping @sethah --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16436: [SPARK-18698][ML][MLLIB] Adding public constructo...

2016-12-29 Thread imatiach-msft
GitHub user imatiach-msft opened a pull request: https://github.com/apache/spark/pull/16436 [SPARK-18698][ML][MLLIB] Adding public constructor that takes uid for IndexToString ## What changes were proposed in this pull request? Based on SPARK-18698, this adds a public

[GitHub] spark issue #16436: [SPARK-18698][ML][MLLIB] Adding public constructor that ...

2016-12-29 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16436 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16436: [SPARK-18698][ML][MLLIB] Adding public constructor that ...

2016-12-29 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16436 @mengxr @jkbradley @MLnick @HyukjinKwon @holdenk would you be able to take a look at the changes - it looks like you have previously modified StringIndexer.scala file. --- If your project

[GitHub] spark issue #16436: [SPARK-18698][ML] Adding public constructor that takes u...

2016-12-29 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16436 done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request #16436: [SPARK-18698][ML] Adding public constructor that ...

2016-12-29 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16436#discussion_r94174966 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -219,6 +219,16 @@ class StringIndexerSuite

[GitHub] spark pull request #16436: [SPARK-18698][ML] Adding public constructor that ...

2016-12-29 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16436#discussion_r94175547 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -219,6 +219,16 @@ class StringIndexerSuite

[GitHub] spark pull request #16436: [SPARK-18698][ML] Adding public constructor that ...

2016-12-29 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16436#discussion_r94175562 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -219,6 +219,16 @@ class StringIndexerSuite

[GitHub] spark pull request #16436: [SPARK-18698][ML] Adding public constructor that ...

2016-12-29 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16436#discussion_r94175568 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -219,6 +219,16 @@ class StringIndexerSuite

[GitHub] spark issue #16436: [SPARK-18698][ML] Adding public constructor that takes u...

2016-12-29 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16436 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2016-12-29 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 the only problem I see is that with this code we generate k-1 clusters instead of k, but it states in the algorithm documentation that it is not guaranteed to generate k clusters, it could be

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2016-12-30 Thread imatiach-msft
GitHub user imatiach-msft opened a pull request: https://github.com/apache/spark/pull/16441 [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict probability per training instance and fixed interfaces ## What changes were proposed in this pull request? For all of the

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2016-12-30 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 Thanks, I've updated the PR based on your comment. The only disadvantage to the current code is that I do the probability computation within the classifier, but it seems like it shou

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r94850577 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -713,6 +713,15 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r94850633 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -713,6 +713,15 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r94852377 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -713,6 +713,15 @@ private[spark] object RandomForest extends

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 @jkbradley I've updated based on your comments, please take another look, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark issue #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees handling ...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16377 @sethah I've updated the code based on your comments, please take a look, thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94873371 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -215,10 +223,23 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94873411 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -215,10 +223,23 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94874968 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +269,38 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94875629 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +269,38 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94877056 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +269,38 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94898798 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +66,39 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94898909 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +66,39 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94898917 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +66,39 @@ class GBTClassifierSuite extends

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 @sethah @jkbradley thank you for the review - could you please take another look since I've updated the code review based on your comments? --- If your project is set up for it, yo

[GitHub] spark issue #9920: [SPARK-11569] [ML] Fix StringIndexer to handle null value...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/9920 @jliwork @srowen are you currently working on this in-progress JIRA 11569? If not, I would be interested in continuing the initial pull request that was closed. Please let me know, thank you

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2017-01-05 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 It looks like I am failing the binary compatibility tests despite this constructor being private: class GBTClassificationModel private[ml]( @Since("1.6.0") overri

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-06 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94957214 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +269,38 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-06 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94957257 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +269,38 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2017-01-06 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r94957348 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +269,38 @@ class GBTClassificationModel private

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2017-01-06 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 Indeed re-adding the constructor seems to make the binary compatibility tests pass (see spark QA build above). I think in favor of making the binary compat tests pass, we can keep the extra

[GitHub] spark issue #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict probabi...

2017-01-06 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 I've removed the WIP from title to reflect the status of the pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark issue #16471: [SPARK-19078] hashingTF,ChiSqSelector,IDF,StandardScaler...

2017-01-06 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16471 I think you might need to add [ML] to the pull request name, eg: [SPARK-19078][ML] hashingTF,ChiSqSelector,IDF,StandardScaler,PCA transform avoid extra vector conversion I like

[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-06 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95030008 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -51,6 +54,23 @@ class BisectingKMeansSuite

[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-06 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95029976 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -29,9 +29,12 @@ class BisectingKMeansSuite

[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-06 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95030147 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -51,6 +54,23 @@ class BisectingKMeansSuite

[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-06 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95030212 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -160,6 +162,17 @@ object KMeansSuite

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-06 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 @jkbradley Thank you for taking a look! I've updated the code based on your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on G

[GitHub] spark pull request #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing w...

2017-01-06 Thread imatiach-msft
GitHub user imatiach-msft opened a pull request: https://github.com/apache/spark/pull/16494 [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with ClassCastException ## What changes were proposed in this pull request? LDA fails with a ClassCastException when run on a dataset

[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...

2017-01-06 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16494 @jkbradley @vanzin @skyluc @luluorta @uncleGen @kanzhang Could you please take a look at this pull request to fix the method fromEdges in EdgeRDD class used by LDA? Thank you! --- If your

[GitHub] spark issue #16516: [SPARK-19133][ML] ML GLR family and link could be upperc...

2017-01-09 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16516 This is a nice fix. It looks like some other learners have this issue as well, eg LogisticRegression.scala under $(root)/mllib/src/main/scala/org/apache/spark/ml/classification

[GitHub] spark issue #16516: [SPARK-19133][ML] ML GLR family and link could be upperc...

2017-01-09 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16516 Maybe a more generic fix would be to fix the method ParamValidators.inArray to be case insensitive. I see this method used in a lot of places. Doing a simple search brings up not just

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-09 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 @jkbradley @yu-iskw @srowen can you please take another look at the bisecting k-means algorithm fix? Thank you! --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees handling ...

2017-01-09 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16377 ping @sethah can you please take another look at the decision tree/random forest fixes? Thank you! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict probabi...

2017-01-09 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 ping @sethah @jkbradley could you please take another look since I've updated the code review based on your comments? Thank you! --- If your project is set up for it, you can reply to

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 @filousen could you please share the code that you used to load and run the dataset and the full error message with stack trace you are seeing? I'm a bit confused since the dataset is

  1   2   3   4   5   6   7   >