[GitHub] spark issue #16516: [SPARK-19155][ML] MLlib GeneralizedLinearRegression fami...

2017-01-20 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16516 Hmm ok, I guess that's fine. I'm just worried this line is duplicated, maybe you could add a method for it and put it in a common place: (value: String) => supportedFamilyN

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-20 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 ping @jkbradley would you be able to take another look at the bisecting kmeans model? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-18 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r96721137 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +72,156 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-18 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r96720750 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala --- @@ -67,3 +66,12 @@ trait Loss extends Serializable

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-18 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r96720362 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala --- @@ -52,4 +51,10 @@ object LogLoss extends Loss { // The

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-18 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r96719349 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -177,6 +177,8 @@ class

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-18 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r96718917 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -315,8 +368,9 @@ object GBTClassificationModel extends

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-18 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r96716341 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -159,14 +158,21 @@ class GBTClassifier @Since("

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-18 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 ping @jkbradley would you be able to take another look at the bisecting kmeans model? I've updated with the random seed as requested, and the build succeeded. Thank you! --- If

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-01-18 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-01-17 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-17 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 ping @jkbradley would you be able to take another look at the bisecting kmeans model? I've updated with the random seed as requested. --- If your project is set up for it, you can rep

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-01-17 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-17 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r96534064 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -51,6 +54,18 @@ class BisectingKMeansSuite

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-17 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 @jkbradley done, added seed. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-01-17 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict probabi...

2017-01-17 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 ping @jkbradley would you be able to take a look at the GBTClassifier fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees handling ...

2017-01-17 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16377 ping @jkbradley would you be able to take a look at the decision tree fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-17 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 ping @jkbradley would you be able to take another look at the bisecting K-Means fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #16571: [SPARK-19208][ML] MaxAbsScaler and MinMaxScaler are very...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16571 @sethah thank you for your concern, I added my thoughts to the JIRA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #16571: [SPARK-19208][ML] MaxAbsScaler and MinMaxScaler a...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16571#discussion_r96096056 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala --- @@ -117,11 +113,75 @@ class MinMaxScaler @Since("1.5.0"

[GitHub] spark pull request #16571: [SPARK-19208][ML] MaxAbsScaler and MinMaxScaler a...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16571#discussion_r96095807 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MaxAbsScaler.scala --- @@ -70,14 +67,40 @@ class MaxAbsScaler @Since("2.0.0"

[GitHub] spark pull request #16571: [SPARK-19208][ML] MaxAbsScaler and MinMaxScaler a...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16571#discussion_r96095266 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MaxAbsScaler.scala --- @@ -70,14 +67,40 @@ class MaxAbsScaler @Since("2.0.0"

[GitHub] spark pull request #16571: [SPARK-19208][ML] MaxAbsScaler and MinMaxScaler a...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16571#discussion_r96095069 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MaxAbsScaler.scala --- @@ -70,14 +67,40 @@ class MaxAbsScaler @Since("2.0.0"

[GitHub] spark pull request #16571: [SPARK-19208][ML] MaxAbsScaler and MinMaxScaler a...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16571#discussion_r96095042 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MaxAbsScaler.scala --- @@ -70,14 +67,40 @@ class MaxAbsScaler @Since("2.0.0"

[GitHub] spark pull request #16571: [SPARK-19208][ML] MaxAbsScaler and MinMaxScaler a...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16571#discussion_r96094806 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MaxAbsScaler.scala --- @@ -70,14 +67,40 @@ class MaxAbsScaler @Since("2.0.0"

[GitHub] spark pull request #16571: [SPARK-19208][ML] MaxAbsScaler and MinMaxScaler a...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16571#discussion_r96092078 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MaxAbsScaler.scala --- @@ -70,14 +67,40 @@ class MaxAbsScaler @Since("2.0.0"

[GitHub] spark issue #16516: [SPARK-19155][ML] Make some string params of ML algorith...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16516 yep, I wrote that in a comment above, I totally agree: 1.) we are specifying some column name as a parameter 2.) RModel formula (from RFormula.scala) 3.) Tokenizer.scala regex

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB][WIP] ML Evaluators should use w...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB][WIP] ML Evaluators should use w...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley would you be able to take a look at the changes to add a weight column to binary/multiclass/regression evaluators/metrics classes? It

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB][WIP] ML Evaluators should use w...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 Jenkins doesn't seem to be working ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB][WIP] ML Evaluators should use w...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB][WIP] ML Evaluators should use w...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB][WIP] ML Evaluators should use w...

2017-01-13 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-12 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 @jkbradley thanks, I've updated the code based on your latest comments - I removed k and the verification for the setters. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-12 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95937613 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -160,6 +162,17 @@ object KMeansSuite

[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-12 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16355#discussion_r95937532 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -51,6 +54,21 @@ class BisectingKMeansSuite

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB][WIP] ML Evaluators should use w...

2017-01-12 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB][WIP] ML Evaluators should use w...

2017-01-12 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB][WIP] ML Evaluators should use w...

2017-01-12 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 it looks like a random test timed out: org.apache.spark.scheduler.BasicSchedulerIntegrationSuite.job with fetch failure Error Details java.util.concurrent.TimeoutException

[GitHub] spark issue #16516: [SPARK-19155][ML] Make some string params of ML algorith...

2017-01-12 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16516 It looks like you can also update the metric name in the evaluators (binary, regression, multiclass) as well. Those should be case-insensitive too, I think. --- If your project is set up

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

2017-01-12 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16516#discussion_r95848454 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -91,8 +91,8 @@ private[classification] trait

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

2017-01-12 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16516#discussion_r95846429 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -91,8 +91,8 @@ private[classification] trait

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

2017-01-12 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16516#discussion_r95846332 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -91,8 +91,8 @@ private[classification] trait

[GitHub] spark pull request #16557: [SPARK-18693][ML][MLLIB][WIP] ML Evaluators shoul...

2017-01-11 Thread imatiach-msft
GitHub user imatiach-msft opened a pull request: https://github.com/apache/spark/pull/16557 [SPARK-18693][ML][MLLIB][WIP] ML Evaluators should use weight column ## What changes were proposed in this pull request? The evaluators BinaryClassificationEvaluator

[GitHub] spark issue #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict probabi...

2017-01-11 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 @sethah thank you for doing the review! I've updated based on your latest comments. @jkbradley would you be able to take a look as well? --- If your project is set up for it, you can

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-11 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95697903 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -275,18 +316,30 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-11 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95697512 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -193,6 +199,8 @@ object GBTClassifier extends

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-11 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95697253 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -159,14 +158,21 @@ class GBTClassifier @Since("

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

2017-01-11 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16516#discussion_r95681819 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -91,8 +91,8 @@ private[classification] trait

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

2017-01-11 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16516#discussion_r95679245 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -91,8 +91,8 @@ private[classification] trait

[GitHub] spark pull request #16516: [SPARK-19155][ML] Make some string params of ML a...

2017-01-11 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16516#discussion_r95679101 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -365,7 +365,7 @@ class LogisticRegression @Since

[GitHub] spark issue #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees handling ...

2017-01-11 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16377 ping @sethah can you please take another look at the decision tree/random forest fixes? Thank you! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-11 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95612086 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -161,6 +161,33 @@ class RandomForestSuite extends

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-11 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95605827 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -170,12 +197,24 @@ class RandomForestSuite extends

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-11 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95605416 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -176,6 +203,18 @@ class RandomForestSuite extends

[GitHub] spark issue #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict probabi...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 thank you @sethah, I've updated the PR based on your latest comments. @jkbradley would you be able to take a look when you have time? Thank you! --- If your project is set up for it

[GitHub] spark issue #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict probabi...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 with regards to the loss type, I think the real issue is that the user shouldn't be able to change the loss type at all on the model, as with many other parameters. It seems strange to

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95518069 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +72,157 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95517754 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala --- @@ -20,6 +20,12 @@ package org.apache.spark.mllib.tree.loss

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95517520 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -275,18 +316,33 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95517316 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -275,18 +316,33 @@ class GBTClassificationModel private

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 @jkbradley @yu-iskw @srowen can you please take another look at the bisecting k-means algorithm fix? Thank you! --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict probabi...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 ping @sethah @jkbradley could you please take another look since I've updated the code review based on your comments? Thank you! --- If your project is set up for it, you can reply to

[GitHub] spark issue #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees handling ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16377 ping @sethah can you please take another look at the decision tree/random forest fixes? Thank you! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95482452 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala --- @@ -52,4 +61,8 @@ object LogLoss extends Loss { // The

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95482409 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala --- @@ -20,6 +20,15 @@ package org.apache.spark.mllib.tree.loss

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95481743 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -241,19 +261,42 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95481100 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -241,19 +261,42 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95480871 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -241,19 +261,42 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95480825 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -215,10 +224,21 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95480340 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -159,14 +157,21 @@ class GBTClassifier @Since("

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 @carocat @filousen Please look at these changes that I updated on December 28: -val height = math.sqrt(Seq(leftIndex, rightIndex).map { childIndex => +val inde

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 It looks like you don't have all of my changes. I also updated the buildSubTree method. Please take a look at the latest commit. --- If your project is set up for it, you can reply to

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 @filousen please note this fix is still in review and hasn't been checked into spark yet. Can you send me the error you are seeing? Also, are you sure you have ported my entire fix to

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 @filousen I must have fixed your issue, because if I undo my changes and run your code I can reproduce the error, you must be running your code without this fix: Job aborted due to

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16355 How did you verify that this change does not fix it? I ran the following code and it ran without errors: test("Verify issue from user") { val jsonDs = spark

[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16494 thank you for the deep investigation and the better root cause analysis. I will refrain from working on this pull request then until either: 1.) the issue is fixed and I can update the

[GitHub] spark issue #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict probabi...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 ping @sethah @jkbradley could you please take another look since I've updated the code review based on your comments? Thank you! --- If your project is set up for it, you can reply to

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95445748 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -176,6 +203,18 @@ class RandomForestSuite extends

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95445282 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -176,6 +203,18 @@ class RandomForestSuite extends

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95444661 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -828,8 +828,27 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95444654 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -161,6 +161,33 @@ class RandomForestSuite extends

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95444352 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala --- @@ -161,6 +161,33 @@ class RandomForestSuite extends

[GitHub] spark pull request #16377: [SPARK-18036][ML][MLLIB] Fixing decision trees ha...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16377#discussion_r95443910 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -828,8 +828,27 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95442630 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -159,14 +157,21 @@ class GBTClassifier @Since("

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95431357 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -275,6 +321,13 @@ class GBTClassificationModel private[ml

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95431048 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -159,14 +157,21 @@ class GBTClassifier @Since("

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95430551 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -159,14 +157,21 @@ class GBTClassifier @Since("

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95423563 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -159,14 +157,21 @@ class GBTClassifier @Since("

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95418591 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -159,14 +157,21 @@ class GBTClassifier @Since("

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95418341 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -275,6 +321,13 @@ class GBTClassificationModel private[ml

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95408807 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +268,38 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95408548 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -248,12 +268,38 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95407706 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -215,10 +223,23 @@ class GBTClassificationModel private

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95399715 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -159,14 +157,21 @@ class GBTClassifier @Since("

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95399594 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -159,14 +157,21 @@ class GBTClassifier @Since("

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95392592 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +70,79 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95392109 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +70,79 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict ...

2017-01-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/16441#discussion_r95387530 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -66,10 +70,79 @@ class GBTClassifierSuite extends

<    1   2   3   4   5   6   7   >