[GitHub] spark issue #19680: [SPARK-22641][ML] Refactor Spark ML model summaries

2017-11-06 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/19680 cc @yanboliang @srowen @WeichenXu123 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #19680: [SPARK-22641][ML] Refactor Spark ML model summari...

2017-11-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19680#discussion_r149216754 --- Diff: mllib/src/main/scala/org/apache/spark/ml/summary/ClusteringSummary.scala --- @@ -30,11 +30,12 @@ import org.apache.spark.sql.{DataFrame, Row

[GitHub] spark pull request #19680: [SPARK-22641][ML] Refactor Spark ML model summari...

2017-11-06 Thread sethah
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/19680 [SPARK-22641][ML] Refactor Spark ML model summaries ## What changes were proposed in this pull request? JIRA: [SPARK-22641](https://issues.apache.org/jira/browse/SPARK-22461) This

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-03 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148881376 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-03 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148852449 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-03 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148852081 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -764,13 +764,17 @@ class LinearRegressionSuite

[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-02 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18118 Jenkins test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-02 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148662710 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala --- @@ -230,6 +230,13 @@ class MultivariateOnlineSummarizer

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-02 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148664242 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -125,4 +125,14 @@ class RegressionMetrics @Since("

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-02 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148623948 --- Diff: mllib/src/test/scala/org/apache/spark/ml/evaluation/RegressionEvaluatorSuite.scala --- @@ -73,6 +73,11 @@ class RegressionEvaluatorSuite

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-02 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148619202 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala --- @@ -230,6 +230,13 @@ class MultivariateOnlineSummarizer

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-02 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148619618 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala --- @@ -230,6 +230,13 @@ class MultivariateOnlineSummarizer

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-02 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148585371 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala --- @@ -49,8 +49,8 @@ final class RegressionEvaluator @Since("

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

2017-11-02 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19638#discussion_r148619734 --- Diff: mllib/src/test/scala/org/apache/spark/ml/evaluation/RegressionEvaluatorSuite.scala --- @@ -73,6 +73,11 @@ class RegressionEvaluatorSuite

[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-02 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148564380 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -354,6 +356,41 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-10-31 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148063967 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala --- @@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since

[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-10-31 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148095940 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -354,6 +356,41 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-10-31 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148065729 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala --- @@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since

[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-10-31 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148063860 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala --- @@ -108,7 +108,8 @@ class DecisionTreeRegressor @Since("

[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-10-31 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148095844 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -354,6 +356,41 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-10-31 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148096176 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -354,6 +356,41 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-10-31 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r148063169 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -192,6 +197,10 @@ object GBTClassifier extends

[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...

2017-10-27 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18610 A couple questions: 1. do we actually *need* to save the initialModel when we persist the current model? I'm not sure it's necessary and it adds complexity. Also, we could add

[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-10-27 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r147532256 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -226,6 +246,12 @@ class LinearRegression @Since("

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143571594 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143048261 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143063348 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143051494 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143011936 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143020551 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143029928 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143029396 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143036042 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143051749 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143050445 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143050632 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143053337 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143034710 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143053094 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143050902 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143020469 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143033624 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143021666 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143051105 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143050245 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143034368 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143034586 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143011531 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143048444 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143050140 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143052196 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143020756 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143048339 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143030468 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143007246 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r142992652 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143022987 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143053455 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143051704 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143020307 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r143033876 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r142991123 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -171,20 +210,46 @@ final class Word2Vec @Since("1.4.0") (

[GitHub] spark pull request #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Wor...

2017-10-05 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17673#discussion_r142990145 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -171,20 +210,46 @@ final class Word2Vec @Since("1.4.0") (

[GitHub] spark issue #19020: [SPARK-3181] [ML] Implement huber loss for LinearRegress...

2017-09-25 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/19020 @yanboliang Yeah, I saw the discussion and it seems to me the reason was: there would be too much code duplication. Sure, it's true that there would be code duplication, but to me that's a

[GitHub] spark issue #19020: [SPARK-3181] [ML] Implement huber loss for LinearRegress...

2017-09-20 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/19020 I disagree that this should be combined with Linear Regression. IMO, this belongs as its own algorithm. The fact that there would be code duplication in that case is indicative that we don't

[GitHub] spark issue #19232: [SPARK-22009][ML] Using treeAggregate improve some algs

2017-09-18 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/19232 Sure, we all agree there is a mechanism for avoiding overhead. However, performance tests are very tricky things, 5% is not a huge improvement, and hard-coding the aggregation depth to `2` limits

[GitHub] spark issue #19232: [SPARK-22009][ML] Using treeAggregate improve some algs

2017-09-14 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/19232 I'm not really aware of situations where it would be detrimental, since it has a mechanism for avoiding the intermediate stages when it doesn't make sense. However, one of the big adv

[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...

2017-09-12 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/19106 Ok, I guess I'm surprised that someone even noticed this... So, basically, we are changing the behavior of a private function for a specific case which is actually impossible to eve

[GitHub] spark pull request #19185: [Spark-21854] Added LogisticRegressionTrainingSum...

2017-09-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19185#discussion_r138220213 --- Diff: python/pyspark/ml/tests.py --- @@ -1473,11 +1473,59 @@ def test_logistic_regression_summary(self): self.assertTrue(isinstance

[GitHub] spark pull request #19185: [Spark-21854] Added LogisticRegressionTrainingSum...

2017-09-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19185#discussion_r138219915 --- Diff: python/pyspark/ml/tests.py --- @@ -1473,11 +1473,59 @@ def test_logistic_regression_summary(self): self.assertTrue(isinstance

[GitHub] spark pull request #19185: [Spark-21854] Added LogisticRegressionTrainingSum...

2017-09-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19185#discussion_r138220005 --- Diff: python/pyspark/ml/tests.py --- @@ -1473,11 +1473,59 @@ def test_logistic_regression_summary(self): self.assertTrue(isinstance

[GitHub] spark pull request #19185: [Spark-21854] Added LogisticRegressionTrainingSum...

2017-09-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/19185#discussion_r138220297 --- Diff: python/pyspark/ml/classification.py --- @@ -528,9 +528,11 @@ def summary(self): trained on the training set. An exception is thrown if

[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...

2017-09-11 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/19106 I'm confused how this issue was discovered in the first place. Did someone actually train an RF/DT and receive all zero probabilities? If so, shouldn't there be a unit test that recr

[GitHub] spark pull request #18315: [SPARK-21108] [ML] convert LinearSVC to aggregato...

2017-08-21 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18315#discussion_r134273202 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/HingeAggregatorSuite.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-18 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18896 Thanks for catching this @WeichenXu123! I just added a note about the intent of test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #18896: [SPARK-21681][ML] fix bug of MLOR do not work cor...

2017-08-18 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18896#discussion_r134076580 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -1392,6 +1415,61 @@ class LogisticRegressionSuite

[GitHub] spark pull request #18896: [SPARK-21681][ML] fix bug of MLOR do not work cor...

2017-08-18 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18896#discussion_r134076552 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala --- @@ -238,8 +238,17 @@ class LogisticAggregatorSuite

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18899 Ok, it's fairly safe since it's limited to `private[linalg]`. The confusion for me is that this method introduces all sorts of edge cases which have behavior that is not at all obvious or

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-11 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18899 I think there _is_ new functionality, a new method that needs its functionality defined. One specific example, we need a test like: scala test("toSparseWithSize") {

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18899 Ok, yes, I see it now. Though, the point remains but to a lesser degree. We still have a method, albeit private, that indexes the array at potentially unsafe locations. It's probably ok, but a

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18899 Btw, I think the compile error is because `v.toSparse(2)` could resolve to either `v.toSparse(nnz = 2)` OR `v.toSparse.apply(2)`. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18899 This approach doesn't feel right to me. The goal of the change is to avoid making a pass over the values to find out if there are any explicit zeros that need to be eliminated, which is

[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress

2017-08-10 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18899 First suggestion is that there must be unit tests :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18832 If you want to change it, that's fine. I think it's fine either way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If yo

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18832 No, I don't think so. Computing parent stats is a very small fraction of the time and memory compared with the overall `allStats` array. That's why we decided to just add it in the f

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18832 I don't agree the comment is _misleading_. It might be confusing, but that's something different. The reason that the `DTStatsAggregator` needs to keep track of `parentStats` is

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18832 The comment is not wrong. It's added for when we are finding the best split, to compute the right child stats from the left child stats. We would have just used the stats that are already avai

[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.1 for an emergen...

2017-08-01 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18797 Can you change the title? Upgrade to **0.13.2**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18315: [SPARK-21108] [ML] [WIP] convert LinearSVC to aggregator...

2017-07-26 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18315 ping! https://github.com/apache/spark/pull/18305 was merged. This can proceed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #18305: [SPARK-20988][ML] Logistic regression uses aggregator hi...

2017-07-26 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18305 Thanks @MLnick, @hhbyyh, and @facaiy for reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18305: [SPARK-20988][ML] Logistic regression uses aggregator hi...

2017-07-21 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18305 What's blocking us here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and w

[GitHub] spark issue #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-21 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18513 LGTM! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-21 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18513#discussion_r128786226 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala --- @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-18 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18513 Let's make sure to create doc and python JIRAs before this gets merged btw. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-18 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r128042544 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala --- @@ -32,40 +34,45 @@ private[ml] trait

[GitHub] spark issue #18305: [SPARK-20988][ML] Logistic regression uses aggregator hi...

2017-07-18 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/18305 Thanks @MLnick. I agree with you about the broadcasting, so have backed it out. I think all comments are addressed now, please let me know if there is anything else. --- If your project is set up

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-18 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r128041731 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed to the

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-14 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18513#discussion_r127558429 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala --- @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-14 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18513#discussion_r127554746 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala --- @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-14 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18513#discussion_r127555147 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala --- @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software

<    1   2   3   4   5   6   7   8   9   10   >