[GitHub] spark pull request #17078: [SPARK-19746][ML] Faster indexing for logistic ag...

2017-02-27 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17078#discussion_r103236865 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1447,7 +1447,7 @@ private class LogisticAggregator

[GitHub] spark pull request #17076: [SPARK-19745][ML] SVCAggregator captures coeffici...

2017-02-26 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17076#discussion_r103139054 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -440,19 +440,9 @@ private class LinearSVCAggregator

[GitHub] spark issue #17076: [SPARK-19745][ML] SVCAggregator captures coefficients in...

2017-02-26 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17076 ping @yanboliang @AnthonyTruchet --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17078: [SPARK-19746][ML] Faster indexing for logistic ag...

2017-02-26 Thread sethah
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/17078 [SPARK-19746][ML] Faster indexing for logistic aggregator ## What changes were proposed in this pull request? JIRA: [SPARK-19746](https://issues.apache.org/jira/browse/SPARK-19746

[GitHub] spark issue #17078: [SPARK-19746][ML] Faster indexing for logistic aggregato...

2017-02-26 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17078 ping @dbtsai @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #17076: [SPARK-19745][ML] SVCAggregator captures coeffici...

2017-02-26 Thread sethah
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/17076 [SPARK-19745][ML] SVCAggregator captures coefficients in its closure ## What changes were proposed in this pull request? JIRA: [SPARK-19745](https://issues.apache.org/jira/browse/SPARK

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101449037 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala --- @@ -60,12 +68,14 @@ private[spark] object BaggedPoint

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101448976 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala --- @@ -82,16 +92,16 @@ private[spark] object BaggedPoint { val

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101448658 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -351,6 +370,36 @@ class

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101448626 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala --- @@ -115,7 +122,10 @@ private[spark] object DecisionTreeMetadata

[GitHub] spark issue #16715: [Spark-18080][ML][PYTHON] Python API & Examples for Loca...

2017-02-15 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16715 BTW, in the future I'd prefer to separate the examples and they Python API. I'm not sure if we ever fully decided on a normal protocol for this, but it certainly would make the rev

[GitHub] spark pull request #16722: [SPARK-19591][ML][MLlib] Add sample weights to de...

2017-02-15 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r101349655 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -351,6 +370,36 @@ class

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-14 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r101201608 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaMinHashLSHExample.java --- @@ -17,6 +17,7 @@ package

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100970965 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -222,17 +222,18 @@ private[ml] abstract class LSHModel[T <: LSHMode

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100971720 --- Diff: python/pyspark/ml/feature.py --- @@ -755,6 +945,103 @@ def maxAbs(self): @inherit_doc +class MinHashLSH(JavaEstimator

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100970887 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/MinHashLSHExample.scala --- @@ -37,38 +43,45 @@ object MinHashLSHExample { (0

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100971229 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/MinHashLSHExample.scala --- @@ -21,9 +21,15 @@ package org.apache.spark.examples.ml

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-02-13 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15435 Yes I think that's the gist of it, thanks a lot @WeichenXu123 ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r100901912 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -944,15 +981,27 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r100574976 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -798,77 +798,160 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r100923075 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Instance.scala --- @@ -27,3 +27,25 @@ import org.apache.spark.ml.linalg.Vector * @param

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r100922910 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Instance.scala --- @@ -27,3 +27,25 @@ import org.apache.spark.ml.linalg.Vector * @param

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r100624754 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -798,77 +798,160 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r100919210 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -406,6 +435,14 @@ object GeneralizedLinearRegression

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r100896289 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -168,6 +179,7 @@ private[regression] trait

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r100917713 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -1139,54 +1189,52 @@ class

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-13 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16699 Ah, thank you very much for that clarification, I don't have much experience using R. I tried this out earlier, and it seems you have to use the same offset column name as you show abo

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100875650 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/MinHashLSHExample.scala --- @@ -37,38 +38,44 @@ object MinHashLSHExample { (0

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100868336 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSH.scala --- @@ -111,8 +111,8 @@ class BucketedRandomProjectionLSHModel

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100878185 --- Diff: docs/ml-features.md --- @@ -1558,6 +1558,15 @@ for more details on the API. {% include_example java/org/apache/spark/examples/ml

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100877318 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaBucketedRandomProjectionLSHExample.java --- @@ -35,6 +35,8 @@ import

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100868312 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,198 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100879790 --- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh_example.py --- @@ -0,0 +1,81 @@ +# +# Licensed to the Apache Software

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100868097 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,198 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100869085 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,198 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #14321: [SPARK-8971][ML] Add stratified sampling to ML Cr...

2017-02-13 Thread sethah
Github user sethah closed the pull request at: https://github.com/apache/spark/pull/14321 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-13 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16699 I'm finding R's behavior for prediction with offsets to be a bit strange. Yes, R does use the original offsets supplied during training to do prediction, but what if I want to make predic

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-10 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16699 @actuaryzhang This is looking pretty good overall. Regarding the prediction logic, R glm does not allow you to predict with offsets, correct? I notice that statsmodels in Python _does_ allow it. For

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-10 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16699 whew, this was a lot of work :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-09 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16715 BTW I pointed out some typos or mistakes in certain places, but if it's in one place it generally was everywhere. I didn't point out each individual one. --- If your project is set up f

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100424220 --- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh_example.py --- @@ -0,0 +1,86 @@ +# +# Licensed to the Apache Software

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100426821 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaMinHashLSHExample.java --- @@ -44,25 +45,67 @@ public static void main(String[] args

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100427045 --- Diff: examples/src/main/python/ml/min_hash_lsh_example.py --- @@ -0,0 +1,85 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100421903 --- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh_example.py --- @@ -0,0 +1,86 @@ +# +# Licensed to the Apache Software

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100422395 --- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh_example.py --- @@ -0,0 +1,86 @@ +# +# Licensed to the Apache Software

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100427531 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,196 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100426756 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaMinHashLSHExample.java --- @@ -44,25 +45,67 @@ public static void main(String[] args

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100426237 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaBucketedRandomProjectionLSHExample.java --- @@ -71,25 +71,32 @@ public static void main

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100427633 --- Diff: python/pyspark/ml/feature.py --- @@ -755,6 +947,101 @@ def maxAbs(self): @inherit_doc +class MinHashLSH(JavaEstimator

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100426683 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaMinHashLSHExample.java --- @@ -44,25 +45,67 @@ public static void main(String[] args

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100428492 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaMinHashLSHExample.java --- @@ -44,25 +45,67 @@ public static void main(String[] args

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100420489 --- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh_example.py --- @@ -0,0 +1,86 @@ +# +# Licensed to the Apache Software

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-09 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100426020 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaBucketedRandomProjectionLSHExample.java --- @@ -71,25 +71,32 @@ public static void main

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-07 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16715 First pass, thanks @Yunni and @yanboliang ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99940031 --- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh.py --- @@ -0,0 +1,76 @@ +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99943935 --- Diff: examples/src/main/python/ml/min_hash_lsh.py --- @@ -0,0 +1,75 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99940201 --- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh.py --- @@ -0,0 +1,76 @@ +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99943986 --- Diff: examples/src/main/python/ml/min_hash_lsh.py --- @@ -0,0 +1,75 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99912784 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99901401 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99929663 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99923657 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99909930 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99871800 --- Diff: python/pyspark/ml/feature.py --- @@ -755,6 +951,102 @@ def maxAbs(self): @inherit_doc +class MinHashLSH(JavaEstimator

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99928739 --- Diff: python/pyspark/ml/feature.py --- @@ -755,6 +951,102 @@ def maxAbs(self): @inherit_doc +class MinHashLSH(JavaEstimator

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99871507 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99929977 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99870133 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99870243 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99872069 --- Diff: python/pyspark/ml/feature.py --- @@ -755,6 +951,102 @@ def maxAbs(self): @inherit_doc +class MinHashLSH(JavaEstimator

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-07 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r99870914 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16740 Regarding the tests - I don't think the tests should change _depending on_ the implementation. I don't think it's valid to say that we don't need to test this thoroughly becau

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99670310 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -106,14 +122,18 @@ class DecisionTreeClassifier @Since

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99668948 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LabeledPoint.scala --- @@ -35,4 +35,11 @@ case class LabeledPoint(@Since("2.0.0") lab

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99668686 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -590,8 +599,8 @@ private[spark] object RandomForest extends Logging

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99666983 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala --- @@ -124,8 +129,8 @@ private[ml] object TreeTests extends SparkFunSuite

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99667381 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -351,6 +370,36 @@ class

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r9967 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -281,10 +283,26 @@ object MLTestingUtils extends SparkFunSuite

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99665910 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Variance.scala --- @@ -70,17 +70,24 @@ object Variance extends Impurity { * Note

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99665188 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala --- @@ -79,7 +79,12 @@ private[spark] abstract class ImpurityAggregator

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99664877 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Entropy.scala --- @@ -83,23 +83,29 @@ object Entropy extends Impurity { * @param

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99664273 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala --- @@ -42,6 +42,7 @@ import org.apache.spark.rdd.RDD private

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-03 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16740#discussion_r99413491 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala --- @@ -89,7 +89,7 @@ private[ml] class

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-03 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16740#discussion_r99407439 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -335,6 +335,11 @@ class GeneralizedLinearRegression

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-03 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16740#discussion_r99408646 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -743,6 +744,50 @@ class

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-02 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16740#discussion_r99269075 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -743,6 +743,55 @@ class

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-02 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16740#discussion_r99268773 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -743,6 +743,55 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-02 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r99263111 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -743,6 +743,84 @@ class

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16740 Ok, yeah, let's go with this fix now then - seems both R and statsmodels fit to compute the null model. Thanks for following up on that! --- If your project is set up for it, you can reply to

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-02-02 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15435 So, looking at the design, I'm a bit concerned. Since we're adding summaries in several places around ML, I think we'd ideally design a hierarchy like we did for the estima

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-31 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16740 I agree having a special case is unsatisfying from an engineering perspective. In Spark it's a bit different than R since every iteration of IRLS will launch a Spark job, making a pass over the

[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-01-31 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16722 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-31 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16740 I don't really expect that we'll be changing things so often that this becomes a hassle. I think there is value in getting known results - in the current test the IRLS solver takes 3 ite

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-01-31 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15435 sorry for the delay, hope to get to it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2017-01-31 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15628 ping @imatiach-msft @dbtsai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2017-01-31 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/9 ping! I could take this over if needed :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-01-31 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r98807130 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/RandomForestClassifier.scala --- @@ -126,20 +127,22 @@ class RandomForestClassifier @Since

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-01-31 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r98807054 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -106,14 +122,18 @@ class DecisionTreeClassifier @Since

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-01-31 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r98778763 --- Diff: mllib-local/src/test/scala/org/apache/spark/ml/util/TestingUtils.scala --- @@ -48,7 +48,7 @@ object TestingUtils { /** * Private

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-31 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16740 Allowing offset will only require a small change to the intercept calculation, won't it? scala val agg = data.agg(sum(w * (col("label") - col("labe

<    1   2   3   4   5   6   7   8   9   10   >