Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17078#discussion_r103236865
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1447,7 +1447,7 @@ private class LogisticAggregator
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17076#discussion_r103139054
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -440,19 +440,9 @@ private class LinearSVCAggregator
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17076
ping @yanboliang @AnthonyTruchet
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/17078
[SPARK-19746][ML] Faster indexing for logistic aggregator
## What changes were proposed in this pull request?
JIRA: [SPARK-19746](https://issues.apache.org/jira/browse/SPARK-19746
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17078
ping @dbtsai @yanboliang
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/17076
[SPARK-19745][ML] SVCAggregator captures coefficients in its closure
## What changes were proposed in this pull request?
JIRA: [SPARK-19745](https://issues.apache.org/jira/browse/SPARK
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101449037
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala ---
@@ -60,12 +68,14 @@ private[spark] object BaggedPoint
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101448976
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/BaggedPoint.scala ---
@@ -82,16 +92,16 @@ private[spark] object BaggedPoint {
val
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101448658
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
---
@@ -351,6 +370,36 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101448626
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala
---
@@ -115,7 +122,10 @@ private[spark] object DecisionTreeMetadata
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16715
BTW, in the future I'd prefer to separate the examples and they Python API.
I'm not sure if we ever fully decided on a normal protocol for this, but it
certainly would make the rev
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r101349655
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
---
@@ -351,6 +370,36 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r101201608
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaMinHashLSHExample.java
---
@@ -17,6 +17,7 @@
package
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100970965
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -222,17 +222,18 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100971720
--- Diff: python/pyspark/ml/feature.py ---
@@ -755,6 +945,103 @@ def maxAbs(self):
@inherit_doc
+class MinHashLSH(JavaEstimator
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100970887
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/MinHashLSHExample.scala ---
@@ -37,38 +43,45 @@ object MinHashLSHExample {
(0
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100971229
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/MinHashLSHExample.scala ---
@@ -21,9 +21,15 @@ package org.apache.spark.examples.ml
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15435
Yes I think that's the gist of it, thanks a lot @WeichenXu123 !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16699#discussion_r100901912
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -944,15 +981,27 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16699#discussion_r100574976
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -798,77 +798,160 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16699#discussion_r100923075
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Instance.scala
---
@@ -27,3 +27,25 @@ import org.apache.spark.ml.linalg.Vector
* @param
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16699#discussion_r100922910
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Instance.scala
---
@@ -27,3 +27,25 @@ import org.apache.spark.ml.linalg.Vector
* @param
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16699#discussion_r100624754
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -798,77 +798,160 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16699#discussion_r100919210
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -406,6 +435,14 @@ object GeneralizedLinearRegression
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16699#discussion_r100896289
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -168,6 +179,7 @@ private[regression] trait
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16699#discussion_r100917713
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -1139,54 +1189,52 @@ class
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16699
Ah, thank you very much for that clarification, I don't have much
experience using R. I tried this out earlier, and it seems you have to use the
same offset column name as you show abo
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100875650
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/MinHashLSHExample.scala ---
@@ -37,38 +38,44 @@ object MinHashLSHExample {
(0
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100868336
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSH.scala
---
@@ -111,8 +111,8 @@ class BucketedRandomProjectionLSHModel
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100878185
--- Diff: docs/ml-features.md ---
@@ -1558,6 +1558,15 @@ for more details on the API.
{% include_example
java/org/apache/spark/examples/ml
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100877318
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaBucketedRandomProjectionLSHExample.java
---
@@ -35,6 +35,8 @@
import
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100868312
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,198 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100879790
--- Diff:
examples/src/main/python/ml/bucketed_random_projection_lsh_example.py ---
@@ -0,0 +1,81 @@
+#
+# Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100868097
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,198 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100869085
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,198 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah closed the pull request at:
https://github.com/apache/spark/pull/14321
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16699
I'm finding R's behavior for prediction with offsets to be a bit strange.
Yes, R does use the original offsets supplied during training to do prediction,
but what if I want to make predic
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16699
@actuaryzhang This is looking pretty good overall. Regarding the prediction
logic, R glm does not allow you to predict with offsets, correct? I notice that
statsmodels in Python _does_ allow it. For
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16699
whew, this was a lot of work :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16715
BTW I pointed out some typos or mistakes in certain places, but if it's in
one place it generally was everywhere. I didn't point out each individual one.
---
If your project is set up f
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100424220
--- Diff:
examples/src/main/python/ml/bucketed_random_projection_lsh_example.py ---
@@ -0,0 +1,86 @@
+#
+# Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100426821
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaMinHashLSHExample.java
---
@@ -44,25 +45,67 @@ public static void main(String[] args
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100427045
--- Diff: examples/src/main/python/ml/min_hash_lsh_example.py ---
@@ -0,0 +1,85 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100421903
--- Diff:
examples/src/main/python/ml/bucketed_random_projection_lsh_example.py ---
@@ -0,0 +1,86 @@
+#
+# Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100422395
--- Diff:
examples/src/main/python/ml/bucketed_random_projection_lsh_example.py ---
@@ -0,0 +1,86 @@
+#
+# Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100427531
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,196 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100426756
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaMinHashLSHExample.java
---
@@ -44,25 +45,67 @@ public static void main(String[] args
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100426237
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaBucketedRandomProjectionLSHExample.java
---
@@ -71,25 +71,32 @@ public static void main
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100427633
--- Diff: python/pyspark/ml/feature.py ---
@@ -755,6 +947,101 @@ def maxAbs(self):
@inherit_doc
+class MinHashLSH(JavaEstimator
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100426683
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaMinHashLSHExample.java
---
@@ -44,25 +45,67 @@ public static void main(String[] args
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100428492
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaMinHashLSHExample.java
---
@@ -44,25 +45,67 @@ public static void main(String[] args
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100420489
--- Diff:
examples/src/main/python/ml/bucketed_random_projection_lsh_example.py ---
@@ -0,0 +1,86 @@
+#
+# Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100426020
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaBucketedRandomProjectionLSHExample.java
---
@@ -71,25 +71,32 @@ public static void main
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16715
First pass, thanks @Yunni and @yanboliang !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99940031
--- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh.py ---
@@ -0,0 +1,76 @@
+#
+# Licensed to the Apache Software Foundation (ASF
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99943935
--- Diff: examples/src/main/python/ml/min_hash_lsh.py ---
@@ -0,0 +1,75 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99940201
--- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh.py ---
@@ -0,0 +1,76 @@
+#
+# Licensed to the Apache Software Foundation (ASF
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99943986
--- Diff: examples/src/main/python/ml/min_hash_lsh.py ---
@@ -0,0 +1,75 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99912784
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99901401
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99929663
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99923657
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99909930
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99871800
--- Diff: python/pyspark/ml/feature.py ---
@@ -755,6 +951,102 @@ def maxAbs(self):
@inherit_doc
+class MinHashLSH(JavaEstimator
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99928739
--- Diff: python/pyspark/ml/feature.py ---
@@ -755,6 +951,102 @@ def maxAbs(self):
@inherit_doc
+class MinHashLSH(JavaEstimator
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99871507
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99929977
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99870133
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99870243
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99872069
--- Diff: python/pyspark/ml/feature.py ---
@@ -755,6 +951,102 @@ def maxAbs(self):
@inherit_doc
+class MinHashLSH(JavaEstimator
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r99870914
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16740
Regarding the tests - I don't think the tests should change _depending on_
the implementation. I don't think it's valid to say that we don't need to test
this thoroughly becau
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r99670310
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
---
@@ -106,14 +122,18 @@ class DecisionTreeClassifier @Since
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r99668948
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/LabeledPoint.scala ---
@@ -35,4 +35,11 @@ case class LabeledPoint(@Since("2.0.0") lab
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r99668686
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -590,8 +599,8 @@ private[spark] object RandomForest extends Logging
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r99666983
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala ---
@@ -124,8 +129,8 @@ private[ml] object TreeTests extends SparkFunSuite
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r99667381
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
---
@@ -351,6 +370,36 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r9967
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
@@ -281,10 +283,26 @@ object MLTestingUtils extends SparkFunSuite
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r99665910
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Variance.scala ---
@@ -70,17 +70,24 @@ object Variance extends Impurity {
* Note
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r99665188
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala ---
@@ -79,7 +79,12 @@ private[spark] abstract class ImpurityAggregator
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r99664877
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Entropy.scala ---
@@ -83,23 +83,29 @@ object Entropy extends Impurity {
* @param
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r99664273
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala
---
@@ -42,6 +42,7 @@ import org.apache.spark.rdd.RDD
private
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16740#discussion_r99413491
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala
---
@@ -89,7 +89,7 @@ private[ml] class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16740#discussion_r99407439
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -335,6 +335,11 @@ class GeneralizedLinearRegression
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16740#discussion_r99408646
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -743,6 +744,50 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16740#discussion_r99269075
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -743,6 +743,55 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16740#discussion_r99268773
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -743,6 +743,55 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16699#discussion_r99263111
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -743,6 +743,84 @@ class
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16740
Ok, yeah, let's go with this fix now then - seems both R and statsmodels
fit to compute the null model. Thanks for following up on that!
---
If your project is set up for it, you can reply to
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15435
So, looking at the design, I'm a bit concerned. Since we're adding
summaries in several places around ML, I think we'd ideally design a hierarchy
like we did for the estima
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16740
I agree having a special case is unsatisfying from an engineering
perspective. In Spark it's a bit different than R since every iteration of IRLS
will launch a Spark job, making a pass over the
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16722
jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16740
I don't really expect that we'll be changing things so often that this
becomes a hassle. I think there is value in getting known results - in the
current test the IRLS solver takes 3 ite
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15435
sorry for the delay, hope to get to it soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15628
ping @imatiach-msft @dbtsai
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/9
ping! I could take this over if needed :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r98807130
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/RandomForestClassifier.scala
---
@@ -126,20 +127,22 @@ class RandomForestClassifier @Since
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r98807054
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
---
@@ -106,14 +122,18 @@ class DecisionTreeClassifier @Since
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16722#discussion_r98778763
--- Diff:
mllib-local/src/test/scala/org/apache/spark/ml/util/TestingUtils.scala ---
@@ -48,7 +48,7 @@ object TestingUtils {
/**
* Private
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16740
Allowing offset will only require a small change to the intercept
calculation, won't it?
scala
val agg = data.agg(sum(w * (col("label") - col("labe
401 - 500 of 1857 matches
Mail list logo