Github user sethah commented on the issue:
https://github.com/apache/spark/pull/19680
cc @yanboliang @srowen @WeichenXu123
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19680#discussion_r149216754
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/summary/ClusteringSummary.scala ---
@@ -30,11 +30,12 @@ import org.apache.spark.sql.{DataFrame, Row
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/19680
[SPARK-22641][ML] Refactor Spark ML model summaries
## What changes were proposed in this pull request?
JIRA: [SPARK-22641](https://issues.apache.org/jira/browse/SPARK-22461)
This
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148881376
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -764,13 +764,17 @@ class LinearRegressionSuite
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148852449
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -764,13 +764,17 @@ class LinearRegressionSuite
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148852081
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -764,13 +764,17 @@ class LinearRegressionSuite
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18118
Jenkins test this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148662710
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala
---
@@ -230,6 +230,13 @@ class MultivariateOnlineSummarizer
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148664242
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -125,4 +125,14 @@ class RegressionMetrics @Since("
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148623948
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/evaluation/RegressionEvaluatorSuite.scala
---
@@ -73,6 +73,11 @@ class RegressionEvaluatorSuite
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148619202
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala
---
@@ -230,6 +230,13 @@ class MultivariateOnlineSummarizer
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148619618
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala
---
@@ -230,6 +230,13 @@ class MultivariateOnlineSummarizer
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148585371
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala
---
@@ -49,8 +49,8 @@ final class RegressionEvaluator @Since("
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148619734
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/evaluation/RegressionEvaluatorSuite.scala
---
@@ -73,6 +73,11 @@ class RegressionEvaluatorSuite
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r148564380
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
---
@@ -354,6 +356,41 @@ class GBTClassifierSuite extends
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r148063967
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala
---
@@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r148095940
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
---
@@ -354,6 +356,41 @@ class GBTClassifierSuite extends
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r148065729
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala
---
@@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r148063860
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala
---
@@ -108,7 +108,8 @@ class DecisionTreeRegressor @Since("
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r148095844
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
---
@@ -354,6 +356,41 @@ class GBTClassifierSuite extends
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r148096176
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
---
@@ -354,6 +356,41 @@ class GBTClassifierSuite extends
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r148063169
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -192,6 +197,10 @@ object GBTClassifier extends
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18610
A couple questions:
1. do we actually *need* to save the initialModel when we persist the
current model? I'm not sure it's necessary and it adds complexity. Also, we
could add
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18610#discussion_r147532256
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -226,6 +246,12 @@ class LinearRegression @Since("
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143571594
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143048261
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143063348
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143051494
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143011936
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143020551
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143029928
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143029396
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143036042
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143051749
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143050445
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143050632
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143053337
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143034710
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143053094
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143050902
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143020469
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143033624
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143021666
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143051105
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143050245
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143034368
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143034586
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143011531
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143048444
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143050140
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143052196
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143020756
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143048339
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143030468
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143007246
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r142992652
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143022987
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143053455
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143051704
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143020307
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r143033876
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/Word2VecCBOWSolver.scala ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r142991123
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -171,20 +210,46 @@ final class Word2Vec @Since("1.4.0") (
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17673#discussion_r142990145
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala
---
@@ -171,20 +210,46 @@ final class Word2Vec @Since("1.4.0") (
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/19020
@yanboliang Yeah, I saw the discussion and it seems to me the reason was:
there would be too much code duplication. Sure, it's true that there would be
code duplication, but to me that's a
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/19020
I disagree that this should be combined with Linear Regression. IMO, this
belongs as its own algorithm. The fact that there would be code duplication in
that case is indicative that we don't
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/19232
Sure, we all agree there is a mechanism for avoiding overhead. However,
performance tests are very tricky things, 5% is not a huge improvement, and
hard-coding the aggregation depth to `2` limits
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/19232
I'm not really aware of situations where it would be detrimental, since it
has a mechanism for avoiding the intermediate stages when it doesn't make
sense. However, one of the big adv
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/19106
Ok, I guess I'm surprised that someone even noticed this...
So, basically, we are changing the behavior of a private function for a
specific case which is actually impossible to eve
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19185#discussion_r138220213
--- Diff: python/pyspark/ml/tests.py ---
@@ -1473,11 +1473,59 @@ def test_logistic_regression_summary(self):
self.assertTrue(isinstance
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19185#discussion_r138219915
--- Diff: python/pyspark/ml/tests.py ---
@@ -1473,11 +1473,59 @@ def test_logistic_regression_summary(self):
self.assertTrue(isinstance
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19185#discussion_r138220005
--- Diff: python/pyspark/ml/tests.py ---
@@ -1473,11 +1473,59 @@ def test_logistic_regression_summary(self):
self.assertTrue(isinstance
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/19185#discussion_r138220297
--- Diff: python/pyspark/ml/classification.py ---
@@ -528,9 +528,11 @@ def summary(self):
trained on the training set. An exception is thrown if
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/19106
I'm confused how this issue was discovered in the first place. Did someone
actually train an RF/DT and receive all zero probabilities? If so, shouldn't
there be a unit test that recr
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18315#discussion_r134273202
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/HingeAggregatorSuite.scala
---
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18896
Thanks for catching this @WeichenXu123! I just added a note about the
intent of test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18896#discussion_r134076580
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
---
@@ -1392,6 +1415,61 @@ class LogisticRegressionSuite
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18896#discussion_r134076552
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala
---
@@ -238,8 +238,17 @@ class LogisticAggregatorSuite
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18899
Ok, it's fairly safe since it's limited to `private[linalg]`. The confusion
for me is that this method introduces all sorts of edge cases which have
behavior that is not at all obvious or
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18899
I think there _is_ new functionality, a new method that needs its
functionality defined. One specific example, we need a test like:
scala
test("toSparseWithSize") {
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18899
Ok, yes, I see it now. Though, the point remains but to a lesser degree. We
still have a method, albeit private, that indexes the array at potentially
unsafe locations. It's probably ok, but a
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18899
Btw, I think the compile error is because `v.toSparse(2)` could resolve to
either `v.toSparse(nnz = 2)` OR `v.toSparse.apply(2)`.
---
If your project is set up for it, you can reply to this email
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18899
This approach doesn't feel right to me. The goal of the change is to avoid
making a pass over the values to find out if there are any explicit zeros that
need to be eliminated, which is
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18899
First suggestion is that there must be unit tests :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18832
If you want to change it, that's fine. I think it's fine either way.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If yo
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18832
No, I don't think so. Computing parent stats is a very small fraction of
the time and memory compared with the overall `allStats` array. That's why we
decided to just add it in the f
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18832
I don't agree the comment is _misleading_. It might be confusing, but
that's something different.
The reason that the `DTStatsAggregator` needs to keep track of
`parentStats` is
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18832
The comment is not wrong. It's added for when we are finding the best
split, to compute the right child stats from the left child stats. We would
have just used the stats that are already avai
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18797
Can you change the title? Upgrade to **0.13.2**.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18315
ping! https://github.com/apache/spark/pull/18305 was merged. This can
proceed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18305
Thanks @MLnick, @hhbyyh, and @facaiy for reviewing!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18305
What's blocking us here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and w
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18513
LGTM!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r128786226
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18513
Let's make sure to create doc and python JIRAs before this gets merged btw.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r128042544
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
---
@@ -32,40 +34,45 @@ private[ml] trait
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18305
Thanks @MLnick. I agree with you about the broadcasting, so have backed it
out. I think all comments are addressed now, please let me know if there is
anything else.
---
If your project is set up
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r128041731
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala
---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r127558429
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r127554746
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r127555147
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software
101 - 200 of 1857 matches
Mail list logo