Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r127498459
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r127557871
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/FeatureHasherSuite.scala ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r127491688
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/FeatureHasherSuite.scala ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18513
Just to clarify:
* If I want to treat a column as categorical that is represented by
integers, I'd have to map those integers to strings, right? I believe that's
one of your bul
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r127064727
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala
---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18305
Did we reach a consensus on the broadcast variables? My opinion is that
it's probably better in this case not to worry about it, and we can back out
the change that destroys them in the test s
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r126446952
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala
---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r125759270
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala
---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r125757263
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/loss/RDDLossFunction.scala ---
@@ -62,8 +62,8 @@ private[ml] class RDDLossFunction[
val
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r125681761
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
---
@@ -32,40 +34,45 @@ private[ml] trait
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r125680954
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala
---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r124615112
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/loss/RDDLossFunction.scala ---
@@ -50,7 +50,7 @@ private[ml] class RDDLossFunction[
Agg
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r124614166
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/DifferentiableLossAggregatorSuite.scala
---
@@ -157,4 +160,38 @@ object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r124615145
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
---
@@ -38,34 +40,39 @@ private[ml] trait
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r124614829
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala
---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r124615494
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
---
@@ -38,34 +40,39 @@ private[ml] trait
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r124384213
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/loss/RDDLossFunction.scala ---
@@ -62,8 +62,8 @@ private[ml] class RDDLossFunction[
val
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18305
also ping @hhbyyh @yanboliang This is a straightforward follow up to
https://github.com/apache/spark/pull/17094. Let me know if I can do anything to
make the review easier.
---
If your project is
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18118
I'll take a look at the changes in the next few days. In the meantime, you
can remove "Please review http://spark.apache.org/contributing.html before
opening a pull request." from the
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18389
@zhengruifeng Wasn't there some history on this issue? I thought there was
another PR? If that's the case, it's always helpful to post links to
discussions, or just to summarize the d
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r123050801
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala ---
@@ -150,11 +154,11 @@ class GBTRegressor @Since("1.4.0") (@Si
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r123049135
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -136,6 +136,10 @@ class GBTClassifier @Since("
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r123051195
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala
---
@@ -118,11 +119,12 @@ class DecisionTreeRegressor @Since
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r123040612
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
---
@@ -73,19 +75,21 @@ private[spark] object GradientBoostedTrees
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r123039522
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala ---
@@ -140,6 +140,10 @@ class GBTRegressor @Since("1.4.0") (@Si
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r123049728
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -192,6 +196,9 @@ object GBTClassifier extends
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r123040767
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
---
@@ -284,11 +290,13 @@ private[spark] object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r123051005
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
---
@@ -319,8 +327,10 @@ private[spark] object GradientBoostedTrees
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r123050956
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
---
@@ -284,11 +290,13 @@ private[spark] object
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18118#discussion_r123042480
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala ---
@@ -49,14 +49,16 @@ import org.apache.spark.rdd.RDD
@Since
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18315
Thanks for this pr @hhbyyh. I think we need to add a test suite for the
aggregator, but since https://github.com/apache/spark/pull/18305 needs to be
merged first, it's fine to wait. If you wou
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18305
cc @VinceShieh @MLnick @srowen
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15435
ping?? @yanboliang @MLnick
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/18305
[SPARK-20988][ML] Logistic regression uses aggregator hierarchy
## What changes were proposed in this pull request?
This change pulls the `LogisticAggregator` class out of
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17862
@hhbyyh Thanks for doing the extra work to use the new aggregator here. I
do think it's better to separate those changes from this one, though. There is
actually more that needs to be done fo
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18118
I don't think there's any point in pinging every day :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project doe
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17094
@srowen Speaking for myself, I think the other concerns can be issued as
follow ups, yes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18151
One minor comment, otherwise LGTM. Thanks for catching this @jkbradley!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18151#discussion_r119901231
--- Diff: python/pyspark/ml/classification.py ---
@@ -109,6 +109,10 @@ class LinearSVC(JavaEstimator, HasFeaturesCol,
HasLabelCol, HasPredictionCol, Ha
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17894
@VinceShieh Thanks for posting your results. You tested these on datasets
with only 100 samples correct? That's probably not a representative use case of
a normal workload... Also, how many cl
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18151#discussion_r119472374
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
@@ -127,6 +127,27 @@ class LinearSVCSuite extends SparkFunSuite
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17094
Ok, yes all good points. I think since these are all private apis it gives
us room for future changes. For now, I think we can get rid of a lot of code
duplication and fill in some testing gaps with
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18151#discussion_r119275319
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
@@ -127,6 +127,14 @@ class LinearSVCSuite extends SparkFunSuite
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17094
@MLnick I completely agree about the leaky regularization abstraction. In
fact, I think the function composition feature would make it easy to get rid of
that problem. Consider:
In the
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/18120
cc @BryanCutler.
Bryan did some work on https://github.com/apache/spark/pull/17849. It seems
even with that patch, we still need to add methods like these, hoping Bryan can
confirm. If
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/11974
Mini-batching in Spark generally isn't that efficient, since to extract a
mini-batch you still need to iterate over the entire dataset - and that means
reading it from disk if it doesn'
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17094
cc @srowen also
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/11459
This can be closed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/13959
Yes, this is a tough issue. Let's wait and see if @jkbradley has thoughts
on this issue. If we don't hear anything, then I'd leave it up to @MechCoder on
whether to reopen. Thanks,
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/13959
This is fine, but are we not also policing JIRAs? I've argued above that
the reason this PR has been inactive is simply lack of interest in this issue.
If that's the case, then the JIRA mu
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17094
Thanks @MLnick! I am happy to discuss splitting this into smaller bits as
well, if it can make things easier.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/13959
The lack of bandwidth in MLlib means that sometimes good code that would
make an impact just gets ignored. This is kind of the reality of things.
However, if we are going to close the PR simply
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17094
ping! @MLnick @jkbradley @yanboliang @hhbyyh
Is there any interest in this? I actually think this cleanup will be a
precursor to several different improvements (adding more optimized
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17586
@MLnick There was some discussion
[here](https://github.com/apache/spark/pull/15435) and also on the JIRA for
that pr. We definitely want to design it carefully so it's easy to share code.
-
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17910
@zhengruifeng In the follow up PR, would you mind changing the logistic
regression tests to incorporate `setMaxIter(1)`?
---
If your project is set up for it, you can reply to this email and have
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/12761
@daniel-siegmann-aol Good points, and thanks for following up on this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17910#discussion_r116408136
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
---
@@ -2318,8 +2319,8 @@ class LogisticRegressionSuite
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15435
ping! @jkbradley @yanboliang
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17894
Would you mind adding `[WIP]` to the title? Without even a benchmark for
dense features, this is definitely a work-in-progress.
---
If your project is set up for it, you can reply to this email and
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17910#discussion_r115813053
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
---
@@ -2318,8 +2319,8 @@ class LogisticRegressionSuite
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17864
So the other PR https://github.com/apache/spark/pull/11601 is really long.
For reference, I am picking out the relevant discussions to this PR (also
someone tell me if there's a better way to
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17793#discussion_r114893509
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -910,26 +944,143 @@ object ALS extends DefaultParamsReadable[ALS] with
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17862#discussion_r114879308
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -145,6 +164,15 @@ class LinearSVC @Since("2.2.0") (
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15435#discussion_r114818175
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -982,19 +989,33 @@ class LogisticRegressionModel private
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17845#discussion_r114660256
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -389,6 +436,17 @@ class ALSModel private[ml
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17845#discussion_r114660148
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -389,6 +436,17 @@ class ALSModel private[ml
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17845#discussion_r114661114
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -372,11 +385,45 @@ class ALSModel private[ml] (
num: Int
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17845#discussion_r114660366
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -356,6 +356,19 @@ class ALSModel private[ml
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17845#discussion_r114658306
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -372,11 +385,45 @@ class ALSModel private[ml] (
num: Int
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17742#discussion_r114655727
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -274,46 +275,62 @@ object
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15435
btw @WeichenXu123 you just have to fix merge conflicts while rebasing. This
is always possible. Squashing commits is rarely necessary and rarely good
practice for an open PR IMO.
---
If your
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15435
ping @jkbradley @srowen. Any hope/interest for 2.2? Probably too late, but
wanted to check.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17556
Thanks @srowen!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17556#discussion_r114457816
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1037,7 +1042,8 @@ private[spark] object RandomForest extends Logging
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17556#discussion_r114134468
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1009,10 +1009,24 @@ private[spark] object RandomForest extends
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17793#discussion_r114061940
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -910,26 +944,127 @@ object ALS extends DefaultParamsReadable[ALS] with
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17793
btw "You can build just the Spark scaladoc by running build/sbt unidoc from
the SPARK_PROJECT_ROOT directory."
[Link](https://github.com/apache/spark/tree/master/docs)
---
If your proj
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17793#discussion_r114009389
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -910,26 +944,127 @@ object ALS extends DefaultParamsReadable[ALS] with
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17793#discussion_r114003560
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -910,26 +944,127 @@ object ALS extends DefaultParamsReadable[ALS] with
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17793#discussion_r114009897
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -1026,7 +1161,24 @@ object ALS extends DefaultParamsReadable[ALS] with
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17793#discussion_r114009801
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
---
@@ -910,26 +944,127 @@ object ALS extends DefaultParamsReadable[ALS] with
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15435#discussion_r113950702
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1231,6 +1295,109 @@ class
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15435
ping @yanboliang @jkbradley This LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15435#discussion_r113857158
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1231,6 +1295,109 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15435#discussion_r113857182
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1231,6 +1295,109 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15435#discussion_r113856908
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1070,90 +1096,128 @@ private[classification] class
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17793
+1 for this change. I'll try to take a look sometime, but maybe after the
QA period. Also cc @MLnick.
---
If your project is set up for it, you can reply to this email and have your
reply appe
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17556
I don't mind the weighted midpoints. However, if for a continuous feature
we find that many points have the exact same value, we are assuming we may find
data points in the test set that are
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17556#discussion_r113855186
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1009,10 +1009,24 @@ private[spark] object RandomForest extends
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17556#discussion_r113855243
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -138,9 +169,10 @@ class RandomForestSuite extends SparkFunSuite
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17556#discussion_r113854473
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -112,9 +138,11 @@ class RandomForestSuite extends SparkFunSuite
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17556#discussion_r113855209
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1037,7 +1051,10 @@ private[spark] object RandomForest extends
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17503
I think the benefit of this would be for speed at predict time or for model
storage. @srowen the nodes don't have to be equal to be merged, they just have
to output the same prediction. Since t
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17706#discussion_r112746518
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
---
@@ -1204,6 +1207,9 @@ class LogisticRegressionSuite
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17706#discussion_r112745372
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
---
@@ -1204,6 +1207,9 @@ class LogisticRegressionSuite
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17706
@WeichenXu123 Thanks for the pr. Is there a JIRA? Why is testing "not
applicable"? Seems you are correct on this, but could you please provide a good
reference?
---
If your project is
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15435
@WeichenXu123 I made a PR to your branch. Can you check it? I think you'll
still need to update the Mima file. Also, this may not make 2.2, so then you'd
have to update the since tags.
-
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17416
@srowen Can you confirm what happens when the jars are not found in your
local m2 cache? Do you still find the `-models` jar in the ivy2 cache?
---
If your project is set up for it, you can reply
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17556
Seems like a reasonable change. Just left some minor comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/17556#discussion_r111434055
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -104,6 +104,18 @@ class RandomForestSuite extends SparkFunSuite
201 - 300 of 1857 matches
Mail list logo