Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15140#discussion_r80030192
--- Diff:
core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala ---
@@ -670,6 +670,19 @@ class JavaSparkContext(val sc: SparkContext
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15113
@BryanCutler Thanks for working on this. I'm a bit worried that if users
set ```weightCol = None``` for Python means he would like to set ```weightCol =
null``` for Scala.
The c
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15113
Further more, ```weightCol=None``` in the Python API doc may be confused
for users, I think we can add some annotations to clarify the meanings. Thanks.
---
If your project is set up for it
GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/15214
[SPARK-17017][Follow-up][ML] Refactor of ChiSqSelector and add ML Python
API.
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15131
@HyukjinKwon Sounds good. Do you think only backport the URI related change
is OK?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/15215
[Minor][SparkR] Add sparkr-vignettes.html to gitignore.
## What changes were proposed in this pull request?
Add ```sparkr-vignettes.html``` to ```.gitignore```.
## How was this
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15215
cc @shivaram @felixcheung
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/15216
[SPARK-17577][Follow-up][SparkR] SparkR spark.addFile supports adding
directory recursively
## What changes were proposed in this pull request?
#15140 exposed ```JavaSparkContext.addFile
GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/15217
[SPARK-17577][Core] Update SparkContext.addFile to make it work well on
Windows [2.0 backport]
## What changes were proposed in this pull request?
Update ```SparkContext.addFile``` to
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15217
cc @HyukjinKwon @sarutak @shivaram
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15131
Opened backport PR at #15217. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user yanboliang closed the pull request at:
https://github.com/apache/spark/pull/15217
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15217
Close this PR. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15214#discussion_r80355100
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -143,13 +149,13 @@ final class ChiSqSelector @Since("
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15214#discussion_r80355103
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -160,6 +166,12 @@ final class ChiSqSelector @Since("
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15214#discussion_r80355108
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala ---
@@ -76,7 +76,7 @@ class ChiSqSelectorSuite extends
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15214
@mpjlu The most important cause of this change is that the fit/train model
should not dependent on the order of users setting params. In other words,
users should get the same model whether set
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15215
Merged into master. Thanks for review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r80376680
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala
---
@@ -116,7 +117,7 @@ class
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r80376849
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala ---
@@ -282,9 +281,7 @@ class MLUtilsSuite extends SparkFunSuite with
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r80376739
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -57,8 +58,7 @@ class MinMaxScalerSuite extends SparkFunSuite
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r80376695
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/evaluation/RegressionEvaluatorSuite.scala
---
@@ -42,9 +43,10 @@ class RegressionEvaluatorSuite
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14035
@HyukjinKwon I have made a pass and this PR look good overall. Could you
double check whether all ML test cases are covered? Since I found we used
implicit import of different style at
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15216#discussion_r80377484
--- Diff: R/pkg/R/context.R ---
@@ -231,17 +231,21 @@ setCheckpointDir <- function(sc, dirName) {
#' filesystems), or an HTTP, HTTPS or FTP
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15216#discussion_r80377496
--- Diff: R/pkg/R/context.R ---
@@ -231,17 +231,21 @@ setCheckpointDir <- function(sc, dirName) {
#' filesystems), or an HTTP, HTTPS or FTP
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15214
@srowen @mpjlu
Another important reason for this change: it's error prone for Python ML
API.
```
def __init__(self, numTopFeatures=50, featuresCol="features",
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15214
And you can also refer all other Estimator in ML, even you swap the
arguments setting order, you still get the same model. Thanks.
---
If your project is set up for it, you can reply to this
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r80449593
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/ChiSqSelectorSuite.scala ---
@@ -29,8 +29,7 @@ class ChiSqSelectorSuite extends SparkFunSuite
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14035
LGTM, merged into master. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15216
Merged into master, thanks for review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14852
LGTM2, merged into master. Thanks! @WeichenXu123 @sethah
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/15261
[SPARK-16356][Follow-up][ML] Enforce ML test of exception for
local/distributed Dataset.
## What changes were proposed in this pull request?
#14035 added ```testImplicits``` to ML unit
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15261
cc @HyukjinKwon
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15261
@HyukjinKwon The root cause of this is Spark supported creating local
Dataset which may not trigger a Spark job. This satisfied the design of
Dataset, and in most case the have the same behavior
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15261
@srowen Would you mind to have a look when you are available? Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/15277
[SPARK-17704][ML][MLlib] ChiSqSelector performance improvement.
## What changes were proposed in this pull request?
Several performance improvement for ```ChiSqSelector```:
1, Keep
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15277#discussion_r80863514
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -220,18 +231,22 @@ class ChiSqSelector @Since("
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15277#discussion_r80863287
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15212
@mpjlu I made some changes to improve ```ChiSqSelector``` performance at
#15277. Let work together to get that in first, and then we can work on this.
Thanks!
---
If your project is set up for
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15277
cc @mpjlu @srowen @avulanov
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15261
Merged into master, thanks for review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15261#discussion_r81082669
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorIndexerSuite.scala ---
@@ -121,10 +119,17 @@ class VectorIndexerSuite extends
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15277#discussion_r81085184
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15277#discussion_r81101636
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/12819#discussion_r81106187
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala
---
@@ -150,6 +150,54 @@ class NaiveBayesSuite extends
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/12819#discussion_r81105095
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -27,11 +27,14 @@ import org.json4s.jackson.JsonMethods
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/12819#discussion_r81105041
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -27,11 +27,14 @@ import org.json4s.jackson.JsonMethods
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/12819#discussion_r81105501
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -355,79 +356,33 @@ class NaiveBayes private
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/12819#discussion_r81106353
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala
---
@@ -150,6 +150,54 @@ class NaiveBayesSuite extends
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/12819#discussion_r81107309
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -355,79 +356,33 @@ class NaiveBayes private
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/12819
@zhengruifeng Only left some minor comments, otherwise, looks good. I think
we should also make parity check between the ml and mllib test suites, and
complement missing test cases for ml since
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15277#discussion_r8257
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15277#discussion_r81116099
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15277
Merged into master. Thanks for all your review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15277
@srowen I'm sorry for misunderstand. I'll revert it firstly and let's
continue the discussion. Thanks.
---
If your project is set up for it, you can reply to this email and
GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/15298
Revert "[SPARK-17704][ML][MLLIB] ChiSqSelector performance improvement."
## What changes were proposed in this pull request?
Revert "[SPARK-17704][ML][MLLIB] ChiSqSelec
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15277#discussion_r81119230
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15298
@srowen Sure. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15211#discussion_r81136637
--- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/SVM.scala
---
@@ -0,0 +1,527 @@
+/*
+ * Licensed to the Apache Software
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15211#discussion_r81136865
--- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/SVM.scala
---
@@ -0,0 +1,527 @@
+/*
+ * Licensed to the Apache Software
Github user yanboliang closed the pull request at:
https://github.com/apache/spark/pull/15298
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15298
OK, I'll close this one and move to #15299 . Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15299
@srowen I think this PR may fail MiMa tests, since it makes binary
incompatible change. The major disagreement between this and #15277 is whether
to keep ```selectedFeatures``` sorted. I think
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15299
We need ```sort``` cost in any case, and put it in fit/training or model
has no difference. So I think if we want to introduce this binary incompatible
change, there should be strong
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15299
Oh, ```isSorted``` is left and it's not introduce binary incompatible right
now. Thanks for your remind. I'm neutral for this change. Thanks!
---
If your project is set up for it, you
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15299
@srowen It looks strange to left it protected, and deprecating it looks ok
to me except someone tells me any reason. BTW, please update Python API docs to
reflect that ```selectedFeatures``` is
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/12819#discussion_r81285828
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -355,79 +356,33 @@ class NaiveBayes private
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/12819
Merged into master. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/12819
@zhengruifeng Please create JIRAs for the follow-up works:
* Parity check between the ml and mllib test suites, and complement missing
test cases for ml.
* Investigate how to handle ```-1
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15299
@MLnick This change will not break binary compatibility currently. It marks
```isSorted``` as deprecated and will break binary compatibility when we delete
that method. This should be not a big
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/15313
LGTM, merged into master. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14937
@srowen Please feel free to send that PR. This PR involves some significant
change and should be careful discussed, it may not be merged too fast. Thanks!
---
If your project is set up for it
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14150#discussion_r70820419
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala ---
@@ -784,7 +784,13 @@ class DistributedLDAModel private[clustering
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14150#discussion_r70821003
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -508,8 +508,9 @@ final class OnlineLDAOptimizer extends
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14150#discussion_r70821947
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/clustering/LDASuite.scala ---
@@ -118,8 +118,8 @@ class LDASuite extends SparkFunSuite with
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14150#discussion_r70822797
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala ---
@@ -42,7 +43,9 @@ class PCASuite extends SparkFunSuite with
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14150#discussion_r70823371
--- Diff: dev/deps/spark-deps-hadoop-2.7 ---
@@ -163,6 +163,7 @@ scala-parser-combinators_2.11-1.0.4.jar
scala-reflect-2.11.8.jar
scala-xml_2.11
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14150
cc @srowen
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14150
@srowen I found no obvious compatibility issues after reading the release
notes. If this looks good, please let it get in, since
[SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14150
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14150#discussion_r71151532
--- Diff: mllib/src/test/java/org/apache/spark/ml/feature/JavaPCASuite.java
---
@@ -107,7 +107,11 @@ public VectorPair call(Tuple2 pair
GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/14326
[SPARK-3181] [ML] Implement RobustRegression with huber loss.
## What changes were proposed in this pull request?
The current implementation is a straight forward porting for Python
scikit
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14265
LGTM, merged into master. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14326#discussion_r72031054
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala ---
@@ -0,0 +1,473 @@
+/*
+ * Licensed to the Apache
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14326#discussion_r72031141
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala ---
@@ -0,0 +1,466 @@
+/*
+ * Licensed to the Apache
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14326
cc @dbtsai @MechCoder
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14326#discussion_r72033498
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala ---
@@ -0,0 +1,466 @@
+/*
+ * Licensed to the Apache
GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/14346
[SPARK-16710] [SparkR] [ML] spark.glm should support weightCol
## What changes were proposed in this pull request?
Training GLMs on weighted dataset is very important use cases. Users can
GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/14369
[Minor] [ML] Fix some mistake in LinearRegression formula.
## What changes were proposed in this pull request?
Fix some mistake in ```LinearRegression``` formula.
## How was this
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14182#discussion_r72376639
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/r/IsotonicRegressionWrapper.scala ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14182#discussion_r72377343
--- Diff: R/pkg/NAMESPACE ---
@@ -24,7 +24,8 @@ exportMethods("glm",
"spark.kmeans",
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14182#discussion_r72377491
--- Diff: R/pkg/R/mllib.R ---
@@ -292,6 +299,43 @@ setMethod("summary", signature(object =
"NaiveBayesModel"),
GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/14378
[SPARK-16750] [ML] Fix GaussianMixture training failed due to feature
column type mistake
## What changes were proposed in this pull request?
ML ```GaussianMixture``` training failed due to
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14378
cc @srowen
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14182#discussion_r72558746
--- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
@@ -454,4 +454,9 @@ test_that("spark.survreg", {
}
})
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14378#discussion_r72560646
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -111,7 +111,7 @@ class MinMaxScaler @Since("1.5.0") (@Si
GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/14392
[SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapper in SparkR
## What changes were proposed in this pull request?
Gaussian Mixture Model wrapper in SparkR, similarly to R
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14392
cc @mengxr
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14346
cc @mengxr
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14378#discussion_r72623737
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -111,7 +111,7 @@ class MinMaxScaler @Since("1.5.0") (@Si
901 - 1000 of 2646 matches
Mail list logo