Github user zhengruifeng commented on the pull request:
https://github.com/apache/spark/pull/11917#issuecomment-208689383
cc @holdenk Could you please take a look?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19889
[SPARK-22690][ML] Imputer inherit HasOutputCols
## What changes were proposed in this pull request?
make `Imputer` inherit `HasOutputCols`
## How was this patch tested
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19889
No other algs output multi-column for now
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19889
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19889
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19084#discussion_r154897263
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -117,11 +113,56 @@ class MinMaxScaler @Since("1.5.0"
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19892
[SPARK-20542][FollowUp][PySpark] Bucketizer support multi-column
## What changes were proposed in this pull request?
Bucketizer support multi-column in the python side
## How was
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19894
[SPARK-22700][ML] Bucketizer.transform incorrectly drops row containing NaN
## What changes were proposed in this pull request?
only drops the rows containing NaN in the input columns
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19892
This PR is currently blocked by
https://github.com/apache/spark/pull/19894#issuecomment-349315711
---
-
To unsubscribe, e
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19894
ping @MLnick ?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19927
[WIP] OVR transform optimization
## What changes were proposed in this pull request?
optimize OVR transform
## How was this patch tested?
existing tests
You can merge this
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19927
test code:
```
import org.apache.spark.ml.classification._
val df =
spark.read.format("libsvm").load("/Users/zrf/Dev/OpenSource/s
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19927#discussion_r156314727
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -156,54 +153,22 @@ final class OneVsRestModel private[ml
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19950
[SPARK-22450][Core][MLLib][FollowUp] safely register class for mllib -
LabeledPoint/VectorWithNorm/TreePoint
## What changes were proposed in this pull request?
register following classes
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19963
[SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - Link Classification
Example
## What changes were proposed in this pull request?
in https://github.com/apache/spark/pull/18067, only
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19950
Since `VectorWithNorm` and `TreePoint` do not override method `equals`, we
can not directly using `===` to compare objects.
`LabeledPoint` is a case class, which method `equals` is
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19892
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19892
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19892
ping @holdenk , can you help reviewing this?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19530
[SPARK-22309][ML] Remove unused param in
`LDAModel.getTopicDistributionMethod` & destory `nodeToFeaturesBc` in
RandomForest
## What changes were proposed in this pull request?
Re
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19618
[SPARK-5484][Followup] PeriodicRDDCheckpointer doc cleanup
## What changes were proposed in this pull request?
PeriodicRDDCheckpointer was already moved out of mllib in Spark-5484
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r147950431
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,462 @@
+/*
+ * Licensed to the Apache
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r147950931
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,462 @@
+/*
+ * Licensed to the Apache
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r147953998
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,462 @@
+/*
+ * Licensed to the Apache
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19618
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r148702816
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,456 @@
+/*
+ * Licensed to the Apache
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19288
[SPARK-22075][ML] unpersist datasets cached by PeriodicRDDCheckpointer
## What changes were proposed in this pull request?
PeriodicRDDCheckpointer will automatically persist the last 3
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19288
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19288
@srowen In MLlib, `PeriodicRDDCheckpointer` is only used in
`GradientBoostedTrees`.
I just find that there is another checkpointer `PeriodicGraphCheckpointer`,
I will check it
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19288
@srowen I check `LDA` : although `unpersistDataSet` is not called in it,
no intermediate cached rdds is generated after `fit()`.
Then I check `Pregel`, and find that each call of
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19288
@srowen I found that the cached rdds in `Pregel` is just the result graph.
and the intermidiate rdds are already unpersisted directly out of the
graphCheckpointer. So I think we don't ne
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19288
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19288
@WeichenXu123 It maybe better to destory intermediate objects ASAP
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19229
I am not familiar with SQL source, but I think it's great to transform all
columns at a time
---
-
To unsubscribe, e
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19186
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19490
[Trivial][DOC] update code style for InteractionExample
## What changes were proposed in this pull request?
code style update
no other same issues found
## How was this patch
Github user zhengruifeng closed the pull request at:
https://github.com/apache/spark/pull/19490
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/14643
[SPARK-17057][ML] ProbabilisticClassifierModels' prediction more reasonable
with multi zero thresholds
## What changes were proposed in this pull request?
Change the behavi
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/14643
@srowen I though of `threshoulds` designed in ML just as a kind of
`weight`. This design is easy to understand. Is there some other librarys (like
sklearn) that support thresholds? We can
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18902
@MLnick @yanboliang I update the performance comparison.
The DF-based impl is a little slower than the RDD-based one when num of
column is small.
When num of column is large (100
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18902
@yanboliang RDD-based impl the (former
commit)[https://github.com/apache/spark/pull/18902/commits/8daffc9007c65f04e005ffe5dcfbeca634480465]
---
If your project is set up for it, you can
Github user zhengruifeng closed the pull request at:
https://github.com/apache/spark/pull/17951
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/17014#discussion_r135692470
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -85,6 +86,10 @@ abstract class Predictor[
M <: PredictionMo
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18902
@yanboliang Although dispointed by DF's performance, I also approve the
choice of DF just for less code.
---
If your project is set up for it, you can reply to this email and have your
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@WeichenXu123 @yanboliang I have updated this PR according to the comments.
Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19084
[SPARK-20711][ML]MultivariateOnlineSummarizer incorrect min/max for NaN
value
## What changes were proposed in this pull request?
current impl of min/max ignore `NaN`
for a
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19084
ping @WeichenXu123 @srowen
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@WeichenXu123 Agree that we should pass `handlePersistence` to mllib impl.
Thanks for pointing it out!
---
If your project is set up for it, you can reply to this email and have your
reply
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@WeichenXu123 Current impl of `mllib.KMeans` seems do not support caching,
it just (log
warnings)[https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19084
`MinMaxScalerSuite` fails because `MinMaxScaler` need the behavior of
ignoring `NaN`. So I think there are 2 options:
1, `MultivariateOnlineSummarizer/Summarizer` support param
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@WeichenXu123 Sounds good. And since adding `handlePersistence` as a
`ml.Param` may influences many algs (more than that in this PR), I think we may
need more discussion @MLnick @yanboliang
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/17014#discussion_r136737427
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -304,16 +304,14 @@ class KMeans @Since("1.5.0") (
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@WeichenXu123 @jkbradley I am curious about why `ml.Kmeans` is special
that it needs a separate PR
---
If your project is set up for it, you can reply to this email and have your
reply
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18902
@WeichenXu123 No, I only cache the DataFrame. And the RDD-Version is
[here](https://github.com/apache/spark/pull/18902/commits/8daffc9007c65f04e005ffe5dcfbeca634480465).
I use the same
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
Jenkins, retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
@smurching OK, I will close this PR and resubmit it to the new ticket.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user zhengruifeng closed the pull request at:
https://github.com/apache/spark/pull/17014
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19186
[SPARK-21972][ML] Add param handlePersistence
## What changes were proposed in this pull request?
Add param handlePersistence
## How was this patch tested?
existing tests
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19186
Jenkins, retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19186#discussion_r138237760
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -483,24 +488,17 @@ class LogisticRegression @Since
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19186#discussion_r138243247
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
---
@@ -444,13 +444,13 @@ class
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19107
I am OK to resubmit the original PR if needed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19197
[SPARK-18608][ML] Fix double caching
## What changes were proposed in this pull request?
`df.rdd.getStorageLevel` => `df.storageLevel`
using cmd `find . -name '*.scala
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19198
[MINOR][DOC] Add missing call of `update()` in examples of
PeriodicGraphCheckpointer & PeriodicRDDCheckpointer
## What changes were proposed in this pull request?
forgot to call `up
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19198
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19197
ping @jkbradley @WeichenXu123
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18902
Any more comments on this PR? It have been about one month since the last
modification.
---
-
To unsubscribe, e-mail
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19110#discussion_r138517690
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -297,6 +298,16 @@ final class OneVsRest @Since("
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19186
@WeichenXu123 Thanks a lot for pointing it out! I also forgot about this.
@smurching Thanks for your solution, however, I think there maybe exist
another drawback in it: The alg usually use
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19220
LGTM Thanks for this catch!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19229
In the test code, should we use `model.transform(df).count` instead?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19186
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/19232
[SPARK-22009][ML] Using treeAggregate improve some algs
## What changes were proposed in this pull request?
I test on a dataset of about 10M instances, and found that using
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/19232
ping @yanboliang
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user zhengruifeng closed the pull request at:
https://github.com/apache/spark/pull/17384
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17384
@MLnick Agree. I will close this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18589
@MLnick Sorry to reply late. It is a long time since I got the last
comments in the previous PR https://github.com/apache/spark/pull/15324, so I
thought that community may dislike that design
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18612
@holdenk I think the meaning of `StepSize` in GBT and `Threshould` in
LInearSVC/Binarizer is almost same as that in other algs, so it maybe better to
make them inherit from same trait
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18154
@hhbyyh @HyukjinKwon Sorry to reply late.
I think it may be better to use a special logic if it is more efficient in
performance.
What is your opinion? @yanboliang @HyukjinKwon
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18610
LGTM, this is really a great feature
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18610
@yanboliang Agree. I think if there is some prarms which control the
"shape" of model coefficients, than they should be override if we use inital
model. Like `k` in KMeans, GMM, L
Github user zhengruifeng closed the pull request at:
https://github.com/apache/spark/pull/17995
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/18902
[SPARK-21690][ML] one-pass imputer
## What changes were proposed in this pull request?
parallelize the computation of all columns
## How was this patch tested?
existing tests
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18902
Jenkis, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18902
@hhbyyh Yes, I will test the performance.
Btw, the median computation by call `stat.approxQuantile` will also
transform df into rdd before aggregation. see
https://github.com/apache/spark
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18902
I test the performance on a small data, the value in the following table is
the average duration in seconds:
|numColums| Old Mean | Old Median | New Mean | New Median
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/17014
ping @MLnick ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18902
@hhbyyh Good Idea! We can also use this trick to compute median, because
method
`multipleApproxQuantiles`[https://github.com/apache/spark/blob/0e80ecae300f3e2033419b2d98da8bf092c105bb/sql/core
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18902
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/16763
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/18902
@hhbyyh I rewrite the impl, and now all `NaN` and `missingValue` will be
transform to `null` at first, then current methods are used.
For columns only containing `null`, `null` is
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/18538#discussion_r133372183
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala
---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/18538#discussion_r133371918
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala
---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/18538#discussion_r133370353
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala
---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/18538#discussion_r133372968
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala
---
@@ -0,0 +1,225 @@
+/*
+ * Licensed to the
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/18538#discussion_r133370197
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala
---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/18538#discussion_r133368511
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala
---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/18538#discussion_r133370279
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala
---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/18538#discussion_r133372318
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala
---
@@ -0,0 +1,225 @@
+/*
+ * Licensed to the
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/18538#discussion_r133368243
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala
---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache
701 - 800 of 878 matches
Mail list logo