Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/22873
LGTM. Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/22764
I think it can target for 3.0. since 2.4 will be released soon and this PR
looks a little complex and need take some time to check carefully
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22764#discussion_r228527916
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
---
@@ -225,13 +227,14 @@ object BisectingKMeansModel
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22790#discussion_r228215468
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
---
@@ -109,7 +109,7 @@ class BisectingKMeansModel
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/22790
LGTM.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22675#discussion_r227193764
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,113 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/22793
LGTM pending on Jenkins pass.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22764#discussion_r226826701
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
---
@@ -225,13 +227,14 @@ object BisectingKMeansModel
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/22780
[DOC][MINOR] Fix minor error in the code of graphx guide
## What changes were proposed in this pull request?
Fix minor error in the code "sketch of pregel implementation"
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22764#discussion_r226812121
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
---
@@ -225,13 +227,14 @@ object BisectingKMeansModel
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22764#discussion_r226520698
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
---
@@ -225,13 +227,14 @@ object BisectingKMeansModel
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/22763
LGTM. thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/22756
LGTM. thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22675#discussion_r226379993
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,90 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
--- End
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22675#discussion_r226161636
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22675#discussion_r226161623
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22675#discussion_r226161557
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/22675
This do not block 2.4 release. But merge before 2.4 is better.
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/22675
[SPARK-25347][ML][DOC] Spark datasource for image/libsvm user guide
## What changes were proposed in this pull request?
Spark datasource for image/libsvm user guide
## How
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/22618
LGTM. Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user WeichenXu123 closed the pull request at:
https://github.com/apache/spark/pull/22492
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/22510
[SPARK-25321][ML] Fix local LDA model constructor
## What changes were proposed in this pull request?
change back the constructor to:
```
class LocalLDAModel private[ml
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/22492
@mengxr Should this be put into master ?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/22492
[SPARK-25321][ML] Revert SPARK-14681 to avoid API breaking change
## What changes were proposed in this pull request?
Revert SPARK-14681 to avoid API breaking change. PR [SPARK-14681
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/22449
[SPARK-22666][ML][FOLLOW-UP] Return a correctly formatted URI for invalid
images
## What changes were proposed in this pull request?
Change the URI returned in ImageFileFormat for an
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/22360
Do we need to set `distanceMeasure` again for the parent model ?
When parent model created, it will use the same `distanceMeasure` with the
one used in training
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22349#discussion_r216117396
--- Diff: python/pyspark/ml/image.py ---
@@ -207,6 +207,9 @@ def readImages(self, path, recursive=False,
numPartitions=-1,
.. note:: If
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/22360
@srowen The delegated `mllib.BisectingKMeansModel` is:
```
class BisectingKMeansModel private[clustering] (
private[clustering] val root: ClusteringTreeNode,
@Since
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/22360
[MINOR][ML] Remove `BisectingKMeansModel.setDistanceMeasure` method
## What changes were proposed in this pull request?
Remove `BisectingKMeansModel.setDistanceMeasure` method.
In
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/22349
[SPARK-25345][ML] Deprecate public APIs from ImageSchema
## What changes were proposed in this pull request?
Deprecate public APIs from ImageSchema.
## How was this patch
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215200249
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/source/image/ImageOptions.scala ---
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215138998
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215138889
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -567,6 +567,7 @@ object DataSource extends
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215138862
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215138728
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215138711
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215138174
--- Diff: data/mllib/images/images/license.txt ---
@@ -0,0 +1,13 @@
+The images in the folder "kittens" are under the creative commons CC
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215135665
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/22328
[SPARK-22666][ML][SQL] Spark datasource for image format
## What changes were proposed in this pull request?
Implement an image schema datasource.
This image datasource
Github user WeichenXu123 closed the pull request at:
https://github.com/apache/spark/pull/19666
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20446
@srowen The reason I do not use `.show` I have already reply here
https://github.com/apache/spark/pull/20446#discussion_r165565121
thanks
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21513
LGTM. Thanks! @mengxr Would you mind take a look ?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21513#discussion_r194214431
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self):
return self.getOrDefault
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21513#discussion_r194214516
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self):
return self.getOrDefault
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21513#discussion_r194214535
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self):
return self.getOrDefault
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21513#discussion_r194214831
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self):
return self.getOrDefault
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21513#discussion_r194215008
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self):
return self.getOrDefault
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21513#discussion_r194167552
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1159,216 @@ def getKeepLastCheckpoint(self):
return self.getOrDefault
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21119
@huaxingao Create a new PR is better I think.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/21493
[SPARK-15784] Add Power Iteration Clustering to spark.ml
## What changes were proposed in this pull request?
According to the discussion on JIRA. I rewrite the Power Iteration
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21265
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21265#discussion_r192000596
--- Diff: python/pyspark/ml/fpm.py ---
@@ -243,3 +244,105 @@ def setParams(self, minSupport=0.3,
minConfidence=0.8, itemsCol="
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21265#discussion_r191996249
--- Diff: python/pyspark/ml/fpm.py ---
@@ -243,3 +244,105 @@ def setParams(self, minSupport=0.3,
minConfidence=0.8, itemsCol="
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21265#discussion_r191995667
--- Diff: python/pyspark/ml/fpm.py ---
@@ -243,3 +244,75 @@ def setParams(self, minSupport=0.3, minConfidence=0.8,
itemsCol="
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21393
@mengxr @jkbradley
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/21393
[SPARK-20114][ML][FOLLOW-UP] spark.ml parity for sequential pattern mining
- PrefixSpan
## What changes were proposed in this pull request?
Change `PrefixSpan` into a class with
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21163
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20973#discussion_r188853310
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20973#discussion_r188491670
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21129
Jenkins test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17086
LGTM. @jkbradley @mengxr Would you mind take a look ?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21163
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21274
LGTM. !
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20973#discussion_r186994754
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21274#discussion_r186986006
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -232,7 +232,7 @@ class PowerIterationClustering
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21270
@shahidki31 Seemingly what you said above is anothor issue ? You can create
another jira for that. :)
---
-
To unsubscribe
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21272
LGTM!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21129
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/21265
[SPARK-24146][PySpark][ML] spark.ml parity for sequential pattern mining -
PrefixSpan: Python API
## What changes were proposed in this pull request?
spark.ml parity for sequential
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/13493
LGTM!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20095#discussion_r186381507
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala ---
@@ -79,7 +82,52 @@ abstract class Estimator[M <: Model[M]] exte
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21097#discussion_r186037589
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
---
@@ -365,6 +365,20 @@ class GBTClassifierSuite extends
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21218#discussion_r185970925
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
@@ -423,6 +423,8 @@ class GaussianMixture @Since("
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20261
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21218#discussion_r185756220
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -378,6 +378,7 @@ class KMeans @Since("
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21218#discussion_r185756193
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
@@ -423,6 +423,8 @@ class GaussianMixture @Since("
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20973
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20261
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20973
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20973#discussion_r185149879
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala ---
@@ -44,26 +43,37 @@ object PrefixSpan {
*
* @param dataset
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21153#discussion_r184626842
--- Diff: python/pyspark/ml/util.py ---
@@ -523,11 +534,29 @@ def getAndSetParams(instance, metadata):
"""
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21153#discussion_r184620777
--- Diff: python/pyspark/ml/util.py ---
@@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc,
extraMetadata=None, paramMap=None
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21153#discussion_r184620855
--- Diff: python/pyspark/ml/util.py ---
@@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc,
extraMetadata=None, paramMap=None
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/17086
overall good, @jkbradley Would you mind take a look ?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17086#discussion_r184584878
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala
---
@@ -55,44 +60,128 @@ class MulticlassMetricsSuite
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17086#discussion_r184566012
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala
---
@@ -95,4 +95,95 @@ class MulticlassMetricsSuite
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21119#discussion_r184342231
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self):
return self.getOrDefault
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21119#discussion_r184346287
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self):
return self.getOrDefault
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21119#discussion_r184343934
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self):
return self.getOrDefault
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21119#discussion_r184344777
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self):
return self.getOrDefault
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21119#discussion_r184344901
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self):
return self.getOrDefault
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21119#discussion_r184345688
--- Diff: python/pyspark/ml/clustering.py ---
@@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self):
return self.getOrDefault
GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/21163
[SPARK-24097][ML] Instruments improvements - RandomForest and
GradientBoostedTree
## What changes were proposed in this pull request?
Instruments improvements for `RandomForest` and
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21120
I doubt that this will slow down the summarizer performance because you add
sum statistics internally (and this sum value will possible to overflow).
We can directly use `count * mean` to
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17086#discussion_r183645675
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala
---
@@ -95,4 +95,95 @@ class MulticlassMetricsSuite
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17086#discussion_r183647005
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala
---
@@ -95,4 +95,95 @@ class MulticlassMetricsSuite
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17086#discussion_r183645265
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala
---
@@ -95,4 +95,95 @@ class MulticlassMetricsSuite
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17086#discussion_r183647533
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala
---
@@ -95,4 +95,95 @@ class MulticlassMetricsSuite
Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17086#discussion_r183646411
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala
---
@@ -95,4 +95,95 @@ class MulticlassMetricsSuite
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/21129
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
1 - 100 of 1170 matches
Mail list logo