[GitHub] spark issue #22873: [SPARK-25866][ML] Update KMeans formatVersion

2018-11-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/22873 LGTM. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22764: [SPARK-25765][ML] Add training cost to BisectingKMeans s...

2018-10-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/22764 I think it can target for 3.0. since 2.4 will be released soon and this PR looks a little complex and need take some time to check carefully

[GitHub] spark pull request #22764: [SPARK-25765][ML] Add training cost to BisectingK...

2018-10-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22764#discussion_r228527916 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala --- @@ -225,13 +227,14 @@ object BisectingKMeansModel

[GitHub] spark pull request #22790: [SPARK-25793][ML]call SaveLoadV2_0.load for class...

2018-10-25 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22790#discussion_r228215468 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala --- @@ -109,7 +109,7 @@ class BisectingKMeansModel

[GitHub] spark issue #22790: [SPARK-25793][ML]call SaveLoadV2_0.load for classNameV2_...

2018-10-25 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/22790 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-22 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22675#discussion_r227193764 --- Diff: docs/ml-datasource.md --- @@ -0,0 +1,113 @@ +--- +layout: global +title: Data sources +displayTitle: Data sources

[GitHub] spark issue #22793: [SPARK-25793][ML]Call SaveLoadV2_0.load for classNameV2_...

2018-10-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/22793 LGTM pending on Jenkins pass. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22764: [SPARK-25765][ML] Add training cost to BisectingK...

2018-10-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22764#discussion_r226826701 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala --- @@ -225,13 +227,14 @@ object BisectingKMeansModel

[GitHub] spark pull request #22780: [DOC][MINOR] Fix minor error in the code of graph...

2018-10-19 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/22780 [DOC][MINOR] Fix minor error in the code of graphx guide ## What changes were proposed in this pull request? Fix minor error in the code "sketch of pregel implementation"

[GitHub] spark pull request #22764: [SPARK-25765][ML] Add training cost to BisectingK...

2018-10-19 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22764#discussion_r226812121 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala --- @@ -225,13 +227,14 @@ object BisectingKMeansModel

[GitHub] spark pull request #22764: [SPARK-25765][ML] Add training cost to BisectingK...

2018-10-18 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22764#discussion_r226520698 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala --- @@ -225,13 +227,14 @@ object BisectingKMeansModel

[GitHub] spark issue #22763: [SPARK-25764][ML][EXAMPLES] Update BisectingKMeans examp...

2018-10-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/22763 LGTM. thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

2018-10-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/22756 LGTM. thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-18 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22675#discussion_r226379993 --- Diff: docs/ml-datasource.md --- @@ -0,0 +1,90 @@ +--- +layout: global +title: Data sources +displayTitle: Data sources --- End

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22675#discussion_r226161636 --- Diff: docs/ml-datasource.md --- @@ -0,0 +1,49 @@ +--- +layout: global +title: Data sources +displayTitle: Data sources

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22675#discussion_r226161623 --- Diff: docs/ml-datasource.md --- @@ -0,0 +1,49 @@ +--- +layout: global +title: Data sources +displayTitle: Data sources

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22675#discussion_r226161557 --- Diff: docs/ml-datasource.md --- @@ -0,0 +1,49 @@ +--- +layout: global +title: Data sources +displayTitle: Data sources

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/22675 This do not block 2.4 release. But merge before 2.4 is better. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-08 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/22675 [SPARK-25347][ML][DOC] Spark datasource for image/libsvm user guide ## What changes were proposed in this pull request? Spark datasource for image/libsvm user guide ## How

[GitHub] spark issue #22618: [SPARK-25321][ML] Revert SPARK-14681 to avoid API breaki...

2018-10-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/22618 LGTM. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22492: [SPARK-25321][ML] Revert SPARK-14681 to avoid API...

2018-09-25 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/22492 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22510: [SPARK-25321][ML] Fix local LDA model constructor

2018-09-20 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/22510 [SPARK-25321][ML] Fix local LDA model constructor ## What changes were proposed in this pull request? change back the constructor to: ``` class LocalLDAModel private[ml

[GitHub] spark issue #22492: [SPARK-25321][ML] Revert SPARK-14681 to avoid API breaki...

2018-09-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/22492 @mengxr Should this be put into master ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22492: [SPARK-25321][ML] Revert SPARK-14681 to avoid API...

2018-09-20 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/22492 [SPARK-25321][ML] Revert SPARK-14681 to avoid API breaking change ## What changes were proposed in this pull request? Revert SPARK-14681 to avoid API breaking change. PR [SPARK-14681

[GitHub] spark pull request #22449: [SPARK-22666][ML][FOLLOW-UP] Return a correctly f...

2018-09-18 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/22449 [SPARK-22666][ML][FOLLOW-UP] Return a correctly formatted URI for invalid images ## What changes were proposed in this pull request? Change the URI returned in ImageFileFormat

[GitHub] spark issue #22360: [MINOR][ML] Remove `BisectingKMeansModel.setDistanceMeas...

2018-09-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/22360 Do we need to set `distanceMeasure` again for the parent model ? When parent model created, it will use the same `distanceMeasure` with the one used in training

[GitHub] spark pull request #22349: [SPARK-25345][ML] Deprecate public APIs from Imag...

2018-09-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22349#discussion_r216117396 --- Diff: python/pyspark/ml/image.py --- @@ -207,6 +207,9 @@ def readImages(self, path, recursive=False, numPartitions=-1, .. note

[GitHub] spark issue #22360: [MINOR][ML] Remove `BisectingKMeansModel.setDistanceMeas...

2018-09-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/22360 @srowen The delegated `mllib.BisectingKMeansModel` is: ``` class BisectingKMeansModel private[clustering] ( private[clustering] val root: ClusteringTreeNode, @Since

[GitHub] spark pull request #22360: [MINOR][ML] Remove `BisectingKMeansModel.setDista...

2018-09-07 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/22360 [MINOR][ML] Remove `BisectingKMeansModel.setDistanceMeasure` method ## What changes were proposed in this pull request? Remove `BisectingKMeansModel.setDistanceMeasure` method

[GitHub] spark pull request #22349: [SPARK-25345][ML] Deprecate public APIs from Imag...

2018-09-06 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/22349 [SPARK-25345][ML] Deprecate public APIs from ImageSchema ## What changes were proposed in this pull request? Deprecate public APIs from ImageSchema. ## How was this patch

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215200249 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageOptions.scala --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215138998 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215138889 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -567,6 +567,7 @@ object DataSource extends

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215138862 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215138728 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215138711 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215138174 --- Diff: data/mllib/images/images/license.txt --- @@ -0,0 +1,13 @@ +The images in the folder "kittens" are under the creative commons CC

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215135665 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/22328 [SPARK-22666][ML][SQL] Spark datasource for image format ## What changes were proposed in this pull request? Implement an image schema datasource. This image datasource

[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2018-07-09 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/19666 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20446: [SPARK-23254][ML] Add user guide entry and example for D...

2018-07-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20446 @srowen The reason I do not use `.show` I have already reply here https://github.com/apache/spark/pull/20446#discussion_r165565121 thanks

[GitHub] spark issue #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-06-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21513 LGTM. Thanks! @mengxr Would you mind take a look ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194214431 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194214516 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194214535 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194214831 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194215008 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1157,204 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21513: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21513#discussion_r194167552 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1159,216 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC

2018-06-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21119 @huaxingao Create a new PR is better I think. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21493: [SPARK-15784] Add Power Iteration Clustering to s...

2018-06-04 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21493 [SPARK-15784] Add Power Iteration Clustering to spark.ml ## What changes were proposed in this pull request? According to the discussion on JIRA. I rewrite the Power Iteration

[GitHub] spark issue #21265: [SPARK-24146][PySpark][ML] spark.ml parity for sequentia...

2018-05-31 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21265 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21265: [SPARK-24146][PySpark][ML] spark.ml parity for se...

2018-05-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21265#discussion_r192000596 --- Diff: python/pyspark/ml/fpm.py --- @@ -243,3 +244,105 @@ def setParams(self, minSupport=0.3, minConfidence=0.8, itemsCol="

[GitHub] spark pull request #21265: [SPARK-24146][PySpark][ML] spark.ml parity for se...

2018-05-31 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21265#discussion_r191996249 --- Diff: python/pyspark/ml/fpm.py --- @@ -243,3 +244,105 @@ def setParams(self, minSupport=0.3, minConfidence=0.8, itemsCol="

[GitHub] spark pull request #21265: [SPARK-24146][PySpark][ML] spark.ml parity for se...

2018-05-30 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21265#discussion_r191995667 --- Diff: python/pyspark/ml/fpm.py --- @@ -243,3 +244,75 @@ def setParams(self, minSupport=0.3, minConfidence=0.8, itemsCol="

[GitHub] spark issue #21393: [SPARK-20114][ML][FOLLOW-UP] spark.ml parity for sequent...

2018-05-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21393 @mengxr @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21393: [SPARK-20114][ML][FOLLOW-UP] spark.ml parity for ...

2018-05-22 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21393 [SPARK-20114][ML][FOLLOW-UP] spark.ml parity for sequential pattern mining - PrefixSpan ## What changes were proposed in this pull request? Change `PrefixSpan` into a class

[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-05-17 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21163 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-05-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r188853310 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-05-15 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r188491670 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21129 Jenkins test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-05-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17086 LGTM. @jkbradley @mengxr Would you mind take a look ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-05-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21163 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

2018-05-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21274 LGTM. ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-05-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r186994754 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

2018-05-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21274#discussion_r186986006 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -232,7 +232,7 @@ class PowerIterationClustering

[GitHub] spark issue #21270: [SPARK-24213][ML]Power Iteration Clustering in SparkML t...

2018-05-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21270 @shahidki31 Seemingly what you said above is anothor issue ? You can create another jira for that. :) --- - To unsubscribe

[GitHub] spark issue #21272: [MINOR][ML][DOC] Improved Naive Bayes user guide explana...

2018-05-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21272 LGTM! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-05-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21129 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21265: [SPARK-24146][PySpark][ML] spark.ml parity for se...

2018-05-07 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21265 [SPARK-24146][PySpark][ML] spark.ml parity for sequential pattern mining - PrefixSpan: Python API ## What changes were proposed in this pull request? spark.ml parity for sequential

[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2018-05-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13493 LGTM! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20095: [SPARK-22126][ML] Added fitMultiple method with d...

2018-05-07 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20095#discussion_r186381507 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala --- @@ -79,7 +82,52 @@ abstract class Estimator[M <: Model[M]] exte

[GitHub] spark pull request #21097: [SPARK-14682][ML] Provide evaluateEachIteration m...

2018-05-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21097#discussion_r186037589 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -365,6 +365,20 @@ class GBTClassifierSuite extends

[GitHub] spark pull request #21218: [SPARK-24155][ML] Instrumentation improvements fo...

2018-05-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21218#discussion_r185970925 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -423,6 +423,8 @@ class GaussianMixture @Since("

[GitHub] spark issue #20261: [SPARK-22885][ML][TEST] ML test for StructuredStreaming:...

2018-05-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20261 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21218: [SPARK-24155][ML] Instrumentation improvements fo...

2018-05-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21218#discussion_r185756220 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -378,6 +378,7 @@ class KMeans @Since("

[GitHub] spark pull request #21218: [SPARK-24155][ML] Instrumentation improvements fo...

2018-05-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21218#discussion_r185756193 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -423,6 +423,8 @@ class GaussianMixture @Since("

[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-05-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20973 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20261: [SPARK-22885][ML][TEST] ML test for StructuredStreaming:...

2018-05-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20261 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20973: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-05-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20973 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20973: [SPARK-20114][ML] spark.ml parity for sequential ...

2018-04-30 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20973#discussion_r185149879 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala --- @@ -44,26 +43,37 @@ object PrefixSpan { * * @param dataset

[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r184626842 --- Diff: python/pyspark/ml/util.py --- @@ -523,11 +534,29 @@ def getAndSetParams(instance, metadata): """

[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r184620777 --- Diff: python/pyspark/ml/util.py --- @@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, extraMetadata=None, paramMap=None

[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21153#discussion_r184620855 --- Diff: python/pyspark/ml/util.py --- @@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, extraMetadata=None, paramMap=None

[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17086 overall good, @jkbradley Would you mind take a look ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r184584878 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -55,44 +60,128 @@ class MulticlassMetricsSuite

[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r184566012 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -95,4 +95,95 @@ class MulticlassMetricsSuite

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184343934 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184342231 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184346287 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184344777 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184344901 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...

2018-04-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184345688 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,201 @@ def getKeepLastCheckpoint(self): return self.getOrDefault

[GitHub] spark pull request #21163: [SPARK-24097][ML] Instruments improvements - Rand...

2018-04-26 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/21163 [SPARK-24097][ML] Instruments improvements - RandomForest and GradientBoostedTree ## What changes were proposed in this pull request? Instruments improvements for `RandomForest

[GitHub] spark issue #21120: [SPARK-22448][ML] Added sum function to Summerizer and M...

2018-04-24 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21120 I doubt that this will slow down the summarizer performance because you add sum statistics internally (and this sum value will possible to overflow). We can directly use `count * mean

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-24 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r183645675 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -95,4 +95,95 @@ class MulticlassMetricsSuite

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-24 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r183647005 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -95,4 +95,95 @@ class MulticlassMetricsSuite

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-24 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r183645265 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -95,4 +95,95 @@ class MulticlassMetricsSuite

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-24 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r183647533 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -95,4 +95,95 @@ class MulticlassMetricsSuite

[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-24 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17086#discussion_r183646411 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/MulticlassMetricsSuite.scala --- @@ -95,4 +95,95 @@ class MulticlassMetricsSuite

[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...

2018-04-24 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/21129 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

  1   2   3   4   5   6   7   8   9   10   >