GitHub user jkbradley opened a pull request:
https://github.com/apache/spark/pull/21090
[SPARK-15784][ML] Add Power Iteration Clustering to spark.ml
## What changes were proposed in this pull request?
This PR adds PowerIterationClustering as a Transformer to spark.ml. In the
transform method, it calls spark.mllib's PowerIterationClustering.run() method
and transforms the return value assignments (the Kmeans output of the
pseudo-eigenvector) as a DataFrame (id: LongType, cluster: IntegerType).
This PR is copied and modified from
https://github.com/apache/spark/pull/15770 The primary author is @wangmiao1981
## How was this patch tested?
This PR has 2 types of tests:
* Copies of tests from spark.mllib's PIC tests
* New tests specific to the spark.ml APIs
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkbradley/spark wangmiao1981-pic
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21090.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21090
----
commit e4492a64b74b0bbbbccc2da8f13353d37bb9bb0c
Author: [email protected] <wm624@...>
Date: 2016-06-13T19:47:42Z
add pic framework (model, class etc)
commit 70862491e5b86ce4add500a0c96ae5220733b35d
Author: [email protected] <wm624@...>
Date: 2016-06-13T23:28:09Z
change a comment
commit b73d8a78fa69f83c278996feb1b19502ef871c5b
Author: [email protected] <wm624@...>
Date: 2016-06-17T17:27:55Z
add missing functions fit predict load save etc.
commit 022fe523f735c5519f948b175871489f79434fb5
Author: [email protected] <wm624@...>
Date: 2016-06-18T01:12:41Z
add unit test flie
commit 552cf54fb03f88af023f080e60fa50f1f39060fc
Author: [email protected] <wm624@...>
Date: 2016-06-20T17:35:05Z
add test cases part 1
commit 0b4954d55b4d344794d3c47366220c67f07d0d43
Author: [email protected] <wm624@...>
Date: 2016-06-20T20:29:54Z
add unit test part 2: test fit, parameters etc.
commit f22b01e06eaaf5951befcebdffc18c8e519183d2
Author: [email protected] <wm624@...>
Date: 2016-06-20T21:22:59Z
fix a type issue
commit 305b194dae40eaff990c18837c3f2bc8d469e60c
Author: [email protected] <wm624@...>
Date: 2016-06-21T20:07:27Z
add more unit tests
commit 4b32cbf02965c5c1a0c094fa144836dab0dfd543
Author: [email protected] <wm624@...>
Date: 2016-06-21T21:46:25Z
delete unused import and add comments
commit f6eda88a6c0af416b988a2c37f46c8b08e5e99cf
Author: [email protected] <wm624@...>
Date: 2016-10-25T21:28:12Z
change version to 2.1.0
commit 45c4b1cd1afa28c775c666b57ecee614ed9a41d0
Author: [email protected] <wm624@...>
Date: 2016-11-03T23:26:01Z
change PIC as a Transformer
commit e8d7ed37138909d010a812fba7d03ef30a4f6e05
Author: [email protected] <wm624@...>
Date: 2016-11-04T17:28:26Z
add LabelCol
commit e4e1e055a9b3ab54b83331ac7dc56d6b792dcf7b
Author: [email protected] <wm624@...>
Date: 2016-11-04T18:36:09Z
change col implementation
commit 8384422ec0e7192cc8ce53df02ddb4ae0401fd0b
Author: [email protected] <wm624@...>
Date: 2017-02-17T22:20:00Z
address some of the comments
commit d6a199c48ff940861d80caf275da29d99375ce33
Author: [email protected] <wm624@...>
Date: 2017-02-21T22:37:51Z
add additional test with dataset having more data
commit b0c3aff4a76ace99c104c2b2c10c9485a028bfd6
Author: [email protected] <wm624@...>
Date: 2017-03-14T23:13:45Z
change input data format
commit 091225dd2f1c353edc28dc4299034a018a92bc81
Author: [email protected] <wm624@...>
Date: 2017-03-15T22:49:45Z
resolve warnings
commit 8bb99567556ce29c75d5f395157d0161dff695bc
Author: [email protected] <wm624@...>
Date: 2017-03-16T18:33:47Z
add neighbor and weight cols
commit 8ba82e8392e6d607ab750ed8eb3caaf386e1352a
Author: wangmiao1981 <wm624@...>
Date: 2017-08-15T21:13:55Z
address review comments 1
commit 468a94741efe6530c9acfbb1af4f46499550ee1f
Author: wangmiao1981 <wm624@...>
Date: 2017-08-15T21:23:39Z
fix style
commit ec10f24336ff51354a1657c7ceadb9ada8cd1484
Author: wangmiao1981 <wm624@...>
Date: 2017-08-15T22:30:28Z
remove unused comments
commit 5710cfcf2e3596c95f353ce043f7358a030d70a0
Author: wangmiao1981 <wm624@...>
Date: 2017-08-15T23:43:14Z
add Since
commit 88654b3055ebd863e3b3c5774abdce28f3cda184
Author: wangmiao1981 <wm624@...>
Date: 2017-08-17T00:12:12Z
fix missing >
commit 804adc6fece91e7264f315ee965faa40c5e334c5
Author: wangmiao1981 <wm624@...>
Date: 2017-08-17T17:26:40Z
fix doc
commit 4a6dd79a9c37f71ea4378692438f19b3247b7913
Author: wangmiao1981 <wm624@...>
Date: 2017-10-25T23:16:55Z
address review comments
commit 5cb8ed6de3865f58719b3b30888b3bc4542905d4
Author: wangmiao1981 <wm624@...>
Date: 2017-10-30T21:44:24Z
fix unit test
commit 6abf6023868d944068a26186cde3fbadffd83a74
Author: Joseph K. Bradley <joseph@...>
Date: 2018-04-03T23:46:40Z
cleanups to docs
commit d9270876797153d7660843fc621e707b4dff71ca
Author: Joseph K. Bradley <joseph@...>
Date: 2018-04-03T23:52:36Z
typo
commit d2157489770a79fe443d567bfc03d61f72fbe161
Author: Joseph K. Bradley <joseph@...>
Date: 2018-04-17T20:17:15Z
final updates for PIC PR
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]