GitHub user fjiang6 opened a pull request:
https://github.com/apache/spark/pull/4254
[SPARK-4259][MLlib]: Add Power Iteration Clustering Algorithm with Gaussian
Similarity Function
Add single pseudo-eigenvector PIC
Including documentations, one property file and updated pom.xml with the
following codes:
mllib/src/main/scala/org/apache/spark/mllib/clustering/PIClustering.scala
mllib/src/test/scala/org/apache/spark/mllib/clustering/PIClusteringSuite.scala
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Huawei-Spark/spark PIC
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/4254.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4254
----
commit a3c5fbe3451b665968d503fa4ee52f1f6118252a
Author: Jiang Fan <[email protected]>
Date: 2015-01-22T21:52:52Z
Adding Power Iteration Clustering
commit d5aae2032c08d097ed3c6cd61ed2612a55a619df
Author: Jiang Fan <[email protected]>
Date: 2015-01-22T21:57:35Z
Adding Power Iteration Clustering and Suite test
commit 3fd5bc895f1594c57a182c31e010966affb47325
Author: sboeschhuawei <[email protected]>
Date: 2015-01-23T00:17:57Z
PIClustering is running in new branch (up to the pseudo-eigenvector
convergence step)
commit 0ef163f89ed82ed72967b51330e16ac3cf5759be
Author: sboeschhuawei <[email protected]>
Date: 2015-01-23T04:20:47Z
Added ConcentricCircles data generation and KMeans clustering
commit 32a90dc5570ea02ee25b80c4440293581416209c
Author: sboeschhuawei <[email protected]>
Date: 2015-01-23T16:48:00Z
Update circles test data values
commit 0700335d7b4fe9132046f034a67eb3405cd20953
Author: sboeschhuawei <[email protected]>
Date: 2015-01-23T22:30:53Z
First end to end working version: but has bad performance issue
commit e5df2b88c3668ecc4bc0cd25cde10dd033b9f72f
Author: sboeschhuawei <[email protected]>
Date: 2015-01-24T04:20:32Z
First end to end working PIC
commit 929426339d9934d61878880b2182bc5e18acee6c
Author: sboeschhuawei <[email protected]>
Date: 2015-01-25T11:00:07Z
Added visualization/plotting of input/output data
commit a2b1e5720266393a1813f0abe43c3709ebf46268
Author: sboeschhuawei <[email protected]>
Date: 2015-01-25T11:21:43Z
Revert inadvertent update to KMeans
commit b7dbcbe56767a8609314a20f24e907c426e827af
Author: sboeschhuawei <[email protected]>
Date: 2015-01-26T00:03:46Z
Added axes and combined into single plot for matplotlib
commit f656c349b059a7df1c6415e69c2010873ba4d2d4
Author: sboeschhuawei <[email protected]>
Date: 2015-01-26T00:04:10Z
Added iris dataset
commit a112f38d0476cee2bb5aa49311ce98b800141f8e
Author: sboeschhuawei <[email protected]>
Date: 2015-01-26T08:42:05Z
Added graphx main and test jars as dependencies to mllib/pom.xml
commit ace9749338c7454d17839dcf98ed75b131a21537
Author: Fan Jiang <[email protected]>
Date: 2015-01-26T18:27:50Z
Update PIClustering.scala
commit b29c0dbf081d8baa30a3a83b57492bf92b2f4b6a
Author: Fan Jiang <[email protected]>
Date: 2015-01-26T18:57:04Z
Update PIClustering.scala
commit bea48eaa0cca25695c283616d86235227357980c
Author: sboeschhuawei <[email protected]>
Date: 2015-01-27T00:58:57Z
Converted custom Linear Algebra datatypes/routines to use Breeze.
commit 90e7fa4b58b6d12f6b04dab3bf5f0a9d50f8d330
Author: sboeschhuawei <[email protected]>
Date: 2015-01-28T02:04:05Z
Converted from custom Linalg routines to Breeze: added JavaDoc comments;
added Markdown documentation
commit be659e31f5d9b1d35561ee43620f36d26732a950
Author: sboeschhuawei <[email protected]>
Date: 2015-01-28T02:06:53Z
Added mllib specific log4j
commit 060e6bf8d45a211a6b71e2cba8e4bf2b14b9e72a
Author: sboeschhuawei <[email protected]>
Date: 2015-01-28T06:49:12Z
Added link to PIC doc from the main clustering md doc
commit 24f438e9c72fcc77691fe5d70f01c1bb577ee874
Author: sboeschhuawei <[email protected]>
Date: 2015-01-28T06:50:29Z
fixed incorrect markdown in clustering doc
commit 88aacc8fa8aa955be2ec81caf001897b2bc91625
Author: sboeschhuawei <[email protected]>
Date: 2015-01-28T19:48:51Z
Add assert to testcase on cluster sizes
commit 43ab10be1c634f88d08f666df71ff15427e8a3d2
Author: sboeschhuawei <[email protected]>
Date: 2015-01-28T19:55:09Z
Change last two println's to log4j logger
commit 218a49d4e74b24bebf94033440904ca7411a28f0
Author: sboeschhuawei <[email protected]>
Date: 2015-01-28T20:38:04Z
Applied Xiangrui's comments - especially removing RDD/PICLinalg classes and
making noncritical methods private
commit 1c3a62ea8d45609e22bf2394a73930b1a334422d
Author: sboeschhuawei <[email protected]>
Date: 2015-01-28T21:23:52Z
removed matplot.py and reordered all private methods to bottom of PIC
commit 121e4d5fc0a0ab61a211fc71fea7a74775feb763
Author: sboeschhuawei <[email protected]>
Date: 2015-01-28T21:33:29Z
Remove unused testing data files
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]