GitHub user acflorea opened a pull request:
https://github.com/apache/spark/pull/9699
[SPARK-2344] [MLlib] Add fuzzifier (m) parameter to KMeans to offer support
for Fuzzy CMeans specific training
- Implement support for: https://en.wikipedia.org/wiki/Fuzzy_clustering in
the existing KMeans classes. Basically, the change consist in adding the
fuzzifier factor (m) as parameter to the KMeans training phase. If m == 1, the
implementation defaults to the original, hard clustering KMeans.
- Add some extra comments in the KMeans code
- Create unit tests for the fuzzy side of the algorithms, mainly for m=2 as
this the most commonly used value.
The contribution is my original work and I license the work to the project
under the project's open source license.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/acflorea/spark fuzzy-c-means
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9699.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9699
----
commit 7dcffd5536e729f7db903e1efb0c417a960f73e3
Author: Adrian Florea <[email protected]>
Date: 2015-11-05T20:49:13Z
Add a bunch of comments to KMeans code.
commit 9d6f23894f05d0a3d2e131bab5eb3236331fee6a
Author: Adrian Florea <[email protected]>
Date: 2015-11-06T21:03:28Z
Create degreesOfMembership method - computes the weight for each point with
respect to each cluster
commit daac6545f157484eca4e49c9d37e83f7acaa1c40
Author: Adrian Florea <[email protected]>
Date: 2015-11-08T12:58:32Z
Add distances to degreesOfMembership method, switch to it instead of
findClosest, convert counts to Double, rename it accordingly
commit d4dc4beb7369b29fc6baae3b8ae92df6b2e92c96
Author: Adrian Florea <[email protected]>
Date: 2015-11-08T13:57:22Z
Adjustments for centroid computation. Multiple runs for "two clusters" test
(with a high value for the fuzzyfier and a bad choice of initial centroids the
algorithm fails sometimes)
commit b28fcf4285d489fa051657ca80e21f51070d359f
Author: Adrian Florea <[email protected]>
Date: 2015-11-08T14:31:25Z
Add fuzzifier (m) as parameter, default to Hard Clustering for m==1
commit eed6195d106e2c9448e1c0083abf1ab9034bc618
Author: Adrian Florea <[email protected]>
Date: 2015-11-08T19:59:32Z
Optimizations for membership computation
commit 0e808eddfc41e62856e1e1fece46c531277913a1
Author: Adrian Florea <[email protected]>
Date: 2015-11-11T19:24:04Z
Merge remote-tracking branch 'upstream/master' into fuzzy-c-means
commit 2c0f7a66a55bf187d1a7d93cb4037849403ee414
Author: Adrian Florea <[email protected]>
Date: 2015-11-12T20:34:15Z
Publish m as train parameter, add some fuzzier tests.
commit 490ecd1bc8f668a521a0b9f77b17e69a8e20cb0b
Author: Adrian Florea <[email protected]>
Date: 2015-11-13T18:57:03Z
Merge remote-tracking branch 'upstream/master' into fuzzy-c-means
commit 30d6f2f1b59eb52705f946d6691c205a31afffbb
Author: Adrian Florea <[email protected]>
Date: 2015-11-13T19:29:58Z
Add more fuzzier tests.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]