GitHub user acflorea opened a pull request:

    https://github.com/apache/spark/pull/9699

    [SPARK-2344] [MLlib] Add fuzzifier (m) parameter to KMeans to offer support 
for Fuzzy CMeans specific training

    - Implement support for: https://en.wikipedia.org/wiki/Fuzzy_clustering in 
the existing KMeans classes. Basically, the change consist in adding the 
fuzzifier factor (m) as parameter to the KMeans training phase. If m == 1, the 
implementation defaults to the original, hard clustering KMeans.
    
    - Add some extra comments in the KMeans code
    
    - Create unit tests for the fuzzy side of the algorithms, mainly for m=2 as 
this the most commonly used value.
    
    The contribution is my original work and I license the work to the project 
under the project's open source license.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/acflorea/spark fuzzy-c-means

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9699.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9699
    
----
commit 7dcffd5536e729f7db903e1efb0c417a960f73e3
Author: Adrian Florea <[email protected]>
Date:   2015-11-05T20:49:13Z

    Add a bunch of comments to KMeans code.

commit 9d6f23894f05d0a3d2e131bab5eb3236331fee6a
Author: Adrian Florea <[email protected]>
Date:   2015-11-06T21:03:28Z

    Create degreesOfMembership method - computes the weight for each point with 
respect to each cluster

commit daac6545f157484eca4e49c9d37e83f7acaa1c40
Author: Adrian Florea <[email protected]>
Date:   2015-11-08T12:58:32Z

    Add distances to degreesOfMembership method, switch to it instead of 
findClosest, convert counts to Double, rename it accordingly

commit d4dc4beb7369b29fc6baae3b8ae92df6b2e92c96
Author: Adrian Florea <[email protected]>
Date:   2015-11-08T13:57:22Z

    Adjustments for centroid computation. Multiple runs for "two clusters" test 
(with a high value for the fuzzyfier and a bad choice of initial centroids the 
algorithm fails sometimes)

commit b28fcf4285d489fa051657ca80e21f51070d359f
Author: Adrian Florea <[email protected]>
Date:   2015-11-08T14:31:25Z

    Add fuzzifier (m) as parameter, default to Hard Clustering for m==1

commit eed6195d106e2c9448e1c0083abf1ab9034bc618
Author: Adrian Florea <[email protected]>
Date:   2015-11-08T19:59:32Z

    Optimizations for membership computation

commit 0e808eddfc41e62856e1e1fece46c531277913a1
Author: Adrian Florea <[email protected]>
Date:   2015-11-11T19:24:04Z

    Merge remote-tracking branch 'upstream/master' into fuzzy-c-means

commit 2c0f7a66a55bf187d1a7d93cb4037849403ee414
Author: Adrian Florea <[email protected]>
Date:   2015-11-12T20:34:15Z

    Publish m as train parameter, add some fuzzier tests.

commit 490ecd1bc8f668a521a0b9f77b17e69a8e20cb0b
Author: Adrian Florea <[email protected]>
Date:   2015-11-13T18:57:03Z

    Merge remote-tracking branch 'upstream/master' into fuzzy-c-means

commit 30d6f2f1b59eb52705f946d6691c205a31afffbb
Author: Adrian Florea <[email protected]>
Date:   2015-11-13T19:29:58Z

    Add more fuzzier tests.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to