+1

On 07/31/2015 04:50 PM, Sebastian Raschka wrote:
Hi, Timo,
wow, the code really short, well organized and commented. But it's probably better to submit a pull request so that people can directly comment on sections of the code and get notifications and updates.

Best,
Sebastian

On Jul 31, 2015, at 4:35 PM, Timo Erkkilä <timo.erkk...@gmail.com <mailto:timo.erkk...@gmail.com>> wrote:

Good ideas. I'm fine integrating the code to Scikit-Learn even though it's a bit of work. :) I've pushed the first version of the code under feature branch "kmedoids":

https://github.com/terkkila/scikit-learn/blob/kmedoids/sklearn/cluster/k_medoids_.py

I've added drafts of the "clustering" and "distance_metric" arguments. Please take a look and comment.


Cheers,
Timo


On Fri, Jul 31, 2015 at 7:59 PM, Sebastian Raschka <se.rasc...@gmail.com <mailto:se.rasc...@gmail.com>> wrote:

    To address the efficiency issue for large datasets (to some
    extend), we could maybe have a `clustering` argument where
    `clustering='pam'` or `clustering='clara'`; 'pam' should probably
    be the default.

    In a nutshell, CLARA repeatedly draws random samples (k <
    n_samples), applies PAM to them, and finds the best clustering.
    Here are some good resources for PAM and CLARA:

    - PAM (Partitioning Around Medoids):
    
https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Partitioning_Around_Medoids_(PAM)
    
<https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Partitioning_Around_Medoids_%28PAM%29>

    - CLARA (Clustering for Large Applications):
    https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/CLARA

    Best,
    Sebastian


    On Jul 31, 2015, at 12:46 PM, Andreas Mueller <t3k...@gmail.com
    <mailto:t3k...@gmail.com>> wrote:

    Cool.
    Including the code in scikit-learn is often a bit of a process
    but it might indeed be interesting.
    You could just start with a pull request - or publish a gist if
    you don't think you'll have time to work on the inclusion and
    leave that part to someone else.

    Cheers,
    Andy

    On 07/31/2015 05:38 AM, Timo Erkkilä wrote:
    That makes sense. The basic implementation is definitely short,
    just ~20 lines of code if you don't count comments etc. I can
    put the source code available so that you can judge whether
    it's good to take further. I am familiar with the documentation
    libraries you are using (Sphinx with Numpy style docstrings) in
    Scikit-Learn, but that's further down the line.


    Cheers,
    Timo

    On Fri, Jul 31, 2015 at 10:53 AM, Gael Varoquaux
    <gael.varoqu...@normalesup.org
    <mailto:gael.varoqu...@normalesup.org>> wrote:

        > Is it required that an algorithm, which is implemented in
        Scikit-Learn, scales
        > well wrt n_samples?

        The requirement is 'be actually useful', which is something
        that is a bit
        hard to judge :).

        I think that K-medoids is bordeline on this requirement,
        probably on the
        right side of the border. I would tend to say that if the
        code clean and
        reasonnably short (that last requirement is important), it
        comes with
        good tests, examples and documentation, it should be
        possible to merge it
        in.

        Sorry, we are indeed being picky. It's a struggle to find
        the right
        feature set to keep the package maintainable while
        providing great value
        to our users.

        Gaël

        
------------------------------------------------------------------------------
        _______________________________________________
        Scikit-learn-general mailing list
        Scikit-learn-general@lists.sourceforge.net
        <mailto:Scikit-learn-general@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




    
------------------------------------------------------------------------------


    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net  
<mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

    
------------------------------------------------------------------------------
    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


    
------------------------------------------------------------------------------

    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net <mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to