Hi, Timo, wow, the code really short, well organized and commented. But it's probably better to submit a pull request so that people can directly comment on sections of the code and get notifications and updates.
Best, Sebastian > On Jul 31, 2015, at 4:35 PM, Timo Erkkilä <timo.erkk...@gmail.com> wrote: > > Good ideas. I'm fine integrating the code to Scikit-Learn even though it's a > bit of work. :) I've pushed the first version of the code under feature > branch "kmedoids": > > https://github.com/terkkila/scikit-learn/blob/kmedoids/sklearn/cluster/k_medoids_.py > > <https://github.com/terkkila/scikit-learn/blob/kmedoids/sklearn/cluster/k_medoids_.py> > > I've added drafts of the "clustering" and "distance_metric" arguments. Please > take a look and comment. > > > Cheers, > Timo > > > On Fri, Jul 31, 2015 at 7:59 PM, Sebastian Raschka <se.rasc...@gmail.com > <mailto:se.rasc...@gmail.com>> wrote: > To address the efficiency issue for large datasets (to some extend), we could > maybe have a `clustering` argument where `clustering='pam'` or > `clustering='clara'`; 'pam' should probably be the default. > > In a nutshell, CLARA repeatedly draws random samples (k < n_samples), applies > PAM to them, and finds the best clustering. Here are some good resources for > PAM and CLARA: > > - PAM (Partitioning Around Medoids): > https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Partitioning_Around_Medoids_(PAM) > > <https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Partitioning_Around_Medoids_(PAM)> > - CLARA (Clustering for Large Applications): > https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/CLARA > <https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/CLARA> > > Best, > Sebastian > > >> On Jul 31, 2015, at 12:46 PM, Andreas Mueller <t3k...@gmail.com >> <mailto:t3k...@gmail.com>> wrote: >> >> Cool. >> Including the code in scikit-learn is often a bit of a process but it might >> indeed be interesting. >> You could just start with a pull request - or publish a gist if you don't >> think you'll have time to work on the inclusion and leave that part to >> someone else. >> >> Cheers, >> Andy >> >> On 07/31/2015 05:38 AM, Timo Erkkilä wrote: >>> That makes sense. The basic implementation is definitely short, just ~20 >>> lines of code if you don't count comments etc. I can put the source code >>> available so that you can judge whether it's good to take further. I am >>> familiar with the documentation libraries you are using (Sphinx with Numpy >>> style docstrings) in Scikit-Learn, but that's further down the line. >>> >>> >>> Cheers, >>> Timo >>> >>> On Fri, Jul 31, 2015 at 10:53 AM, Gael Varoquaux >>> <gael.varoqu...@normalesup.org <mailto:gael.varoqu...@normalesup.org>> >>> wrote: >>> > Is it required that an algorithm, which is implemented in Scikit-Learn, >>> > scales >>> > well wrt n_samples? >>> >>> The requirement is 'be actually useful', which is something that is a bit >>> hard to judge :). >>> >>> I think that K-medoids is bordeline on this requirement, probably on the >>> right side of the border. I would tend to say that if the code clean and >>> reasonnably short (that last requirement is important), it comes with >>> good tests, examples and documentation, it should be possible to merge it >>> in. >>> >>> Sorry, we are indeed being picky. It's a struggle to find the right >>> feature set to keep the package maintainable while providing great value >>> to our users. >>> >>> Gaël >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> <mailto:Scikit-learn-general@lists.sourceforge.net> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> <mailto:Scikit-learn-general@lists.sourceforge.net> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> <mailto:Scikit-learn-general@lists.sourceforge.net> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > <mailto:Scikit-learn-general@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general> > > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general