Hi, Timo,
wow, the code really short, well organized and commented. But it's probably 
better to submit a pull request so that people can directly comment on sections 
of the code and get notifications and updates.

Best,
Sebastian

> On Jul 31, 2015, at 4:35 PM, Timo Erkkilä <timo.erkk...@gmail.com> wrote:
> 
> Good ideas. I'm fine integrating the code to Scikit-Learn even though it's a 
> bit of work. :) I've pushed the first version of the code under feature 
> branch "kmedoids":
> 
> https://github.com/terkkila/scikit-learn/blob/kmedoids/sklearn/cluster/k_medoids_.py
>  
> <https://github.com/terkkila/scikit-learn/blob/kmedoids/sklearn/cluster/k_medoids_.py>
> 
> I've added drafts of the "clustering" and "distance_metric" arguments. Please 
> take a look and comment.
> 
> 
> Cheers,
> Timo
> 
> 
> On Fri, Jul 31, 2015 at 7:59 PM, Sebastian Raschka <se.rasc...@gmail.com 
> <mailto:se.rasc...@gmail.com>> wrote:
> To address the efficiency issue for large datasets (to some extend), we could 
> maybe have a `clustering` argument where `clustering='pam'` or 
> `clustering='clara'`; 'pam' should probably be the default.
> 
> In a nutshell, CLARA repeatedly draws random samples (k < n_samples), applies 
> PAM to them, and finds the best clustering. Here are some good resources for 
> PAM and CLARA:
> 
> - PAM (Partitioning Around Medoids): 
> https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Partitioning_Around_Medoids_(PAM)
>  
> <https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Partitioning_Around_Medoids_(PAM)>
> - CLARA (Clustering for Large Applications): 
> https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/CLARA 
> <https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/CLARA>
> 
> Best,
> Sebastian
> 
> 
>> On Jul 31, 2015, at 12:46 PM, Andreas Mueller <t3k...@gmail.com 
>> <mailto:t3k...@gmail.com>> wrote:
>> 
>> Cool.
>> Including the code in scikit-learn is often a bit of a process but it might 
>> indeed be interesting.
>> You could just start with a pull request - or publish a gist if you don't 
>> think you'll have time to work on the inclusion and leave that part to 
>> someone else.
>> 
>> Cheers,
>> Andy
>> 
>> On 07/31/2015 05:38 AM, Timo Erkkilä wrote:
>>> That makes sense. The basic implementation is definitely short, just ~20 
>>> lines of code if you don't count comments etc. I can put the source code 
>>> available so that you can judge whether it's good to take further. I am 
>>> familiar with the documentation libraries you are using (Sphinx with Numpy 
>>> style docstrings) in Scikit-Learn, but that's further down the line.  
>>> 
>>> 
>>> Cheers,
>>> Timo
>>> 
>>> On Fri, Jul 31, 2015 at 10:53 AM, Gael Varoquaux 
>>> <gael.varoqu...@normalesup.org <mailto:gael.varoqu...@normalesup.org>> 
>>> wrote:
>>> > Is it required that an algorithm, which is implemented in Scikit-Learn, 
>>> > scales
>>> > well wrt n_samples? 
>>> 
>>> The requirement is 'be actually useful', which is something that is a bit
>>> hard to judge :).
>>> 
>>> I think that K-medoids is bordeline on this requirement, probably on the
>>> right side of the border. I would tend to say that if the code clean and
>>> reasonnably short (that last requirement is important), it comes with
>>> good tests, examples and documentation, it should be possible to merge it
>>> in.
>>> 
>>> Sorry, we are indeed being picky. It's a struggle to find the right
>>> feature set to keep the package maintainable while providing great value
>>> to our users.
>>> 
>>> Gaël
>>> 
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net 
>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> 
>>> 
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net 
>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>> 
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net 
>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> ------------------------------------------------------------------------------
> 
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net 
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to