Good ideas. I'm fine integrating the code to Scikit-Learn even though it's
a bit of work. :) I've pushed the first version of the code under feature
branch "kmedoids":

https://github.com/terkkila/scikit-learn/blob/kmedoids/sklearn/cluster/k_medoids_.py

I've added drafts of the "clustering" and "distance_metric" arguments.
Please take a look and comment.


Cheers,
Timo


On Fri, Jul 31, 2015 at 7:59 PM, Sebastian Raschka <se.rasc...@gmail.com>
wrote:

> To address the efficiency issue for large datasets (to some extend), we
> could maybe have a `clustering` argument where `clustering='pam'` or
> `clustering='clara'`; 'pam' should probably be the default.
>
> In a nutshell, CLARA repeatedly draws random samples (k < n_samples),
> applies PAM to them, and finds the best clustering. Here are some good
> resources for PAM and CLARA:
>
> - PAM (Partitioning Around Medoids):
> https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Partitioning_Around_Medoids_(PAM)
> - CLARA (Clustering for Large Applications):
> https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/CLARA
>
> Best,
> Sebastian
>
>
> On Jul 31, 2015, at 12:46 PM, Andreas Mueller <t3k...@gmail.com> wrote:
>
> Cool.
> Including the code in scikit-learn is often a bit of a process but it
> might indeed be interesting.
> You could just start with a pull request - or publish a gist if you don't
> think you'll have time to work on the inclusion and leave that part to
> someone else.
>
> Cheers,
> Andy
>
> On 07/31/2015 05:38 AM, Timo Erkkilä wrote:
>
> That makes sense. The basic implementation is definitely short, just ~20
> lines of code if you don't count comments etc. I can put the source code
> available so that you can judge whether it's good to take further. I am
> familiar with the documentation libraries you are using (Sphinx with Numpy
> style docstrings) in Scikit-Learn, but that's further down the line.
>
>
> Cheers,
> Timo
>
> On Fri, Jul 31, 2015 at 10:53 AM, Gael Varoquaux <
> gael.varoqu...@normalesup.org> wrote:
>
>> > Is it required that an algorithm, which is implemented in Scikit-Learn,
>> scales
>> > well wrt n_samples?
>>
>> The requirement is 'be actually useful', which is something that is a bit
>> hard to judge :).
>>
>> I think that K-medoids is bordeline on this requirement, probably on the
>> right side of the border. I would tend to say that if the code clean and
>> reasonnably short (that last requirement is important), it comes with
>> good tests, examples and documentation, it should be possible to merge it
>> in.
>>
>> Sorry, we are indeed being picky. It's a struggle to find the right
>> feature set to keep the package maintainable while providing great value
>> to our users.
>>
>> Gaël
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to