Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Andreas Mueller
+1 On 07/31/2015 04:50 PM, Sebastian Raschka wrote: Hi, Timo, wow, the code really short, well organized and commented. But it's probably better to submit a pull request so that people can directly comment on sections of the code and get notifications and updates. Best, Sebastian On Jul 31,

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Sebastian Raschka
Hi, Timo, wow, the code really short, well organized and commented. But it's probably better to submit a pull request so that people can directly comment on sections of the code and get notifications and updates. Best, Sebastian > On Jul 31, 2015, at 4:35 PM, Timo Erkkilä wrote: > > Good idea

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Timo Erkkilä
Good ideas. I'm fine integrating the code to Scikit-Learn even though it's a bit of work. :) I've pushed the first version of the code under feature branch "kmedoids": https://github.com/terkkila/scikit-learn/blob/kmedoids/sklearn/cluster/k_medoids_.py I've added drafts of the "clustering" and "d

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Sebastian Raschka
To address the efficiency issue for large datasets (to some extend), we could maybe have a `clustering` argument where `clustering='pam'` or `clustering='clara'`; 'pam' should probably be the default. In a nutshell, CLARA repeatedly draws random samples (k < n_samples), applies PAM to them, and

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Andreas Mueller
Cool. Including the code in scikit-learn is often a bit of a process but it might indeed be interesting. You could just start with a pull request - or publish a gist if you don't think you'll have time to work on the inclusion and leave that part to someone else. Cheers, Andy On 07/31/2015 0

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Timo Erkkilä
That makes sense. The basic implementation is definitely short, just ~20 lines of code if you don't count comments etc. I can put the source code available so that you can judge whether it's good to take further. I am familiar with the documentation libraries you are using (Sphinx with Numpy style

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Gael Varoquaux
> Is it required that an algorithm, which is implemented in Scikit-Learn, scales > well wrt n_samples?  The requirement is 'be actually useful', which is something that is a bit hard to judge :). I think that K-medoids is bordeline on this requirement, probably on the right side of the border. I

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Timo Erkkilä
I was using a dynamic time warping (DTW) distance with KMedoids, which made more sense than using euclidean distance since the profiles indeed had warps along the time axis. DTW implementation was taken from MLPY since it's not in Scikit-Learn either. Is it required that an algorithm, which is imp

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-30 Thread Sebastian Raschka
Yes, I may be far more expensive than k-means. I just used it with Euclidean distance -- was for a comparison. I think k-medoids can still be useful for smaller, maybe noisier datasets, or if you have some distance measure were calculating averages may not make sense. > On Jul 30, 2015, at 2:4

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-30 Thread Andreas Mueller
I think KMediods has come up before. One issues is that it doesn't really scale to large n_samples, right? There is an implementation mentioned here: https://github.com/scikit-learn/scikit-learn/issues/3799 Do you use it because you have a custom distance matrix? On 07/30/2015 02:27 PM, Sebastia

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-30 Thread Sebastian Raschka
I was looking for K-Medoids too couple of weeks ago and ended up implementing it myself -- but more like quick & dirty. I would really welcome a nice and efficient implementation of available via scikit, for example, using voronoi iteration. Best, Sebastian > On Jul 30, 2015, at 1:51 PM, Timo

[Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-30 Thread Timo Erkkilä
Hi all, I checked and could find no mention of KMedoids in Scikit-Learn. Me and my friend have implemented the algorithm in Python, and were wondering if it could be brought into Scikit-Learn. Thoughts? Cheers, Timo PS: I am new to the mailing list, so please guide me in case I am doing someth