+1
On 07/31/2015 04:50 PM, Sebastian Raschka wrote:
Hi, Timo,
wow, the code really short, well organized and commented. But it's
probably better to submit a pull request so that people can directly
comment on sections of the code and get notifications and updates.
Best,
Sebastian
On Jul 31,
Hi, Timo,
wow, the code really short, well organized and commented. But it's probably
better to submit a pull request so that people can directly comment on sections
of the code and get notifications and updates.
Best,
Sebastian
> On Jul 31, 2015, at 4:35 PM, Timo Erkkilä wrote:
>
> Good idea
Good ideas. I'm fine integrating the code to Scikit-Learn even though it's
a bit of work. :) I've pushed the first version of the code under feature
branch "kmedoids":
https://github.com/terkkila/scikit-learn/blob/kmedoids/sklearn/cluster/k_medoids_.py
I've added drafts of the "clustering" and "d
To address the efficiency issue for large datasets (to some extend), we could
maybe have a `clustering` argument where `clustering='pam'` or
`clustering='clara'`; 'pam' should probably be the default.
In a nutshell, CLARA repeatedly draws random samples (k < n_samples), applies
PAM to them, and
Cool.
Including the code in scikit-learn is often a bit of a process but it
might indeed be interesting.
You could just start with a pull request - or publish a gist if you
don't think you'll have time to work on the inclusion and leave that
part to someone else.
Cheers,
Andy
On 07/31/2015 0
That makes sense. The basic implementation is definitely short, just ~20
lines of code if you don't count comments etc. I can put the source code
available so that you can judge whether it's good to take further. I am
familiar with the documentation libraries you are using (Sphinx with Numpy
style
> Is it required that an algorithm, which is implemented in Scikit-Learn, scales
> well wrt n_samples?
The requirement is 'be actually useful', which is something that is a bit
hard to judge :).
I think that K-medoids is bordeline on this requirement, probably on the
right side of the border. I
I was using a dynamic time warping (DTW) distance with KMedoids, which made
more sense than using euclidean distance since the profiles indeed had
warps along the time axis. DTW implementation was taken from MLPY since
it's not in Scikit-Learn either.
Is it required that an algorithm, which is imp
Yes, I may be far more expensive than k-means. I just used it with Euclidean
distance -- was for a comparison. I think k-medoids can still be useful for
smaller, maybe noisier datasets, or if you have some distance measure were
calculating averages may not make sense.
> On Jul 30, 2015, at 2:4
I think KMediods has come up before.
One issues is that it doesn't really scale to large n_samples, right?
There is an implementation mentioned here:
https://github.com/scikit-learn/scikit-learn/issues/3799
Do you use it because you have a custom distance matrix?
On 07/30/2015 02:27 PM, Sebastia
I was looking for K-Medoids too couple of weeks ago and ended up implementing
it myself -- but more like quick & dirty. I would really welcome a nice and
efficient implementation of available via scikit, for example, using voronoi
iteration.
Best,
Sebastian
> On Jul 30, 2015, at 1:51 PM, Timo
Hi all,
I checked and could find no mention of KMedoids in Scikit-Learn. Me and my
friend have implemented the algorithm in Python, and were wondering if it
could be brought into Scikit-Learn. Thoughts?
Cheers,
Timo
PS: I am new to the mailing list, so please guide me in case I am doing
someth
12 matches
Mail list logo