+1
On 07/31/2015 04:50 PM, Sebastian Raschka wrote:
Hi, Timo,
wow, the code really short, well organized and commented. But it's
probably better to submit a pull request so that people can directly
comment on sections of the code and get notifications and updates.
Best,
Sebastian
On Jul 31, 2015, at 4:35 PM, Timo Erkkilä <timo.erkk...@gmail.com
<mailto:timo.erkk...@gmail.com>> wrote:
Good ideas. I'm fine integrating the code to Scikit-Learn even though
it's a bit of work. :) I've pushed the first version of the code
under feature branch "kmedoids":
https://github.com/terkkila/scikit-learn/blob/kmedoids/sklearn/cluster/k_medoids_.py
I've added drafts of the "clustering" and "distance_metric"
arguments. Please take a look and comment.
Cheers,
Timo
On Fri, Jul 31, 2015 at 7:59 PM, Sebastian Raschka
<se.rasc...@gmail.com <mailto:se.rasc...@gmail.com>> wrote:
To address the efficiency issue for large datasets (to some
extend), we could maybe have a `clustering` argument where
`clustering='pam'` or `clustering='clara'`; 'pam' should probably
be the default.
In a nutshell, CLARA repeatedly draws random samples (k <
n_samples), applies PAM to them, and finds the best clustering.
Here are some good resources for PAM and CLARA:
- PAM (Partitioning Around Medoids):
https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Partitioning_Around_Medoids_(PAM)
<https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Partitioning_Around_Medoids_%28PAM%29>
- CLARA (Clustering for Large Applications):
https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/CLARA
Best,
Sebastian
On Jul 31, 2015, at 12:46 PM, Andreas Mueller <t3k...@gmail.com
<mailto:t3k...@gmail.com>> wrote:
Cool.
Including the code in scikit-learn is often a bit of a process
but it might indeed be interesting.
You could just start with a pull request - or publish a gist if
you don't think you'll have time to work on the inclusion and
leave that part to someone else.
Cheers,
Andy
On 07/31/2015 05:38 AM, Timo Erkkilä wrote:
That makes sense. The basic implementation is definitely short,
just ~20 lines of code if you don't count comments etc. I can
put the source code available so that you can judge whether
it's good to take further. I am familiar with the documentation
libraries you are using (Sphinx with Numpy style docstrings) in
Scikit-Learn, but that's further down the line.
Cheers,
Timo
On Fri, Jul 31, 2015 at 10:53 AM, Gael Varoquaux
<gael.varoqu...@normalesup.org
<mailto:gael.varoqu...@normalesup.org>> wrote:
> Is it required that an algorithm, which is implemented in
Scikit-Learn, scales
> well wrt n_samples?
The requirement is 'be actually useful', which is something
that is a bit
hard to judge :).
I think that K-medoids is bordeline on this requirement,
probably on the
right side of the border. I would tend to say that if the
code clean and
reasonnably short (that last requirement is important), it
comes with
good tests, examples and documentation, it should be
possible to merge it
in.
Sorry, we are indeed being picky. It's a struggle to find
the right
feature set to keep the package maintainable while
providing great value
to our users.
Gaël
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general