Hi Gaël,
Sure, I understand the rationale behind the requirement of 1000+ cites
etc., and as I mentioned above, I am quite happy to release it via PyPI.
Wang et al. 2008 claim that their approach improves correctness of Affinity
Propagation clustering (though it increases the running times). Correct me
if I am wrong, from your reply it looks like you are not persuaded by the
paper and do not recommend including the algorithm in sklearn.
Best wishes,
ilya.
2014-12-03 13:54 GMT+00:00 Gael Varoquaux <gael.varoqu...@normalesup.org>:
> Привет Илья,
>
> I'm actually really not excited about affinity propagation. Firstly, it's
> slow. Clustering has pretty much 2 usecases. The first one is to find
> latent meaningful structure. This is a hard problem in the sens of
> learning theory, thus to be able to trust the solution one needs many
> sample. The second problem is to be able to reduce the problem size, by
> assigning replacing samples by centers. Both of these usecases are really
> relevant only when there are many samples. Thus a slow clustering method
> is not very useful. The second reason that I don't like affinity
> propagation, is that it has many parameters to set, and gives very
> strange/unstable results.
>
> I think that the empirical comparison of clustering algorithms that we
> have on top of the clustering page:
>
> http://scikit-learn.org/stable/modules/clustering.html#overview-of-clustering-methods
> is quite telling in terms of what are the limitation of affinity
> propagation. I have personally not seen it used in any non trivial
> application (or academic papers interested in it theoretically).
>
> Now, the enhancements that you are proposing are trying to tackle both
> limitations of affinity propagation. So, on paper, they look great.
> However, I am a computer scientist that publishes papers on methods, and
> thus I know how weak a claim is when it is in a paper by the authors of a
> method. Thus I don't trust that a method actually has the benefits that
> it claims it has, unless I see it proved on many different applications,
> by many different people. Experience has really taught me this, and I
> must say that there are some methods that I regret pushing in
> scikit-learn. That's why we have the requirements on the number of
> citations. We find that a method that is really useful gets used, and
> thus cited. Of one of proving us wrong, is to do an implementation
> outside of scikit-learn, in a separate package, and in the examples of
> this package, show that the method really solves very well problems that
> are not solved way by the methods in scikit-learn.
>
>
> Do you understand our line of thought's? It's not against methods in
> general, it's just that we are trying hard to find the right subset of
> the literature that we should be struggling to keep alive and kicking.
>
> Cheers,
>
> Gaël
>
>
> On Wed, Dec 03, 2014 at 01:08:17PM +0000, Илья Патрушев wrote:
> > Hi Andy,
>
> > Adaptive Affinity Propagation is essentially an additional optimisation
> layer
> > on top of the original Affinity Propagation algorithm.
> > Affinity Propagation algorithm works off the similarity matrix and tries
> to
> > identify a number of data points that would be "centres" of clusters. The
> > behaviour of Affinity Propagation algorithm is governed by two
> parameters:
> > preferences (a vector of n_samples size) and damping.
> > The preferences on one hand are the way to incorporate prior knowledge
> about
> > likely cluster centres, on the other hand they control the number of
> clusters
> > produced by the algorithm. When there is no prior knowledge, preferences
> are
> > set to the same value for all sample points. The general relationship
> between
> > the preference value and the number of clusters is; the greater the
> value the
> > greater the number of clusters. The authors of the Affinity Propagation
> > algorithm recommend using the median similarity value, but in the end
> one has
> > to find the right preference value for each new clustering problem.
> > The damping parameter defines speed at which the algorithm updates its
> > responsibility/availability evidence. The higher the damping parameter
> is the
> > less the algorithm prone to oscillations, but this slows down
> convergence.
> > The Wang's solution is to run Affinity Propagation algorithm starting
> with
> > quite high preference value (like .5 of median similarity). As it
> converges,
> > the goodness of clustering is measured (they suggested Silhouette index)
> the
> > preference is decreased, and these steps are repeated until the algorithm
> > produces some minimal number of clusters. Along with that, the presence
> of
> > oscillations is monitored and should they appear they are controlled by
> > adjusting the damping parameter, should it reaches maximum value by
> reducing
> > the preference value.
> > The pdf in arXiv is the English translation of the original paper
> published in
> > Chinese.
> > I agree, Adaptive Affinity Propagation is not as widely used method as
> defined
> > in FAQ, I should have looked in it beforehand. May be it can be
> considered a
> > clear-cut improvement of the Affinity Propagation algorithm?
> > Any way if it is not to be added in sklearn, I am quite happy to release
> it via
> > PyPI.
>
> > Best wishes,
> > ilya
>
>
> > 2014-12-02 14:34 GMT+00:00 Andy <t3k...@gmail.com>:
>
> > Hi Ilya.
>
> > Thanks for your interest in contributing.
> > I am not expert in affinity propagation, so it would be great if you
> could
> > give some details of what the advantage of the method is.
> > The reference paper seems to be an arxiv preprint with 88 citations,
> which
> > would probably not qualify for inclusion in scikit-learn,
> > see the FAQ http://scikit-learn.org/dev/faq.html#
> > can-i-add-this-new-algorithm-that-i-or-someone-else-just-published
>
> > It might be a candidate for an external experimental / contribution
> > project, which has been an idea that has been floating around for a
> while.
>
> > Cheers,
> > Andy
>
>
>
> > On 12/02/2014 09:06 AM, Илья Патрушев wrote:
>
> > Hi everybody,
>
> > As far as I am aware, there is no adaptive affinity propagation
> > clustering algorithm implementation in neither the stable nor the
> > development version of sklearn.
> > I have recently implemented the adaptive affinity propagation
> algorithm
> > as a part of my image analysis project. I based my
> implementation on
> > the paper by Wang et al., 2007 and their Matlab code, and
> sklearn's
> > affinity propagation algorithm. This is not exactly a port of
> Matlab
> > code since I have slightly modified the Wang's approach to deal
> with
> > oscillations and added an optional upper limit on number of
> clusters.
> > I am planning to submit the code to sklearn eventually. So
> please let
> > me know if anybody already works on the algorithm, as we could
> join our
> > efforts and save some time.
>
> > Best wishes,
> > ilya.
> --
> Gael Varoquaux
> Researcher, INRIA Parietal
> Laboratoire de Neuro-Imagerie Assistee par Ordinateur
> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
> Phone: ++ 33-1-69-08-79-68
> http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
--
Ilya Patrushev,
MRC National Institute for Medical Research
The Ridgeway
Mill Hill
London NW7 1AA
UK
Tel: 0208 816 2656
Fax: 0208 906 4477
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general