Re: [scikit-learn] A basic question about kmeans algorithms elkan and llyod

Andreas Mueller Mon, 30 Mar 2020 12:06:03 -0700

sorry I thought it also did experiements on what they call "sta" but Iguess they are not included.The conclusion is the same, though. Different algorithms show differentperformance on different datasets.


The Yingyang k-means has some elkan vs lloyd figures:
http://proceedings.mlr.press/v37/ding15.pdf

In table 2, the Elkan row, in cases the speedup is <1, it means elkansis slower than lloyd.Elkans is also more memory intensive, so you can see some missing valuesin that where the computation couldn't be performed, but lloyd could.




On 3/30/20 3:33 AM, 樊 书华 wrote:

Hi,

Thanks for your suggestion of the paper. However, the paper shows manymore algorithms and finds out different algorithms show differentperformance on dataset with various dimensions, Lloyd algorithm notincluded. What I want to know is that can we remove the Lloydalgorithm in kmeans of scikit-learn since elkan is an optimized onwith better performance.


Best regards,

George

*From:* scikit-learn<scikit-learn-bounces+mc_george123=hotmail....@python.org> *On BehalfOf *Andreas Mueller

*Sent:* Saturday, March 28, 2020 12:37 AM
*To:* scikit-learn@python.org

*Subject:* Re: [scikit-learn] A basic question about kmeans algorithmselkan and llyod


There's an interesting analysis in this paper:
Fast K-Means with Accurate Bounds

http://proceedings.mlr.press/v48/newling16.pdf

On 3/26/20 3:40 AM, Alexandre Gramfort wrote:

    hi,

    I suspect Elkan is really winning when you have many centroids

    so the conclusion is not systematic

    my 2c

    Alex

    On Thu, Mar 26, 2020 at 3:18 AM mc_george...@hotmail.com
    <mailto:mc_george...@hotmail.com> <mc_george...@hotmail.com
    <mailto:mc_george...@hotmail.com>> wrote:

        Hi admins,

        My team is working on optimization on scikit-learn staff now.
        When it comes to kmeans, I find there are two algorithms, one
        of which is lloyd and the other is elkan, which is the
        optimized one for lloyd using triangle inequality.  In the
        older version of scikit-learn, elkan only supports dense
        dataset instead of sparse one. And in the latest version,
        elkan supports both type of datasets. So there is a question
        why both two algorithms are kept in kmeans since they do the
        almost same thing and elkan is a optimized one for lloyd. Are
        there any precision difference between two algorithms and how
        can I decide what algorithm to use?

        Best regards,

        George Fan

        _______________________________________________
        scikit-learn mailing list
        scikit-learn@python.org <mailto:scikit-learn@python.org>
        https://mail.python.org/mailman/listinfo/scikit-learn



    _______________________________________________

    scikit-learn mailing list

    scikit-learn@python.org  <mailto:scikit-learn@python.org>

    https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] A basic question about kmeans algorithms elkan and llyod

Reply via email to