sorry I thought it also did experiements on what they call "sta" but I
guess they are not included.
The conclusion is the same, though. Different algorithms show different
performance on different datasets.
The Yingyang k-means has some elkan vs lloyd figures:
http://proceedings.mlr.press/v37/ding15.pdf
In table 2, the Elkan row, in cases the speedup is <1, it means elkans
is slower than lloyd.
Elkans is also more memory intensive, so you can see some missing values
in that where the computation couldn't be performed, but lloyd could.
On 3/30/20 3:33 AM, 樊 书华 wrote:
Hi,
Thanks for your suggestion of the paper. However, the paper shows many
more algorithms and finds out different algorithms show different
performance on dataset with various dimensions, Lloyd algorithm not
included. What I want to know is that can we remove the Lloyd
algorithm in kmeans of scikit-learn since elkan is an optimized on
with better performance.
Best regards,
George
*From:* scikit-learn
<scikit-learn-bounces+mc_george123=hotmail....@python.org> *On Behalf
Of *Andreas Mueller
*Sent:* Saturday, March 28, 2020 12:37 AM
*To:* scikit-learn@python.org
*Subject:* Re: [scikit-learn] A basic question about kmeans algorithms
elkan and llyod
There's an interesting analysis in this paper:
Fast K-Means with Accurate Bounds
http://proceedings.mlr.press/v48/newling16.pdf
On 3/26/20 3:40 AM, Alexandre Gramfort wrote:
hi,
I suspect Elkan is really winning when you have many centroids
so the conclusion is not systematic
my 2c
Alex
On Thu, Mar 26, 2020 at 3:18 AM mc_george...@hotmail.com
<mailto:mc_george...@hotmail.com> <mc_george...@hotmail.com
<mailto:mc_george...@hotmail.com>> wrote:
Hi admins,
My team is working on optimization on scikit-learn staff now.
When it comes to kmeans, I find there are two algorithms, one
of which is lloyd and the other is elkan, which is the
optimized one for lloyd using triangle inequality. In the
older version of scikit-learn, elkan only supports dense
dataset instead of sparse one. And in the latest version,
elkan supports both type of datasets. So there is a question
why both two algorithms are kept in kmeans since they do the
almost same thing and elkan is a optimized one for lloyd. Are
there any precision difference between two algorithms and how
can I decide what algorithm to use?
Best regards,
George Fan
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn