[R] Kmeans performance difference

Moisan Yves Wed, 04 Jul 2007 12:09:43 -0700

Hi All,

A question from a newbie using R 2-5-0 on windows XP.  Why is it that
kmeans clustering with apparently the exact same parameters behaves so
differently between the two following examples :


> cl1 <- kmeans(subset(pointsUXO15555, select = c(2:4)), 10)

Takes about 2 seconds to deliver a result

> cl1 <- clust(subset(pointsUXO15555, select = c(2:4)), k=10,
method="kmeansHartigan") 

Dies after about 10 minutes and fills up RAM :   

*** running kmeansHartigan cluster algorithm...

 *** calculating validity measure... 
Erreur : impossible d'allouer un vecteur de taille 922.9 Mo
De plus : Warning messages:
1: Reached total allocation of 1023Mb: see help(memory.size) 
2: Reached total allocation of 1023Mb: see help(memory.size) 
3: Reached total allocation of 1023Mb: see help(memory.size) 
4: Reached total allocation of 1023Mb: see help(memory.size)

If I understand correctly, both methods should give the sameish results
(modulo the initial random locations) since the default in kmeans is
"Hartigan-Wong".  My data frame is 3 columns X 15555 lines.  It must be
that kmeans is more a "core" R function whereas clust id from the
clustTool package, but isn't clustTool simply wrapping the core kmeans
method ?  Why such a difference ?

TIA,

Yves Moisan

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Kmeans performance difference

Reply via email to