Re: [R] Clustering of datasets

2022-09-05 Thread Rui Barradas

Hello,

I am not at all sure that the following answers the question.
The code below ries to find the optimal number of clusters. One of the 
changes I have made to your call to kmeans is to subset DMs not dropping 
the dim attribute.



library(cluster)

max_clust <- 10
wss <- numeric(max_clust)

for(k in 1:max_clust) {
  km <- kmeans(DMs[,2], centers = k, nstart = 25)
  wss[k] <- km$tot.withinss
}
plot(wss, type = "b")

dm <- DMs[, 2, drop = FALSE]
# Where is the elbow, at 2 or at 4?
factoextra::fviz_nbclust(dm, kmeans, method = "wss")
factoextra::fviz_nbclust(dm, kmeans, method = "silhouette")

k2 <- kmeans(dm, centers = 2, nstart = 25)
k3 <- kmeans(dm, centers = 3, nstart = 25)
k4 <- kmeans(dm, centers = 4, nstart = 25)

main2 <- paste(length(k2$centers), "clusters")
main3 <- paste(length(k3$centers), "clusters")
main4 <- paste(length(k4$centers), "clusters")

old_par <- par(mfcol = c(1, 3))
plot(DMs[,2], col = k2$cluster, pch = 19, main = main2)
plot(DMs[,2], col = k3$cluster, pch = 19, main = main3)
plot(DMs[,2], col = k4$cluster, pch = 19, main = main4)
par(old_par)


Hope this helps,

Rui Barradas


Às 12:31 de 05/09/2022, Subhamitra Patra escreveu:

Dear all,

I am about to cluster my datasets by using K-mean clustering techniques in
R, but getting some type of scattered results. Herewith I pasted my code
below. Please suggest to me where I am lacking in my code. I was pasting my
data before applying the K-mean method as follows.

DMs<-read.table(text="Country DATA
   IS -0.0092
   BA -0.0235
   HK -0.0239
   JA -0.0333
   KU -0.0022
   OM -0.0963
   QA -0.0706
   SK -0.0322
   SA -0.1233
   SI -0.0141
   TA -0.0142
   UAE -0.0656
   AUS -0.0230
  BEL -0.0006
  CYP -0.0085
  CR  -0.0398
 DEN  -0.0423
   EST -0.0604
   FIN -0.0227
   FRA -0.0085
  GER -0.0272
  GrE -0.3519
  ICE -0.0210
  IRE -0.0057
  LAT -0.0595
 LITH -0.0451
 LUXE -0.0023
 MAL  -0.0351
 NETH -0.0048
   NOR -0.0495
   POL -0.0081
 PORT -0.0044
 SLOVA -0.1210
 SLOVE -0.0031
   SPA -0.0213
   SWE -0.0106
 SWIT -0.0152
   UK -0.0030
 HUNG -0.0086
   CAN -0.0144
 CHIL -0.0078
   USA -0.0042
 BERM -0.0035
 AUST -0.0211
 NEWZ -0.0538" ,
  header = TRUE,stringsAsFactors=FALSE)
library(cluster)
k1<-kmeans(DMs[,2],centers=2,nstart=25)
plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0))
text(1:46,DMs[,2],DMs[,1],col=k1$cluster)
legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less Integrated"),
col=1:2,pch=19)




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Clustering of datasets

2022-09-05 Thread Jim Lemon
Hi Subhamitra,
I think the fact that you are passing a vector of values rather than a
matrix is part of the problem. As you have only one value for each
country, The points plotted will be the index on the x-axis and the
value for each country on the y-axis. Passing a value for ylim= means
that you are cutting off the lowest points. Here is an example that
will give you two clusters and show the values for the centers in the
middle of the plot. Perhaps this is all you need, but I suspect there
is more work to be done.

k2<-kmeans(DMs[,2],centers=2)
plot(DMs[,2],col=k2$cluster,pch=19,xlim=c(1,46))
text(1:46,DMs[,2],DMs[,1],col=k2$cluster)
points(rep(23,2),k2$centers,pch=1:2,cex=2,col=k2$cluster)
legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less Integrated"),
col=1:2,pch=19)

Jim

On Mon, Sep 5, 2022 at 9:31 PM Subhamitra Patra
 wrote:
>
> Dear all,
>
> I am about to cluster my datasets by using K-mean clustering techniques in
> R, but getting some type of scattered results. Herewith I pasted my code
> below. Please suggest to me where I am lacking in my code. I was pasting my
> data before applying the K-mean method as follows.
>
> DMs<-read.table(text="Country DATA
>   IS -0.0092
>   BA -0.0235
>   HK -0.0239
>   JA -0.0333
>   KU -0.0022
>   OM -0.0963
>   QA -0.0706
>   SK -0.0322
>   SA -0.1233
>   SI -0.0141
>   TA -0.0142
>   UAE -0.0656
>   AUS -0.0230
>  BEL -0.0006
>  CYP -0.0085
>  CR  -0.0398
> DEN  -0.0423
>   EST -0.0604
>   FIN -0.0227
>   FRA -0.0085
>  GER -0.0272
>  GrE -0.3519
>  ICE -0.0210
>  IRE -0.0057
>  LAT -0.0595
> LITH -0.0451
> LUXE -0.0023
> MAL  -0.0351
> NETH -0.0048
>   NOR -0.0495
>   POL -0.0081
> PORT -0.0044
> SLOVA -0.1210
> SLOVE -0.0031
>   SPA -0.0213
>   SWE -0.0106
> SWIT -0.0152
>   UK -0.0030
> HUNG -0.0086
>   CAN -0.0144
> CHIL -0.0078
>   USA -0.0042
> BERM -0.0035
> AUST -0.0211
> NEWZ -0.0538" ,
>  header = TRUE,stringsAsFactors=FALSE)
> library(cluster)
> k1<-kmeans(DMs[,2],centers=2,nstart=25)
> plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0))
> text(1:46,DMs[,2],DMs[,1],col=k1$cluster)
> legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less Integrated"),
> col=1:2,pch=19)
>
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>
> [image: Mailtrack]
> 
> Sender
> notified by
> Mailtrack
> 
> 09/05/22,
> 04:55:22 PM
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.