[R] Help with 2-D plot of k-mean clustering analysis
Hi, all I would like to use R to perform k-means clustering on my data which included 33 samples measured with ~1000 variables. I have already used kmeans package for this analysis, and showed that there are 4 clusters in my data. However, it's really difficult to plot this cluster in 2-D format since the huge number of variables. One possible way is to project the multidimensional space into 2-D platform, but I could not find any good way to do that. Any suggestions or comments will be really helpful! Thanks, Meng [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with 2-D plot of k-mean clustering analysis
I wonder if it makes sense to reduce the dimensionality of the variables somehow? David Cross d.cr...@tcu.edu www.davidcross.us On May 18, 2011, at 9:41 AM, Meng Wu wrote: Hi, all I would like to use R to perform k-means clustering on my data which included 33 samples measured with ~1000 variables. I have already used kmeans package for this analysis, and showed that there are 4 clusters in my data. However, it's really difficult to plot this cluster in 2-D format since the huge number of variables. One possible way is to project the multidimensional space into 2-D platform, but I could not find any good way to do that. Any suggestions or comments will be really helpful! Thanks, Meng [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with 2-D plot of k-mean clustering analysis
On Wed, May 18, 2011 at 7:41 AM, Meng Wu mengwu1...@gmail.com wrote: Hi, all I would like to use R to perform k-means clustering on my data which included 33 samples measured with ~1000 variables. I have already used kmeans package for this analysis, and showed that there are 4 clusters in my data. However, it's really difficult to plot this cluster in 2-D format since the huge number of variables. One possible way is to project the multidimensional space into 2-D platform, but I could not find any good way to do that. Any suggestions or comments will be really helpful! You could use multidimensional scaling, function cmdscale(), to produce a 2-dimensional representation of your data, then plot it using colors that correspond to the clusters. For example, suppose your data is stored in matrix X (1000x33). I assume you clustered the samples, not the variables, so you have a vector label[] with length 33 that has values between 1 and 4. Since k-means uses Euclidean distance, you would re-create the distance dst = dist(t(X)) then feed it into cmdscale() mds = cmdscale(dst); then plot it: plot(mds, col = label) HTH, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with 2-D plot of k-mean clustering analysis
Hi Meng, I would like to use R to perform k-means clustering on my data which included 33 samples measured with ~1000 variables. I have already used kmeans package for this analysis, and showed that there are 4 clusters in my data. However, it's really difficult to plot this cluster in 2-D format since the huge number of variables. One possible way is to project the multidimensional space into 2-D platform, but I could not find any good way to do that. Any suggestions or comments will be really helpful! For suggestions it would be extremely helpful to tell us what kind of variables your 1000 variables are. Parallel coordinate plots plot values over (many) variables. Whether this is useful, depends very much on your variables: E.g. I have spectral channels, they have an intrinsic order and the values have physically the same meaning (and almost the same range), so the parallel coordinate plot comes naturally (it produces in fact the spectra). Claudia Thanks, Meng [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with 2-D plot of k-mean clustering analysis
One idea: Pick the three largest clusters, their centers determine a plane. project your data into that plane. albyn On Wed, May 18, 2011 at 06:55:39PM +0200, Claudia Beleites wrote: Hi Meng, I would like to use R to perform k-means clustering on my data which included 33 samples measured with ~1000 variables. I have already used kmeans package for this analysis, and showed that there are 4 clusters in my data. However, it's really difficult to plot this cluster in 2-D format since the huge number of variables. One possible way is to project the multidimensional space into 2-D platform, but I could not find any good way to do that. Any suggestions or comments will be really helpful! For suggestions it would be extremely helpful to tell us what kind of variables your 1000 variables are. Parallel coordinate plots plot values over (many) variables. Whether this is useful, depends very much on your variables: E.g. I have spectral channels, they have an intrinsic order and the values have physically the same meaning (and almost the same range), so the parallel coordinate plot comes naturally (it produces in fact the spectra). Claudia Thanks, Meng [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.