Re: [R] hierarchical clustering of large dataset

Peter Langfelder Fri, 09 Mar 2012 16:19:44 -0800

On Fri, Mar 9, 2012 at 1:50 PM, Massimo Di Stefano
<massimodisa...@gmail.com> wrote:
> Peter,
>
> really thanks  for your answer.
>
>
>
> install.packages("flashClust")
> library(flashClust)
> data <- read.csv('/Users/epifanio/Desktop/cluster/x.txt')
> data <- na.omit(data)
> data <- scale(data)
>> mydata
>                 a             b            c          d           e
> 1     -0.207709346 -6.618558e-01  0.481413046  0.7761133  0.96473124
> 2     -0.207709346 -6.618558e-01  0.481413046  0.7761133  0.96473124
> 3     -0.256330843 -6.618558e-01 -0.352285877  0.7761133  0.96473124
> 4     -0.289039851 -6.618558e-01 -0.370032451 -0.2838308  0.96473124
>
>
> my target is to group my observation by 'speciesID'
> the speciesID is the last column : 'e'
>
>
>
> Before to go ahead, i should understand how to tell R that the he has to 
> generate the groups using the column 'e' as variable,
> so to have the groups by speciesID.
>
> using this instruction :
>
> d <- dist(data)
> clust <- hclust(d)
>
> is not clear to me how R will understand to use the column 'e' as label.


Well, you didn't say that column e was a label that you wanted to keep
separate. Any other labels in the data? You may not want to use labels
in the distance calculation.

Do I understand right that you want to cluster each species separately?

Peter

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] hierarchical clustering of large dataset

Reply via email to