[R] Re: clustering polypeptide sequences

ucgamdo Mon, 08 Sep 2003 05:14:45 -0700

Hi Peter,

   You didn't give a very specific example, but it seems to me that what
you wish to do is not really complicated. I suppose you have created a
table of sequences vs. say hyprophobicity, charge, etc..., something like...


seq     hydroph arom
b0001   0.104762        0.000000
b0002   0.035122        0.065854
b0003   0.024193        0.070968
b0004   -0.096729       0.084112
b0005   -0.973469       0.091837
b0006   -0.402713       0.108527
b0007   0.680672        0.123950
b0008   -0.209779       0.072555
b0009   -0.013334       0.046154
b0010   0.952128        0.143617

suppose you have these data into a data frame called myseqs [see the R
documentation in how to upload these data, you can try       > myseqs <-
edit(read.table()) ]

# you need to load the necessary libraries

library(mva)      # basic clustering
library(cluster)  # more clustering algorithms

# then you need to calculate the 'distances' between sequences

myseqs.d <- dist(myseqs)  # this creates the euclidean distance matrix, try
help(dist) for more info

# then we perform a hierarchical cluster

myseqs.clus <- hclust(myseqs.d)

# now checkout your results

plot(myseqs.clus) # hey! you see how easy it is?

# the documentation for hlcust contains much more info
# other fancy clustering algorithms

myseqs.pam <- pam(myseqs, k = 2)
plot(myseqs.pam)

I hope this is of any help.

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] Re: clustering polypeptide sequences

Reply via email to