I have a matrix of similarity scores that I want to convert into a matrix of dissimilarity scores so that I can apply some clustering methods to the data. That is, high values in my matrix signify similarity and low values (zero being the lowest) signify no similarity. What functions/options in R or its packages are available for making this kind of transformation of a matrix?

Specifically, I am a molecular biologist. I have a set of 700+ nucleotide sequences i want to group into clusters based on sequence similarities. There is a wide range of sequences in the set, some of which are homologous to other sequences in the set. I want to use clustering to identify these groups.

If the sequences were related and good be trimmed to the same length, I would do an alignment and then use phylip (or some other distance method) to create a distance matrix, but since my sequences are unrelated and cannot be trimmed to the same length, I am at a loss for what to do.

For a set with so many unrelated sequences of different lengths, the only thing I have been able to is an all-against-all BLAST to create the matrix, but this gives high scores for similarities, not high scores for dissimilarities. The only thought I had was to use the reciprocal of the BLAST score as some perverse measure of distance.

I am not subscribed to the list, so can I ask for responses directly to my email address?

Thank-you,
Tom Isenbarger


-- [EMAIL PROTECTED] thomas a isenbarger (608) 265-0850

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to