Hi Chrysanthi,
Chrysanthi A. wrote:
Thanks a lot..! What exactly the sweep function is doing? Also, is there
a possibility instead of using the mean of the whole row to get only the
mean of a group of the row values? So the values in the matrix (heat
map) used in the comparison are z-scores and not the intensities of the
gene expressions, right?
I was trying to give a subtle hint below, but maybe I should be a bit
more blunt. One of the coolest things about R is that it is free, and
there are these sweet listservs where people give advice and help for
free as well.
HOWEVER, there is still a price to pay, and that is with your time. All
of these functions have help pages that the developers spent time
writing, and the code is there for you to peruse. Because of this, there
is some expectation that you would have done so prior to asking
questions. Now I have read the help page for sweep, and quite frankly it
is a bit confusing. The term 'sweep' is used without definition, so if
one doesn't know what that means the help page is less than helpful. But
it doesn't take much time or effort to empirically see what it does:
> a <- matrix(rnorm(25), ncol=5)
> a
[,1] [,2] [,3] [,4] [,5]
[1,] 0.6841637 -1.0590185 -0.1719887 -0.01916011 -1.61936817
[2,] 0.5707217 1.4790968 1.6736991 -0.72158518 1.22467334
[3,] 0.4440499 -0.3382888 -0.1504191 0.32140022 1.83780859
[4,] -0.6659568 3.0573678 -1.5709904 -1.35618488 -0.01717017
[5,] -0.3182206 2.2777597 -0.2325356 -0.02001414 1.77440090
> rm <- rowMeans(a)
> rm
[1] -0.4370743 0.8453211 0.4229102 -0.1105869 0.6962780
> sweep(a, 1, rm, "-")
[,1] [,2] [,3] [,4] [,5]
[1,] 1.12123808 -0.6219441 0.2650857 0.4179142 -1.18229384
[2,] -0.27459943 0.6337756 0.8283779 -1.5669063 0.37935220
[3,] 0.02113977 -0.7611990 -0.5733293 -0.1015100 1.41489842
[4,] -0.55536988 3.1679546 -1.4604035 -1.2455980 0.09341672
[5,] -1.01449866 1.5814817 -0.9288137 -0.7162922 1.07812286
For your second question:
?heatmap.2
Also, as I can understand from the code, heatmap is using distfun
function for the clusering. Can I use pearson correlation for the
clustering? My main object of using the heatmap is to examine the
expression levels of the marker genes and to confirm that the marker
genes are clearly differentially expressed in the two subtypes of the
disease that I examine.
No, heatmap.2() is not using distfun for the clustering. There isn't a
function by that name in either gplots nor base R. If you look at the
help page, you can see that distfun is an argument to the function, and
the default is to use the dist() function.
You can use Pearson correlation, but in my experience it takes some
work. Again, if you read the help page, you can see that the Rowv and
Colv arguments can be one of TRUE, FALSE, NULL, or a dendrogram. So if
you want to use Pearson correlation, you should supply heatmap.2() with
dendrograms produced using that correlation. So an example:
a <- matrix(rnorm(50), ncol=5)
rowv <- as.dendrogram(hclust(as.dist(1-cor(t(a)))))
colv <- as.dendrogram(hclust(as.dist(1-cor(a))))
heatmap.2(a, scale="row", Rowv=rowv, Colv=colv)
Best,
Jim
Many thanks,
Chrysanthi.
2009/7/8 James W. MacDonald <jmac...@med.umich.edu
<mailto:jmac...@med.umich.edu>>
Hi Chrysanthi,
Chrysanthi A. wrote:
Hi,
I am analysing gene expression data using the heatmap.2 function
in R and I
was wondering what is the formula of the "raw z-score" bar which
shows the
colors for each pixel.
According to that post:
https://mailman.stat.ethz.ch/pipermail/r-help/2006-September/113598.html,
it
is the
(actual value - mean of the group) / standard deviation.
But, mean of which group? Mean of the gene vector? And actual
value of that
gene on a sample? I would be grateful if you could give me some
more
details about it or even if there is a book/manual that I could
address
to..
How about looking at the code?
if (scale == "row") {
retval$rowMeans <- rm <- rowMeans(x, na.rm = na.rm)
x <- sweep(x, 1, rm)
retval$rowSDs <- sx <- apply(x, 1, sd, na.rm = na.rm)
x <- sweep(x, 1, sx, "/")
}
else if (scale == "column") {
retval$colMeans <- rm <- colMeans(x, na.rm = na.rm)
x <- sweep(x, 2, rm)
retval$colSDs <- sx <- apply(x, 2, sd, na.rm = na.rm)
x <- sweep(x, 2, sx, "/")
}
So the z-score is calculated on either the row or column (or the
default of "none").
I don't see how you can get something saying 'raw z-score'. I get
either 'Row Z-Score' or 'Column Z-Score'. So assuming you meant Row
Z-Score, then the rows are centered and scaled by subtracting the
mean of the row from every value and then dividing the resulting
values by the standard deviation of the row.
Best,
Jim
Thanks a lot,
Chrysanthi.
*
*
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org <mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.