Re: [R] heatmap.2: question regarding the "raw z-score"

James W. MacDonald Thu, 09 Jul 2009 07:21:12 -0700

Hi Chrysanthi,

Chrysanthi A. wrote:

Thanks a lot..! What exactly the sweep function is doing? Also, is therea possibility instead of using the mean of the whole row to get only themean of a group of the row values? So the values in the matrix (heatmap) used in the comparison are z-scores and not the intensities of thegene expressions, right?

I was trying to give a subtle hint below, but maybe I should be a bitmore blunt. One of the coolest things about R is that it is free, andthere are these sweet listservs where people give advice and help forfree as well.

HOWEVER, there is still a price to pay, and that is with your time. Allof these functions have help pages that the developers spent timewriting, and the code is there for you to peruse. Because of this, thereis some expectation that you would have done so prior to askingquestions. Now I have read the help page for sweep, and quite frankly itis a bit confusing. The term 'sweep' is used without definition, so ifone doesn't know what that means the help page is less than helpful. Butit doesn't take much time or effort to empirically see what it does:


> a <- matrix(rnorm(25), ncol=5)
> a
           [,1]       [,2]       [,3]        [,4]        [,5]
[1,]  0.6841637 -1.0590185 -0.1719887 -0.01916011 -1.61936817
[2,]  0.5707217  1.4790968  1.6736991 -0.72158518  1.22467334
[3,]  0.4440499 -0.3382888 -0.1504191  0.32140022  1.83780859
[4,] -0.6659568  3.0573678 -1.5709904 -1.35618488 -0.01717017
[5,] -0.3182206  2.2777597 -0.2325356 -0.02001414  1.77440090
> rm <- rowMeans(a)
> rm
[1] -0.4370743  0.8453211  0.4229102 -0.1105869  0.6962780
> sweep(a, 1, rm, "-")
            [,1]       [,2]       [,3]       [,4]        [,5]
[1,]  1.12123808 -0.6219441  0.2650857  0.4179142 -1.18229384
[2,] -0.27459943  0.6337756  0.8283779 -1.5669063  0.37935220
[3,]  0.02113977 -0.7611990 -0.5733293 -0.1015100  1.41489842
[4,] -0.55536988  3.1679546 -1.4604035 -1.2455980  0.09341672
[5,] -1.01449866  1.5814817 -0.9288137 -0.7162922  1.07812286

For your second question:

?heatmap.2

Also, as I can understand from the code, heatmap is using distfunfunction for the clusering. Can I use pearson correlation for theclustering? My main object of using the heatmap is to examine theexpression levels of the marker genes and to confirm that the markergenes are clearly differentially expressed in the two subtypes of thedisease that I examine.

No, heatmap.2() is not using distfun for the clustering. There isn't afunction by that name in either gplots nor base R. If you look at thehelp page, you can see that distfun is an argument to the function, andthe default is to use the dist() function.

You can use Pearson correlation, but in my experience it takes somework. Again, if you read the help page, you can see that the Rowv andColv arguments can be one of TRUE, FALSE, NULL, or a dendrogram. So ifyou want to use Pearson correlation, you should supply heatmap.2() withdendrograms produced using that correlation. So an example:


a <- matrix(rnorm(50), ncol=5)
rowv <- as.dendrogram(hclust(as.dist(1-cor(t(a)))))
colv <- as.dendrogram(hclust(as.dist(1-cor(a))))
heatmap.2(a, scale="row", Rowv=rowv, Colv=colv)

Best,

Jim


Many thanks,

Chrysanthi.

2009/7/8 James W. MacDonald <[email protected]<mailto:[email protected]>>


    Hi Chrysanthi,


    Chrysanthi A. wrote:

        Hi,

        I am analysing gene expression data using the heatmap.2 function
        in R and I
        was wondering what is the formula of the "raw z-score" bar which
        shows the
        colors for each pixel.
        According to that post:
        
https://mailman.stat.ethz.ch/pipermail/r-help/2006-September/113598.html,
        it
        is the

        (actual value - mean of the group) / standard deviation.

        But, mean of which group? Mean of the gene vector? And actual
        value of that
        gene on a sample?  I would be grateful if you could give me some
        more
        details about it or even if there is a book/manual that I could
        address
        to..


    How about looking at the code?

       if (scale == "row") {
           retval$rowMeans <- rm <- rowMeans(x, na.rm = na.rm)
           x <- sweep(x, 1, rm)
           retval$rowSDs <- sx <- apply(x, 1, sd, na.rm = na.rm)
           x <- sweep(x, 1, sx, "/")
       }
       else if (scale == "column") {
           retval$colMeans <- rm <- colMeans(x, na.rm = na.rm)
           x <- sweep(x, 2, rm)
           retval$colSDs <- sx <- apply(x, 2, sd, na.rm = na.rm)
           x <- sweep(x, 2, sx, "/")
       }

    So the z-score is calculated on either the row or column (or the
    default of "none").

    I don't see how you can get something saying 'raw z-score'. I get
    either 'Row Z-Score' or 'Column Z-Score'. So assuming you meant Row
    Z-Score, then the rows are centered and scaled by subtracting the
    mean of the row from every value and then dividing the resulting
    values by the standard deviation of the row.

    Best,

    Jim



        Thanks a lot,

        Chrysanthi.

        *
        *

               [[alternative HTML version deleted]]

        ______________________________________________
        [email protected] <mailto:[email protected]> mailing list
        https://stat.ethz.ch/mailman/listinfo/r-help
        PLEASE do read the posting guide
        http://www.R-project.org/posting-guide.html
        and provide commented, minimal, self-contained, reproducible code.

--James W. MacDonald, M.S.

    Biostatistician
    Douglas Lab
    University of Michigan
    Department of Human Genetics
    5912 Buhl
    1241 E. Catherine St.
    Ann Arbor MI 48109-5618
    734-615-7826


--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] heatmap.2: question regarding the "raw z-score"

Reply via email to