Re: [R] Correlation between matrices
Thank you Dennis, your tips are really helpful. I don't quite understand the lm(y~mouse) part; my intention was -- in pseudo code -- lm(y(Enzyme) ~ y(each elem)). In addition, attach(d) seems necessary before using lm(y~mouse), and since d$mouse has a length 125, while each elem for each region has a length 5, it generates the following error: coefs = ddply(d, .(regions, elem), coefun) Error in model.frame.default(formula = y ~ mouse, drop.unused.levels = TRUE) : variable lengths differ (found for 'mouse') On Sun, Nov 6, 2011 at 12:53 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: I don't think you want to keep these objects separate; it's better to combine everything into a data frame. Here's a variation of your example - the x variable ends up being a mouse, but you may have another variable that's more appropriate to plot so take this as a starting point. One plot uses the ggplot2 package, the other uses the lattice and latticeExtra packages. library('ggplot2') regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain', 'cerebellum') mice = paste('mouse', 1:5, sep='') elem - c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme') # Generate a data frame from the combinations of # mice, regions and elem: d - data.frame(expand.grid(mice = mice, regions = regions, elem = elem), y = rnorm(125)) # Create a numeric version of mice d$mouse - as.numeric(d$mice) # A function to return regression coefficients coefun - function(df) coef(lm(y ~ mouse), data = df) # Apply to all regions * elem combinations coefs - ddply(d, .(regions, elem), coefun) names(coefs) - c('regions', 'elem', 'b0', 'b1') # Generate the plot using package ggplot2: ggplot(d, aes(x = mouse, y = y)) + geom_point(size = 2.5) + geom_abline(data = coefs, aes(intercept = b0, slope = b1), size = 1) + facet_grid(elem ~ regions) # Same plot in lattice: library('lattice') library('latticeExtra') p - xyplot(y ~ mouse | elem + regions, data = d, type = c('p', 'r'), layout = c(5, 5)) HTH, Dennis On Sat, Nov 5, 2011 at 10:49 AM, Kaiyin Zhong kindlych...@gmail.com wrote: regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain', 'cerebellum') mice = paste('mouse', 1:5, sep='') for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) { + assign(n, as.data.frame(replicate(5, rnorm(5 + } names(Cu) = names(Zn) = names(Fe) = names(Ca) = names(Enzyme) = regions row.names(Cu) = row.names(Zn) = row.names(Fe) = row.names(Ca) = row.names(Enzyme) = mice Cu cortex hippocampus brain_stem mid_brain cerebellum mouse1 -0.5436573 -0.31486713 0.1039148 -0.3908665 -1.0849112 mouse2 1.4559136 1.75731752 -2.1195118 -0.9894767 0.3609033 mouse3 -0.6735427 -0.04666507 0.9641000 0.4683339 0.7419944 mouse4 0.6926557 -0.47820023 1.3560802 0.9967562 -1.3727874 mouse5 0.2371585 0.20031393 -1.4978517 0.7535148 0.5632443 Zn cortex hippocampus brain_stem mid_brain cerebellum mouse1 -0.66424043 0.6664478 1.1983546 0.0319403 0.41955740 mouse2 -1.14510448 1.5612235 0.3210821 0.4094753 1.01637466 mouse3 -0.85954416 2.8275458 -0.6922565 -0.8182307 -0.06961242 mouse4 0.03606034 -0.7177256 0.7067217 0.2036655 -0.25542524 mouse5 0.67427572 0.6171704 0.1044267 -1.8636174 -0.07654666 Fe cortex hippocampus brain_stem mid_brain cerebellum mouse1 1.8337008 2.0884261 0.29730413 -1.6884804 0.8336137 mouse2 -0.2734139 -0.5728439 0.63791556 -0.6232828 -1.1352224 mouse3 -0.4795082 0.1627235 0.21775206 1.0751584 -0.5581422 mouse4 1.7125147 -0.5830600 1.40597896 -0.2815305 0.3776360 mouse5 -0.3469067 -0.4813120 -0.09606797 1.0970077 -1.1234038 Ca cortex hippocampus brain_stem mid_brain cerebellum mouse1 -0.7663354 0.8595091 1.33803798 -1.17651576 0.8299963 mouse2 -0.7132260 -0.2626811 0.08025079 -2.40924271 0.7883005 mouse3 -0.7988904 -0.1144639 -0.65901136 0.42462227 0.7068755 mouse4 0.3880393 0.5570068 -0.49969135 0.06633009 -1.3497228 mouse5 1.0077684 0.6023264 -0.57387762 0.25919461 -0.9337281 Enzyme cortex hippocampus brain_stem mid_brain cerebellum mouse1 1.3430936 0.5335819 -0.56992947 1.3565803 -0.8323391 mouse2 1.0520850 -1.0201124 0.8965 1.4719880 1.0854768 mouse3 -0.2802482 0.6863323 -1.37483570 -0.7790174 0.2446761 mouse4 -0.1916415 -0.4566571 1.93365932 1.3493848 0.2130424 mouse5 -1.0349593 -0.1940268 -0.07216321 -0.2968288 1.7406905 In each anatomic region, I would like to calculate the correlation between Enzyme activity and each of the concentrations of Cu, Zn, Fe, and Ca, and do a scatter plot with a tendency line, organizing those plots into a grid. See the image below for the desired effect: http://postimage.org/image/62brra6jn/ How can I achieve this? Thank you in advance. [[alternative HTML version
Re: [R] Correlation between matrices
Hi: On Sat, Nov 5, 2011 at 11:06 PM, Kaiyin Zhong kindlych...@gmail.com wrote: Thank you Dennis, your tips are really helpful. I don't quite understand the lm(y~mouse) part; my intention was -- in pseudo code -- lm(y(Enzyme) ~ y(each elem)). As I said in my first response, I didn't quite understand what you were trying to regress so I used the mouse as a way of showing you how the code works. I think I understand what you want now, though. I'll create a data set in two ways: the first assumes you have the data as constructed in your original post and the second generates random numbers after erecting a 'scaffold' data frame. The game is to separate the enzyme data from the element data and put them into the final data frame as separate columns. Then the regression is easy if that's what you need to do. # Method 1: Generate the data as you did into separate data frames elem0 - c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme') regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain', 'cerebellum') # Creates five 5 x 5 data frames with names V1-V5: for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) { assign(n, as.data.frame(replicate(5, rnorm(5 } # Stack the chemical element data using melt() from # the reshape2 package: library('reshape2') d1 - rbind(melt(Cu), melt(Zn), melt(Fe), melt(Ca)) # Relabel V1 - V5 with brain region names, add a factor # to distinguish individual elements and tack on the melted # Enzyme data so that it repeats in each element block d1 - within(d1, { variable - factor(d1$variable, labels = regions) elem - factor(rep(elem0[1:4], each = 25)) Enzyme - melt(Enzyme)[, 2] } ) # Plot the data using lattice and latticeExtra: library('lattice') library('latticeExtra') p - xyplot(Enzyme ~ value | variable + elem, data = d1, type = c('p', 'r')) useOuterStrips(p) ### ## Method 2: Generate the random data after setting ## up the element/region/mouse combinations ## # Generate a data frame from the combinations of # mice, regions and elem: library('ggplot2') mice - paste('mouse', 1:5, sep = '') regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain', 'cerebellum') elem - elem0[1:4] d0 - data.frame(expand.grid(mice = mice, regions = regions, elem = elem)) d0 - within(d0, { value - rnorm(100) # generate element values Enzyme - rnorm(25) # generate enzyme values } ) # the Enzyme values are recycled through all element blocks. # You can either adapt the lattice code above to plot d0, or you # can do the following to get an analogous plot in ggplot2. # It's easier to compute the slopes and intercepts and put # them into a data frame that ggplot() can import, so that's # what we'll do first. # A function to return regression coefficients from a # generic data frame. Since this function goes into ddply(), # the argument df is a (generic) data frame and the output # will be converted to a one-line data frame. coefun - function(df) coef(lm(Enzyme ~ value, data = df)) # Apply the function to all regions * elem combinations. # Output is a data frame of coefficients corresponding to # each region/element combination coefs - ddply(d0, .(regions, elem), coefun) # Rename the columns names(coefs) - c('regions', 'elem', 'b0', 'b1') # Generate the plot using package ggplot2: ggplot(d0, aes(x = val, y = Enzyme)) + geom_point(size = 2.5) + geom_abline(data = coefs, aes(intercept = b0, slope = b1), size = 1) + xlab() + facet_grid(elem ~ regions) In addition, attach(d) seems necessary before using lm(y~mouse), and since d$mouse has a length 125, while each elem for each region has a length 5, it generates the following error: You should never need to use attach() - use the data = argument in lm() instead, where the value of data is the name of a data frame. It's always easier to use the modeling functions in R having formula interfaces with data frames. coefs = ddply(d, .(regions, elem), coefun) Error in model.frame.default(formula = y ~ mouse, drop.unused.levels = TRUE) : variable lengths differ (found for 'mouse') You're clearly doing something here that's messing up the structure of the data. Study what the code (and its output) above are telling you, particularly if you're not familiar with plyr, lattice and/or ggplot2. Writing functions to insert into a **ply() function in plyr can be tricky. If you continue to have problems, please provide a reproducible example as you did here. HTH, Dennis On Sun, Nov 6, 2011 at 12:53 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: I don't think you want to keep these objects separate; it's better to combine everything into a data frame. Here's a variation of your example - the x variable ends up being a mouse, but you may have another variable that's more appropriate to plot so take this as a
[R] Correlation between matrices
regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain', 'cerebellum') mice = paste('mouse', 1:5, sep='') for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) { + assign(n, as.data.frame(replicate(5, rnorm(5 + } names(Cu) = names(Zn) = names(Fe) = names(Ca) = names(Enzyme) = regions row.names(Cu) = row.names(Zn) = row.names(Fe) = row.names(Ca) = row.names(Enzyme) = mice Cu cortex hippocampus brain_stem mid_brain cerebellum mouse1 -0.5436573 -0.31486713 0.1039148 -0.3908665 -1.0849112 mouse2 1.4559136 1.75731752 -2.1195118 -0.9894767 0.3609033 mouse3 -0.6735427 -0.04666507 0.9641000 0.4683339 0.7419944 mouse4 0.6926557 -0.47820023 1.3560802 0.9967562 -1.3727874 mouse5 0.2371585 0.20031393 -1.4978517 0.7535148 0.5632443 Zn cortex hippocampus brain_stem mid_brain cerebellum mouse1 -0.66424043 0.6664478 1.1983546 0.0319403 0.41955740 mouse2 -1.14510448 1.5612235 0.3210821 0.4094753 1.01637466 mouse3 -0.85954416 2.8275458 -0.6922565 -0.8182307 -0.06961242 mouse4 0.03606034 -0.7177256 0.7067217 0.2036655 -0.25542524 mouse5 0.67427572 0.6171704 0.1044267 -1.8636174 -0.07654666 Fe cortex hippocampus brain_stem mid_brain cerebellum mouse1 1.8337008 2.0884261 0.29730413 -1.6884804 0.8336137 mouse2 -0.2734139 -0.5728439 0.63791556 -0.6232828 -1.1352224 mouse3 -0.4795082 0.1627235 0.21775206 1.0751584 -0.5581422 mouse4 1.7125147 -0.5830600 1.40597896 -0.2815305 0.3776360 mouse5 -0.3469067 -0.4813120 -0.09606797 1.0970077 -1.1234038 Ca cortex hippocampus brain_stem mid_brain cerebellum mouse1 -0.7663354 0.8595091 1.33803798 -1.17651576 0.8299963 mouse2 -0.7132260 -0.2626811 0.08025079 -2.40924271 0.7883005 mouse3 -0.7988904 -0.1144639 -0.65901136 0.42462227 0.7068755 mouse4 0.3880393 0.5570068 -0.49969135 0.06633009 -1.3497228 mouse5 1.0077684 0.6023264 -0.57387762 0.25919461 -0.9337281 Enzyme cortex hippocampus brain_stem mid_brain cerebellum mouse1 1.3430936 0.5335819 -0.56992947 1.3565803 -0.8323391 mouse2 1.0520850 -1.0201124 0.8965 1.4719880 1.0854768 mouse3 -0.2802482 0.6863323 -1.37483570 -0.7790174 0.2446761 mouse4 -0.1916415 -0.4566571 1.93365932 1.3493848 0.2130424 mouse5 -1.0349593 -0.1940268 -0.07216321 -0.2968288 1.7406905 In each anatomic region, I would like to calculate the correlation between Enzyme activity and each of the concentrations of Cu, Zn, Fe, and Ca, and do a scatter plot with a tendency line, organizing those plots into a grid. See the image below for the desired effect: http://postimage.org/image/62brra6jn/ How can I achieve this? Thank you in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correlation between matrices
Hi: I don't think you want to keep these objects separate; it's better to combine everything into a data frame. Here's a variation of your example - the x variable ends up being a mouse, but you may have another variable that's more appropriate to plot so take this as a starting point. One plot uses the ggplot2 package, the other uses the lattice and latticeExtra packages. library('ggplot2') regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain', 'cerebellum') mice = paste('mouse', 1:5, sep='') elem - c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme') # Generate a data frame from the combinations of # mice, regions and elem: d - data.frame(expand.grid(mice = mice, regions = regions, elem = elem), y = rnorm(125)) # Create a numeric version of mice d$mouse - as.numeric(d$mice) # A function to return regression coefficients coefun - function(df) coef(lm(y ~ mouse), data = df) # Apply to all regions * elem combinations coefs - ddply(d, .(regions, elem), coefun) names(coefs) - c('regions', 'elem', 'b0', 'b1') # Generate the plot using package ggplot2: ggplot(d, aes(x = mouse, y = y)) + geom_point(size = 2.5) + geom_abline(data = coefs, aes(intercept = b0, slope = b1), size = 1) + facet_grid(elem ~ regions) # Same plot in lattice: library('lattice') library('latticeExtra') p - xyplot(y ~ mouse | elem + regions, data = d, type = c('p', 'r'), layout = c(5, 5)) HTH, Dennis On Sat, Nov 5, 2011 at 10:49 AM, Kaiyin Zhong kindlych...@gmail.com wrote: regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain', 'cerebellum') mice = paste('mouse', 1:5, sep='') for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) { + assign(n, as.data.frame(replicate(5, rnorm(5 + } names(Cu) = names(Zn) = names(Fe) = names(Ca) = names(Enzyme) = regions row.names(Cu) = row.names(Zn) = row.names(Fe) = row.names(Ca) = row.names(Enzyme) = mice Cu cortex hippocampus brain_stem mid_brain cerebellum mouse1 -0.5436573 -0.31486713 0.1039148 -0.3908665 -1.0849112 mouse2 1.4559136 1.75731752 -2.1195118 -0.9894767 0.3609033 mouse3 -0.6735427 -0.04666507 0.9641000 0.4683339 0.7419944 mouse4 0.6926557 -0.47820023 1.3560802 0.9967562 -1.3727874 mouse5 0.2371585 0.20031393 -1.4978517 0.7535148 0.5632443 Zn cortex hippocampus brain_stem mid_brain cerebellum mouse1 -0.66424043 0.6664478 1.1983546 0.0319403 0.41955740 mouse2 -1.14510448 1.5612235 0.3210821 0.4094753 1.01637466 mouse3 -0.85954416 2.8275458 -0.6922565 -0.8182307 -0.06961242 mouse4 0.03606034 -0.7177256 0.7067217 0.2036655 -0.25542524 mouse5 0.67427572 0.6171704 0.1044267 -1.8636174 -0.07654666 Fe cortex hippocampus brain_stem mid_brain cerebellum mouse1 1.8337008 2.0884261 0.29730413 -1.6884804 0.8336137 mouse2 -0.2734139 -0.5728439 0.63791556 -0.6232828 -1.1352224 mouse3 -0.4795082 0.1627235 0.21775206 1.0751584 -0.5581422 mouse4 1.7125147 -0.5830600 1.40597896 -0.2815305 0.3776360 mouse5 -0.3469067 -0.4813120 -0.09606797 1.0970077 -1.1234038 Ca cortex hippocampus brain_stem mid_brain cerebellum mouse1 -0.7663354 0.8595091 1.33803798 -1.17651576 0.8299963 mouse2 -0.7132260 -0.2626811 0.08025079 -2.40924271 0.7883005 mouse3 -0.7988904 -0.1144639 -0.65901136 0.42462227 0.7068755 mouse4 0.3880393 0.5570068 -0.49969135 0.06633009 -1.3497228 mouse5 1.0077684 0.6023264 -0.57387762 0.25919461 -0.9337281 Enzyme cortex hippocampus brain_stem mid_brain cerebellum mouse1 1.3430936 0.5335819 -0.56992947 1.3565803 -0.8323391 mouse2 1.0520850 -1.0201124 0.8965 1.4719880 1.0854768 mouse3 -0.2802482 0.6863323 -1.37483570 -0.7790174 0.2446761 mouse4 -0.1916415 -0.4566571 1.93365932 1.3493848 0.2130424 mouse5 -1.0349593 -0.1940268 -0.07216321 -0.2968288 1.7406905 In each anatomic region, I would like to calculate the correlation between Enzyme activity and each of the concentrations of Cu, Zn, Fe, and Ca, and do a scatter plot with a tendency line, organizing those plots into a grid. See the image below for the desired effect: http://postimage.org/image/62brra6jn/ How can I achieve this? Thank you in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] correlation between matrices - both with some NAs
Hi everyone, I'm having trouble applying the Cor() function to two matrices, both of which contain NAs. I am doing the following: a-cor(m1, m2, use=complete.obs) ... and I get the following error message: Error in cor(m1, m2, use = complete.obs) : no complete element pairs Does anyone know how I can apply a correlation, ignoring any NAs? Thanks, rcoder -- View this message in context: http://www.nabble.com/correlation-between-matrices---both-with-some-NAs-tp18721853p18721853.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.