Re: [R] Correlation between matrices

2011-11-06 Thread Kaiyin Zhong
Thank you Dennis, your tips are really helpful.
I don't quite understand the lm(y~mouse) part; my intention was -- in
pseudo code -- lm(y(Enzyme) ~ y(each elem)).

In addition, attach(d) seems necessary before using lm(y~mouse), and
since d$mouse has a length 125, while each elem for each region has a
length 5, it generates the following error:

 coefs = ddply(d, .(regions, elem), coefun)
Error in model.frame.default(formula = y ~ mouse, drop.unused.levels = TRUE) :
  variable lengths differ (found for 'mouse')


On Sun, Nov 6, 2011 at 12:53 PM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 I don't think you want to keep these objects separate; it's better to
 combine everything into a data frame. Here's a variation of your
 example - the x variable ends up being a mouse, but you may have
 another variable that's more appropriate to plot so take this as a
 starting point. One plot uses the ggplot2 package, the other uses the
 lattice and latticeExtra packages.

 library('ggplot2')
 regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
            'cerebellum')
 mice = paste('mouse', 1:5, sep='')
 elem - c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')

 # Generate a data frame from the combinations of
 # mice, regions and elem:
 d - data.frame(expand.grid(mice = mice, regions = regions,
                            elem = elem), y = rnorm(125))
 # Create a numeric version of mice
 d$mouse - as.numeric(d$mice)

 # A function to return regression coefficients
 coefun - function(df) coef(lm(y ~ mouse), data = df)
 # Apply to all regions * elem combinations
 coefs - ddply(d, .(regions, elem), coefun)
 names(coefs) - c('regions', 'elem', 'b0', 'b1')

 # Generate the plot using package ggplot2:
 ggplot(d, aes(x = mouse, y = y)) +
   geom_point(size = 2.5) +
   geom_abline(data = coefs, aes(intercept = b0, slope = b1),
                             size = 1) +
   facet_grid(elem ~ regions)

 # Same plot in lattice:
 library('lattice')
 library('latticeExtra')
 p - xyplot(y ~ mouse | elem + regions, data = d, type = c('p', 'r'),
         layout = c(5, 5))


 HTH,
 Dennis

 On Sat, Nov 5, 2011 at 10:49 AM, Kaiyin Zhong kindlych...@gmail.com wrote:
  regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
  'cerebellum')
  mice = paste('mouse', 1:5, sep='')
  for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) {
  +   assign(n, as.data.frame(replicate(5, rnorm(5
  + }
  names(Cu) = names(Zn) = names(Fe) = names(Ca) = names(Enzyme) = regions
  row.names(Cu) = row.names(Zn) = row.names(Fe) = row.names(Ca) =
  row.names(Enzyme) = mice
  Cu
            cortex hippocampus brain_stem  mid_brain cerebellum
  mouse1 -0.5436573 -0.31486713  0.1039148 -0.3908665 -1.0849112
  mouse2  1.4559136  1.75731752 -2.1195118 -0.9894767  0.3609033
  mouse3 -0.6735427 -0.04666507  0.9641000  0.4683339  0.7419944
  mouse4  0.6926557 -0.47820023  1.3560802  0.9967562 -1.3727874
  mouse5  0.2371585  0.20031393 -1.4978517  0.7535148  0.5632443
  Zn
             cortex hippocampus brain_stem  mid_brain  cerebellum
  mouse1 -0.66424043   0.6664478  1.1983546  0.0319403  0.41955740
  mouse2 -1.14510448   1.5612235  0.3210821  0.4094753  1.01637466
  mouse3 -0.85954416   2.8275458 -0.6922565 -0.8182307 -0.06961242
  mouse4  0.03606034  -0.7177256  0.7067217  0.2036655 -0.25542524
  mouse5  0.67427572   0.6171704  0.1044267 -1.8636174 -0.07654666
  Fe
            cortex hippocampus  brain_stem  mid_brain cerebellum
  mouse1  1.8337008   2.0884261  0.29730413 -1.6884804  0.8336137
  mouse2 -0.2734139  -0.5728439  0.63791556 -0.6232828 -1.1352224
  mouse3 -0.4795082   0.1627235  0.21775206  1.0751584 -0.5581422
  mouse4  1.7125147  -0.5830600  1.40597896 -0.2815305  0.3776360
  mouse5 -0.3469067  -0.4813120 -0.09606797  1.0970077 -1.1234038
  Ca
            cortex hippocampus  brain_stem   mid_brain cerebellum
  mouse1 -0.7663354   0.8595091  1.33803798 -1.17651576  0.8299963
  mouse2 -0.7132260  -0.2626811  0.08025079 -2.40924271  0.7883005
  mouse3 -0.7988904  -0.1144639 -0.65901136  0.42462227  0.7068755
  mouse4  0.3880393   0.5570068 -0.49969135  0.06633009 -1.3497228
  mouse5  1.0077684   0.6023264 -0.57387762  0.25919461 -0.9337281
  Enzyme
            cortex hippocampus  brain_stem  mid_brain cerebellum
  mouse1  1.3430936   0.5335819 -0.56992947  1.3565803 -0.8323391
  mouse2  1.0520850  -1.0201124  0.8965  1.4719880  1.0854768
  mouse3 -0.2802482   0.6863323 -1.37483570 -0.7790174  0.2446761
  mouse4 -0.1916415  -0.4566571  1.93365932  1.3493848  0.2130424
  mouse5 -1.0349593  -0.1940268 -0.07216321 -0.2968288  1.7406905
 
  In each anatomic region, I would like to calculate the correlation between
  Enzyme activity and each of the concentrations of Cu, Zn, Fe, and Ca, and
  do a scatter plot with a tendency line, organizing those plots into a grid.
  See the image below for the desired effect:
  http://postimage.org/image/62brra6jn/
  How can I achieve this?
 
  Thank you in advance.
 
         [[alternative HTML version 

Re: [R] Correlation between matrices

2011-11-06 Thread Dennis Murphy
Hi:

On Sat, Nov 5, 2011 at 11:06 PM, Kaiyin Zhong kindlych...@gmail.com wrote:
 Thank you Dennis, your tips are really helpful.
 I don't quite understand the lm(y~mouse) part; my intention was -- in
 pseudo code -- lm(y(Enzyme) ~ y(each elem)).

As I said in my first response, I didn't quite understand what you
were trying to regress so I used the mouse as a way of showing you how
the code works. I think I understand what you want now, though.

I'll create a data set in two ways: the first assumes you have the
data as constructed in your original post and the second generates
random numbers after erecting a 'scaffold' data frame. The game is to
separate the enzyme data from the element data and put them into the
final data frame as separate columns. Then the regression is easy if
that's what you need to do.

# Method 1: Generate the data as you did into separate data frames

elem0 - c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')
regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
'cerebellum')

# Creates five 5 x 5 data frames with names V1-V5:
for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) {
   assign(n, as.data.frame(replicate(5, rnorm(5
 }

# Stack the chemical element data using melt() from
# the reshape2 package:
library('reshape2')
d1 - rbind(melt(Cu), melt(Zn), melt(Fe), melt(Ca))
# Relabel V1 - V5 with brain region names, add a factor
# to distinguish individual elements and tack on the melted
# Enzyme data so that it repeats in each element block
d1 - within(d1, {
variable - factor(d1$variable, labels = regions)
elem - factor(rep(elem0[1:4], each = 25))
Enzyme - melt(Enzyme)[, 2]
  } )
# Plot the data using lattice and latticeExtra:
library('lattice')
library('latticeExtra')
p - xyplot(Enzyme ~ value | variable + elem,
   data = d1, type = c('p', 'r'))
useOuterStrips(p)

###
## Method 2: Generate the random data after setting
## up the element/region/mouse combinations
##

# Generate a data frame from the combinations of
# mice, regions and elem:

library('ggplot2')

mice - paste('mouse', 1:5, sep = '')
regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
'cerebellum')
elem - elem0[1:4]
d0 - data.frame(expand.grid(mice = mice,
 regions = regions, elem = elem))
d0 - within(d0, {
value - rnorm(100)   # generate element values
Enzyme - rnorm(25)  # generate enzyme values
  } )

# the Enzyme values are recycled through all element blocks.

# You can either adapt the lattice code above to plot d0, or you
# can do the following to get an analogous plot in ggplot2.
# It's easier to compute the slopes and intercepts and put
# them into a data frame that ggplot() can import, so that's
# what we'll do first.

# A function to return regression coefficients from a
# generic data frame. Since this function goes into ddply(),
# the argument df is a (generic) data frame and the output
# will be converted to a one-line data frame.

coefun - function(df) coef(lm(Enzyme ~ value, data = df))

# Apply the function to all regions * elem combinations.
# Output is a data frame of coefficients corresponding to
# each region/element combination

coefs - ddply(d0, .(regions, elem), coefun)
# Rename the columns
names(coefs) - c('regions', 'elem', 'b0', 'b1')

# Generate the plot using package ggplot2:
ggplot(d0, aes(x = val, y = Enzyme)) +
   geom_point(size = 2.5) +
   geom_abline(data = coefs, aes(intercept = b0, slope = b1),
 size = 1) +
   xlab() +
   facet_grid(elem ~ regions)


 In addition, attach(d) seems necessary before using lm(y~mouse), and
 since d$mouse has a length 125, while each elem for each region has a
 length 5, it generates the following error:

You should never need to use attach() - use the data = argument in
lm() instead, where the value of data is the name of a data frame.
It's always easier to use the modeling functions in R having formula
interfaces with data frames.

 coefs = ddply(d, .(regions, elem), coefun)
 Error in model.frame.default(formula = y ~ mouse, drop.unused.levels = TRUE) :
  variable lengths differ (found for 'mouse')

You're clearly doing something here that's messing up the structure of
the data. Study what the code (and its output) above are telling you,
particularly if you're not familiar with plyr, lattice and/or ggplot2.
Writing functions to insert into a **ply() function in plyr can be
tricky. If you continue to have problems, please provide a
reproducible example as you did here.

HTH,
Dennis


 On Sun, Nov 6, 2011 at 12:53 PM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 I don't think you want to keep these objects separate; it's better to
 combine everything into a data frame. Here's a variation of your
 example - the x variable ends up being a mouse, but you may have
 another variable that's more appropriate to plot so take this as a
 

[R] Correlation between matrices

2011-11-05 Thread Kaiyin Zhong
 regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
'cerebellum')
 mice = paste('mouse', 1:5, sep='')
 for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) {
+   assign(n, as.data.frame(replicate(5, rnorm(5
+ }
 names(Cu) = names(Zn) = names(Fe) = names(Ca) = names(Enzyme) = regions
 row.names(Cu) = row.names(Zn) = row.names(Fe) = row.names(Ca) =
row.names(Enzyme) = mice
 Cu
   cortex hippocampus brain_stem  mid_brain cerebellum
mouse1 -0.5436573 -0.31486713  0.1039148 -0.3908665 -1.0849112
mouse2  1.4559136  1.75731752 -2.1195118 -0.9894767  0.3609033
mouse3 -0.6735427 -0.04666507  0.9641000  0.4683339  0.7419944
mouse4  0.6926557 -0.47820023  1.3560802  0.9967562 -1.3727874
mouse5  0.2371585  0.20031393 -1.4978517  0.7535148  0.5632443
 Zn
cortex hippocampus brain_stem  mid_brain  cerebellum
mouse1 -0.66424043   0.6664478  1.1983546  0.0319403  0.41955740
mouse2 -1.14510448   1.5612235  0.3210821  0.4094753  1.01637466
mouse3 -0.85954416   2.8275458 -0.6922565 -0.8182307 -0.06961242
mouse4  0.03606034  -0.7177256  0.7067217  0.2036655 -0.25542524
mouse5  0.67427572   0.6171704  0.1044267 -1.8636174 -0.07654666
 Fe
   cortex hippocampus  brain_stem  mid_brain cerebellum
mouse1  1.8337008   2.0884261  0.29730413 -1.6884804  0.8336137
mouse2 -0.2734139  -0.5728439  0.63791556 -0.6232828 -1.1352224
mouse3 -0.4795082   0.1627235  0.21775206  1.0751584 -0.5581422
mouse4  1.7125147  -0.5830600  1.40597896 -0.2815305  0.3776360
mouse5 -0.3469067  -0.4813120 -0.09606797  1.0970077 -1.1234038
 Ca
   cortex hippocampus  brain_stem   mid_brain cerebellum
mouse1 -0.7663354   0.8595091  1.33803798 -1.17651576  0.8299963
mouse2 -0.7132260  -0.2626811  0.08025079 -2.40924271  0.7883005
mouse3 -0.7988904  -0.1144639 -0.65901136  0.42462227  0.7068755
mouse4  0.3880393   0.5570068 -0.49969135  0.06633009 -1.3497228
mouse5  1.0077684   0.6023264 -0.57387762  0.25919461 -0.9337281
 Enzyme
   cortex hippocampus  brain_stem  mid_brain cerebellum
mouse1  1.3430936   0.5335819 -0.56992947  1.3565803 -0.8323391
mouse2  1.0520850  -1.0201124  0.8965  1.4719880  1.0854768
mouse3 -0.2802482   0.6863323 -1.37483570 -0.7790174  0.2446761
mouse4 -0.1916415  -0.4566571  1.93365932  1.3493848  0.2130424
mouse5 -1.0349593  -0.1940268 -0.07216321 -0.2968288  1.7406905

In each anatomic region, I would like to calculate the correlation between
Enzyme activity and each of the concentrations of Cu, Zn, Fe, and Ca, and
do a scatter plot with a tendency line, organizing those plots into a grid.
See the image below for the desired effect:
http://postimage.org/image/62brra6jn/
How can I achieve this?

Thank you in advance.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation between matrices

2011-11-05 Thread Dennis Murphy
Hi:

I don't think you want to keep these objects separate; it's better to
combine everything into a data frame. Here's a variation of your
example - the x variable ends up being a mouse, but you may have
another variable that's more appropriate to plot so take this as a
starting point. One plot uses the ggplot2 package, the other uses the
lattice and latticeExtra packages.

library('ggplot2')
regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
'cerebellum')
mice = paste('mouse', 1:5, sep='')
elem - c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')

# Generate a data frame from the combinations of
# mice, regions and elem:
d - data.frame(expand.grid(mice = mice, regions = regions,
elem = elem), y = rnorm(125))
# Create a numeric version of mice
d$mouse - as.numeric(d$mice)

# A function to return regression coefficients
coefun - function(df) coef(lm(y ~ mouse), data = df)
# Apply to all regions * elem combinations
coefs - ddply(d, .(regions, elem), coefun)
names(coefs) - c('regions', 'elem', 'b0', 'b1')

# Generate the plot using package ggplot2:
ggplot(d, aes(x = mouse, y = y)) +
   geom_point(size = 2.5) +
   geom_abline(data = coefs, aes(intercept = b0, slope = b1),
 size = 1) +
   facet_grid(elem ~ regions)

# Same plot in lattice:
library('lattice')
library('latticeExtra')
p - xyplot(y ~ mouse | elem + regions, data = d, type = c('p', 'r'),
 layout = c(5, 5))


HTH,
Dennis

On Sat, Nov 5, 2011 at 10:49 AM, Kaiyin Zhong kindlych...@gmail.com wrote:
 regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
 'cerebellum')
 mice = paste('mouse', 1:5, sep='')
 for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) {
 +   assign(n, as.data.frame(replicate(5, rnorm(5
 + }
 names(Cu) = names(Zn) = names(Fe) = names(Ca) = names(Enzyme) = regions
 row.names(Cu) = row.names(Zn) = row.names(Fe) = row.names(Ca) =
 row.names(Enzyme) = mice
 Cu
           cortex hippocampus brain_stem  mid_brain cerebellum
 mouse1 -0.5436573 -0.31486713  0.1039148 -0.3908665 -1.0849112
 mouse2  1.4559136  1.75731752 -2.1195118 -0.9894767  0.3609033
 mouse3 -0.6735427 -0.04666507  0.9641000  0.4683339  0.7419944
 mouse4  0.6926557 -0.47820023  1.3560802  0.9967562 -1.3727874
 mouse5  0.2371585  0.20031393 -1.4978517  0.7535148  0.5632443
 Zn
            cortex hippocampus brain_stem  mid_brain  cerebellum
 mouse1 -0.66424043   0.6664478  1.1983546  0.0319403  0.41955740
 mouse2 -1.14510448   1.5612235  0.3210821  0.4094753  1.01637466
 mouse3 -0.85954416   2.8275458 -0.6922565 -0.8182307 -0.06961242
 mouse4  0.03606034  -0.7177256  0.7067217  0.2036655 -0.25542524
 mouse5  0.67427572   0.6171704  0.1044267 -1.8636174 -0.07654666
 Fe
           cortex hippocampus  brain_stem  mid_brain cerebellum
 mouse1  1.8337008   2.0884261  0.29730413 -1.6884804  0.8336137
 mouse2 -0.2734139  -0.5728439  0.63791556 -0.6232828 -1.1352224
 mouse3 -0.4795082   0.1627235  0.21775206  1.0751584 -0.5581422
 mouse4  1.7125147  -0.5830600  1.40597896 -0.2815305  0.3776360
 mouse5 -0.3469067  -0.4813120 -0.09606797  1.0970077 -1.1234038
 Ca
           cortex hippocampus  brain_stem   mid_brain cerebellum
 mouse1 -0.7663354   0.8595091  1.33803798 -1.17651576  0.8299963
 mouse2 -0.7132260  -0.2626811  0.08025079 -2.40924271  0.7883005
 mouse3 -0.7988904  -0.1144639 -0.65901136  0.42462227  0.7068755
 mouse4  0.3880393   0.5570068 -0.49969135  0.06633009 -1.3497228
 mouse5  1.0077684   0.6023264 -0.57387762  0.25919461 -0.9337281
 Enzyme
           cortex hippocampus  brain_stem  mid_brain cerebellum
 mouse1  1.3430936   0.5335819 -0.56992947  1.3565803 -0.8323391
 mouse2  1.0520850  -1.0201124  0.8965  1.4719880  1.0854768
 mouse3 -0.2802482   0.6863323 -1.37483570 -0.7790174  0.2446761
 mouse4 -0.1916415  -0.4566571  1.93365932  1.3493848  0.2130424
 mouse5 -1.0349593  -0.1940268 -0.07216321 -0.2968288  1.7406905

 In each anatomic region, I would like to calculate the correlation between
 Enzyme activity and each of the concentrations of Cu, Zn, Fe, and Ca, and
 do a scatter plot with a tendency line, organizing those plots into a grid.
 See the image below for the desired effect:
 http://postimage.org/image/62brra6jn/
 How can I achieve this?

 Thank you in advance.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] correlation between matrices - both with some NAs

2008-07-29 Thread rcoder

Hi everyone,

I'm having trouble applying the Cor() function to two matrices, both of
which contain NAs. I am doing the following:

a-cor(m1, m2, use=complete.obs)

... and I get the following error message:

Error in cor(m1, m2, use = complete.obs) : 
  no complete element pairs

Does anyone know how I can apply a correlation, ignoring any NAs?

Thanks,

rcoder
-- 
View this message in context: 
http://www.nabble.com/correlation-between-matrices---both-with-some-NAs-tp18721853p18721853.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.