[R] ggplot2: mapping categorical variable to color aesthetic with faceting

2009-10-06 Thread Bryan Hanson
Hello Again...  I¹m making a faceted plot of a response on two categorical
variables using ggplot2 and having troubles with the coloring. Here is a
sample that produces the desired plot:

compareCats - function(data, res, fac1, fac2, colors) {

require(ggplot2)
p - ggplot(data, aes(fac1, res)) + facet_grid(. ~ fac2)
jit - position_jitter(width = 0.1)
p - p + layer(geom = jitter, position = jit, color = colors)
print(p)
}

test - data.frame(res = rnorm(100), fac1 = as.factor(rep(c(A, B), 50)),
fac2 = as.factor(rep(c(lrg, lrg, sm, sm), 25)))

compareCats(data = test, res = res, fac1 = fac1, fac2 = fac2, colors =
c(red, blue))

Now, if I get away from idealized data where there are the same number of
data points per group (25 in this case), I run into problems.  So, if you
do:

rem - runif(5, 1, 100) # randomly remove a few points here and there
test - test[-rem,]
compareCats(data = test, res = res, fac1 = fac1, fac2 = fac2, colors =
c(red, blue))

R throws an error due to mismatch between the recycling of colors and the
actual number of data points:

Error in `[-.data.frame`(`*tmp*`, gp, value = list(colour = c(red,  :
  replacement element 1 has 2 rows, need 47

I'm new to ggplot2, but have been through the book and the web site enough
to know that my problem is mapping the varible to the aesthetic; I also
know I can either map or set the colors.

The question, finally:  is there an simple/elegant way to map a list of two
colors corresponding to A and B onto any random sample size of A and B with
faceting?  If not, and I must set the colors:  Do I compute the length of
all possible combos of A, B with lrg, sm, and then create one long vector of
colors for the entire plot?  I tried something like this, and was not
successful, but perhaps could be with more work.

All advice appreciated, Bryan (session info below)

*
Bryan Hanson
Professor of Chemistry  Biochemistry
DePauw University, Greencastle IN USA

 sessionInfo()
R version 2.9.2 (2009-08-24)
i386-apple-darwin8.11.1

locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid  datasets  tools utils stats graphics  grDevices
methods  
[9] base 

other attached packages:
 [1] ggplot2_0.8.3  reshape_0.8.3  proto_0.3-8mvbutils_2.2.0
 [5] ChemoSpec_1.1  lattice_0.17-25mvoutlier_1.4  plyr_0.1.8
 [9] RColorBrewer_1.0-2 chemometrics_0.4   som_0.3-4
robustbase_0.4-5  
[13] rpart_3.1-45   pls_2.1-0  pcaPP_1.7  mvtnorm_0.9-7
[17] nnet_7.2-48mclust_3.2 MASS_7.2-48lars_0.9-7
[21] e1071_1.5-19   class_7.2-48  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: mapping categorical variable to color aesthetic with faceting

2009-10-06 Thread baptiste auguie
Hi,

I may be missing an important design decision, but could you not have
only a single data.frame as an argument of your function? From your
example, it seems that the colour can be mapped to the fac1 variable
of data,

compareCats - function(data) {

   require(ggplot2)
   p - ggplot(data, aes(fac1, res, color=fac1)) + facet_grid(. ~ fac2)
   jit - position_jitter(width = 0.1)
   p - p + layer(geom = jitter, position = jit) +
 scale_colour_manual(values=c(red, blue))
   print(p)
   }


test - data.frame(res = rnorm(100), fac1 = as.factor(rep(c(A, B), 50)),
   fac2 = as.factor(rep(c(lrg, lrg, sm, sm), 25)))

compareCats(data = test)

rem - runif(5, 1, 100) # randomly remove a few points here and there
last_plot() %+% test[-rem,] # replot with new dataset


HTH,

baptiste



2009/10/6 Bryan Hanson han...@depauw.edu:
 Hello Again...  I¹m making a faceted plot of a response on two categorical
 variables using ggplot2 and having troubles with the coloring. Here is a
 sample that produces the desired plot:

 compareCats - function(data, res, fac1, fac2, colors) {

    require(ggplot2)
    p - ggplot(data, aes(fac1, res)) + facet_grid(. ~ fac2)
    jit - position_jitter(width = 0.1)
    p - p + layer(geom = jitter, position = jit, color = colors)
    print(p)
    }

 test - data.frame(res = rnorm(100), fac1 = as.factor(rep(c(A, B), 50)),
    fac2 = as.factor(rep(c(lrg, lrg, sm, sm), 25)))

 compareCats(data = test, res = res, fac1 = fac1, fac2 = fac2, colors =
 c(red, blue))

 Now, if I get away from idealized data where there are the same number of
 data points per group (25 in this case), I run into problems.  So, if you
 do:

 rem - runif(5, 1, 100) # randomly remove a few points here and there
 test - test[-rem,]
 compareCats(data = test, res = res, fac1 = fac1, fac2 = fac2, colors =
 c(red, blue))

 R throws an error due to mismatch between the recycling of colors and the
 actual number of data points:

 Error in `[-.data.frame`(`*tmp*`, gp, value = list(colour = c(red,  :
  replacement element 1 has 2 rows, need 47

 I'm new to ggplot2, but have been through the book and the web site enough
 to know that my problem is mapping the varible to the aesthetic; I also
 know I can either map or set the colors.

 The question, finally:  is there an simple/elegant way to map a list of two
 colors corresponding to A and B onto any random sample size of A and B with
 faceting?  If not, and I must set the colors:  Do I compute the length of
 all possible combos of A, B with lrg, sm, and then create one long vector of
 colors for the entire plot?  I tried something like this, and was not
 successful, but perhaps could be with more work.

 All advice appreciated, Bryan (session info below)

 *
 Bryan Hanson
 Professor of Chemistry  Biochemistry
 DePauw University, Greencastle IN USA

 sessionInfo()
 R version 2.9.2 (2009-08-24)
 i386-apple-darwin8.11.1

 locale:
 en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] grid      datasets  tools     utils     stats     graphics  grDevices
 methods
 [9] base

 other attached packages:
  [1] ggplot2_0.8.3      reshape_0.8.3      proto_0.3-8        mvbutils_2.2.0
  [5] ChemoSpec_1.1      lattice_0.17-25    mvoutlier_1.4      plyr_0.1.8
  [9] RColorBrewer_1.0-2 chemometrics_0.4   som_0.3-4
 robustbase_0.4-5
 [13] rpart_3.1-45       pls_2.1-0          pcaPP_1.7          mvtnorm_0.9-7
 [17] nnet_7.2-48        mclust_3.2         MASS_7.2-48        lars_0.9-7
 [21] e1071_1.5-19       class_7.2-48

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: mapping categorical variable to color aesthetic with faceting

2009-10-06 Thread baptiste auguie
Further to my previous reply, it occurred to me that ggplot2 would
only ever use data and colors in your calls to compareCats(): res =
res, fac1 = fac1, fac2 = fac2 have no effect whatsoever.

If you want the user to be able to specify the variables used in the
ggplot2 call, you probably want to look at ?aes_string, as shown
below,

compareCats - function(data, fac1=fac1, fac2=fac2, res=res,
colors=c(red, blue)) {

  require(ggplot2)
  p - ggplot(data, aes_string(x=fac1, y=res, color=fac1)) +
facet_grid(paste(. ~ , fac2))
  jit - position_jitter(width = 0.1)
  p - p + layer(geom = jitter, position = jit) +
scale_colour_manual(values=colors)
  print(p)
  }

test - data.frame(res = rnorm(100), fac1 = as.factor(rep(c(A, B), 50)),
  fac2 = as.factor(rep(c(lrg, lrg, sm, sm), 25)))

compareCats(data = test)

rem - sample(10, 1:ncol(test)) # randomly remove a few points here and there
last_plot() %+% test[-rem, ] # replot with new dataset

HTH,

baptiste




2009/10/6 baptiste auguie baptiste.aug...@googlemail.com:
 Hi,

 I may be missing an important design decision, but could you not have
 only a single data.frame as an argument of your function? From your
 example, it seems that the colour can be mapped to the fac1 variable
 of data,

 compareCats - function(data) {

   require(ggplot2)
   p - ggplot(data, aes(fac1, res, color=fac1)) + facet_grid(. ~ fac2)
   jit - position_jitter(width = 0.1)
   p - p + layer(geom = jitter, position = jit) +
     scale_colour_manual(values=c(red, blue))
   print(p)
   }


 test - data.frame(res = rnorm(100), fac1 = as.factor(rep(c(A, B), 50)),
   fac2 = as.factor(rep(c(lrg, lrg, sm, sm), 25)))

 compareCats(data = test)

 rem - runif(5, 1, 100) # randomly remove a few points here and there
 last_plot() %+% test[-rem,] # replot with new dataset


 HTH,

 baptiste



 2009/10/6 Bryan Hanson han...@depauw.edu:
 Hello Again...  I¹m making a faceted plot of a response on two categorical
 variables using ggplot2 and having troubles with the coloring. Here is a
 sample that produces the desired plot:

 compareCats - function(data, res, fac1, fac2, colors) {

    require(ggplot2)
    p - ggplot(data, aes(fac1, res)) + facet_grid(. ~ fac2)
    jit - position_jitter(width = 0.1)
    p - p + layer(geom = jitter, position = jit, color = colors)
    print(p)
    }

 test - data.frame(res = rnorm(100), fac1 = as.factor(rep(c(A, B), 50)),
    fac2 = as.factor(rep(c(lrg, lrg, sm, sm), 25)))

 compareCats(data = test, res = res, fac1 = fac1, fac2 = fac2, colors =
 c(red, blue))

 Now, if I get away from idealized data where there are the same number of
 data points per group (25 in this case), I run into problems.  So, if you
 do:

 rem - runif(5, 1, 100) # randomly remove a few points here and there
 test - test[-rem,]
 compareCats(data = test, res = res, fac1 = fac1, fac2 = fac2, colors =
 c(red, blue))

 R throws an error due to mismatch between the recycling of colors and the
 actual number of data points:

 Error in `[-.data.frame`(`*tmp*`, gp, value = list(colour = c(red,  :
  replacement element 1 has 2 rows, need 47

 I'm new to ggplot2, but have been through the book and the web site enough
 to know that my problem is mapping the varible to the aesthetic; I also
 know I can either map or set the colors.

 The question, finally:  is there an simple/elegant way to map a list of two
 colors corresponding to A and B onto any random sample size of A and B with
 faceting?  If not, and I must set the colors:  Do I compute the length of
 all possible combos of A, B with lrg, sm, and then create one long vector of
 colors for the entire plot?  I tried something like this, and was not
 successful, but perhaps could be with more work.

 All advice appreciated, Bryan (session info below)

 *
 Bryan Hanson
 Professor of Chemistry  Biochemistry
 DePauw University, Greencastle IN USA

 sessionInfo()
 R version 2.9.2 (2009-08-24)
 i386-apple-darwin8.11.1

 locale:
 en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] grid      datasets  tools     utils     stats     graphics  grDevices
 methods
 [9] base

 other attached packages:
  [1] ggplot2_0.8.3      reshape_0.8.3      proto_0.3-8        mvbutils_2.2.0
  [5] ChemoSpec_1.1      lattice_0.17-25    mvoutlier_1.4      plyr_0.1.8
  [9] RColorBrewer_1.0-2 chemometrics_0.4   som_0.3-4
 robustbase_0.4-5
 [13] rpart_3.1-45       pls_2.1-0          pcaPP_1.7          mvtnorm_0.9-7
 [17] nnet_7.2-48        mclust_3.2         MASS_7.2-48        lars_0.9-7
 [21] e1071_1.5-19       class_7.2-48

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the 

Re: [R] ggplot2: mapping categorical variable to color aesthetic with faceting

2009-10-06 Thread Bryan Hanson
Hi Baptiste:  Thanks for the suggestion.  It will work perfectly.

I would have never considered assigning a color to a variable that contained
no colors at all!  I guess this is part of the aesthetic concept, which I
haven't had time to reflect on much.  Then later, specify a manual color
scale which then maps back onto the aesthetic.  Clever.

As I stated, I'm just learning ggplot2, and I'm finding the language and
concepts a bit different (I'm not familiar with the grammar of graphics,
nor am I a computer scientist).  But, I have to say the code I am working up
replaces a much much longer code in base graphics, so I am really liking the
thought put into ggplot2 and the leanness of it - Thanks Hadley!

Thanks again, Bryan


On 10/6/09 12:36 PM, baptiste auguie baptiste.aug...@googlemail.com
wrote:

 Hi,
 
 I may be missing an important design decision, but could you not have
 only a single data.frame as an argument of your function? From your
 example, it seems that the colour can be mapped to the fac1 variable
 of data,
 
 compareCats - function(data) {
 
require(ggplot2)
p - ggplot(data, aes(fac1, res, color=fac1)) + facet_grid(. ~ fac2)
jit - position_jitter(width = 0.1)
p - p + layer(geom = jitter, position = jit) +
  scale_colour_manual(values=c(red, blue))
print(p)
}
 
 
 test - data.frame(res = rnorm(100), fac1 = as.factor(rep(c(A, B), 50)),
fac2 = as.factor(rep(c(lrg, lrg, sm, sm), 25)))
 
 compareCats(data = test)
 
 rem - runif(5, 1, 100) # randomly remove a few points here and there
 last_plot() %+% test[-rem,] # replot with new dataset
 
 
 HTH,
 
 baptiste
 
 
 
 2009/10/6 Bryan Hanson han...@depauw.edu:
 Hello Again...  I¹m making a faceted plot of a response on two categorical
 variables using ggplot2 and having troubles with the coloring. Here is a
 sample that produces the desired plot:
 
 compareCats - function(data, res, fac1, fac2, colors) {
 
    require(ggplot2)
    p - ggplot(data, aes(fac1, res)) + facet_grid(. ~ fac2)
    jit - position_jitter(width = 0.1)
    p - p + layer(geom = jitter, position = jit, color = colors)
    print(p)
    }
 
 test - data.frame(res = rnorm(100), fac1 = as.factor(rep(c(A, B), 50)),
    fac2 = as.factor(rep(c(lrg, lrg, sm, sm), 25)))
 
 compareCats(data = test, res = res, fac1 = fac1, fac2 = fac2, colors =
 c(red, blue))
 
 Now, if I get away from idealized data where there are the same number of
 data points per group (25 in this case), I run into problems.  So, if you
 do:
 
 rem - runif(5, 1, 100) # randomly remove a few points here and there
 test - test[-rem,]
 compareCats(data = test, res = res, fac1 = fac1, fac2 = fac2, colors =
 c(red, blue))
 
 R throws an error due to mismatch between the recycling of colors and the
 actual number of data points:
 
 Error in `[-.data.frame`(`*tmp*`, gp, value = list(colour = c(red,  :
  replacement element 1 has 2 rows, need 47
 
 I'm new to ggplot2, but have been through the book and the web site enough
 to know that my problem is mapping the varible to the aesthetic; I also
 know I can either map or set the colors.
 
 The question, finally:  is there an simple/elegant way to map a list of two
 colors corresponding to A and B onto any random sample size of A and B with
 faceting?  If not, and I must set the colors:  Do I compute the length of
 all possible combos of A, B with lrg, sm, and then create one long vector of
 colors for the entire plot?  I tried something like this, and was not
 successful, but perhaps could be with more work.
 
 All advice appreciated, Bryan (session info below)
 
 *
 Bryan Hanson
 Professor of Chemistry  Biochemistry
 DePauw University, Greencastle IN USA
 
 sessionInfo()
 R version 2.9.2 (2009-08-24)
 i386-apple-darwin8.11.1
 
 locale:
 en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
 
 attached base packages:
 [1] grid      datasets  tools     utils     stats     graphics  grDevices
 methods
 [9] base
 
 other attached packages:
  [1] ggplot2_0.8.3      reshape_0.8.3      proto_0.3-8        mvbutils_22.0
  [5] ChemoSpec_1.1      lattice_0.17-25    mvoutlier_1.4      plyr_0.1.8
  [9] RColorBrewer_1.0-2 chemometrics_0.4   som_0.3-4
 robustbase_0.4-5
 [13] rpart_3.1-45       pls_2.1-0          pcaPP_1.7          mvtnorm_0.9-7
 [17] nnet_7.2-48        mclust_3.2         MASS_7.2-48        lars_0.9-7
 [21] e1071_1.5-19       class_7.2-48
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: mapping categorical variable to color aesthetic with faceting

2009-10-06 Thread Bryan Hanson
A few days ago on the list I had wrestled with the aes() vs aes_string()
issue, along with the same issue with facetting.

The way I ended up handling the point you bring up, Baptiste, is perhaps
rather inefficient but my data sets are not large.  I allow the user to pass
variables, then I use that info to construct extra data frame entries, which
then are suitable for use by ggplot 2 since they are known in the data
frame.  Here's what the first part of the actual function looks like, you
can see how I avoided aes_string and related problems with facet:

compareCats - function(data = NULL, res = NULL, fac1 = NULL, fac2 = NULL,
fac1order = NULL, fac2order = NULL, fac1cols = NULL,
method = c(sem, iqr, mad, box, points),
title = Comparison of Categories, y.lab = your text here,
subtitle = optional explanatory caption) {

require(ggplot2)

# restructure data so names will match, re-ordering too

data$res - data[, res]
a - match(fac1, names(data))
b - match(fac2, names(data))
data$fac1 - factor(data[[a]], levels = fac1order)
data$fac2 - factor(data[[b]], levels = fac2order)

# now the plot

p - ggplot(data, aes(fac1, res, color = fac1)) + facet_grid(. ~ fac2) +
xlab(NULL) + opts(title = title,
axis.text.x = theme_text(colour = black), axis.ticks =
theme_blank())

And then depending up on the method specified by the user, additional geoms
are added and the plot created.

This gets the job done, but if there are further suggestions, I'd love to
learn other solutions.

Bryan

On 10/6/09 1:08 PM, baptiste auguie baptiste.aug...@googlemail.com
wrote:

 Further to my previous reply, it occurred to me that ggplot2 would
 only ever use data and colors in your calls to compareCats(): res =
 res, fac1 = fac1, fac2 = fac2 have no effect whatsoever.
 
 If you want the user to be able to specify the variables used in the
 ggplot2 call, you probably want to look at ?aes_string, as shown
 below,
 
 compareCats - function(data, fac1=fac1, fac2=fac2, res=res,
 colors=c(red, blue)) {
 
   require(ggplot2)
   p - ggplot(data, aes_string(x=fac1, y=res, color=fac1)) +
 facet_grid(paste(. ~ , fac2))
   jit - position_jitter(width = 0.1)
   p - p + layer(geom = jitter, position = jit) +
 scale_colour_manual(values=colors)
   print(p)
   }
 
 test - data.frame(res = rnorm(100), fac1 = as.factor(rep(c(A, B), 50)),
   fac2 = as.factor(rep(c(lrg, lrg, sm, sm), 25)))
 
 compareCats(data = test)
 
 rem - sample(10, 1:ncol(test)) # randomly remove a few points here and there
 last_plot() %+% test[-rem, ] # replot with new dataset
 
 HTH,
 
 baptiste
 
 
 
 
 2009/10/6 baptiste auguie baptiste.aug...@googlemail.com:
 Hi,
 
 I may be missing an important design decision, but could you not have
 only a single data.frame as an argument of your function? From your
 example, it seems that the colour can be mapped to the fac1 variable
 of data,
 
 compareCats - function(data) {
 
   require(ggplot2)
   p - ggplot(data, aes(fac1, res, color=fac1)) + facet_grid(. ~ fac2)
   jit - position_jitter(width = 0.1)
   p - p + layer(geom = jitter, position = jit) +
     scale_colour_manual(values=c(red, blue))
   print(p)
   }
 
 
 test - data.frame(res = rnorm(100), fac1 = as.factor(rep(c(A, B), 50)),
   fac2 = as.factor(rep(c(lrg, lrg, sm, sm), 25)))
 
 compareCats(data = test)
 
 rem - runif(5, 1, 100) # randomly remove a few points here and there
 last_plot() %+% test[-rem,] # replot with new dataset
 
 
 HTH,
 
 baptiste
 
 
 
 2009/10/6 Bryan Hanson han...@depauw.edu:
 Hello Again...  I¹m making a faceted plot of a response on two categorical
 variables using ggplot2 and having troubles with the coloring. Here is a
 sample that produces the desired plot:
 
 compareCats - function(data, res, fac1, fac2, colors) {
 
    require(ggplot2)
    p - ggplot(data, aes(fac1, res)) + facet_grid(. ~ fac2)
    jit - position_jitter(width = 0.1)
    p - p + layer(geom = jitter, position = jit, color = colors)
    print(p)
    }
 
 test - data.frame(res = rnorm(100), fac1 = as.factor(rep(c(A, B), 50)),
    fac2 = as.factor(rep(c(lrg, lrg, sm, sm), 25)))
 
 compareCats(data = test, res = res, fac1 = fac1, fac2 = fac2, colors =
 c(red, blue))
 
 Now, if I get away from idealized data where there are the same number of
 data points per group (25 in this case), I run into problems.  So, if you
 do:
 
 rem - runif(5, 1, 100) # randomly remove a few points here and there
 test - test[-rem,]
 compareCats(data = test, res = res, fac1 = fac1, fac2 = fac2, colors =
 c(red, blue))
 
 R throws an error due to mismatch between the recycling of colors and the
 actual number of data points:
 
 Error in `[-.data.frame`(`*tmp*`, gp, value = list(colour = c(red,  :
  replacement element 1 has 2 rows, need 47
 
 I'm new to ggplot2, but have been through the book and the web site enough
 to know that my problem is mapping the varible to the aesthetic; I also
 know I can either map or set the