Greg Adkison wrote:
I would be incredibly grateful to anyone who'll help me translate some SAS code into R code.

Say for example that I have a dataset named "dat1" that includes five variables: wshed, site, species, bda, and sla. I can calculate with the following SAS code the mean, CV, se, and number of observations of "bda" and "sla" for each combination of "wshed," "species," and "site," restricting the species considered to only three of several species in dat1 (b, c, and p). Moreover, I can output these calculations and grouping variables to a dataset named "dat2" that will reside in RAM and include the variables wshed, site, species, mBdA, msla, cBda, sBdA, ssla, nBda, and nsla.

proc sort data=dat1;
  by wshed site species;
proc means data=dat1 noprint mean cv stderr n;
  by wshed site species;
  where species in ('b', 'c', 'p');
  var BdA sla;
  output out=dat2
    mean=mBdA msla
    cv=cBdA csla
    stderr=sBdA ssla
    n=nBdA nsla;

Thanks,
Greg

The following handles any number of analysis variables, with automatic naming of all statistics computed from them. It requires the Hmisc package.

# Generate some data.  Put one NA in sla.
set.seed(1)
dat1 <- expand.grid(wshed=1:2, site=c('A','B'),
                    species=c('a','b','c','p'),
                    reps=1:10)
n <- nrow(dat1)
dat1 <- transform(dat1,
                  BdA = rnorm(n, 100, 20),
                  sla = c(rnorm(n-1, 200, 30), NA))
# Can use upData function in Hmisc in place of transform

# Summarization function, per stratum, for a matrix of analysis
# variables
g <- function(y) {
  n <- apply(y, 2, function(z) sum(!is.na(z)))
  m <- apply(y, 2, mean, na.rm=TRUE)
  s <- apply(y, 2, sd,   na.rm=TRUE)
  cv <- s/m
  se <- s/sqrt(n)
  w <- c(m, cv, se, n)
  names(w) <- t(outer(c('m','c','s','n'), colnames(y), paste, sep=''))
  w
}
library(Hmisc)
dat2 <-  with(dat1,
              summarize(cbind(BdA, sla),
                        llist(wshed, site, species),
                        g,
                        subset=species %in% c('b','c','p'),
                        stat.name='mBdA')
              )

options(digits=3)
dat2  # is a data frame

   wshed site species  mBdA msla  cBdA   csla sBdA  ssla nBdA nsla
1      1    A       b 100.5  195 0.133 0.1813 4.23 11.20   10   10
2      1    A       c  99.7  206 0.101 0.1024 3.17  6.68   10   10
3      1    A       p 101.4  188 0.239 0.1580 7.65  9.39   10   10
4      1    B       b 109.9  203 0.118 0.1433 4.09  9.21   10   10
5      1    B       c  98.4  221 0.193 0.1250 6.01  8.72   10   10
6      1    B       p 102.9  203 0.216 0.1446 7.03  9.29   10   10
7      2    A       b  95.8  195 0.241 0.2011 7.31 12.40   10   10
8      2    A       c  98.7  207 0.194 0.1274 6.04  8.33   10   10
9      2    A       p 102.2  191 0.217 0.1709 7.01 10.31   10   10
10     2    B       b  97.8  191 0.235 0.2079 7.27 12.58   10   10
11     2    B       c 100.9  194 0.164 0.0987 5.24  6.07   10   10
12     2    B       p 103.0  209 0.144 0.0769 4.69  5.35   10    9


-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to