Re: [R] Problems with sample means and standard deviations

Dennis Murphy Tue, 01 Feb 2011 09:35:33 -0800

Hi:

Here are a few ways to do this.

A useful approach is to use replicate() to generate the samples, flatten the
resulting matrix into a data frame and call one or more packages that are
well capable of handling multiple outputs per data subset.

Step 1: Generate the samples and rearrange into a data frame.

u <- replicate(400, sample(listMV, 5))      # 400 samples of size 5 w/o
replacement from listMV
df <- data.frame(rep(1:ncol(u), each = nrow(u)), as.vector(u))

df is now a 2000 x 2 data frame with two columns: one a sample number
indicator, the other the values sampled from listMV. You have more options
when the data are arranged in this form.

Step 2: Get the means and standard deviations per sample.

The three packages I use below use somewhat different approaches. To use
them, they need to be installed first from CRAN if you don't already have
them. Fortunately, all three packages have documentation available; doBy and
data.table have package vignettes while plyr has its own website:
http://had.co.nz/plyr/
All three packages are designed to process data quickly in a groupwise
fashion, which is why I created an indicator for sample number.
# ----------------------

(a) Package doBy:

library(doBy)

# Create a function to return the mean and standard deviation of a variable
g <- function(x) c(mean = mean(x), sd = sd(x))

# Apply the function g() above to the y values in each sample
ww <- summaryBy(y ~ sample, data = df, FUN = g)   # returns a 400 x 2 data
frame
head(ww, 3)
  sample   y.mean       y.sd
1      1 1.242163 0.04225226
2      2 1.301827 0.07032729
3      3 1.332400 0.02500223

# The overall mean and standard deviation is gotten as follows:
> summaryBy(y ~ 1, data = df, FUN = g)
    y.mean       y.sd
1 1.299855 0.07606458

# A nice generalization is that summaryBy() can take multiple responses
# on the left side of the formula and return a column of means and standard
# deviations for each by group.
# -----------------

(b) package plyr:

library(plyr)
w <- ddply(df, .(sample), summarise, m = mean(y), s = sd(y))
>dim(w)
[1] 400   3
> head(w)
  sample        m          s
1      1 1.242163 0.04225226
2      2 1.301827 0.07032729
3      3 1.332400 0.02500223

The summarise argument creates a new data frame for the summary functions
defined by m and s. There is a separate summarise() function for overall
summaries:

> summarise(df, m = mean(y), s = sd(y))
         m          s
1 1.299855 0.07606458

# -------------------------------
(c) Package data.table:

library(data.table)
dt <- data.table(sample = rep(1:400, each = 5), y = as.vector(u))
w2 <- dt[, list(m = mean(y), s = sd(y)), by = 'sample']
> dim(w2)
[1] 400   3
> head(w2)
     sample        m          s
[1,]      1 1.242163 0.04225226
[2,]      2 1.301827 0.07032729
[3,]      3 1.332400 0.02500223

The overall mean and standard deviation is straightforward:

dt[, list(m = mean(y), s = sd(y))]
            m          s
[1,] 1.299855 0.07606458

# -----------------------------

This is not an exhaustive list, as there are other ways to do the same
thing, which others may take the opportunity to show you.

HTH,
Dennis

On Tue, Feb 1, 2011 at 1:56 AM, Titta <[email protected]> wrote:

> Hi,
>
> I am doing program that takes samples y times from listMV and saves the
> result to list MVdata. The problem is that I need sample mean or standard
> deviation for each sample (times y) and for all samples together. How can I
> do that? Mean() and sd() won´t work.
>
> Thanks allready,
> Titta
>
> >
>
> listMV<-c(1.182101983,1.249382648,1.374104215,1.336153877,1.331386231,1.319032094,1.311126545,1.221740863,1.298848481,1.241727379,1.339273873,1.386809408,1.355919009,1.321051409,1.256459148,1.284277166,1.300219992,1.377359149,1.231984488,1.308793786,1.319114185,1.417506978,1.310797119,1.230818679,1.229165322,1.320724049,1.342038449,1.201942636,1.334793202,1.30065893,1.409992259,1.369055222,1.214696135,1.228829414,1.273789905,1.328549897,1.201871417,1.272051102,1.381760814,1.482881264,1.35225819,1.171344013,1.235416322,1.25905681,1.34637339,1.188881698,1.221856048,1.302875505,1.43703543,1.434648007,1.246797867,1.236886744,1.308768636,1.253534504,1.246544401,1.347202456,1.253535584,1.442176865,1.40847141,1.241578938,1.238772941,1.30662151,1.326978911,1.237433784,1.308488464,1.274562848,1.452806933,1.486559719,1.237405035,1.175760893,1.316972548,1.313807387,1.224698176,1.239616142,1.259846334,1.423991194,1.406917943,1.25118274,1.200447065,1.237256663,1.237398053
> + )
> > y<-3
> > MVdata=c()
> > for(i in 1:y){
> + s<-sample(listMV,size=5, replace=FALSE)
> + MVdata[[length(MVdata)+length(y)]]<-s}
> >
> > MVdata
> [[1]]
> [1] 1.434648 1.256459 1.237405 1.259057 1.334793
>
> [[2]]
> [1] 1.221856 1.201871 1.320724 1.231984 1.259846
>
> [[3]]
> [1] 1.182102 1.214696 1.310797 1.237405 1.308794
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with sample means and standard deviations

Reply via email to