Here is one way of doing it:

> # create the rows for each unique combination
> x.split <- split(seq(nrow(mydata)), list(mydata$time, mydata$treatment,
+     mydata$expREP, mydata$techREP), drop=TRUE)
> # now go through the list of indices and add the median
> mydata$Y50 <- 0   # add the dummy median column
> for (i in x.split){
+     mydata$Y50[i] <- median(mydata$Y[i])  # median for each group
+ }
> head(mydata,20)
   cpdID  time treatment  expREP  techREP        Y      Y50
1      7 120hr   control expREP1 techREP3 14.42331 15.74599
2      7 120hr   control expREP1 techREP2 14.05282 15.10810
3      7 120hr   control expREP1 techREP1 14.22814 14.63248
4      7 120hr       trt expREP3 techREP1 13.84921 15.08641
5      7  24hr       trt expREP1 techREP1 14.07648 15.17235
6      7  24hr   control expREP1 techREP3 14.21394 14.63314
7      8  24hr   control expREP1 techREP1 14.33992 14.81328
8      8 120hr       trt expREP1 techREP3 14.36256 15.34493
9      8 120hr   control expREP2 techREP3 14.01419 15.14270
10     8 120hr   control expREP2 techREP2 14.09803 15.10079
11     8 120hr       trt expREP1 techREP2 14.31522 15.39152
12     8 120hr       trt expREP1 techREP1 14.45288 14.65430
13     8  24hr   control expREP2 techREP1 13.95919 14.71188
14     8  24hr       trt expREP2 techREP3 14.40640 14.36332
15     8  24hr       trt expREP1 techREP2 14.04038 14.42856
16     8  24hr   control expREP3 techREP3 14.22859 15.08463
17     8  24hr       trt expREP1 techREP3 14.12598 14.53840
18     8  24hr       trt expREP3 techREP1 13.59257 14.69984
19     8  24hr       trt expREP3 techREP2 13.58308 14.85730
20    10 120hr   control expREP3 techREP1 13.02808 14.07448
>



On 3/7/07, ALAN SMITH <[EMAIL PROTECTED]> wrote:
>
> Hello R users,
>
> Problem.......I do not understand how to use "aggregate","by", or the
> appropriate "apply" to perform a function on data with more than one
> factor on unbalanced data...
> I have a data frame in the long format that does not contain balanced
> data. The ID is a unique identifier corresponding to the experimental
> unit that will later be examined by ANOVA, T-tests etc. Y is the data
> generated from the experiment.  The factors represent the differences
> between each sample or "run" measured.
>
> str(mydata)  ### sample of table at bottom of email ###
> 'data.frame':   129982 obs. of  6 variables:
> $ ID    : num  7 7 7 7 7 7 8 8 8 8 ...
> $ time     : Factor w/ 2 levels "120hr","24hr": 1 1 1 1 2 2 2 1 1 1 ...
> $ treatment: Factor w/ 2 levels "control","trt": 1 1 1 2 2 1 1 2 1 1 ...
> $ expREP   : Factor w/ 3 levels "expREP1","expREP2",..: 1 1 1 3 1 1 1 1 2
> 2 ...
> $ techREP  : Factor w/ 3 levels "techREP1","techREP2",..: 3 2 1 1 1 3
> 1 3 3 2 ...
> $ Y             : num  14.4 14.1 14.2 13.8 14.1 ...
>
> Could someone please help with doing something like the following
> 1. I would like to find the median for each unique combination of
> factors using the data in the   long format (like finding the median
> of a single column of data).
> 2. Create a new column where the median is repeated for the number of
> rows of the unique factor combination
> 3. I would like to learn the most efficient way to do this because I
> want to avoid recreating the table from scratch with many commands
> like the series below. I will have to perform this operation on many
> different data sets some, with many more factors then this example.
>
> ### help me learn to use an apply or other command that will do the
> following #####
> m0<-mydata$cpdID[mydata$time=="24hr" & mydata$treatment=="control" &
> mydata$expREP=="expREP1" & mydata$techREP=="techREP1"]
> m1<-mydata$Y[mydata$time=="24hr" & mydata$treatment=="control" &
> mydata$expREP=="expREP1" & mydata$techREP=="techREP1"]
> m2<-median(m1)
> m3<-cbind(ID=m0,time=rep("24hr",length(m1)),
> treatment=rep("control",length(m1)), expREP=rep("expREP1",length(m1)),
> techREP=rep("techREP1",length(m1)),Y=m1,Y50=rep(m2,length(m1)))
> ######### I would like to avoid writing the above hundreds of times ######
>
> I am able to reshape into wide format and then find the column
> medians. However restacking the data and regenerating the factors
> becomes very very messy on data sets with 150 columns.  I am able to
> preform this analysis is SAS easily using BY, but I would like to know
> how to do it in R.
>
>
> I have tried these commands in a number of different variations with
> no luck and similar error messages
> test1<-aggregate(mydata[,-1],
> list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP)
> ,median, na.rm=T)
> Error in median.default(X[[1]], ...) : need numeric data ### Y in
> numeric####
>
> test1<-by(mydata[,-1],
> list(mydata$time,mydata$treatment,mydata$expREP,mydata$techREP)
> ,median, na.rm=T)
> Error in median.default(data[x, ], ...) : need numeric data
>
> Thanks
> Alan
> winXP R 2.4.1
>
>
> #####Example data frame######
> mydata<-as.data.frame(structure(list(cpdID = c(7, 7, 7, 7, 7, 7, 8, 8,
> 8, 8, 8, 8,
> 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
> 10, 10, 10, 10, 10, 10, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23,
> 23, 23, 23, 23, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
> 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
> 33, 33, 33, 33, 33, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 42, 42, 42, 42, 42,
> 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
> 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
> 42, 42, 42, 42, 42, 42, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47,
> 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47), time = structure(as.integer
> (c(1,
> 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2,
> 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2,
> 1, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2,
> 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2,
> 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2,
> 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2,
> 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
> 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1,
> 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2,
> 2, 2, 1, 2, 2, 1, 2, 2, 1, 2)), .Label = c("120hr", "24hr"), class =
> "factor"),
>    treatment = structure(as.integer(c(1, 1, 1, 2, 2, 1, 1, 2,
>    1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1,
>    2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2,
>    1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1,
>    1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2,
>    2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2,
>    1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 2,
>    2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1,
>    1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,
>    2, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1,
>    2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1,
>    2, 1)), .Label = c("control", "trt"), class = "factor"),
>    expREP = structure(as.integer(c(1, 1, 1, 3, 1, 1, 1, 1, 2,
>    2, 1, 1, 2, 2, 1, 3, 1, 3, 3, 3, 1, 2, 1, 2, 2, 2, 2, 3,
>    3, 2, 2, 1, 2, 3, 3, 1, 1, 2, 3, 1, 3, 3, 3, 3, 1, 3, 1,
>    1, 2, 1, 1, 2, 3, 2, 2, 1, 3, 2, 2, 2, 3, 2, 1, 2, 2, 2,
>    2, 1, 1, 1, 3, 2, 2, 3, 3, 2, 2, 2, 3, 2, 3, 2, 3, 1, 2,
>    3, 3, 1, 1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 1,
>    1, 1, 2, 3, 2, 2, 3, 2, 2, 2, 1, 1, 1, 3, 3, 2, 2, 2, 1,
>    3, 1, 2, 3, 1, 3, 3, 1, 2, 3, 1, 2, 1, 3, 1, 3, 3, 2, 2,
>    2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 3, 1, 1, 1,
>    1, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3,
>    3, 1, 1, 1, 3, 2, 1, 1, 2, 1, 3, 2, 1, 2, 1, 3, 1, 1, 2,
>    3)), .Label = c("expREP1", "expREP2", "expREP3"), class = "factor"),
>    techREP = structure(as.integer(c(3, 2, 1, 1, 1, 3, 1, 3,
>    3, 2, 2, 1, 1, 3, 2, 3, 3, 1, 2, 1, 2, 1, 3, 1, 3, 2, 2,
>    3, 1, 1, 3, 3, 2, 3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3, 1, 2,
>    3, 1, 3, 2, 1, 1, 2, 2, 3, 3, 3, 2, 1, 2, 1, 2, 3, 2, 3,
>    2, 3, 1, 3, 2, 2, 1, 2, 1, 3, 2, 2, 1, 1, 3, 3, 3, 1, 3,
>    1, 3, 2, 2, 2, 1, 2, 1, 3, 1, 3, 2, 1, 3, 1, 3, 2, 3, 1,
>    1, 2, 3, 1, 1, 3, 3, 2, 2, 1, 2, 3, 2, 2, 3, 2, 2, 1, 2,
>    2, 3, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 3, 1, 1, 3, 2,
>    3, 3, 3, 3, 1, 3, 2, 1, 3, 3, 1, 3, 2, 1, 2, 2, 1, 2, 1,
>    2, 1, 1, 1, 2, 2, 3, 3, 1, 2, 3, 2, 3, 3, 3, 1, 2, 2, 1,
>    3, 2, 3, 3, 2, 2, 2, 3, 2, 1, 3, 1, 3, 1, 3, 1, 1, 1, 2,
>    2, 3)), .Label = c("techREP1", "techREP2", "techREP3"), class =
> "factor"),
>    log2Abun = c(14.4233129144089, 14.052822741429, 14.2281422686467,
>    13.8492096005693, 14.076481601207, 14.2139395740777, 14.3399195756207,
>    14.3625602954496, 14.0141948668145, 14.0980320829605, 14.3152203363759,
>    14.4528846974866, 13.9591869268449, 14.4064043323413, 14.0403753485321,
>    14.2285932517829, 14.1259784261721, 13.5925738310379, 13.5830827675029,
>    13.0280787227049, 15.0198078807043, 12.8423503434138, 12.645883554519,
>    13.4644181177386, 12.8399910705399, 12.7879025593914, 12.4978518369511,
>    14.3949985145017, 12.8670856466168, 12.9749522735341, 13.3456824481868,
>    13.4557125040673, 12.8989792046225, 16.0609491915918, 13.6795900568273,
>    16.456466720182, 13.6145948287653, 13.2604785448039, 14.8573006848798,
>    13.1382718001722, 13.690761908446, 14.0557060971613, 13.7495552174335,
>    13.6336764098923, 13.7844303674846, 15.9518993688317, 13.2452555803066,
>    13.1930632791304, 12.1919845133603, 13.8710388986595, 13.6375305515253,
>    12.5919897676151, 17.4797250127015, 17.4014712120155, 17.5948202702163,
>    12.6031626795344, 17.8287811089804, 11.3613955331659, 15.8064741020529,
>    15.1007855146758, 16.0553036215393, 15.7553570530353, 15.9747058600332,
>    15.776715745005, 15.8588066550904, 16.2935434944118, 16.271207673964,
>    16.3660489506706, 16.3273070282017, 15.7632383068689, 14.6030467398838,
>    14.7118820283521, 14.7577545959238, 14.7315311764619, 14.8250084466403,
>    15.6652803936783, 15.8249587405285, 15.6558660906456, 15.5387042614836,
>    14.8487696278309, 15.5477380355109, 15.9451465974129, 16.196755792715,
>    15.9999119421954, 15.8660714836595, 15.9406577104549, 15.8754613979164,
>    16.0358944927638, 16.1785092456522, 16.1992122284106, 15.8087128474547,
>    15.9373968104322, 16.1432636222427, 16.2412011305004, 15.9488234774507,
>    15.7820255767261, 15.7730361533934, 15.7459893802453, 20.7777738189812,
>    21.7489122647969, 21.0374490930058, 20.9765158780184, 21.0464959041766,
>    21.6790715518273, 21.8021013715842, 20.7652083875471, 20.6663696521617,
>    20.3963413756589, 20.7983642126234, 20.1864915044977, 20.4422216681915,
>    20.59064186918, 20.6964531077756, 20.6822196619653, 20.4532414913665,
>    20.8126113450884, 20.4397608946311, 21.4603719009067, 21.5318145314919,
>    21.0400816517662, 21.0466431076593, 20.7459819969019, 20.6723053403015,
>    20.4793421418014, 20.6432035537608, 20.6831942471622, 21.6913537667357,
>    20.6562913013787, 21.0940693071186, 20.9473294479256, 20.5087271424267,
>    16.0871520250047, 16.3816612332698, 16.998645516939, 15.7912392142223,
>    14.5058735666446, 13.6035104425928, 14.4369066987207, 14.6998435295626,
>    14.6818972267862, 14.1086877961546, 14.3539049235617, 15.40862828087,
>    15.0657947671893, 14.8615716011254, 14.5538692431961, 14.2397476835569,
>    13.4381420777437, 13.4499224158638, 13.6887966810545, 14.6550275257018,
>    13.500966330283, 14.9271297886953, 14.7405186421119, 15.0047910398043,
>    14.7051463678038, 14.8325933769599, 12.9854861991046, 13.4203550220891,
>    15.399010832952, 15.4064707685293, 15.0953970227926, 15.0712109416537,
>    15.7587957644032, 15.0013202225009, 15.7608498673217, 14.7604080920677,
>    14.2478533598602, 14.4140245098782, 14.7936541075062, 14.7684428120549,
>    14.595607155062, 16.1507389488284, 16.4915712924337, 14.490161446684,
>    14.721633263063, 14.4341721012904, 15.8747652729112, 14.543333961671,
>    14.8633635585377, 14.6696601802386, 13.3020676725265, 14.0190694293311,
>    15.2168973938334, 12.6304946615056, 12.1972166931101, 12.7960396088298,
>    14.4285564621952, 14.5308330346953, 14.1496677436943, 14.0823985634278,
>    12.8407779235951, 14.6543003749437, 14.3202364452416, 15.1723493709662,
>    14.0744760007345, 14.8132801684508, 12.9183042336999, 14.5225202325766,
>    13.742309436084)), .Names = c("cpdID", "time", "treatment",
> "expREP", "techREP", "Y")))
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to