A ddply solution is
dat.out <- ddply(dat, .(time), transform, slope = scale(slope))
but this is not faster than the loop, and slower than the ave() solution:
> system.time(
+ for (i in 1:3) {
+ mat <- dat[dat$time==i, ]
+ outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope))
+ if (i==1) {
+ out <- outi
+ } else {
+ out <- rbind(out, outi)
+ }
+ }
+ )
user system elapsed
0.024 0.000 0.025
>
> system.time(
+ dat.out <- ddply(dat, .(time), transform, slope = scale(slope))
+ )
user system elapsed
0.032 0.000 0.031
>
>
> system.time(
+ cbind(dat, slope = ave(dat$slope, list(dat$time), FUN = scale))
+ )
user system elapsed
0.008 0.000 0.007
>
On Thu, Aug 26, 2010 at 4:33 PM, Bos, Roger <[email protected]>wrote:
> I created a small example to show something that I do a lot of. "scale"
> data by month and return a data.frame with the output. "id" represents
> repeated observations over "time" and I want to scale the "slope"
> variable. The "out" variable shows the output I want. My for..loop
> does the job but is probably very slow versus other methods. ddply
> seems ideal, but despite playing with the baseball examples quite a bit
> I can't figure out how to get it to work with my sample dataset.
>
> TIA for any help, Roger
>
> Here is the sample code:
>
> dat <- data.frame(id=rep(letters[1:5],3),
> time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
> dat
>
> for (i in 1:3) {
> mat <- dat[dat$time==i, ]
> outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope))
> if (i==1) {
> out <- outi
> } else {
> out <- rbind(out, outi)
> }
> }
> out
>
> Here is the sample output:
>
> > dat <- data.frame(id=rep(letters[1:5],3),
> time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
>
> > dat
> id time slope
> 1 a 1 1
> 2 b 1 2
> 3 c 1 3
> 4 d 1 4
> 5 e 1 5
> 6 a 2 6
> 7 b 2 7
> 8 c 2 8
> 9 d 2 9
> 10 e 2 10
> 11 a 3 11
> 12 b 3 12
> 13 c 3 13
> 14 d 3 14
> 15 e 3 15
>
> > for (i in 1:3) {
> + mat <- dat[dat$time==i, ]
> + outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope))
> + if (i==1) {
> + out .... [TRUNCATED]
>
> > out
> mat.time mat.id slope
> 1 1 a -1.2649111
> 2 1 b -0.6324555
> 3 1 c 0.0000000
> 4 1 d 0.6324555
> 5 1 e 1.2649111
> 6 2 a -1.2649111
> 7 2 b -0.6324555
> 8 2 c 0.0000000
> 9 2 d 0.6324555
> 10 2 e 1.2649111
> 11 3 a -1.2649111
> 12 3 b -0.6324555
> 13 3 c 0.0000000
> 14 3 d 0.6324555
> 15 3 e 1.2649111
> >
> ***************************************************************
>
> This message is for the named person's use only. It ma...{{dropped:22}}
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.