[R] tapply and error bars: Problem Fixed

2018-06-24 Thread Ogbos Okike
HI Jim,

This is great!! It is also tricky!!! The problem lies in the choice of
ylim. And looking at the data and choosing ylim based on the maximum and
minimum values of y is a waste of time. And choosing it by other means was
yet much more difficult.

I had to start plotting part of the data with incremental step of 80 data
points and manually varying ylim till I got to the last data point 1136,
where I finally used ylim=c(15,162000) which has nothing to do with the
raw data.

Many, many thanks.
Best wishes
Ogbos

On Sun, Jun 24, 2018 at 9:51 PM, Jim Lemon  wrote:

> Hi Ogbos,
> The problem is almost certainly with the data. I get the plot I expect
> with the sample data that you first posted, so I know that the code
> works. If you try thIs what do you get?
>
> oodf<-read.table(text="S/N  AB
> 1-5  64833
> 2-4  95864
> 3-3  82322
> 4-2  95591
> 5-1  69378
> 6 0  74281
> 7 1 103261
> 8 2  92473
> 9 3  84344
> 104 127415
> 115 123826
> 126 100029
> 137  76205
> 148 105162
> 159 119533
> 16   10 106490
> 17   -5  82322
> 18   -4  95591
> 19   -3  69378
> 20   -2  74281
> 21   -1 103261
> 220  92473
> 231  84344
> 242 127415
> 253 123826
> 264 100029
> 275  76205
> 286 105162
> 297 119533
> 308 106490
> 319 114771
> 32   10  55593
> 33   -5  85694
> 34   -4  65205
> 35   -3  80995
> 36   -2  51723
> 37   -1  62310
> 380  53401
> 391  65677
> 402  76094
> 413  64035
> 424  68290
> 435  73306
> 446  82176
> 457  75566
> 468  89762
> 479  88063
> 48   10  94395
> 49   -5  80651
> 50   -4  81291
> 51   -3  63702
> 52   -2  70297
> 53   -1  64117
> 540  71219
> 551  57354
> 562  62111
> 573  42252
> 584  35454
> 595  33469
> 606  38899
> 617  64981
> 628  85694
> 639  79452
> 64   10  85216
> 65   -5  71219
> 66   -4  57354
> 67   -3  62111
> 68   -2  42252
> 69   -1  35454
> 700  33469
> 711  38899
> 722  64981
> 733  85694
> 744  79452
> 755  85216
> 766  81721
> 777  91231
> 788 107074
> 799 108103
> 80   10  7576",
> header=TRUE)
> library(plotrix)
> std.error<-function(x) return(sd(x)/(sum(!is.na(x
> oomean<-as.vector(by(oodf$B,oodf$A,mean))
> oose<-as.vector(by(oodf$B,oodf$A,std.error))
> plot(-5:10,oomean,type="b",ylim=c(5,11),
>  xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
> dispersion(-5:10,oomean,oose)
>
> I get the attached plot;
>
> Jim
>
> On Mon, Jun 25, 2018 at 1:58 AM, Ogbos Okike 
> wrote:
> > Hi Jim
> >
> > Thanks again for returning to this.
> > please not that the line "oomean<-as.vector(by(oodf$B,oodf$A,mean))" was
> > omitted (not sure whether deliberate)  after you introduced the standard
> > error function.
> > When I used it, empty plot window with the correct axes were generated
> but
> > no data was displayed. No error too.
> >
> > library(plotrix)
> > std.error<-function(x) return(sd(x)/(sum(!is.na(x
> > oose<-as.vector(by(oodf$B,oodf$A,std.error))
> > plot(-5:10,oomean,type="b",ylim=c(5,11),
> >  xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
> > dispersion(-5:10,oomean,oose)
> >
> > When I included the line, the same empty graph window was generated but
> with
> > the former error "Error in FUN(X[[1L]], ...) : could not find function
> > "FUN""
> > library(plotrix)
> > std.error<-function(x) return(sd(x)/(sum(!is.na(x
> > oomean<-as.vector(by(oodf$B,oodf$A,mean))
> > oose<-as.vector(by(oodf$B,oodf$A,std.error))
> > plot(-5:10,oomean,type="b",ylim=c(5,11),
> >  xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
> > dispersion(-5:10,oomean,oose)
> >
> > I am sure am missing something but can't place it. Please have a look
> again
> > to track my mistake.
> >
> > Warmest regards
> > Ogbos
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply and error bars

2018-06-24 Thread Jim Lemon
Hi Ogbos,
The problem is almost certainly with the data. I get the plot I expect
with the sample data that you first posted, so I know that the code
works. If you try thIs what do you get?

oodf<-read.table(text="S/N  AB
1-5  64833
2-4  95864
3-3  82322
4-2  95591
5-1  69378
6 0  74281
7 1 103261
8 2  92473
9 3  84344
104 127415
115 123826
126 100029
137  76205
148 105162
159 119533
16   10 106490
17   -5  82322
18   -4  95591
19   -3  69378
20   -2  74281
21   -1 103261
220  92473
231  84344
242 127415
253 123826
264 100029
275  76205
286 105162
297 119533
308 106490
319 114771
32   10  55593
33   -5  85694
34   -4  65205
35   -3  80995
36   -2  51723
37   -1  62310
380  53401
391  65677
402  76094
413  64035
424  68290
435  73306
446  82176
457  75566
468  89762
479  88063
48   10  94395
49   -5  80651
50   -4  81291
51   -3  63702
52   -2  70297
53   -1  64117
540  71219
551  57354
562  62111
573  42252
584  35454
595  33469
606  38899
617  64981
628  85694
639  79452
64   10  85216
65   -5  71219
66   -4  57354
67   -3  62111
68   -2  42252
69   -1  35454
700  33469
711  38899
722  64981
733  85694
744  79452
755  85216
766  81721
777  91231
788 107074
799 108103
80   10  7576",
header=TRUE)
library(plotrix)
std.error<-function(x) return(sd(x)/(sum(!is.na(x
oomean<-as.vector(by(oodf$B,oodf$A,mean))
oose<-as.vector(by(oodf$B,oodf$A,std.error))
plot(-5:10,oomean,type="b",ylim=c(5,11),
 xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
dispersion(-5:10,oomean,oose)

I get the attached plot;

Jim

On Mon, Jun 25, 2018 at 1:58 AM, Ogbos Okike  wrote:
> Hi Jim
>
> Thanks again for returning to this.
> please not that the line "oomean<-as.vector(by(oodf$B,oodf$A,mean))" was
> omitted (not sure whether deliberate)  after you introduced the standard
> error function.
> When I used it, empty plot window with the correct axes were generated but
> no data was displayed. No error too.
>
> library(plotrix)
> std.error<-function(x) return(sd(x)/(sum(!is.na(x
> oose<-as.vector(by(oodf$B,oodf$A,std.error))
> plot(-5:10,oomean,type="b",ylim=c(5,11),
>  xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
> dispersion(-5:10,oomean,oose)
>
> When I included the line, the same empty graph window was generated but with
> the former error "Error in FUN(X[[1L]], ...) : could not find function
> "FUN""
> library(plotrix)
> std.error<-function(x) return(sd(x)/(sum(!is.na(x
> oomean<-as.vector(by(oodf$B,oodf$A,mean))
> oose<-as.vector(by(oodf$B,oodf$A,std.error))
> plot(-5:10,oomean,type="b",ylim=c(5,11),
>  xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
> dispersion(-5:10,oomean,oose)
>
> I am sure am missing something but can't place it. Please have a look again
> to track my mistake.
>
> Warmest regards
> Ogbos
>


ooplot.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply and error bars

2018-06-24 Thread Ogbos Okike
Hi Jim

Thanks again for returning to this.
please not that the line "oomean<-as.vector(by(oodf$B,oodf$A,mean))" was
omitted (not sure whether deliberate)  after you introduced the standard
error function.
When I used it, empty plot window with the correct axes were generated but
no data was displayed. No error too.

library(plotrix)
std.error<-function(x) return(sd(x)/(sum(!is.na(x
oose<-as.vector(by(oodf$B,oodf$A,std.error))
plot(-5:10,oomean,type="b",ylim=c(5,11),
 xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
dispersion(-5:10,oomean,oose)

When I included the line, the same empty graph window was generated but
with the former error "Error in FUN(X[[1L]], ...) : could not find function
"FUN""
library(plotrix)
std.error<-function(x) return(sd(x)/(sum(!is.na(x
oomean<-as.vector(by(oodf$B,oodf$A,mean))
oose<-as.vector(by(oodf$B,oodf$A,std.error))
plot(-5:10,oomean,type="b",ylim=c(5,11),
 xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
dispersion(-5:10,oomean,oose)

I am sure am missing something but can't place it. Please have a look again
to track my mistake.

Warmest regards
Ogbos

On Sun, Jun 24, 2018 at 11:24 AM, Jim Lemon  wrote:

> Hi Ogbos,
> If I use the example data that you sent, I get the error after this line:
>
> oose<-as.vector(by(oodf$B,oodf$A,std.error))
> Error in FUN(X[[i]], ...) : object 'std.error' not found
>
> The reason is that you have not defined std.error as a function, but
> as the result of a calculation. When I rewrite it like this:
>
> std.error<-function(x) return(sd(x)/(sum(!is.na(x
> oose<-as.vector(by(oodf$B,oodf$A,std.error))
> plot(-5:10,oomean,type="b",ylim=c(5,11),
>  xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
> dispersion(-5:10,oomean,oose)
>
> I get the expected plot.
>
> Jim
>
>
> On Sat, Jun 23, 2018 at 9:36 PM, Ogbos Okike 
> wrote:
> > Hi Jim,
> >
> > Thanks for assisting. Here is what I did:
> >
> > A<-matrix(rep(-5:10,71))
> > B<-matrix(data)
> > std.error = sd(B)/sqrt(sum(!is.na(B)))
> >  oodf<-data.frame(A,B)
> >
> >  oomean<-as.vector(by(oodf$B,oodf$A,mean))
> > oose<-as.vector(by(oodf$B,oodf$A,std.error))
> > plot(-5:10,oomean,type="b",ylim=c(5,11),
> >  xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
> > dispersion(-5:10,oomean,oose)
> >
> > And the error says:
> > Error in FUN(X[[1L]], ...) : could not find function "FUN"
> >
> > Please note that I use:
> > std.error = sd(B)/sqrt(sum(!is.na(B)))
> >  to calculate the standard error as it requested for it.
> >
> > Thanks
> > Ogbos
> >
> > On Sat, Jun 23, 2018 at 10:09 AM, Jim Lemon 
> wrote:
> >>
> >> Hi Ogbos,
> >> This may help:
> >>
> >> # assume your data frame is named "oodf"
> >> oomean<-as.vector(by(oodf$B,oodf$A,mean))
> >> oose<-as.vector(by(oodf$B,oodf$A,std.error))
> >> plot(-5:10,oomean,type="b",ylim=c(5,11),
> >>  xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
> >> dispersion(-5:10,oomean,oose)
> >>
> >> Jim
> >>
> >> On Sat, Jun 23, 2018 at 4:35 PM, Ogbos Okike 
> >> wrote:
> >> > Dear workers,
> >> > I have a data of length 1136. Below is the code I use to get the means
> >> > B.
> >> > It worked fine and I had the mean calculated and plotted.
> >> >
> >> > I wish to plot the error bars as well. I already plotted such means
> with
> >> > error bars before. Please see attached for example.
> >> >
> >> > I tried to redo the same plot but unlikely could not get around it as
> I
> >> > lost my system containing the script.
> >> > Among many attempts, I tried:
> >> > library(gplots)
> >> >
> >> >  plotmeans(errors~AB,xlab="Factor A",ylab="mean errors", p=.68,
> >> > main="Main
> >> >   effect Plot",barcol="black")
> >> > Nothing worked.
> >> >
> >> > I would really be thankful should somebody return me to the track.
> >> > Many, many thanks for your time.
> >> > Ogbos
> >> >
> >> > A sample of the data is:
> >> > S/N  AB
> >> > 1-5  64833
> >> > 2-4  95864
> >> > 3-3  82322
> >> > 4-2  95591
> >> > 5-1  69378
> >> > 6 0  74281
> >> > 7 1 103261
> >> > 8 2  92473
> >> > 9 3  84344
> >> > 104 127415
> >> > 115 123826
> >> > 126 100029
> >> > 137  76205
> >> > 148 105162
> >> > 159 119533
> >> > 16   10 106490
> >> > 17   -5  82322
> >> > 18   -4  95591
> >> > 19   -3  69378
> >> > 20   -2  74281
> >> > 21   -1 103261
> >> > 220  92473
> >> > 231  84344
> >> > 242 127415
> >> > 253 123826
> >> > 264 100029
> >> > 275  76205
> >> > 286 105162
> >> > 297 119533
> >> > 308 106490
> >> > 319 114771
> >> > 32   10  55593
> >> > 33   -5  85694
> >> > 34   -4  65205
> >> > 35   -3  80995
> >> > 36   -2  51723
> >> > 37   -1  62310
> >> > 380  53401
> >> > 391  65677
> >> > 402  76094
> >> > 413  64035
> >> > 424  68290
> >> > 435  73306
> >> > 446  82176
> >> > 457  75566
> >> > 468  89762
> >> > 479  88063
> >> > 48   10  94395
> >> 

Re: [R] tapply and error bars

2018-06-24 Thread Jim Lemon
Hi Ogbos,
If I use the example data that you sent, I get the error after this line:

oose<-as.vector(by(oodf$B,oodf$A,std.error))
Error in FUN(X[[i]], ...) : object 'std.error' not found

The reason is that you have not defined std.error as a function, but
as the result of a calculation. When I rewrite it like this:

std.error<-function(x) return(sd(x)/(sum(!is.na(x
oose<-as.vector(by(oodf$B,oodf$A,std.error))
plot(-5:10,oomean,type="b",ylim=c(5,11),
 xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
dispersion(-5:10,oomean,oose)

I get the expected plot.

Jim


On Sat, Jun 23, 2018 at 9:36 PM, Ogbos Okike  wrote:
> Hi Jim,
>
> Thanks for assisting. Here is what I did:
>
> A<-matrix(rep(-5:10,71))
> B<-matrix(data)
> std.error = sd(B)/sqrt(sum(!is.na(B)))
>  oodf<-data.frame(A,B)
>
>  oomean<-as.vector(by(oodf$B,oodf$A,mean))
> oose<-as.vector(by(oodf$B,oodf$A,std.error))
> plot(-5:10,oomean,type="b",ylim=c(5,11),
>  xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
> dispersion(-5:10,oomean,oose)
>
> And the error says:
> Error in FUN(X[[1L]], ...) : could not find function "FUN"
>
> Please note that I use:
> std.error = sd(B)/sqrt(sum(!is.na(B)))
>  to calculate the standard error as it requested for it.
>
> Thanks
> Ogbos
>
> On Sat, Jun 23, 2018 at 10:09 AM, Jim Lemon  wrote:
>>
>> Hi Ogbos,
>> This may help:
>>
>> # assume your data frame is named "oodf"
>> oomean<-as.vector(by(oodf$B,oodf$A,mean))
>> oose<-as.vector(by(oodf$B,oodf$A,std.error))
>> plot(-5:10,oomean,type="b",ylim=c(5,11),
>>  xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
>> dispersion(-5:10,oomean,oose)
>>
>> Jim
>>
>> On Sat, Jun 23, 2018 at 4:35 PM, Ogbos Okike 
>> wrote:
>> > Dear workers,
>> > I have a data of length 1136. Below is the code I use to get the means
>> > B.
>> > It worked fine and I had the mean calculated and plotted.
>> >
>> > I wish to plot the error bars as well. I already plotted such means with
>> > error bars before. Please see attached for example.
>> >
>> > I tried to redo the same plot but unlikely could not get around it as I
>> > lost my system containing the script.
>> > Among many attempts, I tried:
>> > library(gplots)
>> >
>> >  plotmeans(errors~AB,xlab="Factor A",ylab="mean errors", p=.68,
>> > main="Main
>> >   effect Plot",barcol="black")
>> > Nothing worked.
>> >
>> > I would really be thankful should somebody return me to the track.
>> > Many, many thanks for your time.
>> > Ogbos
>> >
>> > A sample of the data is:
>> > S/N  AB
>> > 1-5  64833
>> > 2-4  95864
>> > 3-3  82322
>> > 4-2  95591
>> > 5-1  69378
>> > 6 0  74281
>> > 7 1 103261
>> > 8 2  92473
>> > 9 3  84344
>> > 104 127415
>> > 115 123826
>> > 126 100029
>> > 137  76205
>> > 148 105162
>> > 159 119533
>> > 16   10 106490
>> > 17   -5  82322
>> > 18   -4  95591
>> > 19   -3  69378
>> > 20   -2  74281
>> > 21   -1 103261
>> > 220  92473
>> > 231  84344
>> > 242 127415
>> > 253 123826
>> > 264 100029
>> > 275  76205
>> > 286 105162
>> > 297 119533
>> > 308 106490
>> > 319 114771
>> > 32   10  55593
>> > 33   -5  85694
>> > 34   -4  65205
>> > 35   -3  80995
>> > 36   -2  51723
>> > 37   -1  62310
>> > 380  53401
>> > 391  65677
>> > 402  76094
>> > 413  64035
>> > 424  68290
>> > 435  73306
>> > 446  82176
>> > 457  75566
>> > 468  89762
>> > 479  88063
>> > 48   10  94395
>> > 49   -5  80651
>> > 50   -4  81291
>> > 51   -3  63702
>> > 52   -2  70297
>> > 53   -1  64117
>> > 540  71219
>> > 551  57354
>> > 562  62111
>> > 573  42252
>> > 584  35454
>> > 595  33469
>> > 606  38899
>> > 617  64981
>> > 628  85694
>> > 639  79452
>> > 64   10  85216
>> > 65   -5  71219
>> > 66   -4  57354
>> > 67   -3  62111
>> > 68   -2  42252
>> > 69   -1  35454
>> > 700  33469
>> > 711  38899
>> > 722  64981
>> > 733  85694
>> > 744  79452
>> > 755  85216
>> > 766  81721
>> > 777  91231
>> > 788 107074
>> > 799 108103
>> > 80   10  7576
>> >
>> > A<-matrix(rep(-5:10,71))
>> > B<-matrix(data)
>> >  AB<-data.frame(A,B)
>> >
>> > x= B
>> >
>> >  f<-factor(A)
>> > AB<- tapply(x,f,mean)
>> > x<--5:10
>> > plot(x,AB,type="l")
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply and error bars

2018-06-23 Thread Jim Lemon
Hi Ogbos,
This may help:

# assume your data frame is named "oodf"
oomean<-as.vector(by(oodf$B,oodf$A,mean))
oose<-as.vector(by(oodf$B,oodf$A,std.error))
plot(-5:10,oomean,type="b",ylim=c(5,11),
 xlab="days (epoch is the day of Fd)",ylab="strikes/km2/day")
dispersion(-5:10,oomean,oose)

Jim

On Sat, Jun 23, 2018 at 4:35 PM, Ogbos Okike  wrote:
> Dear workers,
> I have a data of length 1136. Below is the code I use to get the means B.
> It worked fine and I had the mean calculated and plotted.
>
> I wish to plot the error bars as well. I already plotted such means with
> error bars before. Please see attached for example.
>
> I tried to redo the same plot but unlikely could not get around it as I
> lost my system containing the script.
> Among many attempts, I tried:
> library(gplots)
>
>  plotmeans(errors~AB,xlab="Factor A",ylab="mean errors", p=.68, main="Main
>   effect Plot",barcol="black")
> Nothing worked.
>
> I would really be thankful should somebody return me to the track.
> Many, many thanks for your time.
> Ogbos
>
> A sample of the data is:
> S/N  AB
> 1-5  64833
> 2-4  95864
> 3-3  82322
> 4-2  95591
> 5-1  69378
> 6 0  74281
> 7 1 103261
> 8 2  92473
> 9 3  84344
> 104 127415
> 115 123826
> 126 100029
> 137  76205
> 148 105162
> 159 119533
> 16   10 106490
> 17   -5  82322
> 18   -4  95591
> 19   -3  69378
> 20   -2  74281
> 21   -1 103261
> 220  92473
> 231  84344
> 242 127415
> 253 123826
> 264 100029
> 275  76205
> 286 105162
> 297 119533
> 308 106490
> 319 114771
> 32   10  55593
> 33   -5  85694
> 34   -4  65205
> 35   -3  80995
> 36   -2  51723
> 37   -1  62310
> 380  53401
> 391  65677
> 402  76094
> 413  64035
> 424  68290
> 435  73306
> 446  82176
> 457  75566
> 468  89762
> 479  88063
> 48   10  94395
> 49   -5  80651
> 50   -4  81291
> 51   -3  63702
> 52   -2  70297
> 53   -1  64117
> 540  71219
> 551  57354
> 562  62111
> 573  42252
> 584  35454
> 595  33469
> 606  38899
> 617  64981
> 628  85694
> 639  79452
> 64   10  85216
> 65   -5  71219
> 66   -4  57354
> 67   -3  62111
> 68   -2  42252
> 69   -1  35454
> 700  33469
> 711  38899
> 722  64981
> 733  85694
> 744  79452
> 755  85216
> 766  81721
> 777  91231
> 788 107074
> 799 108103
> 80   10  7576
>
> A<-matrix(rep(-5:10,71))
> B<-matrix(data)
>  AB<-data.frame(A,B)
>
> x= B
>
>  f<-factor(A)
> AB<- tapply(x,f,mean)
> x<--5:10
> plot(x,AB,type="l")
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply and error bars

2018-06-23 Thread Ogbos Okike
Dear workers,
I have a data of length 1136. Below is the code I use to get the means B.
It worked fine and I had the mean calculated and plotted.

I wish to plot the error bars as well. I already plotted such means with
error bars before. Please see attached for example.

I tried to redo the same plot but unlikely could not get around it as I
lost my system containing the script.
Among many attempts, I tried:
library(gplots)

 plotmeans(errors~AB,xlab="Factor A",ylab="mean errors", p=.68, main="Main
  effect Plot",barcol="black")
Nothing worked.

I would really be thankful should somebody return me to the track.
Many, many thanks for your time.
Ogbos

A sample of the data is:
S/N  AB
1-5  64833
2-4  95864
3-3  82322
4-2  95591
5-1  69378
6 0  74281
7 1 103261
8 2  92473
9 3  84344
104 127415
115 123826
126 100029
137  76205
148 105162
159 119533
16   10 106490
17   -5  82322
18   -4  95591
19   -3  69378
20   -2  74281
21   -1 103261
220  92473
231  84344
242 127415
253 123826
264 100029
275  76205
286 105162
297 119533
308 106490
319 114771
32   10  55593
33   -5  85694
34   -4  65205
35   -3  80995
36   -2  51723
37   -1  62310
380  53401
391  65677
402  76094
413  64035
424  68290
435  73306
446  82176
457  75566
468  89762
479  88063
48   10  94395
49   -5  80651
50   -4  81291
51   -3  63702
52   -2  70297
53   -1  64117
540  71219
551  57354
562  62111
573  42252
584  35454
595  33469
606  38899
617  64981
628  85694
639  79452
64   10  85216
65   -5  71219
66   -4  57354
67   -3  62111
68   -2  42252
69   -1  35454
700  33469
711  38899
722  64981
733  85694
744  79452
755  85216
766  81721
777  91231
788 107074
799 108103
80   10  7576

A<-matrix(rep(-5:10,71))
B<-matrix(data)
 AB<-data.frame(A,B)

x= B

 f<-factor(A)
AB<- tapply(x,f,mean)
x<--5:10
plot(x,AB,type="l")
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply error svyby function survey package

2014-11-12 Thread Martin Canon
Hi.


I'm trying to calculate the weighted mean score of a quality of life
measure (ovt) in patients with irritable bowel syndrome by their
marital status (d7).

This is a summary of the structure of the dataset:

 str(sii.tesis)
'data.frame':1063 obs. of  75 variables:
 $ id : int  51 52 53 54 55 56 57 58 59 60 ...
 $ stratum: Factor w/ 6 levels MEst,MAcad,..: 1 4 NA 4 4 1 6 NA 4 4 ...
 $ expfc  : num  22.8 17.1 NA 17.1 17.1 ...
 $ d6 : Factor w/ 3 levels Estudiante,Profesor,..: 1 1 NA
1 1 1 3 NA 1 1 ...
 $ d7 : Factor w/ 6 levels Soltero,Casado,..: 1 1 NA 1 1 1
1 NA 1 1 ...
 $ d7c: Factor w/ 2 levels No estable,Estable: 1 1 NA 1 1
1 1 NA 1 1 ...
 $ s1cm   : Factor w/ 2 levels No,Si: 1 2 NA 1 1 1 2 NA 1 1 ...
 $ ovt: num  NA 93.4 NA NA NA ...

I declared the sampling design:

 sii.design - svydesign(
  id = ~1,
  strata = ~stratum,
  weights = ~expfc,
  data = subset(sii.tesis, !is.na(stratum)))

Then I tried to get the result:

 svyby(~ovt, ~d7, sii.design, svymean, na.rm = TRUE, level = 0.95)

but i get the error:

Error in tapply(1:NROW(x), list(factor(strata)), function(index) { :
  arguments must have same length


The length of both variables is the same. If the variable ovt exists,
there is a d7 match in the data frame.

I try the same thing using another variable instead - role (d6) -
and it works.

 svyby(~ovt, ~d6, sii.design, svymean, na.rm = TRUE, level = 0.95)
   d6  ovt   se
Estudiante Estudiante 71.01805 1.370569
Profesor Profesor 72.30923 6.518378
Administrativo Administrativo 75.69102 3.715050

If I use the recategorized d7 variable (d7c,  two levels only) it works too:

 svyby(~ovt, ~d7c, sii.design, svymean, na.rm = TRUE, level = 0.95)
  d7c  ovt  se
No estable No estable 70.92344 1.37460
Estable   Estable 74.53719 4.16954


What could be the problem?


Regards.


Martin Canon
Colombia, South America

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply error svyby function survey package

2014-11-12 Thread Anthony Damico
try resetting your levels?  if that doesn't work, please dput() an example
data set that we can test with :) thanks!

sii.design - update( sii.design , d6 = factor( d6 ) )






On Wed, Nov 12, 2014 at 7:59 AM, Martin Canon martin.ca...@gmail.com
wrote:

 Hi.


 I'm trying to calculate the weighted mean score of a quality of life
 measure (ovt) in patients with irritable bowel syndrome by their
 marital status (d7).

 This is a summary of the structure of the dataset:

  str(sii.tesis)
 'data.frame':1063 obs. of  75 variables:
  $ id : int  51 52 53 54 55 56 57 58 59 60 ...
  $ stratum: Factor w/ 6 levels MEst,MAcad,..: 1 4 NA 4 4 1 6 NA 4
 4 ...
  $ expfc  : num  22.8 17.1 NA 17.1 17.1 ...
  $ d6 : Factor w/ 3 levels Estudiante,Profesor,..: 1 1 NA
 1 1 1 3 NA 1 1 ...
  $ d7 : Factor w/ 6 levels Soltero,Casado,..: 1 1 NA 1 1 1
 1 NA 1 1 ...
  $ d7c: Factor w/ 2 levels No estable,Estable: 1 1 NA 1 1
 1 1 NA 1 1 ...
  $ s1cm   : Factor w/ 2 levels No,Si: 1 2 NA 1 1 1 2 NA 1 1 ...
  $ ovt: num  NA 93.4 NA NA NA ...

 I declared the sampling design:

  sii.design - svydesign(
   id = ~1,
   strata = ~stratum,
   weights = ~expfc,
   data = subset(sii.tesis, !is.na(stratum)))

 Then I tried to get the result:

  svyby(~ovt, ~d7, sii.design, svymean, na.rm = TRUE, level = 0.95)

 but i get the error:

 Error in tapply(1:NROW(x), list(factor(strata)), function(index) { :
   arguments must have same length


 The length of both variables is the same. If the variable ovt exists,
 there is a d7 match in the data frame.

 I try the same thing using another variable instead - role (d6) -
 and it works.

  svyby(~ovt, ~d6, sii.design, svymean, na.rm = TRUE, level = 0.95)
d6  ovt   se
 Estudiante Estudiante 71.01805 1.370569
 Profesor Profesor 72.30923 6.518378
 Administrativo Administrativo 75.69102 3.715050

 If I use the recategorized d7 variable (d7c,  two levels only) it works
 too:

  svyby(~ovt, ~d7c, sii.design, svymean, na.rm = TRUE, level = 0.95)
   d7c  ovt  se
 No estable No estable 70.92344 1.37460
 Estable   Estable 74.53719 4.16954


 What could be the problem?


 Regards.


 Martin Canon
 Colombia, South America

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply error svyby function survey package

2014-11-12 Thread Anthony Damico
hi martin, sending the first 25 rows does not help if it does not re-create
the problem..  when i run the data you have provided, i do not encounter
your problem (see below).  someone else may be able to guess the issue, but
this would be a lot easier to solve if you can create a minimal
reproducible example

http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example


sii.tesis -
structure(list(id = c(51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L,
59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L,
73L, 74L, 75L, 76L), stratum = structure(c(1L, 4L, NA, 4L, 4L,
1L, 6L, NA, 4L, 4L, 1L, 1L, 1L, 6L, 6L, 3L, 3L, 6L, NA, 1L, 1L,
6L, 4L, 3L, 6L), .Label = c(MEst, MAcad, MAdm, FEst,
FAcad, FAdm), class = factor), expfc = c(22.8195266723633,
17.0644626617432, NA, 17.0644626617432, 17.0644626617432, 22.8195266723633,
5.1702127456665, NA, 17.0644626617432, 17.0644626617432, 22.8195266723633,
22.8195266723633, 22.8195266723633, 5.1702127456665, 5.1702127456665,
6.24137926101685, 6.24137926101685, 5.1702127456665, NA, 22.8195266723633,
22.8195266723633, 5.1702127456665, 17.0644626617432, 6.24137926101685,
5.1702127456665), d7 = structure(c(1L, 1L, NA, 1L, 1L, 1L, 1L,
NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, NA, 1L, 1L, 6L, 1L,
6L, 6L), .Label = c(Soltero, Casado, Separado, Divorciado,
Viudo, Union libre), class = factor), ovt = c(NA, 93.3823547363281,
NA, NA, NA, NA, 83.8235321044922, NA, NA, NA, NA, NA, NA, NA,
79.4117660522461, NA, NA, 19.1176471710205, NA, NA, NA, 85.2941207885742,
NA, NA, NA)), .Names = c(id, stratum, expfc, d7, ovt
), row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25), class = data.frame)

 sii.design - svydesign(
  id = ~1,
  strata = ~stratum,
  weights = ~expfc,
  data = subset(sii.tesis, !is.na(stratum)))

svyby(~ovt, ~d7, sii.design, svymean, na.rm = TRUE, level = 0.95)


# works fine---
 svyby(~ovt, ~d7, sii.design, svymean, na.rm = TRUE, level = 0.95)
 d7  ovt   se
Soltero Soltero 88.94329 3.333485
Casado   Casado 19.11765 0.00
Union libre Union libre 85.29412 0.00






On Wed, Nov 12, 2014 at 5:25 PM, Martin Canon martin.ca...@gmail.com
wrote:

 Anthony, thanks for your reply.

 Resetting the levels didn't work.

 These are the first 25 rows of the dataset:

 structure(list(id = c(51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L,
 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L,
 73L, 74L, 75L, 76L), stratum = structure(c(1L, 4L, NA, 4L, 4L,
 1L, 6L, NA, 4L, 4L, 1L, 1L, 1L, 6L, 6L, 3L, 3L, 6L, NA, 1L, 1L,
 6L, 4L, 3L, 6L), .Label = c(MEst, MAcad, MAdm, FEst,
 FAcad, FAdm), class = factor), expfc = c(22.8195266723633,
 17.0644626617432, NA, 17.0644626617432, 17.0644626617432, 22.8195266723633,
 5.1702127456665, NA, 17.0644626617432, 17.0644626617432, 22.8195266723633,
 22.8195266723633, 22.8195266723633, 5.1702127456665, 5.1702127456665,
 6.24137926101685, 6.24137926101685, 5.1702127456665, NA, 22.8195266723633,
 22.8195266723633, 5.1702127456665, 17.0644626617432, 6.24137926101685,
 5.1702127456665), d7 = structure(c(1L, 1L, NA, 1L, 1L, 1L, 1L,
 NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, NA, 1L, 1L, 6L, 1L,
 6L, 6L), .Label = c(Soltero, Casado, Separado, Divorciado,
 Viudo, Union libre), class = factor), ovt = c(NA, 93.3823547363281,
 NA, NA, NA, NA, 83.8235321044922, NA, NA, NA, NA, NA, NA, NA,
 79.4117660522461, NA, NA, 19.1176471710205, NA, NA, NA, 85.2941207885742,
 NA, NA, NA)), .Names = c(id, stratum, expfc, d7, ovt
 ), row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9,
 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
 21, 22, 23, 24, 25), class = data.frame)

 Regards.

 Martin

 On Wed, Nov 12, 2014 at 1:39 PM, Anthony Damico ajdam...@gmail.com
 wrote:
  try resetting your levels?  if that doesn't work, please dput() an
 example
  data set that we can test with :) thanks!
 
  sii.design - update( sii.design , d6 = factor( d6 ) )
 
 
 
 
 
 
  On Wed, Nov 12, 2014 at 7:59 AM, Martin Canon martin.ca...@gmail.com
  wrote:
 
  Hi.
 
 
  I'm trying to calculate the weighted mean score of a quality of life
  measure (ovt) in patients with irritable bowel syndrome by their
  marital status (d7).
 
  This is a summary of the structure of the dataset:
 
   str(sii.tesis)
  'data.frame':1063 obs. of  75 variables:
   $ id : int  51 52 53 54 55 56 57 58 59 60 ...
   $ stratum: Factor w/ 6 levels MEst,MAcad,..: 1 4 NA 4 4 1 6 NA
 4
  4 ...
   $ expfc  : num  22.8 17.1 NA 17.1 17.1 ...
   $ d6 : Factor w/ 3 levels Estudiante,Profesor,..: 1 1 NA
  1 1 1 3 NA 1 1 ...
   $ d7 : Factor w/ 6 levels Soltero,Casado,..: 1 1 NA 1 1 1
  1 NA 1 1 ...
   $ d7c: Factor w/ 2 levels No estable,Estable: 1 1 NA 1 1
  1 1 NA 1 1 ...
   $ s1cm   : Factor w/ 2 levels No,Si: 1 2 NA 1 1 1 2 NA 1 1 ...
   $ ovt: num  NA 93.4 NA NA NA ...
 
  I declared the sampling design:
 
   sii.design - svydesign(
id = ~1,
strata = 

Re: [R] tapply error svyby function survey package

2014-11-12 Thread Martin Canon
Anthony, thanks for your reply.

Resetting the levels didn't work.

These are the first 25 rows of the dataset:

structure(list(id = c(51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L,
59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L,
73L, 74L, 75L, 76L), stratum = structure(c(1L, 4L, NA, 4L, 4L,
1L, 6L, NA, 4L, 4L, 1L, 1L, 1L, 6L, 6L, 3L, 3L, 6L, NA, 1L, 1L,
6L, 4L, 3L, 6L), .Label = c(MEst, MAcad, MAdm, FEst,
FAcad, FAdm), class = factor), expfc = c(22.8195266723633,
17.0644626617432, NA, 17.0644626617432, 17.0644626617432, 22.8195266723633,
5.1702127456665, NA, 17.0644626617432, 17.0644626617432, 22.8195266723633,
22.8195266723633, 22.8195266723633, 5.1702127456665, 5.1702127456665,
6.24137926101685, 6.24137926101685, 5.1702127456665, NA, 22.8195266723633,
22.8195266723633, 5.1702127456665, 17.0644626617432, 6.24137926101685,
5.1702127456665), d7 = structure(c(1L, 1L, NA, 1L, 1L, 1L, 1L,
NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, NA, 1L, 1L, 6L, 1L,
6L, 6L), .Label = c(Soltero, Casado, Separado, Divorciado,
Viudo, Union libre), class = factor), ovt = c(NA, 93.3823547363281,
NA, NA, NA, NA, 83.8235321044922, NA, NA, NA, NA, NA, NA, NA,
79.4117660522461, NA, NA, 19.1176471710205, NA, NA, NA, 85.2941207885742,
NA, NA, NA)), .Names = c(id, stratum, expfc, d7, ovt
), row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25), class = data.frame)

Regards.

Martin

On Wed, Nov 12, 2014 at 1:39 PM, Anthony Damico ajdam...@gmail.com wrote:
 try resetting your levels?  if that doesn't work, please dput() an example
 data set that we can test with :) thanks!

 sii.design - update( sii.design , d6 = factor( d6 ) )






 On Wed, Nov 12, 2014 at 7:59 AM, Martin Canon martin.ca...@gmail.com
 wrote:

 Hi.


 I'm trying to calculate the weighted mean score of a quality of life
 measure (ovt) in patients with irritable bowel syndrome by their
 marital status (d7).

 This is a summary of the structure of the dataset:

  str(sii.tesis)
 'data.frame':1063 obs. of  75 variables:
  $ id : int  51 52 53 54 55 56 57 58 59 60 ...
  $ stratum: Factor w/ 6 levels MEst,MAcad,..: 1 4 NA 4 4 1 6 NA 4
 4 ...
  $ expfc  : num  22.8 17.1 NA 17.1 17.1 ...
  $ d6 : Factor w/ 3 levels Estudiante,Profesor,..: 1 1 NA
 1 1 1 3 NA 1 1 ...
  $ d7 : Factor w/ 6 levels Soltero,Casado,..: 1 1 NA 1 1 1
 1 NA 1 1 ...
  $ d7c: Factor w/ 2 levels No estable,Estable: 1 1 NA 1 1
 1 1 NA 1 1 ...
  $ s1cm   : Factor w/ 2 levels No,Si: 1 2 NA 1 1 1 2 NA 1 1 ...
  $ ovt: num  NA 93.4 NA NA NA ...

 I declared the sampling design:

  sii.design - svydesign(
   id = ~1,
   strata = ~stratum,
   weights = ~expfc,
   data = subset(sii.tesis, !is.na(stratum)))

 Then I tried to get the result:

  svyby(~ovt, ~d7, sii.design, svymean, na.rm = TRUE, level = 0.95)

 but i get the error:

 Error in tapply(1:NROW(x), list(factor(strata)), function(index) { :
   arguments must have same length


 The length of both variables is the same. If the variable ovt exists,
 there is a d7 match in the data frame.

 I try the same thing using another variable instead - role (d6) -
 and it works.

  svyby(~ovt, ~d6, sii.design, svymean, na.rm = TRUE, level = 0.95)
d6  ovt   se
 Estudiante Estudiante 71.01805 1.370569
 Profesor Profesor 72.30923 6.518378
 Administrativo Administrativo 75.69102 3.715050

 If I use the recategorized d7 variable (d7c,  two levels only) it works
 too:

  svyby(~ovt, ~d7c, sii.design, svymean, na.rm = TRUE, level = 0.95)
   d7c  ovt  se
 No estable No estable 70.92344 1.37460
 Estable   Estable 74.53719 4.16954


 What could be the problem?


 Regards.


 Martin Canon
 Colombia, South America

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply and functions with more than one objects

2013-01-22 Thread Dominic Roye
Hello,

How i can use a costum function in tapply which has more than one variable?

I mean sum(x) only needs one object but what when i have a function
function(x,y) with more, how i indicate where are the other variables
to use?7


I hope someone can help me. Thank you!!

Best regards,

Dominic

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply and functions with more than one objects

2013-01-22 Thread David Winsemius

On Jan 22, 2013, at 2:24 PM, Dominic Roye wrote:

 Hello,
 
 How i can use a costum function in tapply which has more than one variable?
 
 I mean sum(x) only needs one object but what when i have a function
 function(x,y) with more, how i indicate where are the other variables
 to use?7

You can use:

lapply(split( multi_col_object, category_vec) , function(x,y){sum(x,y)}  ) 

aggregate(dat, category, FUN=sum)

Or:

do.call(rbind, by( multi_col_object, category_vec, function(x,y){ } )

Sometimes `Reduce` is more compact. Other times `mapply` is needed.
-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply to data.frame or matrix

2012-09-04 Thread Jannis
Dear R users, 


imagine i have a dataframe and an indexing vector with the length of the 
amount of columns of the dataframe. Is there any convenient way to 
combine the colums of the dataframe into vectors (or straight away apply 
fundtions to these subsets) according to the indexing vector in a 
similar manner to the tapply function? 

For example, in the follwoing case, I would like to combine columns 1 
and two into one vector, and columns 3-4 into another: 

test = as.data.frame(matrix(1:20, ncol = 5, nrow=4)) 
test.ind =c(1,1,2,2,2) 


Thanks a lot! 
Jannis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply to data.frame or matrix

2012-09-04 Thread Rui Barradas

Hello,

Here's a way.

test - as.data.frame(matrix(1:20, ncol = 5, nrow=4))
test.ind - c(1,1,2,2,2)

lapply(split(colnames(test), test.ind), function(x) unlist(test[, x]))

Hope this helps,

Rui Barradas
Em 04-09-2012 15:40, Jannis escreveu:

Dear R users,


imagine i have a dataframe and an indexing vector with the length of the
amount of columns of the dataframe. Is there any convenient way to
combine the colums of the dataframe into vectors (or straight away apply
fundtions to these subsets) according to the indexing vector in a
similar manner to the tapply function?

For example, in the follwoing case, I would like to combine columns 1
and two into one vector, and columns 3-4 into another:

test = as.data.frame(matrix(1:20, ncol = 5, nrow=4))
test.ind =c(1,1,2,2,2)


Thanks a lot!
Jannis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply to data.frame or matrix

2012-09-04 Thread arun
Hi,
Here's another way:
testagg-aggregate(colnames(test),list(test.ind),function(x) test[,x])
list(unlist(testagg[,2][1]),unlist(testagg[,2][2]))
#[[1]]
#0.V11 0.V12 0.V13 0.V14 0.V21 0.V22 0.V23 0.V24 
    1 2 3 4 5 6 7 8 

#[[2]]
#1.V31 1.V32 1.V33 1.V34 1.V41 1.V42 1.V43 1.V44 1.V51 1.V52 1.V53 1.V54 
 #   9    10    11    12    13    14    15    16    17    18    19    20 
A.K.




- Original Message -
From: Rui Barradas ruipbarra...@sapo.pt
To: Jannis bt_jan...@yahoo.de
Cc: r-help r-help@r-project.org
Sent: Tuesday, September 4, 2012 11:30 AM
Subject: Re: [R] tapply to data.frame or matrix

Hello,

Here's a way.

test - as.data.frame(matrix(1:20, ncol = 5, nrow=4))
test.ind - c(1,1,2,2,2)

lapply(split(colnames(test), test.ind), function(x) unlist(test[, x]))

Hope this helps,

Rui Barradas
Em 04-09-2012 15:40, Jannis escreveu:
 Dear R users,


 imagine i have a dataframe and an indexing vector with the length of the
 amount of columns of the dataframe. Is there any convenient way to
 combine the colums of the dataframe into vectors (or straight away apply
 fundtions to these subsets) according to the indexing vector in a
 similar manner to the tapply function?

 For example, in the follwoing case, I would like to combine columns 1
 and two into one vector, and columns 3-4 into another:

 test     = as.data.frame(matrix(1:20, ncol = 5, nrow=4))
 test.ind =c(1,1,2,2,2)


 Thanks a lot!
 Jannis

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply confusion

2012-08-30 Thread andyspeak
Actually its okay.
I just created 16 subsets of the dataframe using the different months and
then ran kruskal test 16 times.
Im sure there is a nice way to code this to do it automatically and produce
a nice table of the results but i only started learning R two weeks ago!!!

Thanks for all the help



--
View this message in context: 
http://r.789695.n4.nabble.com/tapply-confusion-tp4641729p4641821.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply confusion

2012-08-30 Thread andyspeak
Hello
Thankyou for the help.

kruskal.test(Temp, Roof)   is simple but just returns one result for the
whole temperature dataset organised by roof.

I want to compare the Temp data for each Roof in each Month.  So because i
have temperature data on the three roofs for 16 different months then i want
16 separate kruskal.test results.,

How do i do this?

Thanks




--
View this message in context: 
http://r.789695.n4.nabble.com/tapply-confusion-tp4641729p4641820.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply confusion

2012-08-30 Thread David Winsemius


On Aug 30, 2012, at 4:02 AM, andyspeak wrote:


Hello
Thankyou for the help.

kruskal.test(Temp, Roof)   is simple but just returns one result for  
the

whole temperature dataset organised by roof.

I want to compare the Temp data for each Roof in each Month.  So  
because i
have temperature data on the three roofs for 16 different months  
then i want

16 separate kruskal.test results.,



lapply( split(dfrm, dfrm$Month), function(xfrm) {
 kruskal.test(xfrm[[Temp]], xfrm[[Roof]] }

Notice that I used an assumed name for the dataframe. You have  
apparently been following unwise advice to use attach. You would be  
advised to disregard that advice.


--

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply confusion

2012-08-29 Thread andyspeak
Hello
I have a huge data frame with three columns 'Roof' 'Month' and 'Temp'
i want to run analyses on the numerical Temp data by the factors Roof and
Month, separately and together.
For using more than one factor i understand i should use aggregate, but i am
struggling with the tapply for single factor analysis.

  tapply(Temp, INDEX = Roof, FUN = median)

This works fine, however if i try to do anything a bit more complex, such
as:

 tapply(Temp, INDEX = Roof, FUN = kruskal.test)

it gives the error - Error in length(g) : 'g' is missing

What could be the problem?
Thanks




--
View this message in context: 
http://r.789695.n4.nabble.com/tapply-confusion-tp4641729.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply confusion

2012-08-29 Thread Milan Bouchet-Valat
Le mercredi 29 août 2012 à 07:37 -0700, andyspeak a écrit :
 Hello
 I have a huge data frame with three columns 'Roof' 'Month' and 'Temp'
 i want to run analyses on the numerical Temp data by the factors Roof and
 Month, separately and together.
 For using more than one factor i understand i should use aggregate, but i am
 struggling with the tapply for single factor analysis.
 
   tapply(Temp, INDEX = Roof, FUN = median)
 
 This works fine, however if i try to do anything a bit more complex, such
 as:
 
  tapply(Temp, INDEX = Roof, FUN = kruskal.test)
 
 it gives the error - Error in length(g) : 'g' is missing
 
 What could be the problem?
If you read ?kruskal.test, you'll notice its default function takes (at
least) two arguments, the second being g. Its description is:
   g: a vector or factor object giving the group for the
  corresponding elements of ‘x’.  Ignored if ‘x’ is a list.

So you do not need tapply(): just call
kruskal.test(Temp, Roof)


The theoretical reason you cannot use tapply() is that it calls FUN
separately for each subset of the data. kruskal.test() would never be
passed the whole data set, which is needed to make a test of
differences.


Regards

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply confusion

2012-08-29 Thread David Winsemius


On Aug 29, 2012, at 7:37 AM, andyspeak wrote:


Hello
I have a huge data frame with three columns 'Roof' 'Month' and 'Temp'
i want to run analyses on the numerical Temp data by the factors  
Roof and

Month, separately and together.
For using more than one factor i understand i should use aggregate,  
but i am

struggling with the tapply for single factor analysis.


tapply(Temp, INDEX = Roof, FUN = median)


This works fine, however if i try to do anything a bit more complex,  
such

as:


tapply(Temp, INDEX = Roof, FUN = kruskal.test)


it gives the error - Error in length(g) : 'g' is missing


What is the sound of one hand clapping?

You are sending a bunch of single vectors with no grouping variable to  
a function that is expecting two data columns. Maybe you should  
explain what test you had in mind using natural language and we could  
help get you there.



--
David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for enormous (2^31 row) matrices

2012-02-26 Thread Gabor Grothendieck
On Thu, Feb 23, 2012 at 11:39 AM, Matthew Keller mckellerc...@gmail.com wrote:
 Thank you all very much for your help (on both the r-help and the
 bioconductor listserves).

 Benilton - I couldn't get sqldf to install on the server I'm using
 (error is: Error : package 'gsubfn' does not have a name space). I
 think this was a problem for R 2.13, and I'm trying to get the admin's
 to install a more up-to-date version. I know that I need to probably
 learn a modicum of SQL given the sizes of datasets I'm using now.

Right. See the troubleshooting section of the sqldf home page:
http://code.google.com/p/sqldf/#Troubleshooting

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for enormous (2^31 row) matrices

2012-02-23 Thread Matthew Keller
Thank you all very much for your help (on both the r-help and the
bioconductor listserves).

Benilton - I couldn't get sqldf to install on the server I'm using
(error is: Error : package 'gsubfn' does not have a name space). I
think this was a problem for R 2.13, and I'm trying to get the admin's
to install a more up-to-date version. I know that I need to probably
learn a modicum of SQL given the sizes of datasets I'm using now.

I ended up using a modified version of Hervé Pagès' excellent code
(thank you!). I got a huge (40-fold) speed bump by using the
data.table package for indexing/aggregate steps, making an hours long
job a minutes long job. SO - read.table is hugely useful if you're
dealing with indexing/apply-family functions on huge datasets. By the
way, I'm not sure why, but read.table was a bit faster than scan for
this problem... Here is the code for others:


require(data.table)

computeAllPairSums - function(filename, nbindiv,nrows.to.read)
{
   con - file(filename, open=r)
   on.exit(close(con))
   ans - matrix(numeric(nbindiv * nbindiv), nrow=nbindiv)
   chunk - 0L
   while (TRUE) {
   #read.table faster than scan
   df0 - read.table(con,col.names=c(ID1, ID2, ignored, sharing),
colClasses=c(integer, integer, NULL,
numeric),nrows=nrows.to.read,comment.char=)

   DT - data.table(df0)
   setkey(DT,ID1,ID2)
   ss - DT[,sum(sharing),by=ID1,ID2]

   if (nrow(df0) == 0L)
   break

   chunk - chunk + 1L
   cat(Processing chunk, chunk, ... )

  idd - as.matrix(subset(ss,select=1:2))
  newvec - as.vector(as.matrix(subset(ss,select=3)))
  ans[idd] - ans[idd] + newvec

 cat(OK\n)
 }
   ans
 }



On Wed, Feb 22, 2012 at 3:20 PM, ilai ke...@math.montana.edu wrote:
 On Tue, Feb 21, 2012 at 4:04 PM, Matthew Keller mckellerc...@gmail.com 
 wrote:

 X - read.big.matrix(file.loc.X,sep= ,type=double)
 hap.indices - bigsplit(X,1:2) #this runs for too long to be useful on
 these matrices
 #I was then going to use foreach loop to sum across the splits
 identified by bigsplit

 How about just using foreach earlier in the process ? e.g. split
 file.loc.X to (80) sub files and then run
 read.big.matrix/bigsplit/sum inside %dopar%

 If splitting X beforehand is a problem, you could also use ?scan to
 read in different chunks of the file, something like (untested
 obviously):
 # for X a matrix 800x4
 lineind- seq(1,800,100)  # create an index vec for the lines to read
 ReducedX- foreach(i = 1:8) %dopar%{
  x - 
 scan('file.loc.X',list(double(0),double(0),double(0),double(0)),skip=lineind[i],nlines=100)
 ... do your thing on x (aggregate/tapply etc.)
  }

 Hope this helped
 Elai.




 SO - does anyone have ideas on how to deal with this problem - i.e.,
 how to use a tapply() like function on an enormous matrix? This isn't
 necessarily a bigtabulate question (although if I screwed up using
 bigsplit, let me know). If another package (e.g., an SQL package) can
 do something like this efficiently, I'd like to hear about it and your
 experiences using it.

 Thank you in advance,

 Matt



 --
 Matthew C Keller
 Asst. Professor of Psychology
 University of Colorado at Boulder
 www.matthewckeller.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for enormous (2^31 row) matrices

2012-02-22 Thread ilai
On Tue, Feb 21, 2012 at 4:04 PM, Matthew Keller mckellerc...@gmail.com wrote:

 X - read.big.matrix(file.loc.X,sep= ,type=double)
 hap.indices - bigsplit(X,1:2) #this runs for too long to be useful on
 these matrices
 #I was then going to use foreach loop to sum across the splits
 identified by bigsplit

How about just using foreach earlier in the process ? e.g. split
file.loc.X to (80) sub files and then run
read.big.matrix/bigsplit/sum inside %dopar%

If splitting X beforehand is a problem, you could also use ?scan to
read in different chunks of the file, something like (untested
obviously):
# for X a matrix 800x4
lineind- seq(1,800,100)  # create an index vec for the lines to read
ReducedX- foreach(i = 1:8) %dopar%{
  x - 
scan('file.loc.X',list(double(0),double(0),double(0),double(0)),skip=lineind[i],nlines=100)
... do your thing on x (aggregate/tapply etc.)
  }

Hope this helped
Elai.




 SO - does anyone have ideas on how to deal with this problem - i.e.,
 how to use a tapply() like function on an enormous matrix? This isn't
 necessarily a bigtabulate question (although if I screwed up using
 bigsplit, let me know). If another package (e.g., an SQL package) can
 do something like this efficiently, I'd like to hear about it and your
 experiences using it.

 Thank you in advance,

 Matt



 --
 Matthew C Keller
 Asst. Professor of Psychology
 University of Colorado at Boulder
 www.matthewckeller.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply for enormous (2^31 row) matrices

2012-02-21 Thread Matthew Keller
Hi all,

SETUP:
I have pairwise data on 22 chromosomes. Data matrix X for a given
chromosome looks like this:

1 13 58 1.12
6 142 56 1.11
18 307 64 3.13
22 320 58 0.72

Where column 1 is person ID 1, column 2 is person ID 2, column 3 can
be ignored, and column 4 is how much chromosomal sharing those two
individuals have in some small portion of the chromosome. There are
9000 individual people, and therefore ~ (9000^2)/2 pairwise matches at
each small location on the chromosome, so across an entire chromosome,
these matrices are VERY large (e.g., 3 billion rows, which is  the
2^31 vector size limitation in R). I have access to a server with 64
bit R, 1TB RAM and 80 processors.

PROBLEM:
A pair of individuals (e.g., person 1 and 13 from the first row above)
will show up multiple times in a given file. I want to sum column 4
across each pair of individuals. If I could bring the matrix into R, I
could use tapply() to accomplish this by indexing on
paste(X[,1],X[,2]), but the matrix doesn't fit into R. I have been
trying to use bigmemory and bigtabulate packages in R, but when I try
to use the bigsplit function, R never completes the operation (after a
day, I killed the process). In particular, I did this:

X - read.big.matrix(file.loc.X,sep= ,type=double)
hap.indices - bigsplit(X,1:2) #this runs for too long to be useful on
these matrices
#I was then going to use foreach loop to sum across the splits
identified by bigsplit

SO - does anyone have ideas on how to deal with this problem - i.e.,
how to use a tapply() like function on an enormous matrix? This isn't
necessarily a bigtabulate question (although if I screwed up using
bigsplit, let me know). If another package (e.g., an SQL package) can
do something like this efficiently, I'd like to hear about it and your
experiences using it.

Thank you in advance,

Matt



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply with specific quantile value

2011-03-24 Thread Steven Ranney
All -

I have an example data frame

x   l.c.1
43.38812035 085
47.55710661 085
47.55710661 085
51.99211429 085
51.99211429 095
54.78449958 095
54.78449958 095
56.70201864 095
56.70201864 105
59.66361903 105
61.69573564 105
61.69573564 105
63.77469479 115
64.83191994 115
64.83191994 115
66.98222118 115
66.98222118 125
66.98222118 125
66.98222118 125
66.98222118 125

and I'd like to get the 3rd quantile by l.c.1 so I use

tapply(x, l.c.1, quantile)

and my output includes all quantiles (i.e., 0, 25%, 50%, 75%, 100%)
but I'm only interested in the 75% quantile.  Is there an additional
statement or function I can use to get just the quantile that I want?

Thanks for your help -

SR
Steven H. Ranney

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply with specific quantile value

2011-03-24 Thread Peter Alspach
Tena koe Steven

The  ... argument of the apply series of functions allows one to pass arguments 
to the called function.  So:

tapply(x, l.c.1, quantile, probs=0.75)

should work (although I haven't tested it).

HTH .

Peter Alspach

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Steven Ranney
 Sent: Friday, 25 March 2011 12:18 p.m.
 To: r-help@r-project.org
 Subject: [R] tapply with specific quantile value
 
 All -
 
 I have an example data frame
 
 x l.c.1
 43.38812035   085
 47.55710661   085
 47.55710661   085
 51.99211429   085
 51.99211429   095
 54.78449958   095
 54.78449958   095
 56.70201864   095
 56.70201864   105
 59.66361903   105
 61.69573564   105
 61.69573564   105
 63.77469479   115
 64.83191994   115
 64.83191994   115
 66.98222118   115
 66.98222118   125
 66.98222118   125
 66.98222118   125
 66.98222118   125
 
 and I'd like to get the 3rd quantile by l.c.1 so I use
 
 tapply(x, l.c.1, quantile)
 
 and my output includes all quantiles (i.e., 0, 25%, 50%, 75%, 100%)
 but I'm only interested in the 75% quantile.  Is there an additional
 statement or function I can use to get just the quantile that I want?
 
 Thanks for your help -
 
 SR
 Steven H. Ranney
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

The contents of this e-mail are confidential and may be subject to legal 
privilege.
 If you are not the intended recipient you must not use, disseminate, 
distribute or
 reproduce all or any part of this e-mail or attachments.  If you have received 
this
 e-mail in error, please notify the sender and delete all material pertaining 
to this
 e-mail.  Any opinion or views expressed in this e-mail are those of the 
individual
 sender and may not represent those of The New Zealand Institute for Plant and
 Food Research Limited.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply with specific quantile value

2011-03-24 Thread Tóth Dénes

Just have a look at ?quantile and the probs argument.

tapply(x, l.c.1, quantile,probs=0.75)

Anyway, quantiles and quartiles are not the same. I guess you meant the
3rd quartile.


 All -

 I have an example data frame

 x l.c.1
 43.38812035   085
 47.55710661   085
 47.55710661   085
 51.99211429   085
 51.99211429   095
 54.78449958   095
 54.78449958   095
 56.70201864   095
 56.70201864   105
 59.66361903   105
 61.69573564   105
 61.69573564   105
 63.77469479   115
 64.83191994   115
 64.83191994   115
 66.98222118   115
 66.98222118   125
 66.98222118   125
 66.98222118   125
 66.98222118   125

 and I'd like to get the 3rd quantile by l.c.1 so I use

 tapply(x, l.c.1, quantile)

 and my output includes all quantiles (i.e., 0, 25%, 50%, 75%, 100%)
 but I'm only interested in the 75% quantile.  Is there an additional
 statement or function I can use to get just the quantile that I want?

 Thanks for your help -

 SR
 Steven H. Ranney

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply with specific quantile value

2011-03-24 Thread Jorge Ivan Velez
Hi Steven,

See the prob argument under ?quantile.  The following should be what you
want:

tapply(x, l.c.1, quantile, prob = 0.75)


HTH,
Jorge
*

*

On Thu, Mar 24, 2011 at 7:18 PM, Steven Ranney  wrote:

 All -

 I have an example data frame

 x   l.c.1
 43.38812035 085
 47.55710661085
 47.55710661085
 51.99211429085
 51.99211429095
 54.78449958 095
 54.78449958 095
 56.70201864 095
 56.70201864 105
 59.66361903 105
 61.69573564105
 61.69573564105
 63.77469479 115
 64.83191994 115
 64.83191994 115
 66.98222118115
 66.98222118125
 66.98222118125
 66.98222118125
 66.98222118125

 and I'd like to get the 3rd quantile by l.c.1 so I use

 tapply(x, l.c.1, quantile)

 and my output includes all quantiles (i.e., 0, 25%, 50%, 75%, 100%)
 but I'm only interested in the 75% quantile.  Is there an additional
 statement or function I can use to get just the quantile that I want?

 Thanks for your help -

 SR
 Steven H. Ranney

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply with specific quantile value

2011-03-24 Thread Steven Ranney
Worked just fine.  I had been incorrectly trying

tapply(x, l.c.1, quantile(probs=0.75))

rather than

tapply(x, l.c.1, quantile, probs=0.75)

Thanks for the help -

SR
Steven H. Ranney



On Thu, Mar 24, 2011 at 6:03 PM, Peter Alspach
peter.alsp...@plantandfood.co.nz wrote:
 Tena koe Steven

 The  ... argument of the apply series of functions allows one to pass 
 arguments to the called function.  So:

 tapply(x, l.c.1, quantile, probs=0.75)

 should work (although I haven't tested it).

 HTH .

 Peter Alspach

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Steven Ranney
 Sent: Friday, 25 March 2011 12:18 p.m.
 To: r-help@r-project.org
 Subject: [R] tapply with specific quantile value

 All -

 I have an example data frame

 x     l.c.1
 43.38812035   085
 47.55710661      085
 47.55710661      085
 51.99211429      085
 51.99211429      095
 54.78449958   095
 54.78449958   095
 56.70201864   095
 56.70201864   105
 59.66361903   105
 61.69573564      105
 61.69573564      105
 63.77469479   115
 64.83191994   115
 64.83191994   115
 66.98222118      115
 66.98222118      125
 66.98222118      125
 66.98222118      125
 66.98222118      125

 and I'd like to get the 3rd quantile by l.c.1 so I use

 tapply(x, l.c.1, quantile)

 and my output includes all quantiles (i.e., 0, 25%, 50%, 75%, 100%)
 but I'm only interested in the 75% quantile.  Is there an additional
 statement or function I can use to get just the quantile that I want?

 Thanks for your help -

 SR
 Steven H. Ranney

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

 The contents of this e-mail are confidential and may be subject to legal 
 privilege.
  If you are not the intended recipient you must not use, disseminate, 
 distribute or
  reproduce all or any part of this e-mail or attachments.  If you have 
 received this
  e-mail in error, please notify the sender and delete all material pertaining 
 to this
  e-mail.  Any opinion or views expressed in this e-mail are those of the 
 individual
  sender and may not represent those of The New Zealand Institute for Plant and
  Food Research Limited.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply output as a dataframe

2011-02-03 Thread Graves, Gregory
On Mon, Apr 13, 2009 at 12:41 PM, Dan Dube ddube-at-advisen.com wrote:

 i use tapply and by often, but i always end up banging my head against
 the wall with the output.

The proposed solution of Dan's problem posted on R-help was: 

 do.call(rbind,a)

When I use this 'solution' I get 'ERROR:  second argument must be a list'.  So 
head on wall continues.

My tapply output is generated as follows:

 a=tapply(value,list(sampling.date,station.code),mean)

which gives me this (in part):

 A B C D E F G H I J  K
1/15/2008  0.004 0.027 0.019 0.015 0.035 0.022 0.007 0.038 0.042 0.045 0.0350
1/15/2009  0.027 0.027 0.031 0.015 0.008 0.021 0.007 0.027 0.026 0.029 0.0210
1/15/2010  0.016 0.020 0.015 0.022 0.015 0.013 0.007 0.014 0.019 0.019 0.0180
10/15/2007 0.052 0.051 0.032 0.024 0.017 0.044 0.015 0.058 0.063 0.061 0.0640
10/15/2008 0.042 0.054 0.030 0.017 0.024 0.030 0.019 0.044 0.047 0.051 0.0390
10/15/2009 0.047 0.035 0.031 0.020 0.012 0.039 0.019 0.051 0.055 0.054 0.0350

The only way I can figure out how to resolve this, such that I can, for 
example, plot station A against date, is to export the tapply output as a 
csv, and then reimport.

Suggestions?  I couldn't find a solution to this likely SIMPLE problem in 
Crawley or multiple searches of R help.

Gregory A. Graves, Lead Scientist
Everglades REstoration COoordination and VERification (RECOVER) 
Wetland Watershed Sciences / Restoration Sciences Department
South Florida Water Management District
Phones:  DESK: 561 / 682 - 2429 
       CELL:  561 / 719 - 8157

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply output as a dataframe

2011-02-03 Thread Phil Spector

Try
  as.data.frame(as.table(a))

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Thu, 3 Feb 2011, Graves, Gregory wrote:


On Mon, Apr 13, 2009 at 12:41 PM, Dan Dube ddube-at-advisen.com wrote:


i use tapply and by often, but i always end up banging my head against
the wall with the output.


The proposed solution of Dan's problem posted on R-help was:


do.call(rbind,a)


When I use this 'solution' I get 'ERROR:  second argument must be a list'.  So 
head on wall continues.

My tapply output is generated as follows:


a=tapply(value,list(sampling.date,station.code),mean)


which gives me this (in part):

A B C D E F G H I J  K
1/15/2008  0.004 0.027 0.019 0.015 0.035 0.022 0.007 0.038 0.042 0.045 0.0350
1/15/2009  0.027 0.027 0.031 0.015 0.008 0.021 0.007 0.027 0.026 0.029 0.0210
1/15/2010  0.016 0.020 0.015 0.022 0.015 0.013 0.007 0.014 0.019 0.019 0.0180
10/15/2007 0.052 0.051 0.032 0.024 0.017 0.044 0.015 0.058 0.063 0.061 0.0640
10/15/2008 0.042 0.054 0.030 0.017 0.024 0.030 0.019 0.044 0.047 0.051 0.0390
10/15/2009 0.047 0.035 0.031 0.020 0.012 0.039 0.019 0.051 0.055 0.054 0.0350

The only way I can figure out how to resolve this, such that I can, for example, plot 
station A against date, is to export the tapply output as a csv, and then 
reimport.

Suggestions?  I couldn't find a solution to this likely SIMPLE problem in 
Crawley or multiple searches of R help.

Gregory A. Graves, Lead Scientist
Everglades REstoration COoordination and VERification (RECOVER)
Wetland Watershed Sciences / Restoration Sciences Department
South Florida Water Management District
Phones:  DESK: 561 / 682 - 2429
       CELL:  561 / 719 - 8157

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply output as a dataframe

2011-02-03 Thread David Winsemius


On Feb 3, 2011, at 11:29 AM, Graves, Gregory wrote:

On Mon, Apr 13, 2009 at 12:41 PM, Dan Dube ddube-at-advisen.com  
wrote:


That is pushing two years ago, so I doubt very many people still have  
that posting on their mail-clients. (When I did go to the archives Dan  
Dube's problem was posed as how to bind a:


dt = data.frame(bucket=rep(1:4,25),val=rnorm(100))
fn = function(x) { ret = c(unname(quantile(x,probs=seq(. 
25,.75,.25),na.rm=T)),mean(x,na.rm=T)) }

a = tapply(dt$val,dt$bucket,fn)


i use tapply and by often, but i always end up banging my head  
against

the wall with the output.


The proposed solution of Dan's problem posted on R-help was:


do.call(rbind,a)


When I use this 'solution' I get 'ERROR:  second argument must be a  
list'.  So head on wall continues.


My tapply output is generated as follows:


a=tapply(value,list(sampling.date,station.code),mean)


Why not give us sampling.date (which is probably NOT really a date but  
rather a character vector) and station.code so we can show you how to  
create a more appropriate structure?




which gives me this (in part):

A B C D E F G H I  
J  K
1/15/2008  0.004 0.027 0.019 0.015 0.035 0.022 0.007 0.038 0.042  
0.045 0.0350
1/15/2009  0.027 0.027 0.031 0.015 0.008 0.021 0.007 0.027 0.026  
0.029 0.0210
1/15/2010  0.016 0.020 0.015 0.022 0.015 0.013 0.007 0.014 0.019  
0.019 0.0180
10/15/2007 0.052 0.051 0.032 0.024 0.017 0.044 0.015 0.058 0.063  
0.061 0.0640
10/15/2008 0.042 0.054 0.030 0.017 0.024 0.030 0.019 0.044 0.047  
0.051 0.0390
10/15/2009 0.047 0.035 0.031 0.020 0.012 0.039 0.019 0.051 0.055  
0.054 0.0350


The only way I can figure out how to resolve this, such that I can,  
for example, plot station A against date, is to export the tapply  
output as a csv, and then reimport.




Suggestions?  I couldn't find a solution to this likely SIMPLE problem


Perhaps. but we haven't really been told what the problem is, have we?


in Crawley or multiple searches of R help.




Gregory A. Graves, Lead Scientist



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply output as a dataframe

2011-02-03 Thread Graves, Gregory
Yes, as far as I can tell, sampling.date is a character vector of the format 
1/15/2008.  It resides in the leftmost column of the tapply output.

station.code are the A, B, C column headers which refer actual water quality 
station locations, and the values below those headers correspond to the 
sampling.date when samples were taken.  Actually what I have done is to take 
the mid-point of each month and calculated its mean to deal with multiple 
samples taken in one month, and to generate NAs where no sample was taken by 
purposefully not adding the na.rm=T to the tapply command.

Normally I would do this:
 rdate-as.POSIXct(strptime(date,format=%m/%d/%Y)) #convert sampling.date to 
 date R can handle
 plot(A~rdate)

If I just submit station.code like
 A
I get all the values for Station A.

It is in converting the sampling.date to an rdate that has me stumped.  One 
reason being that in the tapply output the character vector representing date 
has no column name.  I can't access that column.

Gregory A. Graves, Lead Scientist
Everglades REstoration COoordination and VERification (RECOVER) 
Wetland Watershed Sciences / Restoration Sciences Department
South Florida Water Management District
Phones:  DESK: 561 / 682 - 2429 
       CELL:  561 / 719 - 8157


-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Thursday, February 03, 2011 12:50 PM
To: Graves, Gregory
Cc: r-help@r-project.org; Goodman, Patricia; Gorman, Patricia
Subject: Re: [R] tapply output as a dataframe


On Feb 3, 2011, at 11:29 AM, Graves, Gregory wrote:


 My tapply output is generated as follows:

 a=tapply(value,list(sampling.date,station.code),mean)

Why not give us sampling.date (which is probably NOT really a date but  
rather a character vector) and station.code so we can show you how to  
create a more appropriate structure?


 which gives me this (in part):

 A B C D E F G H I  
 J  K
 1/15/2008  0.004 0.027 0.019 0.015 0.035 0.022 0.007 0.038 0.042  
 0.045 0.0350
 1/15/2009  0.027 0.027 0.031 0.015 0.008 0.021 0.007 0.027 0.026  
 0.029 0.0210
 1/15/2010  0.016 0.020 0.015 0.022 0.015 0.013 0.007 0.014 0.019  
 0.019 0.0180
 10/15/2007 0.052 0.051 0.032 0.024 0.017 0.044 0.015 0.058 0.063  
 0.061 0.0640
 10/15/2008 0.042 0.054 0.030 0.017 0.024 0.030 0.019 0.044 0.047  
 0.051 0.0390
 10/15/2009 0.047 0.035 0.031 0.020 0.012 0.039 0.019 0.051 0.055  
 0.054 0.0350

 The only way I can figure out how to resolve this, such that I can,  
 for example, plot station A against date, is to export the tapply  
 output as a csv, and then reimport.


 Suggestions?  I couldn't find a solution to this likely SIMPLE problem

Perhaps. but we haven't really been told what the problem is, have we?

 in Crawley or multiple searches of R help.


 Gregory A. Graves, Lead Scientist


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply output as a dataframe

2011-02-03 Thread David Winsemius


On Feb 3, 2011, at 1:05 PM, Graves, Gregory wrote:

Yes, as far as I can tell, sampling.date is a character vector of  
the format 1/15/2008.  It resides in the leftmost column of the  
tapply output.


station.code are the A, B, C column headers which refer actual  
water quality station locations, and the values below those headers  
correspond to the sampling.date when samples were taken.  Actually  
what I have done is to take the mid-point of each month and  
calculated its mean to deal with multiple samples taken in one  
month, and to generate NAs where no sample was taken by purposefully  
not adding the na.rm=T to the tapply command.


Normally I would do this:
rdate-as.POSIXct(strptime(date,format=%m/%d/%Y)) #convert  
sampling.date to date R can handle

plot(A~rdate)


If I just submit station.code like

A

I get all the values for Station A.

It is in converting the sampling.date to an rdate that has me  
stumped.  One reason being that in the tapply output the character  
vector representing date has no column name.  I can't access that  
column.


It looks like a zoo object. zoo objects hold their time values in  
the rownames attribute. But since its not really ordered properly, it  
may just be a table with rownames. The str() function applied to the  
object from tapply would tell you the answer.


--
David.



Gregory A. Graves, Lead Scientist
Everglades REstoration COoordination and VERification (RECOVER)
Wetland Watershed Sciences / Restoration Sciences Department
South Florida Water Management District
Phones:  DESK: 561 / 682 - 2429
   CELL:  561 / 719 - 8157


-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Thursday, February 03, 2011 12:50 PM
To: Graves, Gregory
Cc: r-help@r-project.org; Goodman, Patricia; Gorman, Patricia
Subject: Re: [R] tapply output as a dataframe


On Feb 3, 2011, at 11:29 AM, Graves, Gregory wrote:



My tapply output is generated as follows:


a=tapply(value,list(sampling.date,station.code),mean)


Why not give us sampling.date (which is probably NOT really a date but
rather a character vector) and station.code so we can show you how to
create a more appropriate structure?



which gives me this (in part):

   A B C D E F G H I
J  K
1/15/2008  0.004 0.027 0.019 0.015 0.035 0.022 0.007 0.038 0.042
0.045 0.0350
1/15/2009  0.027 0.027 0.031 0.015 0.008 0.021 0.007 0.027 0.026
0.029 0.0210
1/15/2010  0.016 0.020 0.015 0.022 0.015 0.013 0.007 0.014 0.019
0.019 0.0180
10/15/2007 0.052 0.051 0.032 0.024 0.017 0.044 0.015 0.058 0.063
0.061 0.0640
10/15/2008 0.042 0.054 0.030 0.017 0.024 0.030 0.019 0.044 0.047
0.051 0.0390
10/15/2009 0.047 0.035 0.031 0.020 0.012 0.039 0.019 0.051 0.055
0.054 0.0350



The only way I can figure out how to resolve this, such that I can,
for example, plot station A against date, is to export the tapply
output as a csv, and then reimport.




Suggestions?  I couldn't find a solution to this likely SIMPLE  
problem


Perhaps. but we haven't really been told what the problem is, have we?


in Crawley or multiple searches of R help.




Gregory A. Graves, Lead Scientist



David Winsemius, MD
West Hartford, CT




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply output as a dataframe

2011-02-03 Thread Gabor Grothendieck
On Thu, Feb 3, 2011 at 1:11 PM, David Winsemius dwinsem...@comcast.net wrote:

 On Feb 3, 2011, at 1:05 PM, Graves, Gregory wrote:

 Yes, as far as I can tell, sampling.date is a character vector of the
 format 1/15/2008.  It resides in the leftmost column of the tapply output.

 station.code are the A, B, C column headers which refer actual water
 quality station locations, and the values below those headers correspond to
 the sampling.date when samples were taken.  Actually what I have done is
 to take the mid-point of each month and calculated its mean to deal with
 multiple samples taken in one month, and to generate NAs where no sample was
 taken by purposefully not adding the na.rm=T to the tapply command.

 Normally I would do this:

 rdate-as.POSIXct(strptime(date,format=%m/%d/%Y)) #convert
 sampling.date to date R can handle
 plot(A~rdate)

 If I just submit station.code like

 A

 I get all the values for Station A.

 It is in converting the sampling.date to an rdate that has me stumped.
  One reason being that in the tapply output the character vector
 representing date has no column name.  I can't access that column.

 It looks like a zoo object. zoo objects hold their time values in the
 rownames attribute. But since its not really ordered properly, it may just
 be a table with rownames. The str() function applied to the object from
 tapply would tell you the answer.


Internally zoo objects hold their time index in the index attribute.

 library(zoo)
 dput(zoo(4:5))
structure(4:5, index = 1:2, class = zoo)

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply output as a dataframe

2011-02-03 Thread Graves, Gregory
This works.  Thanks.

Gregory A. Graves, Lead Scientist
Everglades REstoration COoordination and VERification (RECOVER) 
Wetland Watershed Sciences / Restoration Sciences Department
South Florida Water Management District
Phones:  DESK: 561 / 682 - 2429 
       CELL:  561 / 719 - 8157


-Original Message-
From: Phil Spector [mailto:spec...@stat.berkeley.edu] 
Sent: Thursday, February 03, 2011 12:41 PM
To: Graves, Gregory
Cc: r-help@r-project.org; Goodman, Patricia; Gorman, Patricia
Subject: Re: [R] tapply output as a dataframe

Try
   as.data.frame(as.table(a))

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Thu, 3 Feb 2011, Graves, Gregory wrote:

 On Mon, Apr 13, 2009 at 12:41 PM, Dan Dube ddube-at-advisen.com wrote:

 i use tapply and by often, but i always end up banging my head against
 the wall with the output.

 The proposed solution of Dan's problem posted on R-help was:

 do.call(rbind,a)

 When I use this 'solution' I get 'ERROR:  second argument must be a list'.  
 So head on wall continues.

 My tapply output is generated as follows:

 a=tapply(value,list(sampling.date,station.code),mean)

 which gives me this (in part):

 A B C D E F G H I J  K
 1/15/2008  0.004 0.027 0.019 0.015 0.035 0.022 0.007 0.038 0.042 0.045 0.0350
 1/15/2009  0.027 0.027 0.031 0.015 0.008 0.021 0.007 0.027 0.026 0.029 0.0210
 1/15/2010  0.016 0.020 0.015 0.022 0.015 0.013 0.007 0.014 0.019 0.019 0.0180
 10/15/2007 0.052 0.051 0.032 0.024 0.017 0.044 0.015 0.058 0.063 0.061 0.0640
 10/15/2008 0.042 0.054 0.030 0.017 0.024 0.030 0.019 0.044 0.047 0.051 0.0390
 10/15/2009 0.047 0.035 0.031 0.020 0.012 0.039 0.019 0.051 0.055 0.054 0.0350

 The only way I can figure out how to resolve this, such that I can, for 
 example, plot station A against date, is to export the tapply output as a 
 csv, and then reimport.

 Suggestions?  I couldn't find a solution to this likely SIMPLE problem in 
 Crawley or multiple searches of R help.

 Gregory A. Graves, Lead Scientist
 Everglades REstoration COoordination and VERification (RECOVER)
 Wetland Watershed Sciences / Restoration Sciences Department
 South Florida Water Management District
 Phones:  DESK: 561 / 682 - 2429
        CELL:  561 / 719 - 8157

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply output

2010-10-07 Thread Peter Ehlers

On 2010-10-06 13:24, Erik Iverson wrote:

Hello,

You can use ddply from the very useful plyr package to do this.
There must be a way using base R functions, but plyr is
worth looking into in my opinion.

install.packages(plyr)
library(plyr)
ddply(myData, .(class, group, name), function(x) mean(x$height))

class group name   V1
1 0 A  Tom 62.5
2 0 B Jane 58.5
3 1 A Enzo 66.5
4 1 B Mary 70.5


Or use summarize:

   ddply(myData, .(class, group, name), summarize, mht = mean(height))

  -Peter Ehlers



Geoffrey Smith wrote:

Hello, I am having trouble getting the output from the tapply function
formatted so that it can be made into a nice table.  Below is my question
written in R code.  Does anyone have any suggestions?  Thank you.  Geoff

#Input the data;
name- c('Tom', 'Tom', 'Jane', 'Jane', 'Enzo', 'Enzo', 'Mary', 'Mary');
year- c(2008, 2009, 2008, 2009, 2008, 2009, 2008, 2009);
group- c('A', 'A', 'B', 'B', 'A', 'A', 'B', 'B');
class- c(0, 0, 0, 0, 1, 1, 1, 1);
height- c(62, 63, 59, 58, 67, 66, 70, 71);

#Combine the data into a data frame;
myData- data.frame(name, year, group, class, height);
myData;

#Calculate the mean of height by class, group, and name;
tapply(myData$height, data.frame(myData$class, myData$group, myData$name),
mean);

#The raw output from the tapply function is fine, but I would;
#really like the output to look like this;
#  class   group name mean
#0   ATom62.5
#0   BJane58.5
#1   AEnzo   66.5
#1   BMary   70.5



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply output

2010-10-07 Thread jim holtman
You can also use sqldf:


 require(sqldf)
 sqldf(select class, `group`, name, avg(height)
+ from myData
+ group by class, 'group', name)
  class group name avg(height)
1 0 B Jane58.5
2 0 A  Tom62.5
3 1 A Enzo66.5
4 1 B Mary70.5



On Thu, Oct 7, 2010 at 4:49 AM, Peter Ehlers ehl...@ucalgary.ca wrote:
 On 2010-10-06 13:24, Erik Iverson wrote:

 Hello,

 You can use ddply from the very useful plyr package to do this.
 There must be a way using base R functions, but plyr is
 worth looking into in my opinion.

    install.packages(plyr)
    library(plyr)
    ddply(myData, .(class, group, name), function(x) mean(x$height))

    class group name   V1
 1     0     A  Tom 62.5
 2     0     B Jane 58.5
 3     1     A Enzo 66.5
 4     1     B Mary 70.5

 Or use summarize:

    ddply(myData, .(class, group, name), summarize, mht = mean(height))

  -Peter Ehlers


 Geoffrey Smith wrote:

 Hello, I am having trouble getting the output from the tapply function
 formatted so that it can be made into a nice table.  Below is my question
 written in R code.  Does anyone have any suggestions?  Thank you.  Geoff

 #Input the data;
 name- c('Tom', 'Tom', 'Jane', 'Jane', 'Enzo', 'Enzo', 'Mary', 'Mary');
 year- c(2008, 2009, 2008, 2009, 2008, 2009, 2008, 2009);
 group- c('A', 'A', 'B', 'B', 'A', 'A', 'B', 'B');
 class- c(0, 0, 0, 0, 1, 1, 1, 1);
 height- c(62, 63, 59, 58, 67, 66, 70, 71);

 #Combine the data into a data frame;
 myData- data.frame(name, year, group, class, height);
 myData;

 #Calculate the mean of height by class, group, and name;
 tapply(myData$height, data.frame(myData$class, myData$group,
 myData$name),
 mean);

 #The raw output from the tapply function is fine, but I would;
 #really like the output to look like this;
 #  class   group     name     mean
 #    0       A            Tom        62.5
 #    0       B            Jane        58.5
 #    1       A            Enzo       66.5
 #    1       B            Mary       70.5


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply output

2010-10-06 Thread Geoffrey Smith
Hello, I am having trouble getting the output from the tapply function
formatted so that it can be made into a nice table.  Below is my question
written in R code.  Does anyone have any suggestions?  Thank you.  Geoff

#Input the data;
name - c('Tom', 'Tom', 'Jane', 'Jane', 'Enzo', 'Enzo', 'Mary', 'Mary');
year - c(2008, 2009, 2008, 2009, 2008, 2009, 2008, 2009);
group - c('A', 'A', 'B', 'B', 'A', 'A', 'B', 'B');
class - c(0, 0, 0, 0, 1, 1, 1, 1);
height - c(62, 63, 59, 58, 67, 66, 70, 71);

#Combine the data into a data frame;
myData - data.frame(name, year, group, class, height);
myData;

#Calculate the mean of height by class, group, and name;
tapply(myData$height, data.frame(myData$class, myData$group, myData$name),
mean);

#The raw output from the tapply function is fine, but I would;
#really like the output to look like this;
#  class   group name mean
#0   ATom62.5
#0   BJane58.5
#1   AEnzo   66.5
#1   BMary   70.5

-- 
Geoffrey Smith
Visiting Assistant Professor
Department of Finance
W. P. Carey School of Business
Arizona State University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply output

2010-10-06 Thread Henrique Dallazuanna
Try this:

aggregate(height ~ class + group + name, data = myData, FUN = mean)

On Wed, Oct 6, 2010 at 4:13 PM, Geoffrey Smith g...@asu.edu wrote:

 Hello, I am having trouble getting the output from the tapply function
 formatted so that it can be made into a nice table.  Below is my question
 written in R code.  Does anyone have any suggestions?  Thank you.  Geoff

 #Input the data;
 name - c('Tom', 'Tom', 'Jane', 'Jane', 'Enzo', 'Enzo', 'Mary', 'Mary');
 year - c(2008, 2009, 2008, 2009, 2008, 2009, 2008, 2009);
 group - c('A', 'A', 'B', 'B', 'A', 'A', 'B', 'B');
 class - c(0, 0, 0, 0, 1, 1, 1, 1);
 height - c(62, 63, 59, 58, 67, 66, 70, 71);

 #Combine the data into a data frame;
 myData - data.frame(name, year, group, class, height);
 myData;

 #Calculate the mean of height by class, group, and name;
 tapply(myData$height, data.frame(myData$class, myData$group, myData$name),
 mean);

 #The raw output from the tapply function is fine, but I would;
 #really like the output to look like this;
 #  class   group name mean
 #0   ATom62.5
 #0   BJane58.5
 #1   AEnzo   66.5
 #1   BMary   70.5

 --
 Geoffrey Smith
 Visiting Assistant Professor
 Department of Finance
 W. P. Carey School of Business
 Arizona State University

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply output

2010-10-06 Thread Erik Iverson

Hello,

You can use ddply from the very useful plyr package to do this.
There must be a way using base R functions, but plyr is
worth looking into in my opinion.

 install.packages(plyr)
 library(plyr)
 ddply(myData, .(class, group, name), function(x) mean(x$height))

  class group name   V1
1 0 A  Tom 62.5
2 0 B Jane 58.5
3 1 A Enzo 66.5
4 1 B Mary 70.5

Geoffrey Smith wrote:

Hello, I am having trouble getting the output from the tapply function
formatted so that it can be made into a nice table.  Below is my question
written in R code.  Does anyone have any suggestions?  Thank you.  Geoff

#Input the data;
name - c('Tom', 'Tom', 'Jane', 'Jane', 'Enzo', 'Enzo', 'Mary', 'Mary');
year - c(2008, 2009, 2008, 2009, 2008, 2009, 2008, 2009);
group - c('A', 'A', 'B', 'B', 'A', 'A', 'B', 'B');
class - c(0, 0, 0, 0, 1, 1, 1, 1);
height - c(62, 63, 59, 58, 67, 66, 70, 71);

#Combine the data into a data frame;
myData - data.frame(name, year, group, class, height);
myData;

#Calculate the mean of height by class, group, and name;
tapply(myData$height, data.frame(myData$class, myData$group, myData$name),
mean);

#The raw output from the tapply function is fine, but I would;
#really like the output to look like this;
#  class   group name mean
#0   ATom62.5
#0   BJane58.5
#1   AEnzo   66.5
#1   BMary   70.5



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply output

2010-10-06 Thread Phil Spector

Geoffrey -
   The output you want is exactly what the aggregate() function
provides:


aggregate(myData$height, myData[c('class','group','name')],mean)

  class group namex
1 1 A Enzo 66.5
2 0 B Jane 58.5
3 1 B Mary 70.5
4 0 A  Tom 62.5

It should be mentioned that converting tapply's output to this form
isn't too difficult:


tt = tapply(myData$height, data.frame(myData$class, myData$group, myData$name),

+ mean)

answer = as.data.frame(as.table(tt))
subset(answer,!is.na(Freq))

   myData.class myData.group myData.name Freq
2 1AEnzo 66.5
7 0BJane 58.5
121BMary 70.5
130A Tom 62.5

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu



On Wed, 6 Oct 2010, Geoffrey Smith wrote:


Hello, I am having trouble getting the output from the tapply function
formatted so that it can be made into a nice table.  Below is my question
written in R code.  Does anyone have any suggestions?  Thank you.  Geoff

#Input the data;
name - c('Tom', 'Tom', 'Jane', 'Jane', 'Enzo', 'Enzo', 'Mary', 'Mary');
year - c(2008, 2009, 2008, 2009, 2008, 2009, 2008, 2009);
group - c('A', 'A', 'B', 'B', 'A', 'A', 'B', 'B');
class - c(0, 0, 0, 0, 1, 1, 1, 1);
height - c(62, 63, 59, 58, 67, 66, 70, 71);

#Combine the data into a data frame;
myData - data.frame(name, year, group, class, height);
myData;

#Calculate the mean of height by class, group, and name;
tapply(myData$height, data.frame(myData$class, myData$group, myData$name),
mean);

#The raw output from the tapply function is fine, but I would;
#really like the output to look like this;
#  class   group name mean
#0   ATom62.5
#0   BJane58.5
#1   AEnzo   66.5
#1   BMary   70.5

--
Geoffrey Smith
Visiting Assistant Professor
Department of Finance
W. P. Carey School of Business
Arizona State University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply help

2010-06-06 Thread Mark Ebbert
That was very clever. Worked perfectly, thanks!

And thanks to everyone else who provided feedback.

On Jun 5, 2010, at 5:46 AM, jim holtman wrote:

 It this what you are looking for:
 
 set.seed(1)
 # create range for each possible class
 # 'name' the values so you can use them in the 'sapply' function
 lows-c(a=1, b=2, c=3, d=4, e=5)
 highs-c(a=5, b=6, c=7, d=8, e=9)
 
 # data values
 vals-sample(1:10,100,replace=T)
 
 #classes
 classes-sample(letters[1:5],100,replace=T)
 
 # split the data so that you retain the 'classes' name
 x.split - split(vals, classes)
 percentage - sapply(names(x.split), function(.class){
 + # compute the percentage based on 'class'
 + sum((x.split[[.class]] = lows[.class]) 
 + (x.split[[.class]] = highs[.class])) /
 length(x.split[[.class]]) * 100
 + })
 percentage
   abcde
 50.0 45.0 62.5 54.54545 55.6
 
 
 
 On Fri, Jun 4, 2010 at 4:02 PM, Mark Ebbert mark.ebb...@hci.utah.edu wrote:
 Dear R gurus,
 
 I am trying perform what I believe will be a pretty simple task, but I'm 
 struggling to figure out how to do it. I have two vectors of the same 
 length, the first is numeric and the second is factor. I understand that 
 tapply is perfect for applying a function to the numeric vector by subsets 
 of the factors in the second vector. My issue is trying to make use of two 
 other vectors within the custom function I've written for tapply. The two 
 other vectors are a high and low value for each subset I am breaking my data 
 into, and I want to calculate the percentage of data points that fall into 
 each respective range. I will attempt to provide a coherent example:
 
 # create range for each possible class
 lows-c(1,2,3,4,5)
 highs-c(5,6,7,8,9)
 
 # data values
 vals-sample(1:10,100,replace=T)
 
 #classes
 classes-sample(letters[1:5],100,replace=T)
 
 # Try to calculate percentage of values that fall
 # into the respective range for the given class.
 percentages-tapply(vals,classes,
function(i){
length(i[i=lows[index]  i=highs[index]])/length(i)  # I 
 don't know how to actually keep an index count in tapply, but I'm guessing 
 there's a better way.
})
 
 I really appreciate any help.
 
 ME
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390
 
 What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply help

2010-06-05 Thread jim holtman
It this what you are looking for:

 set.seed(1)
 # create range for each possible class
 # 'name' the values so you can use them in the 'sapply' function
 lows-c(a=1, b=2, c=3, d=4, e=5)
 highs-c(a=5, b=6, c=7, d=8, e=9)

 # data values
 vals-sample(1:10,100,replace=T)

 #classes
 classes-sample(letters[1:5],100,replace=T)

 # split the data so that you retain the 'classes' name
 x.split - split(vals, classes)
 percentage - sapply(names(x.split), function(.class){
+ # compute the percentage based on 'class'
+ sum((x.split[[.class]] = lows[.class]) 
+ (x.split[[.class]] = highs[.class])) /
length(x.split[[.class]]) * 100
+ })
 percentage
   abcde
50.0 45.0 62.5 54.54545 55.6



On Fri, Jun 4, 2010 at 4:02 PM, Mark Ebbert mark.ebb...@hci.utah.edu wrote:
 Dear R gurus,

 I am trying perform what I believe will be a pretty simple task, but I'm 
 struggling to figure out how to do it. I have two vectors of the same length, 
 the first is numeric and the second is factor. I understand that tapply is 
 perfect for applying a function to the numeric vector by subsets of the 
 factors in the second vector. My issue is trying to make use of two other 
 vectors within the custom function I've written for tapply. The two other 
 vectors are a high and low value for each subset I am breaking my data into, 
 and I want to calculate the percentage of data points that fall into each 
 respective range. I will attempt to provide a coherent example:

 # create range for each possible class
 lows-c(1,2,3,4,5)
 highs-c(5,6,7,8,9)

 # data values
 vals-sample(1:10,100,replace=T)

 #classes
 classes-sample(letters[1:5],100,replace=T)

 # Try to calculate percentage of values that fall
 # into the respective range for the given class.
 percentages-tapply(vals,classes,
        function(i){
                length(i[i=lows[index]  i=highs[index]])/length(i)  # I 
 don't know how to actually keep an index count in tapply, but I'm guessing 
 there's a better way.
        })

 I really appreciate any help.

 ME
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply help

2010-06-04 Thread Mark Ebbert
Dear R gurus,

I am trying perform what I believe will be a pretty simple task, but I'm 
struggling to figure out how to do it. I have two vectors of the same length, 
the first is numeric and the second is factor. I understand that tapply is 
perfect for applying a function to the numeric vector by subsets of the factors 
in the second vector. My issue is trying to make use of two other vectors 
within the custom function I've written for tapply. The two other vectors are a 
high and low value for each subset I am breaking my data into, and I want to 
calculate the percentage of data points that fall into each respective range. I 
will attempt to provide a coherent example:

# create range for each possible class
lows-c(1,2,3,4,5)
highs-c(5,6,7,8,9)

# data values
vals-sample(1:10,100,replace=T)

#classes
classes-sample(letters[1:5],100,replace=T)

# Try to calculate percentage of values that fall
# into the respective range for the given class.
percentages-tapply(vals,classes,
function(i){
length(i[i=lows[index]  i=highs[index]])/length(i)  # I 
don't know how to actually keep an index count in tapply, but I'm guessing 
there's a better way.
})

I really appreciate any help.

ME
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply function with NA

2010-05-10 Thread Gabor Grothendieck
See ?colSums

On Mon, May 10, 2010 at 12:44 AM, vincent.deluard
vincent.delu...@trimtabs.com wrote:

 Hi R users,

 I have a matrix m of the type:

 m
       X4.20.2010 X4.19.2010   X4.16.2010
 [1,]  0.008319468 0. -0.008250825
 [2,]  0.005574136 0.01816118  0.073081608
 [3,] -0.047830688 0.01612903 -0.030239833
 [4,]           NA         NA           NA
 [5,]  0.008746356 0.02848576 -0.025566107
 [6,] -0.007990868 0. -0.02667

 I want to get the sum of each column. Normally I would do:

 apply(m,2,sum)

 but I get:

 apply(m,2,sum)
 X4.20.2010 X4.19.2010 X4.16.2010
        NA         NA         NA

 This is because of the presence of NA in m. How do you the equivalent of
 sum(m[1:6,1],na.rm=TRUE)
 using apply?

 Many thanks!
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/tapply-function-with-NA-tp2164930p2164930.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply function with NA

2010-05-09 Thread vincent.deluard

Hi R users,

I have a matrix m of the type:

m
   X4.20.2010 X4.19.2010   X4.16.2010
[1,]  0.008319468 0. -0.008250825
[2,]  0.005574136 0.01816118  0.073081608
[3,] -0.047830688 0.01612903 -0.030239833
[4,]   NA NA   NA
[5,]  0.008746356 0.02848576 -0.025566107
[6,] -0.007990868 0. -0.02667

I want to get the sum of each column. Normally I would do:

apply(m,2,sum)

but I get:

 apply(m,2,sum)
X4.20.2010 X4.19.2010 X4.16.2010 
NA NA NA 

This is because of the presence of NA in m. How do you the equivalent of 
 sum(m[1:6,1],na.rm=TRUE)
using apply?

Many thanks!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/tapply-function-with-NA-tp2164930p2164930.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply function with NA

2010-05-09 Thread RICHARD M. HEIBERGER
It is exactly the same

tmp - matrix(1:24,6,4)
tmp[4,] - NA
tmp
apply(tmp, 2, sum, na.rm=TRUE)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tapply.

2010-04-28 Thread Petr PIKAL
Hi

steven mosher mosherste...@gmail.com napsal dne 27.04.2010 17:04:04:

 Thanks,
 
  I had been wondering what Drop did. That makes it more clear.
  
 While I have code that loops and does the problem correctly, I wanted to
 do things the R way and be fast and terse. hehe.
 
 So:
 ID   dy  jan  ...
 11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240 
 NA
  11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240 
 NA
 
 in words : for each id, for each year return
  the max of jan,feb,.over d
  the min of jan, feb  over d
  the mean of jan,feb.. over d
  the (max+min)/2 of jan, feb...over d
  the count of d for jan.feb..
  the results of a function called with all elements of this id
 

something like

aggregate(data[, months], list(id, d), my.summary)

where my.summary is a function computing all required values and returning 
them in appropriate form.

in words : split selected data to chunks according to list of indices, use 
required function to each chunk and return result.

Regards
Petr



 Anyway, your kind attention has been greatly appreciated.
 
  
  
 
 

 On Tue, Apr 27, 2010 at 2:40 AM, Petr PIKAL petr.pi...@precheza.cz 
wrote:
 Hi
 r-help-boun...@r-project.org napsal dne 26.04.2010 17:05:54:
 
  I guess my problem was seeing a bunch of examples where they pulled a
  variable from a dataframe..
 
tapply(df$data, index=list(..

 df$data results in vector so as eg. df[,5] unless you use drop=FALSE
 option
 
 
  and I
  assumed that the df$data was just generalizable to a collection of
 vectors
  a vector of vector being a vector

 df[,1:15] is not a vector of vectors. R sometimes can give you nasty
 surprise with object types and modes but changing a type of object 
merely
 by selecting some part of it wold be quite problematic.
 
 see
 
 str(df$data)
 str(df[, 1])
 str(df[,1, drop=FALSE])
 str(df[,1:15])
 
 Regards
 Petr
 
 
 
 
  Thanks.
 
  On Mon, Apr 26, 2010 at 2:43 AM, Petr PIKAL petr.pi...@precheza.cz
 wrote:
 
   Hi
  
  
   steven mosher mosherste...@gmail.com napsal dne 26.04.2010 
10:21:37:
  
That fails:
   
The manual says:
   
tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
  
Arguments
   
X
   
an atomic object, typically a vector.
   
INDEX
   
list of factors, each of same length as X. The elements are 
coerced
 to
   factors by
as.factor.
   
my error says:
  
   
Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) :
   
  arguments must have same length
   
The issue that I have is I dont understand what the requirements 
for
 the
   list of factors
are. In my example DF$Years is  a sequence of
   years..1979,1980,1982,1983, 1987..
like that with missing years: so when the manual say: list of
 factors
   each the same
length as X? what does that mean? I could have a DF with 20 rows 
and
   only two
different years. or 20 rows and 20 different years.
   
Suppose:
   
a- c(1,2,3,4)
 b-c(2,3,4,5)
 df=data.frame(a,b)
 length(df)
  
   data frame is not vector nor atomic but list hence length(df) gives
 you
   number of columns. It is similar to length of a list
  
lll-list(a=1, b=2, c=3)
length(lll)
   [1] 3
   
  
   If you accept that the first argument of tapply has to be vector you
 can
   not put data frame there.
  
   Next second argument has to be list of factors so you can put there
   several factors, each of the same length as first argument (a 
vector).
  
   If you want to perform aggregating operation on whole data frame you
 shall
   consider
  
   ?by or ?aggregate
  
   Other options are plyr or doBy packages.
  
   Syntax for aggregate is quite similar to tapply, only first argument
 can
   be data frame.
  
   Regards
   Petr
  
  
   
The length of DF is 2.
Does that mean the list of factors, each of same length as X.
 would
   have to be
2? that doesnt seem to make sense.
   
   
   
On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL
 petr.pi...@precheza.cz
   wrote:
Hi
   
r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55:
   
 Having some difficulties with understanding how tapply works and
   getting
 return values I expect

 Data: dataframe. DF  DF$Id $D $Year...

  Id  D  Year Jan Feb Mar Apr May Jun Jul
 Aug
   Sep
Oct
 Nov Dec
  11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228
 237
NA
NA
  11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225 
NA
   231
NA
  11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235 
NA
NA
245
  11264402000 0 1982 236 237 242 240 242 205 199  NA  NA 
NA
NA
NA
  11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA 
NA
NA
NA
  11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA 
NA
NA
NA
  11264402000  

Re: [R] Tapply.

2010-04-27 Thread steven mosher
Thanks dennis.

Is there a book on R u could recommend.



On Mon, Apr 26, 2010 at 7:12 PM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:


  On Mon, Apr 26, 2010 at 8:01 AM, steven mosher 
  mosherste...@gmail.comwrote:
  Thanks,

   I was trying to stick with the base package and figure out how the base
 routines worked.

 If you want to use base functions, then here's a solution with aggregate:
 (the Id column
 was removed first):

  with(DF, aggregate(DF[, -2], list(Year = Year), FUN = mean, na.rm =
 TRUE))
   YearD Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
 1 1980 1.00 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
 2 1981 0.50 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
 3 1982 0.50 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
 4 1983 0.50 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
 5 1986 0.00 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
 6 1987 1.33 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
 7 1988 1.33 238 246 249 246 244 213 212 224 232 238 232 230
 8 1989 1.33 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238

 The problem with tapply() is that the function has to be called recursively
 on each
 column you want to summarize. You could do it in a loop:
  res - matrix(NA, 8, 14)
  res[, 1] - unique(DF$Year)
  res[, 2] - with(DF, tapply(D, Year, mean, na.rm = TRUE))
  for(j in 3:14) res[, j] - tapply(DF[, j], DF$Year, mean, na.rm = TRUE)
  res
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [,13]
 [1,] 1980 1.00  NaN  NaN  NaN  NaN  NaN  212  203   209   228   237
 NaN
 [2,] 1981 0.50  NaN  251  243  246  241  NaN  NaN   NaN   230   NaN
 231
 [3,] 1982 0.50  236  237  242  240  242  205  199   NaN   NaN   NaN
 NaN
 [4,] 1983 0.50  NaN  247  NaN  NaN  NaN  NaN  NaN   205   NaN   225
 NaN
 [5,] 1986 0.00  NaN  NaN  NaN  240  NaN  NaN  NaN   213   NaN   NaN
 NaN
 [6,] 1987 1.33  241  NaN  NaN  NaN  NaN  218  NaN   NaN   235   243
 240
 [7,] 1988 1.33  238  246  249  246  244  213  212   224   232   238
 232
 [8,] 1989 1.33  232  233  238  239  231  NaN  215   NaN   NaN   NaN
 NaN
  [,14]
 [1,]   NaN
 [2,]   245
 [3,]   NaN
 [4,]   NaN
 [5,]   NaN
 [6,]   NaN
 [7,]   230
 [8,]   238

 but it's not the most efficient way to do things.

 Essentially, this approach conforms to the 'split-apply-combine' strategy
 which is
 more efficiently implemented in functions like aggregate() or in packages
 such
 as doBy, plyr, reshape and data.table, some of which were mentioned earlier
 by
 Petr Pikal.

 HTH,
 Dennis


 On Mon, Apr 26, 2010 at 8:01 AM, steven mosher mosherste...@gmail.comwrote:

 Thanks,

   I was trying to stick with the base package and figure out how the base
 routines worked. I looked at plyer and it was very appealing. I guess i'll
 give in and use it

 On Mon, Apr 26, 2010 at 2:33 AM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 Use of ddply() in the plyr package appears to work.

 library(plyr)
 ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE)

  D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
 1 1.00 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
 2 0.50 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
 3 0.50 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
 4 0.50 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
 5 0.00 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
 6 1.33 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
 7 1.33 1988 238 246 249 246 244 213 212 224 232 238 232 230
 8 1.33 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238

 Replace the NaNs with NAs and that should do it

 HTH,
 Dennis

 On Sun, Apr 25, 2010 at 9:52 PM, steven mosher 
 mosherste...@gmail.comwrote:

 Having some difficulties with understanding how tapply works and getting
 return values I expect

 Data: dataframe. DF  DF$Id $D $Year...

  Id  D  Year Jan Feb Mar Apr May Jun Jul Aug Sep
 Oct
 Nov Dec
  11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
  NA
  11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
  NA
  11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA  NA
 245
  11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA  NA
  NA
  11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA  NA
  NA
  11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA  NA
  NA
  11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225  NA
  NA
  11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
  NA
  11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238 232
 230
  11264402000 1 1988 238 246 249 246 

Re: [R] Tapply.

2010-04-27 Thread Petr PIKAL
Hi
r-help-boun...@r-project.org napsal dne 26.04.2010 17:05:54:

 I guess my problem was seeing a bunch of examples where they pulled a
 variable from a dataframe..
 
   tapply(df$data, index=list(..

df$data results in vector so as eg. df[,5] unless you use drop=FALSE 
option

 
 and I
 assumed that the df$data was just generalizable to a collection of 
vectors
 a vector of vector being a vector

df[,1:15] is not a vector of vectors. R sometimes can give you nasty 
surprise with object types and modes but changing a type of object merely 
by selecting some part of it wold be quite problematic.

see

str(df$data)
str(df[, 1])
str(df[,1, drop=FALSE])
str(df[,1:15])

Regards
Petr



 
 Thanks.
 
 On Mon, Apr 26, 2010 at 2:43 AM, Petr PIKAL petr.pi...@precheza.cz 
wrote:
 
  Hi
 
 
  steven mosher mosherste...@gmail.com napsal dne 26.04.2010 10:21:37:
 
   That fails:
  
   The manual says:
  
   tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
 
   Arguments
  
   X
  
   an atomic object, typically a vector.
  
   INDEX
  
   list of factors, each of same length as X. The elements are coerced 
to
  factors by
   as.factor.
  
   my error says:
 
  
   Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) :
  
 arguments must have same length
  
   The issue that I have is I dont understand what the requirements for 
the
  list of factors
   are. In my example DF$Years is  a sequence of
  years..1979,1980,1982,1983, 1987..
   like that with missing years: so when the manual say: list of 
factors
  each the same
   length as X? what does that mean? I could have a DF with 20 rows and
  only two
   different years. or 20 rows and 20 different years.
  
   Suppose:
  
   a- c(1,2,3,4)
b-c(2,3,4,5)
df=data.frame(a,b)
length(df)
 
  data frame is not vector nor atomic but list hence length(df) gives 
you
  number of columns. It is similar to length of a list
 
   lll-list(a=1, b=2, c=3)
   length(lll)
  [1] 3
  
 
  If you accept that the first argument of tapply has to be vector you 
can
  not put data frame there.
 
  Next second argument has to be list of factors so you can put there
  several factors, each of the same length as first argument (a vector).
 
  If you want to perform aggregating operation on whole data frame you 
shall
  consider
 
  ?by or ?aggregate
 
  Other options are plyr or doBy packages.
 
  Syntax for aggregate is quite similar to tapply, only first argument 
can
  be data frame.
 
  Regards
  Petr
 
 
  
   The length of DF is 2.
   Does that mean the list of factors, each of same length as X. 
would
  have to be
   2? that doesnt seem to make sense.
  
  
  
   On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL 
petr.pi...@precheza.cz
  wrote:
   Hi
  
   r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55:
  
Having some difficulties with understanding how tapply works and
  getting
return values I expect
   
Data: dataframe. DF  DF$Id $D $Year...
   
 Id  D  Year Jan Feb Mar Apr May Jun Jul 
Aug
  Sep
   Oct
Nov Dec
 11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 
237
   NA
   NA
 11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225 NA
  231
   NA
 11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235 NA
   NA
   245
 11264402000 0 1982 236 237 242 240 242 205 199  NA  NA NA
   NA
   NA
 11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA NA
   NA
   NA
 11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA NA
   NA
   NA
 11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 
225
   NA
   NA
 11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA NA
   NA
   NA
 11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 
243
  240
   NA
 11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 
243
  240
   NA
 11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 
243
  240
   NA
 11264402000 0 1988 238 246 249  NA 244 213 212 224 232 
238
  232
   230
 11264402000 1 1988 238 246 249 246 244 213 212 224 232 NA
   NA
   230
 11264402000 3 1988 238 246 249 246 244 213 212 224 232 NA
   NA
   230
 11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA NA
   NA
   238
 11264402000 1 1989 232 233 238 239 231  NA  NA  NA  NA NA
   NA
   238
 11264402000 3 1989 232 233 238 239 231  NA  NA  NA  NA NA
   NA
   238
   
and the result should be a dataframe of column means by year  with 
the
variable D dropped (or kept doesnt matter)
   
11264402000 1  1980  NA  NA  NA  NA  NA 212 203 209 228 
237
   NA
   NA
 11264402000.5  1981  NA  NA 243 244  NA  NA  NA  NA 225 
NA
  231
NA
 11264402000.5  1982 236 237 242 240 242 205 199  NA  NA 
NA
   NA
NA
 11264402000.5  1983  NA 247  NA  NA  NA  NA  NA 205  NA 
225
  NA
 NA
 112644020001  1986  

Re: [R] Tapply.

2010-04-27 Thread Petr PIKAL
Hi

If you are not satisfied with R intro docs which are distributed with R 
installation you can consider Introductory statistics with R by P.Dalgaard 
for beginners and mayby Modern applied statistics with S by W.N.Venables 
and B.D.Ripley which is a bit outdated and applies maybe a little more to 
S but still worth reading.

Regards
Petr


r-help-boun...@r-project.org napsal dne 27.04.2010 10:05:25:

 Thanks dennis.
 
 Is there a book on R u could recommend.
 
 
 
 On Mon, Apr 26, 2010 at 7:12 PM, Dennis Murphy djmu...@gmail.com 
wrote:
 
  Hi:
 
 
   On Mon, Apr 26, 2010 at 8:01 AM, steven mosher 
mosherste...@gmail.comwrote:
   Thanks,
 
I was trying to stick with the base package and figure out how the 
base
  routines worked.
 
  If you want to use base functions, then here's a solution with 
aggregate:
  (the Id column
  was removed first):
 
   with(DF, aggregate(DF[, -2], list(Year = Year), FUN = mean, na.rm =
  TRUE))
YearD Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
  1 1980 1.00 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
  2 1981 0.50 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
  3 1982 0.50 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
  4 1983 0.50 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
  5 1986 0.00 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
  6 1987 1.33 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
  7 1988 1.33 238 246 249 246 244 213 212 224 232 238 232 230
  8 1989 1.33 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238
 
  The problem with tapply() is that the function has to be called 
recursively
  on each
  column you want to summarize. You could do it in a loop:
   res - matrix(NA, 8, 14)
   res[, 1] - unique(DF$Year)
   res[, 2] - with(DF, tapply(D, Year, mean, na.rm = TRUE))
   for(j in 3:14) res[, j] - tapply(DF[, j], DF$Year, mean, na.rm = 
TRUE)
   res
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] 
[,12]
  [,13]
  [1,] 1980 1.00  NaN  NaN  NaN  NaN  NaN  212  203   209   228 237
  NaN
  [2,] 1981 0.50  NaN  251  243  246  241  NaN  NaN   NaN   230 NaN
  231
  [3,] 1982 0.50  236  237  242  240  242  205  199   NaN   NaN NaN
  NaN
  [4,] 1983 0.50  NaN  247  NaN  NaN  NaN  NaN  NaN   205   NaN 225
  NaN
  [5,] 1986 0.00  NaN  NaN  NaN  240  NaN  NaN  NaN   213   NaN NaN
  NaN
  [6,] 1987 1.33  241  NaN  NaN  NaN  NaN  218  NaN   NaN   235 243
  240
  [7,] 1988 1.33  238  246  249  246  244  213  212   224   232 238
  232
  [8,] 1989 1.33  232  233  238  239  231  NaN  215   NaN   NaN NaN
  NaN
   [,14]
  [1,]   NaN
  [2,]   245
  [3,]   NaN
  [4,]   NaN
  [5,]   NaN
  [6,]   NaN
  [7,]   230
  [8,]   238
 
  but it's not the most efficient way to do things.
 
  Essentially, this approach conforms to the 'split-apply-combine' 
strategy
  which is
  more efficiently implemented in functions like aggregate() or in 
packages
  such
  as doBy, plyr, reshape and data.table, some of which were mentioned 
earlier
  by
  Petr Pikal.
 
  HTH,
  Dennis
 
 
  On Mon, Apr 26, 2010 at 8:01 AM, steven mosher 
mosherste...@gmail.comwrote:
 
  Thanks,
 
I was trying to stick with the base package and figure out how the 
base
  routines worked. I looked at plyer and it was very appealing. I guess 
i'll
  give in and use it
 
  On Mon, Apr 26, 2010 at 2:33 AM, Dennis Murphy djmu...@gmail.com 
wrote:
 
  Hi:
 
  Use of ddply() in the plyr package appears to work.
 
  library(plyr)
  ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE)
 
   D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
  1 1.00 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
  2 0.50 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
  3 0.50 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
  4 0.50 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
  5 0.00 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
  6 1.33 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
  7 1.33 1988 238 246 249 246 244 213 212 224 232 238 232 230
  8 1.33 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238
 
  Replace the NaNs with NAs and that should do it
 
  HTH,
  Dennis
 
  On Sun, Apr 25, 2010 at 9:52 PM, steven mosher 
mosherste...@gmail.comwrote:
 
  Having some difficulties with understanding how tapply works and 
getting
  return values I expect
 
  Data: dataframe. DF  DF$Id $D $Year...
 
   Id  D  Year Jan Feb Mar Apr May Jun Jul 
Aug Sep
  Oct
  Nov Dec
   11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237 
 NA
   NA
   11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 
231
   NA
   11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA 
 NA
  245
   11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA 
 NA
   NA
   11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA 
 NA
   NA
   11264402000 0 1983  NA 247  NA  NA  

Re: [R] Tapply.

2010-04-27 Thread steven mosher
Thanks,

 I had been wondering what Drop did. That makes it more clear.

While I have code that loops and does the problem correctly, I wanted to
do things the R way and be fast and terse. hehe.

So:
ID   dy  jan  ...
11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
 11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240  NA

in words : for each id, for each year return
 the max of jan,feb,.over d
 the min of jan, feb  over d
 the mean of jan,feb.. over d
 the (max+min)/2 of jan, feb...over d
 the count of d for jan.feb..
 the results of a function called with all elements of this id

Anyway, your kind attention has been greatly appreciated.






On Tue, Apr 27, 2010 at 2:40 AM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi
 r-help-boun...@r-project.org napsal dne 26.04.2010 17:05:54:

  I guess my problem was seeing a bunch of examples where they pulled a
  variable from a dataframe..
 
tapply(df$data, index=list(..

 df$data results in vector so as eg. df[,5] unless you use drop=FALSE
 option

 
  and I
  assumed that the df$data was just generalizable to a collection of
 vectors
  a vector of vector being a vector

 df[,1:15] is not a vector of vectors. R sometimes can give you nasty
 surprise with object types and modes but changing a type of object merely
 by selecting some part of it wold be quite problematic.

 see

 str(df$data)
 str(df[, 1])
 str(df[,1, drop=FALSE])
 str(df[,1:15])

 Regards
 Petr



 
  Thanks.
 
  On Mon, Apr 26, 2010 at 2:43 AM, Petr PIKAL petr.pi...@precheza.cz
 wrote:
 
   Hi
  
  
   steven mosher mosherste...@gmail.com napsal dne 26.04.2010 10:21:37:
  
That fails:
   
The manual says:
   
tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
  
Arguments
   
X
   
an atomic object, typically a vector.
   
INDEX
   
list of factors, each of same length as X. The elements are coerced
 to
   factors by
as.factor.
   
my error says:
  
   
Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) :
   
  arguments must have same length
   
The issue that I have is I dont understand what the requirements for
 the
   list of factors
are. In my example DF$Years is  a sequence of
   years..1979,1980,1982,1983, 1987..
like that with missing years: so when the manual say: list of
 factors
   each the same
length as X? what does that mean? I could have a DF with 20 rows and
   only two
different years. or 20 rows and 20 different years.
   
Suppose:
   
a- c(1,2,3,4)
 b-c(2,3,4,5)
 df=data.frame(a,b)
 length(df)
  
   data frame is not vector nor atomic but list hence length(df) gives
 you
   number of columns. It is similar to length of a list
  
lll-list(a=1, b=2, c=3)
length(lll)
   [1] 3
   
  
   If you accept that the first argument of tapply has to be vector you
 can
   not put data frame there.
  
   Next second argument has to be list of factors so you can put there
   several factors, each of the same length as first argument (a vector).
  
   If you want to perform aggregating operation on whole data frame you
 shall
   consider
  
   ?by or ?aggregate
  
   Other options are plyr or doBy packages.
  
   Syntax for aggregate is quite similar to tapply, only first argument
 can
   be data frame.
  
   Regards
   Petr
  
  
   
The length of DF is 2.
Does that mean the list of factors, each of same length as X.
 would
   have to be
2? that doesnt seem to make sense.
   
   
   
On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL
 petr.pi...@precheza.cz
   wrote:
Hi
   
r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55:
   
 Having some difficulties with understanding how tapply works and
   getting
 return values I expect

 Data: dataframe. DF  DF$Id $D $Year...

  Id  D  Year Jan Feb Mar Apr May Jun Jul
 Aug
   Sep
Oct
 Nov Dec
  11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228
 237
NA
NA
  11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225 NA
   231
NA
  11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235 NA
NA
245
  11264402000 0 1982 236 237 242 240 242 205 199  NA  NA NA
NA
NA
  11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA NA
NA
NA
  11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA NA
NA
NA
  11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA
 225
NA
NA
  11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA NA
NA
NA
  11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235
 243
   240
NA
  11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235
 243
   240
NA
  11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235
 243
   240
NA
  11264402000 

Re: [R] Tapply.

2010-04-26 Thread steven mosher
I've tried both mean and colMean.

I did success with one attempt using mean, however if only have 1 year and
its a NA
then I get NaN ( which I can replace). I'll keep trying.



On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi

 r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55:

  Having some difficulties with understanding how tapply works and getting
  return values I expect
 
  Data: dataframe. DF  DF$Id $D $Year...
 
   Id  D  Year Jan Feb Mar Apr May Jun Jul Aug Sep
 Oct
  Nov Dec
   11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
 NA
   11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
 NA
   11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA  NA
 245
   11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA  NA
 NA
   11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA  NA
 NA
   11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA  NA
 NA
   11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225  NA
 NA
   11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
 NA
   11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238 232
 230
   11264402000 1 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
   11264402000 3 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
   11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA  NA  NA
 238
   11264402000 1 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
 238
   11264402000 3 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
 238
 
  and the result should be a dataframe of column means by year  with the
  variable D dropped (or kept doesnt matter)
 
  11264402000 1  1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
 NA
   11264402000.5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
  NA
   11264402000.5  1982 236 237 242 240 242 205 199  NA  NA  NA  NA
  NA
   11264402000.5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225 NA
   NA
   112644020001  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
 NA
   11264402000 2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   112644020001.33 1988 238 246 249  246 244 213 212 224 232 238
 232
  230
   112644020001.33  1989 232 233 238 239 231  NA 215  NA  NA  NA
 NA
  238
 
   It would seem that Tapply should work
   result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)

 Why colMeans?  It is function used instead of apply(...,.. ,mean).

 Maybe you want

 result-tapply( DF[,1:15], DF$Year, mean,na.rm=T)

 Regards
 Petr

 
   but i get errors about the length of arguments, which
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tapply.

2010-04-26 Thread steven mosher
That fails:

The manual says:

tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

ArgumentsXan atomic object, typically a vector.INDEXlist of factors, each of
same length as X. The elements are coerced to factors by
as.factorhttp://127.0.0.1:31214/library/base/help/as.factor
.

my error says:

Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) :   arguments must
have same length

The issue that I have is I dont understand what the requirements for the
list of factors
are. In my example DF$Years is  a sequence of years..1979,1980,1982,1983,
1987..
like that with missing years: so when the manual say: list of factors each
the same
length as X? what does that mean? I could have a DF with 20 rows and only
two
different years. or 20 rows and 20 different years.

Suppose:

a- c(1,2,3,4)
 b-c(2,3,4,5)
 df=data.frame(a,b)
 length(df)

The length of DF is 2.
Does that mean the list of factors, each of same length as X. would have
to be
2? that doesnt seem to make sense.





On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi

 r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55:

  Having some difficulties with understanding how tapply works and getting
  return values I expect
 
  Data: dataframe. DF  DF$Id $D $Year...
 
   Id  D  Year Jan Feb Mar Apr May Jun Jul Aug Sep
 Oct
  Nov Dec
   11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
 NA
   11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
 NA
   11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA  NA
 245
   11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA  NA
 NA
   11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA  NA
 NA
   11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA  NA
 NA
   11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225  NA
 NA
   11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
 NA
   11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238 232
 230
   11264402000 1 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
   11264402000 3 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
   11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA  NA  NA
 238
   11264402000 1 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
 238
   11264402000 3 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
 238
 
  and the result should be a dataframe of column means by year  with the
  variable D dropped (or kept doesnt matter)
 
  11264402000 1  1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
 NA
   11264402000.5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
  NA
   11264402000.5  1982 236 237 242 240 242 205 199  NA  NA  NA  NA
  NA
   11264402000.5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225 NA
   NA
   112644020001  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
 NA
   11264402000 2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
 NA
   112644020001.33 1988 238 246 249  246 244 213 212 224 232 238
 232
  230
   112644020001.33  1989 232 233 238 239 231  NA 215  NA  NA  NA
 NA
  238
 
   It would seem that Tapply should work
   result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)

 Why colMeans?  It is function used instead of apply(...,.. ,mean).

 Maybe you want

 result-tapply( DF[,1:15], DF$Year, mean,na.rm=T)

 Regards
 Petr

 
   but i get errors about the length of arguments, which
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tapply.

2010-04-26 Thread Dennis Murphy
Hi:

Use of ddply() in the plyr package appears to work.

library(plyr)
ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE)
 D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 1.00 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
2 0.50 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
3 0.50 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
4 0.50 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
5 0.00 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
6 1.33 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
7 1.33 1988 238 246 249 246 244 213 212 224 232 238 232 230
8 1.33 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238

Replace the NaNs with NAs and that should do it

HTH,
Dennis

On Sun, Apr 25, 2010 at 9:52 PM, steven mosher mosherste...@gmail.comwrote:

 Having some difficulties with understanding how tapply works and getting
 return values I expect

 Data: dataframe. DF  DF$Id $D $Year...

  Id  D  Year Jan Feb Mar Apr May Jun Jul Aug Sep
 Oct
 Nov Dec
  11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA  NA
  11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231  NA
  11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA  NA 245
  11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA  NA  NA
  11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA  NA  NA
  11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA  NA  NA
  11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225  NA  NA
  11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA  NA
  11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
  11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
  11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
  11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238 232 230
  11264402000 1 1988 238 246 249 246 244 213 212 224 232  NA  NA 230
  11264402000 3 1988 238 246 249 246 244 213 212 224 232  NA  NA 230
  11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA  NA  NA 238
  11264402000 1 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA 238
  11264402000 3 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA 238

 and the result should be a dataframe of column means by year  with the
 variable D dropped (or kept doesnt matter)

 11264402000 1  1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA  NA
  11264402000.5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
  NA
  11264402000.5  1982 236 237 242 240 242 205 199  NA  NA  NA  NA
  NA
  11264402000.5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225  NA
  NA
  112644020001  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA  NA
  11264402000 2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
  112644020001.33 1988 238 246 249  246 244 213 212 224 232 238 232
 230
  112644020001.33  1989 232 233 238 239 231  NA 215  NA  NA  NA  NA
 238

  It would seem that Tapply should work
  result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)

  but i get errors about the length of arguments, which

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tapply.

2010-04-26 Thread Petr PIKAL
Hi


steven mosher mosherste...@gmail.com napsal dne 26.04.2010 10:21:37:

 That fails:
 
 The manual says:
 
 tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

 Arguments
 
 X
 
 an atomic object, typically a vector.
 
 INDEX
 
 list of factors, each of same length as X. The elements are coerced to 
factors by 
 as.factor.
 
 my error says:

 
 Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) : 
 
   arguments must have same length
 
 The issue that I have is I dont understand what the requirements for the 
list of factors
 are. In my example DF$Years is  a sequence of 
years..1979,1980,1982,1983, 1987..
 like that with missing years: so when the manual say: list of factors 
each the same
 length as X? what does that mean? I could have a DF with 20 rows and 
only two
 different years. or 20 rows and 20 different years. 
 
 Suppose:
 
 a- c(1,2,3,4)
  b-c(2,3,4,5)
  df=data.frame(a,b)
  length(df)

data frame is not vector nor atomic but list hence length(df) gives you 
number of columns. It is similar to length of a list

 lll-list(a=1, b=2, c=3)
 length(lll)
[1] 3


If you accept that the first argument of tapply has to be vector you can 
not put data frame there.

Next second argument has to be list of factors so you can put there 
several factors, each of the same length as first argument (a vector).

If you want to perform aggregating operation on whole data frame you shall 
consider

?by or ?aggregate

Other options are plyr or doBy packages.

Syntax for aggregate is quite similar to tapply, only first argument can 
be data frame.

Regards
Petr 


 
 The length of DF is 2.
 Does that mean the list of factors, each of same length as X. would 
have to be
 2? that doesnt seem to make sense. 
 
  
 
 On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL petr.pi...@precheza.cz 
wrote:
 Hi
 
 r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55:
 
  Having some difficulties with understanding how tapply works and 
getting
  return values I expect
 
  Data: dataframe. DF  DF$Id $D $Year...
 
   Id  D  Year Jan Feb Mar Apr May Jun Jul Aug 
Sep
 Oct
  Nov Dec
   11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237 
 NA
 NA
   11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 
231
 NA
   11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA 
 NA
 245
   11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA 
 NA
 NA
   11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA 
 NA
 NA
   11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA 
 NA
 NA
   11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225 
 NA
 NA
   11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA 
 NA
 NA
   11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 
240
 NA
   11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 
240
 NA
   11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 
240
 NA
   11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238 
232
 230
   11264402000 1 1988 238 246 249 246 244 213 212 224 232  NA 
 NA
 230
   11264402000 3 1988 238 246 249 246 244 213 212 224 232  NA 
 NA
 230
   11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA  NA 
 NA
 238
   11264402000 1 1989 232 233 238 239 231  NA  NA  NA  NA  NA 
 NA
 238
   11264402000 3 1989 232 233 238 239 231  NA  NA  NA  NA  NA 
 NA
 238
 
  and the result should be a dataframe of column means by year  with the
  variable D dropped (or kept doesnt matter)
 
  11264402000 1  1980  NA  NA  NA  NA  NA 212 203 209 228 237 
 NA
 NA
   11264402000.5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 
231
  NA
   11264402000.5  1982 236 237 242 240 242 205 199  NA  NA  NA 
 NA
  NA
   11264402000.5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225 
NA
   NA
   112644020001  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA 
 NA
 NA
   11264402000 2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 
240
 NA
   112644020001.33 1988 238 246 249  246 244 213 212 224 232 238
 232
  230
   112644020001.33  1989 232 233 238 239 231  NA 215  NA  NA  NA
 NA
  238
 
   It would seem that Tapply should work
   result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)

 Why colMeans?  It is function used instead of apply(...,.. ,mean).
 
 Maybe you want
 
 result-tapply( DF[,1:15], DF$Year, mean,na.rm=T)
 
 Regards
 Petr
 
 
   but i get errors about the length of arguments, which
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list

Re: [R] Tapply.

2010-04-26 Thread steven mosher
Thanks,

  I was trying to stick with the base package and figure out how the base
routines worked. I looked at plyer and it was very appealing. I guess i'll
give in and use it

On Mon, Apr 26, 2010 at 2:33 AM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 Use of ddply() in the plyr package appears to work.

 library(plyr)
 ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE)

  D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
 1 1.00 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
 2 0.50 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
 3 0.50 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
 4 0.50 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
 5 0.00 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
 6 1.33 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
 7 1.33 1988 238 246 249 246 244 213 212 224 232 238 232 230
 8 1.33 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238

 Replace the NaNs with NAs and that should do it

 HTH,
 Dennis

 On Sun, Apr 25, 2010 at 9:52 PM, steven mosher mosherste...@gmail.comwrote:

 Having some difficulties with understanding how tapply works and getting
 return values I expect

 Data: dataframe. DF  DF$Id $D $Year...

  Id  D  Year Jan Feb Mar Apr May Jun Jul Aug Sep
 Oct
 Nov Dec
  11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
  NA
  11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
  NA
  11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA  NA
 245
  11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA  NA
  NA
  11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA  NA
  NA
  11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA  NA
  NA
  11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225  NA
  NA
  11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
  NA
  11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238 232
 230
  11264402000 1 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
  11264402000 3 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
  11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA  NA  NA
 238
  11264402000 1 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
 238
  11264402000 3 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA
 238

 and the result should be a dataframe of column means by year  with the
 variable D dropped (or kept doesnt matter)

 11264402000 1  1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
  NA
  11264402000.5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
  NA
  11264402000.5  1982 236 237 242 240 242 205 199  NA  NA  NA  NA
  NA
  11264402000.5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225  NA
  NA
  112644020001  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
  NA
  11264402000 2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  112644020001.33 1988 238 246 249  246 244 213 212 224 232 238 232
 230
  112644020001.33  1989 232 233 238 239 231  NA 215  NA  NA  NA  NA
 238

  It would seem that Tapply should work
  result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)

  but i get errors about the length of arguments, which

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tapply.

2010-04-26 Thread steven mosher
I guess my problem was seeing a bunch of examples where they pulled a
variable from a dataframe..

  tapply(df$data, index=list(..

and I
assumed that the df$data was just generalizable to a collection of vectors
a vector of vector being a vector

Thanks.

On Mon, Apr 26, 2010 at 2:43 AM, Petr PIKAL petr.pi...@precheza.cz wrote:

 Hi


 steven mosher mosherste...@gmail.com napsal dne 26.04.2010 10:21:37:

  That fails:
 
  The manual says:
 
  tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

  Arguments
 
  X
 
  an atomic object, typically a vector.
 
  INDEX
 
  list of factors, each of same length as X. The elements are coerced to
 factors by
  as.factor.
 
  my error says:

 
  Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) :
 
arguments must have same length
 
  The issue that I have is I dont understand what the requirements for the
 list of factors
  are. In my example DF$Years is  a sequence of
 years..1979,1980,1982,1983, 1987..
  like that with missing years: so when the manual say: list of factors
 each the same
  length as X? what does that mean? I could have a DF with 20 rows and
 only two
  different years. or 20 rows and 20 different years.
 
  Suppose:
 
  a- c(1,2,3,4)
   b-c(2,3,4,5)
   df=data.frame(a,b)
   length(df)

 data frame is not vector nor atomic but list hence length(df) gives you
 number of columns. It is similar to length of a list

  lll-list(a=1, b=2, c=3)
  length(lll)
 [1] 3
 

 If you accept that the first argument of tapply has to be vector you can
 not put data frame there.

 Next second argument has to be list of factors so you can put there
 several factors, each of the same length as first argument (a vector).

 If you want to perform aggregating operation on whole data frame you shall
 consider

 ?by or ?aggregate

 Other options are plyr or doBy packages.

 Syntax for aggregate is quite similar to tapply, only first argument can
 be data frame.

 Regards
 Petr


 
  The length of DF is 2.
  Does that mean the list of factors, each of same length as X. would
 have to be
  2? that doesnt seem to make sense.
 
 
 
  On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL petr.pi...@precheza.cz
 wrote:
  Hi
 
  r-help-boun...@r-project.org napsal dne 26.04.2010 06:52:55:
 
   Having some difficulties with understanding how tapply works and
 getting
   return values I expect
  
   Data: dataframe. DF  DF$Id $D $Year...
  
Id  D  Year Jan Feb Mar Apr May Jun Jul Aug
 Sep
  Oct
   Nov Dec
11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237
  NA
  NA
11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA
 231
  NA
11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA
  NA
  245
11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA
  NA
  NA
11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA
  NA
  NA
11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA
  NA
  NA
11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225
  NA
  NA
11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA
  NA
  NA
11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243
 240
  NA
11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243
 240
  NA
11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243
 240
  NA
11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238
 232
  230
11264402000 1 1988 238 246 249 246 244 213 212 224 232  NA
  NA
  230
11264402000 3 1988 238 246 249 246 244 213 212 224 232  NA
  NA
  230
11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA  NA
  NA
  238
11264402000 1 1989 232 233 238 239 231  NA  NA  NA  NA  NA
  NA
  238
11264402000 3 1989 232 233 238 239 231  NA  NA  NA  NA  NA
  NA
  238
  
   and the result should be a dataframe of column means by year  with the
   variable D dropped (or kept doesnt matter)
  
   11264402000 1  1980  NA  NA  NA  NA  NA 212 203 209 228 237
  NA
  NA
11264402000.5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA
 231
   NA
11264402000.5  1982 236 237 242 240 242 205 199  NA  NA  NA
  NA
   NA
11264402000.5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225
 NA
NA
112644020001  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA
  NA
  NA
11264402000 2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243
 240
  NA
112644020001.33 1988 238 246 249  246 244 213 212 224 232 238
  232
   230
112644020001.33  1989 232 233 238 239 231  NA 215  NA  NA  NA
  NA
   238
  
It would seem that Tapply should work
result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)

  Why colMeans?  It is function used instead of apply(...,.. ,mean).
 
  Maybe you want
 
  result-tapply( DF[,1:15], DF$Year, mean,na.rm=T)
 
  Regards
  Petr
 
  
but i get errors about the length of arguments, 

Re: [R] Tapply.

2010-04-26 Thread Dennis Murphy
Hi:

 On Mon, Apr 26, 2010 at 8:01 AM, steven mosher mosherste...@gmail.comwrote:
 Thanks,

  I was trying to stick with the base package and figure out how the base
routines worked.

If you want to use base functions, then here's a solution with aggregate:
(the Id column
was removed first):

 with(DF, aggregate(DF[, -2], list(Year = Year), FUN = mean, na.rm = TRUE))
  YearD Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 1980 1.00 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
2 1981 0.50 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
3 1982 0.50 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
4 1983 0.50 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
5 1986 0.00 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
6 1987 1.33 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
7 1988 1.33 238 246 249 246 244 213 212 224 232 238 232 230
8 1989 1.33 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238

The problem with tapply() is that the function has to be called recursively
on each
column you want to summarize. You could do it in a loop:
 res - matrix(NA, 8, 14)
 res[, 1] - unique(DF$Year)
 res[, 2] - with(DF, tapply(D, Year, mean, na.rm = TRUE))
 for(j in 3:14) res[, j] - tapply(DF[, j], DF$Year, mean, na.rm = TRUE)
 res
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[,13]
[1,] 1980 1.00  NaN  NaN  NaN  NaN  NaN  212  203   209   228   237
NaN
[2,] 1981 0.50  NaN  251  243  246  241  NaN  NaN   NaN   230   NaN
231
[3,] 1982 0.50  236  237  242  240  242  205  199   NaN   NaN   NaN
NaN
[4,] 1983 0.50  NaN  247  NaN  NaN  NaN  NaN  NaN   205   NaN   225
NaN
[5,] 1986 0.00  NaN  NaN  NaN  240  NaN  NaN  NaN   213   NaN   NaN
NaN
[6,] 1987 1.33  241  NaN  NaN  NaN  NaN  218  NaN   NaN   235   243
240
[7,] 1988 1.33  238  246  249  246  244  213  212   224   232   238
232
[8,] 1989 1.33  232  233  238  239  231  NaN  215   NaN   NaN   NaN
NaN
 [,14]
[1,]   NaN
[2,]   245
[3,]   NaN
[4,]   NaN
[5,]   NaN
[6,]   NaN
[7,]   230
[8,]   238

but it's not the most efficient way to do things.

Essentially, this approach conforms to the 'split-apply-combine' strategy
which is
more efficiently implemented in functions like aggregate() or in packages
such
as doBy, plyr, reshape and data.table, some of which were mentioned earlier
by
Petr Pikal.

HTH,
Dennis

On Mon, Apr 26, 2010 at 8:01 AM, steven mosher mosherste...@gmail.comwrote:

 Thanks,

   I was trying to stick with the base package and figure out how the base
 routines worked. I looked at plyer and it was very appealing. I guess i'll
 give in and use it

 On Mon, Apr 26, 2010 at 2:33 AM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 Use of ddply() in the plyr package appears to work.

 library(plyr)
 ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE)

  D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
 1 1.00 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
 2 0.50 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
 3 0.50 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
 4 0.50 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
 5 0.00 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
 6 1.33 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
 7 1.33 1988 238 246 249 246 244 213 212 224 232 238 232 230
 8 1.33 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238

 Replace the NaNs with NAs and that should do it

 HTH,
 Dennis

 On Sun, Apr 25, 2010 at 9:52 PM, steven mosher mosherste...@gmail.comwrote:

 Having some difficulties with understanding how tapply works and getting
 return values I expect

 Data: dataframe. DF  DF$Id $D $Year...

  Id  D  Year Jan Feb Mar Apr May Jun Jul Aug Sep
 Oct
 Nov Dec
  11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA
  NA
  11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231
  NA
  11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA  NA
 245
  11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA  NA
  NA
  11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA  NA
  NA
  11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA  NA
  NA
  11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225  NA
  NA
  11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA
  NA
  11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240
  NA
  11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238 232
 230
  11264402000 1 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
  11264402000 3 1988 238 246 249 246 244 213 212 224 232  NA  NA
 230
  11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA  NA  NA
 238
  11264402000   

[R] Tapply.

2010-04-25 Thread steven mosher
Having some difficulties with understanding how tapply works and getting
return values I expect

Data: dataframe. DF  DF$Id $D $Year...

 Id  D  Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct
Nov Dec
 11264402000 1 1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA  NA
 11264402000 0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231  NA
 11264402000 1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA  NA 245
 11264402000 0 1982 236 237 242 240 242 205 199  NA  NA  NA  NA  NA
 11264402000 1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA  NA  NA
 11264402000 0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA  NA  NA
 11264402000 1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225  NA  NA
 11264402000 0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA  NA
 11264402000 0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
 11264402000 1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
 11264402000 3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
 11264402000 0 1988 238 246 249  NA 244 213 212 224 232 238 232 230
 11264402000 1 1988 238 246 249 246 244 213 212 224 232  NA  NA 230
 11264402000 3 1988 238 246 249 246 244 213 212 224 232  NA  NA 230
 11264402000 0 1989 232 233 238 239 231  NA 215  NA  NA  NA  NA 238
 11264402000 1 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA 238
 11264402000 3 1989 232 233 238 239 231  NA  NA  NA  NA  NA  NA 238

and the result should be a dataframe of column means by year  with the
variable D dropped (or kept doesnt matter)

11264402000 1  1980  NA  NA  NA  NA  NA 212 203 209 228 237  NA  NA
 11264402000.5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 231  NA
 11264402000.5  1982 236 237 242 240 242 205 199  NA  NA  NA  NA  NA
 11264402000.5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225  NA
 NA
 112644020001  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA  NA  NA
 11264402000 2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 240  NA
 112644020001.33 1988 238 246 249  246 244 213 212 224 232 238 232
230
 112644020001.33  1989 232 233 238 239 231  NA 215  NA  NA  NA  NA
238

 It would seem that Tapply should work
 result-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)

 but i get errors about the length of arguments, which

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply syntax

2010-03-28 Thread Jeff Brown

What is the function set()?  Is that a typo?  When I type ?set I get
nothing, and when I try to evaluate that code R tells me it can't find the
function.
-- 
View this message in context: 
http://n4.nabble.com/tapply-syntax-tp1692503p1694586.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply syntax

2010-03-28 Thread Rolf Turner

On 29/03/2010, at 1:27 PM, Jeff Brown wrote:

 
 What is the function set()?  Is that a typo?  When I type ?set I get
 nothing, and when I try to evaluate that code R tells me it can't find the
 function.


Yeah, it's a typo.  (S)he meant ``subset''.

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply syntax

2010-03-28 Thread jim holtman
The message tells you everything - there is no function 'set' in the
workspace you are using.  Did you forget to load a library?  What is the
context in which you are trying to use it?

On Sun, Mar 28, 2010 at 8:27 PM, Jeff Brown dopethatwantsc...@yahoo.comwrote:


 What is the function set()?  Is that a typo?  When I type ?set I get
 nothing, and when I try to evaluate that code R tells me it can't find the
 function.
 --
 View this message in context:
 http://n4.nabble.com/tapply-syntax-tp1692503p1694586.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply syntax

2010-03-28 Thread David Freedman

sorry - I use many abbreviations and I try to remove them before I post 
questions/answers - 'set' is my abb. for subset

david

On 3/28/2010 8:27 PM, Jeff Brown [via R] wrote:
 What is the function set()?  Is that a typo?  When I type ?set I get 
 nothing, and when I try to evaluate that code R tells me it can't find 
 the function.

 View message @ http://n4.nabble.com/tapply-syntax-tp1692503p1694586.html
 To unsubscribe from Re: tapply syntax, click here 
  (link removed) =. 



-- 
View this message in context: 
http://n4.nabble.com/tapply-syntax-tp1692503p1694626.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply syntax

2010-03-26 Thread Min-Han Tan
Dear R-help members,

Apologies for the trouble.

I have a question :

Essentially, I have a dataset which stores genetic variations for individual
patients. Each individual patient can have more than one variation, and each
new record corresponds to a new variation (thus, both individual patients
and variations are non-unique).

So the dataset looks something like this ((letters = patients, numbers =
variation type).
Patient, Variation Type
A, 1
A, 2
A, 3
B, 1
C, 2
D, 2
D, 3
E, 2
E, 4
F, 4

My final desired output is a data.frame or a vector containing patients,
each corresponding to a desired subset of variations. For e.g., if I only
was interested in variation type 2,3, my output would look like this.

A, 2
B, 0
C, 1
D, 2
E, 1
F, 0.

I am trying to figure out how to use tapply to do this.

It would be something like tapply (Variation Type, Patient, ??? )

I am not sure about the function syntax of ??? to subselect only 2,3, and
have been looking at the r-help.

Sorry! Essentially, I am trying to avoid awkward loops in this whole
process.

Thanks very much for your advice!

Min-Han

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply syntax

2010-03-26 Thread Min-Han Tan
Hi,

I figured a workaround to my problem, but if anyone has any advice on how to
express a function in tapply to achieve the same outcome, that would be
awesome and I'd learn something about functions!

The workaround was
tapply ((data$Variation.Type %in% c(2,3)), data$Patient, sum)

Thanks.

Min-Han

On Fri, Mar 26, 2010 at 12:40 PM, Min-Han Tan minhan.scie...@gmail.comwrote:

 Dear R-help members,

 Apologies for the trouble.

 I have a question :

 Essentially, I have a dataset which stores genetic variations for
 individual patients. Each individual patient can have more than one
 variation, and each new record corresponds to a new variation (thus, both
 individual patients and variations are non-unique).

 So the dataset looks something like this ((letters = patients, numbers =
 variation type).
 Patient, Variation Type
 A, 1
 A, 2
 A, 3
 B, 1
 C, 2
 D, 2
 D, 3
 E, 2
 E, 4
 F, 4

 My final desired output is a data.frame or a vector containing patients,
 each corresponding to a desired subset of variations. For e.g., if I only
 was interested in variation type 2,3, my output would look like this.

 A, 2
 B, 0
 C, 1
 D, 2
 E, 1
 F, 0.

 I am trying to figure out how to use tapply to do this.

 It would be something like tapply (Variation Type, Patient, ??? )

 I am not sure about the function syntax of ??? to subselect only 2,3, and
 have been looking at the r-help.

 Sorry! Essentially, I am trying to avoid awkward loops in this whole
 process.

 Thanks very much for your advice!

 Min-Han


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply syntax

2010-03-26 Thread David Freedman

how about:

d1=data.frame(pat=c(rep('a',3),'b','c',rep('d',2),rep('e',2),'f'),var=c(1,2,3,1,2,2,3,2,4,4))
ds=set(d1,var %in% c(2,3))
with(ds,tapply(var,pat,FUN=length))

hth,
David Freedman, CDC, Atlanta
-- 
View this message in context: 
http://n4.nabble.com/tapply-syntax-tp1692503p1692553.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-04 Thread J. R. M. Hosking

sjaffe wrote:

I'm sure I can put this together from the various 'apply's and split, but I
wonder if anyone has a quick incantation:

E.g. I can do tapply( data, groups, mean)

but how can I do something like:  tapply( list(data,weights), groups,
weighted.mean ) ?

(or: mapply is to sapply as ? is to tapply )

Thanks for your help.


  coef(lm(data ~ -1 + as.factor(groups), weights=weights))

Not the fastest, but IMO more comprehensible than the constructions
involving anonymous functions.


J. R. M. Hosking

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-04 Thread David Winsemius


On Feb 4, 2010, at 9:56 AM, J. R. M. Hosking wrote:


sjaffe wrote:
I'm sure I can put this together from the various 'apply's and  
split, but I

wonder if anyone has a quick incantation:
E.g. I can do tapply( data, groups, mean)
but how can I do something like:  tapply( list(data,weights), groups,
weighted.mean ) ?
(or: mapply is to sapply as ? is to tapply )
Thanks for your help.


 coef(lm(data ~ -1 + as.factor(groups), weights=weights))

Not the fastest, but IMO more comprehensible than the constructions
involving anonymous functions.


Are you sure?  (Am I sure?)  Thomas Lumley has corrected my  
misinterpretations on this point (and I apologize to him for the fact  
that he has had to do it more than once.)


https://stat.ethz.ch/pipermail/r-help/2010-February/226536.html

I am guessing that the OP was using either sampling weights or  
replication weights (he did not say), so lm( , weights) might not be  
appropriate tool.




--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-03 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 02.02.2010 22:16:06:

 
 'fraid not :-((
 
 tapply( data, groups, weighted.mean, weights) 

tapply(seq(along=lll), rrr, function(i, x, w) weighted.mean(x[i], w[i]), 
   x=lll, w=ttt) 
If you want to subset more than one thing, subset the index vector. 
The above help I obtained from Prof.Ripley several years ago so (untested)

tapply( seq(along=data), groups, function (i, x, w) weighted.mean(x[i], 
w[i]), x=data, w=weights)

I believe it shall still work.

Regards
Petr



 
 won't work because the *entire* weights vector is passed as the 2nd arg 
to
 weighted.means. But weighted.mean needs 'weights' to be split in the 
same
 way as 'data' -- the first and 2nd args need to correspond. 
 
 
 Jorge Ivan Velez wrote:
  
  Hi sjaffem,
  
  You were almost there:
  
  tapply( yourdata, groups, weighted.mean, weights)
  
  See ?tapply for more information.
  
  HTH,
  Jorge
  
  
  
 
 -- 
 View this message in context: 
http://n4.nabble.com/tapply-for-function-taking-
 of-1-argument-tp1460392p1460419.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-03 Thread Steve Jaffe
Yes, this is clearly the key to working with subsets. Thanks

-Original Message-
From: Petr PIKAL [mailto:petr.pi...@precheza.cz] 
Sent: Wednesday, February 03, 2010 4:16 AM
To: Steve Jaffe
Cc: r-help@r-project.org
Subject: Re: [R] tapply for function taking of 1 argument?

Hi

r-help-boun...@r-project.org napsal dne 02.02.2010 22:16:06:

 
 'fraid not :-((
 
 tapply( data, groups, weighted.mean, weights) 

tapply(seq(along=lll), rrr, function(i, x, w) weighted.mean(x[i], w[i]), 
   x=lll, w=ttt) 
If you want to subset more than one thing, subset the index vector. 
The above help I obtained from Prof.Ripley several years ago so (untested)

tapply( seq(along=data), groups, function (i, x, w) weighted.mean(x[i], 
w[i]), x=data, w=weights)

I believe it shall still work.

Regards
Petr



 
 won't work because the *entire* weights vector is passed as the 2nd arg 
to
 weighted.means. But weighted.mean needs 'weights' to be split in the 
same
 way as 'data' -- the first and 2nd args need to correspond. 
 
 
 Jorge Ivan Velez wrote:
  
  Hi sjaffem,
  
  You were almost there:
  
  tapply( yourdata, groups, weighted.mean, weights)
  
  See ?tapply for more information.
  
  HTH,
  Jorge
  
  
  
 
 -- 
 View this message in context: 
http://n4.nabble.com/tapply-for-function-taking-
 of-1-argument-tp1460392p1460419.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-03 Thread David Freedman

also, 

library(plyr)
ddply(d,~grp,function(df) weighted.mean(df$x,df$w))
-- 
View this message in context: 
http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1461428.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-03 Thread hadley wickham
On Wed, Feb 3, 2010 at 11:06 AM, David Freedman 3.14da...@gmail.com wrote:

 also,

 library(plyr)
 ddply(d,~grp,function(df) weighted.mean(df$x,df$w))

Or

ddply(d, grp, summarise, mean = weighted.mean(x, w))

which is convenient if you want more than one output

Hadley



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-03 Thread Gabor Grothendieck
Also try this:

 library(sqldf)
 DF - data.frame(data = 1:10, groups = rep(1:2, 5), weights = 1)
 sqldf(select groups, sum(data * weights)/sum(weights) 'wtd mean' from DF 
 group by groups)
  groups wtd mean
1  15
2  26

On Tue, Feb 2, 2010 at 5:06 PM, sjaffe sja...@riskspan.com wrote:

 Thanks! :-)

 I suppose it's obvious, but one will generally have to use a (anonymous)
 function to 'unpack' the data.frame into columns, unless the function
 already knows how to do this.

 I mention this because when I tested the solution on my example I got an
 unexpected result -- apparently weighted.mean will operate on a 2-column
 dataframe but not in the way one would expect.

 data = 1:10
 weights = rep(1,10)
 groups = rep(c(1,2),5)
  by( data.frame(data,weights), groups, weighted.mean)
 groups: 1
 [1] 15
 
 groups: 2
 [1] 17.5



 But

  by( data.frame(data,weights), groups, function(d) { weighted.mean(d[,1],
 d[,2]) } )

 does the right thing

 groups: 1
 [1] 5
 
 groups: 2
 [1] 6




 Bert Gunter wrote:

 ?by


 Bert Gunter
 Genentech Nonclinical Statistics

 --
 View this message in context: 
 http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1460489.html
 Sent from the R help mailing list archive at Nabble.com.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-03 Thread sjaffe

Thanks, I’m actually more comfortable with vector-ish syntax than sql-ish but 
this is a good thing to keep in mind… I wonder how it compares in performance 
versus ‘by’ or ‘tapply’

From: Gabor Grothendieck [via R] 
[mailto:ml-node+1461531-1948782...@n4.nabble.com]
Sent: Wednesday, February 03, 2010 1:19 PM
To: Steve Jaffe
Subject: Re: tapply for function taking of 1 argument?

Also try this:

 library(sqldf)
 DF - data.frame(data = 1:10, groups = rep(1:2, 5), weights = 1)
 sqldf(select groups, sum(data * weights)/sum(weights) 'wtd mean' from DF 
 group by groups)
  groups wtd mean
1  15
2  26

On Tue, Feb 2, 2010 at 5:06 PM, sjaffe [hidden 
email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1461531i=0 
wrote:


 Thanks! :-)

 I suppose it's obvious, but one will generally have to use a (anonymous)
 function to 'unpack' the data.frame into columns, unless the function
 already knows how to do this.

 I mention this because when I tested the solution on my example I got an
 unexpected result -- apparently weighted.mean will operate on a 2-column
 dataframe but not in the way one would expect.

 data = 1:10
 weights = rep(1,10)
 groups = rep(c(1,2),5)
  by( data.frame(data,weights), groups, weighted.mean)
 groups: 1
 [1] 15
 
 groups: 2
 [1] 17.5



 But

  by( data.frame(data,weights), groups, function(d) { weighted.mean(d[,1],
 d[,2]) } )

 does the right thing

 groups: 1
 [1] 5
 
 groups: 2
 [1] 6




 Bert Gunter wrote:

 ?by


 Bert Gunter
 Genentech Nonclinical Statistics

 --
 View this message in context: 
 http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1460489.html
 Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

 __
 [hidden 
 email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1461531i=1 
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
[hidden 
email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1461531i=2 
mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


View message @ 
http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1461531.html
To unsubscribe from Re: tapply for function taking of 1 argument?, click here 
(link removed) ==.


-- 
View this message in context: 
http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1461541.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-03 Thread Bert Gunter
My editorial opinion only:

It will of necessity be slower (because there's more machinery underlying
the sqldf package); but I doubt whether it would be noticeably slower than
the native R solution in most practical situations. The same would be true
for plyR's implementation (it relies on the proto package, which slows
things down a bit).

The point is that the most important issue in almost all cases is the
programmer's time to create and debug correct code, especially as the native
machine speeds continue to increase. R gives you the option to choose
whatever idiom you prefer to minimize this. The software implementation
differences thereafter will rarely be important.

In other words, pick your poison.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of sjaffe
Sent: Wednesday, February 03, 2010 10:25 AM
To: r-help@r-project.org
Subject: Re: [R] tapply for function taking of 1 argument?


Thanks, Ibm actually more comfortable with vector-ish syntax than sql-ish
but this is a good thing to keep in mindb I wonder how it compares in
performance versus bbyb or btapplyb

From: Gabor Grothendieck [via R]
[mailto:ml-node+1461531-1948782...@n4.nabble.com]
Sent: Wednesday, February 03, 2010 1:19 PM
To: Steve Jaffe
Subject: Re: tapply for function taking of 1 argument?

Also try this:

 library(sqldf)
 DF - data.frame(data = 1:10, groups = rep(1:2, 5), weights = 1)
 sqldf(select groups, sum(data * weights)/sum(weights) 'wtd mean' from DF
group by groups)
  groups wtd mean
1  15
2  26

On Tue, Feb 2, 2010 at 5:06 PM, sjaffe [hidden
email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1461531i=0
wrote:


 Thanks! :-)

 I suppose it's obvious, but one will generally have to use a (anonymous)
 function to 'unpack' the data.frame into columns, unless the function
 already knows how to do this.

 I mention this because when I tested the solution on my example I got an
 unexpected result -- apparently weighted.mean will operate on a 2-column
 dataframe but not in the way one would expect.

 data = 1:10
 weights = rep(1,10)
 groups = rep(c(1,2),5)
  by( data.frame(data,weights), groups, weighted.mean)
 groups: 1
 [1] 15
 
 groups: 2
 [1] 17.5



 But

  by( data.frame(data,weights), groups, function(d) { weighted.mean(d[,1],
 d[,2]) } )

 does the right thing

 groups: 1
 [1] 5
 
 groups: 2
 [1] 6




 Bert Gunter wrote:

 ?by


 Bert Gunter
 Genentech Nonclinical Statistics

 --
 View this message in context:
http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1460
489.html
 Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

 __
 [hidden
email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1461531i=1
mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
[hidden
email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1461531i=2
mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


View message @
http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1461
531.html
To unsubscribe from Re: tapply for function taking of 1 argument?, click
here (link removed) ==.


-- 
View this message in context:
http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1461
541.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-03 Thread hadley wickham
 It will of necessity be slower (because there's more machinery underlying
 the sqldf package); but I doubt whether it would be noticeably slower than
 the native R solution in most practical situations. The same would be true
 for plyR's implementation (it relies on the proto package, which slows
 things down a bit).

Plyr doesn't use proto at all (that's ggplot2).  Plyr is generally
faster than split + lapply etc for large datasets with many splits,
but slower with smaller datasets/fewer splits.

 The point is that the most important issue in almost all cases is the
 programmer's time to create and debug correct code, especially as the native
 machine speeds continue to increase. R gives you the option to choose
 whatever idiom you prefer to minimize this. The software implementation
 differences thereafter will rarely be important.

Totally agreed!  In my mind the advantage of learning plyr, is that
you learn one set of methods that work for lists, data frames and
arrays. And because all of the functions are designed with consistency
in mind, it hopefully takes less time to learn them all.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply for function taking of 1 argument?

2010-02-02 Thread sjaffe

I'm sure I can put this together from the various 'apply's and split, but I
wonder if anyone has a quick incantation:

E.g. I can do tapply( data, groups, mean)

but how can I do something like:  tapply( list(data,weights), groups,
weighted.mean ) ?

(or: mapply is to sapply as ? is to tapply )

Thanks for your help.
-- 
View this message in context: 
http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1460392.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-02 Thread Jorge Ivan Velez
Hi sjaffem,

You were almost there:

tapply( yourdata, groups, weighted.mean, weights)

See ?tapply for more information.

HTH,
Jorge


On Tue, Feb 2, 2010 at 3:58 PM, sjaffe  wrote:


 I'm sure I can put this together from the various 'apply's and split, but I
 wonder if anyone has a quick incantation:

 E.g. I can do tapply( data, groups, mean)

 but how can I do something like:  tapply( list(data,weights), groups,
 weighted.mean ) ?

 (or: mapply is to sapply as ? is to tapply )

 Thanks for your help.
 --
 View this message in context:
 http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1460392.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-02 Thread sjaffe

'fraid not :-((

tapply( data, groups, weighted.mean, weights) 

won't work because the *entire* weights vector is passed as the 2nd arg to
weighted.means. But weighted.mean needs 'weights' to be split in the same
way as 'data' -- the first and 2nd args need to correspond. 


Jorge Ivan Velez wrote:
 
 Hi sjaffem,
 
 You were almost there:
 
 tapply( yourdata, groups, weighted.mean, weights)
 
 See ?tapply for more information.
 
 HTH,
 Jorge
 
 
 

-- 
View this message in context: 
http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1460419.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-02 Thread Bert Gunter
?by

Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of sjaffe
Sent: Tuesday, February 02, 2010 1:16 PM
To: r-help@r-project.org
Subject: Re: [R] tapply for function taking of 1 argument?


'fraid not :-((

tapply( data, groups, weighted.mean, weights) 

won't work because the *entire* weights vector is passed as the 2nd arg to
weighted.means. But weighted.mean needs 'weights' to be split in the same
way as 'data' -- the first and 2nd args need to correspond. 


Jorge Ivan Velez wrote:
 
 Hi sjaffem,
 
 You were almost there:
 
 tapply( yourdata, groups, weighted.mean, weights)
 
 See ?tapply for more information.
 
 HTH,
 Jorge
 
 
 

-- 
View this message in context:
http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1460
419.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-02 Thread Charles C. Berry

On Tue, 2 Feb 2010, sjaffe wrote:



'fraid not :-((

tapply( data, groups, weighted.mean, weights)

won't work because the *entire* weights vector is passed as the 2nd arg to
weighted.means. But weighted.mean needs 'weights' to be split in the same
way as 'data' -- the first and 2nd args need to correspond.




try

sapply( split( data.frame(x,w), grp) , do.call, what=weighted.mean )


HTH,

Chuck



Jorge Ivan Velez wrote:


Hi sjaffem,

You were almost there:

tapply( yourdata, groups, weighted.mean, weights)

See ?tapply for more information.

HTH,
Jorge





--
View this message in context: 
http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1460419.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-02 Thread Steve Jaffe
Excellent! I knew there would be a clever answer using 'do.call' :-)

-Original Message-
From: Charles C. Berry [mailto:cbe...@tajo.ucsd.edu] 
Sent: Tuesday, February 02, 2010 4:25 PM
To: Steve Jaffe
Cc: r-help@r-project.org
Subject: Re: [R] tapply for function taking of 1 argument?

On Tue, 2 Feb 2010, sjaffe wrote:


 'fraid not :-((

 tapply( data, groups, weighted.mean, weights)

 won't work because the *entire* weights vector is passed as the 2nd arg to
 weighted.means. But weighted.mean needs 'weights' to be split in the same
 way as 'data' -- the first and 2nd args need to correspond.



try

sapply( split( data.frame(x,w), grp) , do.call, what=weighted.mean )


HTH,

Chuck


 Jorge Ivan Velez wrote:

 Hi sjaffem,

 You were almost there:

 tapply( yourdata, groups, weighted.mean, weights)

 See ?tapply for more information.

 HTH,
 Jorge




 -- 
 View this message in context: 
 http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1460419.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry(858) 534-2098
 Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply for function taking of 1 argument?

2010-02-02 Thread sjaffe

Thanks! :-)

I suppose it's obvious, but one will generally have to use a (anonymous)
function to 'unpack' the data.frame into columns, unless the function
already knows how to do this.  

I mention this because when I tested the solution on my example I got an
unexpected result -- apparently weighted.mean will operate on a 2-column
dataframe but not in the way one would expect.

data = 1:10
weights = rep(1,10)
groups = rep(c(1,2),5)
  by( data.frame(data,weights), groups, weighted.mean)
groups: 1
[1] 15
 
groups: 2
[1] 17.5
 


But

  by( data.frame(data,weights), groups, function(d) { weighted.mean(d[,1],
d[,2]) } )

does the right thing

groups: 1
[1] 5
 
groups: 2
[1] 6
 



Bert Gunter wrote:
 
 ?by
 

 Bert Gunter
 Genentech Nonclinical Statistics
   
-- 
View this message in context: 
http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1460489.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply on multiple groups

2010-01-28 Thread David Winsemius


On Jan 28, 2010, at 10:26 AM, GL wrote:



Can you make tapply break down groups similar to bwplot or such?  
Example:


Data frame has one measure (Days) and two Dimensions (MM and  
Place). All

have the same length.


length(dbs.final$Days)

[1] 3306

length()

[1] 3306

length()

[1] 3306

Doing the following makes a nice table for one dimension and one  
measure:


   do.call(rbind,tapply(dbs.final$Days,dbs.final$Place, summary))

But, what I really need to do is break it down on two dimensions and  
one

measures - effectively equivalent to the following bwplot call:

   bwplot( Days ~ MM | Place, ,data=dbs.final)

Is there an equivalent to the | operation in tapply?


Please reread the help page for tapply.

Perhaps?:

tapply(dbs.final$Days, list(dbs.final$MM, dbs.final$Place) summary)

-- David



--
View this message in context: 
http://n4.nabble.com/tapply-on-multiple-groups-tp1380593p1380593.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply on multiple groups

2010-01-28 Thread Gigi Lipori
Thanks. My mistake was that I used c(dbs.final$Days,dbs.final$Place) instead of 
list(... when I tried to follow that part of the documentation. 

 David Winsemius dwinsem...@comcast.net 1/28/2010 11:49 AM 

On Jan 28, 2010, at 10:26 AM, GL wrote:


 Can you make tapply break down groups similar to bwplot or such?  
 Example:

 Data frame has one measure (Days) and two Dimensions (MM and  
 Place). All
 have the same length.

 length(dbs.final$Days)
 [1] 3306
 length()
 [1] 3306
 length()
 [1] 3306

 Doing the following makes a nice table for one dimension and one  
 measure:

do.call(rbind,tapply(dbs.final$Days,dbs.final$Place, summary))

 But, what I really need to do is break it down on two dimensions and  
 one
 measures - effectively equivalent to the following bwplot call:

bwplot( Days ~ MM | Place, ,data=dbs.final)

 Is there an equivalent to the | operation in tapply?

Please reread the help page for tapply.

Perhaps?:

tapply(dbs.final$Days, list(dbs.final$MM, dbs.final$Place) summary)

-- David


 -- 
 View this message in context: 
 http://n4.nabble.com/tapply-on-multiple-groups-tp1380593p1380593.html 
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply and more than one function, with different arguments

2010-01-26 Thread RINNER Heinrich
Dear R-users,

I am working with R version 2.10.1.

Say I have is a simple function like this:

 my.fun - function(x, mult) mult*sum(x)

Now, I want to apply this function along with some other (say 'max') to a 
simple data.frame, like:

 dat - data.frame(x = 1:4, grp = c(a,a,b,b))

Ideally, the result would look something like this (if mult = 10):
  max my.fun
a   2 30
b   4 70

I have tried it that way:

apply.more.functions - function(dat, FUN = c(max, my.fun), ...) {
  res - NULL
  for(f in FUN) res[[f]] - tapply(dat$x, dat$grp, FUN = f, ...)
  data.frame(res)
}

# let's test it:
 apply.more.functions(dat, FUN = c(max, min))
  max min
a   2   1
b   4   3
# perfect!

# now, with an additional argument:
 apply.more.functions(dat, FUN = c(max, my.fun), mult = 10)
  max my.fun
a  10 30
b  10 70
# uhuh!
Apparently, 'mult' has been used in the calculation of 'max' as well.
How can I modify apply.more.functions in order to avoid this?

Your advice would be appreciated;
Kind regards
Heinrich.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply and more than one function, with different arguments

2010-01-26 Thread Peter Ehlers

Try replacing 'max' with 'mean' and see what you get.
Then have a look at ?max and see what max() does with
extra arguments.

I'm not sure it's relevant, but it might be useful
to check what Hmisc::summarize does.

 -Peter Ehlers

RINNER Heinrich wrote:

Dear R-users,

I am working with R version 2.10.1.

Say I have is a simple function like this:


my.fun - function(x, mult) mult*sum(x)


Now, I want to apply this function along with some other (say 'max') to a 
simple data.frame, like:


dat - data.frame(x = 1:4, grp = c(a,a,b,b))


Ideally, the result would look something like this (if mult = 10):
  max my.fun
a   2 30
b   4 70

I have tried it that way:

apply.more.functions - function(dat, FUN = c(max, my.fun), ...) {
  res - NULL
  for(f in FUN) res[[f]] - tapply(dat$x, dat$grp, FUN = f, ...)
  data.frame(res)
}

# let's test it:

apply.more.functions(dat, FUN = c(max, min))

  max min
a   2   1
b   4   3
# perfect!

# now, with an additional argument:

apply.more.functions(dat, FUN = c(max, my.fun), mult = 10)

  max my.fun
a  10 30
b  10 70
# uhuh!
Apparently, 'mult' has been used in the calculation of 'max' as well.
How can I modify apply.more.functions in order to avoid this?

Your advice would be appreciated;
Kind regards
Heinrich.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply and more than one function, with different arguments

2010-01-26 Thread Dennis Murphy
Hi:

Using the plyr package, we can get the result as follows:

 library(plyr)
 my.fun - function(x, mult) mult*sum(x)
 dat - data.frame(x = 1:4, grp = c(a,a,b,b))
 ddply(dat, .(grp), summarize, max = max(x), myfun = my.fun(x, 10))
  grp max myfun
1   a   230
2   b   470

HTH,
Dennis

On Tue, Jan 26, 2010 at 8:26 AM, RINNER Heinrich 
heinrich.rin...@tirol.gv.at wrote:

 Dear R-users,

 I am working with R version 2.10.1.

 Say I have is a simple function like this:

  my.fun - function(x, mult) mult*sum(x)

 Now, I want to apply this function along with some other (say 'max') to a
 simple data.frame, like:

  dat - data.frame(x = 1:4, grp = c(a,a,b,b))

 Ideally, the result would look something like this (if mult = 10):
  max my.fun
 a   2 30
 b   4 70

 I have tried it that way:

 apply.more.functions - function(dat, FUN = c(max, my.fun), ...) {
  res - NULL
  for(f in FUN) res[[f]] - tapply(dat$x, dat$grp, FUN = f, ...)
  data.frame(res)
 }

 # let's test it:
  apply.more.functions(dat, FUN = c(max, min))
  max min
 a   2   1
 b   4   3
 # perfect!

 # now, with an additional argument:
  apply.more.functions(dat, FUN = c(max, my.fun), mult = 10)
  max my.fun
 a  10 30
 b  10 70
 # uhuh!
 Apparently, 'mult' has been used in the calculation of 'max' as well.
 How can I modify apply.more.functions in order to avoid this?

 Your advice would be appreciated;
 Kind regards
 Heinrich.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply and more than one function, with different arguments

2010-01-26 Thread RINNER Heinrich
Hi Dennis,
now that's a very nice function, and this seems to be just what I need!
Thanks a lot!
-Heinrich.

Von: Dennis Murphy [djmu...@gmail.com]
Gesendet: Dienstag, 26. Januar 2010 19:44
An: RINNER Heinrich
Cc: r-help
Betreff: Re: [R] tapply and more than one function, with different arguments

Hi:

Using the plyr package, we can get the result as follows:

 library(plyr)
 my.fun - function(x, mult) mult*sum(x)
 dat - data.frame(x = 1:4, grp = c(a,a,b,b))
 ddply(dat, .(grp), summarize, max = max(x), myfun = my.fun(x, 10))
  grp max myfun
1   a   230
2   b   470

HTH,
Dennis

On Tue, Jan 26, 2010 at 8:26 AM, RINNER Heinrich 
heinrich.rin...@tirol.gv.atmailto:heinrich.rin...@tirol.gv.at wrote:
Dear R-users,

I am working with R version 2.10.1.

Say I have is a simple function like this:

 my.fun - function(x, mult) mult*sum(x)

Now, I want to apply this function along with some other (say 'max') to a 
simple data.frame, like:

 dat - data.frame(x = 1:4, grp = c(a,a,b,b))

Ideally, the result would look something like this (if mult = 10):
 max my.fun
a   2 30
b   4 70

I have tried it that way:

apply.more.functions - function(dat, FUN = c(max, my.fun), ...) {
 res - NULL
 for(f in FUN) res[[f]] - tapply(dat$x, dat$grp, FUN = f, ...)
 data.frame(res)
}

# let's test it:
 apply.more.functions(dat, FUN = c(max, min))
 max min
a   2   1
b   4   3
# perfect!

# now, with an additional argument:
 apply.more.functions(dat, FUN = c(max, my.fun), mult = 10)
 max my.fun
a  10 30
b  10 70
# uhuh!
Apparently, 'mult' has been used in the calculation of 'max' as well.
How can I modify apply.more.functions in order to avoid this?

Your advice would be appreciated;
Kind regards
Heinrich.

__
R-help@r-project.orgmailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply function

2009-11-03 Thread FMH
Hi,

I tried to use tapply function to find the mean of the data in each group as 
the following command, but the result are NA, as there are several missing 
values in each group.

tapply(data,group,mean)

Could someone please advice me the way to  ignore the missing data in order for 
the fucntion to run successfully?

Thanks

Fir




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply function

2009-11-03 Thread Sundar Dorai-Raj
you must have missing values in data. Try

tapply(data, group, mean, na.rm = TRUE)

If that's not the case, read the bottom of this email about the posting guide.

HTH,

--sundar

On Tue, Nov 3, 2009 at 5:28 AM, FMH kagba2...@yahoo.com wrote:
 Hi,

 I tried to use tapply function to find the mean of the data in each group as 
 the following command, but the result are NA, as there are several missing 
 values in each group.

 tapply(data,group,mean)

 Could someone please advice me the way to  ignore the missing data in order 
 for the fucntion to run successfully?

 Thanks

 Fir




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply with multiple arguments that are not part of the same data frame

2009-10-22 Thread Kavitha Venkatesan
Hi all,

I would like to invoke a function that takes multiple arguments (some of
which are specified columns in the data frame, and others that are
independent of the data frame) on split parts of a data frame, how do I do
this?

For example, let's say I have a data frame
fitness_data
name  height  weight  country
rob  5.8200  usa
nancy  5.5140  germany
jen   5.6150  usa
clark 5.10  210 germany
matt 5.9 280 canada
ralph6   270 canada
...
...

Now let us say I have a function,  my_func(h, w, noise, dir), which takes as
input:
(1) a vector of heights
(2) a vector of weights
(3) a user-input numeric noise value
(4) a user-input string dir for the directory to output the end result of
the function to

This function does some calculations on the input data and outputs a
dataframe that is then written to a file in the dir directory.

If I want to apply this function to data grouped by each country in the
fitness_data dataframe, how would I do this? I tried looking through the
mailing archives, but couldn't nail down the solution. I tried something
like

split(mapply( function(a,b,c,d) my_func(fitness_data$h, fitness_data$w, 2.5,
my_directory)), fitness_data$country)

but this considered fitness_data$h, and fitness_data$w in each single row
for a country, rather than a vector of heights or weights across all rows
corresponding to that country.

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply with multiple arguments that are not part of the same data frame

2009-10-22 Thread Kavitha Venkatesan
I just realized my earlier post of my question below was not in
Plain Text mode, hence the repeat post...apologies!
Kavitha


On Thu, Oct 22, 2009 at 4:19 PM, Kavitha Venkatesan
kavitha.venkate...@gmail.com wrote:
 Hi all,

 I would like to invoke a function that takes multiple arguments (some of
 which are specified columns in the data frame, and others that are
 independent of the data frame) on split parts of a data frame, how do I do
 this?

 For example, let's say I have a data frame
fitness_data
 name  height  weight  country
 rob  5.8    200  usa
 nancy  5.5    140  germany
 jen   5.6    150  usa
 clark 5.10  210 germany
 matt 5.9 280 canada
 ralph    6   270 canada
 ...
 ...

 Now let us say I have a function,  my_func(h, w, noise, dir), which takes as
 input:
 (1) a vector of heights
 (2) a vector of weights
 (3) a user-input numeric noise value
 (4) a user-input string dir for the directory to output the end result of
 the function to

 This function does some calculations on the input data and outputs a
 dataframe that is then written to a file in the dir directory.

 If I want to apply this function to data grouped by each country in the
 fitness_data dataframe, how would I do this? I tried looking through the
 mailing archives, but couldn't nail down the solution. I tried something
 like

 split(mapply( function(a,b,c,d) my_func(fitness_data$h, fitness_data$w, 2.5,
 my_directory)), fitness_data$country)

 but this considered fitness_data$h, and fitness_data$w in each single row
 for a country, rather than a vector of heights or weights across all rows
 corresponding to that country.

 Thanks!



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply() and using factor() on a factor

2009-10-16 Thread Alexander Peterhansl
Thank you Mohamed and Bill for your replies.  (I did not send the data
because it is unwieldy.)

Yes Bill, the issue arises directly from what you had guessed.  I was
working with a subset of the data (which implicitly had factors for the
complete data set).

On this, what is the best way take a subset of the data which ignores
these extraneous factors?

 log-data.frame(Flag=1:2,
RequestID=factor(letters[1:2],levels=letters[1:10]))
 log2 -subset(log, RequestID==a)

 levels(log2$RequestID)
 [1] a b c d e f g h i j

In other words, how do I take a subset which yields a as the only
level for log2?

Alex




-Original Message-
From: William Dunlap [mailto:wdun...@tibco.com] 
Sent: Thursday, October 15, 2009 11:59 PM
To: Alexander Peterhansl; r-help@r-project.org
Subject: RE: [R] tapply() and using factor() on a factor

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Alexander 
 Peterhansl
 Sent: Thursday, October 15, 2009 2:50 PM
 To: r-help@r-project.org
 Subject: [R] tapply() and using factor() on a factor
 
 Dear List,
 
  
 
 Shouldn't result1 and result2 be equal in the following case?
 
  
 
 Note that log$RequestID is a factor.  That is, 
 is.factor(log$RequestID)
 yields TRUE.
 
  
 
 result1 - tapply(log$Flag,factor(log$RequestID),sum)
 
 result2 - tapply(log$Flag,log$RequestID,sum)

Showing us the output of dput(log) (or str(log) and summary(log))
would let people discover the problem more readily.  Since you
didn't I'll guess what the dataset may contain.

If log$RequestID is a factor with lots of unused levels tapply
will output an NA for each unused level.  factor(log$RequestID)
will create a new set of levels, only those actually used,
so tapply will not be forced to fill those spots with NA's.  E.g.,

 log-data.frame(Flag=1:2, RequestID=factor(letters[1:2],
levels=letters[1:10]))
 tapply(log$Flag, log$RequestID, sum)
 a  b  c  d  e  f  g  h  i  j
 1  2 NA NA NA NA NA NA NA NA
 tapply(log$Flag, factor(log$RequestID), sum)
a b
1 2

I suppose tapply(X,INDEX,FUN) could call FUN(X[0]) to see
how to fill the cells with no data behind them, but it doesn't.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 
  
 
 Yet, when I summarize the output, I get the following:
 
 summary(result1)
 
Min.1st Qu.  Median  Mean 3rd Qu.Max. 
 
   11.00   11.00 11.00  26.06   11.00   101.00
 
  
 
 summary(result2)
 
Min. 1st Qu.  Median Mean 3rd Qu.Max.NA's 
 
   11.00   11.00   11.0026.06   11.00  101.00   978.00
 
  
 
 Why does result2 have 978 NA's?
 
  
 
 Any help on this would be appreciated.
 
  
 
 Alex
 
  
 
  
 
  
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply() and using factor() on a factor

2009-10-16 Thread David Winsemius


On Oct 16, 2009, at 11:33 AM, Alexander Peterhansl wrote:


Thank you Mohamed and Bill for your replies.  (I did not send the data
because it is unwieldy.)

Yes Bill, the issue arises directly from what you had guessed.  I was
working with a subset of the data (which implicitly had factors for  
the

complete data set).

On this, what is the best way take a subset of the data which ignores
these extraneous factors?


log-data.frame(Flag=1:2,

RequestID=factor(letters[1:2],levels=letters[1:10]))

log2 -subset(log, RequestID==a)



levels(log2$RequestID)

[1] a b c d e f g h i j


log2$RequestID - factor(log2$RequestID)

You might think that log2 -subset(log, RequestID==a, drop=TRUE)  
might do that task, but it clearly doesn't.


--
DW


In other words, how do I take a subset which yields a as the only
level for log2?

Alex

-Original Message-
From: William Dunlap [mailto:wdun...@tibco.com]
Sent: Thursday, October 15, 2009 11:59 PM
To: Alexander Peterhansl; r-help@r-project.org
Subject: RE: [R] tapply() and using factor() on a factor


-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Alexander
Peterhansl
Sent: Thursday, October 15, 2009 2:50 PM
To: r-help@r-project.org
Subject: [R] tapply() and using factor() on a factor

Dear List,
Shouldn't result1 and result2 be equal in the following case?

Note that log$RequestID is a factor.  That is,
is.factor(log$RequestID)
yields TRUE.

result1 - tapply(log$Flag,factor(log$RequestID),sum)

result2 - tapply(log$Flag,log$RequestID,sum)


Showing us the output of dput(log) (or str(log) and summary(log))
would let people discover the problem more readily.  Since you
didn't I'll guess what the dataset may contain.

If log$RequestID is a factor with lots of unused levels tapply
will output an NA for each unused level.  factor(log$RequestID)
will create a new set of levels, only those actually used,
so tapply will not be forced to fill those spots with NA's.  E.g.,


log-data.frame(Flag=1:2, RequestID=factor(letters[1:2],

levels=letters[1:10]))

tapply(log$Flag, log$RequestID, sum)

a  b  c  d  e  f  g  h  i  j
1  2 NA NA NA NA NA NA NA NA

tapply(log$Flag, factor(log$RequestID), sum)

a b
1 2

I suppose tapply(X,INDEX,FUN) could call FUN(X[0]) to see
how to fill the cells with no data behind them, but it doesn't.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com





Yet, when I summarize the output, I get the following:

summary(result1)

  Min.1st Qu.  Median  Mean 3rd Qu.Max.

 11.00   11.00 11.00  26.06   11.00   101.00



summary(result2)

  Min. 1st Qu.  Median Mean 3rd Qu.Max.NA's

 11.00   11.00   11.0026.06   11.00  101.00   978.00



Why does result2 have 978 NA's?



Any help on this would be appreciated.





David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tapply() and using factor() on a factor

2009-10-15 Thread Alexander Peterhansl
Dear List,

 

Shouldn't result1 and result2 be equal in the following case?

 

Note that log$RequestID is a factor.  That is, is.factor(log$RequestID)
yields TRUE.

 

result1 - tapply(log$Flag,factor(log$RequestID),sum)

result2 - tapply(log$Flag,log$RequestID,sum)

 

Yet, when I summarize the output, I get the following:

summary(result1)

   Min.1st Qu.  Median  Mean 3rd Qu.Max. 

  11.00   11.00 11.00  26.06   11.00   101.00

 

summary(result2)

   Min. 1st Qu.  Median Mean 3rd Qu.Max.NA's 

  11.00   11.00   11.0026.06   11.00  101.00   978.00

 

Why does result2 have 978 NA's?

 

Any help on this would be appreciated.

 

Alex

 

 

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply() and using factor() on a factor

2009-10-15 Thread William Dunlap
 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Alexander 
 Peterhansl
 Sent: Thursday, October 15, 2009 2:50 PM
 To: r-help@r-project.org
 Subject: [R] tapply() and using factor() on a factor
 
 Dear List,
 
  
 
 Shouldn't result1 and result2 be equal in the following case?
 
  
 
 Note that log$RequestID is a factor.  That is, 
 is.factor(log$RequestID)
 yields TRUE.
 
  
 
 result1 - tapply(log$Flag,factor(log$RequestID),sum)
 
 result2 - tapply(log$Flag,log$RequestID,sum)

Showing us the output of dput(log) (or str(log) and summary(log))
would let people discover the problem more readily.  Since you
didn't I'll guess what the dataset may contain.

If log$RequestID is a factor with lots of unused levels tapply
will output an NA for each unused level.  factor(log$RequestID)
will create a new set of levels, only those actually used,
so tapply will not be forced to fill those spots with NA's.  E.g.,

 log-data.frame(Flag=1:2, RequestID=factor(letters[1:2],
levels=letters[1:10]))
 tapply(log$Flag, log$RequestID, sum)
 a  b  c  d  e  f  g  h  i  j
 1  2 NA NA NA NA NA NA NA NA
 tapply(log$Flag, factor(log$RequestID), sum)
a b
1 2

I suppose tapply(X,INDEX,FUN) could call FUN(X[0]) to see
how to fill the cells with no data behind them, but it doesn't.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 
  
 
 Yet, when I summarize the output, I get the following:
 
 summary(result1)
 
Min.1st Qu.  Median  Mean 3rd Qu.Max. 
 
   11.00   11.00 11.00  26.06   11.00   101.00
 
  
 
 summary(result2)
 
Min. 1st Qu.  Median Mean 3rd Qu.Max.NA's 
 
   11.00   11.00   11.0026.06   11.00  101.00   978.00
 
  
 
 Why does result2 have 978 NA's?
 
  
 
 Any help on this would be appreciated.
 
  
 
 Alex
 
  
 
  
 
  
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >