[R] aggregate factor

2007-09-07 Thread Bill Szkotnicki
Hi,
I am using aggregate to compute means for later plotting.
There are two factors involved and the problem is that the values of the 
second factor ( Age ) in the means are not in the right order because 
10 comes inbetween 1 and 2
What I really want is the numeric value of Age but as.numeric and 
as.integer returns the level value instead.
Is there a way to easily get the numeric value?
I am using Windows R 2.5.1

Thanks,

  str(fishdata)
'data.frame':   372 obs. of  6 variables:
 $ Lake: Factor w/ 3 levels EVANS,JOLLIET,..: 3 3 3 3 3 3 3 3 3 3 ...
 $ Age : int  1 1 1 1 1 1 1 1 1 1 ...
 $ TL  : int  132 120 125 115 130 120 115 110 117 116 ...
 $ W   : int  10 10 10 10 10 10 10 10 10 20 ...
 $ Sex : Factor w/ 3 levels F,I,M: 1 1 2 2 2 1 1 1 2 2 ...
 $ WT  : num  0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 ...
  fishdatameans=aggregate(fishdata$TL, list(Lake = fishdata$Lake, 
Age=fishdata$Age), mean)

#  Now Age is a Factor but 10 is in the wrong position.
  fishdatameans$Age
 [1] 0  1  1  1  2  2  2  3  3  3  4  4  4  5  5  6  6  6  7  8  9  10
Levels: 0 1 10 2 3 4 5 6 7 8 9

  as.numeric(fishdatameans$Age)
 [1]  1  2  2  2  4  4  4  5  5  5  6  6  6  7  7  8  8  8  9 10 11  3
 
# What I want  is    0  1  1  1  2  2  2  3  3  3  4  4  4  5  5  
6  6  6  7  8  9  10

Bill

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate factor

2007-09-07 Thread ONKELINX, Thierry
Try this.

as.numeric(levels(fishdata$Age))[fishdata$Age]

HTH,

Thierry



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
[EMAIL PROTECTED]
www.inbo.be 

Do not put your faith in what statistics say until you have carefully
considered what they do not say.  ~William W. Watt
A statistical analysis, properly conducted, is a delicate dissection of
uncertainties, a surgery of suppositions. ~M.J.Moroney

 

 -Oorspronkelijk bericht-
 Van: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] Namens Bill Szkotnicki
 Verzonden: vrijdag 7 september 2007 21:10
 Aan: r-help@stat.math.ethz.ch
 Onderwerp: [R] aggregate factor
 
 Hi,
 I am using aggregate to compute means for later plotting.
 There are two factors involved and the problem is that the 
 values of the second factor ( Age ) in the means are not in 
 the right order because 10 comes inbetween 1 and 2
 What I really want is the numeric value of Age but as.numeric 
 and as.integer returns the level value instead.
 Is there a way to easily get the numeric value?
 I am using Windows R 2.5.1
 
 Thanks,
 
   str(fishdata)
 'data.frame':   372 obs. of  6 variables:
  $ Lake: Factor w/ 3 levels EVANS,JOLLIET,..: 3 3 3 3 3 3 
 3 3 3 3 ...
  $ Age : int  1 1 1 1 1 1 1 1 1 1 ...
  $ TL  : int  132 120 125 115 130 120 115 110 117 116 ...
  $ W   : int  10 10 10 10 10 10 10 10 10 20 ...
  $ Sex : Factor w/ 3 levels F,I,M: 1 1 2 2 2 1 1 1 2 2 ...
  $ WT  : num  0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 ...
   fishdatameans=aggregate(fishdata$TL, list(Lake = 
 fishdata$Lake, Age=fishdata$Age), mean)
 
 #  Now Age is a Factor but 10 is in the wrong position.
   fishdatameans$Age
  [1] 0  1  1  1  2  2  2  3  3  3  4  4  4  5  5  6  6  6  7  8  9  10
 Levels: 0 1 10 2 3 4 5 6 7 8 9
 
   as.numeric(fishdatameans$Age)
  [1]  1  2  2  2  4  4  4  5  5  5  6  6  6  7  7  8  8  8  9 
 10 11  3  
 # What I want  is    0  1  1  1  2  2  2  3  3  3  4  4  
 4  5  5  
 6  6  6  7  8  9  10
 
 Bill
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Aggregate daily data into weekly sums

2007-07-23 Thread Jacques Wagnor
Dear Lest,

I have a two-variable data frame as follows (the time peirod of the
actual data set is 10 years):

Date Amount
1   6/1/2007  1
2   6/1/2007  1
3   6/4/2007  2
4   6/5/2007  2
5  6/11/2007  3
6  6/12/2007  3
7  6/12/2007  3
8  6/13/2007  3
9  6/13/2007  3
10 6/18/2007  4
11 6/18/2007  4
12 6/25/2007  5
13 6/28/2007  5


Basically, I would like to collapse the daily data into weekly sums
such that the result should look like the following:

  Date Amount
1  2007/6/Week1   2
2  2007/6/Week2   4
3  2007/6/Week3   15
4  2007/6/Week4   8
5  2007/6/Week5  10

Does there already exist a function that aggregates the data at
user-defined time frequency?

Any pointers would be greatly appreciated.

Jacques

 version
   _
platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status
major  2
minor  5.0
year   2007
month  04
day23
svn rev41293
language   R
version.string R version 2.5.0 (2007-04-23)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate daily data into weekly sums

2007-07-23 Thread Henrique Dallazuanna
Hi,

Perhaps you can try:

 df
 Date Amount
1  2007-06-01  1
2  2007-06-01  1
3  2007-06-04  2
4  2007-06-05  2
5  2007-06-11  3
6  2007-06-12  3
7  2007-06-12  3
8  2007-06-13  3
9  2007-06-13  3
10 2007-06-18  4
11 2007-06-18  4
12 2007-06-25  5
13 2007-06-28  5

df_ok - aggregate(df$Amount, by=list(df$Amount), FUN=sum)
levels(df_ok$Group.1)- paste(2007/06/Week, 1:5, sep=)
-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

On 23/07/07, Jacques Wagnor [EMAIL PROTECTED] wrote:

 Dear Lest,

 I have a two-variable data frame as follows (the time peirod of the
 actual data set is 10 years):

 Date Amount
 1   6/1/2007  1
 2   6/1/2007  1
 3   6/4/2007  2
 4   6/5/2007  2
 5  6/11/2007  3
 6  6/12/2007  3
 7  6/12/2007  3
 8  6/13/2007  3
 9  6/13/2007  3
 10 6/18/2007  4
 11 6/18/2007  4
 12 6/25/2007  5
 13 6/28/2007  5


 Basically, I would like to collapse the daily data into weekly sums
 such that the result should look like the following:

   Date Amount
 1  2007/6/Week1   2
 2  2007/6/Week2   4
 3  2007/6/Week3   15
 4  2007/6/Week4   8
 5  2007/6/Week5  10

 Does there already exist a function that aggregates the data at
 user-defined time frequency?

 Any pointers would be greatly appreciated.

 Jacques

  version
_
 platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status
 major  2
 minor  5.0
 year   2007
 month  04
 day23
 svn rev41293
 language   R
 version.string R version 2.5.0 (2007-04-23)

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate daily data into weekly sums

2007-07-23 Thread antonio rodriguez
Or,

z-mydata #zoo object

new.time - as.Date(7 * floor(as.numeric(time(z))/7) + 7)
z2 - aggregate(z, new.time, mean)




Henrique Dallazuanna escribió:
 Hi,

 Perhaps you can try:

   
 df
 
  Date Amount
 1  2007-06-01  1
 2  2007-06-01  1
 3  2007-06-04  2
 4  2007-06-05  2
 5  2007-06-11  3
 6  2007-06-12  3
 7  2007-06-12  3
 8  2007-06-13  3
 9  2007-06-13  3
 10 2007-06-18  4
 11 2007-06-18  4
 12 2007-06-25  5
 13 2007-06-28  5

 df_ok - aggregate(df$Amount, by=list(df$Amount), FUN=sum)
 levels(df_ok$Group.1)- paste(2007/06/Week, 1:5, sep=)
   
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   


-- 
=
Por favor, si me mandas correos con copia a varias personas, 
pon mi dirección de correo en copia oculta (CCO), para evitar 
que acabe en montones de sitios, eliminando mi privacidad, 
favoreciendo la propagación de virus y la proliferación del SPAM. Gracias.
-
If you send me e-mail which has also been sent to several other people,
kindly mark my address as blind-carbon-copy (or BCC), to avoid its
distribution, which affects my privacy, increases the likelihood of
spreading viruses, and leads to more SPAM. Thanks.
=
Antes de imprimir este e-mail piense bien si es necesario hacerlo: El 
medioambiente es cosa de todos.
Before printing this email, assess if it is really needed.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate daily data into weekly sums

2007-07-23 Thread Gabor Grothendieck
Try this.  I have changed output format to yyy/mm/Weekw so its ordered.

Lines - Date Amount
 6/1/2007  1
 6/1/2007  1
 6/4/2007  2
 6/5/2007  2
6/11/2007  3
6/12/2007  3
6/12/2007  3
6/13/2007  3
6/13/2007  3
6/18/2007  4
6/18/2007  4
6/25/2007  5
6/28/2007  5


# replace next line with
# DF - read.table(myfile.dat, header = TRUE)
DF - read.table(textConnection(Lines), header = TRUE)
DF$Date - as.Date(DF$Date, %m/%d/%Y)

# weeks since first Sunday after Epoch
# assumes week starts on Sunday.  Change 3 to 4 for Monday.
fmt - function(x) {
weeks - function(x) as.numeric(x + 3) %/% 7 + 1
sprintf(%s%05d, format(x, %Y/%m/Week), weeks(x) - weeks(x[1]) + 1)
}

aggregate(DF$Amount, list(Date = fmt(DF$Date)), sum)

# alternative to above using zoo.  DF and fmt are from above.
# Returns a zoo object.
library(zoo)
aggregate(zoo(DF$Amount), fmt(DF$Date), sum)

On 7/23/07, Jacques Wagnor [EMAIL PROTECTED] wrote:
 Dear Lest,

 I have a two-variable data frame as follows (the time peirod of the
 actual data set is 10 years):

Date Amount
 1   6/1/2007  1
 2   6/1/2007  1
 3   6/4/2007  2
 4   6/5/2007  2
 5  6/11/2007  3
 6  6/12/2007  3
 7  6/12/2007  3
 8  6/13/2007  3
 9  6/13/2007  3
 10 6/18/2007  4
 11 6/18/2007  4
 12 6/25/2007  5
 13 6/28/2007  5


 Basically, I would like to collapse the daily data into weekly sums
 such that the result should look like the following:

  Date Amount
 1  2007/6/Week1   2
 2  2007/6/Week2   4
 3  2007/6/Week3   15
 4  2007/6/Week4   8
 5  2007/6/Week5  10

 Does there already exist a function that aggregates the data at
 user-defined time frequency?

 Any pointers would be greatly appreciated.

 Jacques

  version
   _
 platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status
 major  2
 minor  5.0
 year   2007
 month  04
 day23
 svn rev41293
 language   R
 version.string R version 2.5.0 (2007-04-23)

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregate by two columns, sum not working while mean is

2007-06-07 Thread Guanrao Chen
Dear Fellow Rers,

I have a table looks like this:

ca, la, 12
ca, sd, 22
ca, la, 33
nm, al, 9
ma, lx, 18
ma, bs, 90
ma, lx, 22

I want to sum the 3rd column grouped by the first and
the second column, so the result look like this table:

ca, la, 45 
ca, sd, 22
nm, al, 9
ma, lx, 40 
ma, bs, 90

The two rows with  are sums.

I tried
aggregate(table,list(table$V1,table$V2),sum/mean), sum
was not working while mean worked.

Can anybody give a hint?

Thanks.
Guanrao

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate by two columns, sum not working while mean is

2007-06-07 Thread jim holtman
This seems to work fine:

 x - ca, la, 12
+ ca, sd, 22
+ ca, la, 33
+ nm, al, 9
+ ma, lx, 18
+ ma, bs, 90
+ ma, lx, 22
+ 
 table - read.csv(textConnection(x), header=FALSE)
 aggregate(table$V3,list(table$V1,table$V2),mean)
  Group.1 Group.2x
1  nm  al  9.0
2  ma  bs 90.0
3  ca  la 22.5
4  ma  lx 20.0
5  ca  sd 22.0
 aggregate(table$V3,list(table$V1,table$V2),sum)
  Group.1 Group.2  x
1  nm  al  9
2  ma  bs 90
3  ca  la 45
4  ma  lx 40
5  ca  sd 22




On 6/7/07, Guanrao Chen [EMAIL PROTECTED] wrote:

 Dear Fellow Rers,

 I have a table looks like this:

 ca, la, 12
 ca, sd, 22
 ca, la, 33
 nm, al, 9
 ma, lx, 18
 ma, bs, 90
 ma, lx, 22

 I want to sum the 3rd column grouped by the first and
 the second column, so the result look like this table:

 ca, la, 45 
 ca, sd, 22
 nm, al, 9
 ma, lx, 40 
 ma, bs, 90

 The two rows with  are sums.

 I tried
 aggregate(table,list(table$V1,table$V2),sum/mean), sum
 was not working while mean worked.

 Can anybody give a hint?

 Thanks.
 Guanrao

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate in zoo

2007-06-01 Thread Gabor Grothendieck
On 6/1/07, Alfonso Sammassimo [EMAIL PROTECTED] wrote:
 Hi R-experts,

 Thanks very much to Jim Holtman and Gabor on my previous question.

 I am having another problem with data manipulation in zoo. The following is
 data (Z) for first business day of every month in zoo format. I am trying to
 get mean of open for each year. I subset Z - Z[,2] then

  sapply(split(Z, format(index(Z), %Y)),mean)

 I get error message:

 2000 2001 2002 2003 2004 2005 2006 2007
  NA   NA   NA   NA   NA   NA   NA   NA
 Warning messages:
 1: argument is not numeric or logical: returning NA in: mean.default(X[[1]],
 ...)
 2: argument is not numeric or logical: returning NA in: mean.default(X[[2]],
 ...)
 etc...

 Any help on what I'm missing would be appreciated. I am particularly
 confused by the fact that the command used works fine on the original data
 file (i.e. before subsetting by first day of month). Sorry if I have
 overlooked something very simple.

 Z
 dayofmonthopen
 2000-02-01 011636.10
 2000-03-01 011596.75
 2000-04-03 031737.70
 2000-05-01 011695.65
 2000-06-01 011651.90
 2000-07-03 031669.20
 2000-08-01 011628.35
 2000-09-01 011717.35
 2000-10-02 021614.55
 2000-11-01 011587.10
 2000-12-01 011475.60
 2001-01-02 021450.65
 2001-02-01 011503.60
 2001-03-01 011351.95
 2001-04-02 021268.10
 2001-05-01 011369.20
 2001-06-01 011362.75
 2001-07-02 021331.55
 2001-08-01 011309.70
 2001-09-04 041235.55
 2001-10-01 011109.20
 2001-11-01 011155.55
 2001-12-03 031207.30



Can't tell what your Z really looks like,
try posting dput(Z) or explain how to create
Z from scratch, but at any rate your code has two
problems:

1. the result is not a zoo object (that may or may not be a problem)
2. your are combining the two columns altogether
and then taking the mean of that

Try copying and pasting this into your
session:

Lines - date dayofmonthopen
2000-02-01 011636.10
2000-03-01 011596.75
2000-04-03 031737.70
2000-05-01 011695.65
2000-06-01 011651.90
2000-07-03 031669.20
2000-08-01 011628.35
2000-09-01 011717.35
2000-10-02 021614.55
2000-11-01 011587.10
2000-12-01 011475.60
2001-01-02 021450.65
2001-02-01 011503.60
2001-03-01 011351.95
2001-04-02 021268.10
2001-05-01 011369.20
2001-06-01 011362.75
2001-07-02 021331.55
2001-08-01 011309.70
2001-09-04 041235.55
2001-10-01 011109.20
2001-11-01 011155.55
2001-12-03 031207.30

library(zoo)
z - read.zoo(textConnection(Lines), header = TRUE)
year - function(x) as.numeric(format(x, %Y))

sapply(split(z[,2], year(index(z))), mean)

# last line could be replaced with just this
aggregate(z[,2], year, mean)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Aggregate to find majority level of a factor

2007-05-31 Thread Thompson, Jonathan
I want to use the aggregate function to summarize data by a factor (my
field plots), but I want the summary to be the majority level of another
factor.

 
For example, given the dataframe:

Plot1 big
Plot1 big
Plot1 small
Plot2 big
Plot2 small
Plot2 small
Plot3 small
Plot3 small
Plot3 small


My desired result would be:
Plot1 big
Plot2 small
Plot3 small


I can't seem to find a scalar function that will give me the majority
level. 

Thanks in advance,

Jonathan Thompson

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate to find majority level of a factor

2007-05-31 Thread Martin Henry H. Stevens
How about tapply?

plot - gl(2,3); plot
type - letters[c(1,2,2,1,1,1)]; type
tapply(type, list(plot), function(x) {tabl - table(x)
 names(tabl[tabl==max 
(tabl)])})

Hank

On May 31, 2007, at 3:25 PM, Thompson, Jonathan wrote:

 I want to use the aggregate function to summarize data by a factor (my
 field plots), but I want the summary to be the majority level of  
 another
 factor.


 For example, given the dataframe:

 Plot1 big
 Plot1 big
 Plot1 small
 Plot2 big
 Plot2 small
 Plot2 small
 Plot3 small
 Plot3 small
 Plot3 small


 My desired result would be:
 Plot1 big
 Plot2 small
 Plot3 small


 I can't seem to find a scalar function that will give me the majority
 level.

 Thanks in advance,

 Jonathan Thompson

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.



Dr. Hank Stevens, Assistant Professor
338 Pearson Hall
Botany Department
Miami University
Oxford, OH 45056

Office: (513) 529-4206
Lab: (513) 529-4262
FAX: (513) 529-4243
http://www.cas.muohio.edu/~stevenmh/
http://www.muohio.edu/ecology/
http://www.muohio.edu/botany/

E Pluribus Unum

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate to find majority level of a factor

2007-05-31 Thread Marc Schwartz
On Thu, 2007-05-31 at 12:25 -0700, Thompson, Jonathan wrote:
 I want to use the aggregate function to summarize data by a factor (my
 field plots), but I want the summary to be the majority level of another
 factor.
 
  
 For example, given the dataframe:
 
 Plot1 big
 Plot1 big
 Plot1 small
 Plot2 big
 Plot2 small
 Plot2 small
 Plot3 small
 Plot3 small
 Plot3 small
 
 
 My desired result would be:
 Plot1 big
 Plot2 small
 Plot3 small
 
 
 I can't seem to find a scalar function that will give me the majority
 level. 
 
 Thanks in advance,
 
 Jonathan Thompson

Jonathan,

Try this:

 DF
 V1V2
1 Plot1   big
2 Plot1   big
3 Plot1 small
4 Plot2   big
5 Plot2 small
6 Plot2 small
7 Plot3 small
8 Plot3 small
9 Plot3 small


 with(DF, aggregate(V2, list(V1), function(x) names(which.max(table(x)
  Group.1 x
1   Plot1   big
2   Plot2 small
3   Plot3 small


See ?which.max, ?names and ?table.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate to find majority level of a factor

2007-05-31 Thread Peter Alspach

Jon

One way:  assuming your data.frame is 'jon'

aggregate(jon[,2], list(jon[,1]), function(x)
levels(x)[which.max(table(x))])
  Group.1 x
1   Plot1   big
2   Plot2 small
3   Plot3 small 

HTH 

Peter Alspach

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Thompson, Jonathan
 Sent: Friday, 1 June 2007 7:26 a.m.
 To: r-help@stat.math.ethz.ch
 Subject: [R] Aggregate to find majority level of a factor
 
 I want to use the aggregate function to summarize data by a 
 factor (my field plots), but I want the summary to be the 
 majority level of another factor.
 
  
 For example, given the dataframe:
 
 Plot1 big
 Plot1 big
 Plot1 small
 Plot2 big
 Plot2 small
 Plot2 small
 Plot3 small
 Plot3 small
 Plot3 small
 
 
 My desired result would be:
 Plot1 big
 Plot2 small
 Plot3 small
 
 
 I can't seem to find a scalar function that will give me the 
 majority level. 
 
 Thanks in advance,
 
 Jonathan Thompson
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate to find majority level of a factor

2007-05-31 Thread Mike Lawrence
This should do the trick. Also labels ties with NA.

a=as.data.frame(cbind(c(1,1,1,2,2,2,3,3,3,4,4),c 
('big','big','small','big','small','small','small','small','small','big' 
,'small')))
a$V2=factor(a$V2)

maj=function(x){
y=table(x)
z=which.max(y)
if(sum(y==max(y))==1){
return(names(y)[z])
}else{
return(NA)
}
}

aggregate(a$V2,list(a$V1),maj)


On 31-May-07, at 4:25 PM, Thompson, Jonathan wrote:

 I want to use the aggregate function to summarize data by a factor (my
 field plots), but I want the summary to be the majority level of  
 another
 factor.


 For example, given the dataframe:

 Plot1 big
 Plot1 big
 Plot1 small
 Plot2 big
 Plot2 small
 Plot2 small
 Plot3 small
 Plot3 small
 Plot3 small


 My desired result would be:
 Plot1 big
 Plot2 small
 Plot3 small


 I can't seem to find a scalar function that will give me the majority
 level.

 Thanks in advance,

 Jonathan Thompson

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

--
Mike Lawrence
Graduate Student, Department of Psychology, Dalhousie University

Website: http://myweb.dal.ca/mc973993
Public calendar: http://icalx.com/public/informavore/Public

The road to wisdom? Well, it's plain and simple to express:
Err and err and err again, but less and less and less.
- Piet Hein

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregate in zoo

2007-05-31 Thread Alfonso Sammassimo
Hi R-experts,

Thanks very much to Jim Holtman and Gabor on my previous question.

I am having another problem with data manipulation in zoo. The following is 
data (Z) for first business day of every month in zoo format. I am trying to 
get mean of open for each year. I subset Z - Z[,2] then

 sapply(split(Z, format(index(Z), %Y)),mean)

I get error message:

2000 2001 2002 2003 2004 2005 2006 2007
  NA   NA   NA   NA   NA   NA   NA   NA
Warning messages:
1: argument is not numeric or logical: returning NA in: mean.default(X[[1]], 
...)
2: argument is not numeric or logical: returning NA in: mean.default(X[[2]], 
...)
etc...

Any help on what I'm missing would be appreciated. I am particularly 
confused by the fact that the command used works fine on the original data 
file (i.e. before subsetting by first day of month). Sorry if I have 
overlooked something very simple.

Z
 dayofmonthopen
2000-02-01 011636.10
2000-03-01 011596.75
2000-04-03 031737.70
2000-05-01 011695.65
2000-06-01 011651.90
2000-07-03 031669.20
2000-08-01 011628.35
2000-09-01 011717.35
2000-10-02 021614.55
2000-11-01 011587.10
2000-12-01 011475.60
2001-01-02 021450.65
2001-02-01 011503.60
2001-03-01 011351.95
2001-04-02 021268.10
2001-05-01 011369.20
2001-06-01 011362.75
2001-07-02 021331.55
2001-08-01 011309.70
2001-09-04 041235.55
2001-10-01 011109.20
2001-11-01 011155.55
2001-12-03 031207.30

Thank you,
Alfonso Sammassimo
Melbourne, Australia

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregate similar to SPSS

2007-04-25 Thread Natalie O'Toole
Hi,

Does anyone know if: with R can you take a set of numbers and aggregate
them like you can in SPSS? For example, could you calculate the percentage
of people who smoke based on a dataset like the following:

smoke = 1
non-smoke = 2

variable
1
1
1
2
2
1
1
1
2
2
2
2
2
2


When aggregated, SPSS can tell you what percentage of persons are smokers
based on the frequency of 1's and 2's. Can R statistical package do a
similar thing?

Thanks,

Nat

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate similar to SPSS

2007-04-25 Thread Dylan Beaudette
?table

On Wednesday 25 April 2007 14:32, Natalie O'Toole wrote:
 Hi,

 Does anyone know if: with R can you take a set of numbers and aggregate
 them like you can in SPSS? For example, could you calculate the percentage
 of people who smoke based on a dataset like the following:

 smoke = 1
 non-smoke = 2

 variable
 1
 1
 1
 2
 2
 1
 1
 1
 2
 2
 2
 2
 2
 2


 When aggregated, SPSS can tell you what percentage of persons are smokers
 based on the frequency of 1's and 2's. Can R statistical package do a
 similar thing?

 Thanks,

 Nat

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate similar to SPSS

2007-04-25 Thread Andrew Robinson
Hi Nat,

can I suggest, without offending, that you purchase and read Peter
Dalgaard's Introductory Statistics with R or Michael Crawley's
Statistics: An Introduction using R or Venables and Ripley's Modern
Applied Statistics with S or Maindonald and Braun's Data Analysis
and Graphics Using R: An Example-based Approach,

or

download and read An Introduction to R 

http://cran.r-project.org/doc/manuals/R-intro.pdf

or one of the numerous contributed documents at

http://cran.r-project.org/other-docs.html

?

I hope that this helps,

Andrew.

On Wed, Apr 25, 2007 at 03:32:11PM -0600, Natalie O'Toole wrote:
 Hi,
 
 Does anyone know if: with R can you take a set of numbers and aggregate
 them like you can in SPSS? For example, could you calculate the percentage
 of people who smoke based on a dataset like the following:
 
 smoke = 1
 non-smoke = 2
 
 variable
 1
 1
 1
 2
 2
 1
 1
 1
 2
 2
 2
 2
 2
 2
 
 
 When aggregated, SPSS can tell you what percentage of persons are smokers
 based on the frequency of 1's and 2's. Can R statistical package do a
 similar thing?
 
 Thanks,
 
 Nat
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate similar to SPSS

2007-04-25 Thread Tim Churches
Andrew Robinson [EMAIL PROTECTED] wrote:
 can I suggest, without offending, that you purchase and read Peter
 Dalgaard's Introductory Statistics with R or Michael Crawley's
 Statistics: An Introduction using R or Venables and Ripley's Modern
 Applied Statistics with S or Maindonald and Braun's Data Analysis
 and Graphics Using R: An Example-based Approach,
 or download and read An Introduction to R 
 http://cran.r-project.org/doc/manuals/R-intro.pdf
 or one of the numerous contributed documents at
 http://cran.r-project.org/other-docs.html

For Natalie, who is an SPSS user, may I strongly recommend R FOR SAS AND SPSS 
USERS by Bob Muenchen at http://oit.utk.edu/scc/RforSASSPSSusers.pdf

This is a really, really excellent document which has proven to be an 
invaluable resource in introducing my SAS and SPSS using collegaues tot he 
delights or R.

And it is free (as in available at no cost).

Tim C

 On Wed, Apr 25, 2007 at 03:32:11PM -0600, Natalie O'Toole wrote:
  Hi,
  
  Does anyone know if: with R can you take a set of numbers and 
 aggregate
  them like you can in SPSS? For example, could you calculate the 
 percentage
  of people who smoke based on a dataset like the following:
  
  smoke = 1
  non-smoke = 2
  
  variable
  1
  1
  1
  2
  2
  1
  1
  1
  2
  2
  2
  2
  2
  2
  
  
  When aggregated, SPSS can tell you what percentage of persons are 
 smokers
  based on the frequency of 1's and 2's. Can R statistical package do a
  similar thing?
  
  Thanks,
  
  Nat
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 Andrew Robinson  
 Department of Mathematics and StatisticsTel: +61-3-8344-9763
 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
 http://www.ms.unimelb.edu.au/~andrewpr
 http://blogs.mbs.edu/fishing-in-the-bay/
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregate function

2007-04-23 Thread Michel Schnitz
Hello,

is there a way to use the aggregate function to calculate monthly mean 
in case i have one row in data frame that holds the date like 
-mm-dd? i know that it works for daily means. i also like to do it 
for monthly and yearly means. maybe there is something like aggregate(x, 
list(Date[%m]), mean)?
the data frame looks like:

DateTimez
2006-01-01  21:00   6,2
2006-01-01  22:00   5,7
2006-01-01  23:00   3,2
2006-01-02  00:00   7,8
2006-01-02  01:00   6,8
2006-01-02  02:00   5,6
.
.
.
2007-03-30  22:00   5,2
2007-03-30  23:00   8,3
2007-03-31  00:00   6,4
2007-03-31  01:00   7,4

thanks for help!
-- 
Michél Schnitz
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function

2007-04-23 Thread Gabor Grothendieck
try this.  The first group of lines recreates your data frame, DF, and
the last line is the aggregate:


Input - DateTimez
2006-01-01  21:00   6,2
2006-01-01  22:00   5,7
2006-01-01  23:00   3,2
2006-01-02  00:00   7,8
2006-01-02  01:00   6,8
2006-01-02  02:00   5,6
2007-03-30  22:00   5,2
2007-03-30  23:00   8,3
2007-03-31  00:00   6,4
2007-03-31  01:00   7,4

DF - read.table(textConnection(Input), header = TRUE, as.is = TRUE)
DF$z - as.numeric(sub(,, ., DF$z))
DF$Date - as.Date(DF$Date)

aggregate(DF[z], list(yearmon = format(DF$Date, %Y-%m)), mean)



On 4/23/07, Michel Schnitz [EMAIL PROTECTED] wrote:
 Hello,

 is there a way to use the aggregate function to calculate monthly mean
 in case i have one row in data frame that holds the date like
 -mm-dd? i know that it works for daily means. i also like to do it
 for monthly and yearly means. maybe there is something like aggregate(x,
 list(Date[%m]), mean)?
 the data frame looks like:

 DateTimez
 2006-01-01  21:00   6,2
 2006-01-01  22:00   5,7
 2006-01-01  23:00   3,2
 2006-01-02  00:00   7,8
 2006-01-02  01:00   6,8
 2006-01-02  02:00   5,6
 .
 .
 .
 2007-03-30  22:00   5,2
 2007-03-30  23:00   8,3
 2007-03-31  00:00   6,4
 2007-03-31  01:00   7,4

 thanks for help!
 --
 Michél Schnitz
 [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function

2007-04-23 Thread Michel Schnitz
it works. thanks a lot.

Gabor Grothendieck wrote:
 try this.  The first group of lines recreates your data frame, DF, and
 the last line is the aggregate:
 
 
 Input - DateTimez
 2006-01-01  21:00   6,2
 2006-01-01  22:00   5,7
 2006-01-01  23:00   3,2
 2006-01-02  00:00   7,8
 2006-01-02  01:00   6,8
 2006-01-02  02:00   5,6
 2007-03-30  22:00   5,2
 2007-03-30  23:00   8,3
 2007-03-31  00:00   6,4
 2007-03-31  01:00   7,4
 
 DF - read.table(textConnection(Input), header = TRUE, as.is = TRUE)
 DF$z - as.numeric(sub(,, ., DF$z))
 DF$Date - as.Date(DF$Date)
 
 aggregate(DF[z], list(yearmon = format(DF$Date, %Y-%m)), mean)
 
 
 
 On 4/23/07, Michel Schnitz [EMAIL PROTECTED] wrote:
 
 Hello,

 is there a way to use the aggregate function to calculate monthly mean
 in case i have one row in data frame that holds the date like
 -mm-dd? i know that it works for daily means. i also like to do it
 for monthly and yearly means. maybe there is something like aggregate(x,
 list(Date[%m]), mean)?
 the data frame looks like:

 DateTimez
 2006-01-01  21:00   6,2
 2006-01-01  22:00   5,7
 2006-01-01  23:00   3,2
 2006-01-02  00:00   7,8
 2006-01-02  01:00   6,8
 2006-01-02  02:00   5,6
 .
 .
 .
 2007-03-30  22:00   5,2
 2007-03-30  23:00   8,3
 2007-03-31  00:00   6,4
 2007-03-31  01:00   7,4

 thanks for help!
 -- 
 Michél Schnitz
 [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 

-- 
Michél Schnitz
[EMAIL PROTECTED]

Scharrenstrasse 07
06108 Halle-Saale
phone: +0049-(0)345- 290 85 24
mobile:+0049-(0)176- 239 000 64

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function

2007-04-23 Thread Martin Becker
If monthly should aggregate per -mm combination, you could try 
something like

   aggregate(x$z,list(cut(as.Date(x$Date),m)),mean)

for monthly aggregation and

   aggregate(x$z,list(cut(as.Date(x$Date),y)),mean)

for yearly means.
If monthly aggregation should aggregate over different years (and 
produce only 12 numbers), maybe

   aggregate(x$z, list(format(as.Date(x$Date),%m)),mean)

works (everything untested).
Be sure to use R 2.4.1 patched or 2.5.0, since there was a bug in 
cut.Date which prevents the yearly aggregation from working properly 
before R 2.4.1 patched!

Regards,

  Martin

Michel Schnitz wrote:
 Hello,

 is there a way to use the aggregate function to calculate monthly mean 
 in case i have one row in data frame that holds the date like 
 -mm-dd? i know that it works for daily means. i also like to do it 
 for monthly and yearly means. maybe there is something like aggregate(x, 
 list(Date[%m]), mean)?
 the data frame looks like:

 Date  Timez
 2006-01-0121:00   6,2
 2006-01-0122:00   5,7
 2006-01-0123:00   3,2
 2006-01-0200:00   7,8
 2006-01-0201:00   6,8
 2006-01-0202:00   5,6
 .
 .
 .
 2007-03-3022:00   5,2
 2007-03-3023:00   8,3
 2007-03-3100:00   6,4
 2007-03-3101:00   7,4

 thanks for help!


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Aggregate with numerous factors

2006-12-18 Thread Joachim Claudet
Dear list members,

I am facing some problems using the aggregate() function.
I want to calculate a sum and a mean of one variable over the 
combination of 12 factors with the aggregate() function to avoid loops 
but it doesn't work (or the job is far too long, it exceeds 2 hours). It 
works with a fewer number of factors, so I constructed a factor being 
the levels combination of 7 factors (I need the other ones being on 
their own). I had then 6 factors, but it still doesn't work.
Could someone tell me how to fix the problem or know another function I 
could use ?
Thank you very much,
Joachim Claudet.

-- 
º)))  º)))  º)))  º)))  º)))  º)))  º)))  º)))

Joachim Claudet

PhD

EPHE - CNRS FRE 2935
52, avenue Paul Alduy
66860 Perpignan cedex
Tel : 33 4 68662055
Fax : 33 4 68503686


º)))  º)))  º)))  º)))  º)))  º)))  º)))  º)))

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate with numerous factors

2006-12-18 Thread Peter Dalgaard
Joachim Claudet wrote:
 Dear list members,

 I am facing some problems using the aggregate() function.
 I want to calculate a sum and a mean of one variable over the 
 combination of 12 factors with the aggregate() function to avoid loops 
 but it doesn't work (or the job is far too long, it exceeds 2 hours). It 
 works with a fewer number of factors, so I constructed a factor being 
 the levels combination of 7 factors (I need the other ones being on 
 their own). I had then 6 factors, but it still doesn't work.
 Could someone tell me how to fix the problem or know another function I 
 could use ?
 Thank you very much,
 Joachim Claudet.

   
aggregate() is (currently) a wrapper for tapply(), so generates a table
which is indexed by the cartesian product of all the factors. If many cells
are empty, you can reduce the work by calculating the interaction factor up
front and remove levels that are not present in the data. This is pretty
much
the idea you already had, unless you forgot the bit about removing unused
levels. You could potentially extend the idea to all 12 factors, and then
extract the ones you want on their own from the result.

Alternatively, rewrite aggregate() and send us a patch ;-)

It is not necessarily all that hard. Here's a rough idea

IX - as.data.frame(by)
OO - do.call(order,IX)
Y - x[OO,]
g - cumsum(!duplicated(IX))
FF - unique(IX)
cbind(FF, sapply(split(x,g),FUN))

(completely untested, of course, and if it works, it works only for a
single-column x; otherwise, you need a loop over the columns somehow.)

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate with numerous factors

2006-12-18 Thread Peter Dalgaard
Peter Dalgaard wrote:
 Alternatively, rewrite aggregate() and send us a patch ;-)

 It is not necessarily all that hard. Here's a rough idea

 IX - as.data.frame(by)
 OO - do.call(order,IX)
 Y - x[OO,]
 g - cumsum(!duplicated(IX))
 FF - unique(IX)
 cbind(FF, sapply(split(x,g),FUN))

 (completely untested, of course, and if it works, it works only for a
 single-column x; otherwise, you need a loop over the columns somehow.
   
I see two glaring blunders already...

You need IX[OO,] in two places, and split(Y, g) not x

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Aggregate?

2006-12-08 Thread Gustaf Rydevik
Hi All,

I think i'm failing to undersatnd how aggregate() is supposed to work.

example:

 test1-sample(c(0,1),100,replace=T)
test2-sample(letters,100,replace=T)
aggregate(test1,list(test2),sum)
Error in data.frame(w, lapply(y, unlist, use.names = FALSE)) :
arguments imply differing number of rows: 26, 0

I thought this would give me a list containing the number of ones that
belong to each letter. What am I doing wrong?

Thanks in advance,

Gustaf

-- 
email:[EMAIL PROTECTED]
tel: +46(0)703051451
address: Kantorsgatan 50:190 75424 Uppsala Sweden

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate?

2006-12-08 Thread Ingmar Visser
It does that for me without errors ...
(R 2.3.1 on Mac OSX 10.4.8)
Best, Ingmar


 From: Gustaf Rydevik [EMAIL PROTECTED]
 Date: Fri, 8 Dec 2006 12:58:01 +0300
 To: r-help@stat.math.ethz.ch
 Subject: [R] Aggregate?
 
 Hi All,
 
 I think i'm failing to undersatnd how aggregate() is supposed to work.
 
 example:
 
  test1-sample(c(0,1),100,replace=T)
 test2-sample(letters,100,replace=T)
 aggregate(test1,list(test2),sum)
 Error in data.frame(w, lapply(y, unlist, use.names = FALSE)) :
 arguments imply differing number of rows: 26, 0
 
 I thought this would give me a list containing the number of ones that
 belong to each letter. What am I doing wrong?
 
 Thanks in advance,
 
 Gustaf
 
 -- 
 email:[EMAIL PROTECTED]
 tel: +46(0)703051451
 address: Kantorsgatan 50:190 75424 Uppsala Sweden
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate?

2006-12-08 Thread Petr Pikal
Hi

look to your workspace by ls(). I bet there is some mismatch in 
variables as your example works for me without any error. You 
probably redefined sum function.

  test1-sample(c(0,1),100,replace=T)
 test2-sample(letters,100,replace=T)
 aggregate(test1,list(test2),sum)
   Group.1 x
1b 1
2c 3
3d 1
4e 4

 sum-5
 aggregate(test1,list(test2),sum)
Error in FUN(X[[1]], ...) : argument INDEX is missing, with no 
default

HTH
Petr





On 8 Dec 2006 at 12:58, Gustaf Rydevik wrote:

Date sent:  Fri, 8 Dec 2006 12:58:01 +0300
From:   Gustaf Rydevik [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Subject:[R] Aggregate?

 Hi All,
 
 I think i'm failing to undersatnd how aggregate() is supposed to work.
 
 example:
 
  test1-sample(c(0,1),100,replace=T)
 test2-sample(letters,100,replace=T)
 aggregate(test1,list(test2),sum)
 Error in data.frame(w, lapply(y, unlist, use.names = FALSE)) :
 arguments imply differing number of rows: 26, 0
 
 I thought this would give me a list containing the number of ones that
 belong to each letter. What am I doing wrong?
 
 Thanks in advance,
 
 Gustaf
 
 -- 
 email:[EMAIL PROTECTED]
 tel: +46(0)703051451
 address: Kantorsgatan 50:190 75424 Uppsala Sweden
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate with multiple statistics?

2006-10-24 Thread Alex Brown
Try the summary function, which pretty much does exactly that.

-Alex


On 20 Oct 2006, at 23:44, Jonathan Greenberg wrote:

 Is there a way to calculate, say, the mean, min and max using  
 aggregate
 using one line of code?  Or do I need to call them separately (e.g.
 aggregate(...,mean); aggregate(...,min)) and then merge the data back
 together?

 --j

 -- 
 Jonathan A. Greenberg, PhD
 NRC Research Associate
 NASA Ames Research Center
 MS 242-4
 Moffett Field, CA 94035-1000
 Office: 650-604-5896
 Cell: 415-794-5043
 AIM: jgrn307
 MSN: [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Aggregate with multiple statistics?

2006-10-20 Thread Jonathan Greenberg
Is there a way to calculate, say, the mean, min and max using aggregate
using one line of code?  Or do I need to call them separately (e.g.
aggregate(...,mean); aggregate(...,min)) and then merge the data back
together?

--j

-- 
Jonathan A. Greenberg, PhD
NRC Research Associate
NASA Ames Research Center
MS 242-4
Moffett Field, CA 94035-1000
Office: 650-604-5896
Cell: 415-794-5043
AIM: jgrn307
MSN: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate with multiple statistics?

2006-10-20 Thread Gabor Grothendieck
Try summaryBy in package doBy. e.g. using the built in dataset CO2:

summaryBy(uptake ~ Plant, CO2, FUN = c(mean, min, max))

On 10/20/06, Jonathan Greenberg [EMAIL PROTECTED] wrote:
 Is there a way to calculate, say, the mean, min and max using aggregate
 using one line of code?  Or do I need to call them separately (e.g.
 aggregate(...,mean); aggregate(...,min)) and then merge the data back
 together?

 --j

 --
 Jonathan A. Greenberg, PhD
 NRC Research Associate
 NASA Ames Research Center
 MS 242-4
 Moffett Field, CA 94035-1000
 Office: 650-604-5896
 Cell: 415-794-5043
 AIM: jgrn307
 MSN: [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate with multiple statistics?

2006-10-20 Thread hadley wickham
 Try summaryBy in package doBy. e.g. using the built in dataset CO2:

 summaryBy(uptake ~ Plant, CO2, FUN = c(mean, min, max))

Or with reshape with a little more work:

cm - melt(CO2, id=1:4)
cast(cm, Type ~ Treatment, c(min,mean,max))

but you get some extra flexibility:

cast(cm, result_variable + Type ~ Treatment, c(min,mean,max))
cast(cm, Type ~ Treatment ~ result_variable, c(min,mean,max))
cast(cm, Type + Treatment ~ result_variable, c(min,mean,max))

Regards,

Hadley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Aggregate Values for All Levels of a Factor

2006-10-05 Thread Kaom Te
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

I'm a novice user trying to figure out how to retain NA aggregate
values. For example, given a data frame with data for 3 of the 4
possible factor colors(orange is omitted from the data frame), I want
to calculate the average height by color, but I'd like to retain the
knowledge that orange is a possible factor, its just missing. Here is
the example code:

 data - data.frame(color = factor(c(blue,red,red,green,blue),
levels = c(blue,red,green,orange)),
height = c(2,8,4,4,5))
 aggregate(data$height, list(color = data$color), mean)
  color   x
1  blue 3.5
2   red 6.0
3 green 4.0


Instead I would like to get

   color   x
1   blue 3.5
2red 6.0
3  green 4.0
4 orange  NA

Is this possible. I've read as much documentation as I can find, but am
unable to find the solution. It seems like something people would need
to do. So I would assume it must be built in somewhere or do I need to
write my own version of aggregate?

Thanks in advance,
Kaom
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFJYrLaaZgZdCbWv4RApNoAJ9jqKXne3IlQnd+PprS+7Kz1l4oRACfeu5I
Nv/xYWVsSGJD5+fdCP+02jk=
=b5TI
-END PGP SIGNATURE-

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate Values for All Levels of a Factor

2006-10-05 Thread Marc Schwartz
On Thu, 2006-10-05 at 15:44 -0700, Kaom Te wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Hello,
 
 I'm a novice user trying to figure out how to retain NA aggregate
 values. For example, given a data frame with data for 3 of the 4
 possible factor colors(orange is omitted from the data frame), I want
 to calculate the average height by color, but I'd like to retain the
 knowledge that orange is a possible factor, its just missing. Here is
 the example code:
 
  data - data.frame(color = factor(c(blue,red,red,green,blue),
 levels = c(blue,red,green,orange)),
   height = c(2,8,4,4,5))
  aggregate(data$height, list(color = data$color), mean)
   color   x
 1  blue 3.5
 2   red 6.0
 3 green 4.0
 
 
 Instead I would like to get
 
color   x
 1   blue 3.5
 2red 6.0
 3  green 4.0
 4 orange  NA
 
 Is this possible. I've read as much documentation as I can find, but am
 unable to find the solution. It seems like something people would need
 to do. So I would assume it must be built in somewhere or do I need to
 write my own version of aggregate?
 
 Thanks in advance,
 Kaom

If you review the Details section of ?aggregate, you will note:

  Empty subsets are removed, ...

Thus, one approach is:

tmp - tapply(data$height, data$color, mean, na.rm = TRUE)

 tmp
  bluered  green orange
   3.56.04.0 NA

DF - data.frame(color = names(tmp), mean.height = tmp, 
 row.names = seq(along = tmp))

 DF
   color mean.height
1   blue 3.5
2red 6.0
3  green 4.0
4 orange  NA


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregate function with 'NA'

2006-10-01 Thread Frank
Dear r-help reader,

I have some problems with the aggregate function.

My datframe looks like
 frame

   Day Time V1 V2
1   M0  3 NA
2   M0  4 NA
3   M0  5  2
4   M1 NA  4
5   M1 10  6
6   T0  4 45
7   T1  4  3
8   T1  3  2
9   T1  6  1

I used the aggegate function to obtain the mean in V1 and V2 over the 
grouping variable
Time and Day

  aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean)
   Group.1 Group.2 Time   V1 V2
1   M   00 4.00 NA
2   T   00 4.00 45
3   M   11   NA  5
4   T   11 4.33  2
 

My problem is now that I do not obtain a 'mean' for Day=M/Time=0 and 
Day=M/Time=1,

because aggregate ignores all values for a grouping variable if NA 
occurs.

I'm now hoping for some help so that the mean is still calculated in 
this group.

My table should look like:

  aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean)
   Group.1 Group.2 Time   V1 V2
1   M   00 4.00 2
2   T   00 4.00 45
3   M   11   10  5
4   T   11 4.33  2

 

I hope my description makes sense and appreciate any help.

Yours
Frank

[[alternative text/enriched version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function with 'NA'

2006-10-01 Thread Peter Dalgaard
Frank [EMAIL PROTECTED] writes:

   aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean)
 
 My problem is now that I do not obtain a 'mean' for Day=M/Time=0 and 
 Day=M/Time=1,
 
 because aggregate ignores all values for a grouping variable if NA 
 occurs.

No. But mean() will give an NA result if any vaues are NA.
 
 I'm now hoping for some help so that the mean is still calculated in 
 this group.

add na.rm=TRUE

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function with 'NA'

2006-10-01 Thread Johan Sandblom
aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean, na.rm=T)

2006/10/1, Frank [EMAIL PROTECTED]:
 Dear r-help reader,

 I have some problems with the aggregate function.

 My datframe looks like
  frame

Day Time V1 V2
 1   M0  3 NA
 2   M0  4 NA
 3   M0  5  2
 4   M1 NA  4
 5   M1 10  6
 6   T0  4 45
 7   T1  4  3
 8   T1  3  2
 9   T1  6  1

 I used the aggegate function to obtain the mean in V1 and V2 over the
 grouping variable
 Time and Day

   aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean)
Group.1 Group.2 Time   V1 V2
 1   M   00 4.00 NA
 2   T   00 4.00 45
 3   M   11   NA  5
 4   T   11 4.33  2
  

 My problem is now that I do not obtain a 'mean' for Day=M/Time=0 and
 Day=M/Time=1,

 because aggregate ignores all values for a grouping variable if NA
 occurs.

 I'm now hoping for some help so that the mean is still calculated in
 this group.

 My table should look like:

   aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean)
Group.1 Group.2 Time   V1 V2
 1   M   00 4.00 2
 2   T   00 4.00 45
 3   M   11   10  5
 4   T   11 4.33  2

  

 I hope my description makes sense and appreciate any help.

 Yours
 Frank

 [[alternative text/enriched version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Johan Sandblom  N8, MRC, Karolinska sjh
t +46851776108  17176 Stockholm
m +46735521477  Sweden
What is wanted is not the will to believe, but the
will to find out, which is the exact opposite
- Bertrand Russell

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function with 'NA'

2006-10-01 Thread Gabor Grothendieck
See ?mean and note the na.rm= argument:

aggregate(frame[-1], frame[1:2], mean, na.rm = TRUE)


On 10/1/06, Frank [EMAIL PROTECTED] wrote:
 Dear r-help reader,

 I have some problems with the aggregate function.

 My datframe looks like
  frame

   Day Time V1 V2
 1   M0  3 NA
 2   M0  4 NA
 3   M0  5  2
 4   M1 NA  4
 5   M1 10  6
 6   T0  4 45
 7   T1  4  3
 8   T1  3  2
 9   T1  6  1

 I used the aggegate function to obtain the mean in V1 and V2 over the
 grouping variable
 Time and Day

  aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean)
   Group.1 Group.2 Time   V1 V2
 1   M   00 4.00 NA
 2   T   00 4.00 45
 3   M   11   NA  5
 4   T   11 4.33  2
  

 My problem is now that I do not obtain a 'mean' for Day=M/Time=0 and
 Day=M/Time=1,

 because aggregate ignores all values for a grouping variable if NA
 occurs.

 I'm now hoping for some help so that the mean is still calculated in
 this group.

 My table should look like:

  aggregate(frame[,c(-1)],list(frame$Day,frame$Time),mean)
   Group.1 Group.2 Time   V1 V2
 1   M   00 4.00 2
 2   T   00 4.00 45
 3   M   11   10  5
 4   T   11 4.33  2

  

 I hope my description makes sense and appreciate any help.

 Yours
 Frank

[[alternative text/enriched version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate example : where is the state.region variable?

2006-08-22 Thread John Kane

--- MARK LEEDS [EMAIL PROTECTED] wrote:

 these people/experts provide all these packages and
 documentation as a FAVOR 
 and for the fact that they enjoy spreading
 knowledge/statistical computing 
 abilities etc.  It's not their job so I think
 criticism of the docs and the 
 fact that they use a variable from another place is
 kind of harsh.
 
  Mark
 

I am very appeciative of the time, expertise and great
helpfulness that I have seen in the R community.  

If there is no criticism of R then how do we find out
about problems that may exist?

 - Original Message - 
 From: John Kane [EMAIL PROTECTED]
 To: Gabor Grothendieck [EMAIL PROTECTED]
 Cc: R R-help r-help@stat.math.ethz.ch
 Sent: Monday, August 21, 2006 6:59 PM
 Subject: Re: [R] aggregate example : where is the
 state.region variable?
 
 
 
  --- Gabor Grothendieck [EMAIL PROTECTED]
  wrote:
 
  Its not part of state.x77.  Its a completely
  separate variable.
  Try ls(package:datasets) and notice its in the
  list
  or try ?state.region and note that its a variable
 in
  datasets.
 
  Thanks. I was wondering if it was going something
 like
  that.
 
  However, it is a bloody stupid example, at least
 to a
  newbie.  A call to another data.set in what is
  supposed to be a simple example is very confusing.
 
  When someone is apparently illustrating a function
  with a simple one line command I don't expect them
 to
  call another data set, apparently create a new
  variable (Region), and use that new variable as
 the
  grouping variable without a word of explanation of
  what the example is doing.
 
  If I sound a bit annoyed it is because I am. It
 might
  be nice to have an example illlustate the
 funtion,not
  do a couple of other undocumented things as well.
 
 
  On 8/21/06, John Kane [EMAIL PROTECTED] wrote:
   I was looking ?aggregate and ran the first
 example
  
aggregate(state.x77, list(Region =
 state.region),
   mean)
  
   The variables in state.x77 appear to be :
state.x77
   Population Income Illiteracy Life Exp Murder HS
  Grad
   Frost   Area
  
   Where is the state.region variable coming
 from?
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained,
  reproducible code.
  
 
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
  
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate example : where is the state.region variable?

2006-08-22 Thread Martin Maechler
 Gabor == Gabor Grothendieck [EMAIL PROTECTED]
 on Mon, 21 Aug 2006 21:03:49 -0400 writes:

Gabor It is worthwhile to note that what is being
Gabor illustrated here is aggregating a numeric matrix by a
Gabor factor using the aggregate.default method and, of
Gabor course, a factor can't be part of a numeric matrix.

Gabor Of course, that is not say that the examples could
Gabor not be improved in terms of clarity, simplicity and
Gabor comprehensiveness (there is no example of
Gabor aggregate.data.frame).

yes, thank you, Gabor . 
and we (the R developers) have accepted and incorporated
quite a few constructive proposals for improvement.

Just offending the original authors (bloody ..) without adding
any constructive proposal for improvement doesn't really help.
You can always get the money back you paid for R.
You can also decide to leave this mailing list and get the money
back you paid for that service.  Unfortunately, we can't get the
time and energy back we've lost when dealing with such postings...

Martin Maechler, ETH Zurich

Gabor On 8/21/06, John Kane [EMAIL PROTECTED] wrote:
  --- Gabor Grothendieck [EMAIL PROTECTED] wrote:
 
  Its not part of state.x77.  Its a completely  separate
 variable.   Try ls(package:datasets) and notice its in
 the  list  or try ?state.region and note that its a
 variable in  datasets.
 
 Thanks. I was wondering if it was going something like
 that.
 
 However, it is a bloody stupid example, at least to a
 newbie.  A call to another data.set in what is supposed
 to be a simple example is very confusing.
 
 When someone is apparently illustrating a function with a
 simple one line command I don't expect them to call
 another data set, apparently create a new variable
 (Region), and use that new variable as the grouping
 variable without a word of explanation of what the
 example is doing.
 
 If I sound a bit annoyed it is because I am. It might be
 nice to have an example illlustate the funtion,not do a
 couple of other undocumented things as well.
 
 
  On 8/21/06, John Kane [EMAIL PROTECTED] wrote:   I
 was looking ?aggregate and ran the first example
  
   aggregate(state.x77, list(Region = state.region),  
 mean)
  
   The variables in state.x77 appear to be :   
 state.x77   Population Income Illiteracy Life Exp
 Murder HS  Grad   Frost Area
  
   Where is the state.region variable coming from?
  
   __  
 R-help@stat.math.ethz.ch mailing list  
 https://stat.ethz.ch/mailman/listinfo/r-help   PLEASE
 do read the posting guide 
 http://www.R-project.org/posting-guide.html   and
 provide commented, minimal, self-contained, 
 reproducible code.
  
 
 
 
 __ Do You
 Yahoo!?  Tired of spam?  Yahoo! Mail has the best spam
 protection around http://mail.yahoo.com
 

Gabor __
Gabor R-help@stat.math.ethz.ch mailing list
Gabor https://stat.ethz.ch/mailman/listinfo/r-help PLEASE
Gabor do read the posting guide
Gabor http://www.R-project.org/posting-guide.html and
Gabor provide commented, minimal, self-contained,
Gabor reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate example : where is the state.region variable?

2006-08-22 Thread Richard M. Heiberger
 there is
 no factor in the dataset but why there is not one and
 why a call to another dataset is totally opaque.  

The reason is purely historical.  The state dataset is about
10 years older than the data.frame concept.  At the time the
state.* variables were constructed it was not possible to put
numeric data and factor data into the same rectangular structure.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate example : where is the state.region variable?

2006-08-22 Thread John Kane

--- Richard M. Heiberger [EMAIL PROTECTED] wrote:

  there is
  no factor in the dataset but why there is not one
 and
  why a call to another dataset is totally opaque.  
 
 The reason is purely historical.  The state dataset
 is about
 10 years older than the data.frame concept.  At the
 time the
 state.* variables were constructed it was not
 possible to put
 numeric data and factor data into the same
 rectangular structure.

I see. So originally the example would have been more
obvious.  Thanks

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregate example : where is the state.region variable?

2006-08-21 Thread John Kane
I was looking ?aggregate and ran the first example

 aggregate(state.x77, list(Region = state.region),
mean)

The variables in state.x77 appear to be :
 state.x77
Population Income Illiteracy Life Exp Murder HS Grad
Frost   Area

Where is the state.region variable coming from?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate example : where is the state.region variable?

2006-08-21 Thread Gabor Grothendieck
Its not part of state.x77.  Its a completely separate variable.
Try ls(package:datasets) and notice its in the list
or try ?state.region and note that its a variable in datasets.


On 8/21/06, John Kane [EMAIL PROTECTED] wrote:
 I was looking ?aggregate and ran the first example

  aggregate(state.x77, list(Region = state.region),
 mean)

 The variables in state.x77 appear to be :
  state.x77
 Population Income Illiteracy Life Exp Murder HS Grad
 Frost   Area

 Where is the state.region variable coming from?

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate example : where is the state.region variable?

2006-08-21 Thread Prof Brian Ripley
On Mon, 21 Aug 2006, John Kane wrote:

 I was looking ?aggregate and ran the first example
 
  aggregate(state.x77, list(Region = state.region),
 mean)
 
 The variables in state.x77 appear to be :
  state.x77
 Population Income Illiteracy Life Exp Murder HS Grad
 Frost   Area
 
 Where is the state.region variable coming from?

 find(state.region)
[1] package:datasets

Try ?state.region for more info.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate example : where is the state.region variable?

2006-08-21 Thread John Kane

--- Gabor Grothendieck [EMAIL PROTECTED]
wrote:

 Its not part of state.x77.  Its a completely
 separate variable.
 Try ls(package:datasets) and notice its in the
 list
 or try ?state.region and note that its a variable in
 datasets.

Thanks. I was wondering if it was going something like
that.

However, it is a bloody stupid example, at least to a
newbie.  A call to another data.set in what is
supposed to be a simple example is very confusing.

When someone is apparently illustrating a function
with a simple one line command I don't expect them to
call another data set, apparently create a new
variable (Region), and use that new variable as the
grouping variable without a word of explanation of
what the example is doing. 

If I sound a bit annoyed it is because I am. It might
be nice to have an example illlustate the funtion,not
do a couple of other undocumented things as well. 
 
 
 On 8/21/06, John Kane [EMAIL PROTECTED] wrote:
  I was looking ?aggregate and ran the first example
 
   aggregate(state.x77, list(Region = state.region),
  mean)
 
  The variables in state.x77 appear to be :
   state.x77
  Population Income Illiteracy Life Exp Murder HS
 Grad
  Frost   Area
 
  Where is the state.region variable coming from?
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate example : where is the state.region variable?

2006-08-21 Thread MARK LEEDS
these people/experts provide all these packages and documentation as a FAVOR 
and for the fact that they enjoy spreading knowledge/statistical computing 
abilities etc.  It's not their job so I think criticism of the docs and the 
fact that they use a variable from another place is kind of harsh.

 
  Mark





- Original Message - 
From: John Kane [EMAIL PROTECTED]
To: Gabor Grothendieck [EMAIL PROTECTED]
Cc: R R-help r-help@stat.math.ethz.ch
Sent: Monday, August 21, 2006 6:59 PM
Subject: Re: [R] aggregate example : where is the state.region variable?



 --- Gabor Grothendieck [EMAIL PROTECTED]
 wrote:

 Its not part of state.x77.  Its a completely
 separate variable.
 Try ls(package:datasets) and notice its in the
 list
 or try ?state.region and note that its a variable in
 datasets.

 Thanks. I was wondering if it was going something like
 that.

 However, it is a bloody stupid example, at least to a
 newbie.  A call to another data.set in what is
 supposed to be a simple example is very confusing.

 When someone is apparently illustrating a function
 with a simple one line command I don't expect them to
 call another data set, apparently create a new
 variable (Region), and use that new variable as the
 grouping variable without a word of explanation of
 what the example is doing.

 If I sound a bit annoyed it is because I am. It might
 be nice to have an example illlustate the funtion,not
 do a couple of other undocumented things as well.


 On 8/21/06, John Kane [EMAIL PROTECTED] wrote:
  I was looking ?aggregate and ran the first example
 
   aggregate(state.x77, list(Region = state.region),
  mean)
 
  The variables in state.x77 appear to be :
   state.x77
  Population Income Illiteracy Life Exp Murder HS
 Grad
  Frost   Area
 
  Where is the state.region variable coming from?
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
 


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate example : where is the state.region variable?

2006-08-21 Thread Gabor Grothendieck
It is worthwhile to note that what is being illustrated here is aggregating a
numeric matrix by a factor using the aggregate.default method and, of course,
a factor can't be part of a numeric matrix.

Of course, that is not say that the examples could not be improved in
terms of clarity, simplicity and comprehensiveness (there is no
example of aggregate.data.frame).


On 8/21/06, John Kane [EMAIL PROTECTED] wrote:

 --- Gabor Grothendieck [EMAIL PROTECTED]
 wrote:

  Its not part of state.x77.  Its a completely
  separate variable.
  Try ls(package:datasets) and notice its in the
  list
  or try ?state.region and note that its a variable in
  datasets.

 Thanks. I was wondering if it was going something like
 that.

 However, it is a bloody stupid example, at least to a
 newbie.  A call to another data.set in what is
 supposed to be a simple example is very confusing.

 When someone is apparently illustrating a function
 with a simple one line command I don't expect them to
 call another data set, apparently create a new
 variable (Region), and use that new variable as the
 grouping variable without a word of explanation of
 what the example is doing.

 If I sound a bit annoyed it is because I am. It might
 be nice to have an example illlustate the funtion,not
 do a couple of other undocumented things as well.
 
 
  On 8/21/06, John Kane [EMAIL PROTECTED] wrote:
   I was looking ?aggregate and ran the first example
  
aggregate(state.x77, list(Region = state.region),
   mean)
  
   The variables in state.x77 appear to be :
state.x77
   Population Income Illiteracy Life Exp Murder HS
  Grad
   Frost   Area
  
   Where is the state.region variable coming from?
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained,
  reproducible code.
  
 


 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregate data.frame by one column

2006-06-29 Thread Guo Wei-Wei
Hi, everyone,

I have a data.frame named eva like this:

IND PARTNO VC1 EO1 EO2 EO3 EO4 EO5
114 114001   2   5   4   4   5   4
114 114001   2   4   4   4   4   4
114 114001   2   4  NA  NA  NA  NA
112 112002   2   3   3   6   2   6
112 112002   2   1   1   3   4   4
112 112003   2   6   6   6   5   6
112 112003   2   5   7   6   6   6
112 112003   2   6   6   6   4   5
114 114004   2   2   3   3   2   4
114 114004   2   5   3   4   4   2
114 114004   2  NA  NA  NA  NA  NA
113 113005   2   5   5   6   6   5
113 113005   2   7   7   4   7   6
111 111006   2   5   7   7   7   7
112 112007   2   7   7   7   2   2
112 112007   2   6   6   6   1   2
112 112007   2   7   6   6   2   2
111 111008   2   4   1   3   1   4
111 111008   2   3   1   5   3   2

This is only a small part of the whole data. PARTNO is a digit variable
and I want to use it as a group variable to aggreate other variables.
What I want to get looks like this:

IND PARTNO NUM VC1 EO1 EO2 EO3 EO4 EO5
114 114001   3   2 4.3   4   4 4.5   4
112 112002   2   2   2   2 4.5   3   5
112 112003   3   2 5.7 6.3   6   5 5.7
114 114004   3   2 3.5   3 3.5   3   3
113 113005   2   2   6   6   5 6.5 5.5
111 111006   1   2   5   7   7   7   7
112 112007   3   2 6.7 6.3 6.3 1.7   2
111 111008   2   2 3.5   1   4   2   3

NUM is a newly added variable which indicates the case number
of each group grouped by PARTNO.

I have two questions on this manipulation.

The first is how to get the newly added variable NUM. I have no idea
on this question.

The second is how to average other variables by group. If there are
NA, I want
the average operation is done on other cases. For example, the
variable EO1 has
values of 2, 5, and NA on case 114004. What I have done is

 aggregate(eva[,-2], by=eva[,-2], mean)

But it seems because there are NAs, the aggregate cannot process.
Because the NA values are not a small part, I cannot use imputation
methods. I'm not sure whether my operation is right.

Does anyone have any suggestion on the two problems? Thanks in advance!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate data.frame by one column

2006-06-29 Thread Andrew Robinson
Hi Wei-Wei,

try this:

eva.agg - aggregate(x = list(
   VC1=eva$VC1,
   EO1=eva$EO1,
   EO2=eva$EO2,
   EO3=eva$EO3,
   EO4=eva$EO4,
   EO5=eva$EO5
   ),
 by = list(PARTNO=eva$PARTNO),
 FUN = mean, na.rm = TRUE)

eva.agg$NUM - aggregate(eva$PARTNO, list(eva$PARTNO), length)


Cheers

Andrew


On Fri, Jun 30, 2006 at 10:54:47AM +0800, Guo Wei-Wei wrote:
 Hi, everyone,
 
 I have a data.frame named eva like this:
 
 IND PARTNO VC1 EO1 EO2 EO3 EO4 EO5
 114 114001   2   5   4   4   5   4
 114 114001   2   4   4   4   4   4
 114 114001   2   4  NA  NA  NA  NA
 112 112002   2   3   3   6   2   6
 112 112002   2   1   1   3   4   4
 112 112003   2   6   6   6   5   6
 112 112003   2   5   7   6   6   6
 112 112003   2   6   6   6   4   5
 114 114004   2   2   3   3   2   4
 114 114004   2   5   3   4   4   2
 114 114004   2  NA  NA  NA  NA  NA
 113 113005   2   5   5   6   6   5
 113 113005   2   7   7   4   7   6
 111 111006   2   5   7   7   7   7
 112 112007   2   7   7   7   2   2
 112 112007   2   6   6   6   1   2
 112 112007   2   7   6   6   2   2
 111 111008   2   4   1   3   1   4
 111 111008   2   3   1   5   3   2
 
 This is only a small part of the whole data. PARTNO is a digit variable
 and I want to use it as a group variable to aggreate other variables.
 What I want to get looks like this:
 
 IND PARTNO NUM VC1 EO1 EO2 EO3 EO4 EO5
 114 114001   3   2 4.3   4   4 4.5   4
 112 112002   2   2   2   2 4.5   3   5
 112 112003   3   2 5.7 6.3   6   5 5.7
 114 114004   3   2 3.5   3 3.5   3   3
 113 113005   2   2   6   6   5 6.5 5.5
 111 111006   1   2   5   7   7   7   7
 112 112007   3   2 6.7 6.3 6.3 1.7   2
 111 111008   2   2 3.5   1   4   2   3
 
 NUM is a newly added variable which indicates the case number
 of each group grouped by PARTNO.
 
 I have two questions on this manipulation.
 
 The first is how to get the newly added variable NUM. I have no idea
 on this question.
 
 The second is how to average other variables by group. If there are
 NA, I want
 the average operation is done on other cases. For example, the
 variable EO1 has
 values of 2, 5, and NA on case 114004. What I have done is
 
  aggregate(eva[,-2], by=eva[,-2], mean)
 
 But it seems because there are NAs, the aggregate cannot process.
 Because the NA values are not a small part, I cannot use imputation
 methods. I'm not sure whether my operation is right.
 
 Does anyone have any suggestion on the two problems? Thanks in advance!
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
Email: [EMAIL PROTECTED] http://www.ms.unimelb.edu.au

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate data.frame by one column

2006-06-29 Thread Guo Wei-Wei
Hi Andrew,

Thank you very much! It works so well than I can expect.

All the best,
Wei-Wei

2006/6/30, Andrew Robinson [EMAIL PROTECTED]:
 Hi Wei-Wei,

 try this:

 eva.agg - aggregate(x = list(
VC1=eva$VC1,
EO1=eva$EO1,
EO2=eva$EO2,
EO3=eva$EO3,
EO4=eva$EO4,
EO5=eva$EO5
),
  by = list(PARTNO=eva$PARTNO),
  FUN = mean, na.rm = TRUE)

 eva.agg$NUM - aggregate(eva$PARTNO, list(eva$PARTNO), length)

 Cheers

 Andrew


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Aggregate?

2006-05-03 Thread Guenther, Cameron
Hello,

I have a data set with a grouping variable (TRIPID) and  several other
variables.  TRIPID is repeated in some areas and I would like to use a
function like aggregate to sum the variable UNITS according to TRIPID.
However I would also like to retain the other variables as they are in
the data set with the new summed TRIPID.

So what I have is something like this:

YEARMONTH   DAY CONTINUESPL AREACOUNTY  DEPTH
DEPUNIT GEARGEAR2   TRAPS   SOAKTIMEUNITS   FACTOR  DISPOSIT
NUMSETS TRIPST  TRIPID   
19921   26  1 SP0073928   8
25 4   NA  100 NA  NA
NA  161 1   NA  NA
NA  02163399054 19921   26
1 SP0073928   8 25 4   NA
100 NA  NA  NA  8
1   NA  NA  NA  02163399054
19921   26  2 SP0004228   8
25 4   NA  100 NA  NA
NA  161 1   NA  NA
NA  02163399054  19921   26
2 SP0004228   8 25 4   NA
100 NA  NA  NA  8
1   NA  NA  NA  02163399054
19921   25  NA  SP0052652   8
25 4   NA  100 NA  NA
NA  85  1   NA  NA
NA  02163399057   19921   26
NA  SP0037940   8 25 4   NA
100 NA  NA  NA  70
1   NA  NA  NA  02163399058
19921   27  NA  SP0072357   8
25 4   NA  100 NA  NA
NA  15  1   NA  NA
NA  02163399059
19921   27  NA  SP0072357   8
25 4   NA  100 NA  NA
NA  20  1   NA  NA
NA  02163399059 19921   27
NA  SP0026324   8 25 4   NA
100 NA  NA  NA  8
1   NA  NA  NA  02163399060
19921   28  1 SP0072357   8
25 4   NA  100 NA  NA
NA  2001   NA  NA
NA  02163399062 

And what I want is this:

YEARMONTH   DAY CONTINUESPL AREACOUNTY  DEPTH
DEPUNIT GEARGEAR2   TRAPS   SOAKTIMEUNITS   FACTOR  DISPOSIT
NUMSETS TRIPST  TRIPID   
19921   26  1 SP0073928   8
25 4   NA  100 NA  NA
NA  3381   NA  NA
NA  02163399054  19921   25
NA  SP0052652   8 25 4   NA
100 NA  NA  NA  85
1   NA  NA  NA  02163399057
19921   26  NA  SP0037940   8
25 4   NA  100 NA  NA
NA  70  1   NA  NA
NA  02163399058
19921   27  NA  SP0072357   8
25 4   NA  100 NA  NA
NA  35  1   NA  NA
NA  02163399059
19921   27  NA  SP0026324   8
25 4   NA  100 NA  NA
NA  8   1   NA  NA
NA  02163399060
19921   28  1 SP0072357   8
25 4   NA  100 NA  NA
NA  2001   NA  NA
NA  02163399062 

 
Does anyone know how to do this.  Data file is attached.
Thanks in advance

Cameron Guenther, Ph.D. 
Associate Research Scientist
FWC/FWRI, Marine Fisheries Research
100 8th Avenue S.E.
St. Petersburg, FL 33701
(727)896-8626 Ext. 4305
[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Aggregate?

2006-05-03 Thread Gabor Grothendieck
Suppose we want to sum C over levels of A and that B is constant
within levels of A.  Then:

DF - data.frame(A = gl(2,2), B = gl(2,2), C = 1:4)  # test data
do.call(rbind, by(DF, DF$A, function(x) replace(x[1,], C, sum(x$C



On 5/3/06, Guenther, Cameron [EMAIL PROTECTED] wrote:
 Hello,

 I have a data set with a grouping variable (TRIPID) and  several other
 variables.  TRIPID is repeated in some areas and I would like to use a
 function like aggregate to sum the variable UNITS according to TRIPID.
 However I would also like to retain the other variables as they are in
 the data set with the new summed TRIPID.

 So what I have is something like this:

 YEARMONTH   DAY CONTINUESPL AREACOUNTY  DEPTH
 DEPUNIT GEARGEAR2   TRAPS   SOAKTIMEUNITS   FACTOR  DISPOSIT
 NUMSETS TRIPST  TRIPID
 19921   26  1 SP0073928   8
 25 4   NA  100 NA  NA
 NA  161 1   NA  NA
 NA  02163399054 19921   26
 1 SP0073928   8 25 4   NA
 100 NA  NA  NA  8
 1   NA  NA  NA  02163399054
 19921   26  2 SP0004228   8
 25 4   NA  100 NA  NA
 NA  161 1   NA  NA
 NA  02163399054  19921   26
 2 SP0004228   8 25 4   NA
 100 NA  NA  NA  8
 1   NA  NA  NA  02163399054
 19921   25  NA  SP0052652   8
 25 4   NA  100 NA  NA
 NA  85  1   NA  NA
 NA  02163399057   19921   26
 NA  SP0037940   8 25 4   NA
 100 NA  NA  NA  70
 1   NA  NA  NA  02163399058
 19921   27  NA  SP0072357   8
 25 4   NA  100 NA  NA
 NA  15  1   NA  NA
 NA  02163399059
 19921   27  NA  SP0072357   8
 25 4   NA  100 NA  NA
 NA  20  1   NA  NA
 NA  02163399059 19921   27
 NA  SP0026324   8 25 4   NA
 100 NA  NA  NA  8
 1   NA  NA  NA  02163399060
 19921   28  1 SP0072357   8
 25 4   NA  100 NA  NA
 NA  2001   NA  NA
 NA  02163399062

 And what I want is this:

 YEARMONTH   DAY CONTINUESPL AREACOUNTY  DEPTH
 DEPUNIT GEARGEAR2   TRAPS   SOAKTIMEUNITS   FACTOR  DISPOSIT
 NUMSETS TRIPST  TRIPID
 19921   26  1 SP0073928   8
 25 4   NA  100 NA  NA
 NA  3381   NA  NA
 NA  02163399054  19921   25
 NA  SP0052652   8 25 4   NA
 100 NA  NA  NA  85
 1   NA  NA  NA  02163399057
 19921   26  NA  SP0037940   8
 25 4   NA  100 NA  NA
 NA  70  1   NA  NA
 NA  02163399058
 19921   27  NA  SP0072357   8
 25 4   NA  100 NA  NA
 NA  35  1   NA  NA
 NA  02163399059
 19921   27  NA  SP0026324   8
 25 4   NA  100 NA  NA
 NA  8   1   NA  NA
 NA  02163399060
 19921   28  1 SP0072357   8
 25 4   NA  100 NA  NA
 NA  2001   NA  NA
 NA  02163399062


 Does anyone know how to do this.  Data file is attached.
 Thanks in advance

 Cameron Guenther, Ph.D.
 Associate Research Scientist
 FWC/FWRI, Marine Fisheries Research
 100 8th Avenue S.E.
 St. Petersburg, FL 33701
 (727)896-8626 Ext. 4305
 [EMAIL PROTECTED]

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 

Re: [R] aggregate function....

2006-03-31 Thread Stephane CRUVEILLER
Nice trick, thx...

Stéphane.

On Wed, 2006-03-29 at 11:17 -0500, jim holtman wrote:
 try 'by':
  
  x
   S_id AF_Class count... R_gc_percent S_length
 5  82644971   30 0.4835678
 6  826449737 0.4835678
 8  82645541   31 0.5138894 
 9  82645542   11 0.5138894
 10 826455431 0.5138894
  do.call('rbind', by(x, x$S_id, function(y) y[which.max(y
 $AF_Class),]))
S_id AF_Class count... R_gc_percent S_length 
 8264497 826449737 0.4835678
 8264554 826455431 0.5138894
  
 
 
  
 On 3/29/06, Stephane CRUVEILLER [EMAIL PROTECTED] wrote: 
 Dear R users,
 
 I have some trouble with the aggregate function. Here are my
 data
 
  daf
  S_id AF_Class count... R_gc_percent S_length
 5  82644971   30 0.4835678
 6  826449737 0.4835678
 8  82645541   31 0.5138894
 9  82645542   11 0.5138894
 10 826455431 0.5138894
 
 for a given S_id, I would like to select the line
 corresponding to the
 max count. To perform this, I used:
  aggregate(daf,list(daf$S_id),max) 
 Group.1S_id AF_Class count... R_gc_percent S_length
 1 8264497 82644973   30 0.4835678
 2 8264554 82645543   31 0.5138894
 
 which is ok for the count. But I realized that max function is
 also 
 applied
 to AF_class (should be 1 and 1 instead of 3 and 3), so it
 seems that
 aggregate is not the appropriate function for that I want to
 do. Is
 there any other function I could use instead?
 
 Best whishes, 
 
 
 Stéphane.
 --
 ==
 Stephane CRUVEILLER Ph. D.
 Genoscope - Centre National de Sequencage
 Atelier de Genomique Comparative
 2, Rue Gaston Cremieux   CP 5706 
 91057 Evry Cedex - France
 Phone: +33 (0)1 60 87 84 58
 Fax: +33 (0)1 60 87 25 14
 EMails: [EMAIL PROTECTED] ,[EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 
 
 
 -- 
 Jim Holtman
 Cincinnati, OH 
 +1 513 646 9390 (Cell)
 +1 513 247 0281 (Home)
 
 What the problem you are trying to solve? 
-- 
==
Stephane CRUVEILLER Ph. D.
Genoscope - Centre National de Sequencage
Atelier de Genomique Comparative
2, Rue Gaston Cremieux   CP 5706
91057 Evry Cedex - France
Phone: +33 (0)1 60 87 84 58
Fax: +33 (0)1 60 87 25 14
EMails: [EMAIL PROTECTED] ,[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate function....

2006-03-29 Thread Stephane CRUVEILLER
Dear R users,

I have some trouble with the aggregate function. Here are my data

 daf
  S_id AF_Class count... R_gc_percent S_length
5  82644971   30 0.4835678
6  826449737 0.4835678
8  82645541   31 0.5138894
9  82645542   11 0.5138894
10 826455431 0.5138894

for a given S_id, I would like to select the line corresponding to the
max count. To perform this, I used:
  aggregate(daf,list(daf$S_id),max)
  Group.1S_id AF_Class count... R_gc_percent S_length
1 8264497 82644973   30 0.4835678
2 8264554 82645543   31 0.5138894

which is ok for the count. But I realized that max function is also
applied
to AF_class (should be 1 and 1 instead of 3 and 3), so it seems that
aggregate is not the appropriate function for that I want to do. Is
there any other function I could use instead?

Best whishes,


Stéphane.
-- 
==
Stephane CRUVEILLER Ph. D.
Genoscope - Centre National de Sequencage
Atelier de Genomique Comparative
2, Rue Gaston Cremieux   CP 5706
91057 Evry Cedex - France
Phone: +33 (0)1 60 87 84 58
Fax: +33 (0)1 60 87 25 14
EMails: [EMAIL PROTECTED] ,[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate function....

2006-03-29 Thread jim holtman
try 'by':

 x
  S_id AF_Class count... R_gc_percent S_length
5  82644971   30 0.4835678
6  826449737 0.4835678
8  82645541   31 0.5138894
9  82645542   11 0.5138894
10 826455431 0.5138894
 do.call('rbind', by(x, x$S_id, function(y) y[which.max(y$AF_Class),]))
   S_id AF_Class count... R_gc_percent S_length
8264497 826449737 0.4835678
8264554 826455431 0.5138894




On 3/29/06, Stephane CRUVEILLER [EMAIL PROTECTED] wrote:

 Dear R users,

 I have some trouble with the aggregate function. Here are my data

  daf
  S_id AF_Class count... R_gc_percent S_length
 5  82644971   30 0.4835678
 6  826449737 0.4835678
 8  82645541   31 0.5138894
 9  82645542   11 0.5138894
 10 826455431 0.5138894

 for a given S_id, I would like to select the line corresponding to the
 max count. To perform this, I used:
  aggregate(daf,list(daf$S_id),max)
 Group.1S_id AF_Class count... R_gc_percent S_length
 1 8264497 82644973   30 0.4835678
 2 8264554 82645543   31 0.5138894

 which is ok for the count. But I realized that max function is also
 applied
 to AF_class (should be 1 and 1 instead of 3 and 3), so it seems that
 aggregate is not the appropriate function for that I want to do. Is
 there any other function I could use instead?

 Best whishes,


 Stéphane.
 --
 ==
 Stephane CRUVEILLER Ph. D.
 Genoscope - Centre National de Sequencage
 Atelier de Genomique Comparative
 2, Rue Gaston Cremieux   CP 5706
 91057 Evry Cedex - France
 Phone: +33 (0)1 60 87 84 58
 Fax: +33 (0)1 60 87 25 14
 EMails: [EMAIL PROTECTED] ,[EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390 (Cell)
+1 513 247 0281 (Home)

What the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] aggregate data.frame using column-specific functions

2006-02-15 Thread Markus Preisetanz
Dear Colleagues, 

 

does anybody know how to aggregate a data.frame using different functions for 
different columns?

 

Sincerely

 

___

Markus Preisetanz

Consultant

 

Client Vela GmbH

Albert-Roßhaupter-Str. 32

81369 München

fon:  +49 (0) 89 742 17-113

fax:  +49 (0) 89 742 17-150

mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 



Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. 
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten 
haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. 
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail ist nicht 
gestattet.

This e-mail may contain confidential and/or privileged infor...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] aggregate data.frame using column-specific functions

2006-02-15 Thread Jacques VESLOT
you can use mapply()...

z - as.data.frame(matrix(1:3,3,3,T))
mapply(function(x,y) x(y), c(sum,prod,sum), z)


Markus Preisetanz a écrit :

Dear Colleagues, 

 

does anybody know how to aggregate a data.frame using different functions for 
different columns?

 

Sincerely

 

___

Markus Preisetanz

Consultant

 

Client Vela GmbH

Albert-Roßhaupter-Str. 32

81369 München

fon:  +49 (0) 89 742 17-113

fax:  +49 (0) 89 742 17-150

mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 



Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. 
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich 
erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie 
diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser 
E-Mail ist nicht gestattet.

This e-mail may contain confidential and/or privileged infor...{{dropped}}

  



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate data.frame using column-specific functions

2006-02-15 Thread justin bem
Hi 
   
  have you tried 
  ?aggregate 
   
  eg  df1-aggregate(mydata, list(mean1=x1,mean2=x2),mean)


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate vs tapply; is there a middle ground?

2006-02-12 Thread Hans Gardfjell
Thanks Peter!

I had a feeling that there must be a simpler, better, more elegant 
solution.

/Hans


Peter Dalgaard wrote:
 hadley wickham [EMAIL PROTECTED] writes:

   
 I faced a similar problem. Here's what I did

 tmp -
 data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
 tmp1 - with(tmp,aggregate(C,list(A=A,B=B),sum))
 tmp2 - expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B)))
 merge(tmp2,tmp1,all.x=T)

 At least fewer than 10 extra lines of code. Anyone with a simpler solution?
   
 Well, you can almost do this in with the reshape package:

 tmp -
 data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
 a - recast(tmp, A + B ~ ., sum)
 # see also recast(tmp, A  ~ B, sum)
 add.all.combinations(a, row=A, cols = B)

 Where add.all.combinations basically does what you outlined above --
 it would be easy enough to generalise to multiple dimensions.
 

 Anything wrong with

   
 as.data.frame(with(tmp,as.table(tapply(C,list(A=A,B=B),sum
 
A B   Freq
 1  A a NA
 2  B a -0.2524320
 3  C a  3.8539264
 4  D a NA
 5  A c  0.7227294
 6  B c -0.2694669
 7  C c  0.4760957
 8  D c NA
 9  A e NA
 10 B e  0.1800500
 11 C e NA
 12 D e -1.0350928

 (except the silly colname, responseName=sum should fix that).

   


-- 

*
Hans Gardfjell
Ecology and Environmental Science
Umeå University
90187 Umeå, Sweden
email: [EMAIL PROTECTED]
phone:  +46 907865267
mobile: +46 705984464

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate vs tapply; is there a middle ground?

2006-02-11 Thread Joseph LeBouton
Dear all,

I'm wanting to do a series of comparisons among 4 categorical variables:

a - aggregate(y, list(var1, var2, var3, var4), sum)

This gets me a very nice 2-dimensional data frame with one column per 
variable, BUT, as help for aggregate says, empty subsets are 
removed.  I don't see in help(aggregate) how I can change this.

In contrast,
a - tapply(y, list(var1, var2, var3, var4), sum)

gives me results for everything including empty subsets, but in an 
awkward 4-dimensional array that takes me another 10 lines of 
inefficient code to turn into a 2D data.frame.

Is there a way to directly do this calculation INCLUDING results for 
empty subsets, and still obtain a 2D array, matrix, or data.frame?  OR 
alternatively is there a simple way to mush the 4D result from the 
tapply into a 2D matrix/data.frame?

thanks very much in advance for any help!

-jlb

-- 

Joseph P. LeBouton
Forest Ecology PhD Candidate
Department of Forestry
Michigan State University
East Lansing, Michigan 48824

Office phone: 517-355-7744
email: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate vs tapply; is there a middle ground?

2006-02-11 Thread Hans Gardfjell
I faced a similar problem. Here's what I did

tmp - 
data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
tmp1 - with(tmp,aggregate(C,list(A=A,B=B),sum))
tmp2 - expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B)))
merge(tmp2,tmp1,all.x=T)

At least fewer than 10 extra lines of code. Anyone with a simpler solution?

Cheers, Hans


lebouton wrote:

Dear all,

I'm wanting to do a series of comparisons among 4 categorical variables:

a - aggregate(y, list(var1, var2, var3, var4), sum)

This gets me a very nice 2-dimensional data frame with one column per 
variable, BUT, as help for aggregate says, empty subsets are 
removed.  I don't see in help(aggregate) how I can change this.

In contrast,
a - tapply(y, list(var1, var2, var3, var4), sum)

gives me results for everything including empty subsets, but in an 
awkward 4-dimensional array that takes me another 10 lines of 
inefficient code to turn into a 2D data.frame.

Is there a way to directly do this calculation INCLUDING results for 
empty subsets, and still obtain a 2D array, matrix, or data.frame?  OR 
alternatively is there a simple way to mush the 4D result from the 
tapply into a 2D matrix/data.frame?

thanks very much in advance for any help!

-jlb

-- 

Joseph P. LeBouton
Forest Ecology PhD Candidate
Department of Forestry
Michigan State University
East Lansing, Michigan 48824

Office phone: 517-355-7744
email: lebouton at msu.edu https://stat.ethz.ch/mailman/listinfo/r-help


-- 

*
Hans Gardfjell
Ecology and Environmental Science
Umeå University
90187 Umeå, Sweden
email: [EMAIL PROTECTED]
phone:  +46 907865267
mobile: +46 705984464

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate vs tapply; is there a middle ground?

2006-02-11 Thread hadley wickham
 I faced a similar problem. Here's what I did

 tmp -
 data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
 tmp1 - with(tmp,aggregate(C,list(A=A,B=B),sum))
 tmp2 - expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B)))
 merge(tmp2,tmp1,all.x=T)

 At least fewer than 10 extra lines of code. Anyone with a simpler solution?

Well, you can almost do this in with the reshape package:

tmp -
data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
a - recast(tmp, A + B ~ ., sum)
# see also recast(tmp, A  ~ B, sum)
add.all.combinations(a, row=A, cols = B)

Where add.all.combinations basically does what you outlined above --
it would be easy enough to generalise to multiple dimensions.

Hadley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate vs tapply; is there a middle ground?

2006-02-11 Thread Joseph LeBouton
Thanks, Phil!  I've literally spent two hours on my own trying to find 
something that does exactly that.  Thanks for another pair of functions 
added to my (slowly!) growing R vocabulary.

-jlb

Phil Spector wrote:
 Joseph -
I'm sure there are clearer and more efficient ways to do it, but 
 here's something
 that seems to do what you want:
 
 z = tapply(y,list(var1,var2,var3,var4),sum)
 data.frame(do.call('expand.grid',dimnames(z)),y=do.call('rbind',as.list(z))) 
 
 
- Phil Spector
  Statistical Computing Facility
  Department of Statistics
  UC Berkeley
  [EMAIL PROTECTED]
 
 
 On Sat, 11 Feb 2006, Joseph LeBouton wrote:
 
 Dear all,

 I'm wanting to do a series of comparisons among 4 categorical variables:

 a - aggregate(y, list(var1, var2, var3, var4), sum)

 This gets me a very nice 2-dimensional data frame with one column per
 variable, BUT, as help for aggregate says, empty subsets are
 removed.  I don't see in help(aggregate) how I can change this.

 In contrast,
 a - tapply(y, list(var1, var2, var3, var4), sum)

 gives me results for everything including empty subsets, but in an
 awkward 4-dimensional array that takes me another 10 lines of
 inefficient code to turn into a 2D data.frame.

 Is there a way to directly do this calculation INCLUDING results for
 empty subsets, and still obtain a 2D array, matrix, or data.frame?  OR
 alternatively is there a simple way to mush the 4D result from the
 tapply into a 2D matrix/data.frame?

 thanks very much in advance for any help!

 -jlb

 -- 
 
 Joseph P. LeBouton
 Forest Ecology PhD Candidate
 Department of Forestry
 Michigan State University
 East Lansing, Michigan 48824

 Office phone: 517-355-7744
 email: [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

 
 

-- 

Joseph P. LeBouton
Forest Ecology PhD Candidate
Department of Forestry
Michigan State University
East Lansing, Michigan 48824

Office phone: 517-355-7744
email: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate vs tapply; is there a middle ground?

2006-02-11 Thread Peter Dalgaard
hadley wickham [EMAIL PROTECTED] writes:

  I faced a similar problem. Here's what I did
 
  tmp -
  data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
  tmp1 - with(tmp,aggregate(C,list(A=A,B=B),sum))
  tmp2 - expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B)))
  merge(tmp2,tmp1,all.x=T)
 
  At least fewer than 10 extra lines of code. Anyone with a simpler solution?
 
 Well, you can almost do this in with the reshape package:
 
 tmp -
 data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
 a - recast(tmp, A + B ~ ., sum)
 # see also recast(tmp, A  ~ B, sum)
 add.all.combinations(a, row=A, cols = B)
 
 Where add.all.combinations basically does what you outlined above --
 it would be easy enough to generalise to multiple dimensions.

Anything wrong with

 as.data.frame(with(tmp,as.table(tapply(C,list(A=A,B=B),sum
   A B   Freq
1  A a NA
2  B a -0.2524320
3  C a  3.8539264
4  D a NA
5  A c  0.7227294
6  B c -0.2694669
7  C c  0.4760957
8  D c NA
9  A e NA
10 B e  0.1800500
11 C e NA
12 D e -1.0350928

(except the silly colname, responseName=sum should fix that).

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate and ordered factors, feature?

2005-12-19 Thread David James
Hi,

aggregate() does not preserve the order of levels for
ordered factors, e.g.,

   levs - c(Low, Med, Hi)
   d - data.frame(x = 1:30, fac = ordered(rep(levs, 10), levels = levs))
   out - aggregate(d[,x], by = list(fac=d$f), FUN = mean)

   cat(Original ordered levels:, levels(d$fac), \n)
   cat(Levels in aggregated output:, levels(out$fac), \n)

Perhaps this is unintended?  If intended, a note in its documentation 
could be helpful to alert users.

 ? version
 _
platform i686-pc-linux-gnu
arch i686
os   linux-gnu
system   i686, linux-gnu
status   beta
major2
minor2.1
year 2005
month12
day  18
svn rev  36792
language R

--
David

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] aggregate slow with many rows - alternative?

2005-10-14 Thread TEMPL Matthias
Hi,

Yesterday, I have analysed data with 16 rows and 10 columns. 
Aggregation would be impossible with a data frame format, but when converting 
it to a matrix with *numeric* entries (check, if the variables are of class 
numeric!) the computation needs only 7 seconds on a Pentium III. I´m sadly to 
say, that this is also slow in comparsion with the proc summary in SAS (less 
than one second), but the code is much more elegant in R!

Best,
Matthias


 Hi,
 
 I use the code below to aggregate / cnt my test data. It 
 works fine, but the problem is with my real data (33'000 
 rows) where the function is really slow (nothing happened in 
 half an hour).
 
 Does anybody know of other functions that I could use?
 
 Thanks,
 Hans-Peter
 
 --
 dat - data.frame( Datum  = c( 32586, 32587, 32587, 32625, 
 32656, 32656, 32656, 32672, 32672, 32699 ),
   FischerID = c( 58395, 58395, 58395, 88434, 
 89953, 89953, 89953, 64395, 62896, 62870 ),
   Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) )
 f - function(x) data.frame( Datum = x[1,1], FischerID = 
 x[1,2], Anzahl = sum( x[,3] ), Cnt = dim( x )[1] )
 t.a - do.call(rbind, by(dat, dat[,1:2], f))   # slow for 
 33'000 rows
 t.a - t.a[order( t.a[,1], t.a[,2] ),]
 
   # show data
 dat
 t.a
 
 __
 R-help@stat.math.ethz.ch mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read 
 the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate slow with many rows - alternative?

2005-10-14 Thread jim holtman
Here is the way that I would do it. Using 'lapply' to process the list and
create a matrix; take less than 1 second:
  dat - data.frame(D=sample(32000:33000, 33000, T),
+ Fid=sample(1:10,33000,T), A=sample(1:5,33000,T))
 system.time({
+ result - lapply(split(seq(nrow(dat)), dat$D), function(.d){ # split by
first level
+ lapply(split(.d, dat$Fid[.d]), function(.f){ # now by the second
+ # create the sum and count
+ c(D=dat$D[.f[1]], Fid=dat$Fid[.f[1]], sum=sum(dat$A[.f]), cnt=length(.f))
+ })
+ })
+ mat - do.call('rbind',lapply(result, function(x) do.call('rbind',x)))
+ })
[1] 0.66 0.00 0.73 NA NA

 mat[1:20,]
D Fid sum cnt
1 32000 1 8 3
2 32000 2 11 4
3 32000 3 11 3
4 32000 4 2 1
5 32000 5 8 2
6 32000 6 4 2
7 32000 7 21 6
8 32000 8 13 3
9 32000 9 12 4
10 32000 10 10 3
1 32001 1 12 4
2 32001 2 2 1
3 32001 3 10 4
4 32001 4 12 3
5 32001 5 10 3
6 32001 6 8 2
7 32001 7 22 7
8 32001 8 3 2
9 32001 9 7 3
10 32001 10 3 2



 On 10/14/05, TEMPL Matthias [EMAIL PROTECTED] wrote:

 Hi,

 Yesterday, I have analysed data with 16 rows and 10 columns.
 Aggregation would be impossible with a data frame format, but when
 converting it to a matrix with *numeric* entries (check, if the variables
 are of class numeric!) the computation needs only 7 seconds on a Pentium
 III. I´m sadly to say, that this is also slow in comparsion with the proc
 summary in SAS (less than one second), but the code is much more elegant in
 R!

 Best,
 Matthias


  Hi,
 
  I use the code below to aggregate / cnt my test data. It
  works fine, but the problem is with my real data (33'000
  rows) where the function is really slow (nothing happened in
  half an hour).
 
  Does anybody know of other functions that I could use?
 
  Thanks,
  Hans-Peter
 
  --
  dat - data.frame( Datum = c( 32586, 32587, 32587, 32625,
  32656, 32656, 32656, 32672, 32672, 32699 ),
  FischerID = c( 58395, 58395, 58395, 88434,
  89953, 89953, 89953, 64395, 62896, 62870 ),
  Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) )
  f - function(x) data.frame( Datum = x[1,1], FischerID =
  x[1,2], Anzahl = sum( x[,3] ), Cnt = dim( x )[1] )
  t.a - do.call(rbind, by(dat, dat[,1:2], f)) # slow for
  33'000 rows
  t.a - t.a[order( t.a[,1], t.a[,2] ),]
 
  # show data
  dat
  t.a
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read
  the posting guide! http://www.R-project.org/posting-guide.html
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html




--
Jim Holtman
Cincinnati, OH
+1 513 247 0281

What the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] aggregate slow with many rows - alternative?

2005-10-14 Thread Hans-Peter
Many thanks for all your answers. Converting to a matrix didn't help,
I tried with Hmisc but didn't get anywhere (different summary
functions, multiple levels).

2005/10/14, jim holtman [EMAIL PROTECTED]:
 Here is the way that I would do it.  Using 'lapply' to process the list and 
 create a matrix

[snip]

Wow! That's a wonderful suggestion, Your code works just fine with my
data (takes 11 seconds). Thanks a lot, I couldn't have written such
code (reading some help entries now...).

Hans-Peter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate slow with many rows - alternative?

2005-10-13 Thread Hans-Peter
Hi,

I use the code below to aggregate / cnt my test data. It works fine,
but the problem is with my real data (33'000 rows) where the function
is really slow (nothing happened in half an hour).

Does anybody know of other functions that I could use?

Thanks,
Hans-Peter

--
dat - data.frame( Datum  = c( 32586, 32587, 32587, 32625, 32656,
32656, 32656, 32672, 32672, 32699 ),
  FischerID = c( 58395, 58395, 58395, 88434, 89953, 89953,
89953, 64395, 62896, 62870 ),
  Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) )
f - function(x) data.frame( Datum = x[1,1], FischerID = x[1,2],
Anzahl = sum( x[,3] ), Cnt = dim( x )[1] )
t.a - do.call(rbind, by(dat, dat[,1:2], f))   # slow for 33'000 rows
t.a - t.a[order( t.a[,1], t.a[,2] ),]

  # show data
dat
t.a

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate slow with many rows - alternative?

2005-10-13 Thread Gabor Grothendieck
Convert dat to a matrix and see if working with the
matrix instead of a data frame speeds things up
enough.

On 10/13/05, Hans-Peter [EMAIL PROTECTED] wrote:
 Hi,

 I use the code below to aggregate / cnt my test data. It works fine,
 but the problem is with my real data (33'000 rows) where the function
 is really slow (nothing happened in half an hour).

 Does anybody know of other functions that I could use?

 Thanks,
 Hans-Peter

 --
 dat - data.frame( Datum  = c( 32586, 32587, 32587, 32625, 32656,
 32656, 32656, 32672, 32672, 32699 ),
  FischerID = c( 58395, 58395, 58395, 88434, 89953, 89953,
 89953, 64395, 62896, 62870 ),
  Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) )
 f - function(x) data.frame( Datum = x[1,1], FischerID = x[1,2],
 Anzahl = sum( x[,3] ), Cnt = dim( x )[1] )
 t.a - do.call(rbind, by(dat, dat[,1:2], f))   # slow for 33'000 rows
 t.a - t.a[order( t.a[,1], t.a[,2] ),]

  # show data
 dat
 t.a

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate slow with many rows - alternative?

2005-10-13 Thread Frank E Harrell Jr
Gabor Grothendieck wrote:
 Convert dat to a matrix and see if working with the
 matrix instead of a data frame speeds things up
 enough.

In the Hmisc package the asNumericMatrix and matrix2dataFrame functions 
facilite this.

Also look at the summarize and mApply functions in Hmisc, which can be 
quite fast.

Frank Harrell

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate

2005-08-30 Thread Omar Lakkis
How can I aggregate this data.frame to list the min and max date for
each unique id?

From this :
 r = data.frame(id=rep(seq(1:3), 3), date= as.Date(c(rep(2005-08-25,3), 
 rep(2005-08-26,3), rep(2005-08-29, 3)), %Y-%m-%d))
 r
id date
 1 2005-08-25
 2 2005-08-25
 3 2005-08-25
 1 2005-08-26
 2 2005-08-26
 3 2005-08-26
 1 2005-08-29
 2 2005-08-29
 3 2005-08-29

I want to get to this:
 
idstart  end
 12005-08-252005-08-29
 22005-08-252005-08-29
 32005-08-252005-08-29

I tried aggregate and aggregate.data.frame but the date column keeps
getting converted into a number.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate

2005-08-30 Thread Dimitris Rizopoulos
maybe you could use something like this:

dat - data.frame(id = rep(1:3, 3), date = as.Date(rep(c(2005-08-25, 
2005-08-26, 2005-08-29), each = 3)))

do.call(rbind, lapply(split(dat, dat$id), function(x) data.frame(id 
= x$id[1], start = min(x$date), end = max(x$date


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: Omar Lakkis [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Tuesday, August 30, 2005 4:36 PM
Subject: [R] aggregate


 How can I aggregate this data.frame to list the min and max date for
 each unique id?

From this :
 r = data.frame(id=rep(seq(1:3), 3), date= 
 as.Date(c(rep(2005-08-25,3), rep(2005-08-26,3), 
 rep(2005-08-29, 3)), %Y-%m-%d))
 r
 id date
 1 2005-08-25
 2 2005-08-25
 3 2005-08-25
 1 2005-08-26
 2 2005-08-26
 3 2005-08-26
 1 2005-08-29
 2 2005-08-29
 3 2005-08-29

 I want to get to this:

 idstart  end
 12005-08-252005-08-29
 22005-08-252005-08-29
 32005-08-252005-08-29

 I tried aggregate and aggregate.data.frame but the date column keeps
 getting converted into a number.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate?

2005-06-17 Thread alex diaz
Dear all:

Here is my problem:

Example data:
dat-data.frame(x=rep(c(a,b,c,d),2),y=c(10:17))

If I wanted to aggregate each level of column dat$x I 
could use:
aggregate(dat$y,list(x=dat$x),sum)

But I just want to aggregate two levels (?c? and ?d?) 
to obtain a new level  ?e?
I am expecting something like:

  x  y
1 a 10
2 b 11
3 e 25
4 a 14
5 b 15
6 e 33


How can I make it?
Thanks in advance and best for all

A. Diaz



-
Email Enviado utilizando o servio MegaMail

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] aggregate?

2005-06-17 Thread Gabor Grothendieck
On 6/17/05, alex diaz [EMAIL PROTECTED] wrote:
 Dear all:
 
 Here is my problem:
 
 Example data:
 dat-data.frame(x=rep(c(a,b,c,d),2),y=c(10:17))
 
 If I wanted to aggregate each level of column dat$x I
 could use:
 aggregate(dat$y,list(x=dat$x),sum)
 
 But I just want to aggregate two levels (c and d)
 to obtain a new level  e
 I am expecting something like:
 
  x  y
 1 a 10
 2 b 11
 3 e 25
 4 a 14
 5 b 15
 6 e 33


In the example 
- dat$y[3:4] are summed and 
- dat$y[7:8] are summed 
so we assume that what is being requested is that d is to
be replaced by c and runs of any level are to be summed.

To do that:
- create xx such that a, b, c and d in dat$x are replaced with
  with 1, 2, 3 and 3 in xx.  
- in the second statement calculate a running sum except if the 
  last observation was the same as the current observation then 
  the Last Observation is Carried Forward (locf) so that all entries 
  in a run have the same number. e.g. in this case locf is
  c(1, 2, 3, 3, 4, 5, 6, 6)
- Finally the 'by' collapses dat using locf rbinds the
  resulting rows together to create a data frame.

xx - ifelse(dat$x == d, 3, dat$x)
locf - cumsum(c(TRUE, xx[-1] != xx[-length(xx)]))
f - function(x) data.frame(x=x[1,1], y=sum(x[,2]))
dat2 - do.call(rbind, by(dat, locf, f))

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate and stack

2005-05-25 Thread Paulo Brando
Dear All,

I have tried to calculate tree mean growth but I think the structure I used 
below (growthresumo) is not the most elegant, even though it worked. The only 
problem I had in this first part was that  I cannot use 'summary', just 'mean' 
(sorry but 'R' is pretty new for me).

growthresumo - 
aggregate(growth[,c(16,19,23,27,31,35,39,43,47,52,56,60,64,68,72,76,81,85,89,93,97,101,105,109,113,117,121,125,129,133,137,
141,145,149,153,157,161,165,169,173,177,181,185,189,194,197,201,205,209,213,217,221,225,229,233,237,241)],
by=(growth[,c(3,8)]),MEAN,na.rm=TRUE)

#after growth is calculated, I want to stack the results in just one colunm.  

growthvertical - c(growthresumo[,3],...,growthresumo[,50]) # this is very 
time consuming though

Parcel - c(C9,S8...C9,S8) # 50 items

date  c(DATE1DATE50)

growthpermonth - data.frame(Parcel, Date, growthvertical)

Thank you very much!

Paulo

Paulo Brando
Inst. de Pesquisa Ambiental da Amazônia (IPAM)
Rua Rui Barbosa,136.
68.005.080 Santarém, PA, Brasil.
Fone/Fax ++ 55 93 522 5538
www.ipam.org.br
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate and stack

2005-05-25 Thread Stephen D. Weigand

Dear Paulo,

On May 25, 2005, at 8:01 PM, Paulo Brando wrote:


Dear All,

I have tried to calculate tree mean growth but I think the structure I  
used below (growthresumo) is not the most elegant, even though it  
worked. The only problem I had in this first part was that  I cannot  
use 'summary', just 'mean' (sorry but 'R' is pretty new for me).




In case you didn't notice, help(aggregate) indicates that 'FUN'
should be a scalar function, so summary won't work for that reason.

growthresumo -  
aggregate(growth[,c(16,19,23,27,31,35,39,43,47,52,56,60,64,68,72,76,81 
,85,89,93,97,101,105,109,113,117,121,125,129,133,137,
141,145,149,153,157,161,165,169,173,177,181,185,189,194,197,201,205,209 
,213,217,221,225,229,233,237,241)],

by=(growth[,c(3,8)]),MEAN,na.rm=TRUE)



It's hard to know where 'growth' came from. Is it your own data.frame,
or from a package? It's better to provide a reproducible or toy example
(as you'll often read here).

#after growth is calculated, I want to stack the results in just one  
colunm.


growthvertical - c(growthresumo[,3],...,growthresumo[,50]) # this is  
very time consuming though




This comes to my mind:

as.vector(as.matrix(growthresumo[,3:50]))

but look up the help on stack() because it's a very powerful tool that
is aptly named (and might do everything you want).


Parcel - c(C9,S8...C9,S8) # 50 items




rep() could help with the above.


date  c(DATE1DATE50)




paste() will help with this.


growthpermonth - data.frame(Parcel, Date, growthvertical)


Thank you very much!


Good luck with R!

Stephen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate

2005-05-11 Thread Omar Lakkis
I have a data frame of daily open, high, low and settle prices. How
can I aggregate this data weekly?
The data frame has five columns, the first is the date column and the
rest are the prices.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] aggregate

2005-05-11 Thread bogdan romocea
Assuming dfr[day,o,h,l,c] and day like 2004-12-28:
dt - strptime(as.character(dfr$day),format=%Y-%m-%d) + 0
wk - format(dt,%Yw%U)
aggr - aggregate(list(dfr$o,dfr$h,dfr$l,dfr$c),list(wk),mean)
colnames(aggr) - etc


-Original Message-
From: Omar Lakkis [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 11, 2005 3:45 PM
To: r-help@stat.math.ethz.ch
Subject: [R] aggregate


I have a data frame of daily open, high, low and settle prices. How
can I aggregate this data weekly?
The data frame has five columns, the first is the date column and the
rest are the prices.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate

2005-05-11 Thread bogdan romocea
In fact since you have dates and not datetimes use as.Date() instead
of strptime().


On 5/11/05, bogdan romocea wrote:
 Assuming dfr[day,o,h,l,c] and day like 2004-12-28:
 dt - strptime(as.character(dfr$day),format=%Y-%m-%d) + 0
 wk - format(dt,%Yw%U)
 aggr - aggregate(list(dfr$o,dfr$h,dfr$l,dfr$c),list(wk),mean)
 colnames(aggr) - etc
 
 -Original Message-
 From: Omar Lakkis [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, May 11, 2005 3:45 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] aggregate
 
 I have a data frame of daily open, high, low and settle prices. How
 can I aggregate this data weekly?
 The data frame has five columns, the first is the date column and the
 rest are the prices.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Aggregate lag

2005-05-10 Thread Matthieu Cornec
hello,

Does anybody know how to aggregate a lag series ?
when I try to use aggregate I get the following message

 try-ts(1:100,start=c(1985,1),freq=12)
 aggregate(try,4,mean,na.rm=T)
Qtr1 Qtr2 Qtr3 Qtr4
1985258   11
1986   14   17   20   23
1987   26   29   32   35
1988   38   41   44   47
1989   50   53   56   59
1990   62   65   68   71
1991   74   77   80   83
1992   86   89   92   95
1993   98
 aggregate(lag(try,-1),4,mean,na.rm=T)
Error in rep.int(, start.pad) : invalid number of copies in rep()

Matthieu

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Aggregate lag

2005-05-10 Thread Achim Zeileis
On Tue, 10 May 2005 12:55:52 +0200 Matthieu Cornec wrote:

 hello,
 
 Does anybody know how to aggregate a lag series ?
 when I try to use aggregate I get the following message
 
  try-ts(1:100,start=c(1985,1),freq=12)
  aggregate(try,4,mean,na.rm=T)
 Qtr1 Qtr2 Qtr3 Qtr4
 1985258   11
 1986   14   17   20   23
 1987   26   29   32   35
 1988   38   41   44   47
 1989   50   53   56   59
 1990   62   65   68   71
 1991   74   77   80   83
 1992   86   89   92   95
 1993   98
  aggregate(lag(try,-1),4,mean,na.rm=T)
 Error in rep.int(, start.pad) : invalid number of copies in rep()

The ts-method seems to expect full blocks of observations. Note, that
also the last observation (100 in April 1993) is dropped from the
aggregate call above. I'm not sure what is the recommended way to
circumvent this problem with ts: probably, you have to do some padding
with NAs yourself.

Example:
R x - ts(1:20,start=c(1990,1),freq=12)
R aggregate(window(x, start = c(1990, 1), end = c(1991, 9), 
 extend = TRUE), 4, mean, na.rm = TRUE)
 Qtr1 Qtr2 Qtr3 Qtr4
1990  2.0  5.0  8.0 11.0
1991 14.0 17.0 19.5 
R aggregate(window(lag(x, k = -1), start = c(1990, 1),
 end = c(1991, 9), extend = TRUE), 4, mean, na.rm = TRUE)
 Qtr1 Qtr2 Qtr3 Qtr4
1990  1.5  4.0  7.0 10.0
1991 13.0 16.0 19.0 

In zoo this can be done a bit easier:
R z - zooreg(1:20, start = yearmon(1990), freq = 12)
R aggregate(z, as.yearqtr(time(z)), mean)
1990 Q1 1990 Q2 1990 Q3 1990 Q4 1991 Q1 1991 Q2 1991 Q3 
2.0 5.0 8.011.014.017.019.5 
R aggregate(lag(z, k = -1), as.yearqtr(time(lag(z, -1))), mean)
1990 Q1 1990 Q2 1990 Q3 1990 Q4 1991 Q1 1991 Q2 1991 Q3 
1.5 4.0 7.010.013.016.019.0 

hth,
Z
 
 Matthieu
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate slow with variables of type 'dates' - how to solve

2005-04-15 Thread Christoph Lehmann
Dear all
I use aggregate with variables of type numeric and dates. For type numeric  
functions, such as sum() are very fast, but similar simple functions, such 
as min() are much slower for the variables of type 'dates'. The difference 
gets bigger the larger the 'id' var is - but see this sample code:

dts - dates(c(02/27/92, 02/27/92, 01/14/92,
   02/28/92, 02/01/92))
ntimes - 70
dts - data.frame(rep(c(1:40), ntimes/8), 
  chron(rep(dts, ntimes), format = c(dates = m/d/y)),
  rep(c(0.123, 0.245, 0.423, 0.634, 0.256), ntimes))
names(dts) - c(id, date, tbs)


date()
dat.1st - aggregate(dts$date, list(id = dts$id), min)$x
dat.1st - chron(dat.1st, format = c(dates = m/d/y)) 
dat.1st
date() #82 seconds


date()
tbs.s - aggregate(as.numeric(dts$tbs),list(id = dts$id), sum)
tbs.s
date() #17 seconds

--- is it a problem of data-type 'dates' ? if yes, is there any solution 
to solve this, since for huge data-sets, this can be a problem...

as I mentioned, e.g. if we have for variable 'id' eg just 5 levels, the 
two times are roughly the same, but with the 40 different ids, we have 
this big difference

thanks a lot

Christoph

--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate slow with variables of type 'dates' - how to solve

2005-04-15 Thread Gabor Grothendieck
On 4/15/05, Christoph Lehmann [EMAIL PROTECTED] wrote:
 Dear all
 I use aggregate with variables of type numeric and dates. For type numeric
 functions, such as sum() are very fast, but similar simple functions, such
 as min() are much slower for the variables of type 'dates'. The difference
 gets bigger the larger the 'id' var is - but see this sample code:
 
 dts - dates(c(02/27/92, 02/27/92, 01/14/92,
   02/28/92, 02/01/92))
 ntimes - 70
 dts - data.frame(rep(c(1:40), ntimes/8),
  chron(rep(dts, ntimes), format = c(dates = m/d/y)),
  rep(c(0.123, 0.245, 0.423, 0.634, 0.256), ntimes))
 names(dts) - c(id, date, tbs)
 
 date()
 dat.1st - aggregate(dts$date, list(id = dts$id), min)$x
 dat.1st - chron(dat.1st, format = c(dates = m/d/y))
 dat.1st
 date() #82 seconds
 
 date()
 tbs.s - aggregate(as.numeric(dts$tbs),list(id = dts$id), sum)
 tbs.s
 date() #17 seconds
 
 --- is it a problem of data-type 'dates' ? if yes, is there any solution
 to solve this, since for huge data-sets, this can be a problem...
 
 as I mentioned, e.g. if we have for variable 'id' eg just 5 levels, the
 two times are roughly the same, but with the 40 different ids, we have
 this big difference
 
 thanks a lot
 
 Christoph
 
 --
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate slow with variables of type 'dates' - how to solve

2005-04-15 Thread Gabor Grothendieck
On 4/15/05, Christoph Lehmann [EMAIL PROTECTED] wrote:
 Dear all
 I use aggregate with variables of type numeric and dates. For type numeric
 functions, such as sum() are very fast, but similar simple functions, such
 as min() are much slower for the variables of type 'dates'. The difference
 gets bigger the larger the 'id' var is - but see this sample code:
 
 dts - dates(c(02/27/92, 02/27/92, 01/14/92,
   02/28/92, 02/01/92))
 ntimes - 70
 dts - data.frame(rep(c(1:40), ntimes/8),
  chron(rep(dts, ntimes), format = c(dates = m/d/y)),
  rep(c(0.123, 0.245, 0.423, 0.634, 0.256), ntimes))
 names(dts) - c(id, date, tbs)
 
 date()
 dat.1st - aggregate(dts$date, list(id = dts$id), min)$x
 dat.1st - chron(dat.1st, format = c(dates = m/d/y))
 dat.1st
 date() #82 seconds
 
 date()
 tbs.s - aggregate(as.numeric(dts$tbs),list(id = dts$id), sum)
 tbs.s
 date() #17 seconds
 
 --- is it a problem of data-type 'dates' ? if yes, is there any solution
 to solve this, since for huge data-sets, this can be a problem...
 
 as I mentioned, e.g. if we have for variable 'id' eg just 5 levels, the
 two times are roughly the same, but with the 40 different ids, we have
 this big difference

Just convert the dates to numeric first.  You are converting 
them back anyways.

 system.time({
+ dat.1st - chron(aggregate(dts$date, list(id = dts$id), min)$x)
+ }, TRUE)
[1] 0.86 0.00 0.86   NA   NA


 
 system.time({
+ dat.1st.2 - chron(aggregate(as.numeric(dts$date), list(id = dts$id), min)$x)
+ }, TRUE)
[1] 0.12 0.00 0.12   NA   NA
 
 identical(dat.1st, dat.1st.2)
[1] TRUE


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate question...

2005-03-31 Thread Jeff D. Hamann
R-folks,

Is there a function, like aggregate, that allows users to bin values?

I've got to break down a data frame into classes of 5cm (or something like
it), and I only know how to do it using code like,

signif - symnum( stems$dbh,
 corr = FALSE,
 na = FALSE,
 cutpoints = c(0,10,20,30,40,999),
 symbols = c(0,10,20,30,40) )


rt - data.frame( stems$expf,
  signif = ordered( signif,
levels = c(0,10,20,30,40) )

st - aggregate( rt$stems.expf, by=list(signif), sum )

Is there a one line command to do this?


-- 
Jeff D. Hamann
Forest Informatics, Inc.
PO Box 1421
Corvallis, Oregon 97339-1421
phone 541-754-1428
fax 541-752-0288
[EMAIL PROTECTED]
http://www.forestinformatics.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate question...

2005-03-31 Thread Marc Schwartz
On Thu, 2005-03-31 at 09:17 -0800, Jeff D. Hamann wrote: 
 R-folks,
 
 Is there a function, like aggregate, that allows users to bin values?
 
 I've got to break down a data frame into classes of 5cm (or something like
 it), and I only know how to do it using code like,
 
 signif - symnum( stems$dbh,
  corr = FALSE,
  na = FALSE,
  cutpoints = c(0,10,20,30,40,999),
  symbols = c(0,10,20,30,40) )
 
 
 rt - data.frame( stems$expf,
 signif = ordered( signif,
 levels = c(0,10,20,30,40) )
 
 st - aggregate( rt$stems.expf, by=list(signif), sum )
 
 Is there a one line command to do this?

Jeff,

Sometimes the notion of a single line command is in the eye of the
beholder, since things can become easily obfuscated. However, something
like the following could work:

stems - data.frame(expf = 1:100, 
dbh = sample(1:500, 100, replace = TRUE))

st - aggregate(stems$expf, 
by=list(cut(stems$dbh, 
breaks = c(0, 10, 20, 30, 40, 999))),
sum)


 st
   Group.1x
1   (0,10]   69
2  (10,20]  172
3  (20,30]  181
4  (30,40]  131
5 (40,999] 4497


Note that in the use of cut(), there are additional arguments relative
to including or not including the left and/or right hand interval values
in the respective intervals and what the labels should be. See ?cut for
more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate()

2005-01-06 Thread Petr Pikal


On 6 Jan 2005 at 16:55, Karla Meurk wrote:

 Hi, some time ago I asked R-help about aggregating data as a result I
 was able to put together some code which includes the line
 
 rain.ag - aggregate(newdata, list(hod6=cut(mindata,6 hours)), mean,
 na.rm=T)
 
 I also want to aggregate daily, and 30 minutely etc.
 
 My question is why is it that I get answers with list(.. hours) but
 R cannot cope with list(..6 hours) or any other multiple.  I have
 tried overcoming this using nfrequency= but to no avail

Hi Karla

 aggregate(rnorm(100), list(weeks5 = cut(as.Date(2001/1/1) + 70*runif(100), 
 5 weeks)),mean)
  weeks5 x
1 2001-01-01 0.1272008
2 2001-02-05 0.1808671

This works as expected so you have some problems in your data 
and without giving more information what is mindata or what sort 
of answer you did get from above mentioned code nobody can 
help.

Cheers
Petr



 
 can someone help?
 
 Thanks
 
 Carla
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate()

2005-01-05 Thread Karla Meurk
Hi, some time ago I asked R-help about aggregating data as a result I 
was able to put together some code which includes the line

rain.ag - aggregate(newdata, list(hod6=cut(mindata,6 hours)), mean, 
na.rm=T)

I also want to aggregate daily, and 30 minutely etc.
My question is why is it that I get answers with list(.. hours) but R 
cannot cope with list(..6 hours) or any other multiple.  I have tried 
overcoming this using nfrequency= but to no avail

can someone help?
Thanks
Carla
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate()

2005-01-05 Thread Gabor Grothendieck
Karla Meurk ksm32 at student.canterbury.ac.nz writes:

: 
: Hi, some time ago I asked R-help about aggregating data as a result I 
: was able to put together some code which includes the line
: 
: rain.ag - aggregate(newdata, list(hod6=cut(mindata,6 hours)), mean, 
: na.rm=T)
: 
: I also want to aggregate daily, and 30 minutely etc.
: 
: My question is why is it that I get answers with list(.. hours) but R 
: cannot cope with list(..6 hours) or any other multiple.  I have tried 
: overcoming this using nfrequency= but to no avail
: 
: can someone help?

You need to provide a short reproducible example to illustrate
your problem with an explanation of what you expect from the
code.  That means that someone can just copy the code 
from your posting and paste it into their session and see the 
exact same incorrect output or error that you got.  If its
not short you need to boil it down to something that is short
before posting it.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate and median

2004-12-21 Thread Philippe Hupé
I am trying to use the function aggregate with the median function but I 
get the following error:

Error in FUN(X[[1]], ...) : Argument INDEX
When I replace median by mean, it works perfectly
Can someone tell me where the problem comes from?
Thx
I am running R 2.0.0 on SunOS  5.9
--
Philippe Hupé
UMR 144 - Service Bioinformatique
Institut Curie
Laboratoire de Transfert (4ème étage)
26 rue d'Ulm
75005 Paris - France

Email :  [EMAIL PROTECTED]
Tél :+33 (0)1 44 32 42 75
Fax :+33 (0)1 42 34 65 28
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate and median

2004-12-21 Thread Petr Pikal


On 21 Dec 2004 at 12:46, Philippe Hup wrote:

 I am trying to use the function aggregate with the median function but
 I get the following error:
 
 Error in FUN(X[[1]], ...) : Argument INDEX
 
 When I replace median by mean, it works perfectly

Hi Philippe

I suppose that you have some typo in your aggregate construction 
or you redefined median.

 aggregate(x, list(rrr), mean)
   Group.1   x
11 -0.19455580
22 -0.06877719
33 -0.47657192
44 -0.41082682
55  1.27739323
66  1.15004620
77 -0.40064292
88 -0.02360514
99 -0.24954037
10  10  0.13480356
11  11  0.24179472
 aggregate(x, list(rrr), median)
   Group.1   x
11 -0.19455580
22 -0.06877719
33 -0.47657192
44 -0.41082682
55  1.27739323
66  1.15004620
77 -0.40064292
88 -0.02360514
99 -0.24954037
10  10  0.13480356
11  11  0.24179472

Works for me as supposed. Or do you do something completely 
defferent?

Cheers
Petr

 
 Can someone tell me where the problem comes from?
 
 Thx
 
 
 I am running R 2.0.0 on SunOS  5.9
 
 -- 
 Philippe Hup
 UMR 144 - Service Bioinformatique
 Institut Curie
 Laboratoire de Transfert (4me tage)
 26 rue d'Ulm
 75005 Paris - France
 
 Email :  [EMAIL PROTECTED]
 Tl :  +33 (0)1 44 32 42 75
 Fax :  +33 (0)1 42 34 65 28
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

Petr Pikal
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aggregate function

2004-07-26 Thread Luis Rideau Cruz
Hi all,
I have the folowing frame(there are more columns than shown),
   1  2   34   5 
Year Total  TusWhi  Norw
1994 1.00  1830   0  355 
1995 1.00 0   00 
1995 1.00 0   00 
1995 1.00  49104280  695 
1997 1.00 0   0  110 
1997 0.58 0   00 
1997 1.00 0   00 
1994 1.00 0   00 
1997 1.00 0  40   70 
1998 1.00 0   0 1252 
1999 1.04 0  740 
1999 1.00 0   00 
1999 1.02 0   00 
1999 1.00 0   00 
1999 1.00 0   0  171 
1999 1.00  1794   0  229 
1999 1.00 035250 
1997 1.00  13351185  147 
1997 1.00  49251057 4801 
1997 1.00 06275 1773 

I try to get sum(Total) by Year in which Tus0,  sum(Total) by Year in which 
Whi0,,,and so on.

I have done something like this;

a-as.list(numeric(3))
for (i in 3:5)
{
a[[i]]-aggregate(frame[,Total],list(Year=frame$Year,
   Tus=frame$i0),sum)
}

But I get

 Error in FUN(X[[as.integer(1)]], ...) : arguments must have same length

Also by doing one by one

aggregate(frame[,Total],list(Year=frame$Year,
   Tus=frame$Tus0),sum)


The result is something like;

   Year  Tus x
 1994 FALSE 49.69
 1995 FALSE 49.35
 1996 FALSE 56.95
 1997 FALSE 57.00
 1998 FALSE 57.00
 1999 FALSE 58.09
 2000 FALSE 56.97
 2001 FALSE 57.95
 2002 FALSE 57.10
 2003 FALSE 56.16
 2000  TRUE  1.00
 2002  TRUE  1.00
 2003  TRUE  2.01


Help


Thank you

Luis Ridao Cruz
Fiskirannsóknarstovan
Nóatún 1
P.O. Box 3051
FR-110 Tórshavn
Faroe Islands
Phone: +298 353900
Phone(direct): +298 353912
Mobile: +298 580800
Fax: +298 353901
E-mail:  [EMAIL PROTECTED]
Web:www.frs.fo

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate function

2004-07-26 Thread TEMPL Matthias
Hi,

# x ... your frame
attach(x)
sum(Total[Year==1997  Tus  0])

I hope this helps

Best,
Matthias


 -Ursprüngliche Nachricht-
 Von: Luis Rideau Cruz [mailto:[EMAIL PROTECTED] 
 Gesendet: Montag, 26. Juli 2004 14:52
 An: [EMAIL PROTECTED]
 Betreff: [R] aggregate function
 
 
 Hi all,
 I have the folowing frame(there are more columns than shown),
1  2   34   5 
 Year Total  TusWhi  Norw
 1994 1.00  1830   0  355 
 1995 1.00 0   00 
 1995 1.00 0   00 
 1995 1.00  49104280  695 
 1997 1.00 0   0  110 
 1997 0.58 0   00 
 1997 1.00 0   00 
 1994 1.00 0   00 
 1997 1.00 0  40   70 
 1998 1.00 0   0 1252 
 1999 1.04 0  740 
 1999 1.00 0   00 
 1999 1.02 0   00 
 1999 1.00 0   00 
 1999 1.00 0   0  171 
 1999 1.00  1794   0  229 
 1999 1.00 035250 
 1997 1.00  13351185  147 
 1997 1.00  49251057 4801 
 1997 1.00 06275 1773 
 
 I try to get sum(Total) by Year in which Tus0,  
 sum(Total) by Year in which Whi0,,,and so on.
 
 I have done something like this;
 
 a-as.list(numeric(3))
 for (i in 3:5)
 {
 a[[i]]-aggregate(frame[,Total],list(Year=frame$Year,

 Tus=frame$i0),sum) }
 
 But I get
 
  Error in FUN(X[[as.integer(1)]], ...) : arguments must have 
 same length
 
 Also by doing one by one
 
 aggregate(frame[,Total],list(Year=frame$Year,

 Tus=frame$Tus0),sum)
 
 
 The result is something like;
 
Year  Tus x
  1994 FALSE 49.69
  1995 FALSE 49.35
  1996 FALSE 56.95
  1997 FALSE 57.00
  1998 FALSE 57.00
  1999 FALSE 58.09
  2000 FALSE 56.97
  2001 FALSE 57.95
  2002 FALSE 57.10
  2003 FALSE 56.16
  2000  TRUE  1.00
  2002  TRUE  1.00
  2003  TRUE  2.01
 
 
 Help
 
 
 Thank you
 
 Luis Ridao Cruz
 Fiskirannsóknarstovan
 Nóatún 1
 P.O. Box 3051
 FR-110 Tórshavn
 Faroe Islands
 Phone: +298 353900
 Phone(direct): +298 353912
 Mobile: +298 580800
 Fax: +298 353901
 E-mail:  [EMAIL PROTECTED]
 Web:www.frs.fo
 
 __
 [EMAIL PROTECTED] mailing list 
 https://www.stat.math.ethz.ch/mailman/listinfo /r-help
 PLEASE 
 do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aggregate function

2004-07-26 Thread TEMPL Matthias
Hi,
 
# x ... your frame
attach(x)
sum(Total[Year==1997  Tus  0])
 
I hope this helps
 
Best,
Matthias Templ
 
 
  -Ursprüngliche Nachricht-
  Von: Luis Rideau Cruz [mailto:[EMAIL PROTECTED]
  Gesendet: Montag, 26. Juli 2004 14:52
  An: [EMAIL PROTECTED]
  Betreff: [R] aggregate function
  
  
  Hi all,
  I have the folowing frame(there are more columns than shown),
 1  2   34   5 
  Year Total  TusWhi  Norw
  1994 1.00  1830   0  355 
  1995 1.00 0   00 
  1995 1.00 0   00 
  1995 1.00  49104280  695 
  1997 1.00 0   0  110 
  1997 0.58 0   00 
  1997 1.00 0   00 
  1994 1.00 0   00 
  1997 1.00 0  40   70 
  1998 1.00 0   0 1252 
  1999 1.04 0  740 
  1999 1.00 0   00 
  1999 1.02 0   00 
  1999 1.00 0   00 
  1999 1.00 0   0  171 
  1999 1.00  1794   0  229 
  1999 1.00 035250 
  1997 1.00  13351185  147 
  1997 1.00  49251057 4801 
  1997 1.00 06275 1773 
  
  I try to get sum(Total) by Year in which Tus0,
  sum(Total) by Year in which Whi0,,,and so on.
  
  I have done something like this;
  
  a-as.list(numeric(3))
  for (i in 3:5)
  {
  a[[i]]-aggregate(frame[,Total],list(Year=frame$Year,
 
  Tus=frame$i0),sum) }
  
  But I get
  
   Error in FUN(X[[as.integer(1)]], ...) : arguments must have
  same length
  
  Also by doing one by one
  
  aggregate(frame[,Total],list(Year=frame$Year,
 
  Tus=frame$Tus0),sum)
  
  
  The result is something like;
  
 Year  Tus x
   1994 FALSE 49.69
   1995 FALSE 49.35
   1996 FALSE 56.95
   1997 FALSE 57.00
   1998 FALSE 57.00
   1999 FALSE 58.09
   2000 FALSE 56.97
   2001 FALSE 57.95
   2002 FALSE 57.10
   2003 FALSE 56.16
   2000  TRUE  1.00
   2002  TRUE  1.00
   2003  TRUE  2.01
  
  
  Help
  
  
  Thank you
  
  Luis Ridao Cruz
  Fiskirannsóknarstovan
  Nóatún 1
  P.O. Box 3051
  FR-110 Tórshavn
  Faroe Islands
  Phone: +298 353900
  Phone(direct): +298 353912
  Mobile: +298 580800
  Fax: +298 353901
  E-mail:  [EMAIL PROTECTED]
  Web:www.frs.fo
  
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.math.ethz.ch/mailman/listinfo /r-help
  PLEASE 
  do read the posting guide! 
  http://www.R-project.org/posting-guide.html
  


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] aggregate function

2004-07-26 Thread Liaw, Andy
I would try something like:

 lapply(frame[3:5], function(i) tapply(frame$Total[i0], frame$Year[i0],
sum))
$Tus
1994 1995 1997 1999 
   1121 

$Whi
1995 1997 1999 
1.00 4.00 2.04 

$Norw
1994 1995 1997 1998 1999 
   11512 

HTH,
Andy

  
 From: Luis Rideau Cruz
 
 Hi all,
 I have the folowing frame(there are more columns than shown),
1  2   34   5 
 Year Total  TusWhi  Norw
 1994 1.00  1830   0  355 
 1995 1.00 0   00 
 1995 1.00 0   00 
 1995 1.00  49104280  695 
 1997 1.00 0   0  110 
 1997 0.58 0   00 
 1997 1.00 0   00 
 1994 1.00 0   00 
 1997 1.00 0  40   70 
 1998 1.00 0   0 1252 
 1999 1.04 0  740 
 1999 1.00 0   00 
 1999 1.02 0   00 
 1999 1.00 0   00 
 1999 1.00 0   0  171 
 1999 1.00  1794   0  229 
 1999 1.00 035250 
 1997 1.00  13351185  147 
 1997 1.00  49251057 4801 
 1997 1.00 06275 1773 
 
 I try to get sum(Total) by Year in which Tus0,  
 sum(Total) by Year in which Whi0,,,and so on.
 
 I have done something like this;
 
 a-as.list(numeric(3))
 for (i in 3:5)
 {
 a[[i]]-aggregate(frame[,Total],list(Year=frame$Year,

 Tus=frame$i0),sum)
 }
 
 But I get
 
  Error in FUN(X[[as.integer(1)]], ...) : arguments must have 
 same length
 
 Also by doing one by one
 
 aggregate(frame[,Total],list(Year=frame$Year,

 Tus=frame$Tus0),sum)
 
 
 The result is something like;
 
Year  Tus x
  1994 FALSE 49.69
  1995 FALSE 49.35
  1996 FALSE 56.95
  1997 FALSE 57.00
  1998 FALSE 57.00
  1999 FALSE 58.09
  2000 FALSE 56.97
  2001 FALSE 57.95
  2002 FALSE 57.10
  2003 FALSE 56.16
  2000  TRUE  1.00
  2002  TRUE  1.00
  2003  TRUE  2.01
 
 
 Help
 
 
 Thank you
 
 Luis Ridao Cruz
 Fiskirannsóknarstovan
 Nóatún 1
 P.O. Box 3051
 FR-110 Tórshavn
 Faroe Islands
 Phone: +298 353900
 Phone(direct): +298 353912
 Mobile: +298 580800
 Fax: +298 353901
 E-mail:  [EMAIL PROTECTED]
 Web:www.frs.fo
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] aggregate function

2004-07-26 Thread Gabor Grothendieck

[Sorry if this gets posted twice.  I have been having some
problems with gmane posting.]

We can use rowsum like this:

 rowsum(frame$Total * (frame[,3:5]0), frame$Year)

Tus Whi Norw
1994 1 0.00 1
1995 1 1.00 1
1997 2 4.00 5
1998 0 0.00 1
1999 1 2.04 2

Note that only years that are actually present will be in the
resulting matrix. 1996 is not in the sample data in your post
so there is no row for 1996. If that's not a problem or if your 
real data covers all the years anyways we are done.

If missing years is a problem then merge in some zero rows with
the years first. The first two lines below do this and the third 
line is the same as the line above:

 frame - merge(frame, 1994:1999, by = 1, all = TRUE)
 frame[is.na(frame)] - 0

 rowsum(frame$Total * (frame[,3:5]0), frame$Year)

Tus Whi Norw
1994 1 0.00 1
1995 1 1.00 1
1996 0 0.00 0 -- now we have a row for 1996
1997 2 4.00 5
1998 0 0.00 1
1999 1 2.04 2


Luis Rideau Cruz [EMAIL PROTECTED] :

I have the folowing frame(there are more columns than shown),
1 2 3 4 5
Year Total Tus Whi Norw
1994 1.00 1830 0 355
1995 1.00 0 0 0
1995 1.00 0 0 0
1995 1.00 4910 4280 695
1997 1.00 0 0 110
1997 0.58 0 0 0
1997 1.00 0 0 0
1994 1.00 0 0 0
1997 1.00 0 40 70
1998 1.00 0 0 1252
1999 1.04 0 74 0
1999 1.00 0 0 0
1999 1.02 0 0 0
1999 1.00 0 0 0
1999 1.00 0 0 171
1999 1.00 1794 0 229
1999 1.00 0 3525 0
1997 1.00 1335 1185 147
1997 1.00 4925 1057 4801
1997 1.00 0 6275 1773

I try to get sum(Total) by Year in which Tus0, sum(Total) by Year in which 
Whi0,,,and so on.

I have done something like this;

a-as.list(numeric(3))
for (i in 3:5)
{
a[[i]]-aggregate(frame[,Total],list(Year=frame$Year,
Tus=frame$i0),sum)
}

But I get

Error in FUN(X[[as.integer(1)]], ...) : arguments must have same length

Also by doing one by one

aggregate(frame[,Total],list(Year=frame$Year,
Tus=frame$Tus0),sum)


The result is something like;

Year Tus x
1994 FALSE 49.69
1995 FALSE 49.35
1996 FALSE 56.95
1997 FALSE 57.00
1998 FALSE 57.00
1999 FALSE 58.09
2000 FALSE 56.97
2001 FALSE 57.95
2002 FALSE 57.10
2003 FALSE 56.16
2000 TRUE 1.00
2002 TRUE 1.00
2003 TRUE 2.01

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Aggregate rows to see the number of occurences

2004-06-07 Thread Nicolas STRANSKY
Hi,
I have a set of data like the following:
  [,1]  [,2]
[1,]   102
[2,]70
[3,]10
[4,]10
[5,]   150
[6,]   174
[7,]40
[8,]   198
[9,]   102
[10,]  195
I'd like to aggregate it in order to obtain the frequency (the number of 
occurences) for each couple of values (e.g.: (10,2) appears twice, (7,0) 
appears once). Something cool would be to have this value in a third 
column...
I've been looking at aggregate() but either I couldn't get the right 
parameters, or this is not the right tool to use...

Thank's for any help !
--
Nicolas STRANSKY
Équipe Oncologie Moléculaire
Institut Curie - UMR 144 - CNRS Tel : +33 1 42 34 63 40
26, rue d'Ulm - 75248 Paris Cedex 5 - FRANCEFax : +33 1 42 34 63 49
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


  1   2   >