subject:"Re\: \[R\] tapply"

Re: [R] tapply grand mean

2007-08-08 Thread Chuck Cleland

Lauri Nikkinen wrote:
 Hi R-users,
 
 I have a data.frame like this (modificated from
 https://stat.ethz.ch/pipermail/r-help/2007-August/138124.html).
 
 y1 - rnorm(20) + 6.8
 y2 - rnorm(20) + (1:20*1.7 + 1)
 y3 - rnorm(20) + (1:20*6.7 + 3.7)
 y - c(y1,y2,y3)
 x - rep(1:5,12)
 f - gl(3,20, labels=paste(lev, 1:3, sep=))
 d - data.frame(x=x,y=y, f=f)
 
 and this is how I can calculate mean of these levels.
 
 tapply(d$y, list(d$x, d$f), mean)
 
 But how can I calculate the mean of d$x 1 and 2 and the grand mean of d$x 1,
 2, 3, 4, 5 (within d$f) into a table?

  You might like the tables produced by summary.formula() in the Hmisc
package:

library(Hmisc)

summary(y ~ x + f, data = d, fun=mean, method=cross, overall=TRUE)

 UseMethod by x, f

+-+
|N|
|y|
+-+
+---+-+-+-+-+
| x |   lev1  |   lev2  |   lev3  |   ALL   |
+---+-+-+-+-+
|1  | 4   | 4   | 4   |12   |
|   | 6.452326|15.861256|61.393455|27.902346|
+---+-+-+-+-+
|2  | 4   | 4   | 4   |12   |
|   | 7.403041|17.296270|68.208299|30.969203|
+---+-+-+-+-+
|3  | 4   | 4   | 4   |12   |
|   | 6.117648|17.976864|73.479837|32.524783|
+---+-+-+-+-+
|4  | 4   | 4   | 4   |12   |
|   | 7.831390|19.696998|80.323382|35.950590|
+---+-+-+-+-+
|5  | 4   | 4   | 4   |12   |
|   | 6.746213|21.101952|87.430087|38.426084|
+---+-+-+-+-+
|ALL|20   |20   |20   |60   |
|   | 6.910124|18.386668|74.167012|33.154601|
+---+-+-+-+-+

summary(y ~ I(x %in% c(1,2)) + f, data = d, fun=mean, method=cross,
overall=TRUE)

 UseMethod by I(x %in% c(1, 2)), f

+-+
|N|
|y|
+-+
+-+-+-+-+-+
|I(x %in% c(1, 2))|   lev1  |   lev2  |   lev3  |   ALL   |
+-+-+-+-+-+
|  FALSE  |12   |12   |12   |36   |
| | 6.898417|19.591938|80.411102|35.633819|
+-+-+-+-+-+
|  TRUE   | 8   | 8   | 8   |24   |
| | 6.927684|16.578763|64.800877|29.435774|
+-+-+-+-+-+
|  ALL|20   |20   |20   |60   |
| | 6.910124|18.386668|74.167012|33.154601|
+-+-+-+-+-+

 Regards,
 Lauri
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply grand mean

2007-08-08 Thread Lauri Nikkinen

Thanks Chuck but I would fancy the output made by tapply because the idea is
to make a barplot based on those values.

-Lauri


2007/8/8, Chuck Cleland [EMAIL PROTECTED]:

 Lauri Nikkinen wrote:
  Hi R-users,
 
  I have a data.frame like this (modificated from
  https://stat.ethz.ch/pipermail/r-help/2007-August/138124.html).
 
  y1 - rnorm(20) + 6.8
  y2 - rnorm(20) + (1:20*1.7 + 1)
  y3 - rnorm(20) + (1:20*6.7 + 3.7)
  y - c(y1,y2,y3)
  x - rep(1:5,12)
  f - gl(3,20, labels=paste(lev, 1:3, sep=))
  d - data.frame(x=x,y=y, f=f)
 
  and this is how I can calculate mean of these levels.
 
  tapply(d$y, list(d$x, d$f), mean)
 
  But how can I calculate the mean of d$x 1 and 2 and the grand mean of
 d$x 1,
  2, 3, 4, 5 (within d$f) into a table?

 You might like the tables produced by summary.formula() in the Hmisc
 package:

 library(Hmisc)

 summary(y ~ x + f, data = d, fun=mean, method=cross, overall=TRUE)

 UseMethod by x, f

 +-+
 |N|
 |y|
 +-+
 +---+-+-+-+-+
 | x |   lev1  |   lev2  |   lev3  |   ALL   |
 +---+-+-+-+-+
 |1  | 4   | 4   | 4   |12   |
 |   | 6.452326|15.861256|61.393455|27.902346|
 +---+-+-+-+-+
 |2  | 4   | 4   | 4   |12   |
 |   | 7.403041|17.296270|68.208299|30.969203|
 +---+-+-+-+-+
 |3  | 4   | 4   | 4   |12   |
 |   | 6.117648|17.976864|73.479837|32.524783|
 +---+-+-+-+-+
 |4  | 4   | 4   | 4   |12   |
 |   | 7.831390|19.696998|80.323382|35.950590|
 +---+-+-+-+-+
 |5  | 4   | 4   | 4   |12   |
 |   | 6.746213|21.101952|87.430087|38.426084|
 +---+-+-+-+-+
 |ALL|20   |20   |20   |60   |
 |   | 6.910124|18.386668|74.167012|33.154601|
 +---+-+-+-+-+

 summary(y ~ I(x %in% c(1,2)) + f, data = d, fun=mean, method=cross,
 overall=TRUE)

 UseMethod by I(x %in% c(1, 2)), f

 +-+
 |N|
 |y|
 +-+
 +-+-+-+-+-+
 |I(x %in% c(1, 2))|   lev1  |   lev2  |   lev3  |   ALL   |
 +-+-+-+-+-+
 |  FALSE  |12   |12   |12   |36   |
 | | 6.898417|19.591938|80.411102|35.633819|
 +-+-+-+-+-+
 |  TRUE   | 8   | 8   | 8   |24   |
 | | 6.927684|16.578763|64.800877|29.435774|
 +-+-+-+-+-+
 |  ALL|20   |20   |20   |60   |
 | | 6.910124|18.386668|74.167012|33.154601|
 +-+-+-+-+-+

  Regards,
  Lauri
 
[[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --
 Chuck Cleland, Ph.D.
 NDRI, Inc.
 71 West 23rd Street, 8th floor
 New York, NY 10010
 tel: (212) 845-4495 (Tu, Th)
 tel: (732) 512-0171 (M, W, F)
 fax: (917) 438-0894


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply grand mean

2007-08-08 Thread Chuck Cleland

Lauri Nikkinen wrote:
 Thanks Chuck but I would fancy the output made by tapply because the
 idea is to make a barplot based on those values.
  
 -Lauri

sum1 - summary(y ~ x + f, data = d, fun=mean,
method=cross, overall=TRUE)

df - data.frame(x = sum1$x, f = sum1$f, y = sum1$S)

df
 xf y
11 lev1  6.452326
22 lev1  7.403041
33 lev1  6.117648
44 lev1  7.831390
55 lev1  6.746213
6  ALL lev1  6.910124
71 lev2 15.861256
82 lev2 17.296270
93 lev2 17.976864
10   4 lev2 19.696998
11   5 lev2 21.101952
12 ALL lev2 18.386668
13   1 lev3 61.393455
14   2 lev3 68.208299
15   3 lev3 73.479837
16   4 lev3 80.323382
17   5 lev3 87.430087
18 ALL lev3 74.167012
19   1  ALL 27.902346
20   2  ALL 30.969203
21   3  ALL 32.524783
22   4  ALL 35.950590
23   5  ALL 38.426084
24 ALL  ALL 33.154601

library(lattice)

barchart(y ~ x | f, data = df, layout=c(4,1,1))

OR

barchart(S ~ x | f, data = sum1, layout=c(4,1,1))

 2007/8/8, Chuck Cleland [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]:
 
 Lauri Nikkinen wrote:
  Hi R-users,
 
  I have a data.frame like this (modificated from
  https://stat.ethz.ch/pipermail/r-help/2007-August/138124.html).
 
  y1 - rnorm(20) + 6.8
  y2 - rnorm(20) + (1:20* 1.7 + 1)
  y3 - rnorm(20) + (1:20*6.7 + 3.7)
  y - c(y1,y2,y3)
  x - rep(1:5,12)
  f - gl(3,20, labels=paste(lev, 1:3, sep=))
  d - data.frame(x=x,y=y, f=f)
 
  and this is how I can calculate mean of these levels.
 
  tapply(d$y, list(d$x, d$f), mean)
 
  But how can I calculate the mean of d$x 1 and 2 and the grand mean
 of d$x 1,
  2, 3, 4, 5 (within d$f) into a table?
 
 You might like the tables produced by summary.formula() in the Hmisc
 package:
 
 library(Hmisc)
 
 summary(y ~ x + f, data = d, fun=mean, method=cross, overall=TRUE)
 
 UseMethod by x, f
 
 +-+
 |N|
 |y|
 +-+
 +---+-+-+-+-+
 | x |   lev1  |   lev2  |   lev3  |   ALL   |
 +---+-+-+-+-+
 |1  | 4   | 4   | 4   |12   |
 |   | 6.452326|15.861256|61.393455|27.902346|
 +---+-+-+-+-+
 |2  | 4   | 4   | 4   |12   |
 |   | 7.403041|17.296270|68.208299|30.969203|
 +---+-+-+-+-+
 |3  | 4   | 4   | 4   |12   |
 |   | 6.117648|17.976864|73.479837|32.524783|
 +---+-+-+-+-+
 |4  | 4   | 4   | 4   |12   |
 |   | 7.831390|19.696998|80.323382|35.950590|
 +---+-+-+-+-+
 |5  | 4   | 4   | 4   |12   |
 |   | 6.746213|21.101952|87.430087|38.426084|
 +---+-+-+-+-+
 |ALL|20   |20   |20   |60   |
 |   | 6.910124|18.386668|74.167012|33.154601|
 +---+-+-+-+-+
 
 summary(y ~ I(x %in% c(1,2)) + f, data = d, fun=mean, method=cross,
 overall=TRUE)
 
 UseMethod by I(x %in% c(1, 2)), f
 
 +-+
 |N|
 |y|
 +-+
 +-+-+-+-+-+
 |I(x %in% c(1, 2))|   lev1  |   lev2  |   lev3  |   ALL   |
 +-+-+-+-+-+
 |  FALSE  |12   |12   |12   |36   |
 | | 6.898417|19.591938|80.411102|35.633819|
 +-+-+-+-+-+
 |  TRUE   | 8   | 8   | 8   |24   |
 | | 6.927684|16.578763|64.800877|29.435774|
 +-+-+-+-+-+
 |  ALL|20   |20   |20   |60   |
 | | 6.910124|18.386668|74.167012|33.154601|
 +-+-+-+-+-+
 
  Regards,
  Lauri
 
[[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing
 list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 --
 Chuck Cleland, Ph.D.
 NDRI, Inc.
 71 West 23rd Street, 8th floor
 New York, NY 10010
 tel: (212) 845-4495 (Tu, Th)
 tel: (732) 512-0171 (M, W, F)
 fax: (917) 438-0894 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

Re: [R] tapply

2007-07-19 Thread John Kane

I do not understand what you want.  If aps is constant
over each class then the mean for each class is equal
to any value of aps.  

Using your example you can do 

tapply(icu1$aps, icu1$d, mean)

but it does not give you anything new.  Can you
explain the problem a bit more? 


--- sigalit mangut-leiba [EMAIL PROTECTED] wrote:

 hello,
 i want to compute the mean of a variable (aps) for
 every class
 (1,2, and 3).
 every id have a few obs., aps and class are
 constant over id.
 like this:
 id   aps class
 1  11   2
 1  11   2
 1  11   2
 1  11   2
 1  11   2
 2   83
 2   83
 2   83
 3  12   2
 3  12   2
 .
 .
 
 i tried:
 
 tapply(icu1$aps_st, icu1$hidclass, function(z)
 mean(unique(z)))
 
 but it's counting every row and not every id.
 
 thank you,
 
 Sigalit.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply

2007-07-19 Thread Henrique Dallazuanna

I also don't understand, but perhaps:

with(df, tapply(aps, list(class, id), mean))


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O


On 19/07/07, sigalit mangut-leiba [EMAIL PROTECTED] wrote:
 hello,
 i want to compute the mean of a variable (aps) for every class
 (1,2, and 3).
 every id have a few obs., aps and class are constant over id.
 like this:
 id   aps class
 1  11   2
 1  11   2
 1  11   2
 1  11   2
 1  11   2
 2   83
 2   83
 2   83
 3  12   2
 3  12   2
 .
 .

 i tried:

 tapply(icu1$aps_st, icu1$hidclass, function(z) mean(unique(z)))

 but it's counting every row and not every id.

 thank you,

 Sigalit.

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply

2007-07-19 Thread Peter Dalgaard

sigalit mangut-leiba wrote:
 I'm sorry for the unfocused questions, i'm new here...
 the output should be:
 classaps_mean
 1  na
 2 11.5
 3   8

 the mean aps of every class, when every id count *once*,  for example: class
 2, mean= (11+12)/2=11.5
 hope it's clearer.
   
Much... Get the first record for each individual from (e.g.)

icul.redux - subset(icul, !duplicated(id))

then use tapply as before using variables from icul.redux. Or in one go

with(
  subset(icul, !duplicated(id)),
  tapply(aps, class, mean, na.rm=TRUE)
)


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply

2007-06-01 Thread Bojanowski, M.J. \(Michal\)

I'm not sure what is the 'pvalue' function (it's not found in base nor
stats packages) but
this should give you what you want:

# some example
re - rnorm(100)
reg - rep(1:3, length=100)
ast - rep(1:2, length=100)

tapply( re, list(reg, ast), function(v) shapiro.test(v)$p.value )

# or neater by defining a function
p.shapiro - function(v) shapiro.test(v)$p.value
tapply( re, list(reg, ast), p.shapiro )



hth,

michal

 Hello, I want to conduct normality test to a series of data 
 and get the
 p-value for each subset. I am using the following codes, but 
 it does not
 work.
 
 tapply(re, list(reg, ast), pvalue(shapiro.test))
 
 Could anyone give me some advice? Many thanks.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply

2007-06-01 Thread Dimitris Rizopoulos

try this:

tapply(re, list(reg, ast), function(x) shapiro.test(x)$p.value)


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: livia [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Friday, June 01, 2007 1:00 PM
Subject: [R] tapply



 Hello, I want to conduct normality test to a series of data and get 
 the
 p-value for each subset. I am using the following codes, but it does 
 not
 work.

 tapply(re, list(reg, ast), pvalue(shapiro.test))

 Could anyone give me some advice? Many thanks.
 -- 
 View this message in context: 
 http://www.nabble.com/tapply-tf3851631.html#a10910748
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply histogram

2007-06-01 Thread Aimin Yan

use lattice graph


At 08:00 AM 6/1/2007, livia wrote:

Dear members,

I would like to pass the histogram settings to each subset of the dataframe,
and generate a multiple figures graph.

First, can anyone tell me how to generate a multiple figures environment? I
am trying

mfrow=c(2,4) and nothing appears.

Secondly, I want to pass the following function in tapply()

hist(x, freq=FALSE)
lines(density(x), col=red)
rug(x)

how can I manage it?

Many thanks

--
View this message in context: 
http://www.nabble.com/tapply-histogram-tf3852186.html#a10912441
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply histogram

2007-06-01 Thread Marc Schwartz

On Fri, 2007-06-01 at 06:00 -0700, livia wrote:
 Dear members,
 
 I would like to pass the histogram settings to each subset of the dataframe,
 and generate a multiple figures graph.
 
 First, can anyone tell me how to generate a multiple figures environment? I
 am trying 
 
 mfrow=c(2,4) and nothing appears.
 
 Secondly, I want to pass the following function in tapply()
 
 hist(x, freq=FALSE)
 lines(density(x), col=red)
 rug(x)
 
 how can I manage it?
 
 Many thanks

In this case, you would not want to use one of the *apply() family of
functions. First, it does not save you anything and second, these
functions are designed to return some type of R object, which you don't
want here.

Better to use a for() loop and if you wish, encapsulate the loop in a
function. Something along the lines of the following, which actually
defines a new 'formula' method for hist() (though not fully tested):


hist.formula - function(formula, data, cols, rows, ...)
{
  DF - model.frame(formula, data = data, ...)
  DF.split - split(DF[[1]], DF[[2]])
  
  par(mfrow = c(cols, rows))

  for (i in names(DF.split))
  {
Col - DF.split[[i]]
hist(Col, freq = FALSE, main = i, ...)
lines(density(Col), col = red)
rug(Col)
  }
}



The function will take the formula, create a data frame comprised of the
formula terms and then loop over the list of data frames created by
split(). 

So we call it as follows:


  hist(Sepal.Length ~ Species, data = iris, 2, 2)


Based upon the formula specification, you will then get a matrix of
histograms, where each will be titled with the factor level used to
split the original data frame.

You could further consolidate the function by implementing an automated
means to determine the number of rows and columns required in the plot
matrix, but I'll leave that for you.

See ?model.frame and ?split

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply

2007-04-10 Thread Petr PIKAL

Hallo

Seems to me that you can make a summary table using

aggregate(RESPONSE, list(TREATMENT, MEASUREMENT, BLOCK, STUDY), mean)

and then if you want you can use reshape function or melt/cast function 
from reshape package to get wide form of your table.

Regards

Petr Pikal
[EMAIL PROTECTED]

[EMAIL PROTECTED] napsal dne 10.04.2007 00:14:15:

 Hi,
 
 I have a summary table for an experiment that looks like this
 
 STUDY BLOCK  TREATMENT MEASURMENT RESPONSE
 A  1   T-0   1 12
 A  1   T-1   1 52
 A  1   T-0   2 12
 A  1   T-1   2 65
 
 and so on...
 
 there are 10 studies, 4 blocks, 10 treatemnts, 5 measurments for
 the response value.
 
 I want to produce a table that looks like this:
 
 STUDY BLOCK TREATMENT MEAS.1 MEAS.2 MEAS.3
 A   1 T-1  15 54 65
 A   1 T-2  54 65 45
 A   2 T-1  12 12 23
 A   2 T-2  65 54 65
 
 and so on...
 
 with tapply(RESPONSE, list(TREATMENT, MEASUREMENT, BLOCK, STUDY), mean)
 
 I get very close, however, I get the results as a list!
 
 if instead I use
 
 ftable(tapply(RESPONSE, list(TREATMENT, MEASUREMENT, BLOCK, STUDY), 
mean))
 
 I get REALLY close, but the I get only one value for each class, however 
I
 need to whole table, because at the end, what I really need is the
 increment between MEASUREMENT (n) - Measurement (n-1) for each 
TREATMENT,
 BLOCK, STUDY, to perform a ANOVA analysis over increment data.
 
 Esentialy, I want to move away from running a pivot-table in ACCESS
 
 Any thoughts?
 
 Cristian Montes
 North Carolina State University
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply, levelinformation

2007-02-16 Thread niederlein-rstat

Hi Jim,

jim holtman schrieb:
 Here is one way:
  
 t - split(mat, classes)
 for (i in names(t)) plotdensity(t[[i]], main=i)
 

But then I don't use the advantages of the tapply anymore...

 What is the problem you are trying to solve?

I have a set of data (multiple files), which belong to different 
conditions (one or more files per condition). I wanted to read the data 
set and a description of the conditions and then automatically create 
plots for data of the same condition.

Maybe it's much to complicate the way I do...

Antje

-
NEU: Fragen stellen - Wissen, Meinungen und Erfahrungen teilen. Jetzt auf 
Yahoo! Clever.
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply, levelinformation

2007-02-16 Thread Antje

Hi Jim,

jim holtman schrieb:
 Here is one way:
  
 t - split(mat, classes)
 for (i in names(t)) plotdensity(t[[i]], main=i)
 

But then I don't use the advantages of the tapply anymore...

 What is the problem you are trying to solve?

I have a set of data (multiple files), which belong to different
conditions (one or more files per condition). I wanted to read the data
set and a description of the conditions and then automatically create
plots for data of the same condition.

Maybe it's much to complicate the way I do...

Antje

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply, levelinformation

2007-02-16 Thread jim holtman

But it does the same thing.  What 'advantage' of tapply do you think that
you are missing?  Performance is probably not impacted since most of the
time is in the plot.

On 2/16/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 Hi Jim,

 jim holtman schrieb:
  Here is one way:
 
  t - split(mat, classes)
  for (i in names(t)) plotdensity(t[[i]], main=i)
 

 But then I don't use the advantages of the tapply anymore...

  What is the problem you are trying to solve?

 I have a set of data (multiple files), which belong to different
 conditions (one or more files per condition). I wanted to read the data
 set and a description of the conditions and then automatically create
 plots for data of the same condition.

 Maybe it's much to complicate the way I do...

 Antje

 -
 NEU: Fragen stellen - Wissen, Meinungen und Erfahrungen teilen. Jetzt auf
 Yahoo! Clever.
[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply and data.frame?

2007-01-23 Thread jim holtman

Is this what you want:

 tst
   p1   p10  p100 p1000 p1001 p1002 p1003 p1004 p1005 p1006
1 5 1 8 6 5 8 7 4 4
 data.frame(point=names(tst), ind=tst)
  point ind
p1   p1   1
p10 p10   5
p100   p100   1
p1000 p1000   8
p1001 p1001   6
p1002 p1002   5
p1003 p1003   8
p1004 p1004   7
p1005 p1005   4
p1006 p1006   4



On 1/23/07, Zhang Jian [EMAIL PROTECTED] wrote:
 I want to transform the data by tapply to one dataframe. But I can not get
 it.
 For example:
  tst=tapply(point,pp,length)
  tst[1:10]
  p1   p10 p100 p1000 p1001 p1002 p1003 p1004 p1005 p1006
  1   5   1   8   6   5   8   7   4   4
  res=as.data.frame(tst)  # I try to transform it
  res[1:10,]
  p1   p10 p100 p1000 p1001 p1002 p1003 p1004 p1005 p1006
  1   5   1   8   6   5   8   7   4   4
 How to transfrom it like the following:
 res
 point ind
 1   p1   1
 2   p10   5
 3 p100   1
 4 p1000   8
 5 p1001   6

 Thanks!

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply, data.frame problem

2007-01-17 Thread Chuck Cleland

Lauri Nikkinen wrote:
 Hi R-users,
 
 I'm quite new to R and trying to learn the basics. I have a following
 problem concerning the convertion of array object into data frame. I have
 made following data sets
 
 tmp1 - rnorm(100)
 tmp2 - gl(10,2,length=100)
 tmp3 - as.data.frame(cbind(tmp1,tmp2))
 tmp3.sum - tapply(tmp3$tmp1,tmp3$tmp2,sum)
 tmp3.sum - as.data.frame(tapply(tmp1,tmp2,sum))
 and I want the levels from tmp2 be shown as a column in the data.frame, not
 as row name as it now does. To put it in another way, as a result, I want a
 data frame with two columns: levels and the sums of those levels. Row names
 can be, for example, numbers from 1 to 10.

aggregate(tmp3[1], tmp3[2], sum)
   tmp2tmp1
1 1  8.41550650
2 2  3.65831086
3 3 -0.26296334
4 4  3.45368671
5 5 -4.64383794
6 6  0.25640949
7 7  0.02832348
8 8 -0.03811150
9 9  1.41724121
10   10 -1.06780900

?aggregate

 -Lauri Nikkinen
 Lahti, Finland
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply question

2006-07-06 Thread Jacques VESLOT

i think you can't have column with the same names.

  data.frame(AAA=1:3, AAA=4:6)
   AAA AAA.1
1   1 4
2   2 5
3   3 6

but you could subset the data frame by names using substring():

sapply(unique(substring(names(data1), 1, 3)), function(x)
rowMeans(data1[, substring(names(data1), 1, 3) == x])


---
Jacques VESLOT

CNRS UMR 8090
I.B.L (2ème étage)
1 rue du Professeur Calmette
B.P. 245
59019 Lille Cedex

Tel : 33 (0)3.20.87.10.44
Fax : 33 (0)3.20.87.10.31

http://www-good.ibl.fr
---


[EMAIL PROTECTED] a écrit :
 I think I understand tapply but i still
 can't figure out how to do the following.
 
 I have a dataframe where some of the column names are the same
 and i want to make a new dataframe where columns
 that have the same name are averaged by row.
 
 so, if the data frame, DF, was 
 
 AAABBB  CCC   AAA DDD
 1   07 11  13
 20   8 12  14
 30   6  0  15
 
 then the resulting data frame would be exactly the same except
 that the AAA column would be 
 
 6   comes from  (11 + 1)/2
 7comes from  (12 + 2)/2
 3   stays 3 because the element in the other AAA is zero
 so i don't want to average that one. it shoulsd just stay 3.
 
 So, I do 
 
 DF[DF == 0]-NA
 rowaverage-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE)
 revisedDF-tapply(seq(DF),names(DF),rowmeans)
 
 there are two problems with this :
 
 1) i need to go through the rows of the same name, not the columns
 so i don't think seq(DF) is right because that goes through 
 the columns but i want to go through rows.
 
 2) BBB will come back with ALL NA's ( since
 it was unique and there was nothing else to average ( and I don't know how to 
 transform that BB column to all zero's.
 
 thanks and i'm sorry for so many questions. i'm getting bettter with this 
 stuff and my questions will decrease soon.
 
 my guess is that i no longer should be using tapply ?
 and should be using some other version of apply.
 thanks
  mark
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply question

2006-07-06 Thread jim holtman

I think this does what you want:


 In - AAABBB  CCC   AAA DDD
+ 1   07 11  13
+ 20   8 12  14
+ 30   6  0  15
 DF - read.table(textConnection(In), header=TRUE, check.names=FALSE)

 DF[DF == 0]-NA
 rowaverage-function(x) rowMeans(DF[x],na.rm=TRUE)
 revisedDF-tapply(seq(DF),names(DF),rowaverage)
 revisedDF
$AAA
1 2 3
6 7 3

$BBB
 1  2  3
NA NA NA

$CCC
1 2 3
7 8 6

$DDD
 1  2  3
13 14 15

 do.call('cbind', revisedDF)
  AAA BBB CCC DDD
1   6  NA   7  13
2   7  NA   8  14
3   3  NA   6  15





On 7/6/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 I think I understand tapply but i still
 can't figure out how to do the following.

 I have a dataframe where some of the column names are the same
 and i want to make a new dataframe where columns
 that have the same name are averaged by row.

 so, if the data frame, DF, was

 AAABBB  CCC   AAA DDD
 1   07 11  13
 20   8 12  14
 30   6  0  15

 then the resulting data frame would be exactly the same except
 that the AAA column would be

 6   comes from  (11 + 1)/2
 7comes from  (12 + 2)/2
 3   stays 3 because the element in the other AAA is zero
 so i don't want to average that one. it shoulsd just stay 3.

 So, I do

 DF[DF == 0]-NA
 rowaverage-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE)
 revisedDF-tapply(seq(DF),names(DF),rowmeans)

 there are two problems with this :

 1) i need to go through the rows of the same name, not the columns
 so i don't think seq(DF) is right because that goes through
 the columns but i want to go through rows.

 2) BBB will come back with ALL NA's ( since
 it was unique and there was nothing else to average ( and I don't know how
 to transform that BB column to all zero's.

 thanks and i'm sorry for so many questions. i'm getting bettter with this
 stuff and my questions will decrease soon.

 my guess is that i no longer should be using tapply ?
 and should be using some other version of apply.
 thanks
 mark

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390 (Cell)
+1 513 247 0281 (Home)

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply with unequal length of arguments

2006-03-12 Thread ronggui

2006/3/12, Stefanie von Felten, IPWIfU [EMAIL PROTECTED]:
 Hi everyone,

 Is it possible to use tapply(x,y,mean) if not all groups of x by y are
 of the same length (for example if you have one missing observation)?

Yes,It works.

 I tried tapply(x,y,mean,na.omit=T) but it doesn't work!
What does it doesn't work mean exactly?Can you give an example and
the error msg?

 Steffi
 --
 -
 Stefanie von Felten
 Doktorandin

 ETH Zürich
 Institut für Pflanzenwissenschaften
 ETH Zentrum, LFW A 2

 Telefon: 044 632 85 97
 Telefax: 044 632 11 53
 e-mail:  [EMAIL PROTECTED]
 http://www.ipw.agrl.ethz.ch/~svfelten/

 und:

 Universität Zürich
 Institut für Umweltwissenschaften
 Winterthurerstrasse 190
 8057 Zürich

 Telefon: 044 635 61 23
 Telefax: 044 635 57 11
 e-mail:  [EMAIL PROTECTED]
 http://www.unizh.ch/uwinst/homepages/steffi.html

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



--
黄荣贵
Deparment of Sociology
Fudan University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply with unequal length of arguments

2006-03-12 Thread Uwe Ligges

Stefanie von Felten, IPWIfU wrote:

 Hi everyone,
 
 Is it possible to use tapply(x,y,mean) if not all groups of x by y are 
 of the same length (for example if you have one missing observation)?
 
 I tried tapply(x,y,mean,na.omit=T) but it doesn't work!


See ?tapply which tells you that the argument ... is passed to FUN 
which is mean() in this case. mean() has an argument na.rm, see ?mean.
So we get:

  tapply(x, y, mean, na.rm = TRUE)

Please read the help pages more carefully.

Uwe Ligges

 Steffi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply and weighted means

2006-01-12 Thread Dimitris Rizopoulos

you need also to split the 'w' column, for each level of 'x'; you 
could use:

lapply(split(truc, truc$x), function(z) weighted.mean(z$y, z$w))


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm



- Original Message - 
From: Florent Bresson [EMAIL PROTECTED]
To: R-help r-help@stat.math.ethz.ch
Sent: Thursday, January 12, 2006 3:44 PM
Subject: [R] tapply and weighted means


 I' m trying to compute weighted mean on different
 groups but it only returns NA. If I use the following
 data.frame truc:

 x  y  w
 1  1  1
 1  2  2
 1  3  1
 1  4  2
 0  2  1
 0  3  2
 0  4  1
 0  5  1

 where x is a factor, and then use the command :

 tapply(truc$y,list(truc$x),wtd.mean, weights=truc$w)

 I just get NA. What's the problem ? What can I do ?

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply and weighted means

2006-01-12 Thread Frank E Harrell Jr

Dimitris Rizopoulos wrote:
 you need also to split the 'w' column, for each level of 'x'; you 
 could use:
 
 lapply(split(truc, truc$x), function(z) weighted.mean(z$y, z$w))
 
 
 I hope it helps.
 
 Best,
 Dimitris

Or:
library(Hmisc)
?wtd.mean
The help file has a built-in example of this.
Frank

 
 
 Dimitris Rizopoulos
 Ph.D. Student
 Biostatistical Centre
 School of Public Health
 Catholic University of Leuven
 
 Address: Kapucijnenvoer 35, Leuven, Belgium
 Tel: +32/(0)16/336899
 Fax: +32/(0)16/337015
 Web: http://www.med.kuleuven.be/biostat/
  http://www.student.kuleuven.be/~m0390867/dimitris.htm
 
 
 
 - Original Message - 
 From: Florent Bresson [EMAIL PROTECTED]
 To: R-help r-help@stat.math.ethz.ch
 Sent: Thursday, January 12, 2006 3:44 PM
 Subject: [R] tapply and weighted means
 
 
 
I' m trying to compute weighted mean on different
groups but it only returns NA. If I use the following
data.frame truc:

x  y  w
1  1  1
1  2  2
1  3  1
1  4  2
0  2  1
0  3  2
0  4  1
0  5  1

where x is a factor, and then use the command :

tapply(truc$y,list(truc$x),wtd.mean, weights=truc$w)

I just get NA. What's the problem ? What can I do ?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

 
 
 
 Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply and weighted means

2006-01-12 Thread Gavin Simpson

On Thu, 2006-01-12 at 15:44 +0100, Florent Bresson wrote:
 I' m trying to compute weighted mean on different
 groups but it only returns NA. If I use the following
 data.frame truc:
 
 x  y  w
 1  1  1
 1  2  2
 1  3  1
 1  4  2
 0  2  1
 0  3  2
 0  4  1
 0  5  1
 
 where x is a factor, and then use the command :
 
 tapply(truc$y,list(truc$x),wtd.mean, weights=truc$w)
 
 I just get NA. What's the problem ? What can I do ?

Florent,

I guess you didn't read the help for tapply, which in the Value section
states:

 Note that optional arguments to 'FUN' supplied by the '...'
 argument are not divided into cells.  It is therefore
 inappropriate for 'FUN' to expect additional arguments with the
 same length as 'X'.

So tapply is not the right tool for this job. We can use by() instead (a
wrapper for tapply) as so:

dat - matrix(scan(), byrow = TRUE, ncol = 3)
1  1  1
1  2  2
1  3  1
1  4  2
0  2  1
0  3  2
0  4  1
0  5  1

colnames(dat) - c(x, y, w)
dat - as.data.frame(dat)
dat
(res - by(dat, dat$x, function(z) weighted.mean(z$y, z$w)))

but if you want to easily access the numbers you need to do a little
work, e.g.

as.vector(res)

Also, I don't see a function wtd.mean in standard R and weighted.mean()
doesn't have a weights argument, so I guess you are using a function
from another package and did not tell us.

HTH,

Gav
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [T] +44 (0)20 7679 5522
ENSIS Research Fellow [F] +44 (0)20 7679 7565
ENSIS Ltd.  ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk
UCL Department of Geography   [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/
London.  WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply question

2005-12-16 Thread Uwe Ligges

Frank Johannes wrote:

 HI,
 Suppose I have the following data structure.
LRT  tp
 1   1.50654010 522
 2   0.51793929 522
 3   0.90340299 522
 4   1.20293325 522
 5   1.05578774 523
 6   0.01617942 523
 7   0.68183543 523
 8   0.43820244 523
 9   1.14123995 524
 10  0.05809550 524
 11  0.93061597 524
 12  1.39739700 524
 13  1.05220953 525
 14  0.03471461 525
 15  0.63168798 525
 16  1.40592603 525
 17  1.41884492 526
 18  0.23388479 526
 19  0.21881064 526
 20  0.99710830 526
 21  2.02054187 527
 22  1.99872887 527
 23  1.04187450 527
 24  1.31556807 527
 25  2.5190 528
 26  2.94778561 528
 27  1.88800177 528
 28  2.08249941 528
 
 
 I have succesfully used a command line such as the one below to get
 maxima for each tp-category'
 
 data.out-data[tapply(LRT,tp, function(x) which(LRT==max(x))),]
 
 However, when I try it on the above data, it gives me the following
 error message:
 
Error in [.data.frame(data, tapply(LRT, tp, function(x) which(LRT ==  : 
 
 invalid subscript type


Works for me. Look at your data structures and check whether your data 
frame is OK.

Or much better easier:

   tapply(LRT, tp, max)

Uwe Ligges




 I don't know what to do.
 Thanks for your help
 
 --
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply huge speed difference if X has names

2005-08-08 Thread Prof Brian Ripley

Please use a current version of R!

This was fixed long ago, and you will find it in the NEWS file:

 split() now handles vectors with names internally and so is
 almost as fast as on vectors without names (and maybe 100x
 faster than before).


On Mon, 8 Aug 2005, Matthew Dowle wrote:


 Hi all,

 Apologies if this has been raised before ... R's tapply is very fast, but if
 X has names in this example, there seems to be a huge slow down: under 1
 second compared to 151 seconds.  The following timings are repeatable and
 are timed properly on a single user machine :

 X = 1:10
 names(X) = X
 system.time(fast-tapply(as.vector(X), rep(1:1,each=10), mean)) #
 as.vector() to drop the names
 [1] 0.36 0.00 0.35 0.00 0.00
 system.time(slow-tapply(X, rep(1:1,each=10), mean))
 [1] 149.95   1.83 151.79   0.00   0.00
 head(fast)
   123456
 5.5 15.5 25.5 35.5 45.5 55.5
 head(slow)
   123456
 5.5 15.5 25.5 35.5 45.5 55.5
 identical(fast,slow)
 [1] TRUE


 Looking inside tapply, which then calls split, it seems there is an
 is.null(names(x)) which prevents R's internal fast version from being
 called. Why is that there? Could it be removed?  I often do something like
 tapply(mat[,colname],...) where mat has rownames. Therefore the rownames
 of mat become the names of the vector mat[,colname], and this seems to
 slow down tapply a lot. Perhaps other functions which call split also suffer
 this problem?

 split.default
 function (x, f)
 {
if (is.list(f))
f - interaction(f)
f - factor(f)
if (is.null(attr(x, class))  is.null(names(x)))
return(.Internal(split(x, f)))
lf - levels(f)
y - vector(list, length(lf))
names(y) - lf
for (k in lf) y[[k]] - x[f %in% k]
y
 }
 environment: namespace:base


 version
 _
 platform x86_64-redhat-linux-gnu
 arch x86_64
 os   linux-gnu
 system   x86_64, linux-gnu
 status
 major2
 minor0.1
 year 2004
 month11
 day  15
 language R



 Thanks and regards,
 Matthew



   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply

2005-06-22 Thread Martin Maechler

 AndyL == Liaw, Andy [EMAIL PROTECTED]
 on Tue, 21 Jun 2005 13:30:54 -0400 writes:

AndyL Try:
 (x - factor(1:2, levels=1:5))
AndyL [1] 1 2
AndyL Levels: 1 2 3 4 5
 (x - x[, drop=TRUE])
AndyL [1] 1 2
AndyL Levels: 1 2

or  
(x - factor(1:2, levels=1:5))
(x2 - factor(x))

which also drops the level
Martin

AndyL Andy

 From: Weiwei Shi [mailto:[EMAIL PROTECTED] 
 
 Even before I tried, I already realize it must be true when I read
 this reply! Great job! thanks, Andy.
 
  str(z)
 `data.frame':   235 obs. of  2 variables:
 $ CLAIMNUM : Factor w/ 1907 levels 0,1001849,..: 1083 1083
 1083 1582 1582 1084 1681 1681 1391 1391 ...
 $ SIU.SAVED: int  475 3000 3000 0 0 4352 0 0 4500 3000 ...
 
 So, I have another general question: how to avoid this when I 
 do the matching?
 In my case, claimnum does not have to be a factor.  I think I can do
 as.integer on it to de-factor it. But, I want to know how to do it w/
 keeping is as factor? btw, what's your way to drop those levels?  :)
 
 weiwei 
 
 
 On 6/21/05, Liaw, Andy [EMAIL PROTECTED] wrote:
  What does str(z) say?  I suspect the second column is a 
 factor, which, after
  the subsetting, has some empty levels.  If so, just drop 
 those levels.
  
  Andy
  
   From: Weiwei Shi
  
   hi
   i tried all the methods suggested above:
   ave and rowsum with with function works for my 
 situation. I think
   the problem might not be due to tapply.
   My data z comes from
   z-y[y[[1]] %in% x[[2]], c(1,9)]
  
   while z is supposed to have no entries for those non-matched
   between x and y.
  
   however, when I run tapply, and the result also includes those
   non-matched entries. I use is.na function to remove those 
 entry from z
   first and then use tapply again, but the result is the same: those
   NA's and those non-matched results are still there. 
 That's what I mean
   by it doesn't work.
  
   Is there something I missed here so that z implicitly has some
   trace back to y dataset?
  
   thanks,
  
   On 6/20/05, Gabor Grothendieck [EMAIL PROTECTED] wrote:
On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote:
 hi,
 i have another question on tapply:
 i have a dataset z like this:
 5540 389100307391  2600
 5541 389100307391  2600
 5542 389100307391  2600
 5543 389100307391  2600
 5544 389100307391  2600
 5546 381300302513NA
 5547 387000307470NA
 5548 387000307470NA
 5549 387000307470NA
 5550 387000307470NA
 5551 387000307470NA
 5552 387000307470NA

 I want to sum the column 3 by column 2.
 I removed NA by calling:
 tapply(z[[3]], z[[2]], sum, na.rm=T)
 but it does not work.

 then, i used
 z1-z[!is.na(z[[3]],]
 and repeat
 still doesn't work.

 please help.

   
Depending on what you want you may be able to use rowsum:
   
- display only groups that have at least one non-NA with the sum
  being the sum of the non-NAs:
   
with(na.omit(z), rowsum(V3, V2))
   
- display all groups with the sum being NA if any member is NA:
   
rowsum(z$V3, z$V2)
   
  
  
   --
   Weiwei Shi, Ph.D
  
   Did you always know?
   No, I did not. But I believed...
   ---Matrix III
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide!
   http://www.R-project.org/posting-guide.html
  
  
  
  
  
  
  
 --
 
  Notice:  This e-mail message, together with any 
 attachments, contains information of Merck  Co., Inc. (One 
 Merck Drive, Whitehouse Station, New Jersey, USA 08889), 
 and/or its affiliates (which may be known outside the United 
 States as Merck Frosst, Merck Sharp  Dohme or MSD and in 
 Japan, as Banyu) that may be confidential, proprietary 
 copyrighted and/or legally privileged. It is intended solely 
 for the use of the individual or entity named on this 
 message.  If you are not the intended recipient, and have 
 received this message in error, please notify us immediately 
 by reply e-mail and then delete it from your system.
  
 --
 
  
 
 
 -- 
 Weiwei Shi, Ph.D
 
 Did you

Re: [R] tapply

2005-06-21 Thread Weiwei Shi

hi
i tried all the methods suggested above:
ave and rowsum with with function works for my situation. I think
the problem might not be due to tapply.
My data z comes from
z-y[y[[1]] %in% x[[2]], c(1,9)]

while z is supposed to have no entries for those non-matched between x and y.

however, when I run tapply, and the result also includes those
non-matched entries. I use is.na function to remove those entry from z
first and then use tapply again, but the result is the same: those
NA's and those non-matched results are still there. That's what I mean
by it doesn't work.

Is there something I missed here so that z implicitly has some
trace back to y dataset?

thanks,

On 6/20/05, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote:
  hi,
  i have another question on tapply:
  i have a dataset z like this:
  5540 389100307391  2600
  5541 389100307391  2600
  5542 389100307391  2600
  5543 389100307391  2600
  5544 389100307391  2600
  5546 381300302513NA
  5547 387000307470NA
  5548 387000307470NA
  5549 387000307470NA
  5550 387000307470NA
  5551 387000307470NA
  5552 387000307470NA
 
  I want to sum the column 3 by column 2.
  I removed NA by calling:
  tapply(z[[3]], z[[2]], sum, na.rm=T)
  but it does not work.
 
  then, i used
  z1-z[!is.na(z[[3]],]
  and repeat
  still doesn't work.
 
  please help.
 
 
 Depending on what you want you may be able to use rowsum:
 
 - display only groups that have at least one non-NA with the sum
   being the sum of the non-NAs:
 
 with(na.omit(z), rowsum(V3, V2))
 
 - display all groups with the sum being NA if any member is NA:
 
 rowsum(z$V3, z$V2)
 


-- 
Weiwei Shi, Ph.D

Did you always know?
No, I did not. But I believed...
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply

2005-06-21 Thread Liaw, Andy

What does str(z) say?  I suspect the second column is a factor, which, after
the subsetting, has some empty levels.  If so, just drop those levels.

Andy

 From: Weiwei Shi
 
 hi
 i tried all the methods suggested above:
 ave and rowsum with with function works for my situation. I think
 the problem might not be due to tapply.
 My data z comes from
 z-y[y[[1]] %in% x[[2]], c(1,9)]
 
 while z is supposed to have no entries for those non-matched 
 between x and y.
 
 however, when I run tapply, and the result also includes those
 non-matched entries. I use is.na function to remove those entry from z
 first and then use tapply again, but the result is the same: those
 NA's and those non-matched results are still there. That's what I mean
 by it doesn't work.
 
 Is there something I missed here so that z implicitly has some
 trace back to y dataset?
 
 thanks,
 
 On 6/20/05, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote:
   hi,
   i have another question on tapply:
   i have a dataset z like this:
   5540 389100307391  2600
   5541 389100307391  2600
   5542 389100307391  2600
   5543 389100307391  2600
   5544 389100307391  2600
   5546 381300302513NA
   5547 387000307470NA
   5548 387000307470NA
   5549 387000307470NA
   5550 387000307470NA
   5551 387000307470NA
   5552 387000307470NA
  
   I want to sum the column 3 by column 2.
   I removed NA by calling:
   tapply(z[[3]], z[[2]], sum, na.rm=T)
   but it does not work.
  
   then, i used
   z1-z[!is.na(z[[3]],]
   and repeat
   still doesn't work.
  
   please help.
  
  
  Depending on what you want you may be able to use rowsum:
  
  - display only groups that have at least one non-NA with the sum
being the sum of the non-NAs:
  
  with(na.omit(z), rowsum(V3, V2))
  
  - display all groups with the sum being NA if any member is NA:
  
  rowsum(z$V3, z$V2)
  
 
 
 -- 
 Weiwei Shi, Ph.D
 
 Did you always know?
 No, I did not. But I believed...
 ---Matrix III
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply

2005-06-21 Thread Weiwei Shi

Even before I tried, I already realize it must be true when I read
this reply! Great job! thanks, Andy.

 str(z)
`data.frame':   235 obs. of  2 variables:
 $ CLAIMNUM : Factor w/ 1907 levels 0,1001849,..: 1083 1083
1083 1582 1582 1084 1681 1681 1391 1391 ...
 $ SIU.SAVED: int  475 3000 3000 0 0 4352 0 0 4500 3000 ...

So, I have another general question: how to avoid this when I do the matching?
In my case, claimnum does not have to be a factor.  I think I can do
as.integer on it to de-factor it. But, I want to know how to do it w/
keeping is as factor? btw, what's your way to drop those levels?  :)

weiwei 


On 6/21/05, Liaw, Andy [EMAIL PROTECTED] wrote:
 What does str(z) say?  I suspect the second column is a factor, which, after
 the subsetting, has some empty levels.  If so, just drop those levels.
 
 Andy
 
  From: Weiwei Shi
 
  hi
  i tried all the methods suggested above:
  ave and rowsum with with function works for my situation. I think
  the problem might not be due to tapply.
  My data z comes from
  z-y[y[[1]] %in% x[[2]], c(1,9)]
 
  while z is supposed to have no entries for those non-matched
  between x and y.
 
  however, when I run tapply, and the result also includes those
  non-matched entries. I use is.na function to remove those entry from z
  first and then use tapply again, but the result is the same: those
  NA's and those non-matched results are still there. That's what I mean
  by it doesn't work.
 
  Is there something I missed here so that z implicitly has some
  trace back to y dataset?
 
  thanks,
 
  On 6/20/05, Gabor Grothendieck [EMAIL PROTECTED] wrote:
   On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote:
hi,
i have another question on tapply:
i have a dataset z like this:
5540 389100307391  2600
5541 389100307391  2600
5542 389100307391  2600
5543 389100307391  2600
5544 389100307391  2600
5546 381300302513NA
5547 387000307470NA
5548 387000307470NA
5549 387000307470NA
5550 387000307470NA
5551 387000307470NA
5552 387000307470NA
   
I want to sum the column 3 by column 2.
I removed NA by calling:
tapply(z[[3]], z[[2]], sum, na.rm=T)
but it does not work.
   
then, i used
z1-z[!is.na(z[[3]],]
and repeat
still doesn't work.
   
please help.
   
  
   Depending on what you want you may be able to use rowsum:
  
   - display only groups that have at least one non-NA with the sum
 being the sum of the non-NAs:
  
   with(na.omit(z), rowsum(V3, V2))
  
   - display all groups with the sum being NA if any member is NA:
  
   rowsum(z$V3, z$V2)
  
 
 
  --
  Weiwei Shi, Ph.D
 
  Did you always know?
  No, I did not. But I believed...
  ---Matrix III
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html
 
 
 
 
 
 
 --
 Notice:  This e-mail message, together with any attachment...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply

2005-06-21 Thread Liaw, Andy

Try:

 (x - factor(1:2, levels=1:5))
[1] 1 2
Levels: 1 2 3 4 5
 (x - x[, drop=TRUE])
[1] 1 2
Levels: 1 2

Andy

 From: Weiwei Shi [mailto:[EMAIL PROTECTED] 
 
 Even before I tried, I already realize it must be true when I read
 this reply! Great job! thanks, Andy.
 
  str(z)
 `data.frame':   235 obs. of  2 variables:
  $ CLAIMNUM : Factor w/ 1907 levels 0,1001849,..: 1083 1083
 1083 1582 1582 1084 1681 1681 1391 1391 ...
  $ SIU.SAVED: int  475 3000 3000 0 0 4352 0 0 4500 3000 ...
 
 So, I have another general question: how to avoid this when I 
 do the matching?
 In my case, claimnum does not have to be a factor.  I think I can do
 as.integer on it to de-factor it. But, I want to know how to do it w/
 keeping is as factor? btw, what's your way to drop those levels?  :)
 
 weiwei 
 
 
 On 6/21/05, Liaw, Andy [EMAIL PROTECTED] wrote:
  What does str(z) say?  I suspect the second column is a 
 factor, which, after
  the subsetting, has some empty levels.  If so, just drop 
 those levels.
  
  Andy
  
   From: Weiwei Shi
  
   hi
   i tried all the methods suggested above:
   ave and rowsum with with function works for my 
 situation. I think
   the problem might not be due to tapply.
   My data z comes from
   z-y[y[[1]] %in% x[[2]], c(1,9)]
  
   while z is supposed to have no entries for those non-matched
   between x and y.
  
   however, when I run tapply, and the result also includes those
   non-matched entries. I use is.na function to remove those 
 entry from z
   first and then use tapply again, but the result is the same: those
   NA's and those non-matched results are still there. 
 That's what I mean
   by it doesn't work.
  
   Is there something I missed here so that z implicitly has some
   trace back to y dataset?
  
   thanks,
  
   On 6/20/05, Gabor Grothendieck [EMAIL PROTECTED] wrote:
On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote:
 hi,
 i have another question on tapply:
 i have a dataset z like this:
 5540 389100307391  2600
 5541 389100307391  2600
 5542 389100307391  2600
 5543 389100307391  2600
 5544 389100307391  2600
 5546 381300302513NA
 5547 387000307470NA
 5548 387000307470NA
 5549 387000307470NA
 5550 387000307470NA
 5551 387000307470NA
 5552 387000307470NA

 I want to sum the column 3 by column 2.
 I removed NA by calling:
 tapply(z[[3]], z[[2]], sum, na.rm=T)
 but it does not work.

 then, i used
 z1-z[!is.na(z[[3]],]
 and repeat
 still doesn't work.

 please help.

   
Depending on what you want you may be able to use rowsum:
   
- display only groups that have at least one non-NA with the sum
  being the sum of the non-NAs:
   
with(na.omit(z), rowsum(V3, V2))
   
- display all groups with the sum being NA if any member is NA:
   
rowsum(z$V3, z$V2)
   
  
  
   --
   Weiwei Shi, Ph.D
  
   Did you always know?
   No, I did not. But I believed...
   ---Matrix III
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide!
   http://www.R-project.org/posting-guide.html
  
  
  
  
  
  
  
 --
 
  Notice:  This e-mail message, together with any 
 attachments, contains information of Merck  Co., Inc. (One 
 Merck Drive, Whitehouse Station, New Jersey, USA 08889), 
 and/or its affiliates (which may be known outside the United 
 States as Merck Frosst, Merck Sharp  Dohme or MSD and in 
 Japan, as Banyu) that may be confidential, proprietary 
 copyrighted and/or legally privileged. It is intended solely 
 for the use of the individual or entity named on this 
 message.  If you are not the intended recipient, and have 
 received this message in error, please notify us immediately 
 by reply e-mail and then delete it from your system.
  
 --
 
  
 
 
 -- 
 Weiwei Shi, Ph.D
 
 Did you always know?
 No, I did not. But I believed...
 ---Matrix III
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply

2005-06-20 Thread Jim Brennan

This may help
Rwei
 V1   V2   V3
1  5540 389100307391 2600
2  5541 389100307391 2600
3  5542 389100307391 2600
4  5543 389100307391 2600
5  5544 389100307391 2600
6  5546 381300302513   NA
7  5547 387000307470   NA
8  5548 387000307470   NA
9  5549 387000307470   NA
10 5550 387000307470   NA
11 5551 387000307470   NA
12 5552 387000307470   NA
Rave(wei[,3],wei[,2],FUN=sum)
 [1] 13000 13000 13000 13000 13000NANANANANANANA
R

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Weiwei Shi
Sent: June 20, 2005 7:16 PM
To: R-help@stat.math.ethz.ch
Subject: [R] tapply

hi,
i have another question on tapply:
i have a dataset z like this:
5540 389100307391  2600
5541 389100307391  2600
5542 389100307391  2600
5543 389100307391  2600
5544 389100307391  2600
5546 381300302513NA
5547 387000307470NA
5548 387000307470NA
5549 387000307470NA
5550 387000307470NA
5551 387000307470NA
5552 387000307470NA

I want to sum the column 3 by column 2.
I removed NA by calling:
tapply(z[[3]], z[[2]], sum, na.rm=T)
but it does not work.

then, i used
z1-z[!is.na(z[[3]],]
and repeat
still doesn't work.

please help.

-- 
Weiwei Shi, Ph.D

Did you always know?
No, I did not. But I believed...
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply

2005-06-20 Thread Marc Schwartz

On Mon, 2005-06-20 at 18:15 -0500, Weiwei Shi wrote:
 hi,
 i have another question on tapply:
 i have a dataset z like this:
 5540 389100307391  2600
 5541 389100307391  2600
 5542 389100307391  2600
 5543 389100307391  2600
 5544 389100307391  2600
 5546 381300302513NA
 5547 387000307470NA
 5548 387000307470NA
 5549 387000307470NA
 5550 387000307470NA
 5551 387000307470NA
 5552 387000307470NA
 
 I want to sum the column 3 by column 2.
 I removed NA by calling:
 tapply(z[[3]], z[[2]], sum, na.rm=T)
 but it does not work.
 
 then, i used
 z1-z[!is.na(z[[3]],]
 and repeat
 still doesn't work.
 
 please help.


The index vector(s) in tapply() need to be a list. See the description
of the INDEX argument in ?tapply:

 tapply(z[[3]],list(z[[2]]), sum, na.rm = TRUE)
381300302513 387000307470 389100307391 
   0013000 


Note that the use of na.rm = TRUE here results in misleading values of 0
for the other two groups, which are all NA's and this is not
self-evident unless you know the data.

You may be better off with:

 tapply(z[[3]],list(z[[2]]), sum)
381300302513 387000307470 389100307391 
  NA   NA13000 

unless your real data is a mix of NA's and measured values.

Also see ?complete.cases and ?na.omit for further approaches to dealing
with such data sets.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply

2005-06-20 Thread Douglas Bates

On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote:
 hi,
 i have another question on tapply:
 i have a dataset z like this:
 5540 389100307391  2600
 5541 389100307391  2600
 5542 389100307391  2600
 5543 389100307391  2600
 5544 389100307391  2600
 5546 381300302513NA
 5547 387000307470NA
 5548 387000307470NA
 5549 387000307470NA
 5550 387000307470NA
 5551 387000307470NA
 5552 387000307470NA
 
 I want to sum the column 3 by column 2.
 I removed NA by calling:
 tapply(z[[3]], z[[2]], sum, na.rm=T)
 but it does not work.
 
 then, i used
 z1-z[!is.na(z[[3]],]
 and repeat
 still doesn't work.

Can you be more explicit about doesn't work?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply

2005-06-20 Thread Gabor Grothendieck

On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote:
 hi,
 i have another question on tapply:
 i have a dataset z like this:
 5540 389100307391  2600
 5541 389100307391  2600
 5542 389100307391  2600
 5543 389100307391  2600
 5544 389100307391  2600
 5546 381300302513NA
 5547 387000307470NA
 5548 387000307470NA
 5549 387000307470NA
 5550 387000307470NA
 5551 387000307470NA
 5552 387000307470NA
 
 I want to sum the column 3 by column 2.
 I removed NA by calling:
 tapply(z[[3]], z[[2]], sum, na.rm=T)
 but it does not work.
 
 then, i used
 z1-z[!is.na(z[[3]],]
 and repeat
 still doesn't work.
 
 please help.
 

Depending on what you want you may be able to use rowsum:

- display only groups that have at least one non-NA with the sum
  being the sum of the non-NAs:

with(na.omit(z), rowsum(V3, V2))

- display all groups with the sum being NA if any member is NA:

rowsum(z$V3, z$V2)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply and NA value

2005-03-25 Thread Dimitris Rizopoulos

you should look at the 'na.rm=FALSE' argument of '?mean()', i.e.,
x - rnorm(100); x[sample(100, 10)] - NA
f - sample(letters[1:5], 100, TRUE)
###
tapply(x, f, mean)
tapply(x, f, mean, na.rm=TRUE)
I hope it helps.
Best,
Dimitris

Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
- Original Message - 
From: Leonardo Lami [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Friday, March 25, 2005 10:35 AM
Subject: [R] tapply and NA value


Hi,
I'm writing for a little help.
I have a dataframe with same NA value and I'd like to obtain the 
means of the
value of a coloumn grouped by the levels of a factor coloumn of the 
datframe.
I'm using the function tapply but I see that if only a NA value is 
present
the result is NA.
There is an option to have the correct result or I must use an other 
function?

Thanks of all
Leonardo
--
Leonardo Lami
[EMAIL PROTECTED]www.faunalia.it
Via Colombo 3 - 51010 Massa e Cozzile (PT), Italy   Tel: 
(+39)349-1310164
GPG key @: hkp://wwwkeys.pgp.net http://www.pgp.net/wwwkeys.html
https://www.biglumber.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply and NA value

2005-03-25 Thread Ales Ziberna

I am not really sure what you mean. If I understand you correctly, than all 
ylu have to do is to give additiona parameter to tapply, na.rm=TRUE,

tapply(, na.rm=TRUE)
However as I already said, I'm not sure what you did and what is the 
problem. Plese provide the code that did not work, possibly with a workable 
example, as the posting guide suggests:
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html;

I hope this helps in anyway,
Ales Ziberna
- Original Message - 
From: Leonardo Lami [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Friday, March 25, 2005 10:35 AM
Subject: [R] tapply and NA value


Hi,
I'm writing for a little help.
I have a dataframe with same NA value and I'd like to obtain the means of 
the
value of a coloumn grouped by the levels of a factor coloumn of the 
datframe.
I'm using the function tapply but I see that if only a NA value is 
present
the result is NA.
There is an option to have the correct result or I must use an other 
function?

Thanks of all
Leonardo
--
Leonardo Lami
[EMAIL PROTECTED]www.faunalia.it
Via Colombo 3 - 51010 Massa e Cozzile (PT), Italy   Tel: (+39)349-1310164
GPG key @: hkp://wwwkeys.pgp.net http://www.pgp.net/wwwkeys.html
https://www.biglumber.com
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply and NA value

2005-03-25 Thread Leonardo Lami

Thanks very much!
Best of all,
Leonardo


Alle 10:52, venerdì 25 marzo 2005, Dimitris Rizopoulos ha scritto:
 you should look at the 'na.rm=FALSE' argument of '?mean()', i.e.,

 x - rnorm(100); x[sample(100, 10)] - NA
 f - sample(letters[1:5], 100, TRUE)
 ###
 tapply(x, f, mean)
 tapply(x, f, mean, na.rm=TRUE)


 I hope it helps.

 Best,
 Dimitris

 
 Dimitris Rizopoulos
 Ph.D. Student
 Biostatistical Centre
 School of Public Health
 Catholic University of Leuven

 Address: Kapucijnenvoer 35, Leuven, Belgium
 Tel: +32/16/336899
 Fax: +32/16/337015
 Web: http://www.med.kuleuven.ac.be/biostat/
  http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm


 - Original Message -
 From: Leonardo Lami [EMAIL PROTECTED]
 To: r-help@stat.math.ethz.ch
 Sent: Friday, March 25, 2005 10:35 AM
 Subject: [R] tapply and NA value

  Hi,
  I'm writing for a little help.
  I have a dataframe with same NA value and I'd like to obtain the
  means of the
  value of a coloumn grouped by the levels of a factor coloumn of the
  datframe.
  I'm using the function tapply but I see that if only a NA value is
  present
  the result is NA.
  There is an option to have the correct result or I must use an other
  function?
 
  Thanks of all
  Leonardo
  --
  Leonardo Lami
  [EMAIL PROTECTED]www.faunalia.it
  Via Colombo 3 - 51010 Massa e Cozzile (PT), Italy   Tel:
  (+39)349-1310164
  GPG key @: hkp://wwwkeys.pgp.net http://www.pgp.net/wwwkeys.html
  https://www.biglumber.com
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html

-- 
Leonardo Lami
[EMAIL PROTECTED]www.faunalia.it
Via Colombo 3 - 51010 Massa e Cozzile (PT), Italy   Tel: (+39)349-1310164
GPG key @: hkp://wwwkeys.pgp.net http://www.pgp.net/wwwkeys.html
https://www.biglumber.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] tapply and names

2005-01-25 Thread Liaw, Andy

 From: Göran Broström
 
 I have a data frame containing children, with variables 'year' = birth
 year, and 'm.id' = mother's id number. Let's assume that all 
 the births of
 each mother is represented in the data frame. 
 
 Now I want to create a subset of this data frame containing 
 all children,
 whose mother's first birth was in the year 1816 or later. 
 This seems to
 work: 
 
 mid - tapply(dat$year, dat$m.id, min)
 mid - as.numeric(names(mid)[mid = 1816])
 dat - dat[dat$m.id %in% mid, ]
 
 but I'm worried about the second line, because the output 
 from 'tapply'
 isn't documented to have a 'dimnames' attribute (although it 
 has one, at
 least in R-2.1.0, 2005-01-19). Another aspect is that this 
 code relies on
 m.id being numeric; I would have to change it if the type of 
 m.id changes
 to, eg, character.
 
 So, question: Is there a better way of doing this?

Would this work?

  dat - dat[ave(dat$year, dat$m.id, min) = 1816, ]

Andy

 -- 
  Göran Broströmtel: +46 90 786 5223
  Department of Statistics  fax: +46 90 786 6614
  Umeå University   http://www.stat.umu.se/egna/gb/
  SE-90187 Umeå, Sweden e-mail: [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply and names

2005-01-25 Thread Dimitris Rizopoulos

your approach, after omitting the as.numeric() in the second line, 
seems to work even for `m.id' being factor, i.e.,

dat - data.frame(m.id=rep(letters[1:10], 10), year=sample(1805:1950, 
100, TRUE))
###
mid - tapply(dat$year, dat$m.id, min)
mid - names(mid)[mid = 1816]
dat. - dat[dat$m.id %in% mid, ]
dat; dat.

but maybe there is something better.
Best,
Dimitris

Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat
http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
- Original Message - 
From: Göran Broström [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Tuesday, January 25, 2005 3:55 PM
Subject: [R] tapply and names


I have a data frame containing children, with variables 'year' = 
birth
year, and 'm.id' = mother's id number. Let's assume that all the 
births of
each mother is represented in the data frame.

Now I want to create a subset of this data frame containing all 
children,
whose mother's first birth was in the year 1816 or later. This seems 
to
work:

   mid - tapply(dat$year, dat$m.id, min)
   mid - as.numeric(names(mid)[mid = 1816])
   dat - dat[dat$m.id %in% mid, ]
but I'm worried about the second line, because the output from 
'tapply'
isn't documented to have a 'dimnames' attribute (although it has 
one, at
least in R-2.1.0, 2005-01-19). Another aspect is that this code 
relies on
m.id being numeric; I would have to change it if the type of m.id 
changes
to, eg, character.

So, question: Is there a better way of doing this?
--
Göran Broströmtel: +46 90 786 5223
Department of Statistics  fax: +46 90 786 6614
Umeå University   http://www.stat.umu.se/egna/gb/
SE-90187 Umeå, Sweden e-mail: [EMAIL PROTECTED]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply and names

2005-01-25 Thread Göran Broström

On Tue, Jan 25, 2005 at 10:43:24AM -0500, Liaw, Andy wrote:
  From: Göran Broström
  
  I have a data frame containing children, with variables 'year' = birth
  year, and 'm.id' = mother's id number. Let's assume that all 
  the births of
  each mother is represented in the data frame. 
  
  Now I want to create a subset of this data frame containing 
  all children,
  whose mother's first birth was in the year 1816 or later. 
  This seems to
  work: 
  
  mid - tapply(dat$year, dat$m.id, min)
  mid - as.numeric(names(mid)[mid = 1816])
  dat - dat[dat$m.id %in% mid, ]
  
  but I'm worried about the second line, because the output 
  from 'tapply'
  isn't documented to have a 'dimnames' attribute (although it 
  has one, at
  least in R-2.1.0, 2005-01-19). Another aspect is that this 
  code relies on
  m.id being numeric; I would have to change it if the type of 
  m.id changes
  to, eg, character.
  
  So, question: Is there a better way of doing this?
 
 Would this work?
 
   dat - dat[ave(dat$year, dat$m.id, min) = 1816, ]

Yes, but you (or I) need

 dat - dat[ave(dat$year, dat$m.id, FUN = min) = 1816, ]
 ^
(took me some time to figure out), because

?ave

Usage:

 ave(x, ..., FUN = mean)

Thanks Andy for giving me 'ave'! And thanks to Dimitris for his suggestion. 

Göran

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply hist

2004-05-13 Thread Jason Turner

 ...
 # Histograms by technology
 par(mfrow=c(2,3))
 tapply(Pot,SGruppo,hist)
 detach(dati)

 It all works great but  tapply(Pot,SGruppo,hist) produces 6 histograms
 with
 the titles and the xlab labels in a generic form, something like
 integer[1],
 integer[2], ... while I'd like to have each graph indicating the

tapply takes atomic data (usually vectors).  You want to pass rows of a
data frame, so the Pot *and* SGruppo will be sent together; by() is very
good for this.  It might be possible (even easy?) to use tapply, but I
just use by for these things.

Since dati is your data frame, try this (untested!):

by(dati,dati$SGruppo, function(x,...){
  hist(x$Pot,main=as.character(x$SGruppo[1])) } )

Or, use Lattice:

library(lattice)
histogram( ~ Pot | SGruppo, data=dati)

Cheers

Jason

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply hist

2004-05-13 Thread Gabor Grothendieck


As another respondent already mentioned, Lattice is probably the way to
go on this one but if you do want to use tapply try this:

names(Pot) - SGruppo
dummy - tapply(Pot,SGruppo,function(x)hist(x,main=names(x)[1],xlab=NULL))


Vittorio v.demartino2 at virgilio.it writes:

: 
: I'm learning how to use tapply. 
: Now I'm having a go at the following code in which dati contains almost 600 
: lines, Pot - numeric - are the capacities of power plants and SGruppo - text 
: - the corresponding six technologies 
(CCC, CIC,TGC, CSC,CPC, TE). 
: .
: 
: dati=sqlQuery(canale,select Id,SGruppo,Classe, NGruppo,ProdNetta,Pot from 
: SintesiQuery)
: attach(dati)
: # Grouping by technology
: tapply(Pot,SGruppo,sum)
: ...
: # Histograms by technology
: par(mfrow=c(2,3)) 
: tapply(Pot,SGruppo,hist)
: detach(dati)
: 
: It all works great but  tapply(Pot,SGruppo,hist) produces 6 histograms with 
: the titles and the xlab labels in a generic form, something like integer[1], 
: integer[2], ... while I'd like to have each graph indicating the 
: mentioned technologies.
: I've been trying issuing 
: tech=c(CCC, CIC,TGC, CSC,CPC, TE)
: tapply(Pot,SGruppo,hist, main=tech)
: 
: but R prints in each histogram the six values in the title without cycling 
: among them.
: 
: How can I obtain what I want?
: 
: Ciao
: Vittorio

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply() and barplot() help files for 1.8.1

2004-04-16 Thread David Whiting

Martin Maechler [EMAIL PROTECTED] writes:

 and I like to help you.
 As I keep installed `(almost) all released versions of R ever
 installed on our machines'
 I can easily run 1.8.1 (or 1.4.x or 1.0.x ...) for you.
 
 The only difference
  between the help page help(tapply)
 is an extra   require(stats) statement at the beginning of the
 `Examples' section in 1.9.0.
 
 and the only change to  tapply() is 
 group - rep.int(one, nx)#- to contain the splitting vector
 instead of
 group - rep(one, nx)#- to contain the splitting vector
 
 which hardly should have adverse results.
 
 In barplot, there's the new 'offset' option  --- not in NEWS ()
 
 and another change that may be a problem.
 
 Can you dig harder and if possible provide a reproducible (small..)
 example to make progress here...
 

Last night I found I had a backup of the source of 1.8.0, built that
and tested an example and it worked as in 1.9.0.  I then started to
question my sanity (or at least my competence).

The code that follows should be a reproducible example.  It creates a
data frame that has the same structure as the data I am working with
(with a number of other columns dropped) and is followed by the
function that creates the barplot.  The changes I have had to make to
make it work as I thought it was working with 1.8.1 have ## NEW BIT
after them, i.e. those lines were not there in the version I ran with
1.8.1.  The important new lines are:

 x - matrix(x)  ## NEW BIT

and 

 beside = TRUE,  ## NEW BIT



--- EXAMPLE ---

## Create some fake data.
x - c(rep(, 926), 
rep(All Other Perinatal Causes, 46), 
rep(Anaemia, 3), 
rep(Congenital Abnormalities, 1), 
rep(Unsp. Direct Maternal Causes, 24))
y - runif(length(x))
tempdat - data.frame(smi=x, yllperdth=y)



## Define the function to make my barplot
bodShare - function(x, fld, main = , userpar = 18, xlimMult=1.3 ) {
  ###
  # A horizontal barchart to display BoD shares #
  ###
  z - subset(x, as.character(x[,fld]) != )
  z[, fld] - factor(z[, fld])

  ## We need to change the parameters of the chart.
  ## First save the old settings.
  oldpar - par(mar)
  newpar - par(mar)

  ## Increase the size of the margin on the left so there 
  ## is enough space for the long text labels (which will 
  ## be displayed horizontally on the y-axis).
  newpar[2] - userpar

  
  ## Reduce the top margin because I will use a \caption in LaTeX 
  ## instead.
  newpar[3] - 1


  ## Now apply the new settings.
  par(mar = newpar)

  ## Calculate the % of YLLs for each group in the cause classification.
  x - tapply(z$yllperdth, z[, fld], sum)
  totalYLLs - sum(x)
  x - x / totalYLLs * 100
  x - sort(x)

  causeNames - names(x)  ## NEW BIT
  x - matrix(x)  ## NEW BIT
  

  ## Plot the chart. horiz = TRUE makes it a bar instead of 
  ## column chart.  las = 1 prints the labels horizontally.
  xplot - barplot(x, 
##   main = main,
   horiz = TRUE, 
   beside = TRUE,## NEW BIT
   names.arg = causeNames,   ## NEW BIT
   xlab = Percent of YLLs,
   xlim = c(0, max(x) * xlimMult), 
   las = 1)
  
  text(x + (max(x) * .15), xplot, formatC(x, digits=1, format='f'))

  ## Reset the old margin parameters.
  par(mar = oldpar)
  
  ## Write data to a table for export.
  # First we need to remove newlines from labels.
  names(x) - sub(\n, , names(x))
  write.table(as.table(x), file = paste(tables/, fld, .csv, sep=), col.names=NA, 
sep=\t)
  names(x) - causeNames
  x[length(x)]
}

## Create the barplot.
bodShare(tempdat, smi)


-- 
David Whiting
Dar es Salaam, Tanzania

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply() and barplot() help files for 1.8.1

2004-04-15 Thread Martin Maechler

 David == David Whiting [EMAIL PROTECTED]
 on 15 Apr 2004 11:42:18 + writes:

David Hi,

David I've just upgraded to 1.9.0 and one of my Sweave
David files that produces a number of barplots in a
David standard manner now produces them in a different way.
David I have made a couple of small changes to my code to
David get the back the output I was getting before
David upgrading and now (mostly out of curiosity) would
David like to understand what has changed.

and I like to help you.
As I keep installed `(almost) all released versions of R ever
installed on our machines'
I can easily run 1.8.1 (or 1.4.x or 1.0.x ...) for you.

The only difference
 between the help page help(tapply)
is an extra   require(stats) statement at the beginning of the
`Examples' section in 1.9.0.

and the only change to  tapply() is 
group - rep.int(one, nx)#- to contain the splitting vector
instead of
group - rep(one, nx)#- to contain the splitting vector

which hardly should have adverse results.

In barplot, there's the new 'offset' option  --- not in NEWS ()

and another change that may be a problem.

Can you dig harder and if possible provide a reproducible (small..)
example to make progress here...


David I *think* I've tracked it down to tapply() and/or
David barplot() and have not seen anything in the NEWS file
David regarding changes to these functions (as far a I can
David see).  As part of doing my homework, I would like to
David read the version 1.8.1 help files for these two
David functions, but now that I've upgraded I'm not sure
David where I can find them.  Is there a simple way for me
David to get copies of these two help files to compare with
David the versions in 1.9.0?  As far as I can see,
David barplot() and tapply() in 1.9.0 work as described in
David their 1.9.0 help files (which does not surprise me).

David I've been lurking on this list long enough to know
David that if there has been a change it is documented, so
David it must be that I just haven't found it yet.  If
David there hasn't been a change, then I am totally
David perplexed, because I have been running this Sweave
David file several times a day for the last few weeks and
David have not changed that part of it (I've been changing
David the LaTeX parts).

David In the part of the code that has changed I use
David tapply() to summarise some data and then plot it with
David barplot().  I now have to use matrix() on the output
David of tapply() before using barplot() because tapply()
David produces a list and barplot() wants a vector or
David matrix.

David In the code below, z is a dataframe, yllperdth is a
David numeric and fld is the name of a factor, both in the
David dataframe.

David Old version (as used with R 1.8.1):

David   ## Calculate the % of YLLs for each group in the
David cause classification.  x - tapply(z$yllperdth, z[,
David fld], sum) totalYLLs - sum(x) x - x / totalYLLs *
David 100 x - sort(x)
  
David   ## Plot the chart. horiz = TRUE makes it a bar
David instead of ## column chart.  las = 1 prints the
David labels horizontally.  xplot - barplot(x, horiz =
David TRUE, xlab = Percent of YLLs, las = 1)


David New Version (as used with R 1.9.0):

David   ## Calculate the % of YLLs for each group in the
David cause classification.  x - tapply(z$yllperdth, z[,
David fld], sum) totalYLLs - sum(x) x - x / totalYLLs *
David 100 x - sort(x)

David   causeNames - names(x) ## NEW BIT x - matrix(x) ##
David NEW BIT
  

David   ## Plot the chart. horiz = TRUE makes it a bar
David instead of ## column chart.  las = 1 prints the
David labels horizontally.  xplot - barplot(x, beside =
David TRUE, ## NEW BIT names.arg = causeNames, ## NEW BIT
David horiz = TRUE, xlab = Percent of YLLs, las = 1)




 version
David  _ platform i686-pc-linux-gnu arch i686 os
David linux-gnu system i686, linux-gnu status major 1 minor
David 9.0 year 2004 month 04 day 12 language R


David A little while before upgrading I noted my previous R
David version (for a post that I redrafted 7 times and
David never sent because I found the answer through
David refining my draft), and it was:

 version
David  _ platform i686-pc-linux-gnu arch i686 os
David linux-gnu system i686, linux-gnu status Patched major
David 1 minor 8.1 year 2004 month 02 day 16 language R

David So, can I get the old help files?  Or it is easy to
David point me to a documented change?  Or is it clear from
David my code what has changed or what I am or was doing
David wrong?

David Thanks.

David Dave

David -- David Whiting Dar es Salaam, Tanzania

__
[EMAIL PROTECTED] mailing list

Re: [R] tapply() and barplot() help files for 1.8.1

2004-04-15 Thread Duncan Murdoch

On Thu, 15 Apr 2004 18:10:27 +0200, Martin Maechler
[EMAIL PROTECTED] wrote :

 David == David Whiting [EMAIL PROTECTED]
 on 15 Apr 2004 11:42:18 + writes:

David Hi,

David I've just upgraded to 1.9.0 and one of my Sweave
David files that produces a number of barplots in a
David standard manner now produces them in a different way.
David I have made a couple of small changes to my code to
David get the back the output I was getting before
David upgrading and now (mostly out of curiosity) would
David like to understand what has changed.

and I like to help you.
As I keep installed `(almost) all released versions of R ever
installed on our machines'
I can easily run 1.8.1 (or 1.4.x or 1.0.x ...) for you.

The only difference
 between the help page help(tapply)
is an extra   require(stats) statement at the beginning of the
`Examples' section in 1.9.0.

and the only change to  tapply() is 
group - rep.int(one, nx)#- to contain the splitting vector
instead of
group - rep(one, nx)#- to contain the splitting vector

which hardly should have adverse results.

In barplot, there's the new 'offset' option  --- not in NEWS ()

and another change that may be a problem.

Here's a reproducible bug in barplot in 1.9.0 (based on an email I got
this morning from Richard Rowe):

x - table(rep(1:5,1:5))
barplot(x)

The problem is that table() produces a one dimensional array, and
barplot() doesn't handle those properly now.  The offending line is
this one:

$ cvs diff -r 1.3 barplot.R
[junk deleted] 
43c43
   width - rep(width, length.out = NR * NC)
---
   width - rep(width, length.out = NR)

In the example above, x gets turned into a matrix with NR=1 row and
NC=5 columns so only one bar width gets set.

Duncan Murdoch

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] tapply

2004-03-18 Thread Gabor Grothendieck

Try this (untested):

aggregate( data[,6:8], list(date = as.matrix(data[,1:3]) %*% c(1,100,1)), mean )

---
Date:   Thu, 18 Mar 2004 09:39:02 +0100 
From:   [EMAIL PROTECTED]
To:   [EMAIL PROTECTED] 
Subject:   [R] tapply 

Dear all
I have a dataframe containing hourly data of 3 parameters. 
I would like to create a dataframe containg daily mean values of these 
parameters. Additionally I want to keep information about time of 
measurement (year,month,day). 
With the function tapply I can average over a column of the dataframe. 
I can repeat the function 2 time and merge the vectors. In this way I 
obtain my new dataframe (see below).If I want to add the column day, 
month and year I can repeat tapply other three time. This system works. 

Question: is there a function that average in a single step over the 3 
columns?

Thanks a lot for your answer!
Regards
Mike Campana 

 read the data
setwd(c:/R)
data - NULL
data - as.data.frame(read.table(file=Montreal.txt,header=F,skip=15))
colnames(data) 
-c(year,month,day,hour,min,temp,press,ozone)
### create mean value
temp_daily - 
tapply(data$temp,data$year*1+data$month*100+data$day,FUN=mean)
press_daily - 
tapply(data$press,data$year*1+data$month*100+data$day,FUN=mean)
ozone_daily - 
tapply(data$ozone,data$year*1+data$month*100+data$day,FUN=mean)
### merge the data
newdata - as.data.frame (cbind(temp_daily,temp_daily,temp_daily))

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] tapply

2004-03-18 Thread Thomas Petzoldt

[EMAIL PROTECTED] wrote:

Question: is there a function that average in a single step over the 3 
columns?
You may look for ?aggregate

Thomas P.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

47 matches

Mail list logo