Re: [R] tapply grand mean
Lauri Nikkinen wrote: Hi R-users, I have a data.frame like this (modificated from https://stat.ethz.ch/pipermail/r-help/2007-August/138124.html). y1 - rnorm(20) + 6.8 y2 - rnorm(20) + (1:20*1.7 + 1) y3 - rnorm(20) + (1:20*6.7 + 3.7) y - c(y1,y2,y3) x - rep(1:5,12) f - gl(3,20, labels=paste(lev, 1:3, sep=)) d - data.frame(x=x,y=y, f=f) and this is how I can calculate mean of these levels. tapply(d$y, list(d$x, d$f), mean) But how can I calculate the mean of d$x 1 and 2 and the grand mean of d$x 1, 2, 3, 4, 5 (within d$f) into a table? You might like the tables produced by summary.formula() in the Hmisc package: library(Hmisc) summary(y ~ x + f, data = d, fun=mean, method=cross, overall=TRUE) UseMethod by x, f +-+ |N| |y| +-+ +---+-+-+-+-+ | x | lev1 | lev2 | lev3 | ALL | +---+-+-+-+-+ |1 | 4 | 4 | 4 |12 | | | 6.452326|15.861256|61.393455|27.902346| +---+-+-+-+-+ |2 | 4 | 4 | 4 |12 | | | 7.403041|17.296270|68.208299|30.969203| +---+-+-+-+-+ |3 | 4 | 4 | 4 |12 | | | 6.117648|17.976864|73.479837|32.524783| +---+-+-+-+-+ |4 | 4 | 4 | 4 |12 | | | 7.831390|19.696998|80.323382|35.950590| +---+-+-+-+-+ |5 | 4 | 4 | 4 |12 | | | 6.746213|21.101952|87.430087|38.426084| +---+-+-+-+-+ |ALL|20 |20 |20 |60 | | | 6.910124|18.386668|74.167012|33.154601| +---+-+-+-+-+ summary(y ~ I(x %in% c(1,2)) + f, data = d, fun=mean, method=cross, overall=TRUE) UseMethod by I(x %in% c(1, 2)), f +-+ |N| |y| +-+ +-+-+-+-+-+ |I(x %in% c(1, 2))| lev1 | lev2 | lev3 | ALL | +-+-+-+-+-+ | FALSE |12 |12 |12 |36 | | | 6.898417|19.591938|80.411102|35.633819| +-+-+-+-+-+ | TRUE | 8 | 8 | 8 |24 | | | 6.927684|16.578763|64.800877|29.435774| +-+-+-+-+-+ | ALL|20 |20 |20 |60 | | | 6.910124|18.386668|74.167012|33.154601| +-+-+-+-+-+ Regards, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply grand mean
Thanks Chuck but I would fancy the output made by tapply because the idea is to make a barplot based on those values. -Lauri 2007/8/8, Chuck Cleland [EMAIL PROTECTED]: Lauri Nikkinen wrote: Hi R-users, I have a data.frame like this (modificated from https://stat.ethz.ch/pipermail/r-help/2007-August/138124.html). y1 - rnorm(20) + 6.8 y2 - rnorm(20) + (1:20*1.7 + 1) y3 - rnorm(20) + (1:20*6.7 + 3.7) y - c(y1,y2,y3) x - rep(1:5,12) f - gl(3,20, labels=paste(lev, 1:3, sep=)) d - data.frame(x=x,y=y, f=f) and this is how I can calculate mean of these levels. tapply(d$y, list(d$x, d$f), mean) But how can I calculate the mean of d$x 1 and 2 and the grand mean of d$x 1, 2, 3, 4, 5 (within d$f) into a table? You might like the tables produced by summary.formula() in the Hmisc package: library(Hmisc) summary(y ~ x + f, data = d, fun=mean, method=cross, overall=TRUE) UseMethod by x, f +-+ |N| |y| +-+ +---+-+-+-+-+ | x | lev1 | lev2 | lev3 | ALL | +---+-+-+-+-+ |1 | 4 | 4 | 4 |12 | | | 6.452326|15.861256|61.393455|27.902346| +---+-+-+-+-+ |2 | 4 | 4 | 4 |12 | | | 7.403041|17.296270|68.208299|30.969203| +---+-+-+-+-+ |3 | 4 | 4 | 4 |12 | | | 6.117648|17.976864|73.479837|32.524783| +---+-+-+-+-+ |4 | 4 | 4 | 4 |12 | | | 7.831390|19.696998|80.323382|35.950590| +---+-+-+-+-+ |5 | 4 | 4 | 4 |12 | | | 6.746213|21.101952|87.430087|38.426084| +---+-+-+-+-+ |ALL|20 |20 |20 |60 | | | 6.910124|18.386668|74.167012|33.154601| +---+-+-+-+-+ summary(y ~ I(x %in% c(1,2)) + f, data = d, fun=mean, method=cross, overall=TRUE) UseMethod by I(x %in% c(1, 2)), f +-+ |N| |y| +-+ +-+-+-+-+-+ |I(x %in% c(1, 2))| lev1 | lev2 | lev3 | ALL | +-+-+-+-+-+ | FALSE |12 |12 |12 |36 | | | 6.898417|19.591938|80.411102|35.633819| +-+-+-+-+-+ | TRUE | 8 | 8 | 8 |24 | | | 6.927684|16.578763|64.800877|29.435774| +-+-+-+-+-+ | ALL|20 |20 |20 |60 | | | 6.910124|18.386668|74.167012|33.154601| +-+-+-+-+-+ Regards, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply grand mean
Lauri Nikkinen wrote: Thanks Chuck but I would fancy the output made by tapply because the idea is to make a barplot based on those values. -Lauri sum1 - summary(y ~ x + f, data = d, fun=mean, method=cross, overall=TRUE) df - data.frame(x = sum1$x, f = sum1$f, y = sum1$S) df xf y 11 lev1 6.452326 22 lev1 7.403041 33 lev1 6.117648 44 lev1 7.831390 55 lev1 6.746213 6 ALL lev1 6.910124 71 lev2 15.861256 82 lev2 17.296270 93 lev2 17.976864 10 4 lev2 19.696998 11 5 lev2 21.101952 12 ALL lev2 18.386668 13 1 lev3 61.393455 14 2 lev3 68.208299 15 3 lev3 73.479837 16 4 lev3 80.323382 17 5 lev3 87.430087 18 ALL lev3 74.167012 19 1 ALL 27.902346 20 2 ALL 30.969203 21 3 ALL 32.524783 22 4 ALL 35.950590 23 5 ALL 38.426084 24 ALL ALL 33.154601 library(lattice) barchart(y ~ x | f, data = df, layout=c(4,1,1)) OR barchart(S ~ x | f, data = sum1, layout=c(4,1,1)) 2007/8/8, Chuck Cleland [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]: Lauri Nikkinen wrote: Hi R-users, I have a data.frame like this (modificated from https://stat.ethz.ch/pipermail/r-help/2007-August/138124.html). y1 - rnorm(20) + 6.8 y2 - rnorm(20) + (1:20* 1.7 + 1) y3 - rnorm(20) + (1:20*6.7 + 3.7) y - c(y1,y2,y3) x - rep(1:5,12) f - gl(3,20, labels=paste(lev, 1:3, sep=)) d - data.frame(x=x,y=y, f=f) and this is how I can calculate mean of these levels. tapply(d$y, list(d$x, d$f), mean) But how can I calculate the mean of d$x 1 and 2 and the grand mean of d$x 1, 2, 3, 4, 5 (within d$f) into a table? You might like the tables produced by summary.formula() in the Hmisc package: library(Hmisc) summary(y ~ x + f, data = d, fun=mean, method=cross, overall=TRUE) UseMethod by x, f +-+ |N| |y| +-+ +---+-+-+-+-+ | x | lev1 | lev2 | lev3 | ALL | +---+-+-+-+-+ |1 | 4 | 4 | 4 |12 | | | 6.452326|15.861256|61.393455|27.902346| +---+-+-+-+-+ |2 | 4 | 4 | 4 |12 | | | 7.403041|17.296270|68.208299|30.969203| +---+-+-+-+-+ |3 | 4 | 4 | 4 |12 | | | 6.117648|17.976864|73.479837|32.524783| +---+-+-+-+-+ |4 | 4 | 4 | 4 |12 | | | 7.831390|19.696998|80.323382|35.950590| +---+-+-+-+-+ |5 | 4 | 4 | 4 |12 | | | 6.746213|21.101952|87.430087|38.426084| +---+-+-+-+-+ |ALL|20 |20 |20 |60 | | | 6.910124|18.386668|74.167012|33.154601| +---+-+-+-+-+ summary(y ~ I(x %in% c(1,2)) + f, data = d, fun=mean, method=cross, overall=TRUE) UseMethod by I(x %in% c(1, 2)), f +-+ |N| |y| +-+ +-+-+-+-+-+ |I(x %in% c(1, 2))| lev1 | lev2 | lev3 | ALL | +-+-+-+-+-+ | FALSE |12 |12 |12 |36 | | | 6.898417|19.591938|80.411102|35.633819| +-+-+-+-+-+ | TRUE | 8 | 8 | 8 |24 | | | 6.927684|16.578763|64.800877|29.435774| +-+-+-+-+-+ | ALL|20 |20 |20 |60 | | | 6.910124|18.386668|74.167012|33.154601| +-+-+-+-+-+ Regards, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] tapply
I do not understand what you want. If aps is constant over each class then the mean for each class is equal to any value of aps. Using your example you can do tapply(icu1$aps, icu1$d, mean) but it does not give you anything new. Can you explain the problem a bit more? --- sigalit mangut-leiba [EMAIL PROTECTED] wrote: hello, i want to compute the mean of a variable (aps) for every class (1,2, and 3). every id have a few obs., aps and class are constant over id. like this: id aps class 1 11 2 1 11 2 1 11 2 1 11 2 1 11 2 2 83 2 83 2 83 3 12 2 3 12 2 . . i tried: tapply(icu1$aps_st, icu1$hidclass, function(z) mean(unique(z))) but it's counting every row and not every id. thank you, Sigalit. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply
I also don't understand, but perhaps: with(df, tapply(aps, list(class, id), mean)) -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O On 19/07/07, sigalit mangut-leiba [EMAIL PROTECTED] wrote: hello, i want to compute the mean of a variable (aps) for every class (1,2, and 3). every id have a few obs., aps and class are constant over id. like this: id aps class 1 11 2 1 11 2 1 11 2 1 11 2 1 11 2 2 83 2 83 2 83 3 12 2 3 12 2 . . i tried: tapply(icu1$aps_st, icu1$hidclass, function(z) mean(unique(z))) but it's counting every row and not every id. thank you, Sigalit. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply
sigalit mangut-leiba wrote: I'm sorry for the unfocused questions, i'm new here... the output should be: classaps_mean 1 na 2 11.5 3 8 the mean aps of every class, when every id count *once*, for example: class 2, mean= (11+12)/2=11.5 hope it's clearer. Much... Get the first record for each individual from (e.g.) icul.redux - subset(icul, !duplicated(id)) then use tapply as before using variables from icul.redux. Or in one go with( subset(icul, !duplicated(id)), tapply(aps, class, mean, na.rm=TRUE) ) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply
I'm not sure what is the 'pvalue' function (it's not found in base nor stats packages) but this should give you what you want: # some example re - rnorm(100) reg - rep(1:3, length=100) ast - rep(1:2, length=100) tapply( re, list(reg, ast), function(v) shapiro.test(v)$p.value ) # or neater by defining a function p.shapiro - function(v) shapiro.test(v)$p.value tapply( re, list(reg, ast), p.shapiro ) hth, michal Hello, I want to conduct normality test to a series of data and get the p-value for each subset. I am using the following codes, but it does not work. tapply(re, list(reg, ast), pvalue(shapiro.test)) Could anyone give me some advice? Many thanks. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply
try this: tapply(re, list(reg, ast), function(x) shapiro.test(x)$p.value) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: livia [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Friday, June 01, 2007 1:00 PM Subject: [R] tapply Hello, I want to conduct normality test to a series of data and get the p-value for each subset. I am using the following codes, but it does not work. tapply(re, list(reg, ast), pvalue(shapiro.test)) Could anyone give me some advice? Many thanks. -- View this message in context: http://www.nabble.com/tapply-tf3851631.html#a10910748 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply histogram
use lattice graph At 08:00 AM 6/1/2007, livia wrote: Dear members, I would like to pass the histogram settings to each subset of the dataframe, and generate a multiple figures graph. First, can anyone tell me how to generate a multiple figures environment? I am trying mfrow=c(2,4) and nothing appears. Secondly, I want to pass the following function in tapply() hist(x, freq=FALSE) lines(density(x), col=red) rug(x) how can I manage it? Many thanks -- View this message in context: http://www.nabble.com/tapply-histogram-tf3852186.html#a10912441 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply histogram
On Fri, 2007-06-01 at 06:00 -0700, livia wrote: Dear members, I would like to pass the histogram settings to each subset of the dataframe, and generate a multiple figures graph. First, can anyone tell me how to generate a multiple figures environment? I am trying mfrow=c(2,4) and nothing appears. Secondly, I want to pass the following function in tapply() hist(x, freq=FALSE) lines(density(x), col=red) rug(x) how can I manage it? Many thanks In this case, you would not want to use one of the *apply() family of functions. First, it does not save you anything and second, these functions are designed to return some type of R object, which you don't want here. Better to use a for() loop and if you wish, encapsulate the loop in a function. Something along the lines of the following, which actually defines a new 'formula' method for hist() (though not fully tested): hist.formula - function(formula, data, cols, rows, ...) { DF - model.frame(formula, data = data, ...) DF.split - split(DF[[1]], DF[[2]]) par(mfrow = c(cols, rows)) for (i in names(DF.split)) { Col - DF.split[[i]] hist(Col, freq = FALSE, main = i, ...) lines(density(Col), col = red) rug(Col) } } The function will take the formula, create a data frame comprised of the formula terms and then loop over the list of data frames created by split(). So we call it as follows: hist(Sepal.Length ~ Species, data = iris, 2, 2) Based upon the formula specification, you will then get a matrix of histograms, where each will be titled with the factor level used to split the original data frame. You could further consolidate the function by implementing an automated means to determine the number of rows and columns required in the plot matrix, but I'll leave that for you. See ?model.frame and ?split HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply
Hallo Seems to me that you can make a summary table using aggregate(RESPONSE, list(TREATMENT, MEASUREMENT, BLOCK, STUDY), mean) and then if you want you can use reshape function or melt/cast function from reshape package to get wide form of your table. Regards Petr Pikal [EMAIL PROTECTED] [EMAIL PROTECTED] napsal dne 10.04.2007 00:14:15: Hi, I have a summary table for an experiment that looks like this STUDY BLOCK TREATMENT MEASURMENT RESPONSE A 1 T-0 1 12 A 1 T-1 1 52 A 1 T-0 2 12 A 1 T-1 2 65 and so on... there are 10 studies, 4 blocks, 10 treatemnts, 5 measurments for the response value. I want to produce a table that looks like this: STUDY BLOCK TREATMENT MEAS.1 MEAS.2 MEAS.3 A 1 T-1 15 54 65 A 1 T-2 54 65 45 A 2 T-1 12 12 23 A 2 T-2 65 54 65 and so on... with tapply(RESPONSE, list(TREATMENT, MEASUREMENT, BLOCK, STUDY), mean) I get very close, however, I get the results as a list! if instead I use ftable(tapply(RESPONSE, list(TREATMENT, MEASUREMENT, BLOCK, STUDY), mean)) I get REALLY close, but the I get only one value for each class, however I need to whole table, because at the end, what I really need is the increment between MEASUREMENT (n) - Measurement (n-1) for each TREATMENT, BLOCK, STUDY, to perform a ANOVA analysis over increment data. Esentialy, I want to move away from running a pivot-table in ACCESS Any thoughts? Cristian Montes North Carolina State University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply, levelinformation
Hi Jim, jim holtman schrieb: Here is one way: t - split(mat, classes) for (i in names(t)) plotdensity(t[[i]], main=i) But then I don't use the advantages of the tapply anymore... What is the problem you are trying to solve? I have a set of data (multiple files), which belong to different conditions (one or more files per condition). I wanted to read the data set and a description of the conditions and then automatically create plots for data of the same condition. Maybe it's much to complicate the way I do... Antje - NEU: Fragen stellen - Wissen, Meinungen und Erfahrungen teilen. Jetzt auf Yahoo! Clever. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply, levelinformation
Hi Jim, jim holtman schrieb: Here is one way: t - split(mat, classes) for (i in names(t)) plotdensity(t[[i]], main=i) But then I don't use the advantages of the tapply anymore... What is the problem you are trying to solve? I have a set of data (multiple files), which belong to different conditions (one or more files per condition). I wanted to read the data set and a description of the conditions and then automatically create plots for data of the same condition. Maybe it's much to complicate the way I do... Antje __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply, levelinformation
But it does the same thing. What 'advantage' of tapply do you think that you are missing? Performance is probably not impacted since most of the time is in the plot. On 2/16/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi Jim, jim holtman schrieb: Here is one way: t - split(mat, classes) for (i in names(t)) plotdensity(t[[i]], main=i) But then I don't use the advantages of the tapply anymore... What is the problem you are trying to solve? I have a set of data (multiple files), which belong to different conditions (one or more files per condition). I wanted to read the data set and a description of the conditions and then automatically create plots for data of the same condition. Maybe it's much to complicate the way I do... Antje - NEU: Fragen stellen - Wissen, Meinungen und Erfahrungen teilen. Jetzt auf Yahoo! Clever. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply and data.frame?
Is this what you want: tst p1 p10 p100 p1000 p1001 p1002 p1003 p1004 p1005 p1006 1 5 1 8 6 5 8 7 4 4 data.frame(point=names(tst), ind=tst) point ind p1 p1 1 p10 p10 5 p100 p100 1 p1000 p1000 8 p1001 p1001 6 p1002 p1002 5 p1003 p1003 8 p1004 p1004 7 p1005 p1005 4 p1006 p1006 4 On 1/23/07, Zhang Jian [EMAIL PROTECTED] wrote: I want to transform the data by tapply to one dataframe. But I can not get it. For example: tst=tapply(point,pp,length) tst[1:10] p1 p10 p100 p1000 p1001 p1002 p1003 p1004 p1005 p1006 1 5 1 8 6 5 8 7 4 4 res=as.data.frame(tst) # I try to transform it res[1:10,] p1 p10 p100 p1000 p1001 p1002 p1003 p1004 p1005 p1006 1 5 1 8 6 5 8 7 4 4 How to transfrom it like the following: res point ind 1 p1 1 2 p10 5 3 p100 1 4 p1000 8 5 p1001 6 Thanks! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply, data.frame problem
Lauri Nikkinen wrote: Hi R-users, I'm quite new to R and trying to learn the basics. I have a following problem concerning the convertion of array object into data frame. I have made following data sets tmp1 - rnorm(100) tmp2 - gl(10,2,length=100) tmp3 - as.data.frame(cbind(tmp1,tmp2)) tmp3.sum - tapply(tmp3$tmp1,tmp3$tmp2,sum) tmp3.sum - as.data.frame(tapply(tmp1,tmp2,sum)) and I want the levels from tmp2 be shown as a column in the data.frame, not as row name as it now does. To put it in another way, as a result, I want a data frame with two columns: levels and the sums of those levels. Row names can be, for example, numbers from 1 to 10. aggregate(tmp3[1], tmp3[2], sum) tmp2tmp1 1 1 8.41550650 2 2 3.65831086 3 3 -0.26296334 4 4 3.45368671 5 5 -4.64383794 6 6 0.25640949 7 7 0.02832348 8 8 -0.03811150 9 9 1.41724121 10 10 -1.06780900 ?aggregate -Lauri Nikkinen Lahti, Finland [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply question
i think you can't have column with the same names. data.frame(AAA=1:3, AAA=4:6) AAA AAA.1 1 1 4 2 2 5 3 3 6 but you could subset the data frame by names using substring(): sapply(unique(substring(names(data1), 1, 3)), function(x) rowMeans(data1[, substring(names(data1), 1, 3) == x]) --- Jacques VESLOT CNRS UMR 8090 I.B.L (2ème étage) 1 rue du Professeur Calmette B.P. 245 59019 Lille Cedex Tel : 33 (0)3.20.87.10.44 Fax : 33 (0)3.20.87.10.31 http://www-good.ibl.fr --- [EMAIL PROTECTED] a écrit : I think I understand tapply but i still can't figure out how to do the following. I have a dataframe where some of the column names are the same and i want to make a new dataframe where columns that have the same name are averaged by row. so, if the data frame, DF, was AAABBB CCC AAA DDD 1 07 11 13 20 8 12 14 30 6 0 15 then the resulting data frame would be exactly the same except that the AAA column would be 6 comes from (11 + 1)/2 7comes from (12 + 2)/2 3 stays 3 because the element in the other AAA is zero so i don't want to average that one. it shoulsd just stay 3. So, I do DF[DF == 0]-NA rowaverage-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE) revisedDF-tapply(seq(DF),names(DF),rowmeans) there are two problems with this : 1) i need to go through the rows of the same name, not the columns so i don't think seq(DF) is right because that goes through the columns but i want to go through rows. 2) BBB will come back with ALL NA's ( since it was unique and there was nothing else to average ( and I don't know how to transform that BB column to all zero's. thanks and i'm sorry for so many questions. i'm getting bettter with this stuff and my questions will decrease soon. my guess is that i no longer should be using tapply ? and should be using some other version of apply. thanks mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply question
I think this does what you want: In - AAABBB CCC AAA DDD + 1 07 11 13 + 20 8 12 14 + 30 6 0 15 DF - read.table(textConnection(In), header=TRUE, check.names=FALSE) DF[DF == 0]-NA rowaverage-function(x) rowMeans(DF[x],na.rm=TRUE) revisedDF-tapply(seq(DF),names(DF),rowaverage) revisedDF $AAA 1 2 3 6 7 3 $BBB 1 2 3 NA NA NA $CCC 1 2 3 7 8 6 $DDD 1 2 3 13 14 15 do.call('cbind', revisedDF) AAA BBB CCC DDD 1 6 NA 7 13 2 7 NA 8 14 3 3 NA 6 15 On 7/6/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I think I understand tapply but i still can't figure out how to do the following. I have a dataframe where some of the column names are the same and i want to make a new dataframe where columns that have the same name are averaged by row. so, if the data frame, DF, was AAABBB CCC AAA DDD 1 07 11 13 20 8 12 14 30 6 0 15 then the resulting data frame would be exactly the same except that the AAA column would be 6 comes from (11 + 1)/2 7comes from (12 + 2)/2 3 stays 3 because the element in the other AAA is zero so i don't want to average that one. it shoulsd just stay 3. So, I do DF[DF == 0]-NA rowaverage-function(x) x[rowMeans(forecastDf[x],na.rm=TRUE) revisedDF-tapply(seq(DF),names(DF),rowmeans) there are two problems with this : 1) i need to go through the rows of the same name, not the columns so i don't think seq(DF) is right because that goes through the columns but i want to go through rows. 2) BBB will come back with ALL NA's ( since it was unique and there was nothing else to average ( and I don't know how to transform that BB column to all zero's. thanks and i'm sorry for so many questions. i'm getting bettter with this stuff and my questions will decrease soon. my guess is that i no longer should be using tapply ? and should be using some other version of apply. thanks mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 646 9390 (Cell) +1 513 247 0281 (Home) What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply with unequal length of arguments
2006/3/12, Stefanie von Felten, IPWIfU [EMAIL PROTECTED]: Hi everyone, Is it possible to use tapply(x,y,mean) if not all groups of x by y are of the same length (for example if you have one missing observation)? Yes,It works. I tried tapply(x,y,mean,na.omit=T) but it doesn't work! What does it doesn't work mean exactly?Can you give an example and the error msg? Steffi -- - Stefanie von Felten Doktorandin ETH Zürich Institut für Pflanzenwissenschaften ETH Zentrum, LFW A 2 Telefon: 044 632 85 97 Telefax: 044 632 11 53 e-mail: [EMAIL PROTECTED] http://www.ipw.agrl.ethz.ch/~svfelten/ und: Universität Zürich Institut für Umweltwissenschaften Winterthurerstrasse 190 8057 Zürich Telefon: 044 635 61 23 Telefax: 044 635 57 11 e-mail: [EMAIL PROTECTED] http://www.unizh.ch/uwinst/homepages/steffi.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- 黄荣贵 Deparment of Sociology Fudan University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply with unequal length of arguments
Stefanie von Felten, IPWIfU wrote: Hi everyone, Is it possible to use tapply(x,y,mean) if not all groups of x by y are of the same length (for example if you have one missing observation)? I tried tapply(x,y,mean,na.omit=T) but it doesn't work! See ?tapply which tells you that the argument ... is passed to FUN which is mean() in this case. mean() has an argument na.rm, see ?mean. So we get: tapply(x, y, mean, na.rm = TRUE) Please read the help pages more carefully. Uwe Ligges Steffi __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply and weighted means
you need also to split the 'w' column, for each level of 'x'; you could use: lapply(split(truc, truc$x), function(z) weighted.mean(z$y, z$w)) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://www.med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Florent Bresson [EMAIL PROTECTED] To: R-help r-help@stat.math.ethz.ch Sent: Thursday, January 12, 2006 3:44 PM Subject: [R] tapply and weighted means I' m trying to compute weighted mean on different groups but it only returns NA. If I use the following data.frame truc: x y w 1 1 1 1 2 2 1 3 1 1 4 2 0 2 1 0 3 2 0 4 1 0 5 1 where x is a factor, and then use the command : tapply(truc$y,list(truc$x),wtd.mean, weights=truc$w) I just get NA. What's the problem ? What can I do ? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply and weighted means
Dimitris Rizopoulos wrote: you need also to split the 'w' column, for each level of 'x'; you could use: lapply(split(truc, truc$x), function(z) weighted.mean(z$y, z$w)) I hope it helps. Best, Dimitris Or: library(Hmisc) ?wtd.mean The help file has a built-in example of this. Frank Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://www.med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Florent Bresson [EMAIL PROTECTED] To: R-help r-help@stat.math.ethz.ch Sent: Thursday, January 12, 2006 3:44 PM Subject: [R] tapply and weighted means I' m trying to compute weighted mean on different groups but it only returns NA. If I use the following data.frame truc: x y w 1 1 1 1 2 2 1 3 1 1 4 2 0 2 1 0 3 2 0 4 1 0 5 1 where x is a factor, and then use the command : tapply(truc$y,list(truc$x),wtd.mean, weights=truc$w) I just get NA. What's the problem ? What can I do ? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply and weighted means
On Thu, 2006-01-12 at 15:44 +0100, Florent Bresson wrote: I' m trying to compute weighted mean on different groups but it only returns NA. If I use the following data.frame truc: x y w 1 1 1 1 2 2 1 3 1 1 4 2 0 2 1 0 3 2 0 4 1 0 5 1 where x is a factor, and then use the command : tapply(truc$y,list(truc$x),wtd.mean, weights=truc$w) I just get NA. What's the problem ? What can I do ? Florent, I guess you didn't read the help for tapply, which in the Value section states: Note that optional arguments to 'FUN' supplied by the '...' argument are not divided into cells. It is therefore inappropriate for 'FUN' to expect additional arguments with the same length as 'X'. So tapply is not the right tool for this job. We can use by() instead (a wrapper for tapply) as so: dat - matrix(scan(), byrow = TRUE, ncol = 3) 1 1 1 1 2 2 1 3 1 1 4 2 0 2 1 0 3 2 0 4 1 0 5 1 colnames(dat) - c(x, y, w) dat - as.data.frame(dat) dat (res - by(dat, dat$x, function(z) weighted.mean(z$y, z$w))) but if you want to easily access the numbers you need to do a little work, e.g. as.vector(res) Also, I don't see a function wtd.mean in standard R and weighted.mean() doesn't have a weights argument, so I guess you are using a function from another package and did not tell us. HTH, Gav -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply question
Frank Johannes wrote: HI, Suppose I have the following data structure. LRT tp 1 1.50654010 522 2 0.51793929 522 3 0.90340299 522 4 1.20293325 522 5 1.05578774 523 6 0.01617942 523 7 0.68183543 523 8 0.43820244 523 9 1.14123995 524 10 0.05809550 524 11 0.93061597 524 12 1.39739700 524 13 1.05220953 525 14 0.03471461 525 15 0.63168798 525 16 1.40592603 525 17 1.41884492 526 18 0.23388479 526 19 0.21881064 526 20 0.99710830 526 21 2.02054187 527 22 1.99872887 527 23 1.04187450 527 24 1.31556807 527 25 2.5190 528 26 2.94778561 528 27 1.88800177 528 28 2.08249941 528 I have succesfully used a command line such as the one below to get maxima for each tp-category' data.out-data[tapply(LRT,tp, function(x) which(LRT==max(x))),] However, when I try it on the above data, it gives me the following error message: Error in [.data.frame(data, tapply(LRT, tp, function(x) which(LRT == : invalid subscript type Works for me. Look at your data structures and check whether your data frame is OK. Or much better easier: tapply(LRT, tp, max) Uwe Ligges I don't know what to do. Thanks for your help -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply huge speed difference if X has names
Please use a current version of R! This was fixed long ago, and you will find it in the NEWS file: split() now handles vectors with names internally and so is almost as fast as on vectors without names (and maybe 100x faster than before). On Mon, 8 Aug 2005, Matthew Dowle wrote: Hi all, Apologies if this has been raised before ... R's tapply is very fast, but if X has names in this example, there seems to be a huge slow down: under 1 second compared to 151 seconds. The following timings are repeatable and are timed properly on a single user machine : X = 1:10 names(X) = X system.time(fast-tapply(as.vector(X), rep(1:1,each=10), mean)) # as.vector() to drop the names [1] 0.36 0.00 0.35 0.00 0.00 system.time(slow-tapply(X, rep(1:1,each=10), mean)) [1] 149.95 1.83 151.79 0.00 0.00 head(fast) 123456 5.5 15.5 25.5 35.5 45.5 55.5 head(slow) 123456 5.5 15.5 25.5 35.5 45.5 55.5 identical(fast,slow) [1] TRUE Looking inside tapply, which then calls split, it seems there is an is.null(names(x)) which prevents R's internal fast version from being called. Why is that there? Could it be removed? I often do something like tapply(mat[,colname],...) where mat has rownames. Therefore the rownames of mat become the names of the vector mat[,colname], and this seems to slow down tapply a lot. Perhaps other functions which call split also suffer this problem? split.default function (x, f) { if (is.list(f)) f - interaction(f) f - factor(f) if (is.null(attr(x, class)) is.null(names(x))) return(.Internal(split(x, f))) lf - levels(f) y - vector(list, length(lf)) names(y) - lf for (k in lf) y[[k]] - x[f %in% k] y } environment: namespace:base version _ platform x86_64-redhat-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major2 minor0.1 year 2004 month11 day 15 language R Thanks and regards, Matthew [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply
AndyL == Liaw, Andy [EMAIL PROTECTED] on Tue, 21 Jun 2005 13:30:54 -0400 writes: AndyL Try: (x - factor(1:2, levels=1:5)) AndyL [1] 1 2 AndyL Levels: 1 2 3 4 5 (x - x[, drop=TRUE]) AndyL [1] 1 2 AndyL Levels: 1 2 or (x - factor(1:2, levels=1:5)) (x2 - factor(x)) which also drops the level Martin AndyL Andy From: Weiwei Shi [mailto:[EMAIL PROTECTED] Even before I tried, I already realize it must be true when I read this reply! Great job! thanks, Andy. str(z) `data.frame': 235 obs. of 2 variables: $ CLAIMNUM : Factor w/ 1907 levels 0,1001849,..: 1083 1083 1083 1582 1582 1084 1681 1681 1391 1391 ... $ SIU.SAVED: int 475 3000 3000 0 0 4352 0 0 4500 3000 ... So, I have another general question: how to avoid this when I do the matching? In my case, claimnum does not have to be a factor. I think I can do as.integer on it to de-factor it. But, I want to know how to do it w/ keeping is as factor? btw, what's your way to drop those levels? :) weiwei On 6/21/05, Liaw, Andy [EMAIL PROTECTED] wrote: What does str(z) say? I suspect the second column is a factor, which, after the subsetting, has some empty levels. If so, just drop those levels. Andy From: Weiwei Shi hi i tried all the methods suggested above: ave and rowsum with with function works for my situation. I think the problem might not be due to tapply. My data z comes from z-y[y[[1]] %in% x[[2]], c(1,9)] while z is supposed to have no entries for those non-matched between x and y. however, when I run tapply, and the result also includes those non-matched entries. I use is.na function to remove those entry from z first and then use tapply again, but the result is the same: those NA's and those non-matched results are still there. That's what I mean by it doesn't work. Is there something I missed here so that z implicitly has some trace back to y dataset? thanks, On 6/20/05, Gabor Grothendieck [EMAIL PROTECTED] wrote: On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote: hi, i have another question on tapply: i have a dataset z like this: 5540 389100307391 2600 5541 389100307391 2600 5542 389100307391 2600 5543 389100307391 2600 5544 389100307391 2600 5546 381300302513NA 5547 387000307470NA 5548 387000307470NA 5549 387000307470NA 5550 387000307470NA 5551 387000307470NA 5552 387000307470NA I want to sum the column 3 by column 2. I removed NA by calling: tapply(z[[3]], z[[2]], sum, na.rm=T) but it does not work. then, i used z1-z[!is.na(z[[3]],] and repeat still doesn't work. please help. Depending on what you want you may be able to use rowsum: - display only groups that have at least one non-NA with the sum being the sum of the non-NAs: with(na.omit(z), rowsum(V3, V2)) - display all groups with the sum being NA if any member is NA: rowsum(z$V3, z$V2) -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- -- Weiwei Shi, Ph.D Did you
Re: [R] tapply
hi i tried all the methods suggested above: ave and rowsum with with function works for my situation. I think the problem might not be due to tapply. My data z comes from z-y[y[[1]] %in% x[[2]], c(1,9)] while z is supposed to have no entries for those non-matched between x and y. however, when I run tapply, and the result also includes those non-matched entries. I use is.na function to remove those entry from z first and then use tapply again, but the result is the same: those NA's and those non-matched results are still there. That's what I mean by it doesn't work. Is there something I missed here so that z implicitly has some trace back to y dataset? thanks, On 6/20/05, Gabor Grothendieck [EMAIL PROTECTED] wrote: On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote: hi, i have another question on tapply: i have a dataset z like this: 5540 389100307391 2600 5541 389100307391 2600 5542 389100307391 2600 5543 389100307391 2600 5544 389100307391 2600 5546 381300302513NA 5547 387000307470NA 5548 387000307470NA 5549 387000307470NA 5550 387000307470NA 5551 387000307470NA 5552 387000307470NA I want to sum the column 3 by column 2. I removed NA by calling: tapply(z[[3]], z[[2]], sum, na.rm=T) but it does not work. then, i used z1-z[!is.na(z[[3]],] and repeat still doesn't work. please help. Depending on what you want you may be able to use rowsum: - display only groups that have at least one non-NA with the sum being the sum of the non-NAs: with(na.omit(z), rowsum(V3, V2)) - display all groups with the sum being NA if any member is NA: rowsum(z$V3, z$V2) -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply
What does str(z) say? I suspect the second column is a factor, which, after the subsetting, has some empty levels. If so, just drop those levels. Andy From: Weiwei Shi hi i tried all the methods suggested above: ave and rowsum with with function works for my situation. I think the problem might not be due to tapply. My data z comes from z-y[y[[1]] %in% x[[2]], c(1,9)] while z is supposed to have no entries for those non-matched between x and y. however, when I run tapply, and the result also includes those non-matched entries. I use is.na function to remove those entry from z first and then use tapply again, but the result is the same: those NA's and those non-matched results are still there. That's what I mean by it doesn't work. Is there something I missed here so that z implicitly has some trace back to y dataset? thanks, On 6/20/05, Gabor Grothendieck [EMAIL PROTECTED] wrote: On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote: hi, i have another question on tapply: i have a dataset z like this: 5540 389100307391 2600 5541 389100307391 2600 5542 389100307391 2600 5543 389100307391 2600 5544 389100307391 2600 5546 381300302513NA 5547 387000307470NA 5548 387000307470NA 5549 387000307470NA 5550 387000307470NA 5551 387000307470NA 5552 387000307470NA I want to sum the column 3 by column 2. I removed NA by calling: tapply(z[[3]], z[[2]], sum, na.rm=T) but it does not work. then, i used z1-z[!is.na(z[[3]],] and repeat still doesn't work. please help. Depending on what you want you may be able to use rowsum: - display only groups that have at least one non-NA with the sum being the sum of the non-NAs: with(na.omit(z), rowsum(V3, V2)) - display all groups with the sum being NA if any member is NA: rowsum(z$V3, z$V2) -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply
Even before I tried, I already realize it must be true when I read this reply! Great job! thanks, Andy. str(z) `data.frame': 235 obs. of 2 variables: $ CLAIMNUM : Factor w/ 1907 levels 0,1001849,..: 1083 1083 1083 1582 1582 1084 1681 1681 1391 1391 ... $ SIU.SAVED: int 475 3000 3000 0 0 4352 0 0 4500 3000 ... So, I have another general question: how to avoid this when I do the matching? In my case, claimnum does not have to be a factor. I think I can do as.integer on it to de-factor it. But, I want to know how to do it w/ keeping is as factor? btw, what's your way to drop those levels? :) weiwei On 6/21/05, Liaw, Andy [EMAIL PROTECTED] wrote: What does str(z) say? I suspect the second column is a factor, which, after the subsetting, has some empty levels. If so, just drop those levels. Andy From: Weiwei Shi hi i tried all the methods suggested above: ave and rowsum with with function works for my situation. I think the problem might not be due to tapply. My data z comes from z-y[y[[1]] %in% x[[2]], c(1,9)] while z is supposed to have no entries for those non-matched between x and y. however, when I run tapply, and the result also includes those non-matched entries. I use is.na function to remove those entry from z first and then use tapply again, but the result is the same: those NA's and those non-matched results are still there. That's what I mean by it doesn't work. Is there something I missed here so that z implicitly has some trace back to y dataset? thanks, On 6/20/05, Gabor Grothendieck [EMAIL PROTECTED] wrote: On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote: hi, i have another question on tapply: i have a dataset z like this: 5540 389100307391 2600 5541 389100307391 2600 5542 389100307391 2600 5543 389100307391 2600 5544 389100307391 2600 5546 381300302513NA 5547 387000307470NA 5548 387000307470NA 5549 387000307470NA 5550 387000307470NA 5551 387000307470NA 5552 387000307470NA I want to sum the column 3 by column 2. I removed NA by calling: tapply(z[[3]], z[[2]], sum, na.rm=T) but it does not work. then, i used z1-z[!is.na(z[[3]],] and repeat still doesn't work. please help. Depending on what you want you may be able to use rowsum: - display only groups that have at least one non-NA with the sum being the sum of the non-NAs: with(na.omit(z), rowsum(V3, V2)) - display all groups with the sum being NA if any member is NA: rowsum(z$V3, z$V2) -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachment...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply
Try: (x - factor(1:2, levels=1:5)) [1] 1 2 Levels: 1 2 3 4 5 (x - x[, drop=TRUE]) [1] 1 2 Levels: 1 2 Andy From: Weiwei Shi [mailto:[EMAIL PROTECTED] Even before I tried, I already realize it must be true when I read this reply! Great job! thanks, Andy. str(z) `data.frame': 235 obs. of 2 variables: $ CLAIMNUM : Factor w/ 1907 levels 0,1001849,..: 1083 1083 1083 1582 1582 1084 1681 1681 1391 1391 ... $ SIU.SAVED: int 475 3000 3000 0 0 4352 0 0 4500 3000 ... So, I have another general question: how to avoid this when I do the matching? In my case, claimnum does not have to be a factor. I think I can do as.integer on it to de-factor it. But, I want to know how to do it w/ keeping is as factor? btw, what's your way to drop those levels? :) weiwei On 6/21/05, Liaw, Andy [EMAIL PROTECTED] wrote: What does str(z) say? I suspect the second column is a factor, which, after the subsetting, has some empty levels. If so, just drop those levels. Andy From: Weiwei Shi hi i tried all the methods suggested above: ave and rowsum with with function works for my situation. I think the problem might not be due to tapply. My data z comes from z-y[y[[1]] %in% x[[2]], c(1,9)] while z is supposed to have no entries for those non-matched between x and y. however, when I run tapply, and the result also includes those non-matched entries. I use is.na function to remove those entry from z first and then use tapply again, but the result is the same: those NA's and those non-matched results are still there. That's what I mean by it doesn't work. Is there something I missed here so that z implicitly has some trace back to y dataset? thanks, On 6/20/05, Gabor Grothendieck [EMAIL PROTECTED] wrote: On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote: hi, i have another question on tapply: i have a dataset z like this: 5540 389100307391 2600 5541 389100307391 2600 5542 389100307391 2600 5543 389100307391 2600 5544 389100307391 2600 5546 381300302513NA 5547 387000307470NA 5548 387000307470NA 5549 387000307470NA 5550 387000307470NA 5551 387000307470NA 5552 387000307470NA I want to sum the column 3 by column 2. I removed NA by calling: tapply(z[[3]], z[[2]], sum, na.rm=T) but it does not work. then, i used z1-z[!is.na(z[[3]],] and repeat still doesn't work. please help. Depending on what you want you may be able to use rowsum: - display only groups that have at least one non-NA with the sum being the sum of the non-NAs: with(na.omit(z), rowsum(V3, V2)) - display all groups with the sum being NA if any member is NA: rowsum(z$V3, z$V2) -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply
This may help Rwei V1 V2 V3 1 5540 389100307391 2600 2 5541 389100307391 2600 3 5542 389100307391 2600 4 5543 389100307391 2600 5 5544 389100307391 2600 6 5546 381300302513 NA 7 5547 387000307470 NA 8 5548 387000307470 NA 9 5549 387000307470 NA 10 5550 387000307470 NA 11 5551 387000307470 NA 12 5552 387000307470 NA Rave(wei[,3],wei[,2],FUN=sum) [1] 13000 13000 13000 13000 13000NANANANANANANA R -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Weiwei Shi Sent: June 20, 2005 7:16 PM To: R-help@stat.math.ethz.ch Subject: [R] tapply hi, i have another question on tapply: i have a dataset z like this: 5540 389100307391 2600 5541 389100307391 2600 5542 389100307391 2600 5543 389100307391 2600 5544 389100307391 2600 5546 381300302513NA 5547 387000307470NA 5548 387000307470NA 5549 387000307470NA 5550 387000307470NA 5551 387000307470NA 5552 387000307470NA I want to sum the column 3 by column 2. I removed NA by calling: tapply(z[[3]], z[[2]], sum, na.rm=T) but it does not work. then, i used z1-z[!is.na(z[[3]],] and repeat still doesn't work. please help. -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply
On Mon, 2005-06-20 at 18:15 -0500, Weiwei Shi wrote: hi, i have another question on tapply: i have a dataset z like this: 5540 389100307391 2600 5541 389100307391 2600 5542 389100307391 2600 5543 389100307391 2600 5544 389100307391 2600 5546 381300302513NA 5547 387000307470NA 5548 387000307470NA 5549 387000307470NA 5550 387000307470NA 5551 387000307470NA 5552 387000307470NA I want to sum the column 3 by column 2. I removed NA by calling: tapply(z[[3]], z[[2]], sum, na.rm=T) but it does not work. then, i used z1-z[!is.na(z[[3]],] and repeat still doesn't work. please help. The index vector(s) in tapply() need to be a list. See the description of the INDEX argument in ?tapply: tapply(z[[3]],list(z[[2]]), sum, na.rm = TRUE) 381300302513 387000307470 389100307391 0013000 Note that the use of na.rm = TRUE here results in misleading values of 0 for the other two groups, which are all NA's and this is not self-evident unless you know the data. You may be better off with: tapply(z[[3]],list(z[[2]]), sum) 381300302513 387000307470 389100307391 NA NA13000 unless your real data is a mix of NA's and measured values. Also see ?complete.cases and ?na.omit for further approaches to dealing with such data sets. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply
On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote: hi, i have another question on tapply: i have a dataset z like this: 5540 389100307391 2600 5541 389100307391 2600 5542 389100307391 2600 5543 389100307391 2600 5544 389100307391 2600 5546 381300302513NA 5547 387000307470NA 5548 387000307470NA 5549 387000307470NA 5550 387000307470NA 5551 387000307470NA 5552 387000307470NA I want to sum the column 3 by column 2. I removed NA by calling: tapply(z[[3]], z[[2]], sum, na.rm=T) but it does not work. then, i used z1-z[!is.na(z[[3]],] and repeat still doesn't work. Can you be more explicit about doesn't work? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply
On 6/20/05, Weiwei Shi [EMAIL PROTECTED] wrote: hi, i have another question on tapply: i have a dataset z like this: 5540 389100307391 2600 5541 389100307391 2600 5542 389100307391 2600 5543 389100307391 2600 5544 389100307391 2600 5546 381300302513NA 5547 387000307470NA 5548 387000307470NA 5549 387000307470NA 5550 387000307470NA 5551 387000307470NA 5552 387000307470NA I want to sum the column 3 by column 2. I removed NA by calling: tapply(z[[3]], z[[2]], sum, na.rm=T) but it does not work. then, i used z1-z[!is.na(z[[3]],] and repeat still doesn't work. please help. Depending on what you want you may be able to use rowsum: - display only groups that have at least one non-NA with the sum being the sum of the non-NAs: with(na.omit(z), rowsum(V3, V2)) - display all groups with the sum being NA if any member is NA: rowsum(z$V3, z$V2) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply and NA value
you should look at the 'na.rm=FALSE' argument of '?mean()', i.e., x - rnorm(100); x[sample(100, 10)] - NA f - sample(letters[1:5], 100, TRUE) ### tapply(x, f, mean) tapply(x, f, mean, na.rm=TRUE) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm - Original Message - From: Leonardo Lami [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Friday, March 25, 2005 10:35 AM Subject: [R] tapply and NA value Hi, I'm writing for a little help. I have a dataframe with same NA value and I'd like to obtain the means of the value of a coloumn grouped by the levels of a factor coloumn of the datframe. I'm using the function tapply but I see that if only a NA value is present the result is NA. There is an option to have the correct result or I must use an other function? Thanks of all Leonardo -- Leonardo Lami [EMAIL PROTECTED]www.faunalia.it Via Colombo 3 - 51010 Massa e Cozzile (PT), Italy Tel: (+39)349-1310164 GPG key @: hkp://wwwkeys.pgp.net http://www.pgp.net/wwwkeys.html https://www.biglumber.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply and NA value
I am not really sure what you mean. If I understand you correctly, than all ylu have to do is to give additiona parameter to tapply, na.rm=TRUE, tapply(, na.rm=TRUE) However as I already said, I'm not sure what you did and what is the problem. Plese provide the code that did not work, possibly with a workable example, as the posting guide suggests: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html; I hope this helps in anyway, Ales Ziberna - Original Message - From: Leonardo Lami [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Friday, March 25, 2005 10:35 AM Subject: [R] tapply and NA value Hi, I'm writing for a little help. I have a dataframe with same NA value and I'd like to obtain the means of the value of a coloumn grouped by the levels of a factor coloumn of the datframe. I'm using the function tapply but I see that if only a NA value is present the result is NA. There is an option to have the correct result or I must use an other function? Thanks of all Leonardo -- Leonardo Lami [EMAIL PROTECTED]www.faunalia.it Via Colombo 3 - 51010 Massa e Cozzile (PT), Italy Tel: (+39)349-1310164 GPG key @: hkp://wwwkeys.pgp.net http://www.pgp.net/wwwkeys.html https://www.biglumber.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply and NA value
Thanks very much! Best of all, Leonardo Alle 10:52, venerdì 25 marzo 2005, Dimitris Rizopoulos ha scritto: you should look at the 'na.rm=FALSE' argument of '?mean()', i.e., x - rnorm(100); x[sample(100, 10)] - NA f - sample(letters[1:5], 100, TRUE) ### tapply(x, f, mean) tapply(x, f, mean, na.rm=TRUE) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm - Original Message - From: Leonardo Lami [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Friday, March 25, 2005 10:35 AM Subject: [R] tapply and NA value Hi, I'm writing for a little help. I have a dataframe with same NA value and I'd like to obtain the means of the value of a coloumn grouped by the levels of a factor coloumn of the datframe. I'm using the function tapply but I see that if only a NA value is present the result is NA. There is an option to have the correct result or I must use an other function? Thanks of all Leonardo -- Leonardo Lami [EMAIL PROTECTED]www.faunalia.it Via Colombo 3 - 51010 Massa e Cozzile (PT), Italy Tel: (+39)349-1310164 GPG key @: hkp://wwwkeys.pgp.net http://www.pgp.net/wwwkeys.html https://www.biglumber.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Leonardo Lami [EMAIL PROTECTED]www.faunalia.it Via Colombo 3 - 51010 Massa e Cozzile (PT), Italy Tel: (+39)349-1310164 GPG key @: hkp://wwwkeys.pgp.net http://www.pgp.net/wwwkeys.html https://www.biglumber.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] tapply and names
From: Göran Broström I have a data frame containing children, with variables 'year' = birth year, and 'm.id' = mother's id number. Let's assume that all the births of each mother is represented in the data frame. Now I want to create a subset of this data frame containing all children, whose mother's first birth was in the year 1816 or later. This seems to work: mid - tapply(dat$year, dat$m.id, min) mid - as.numeric(names(mid)[mid = 1816]) dat - dat[dat$m.id %in% mid, ] but I'm worried about the second line, because the output from 'tapply' isn't documented to have a 'dimnames' attribute (although it has one, at least in R-2.1.0, 2005-01-19). Another aspect is that this code relies on m.id being numeric; I would have to change it if the type of m.id changes to, eg, character. So, question: Is there a better way of doing this? Would this work? dat - dat[ave(dat$year, dat$m.id, min) = 1816, ] Andy -- Göran Broströmtel: +46 90 786 5223 Department of Statistics fax: +46 90 786 6614 Umeå University http://www.stat.umu.se/egna/gb/ SE-90187 Umeå, Sweden e-mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply and names
your approach, after omitting the as.numeric() in the second line, seems to work even for `m.id' being factor, i.e., dat - data.frame(m.id=rep(letters[1:10], 10), year=sample(1805:1950, 100, TRUE)) ### mid - tapply(dat$year, dat$m.id, min) mid - names(mid)[mid = 1816] dat. - dat[dat$m.id %in% mid, ] dat; dat. but maybe there is something better. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm - Original Message - From: Göran Broström [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Tuesday, January 25, 2005 3:55 PM Subject: [R] tapply and names I have a data frame containing children, with variables 'year' = birth year, and 'm.id' = mother's id number. Let's assume that all the births of each mother is represented in the data frame. Now I want to create a subset of this data frame containing all children, whose mother's first birth was in the year 1816 or later. This seems to work: mid - tapply(dat$year, dat$m.id, min) mid - as.numeric(names(mid)[mid = 1816]) dat - dat[dat$m.id %in% mid, ] but I'm worried about the second line, because the output from 'tapply' isn't documented to have a 'dimnames' attribute (although it has one, at least in R-2.1.0, 2005-01-19). Another aspect is that this code relies on m.id being numeric; I would have to change it if the type of m.id changes to, eg, character. So, question: Is there a better way of doing this? -- Göran Broströmtel: +46 90 786 5223 Department of Statistics fax: +46 90 786 6614 Umeå University http://www.stat.umu.se/egna/gb/ SE-90187 Umeå, Sweden e-mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply and names
On Tue, Jan 25, 2005 at 10:43:24AM -0500, Liaw, Andy wrote: From: Göran Broström I have a data frame containing children, with variables 'year' = birth year, and 'm.id' = mother's id number. Let's assume that all the births of each mother is represented in the data frame. Now I want to create a subset of this data frame containing all children, whose mother's first birth was in the year 1816 or later. This seems to work: mid - tapply(dat$year, dat$m.id, min) mid - as.numeric(names(mid)[mid = 1816]) dat - dat[dat$m.id %in% mid, ] but I'm worried about the second line, because the output from 'tapply' isn't documented to have a 'dimnames' attribute (although it has one, at least in R-2.1.0, 2005-01-19). Another aspect is that this code relies on m.id being numeric; I would have to change it if the type of m.id changes to, eg, character. So, question: Is there a better way of doing this? Would this work? dat - dat[ave(dat$year, dat$m.id, min) = 1816, ] Yes, but you (or I) need dat - dat[ave(dat$year, dat$m.id, FUN = min) = 1816, ] ^ (took me some time to figure out), because ?ave Usage: ave(x, ..., FUN = mean) Thanks Andy for giving me 'ave'! And thanks to Dimitris for his suggestion. Göran __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply hist
... # Histograms by technology par(mfrow=c(2,3)) tapply(Pot,SGruppo,hist) detach(dati) It all works great but tapply(Pot,SGruppo,hist) produces 6 histograms with the titles and the xlab labels in a generic form, something like integer[1], integer[2], ... while I'd like to have each graph indicating the tapply takes atomic data (usually vectors). You want to pass rows of a data frame, so the Pot *and* SGruppo will be sent together; by() is very good for this. It might be possible (even easy?) to use tapply, but I just use by for these things. Since dati is your data frame, try this (untested!): by(dati,dati$SGruppo, function(x,...){ hist(x$Pot,main=as.character(x$SGruppo[1])) } ) Or, use Lattice: library(lattice) histogram( ~ Pot | SGruppo, data=dati) Cheers Jason __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply hist
As another respondent already mentioned, Lattice is probably the way to go on this one but if you do want to use tapply try this: names(Pot) - SGruppo dummy - tapply(Pot,SGruppo,function(x)hist(x,main=names(x)[1],xlab=NULL)) Vittorio v.demartino2 at virgilio.it writes: : : I'm learning how to use tapply. : Now I'm having a go at the following code in which dati contains almost 600 : lines, Pot - numeric - are the capacities of power plants and SGruppo - text : - the corresponding six technologies (CCC, CIC,TGC, CSC,CPC, TE). : . : : dati=sqlQuery(canale,select Id,SGruppo,Classe, NGruppo,ProdNetta,Pot from : SintesiQuery) : attach(dati) : # Grouping by technology : tapply(Pot,SGruppo,sum) : ... : # Histograms by technology : par(mfrow=c(2,3)) : tapply(Pot,SGruppo,hist) : detach(dati) : : It all works great but tapply(Pot,SGruppo,hist) produces 6 histograms with : the titles and the xlab labels in a generic form, something like integer[1], : integer[2], ... while I'd like to have each graph indicating the : mentioned technologies. : I've been trying issuing : tech=c(CCC, CIC,TGC, CSC,CPC, TE) : tapply(Pot,SGruppo,hist, main=tech) : : but R prints in each histogram the six values in the title without cycling : among them. : : How can I obtain what I want? : : Ciao : Vittorio __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply() and barplot() help files for 1.8.1
Martin Maechler [EMAIL PROTECTED] writes: and I like to help you. As I keep installed `(almost) all released versions of R ever installed on our machines' I can easily run 1.8.1 (or 1.4.x or 1.0.x ...) for you. The only difference between the help page help(tapply) is an extra require(stats) statement at the beginning of the `Examples' section in 1.9.0. and the only change to tapply() is group - rep.int(one, nx)#- to contain the splitting vector instead of group - rep(one, nx)#- to contain the splitting vector which hardly should have adverse results. In barplot, there's the new 'offset' option --- not in NEWS () and another change that may be a problem. Can you dig harder and if possible provide a reproducible (small..) example to make progress here... Last night I found I had a backup of the source of 1.8.0, built that and tested an example and it worked as in 1.9.0. I then started to question my sanity (or at least my competence). The code that follows should be a reproducible example. It creates a data frame that has the same structure as the data I am working with (with a number of other columns dropped) and is followed by the function that creates the barplot. The changes I have had to make to make it work as I thought it was working with 1.8.1 have ## NEW BIT after them, i.e. those lines were not there in the version I ran with 1.8.1. The important new lines are: x - matrix(x) ## NEW BIT and beside = TRUE, ## NEW BIT --- EXAMPLE --- ## Create some fake data. x - c(rep(, 926), rep(All Other Perinatal Causes, 46), rep(Anaemia, 3), rep(Congenital Abnormalities, 1), rep(Unsp. Direct Maternal Causes, 24)) y - runif(length(x)) tempdat - data.frame(smi=x, yllperdth=y) ## Define the function to make my barplot bodShare - function(x, fld, main = , userpar = 18, xlimMult=1.3 ) { ### # A horizontal barchart to display BoD shares # ### z - subset(x, as.character(x[,fld]) != ) z[, fld] - factor(z[, fld]) ## We need to change the parameters of the chart. ## First save the old settings. oldpar - par(mar) newpar - par(mar) ## Increase the size of the margin on the left so there ## is enough space for the long text labels (which will ## be displayed horizontally on the y-axis). newpar[2] - userpar ## Reduce the top margin because I will use a \caption in LaTeX ## instead. newpar[3] - 1 ## Now apply the new settings. par(mar = newpar) ## Calculate the % of YLLs for each group in the cause classification. x - tapply(z$yllperdth, z[, fld], sum) totalYLLs - sum(x) x - x / totalYLLs * 100 x - sort(x) causeNames - names(x) ## NEW BIT x - matrix(x) ## NEW BIT ## Plot the chart. horiz = TRUE makes it a bar instead of ## column chart. las = 1 prints the labels horizontally. xplot - barplot(x, ## main = main, horiz = TRUE, beside = TRUE,## NEW BIT names.arg = causeNames, ## NEW BIT xlab = Percent of YLLs, xlim = c(0, max(x) * xlimMult), las = 1) text(x + (max(x) * .15), xplot, formatC(x, digits=1, format='f')) ## Reset the old margin parameters. par(mar = oldpar) ## Write data to a table for export. # First we need to remove newlines from labels. names(x) - sub(\n, , names(x)) write.table(as.table(x), file = paste(tables/, fld, .csv, sep=), col.names=NA, sep=\t) names(x) - causeNames x[length(x)] } ## Create the barplot. bodShare(tempdat, smi) -- David Whiting Dar es Salaam, Tanzania __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply() and barplot() help files for 1.8.1
David == David Whiting [EMAIL PROTECTED] on 15 Apr 2004 11:42:18 + writes: David Hi, David I've just upgraded to 1.9.0 and one of my Sweave David files that produces a number of barplots in a David standard manner now produces them in a different way. David I have made a couple of small changes to my code to David get the back the output I was getting before David upgrading and now (mostly out of curiosity) would David like to understand what has changed. and I like to help you. As I keep installed `(almost) all released versions of R ever installed on our machines' I can easily run 1.8.1 (or 1.4.x or 1.0.x ...) for you. The only difference between the help page help(tapply) is an extra require(stats) statement at the beginning of the `Examples' section in 1.9.0. and the only change to tapply() is group - rep.int(one, nx)#- to contain the splitting vector instead of group - rep(one, nx)#- to contain the splitting vector which hardly should have adverse results. In barplot, there's the new 'offset' option --- not in NEWS () and another change that may be a problem. Can you dig harder and if possible provide a reproducible (small..) example to make progress here... David I *think* I've tracked it down to tapply() and/or David barplot() and have not seen anything in the NEWS file David regarding changes to these functions (as far a I can David see). As part of doing my homework, I would like to David read the version 1.8.1 help files for these two David functions, but now that I've upgraded I'm not sure David where I can find them. Is there a simple way for me David to get copies of these two help files to compare with David the versions in 1.9.0? As far as I can see, David barplot() and tapply() in 1.9.0 work as described in David their 1.9.0 help files (which does not surprise me). David I've been lurking on this list long enough to know David that if there has been a change it is documented, so David it must be that I just haven't found it yet. If David there hasn't been a change, then I am totally David perplexed, because I have been running this Sweave David file several times a day for the last few weeks and David have not changed that part of it (I've been changing David the LaTeX parts). David In the part of the code that has changed I use David tapply() to summarise some data and then plot it with David barplot(). I now have to use matrix() on the output David of tapply() before using barplot() because tapply() David produces a list and barplot() wants a vector or David matrix. David In the code below, z is a dataframe, yllperdth is a David numeric and fld is the name of a factor, both in the David dataframe. David Old version (as used with R 1.8.1): David ## Calculate the % of YLLs for each group in the David cause classification. x - tapply(z$yllperdth, z[, David fld], sum) totalYLLs - sum(x) x - x / totalYLLs * David 100 x - sort(x) David ## Plot the chart. horiz = TRUE makes it a bar David instead of ## column chart. las = 1 prints the David labels horizontally. xplot - barplot(x, horiz = David TRUE, xlab = Percent of YLLs, las = 1) David New Version (as used with R 1.9.0): David ## Calculate the % of YLLs for each group in the David cause classification. x - tapply(z$yllperdth, z[, David fld], sum) totalYLLs - sum(x) x - x / totalYLLs * David 100 x - sort(x) David causeNames - names(x) ## NEW BIT x - matrix(x) ## David NEW BIT David ## Plot the chart. horiz = TRUE makes it a bar David instead of ## column chart. las = 1 prints the David labels horizontally. xplot - barplot(x, beside = David TRUE, ## NEW BIT names.arg = causeNames, ## NEW BIT David horiz = TRUE, xlab = Percent of YLLs, las = 1) version David _ platform i686-pc-linux-gnu arch i686 os David linux-gnu system i686, linux-gnu status major 1 minor David 9.0 year 2004 month 04 day 12 language R David A little while before upgrading I noted my previous R David version (for a post that I redrafted 7 times and David never sent because I found the answer through David refining my draft), and it was: version David _ platform i686-pc-linux-gnu arch i686 os David linux-gnu system i686, linux-gnu status Patched major David 1 minor 8.1 year 2004 month 02 day 16 language R David So, can I get the old help files? Or it is easy to David point me to a documented change? Or is it clear from David my code what has changed or what I am or was doing David wrong? David Thanks. David Dave David -- David Whiting Dar es Salaam, Tanzania __ [EMAIL PROTECTED] mailing list
Re: [R] tapply() and barplot() help files for 1.8.1
On Thu, 15 Apr 2004 18:10:27 +0200, Martin Maechler [EMAIL PROTECTED] wrote : David == David Whiting [EMAIL PROTECTED] on 15 Apr 2004 11:42:18 + writes: David Hi, David I've just upgraded to 1.9.0 and one of my Sweave David files that produces a number of barplots in a David standard manner now produces them in a different way. David I have made a couple of small changes to my code to David get the back the output I was getting before David upgrading and now (mostly out of curiosity) would David like to understand what has changed. and I like to help you. As I keep installed `(almost) all released versions of R ever installed on our machines' I can easily run 1.8.1 (or 1.4.x or 1.0.x ...) for you. The only difference between the help page help(tapply) is an extra require(stats) statement at the beginning of the `Examples' section in 1.9.0. and the only change to tapply() is group - rep.int(one, nx)#- to contain the splitting vector instead of group - rep(one, nx)#- to contain the splitting vector which hardly should have adverse results. In barplot, there's the new 'offset' option --- not in NEWS () and another change that may be a problem. Here's a reproducible bug in barplot in 1.9.0 (based on an email I got this morning from Richard Rowe): x - table(rep(1:5,1:5)) barplot(x) The problem is that table() produces a one dimensional array, and barplot() doesn't handle those properly now. The offending line is this one: $ cvs diff -r 1.3 barplot.R [junk deleted] 43c43 width - rep(width, length.out = NR * NC) --- width - rep(width, length.out = NR) In the example above, x gets turned into a matrix with NR=1 row and NC=5 columns so only one bar width gets set. Duncan Murdoch __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] tapply
Try this (untested): aggregate( data[,6:8], list(date = as.matrix(data[,1:3]) %*% c(1,100,1)), mean ) --- Date: Thu, 18 Mar 2004 09:39:02 +0100 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [R] tapply Dear all I have a dataframe containing hourly data of 3 parameters. I would like to create a dataframe containg daily mean values of these parameters. Additionally I want to keep information about time of measurement (year,month,day). With the function tapply I can average over a column of the dataframe. I can repeat the function 2 time and merge the vectors. In this way I obtain my new dataframe (see below).If I want to add the column day, month and year I can repeat tapply other three time. This system works. Question: is there a function that average in a single step over the 3 columns? Thanks a lot for your answer! Regards Mike Campana read the data setwd(c:/R) data - NULL data - as.data.frame(read.table(file=Montreal.txt,header=F,skip=15)) colnames(data) -c(year,month,day,hour,min,temp,press,ozone) ### create mean value temp_daily - tapply(data$temp,data$year*1+data$month*100+data$day,FUN=mean) press_daily - tapply(data$press,data$year*1+data$month*100+data$day,FUN=mean) ozone_daily - tapply(data$ozone,data$year*1+data$month*100+data$day,FUN=mean) ### merge the data newdata - as.data.frame (cbind(temp_daily,temp_daily,temp_daily)) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] tapply
[EMAIL PROTECTED] wrote: Question: is there a function that average in a single step over the 3 columns? You may look for ?aggregate Thomas P. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html