Re: [R] Factor levels
Sebastain Does the following work for you? seb <- read.table(file='clipboard', header=T) seb$c [1] w k r s k p l u z s j j x r d j x w q f Levels: d f j k l p q r s u w x z seb$c <- factor(seb$c, levels=unique(seb$c)) seb$c [1] w k r s k p l u z s j j x r d j x w q f Levels: w k r s p l u z j x d q f Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Sébastien > Sent: Wednesday, 29 August 2007 9:00 a.m. > To: Gabor Grothendieck > Cc: R-help > Subject: Re: [R] Factor levels > > Ok, I cannot send to you one of my dataset since they are > confidential. > But I can produce a dummy "mini" dataset to illustrate my question. > Let's say I have a csv file with 3 columns and 20 rows which > content is reproduced by the following line. > > > mydata<-data.frame(a=1:20, > b=sample(100:200,20,replace=T),c=sample(letters[1:26], 20, > replace = T)) > mydata > a b c > 1 1 176 w > 2 2 141 k > 3 3 172 r > 4 4 182 s > 5 5 123 k > 6 6 153 p > 7 7 176 l > 8 8 170 u > 9 9 140 z > 10 10 194 s > 11 11 164 j > 12 12 100 j > 13 13 127 x > 14 14 137 r > 15 15 198 d > 16 16 173 j > 17 17 113 x > 18 18 144 w > 19 19 198 q > 20 20 122 f > > If I had to read the csv file, I would use something like: > mydata<-data.frame(read.table(file="c:/test.csv",header=T)) > > Now, if you look at mydata$c, the levels are alphabetically ordered. > > mydata$c > [1] w k r s k p l u z s j j x r d j x w q f > Levels: d f j k l p q r s u w x z > > What I am trying to do is to reorder the levels as to have > them in the order they appear in the table, ie > Levels: w k r s p l u z j x d q f > > Again, keep in mind that my script should be used on datasets > which content are unknown to me. In my example, I have used > letters for mydata$c, but my code may have to handle factors > of numeric or character values (I need to transform specific > columns of my dataset into factors for plotting purposes). My > goal is to let the code scan the content of each factor of my > data.frame during or after the read.table step and reorder > their levels automatically without having to ask the user to > hard-code the level order. > > In a way, my problem is more related to the way the factor > levels are ordered than to the read.table function, although > I guess there is a link... > > Gabor Grothendieck a écrit : > > Its not clear from your description what you want __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple coding question
Kirsten One way to do this: kirsten <- c(123, 1234, 12345) 100*as.numeric(paste(substring(kirsten, 1, 3), substring(kirsten, 4, 5), sep='.')) HTH .... Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Kirsten Beyer > Sent: Tuesday, 31 July 2007 9:31 a.m. > To: r-help@stat.math.ethz.ch > Subject: [R] simple coding question > > I have a list of ICD9 (disease) codes with various formats - 3 digit, > 4 digit, 5 digit. The first three digits of these codes are > what I am most interested in. I would like to either add > zeros to the 3 and 4 digit codes to make them 5 digit codes > or add decimal points to put them all in the format ###.##. > I did not see a function that allows me to do this in the > formatting command. This seems simple - can someone help? > > Thanks, > K.Beyer > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggregate to find majority level of a factor
Jon One way: assuming your data.frame is 'jon' aggregate(jon[,2], list(jon[,1]), function(x) levels(x)[which.max(table(x))]) Group.1 x 1 Plot1 big 2 Plot2 small 3 Plot3 small HTH .... Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Thompson, Jonathan > Sent: Friday, 1 June 2007 7:26 a.m. > To: r-help@stat.math.ethz.ch > Subject: [R] Aggregate to find majority level of a factor > > I want to use the aggregate function to summarize data by a > factor (my field plots), but I want the summary to be the > majority level of another factor. > > > For example, given the dataframe: > > Plot1 big > Plot1 big > Plot1 small > Plot2 big > Plot2 small > Plot2 small > Plot3 small > Plot3 small > Plot3 small > > > My desired result would be: > Plot1 big > Plot2 small > Plot3 small > > > I can't seem to find a scalar function that will give me the > majority level. > > Thanks in advance, > > Jonathan Thompson > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A particular shuffling on a vector
Emmanuel One option which appears to work: emmanuel <- c(1,1,1,2,2,3,3,3) #emmanuel <- c(1,1,2,3,4,5,6,6,6,6,6,6,6,6,7,8) runs <- rle(emmanuel)[[1]] shuffle <- sample(1:length(runs)) newEmm <- rep(emmanuel[cumsum(runs)[shuffle]], runs[shuffle]) startPos <- sample(1:length(emmanuel), 1) if (startPos==1) newEmm else newEmm[c(startPos:length(newEmm),1:(startPos-1))] Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Emmanuel Levy > Sent: Friday, 20 April 2007 1:03 p.m. > To: r-help@stat.math.ethz.ch > Subject: [R] A particular shuffling on a vector > > Hello, > > I was wondering if anyone can think of a straightforward way (without > loops) to do the following shuffling: > > Let's imagine a vector: > c(1,1,1,2,2,3,3,3) > > I would like to derive shuffled vectors __where the same > digits are never separated__, although they can be at both > ends (periodicity). > > So the following shuffled vectors are possible: > > c(2,2,1,1,1,3,3,3) > c(2,1,1,1,3,3,3,2) > c(3,3,3,1,1,1,2,2) > c(3,1,1,1,2,2,3,3) > etc ... > > I should mention that there can be any number of different > numbers, and any number of repetition of each number. > > So the vectors I have to deal with could look like > c(1,1,2,3,4,5,6,6,6,6,6,6,6,6,7,8) for example > > Since I have to derive many shuffled versions for each > vector, I am looking for an efficient way of doing it. Can > you think of a way without nested loops? > > Many thanks for your help, > > Best, > > Emmanuel > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Moving plot labels
Dean Try: plot(c(-0.25,18),c(0, max(patient10)),type="n", ylab="SD of POST estimator", xlab="") title(xlab='Scans\n(a)', line=3) Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] > Sent: Thursday, 5 April 2007 12:21 p.m. > To: r-help@stat.math.ethz.ch > Subject: [R] Moving plot labels > > Hi. > > I'm sure this is a simple problem, yet I can't seem to get > simple help for it. > > I am simply trying to move my xlab in plot(). > > I am currently using the following commands: > > plot(c(-0.25,18),c(0, max(patient10)),type="n", ylab="SD of > POST estimator", xlab="Scans \n (a)") > > But when the plot prints, the xlab is printed over top the > xaxis. I tried adjusting mar(), but this just works with the > margins outside the plotting area. What is a simple fix? Is > there a single command I can add that adjusts the size of the > plotting area within the area controlled by mar()? > > Thanks very much. > > Dean > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hwo can i get a vector that...
Check out which.max Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of bunny > , lautloscrew.com > Sent: Thursday, 8 March 2007 11:20 a.m. > To: R-help@stat.math.ethz.ch > Subject: Re: [R] hwo can i get a vector that... > > > > > > dear all, > > > > how can i get a vector that shows the number of the column > of matrix > > that contains the maximum of the row ?? > > can´t believe in need a loop for this... > > > > i have a 100 x 3 matrix and want to get a 100 x 1 vector > with values > > 1,2,3 . > > > > there must be a simple solution. i just cannot find it. i think am > > searching on the wrong end. > > > > thx for help in advance. > > > > m. > > > EDIT: ok, i know the following by now :) > > apply(for18[,-1], 1, max, na.rm=T) > > but this doesn´t get me the number of the column - which is > what i need... > [[alternative HTML version deleted]] > > > __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] If you had just one book on R to buy...
Julien This is quite a common question. Within R, try RSiteSearch('one good book for R') and follow the threads ...... Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Julien Barnier > Sent: Monday, 26 February 2007 8:51 a.m. > To: r-help@stat.math.ethz.ch > Subject: [R] If you had just one book on R to buy... > > Hi, > > I am starting a new job as a study analyst for a social > science research unit. I would really like to use R as my > main tool for data manipulation and analysis. So I'd like to > ask you, if you had just one book on R to buy (or to keep), > which one would it be ? I already bought the Handbook of > Statistical Analysis Using R, but I'd like to have something > more complete, both on the statistical point of view and on R usage. > > I thought that "Modern applied statistics with S-Plus" would > be a good choice, but maybe some of you could have > interesting suggestions ? > > Thanks in advance, > > -- > Julien > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify selected substances across individuals
Jenny > Below is an example data, which contains three id-numbers. > For each id there are three substances in each of the three > blocks. Some substances are repeated twice.The subsatances > are the same for all ids. The id number 3 is actually a > control so all responses (y) that are equal or greater than 4 > are supposed to be removed from this id number. This I can do > easily in R but what I need help with is I want to have those > substances that are removed from id number 3 also removed > from other ids as well. I could do an algorithm like : for id > in 1:2, if substance = c("abc","dgf") then delete but if the > substances to be removed have long strings and are more than > 2 (for example 20 substances) then it would take long time to > list the substances manually. Can you guys please show me a > clever way to do what I described above ? Use subsetting to identify that substances, and then !(...%in%...) to remove records with these substances: yourData[!(yourData$substance %in% yourData[yourData$id==3 & yourData$y>=4, 'substance']),] The above is untested and will need modification is yourData$id or yourData$y contains missing values. Peter Alspach __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] making a grid of points
Ross I think you want ?expand.grid BTW, help.search('grid') finds this. Cheers ...... Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Ross Boylan > Sent: Friday, 8 December 2006 8:03 a.m. > To: r-help > Subject: [R] making a grid of points > > I'd like to evaluate a function at each point on a 2 or 3-D > grid. Is there some function that already does this, or > generates the grid of points? > > My search has led me to the grid and lattice packages, and I > found a reference to the sp package (e.g., SpatialGrid) for > this. There are things in there that might be relevant, but > at first blush many of them are embedded in other concepts > (grobs, shingles, rugs) and don't obviously solve the problem. > > I know this is not a hard thing to program, but I suspect > someone has already done it. Any pointers? > > Thanks. > -- > Ross Boylan wk: (415) 514-8146 > 185 Berry St #5700 [EMAIL PROTECTED] > Dept of Epidemiology and Biostatistics fax: (415) 514-8150 > University of California, San Francisco > San Francisco, CA 94107-1739 hm: (415) 550-1062 > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting a title in a plot during an lapply
Mark > In my code below tempa and tempb are numeric vectors that I > combined into a dataframe along with the deciles of tempa. > I have an lapply statement that goes through the dataframe > and does ten plots according to the appropriate decile. > The code is below and it works fine. There are no bugs so I > figure there was no need to include structure statements on the data. > Also, I don't want to use coplot because the axes get sort of > messed up because of outliers. > > tempa.deciles<-cut(tempa,breaks=quantile(tempa,probs=seq(0,1,0 > .1)),inclu > de.lowest=TRUE) > df<-data.frame(tempa,tempb,tempa.deciles) > levels(df$tempa.deciles)<-paste("Decile ",1:10,sep="") > lapply(split(df, df$tempa.deciles), function(x) { > print(cor(x$tempa,x$tempb)); plot(x$tempa,x$tempb); > > I would like to include a title on each plot so that I can > recognize which plot goes with which decile but I don't know > how to do that basically because I can't figure out what > lapply is looping over by number. I definitely looked around > in various R books but I was not successful. Thanks for your help. Try plot(x$tempa, x$tempb, main=x$tempa.deciles[1]) Peter Alspach __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a table
Michael One solution: df<-data.frame(loc=c("A","B","A","A","A"), year=c(1970,1970,1970,1976,1980)) df[,3] <- cut(df$year, c(1969.5,1974.5,1979.5,1984.5), c('1970-74','1975-79','1980-85')) with(df, addmargins(table(loc, V3))) Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Michael Graber > Sent: Wednesday, 15 November 2006 9:21 a.m. > To: R-Mailingliste > Subject: [R] Creating a table > > Dear R List, > > I am a new to R, so my question may be easy to answer for you: > > I have a dataframe, for example: > > df<-data.frame(loc=c("A","B","A","A","A"), > year=as.numeric(c("1970","1970","1970","1976","1980"))) > > and I want to create the following table without using loops: > > 1970-74 ; 1975-79 ; 1980-85; rowsum > A 2 1 1 4 > B 1 00 1 > colsum 31 15 > > so that the frequencies of df$loc are shown in the table for > different time intervals. > > Thanks in advance for any hint, > > Michael Graber > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] summary linear regression
Gorka See the message from Brian Ripley, which is the first item from RSiteSearch('R-squared') Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Gorka Merino > Sent: Friday, 10 November 2006 4:54 a.m. > To: r-help@stat.math.ethz.ch > Subject: [R] summary linear regression > > > >>Good morning, > >> > >>I am a PhD student in Barcelona working with R. My question > is about > >>the summary of linear regressions. > >>The output of that summary gives some statistical parameters of the > >>regression. One of them is the R-squared. In the help menu > i have read > >>that the manner to calculate the R-squared is > >> > >>R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2), and when the linear > >>regression is forced to start at the point (0,0) y* is 0. > >> > >>My question is if the R-squared obtained when the linear regression > >>crosses the origin has the same meaning as when it does with an > >>intercept. I'm faced to a theoretical model which brings > some values > >>for a variable that compared to the observed ones gives > some R*squared > >>of > >>0.32 when the regression has an intercept and 0.7 when it > is forced to > >>cross the origin. > >>To sum up, it may be stated that the theoretical methodolgy > followed > >>explains the 70% of of the observed variance? > >> > >>Thank you very much. > >> > >>Gorka Merino. > > Gorka Merino > Institut de Ciències del Mar, CMIMA-CSIC Psg. Marítim de la > Barceloneta 37-49 08003-BARCELONA (Spain) > > Tel.: (34) 932 30 95 48 > e-mail: [EMAIL PROTECTED] > > CMIMA: > Tel.: (34) 932 30 95 00 > Fax: (34) 932 30 95 55 > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot question
Mark > i'm doing a bar plot and there are 16 column variables. is > there a way to make the variable names go down instead of > across when you do the barplot ? > because the names are so long, the barplot just shows 3 names > and leaves the rest out. if i could rotate the names 90 > degrees, it would probably fit a lot more. Is this the sort of thing you mean: temp <- barplot(rnorm(16, 3)) text(temp, rep(-0.2, 16), paste('trt', 1:16), srt=90, adj=1) Peter Alspach > or maybe i can use space to make the horizontal width longer > ? I looed up ?barlot but i'm not sure. when 1st and 2nd are > on the bottom, things look fine but i'm not as interesed in > those 2 barplots. > > i didn't use any special options. i just did > > barplot(probsignmatrix) > barplot(t(probsignmatrix)) > > barplot(probsignmatrix,beside=T) > barplot(t(probsignmatrix),beside=T) > > > > i put probsignmatrix below in case someone wants to see what > i mean because it may not be clear. i don't expect anyone to > type it in but rounding would still show what i mean. > thanks a lot. > > > pcount pmpppcount pmmppcount pmmmpcount pcount > mcount pppmmcount ppmmmcount ppmppcount ppmmpcount > pppmpcount ppmpmcount pmpmpcount pmpmmcount pmmpmcount > pmppmcount 1st 0.03477157 0.02842640 0.03157360 0.03365482 > 0.04010152 0.03553299 > 0.03989848 0.04182741 0.02817259 0.03203046 0.02781726 0.02218274 > 0.01771574 0.02289340 0.02583756 0.02390863 2nd 0.04648895 > 0.02901495 0.03092490 0.03064044 0.04108420 0.03998700 > 0.03958062 0.04059655 0.03039662 0.03027471 0.02901495 0.02170026 > 0.01601105 0.02287874 0.02165962 0.02267555 > __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confusing about contrasts concept [long]
Tian It appears the attachment might not have worked so I'll embed Bill's message at the end. Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Peter Alspach > Sent: Thursday, 17 August 2006 8:02 a.m. > To: T Mu; R-Help > Subject: Re: [R] confusing about contrasts concept > > > Tian > > Bill Venables wrote an excellent explanation to the S list > back in 1997. > I saved it as a pdf file and attach it herewith ... > > Peter Alspach > > > > -Original Message- > > From: [EMAIL PROTECTED] > > > [mailto:[EMAIL PROTECTED] On Behalf Of T Mu > > Sent: Thursday, 17 August 2006 3:23 a.m. > > To: R-Help > > Subject: [R] confusing about contrasts concept > > > > > Hi all, > > > > > Where can I find a thorough explanation of Contrasts and > > > Contrasts Matrices? > > I read some help but still confused. > > > > > Thank you, > > Tian Date sent: Sat, 29 Mar 1997 17:23:31 +1030 From: Bill Venables <[EMAIL PROTECTED]> To: Wei Qian <[EMAIL PROTECTED]> Copies to: [EMAIL PROTECTED] Subject:Re: test contrasts [long] Wei Qian writes: I am new to Splus, so please don't be offended if my question is too naive. We're all friends here, Wei. It's not that kind of group...mostly. Does anyone know how to test contrasts in Splus? To be specific, suppose I have a one-way layout experiment with 3 treatments, and I want to test the contrasts of treatment 1 vs. treatment 2, treatment 1 vs. treatment 3, etc. In SAS, I could use the following: proc gim; class trt; model y = trt; contrasts"! vs. 2" 1-10; contrasts "2 vs. 3" 10-1; run; One way I can think of is that to construct my design matrix and obtain the contrast sum of squares through a series of matrix operations, but is there any easy way to do it or any built-in function in Splus can do it? The answer is 'yes' but hardly anyone seems to know how to do it. The explanation in the 'white book' for example, seems a little incomplete to me and not quite adequate to settle the case you raise. (The explanation in the yellow book is also inadequate, but I hope come July we will have fixed that.) Since this is one of the most frequent questions people ask me in direct email, too, let me try (again) to sort it out in some detail. A formula such as y ~ f, where f is a factor in principle generates a single classification model in the form *y_{ij} == mu + phi_i + e_{ij} Write the design matrix in the form X = [1 Xf], where, assuming f has p levels, Xf is the n x p dummy variable (ie binary) matrix corresponding to the phi_i's. So in matrix terms the model is written as *y = 1 mu + Xf phi + e (a) If you remove the intercept term, using y ~ f -1, then Xf is the design matrix you get and the coefficients correspond to the class means; (b) If the intercept term is left in, then the design matrix X does not have full column rank, so an adjustment has to be made to Xf to make it so. The redundancy comes about because the columns of Xf add to the the 1-vector, that is Xf l_p = l_n. So let C be any p x (p -1) matrix such that [1 C] is nonsingular. It can easily be seen that Xc = [1 (Xf C)] will have full column rank and that fitting the model using this model matrix is equivalent to the original redundantly specified model. The matrix C is called the *coding matrix* for the factor f. The linear model is actually fitted in the form *y = 1 mu + (Xf C) alpha + e where alpha has (p-1) components, of course. In order to make sense of the alpha's we need to relate them back to the phi's. For any such C there will be a vector, v, such the v'C = 0' (using ' for transpose). (In fact v spans the orthogonal complement to the column space of C). Clearly phi and alpha are related by *C alpha = phi but since v'C = 0', it follows that an identification constraint applies, namely v'phi = 0. By multiplying both sides by (C'C)^{-1} C', it also follows that *alpha =(C'C)^{-1}C'phi which provides an interpretation for the alpha's in terms of the (constrained) phi's. For example take the Helmert contrasts. > contr.helmert(4) [,1][,2][,3] 1 -1 -1 -1 2 1 -1 -1 3 0 2 -1 4 0 0 3 The constraint vector is clearly v= (1,1,1,1), since the columns add to zero. In this case the columns are also mutually orthogonal, so the matrix (C'C^{-l} C' (the generalized inverse of C) has a similar form apart from a few row divisors: >fractions(ginverse(contr.helmert(4))) [,1][,2][,3][,4] [1,]-1/21/2 0 0 [2,]-1/6-1/61/3 0 [3,]-1/12 -1/12
Re: [R] confusing about contrasts concept
Tian Bill Venables wrote an excellent explanation to the S list back in 1997. I saved it as a pdf file and attach it herewith ... Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of T Mu > Sent: Thursday, 17 August 2006 3:23 a.m. > To: R-Help > Subject: [R] confusing about contrasts concept > > Hi all, > > Where can I find a thorough explanation of Contrasts and > Contrasts Matrices? > I read some help but still confused. > > Thank you, > Tian > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ The contents of this e-mail are privileged and/or confidential to the named recipient and are not to be used by any other person and/or organisation. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to choose columns in data.frame by parts of columns' names?
Wei-wei > I have a data.frame which has names as following. > [1] "XG1" "YG1" "XEST" "YEST" > [2] "XNOEMP1" "XNOEMP2" "YNOEMP1" "YNOEMP2" > [3] "XBUS10" "XBUS10A" "XBUS10B" "XBUS10C" > [4] "YBUS10" "YBUS10A" "YBUS10B" "YBUS10C" > [5] "XOWNBUS" "XSELFEST" "YOWNBUS" "YSELFEST" > > Those columns have names beginning with "X" or "Y". Each "X" > is paired by a "Y", e.g. "XG1" and "YG1", but they are not in > the order of "X Y X Y ...". I want to combine "X*" and "Y*" like this: > > data.new[,"G1"] <- (data.old[,"XG1"] + endata.use[,"YG1"])/2 > > How to choose columns by parts of names? For example, I can pick out > XG1 and YG1 because they have the common part "G1". Not entirely sure what you mean but one approach might be to re-order the columns so that they are in order. yourNames [1] "XG1" "YG1" "XEST" "YEST" "XNOEMP1" "XNOEMP2" [7] "YNOEMP1" "YNOEMP2" "XBUS10" "XBUS10A" "XBUS10B" "XBUS10C" [13] "YBUS10" "YBUS10A" "YBUS10B" "YBUS10C" "XOWNBUS" "XSELFEST" [19] "YOWNBUS" "YSELFEST" yourNames[order(substring(yourNames,2), substring(yourNames, 1,1))] [1] "XBUS10" "YBUS10" "XBUS10A" "YBUS10A" "XBUS10B" "YBUS10B" [7] "XBUS10C" "YBUS10C" "XEST" "YEST" "XG1" "YG1" [13] "XNOEMP1" "YNOEMP1" "XNOEMP2" "YNOEMP2" "XOWNBUS" "YOWNBUS" [19] "XSELFEST" "YSELFEST" gives an idea of what I mean ... Peter Alspach __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] factors and mca
Carlos ?mca states that mca works on a dataframe. As you've written it is.data.frame(de) returns FALSE Try de <- data.frame(d,e) instead of de <- factor(c(d,e)) HTH Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Carlos > Mauricio Cardeal Mendes > Sent: Wednesday, 3 May 2006 8:23 a.m. > To: 'R-Help help' > Subject: [R] factors and mca > > Hi ! Wonder if I have this code below: > > a <- c(1, 2, 3, 2, 1) > b <- c(3, 2, 3, 1, 1) > d <-as.factor(a) > e <-as.factor(b) > table(d,e) > is.factor(a) > is.factor(b) > is.factor(d) > is.factor(e) > de <- factor(c(d,e)) > is.factor(de) > require(MASS) > mca(de) > > I'd like to perform a correspondence analysis, but I can't > understand why the message error after mca. > > Aren't those (d and e) factors as is.factors shows ? > > > Erro em mca(de) : all variables must be factors > > I'm running R 2.3.0 under windows XP > > > Thank you ! > Mauricio > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Special characters: plus/minus - a method that works
Gabor Grothendieck wrote: > Another way that works on my Windows XP system is to use "\261" . Please note Windows escape codes are not portable thus not recommended as Martin Maechler pointed out in a response to a suggestion of mine: On Tue, 14 Jun 2005, Martin Maechler wrote: >>>>>> "Peter" == Peter Alspach >>>>>> on Tue, 14 Jun 2005 14:11:47 +1200 writes: > > Peter> Ben > > Peter> Others have pointed out plotmath. However, for some > Peter> superscripts (including 2) it may be easier to use > Peter> the appropriate escape sequence (at in Windows): > > Peter> ylab = 'BA (m\262/ha)' > > but please refrain from doing that way. > You should write R code that is portable, and ``plotmath'' > has been designed to be portable. Windows escape codes are not, > and may not even work in future (or current?) versions of > Windows with `unusual' locale settings {fonts, etc}. Peter Alspach __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] superscript in figures - basic question
Ben Others have pointed out plotmath. However, for some superscripts (including 2) it may be easier to use the appropriate escape sequence (at in Windows): ylab = 'BA (m\262/ha)' Cheers .... Peter Alspach >>> "Benjamin M. Osborne" <[EMAIL PROTECTED]> 14/06/05 13:42:53 >>> Although I see similar, but more complex, questions addressed in the help archive, I'm having trouble adding superscripted text to the y-axis labels of some figures, and I can't find anything in the R documentation on this. I have: ylab="BA (m2/ha)" but I want the "2" to be superscripted. Thanks in advence for the help, or for pointing out the appropriate help file. -Ben Osborne __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] simple predict question
Alternatively, one can convert dat to a data frame first (?lm states that data is "an optional data frame containing the variables in the model."): dat<-matrix(c(0,0,10,20),2,byrow=T) dat <- as.data.frame(dat) lm1<-lm(V2~V1, dat) predict(lm1, data.frame(V1=seq(0,10,1))) 1 2 3 4 5 6 7 8 9 10 11 0 2 4 6 8 10 12 14 16 18 20 Cheers . Peter Alspach >>> Gabor Grothendieck <[EMAIL PROTECTED]> 01/06/05 09:00:15 >>> On 5/31/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Excuse the simple question... > I'm not sure what I'm doing wrong with predict, but let me use this example: > Suppose I do: > dat<-matrix(c(0,0,10,20),2,byrow=T) > lm1<-lm(dat[,2]~dat[,1]) > > Suppose I want to generate the linearly-interpolated y-values between the > point (0,0) and (0,20) at every unit interval. > I thought I just do: > predict(lm1, data.frame(seq(0,10,1))) to get 0,2,4,6...,18,20, but instead > I just get: > 12 > 0 20 > I think the names are confusing it. Try: x <- dat[,1]; y <- dat[,2] lm1 <- lm(y ~ x) predict(lm1, data.frame(x = 1:10)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Adding up intervals by activity
Lorin You could use rle(): Say your Activity and Interval data is in lorin, then: tmp.rle <- rle(as.vector(lorin[,1]))[[1]] tapply(lorin[,2], rep(1:length(tmp.rle), tmp.rle), sum) 1 2 3 16 8 6 HTH Peter Alspach >>> Lorin Hochstein <[EMAIL PROTECTED]> 24/02/05 16:18:43 >>> Hello all, I have a data frame with the following columns: timestamp, activity, interval. Each row contains information about an activity that occured over a time interval. "Timestamp" is when the interval ended, "Activity" is a factor that represents the type of activity, and "Interval" is a double which represents an interval in time. The data frame is sorted by timestamp. For example: TimestampActivity Interval 2004-02-12 08:59:08 A 5 2004-02-12 09:23:03 A 7 2004-02-12 09:56:04 A 4 2004-02-12 10:39:00 B 5 2004-02-12 11:23:06 B 3 2004-02-12 12:56:01 A 4 2004-02-12 12:59:01 A 2 ... I would like to know how much time was spent on an activity before the activity changed (to compute time, I just care about the lengths of the intervals, not the values of the timestamps). For example, given the table above, I would like to get an output that looks like: Activity TotalTime A 16 B 8 A 6 Is this possible to do in R without resorting to a for loop or recursion (i.e. what's the R-ish way to do this?) I know I can use the "diff" function to identify which rows represent a change in activity, but is there anything I can do with that? Lorin -- Lorin Hochstein Graduate Research Assistant Experimental Software Engineering Group Computer Science Dept., University of Maryland email:[EMAIL PROTECTED] tel:301-405-2721 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Positive log-likelihood in lme
Kia ora I'm a using lme (from nlme package) with data similar to the Orthodont dataset and am getting positive log-likelihoods (>100). This seems usual and I wondered if someone could offer a possible explanation. I can supply a sample dataset if requested, but I feel almost certain that this question has been asked and answered recently. However, I can find no trace of it in the mail archives (although I have spent several hours reading lots of other interesting things :-)). Thanks ..... Peter Alspach __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] dropping rows
Tobias I remember finding Patrick Burns' "S Poetry" (see http://www.burns-stat.com/ ) worth reading - and it covers this sort of thing nicely. Peter Alspach >>> Tobias Muhlhofer <[EMAIL PROTECTED]> 02/12/04 13:57:47 >>> Thanks. The problem is that there is extremely little on dataframes or matrices in "An Intro to R", which I did read and I frankly don't know where else to go. Once I know a function like subset() exists, I can then read the help files on it and that's fine, but I would never dream this function up myself... As for indexing, I DID read "An Introduction to R" and I did NOT catch the part where it says you can use any variable in the dataframe to index it, nor would I have thought of it by myself. From that documentation, I only learned about using row-labels to index things... But I am definitely thankful for the quick help given to me by people on this list, and so I guess being RTFM'ed is a small price to pay for figuring out how to solve the problem I need to solve. Toby Jeff Laake wrote: > Here's an example: > > earlydata=data[data$year<1960,] > > Lookup help and read manuals on manipulating dataframes. > > > Tobias Muhlhofer wrote: > >> Hi! >> >> Sorry for asking a trivial questions, but I can't seem to figure this >> out. >> >> I have a dataframe called master containing 30-odd variables. >> >> In this dataframe, I have observations across these 30 variables from >> 1930 to 2003 (I've made a "year" variable). How can I drop all rows >> for which the year is less than 1960? I'm assuming something with >> ifelse() but I can't quite figure it out. >> >> I would appreciate a suggestion of some syntax. >> >> Thanks! >> Toby >> >> __ >> [EMAIL PROTECTED] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html > > > > > -- ** When Thomas Edison invented the light bulb he tried over 2000 experiments before he got it to work. A young reporter asked him how it felt to have failed so many times. He said "I never failed once. I invented the light bulb. It just happened to be a 2000-step process." __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] unbalanced design
Damián I asked a similar question a few months ago (3 August 2004): > temp.aov <- aov(S~rep+trt1*trt2*trt3, data=dummy.data) > model.tables(temp.aov, type='mean', se=T) > > Returns the means, but states "Design is unbalanced - use se.contrasts > for se's" which is a little surprising since the design is balanced. To which Prof Ripley replied: If you used the default treatment contrasts, it is not. Try Helmert contrasts with aov(). If I recall correctly, following Prof Ripley's suggestion led aov() to accept the design was balanced, but model.tables() still did not (but that could have been my error). However, se.contrast() worked. Cheers Peter Alspach >>> Damián Cirelli <[EMAIL PROTECTED]> 02/12/04 09:50:13 >>> Hi all, I'm new to R and have the following problem: I have a 2 factor design (a has 2 levels, b has 3 levels). I have an object kidney.aov which is an aov(y ~ a*b), and when I ask for model.tables(kidney.avo, se=T) I get the following message along with the table of effects: Design is unbalanced - use se.contrast() for se's but the design is NOT unbalanced... each fator level combination has the same n I' d appreciate any help. Thanks. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] as.list.matrix
Kjetil Isn't a data.frame as special type of list, and thus one could use as.data.frame? Peter Alspach >>> Kjetil Brinchmann Halvorsen <[EMAIL PROTECTED]> 29/10/04 14:16:03 >>> I found the need of converting a matrix into a list of its columns (for use with do.call), and was surprised there was no method as.list.matrix, it could easily be a part of as.list default I wrote my.as.list.matrix <- function(mat) { if(!is.matrix(mat))stop("Argument must be a matrix") n <- NCOL(mat) res <- vector(mode="list", length=n) for (i in seq(along=res)) res[[i]] <- mat[,i] res } but still think there must be some in-built function doing this! By the way, my use is with(VRS,do.call("scatter.smooth", c(my.as.list.matrix(na.omit(cbind(Tmed, resid(VRS.mod1,type="response", xlab="", ylab=""))) there shoud be an easier way for such a daily task? Kjetil -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ifelse() question
Francisco Did you try changing the factors to character, with as.character? Also you don't really need ifelse() for this. Something like the following (untested) should do it: dat[,4] <- as.character(dat[,4]) dat[,5] <- as.character(dat[,5]) dat[dat[,4]=='POR',4] <- dat[dat[,4]=='POR',5] dat[,4] <- as.factor(dat[,4]) dat[,5] <- as.factor(dat[,5]) Peter Alspach >>> "F Z" <[EMAIL PROTECTED]> 29/10/04 12:48:54 >>> Hi I have a data.frame with dim = 18638 (rows) 6 (cols) names(dat) [1] "id" "long""lat" "species" "type""size" Variable "species" and "type" are factors. Species has 5 levels "BOV" "CAP" "CER" "OVI" "POR" Variable "type" has 11 levels "BRD" "CL" ... "OTHER" I would like to replace the values on species by the values on types only if species is == "POR" I tried: x<-ifelse(dat$species %in% "POR",dat$type,dat$species) dat[,4]<-x but levels(x) [1] "1" "2" "3" "4" "5" "6" "8" "9" "10" "11" "12" So x changes the factor names by numbers. I can not use factor() to recover the names since the resulting factors in x are a mixture of factors from species and type. I also tried x<-gsub(pattern = "POR",replacement= factor(dat$type),dat$species) with same behavior. Apparently I did not have my granola bar today so I can't find a solution! Any help is greatly appreciated Thanks! Francisco __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Standard errors from glm
Dear Prof Ripley Thanks for your reply and clarification. However: 1. Regarding model.tables() returning "Design is unbalanced". Setting contrasts to Helmert does indeed make the design balanced, but model.tables() still returns "Design is unbalanced": > options()$contrasts unordered ordered "contr.treatment" "contr.poly" > aov(S~rep+trt1*trt2*trt3, data=dummy.data) Call: ... Residual standard error: 14.59899 Estimated effects may be unbalanced > options(contrasts=c("contr.helmert", "contr.treatment")) > aov(S~rep+trt1*trt2*trt3, data=dummy.data) Call: ... Residual standard error: 14.59899 Estimated effects are balanced > model.tables(aov(S~rep+trt1*trt2*trt3, data=dummy.data), se=T) Design is unbalanced - use se.contrasts for se's Tables of effects ... However, this is a relatively minor issue, and covered in ?model.tables which clearly states that "The implementation is incomplete, and only the simpler cases have been tested thoroughly." 2. You point out that "In either case you can predict something you want to estimate and use predict(, se=TRUE)." Doesn't this give the standard error of the predicted value, rather than the mean for, say, trt1 level 0? For example: > predict(temp.lm, newdata=data.frame(rep='1', trt1='0', trt2='1', trt3='0'), se=T) $fit [1] 32 $se.fit [1] 10.53591 $df [1] 23 $residual.scale [1] 14.59899 Whereas from the analysis of variance table we can get the standard error of the mean for trt1 as being sqrt(anova(temp.lm)[9,3]/12) = 4.214365. It is the equivalent of this latter value that I'm after in the glm() case. >>> Prof Brian Ripley <[EMAIL PROTECTED]> 03/08/04 18:10:56 >>> On Tue, 3 Aug 2004, Peter Alspach wrote: [Lines wrapped for legibility.] > I'm having a little difficulty getting the correct standard errors from > a glm.object (R 1.9.0 under Windows XP 5.1). predict() will gives > standard errors of the predicted values, but I am wanting the standard > errors of the mean. > > To clarify: > > Assume I have a 4x3x2 factorial with 2 complete replications (i.e. 48 > observations, I've appended a dummy set of data at the end of this > message). Call the treatments trt1 (4 levels), trt2 (3 levels) and trt3 > (2 levels) and the replications rep - all are factors. The observed > data is S. Then: > > temp.aov <- aov(S~rep+trt1*trt2*trt3, data=dummy.data) > model.tables(temp.aov, type='mean', se=T) > > Returns the means, but states "Design is unbalanced - use se.contrasts > for se's" which is a little surprising since the design is balanced. If you used the default treatment contrasts, it is not. Try Helmert contrasts with aov(). > Nevertheless, se.contrast gives what I'd expect: > > se.contrast(temp.aov, list(trt1==0, trt1==1), data=dummy.data) > [1] 5.960012 > > i.e. standard error of mean is 5.960012/sqrt(2) = 4.214, which is the > sqrt(anova(temp.aov)[9,3]/12) as expected. Similarly for interactions, > e.g.: > > se.contrast(temp.aov, list(trt1==0 & trt2==0, trt1==1 & trt2==1), > data=dummy.data)/sqrt(2) > [1] 7.299494 > > How do I get the equivalent of these standard errors if I have used > lm(), and by extension glm()? I think I should be able to get these > using predict(..., type='terms', se=T) or coef(summary()) but can't > quite see how. In either case you can predict something you want to estimate and use predict(, se=TRUE). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Standard errors from glm
Roger Thanks - I have tried that but it doesn't give the standard errors I required. Using my example: coef(summary(temp.lm)) gives: Estimate Std. Error t value Pr(>|t|) (Intercept) 44.5 10.535912 4.22364968 0.0003224616 rep2 -1.0 4.214365 -0.23728369 0.8145375583 trt11-13.0 14.598987 -0.89047272 0.3824325293 trt12 3.0 14.598987 0.20549370 0.8389943428 trt13-17.0 14.598987 -1.16446432 0.2561726432 .. etc ... that is, standard error for the (intercept) is 10.54, rep 4.21, main effects 14.60, 2-way interactions 20.65 and 3-way interaction 29.20. These can also be obtained via sqrt(diag(vcov(temp.lm))). It is not clear to me how to go from these estimates to those from the aov() call. I have tried pre- and post- multiplying vcov() by the design matrix but this gives the same standard errors as predict(temp.lm, se=T); i.e. those of the predicted values. Regards .... Peter Alspach >>> "Roger D. Peng" <[EMAIL PROTECTED]> 03/08/04 12:22:44 >>> Try summary(glm.object)$coefficients. -roger Peter Alspach wrote: > Kia ora list members: > > I'm having a little difficulty getting the correct standard > errors from a glm.object (R 1.9.0 under Windows XP 5.1). > predict() will gives standard errors of the predicted values, > but I am wanting the standard errors of the mean. > > To clarify: > > Assume I have a 4x3x2 factorial with 2 complete replications > (i.e. 48 observations, I've appended a dummy set of data at > the end of this message). Call the treatments trt1 (4 > levels), trt2 (3 levels) and trt3 (2 levels) and the > replications rep - all are factors. The observed data is S. > Then: > > temp.aov <- aov(S~rep+trt1*trt2*trt3, data=dummy.data) > model.tables(temp.aov, type='mean', se=T) > > Returns the means, but states "Design is unbalanced - use > se.contrasts for se's" which is a little surprising since the > design is balanced. Nevertheless, se.contrast gives what I'd > expect: > > se.contrast(temp.aov, list(trt1==0, trt1==1), data=dummy.data) > [1] 5.960012 > > i.e. standard error of mean is 5.960012/sqrt(2) = 4.214, which > is the sqrt(anova(temp.aov)[9,3]/12) as expected. Similarly > for interactions, e.g.: > > se.contrast(temp.aov, list(trt1==0 & trt2==0, trt1==1 & > trt2==1), data=dummy.data)/sqrt(2) [1] 7.299494 > > How do I get the equivalent of these standard errors if I have > used lm(), and by extension glm()? I think I should be able > to get these using predict(..., type='terms', se=T) or > coef(summary()) but can't quite see how. > > predict(lm(S~rep+trt1*trt2*trt3, data=dummy.data), > type='terms', se=T) or predict(glm(cbind(S, > 100-S)~rep+trt1*trt2*trt3, data=dummy.data, > family='binomial'), type='terms', se=T) or, as in my case, > predict(glm(cbind(S, 100-S)~rep+trt1*trt2*trt3, > data=dummy.data, family='quasibinomial'), type='terms', se=T) > > > Thanks > > Peter Alspach HortResearch > > dummy.data trt1 trt2trt3rep S 0 0 0 1 33 0 > 0 0 2 55 00 1 1 > 18 0 0 1 2 12 01 0 1 47 01 0 > 2 16 01 1 1 22 01 1 2 33 02 > 0 1 22 02 0 2 18 02 1 1 60 0 > 2 1 2 40 10 0 1 38 10 0 2 > 24 > 1 0 1 18 10 1 2 14 11 0 > 1 69 11 0 2 42 11 1 1 42 11 > 1 2 > 44 1 2 0 1 48 12 0 2 26 12 1 > 1 46 12 1 2 33 20 0 1 48 20 > 0 2 46 20 1 1 24 20 1 28 2 > 1 0 1 69 21 0 2 33 21 1 1 > 22 > 2 1 1 2 33 22 0 1 33 22 0 > 2 18 22 1 1 26 22 1 2 42 30 > 0 1 > 12 3 0 0 2 42 30 1 1 16 30 1 > 2 22 31 0 1 14 31 0 2 60 31 > 1 1 40 31 1 2 55 32 0 1 47 3 > 2 0 2 38 32 1 1 18 32 1 2 &
[R] Standard errors from glm
Kia ora list members: I'm having a little difficulty getting the correct standard errors from a glm.object (R 1.9.0 under Windows XP 5.1). predict() will gives standard errors of the predicted values, but I am wanting the standard errors of the mean. To clarify: Assume I have a 4x3x2 factorial with 2 complete replications (i.e. 48 observations, I've appended a dummy set of data at the end of this message). Call the treatments trt1 (4 levels), trt2 (3 levels) and trt3 (2 levels) and the replications rep - all are factors. The observed data is S. Then: temp.aov <- aov(S~rep+trt1*trt2*trt3, data=dummy.data) model.tables(temp.aov, type='mean', se=T) Returns the means, but states "Design is unbalanced - use se.contrasts for se's" which is a little surprising since the design is balanced. Nevertheless, se.contrast gives what I'd expect: se.contrast(temp.aov, list(trt1==0, trt1==1), data=dummy.data) [1] 5.960012 i.e. standard error of mean is 5.960012/sqrt(2) = 4.214, which is the sqrt(anova(temp.aov)[9,3]/12) as expected. Similarly for interactions, e.g.: se.contrast(temp.aov, list(trt1==0 & trt2==0, trt1==1 & trt2==1), data=dummy.data)/sqrt(2) [1] 7.299494 How do I get the equivalent of these standard errors if I have used lm(), and by extension glm()? I think I should be able to get these using predict(..., type='terms', se=T) or coef(summary()) but can't quite see how. predict(lm(S~rep+trt1*trt2*trt3, data=dummy.data), type='terms', se=T) or predict(glm(cbind(S, 100-S)~rep+trt1*trt2*trt3, data=dummy.data, family='binomial'), type='terms', se=T) or, as in my case, predict(glm(cbind(S, 100-S)~rep+trt1*trt2*trt3, data=dummy.data, family='quasibinomial'), type='terms', se=T) Thanks Peter Alspach HortResearch dummy.data trt1trt2trt3rep S 0 0 0 1 33 0 0 0 2 55 0 0 1 1 18 0 0 1 2 12 0 1 0 1 47 0 1 0 2 16 0 1 1 1 22 0 1 1 2 33 0 2 0 1 22 0 2 0 2 18 0 2 1 1 60 0 2 1 2 40 1 0 0 1 38 1 0 0 2 24 1 0 1 18 1 0 1 2 14 1 1 0 1 69 1 1 0 2 42 1 1 1 1 42 1 1 1 2 44 1 2 0 1 48 1 2 0 2 26 1 2 1 1 46 1 2 1 2 33 2 0 0 1 48 2 0 0 2 46 2 0 1 1 24 2 0 1 28 2 1 0 1 69 2 1 0 2 33 2 1 1 1 22 2 1 1 2 33 2 2 0 1 33 2 2 0 2 18 2 2 1 1 26 2 2 1 2 42 3 0 0 1 12 3 0 0 2 42 3 0 1 1 16 3 0 1 2 22 3 1 0 1 14 3 1 0 2 60 3 1 1 1 40 3 1 1 2 55 3 2 0 1 47 3 2 0 2 38 3 2 1 1 18 3 2 1 2 44 __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Scoping Rules: Summary
Thanks to Andy Liaw, Roger Peng, Thomas Lumley, Brian Ripley and Peter Dalgaard, all of whom addressed my questions or threads arising from them. The full messages were posted to the list so this is a brief summary: Andy Liaw explained the difference between lexical and dynamic scoping and the rationale behind the choice of lexical scoping for R. Roger Peng showed how to modify fnB. Brian Ripley suggested that it is generally better to pass functions an object rather than just the name, and warned of the dangers of using get() on the result of deparse(substitute()). Thanks all Peter Alspach Original question below: I'm using R1.7.1 (Windows 2000) and having difficulty with scoping. I've studied the FAQ and on-line manuals and think I have identified the source of my difficulty, but cannot work out the solution. For the purposes of illustration. I have three functions as defined below: fnA <- function(my.x) { list(first=as.character(substitute(my.x)), second=sqrt(my.x)) } fnB <- function(AA) { tmp.x <- get(AA$first) tmp.x^2 } fnC <- function() { x <- 1:2 y <- fnA(x) z <- fnB(y) c(x,y,z) } fnA() has a vector as an argument and returns the name of the vector and the square root of its elements in a list. fn(B) takes the result of fn(A) as its argument, gets the appropriate vector and computes the square of its elements. These work fine when called at the command line. fnC() defines a local vector x and calls fnA() which operates on this vector. Then fnB() is called, but it operates on a global vector x in GlobalEnv (or returns an error is x doesn't exist there) - but I want it to operate on the local vector. I think this is related to the enclosing environment of all three functions being GlobalEnv (since they were created at the command line), but the parent environment of fnB() being different when invoked from within fnC(). My questions: 1 Have I correctly understood the issue ? 2 How do I make fnB() operate on the local vector rather than the global one ? 3 And, as an aside, I have used as.character(substitute(my.x)) to pass the name - but deparse(substitute(my.x)) also works. Is there any reason to prefer one over the other? Thank you ... __ The contents of this e-mail are privileged and/or confidenti...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Scoping rules
Dear List members: I'm using R1.7.1 (Windows 2000) and having difficulty with scoping. I've studied the FAQ and on-line manuals and think I have identified the source of my difficulty, but cannot work out the solution. For the purposes of illustration. I have three functions as defined below: fnA <- function(my.x) { list(first=as.character(substitute(my.x)), second=sqrt(my.x)) } fnB <- function(AA) { tmp.x <- get(AA$first) tmp.x^2 } fnC <- function() { x <- 1:2 y <- fnA(x) z <- fnB(y) c(x,y,z) } fnA() has a vector as an argument and returns the name of the vector and the square root of its elements in a list. fn(B) takes the result of fn(A) as its argument, gets the appropriate vector and computes the square of its elements. These work fine when called at the command line. fnC() defines a local vector x and calls fnA() which operates on this vector. Then fnB() is called, but it operates on a global vector x in GlobalEnv (or returns an error is x doesn't exist there) - but I want it to operate on the local vector. I think this is related to the enclosing environment of all three functions being GlobalEnv (since they were created at the command line), but the parent environment of fnB() being different when invoked from within fnC(). My questions: 1 Have I correctly understood the issue ? 2 How do I make fnB() operate on the local vector rather than the global one ? 3 And, as an aside, I have used as.character(substitute(my.x)) to pass the name - but deparse(substitute(my.x)) also works. Is there any reason to prefer one over the other? Thank you ... Peter Alspach __ The contents of this e-mail are privileged and/or confidential to the named recipient and are not to be used by any other person and/or organisation. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help