[R] A printing macro
I am exploring the result of clustering a large multivariate data set into a number of groups, represented, say, by a factor G. I wrote a function to see how categorical variables vary between groups: ddisp - function(dvar) { + csqt - chisq.test(G,dvar) + print(csqt$statistic) + print(csqt$observed) + print(round(csqt$expected)) + round(csqt$residuals) + } x - ceiling(4*runif(100)) G - gl(4,1,100) ddisp(x) X-squared 6.235645 dvar G1 2 3 4 1 10 5 5 5 2 6 9 5 5 3 8 6 5 6 4 7 4 4 10 dvar G 1 2 3 4 1 8 6 5 6 2 8 6 5 6 3 8 6 5 6 4 8 6 5 6 dvar G1 2 3 4 1 1 0 0 -1 2 -1 1 0 -1 3 0 0 0 0 4 0 -1 0 1 Warning message: Chi-squared approximation may be incorrect in: chisq.test(G, dvar) As I need to apply this function to a large number of variables x it would be helpful if the function printed x rather than the formal argument dvar. I have a vague idea that things like deparse() and substitute() will come into the solution but I have not yet come up with the right incantation. Any help appreciated! Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] stepAIC for overdispersed Poisson
I am wondering if stepAIC in the MASS library may be used for model selection in an overdispersed Poisson situation. What I thought of doing was to get an estimate of the overdispersion parameter phi from fitting a model with all or most of the available predictors (we have a large number of observations so this should not be problematical) and then use stepAIC with scale = phi. Should this be OK? Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A printing macro
Thanks for these suggestions, Professor Ripley. It's interesting that the function parameters in R are not truly dummy as they can effect the result of a function. Murray Prof Brian Ripley wrote: ddisp - function(dvar) { yn - substitute(dvar) csqt - eval.parent(substitute(chisq.test(G,dvar), list(dvar=yn))) } There are other ways, such as forming the cross-classification table, setting its dimnames and passing that to chisq.test. On Mon, 13 Nov 2006, Murray Jorgensen wrote: I am exploring the result of clustering a large multivariate data set into a number of groups, represented, say, by a factor G. I wrote a function to see how categorical variables vary between groups: ddisp - function(dvar) { + csqt - chisq.test(G,dvar) + print(csqt$statistic) + print(csqt$observed) + print(round(csqt$expected)) + round(csqt$residuals) + } x - ceiling(4*runif(100)) G - gl(4,1,100) ddisp(x) X-squared 6.235645 dvar G1 2 3 4 1 10 5 5 5 2 6 9 5 5 3 8 6 5 6 4 7 4 4 10 dvar G 1 2 3 4 1 8 6 5 6 2 8 6 5 6 3 8 6 5 6 4 8 6 5 6 dvar G1 2 3 4 1 1 0 0 -1 2 -1 1 0 -1 3 0 0 0 0 4 0 -1 0 1 Warning message: Chi-squared approximation may be incorrect in: chisq.test(G, dvar) As I need to apply this function to a large number of variables x it would be helpful if the function printed x rather than the formal argument dvar. I have a vague idea that things like deparse() and substitute() will come into the solution but I have not yet come up with the right incantation. Any help appreciated! Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Recoding categorical variables
I want to recode an integer-valued variable y so that its values become 1:length(y). I can do this using a loop but maybe someone can suggest code without a loop. My code is this: y - round(20*runif(25)) table(y) suy - sort(unique(y)) m - length(suy) z - y + max(suy) for(i in 1:m) z[y==suy[i]] - i rbind(y,z) (the recoded y is stored in z) Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recoding categorical variables
Thanks John! Deepayan Sarkar also suggested this. I don't really expect to see any better solution. Murray John Fox wrote: Dear Murray, How about as.numeric(factor(y)) ? I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Murray Jorgensen Sent: Wednesday, October 25, 2006 7:13 PM To: r-help@stat.math.ethz.ch Subject: [R] Recoding categorical variables I want to recode an integer-valued variable y so that its values become 1:length(y). I can do this using a loop but maybe someone can suggest code without a loop. My code is this: y - round(20*runif(25)) table(y) suy - sort(unique(y)) m - length(suy) z - y + max(suy) for(i in 1:m) z[y==suy[i]] - i rbind(y,z) (the recoded y is stored in z) Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pooled Covariance Matrix
Thank you, Professor Ripley. Murray Jorgensen Prof Brian Ripley wrote: On Wed, 20 Sep 2006, Murray Jorgensen wrote: I am in a discriminant analysis situation with a frame containing several variables and a grouping factor, if you like: set.seed(200906) exampledf - as.data.frame(matrix(rnorm(50,5,2),nrow=10,ncol=5)) exampledf$Group - factor(rep(c(1,2,3),c(3,3,4))) exampledf I'm sure there must be a simple way to get the within group pooled covariance matrix but I haven't found it yet. There are two versions of this, weighted and unweighted, and the difference caused confusion in the early discriminant analysis literature. (See MASS4 p.333.) The weighted version is conventional. Suppose you have a matrix X and a grouping factor g. Then either of group.means - rowsum(X, g)/as.vector(table(g)) group.means - tapply(X, list(rep(g, ncol(X)), col(X)), mean) gives the group means, and var(X - group.means[g,]) seems to be what you want. I started thinking that one might begin by forming a frame with the same dimensions but containing the group means. But then I found a thread from two years back called Getting the groupmean for each person which seemed to imply that doing this was a bit subtle even for ncol=1. Hence I will risk a question to the list. That thread seems to be about efficiency for very large matrices on R of two years' ago. -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Pooled Covariance Matrix
I am in a discriminant analysis situation with a frame containing several variables and a grouping factor, if you like: set.seed(200906) exampledf - as.data.frame(matrix(rnorm(50,5,2),nrow=10,ncol=5)) exampledf$Group - factor(rep(c(1,2,3),c(3,3,4))) exampledf I'm sure there must be a simple way to get the within group pooled covariance matrix but I haven't found it yet. I started thinking that one might begin by forming a frame with the same dimensions but containing the group means. But then I found a thread from two years back called Getting the groupmean for each person which seemed to imply that doing this was a bit subtle even for ncol=1. Hence I will risk a question to the list. Thanks for any help, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unquoting
I would like a function to strip quotes off character strings. I should work like this: A - matrix(1:6, nrow = 2, ncol=3) AF - as.data.frame(A) names(AF) - c(First,Second,Third) AF First Second Third 1 1 3 5 2 2 4 6 names(AF)[2] [1] Second attach(AF) unquote(names(AF)[2]) [1] 3 4 Of course what I actually get is Error: couldn't find function unquote The reason that I want to do this is that I have a frame with a rather large number of variables and I would like to loop over the names and print out various descriptive summaries of the variable's distribution. OK, OK, I could just go AF[,2] [1] 3 4 but once I thought of unquoting I have some sort of inner need to know how to do it! Cheers, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unquoting
Thank you, Brian, Peter and Gabor Brian has what want. My heading was a bit misleading. I was looking for a function that would, in logicians' terms, convert 'mention' into 'use'. (This usually goes along with the story about the importance of knowing the difference between a lion and lion.) get() is just such a function. Murray Jorgensen Prof Brian Ripley wrote: ?get I really think this has nothing to do with `quoting', rather to do with evaluating variables from their names. At first I though you were looking for noquote(), which does unquote in the conventional sense. noquote(names(AF)[2]) [1] Second get(names(AF)[2]) [1] 3 4 On Sun, 20 Aug 2006, Murray Jorgensen wrote: I would like a function to strip quotes off character strings. I should work like this: A - matrix(1:6, nrow = 2, ncol=3) AF - as.data.frame(A) names(AF) - c(First,Second,Third) AF First Second Third 1 1 3 5 2 2 4 6 names(AF)[2] [1] Second attach(AF) unquote(names(AF)[2]) [1] 3 4 Of course what I actually get is Error: couldn't find function unquote The reason that I want to do this is that I have a frame with a rather large number of variables and I would like to loop over the names and print out various descriptive summaries of the variable's distribution. OK, OK, I could just go AF[,2] [1] 3 4 but once I thought of unquoting I have some sort of inner need to know how to do it! Cheers, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting models in a loop
Thanks to all for their help. I am busy today but tomorrow I will have time to digest all the feedback and follow up if necessary Cheers, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting models in a loop
Thanks to all who helped me with this problem, especially Bill Venables and Gabor Grothendieck. I hope one day to learn more about the advanced features of the language used by Bill. From a practical standpoint I think I will just avoid doing things like this in my teaching. It is hard enough just getting across the elementary ideas. Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fitting models in a loop
If I want to display a few polynomial regression fits I can do something like for (i in 1:6) { mod - lm(y ~ poly(x,i)) print(summary(mod)) } Suppose that I don't want to over-write the fitted model objects, though. How do I create a list of blank fitted model objects for later use in a loop? Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [Fwd: Re: Parameterization puzzle]
Bother! This cold has made me accident-prone. I meant to hit Reply-all. Clarification below. Original Message Subject: Re: [R] Parameterization puzzle Date: Fri, 21 Jul 2006 19:10:03 +1200 From: Murray Jorgensen [EMAIL PROTECTED] To: Prof Brian Ripley [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] Apologies for a non-selfcontained example. Here is what I should have sent: pyears - scan() 18793 52407 10673 43248 5710 28612 2585 12663 1462 5317 deaths - scan() 2 32 12 104 28 206 28 186 31 102 l - log(pyears) Smoke - gl(2,1,10,labels=c(No,Yes)) Age - gl(5,2,10,labels=c(35-44,45-54,55-64,65-74,75-84), ordered=TRUE) mod1.glm - glm(deaths ~ Age * Smoke + offset(l),family=poisson) summary(mod1.glm) age - as.numeric(Age) mod2.glm - aso1.glm - glm(deaths ~ poly(age,2) + Smoke + poly(age,1):Smoke + offset(l),family=poisson) summary(mod2.glm) Cheers, Murray Jorgensen Prof Brian Ripley wrote: R does not know that poly(age,2) and poly(age,1) are linearly dependent. (And indeed they only are for some functions 'poly'.) I cannot reproduce your example ('l' is missing), but perhaps glm(deaths ~ poly(age,2) + poly(age,1)*Smoke + offset(l), poisson) was your intention? On Fri, 21 Jul 2006, Murray Jorgensen wrote: Consider the following example (based on an example in Pat Altham's GLM notes) pyears - scan() 18793 52407 10673 43248 5710 28612 2585 12663 1462 5317 deaths - scan() 2 32 12 104 28 206 28 186 31 102 Smoke - gl(2,1,10,labels=c(No,Yes)) Age - gl(5,2,10,labels=c(35-44,45-54,55-64,65-74,75-84), ordered=TRUE) mod1.glm - glm(deaths ~ Age * Smoke + offset(l),family=poisson) summary(mod1.glm) age - as.numeric(Age) mod2.glm - aso1.glm - glm(deaths ~ poly(age,2) + Smoke + poly(age,1):Smoke + offset(l),family=poisson) summary(mod2.glm) The business part of the summary for the first model Estimate Std. Error z value Pr(|z|) (Intercept)-5.927060.16577 -35.754 2e-16 *** Age.L 4.064900.47414 8.573 2e-16 *** Age.Q -1.082930.41326 -2.620 0.008781 ** Age.C 0.241580.31756 0.761 0.446816 Age^4 0.042440.23061 0.184 0.853986 SmokeYes0.619160.17296 3.580 0.000344 *** Age.L:SmokeYes -1.312340.49267 -2.664 0.007729 ** Age.Q:SmokeYes 0.390430.43008 0.908 0.363976 Age.C:SmokeYes -0.295930.33309 -0.888 0.374298 Age^4:SmokeYes -0.036820.24432 -0.151 0.880218 inspires me to fit the second model that omits the nonsignificant terms, however this produces the summary Estimate Std. Error z value Pr(|z|) (Intercept)-5.8368 0.1213 -48.103 2e-16 *** poly(age, 2)1 3.9483 0.1755 22.497 2e-16 *** poly(age, 2)2 -1.0460 0.1448 -7.223 5.08e-13 *** SmokeYes0.5183 0.1262 4.106 4.02e-05 *** SmokeNo:poly(age, 1)1.3755 0.4340 3.169 0.00153 ** SmokeYes:poly(age, 1) NA NA NA NA Why do we get a SmokeNo:poly(age, 1) term? Can I re-express mod2.glm so that this term does not appear? Cheers, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parameterization puzzle
Thanks to Brian and Berwin with there help. I faced a double problem in that I not only wanted to fit the model but I also wanted to do so in such a way that it would seem natural for a classroom example. The moral seems to be that I should do the orthogonal polynomial stuff outside the model formula. Here then is my solution: pyears - scan() 18793 52407 10673 43248 5710 28612 2585 12663 1462 5317 deaths - scan() 2 32 12 104 28 206 28 186 31 102 l - log(pyears) Smoke - gl(2,1,10,labels=c(No,Yes)) Age - gl(5,2,10,labels=c(35-44,45-54,55-64,65-74,75-84), ordered=TRUE) mod1.glm - glm(deaths ~ Age * Smoke + offset(l),family=poisson) summary(mod1.glm) age - as.numeric(Age) age1 - poly(age,2)[,1] age2 - poly(age,2)[,2] mod2.glm - aso1.glm - glm(deaths ~ age1 + age2 + Smoke + age1:Smoke + offset(l),family=poisson) summary(mod2.glm) The final summary then comes out looking like this: Estimate Std. Error z value Pr(|z|) (Intercept)-5.8368 0.1213 -48.103 2e-16 *** age15.3238 0.4129 12.893 2e-16 *** age2 -1.0460 0.1448 -7.223 5.08e-13 *** SmokeYes0.5183 0.1262 4.106 4.02e-05 *** age1:SmokeYes -1.3755 0.4340 -3.169 0.00153 ** which is just what I wanted. Cheers, Murray Jorgensen Prof Brian D Ripley wrote: On Fri, 21 Jul 2006, Berwin A Turlach wrote: G'day all, BDR == Prof Brian Ripley [EMAIL PROTECTED] writes: BDR R does not know that poly(age,2) and poly(age,1) are linearly BDR dependent. Indeed, I also thought that this is the reason of the problem. BDR (And indeed they only are for some functions 'poly'.) I am surprised about this. Should probably read the help page of 'poly' once more and more carefully. My point was perhaps simpler: if you or I or Murray had a function poly() in our workspace, that one would be found, I think. (I have not checked the ramifications of namespaces here, but I believe that would be the intention, that formulae are evaluated in their defining environment.) So omly when the model matrix is set up could the linear dependence be known (and there is nothing in the system poly() to tell model.matrix). What is sometimes called extrinsic aliasing is left to the fitting function, which seems to be to do a sensible job provided the main effect is in the model. Indeed, including interactions without main effects (as Murray did) often makes the results hard to interpret. BDR I cannot reproduce your example ('l' is missing), [...] My guess is that 'l' is 'pyears'. At least, I worked under that assumption. Apparently l = log(pyears), which was my uncertain guess. Interestingly, on my machine (using R 2.3.1, 2.2.1 and 2.1.1) I cannot fit any of the Poisson GLM that Murray tried. I always get the error message: Error: no valid set of coefficients has been found: please supply starting values Related to the offset, I believe. But I have to investigate this further. I can fit binomial models that give me similar answers. BDR [...] but perhaps BDR glm(deaths ~ poly(age,2) + poly(age,1)*Smoke + offset(l), BDR poisson) BDR was your intention? In this parameterisation a 'poly(age,1)' term will appear among the coefficients with an estimated value of NA since it is aliased with 'poly(age, 2)1'. So I don't believe that this was Murray's intention. The only suggestion I can come up with is: summary(glm(cbind(deaths, l-deaths) ~ age*Smoke+I(age^2), family=binomial)) [...] Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -10.798950.45149 -23.918 2e-16 *** age2.378920.20877 11.395 2e-16 *** SmokeYes 1.445730.37347 3.871 0.000108 *** I(age^2) -0.197060.02749 -7.168 7.6e-13 *** age:SmokeYes -0.308500.09756 -3.162 0.001566 ** [...] Which doesn't use orthogonal polynomials anymore. But I don't see how you can fit the model that Murray want to fit using orthogonal polynomials given the way R's model language operates. So I guess the Poisson GLM that Murray wants to fit is: glm(deaths~ age*Smoke+I(age^2)+offset(l), family=poisson) Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Mathematics and Statistics+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: [EMAIL PROTECTED] Australiahttp://www.maths.uwa.edu.au/~berwin -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862
[R] Parameterization puzzle
Consider the following example (based on an example in Pat Altham's GLM notes) pyears - scan() 18793 52407 10673 43248 5710 28612 2585 12663 1462 5317 deaths - scan() 2 32 12 104 28 206 28 186 31 102 Smoke - gl(2,1,10,labels=c(No,Yes)) Age - gl(5,2,10,labels=c(35-44,45-54,55-64,65-74,75-84), ordered=TRUE) mod1.glm - glm(deaths ~ Age * Smoke + offset(l),family=poisson) summary(mod1.glm) age - as.numeric(Age) mod2.glm - aso1.glm - glm(deaths ~ poly(age,2) + Smoke + poly(age,1):Smoke + offset(l),family=poisson) summary(mod2.glm) The business part of the summary for the first model Estimate Std. Error z value Pr(|z|) (Intercept)-5.927060.16577 -35.754 2e-16 *** Age.L 4.064900.47414 8.573 2e-16 *** Age.Q -1.082930.41326 -2.620 0.008781 ** Age.C 0.241580.31756 0.761 0.446816 Age^4 0.042440.23061 0.184 0.853986 SmokeYes0.619160.17296 3.580 0.000344 *** Age.L:SmokeYes -1.312340.49267 -2.664 0.007729 ** Age.Q:SmokeYes 0.390430.43008 0.908 0.363976 Age.C:SmokeYes -0.295930.33309 -0.888 0.374298 Age^4:SmokeYes -0.036820.24432 -0.151 0.880218 inspires me to fit the second model that omits the nonsignificant terms, however this produces the summary Estimate Std. Error z value Pr(|z|) (Intercept)-5.8368 0.1213 -48.103 2e-16 *** poly(age, 2)1 3.9483 0.1755 22.497 2e-16 *** poly(age, 2)2 -1.0460 0.1448 -7.223 5.08e-13 *** SmokeYes0.5183 0.1262 4.106 4.02e-05 *** SmokeNo:poly(age, 1)1.3755 0.4340 3.169 0.00153 ** SmokeYes:poly(age, 1) NA NA NA NA Why do we get a SmokeNo:poly(age, 1) term? Can I re-express mod2.glm so that this term does not appear? Cheers, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] princomp and eigen
Consider the following output [R2.2.0; Windows XP] set.seed(160706) X - matrix(rnorm(40),nrow=10,ncol=4) Xpc - princomp(X,cor=FALSE) summary(Xpc,loadings=TRUE, cutoff=0) Importance of components: Comp.1Comp.2Comp.3 Comp.4 Standard deviation 1.2268300 0.9690865 0.7918504 0.55295970 Proportion of Variance 0.4456907 0.2780929 0.1856740 0.09054235 Cumulative Proportion 0.4456907 0.7237836 0.9094576 1. Loadings: Comp.1 Comp.2 Comp.3 Comp.4 [1,] -0.405 -0.624 0.466 0.479 [2,] -0.199 -0.636 -0.346 -0.660 [3,] 0.884 -0.443 0.023 0.148 [4,] 0.122 0.099 0.814 -0.559 eigen(var(X)) $values [1] 1.6723465 1.0434763 0.6966967 0.3397382 $vectors [,1] [,2][,3] [,4] [1,] -0.4048158 0.6240510 0.46563382 0.4794473 [2,] -0.1994853 0.6361009 -0.34634256 -0.6600213 [3,] 0.8839775 0.4429553 0.02261302 0.1478618 [4,] 0.1221215 -0.0986234 0.81407655 -0.5591414 I would have expected the princomp component standard deviations to be the square roots of the eigen() $values and they clearly are not. Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Extracting Variance components
I can ask my question using and example from Chapter 1 of Pinheiro Bates. # 1.4 An Analysis of Covariance Model OrthoFem - Orthodont[ Orthodont$Sex == Female, ] fm1OrthF - + lme( distance ~ age, data = OrthoFem, random = ~ 1 | Subject ) summary( fm1OrthF ) Linear mixed-effects model fit by REML Data: OrthoFem AIC BIClogLik 149.2183 156.169 -70.60916 Random effects: Formula: ~1 | Subject (Intercept) Residual StdDev: 2.06847 0.7800331 [...etc...] I can extract the estimate of the variance component \sigma (0.7800331) via sigma - fm1OrthF$sigma How do I extract the other component \sigma_b (2.06847) ? Cheers, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Frequencies into Poisson responses
Actually I had just meant that because I had some factors I had a conceptual table. Using ftable() was just my way of getting the factors into data frame form. But thank you for showing me that as.data.frame() does exactly what I want. Murray Jorgensen Prof Brian Ripley wrote: In fact you have an ftable, not a multi-way table, but as.data.frame works for multi-way tables, and here as.data.frame(as.table(ftabc)) works: as.data.frame(as.table(ftabc)) A B CC Freq 1 1 1 1 20 2 2 1 1 38 3 3 1 1 20 4 1 2 1 22 5 2 2 1 25 6 3 2 1 23 ... It would be more usual to start with a table and use ftable to display it. It might just be worth adding as.data.frame.ftable. On Thu, 18 May 2006, Murray Jorgensen wrote: Suppose that one has several factors, all of the same length. These define a multi-way contingency table. Now suppose one wants to fit a Poisson GLM a.k.a. log-linear model to the frequencies in this table. How may we make the table into a data frame suitable for glm() ? I have an answer to my own question below, but surely more elegant solutions exist? set.seed(060518) na - nb - 3 nc - 4 n - na*nb*nc a - round(runif(1000,0.5,na+0.5)) b - round(runif(1000,0.5,nb+0.5)) cc - round(runif(1000,0.5,nc+0.5)) A - factor(a) B - factor(b) CC - factor(cc) ftabc - ftable(A,B,CC) freqs - as.vector(ftabc) A1 - gl(na,nb,n) B1 - gl(nb,1,n) C1 - gl(nc,na*nb,n) required - data.frame(A1,B1,C1,freqs) required Cheers, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Frequencies into Poisson responses
Suppose that one has several factors, all of the same length. These define a multi-way contingency table. Now suppose one wants to fit a Poisson GLM a.k.a. log-linear model to the frequencies in this table. How may we make the table into a data frame suitable for glm() ? I have an answer to my own question below, but surely more elegant solutions exist? set.seed(060518) na - nb - 3 nc - 4 n - na*nb*nc a - round(runif(1000,0.5,na+0.5)) b - round(runif(1000,0.5,nb+0.5)) cc - round(runif(1000,0.5,nc+0.5)) A - factor(a) B - factor(b) CC - factor(cc) ftabc - ftable(A,B,CC) freqs - as.vector(ftabc) A1 - gl(na,nb,n) B1 - gl(nb,1,n) C1 - gl(nc,na*nb,n) required - data.frame(A1,B1,C1,freqs) required Cheers, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Pasting data into scan()
I think that Mozilla Thunderbird replaces tabs with blanks routinely. I was actually pasting from Crimson Editor but results are identical with Notepad and MS Word. However there is no problem when pasting from WinEdt: strength - scan() 1: 0.023 0.032 0.054 0.069 0.081 0.094 7: 0.105 0.127 0.148 0.169 0.188 0.216 13: 0.255 0.277 0.311 0.361 0.376 0.395 19: 0.432 0.463 0.481 0.519 0.529 0.567 25: 0.642 0.674 0.752 0.823 0.887 0.926 31: Read 30 items and scanning the file directly works with or without ' sep=\t ': strength - scan(C:\\Files\\Teaching\\STAT321\\tensile.dat) Read 30 items strength - scan(C:\\Files\\Teaching\\STAT321\\tensile.dat,sep=\t) Read 30 items I will send the file to BR (but not to the list!) Murray Jorgensen Prof Brian Ripley wrote: No tabs are being echoed, so your cut/paste is removing the tabs. What are you pasting from? (Not an R problem, of course.) On Tue, 2 May 2006, Murray Jorgensen wrote: The file TENSILE.DAT from the Hand et al Handbook of Small Data Sets looks like this: 0.0230.0320.0540.0690.0810.094 0.1050.1270.1480.1690.1880.216 0.2550.2770.3110.3610.3760.395 0.4320.4630.4810.5190.5290.567 0.6420.6740.7520.8230.8870.926 except that my mail client has replaced the tab separators by blanks. If I paste this data into R 2.2.1 what I get is strength - scan() 1: 0.0230.0320.0540.0690.0810.094 1: 0.1050.1270.1480.1690.1880.216 Error in scan() : scan() expected 'a real', got '0.0230.0320.0540.0690.0810.094' 0.2550.2770.3110.3610.3760.395 Error: syntax error in 0.2550.2770 0.4320.4630.4810.5190.5290.567 Error: syntax error in 0.4320.4630 0.6420.6740.7520.8230.8870.926 Error: syntax error in 0.6420.6740 Aha! I thought, what I need is scan(sep = \t) but this generates the same error messages. Help! Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Pasting data into scan()
The file TENSILE.DAT from the Hand et al Handbook of Small Data Sets looks like this: 0.023 0.032 0.054 0.069 0.081 0.094 0.105 0.127 0.148 0.169 0.188 0.216 0.255 0.277 0.311 0.361 0.376 0.395 0.432 0.463 0.481 0.519 0.529 0.567 0.642 0.674 0.752 0.823 0.887 0.926 except that my mail client has replaced the tab separators by blanks. If I paste this data into R 2.2.1 what I get is strength - scan() 1: 0.0230.0320.0540.0690.0810.094 1: 0.1050.1270.1480.1690.1880.216 Error in scan() : scan() expected 'a real', got '0.0230.0320.0540.0690.0810.094' 0.2550.2770.3110.3610.3760.395 Error: syntax error in 0.2550.2770 0.4320.4630.4810.5190.5290.567 Error: syntax error in 0.4320.4630 0.6420.6740.7520.8230.8870.926 Error: syntax error in 0.6420.6740 Aha! I thought, what I need is scan(sep = \t) but this generates the same error messages. Help! Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Pasting data into scan() - oops!
I forgot to mention that I am using Windows XP. Original Message Subject: Pasting data into scan() Date: Tue, 02 May 2006 11:55:03 +1200 From: Murray Jorgensen [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch The file TENSILE.DAT from the Hand et al Handbook of Small Data Sets looks like this: [...] -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] vector-factor operation
Thanks to Berwin Turlach and Petr Pikal for tapply(vec, Fac, mean)[Fac] and Gabor Grothendieck Thomas Lumley for ave(vec,Fac) . Looking at the code for tapply and ave I guess that the latter is to be preferred. Murray Jorgensen Gabor Grothendieck wrote: Look at ?ave ave(vec, Fac) ave(vec, Fac, FUN = mean) # same ave(vec, Fac, FUN = sd) On 4/14/06, Murray Jorgensen [EMAIL PROTECTED] wrote: I found myself wanting to average a vector [vec] within each level of a factor [Fac], returning a vector of the same length as vec. After a while I realised that lm1 - lm(vec ~ Fac) fitted(lm1) did what I want. But there must be another way to do this, and it would be good to be able to apply other functions than mean() in this way. Cheers, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] vector-factor operation
I found myself wanting to average a vector [vec] within each level of a factor [Fac], returning a vector of the same length as vec. After a while I realised that lm1 - lm(vec ~ Fac) fitted(lm1) did what I want. But there must be another way to do this, and it would be good to be able to apply other functions than mean() in this way. Cheers, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Nested error structure in nonlinear model
I am trying to fit a nonlinear regression model to data. There are several predictor variables and 8 parameters. I will write the model as Y ~ Yhat(theta1,...,theta8) OK, I can do this using nls() - but only just as there are not as many observations as might be desired. Now the problem is that we have a factor Site and I want to include a corresponding error component. I tried something like (excuse the doctored R output): mod0.nlme - nlme(Y ~ Yhat(theta1,...,theta8) + site.err , +start = c(mod0.st,0), +groups = ~ Site, + fixed = theta1 + ... + theta8 ~ 1, + random = site.err ~ 1) Error in nlme.formula(Y ~ Yhat(theta1,...,theta8 : starting values for the fixed component are not the correct length and then mod0.nlme - nlme(Y ~ Yhat(theta1,...,theta8) + site.err , +start = mod0.st, +groups = ~ Site, + fixed = theta1 + ... + theta8 ~ 1, + random = site.err ~ 1) Error: Singularity in backsolve at level 0, block 1 The straightforward way would be to go mod0.nlme - nlme(Y ~ Yhat(theta1,...,theta8) , +start = mod0.st, +groups = ~ Site, + fixed = theta1 + ... + theta8 ~ 1) making all 8 parameters random, but there is little hope of getting that to work. (I did try it and it does not crash straight away but settles down to run forever.) Any comments welcome. I regret that I cannot go into much detail about the actual problem. -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nonlinear model: pseudo-design matrix
Given a nonlinear model formula and a set of values for all the parameters defining a point in parameter space, is there a neat way to extract the pseudodesign matrix of the model at the point? That is the matrix of partial derivatives of the fitted values w.r.t. the parameters evaluated at the point. (I have figured out how to extract the gradient information from an nls fitted model using the nlsModel part, but I wish to implement a score test, so I need to be able to extract the information at points other than the mle.) Thanks, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Huber location estimate
Prof Brian Ripley wrote: On Thu, 22 Dec 2005, Murray Jorgensen wrote: We have a choice when calculating the Huber location estimate: set.seed(221205) y - 7 + 3*rt(30,1) That's Cauchy, BTW, a very extreme case. Sure, the sort of situation where one might want a robust estimator. [...] Note that the huber() scale estimate is the initial MAD, whereas rlm iterates. Because during iteration the 'center' for MAD is known to be zero, the results differ. For symmetric distributions there is little difference, but your sample is not close to symmetric. [Blush] yes I knew that and somehow I forgot. But leave rlm() alone for a while and do IRLS with fixed scale: th - median(y) s - mad(y) # paste this in a few times: w - ifelse((y-th 1.345*s y-th-1.345*s), 1, 1.345*s/abs(y-th)) th - weighted.mean(y,w) th We converge to th [1] 5.9203 close to the answer given by rlm() different from huber(y)$mu [1] 5.9117 So the variable scale does not account for the difference. Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Huber location estimate
D'oh! Apologies for wasting everybody's time! Murray Martin Maechler wrote: Murray == Murray Jorgensen [EMAIL PROTECTED] on Thu, 22 Dec 2005 22:13:45 +1300 writes: Murray Prof Brian Ripley wrote: On Thu, 22 Dec 2005, Murray Jorgensen wrote: We have a choice when calculating the Huber location estimate: set.seed(221205) y - 7 + 3*rt(30,1) That's Cauchy, BTW, a very extreme case. Murray Sure, the sort of situation where one might want a robust estimator. Murray [...] Note that the huber() scale estimate is the initial MAD, whereas rlm iterates. Because during iteration the 'center' for MAD is known to be zero, the results differ. For symmetric distributions there is little difference, but your sample is not close to symmetric. Murray [Blush] yes I knew that and somehow I forgot. But leave rlm() alone for Murray a while and do IRLS with fixed scale: Murray th - median(y) Murray s - mad(y) Murray # paste this in a few times: Murray w - ifelse((y-th 1.345*s y-th-1.345*s), 1, 1.345*s/abs(y-th)) Murray th - weighted.mean(y,w) Murray th Murray We converge to th Murray [1] 5.9203 Murray close to the answer given by rlm() different from huber(y)$mu Murray [1] 5.9117 Murray So the variable scale does not account for the difference. No, the main difference is the different default: huber() has k=1.5 and rlm() has k=1.345 : Try this set.seed(221205) y - 7 + 3*rt(30,1) str(huber(y, k = 1.345), digits = 5) ## List of 2 ## $ mu: num 5.9203 ## $ s : num 4.0967 str(rlm(y ~ 1)[c(coefficients, s)], digits = 5) # ## (edited to) ## $ coefficients: num 5.9204 ## $ s : num 3.7463 which gives 'mu' very close, even for the iterated vs. non-iterated scales. Martin Maechler, ETH Zurich -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Huber location estimate
We have a choice when calculating the Huber location estimate: set.seed(221205) y - 7 + 3*rt(30,1) library(MASS) huber(y)$mu [1] 5.9117 coefficients(rlm(y~1)) (Intercept) 5.9204 I was surprised to get two different results. The function huber() works directly with the definition whereas rlm() uses iteratively reweighted least squares. My surprise is because I vaguely remember @ARTICLE{hw77, author = {Holland, P. W. and Welsch, R. E.}, title = {Robust Regression using Iteratively Reweighted Least-Squares}, journal = {Communications in Statistics: Theory and Methods}, volume = {A6(9)}, number = {}, pages = {813-827}, year= {1977} } as saying that the two methods were equivalent. Obviously they aren't quite. Comments welcome. Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Simple introduction to Bayesian Statistics
Joshua N Pritikin wrote: I made it through the first chapter (Background) of Bayesian Data Analysis by Gelman, et al. Conceptually, the Bayesian approach appeals to me and my curiosity is piqued. However, the next chapter was much too terse. The math is daunting. Where can I find a gentle introduction? Or which math book(s) do I need to read first? On page 27, there is mention of introductory books including Bayesian Methods by Jeff Gill (2002). Just for fun, I took a look at this book to see whether I could get through it. Chapter 1 was inviting, but again, the math got too sophisticated starting from chapter 2. Try Introduction to Bayesian Statistics William M. Bolstad ISBN: 0-471-27020-2 by my colleague Bill Bolstad. This is written for a course targeting bright first or second year students and assumes very little background. Cheers, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Extracting variance components in lme
Consider the output for the inroductory Rail example in Mixed Effects Models in S and S-PLUS by Pinheiro and Bates: summary(fm1Rail.lme) Linear mixed-effects model fit by REML Data: Rail AIC BIC logLik 128.177 130.6766 -61.0885 Random effects: Formula: ~1 | Rail (Intercept) Residual StdDev:24.80547 4.020779 Fixed effects: travel ~ 1 Value Std.Error DF t-value p-value (Intercept) 66.5 10.17104 12 6.538173 0 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -1.61882658 -0.28217671 0.03569328 0.21955784 1.61437744 Number of Observations: 18 Number of Groups: 6 I want to extract the variance components sigma = 4.020779 (the within component) and sigma_b = 24.80547 (the between component). I can get sigma easily: sigma - fm1Rail.lme$sigma but how can I get sigma_b ? Cheers, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Extracting Variance Components fro lme
It seems that what I need to get the within group component is as.numeric(VarCorr(fm1Rail.lme)[2,2]) thanks to Bert Gunter and Peter Alspach. Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Extracting variance components in lme
Woops! I should have written: as.numeric(VarCorr(fm1Rail.lme)[1,2]) for the within component. -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Downloading zip files
I have not had a great amount of success installing/updating packages from the Packages menu of Rgui under Windows XL. (Except for installing from loacal zip files.) But I am not asking for help in using these facilities because I prefer to keep a folder of package zip files. On the other hand I do find it tedious having to right-click Save link as on every individual file from CRAN. I'm sure someone knows a faster way to do it. Cheers, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Error in integrate
I'm using R 2.0.1 under Windows XP. I get the following message when I run the code listed below. I don't seem to have any problems using the function slice outside integrate. Error in integrate(slice, 0, Inf, , , , , , , a, b) * delta : non-numeric argument to binary operator [ By the way, I'm trying to evaluate a two-dimensional integral by slicing it up into one-dimensional bits which I will loop over to evaluate.] Here's the code: mu - 0.3 sig2 - 0.01 alpha - 20 beta - 15.75 rho - 0.1 k - 1/(rho^2.5*gamma(rho)^2*sqrt(2*pi*sig2)) slice - function(w,a,b) { g - w^(1/rho) g1 - w1^(1/rho) h - g1^a*g^b E - -(y-rho*mu -g1/alpha + g/beta)^2/(2*sig2*rho) k*h*exp(E- g1 - g) } hi - 10 delta - 0.05 grid - seq(delta/2,hi,delta) y - -0.3 a - 0 b - 0 m - length(grid) A - rep(0,m) j - 0 for (w1 in grid) { j - j+1 A[j] - integrate(slice,0,Inf,,,a,b)*delta cat(A[j],\n) } -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R for teaching multivariate statistics (Summary)
Greetings all I promised a summary of the responses that I got to my question: Next year I will be teaching a third year course in applied statistics about 1/3 of which is multivariate statistics. I would be interested in hearing experiences from those who have taught multivariate statistics using R. Especially I am interested in the textbook that you used or recommended. There were not many replies, so my task is easy! Peter Dunn mentioned the new book by Brian Everitt, An R and S-Plus Companion to Multivariate Analysis which he had yet to see. Someone else has it out on loan here so I have not seen it either. Peter has been using Bryan Manly's book but finds that it is expensive and describes different algorithms to those used in R. Brian Ripley drew attention to Chapters 11 and 12 of MASS. Pierre Bady pointed out the material on the website Enseignements de Statistique en Biologie http://pbil.univ-lyon1.fr/R/enseignement.html by A.B. Dufour, D. Chessel J.R. Lobry. I hope to explore this some more when I get back to higher bandwidth. Pat Altham has a wealth of material on her web site, especially http://www.statslab.cam.ac.uk/~pat/misc.ps and http://www.statslab.cam.ac.uk/~pat/AppMultNotes.ps.gz Many thanks to these respondants for their help. Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R for teaching multivariate statistics
Greetings all - Next year I will be teaching a third year course in applied statistics about 1/3 of which is multivariate statistics. I would be interested in hearing experiences from those who have taught multivariate statistics using R. Especially I am interested in the textbook that you used or recommended. If you reply directly to me I will summarize to reduce list traffic. Regards, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Coarsening Factors
It is not uncommon to want to coarsen a factor by grouping levels together. I have found one way to do this in R: sites [1] F A A D A A B F C F A D E E D C F A E D F C E D E F F D B C Levels: A B C D E F regions - list(I = c(A,B,C), II = D, III = c(E,F)) library(Epi) region - Relevel(sites,regions) region [1] III I I II I I I III I III I II III III II I III I III [20] II III I III II III III III II I I Levels: I II III However this seems like using a sledgehammer to crack a nut. Can someone suggest a simpler way to do this task? Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Sorting Text Frames
Uwe Ligges wrote: I guess there is just too much space or some special characters in your variables that cause problems when printing ... Hence you have to debug your data yourself. Uwe Ligges However the problem persists when I don't try to print the fram thanks to a file. thanks[tord[1:74],] gives me row numbers, V1 and V2 to the console, but thanks[tord[1:75],] gives only the row numbes and V1. It is not a special problem with row tord[75]: thanks[tord[75],] V1 V2 1192 Votic (Russia) [may God give you health]Antagoo Jumal tervüt teilee Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Sorting Text Frames
[Using 2.0.1 under Windows XP] There are a few pages on the internet that list equivalents of thank you in many languages. I downloaded one from a Google search and I thought that it would be interesting and a good R exercise to sort the file into the order of the expressions, rather than the languages. I tidied up the web page and got it into the format that it was nearly in: Language Name in columns 1-43, the expression in the remaining columns. Then I read it in: thanks - read.fwf(C:\\Files\\Reading\\thankyou.txt, c(43,37)) thanks[1:4,] V1V2 1 Abenaki (Maine USA, Montreal Canada)Wliwni ni 2 Abenaki (Maine USA, Montreal Canada) Wliwni 3 Abenaki (Maine USA, Montreal Canada) Oliwni 4 Achà (Baja Verapaz Guatemala) Mantiox chawe dim(thanks) [1] 12542 Now I tried sorting the frame into the order of the second column: tord - order(thanks$V2) sink(C:\\Files\\Reading\\thanks.txt) thanks[tord[1:74],] sink() This gives more or less the expected output, the file thanks.txt beginning V1 V2 145 Cahuila (United States)'\301cha-ma 862 Paipai (Mexico, USA)'Ara'ya:ikm 863 Paipai (Mexico, USA)'Ara'yai:km 864 Paipai (Mexico, USA) 'Ara'ye:km 311 Eyak (Alaska)'Awa'ahdah [you may get a bit of wrapping there!] However I don't really want just 74 lines, I would like the whole file. But if I get rid of the [1:74] or replace 74 with any larger number I get output like this, with no second column: V1 145 Cahuila (United States) 862 Paipai (Mexico, USA) 863 Paipai (Mexico, USA) 864 Paipai (Mexico, USA) 311 Eyak (Alaska) Does anyone know what is going on? Tusen tak in advance, in fact 1254 tak in advance! Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nlme SASmixed in 2.0.1
I assigned a class the first problem in Pinheiro Bates, which uses the data set PBIB from the SASmixed package. I have recently downloaded 2.0.1 and its associated packages. On trying library(SASmixed) data(PBIB) library(nlme) plot(PBIB) I get a warning message Warning message: replacing previous import: coef in: namespaceImportFrom(self, asNamespace(ns)) after library(nlme) and a pairs type plot of PBIB. Pressing on I get: lme1 - lme(response ~ Treatment, data=PBIB, random =~1| Block) summary(lme1) Error: No slot of name rep for this object of class lme Error in deviance([EMAIL PROTECTED], REML = REML) : Unable to find the argument object in selecting a method for function deviance Everything works fine under 1.8.1 and plot(PBIB) is of trellis style, which is what I think the authors intend. Cheers, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Order of messgaes/ missing messages
The TCP protocol sends a message as a stream of packets. When packets are lost over a particular link in the path from the sending machine at ETHZ to a recipient parts of the message will be re-transmitted. A message sent after another message can arrive before it if it experiences less packet loss on its path through the network. It's statistical, you see! Murray Jorgensen hadley wickham wrote: Note that if this bothers you then you could try a web-based email systems (e.g. yahoo, hotmail, etc.) to receive your r-help messages since the path to any of them would be independent of your particular location on the net. I use gmail and still often recieve replies before the original message. Why should using webmail make a difference? Surely it's the r mailing list server that does all the sending (apart from specific individual replies)? And the listserve must recieve the reply after the original, so it seems to me that it must be the time it takes to get between the server and you that is varying. Or is the listserve sitting on some emails before sending them? One test would be to see if we recieve emails in the same order. Hadley __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Association between discrete and continuous variable
I'm wondering if mutual information al la Cover Thomas (1991, Ch 2) is not the killer association measure for all types of random variables? Murray Jorgensen PS Yes, this is probably OT! Richard A. O'Keefe wrote: What's the reommended way, in R, to determine the strength of association between a discrete variable and a continuous variable? Yes, I have read the manuals, trawled the archives, c. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Nested source()s
I had an error message while running a macro from Yudi Pawitan's web site: source(ex2-13.r) Error in parse(file, n, text, prompt) : syntax error on line 2 Inspecting ex2-13.r I found that the error was generated by another source() command. Clearly R does not like nested source()s, which is fair enough when you think about it. Still it's something that you might want to do. Does anyone know how to get achieve the substance of what nested source() commands would give you? Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] text editor for R
I tried R-WinEdt a few years ago, but as I remember it interfered with my usual use of WinEdt which is as a front end to MiKTeX. Is there a way to use WinEdt both ways? Murray Jorgensen Marc Schwartz wrote: On Wed, 2004-07-07 at 17:47, Yi-Xiong Sean Zhou wrote: Hi, What is the best text editor for programming in R? I am using JEdit as the text editor, however, it does not have anything specific for R. It will be nice to have a developing environment where the keywords are highlighted, plus some other debugging functions. Yi-Xiong More information is available at: http://www.sciviews.org/_rgui/ Your e-mail headers suggest that you are using Windows. Thus, perhaps the two best choices (subject to challenge by others) would be: 1. R-WinEdt (Under IDE/Script Editors) 2. ESS for Windows The above two tools provide for a wide variety of functionality beyond syntax highlighting. There is a syntax highlighting file listed at the above site for jEdit. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] text editor for R
Oh, yes. I think I did do something like this, but for some reason the two incarnations bothered me. Murray Jorgensen David Scott wrote: On Thu, 8 Jul 2004, Murray Jorgensen wrote: I tried R-WinEdt a few years ago, but as I remember it interfered with my usual use of WinEdt which is as a front end to MiKTeX. Is there a way to use WinEdt both ways? Murray Jorgensen This problem annoyed me for a while too. My solution (which is not perhaps ideal) is this. You want two different incarnations of WinEdt, one for TeX, the other for R. On the desktop I have a shortcut to WinEdt which is the one for TeX stuff. I open the other one with R syntax highlighting etc by starting R and using library(RWinEdt). To do this you have to install the RWinEdt package and SWinRegistry. This is all well explained in the ReadMe.txt for RWinEdt. I think with the right additions to the Target field in a shortcut to WinEdt you can call up the incarnation of WinEdt that is suitable for R. I haven't done that. You would then have two shortcuts to WinEdt, one for your TeX stuff, one for R. Uwe Ligges is the guru for this though. David Scott David Scott Department of Statistics, Tamaki Campus The University of Auckland, PB 92019 Auckland NEW ZEALAND Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000 Email: [EMAIL PROTECTED] Graduate Officer, Department of Statistics __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: Summary [R] R 1.9.0, special characters in variable names.
This is not what I would call a summary. A summary should: 1. State the original question. 2. Give a pre'cis of the responses. Murray Jorgensen Sixten Borg wrote: Summary: The locale setting in the operating system seems to be involved in what confused me a little bit. Thank you all for your help, especially the suggested work-around data.frame(..., check.names=F) which works very well. A mystery still to be solved is why two versions of R, running on the same machine on the same time, behaves differently. Please do not respond to this on the list. I very much welcome you not to respond at all. Sixten. Prof Brian Ripley [EMAIL PROTECTED] 2004-06-24 10:20:55 Can we stop blaming R for things which are not its fault, especially as that has already been pointed out twice this morning? __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] MCLUST Covariance Parameterization.
I'm not going to directly answer your question but it seems to me that you want to fit a completely unconstrained Gaussian mixture model. This may not be the best thing to do as without constraints on the Sigma_k the model may have more parameters than it is reasonable to try to estimate with the available data. A preliminary fit of a mixture model, even if it does not have quite the covariance structure that you want gives a clustering of the data that you can use to explore the correlation structure of the components empirically: then you can revise the covariance structure and re-fit. I think that this sort of approach is likely to be more effective than fitting the fully unstructured model directly. Murray Jorgensen [EMAIL PROTECTED] wrote: Hello all (especially MCLUS users). I'm trying to make use of the MCLUST package by C. Fraley and A. Raftery. My problem is trying to figure out how the (model) identifier (e.g, EII, VII, VVI, etc.) relates to the covariance matrix. The parameterization of the covariance matrix makes use of the method of decomposition in Banfield and Rraftery (1993) and Fraley and Raftery (2002) where Sigma_k = lambda_k*D_k*A_k*D_k^' where Sigma_k is the covariance matrix for the kth (k=1,...,G), lambda_k is the kth groups constant of proportionality, D_k is the orthogonal matrix of eigenvectors for the kth group, and A_k is a diagonal matrix whose elements are proportional to the eigenvalues. The parameterization of the covariance matrix Sigma_k depends on the distribution (whether spherical, diagonal, or ellipsoidal), volume (equal or variable), shape (equal or variable), and orientation (coordinate axes, equal, or variable). The distribution, volume, shape and orientation are a function of lambda_k, D_k, and A_k. Thus, depending on whether or not these values are constant across class defines Sigma_k. What I'm trying to figure out is how the distribution, volume, shape, and orientation relate to Sigma_k. As far as the parameterization of Sigma_k, what do distribution, volume, shape, and orientation even mean. Does a table exist of how these values relate to the Sigma_k? I know a table exists in the MCLUST software manual on the MCLUST website, but this table doesn't relate the values of distribution, volume, shape, and orientation to Sigma_k directly, only to how Sigma_k would be parameterized (this isnt helpful unless you know what distribution, volume, shape and orientation mean in terms of the within class covariance matrix) So, just what do the distribution, volume, shape, and orientation mean in the context of Sigma_k? What do the distribution, volume, shape, and orientation mean for a Sigma_k=sigma^2*I where I is a p by p covariance matrix, sigma^2 is the constant variance and Sigma_1=Sigma_2==Sigma_G. What about when a Sigma_k=sigma^2_k*I, or when Sigma_1=Sigma_2==Sigma_G in situations where each element of the (constant across class) covariance matrix is different? I would say I have a pretty good understanding of finite mixture modeling, but nothing I've read (expect the works cited in the 2002 JASA paper) talks about parameterizing the Sigma_k matrix in such a way. It would be nice to specify a structure directly for Sigma_k (as most books talk about). Any help on this issue would be greatly appreciated. Thanks, Ken. __ [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problem with logtrans, from library MASS
Thanks for the help. The variables do all come from a frame but with various transformations and manipulations. I prefer not to stick them all into a new frame just to call one function. Thanks again, Murray Jorgensen Prof Brian Ripley wrote: On 14 May 2004, Peter Dalgaard wrote: Murray Jorgensen [EMAIL PROTECTED] writes: Greetings all! This problem occurs using R 1.8.1 on Windows XP. I downloaded the binaries for R and all packages, including the VR bundle, in December 2003. The data consists of NZ$ prices and attributes for 643 cars. summary(price) Min. 1st Qu. MedianMean 3rd Qu.Max.NA's 14290 35800 48990 65400 79000 285000 2 library(MASS) boxcox(price ~ doors+CC+KW+KG+LENGTH, lambda=seq(-0.1,0.5,length=30)) boxcox(price ~ doors+CC+KW+KG+LENGTH, lambda=seq(-0.5,0.1,length=30)) This all works fine, I get a fairly sharp peak near lambda=-0.25, but then: logtrans(price ~ doors+CC+KW+KG+LENGTH, alpha=seq(-1,0,length=30)) Error in eval(expr, envir, enclos) : Object price not found The issue is that logtrans has a 'data' argument that defaults to NULL (the author(s) may have a reason for having this inconsistent with boxplot?). Adding data=.GlobalEnv seems to work. It's an R/S difference. data=NULL or list() in S does give you the normal search path. I guess this got noticed for boxplot and not logtrans. Using a data argument would in any case be a very good idea. -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Problem with logtrans, from library MASS
Greetings all! This problem occurs using R 1.8.1 on Windows XP. I downloaded the binaries for R and all packages, including the VR bundle, in December 2003. The data consists of NZ$ prices and attributes for 643 cars. summary(price) Min. 1st Qu. MedianMean 3rd Qu.Max.NA's 14290 35800 48990 65400 79000 285000 2 library(MASS) boxcox(price ~ doors+CC+KW+KG+LENGTH, lambda=seq(-0.1,0.5,length=30)) boxcox(price ~ doors+CC+KW+KG+LENGTH, lambda=seq(-0.5,0.1,length=30)) This all works fine, I get a fairly sharp peak near lambda=-0.25, but then: logtrans(price ~ doors+CC+KW+KG+LENGTH, alpha=seq(-1,0,length=30)) Error in eval(expr, envir, enclos) : Object price not found Comments welcome, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Re: Mozilla Misbehavior
It was a proxy settings problem. All fixed now. Sorry to disturb the list. Murray Jorgensen [EMAIL PROTECTED] wrote: Hi Murray: Perhaps the errors are thrown by Mozilla because it encounters an encrypted/secure page and does not know what to do (..:)..)? Would it help if you went to Edit Preferences... Privacy and Security select SSL or Certificates drop down list and then once in SSL, or once in Certificates tab, make the necessary changes so that Mozilla can at least show you the certificates which you can then accept or reject before proceeding. And indeed, why are mail archives on secure pages? HTH, Arin Message: 9 Date: Mon, 10 May 2004 09:32:48 +1200 From: Murray Jorgensen [EMAIL PROTECTED] Subject: Re: [R] Modalwert To: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=us-ascii; format=flowed Off-topic, I know, but when I try to follow secure links like this in Mozilla, I always get a message The connection was refused when trying to contact www.stat.math.ethz.ch and I am forced to use Internet Explorer instead. This is under Windows XP. Is this because www.stat.math.ethz.ch is using some non-standard Microsoft extensions of html? Or is it because my Mozilla 1.6 browser is wrongly configured? Why are the mail archives on secure pages anyway? Murray Jorgensen __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Modalwert
Off-topic, I know, but when I try to follow secure links like this in Mozilla, I always get a message The connection was refused when trying to contact www.stat.math.ethz.ch and I am forced to use Internet Explorer instead. This is under Windows XP. Is this because www.stat.math.ethz.ch is using some non-standard Microsoft extensions of html? Or is it because my Mozilla 1.6 browser is wrongly configured? Why are the mail archives on secure pages anyway? Murray Jorgensen Thomas Petzoldt wrote: Sonja Dornieden wrote: Hai - kann mir jemand sagen, wie ich den Modalwert in R berechne?! IRgendwie finde ich den Befehl nicht greetz und herzlichen Dank Sonja Hi, there was already a thread in this list about this question with subject Computing the mode on 24.02.2004. You will find several answers in the R-Help archive: https://www.stat.math.ethz.ch/pipermail/r-help/2004-February/ Thomas P. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Changing Gui preferences
I want to modify my Gui preferences to get rid of the MDI toolbar and status bar. [ Under Windows XP and R 1.8.1 and 1.9.0.] When I uncheck the boxes and click apply I am told to save the preferences and the changes will take effect when restarting R. I click save and a box comes up to ask me to save Rconsole (I'm not sure where this should be saved but I navigate my way to where the old Rconsole was and save on top of it. Strangely I am not asked if I want to replace the old Rconsole. Anyway when I re-start there are no changes and I am still cluttered up by the MDI bars that I want to get rid of! Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Rgui front-end has encountered a problem and needs to close
Well I don't know if anyone can help with this but it will be interesting to know if others have had the same problem. I can't start R at home on my laptop [ I'm using 1.8.1 under Windows XP]. When I click on the shortcut I get the usual Windows box for when an application needs to close. A couple of clicks down it displays the following: Error signature AppName: rgui.exeAppVer: 1.81.31121.0ModName: msvcrt.dll ModVer: 7.0.2600.1106Offset: 0003213b as well as another uncopyable window containing a large amount of binary. I was going to re-install R at work today but the thing worked OK. But back home tonight I get the same problem! The only difference between home and work is that at work I'm connected to a LAN. Cheers, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Scanning tab-separated numbers
I want to paste in the following numbers into a scan: 0.023 0.032 0.054 0.069 0.081 0.094 0.105 0.127 0.148 0.169 0.188 0.216 they are separated by tabs alone, unless my mailer has done something to the tabs. Now have a look at this: scan() 1: 0.0230.0320.0540.0690.0810.094 1: 0.1050.1270.1480.1690.1880.216 Error in scan() : scan expected a real, got 0.0230.0320.0540.0690.0810.094 scan(sep=\t) 1: 0.0230.0320.0540.0690.0810.094 1: 0.1050.1270.1480.1690.1880.216 Error in scan(sep =) : scan expected a real, got 0.0230.0320.0540.0690.0810.094 Platform: Windows XP Release 1.8.1 I can't seem to scan in tab-separated numbers even when I try to tell R to expect that. (this may be related to the Sweave problem I mentioned a few days ago.) Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Sweave and sep = \t
In my .Snw file: = fyle - choose.files() fyle f - count.fields(fyle, sep = \t) f @ and in the .tex file: \begin{Schunk} \begin{Sinput} fyle - choose.files() fyle \end{Sinput} \begin{Soutput} [1] C:\\Files\\Data\\Cars03\\Rover.txt \end{Soutput} \begin{Sinput} f - count.fields(fyle, sep =) f \end{Sinput} \begin{Soutput} [1] 8 8 8 8 8 \end{Soutput} \end{Schunk} But I want the sep = \t to be left as is. Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] references on cluster analysis
I don't really believe that there is any satisfactory definition of the true number of clusters let along a procedure that would reliably find it. Murray Jorgensen Martin Maechler wrote: Back from my vacation, I haven't seen an R-help answer on this (Christian, where have you been ? ;-) GiampS == Giampiero Salvi [EMAIL PROTECTED] on Sat, 7 Feb 2004 23:40:36 +0100 (CET) writes: GiampS Hi all, I'm doing a study on predicting the true GiampS number of clusters in a hierarchical clustering GiampS scheme. My main reference is at the moment GiampS Milligan GW and Cooper MC (1985) An examination of GiampS procedures for determining the number of clusters in GiampS a data set Psychometrika vol 50 no 2 pp 159-179 GiampS and all the references included in that paper. (not available to me) GiampS I'm planning to perform a similar comparison on a GiampS number of indexes, but on a much larger data set (in GiampS the order of 3000 points), and with a much higher GiampS true number of clusters (in the order of some GiampS hundreds), to see if the properties of the indexes GiampS scale accordingly. GiampS I was wondering if the set of indexes described in GiampS the reference are still state of the art (most of GiampS them were introduced in the '60s and '70s), or if GiampS there are new indexes and methods I could include in GiampS my study. I would really appreciate if you could GiampS point me to some newer references addressing this problem. Gordon's 2nd edition, author = {A. D. Gordon}, title ={Classification, 2nd Edition}, publisher ={Chappman \ Hall/CRC}, year = 1999, series = {Monographs on Statistics and Applied Probability 82}, edition = {2nd edition} has a whole chapter (one of the last ones in the book) on this. R's cluster package has a generic silhouette() function (with 2 methods), and plot.silhouette() method --- all are improvements from Kaufman Rousseeuw's original code. A recent research paper using CLEST (Fridyland Dudoit), mentioning GAP (Tibshirani) etc etc still find silhouette among the best indices for determining the number of clusters. A student's (master) thesis here seems to point in the same direction. GiampS I also read Milligan's chapter in the book GiampS Clustering and Classification from 1995, (which book? author?) GiampS but didn't find information on this subject that wasn't GiampS included in the previous paper. Regards, Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] In praise of -
This is not a problem, but I thought that I might say one or two things in favour of right pointing assignment before anybody influential gets the idea of scrapping it! Situation (a) You have just typed a long complcated expression into the console and you suddenly realise that you will need it later in the session. Just append - something to the end of the expression. You can do this with any recently evaluated expression with the help of the up-arrow key (in Windows at least). Situation (b) You have written a chunk of code and you want to see how it behaves for various values of the scalar fred. Just put - fred at the start of the code block, type any number, then paste the code into the console. Cheers, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] In praise of -
Liaw, Andy wrote: This is also easily accomodated w/o the use of -: Instead of MyValue - fred [code that follows] use: fred - MyValue [code that follows] Andy The point is that - fred [code that follows] is sitting as one piece in the clipboard. So instead of 3 paste 8 paste etc you need to do fred - 3 paste fred - 8 paste etc Not a big saving, but some of us are lazy! -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Sweave problem
Sorry, in an effort to be minimal I cut away some of my surrounding text: The first code block is meant to be typed in by the user, except for the numbers which are to be thought of as pasted in. I certainly do not wish to supply a file name to scan() for this particular example. Murray At 07:44 5/02/2004 +, Prof Brian Ripley wrote: Note that ?scan says file: the name of a file to read data values from. If the specified file is '', then input is taken from the keyboard and you want it to come from the file. So I would not have expected it to work. However, the help file is not totally accurate, as this will work in a batch file in R (although it has not worked in some versions of S-PLUS). The problem is that Sweave does expect the input to all be S code, and does not get as far as running it. Far from being `rather simple', this is a rather sophisticated usage, intended for use only at a keyboard. On Thu, 5 Feb 2004, Murray Jorgensen wrote: Here is the file minimal.Snw: \documentclass[a4paper]{article} \title{R tips and tricks} \author{Murray Jorgensen} \usepackage{Sweave} \begin{document} \maketitle \section*{Entering data from a single variable} The following data are transformed tensile strength measurements on polyester fibres. They may be found on the file \texttt{TENSILE.DAT}. We may enter this data into R using the \texttt{scan} command. = strength - scan() 0.023 0.032 0.054 0.069 0.081 0.094 0.105 0.127 0.148 0.169 0.188 0.216 @ \end{document} and now I attempt to use SWeave to convert it to minimal.tex: Sweave(minimal.Snw) Writing to file minimal.tex Processing code chunks ... 1 : echo term verbatim Error: chunk 1 Error in parse(file, n, text, prompt) : parse error It seems a rather simple piece of code to generate such an error message! Murray [R 1.8.1 on Windows XP] -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Sweave problem
Here is the file minimal.Snw: \documentclass[a4paper]{article} \title{R tips and tricks} \author{Murray Jorgensen} \usepackage{Sweave} \begin{document} \maketitle \section*{Entering data from a single variable} The following data are transformed tensile strength measurements on polyester fibres. They may be found on the file \texttt{TENSILE.DAT}. We may enter this data into R using the \texttt{scan} command. = strength - scan() 0.023 0.032 0.054 0.069 0.081 0.094 0.105 0.127 0.148 0.169 0.188 0.216 @ \end{document} and now I attempt to use SWeave to convert it to minimal.tex: Sweave(minimal.Snw) Writing to file minimal.tex Processing code chunks ... 1 : echo term verbatim Error: chunk 1 Error in parse(file, n, text, prompt) : parse error It seems a rather simple piece of code to generate such an error message! Murray [R 1.8.1 on Windows XP] -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Finding Sweave.sty and other problems
Hmmm, I can certainly remove the path from the \usepackage command. (Now that I come to think of it, I have never seen that in LaTeX before.) I wonder why Sweave put it there in the first place? I thought that I was just running it straight out of the box. Don't tell me though: I will read the manual some more. Murray At 07:59 29/01/2004 +, Prof Brian Ripley wrote: ~ is an active character in TeX, so it assumes it is not in a filename. You will need to escape it. It would be better to have \usepackage{Sweave} there and the path in your TEXINPUTS. TeX is not really designed to work with file paths. On Thu, 29 Jan 2004, Murray Jorgensen wrote: Hi, I've just tried to run example-3 from Friedrich Leish. I'm using R 1.8.1 and MiKTeX 2.2 on Windows XP. I go === library(tools) Sweave(example-3.Snw) Writing to file example-3.tex Processing code chunks ... 1 : term hide 2 : echo term verbatim 3 : term tex 4 : term verbatim eps pdf You can now run LaTeX on example-3.tex === The file example-3.tex looks OK, it starts off === \documentclass[a4paper]{article} \usepackage{C:/PROGRA~1/R/rw1081/share/texmf/Sweave} \begin{document} \section*{The Cats Data} .. === but my LaTeX log file tells a sad story: Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Finding Sweave.sty and other problems
Hi, I've just tried to run example-3 from Friedrich Leish. I'm using R 1.8.1 and MiKTeX 2.2 on Windows XP. I go === library(tools) Sweave(example-3.Snw) Writing to file example-3.tex Processing code chunks ... 1 : term hide 2 : echo term verbatim 3 : term tex 4 : term verbatim eps pdf You can now run LaTeX on example-3.tex === The file example-3.tex looks OK, it starts off === \documentclass[a4paper]{article} \usepackage{C:/PROGRA~1/R/rw1081/share/texmf/Sweave} \begin{document} \section*{The Cats Data} .. === but my LaTeX log file tells a sad story: === This is TeX, Version 3.141592 (MiKTeX 2.2) (preloaded format=latex 2000.11.28) 29 JAN 2004 16:36 **example-3.tex (example-3.tex LaTeX2e 2001/06/01 Babel v3.7h and hyphenation patterns for english, french, german, ngerman, du mylang, nohyphenation, loaded. (C:\texmf\tex\latex\base\article.cls Document Class: article 2001/04/21 v1.4e Standard LaTeX document class (C:\texmf\tex\latex\base\size10.clo File: size10.clo 2001/04/21 v1.4e Standard LaTeX file (size option) ) [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] \abovecaptionskip=\skip41 \belowcaptionskip=\skip42 \bibindent=\dimen102 ) ! Missing \endcsname inserted. to be read again \protect l.4 \begin {document} ? s OK, entering \scrollmode... ! LaTeX Error: Missing \begin{document}. See the LaTeX manual or LaTeX Companion for explanation. Type H return for immediate help. ... l.4 \begin {document} You're in trouble here. Try typing return to proceed. If that doesn't work, type X return to quit. ! Extra \endcsname. [EMAIL PROTECTED] [EMAIL PROTECTED] -h@@k\endcsname [EMAIL PROTECTED] \let \CurrentOptio... l.4 \begin {document} I'm ignoring this, since I wasn't doing a \csname. ! Missing \endcsname inserted. to be read again \protect l.4 \begin {document} The control sequence marked to be read again should not appear between \csname and \endcsname. ! Extra \endcsname. [EMAIL PROTECTED]@aded ...er \ifx \csname [EMAIL PROTECTED] \relax \expandafter [EMAIL PROTECTED] l.4 \begin {document} I'm ignoring this, since I wasn't doing a \csname. ! Missing \endcsname inserted. to be read again \protect l.4 \begin {document} The control sequence marked to be read again should not appear between \csname and \endcsname. ! Extra \endcsname. [EMAIL PROTECTED]@ptions ...xdef \csname [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] l.4 \begin {document} I'm ignoring this, since I wasn't doing a \csname. ! Missing \endcsname inserted. to be read again \protect l.4 \begin {document} The control sequence marked to be read again should not appear between \csname and \endcsname. ! Missing \endcsname inserted. to be read again \protect l.4 \begin {document} The control sequence marked to be read again should not appear between \csname and \endcsname. ! Extra \endcsname. argument ...e/texmf/[EMAIL PROTECTED] \endcsname , l.4 \begin {document} I'm ignoring this, since I wasn't doing a \csname. ! Missing \endcsname inserted. to be read again \protect l.4 \begin {document} The control sequence marked to be read again should not appear between \csname and \endcsname. ! Extra \endcsname. argument [EMAIL PROTECTED]@currname [EMAIL PROTECTED] \endcsname [EMAIL PROTECTED] \InputIfFileExists... l.4 \begin {document} I'm ignoring this, since I wasn't doing a \csname. ! LaTeX Error: File `C:/[EMAIL PROTECTED] \penalty [EMAIL PROTECTED] \ {}1/R/rw1081/share /texmf/Sweave.sty' not found. Type X to quit or RETURN to proceed, or enter new name. (Default extension: sty) Enter file name: x Overfull \hbox (689.89102pt too wide) in paragraph at lines 4--4 [][] \OT1/cmr/m/n/10 1/R/rw1081/share/texmf/Sweave.sty-h@@k 1/R/rw1081/share/te xmf/Sweave.sty1/R/rw1081/share/texmf/Sweave.sty1/R/rw1081/share/texmf/Sweave.st y, 1/R/rw1081/share/texmf/Sweave.sty 1/R/rw1081/share/texmf/Sweave.sty [] ) (\end occurred when \ifx on line 4 was incomplete) (\end occurred when \ifx on line 4 was incomplete) Here is how much of TeX's memory you used: 205 strings out of 96052 1946 string characters out of 1197190 46593 words of memory out of 1050795 3221 multiletter control sequences out of 35000 3640 words of font info for 14 fonts, out of 50 for 1000 14 hyphenation exceptions out of 607 23i,1n,17p,117b,40s stack positions out of 1500i,500n,5000p,20b,32768s No pages of output. === Any comments welcome! Murray -- Dr Murray Jorgensen
[R] Assignments in loops
Greetings all. Any help with the following would be appreciated. I want to create a data frame for each file in a directory. The following code does not work but it may show what I am trying to do: carmakes - c('BMW','Chrysler','Citroen','Fiat','Ford','Holden','Honda', 'Mercedes','MG','Mitsubishi','Nissan','Peugeot','Renault','Subaru','Toyota', 'VW') for (brand in carmakes) { fyle - paste(c:/data/cars03/,brand,.txt,sep=) brand - read.table(fyle, header = TRUE, sep = \t,na.strings = c(-,POA), colClasses=c(character,rep(numeric,7)), comment.char = #) } I need something like an unquote() function that will allow the brand on the LHS of the assignment to be treated as an object name instead of a character string. Murray Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Assignments in loops
Thanks to Andy, Peter and Roger for drawing my attention to assign(), which is just what I needed and works fine. Murray At 14:11 30/12/2003 +1300, Murray Jorgensen wrote: Greetings all. Any help with the following would be appreciated. I want to create a data frame for each file in a directory. The following code does not work but it may show what I am trying to do: carmakes - c('BMW','Chrysler','Citroen','Fiat','Ford','Holden','Honda', 'Mercedes','MG','Mitsubishi','Nissan','Peugeot','Renault','Subaru','Toyota', 'VW') for (brand in carmakes) { fyle - paste(c:/data/cars03/,brand,.txt,sep=) brand - read.table(fyle, header = TRUE, sep = \t,na.strings = c(-,POA), colClasses=c(character,rep(numeric,7)), comment.char = #) } I need something like an unquote() function that will allow the brand on the LHS of the assignment to be treated as an object name instead of a character string. Murray Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] missing data and completed missing data
You might take a set of classified data, say Fisher's irises, and treat the classification as initially unknown. Murray Raphael Schoenle wrote: Hi, This is not exactly an R request, but does anyone know of a good dataset that contains missing and missing data that have been completed later (like from persistent in-person interview attempts)? (want it for some Bayesian regression analysis) Thanks!! -Raphael [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] mclust - clustering by spatial patterns
Thomas W Blackwell wrote: [...] Why not simply use dist() and hclust() ? Starting with presence/absence data, what could mclust() possibly do that is different from hclust() ? Um, fit a statistical model. - tom blackwell - u michigan medical school - ann arbor - __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] reverse lexicographic order
Hi all, I have some email addresses that I would like to sort in reverse lexicographic order so that addresses from the same domain will be grouped together. How might that be done? Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] reverse lexicographic order
Hi Duncan, Hi Peter, thanks for those ideas! I'm sure I will learn a lot be fooling around with them. Cheers, Murray Duncan Murdoch wrote: On Mon, 15 Dec 2003 10:08:31 +1300, you wrote: Hi all, I have some email addresses that I would like to sort in reverse lexicographic order so that addresses from the same domain will be grouped together. How might that be done? I'm not sure this is what you want, but this function sorts a character vector by last letters, then 2nd last, 3rd last, and so on: revsort - function(x,...) { x[order(unlist(lapply(strsplit(x,''), function(x) paste(rev(x),collapse=''))),...)] } revsort(as.character(1:20)) [1] 10 20 1 11 2 12 3 13 4 14 5 15 6 16 7 [16] 17 8 18 9 19 The ... args are given to order(), so na.last=FALSE and decreasing=TRUE are possibilities. Duncan Murdoch -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] mode
Apologies for pursuing this increasingly off-topic thread, but I've just remembered that 'my' mode is not so hard to compute. Suppose a one-dimensional data set is in general position (all gaps unequal), find the smallest gap, then choose whichever endpoint has the closest neighbour. That's the mode. At 16:39 12/12/2003 +1300, Murray Jorgensen wrote: Opps! This is what I should have written: The mode of a data vector x might be defined as the limit of m_p as p tends to zero from above and where m_p is the m minimizing sum(abs(x - m)^p). I would not expect the mode so defined to be of much use in data analysis, nor would it be easy to compute. Murray [Thanks to Duncan Murdoch for noticing the missing ^p .] Douglas Bates wrote: Christian Mora [EMAIL PROTECTED] writes: How can I get the mode (most frequent value) from a dataset (continuos variables)? I can obtain it when the data is discrete (by making a table and looking at the higher frequency) but I don't know how obtain it from, for example, a density plot of the data. Does anyone know how to do it? Thanks I don't think the mode of a sample from a continuous random variable is well defined. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] mode
The mode of a data vector x might be defined as the limit of m_p as p tends to zero from above and where m_p is the m minimizing sum(abs(x - m)). I would not expect the mode so defined to be of much use in data analysis, nor would it be easy to compute. Murray Douglas Bates wrote: Christian Mora [EMAIL PROTECTED] writes: How can I get the mode (most frequent value) from a dataset (continuos variables)? I can obtain it when the data is discrete (by making a table and looking at the higher frequency) but I don't know how obtain it from, for example, a density plot of the data. Does anyone know how to do it? Thanks I don't think the mode of a sample from a continuous random variable is well defined. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] mode
Opps! This is what I should have written: The mode of a data vector x might be defined as the limit of m_p as p tends to zero from above and where m_p is the m minimizing sum(abs(x - m)^p). I would not expect the mode so defined to be of much use in data analysis, nor would it be easy to compute. Murray [Thanks to Duncan Murdoch for noticing the missing ^p .] Douglas Bates wrote: Christian Mora [EMAIL PROTECTED] writes: How can I get the mode (most frequent value) from a dataset (continuos variables)? I can obtain it when the data is discrete (by making a table and looking at the higher frequency) but I don't know how obtain it from, for example, a density plot of the data. Does anyone know how to do it? Thanks I don't think the mode of a sample from a continuous random variable is well defined. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Character graphics
Does anyone else miss email-friendly character graphics such as the following example, produced using Minitab? Histogram of C6 N = 478 N* = 21 Each * represents 2 observation(s) MidpointCount -12 16 -11 53 *** -10 63 -9 83 ** -8 93 *** -7 74 * -6 45 *** -5 13 *** -46 *** -32 * -2 10 * -1 18 * 01 * BTW, Minitab 13 protests when you want to do this: MTB gstd * NOTE * Character graphs are obsolete. * NOTE * Standard Graphics are enabled. Professional Graphics are disabled. Use the GPRO command to enable Professional Graphics. Has anyone tried to get similar unprofessional displays out of R? Cheers, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] add a point to regression line and cook's distance
Not a good idea, unless the regression function is *known* to be linear. More likely it is only approximately linear over small ranges. Murray Jorgensen Wiener, Matthew wrote: If you know that the line should pass through (0,0), would it make sense to do a regression without an intercept? You can do that by putting -1 in the formula, like: lm(y ~ x - 1). Hope this helps, Matt Matthew Wiener RY84-202 Applied Computer Science Mathematics Dept. Merck Research Labs 126 E. Lincoln Ave. Rahway, NJ 07065 732-594-5303 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Spencer Graves Sent: Wednesday, December 03, 2003 5:51 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [R] add a point to regression line and cook's distance What is the context? What do the outliers represent? If you think carefully about the context, you may find the answer. hope this helps. spencer graves p.s. I know statisticians who worked for HP before the split and who still work for either HP or Agilent, I'm not certain which. If you want to contact me off-line, I can give you a couple of names if that might help. [EMAIL PROTECTED] wrote: Hi, This is more a statistics question rather than R question. But I thought people on this list may have some pointers. MY question is like the following: I would like to have a robust regression line. The data I have are mostly clustered around a small range. So the regression line tend to be influenced strongly by outlier points (with large cook's distance). From the application 's background, I know that the line should pass (0,0), which is far away from the data cloud. I would like to add this point to have a more robust line. The question is: does it make sense to do this? what are the negative impacts if any? thanks, jonathan __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] sampling without repetition
Really you just want a random permutation of r: rperm - sample(r) Then you can obtain a random partition of r into cells of any desired size simply by taking appropriately-sized consecutive blocks of rperm. Murray At 22:42 17/11/2003 -0500, Rajarshi Guha wrote: Sorry for not providing all the details. The 3 sets can be of any size (which will be specified by the user of the function) and cover all of r (ie, set1 + set2 + set3 == r) Thanks to everybody for all the solutions. --- Rajarshi Guha [EMAIL PROTECTED] http://jijo.cjb.net GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE --- I'd love to go out with you, but there are important world issues that need worrying about. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Plotting lm() attributes
Suppose you fit a linear model model.1 ~ lm(v1 ~ ..., data=myframe) and v2 is some other column of myframe typically not in the model. You will often want to try plot(v2, model.1$residuals) but this will fail if there are NAs in the response v1 as model.1$residuals has length equal to the number of nonmissing values in v1. I suppose plot(v2[!is.na(v1)], model.1$residuals) does the job, but it seems irritating that model.1$residuals, does not have length agreeing with the number of rows in the data frame. It would be even more irritating for model.1$fitted.values, where the removed elements would often have nonmissing values. Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] ETA for 1.8.1 ?
When is version 1.8.1 likely to be released? Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] detecting singular matrices
My colleague runs R 1.7.1 under Windows XP. He remarks: A [,1] [,2] [,3] [1,]123 [2,]456 [3,]789 b [1] 1 2 3 solve(A,b) [1] -0.333 0.667 0.000 solve(A) [,1] [,2][,3] [1,] -4.5036e+15 9.00720e+15 -4.5036e+15 [2,] 9.0072e+15 -1.80144e+16 9.0072e+15 [3,] -4.5036e+15 9.00720e+15 -4.5036e+15 eigen(A) $values [1] 1.611684e+01 -1.116844e+00 -1.303678e-15 with a condition number of 1.236260e+16 I think I'd like it to tell me it's singular like it used to :) and I know it is singular Under 1.6.0 also on XP I get solve(A,1:3) Error in solve.default(A, 1:3) : singular matrix `a' in solve solve(A) Error in solve.default(A) : singular matrix `a' in solve eigen(A) $values [1] 1.611684e+01 -1.116844e+00 -1.120190e-16 $vectors [,1][,2] [,3] [1,] -0.2719964 -0.76169140 0.3734378 [2,] -0.6159646 -0.08408654 -0.7468756 [3,] -0.9599328 0.59351831 0.3734378 Which seems to be preferable output. Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] R-project [.com?] [.net?]
I got a shock a few days ago when I accidentally visited www.r-project.com . I thought that the r-project site had been hacked until I realised my mistake. There is also a site www.r-project.net. Both of these sites appear to be Japanese. Does anyone know anything about them? I suppose that it is not unusual for names close to those of popular sites to be used. It is good that they use a different language or there might well be confusion. Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Suming logical vectors
Can anyone explain the following? [R 1.6.0 Windows XP, yes I will upgrade soon.] Murray sort(IATmedian)[0:50]==0 [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [49] FALSE FALSE sum(sort(IATmedian)[0:50]==0) [1] 2 sum(sort(IATmedian)==0) [1] 2 sum(IATmedian==0) [1] NA -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Suming logical vectors
Blush! Not too hard to explain at all! Deepayan Sarkar wrote: Your IATmedian has some NAs (which are removed by sort) ? On Thu, 18 Sep 2003, Murray Jorgensen wrote: Can anyone explain the following? [R 1.6.0 Windows XP, yes I will upgrade soon.] Murray sort(IATmedian)[0:50]==0 [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [49] FALSE FALSE sum(sort(IATmedian)[0:50]==0) [1] 2 sum(sort(IATmedian)==0) [1] 2 sum(IATmedian==0) [1] NA -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Quit asking me if I want to save the workspace!
How do you stop R from putting up a dialog box when you quit Rgui? (I use Windows and I never save workspaces that way) Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Quit asking me if I want to save the workspace!
Rafael A. Irizarry wrote: you can type this: q(no) see the help file for q Still more work than two mouse clicks. -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Quit asking me if I want to save the workspace!
Ah! now that tells me what I want to know. I was trying to type C:\Program Files\R\rw1071\bin\Rgui.exe --no-save instead of C:\Program Files\R\rw1071\bin\Rgui.exe --no-save into the Target box. Silly me! Jason Turner wrote: On Wed, 2003-09-17 at 14:26, Murray Jorgensen wrote: Rafael A. Irizarry wrote: you can type this: q(no) see the help file for q Still more work than two mouse clicks. Two clicks! How awful! ;) Actually, it bugs me too, so my desktop shortcut (under Win XP) has this for Target. ### C:\Program Files\R\rw1071\bin\Rgui.exe --no-save ### (my mail client might've line-wrapped that by the time you see it. Everything between the ### marks is one line. *Include* the quotes. There is a space between Rgui.exe and --no-save) See Appendix B of An Introduction to R if you need more info. Hope that helps. Jason -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] R tools for large files
I would like to thank those who have responded and especially Brian Ripley for making his unix tools for Windows available. A colleague has also mentioned to me the set of unix tools called Cygwin. Two things that can be done with R alone are to read the first n lines of a file into n strings with readLines() and to scan in a block of the file after skipping a number of lines. I will probably use Fortran to extract subsets of the file as I need to use it for other things that I am planning to do with the file. I'll maybe also play a bit with readLines() and writeLines() inside loops to see if I can build up my random subsets of files this way. BTW, I now estimate the file at about 100,000 lines so indeed, it is not all that large! Murray Jorgensen Prof Brian Ripley wrote: On Mon, 25 Aug 2003, Murray Jorgensen wrote: At 08:12 25/08/2003 +0100, Prof Brian Ripley wrote: I think that is only a medium-sized file. Large for my purposes means more than I really want to read into memory which in turn means takes more than 30s. I'm at home now and the file isn't so I'm not sure if the file is large or not. More responses interspesed below. BTW, I forgot to mention that I'm using Windows and so do not have nice unix tools readily available. But you do, thanks to me, as you need them to installed R packages. On Mon, 25 Aug 2003, Murray Jorgensen wrote: I'm wondering if anyone has written some functions or code for handling very large files in R. I am working with a data file that is 41 variables times who knows how many observations making up 27MB altogether. The sort of thing that I am thinking of having R do is - count the number of lines in a file You can do that without reading the file into memory: use system(paste(wc -l, filename)) Don't think that I can do that in Windows XL. I presume you mean Windows XP? Of course you can, and wc.exe is in Rtools.zip! or read in blocks of lines via a connection But that does sound promising! - form a data frame by selecting all cases whose line numbers are in a supplied vector (which could be used to extract random subfiles of particular sizes) R should handle that easily in today's memory sizes. Buy some more RAM if you don't already have 1/2Gb. As others have said, for a real large file, use a RDBMS to do the selection for you. It's just that R is so good in reading in initial segments of a file that I can't believe that it can't be effective in reading more general (pre-specified) subsets. -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] R tools for large files
Hi Martin, I don't know much about the concept of connection but I had supposed it to at least include the concept of file and perhaps also input device and output device'. I guess the important point that you are making is that it is sequential in the sense that you describe. I suppose at the time that I wrote my emails I didn't *know* that this was the case but rather assumed that this must be so, since it would be tedious in the extreme to have to work with the access functions if they kept going back to the beginning of the connection. It may help to explain the application. The large files that I am working with are themselves statistical summaries of internet traffic flows (you will appreciate why they can be almost arbitrarily large!) I am interested in clustering these flows into different classes of traffic. I am using a model-based approach, so that the end-point will be statistical models for each cluster. Once these have been estimated they may be used in the classification of future traffic [including a residual class of traffic that does not fit any cluster well]. Based on experience with my clustering software (Multimix) I believe that it should work well on data sets of, say, 3000 observations. I plan to select a small number of random subsets of this size. The replication of these subsets should help me with model selection questions (How many Clusters? How complex should each cluster model be?) Tom Mulholland makes a good point when he notes that many R users (and other users) have very little control over their computing environment owing to somewhat arbitrary IT management decisions. For this reason it will be advantageous to have several solutions to large file problems. I'm pleased that you think that efficient R functions for manipulating numbered lines from files may be written. I'm going to have a go at it just as soon as I finish a big item of paperwork! BTW, I will be out of town and with much reduced email access over the next week or so, so if I don't reply to the list or individuals this should not be put down to laziness or rudeness! Cheers, Murray Jorgensen PS Give my regards to Chris Hennig. Martin Maechler wrote: Hi Murray, from reading your summarizing reply, I wonder if you missed the most important point about connections connection := generalization of file): Once you open() one, you can read it **sequentially**, e.g., in bunches of a few lines i.e., you don't re-start from the beginning each time. I think this will allow to devise a pretty efficient R function for reading (and returning as a vector of strings) line numbers (n1, n2,..., nm). Did you know this? If not, maybe you forward this answer (and your reaction to it) to R-help as well. Regards, Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] R tools for large files
I'm wondering if anyone has written some functions or code for handling very large files in R. I am working with a data file that is 41 variables times who knows how many observations making up 27MB altogether. The sort of thing that I am thinking of having R do is - count the number of lines in a file - form a data frame by selecting all cases whose line numbers are in a supplied vector (which could be used to extract random subfiles of particular sizes) Does anyone know of a package that might be useful for this? Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] R tools for large files
Could you be more specific? Do you mean the chapter on connections? Ko-Kang Kevin Wang wrote: Hi, Have you looked at R Data Import/Export? On Mon, 25 Aug 2003, Murray Jorgensen wrote: __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] R tools for large files
Andrew, This is no doubt true, but some things in R work very well with big files without the need for any extra software: readLines(c:/data/perry/data.csv,n=12) # prints out the first 12 lines as strings flows - read.csv(c:/data/perry/data.csv,na.strings=?, header=F,nrows=1000) # makes a data frame from the first 1000 records I would like to get some solution where I don't find myself generating large numbers of derived files from the original data file. Murray Andrew C. Ward wrote: Dear Murray, One way that works very well for many people (including me) is to store the data in an external database, such as MySQL, and read in just the bits you want using the excellent package RODBC. Getting a database to do all the selecting is very fast and efficient, leaving R to concentrate on the analysis and visualisation. This is all described in the R Import/Export Manual. Regards, Andrew C. Ward CAPE Centre Department of Chemical Engineering The University of Queensland Brisbane Qld 4072 Australia [EMAIL PROTECTED] Quoting Murray Jorgensen [EMAIL PROTECTED]: I'm wondering if anyone has written some functions or code for handling very large files in R. I am working with a data file that is 41 variables times who knows how many observations making up 27MB altogether. The sort of thing that I am thinking of having R do is - count the number of lines in a file - form a data frame by selecting all cases whose line numbers are in a supplied vector (which could be used to extract random subfiles of particular sizes) Does anyone know of a package that might be useful for this? Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED] Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] R tools for large files
At 08:12 25/08/2003 +0100, Prof Brian Ripley wrote: I think that is only a medium-sized file. Large for my purposes means more than I really want to read into memory which in turn means takes more than 30s. I'm at home now and the file isn't so I'm not sure if the file is large or not. More responses interspesed below. BTW, I forgot to mention that I'm using Windows and so do not have nice unix tools readily available. On Mon, 25 Aug 2003, Murray Jorgensen wrote: I'm wondering if anyone has written some functions or code for handling very large files in R. I am working with a data file that is 41 variables times who knows how many observations making up 27MB altogether. The sort of thing that I am thinking of having R do is - count the number of lines in a file You can do that without reading the file into memory: use system(paste(wc -l, filename)) Don't think that I can do that in Windows XL. or read in blocks of lines via a connection But that does sound promising! - form a data frame by selecting all cases whose line numbers are in a supplied vector (which could be used to extract random subfiles of particular sizes) R should handle that easily in today's memory sizes. Buy some more RAM if you don't already have 1/2Gb. As others have said, for a real large file, use a RDBMS to do the selection for you. It's just that R is so good in reading in initial segments of a file that I can't believe that it can't be effective in reading more general (pre-specified) subsets. Murray -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Interlacing two vectors
I want to interlace two vectors. This I can do: x - 1:4 z - x+0.5 as.vector(t(cbind(x,z))) [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 but this seems rather inelegant. Any suggestions? Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] r-help using random generating
Student Exercise?! Cheryl H. wrote: To whom it may concern: Given that my sample size is n, my mean is 100, and my sd is 10, I need to use a random number generator (which I believe is the function rnorm(5,100,10)), but I need to repeat it a large number of times, and then plot the sampling distributions of the sample means, sd's, and variances of those generated sets. I'm having a real hard time trying to figure out how to do this easily. Please help if possible! Thanks, Cheryl __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] help with likelihood contour plot
Can some kind person point out my error here? I'm trying to set up a grid for a countour plot of a likelihood function. u - rnorm(20,9.5,2.5) # sample of size 20 from N(9.5,2.5^2) loglik - function(th1,th2) { + n - length(u) + -(n/2)*log(2*pi*th2^2)-0.5*sum((u-th1)^2/th2^2) + } x - seq(4.5,14.5,len=50) y - seq(0.5,6,len=50) f - outer(x, y, loglik(x,y)) Error in match.fun(FUN) : not function, character, or symbol: loglik(x, y) In addition: Warning message: longer object length is not a multiple of shorter object length in: u - th1 loglik(9,2) [1] -44.56294 is.function(loglik) [1] TRUE Thanks, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] lda on curves
I'm working on a rather interesting consulting problem with a client. A number of physical variables are measured on a number of cricket bowlers in the performance of a delivery. An example variable might be a directional component of angular momentum for a particular joint measured at a large number (101) of equally spaced timepoints. Each bowler generates a (fairly smooth) curve for each variable measured. I decided to represent each curve by a few orthogonal polynomial constrasts. There are 4 groups of bowlers corresponding to various speeds of delivery. I want to use canonical variant analysis to find linear combinations of my transformed variables discriminating well between the groups of bowlers. I used lda() from the MASS library to do this, but examining the output I notice that the higher-order orthogonal polynomials are getting larger coefficients than the more important lower-order ones. This is clearly because some scaling of the variables is being done by lda(), and because the higher-order polynomial vaiable values are smaller, they are scaled up. I would like to turn off this scaling as it is not what is needed in this problem and will cause the tail to wag the dog. There is no obvious parameter to do this in lda(x, grouping, prior = proportions, tol = 1.0e-4, subset, na.action = na.fail, method, CV = FALSE, nu) so I thought that I might try a hack. However: lda function (x, ...) { if (is.null(class(x))) class(x) - data.class(x) UseMethod(lda, x, ...) } which isn't very helpful. Any ideas about how to perform an unscaled canonical variates analysis? Cheers, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] google
That's good. But it will probably not take you to other pages where people mention that they are using R for something, which is a bit of a pity. Rafael A. Irizarry wrote: fyi, I typed R in google and hit the I'm feeling lucky botton... it took me to http://www.r-project.org -rafael ps - toysrus.com is second. __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] order() on vector with Inf's
[EMAIL PROTECTED] wrote: On Thu, 23 Jan 2003, Murray Jorgensen wrote: [useful suggestions omitted] llbet[llbet==Inf] - -Inf should correct the problem, but why not correct the calculation? Not a bad idea, but the numerical difficulties always occur well away from the likelihood maximum, and I'm only looking for reasonable starting values. -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] order() on vector with Inf's
I have a parameter vector, ibeta, and a corresponding vector of loglikelihoods, llbet. If llbet contains no NAs or Inf's then I can extract the best parameter by index - order(-llbet)[1] beta - ibeta[index] or similar. The argument na.last of order() allows me to fix this up even if llbet contains some NAs which I wish to ignore. Unfortunately my llbet contains Inf's, which are not points of infinite likelihood, if anything they should be -Inf's. Anyway na.last does not seem to help me with these. What should I do? Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wk+64 7 849 6486 homeMobile 021 1395 862 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help