Fwd: [R] Enduring LME confusion or Psychologists and Mixed-Effects
I will follow the suggestion of John Maindonald and present the problem by example with some data. I also follow the advice to use mean scores, somewhat reluctantly though. I know it is common practice in psychology, but wouldnt it be more elegant if one could use all the data points in an analysis? The data for 5 subjects (myData) are provided at the bottom of this message. It is a crossed within-subject design with three factors and reaction time (RT) as the dependent variable. An initial repeated-measures model would be: aov1-aov(RT~fact1*fact2*fact3+Error(sub/(fact1*fact2*fact3)),data=myData) Aov complains that the effects involving fact3 are unbalanced: aov1 Stratum 4: sub:fact3 Terms: fact3 Residuals Sum of Squares 4.81639e-07 5.08419e-08 Deg. of Freedom 2 8 Residual standard error: 7.971972e-05 6 out of 8 effects not estimable Estimated effects may be unbalanced Presumably this is because fact3 has three levels and the other ones only two, making this a non-orthogonal design. I then fit an equivalent mixed-effects model: lme1-lme(RT~fact1*fact2*fact3,data=meanData2,random=~1|sub) Subsequently I test if my factors had any effect on RT: anova(lme1) numDF denDF F-value p-value (Intercept) 144 105294024 .0001 fact1 14422 .0001 fact2 144 7 0.0090 fact3 24419 .0001 fact1:fact2 144 9 0.0047 fact1:fact3 244 1 0.4436 fact2:fact3 244 1 0.2458 fact1:fact2:fact3 244 0 0.6660 Firstly, why are the F-values in the output whole numbers? The effects are estimated over the whole range of the dataset and this is so because all three factors are nested under subjects, on the same level. This, however, seems to inflate the F-values compared to the anova(aov1) output, e.g. anova(aov1) Error: sub:fact2 Df Sum SqMean Sq F value Pr(F) fact2 1 9.2289e-08 9.2289e-08 2.2524 0.2078 Residuals 4 1.6390e-07 4.0974e-08 I realize that the (probably faulty) aov model may not be directly compared to the lme model, but my concern is if the lme estimation of the effects is right, and if so, how can a nave skeptic be convinced of this? The suggestion to use simulate.lme() to find this out seems good, but I cant seem to get it working (from [R] lme: simulate.lme in R it seems it may never work in R). I have also followed the suggestion to fit the exact same model with lme4. However, format of the anova output does not give me the estimation in the way nlme does. More importantly, the degrees of freedom in the denominator dont change, theyre still large: library(lme4) lme4_1-lme(RT~fact1*fact2*fact3,random=~1|sub,data=myData) anova(lme4_1) Analysis of Variance Table DfSum Sq Mean Sq Denom F valuePr(F) fact1I1 2.709e-07 2.709e-0748 21.9205 2.360e-05 *** fact2I1 9.229e-08 9.229e-0848 7.4665 0.008772 ** fact3L1 4.906e-08 4.906e-0848 3.9691 0.052047 . fact3M1 4.326e-07 4.326e-0748 34.9972 3.370e-07 *** fact1I:fact2I 1 1.095e-07 1.095e-0748 8.8619 0.004552 ** fact1I:fact3L 1 8.988e-10 8.988e-1048 0.0727 0.788577 fact1I:fact3M 1 1.957e-08 1.957e-0848 1.5834 0.214351 fact2I:fact3L 1 3.741e-09 3.741e-0948 0.3027 0.584749 fact2I:fact3M 1 3.207e-08 3.207e-0848 2.5949 0.113767 fact1I:fact2I:fact3L 1 2.785e-09 2.785e-0948 0.2253 0.637162 fact1I:fact2I:fact3M 1 7.357e-09 7.357e-0948 0.5952 0.444206 --- I hope that by providing a sample of the data someone can help me out on the questions I asked in my previous mail: 1. When aovs assumptions are violated, can lme provide the right model for within-subjects designs where multiple fixed effects are NOT hierarchically ordered? 2. Are the degrees of freedom in anova(lme1) the right ones to report? If so, how do I convince a reviewer that, despite the large number of degrees of freedom, lme does provide a conservative evaluation of the effects? If not, how does one get the right denDf in a way that can be easily understood? If anyone thinks he can help me better by looking at the entire data set, I very much welcome them to email me for further discussion. In great appreciation of your help and work for the R-community, Gijs Plomp [EMAIL PROTECTED] myData sub fact1 fact2 fact3RT 1 s1 C C G 0.9972709 2 s2 C C G 0.9981664 3 s3 C C G 0.9976909 4 s4 C C G 0.9976047 5 s5 C C G 0.9974346 6 s1 I C G 0.9976206 7 s2 I C G 0.9981980 8 s3 I C G 0.9980503 9 s4 I C G 0.9980620 10 s5 I C G 0.9977682 11 s1 C I G 0.9976633 12 s2 C
[R] Help with generating data from a 'not quite' Normal distriburtion
I would be very grateful for any help from members of this list for what might be a simple problem... We are trying to simulate the behaviour of a clinical measurement in a series of computer experiments. This is simple enough to do in R if we assume the measurements to be Gaussian, but their empirical distribution has a much higher peak at the mean and the distribution has much longer tails. (The distribution is quite symmetrical) Can anyone suggest any distributions I could fit to this data, and better still how I can then generate random data from this 'distribution' using R? --- Dr. David Crabb School of Science, The Nottingham Trent University, Clifton Campus, Nottingham. NG11 8NS Tel: 0115 848 3275 Fax: 0115 848 6690 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help with generating data from a 'not quite' Normal distriburtion
Hi David, you could try a Student's t distribution with appropriate degrees of freedom and extra scale paremter, i.e., ?rt rgt - function(n, mu=0, sigma=1, df=stop(no df arg)) mu+sigma*rt(n, df=df) I hope this helps. Best, Dimitris Dimitris Rizopoulos Doctoral Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/396887 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm - Original Message - From: Crabb, David [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 12, 2004 10:44 AM Subject: [R] Help with generating data from a 'not quite' Normal distriburtion I would be very grateful for any help from members of this list for what might be a simple problem... We are trying to simulate the behaviour of a clinical measurement in a series of computer experiments. This is simple enough to do in R if we assume the measurements to be Gaussian, but their empirical distribution has a much higher peak at the mean and the distribution has much longer tails. (The distribution is quite symmetrical) Can anyone suggest any distributions I could fit to this data, and better still how I can then generate random data from this 'distribution' using R? --- Dr. David Crabb School of Science, The Nottingham Trent University, Clifton Campus, Nottingham. NG11 8NS Tel: 0115 848 3275 Fax: 0115 848 6690 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Help with generating data from a 'not quite' Normal distriburtion
Hi, the Student's t distribution could be considered: it's symmetrical, but with a low number of degree of freedom is different from Normal distribution I think in the way you said:has a much higher peak at the mean and the distribution has much longer tails. Try to use: rt(n, df) where n=number of obs. df=degree of freedom. for samples simulations. Best Vito I would be very grateful for any help from members of this list for what might be a simple problem... We are trying to simulate the behaviour of a clinical measurement in a series of computer experiments. This is simple enough to do in R if we assume the measurements to be Gaussian, but their empirical distribution has a much higher peak at the mean and the distribution has much longer tails. (The distribution is quite symmetrical) Can anyone suggest any distributions I could fit to this data, and better still how I can then generate random data from this 'distribution' using R? --- Dr. David Crabb School of Science, The Nottingham Trent University, Clifton Campus, Nottingham. NG11 8NS Tel: 0115 848 3275 Fax: 0115 848 6690 = Diventare costruttori di soluzioni Visitate il portale http://www.modugno.it/ e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Help with generating data from a 'not quite' Normal distriburtion
Hi, Also the Cauchy's distribution could be good: rcauchy(n, location = 0, scale = 1) Best Vito I would be very grateful for any help from members of this list for what might be a simple problem... We are trying to simulate the behaviour of a clinical measurement in a series of computer experiments. This is simple enough to do in R if we assume the measurements to be Gaussian, but their empirical distribution has a much higher peak at the mean and the distribution has much longer tails. (The distribution is quite symmetrical) Can anyone suggest any distributions I could fit to this data, and better still how I can then generate random data from this 'distribution' using R? --- Dr. David Crabb School of Science, The Nottingham Trent University, Clifton Campus, Nottingham. NG11 8NS Tel: 0115 848 3275 Fax: 0115 848 6690 = Diventare costruttori di soluzioni Visitate il portale http://www.modugno.it/ e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Giving a first good impression of R to Social Scientists
Dear all, in the coming Winter Semester, I will be a teaching assistant for a course in Survival Analysis. My job will be to do the lab sessions. The software used for these lab sessions will be R. Most of the students have a background in social sciences and the only stats package they used so far is most likely SPSS. So I assume they might be quite surprised the first time they see R (where is my rectangular data window?, where do I have to click to make a new variable?, ...). That is why would like to ask the experts on this list if anyone of you has encountered a similar experience and what you could advise to persuade people quickly that it is worth learning a new software? I imagined to give them a short presentation about the nice capabilities what R can do which would be impossible or troublesome with conventional software like SPSS.[1] The reason is that I want to create an atmosphere where people have a positive attitude towards learning a new software right from the beginning. This would make it easier for me and I guess also the students learn more and faster if they have a positive first encounter with R. (Afterwards I plan to introduce them to the basics of R with the help of Venables/Smith/R Core Team: An Introduction to R, Dalgaard Introductory Statistics with R and Krause/Olson The Basics of S and S-Plus before doing any survival analysis relevant exercises.) I would appreciate any suggestions! Thanks for your help, Roland [1] I originally thought to show them how easy it is to estimate in R a Kaplan-Meier-Survival curve in the presence of left truncation, whereas I have seen no possibility so far to do that in SPSS (that could be also due to my lack of knowledge using SPSS) but the KM-estimator is a topic during the course, so I can not use this example. + This mail has been sent through the MPI for Demographic Rese...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Giving a first good impression of R to Social Scientists
Hi, do you know there are several GUI for R? See: http://www.sciviews.org/_rgui/ and in particular: http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/ R-Commander is quite like GUI of commercial softwares. Give a look: they can help your pupils which are able to use SPSS. Although I prefer command line, for me is simplier. Best Vito Dear all, in the coming Winter Semester, I will be a teaching assistant for a course in Survival Analysis. My job will be to do the lab sessions. The software used for these lab sessions will be R. Most of the students have a background in social sciences and the only stats package they used so far is most likely SPSS. So I assume they might be quite surprised the first time they see R (where is my rectangular data window?, where do I have to click to make a new variable?, ...). That is why would like to ask the experts on this list if anyone of you has encountered a similar experience and what you could advise to persuade people quickly that it is worth learning a new software? I imagined to give them a short presentation about the nice capabilities what R can do which would be impossible or troublesome with conventional software like SPSS.[1] The reason is that I want to create an atmosphere where people have a positive attitude towards learning a new software right from the beginning. This would make it easier for me and I guess also the students learn more and faster if they have a positive first encounter with R. (Afterwards I plan to introduce them to the basics of R with the help of Venables/Smith/R Core Team: An Introduction to R, Dalgaard Introductory Statistics with R and Krause/Olson The Basics of S and S-Plus before doing any survival analysis relevant exercises.) I would appreciate any suggestions! Thanks for your help, Roland [1] I originally thought to show them how easy it is to estimate in R a Kaplan-Meier-Survival curve in the presence of left truncation, whereas I have seen no possibility so far to do that in SPSS (that could be also due to my lack of knowledge using SPSS) but the KM-estimator is a topic during the course, so I can not use this example. http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/ = Diventare costruttori di soluzioni Visitate il portale http://www.modugno.it/ e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RE: Giving a first good impression of R to Social Scientists
Hi, -Original Message- From: Vito Ricci [SMTP:[EMAIL PROTECTED] do you know there are several GUI for R? See: [...] R-Commander is quite like GUI of commercial softwares. Yes, I do know the R-Commander. But I did not want to give them a GUI but rather expose them to the command line after I demonstrated that the steep learning curve in the beginning is worth the effort for the final results. That is why I wanted to ask the list if anyone has faced the same situation to persuade students to use R. Are social science students most impressionable with some nice graphs (e.g. filled.contour) or will they get a more positive attitude if I used the R as an overgrown calculator like in Peter Dalgaards book? Or should I write an SPSS script to perform a certain task and demonstrate how easy, compact, and elegant it is to fulfill the same job in R? Just telling them We will use R during our course without any explanation would be not a good choice in my opinion. As I have written before: I would like the students to trust me that it is worth to invest some extra energy in the beginning. I do not expect to receive any prepared demonstration from anyone of you. I am more curious about your teaching experiences and how you got people enthusiastic to use this software. Thanks, Roland + This mail has been sent through the MPI for Demographic Rese...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Giving a first good impression of R to Social Scientists
Hi, I have encountered big difficulties trying to persuade my undergraduate students, with very slight background either in statistics or computing to use R instead of SPSS. I tried to start with a sort of very, very simple sample session, just for showing that R is not as complicated as they think and it is worth trying The idea was to give them a kind of simple lab guide they could follow at their own pace, and stimulate them to search in help files, etc... I think it is important that they get used to try around the examples and documentation to solve specific problems... Best Susana -- Susana Barbosa Departamento de Matematica Aplicada Faculdade de Ciências, Universidade Porto Rua do Campo Alegre, 687, 4169-007, Porto Tel: 220 100 840 Fax: 220 100 809 --- -- Susana Barbosa Departamento de Matematica Aplicada Faculdade de Ciências, Universidade Porto Rua do Campo Alegre, 687, 4169-007, Porto Tel: 220 100 840 Fax: 220 100 809 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] error using daisy() in library(cluster). Bug?
Hi, I'm using the cluster library to examine multivariate data. The data come from a connection to a postgres database, and I did a short R script to do the analisys. With the cluster version included in R1.8.0, daisy worked well for my data, but now, when I call daisy, I obtain the following messages: - Error in if (any(sx == 0)) { : missing value where TRUE/FALSE needed In addition: Warning message: binary variable(s) 116 treated as interval scaled in: daisy(concentracion.data.frame, stand = TRUE) - Al the variables in my dataframe are numeric. Although I've got NA values, and I've seen that if a do the analisys for a subset of the dataframe, selecting just columns with no NA, the result is good. Could this be a bug? Thanks, and best regards Javier __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Enduring LME confusion or Psychologists and Mixed-Effects
Dear Gijs: lme fits models using restricted maximum likelihood by default. So, I believe this is why you have a different DF. If you include method=ML in the modeling function the DF should be similar to aov. I think your lme code is incorrect given my understanding of your problem. You can use all data points using a repeated measures design. Each subject is measured on multiple occasions. So, observations are nested within individual. You need to add a time variable to indicate the measurement occasion for each subject (e.g., t={0,...,T}). You might try the following: fm1.lme-lme(rt~time*fact1*fact2*fact3, data=mydata, random=~time|sub, method=ML) Use summary() to get the entire output hope this helps Harold -Original Message- From: [EMAIL PROTECTED] on behalf of Gijs Plomp Sent: Thu 8/12/2004 4:13 AM To: [EMAIL PROTECTED] Cc: Subject: Fwd: [R] Enduring LME confusion… or Psychologists and Mixed-Effects I will follow the suggestion of John Maindonald and present the problem by example with some data. I also follow the advice to use mean scores, somewhat reluctantly though. I know it is common practice in psychology, but wouldn’t it be more elegant if one could use all the data points in an analysis? The data for 5 subjects (myData) are provided at the bottom of this message. It is a crossed within-subject design with three factors and reaction time (RT) as the dependent variable. An initial repeated-measures model would be: aov1-aov(RT~fact1*fact2*fact3+Error(sub/(fact1*fact2*fact3)),data=myData) Aov complains that the effects involving fact3 are unbalanced: aov1 … Stratum 4: sub:fact3 Terms: fact3 Residuals Sum of Squares 4.81639e-07 5.08419e-08 Deg. of Freedom 2 8 Residual standard error: 7.971972e-05 6 out of 8 effects not estimable Estimated effects may be unbalanced … Presumably this is because fact3 has three levels and the other ones only two, making this a ‘non-orthogonal’ design. I then fit an equivalent mixed-effects model: lme1-lme(RT~fact1*fact2*fact3,data=meanData2,random=~1|sub) Subsequently I test if my factors had any effect on RT: anova(lme1) numDF denDF F-value p-value (Intercept) 144 105294024 .0001 fact1 14422 .0001 fact2 144 7 0.0090 fact3 24419 .0001 fact1:fact2 144 9 0.0047 fact1:fact3 244 1 0.4436 fact2:fact3 244 1 0.2458 fact1:fact2:fact3 244 0 0.6660 Firstly, why are the F-values in the output whole numbers? The effects are estimated over the whole range of the dataset and this is so because all three factors are nested under subjects, on the same level. This, however, seems to inflate the F-values compared to the anova(aov1) output, e.g. anova(aov1) … Error: sub:fact2 Df Sum SqMean Sq F value Pr(F) fact2 1 9.2289e-08 9.2289e-08 2.2524 0.2078 Residuals 4 1.6390e-07 4.0974e-08 … I realize that the (probably faulty) aov model may not be directly compared to the lme model, but my concern is if the lme estimation of the effects is right, and if so, how can a naïve skeptic be convinced of this? The suggestion to use simulate.lme() to find this out seems good, but I can’t seem to get it working (from [R] lme: simulate.lme in R it seems it may never work in R). I have also followed the suggestion to fit the exact same model with lme4. However, format of the anova output does not give me the estimation in the way nlme does. More importantly, the degrees of freedom in the denominator don’t change, they’re still large: library(lme4) lme4_1-lme(RT~fact1*fact2*fact3,random=~1|sub,data=myData) anova(lme4_1) Analysis of Variance Table DfSum Sq Mean Sq Denom F valuePr(F) fact1I1 2.709e-07 2.709e-0748 21.9205 2.360e-05 *** fact2I1 9.229e-08 9.229e-0848 7.4665 0.008772 ** fact3L1 4.906e-08 4.906e-0848 3.9691 0.052047 . fact3M1 4.326e-07 4.326e-0748 34.9972 3.370e-07 *** fact1I:fact2I 1 1.095e-07
Re: [R] Help with generating data from a 'not quite' Normal distriburtion
Vito == Vito Ricci [EMAIL PROTECTED] on Thu, 12 Aug 2004 10:59:23 +0200 (CEST) writes: Vito Hi, Also the Cauchy's distribution could be good: Vito rcauchy(n, location = 0, scale = 1) also is an exaggeration, after you already told him to use the t-distribution family: Cauchy = t-Dist(*, df = 1) ! DCrabb I would be very grateful for any help from members of DCrabb this list for what might be a simple problem... DCrabb We are trying to simulate the behaviour of a clinical DCrabb measurement in a series of computer experiments. This DCrabb is simple enough to do in R if we assume the DCrabb measurements to be Gaussian, but their empirical DCrabb distribution has a much higher peak at the mean and DCrabb the distribution has much longer tails. (The DCrabb distribution is quite symmetrical) Can anyone suggest DCrabb any distributions I could fit to this data, and better DCrabb still how I can then generate random data from this DCrabb 'distribution' using R? I'd first try with the t distribution, using fitdistr() from package MASS, e.g., x - rt(1000, df = 1.5) library(MASS) fx - fitdistr(x, densfun = t) fx m sdf -0.013967851.043381511.57749052 ( 0.04426267) ( 0.04766543) ( 0.10809543) (so it *does* estimate location and scale in addition to the df). If you read the help page ?fitdistr you'll see in the example that estimating 'df' is said to be problematic. AFAIK it can be better to reparametrize, possibly using 1/df or log(df) as new parameter. {but then you can't use fitdistr() but rather mle() and the log likelihood or optim() directly}. Martin Maechler __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: RE: [R] Enduring LME confusion or Psychologists and Mixed-Effects
On Thu, 12 Aug 2004, Doran, Harold wrote: lme fits models using restricted maximum likelihood by default. So, I believe this is why you have a different DF. If you include method=ML in the modeling function the DF should be similar to aov. It is REML and not ML that generalizes the classical multistratum AoV. In any case, REML and ML use the same subspaces, just different methods for estimating variances within them. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help with generating data from a 'not quite' Normal distriburtion
On Thu, 12 Aug 2004, Martin Maechler wrote: Vito == Vito Ricci [EMAIL PROTECTED] on Thu, 12 Aug 2004 10:59:23 +0200 (CEST) writes: Vito Hi, Also the Cauchy's distribution could be good: Vito rcauchy(n, location = 0, scale = 1) also is an exaggeration, after you already told him to use the t-distribution family: Cauchy = t-Dist(*, df = 1) ! DCrabb I would be very grateful for any help from members of DCrabb this list for what might be a simple problem... DCrabb We are trying to simulate the behaviour of a clinical DCrabb measurement in a series of computer experiments. This DCrabb is simple enough to do in R if we assume the DCrabb measurements to be Gaussian, but their empirical DCrabb distribution has a much higher peak at the mean and DCrabb the distribution has much longer tails. (The DCrabb distribution is quite symmetrical) Can anyone suggest DCrabb any distributions I could fit to this data, and better DCrabb still how I can then generate random data from this DCrabb 'distribution' using R? I'd first try with the t distribution, using fitdistr() from package MASS, e.g., x - rt(1000, df = 1.5) library(MASS) fx - fitdistr(x, densfun = t) fx m sdf -0.013967851.043381511.57749052 ( 0.04426267) ( 0.04766543) ( 0.10809543) (so it *does* estimate location and scale in addition to the df). If you read the help page ?fitdistr you'll see in the example that estimating 'df' is said to be problematic. AFAIK it can be better to reparametrize, possibly using 1/df or log(df) as new parameter. {but then you can't use fitdistr() but rather mle() and the log likelihood or optim() directly}. It is the use of ML for the df that is *in theory* problematic, not the optimization per se. See the reference, p.110, for some of the literature. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] error using daisy() in library(cluster). Bug?
¡Hola Javier! since I am the maintainer of the cluster *package* (not library), I'm interested to find out more about this problem. I assume, you now use R 1.9.1. Can you give us an example we can reproduce? Give the exact R commands you use and maybe attach the save()d data file (*.rda) in a private e-mail? Or do this on R-help and give an URL where one can download the data (you can't attach such binary files for R-help). Thank you, Martin Maechler javier == javier garcia - CEBAS [EMAIL PROTECTED] on Thu, 12 Aug 2004 12:53:28 +0200 writes: javier Hi, I'm using the cluster library to examine javier multivariate data. The data come from a connection javier to a postgres database, and I did a short R script javier to do the analisys. With the cluster version javier included in R1.8.0, daisy worked well for my data, javier but now, when I call daisy, I obtain the following javier messages: - Error in if (any(sx == 0)) { : javier missing value where TRUE/FALSE needed In addition: javier Warning message: binary variable(s) 116 treated as javier interval scaled in: daisy(concentracion.data.frame, javier stand = TRUE) - javier Al the variables in my dataframe are javier numeric. Although I've got NA values, and I've seen javier that if a do the analisys for a subset of the javier dataframe, selecting just columns with no NA, the javier result is good. Could this be a bug? javier Thanks, and best regards javier Javier __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Re: R-help Digest, Vol 18, Issue 12
The message for aov1 was Estimated effects may be unbalanced. The effects are not unbalanced. The design is 'orthogonal'. The problem is that there are not enough degrees of freedom to estimate all those error terms. If you change the model to: aov1 - aov(RT~fact1*fact2*fact3+Error(sub/(fact1+fact2+fact3)),data=myData) or to aov2 - aov(RT~fact1*fact2*fact3+Error(sub/ ((fact1+fact2+fact3)^2)),data=myData) all is well. This last model (aov2) seems to me to have an excessive number of error terms. The lme model lme(RT~fact1*fact2*fact3, random=~1|sub, data=myData) is equivalent to aov0 - aov(RT~fact1*fact2*fact3+Error(sub), data=myData) It can be verified that the estimated variance components can be used to reproduce the mean squares in the anova table. A conservative approach is to estimate e.g. contrasts involving fact1 for each subject separately, then comparing these against SE estimates that have 4df (5 subjects -1). This is the ultimate check -- are claimed effects consistent against the 5 subjects? The standard errors should though, probably be based on some variety of averaged variance. I do not know how to set up the equivalents of aov1 and aov2 (where the errors are indeed crossed) in lme4. John Maindonald. On 12 Aug 2004, at 8:03 PM, [EMAIL PROTECTED] wrote: I will follow the suggestion of John Maindonald and present the problem by example with some data. I also follow the advice to use mean scores, somewhat reluctantly though. I know it is common practice in psychology, but wouldnt it be more elegant if one could use all the data points in an analysis? The data for 5 subjects (myData) are provided at the bottom of this message. It is a crossed within-subject design with three factors and reaction time (RT) as the dependent variable. An initial repeated-measures model would be: aov1-aov(RT~fact1*fact2*fact3+Error(sub/ (fact1*fact2*fact3)),data=myData) Aov complains that the effects involving fact3 are unbalanced: aov1 Stratum 4: sub:fact3 Terms: fact3 Residuals Sum of Squares 4.81639e-07 5.08419e-08 Deg. of Freedom 2 8 Residual standard error: 7.971972e-05 6 out of 8 effects not estimable Estimated effects may be unbalanced Presumably this is because fact3 has three levels and the other ones only two, making this a non-orthogonal design. I then fit an equivalent mixed-effects model: lme1-lme(RT~fact1*fact2*fact3,data=meanData2,random=~1|sub) Subsequently I test if my factors had any effect on RT: anova(lme1) numDF denDF F-value p-value (Intercept) 144 105294024 .0001 fact1 14422 .0001 fact2 144 7 0.0090 fact3 24419 .0001 fact1:fact2 144 9 0.0047 fact1:fact3 244 1 0.4436 fact2:fact3 244 1 0.2458 fact1:fact2:fact3 244 0 0.6660 Firstly, why are the F-values in the output whole numbers? The effects are estimated over the whole range of the dataset and this is so because all three factors are nested under subjects, on the same level. This, however, seems to inflate the F-values compared to the anova(aov1) output, e.g. anova(aov1) Error: sub:fact2 Df Sum SqMean Sq F value Pr(F) fact2 1 9.2289e-08 9.2289e-08 2.2524 0.2078 Residuals 4 1.6390e-07 4.0974e-08 I realize that the (probably faulty) aov model may not be directly compared to the lme model, but my concern is if the lme estimation of the effects is right, and if so, how can a nave skeptic be convinced of this? The suggestion to use simulate.lme() to find this out seems good, but I cant seem to get it working (from [R] lme: simulate.lme in R it seems it may never work in R). I have also followed the suggestion to fit the exact same model with lme4. However, format of the anova output does not give me the estimation in the way nlme does. More importantly, the degrees of freedom in the denominator dont change, theyre still large: library(lme4) lme4_1-lme(RT~fact1*fact2*fact3,random=~1|sub,data=myData) anova(lme4_1) Analysis of Variance Table DfSum Sq Mean Sq Denom F valuePr(F) fact1I1 2.709e-07 2.709e-0748 21.9205 2.360e-05 *** fact2I1 9.229e-08 9.229e-0848 7.4665 0.008772 ** fact3L1 4.906e-08 4.906e-0848 3.9691 0.052047 . fact3M1 4.326e-07 4.326e-0748 34.9972 3.370e-07 *** fact1I:fact2I 1 1.095e-07 1.095e-0748 8.8619 0.004552 ** fact1I:fact3L 1 8.988e-10 8.988e-1048 0.0727 0.788577 fact1I:fact3M 1 1.957e-08 1.957e-0848 1.5834 0.214351 fact2I:fact3L 1 3.741e-09 3.741e-0948 0.3027 0.584749 fact2I:fact3M 1 3.207e-08 3.207e-0848 2.5949 0.113767 fact1I:fact2I:fact3L 1 2.785e-09 2.785e-09
RE: [R] Help with generating data from a 'not quite' Normal distriburtion
Consider using the HyperbolicDist package. With the package you can both fit the hyperbolic distribution to your data and generate random numbers from the distribution. Hyperbolic distribution/s provide/s good fit to financial returns that commonly exhibit high peaks and heavy tails. Hannu Kahra -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Crabb, David Sent: Thursday, August 12, 2004 10:44 AM To: [EMAIL PROTECTED] Subject: [R] Help with generating data from a 'not quite' Normal distriburtion I would be very grateful for any help from members of this list for what might be a simple problem... We are trying to simulate the behaviour of a clinical measurement in a series of computer experiments. This is simple enough to do in R if we assume the measurements to be Gaussian, but their empirical distribution has a much higher peak at the mean and the distribution has much longer tails. (The distribution is quite symmetrical) Can anyone suggest any distributions I could fit to this data, and better still how I can then generate random data from this 'distribution' using R? --- Dr. David Crabb School of Science, The Nottingham Trent University, Clifton Campus, Nottingham. NG11 8NS Tel: 0115 848 3275 Fax: 0115 848 6690 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Giving a first good impression of R to Social Scientists
On Thu, 12 Aug 2004, Rau, Roland wrote: That is why would like to ask the experts on this list if anyone of you has encountered a similar experience and what you could advise to persuade people quickly that it is worth learning a new software? One problem is that it may not be true. Unless these people are going to be doing their own statistics in the future (which is probably true only for a minority) they might actually be better off with a point and click interface. I'm (obviously) not arguing that SPSS is a better statistical environment than R, but it is easier to learn, and in 10 or 15 weeks they may not get to see the benefits of R. -thomas __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Giving a first good impression of R to Social Scientists
Thomas Lumley wrote: On Thu, 12 Aug 2004, Rau, Roland wrote: That is why would like to ask the experts on this list if anyone of you has encountered a similar experience and what you could advise to persuade people quickly that it is worth learning a new software? The usual way of teaching R seems to be bottom-up. Here's the command prompt, type some arithmetic, make some assignments, learn about function calls and arguments, write your own functions, write your own packages. Perhaps a top-down approach might help certain cases. People using point-n-click packages tend to use a limited range of analyses. Write some functions that do these analyses, or give them wrappers so that they get something like: myData = readDataFile(foo.dat) Read 4 variables: Z, Age, Sex, Disease analyseThis(myData, response=Z, covariate=Age) Z = 0.36 * Age, Significance level = 0.932 or whatever. Really spoon feed the things they need to do. Make it really easy, foolproof. Then show them what's behind the analyseThis() function. How its not even part of the R distribution. How easy you made it for a beginner to do a complex and novel analysis. Then maybe it'll click for them, and they'll see how having a programming language behind their statistics functions lets them explore in ways not thought possible with the point-n-click paradigm. Perhaps they'll start editing analyseThis() and write analyseThat(), start thinking for themselves. Or maybe they'll just stare at you blankly... Baz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] truly object oriented programming in R
Good morning! I recently implemented a KD tree in JAVA for faster kernel density estimation (part of the code follows). It went well. To hook it with R, however, has proved more difficult. My question is: is it possible to implement the algorithm in R? My impression seems to indicate no as the code requires a complete class-object framework that R does not support. But is there an R package or something that may make it possible? Thanks in advance for your help. Java implementation of KD tree: public class Kdnode { private double[] center; //center of the bounding box private double diameter; //maximum distance from center to anywhere within the bounding box private int numOfPoints; //number of source data points in the bounding box private Kdnode left, right; public Kdnode(double[][] points, int split_dim, int [][] sortedIndices, double[][] bBox) { //bBox: the bounding box, 1st row the lower bound, 2nd row the upper bound numOfPoints = points.length; int d = points[0].length; center = new double[d]; for(int j=0; jd; j++) center[j] = (bBox[0][j]+bBox[1][j])/2.; diameter = get_diameter(bBox); if(numOfPoints==1) { diameter = 0.; for(int j=0; jd; j++) center[j] = points[0][j]; left = null; right = null; } else { int middlePoint = sortedIndices[split_dim][numOfPoints/2]; double splitValue = points[middlePoint][split_dim]; middlePoint = sortedIndices[split_dim][numOfPoints/2-1]; double splitValue_small = points[middlePoint][split_dim]; int left_size = numOfPoints/2; int right_size = numOfPoints - left_size; double[][] leftPoints = new double[left_size][d]; double[][] rightPoints = new double[right_size][d]; int[][] leftSortedIndices = new int[d][left_size]; int[][] rightSortedIndices = new int[d][right_size]; int left_counter = 0, right_counter = 0; int[] splitInfo = new int [numOfPoints]; for(int i = 0; i numOfPoints; i++) { if(points[i][split_dim] splitValue) { for(int j=0; jd; j++) leftPoints[left_counter][j] = points[i][j]; splitInfo[i] = right_counter; left_counter++; } else { for(int j=0; jd; j++) rightPoints[right_counter][j] = points[i][j]; splitInfo[i] = left_counter; right_counter++; } } // modify appropriately the indices to correspond to the new lists for(int i = 0; i d; i++) { int left_index = 0, right_index = 0; for(int j = 0; j numOfPoints; j++) { if(points[sortedIndices[i][j]][split_dim] splitValue) leftSortedIndices[i][left_index++] = sortedIndices[i][j] - splitInfo[sortedIndices[i][j]]; elserightSortedIndices[i][right_index++] = sortedIndices[i][j] - splitInfo[sortedIndices[i][j]]; } } // Recursively compute the kdnodes for the points in the two splitted spaces double[][] leftBBox = new double[2][]; double[][] rightBBox = new double[2][]; for(int i=0; i2; i++) { leftBBox[i] = (double[])bBox[i].clone(); rightBBox[i] = (double[])bBox[i].clone(); } leftBBox[1][split_dim] = splitValue_small; rightBBox[0][split_dim] = splitValue; int next_dim = (split_dim + 1) % (d); left = new Kdnode(leftPoints, next_dim, leftSortedIndices, leftBBox); right = new
Re: [R] truly object oriented programming in R
Jason Liao jg_liao at yahoo.com writes: : : Good morning! I recently implemented a KD tree in JAVA for faster : kernel density estimation (part of the code follows). It went well. To : hook it with R, however, has proved more difficult. My question is: is : it possible to implement the algorithm in R? My impression seems to : indicate no as the code requires a complete class-object framework that : R does not support. But is there an R package or something that may : make it possible? Thanks in advance for your help. R comes with the S3 and S4 object systems out-of-the-box and there is an addon package oo.R available at: http://www.maths.lth.se/help/R/R.classes/ that provides a more conventional OO system. Its likely that one or more of these would satisfy your requirements. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] linear constraint optim with bounds/reparametrization
From Spencer Graves: However, for an equality constraint, I've had good luck by with an objective function that adds something like the following to my objective function: constraintViolationPenalty*(A%*%theta-c)^2, where constraintViolationPenalty is passed via ... in a call to optim. I applied Spencer's suggestion to a set of eight different constrained portfolio optimization problems. It seems to give a usable practice to solve the portfolio problem, when the QP optimizer is not applicable. After all, practical portfolio management is more an art than a science. I may first run optim with a modest value for constraintViolationPenalty then restart it with the output of the initial run as starting values and with a larger value for constraintViolationPenalty. I wrote a loop that starts with a small value for the penalty and stops when the change of the function value, when increasing the penalty, is less than epsilon. I found that epsilon = 1e-06 provides a reasonable accuracy with respect to computational time. Spencer, many thanks for your suggestion. Hannu Kahra __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] error using daisy() in library(cluster). Bug?
[Reverted back to R-help, after private exchange] MM == Martin Maechler [EMAIL PROTECTED] on Thu, 12 Aug 2004 17:12:01 +0200 writes: javier == javier garcia - CEBAS [EMAIL PROTECTED] on Thu, 12 Aug 2004 16:28:27 +0200 writes: javier Martin; Yes I know that there are variables with all javier five values 'NA'. I've left them as they are just javier because of saving a couple of lines in the script, javier and because I like to see that they are there, javier although all values are 'NA'. I don't expect they javier are used in the analysis, but are they the source of javier the problem? MM yes, but only because of stand = TRUE. MM Yes, one could imagine that it might be good when MM standardizing these all NA variables would work MM I'll think a bit more about it. Thank you for the MM example. Ok. I've thought (and looked at the R code) a bit longer. Also considered the fact (you mentioned) that this worked in R 1.8.0. Hence, I'm considering the current behavior a bug. Here is the patch (apply to cluster/R/daisy.q in the *source* or at the appriopriate place in cluster_installed/R/cluster ) : --- daisy.q 2004/06/25 16:17:47 1.17 +++ daisy.q 2004/08/12 15:23:26 @@ -78,8 +78,8 @@ if(all(type2 == I)) { if(stand) { x - scale(x, center = TRUE, scale = FALSE) #- 0-means -sx - colMeans(abs(x)) -if(any(sx == 0)) { + sx - colMeans(abs(x), na.rm = TRUE)# can still have NA's + if(0 %in% sx) { warning(sQuote(x), has constant columns , pColl(which(sx == 0)), ; these are standardized to 0) sx[sx == 0] - 1 Thank you for helping to find and fix this bug. Martin Maechler, ETH Zurich, Switzerland javier El Jue 12 Ago 2004 15:11, MM escribió: Javier, I could well read your .RData and try your script to produce the same error from daisy(). Your dataframe is of dimension 5 x 180 and has many variables that have all five values 'NA' (see below). You can't expect to use these, do you? Martin __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] RE: Giving a first good impression of R to Social Scientists
Rau, Roland Rau at demogr.mpg.de writes: Yes, I do know the R-Commander. But I did not want to give them a GUI but rather expose them to the command line after I demonstrated that the steep learning curve in the beginning is worth the effort for the final results. Note that Rcmdr displays all the underlying generated R code that does the analysis as it runs so you are exposed to the command line. This might pique the interest of students wishing to learn more while giving an easy-to-use and immediately useful environment for those who just want to get results in the shortest most direction fashion. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Approaches to using RUnit
On Tue, Aug 10, 2004 at 04:53:49PM +0200, Klaus Juenemann wrote: If you don't organize your code into packages but source individual R files your approach to source the code at the beginning of a test file looks the right thing to do. Appears to be working pretty well for me too ;-) We mainly use packages and the code we use to test packages A and B, say, looks like SNIP We use the tests subdirectory of a package to store our RUnit tests even though this is not really according to R conventions. In an off list exchange with A.J. Rossini, we discussed an alternative for using RUnit in a package. The idea was to put the runit_*.R files (containing test code) into somePackage/inst/runit/ and then put a script, say dorunit.R inside somePackage/test/ that would create the test suite's similar to the code you included in your mail. The advantage of this would be that the unit tests would run using R CMD check. In the next week or so I hope to package-ify some code and try this out. + seth __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Giving a first good impression of R to Social Scientists
From: Barry Rowlingson Thomas Lumley wrote: On Thu, 12 Aug 2004, Rau, Roland wrote: That is why would like to ask the experts on this list if anyone of you has encountered a similar experience and what you could advise to persuade people quickly that it is worth learning a new software? The usual way of teaching R seems to be bottom-up. Here's the command prompt, type some arithmetic, make some assignments, learn about function calls and arguments, write your own functions, write your own packages. Perhaps a top-down approach might help certain cases. People using point-n-click packages tend to use a limited range of analyses. Write some functions that do these analyses, or give them wrappers so that they get something like: myData = readDataFile(foo.dat) Read 4 variables: Z, Age, Sex, Disease analyseThis(myData, response=Z, covariate=Age) Z = 0.36 * Age, Significance level = 0.932 or whatever. Really spoon feed the things they need to do. Make it really easy, foolproof. The problem is that the only `fool' that had been `proof' against is the one that the developer(s) had imagined. One cannot under-estimate users' ability to out-fool the developers' imagination... Cheers, Andy Then show them what's behind the analyseThis() function. How its not even part of the R distribution. How easy you made it for a beginner to do a complex and novel analysis. Then maybe it'll click for them, and they'll see how having a programming language behind their statistics functions lets them explore in ways not thought possible with the point-n-click paradigm. Perhaps they'll start editing analyseThis() and write analyseThat(), start thinking for themselves. Or maybe they'll just stare at you blankly... Baz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] truly object oriented programming in R
For an overview of the OOP R package, see http://cran.r-project.org/doc/Rnews/Rnews_2001-3.pdf + seth __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] truly object oriented programming in R
On Thu, 12 Aug 2004, Jason Liao wrote: Good morning! I recently implemented a KD tree in JAVA for faster kernel density estimation (part of the code follows). It went well. To hook it with R, however, has proved more difficult. My question is: is it possible to implement the algorithm in R? My impression seems to indicate no as the code requires a complete class-object framework that R does not support. But is there an R package or something that may make it possible? Thanks in advance for your help. This code doesn't seem to have any requirement for a class-object framework. The methods can all be written as functions, there isn't any use of inheritance or polymorphism. The data structure can then be a list. Now, you might want to make this list an object, to improve printing or to make it easier to check that the functions don't get called with arguments that aren't really k-d trees. This is well within the facilities of even the S3 method system. AFAICS the only class/object facility that Java provides and the methods package doesn't is enforcement of private methods and data, which has absolutely no impact on the complexity of programs (it can affect how easy code *maintenance* is, because it forces you to decide what is and isn't in your API, but that's a separate issue). The old S3 class system is weaker, since it doesn't support function polymorphism based on more than one of the arguments and doesn't have reliable reflectance facilities. -thomas Java implementation of KD tree: public class Kdnode { private double[] center; //center of the bounding box private double diameter; //maximum distance from center to anywhere within the bounding box private int numOfPoints; //number of source data points in the bounding box private Kdnode left, right; public Kdnode(double[][] points, int split_dim, int [][] sortedIndices, double[][] bBox) { //bBox: the bounding box, 1st row the lower bound, 2nd row the upper bound numOfPoints = points.length; int d = points[0].length; center = new double[d]; for(int j=0; jd; j++) center[j] = (bBox[0][j]+bBox[1][j])/2.; diameter = get_diameter(bBox); if(numOfPoints==1) { diameter = 0.; for(int j=0; jd; j++) center[j] = points[0][j]; left = null; right = null; } else { int middlePoint = sortedIndices[split_dim][numOfPoints/2]; double splitValue = points[middlePoint][split_dim]; middlePoint = sortedIndices[split_dim][numOfPoints/2-1]; double splitValue_small = points[middlePoint][split_dim]; int left_size = numOfPoints/2; int right_size = numOfPoints - left_size; double[][] leftPoints = new double[left_size][d]; double[][] rightPoints = new double[right_size][d]; int[][] leftSortedIndices = new int[d][left_size]; int[][] rightSortedIndices = new int[d][right_size]; int left_counter = 0, right_counter = 0; int[] splitInfo = new int [numOfPoints]; for(int i = 0; i numOfPoints; i++) { if(points[i][split_dim] splitValue) { for(int j=0; jd; j++) leftPoints[left_counter][j] = points[i][j]; splitInfo[i] = right_counter; left_counter++; } else { for(int j=0; jd; j++) rightPoints[right_counter][j] = points[i][j]; splitInfo[i] = left_counter; right_counter++; } } // modify appropriately the indices to correspond to the new lists for(int i = 0; i d; i++) { int left_index = 0, right_index = 0; for(int j = 0; j numOfPoints; j++) { if(points[sortedIndices[i][j]][split_dim] splitValue) leftSortedIndices[i][left_index++] = sortedIndices[i][j] - splitInfo[sortedIndices[i][j]]; elserightSortedIndices[i][right_index++] = sortedIndices[i][j] - splitInfo[sortedIndices[i][j]]; } } // Recursively compute the kdnodes for the points in the two splitted spaces double[][] leftBBox = new double[2][]; double[][] rightBBox = new double[2][]; for(int i=0; i2; i++) { leftBBox[i] = (double[])bBox[i].clone();
Re: [R] truly object oriented programming in R
Dear Thomas, Thank you very much again for taking time to answer my questions. I am sorry that my knoweldge of R is limited as I have only learned what is necessary to do my work. In the KD tree, we have this recursive data structure in that each knod has two children knods and this process continues until the data points are divided. Does R's list support this recursive data structure? If yes, can you give a sample program? Regards, Jason --- Thomas Lumley [EMAIL PROTECTED] wrote: On Thu, 12 Aug 2004, Jason Liao wrote: Good morning! I recently implemented a KD tree in JAVA for faster kernel density estimation (part of the code follows). It went well. To hook it with R, however, has proved more difficult. My question is: is it possible to implement the algorithm in R? My impression seems to indicate no as the code requires a complete class-object framework that R does not support. But is there an R package or something that may make it possible? Thanks in advance for your help. This code doesn't seem to have any requirement for a class-object framework. The methods can all be written as functions, there isn't any use of inheritance or polymorphism. The data structure can then be a list. Now, you might want to make this list an object, to improve printing or to make it easier to check that the functions don't get called with arguments that aren't really k-d trees. This is well within the facilities of even the S3 method system. AFAICS the only class/object facility that Java provides and the methods package doesn't is enforcement of private methods and data, which has absolutely no impact on the complexity of programs (it can affect how easy code *maintenance* is, because it forces you to decide what is and isn't in your API, but that's a separate issue). The old S3 class system is weaker, since it doesn't support function polymorphism based on more than one of the arguments and doesn't have reliable reflectance facilities. -thomas Java implementation of KD tree: public class Kdnode { private double[] center; //center of the bounding box private double diameter; //maximum distance from center to anywhere within the bounding box private int numOfPoints; //number of source data points in the bounding box private Kdnode left, right; public Kdnode(double[][] points, int split_dim, int [][] sortedIndices, double[][] bBox) { //bBox: the bounding box, 1st row the lower bound, 2nd row the upper bound numOfPoints = points.length; int d = points[0].length; center = new double[d]; for(int j=0; jd; j++) center[j] = (bBox[0][j]+bBox[1][j])/2.; diameter = get_diameter(bBox); if(numOfPoints==1) { diameter = 0.; for(int j=0; jd; j++) center[j] = points[0][j]; left = null; right = null; } else { int middlePoint = sortedIndices[split_dim][numOfPoints/2]; double splitValue = points[middlePoint][split_dim]; middlePoint = sortedIndices[split_dim][numOfPoints/2-1]; double splitValue_small = points[middlePoint][split_dim]; int left_size = numOfPoints/2; int right_size = numOfPoints - left_size; double[][] leftPoints = new double[left_size][d]; double[][] rightPoints = new double[right_size][d]; int[][] leftSortedIndices = new int[d][left_size]; int[][] rightSortedIndices = new int[d][right_size]; int left_counter = 0, right_counter = 0; int[] splitInfo = new int [numOfPoints]; for(int i = 0; i numOfPoints; i++) { if(points[i][split_dim] splitValue) { for(int j=0; jd; j++) leftPoints[left_counter][j] = points[i][j]; splitInfo[i] = right_counter; left_counter++; } else { for(int j=0; jd; j++) rightPoints[right_counter][j] = points[i][j]; splitInfo[i] = left_counter; right_counter++; } } // modify appropriately the indices to correspond to the new lists for(int i = 0; i d; i++) { int left_index = 0, right_index = 0; for(int j = 0; j numOfPoints; j++) { if(points[sortedIndices[i][j]][split_dim] splitValue) leftSortedIndices[i][left_index++] = sortedIndices[i][j] - splitInfo[sortedIndices[i][j]];
[R] rgl snapshot command
Hi, I am using the rgl package for 3D display. Unfortunately, I am not able to get the snapshot command running. I tried the following: example(rgl.surface) rgl.sr data(volcano) rgl.sr y - 2 * volcano rgl.sr x - 10 * (1:nrow(y)) rgl.sr z - 10 * (1:ncol(y)) rgl.sr ylim - range(y) rgl.sr ylen - ylim[2] - ylim[1] + 1 rgl.sr colorlut - terrain.colors(ylen) rgl.sr col - colorlut[y - ylim[1] + 1] rgl.sr rgl.clear() rgl.sr rgl.surface(x, z, y, color = col) rgl.snapshot(filename=./volcano.png,fmt=png) [1] failed Any help is highly appreciated Joern __ Joern Kamradt, MD Cancer Genetic Branch National Human Genome Research Institute National Institutes of Health 50 South Drive, building 50, room 5147 Bethesda, MD 20892-8000, USA Phone#: +1 (301) 496 5382 FAX#: +1 (301) 402 3241 Email: [EMAIL PROTECTED] Website: www.genome.gov __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [R-pkgs] new package fortunes 1.0-3
Dear useRs, I used the summer months to work on all of my packages, and so this is the first of a sequence of announcements of new or updated packages. The new packages are new in the sense that previous versions had been on CRAN for some months but hadn't been announced to the R community via this list until now. All packages are available from the CRAN master site in source form - binary versions should be available from the mirrors in the next days. So the first announcement is for fortunes 1.0-3: The fortunes package provides simple infrastructure for reading fortunes from a .csv file and displaying them. Furthermore, it contains a growing list of fortunes related to R, mainly collected from the mailing lists but also from quotes at conferences. The author list contains me (as I've written the R code) as well as the people who contributed quotes by sending me a mail. The original authors of each quote are always given in the respective fortune. For those of you who want to see a quote each time they start up R: you can add to your .Rprofile something like if(interactive()) { library(fortunes); fortune() } If you want to create your own list of fortunes you can simply add another fortune collection in .csv format. Of course, it would also be great if you could contribute some quotes to the package...simply send me an e-mail. Enjoy! Z - Package: fortunes Version: 1.0-3 Date: 2004-08-10 Title: R Fortunes Author: Achim Zeileis, fortune contributions from Torsten Hothorn, Peter Dalgaard, Uwe Ligges, Kevin Wright, Martin Maechler Maintainer: Achim Zeileis [EMAIL PROTECTED] Description: R Fortunes Depends: R (= 1.4.0) License: GPL ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [R-pkgs] new package sandwich 0.1-3
Dear useRs, here is the announcement for the next new package: sandwich 0.1-3. sandwich provides heteroskedasticity (and autocorrelation) consistent covariance matrix estimators (also called HC and HAC estimators). The former are implemented in the function vcovHC() (which was available in strucchange before - and independently in hccm() in John Fox's car package). And the latter are implemented in the function vcovHAC(). This implements sandwich-type estimators in a rather flexible way, allowing for user-defined weights or weight functions. It builds on some of the functionality which was before available in Thomas Lumley's weave package (not on CRAN). In particular it makes available the class of WEAVE estimators introduced by Lumley Heagerty (1999) in the function weave() which is a convenience interface to vcovHAC(). Furthermore, it implements the class of kernel HAC estimators with automatic bandwidth-selection of Andrews (1991) in the function kernHAC(), which is again a convenience interface to vcovHAC(). Best wishes, Z - Package: sandwich Version: 0.1-3 Date: 2004-07-19 Title: Robust Covariance Matrix Estimators Author: Thomas Lumley, Achim Zeileis Maintainer: Achim Zeileis [EMAIL PROTECTED] Description: Model-robust standard error estimators for time series and longitudinal data. Depends: zoo, R (= 1.5.0) License: GPL version 2 ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [R-pkgs] new package zoo 0.2-0
Dear useRs, yet another new package: zoo 0.2-0. zoo provides a simple S3 class and methods for totally ordered indexed observations such as irregular time series. Although there are other packages for irregular time series available on CRAN (Giles Heywood's its package and irts() in Adrian Trapletti's tseries package) I wrote this package because I needed something which provides simple infrastructure for observations with (almost) arbitrary indexes (and not only POSIXct time stampes as in its() and irts()). And it was at least also useful for Gabor Grothendieck who provided most of the updates to this version. Best wishes, Z Package: zoo Version: 0.2-0 Date: 2004-08-12 Title: Z's ordered observations Author: Achim Zeileis, Gabor Grothendieck Maintainer: Achim Zeileis [EMAIL PROTECTED] Description: A class with methods for totally ordered indexed observations such as irregular time series. Depends: R (= 1.7.0) License: GPL ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [R-pkgs] updated package strucchange 1.2-4
Dear useRs, the strucchange package for testing for structural change has been updated: the current version is 1.2-4. The most significant additions were two functions gefp() and efpFunctional(). gefp() implements a class of generalized M-fluctuation tests for testing for parameter instability or structural change in general parametric models including generalized linear models (GLMs). efpFunctional() provides infrastructure for inference based on functionals applied to empirical fluctuation processes such as automatic tabulation of critical values and a choice of a suitable visualization method. The theory behind both functions is described in Zeileis Hornik (2003), the implementation ideas are explained in Zeileis (2004). Links to both papers are available from my web page: http://www.ci.tuwien.ac.at/~zeileis/ Best wishes, Z Package: strucchange Version: 1.2-4 Date: 2004-08-10 Title: Testing for Structural Change Author: Achim Zeileis, Friedrich Leisch, Bruce Hansen, Kurt Hornik, Christian Kleiber, Andrea Peters Maintainer: Achim Zeileis [EMAIL PROTECTED] Description: Testing, dating and monitoring of structural change in linear regression relationships. strucchange features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data. Depends: sandwich, zoo, R (= 1.5.0) License: GPL ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [R-pkgs] updated package ineq 0.2-4
Dear useRs, my last announcement is an update of the ineq package for measuring inequality, concentration and poverty. The current version is now 0.2-4. Thanks to suggestions from Rein Halbersma the Pen() function for plotting Pen's parade was improved and now allows for much more flexibility. See the help page for examples. Best wishes, Z - Package: ineq Version: 0.2-4 Date: 2004-08-10 Title: Measuring inequality, concentration and poverty Author: Achim Zeileis Maintainer: Achim Zeileis [EMAIL PROTECTED] Description: Inequality, concentration and poverty measures Lorenz curves (empirical and theoretical) License: GPL ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] truly object oriented programming in R
On Thu, 12 Aug 2004, Jason Liao wrote: Dear Thomas, Thank you very much again for taking time to answer my questions. I am sorry that my knoweldge of R is limited as I have only learned what is necessary to do my work. In the KD tree, we have this recursive data structure in that each knod has two children knods and this process continues until the data points are divided. Does R's list support this recursive data structure? If yes, can you give a sample program? Yes, the elements of a list can be lists. For example, a simple binary tree could have lists with elements left, right, key, and data ## create a new single node newtree-function(key,data){ list(left=NULL,right=NULL, key=key, data=data)} ## add a node to a sorted tree addnode-function(tree, key, data){ if (key=tree$key){ if (is.null(tree$left)) tree$left-newtree(data=data,key=key) else tree$left-addnode(tree$left,key,data) } else { if (is.null(tree$right)) tree$right-newtree(data=data,key=key) else tree$right-addnode(tree$left,key,data) } return(tree) } ## inorder traversal. action() is any function that takes key and data ## arguments applyinorder-function(tree, action){ c(if (!is.null(tree$left)) applyinorder(tree$left,action), action(tree$key,tree$data), if (!is.null(tree$right)) applyinorder(tree$right, action)) } ## an example a-newtree(R,two optional method systems and first-class functions) a-addnode(a,Java,compulsory object system) a-addnode(a,C,No built-in support but that needn't stop you) a-addnode(a,C++,If C++ is your hammer, everything looks like a thumb) applyinorder(a,function(key,data) paste(key,data,sep=: )) [1] C: No built-in support but that needn't stop you [2] C++: If C++ is your hammer, everything looks like a thumb [3] Java: compulsory object system [4] R: two optional method systems and first-class functions __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] truly object oriented programming in R
On Thu, 12 Aug 2004, Seth Falcon wrote: The thing that's very different from, say, Java is that everything is an object in R --- there isn't a notion of a *reference* to an object, which is why in the above I had to say head - insertNode(...) where as in Java you could pass in a reference to head and have the method modify what it points to. I think there are some ways around this, at least syntactically, using various tricks with environment(), but I don't yet understand them well enough to comment further. Yes, and there is support in packages for other object systems in addition to the two built-in ones. Some of us feel that if you want Java you know where to find it... -thomas __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] .Random.seed error
I have this snippet of code from an example in Dr. Harrel's book Regression Modeling Strategies p 501 n-2000 .Random.seed -c(49,39,17,36,23,0,43,51,6,54,50,1) age -50 + 12 * rnorm(n) age I get the error message: Error in rnorm(n) : .Random.seed[1] is NOT a valid RNG kind (code) I have tried this on Windows and Linux R versions 1.8.1, 1.9.0, and 1.9.1 If I comment out the .Random.seed line and call set.seed(49), set.seed(39) etc before each call to a random generator function, everyone is HAPPY. Does anyone have a suggestion? Richard dot urbano At Vanderbilt dot edu [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] .Random.seed error
On Thu, 12 Aug 2004, Richard Urbano wrote: I have this snippet of code from an example in Dr. Harrel's book Regression Modeling Strategies p 501 That's in a section called `S-PLUS Examples'. ^^ n-2000 .Random.seed -c(49,39,17,36,23,0,43,51,6,54,50,1) age -50 + 12 * rnorm(n) age I get the error message: Error in rnorm(n) : .Random.seed[1] is NOT a valid RNG kind (code) I have tried this on Windows and Linux R versions 1.8.1, 1.9.0, and 1.9.1 But you did not try the S-PLUS examples in S-PLUS If I comment out the .Random.seed line and call set.seed(49), set.seed(39) etc before each call to a random generator function, everyone is HAPPY. Does anyone have a suggestion? Don't confuse S-PLUS and R. Read the R documentation before posting, as the posting guide asks -- see ?.Random.seed for the explanation of the format of .Random.seed. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Fwd: Timebased predictions in postgresql.
Hi r-people :) I'm sorry to disturb but I must admit that I know amazingly little about R and similar statistics-packages/languages and I'm kind of lost on where to start. I'm currently working on a datacollection framework for postgresql (The db doesn't really matter except that I hope to use PL/R) and I would like to be able to predict future values preferable 1 day or more ahead. The highest resolution on the historic data is 4 minutes but I'm already resampling that to whatever I need, so if it would be better to use 30min or other reasolution (because of performance) it would be perfectly ok. The types of statistics in the database is typically network io/ cpu usage, temperatur, etc. and I will rarely have holes in the history. Can anyone tell me how (or give me a hint) I can predict traffic or similar maybe one or more days ahead when I have the data xxdays back ? (And many how many days / which interval would be optimal for best performance/precision). You can also tell me it's impossible, but I think it could be really cool to present graphs of expected cpu-load or network IO to our users. Can it be done in R (and PL/R) ? Best regards, Nicolai Petri catpipe Systems Aps __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] hclust-segmentation fault
Thanks a lot for the assistance. It was an installation mistake - apparently we did not have BLAS libraries installed. The compiler did not seem to have been a problem in this case, althought it is an older version (gcc 3.3.1). Mario. -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Monday, August 09, 2004 11:51 AM To: Medvedovic, Mario (medvedm) Cc: '[EMAIL PROTECTED]' Subject: RE: [R] hclust-segmentation fault On Mon, 9 Aug 2004, Medvedovic, Mario (medvedm) wrote: Well, the use of debugger will take some time, but here is a simple code that invariably causes the fault. Mario. indata-matrix(rnorm(1000,0,1),ncol=10) ed-dist(indata) hc.e-hclust(ed,average) Works fine on R 1.9.1 on our dual Opteron 248 under FC2. We know of some pertinent compiler bugs on x86_64, so is this gcc 3.3.3 or later? -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Monday, August 09, 2004 11:14 AM To: Medvedovic, Mario (medvedm) Cc: '[EMAIL PROTECTED]' Subject: Re: [R] hclust-segmentation fault On Mon, 9 Aug 2004, Medvedovic, Mario (medvedm) wrote: I am getting the Segmentation fault when using hclust in R-1.9.1 running under SuSe 9.0 64-bit kernel on a dual opteron system with 8G of RAM. I was wandering if anybody could offer any insight? Please try to use the debugger to supply more information, or give us some code we can reproduce on a similar system to see if we can reproduce the segfault. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Error Using pm.getabst()
R Users: After installing Bioconductor, RSXML and all the relevant Win32 DLLs (libxml2, zlib, iconv), I receive the following error message when using pm.getabst() Error in xmlRoot(absts) : no applicable method for xmlRoot I receive this when using the example from help(pm.getabst). Downloading the target XML file, parsing it with xmlTreeParse and applying xmlRoot returns no error. Your thoughts/suggestions are appreciated. Mark Larsen __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] correlation structures in NLME
I am using the latest version of R on a Windows machine and get the following error when I try to initialize a correlation structure with the function corAR1 in NLME. This example is taken from the book of Pinheiro and Bates, so it should work. What is going wrong? library(nlme) data(Orthodont) cs1AR1 - corAR1(0.8, form= ~1 | Subject) cs1AR1 - initialize(cs1AR1, data = Orthodont) Error in methodsPackageMetaName(C, name) : The name of the object (e.g,. a class or generic function) to find in the meta-data must be a single string (got a character vector of length 2) In addition: Warning message: the condition has length 1 and only the first element will be used in: if (!is.na(match(Class, .BasicClasses))) return(newBasic(Class, Thank you! Michael Jerosch-Herold __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] correlation structures in NLME
On Thu, 12 Aug 2004, Michael Jerosch-Herold wrote: I am using the latest version of R on a Windows machine and get the following error when I try to initialize a correlation structure with the function corAR1 in NLME. This example is taken from the book of Pinheiro and Bates, so it should work. What is going wrong? That book is about S(-PLUS), so there is no reason why it `should work' in R. library(nlme) data(Orthodont) cs1AR1 - corAR1(0.8, form= ~1 | Subject) cs1AR1 - initialize(cs1AR1, data = Orthodont) Error in methodsPackageMetaName(C, name) : The name of the object (e.g,. a class or generic function) to find in the meta-data must be a single string (got a character vector of length 2) In addition: Warning message: the condition has length 1 and only the first element will be used in: if (!is.na(match(Class, .BasicClasses))) return(newBasic(Class, Try find(initialize) [1] package:methods so that's not right. Have you looked in the R scripts in package nlme? I think it is Initialize() in R. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error Using pm.getabst()
You will almost surely do better to ask about Bioconductor packages on the Bioconductor mailing list. Next, it is helpful to know what versions of things you are using. As for your problem, did you look to see what kind of object absts is? There seems to be no default method for xmlRoot, and it is likely that the call to create the absts object failed (prior to this). You might want to try stepping through the commands, one at a time and checking each step. Often, the problem arises because you have not properly set up your connection to the internet and so none of the querying software will work. Robert On Thu, Aug 12, 2004 at 08:20:10PM +, [EMAIL PROTECTED] wrote: R Users: After installing Bioconductor, RSXML and all the relevant Win32 DLLs (libxml2, zlib, iconv), I receive the following error message when using pm.getabst() Error in xmlRoot(absts) : no applicable method for xmlRoot I receive this when using the example from help(pm.getabst). Downloading the target XML file, parsing it with xmlTreeParse and applying xmlRoot returns no error. Your thoughts/suggestions are appreciated. Mark Larsen __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- +---+ | Robert Gentleman phone : (617) 632-5250 | | Associate Professor fax: (617) 632-2444 | | Department of Biostatistics office: M1B20| | Harvard School of Public Health email: [EMAIL PROTECTED]| +---+ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] How to use the whole dataset (including between events) in Cox model (time-varying covariates) ?
Hello, coxph does not use any information that are in the dataset between event times (or death times) , since computation only occurs at event times. For instance, removing observations when there is no event at that time in the whole dataset does not change the results: set.seed(1) data - as.data.frame(cbind(start=c(1:5,1:5,1:4),stop=c(2:6,2:6,2:5),status=c(rep( 0,7),1,rep(0,5),1),id=c(rep(1,5),rep(2,5),rep(3,4)),x1=rnorm(14))) data start stop status id x1 1 1 2 0 1 -0.6264538 2 2 3 0 1 0.1836433 3 3 4 0 1 -0.8356286 4 4 5 0 1 1.5952808 5 5 6 0 1 0.3295078 6 1 2 0 2 -0.8204684 7 2 3 0 2 0.4874291 8 3 4 1 2 0.7383247 9 4 5 0 2 0.5757814 10 5 6 0 2 -0.3053884 11 1 2 0 3 1.5117812 12 2 3 0 3 0.3898432 13 3 4 0 3 -0.6212406 14 4 5 1 3 -2.2146999 coxph(Surv(start, stop,status)~ cluster(id)+x1,data=data ,robust=T) coxph(Surv(start, stop,status)~ cluster(id)+x1,data=subset(data,stop %in% 4:5) ,robust=T) # the same !!! (except n) First, some data is lost. Second, this loss could be an important problem when there is a time-varying covariate that changes quicker than the frequency of events. Specifically, I have a covariate which has low values most of the time. It sometimes jumps to high values and that is hypothesized as greatly increasing the risk of an event. With rare events, the effect of this covariate will only be measured at event times. Chances are that the only time such a covariate is recorded at high level, the individual for which it is measured as being high is having an event. This may bias the estimated coefficient. Here is my question: How to fully use the dataset? (that is: how to have really _time-varying_ covariates (even if they change step by step, not continuously), not covariates whose changes are measured only at event time ) Ideally, the full dataset would be use to estimate the parameters, or at least to estimate the standard error of the estimated parameters. Any ideas ??? . . . A second best (which might require less work) would be to use all the dataset to assess the predictive power of the model. Maybe by using the expected number of events for an individual over the time interval that they were observed to be at risk predict(coxfit,type=expected) and compare it with observed number of events (does it use all data and takes into account all the baseline hazard, even between events?) Or, if not, following Brian D. Ripley suggestion about the baseline hazard: As an approximation you can smooth the fitted baseline cumulative hazard (e.g. by package pspline) and ask for its derivative (https://stat.ethz.ch/pipermail/r-help/2004-July/052376.html) the following code could be use to estimate (and plot) a smooth baseline hazard: t-seq(min(data$start),max(data$stop),length=100) lines(t, predict(sm.spline(x=basehaz(coxfit)[,2],y=basehaz(coxfit)[,1],norder=2), t,1)) #there is a problem with this code. One should add the contraint that the baseline hazard cannot be negative. The following computes the parametric part of the cox model. risk - predict(coxfit, type='risk') # gives exp(X'b) something like baseline.hazard*risk would give the true risk at any time (but it would be probably much harder to compute) which could help assess the predictive power of the model. (still a lot of work) Thanks in advance for any help or comment. Mayeul KAUFFMANN Univ. Pierre Mendes France Grenoble - France __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] coxph question
I have many variables to test using cox model (coxph), and I am only interested in those variables with p value less than 0.01. Is there a quick way to do this automatically instead of looking at the output of each variable? Chris I guess you need covariate selection. for a lengthy discussion of another method (AIC/BIC), look at last month archive: https://www.stat.math.ethz.ch/pipermail/r-help/2004-July/subject.html#53519 and try using library(MASS) stepAIC (...) Most of the time, it starts removing the covariate with the lower p-value. Mayeul KAUFFMANN Univ. Pierre Mendes France Grenoble - France __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] pnorm, qnorm
Trenkler, Dietrich said: I found the following strange behavior using qnorm() and pnorm(): x-8.21;x-qnorm(pnorm(x)) [1] 0.0004638484 x-8.28;x-qnorm(pnorm(x)) [1] 0.07046385 x-8.29;x-qnorm(pnorm(x)) [1] 0.08046385 x-8.30;x-qnorm(pnorm(x)) [1] -Inf qnorm(1-.Machine$double.eps) [1] 8.12589 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error Using pm.getabst()
Robert, Thank you for your reply. I thought RSXML was an R (CRAN) package? I realize your package is part of bioconductor so I'll try the bioconductor mailing list as well. Also and more importantly I took your suggestion and stepped through the calls. It seems the failure in pm.getabst() occurs when creating absts with the pubmed function. When I'm back at work, I'll have to research why it fails further. Right now, while I'm at home, it works flawlessly. Any reason this would fail behind a corporate firewall as opposed to my home network? Oh and my apologies for posting so hastily without the full information. R - 1.9.1, annotate - 1.4.0, RSXML - 0.97, all on a Windows2000 OS. Thanks, Mark Larsen On Thu, 2004-08-12 at 17:28, Robert Gentleman wrote: You will almost surely do better to ask about Bioconductor packages on the Bioconductor mailing list. Next, it is helpful to know what versions of things you are using. As for your problem, did you look to see what kind of object absts is? There seems to be no default method for xmlRoot, and it is likely that the call to create the absts object failed (prior to this). You might want to try stepping through the commands, one at a time and checking each step. Often, the problem arises because you have not properly set up your connection to the internet and so none of the querying software will work. Robert On Thu, Aug 12, 2004 at 08:20:10PM +, [EMAIL PROTECTED] wrote: R Users: After installing Bioconductor, RSXML and all the relevant Win32 DLLs (libxml2, zlib, iconv), I receive the following error message when using pm.getabst() Error in xmlRoot(absts) : no applicable method for xmlRoot I receive this when using the example from help(pm.getabst). Downloading the target XML file, parsing it with xmlTreeParse and applying xmlRoot returns no error. Your thoughts/suggestions are appreciated. Mark Larsen __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Object oriented programming resources
Hi, I'm looking for resources to read about the object-oriented features of R. I have looked through the Manuals page on r-project.org. The most useful of the documents seemed to be the draft of the R language definition. However it had only about 6 pages on the topic. I have also used Google, but my problem here is that R appears in a *lot* of webpages! I tried limiting the search by using site:r-project.org, but didn't find anything very useful. Specifically, I'm trying to find information on member variables (I think that's the correct term), as I'd like to copy this concept from C++: class a { ... private: int x; // I think the term for this is a member variable }; Thanks for your thoughts, Matthew __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Object oriented programming resources
The Bioconductor project posts a short tutorial A guide to using S4 Objects under Developer Page frame. I've found it useful. Note that R-s S4-classes approach to OOP is very different from the one of C++ or Java. Yet you will find member vars, they are called slots. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Matthew Walker Sent: Thursday, August 12, 2004 7:56 PM To: [EMAIL PROTECTED] Subject: [R] Object oriented programming resources Hi, I'm looking for resources to read about the object-oriented features of R. I have looked through the Manuals page on r-project.org. The most useful of the documents seemed to be the draft of the R language definition. However it had only about 6 pages on the topic. I have also used Google, but my problem here is that R appears in a *lot* of webpages! I tried limiting the search by using site:r-project.org, but didn't find anything very useful. Specifically, I'm trying to find information on member variables (I think that's the correct term), as I'd like to copy this concept from C++: class a { ... private: int x; // I think the term for this is a member variable }; Thanks for your thoughts, Matthew __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html