[R] comparing two vectors
Suppose I have a vector A=c(1,2,3) now I want to compare each element of A to another vector L=c(0.5, 1.2) and then recode values for sum(A0.5) and sum(A1.2) to get a result of (3,2) how can I get this without writing a loop of sums? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] penalized cox regression
Hi, What is the function to calculate penalized cox regression? frailtyPenal in frailtypack R package imposes max 2 strata. I want to use a function that reduces all my variables without stratifying them in advance. Look forward to your reply carol - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comparing two vectors
On 6/10/07, gallon li [EMAIL PROTECTED] wrote: Suppose I have a vector A=c(1,2,3) now I want to compare each element of A to another vector L=c(0.5, 1.2) and then recode values for sum(A0.5) and sum(A1.2) to get a result of (3,2) how can I get this without writing a loop of sums? How about colSums(outer(A, L, )) Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
On 10-Jun-07 02:16:46, Gabor Grothendieck wrote: That can be elegantly handled in R through R's object oriented programming by defining a class for the fancy input. See this post: https://stat.ethz.ch/pipermail/r-help/2007-April/130912.html for a simple example of that style. On 6/9/07, Robert Wilkins [EMAIL PROTECTED] wrote: Here are some examples of the type of data crunching you might have to do. In response to the requests by Christophe Pallier and Martin Stevens. Before I started developing Vilno, some six years ago, I had been working in the pharmaceuticals for eight years ( it's not easy to show you actual data though, because it's all confidential of course). I hadn't heard of Vilno before (except as a variant of Vilnius). And it seems remarkably hard to find info about it from a Google search. The best I've come up with, searching on vilno data is at http://www.xanga.com/datahelper This is a blog site, apparently with postings by Robert Wilkins. At the end of the Sunday, September 17, 2006 posting Tedious coding at the Pharmas is a link: I have created a new data crunching programming language. http://www.my.opera.com/datahelper which appears to be totally empty. In another blog article: go to the www.my.opera.com/datahelper site, go to the August 31 blog article, and there you will find a tarball-file to download, called vilnoAUG2006package.tgz so again inaccessible; and a google on vilnoAUG2006package.tgz gives a single hit which is simply the same aricle. In the Xanga blog there are a few examples of tasks which are no big deal in any programming language (and, relative to their simplicity, appear a bit cumbersome in Vilno). I've not seen in the blog any instance of data transformation which could not be quite easily done in any straigthforward language (even awk). Lab data can be especially messy, especially if one clinical trial allows the physicians to use different labs. So let's consider lab data. [...] That's a fairly daunting description, though indeed not at all extreme for the sort of data that can arise in practice (and not just in pharmaceutical investigations). But the complexity is in the situation, and, whatever language you use, the writing of the program will involve the writer getting to grips with the complexity, and the complexity will be present in the code simply because of the need to accomodate all the special cases, exceptions and faults that have to be anticipated in feral data. Once these have been anticipated and incorporated in the code, the actual transformations are again no big deal. Frankly, I haven't yet seen anything Vilno that couldn't be accomodated in an 'awk' program. Not that I'm advocating awk for universal use (I'm not that monolithic about it). But I'm using it as my favourite example of a flexible, capable, transparent and efficient data filtering language, as far as it goes. SO: where can one find out more about Vilno, to see what it may really be capable of that can not be done so easily in other ways? (As is implicit in many comments in Robert's blog, and indeed also from many postings to this list over time and undoubtedly well known to many of us in practice, a lot of the problems with data files arise at the data gathering and entry stages, where people can behave as if stuffing unpaired socks and unattributed underwear randomly into a drawer, and then banging it shut). Best wishes to all, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 10-Jun-07 Time: 09:28:10 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coding categorical variables in mixed environment
Hi R users, Suppose we have following data for a regression model: AGE:numerical SEX: male/female categorical COLOR: {blue, green, pink} categorical RESPONSE: yes/no categorical AGE SEX COLOR RESPONSE 10 M BLUE Y 12 M GREEN N 13 F PINK Y 11 M BLUE Y 13 M GREEN N 09 F GREEN N 15 F BLUE Y 11 F PINK Y 12 M PINK N 14 M GREENN I want to code the categorical data as {male =1, female =2}, {blue =1, green =2, pink = 3} {yes =1, no =0} and finally get the new table. how can i do this? waiting for reply. Thanks in advance. bye -- View this message in context: http://www.nabble.com/Coding-categorical-variables-in-mixed-environment-tf3896721.html#a11046822 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] find position
find the position of the first value who equals certain number in a vector: Say a=c(0,0,0,0,0.2, 0.2, 0.4,0.4,0.5) i wish to return the index value in a for which the value in the vector is equal to 0.4 for the first time. in this case, it is 7. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] find position
which(a == .4)[1] b On Jun 10, 2007, at 4:45 AM, gallon li wrote: find the position of the first value who equals certain number in a vector: Say a=c(0,0,0,0,0.2, 0.2, 0.4,0.4,0.5) i wish to return the index value in a for which the value in the vector is equal to 0.4 for the first time. in this case, it is 7. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] find position
try this: which(a == 0.4)[1] I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm Quoting gallon li [EMAIL PROTECTED]: find the position of the first value who equals certain number in a vector: Say a=c(0,0,0,0,0.2, 0.2, 0.4,0.4,0.5) i wish to return the index value in a for which the value in the vector is equal to 0.4 for the first time. in this case, it is 7. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
Douglas Bates wrote: Frank Harrell indicated that it is possible to do a lot of difficult data transformation within R itself if you try hard enough but that sometimes means working against the S language and its whole object view to accomplish what you want and it can require knowledge of subtle aspects of the S language. Actually, I think Frank's point was subtly different: It is *because* of the differences in view that it sometimes seems difficult to find the way to do something in R that is apparently straightforward in SAS. I.e. the solutions exist and are often elegant, but may require some lateral thinking. Case in point: Finding the first or the last observation for each subject when there are multiple records for each subject. The SAS way would be a datastep with IF-THEN-DELETE, and a RETAIN statement so that you can compare the subject ID with the one from the previous record, working with data that are sorted appropriately. You can do the same thing in R with a for loop, but there are better ways e.g. subset(df,!duplicated(ID)), and subset(df, rev(!duplicated(rev(ID))), or maybe do.call(rbind,lapply(split(df,df$ID), head, 1)), resp. tail. Or something involving aggregate(). (The latter approaches generalize better to other within-subject functionals like cumulative doses, etc.). The hardest cases that I know of are the ones where you need to turn one record into many, such as occurs in survival analysis with time-dependent, piecewise constant covariates. This may require transposing the problem, i.e. for each interval you find out which subjects contribute and with what, whereas the SAS way would be a within-subject loop over intervals containing an OUTPUT statement. Also, there are some really weird data formats, where e.g. the input format is different in different records. Back in the 80's where punched-card input was still common, it was quite popular to have one card with background information on a patient plus several cards detailing visits, and you'd get a stack of cards containing both kinds. In R you would most likely split on the card type using grep() and then read the two kinds separately and merge() them later. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coding categorical variables in mixed environment
try ?recode in package:car --- spime [EMAIL PROTECTED] wrote: Hi R users, Suppose we have following data for a regression model: AGE:numerical SEX: male/female categorical COLOR: {blue, green, pink} categorical RESPONSE: yes/no categorical AGE SEX COLOR RESPONSE 10 M BLUE Y 12 M GREEN N 13 F PINK Y 11 M BLUE Y 13 M GREEN N 09 F GREEN N 15 F BLUE Y 11 F PINK Y 12 M PINK N 14 M GREENN I want to code the categorical data as {male =1, female =2}, {blue =1, green =2, pink = 3} {yes =1, no =0} and finally get the new table. how can i do this? waiting for reply. Thanks in advance. bye -- View this message in context: http://www.nabble.com/Coding-categorical-variables-in-mixed-environment-tf3896721.html#a11046822 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] find position
Try match(0.4, a) Also see ?match and the nomatch= argument, in particular. If your numbers are only equal to within an absolute tolerance, tol, as discussed in the R FAQ http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f you may need: tol - 1e-6 match(TRUE, abs(a-0.4) tol) or which(abs(a-0.4) tol)[1] # tol from above and analogously if a relative tolerance is required. On 6/10/07, gallon li [EMAIL PROTECTED] wrote: find the position of the first value who equals certain number in a vector: Say a=c(0,0,0,0,0.2, 0.2, 0.4,0.4,0.5) i wish to return the index value in a for which the value in the vector is equal to 0.4 for the first time. in this case, it is 7. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
On 6/10/07, Ted Harding [EMAIL PROTECTED] wrote: ... a lot of the problems with data files arise at the data gathering and entry stages, where people can behave as if stuffing unpaired socks and unattributed underwear randomly into a drawer, and then banging it shut. Not specifically R-related, but this would make a great fortune. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] {nlme} Multilevel estimation heteroscedasticity
Dear All, I'm trying to model heteroscedasticity using a multilevel model. To do so, I make use of the nlme package and the weigths-parameter. Let's say that I hypothesize that the exam score of students (normexam) is influenced by their score on a standardized LR test (standLRT). Students are of course nested in schools. These variables are contained in the Exam-data in the mlmRev package. library(nlme) library(mlmRev) lme(fixed = normexam ~ standLRT, data = Exam, random = ~ 1 | school) If I want to model only a few categories of variance, all works fine. For instance, should I (for whatever reason) hypothesize that the variance on the normexam-scores is larger in mixed schools than in boys-schools, I'd use weights = varIdent(form = ~ 1 | type), leading to: heteroscedastic - lme(fixed = normexam ~ standLRT, data = Exam, weights = varIdent(form = ~ 1 | type), random = ~ 1 | school) This gives me nice and clear output, part of which is shown below: Variance function: Structure: Different standard deviations per stratum Formula: ~normexam | type Parameter estimates: Mxd Sngl 1.00 1.034607 Number of Observations: 4059 Number of Groups: 65 Though, should I hypothesize that the variance on the normexam- variable is larger on schools that have a higher average score on intake-exams (schavg), I run into troubles. I'd use weights = varIdent (form = ~ 1 | schavg), leading to: heteroscedastic - lme(fixed = normexam ~ standLRT, data = Exam, weights = varIdent(form = ~ 1 | schavg), random = ~ 1 | school) This leads to estimation problems. R tells me: Error in lme.formula(fixed = normexam ~ standLRT, data = Exam, weights = varIdent(form = ~1 | : nlminb problem, convergence error code = 1; message = iteration limit reached without convergence (9) Fiddling with maxiter and setting an unreasonable tolerance doesn't help. I think the origin of this problem lies within the large number of categories on schavg (65), that may make estimation troublesome. This leads to my two questions: - How to solve this estimation-problem? - Is is possible that the varIdent (or more general: VarFunc) of lme returns a single value, representing a coëfficiënt along which variance is increasing / decreasing? - In general: how can a variance-component / heteroscedasticity be made dependent on some level-2 variable (school level in my examples) ? Many thanks in advance, Rense Nieuwenhuis [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] format.dates, chron and Hmisc
Hello, I have some problems in using chron, Hmisc, and lattice. First, using both chron and Hmisc, I get an error message when describing data: df$Date - chron(df$Date,format=c(d/m/y)) ll - latex(describe(df),file=..//text//df.tex) Error in formatDateTime(dd, atx, !timeUsed) : could not find function format.dates Then, using a chron object and lattice, I get plot.a - xyplot(theta~Date|team,data=op.df.long, + strip = function(bg, ...) strip.default(bg = 'transparent', ...), + panel=function(x,y,...){ + panel.xyplot(x,y,cex=0.4,col=black,...) + panel.loess(x,y,span=0.3,col=black,...) + panel.abline(h=0) + }) print(plot.a) Error in pretty(rng, ...) : unused argument(s) (format.posixt = NULL) In both cases, the cron objects have been created using the function chron(). Are lattice and Hmisc functions incompatible with chron, or am I doing something else that causes these problems? Thanks, Ruud sessionInfo() R version 2.5.0 (2007-04-23) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: lattice MASSchron xlsReadWriteHmisc 0.15-4 7.2-33 2.3-11 1.3.2 3.3-2 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R logistic regression - comparison with SPSS
Dear R-list members, I have been a user of SPSS for a few years and quite new to R. I read the documentation and tried samples but I have some problems to obtain results for a logistic regression under R. The following SPSS script LOGISTIC REGRESSION vir /METHOD = FSTEP(LR) d007 d008 d009 d010 d011 d012 d013 d014 d015 d016 d017 d018 d069 d072 d073 /SAVE = PRED COOK SRESID /CLASSPLOT /PRINT = GOODFIT CI(95) /CRITERIA = PIN(.10) POUT(.10) ITERATE(40) CUT(.5) . predicts vir (value 0 or 1) according to my parameters d007 to d073. It gives me the parameters to retain in the logistic equation and the intercept. The calculation is made from a set of values of about 1.000 cases. I have been unable to translate it with success under R. I would like to check if I can obtain the same results than with SPSS. Can someone help me translate it under R ? I would be most grateful. I thank you. Best regards. -- Alain Reymond CEIA Bd Saint-Michel 119 1040 Bruxelles Tel: +32 2 736 04 58 Fax: +32 2 736 58 02 PGPId : 0xEFB06E2E __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to specify the start position using plot
Hi, How to specify the start position of Y in plot command, hopefully I can specify the range of X and Y axis. I checked the ?plot, it didnot mention I can setup the range. Thanks Pat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to specify the start position using plot
plot( x=rnorm(25, 0.5, 0.3), y=rnorm(25, 4, 1), xlim=c(0,1), ylim=c(2,7)) # ^^ for example Charles Annis, P.E. [EMAIL PROTECTED] phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Wang Sent: Sunday, June 10, 2007 12:25 PM To: r-help@stat.math.ethz.ch Subject: [R] How to specify the start position using plot Hi, How to specify the start position of Y in plot command, hopefully I can specify the range of X and Y axis. I checked the ?plot, it didnot mention I can setup the range. Thanks Pat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Feature selection for Clustering
Hi, I was wondering whether there any feature selection methods for clustering. Thanks chandra - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rdonlp2 - an extension library for constrained optimization
Ryuichi Tamura wrote: Please can you put your package on the CRAN server ? Many thanks Diethelm Wuertz Hello R-list, I have released an update version (0.3-1) of Rdonlp2. Some (fatal) bugs which may kill interpreter should be fixed. In addition, user-visible changes are: * *.mes, *.pro files are not created if name=NULL(this is default) in donlp2(). * use machine-epsilons defined in R for internal calculations(step-size, etc.). * numeric hessian is now evaluated at the optimum and calculated with the algorithm specified in 'difftype' in donlp2.control(). Setting difftype=2 will produce (roughly) same value as optim() does. I sincerely appreciate users who sent me useful comments. Windows Binary, OSX Universal Binary, Source file are available at: http://arumat.net/Rdonlp2/ Regards, TAMURA Ryuichi, mailto: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] PCA for Binary data
Hi, I was wondering whether there is any package implementing Principal Component Analysis for Binary data Thanks chandra - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to specify the start position using plot
plot(x=1:10,y=1:10,xlim=c(0,5),ylim=c(6,10)) a lot of the arguments descriptions for plot() are contained in ?par --- Patrick Wang [EMAIL PROTECTED] wrote: Hi, How to specify the start position of Y in plot command, hopefully I can specify the range of X and Y axis. I checked the ?plot, it didnot mention I can setup the range. Thanks Pat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Bored stiff? Loosen up... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R logistic regression - comparison with SPSS
Alain Reymond wrote: Dear R-list members, I have been a user of SPSS for a few years and quite new to R. I read the documentation and tried samples but I have some problems to obtain results for a logistic regression under R. The following SPSS script LOGISTIC REGRESSION vir /METHOD = FSTEP(LR) d007 d008 d009 d010 d011 d012 d013 d014 d015 d016 d017 d018 d069 d072 d073 /SAVE = PRED COOK SRESID /CLASSPLOT /PRINT = GOODFIT CI(95) /CRITERIA = PIN(.10) POUT(.10) ITERATE(40) CUT(.5) . predicts vir (value 0 or 1) according to my parameters d007 to d073. It gives me the parameters to retain in the logistic equation and the intercept. The calculation is made from a set of values of about 1.000 cases. I have been unable to translate it with success under R. I would like to check if I can obtain the same results than with SPSS. Can someone help me translate it under R ? I would be most grateful. If all the variables you mention are available in a data frame, e.g. virdf, than you can fit a logistic regression model by mymodel - glm(vir ~ d007 + d008 + d009 + d010 + d011 + d012 + d013 + d014 + d015 + d016 + d017 + d018 + d069 + d072 + d073, data = virdf, family = binomial) or mymodel - glm(vir ~ ., data = virdf, family = binomial) if there are no variables other than those mentioned above in the virdf data frame. Contrary to SPSS you need not specify in advance what you would like as output. Everything useful is stored in the model object (here: mymodel) which can then be used to further investigate the model in many ways: summary(mymodel) anova(mymodel, test = Chisq) plot(mymodel) See ?summary.glm, ?anova.glm etc. For stepwise variable selection (not necessarily corresponding to STEP(LR)), see ?step or ?add1 to do it `by hand'. HTH, Tobias P.S. You can find an introduction to R specifically targeted at (SAS and) SPSS users here: http://oit.utk.edu/scc/RforSASSPSSusers.pdf -- Tobias Verbeke - Consultant Business Decision Benelux Rue de la révolution 8 1000 Brussels - BELGIUM +32 499 36 33 15 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Windows vista's early terminate Rgui execution
Hi,I have a frustrating problem from vista that I wonder if anyone has come across the same problem. I wrote a script that involves long computational time (although, during the calculation, it spits out text on the gui to notify me the progress of the calculation periodically). Windows vista always stopped my calculation and claimed that 'Rgui is stop-working. Windows is checking for solution.' And when I looked into task manager, windows already stopped my Rgui process. I am quite disappointed with this. I would really appreciate if anyone finds a solution to go around this windows vista problem? Particularly, how to turn off this feature in vista? Any help would be really appreciated. Thank you!- adschai [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
Since R is supposed to be a complete programming language, I wonder why these tools couldn't be implemented in R (unless speed is the issue). Of course, it's a naive desire to have a single language that does everything, but it seems that R currently has most of the functions necessary to do the type of data cleaning described. For instance, Gabor and Peter showed some snippets of ways to do this elegantly; my [physical science] data is often not as horrendously structured so usually I can get away with a program containing this type of code txtin - scan(filename,what=,sep=\n) filteredList - lapply(strsplit(txtin,delimiter),FUN=filterfunction) # fiteringfunction() returns selected (and possibly transformed # elements if present and NULL otherwise # may include calls to grep(), regexpr(), gsub(), substring(),... # nchar(), sscanf(), type.convert(), paste(), etc. mydataframe - do.call(rbind,filteredList) # then match(), subset(), aggregate(), etc. In the case that the file is large, I open a file connection and scan a single line + apply filterfunction() successively in a FOR-LOOP instead of using lapply(). Of course, the devil is in the details of the filtering function, but I believe most of the required text processing facilities are already provided by R. I often have tasks that involve a combination of shell-scripting and text processing to construct the data frame for analysis; I started out using Python+NumPy to do the front-end work but have been using R progressively more (frankly, all of it) to take over that portion since I generally prefer the data structures and methods in R. --- Peter Dalgaard [EMAIL PROTECTED] wrote: Douglas Bates wrote: Frank Harrell indicated that it is possible to do a lot of difficult data transformation within R itself if you try hard enough but that sometimes means working against the S language and its whole object view to accomplish what you want and it can require knowledge of subtle aspects of the S language. Actually, I think Frank's point was subtly different: It is *because* of the differences in view that it sometimes seems difficult to find the way to do something in R that is apparently straightforward in SAS. I.e. the solutions exist and are often elegant, but may require some lateral thinking. Case in point: Finding the first or the last observation for each subject when there are multiple records for each subject. The SAS way would be a datastep with IF-THEN-DELETE, and a RETAIN statement so that you can compare the subject ID with the one from the previous record, working with data that are sorted appropriately. You can do the same thing in R with a for loop, but there are better ways e.g. subset(df,!duplicated(ID)), and subset(df, rev(!duplicated(rev(ID))), or maybe do.call(rbind,lapply(split(df,df$ID), head, 1)), resp. tail. Or something involving aggregate(). (The latter approaches generalize better to other within-subject functionals like cumulative doses, etc.). The hardest cases that I know of are the ones where you need to turn one record into many, such as occurs in survival analysis with time-dependent, piecewise constant covariates. This may require transposing the problem, i.e. for each interval you find out which subjects contribute and with what, whereas the SAS way would be a within-subject loop over intervals containing an OUTPUT statement. Also, there are some really weird data formats, where e.g. the input format is different in different records. Back in the 80's where punched-card input was still common, it was quite popular to have one card with background information on a patient plus several cards detailing visits, and you'd get a stack of cards containing both kinds. In R you would most likely split on the card type using grep() and then read the two kinds separately and merge() them later. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
On 10-Jun-07 14:04:44, Sarah Goslee wrote: On 6/10/07, Ted Harding [EMAIL PROTECTED] wrote: ... a lot of the problems with data files arise at the data gathering and entry stages, where people can behave as if stuffing unpaired socks and unattributed underwear randomly into a drawer, and then banging it shut. Not specifically R-related, but this would make a great fortune. Sarah -- Sarah Goslee http://www.functionaldiversity.org I'm not going to object to that! Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 10-Jun-07 Time: 21:18:45 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on weighted Kaplan-Meier analysis of case-cohort design
I have a study best described as a retrospective case-cohort design: the cases were all the events in a given time span surveyed, and the controls (event-free during the follow-up period) were selected in 2:1 ratio (2 controls per case). The sampling frequency for the controls was about 0.27, so I used a weight vector consisting of 1 for cases and 1/0.27 for controls for coxph to adjust for sampling bias. Using the same weights in Kaplan-Meier analysis (survfit) gave very inaccurate survival curves (much lower event rate than expected from population). Are weighting handled differently between coxph and survfit? How should I conduct a weighted Kaplan-Meier analysis (given that survfit doesn't accept a weighted cox model) for such a design? Any explanations or suggestions are highly appreciated, xiaojun __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Windows vista's early terminate Rgui execution
At 03:28 PM 6/10/2007, [EMAIL PROTECTED] wrote: Hi,I have a frustrating problem from vista that I wonder if anyone has come across the same problem. I wrote a script that involves long computational time (although, during the calculation, it spits out text on the gui to notify me the progress of the calculation periodically). Windows vista always stopped my calculation and claimed that 'Rgui is stop-working. Windows is checking for solution.' And when I looked into task manager, windows already stopped my Rgui process. I am quite disappointed with this. I would really appreciate if anyone finds a solution to go around this windows vista problem? Particularly, how to turn off this feature in vista? Any help would be really appreciated. Thank you!- adschai You probably need to contact Vista periodically so it knows you are awake. Just include a line that does a call to Vista that doesn't do output, such as useless - dir() placed in some outer loop that satisfies the drop dead time between calls. Alternatively, you can attempt to find out how to change the registry entry corresponding to the wait time and increase it to a value you can live with. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] initial value for optim in polr question
Hi, I have a problem with initial value for optim in polr that R report. After a call to polr, it complains that: Error in optim(start, fmin, gmin, method=BFGS, hessian= Hess, ...) : initial value in 'vmin' is not finite. Would you please suggest a way round to this problem? Thank you so much in advance. Rgds, - adschai __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Windows vista's early terminate Rgui execution
That's really helpful Robert! I was thinking of writing my output to a file periodically but that will make my runtime longer. I think this way is better. Running dir() which contacts windows periodically because it takes much less time than writing to a file. Thank you.- adschai- Original Message -From: Robert A LaBudde Date: Sunday, June 10, 2007 3:32 pmSubject: Re: [R] Windows vista's early terminate Rgui executionTo: R-help@stat.math.ethz.ch At 03:28 PM 6/10/2007, [EMAIL PROTECTED] wrote: Hi,I have a frustrating problem from vista that I wonder if anyone has come across the same problem. I wrote a script that involves long computational time (although, during the calculation, it spits out text on the gui to notify me the progress of the calculation periodically). Windows vista always stopped my calculation and claimed that 'Rgui is stop-working. Windows is checking for solution.' And when I looked into task manager, windows alread! y stopped my Rgui process. I am quite disappointed with this. I would really appreciate if anyone finds a solution to go around this windows vista problem? Particularly, how to turn off this feature in vista? Any help would be really appreciated. Thank you!- adschai You probably need to contact Vista periodically so it knows you are awake. Just include a line that does a call to Vista that doesn't do output, such as useless - dir() placed in some outer loop that satisfies the drop dead time between calls. Alternatively, you can attempt to find out how to change the registry entry corresponding to the wait time and increase it to a value you can live with. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [EMAIL PROTECTED] Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, ! VA 23464-3239Fax: 757-467-2947 Vere scire est per caus as scire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R- project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] {nlme} Multilevel estimation heteroscedasticity
Rense, how about weights = varPower(form = ~ schavg) or weights = varConstPower(form = ~ schavg) or even weights = varPower(form = ~ schavg | type) Yuo might find Pinheiro and Bates (2000) to be a valuable investment. I hope that this helps, Andrew On Sun, Jun 10, 2007 at 04:35:58PM +0200, Rense Nieuwenhuis wrote: Dear All, I'm trying to model heteroscedasticity using a multilevel model. To do so, I make use of the nlme package and the weigths-parameter. Let's say that I hypothesize that the exam score of students (normexam) is influenced by their score on a standardized LR test (standLRT). Students are of course nested in schools. These variables are contained in the Exam-data in the mlmRev package. library(nlme) library(mlmRev) lme(fixed = normexam ~ standLRT, data = Exam, random = ~ 1 | school) If I want to model only a few categories of variance, all works fine. For instance, should I (for whatever reason) hypothesize that the variance on the normexam-scores is larger in mixed schools than in boys-schools, I'd use weights = varIdent(form = ~ 1 | type), leading to: heteroscedastic - lme(fixed = normexam ~ standLRT, data = Exam, weights = varIdent(form = ~ 1 | type), random = ~ 1 | school) This gives me nice and clear output, part of which is shown below: Variance function: Structure: Different standard deviations per stratum Formula: ~normexam | type Parameter estimates: Mxd Sngl 1.00 1.034607 Number of Observations: 4059 Number of Groups: 65 Though, should I hypothesize that the variance on the normexam- variable is larger on schools that have a higher average score on intake-exams (schavg), I run into troubles. I'd use weights = varIdent (form = ~ 1 | schavg), leading to: heteroscedastic - lme(fixed = normexam ~ standLRT, data = Exam, weights = varIdent(form = ~ 1 | schavg), random = ~ 1 | school) This leads to estimation problems. R tells me: Error in lme.formula(fixed = normexam ~ standLRT, data = Exam, weights = varIdent(form = ~1 | : nlminb problem, convergence error code = 1; message = iteration limit reached without convergence (9) Fiddling with maxiter and setting an unreasonable tolerance doesn't help. I think the origin of this problem lies within the large number of categories on schavg (65), that may make estimation troublesome. This leads to my two questions: - How to solve this estimation-problem? - Is is possible that the varIdent (or more general: VarFunc) of lme returns a single value, representing a co?ffici?nt along which variance is increasing / decreasing? - In general: how can a variance-component / heteroscedasticity be made dependent on some level-2 variable (school level in my examples) ? Many thanks in advance, Rense Nieuwenhuis [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
On 10-Jun-07 19:27:50, Stephen Tucker wrote: Since R is supposed to be a complete programming language, I wonder why these tools couldn't be implemented in R (unless speed is the issue). Of course, it's a naive desire to have a single language that does everything, but it seems that R currently has most of the functions necessary to do the type of data cleaning described. In principle that is certainly true. A couple of comments, though. 1. R's rich data structures are likely to be superfluous. Mostly, at the sanitisation stage, one is working with flat files (row column). This straightforward format is often easier to handle using simple programs for the kind of basic filtering needed, rather then getting into the heavier programming constructs of R. 2. As follow-on and contrast at the same time, very often what should be a nice flat file with no rough edges is not. If there are variable numbers of fields per line, R will not handle it straightforwardly (you can force it in, but it's more elaborate). There are related issues as well. a) If someone entering data into an Excel table lets their cursor wander outside the row/col range of the table, this can cause invisible entities to be planted in the extraneous cells. When saved as a CSV, this file then has variable numbers of fields per line, and possibly also extra lines with arbitrary blank fields. cat datafile.csv | awk 'BEGIN{FS=,}{n=NF;print n}' will give you the numbers of fields in each line. If you further pipe it into | sort -nu you will get the distinct field-numbers. If you know (by now) how many fields there should be (e.g. 10), then cat datafile.csv | awk 'BEGIN{FS=,} (NF != 10){print NR , NF}' will tell you which lines have the wrong number of fields, and how many fields they have. You can similarly count how many lines there are (e.g. pipe into wc -l). b) Poeple sometimes randomly use a blank space or a . in a cell to demote a missing value. Consistent use of either is OK: ,, in a CSV will be treated as NA by R. The use of . can be more problematic. If for instance you try to read the following CSV into R as a dataframe: 1,2,.,4 2,.,4,5 3,4,.,6 the . in cols 2 and 3 is treated as the character ., with the result that something complicated happens to the typing of the items. typeeof(D[i,j]) is always integer. sum(D[1,1]=1, but sum(D[1,2]) gives a type-error, even though the entry is in fact 2. And so on , in various combinations. And (as.nmatrix(D)) is of course a matrix of characters. In fact, columns 2 and 3 of D are treated as factors! for(i in (1:3)){ for(j in (1:4)){ print( (D[i,j]))}} [1] 1 [1] 2 Levels: . 2 4 [1] . Levels: . 4 [1] 4 [1] 2 [1] . Levels: . 2 4 [1] 4 Levels: . 4 [1] 5 [1] 3 [1] 4 Levels: . 2 4 [1] . Levels: . 4 [1] 6 This is getting altogether too complicated for the job one wants to do! And it gets worse when people mix ,, and ,.,! On the other hand, a simple brush with awk (or sed in this case) can sort it once and for all, without waking the sleeping dogs in R. I could go on. R undoubtedly has the power, but it can very quickly get over-complicated for simple jobs. Best wishes to all, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 10-Jun-07 Time: 22:14:35 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
An important potential benefit of R solutions shared by awk, sed, ... is that they provide a reproducible way to document exactly how one got from one version of the data to the next. This seems to be the main problem with handicraft methods like editing excel files, it is too easy to introduce new errors that can't be tracked down at later stages of the analysis. url:www.econ.uiuc.edu/~rogerRoger Koenker email [EMAIL PROTECTED] Department of Economics vox:217-333-4558University of Illinois fax:217-244-6678Champaign, IL 61820 On Jun 10, 2007, at 4:14 PM, (Ted Harding) wrote: On 10-Jun-07 19:27:50, Stephen Tucker wrote: Since R is supposed to be a complete programming language, I wonder why these tools couldn't be implemented in R (unless speed is the issue). Of course, it's a naive desire to have a single language that does everything, but it seems that R currently has most of the functions necessary to do the type of data cleaning described. In principle that is certainly true. A couple of comments, though. 1. R's rich data structures are likely to be superfluous. Mostly, at the sanitisation stage, one is working with flat files (row column). This straightforward format is often easier to handle using simple programs for the kind of basic filtering needed, rather then getting into the heavier programming constructs of R. 2. As follow-on and contrast at the same time, very often what should be a nice flat file with no rough edges is not. If there are variable numbers of fields per line, R will not handle it straightforwardly (you can force it in, but it's more elaborate). There are related issues as well. a) If someone entering data into an Excel table lets their cursor wander outside the row/col range of the table, this can cause invisible entities to be planted in the extraneous cells. When saved as a CSV, this file then has variable numbers of fields per line, and possibly also extra lines with arbitrary blank fields. cat datafile.csv | awk 'BEGIN{FS=,}{n=NF;print n}' will give you the numbers of fields in each line. If you further pipe it into | sort -nu you will get the distinct field-numbers. If you know (by now) how many fields there should be (e.g. 10), then cat datafile.csv | awk 'BEGIN{FS=,} (NF != 10){print NR , NF}' will tell you which lines have the wrong number of fields, and how many fields they have. You can similarly count how many lines there are (e.g. pipe into wc -l). b) Poeple sometimes randomly use a blank space or a . in a cell to demote a missing value. Consistent use of either is OK: ,, in a CSV will be treated as NA by R. The use of . can be more problematic. If for instance you try to read the following CSV into R as a dataframe: 1,2,.,4 2,.,4,5 3,4,.,6 the . in cols 2 and 3 is treated as the character ., with the result that something complicated happens to the typing of the items. typeeof(D[i,j]) is always integer. sum(D[1,1]=1, but sum(D[1,2]) gives a type-error, even though the entry is in fact 2. And so on , in various combinations. And (as.nmatrix(D)) is of course a matrix of characters. In fact, columns 2 and 3 of D are treated as factors! for(i in (1:3)){ for(j in (1:4)){ print( (D[i,j]))}} [1] 1 [1] 2 Levels: . 2 4 [1] . Levels: . 4 [1] 4 [1] 2 [1] . Levels: . 2 4 [1] 4 Levels: . 4 [1] 5 [1] 3 [1] 4 Levels: . 2 4 [1] . Levels: . 4 [1] 6 This is getting altogether too complicated for the job one wants to do! And it gets worse when people mix ,, and ,.,! On the other hand, a simple brush with awk (or sed in this case) can sort it once and for all, without waking the sleeping dogs in R. I could go on. R undoubtedly has the power, but it can very quickly get over-complicated for simple jobs. Best wishes to all, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 10-Jun-07 Time: 22:14:35 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonlinear Regression
Have you worked through the examples in the 'nls' help file, especially the following: DNase1 - subset(DNase, Run == 1) fm3DNase1 - nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)), data = DNase1, start = list(Asym = 3, xmid = 0, scal = 1), trace = TRUE) Treated - Puromycin[Puromycin$state == treated, ] weighted.MM - function(resp, conc, Vm, K) { ## Purpose: exactly as white book p. 451 -- RHS for nls() ## Weighted version of Michaelis-Menten model ## -- ## Arguments: 'y', 'x' and the two parameters (see book) ## -- ## Author: Martin Maechler, Date: 23 Mar 2001 pred - (Vm * conc)/(K + conc) (resp - pred) / sqrt(pred) } Pur.wt - nls( ~ weighted.MM(rate, conc, Vm, K), data = Treated, start = list(Vm = 200, K = 0.1), trace = TRUE) 112.5978 : 200.0 0.1 17.33824 : 205.67588840 0.04692873 14.6097 : 206.33087396 0.05387279 14.59694 : 206.79883508 0.05457132 14.59690 : 206.83291286 0.05460917 14.59690 : 206.83468191 0.05461109 # In the call to 'nls' here, 'Vm' and 'K' are in 'start' and must therefore be parameters to be estimated. # The other names passed to the global 'weighted.MM' must be columns of 'data = Treated'. # To get the residual sum of squares, first note that it is printed as the first column in the trace output. # To get that from Pur.wt, I first tried 'class(Pur.wt)'. # This told me it was of class 'nls'. # I then tried method(class='nls'). # One of the functions listed was 'residuals.nls'. That gave me the residuals. # I then tried 'sum(residuals(Pur.wt)^2)', which returned 14.59690. Hope this helps. Spencer Graves p.s. Did this answer your question? Your example did not seem to me to be self contained, which makes it more difficult for me to know if I'm misinterpreting your question. If the example had been self contained, I might have replied a couple of days ago. tronter wrote: Hello I followed the example in page 59, chapter 11 of the 'Introduction to R' manual. I entered my own x,y data. I used the least squares. My function has 5 parameters: p[1], p[2], p[3], p[4], p[5]. I plotted the x-y data. Then I used lines(spline(xfit,yfit)) to overlay best curves on the data while changing the parameters. My question is how do I calculate the residual sum of squares. In the example they have the following: df - data.frame( x=x, y=y) fit - nls(y ~SSmicmen(s, Vm, K), df) fit In the second line how would I input my function? Would it be: fit - nls(y ~ myfunction(p[1], p[2], p[3], p[4], p[5]), df) where myfunction is the actual function? My function doesnt have a name, so should I just enter it? Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Potential junk email moved to Junk Folder
MailMarshal (an automated content monitoring gateway) has not delivered the following message: Message: B466c9a69.0001.0001.mml From:r-help@stat.math.ethz.ch To: [EMAIL PROTECTED] Subject: [Spam] delivery failed This is due to automatic rules that have determined that the message is probably junk email. If you believe the message was business related please contact [EMAIL PROTECTED] and request that the message be released. If no contact is made within 5 days the message will automatically be deleted. MailMarshal Rule: SPAM subject block : Spam Subject Block Script spam in subject Triggered in Subject Expression: SPAM Triggered 1 times weighting 5 Email Content Security provided by NetIQ MailMarshal. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I obtain standard error of each estimated coefficients in polr
Hi, I obtained all the coefficients that I need from polr. However, I'm wondering how I can obtain the standard error of each estimated coefficient? I saved the Hessian and do something like summary(polrObj), I don't see any standard error like when doing regression using lm. Any help would be really appreciated. Thank you! - adschai __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tools For Preparing Data For Analysis
Embarrasingly, I don't know awk or sed but R's code seems to be shorter for most tasks than Python, which is my basis for comparison. It's true that R's more powerful data structures usually aren't necessary for the data cleaning, but sometimes in the filtering process I will pick out lines that contain certain data, in which case I have to convert text to numbers and perform operations like which.min(), order(), etc., so in that sense I like to have R's vectorized notation and the objects/functions that support it. As far as some of the tasks you described, I've tried transcribing them to R. I know you provided only the simplest examples, but even in these cases I think R's functions for handling these situations exemplify their usefulness in this step of the analysis. But perhaps you would argue that this code is too long... In any event it will still save the trouble of keeping track of an extra (intermediate) file passed between awk and R. (1) the numbers of fields in each line equivalent to cat datafile.csv | awk 'BEGIN{FS=,}{n=NF;print n}' in awk # R equivalent: nFields - count.fields(datafile.csv,sep=,) # or nFields - sapply(strsplit(readLines(datafile.csv),,),length) (2) which lines have the wrong number of fields, and how many fields they have. You can similarly count how many lines there are (e.g. pipe into wc -l). # number of lines with wrong number of fields nWrongFields - length(nFields[nFields 10]) # select only first ten fields from each line # and return a matrix firstTenFields - do.call(rbind, lapply(strsplit(readLines(datafile.csv),,), function(x) x[1:10])) # select only those lines which contain ten fields # and return a matrix onlyTenFields - do.call(rbind, lapply(strsplit(readLines(datafile.csv),,), function(x) if(length(x) = 10) x else NULL)) (3) If for instance you try to read the following CSV into R as a dataframe: 1,2,.,4 2,.,4,5 3,4,.,6 txtC - textConnection( 1,2,.,4 2,.,4,5 3,4,.,6) # using read.csv() specifying na.string argument: read.csv(txtC,header=FALSE,na.string=.) V1 V2 V3 V4 1 1 2 NA 4 2 2 NA 4 5 3 3 4 NA 6 # Of course, read.csv will work only if data is formatted correctly. # More generally, using readLines(), strsplit(), etc., which are more # flexible : do.call(rbind, + lapply(strsplit(readLines(txtC),,), +type.convert,na.string=.)) [,1] [,2] [,3] [,4] [1,]12 NA4 [2,]2 NA45 [3,]34 NA6 (4) Situations where people mix ,, and ,.,! # type.convert (and read.csv) will still work when missing values are ,, # and ,., (automatically recognizes as NA and through # specification of 'na.string', can recognize . as NA) # If it is desired to convert . to first, this is simple as # well: m - do.call(rbind, lapply(strsplit(readLines(txtC),,), function(x) gsub(^\\.$,,x))) m [,1] [,2] [,3] [,4] [1,] 1 2 4 [2,] 2 4 5 [3,] 3 4 6 # then mode(m) - numeric # or m - apply(m,2,type.convert) # will give m [,1] [,2] [,3] [,4] [1,]12 NA4 [2,]2 NA45 [3,]34 NA6 --- [EMAIL PROTECTED] wrote: On 10-Jun-07 19:27:50, Stephen Tucker wrote: Since R is supposed to be a complete programming language, I wonder why these tools couldn't be implemented in R (unless speed is the issue). Of course, it's a naive desire to have a single language that does everything, but it seems that R currently has most of the functions necessary to do the type of data cleaning described. In principle that is certainly true. A couple of comments, though. 1. R's rich data structures are likely to be superfluous. Mostly, at the sanitisation stage, one is working with flat files (row column). This straightforward format is often easier to handle using simple programs for the kind of basic filtering needed, rather then getting into the heavier programming constructs of R. 2. As follow-on and contrast at the same time, very often what should be a nice flat file with no rough edges is not. If there are variable numbers of fields per line, R will not handle it straightforwardly (you can force it in, but it's more elaborate). There are related issues as well. a) If someone entering data into an Excel table lets their cursor wander outside the row/col range of the table, this can cause invisible entities to be planted in the extraneous cells. When saved as a CSV, this file then has variable numbers of fields per line, and possibly also extra lines with arbitrary blank fields. cat datafile.csv | awk 'BEGIN{FS=,}{n=NF;print n}' will give you the numbers of fields in each line. If you further pipe it into | sort -nu you will get the distinct field-numbers. If you know (by now) how many fields there should be (e.g. 10), then cat
[R] Determination of % of misclassification
Hi R-users, Suppose i have a two class discrimination problem and i am using logistic regression for the classification. model.logit - glm(formula=RES~NUM01+NUM02+NUM03+NUM04,family=binomial(link=logit),data=train.data) predict.logit-predict.glm(model.logit,newdata=test.data,type='response',se.fit=FALSE) predict.logit I have two questions: 1. Suppose our training data consists of 700 observations and testing set of 300. How can i determine no of misclassifications from predicted values and fitted values. 2. How to determine AUC from ROC curve and also threshold value? Waiting for reply, Thanks in advance, bye -- View this message in context: http://www.nabble.com/Determination-of---of-misclassification-tf3899598.html#a11055026 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lm for matrix of response...
Dear All, 1)Can I use lm() to fit more than one response in single expression. e.g data is a matrix of these variables R1 R2 R3 X Y Z 1 2 1 1 2 3 Now i wnat to fit R1:R3 ~ X+Y+Z. 2) How can i use Singular Value decomposition (SVD) as an alternate to lsq. Regards, __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Updated ggplot2 package (beta version)
ggplot2 === ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics. Find out more at http://had.co.nz/ggplot2 Changes in version 0.5.1 -- * new chapter in book and changes to package to make it possible to customise every aspect of ggplot display using grid * a new economic data set to help demonstrate line, path and area plots * many bug fixes reported by beta testers Hadley ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm for matrix of response...
On Sun, 10 Jun 2007, vinod gullu wrote: Dear All, 1)Can I use lm() to fit more than one response in single expression. e.g data is a matrix of these variables R1 R2 R3 X Y Z 1 2 1 1 2 3 Now i wnat to fit R1:R3 ~ X+Y+Z. ?lm says If 'response' is a matrix a linear model is fitted separately by least-squares to each column of the matrix. so cbind(R1,R2,R3) ~ X+Y+Z 2) How can i use Singular Value decomposition (SVD) as an alternate to lsq. See ?svd. Note that SVD is not a model-fitting criterion, and can be used to fit by least squares. If you mean something else, please study the posting guide and tell us precisely what you mean, with references. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.