[R] GLARMA
Hello, I am a new R user and I need R code for GLARMA. I will be really thankful if you help me. Yours sincerely, __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] problem with dse package (was KALMAN FILTER HELP)
This has come up before: it needs a bug fix which Paul Gilbert has already implemented (but not yet released). Please use an informative subject line, and don't SHOUT at us. (All caps is regarded as shouting, and BTW the package bundle is dse not DSE.) On Tue, 3 Jan 2006, Sumanta Basak wrote: Currently I'm using DSE package for Kalman Filtering. I have a dataset of one dependent variable and seven other independent variables. I'm confused at one point. How to declare the input-output series using TSdata command. Because the given example at page 37 showing some error. rain - matrix(rnorm(86*17), 86,17) radar - matrix(rnorm(86*5), 86,5) mydata - TSdata(input=radar, output=rain) input data: Error: evaluation nested too deeply: infinite recursion / options(expressions=)? Can anyone explain it to me what's going wrong in this? In my data set, I have Change in Exchange Rate as my dependent variable and seven other economic variables as independent variables. I'm trying to forecast Change in Exchange Rate using available dataset of 244 points. How can declare the input and output dataset in this framework? I hope I'm right to explain in this way what ultimately I'm going to do. After having a TSdata object, I want to use toSS to convert the TS model into state space model, and then use l.SS. Am I right in my thinking? Please advice, and many thanks in advance. PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] GLARMA
On 12/30/05, Âíµ¤ [EMAIL PROTECTED] wrote: Hello, I am a new R user and I need R code for GLARMA. I will be really thankful if you help me. You should really spell out an acronym like GLARMA. RSiteSearch(GLARMA) does't give anything. ou could have a look at package sspir. Kjetil Yours sincerely, __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Extending a data frame with S4
I'm trying to create an extension to data.frame with more complex row and column names, and have run into some problems: setClass(new-data.frame, representation(data.frame)) [1] new-data.frame Warning message: old-style ('S3') class data.frame supplied as a superclass of new-data.frame, but no automatic conversion will be peformed for S3 classes in: .validDataPartClass(clDef, name) Do I need to be worried about this? new(new-data.frame, data.frame()) Error in initialize(value, ...) : initialize method returned an object of class data.frame instead of the required class new-data.frame I guess this is related to the warning above. I presume I can fix this with an initialize function, but I'm not sure how to go about referring to the data frame that is the object. Is there a way to extend a data.frame, or do I need to create an object that contains the data frame in a slot? Thanks for your help, Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Monday, January 02, 2006 4:59 PM To: Philippe Grosjean Cc: Kort, Eric; Kjetil Halvorsen; R-help@stat.math.ethz.ch Subject: Re: [R] A comment about R: Probably what is needed is for someone familiar with both Stata and R to create a lexicon in the vein of the Octave to R lexicon http://cran.r-project.org/doc/contrib/R-and-octave-2.txt to make it easier for Stata users to understand R. Ditto for SAS and SPSS. IMO this is a very good proposal but I think that the main problem is not the translation of one function in SPSS/Stata/SAS to the equivalent in R. Remembering my first contact with R after using SPSS for some years (and having some experience with Stata and SAS) was that your mental framework is different. You think in SPSS-terms (i.e. you expect that data are automatically a rectangular matrix, functions operate on columns of this matrix, you have always only one dataset available, ...). This is why jumping from SPSS to Stata is relatively easy. But to jump from any of the three to R is much more difficult. This mental barrier is also the main obstacle for me now when I try to encourage the use of R to other people who have a similar background as I had. What can be done about it? I guess the only answer is investing time from the user which implies that R will probably never become the language of choice for casual users. But popularity is probably not the main goal of the R-Project (it would be rather a nice side-effect). Just a few thoughts ... Best, Roland + This mail has been sent through the MPI for Demographic Rese...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
Roland, Yes, indeed, you are perfectly right. The problem is that R richness means R complexity: many different data types, sub-languages like regexp or the formula interface, S3/S4 objects, classical versus lattice (versus RGL versus iplots) graphs, etc. During translation of R in French, I was thinking of a subset of one or two hundreds of functions that would be enough for beginners to start with, and to propose a translation of that small subset of the online help in French. This is still on my todo list, but I must admit it is not an easy task to decide which function should be kept in the subset and which should not! In fact, that idea could be, perhaps, generalized into the whole online help. It would be sufficient to add a flag somewhere (perhaps a keyword) telling that page is fundamental and to allow filtering index and pages (fundamental only or full help). Even for advanced users, it should be nice to have such a filter to display only the two or three most important functions in a new packages that proposes perhaps hundred online help pages... Using R Commander is also an interesting experiment. R Commander simplifies the use of R down to the manipulation of a single data frame (the so-called active dataset) + optionally one or two model objects. Just look at all you can do just with one active data frame with R Commander, and you will realize that it is perfectly manageable to learn R that way. Best, Philippe Grosjean Rau, Roland wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Monday, January 02, 2006 4:59 PM To: Philippe Grosjean Cc: Kort, Eric; Kjetil Halvorsen; R-help@stat.math.ethz.ch Subject: Re: [R] A comment about R: Probably what is needed is for someone familiar with both Stata and R to create a lexicon in the vein of the Octave to R lexicon http://cran.r-project.org/doc/contrib/R-and-octave-2.txt to make it easier for Stata users to understand R. Ditto for SAS and SPSS. IMO this is a very good proposal but I think that the main problem is not the translation of one function in SPSS/Stata/SAS to the equivalent in R. Remembering my first contact with R after using SPSS for some years (and having some experience with Stata and SAS) was that your mental framework is different. You think in SPSS-terms (i.e. you expect that data are automatically a rectangular matrix, functions operate on columns of this matrix, you have always only one dataset available, ...). This is why jumping from SPSS to Stata is relatively easy. But to jump from any of the three to R is much more difficult. This mental barrier is also the main obstacle for me now when I try to encourage the use of R to other people who have a similar background as I had. What can be done about it? I guess the only answer is investing time from the user which implies that R will probably never become the language of choice for casual users. But popularity is probably not the main goal of the R-Project (it would be rather a nice side-effect). Just a few thoughts ... Best, Roland + This mail has been sent through the MPI for Demographic Rese...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lmer error message
Dear All, I have the following error message when I fitted lmer to a binary data with the AGQ option: Error in family$mu.eta(eta) : NAs are not allowed in subscripted assignments In addition: Warning message: IRLS iterations for PQL did not converge Any help? Thanks in advance, Abderrahim [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
Rau, Roland [EMAIL PROTECTED] wrote IMO this is a very good proposal but I think that the main problem is not the translation of one function in SPSS/Stata/SAS to the equivalent in R. Remembering my first contact with R after using SPSS for some years (and having some experience with Stata and SAS) was that your mental framework is different. You think in SPSS-terms (i.e. you expect that data are automatically a rectangular matrix, functions operate on columns of this matrix, you have always only one dataset available, ...). This is why jumping from SPSS to Stata is relatively easy. But to jump from any of the three to R is much more difficult. This mental barrier is also the main obstacle for me now when I try to encourage the use of R to other people who have a similar background as I had. What can be done about it? I guess the only answer is investing time from the user which implies that R will probably never become the language of choice for casual users. But popularity is probably not the main goal of the R-Project (it would be rather a nice side-effect). As someone who uses SAS qutie a bit and R somewhat less, I think Roland makes some excellent points. Going from SPSS to SAS (which I once did) is like going from Spansih to French. Going from SAS to R (which I am trying to do) is like going from English to Chinese. But it's more than that. Beyond the obvious differences in the languages is a difference in how they are written about; and how they are improved. SAS documentation is much lengthier than R's. Some people like the terseness of R's help. Some like the verboseness of SAS's. SOme of this difference is doubtless due to the fact that SAS is commercial, and pays people to write the documentation. I have tremednous appreciation for the unpaid effort that goes into R, and nothing I say here should be seen as detracting from that. As to how they are improved, the fact that R is extended (in part) by packages written by many many different people is good, becuase it means that the latest techniques can be written up, often by the people who invent the techniques (and, again, I appreciate this tremendously), but it does mean that a) It is hard to know what is out there at any given time; b) the styles of pacakages difer somewhat. In addition, I think the distinction between 'casual user' and serious user is something of a false dichotomy. It's really a continuum, or, probably, several continua, that make R harder or easier for people to learn. I like R. I like it a lot. I like that it's free. I like that it's cutting edge. I like that it can do amazing graphics. I like that the code is open. I like that I can write my own functions in the same language. And, again, I am amazed at the amount of time and effort people put into it. But I do think that the link in the original post made some good points, and the writer of that post is not the only one who has found R difficult to learn. Peter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Extending a data frame with S4
the help page on setOldClass might help you. In particular the section Register or Convert?. Matthias hadley wickham schrieb: I'm trying to create an extension to data.frame with more complex row and column names, and have run into some problems: setClass(new-data.frame, representation(data.frame)) [1] new-data.frame Warning message: old-style ('S3') class data.frame supplied as a superclass of new-data.frame, but no automatic conversion will be peformed for S3 classes in: .validDataPartClass(clDef, name) Do I need to be worried about this? new(new-data.frame, data.frame()) Error in initialize(value, ...) : initialize method returned an object of class data.frame instead of the required class new-data.frame I guess this is related to the warning above. I presume I can fix this with an initialize function, but I'm not sure how to go about referring to the data frame that is the object. Is there a way to extend a data.frame, or do I need to create an object that contains the data frame in a slot? Thanks for your help, Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- StaMatS - Statistik + Mathematik Service Dipl.Math.(Univ.) Matthias Kohl www.stamats.de __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
On 1/3/06, Peter Flom [EMAIL PROTECTED] wrote: Rau, Roland [EMAIL PROTECTED] wrote IMO this is a very good proposal but I think that the main problem is not the translation of one function in SPSS/Stata/SAS to the equivalent in R. Remembering my first contact with R after using SPSS for some years (and having some experience with Stata and SAS) was that your mental framework is different. You think in SPSS-terms (i.e. you expect that data are automatically a rectangular matrix, functions operate on columns of this matrix, you have always only one dataset available, ...). This is why jumping from SPSS to Stata is relatively easy. But to jump from any of the three to R is much more difficult. This mental barrier is also the main obstacle for me now when I try to encourage the use of R to other people who have a similar background as I had. What can be done about it? I guess the only answer is investing time from the user which implies that R will probably never become the language of choice for casual users. But popularity is probably not the main goal of the R-Project (it would be rather a nice side-effect). As someone who uses SAS qutie a bit and R somewhat less, I think Roland makes some excellent points. Going from SPSS to SAS (which I once did) is like going from Spansih to French. Going from SAS to R (which I am trying to do) is like going from English to Chinese. But it's more than that. Beyond the obvious differences in the languages is a difference in how they are written about; and how they are improved. SAS documentation is much lengthier than R's. Some people like the terseness of R's help. Some like the verboseness of SAS's. SOme of Note that at least some packages do have vignettes which are lengthier discussions of the package than the help files, e.g. library(zoo) vignette(zoo) this difference is doubtless due to the fact that SAS is commercial, and pays people to write the documentation. I have tremednous appreciation for the unpaid effort that goes into R, and nothing I say here should be seen as detracting from that. As to how they are improved, the fact that R is extended (in part) by packages written by many many different people is good, becuase it means that the latest techniques can be written up, often by the people who invent the techniques (and, again, I appreciate this tremendously), but it does mean that a) It is hard to know what is out there at any given time; b) the styles of pacakages difer somewhat. Regarding (a) note that for certain areas CRAN Task Views addresses this, at least in part. See: http://cran.r-project.org/src/contrib/Views/ and R-News has a section on changes in CRAN which lists all new packages since the prior issue of CRAN. See: http://cran.r-project.org/doc/Rnews __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] need to know some basic functionality features of R-Proj
Hi, I am new-comer to statistics and R-Project. I would like to know if these features can be attained in R-Project.Please help. 1) beta 1 and Beta 2, or gamma one and gamma two for skewness and kurtosis, respectively, including standard errors and tests for significance (relative to values for a Gaussian distribution). 2) linear correlation 3) quadratic regression 4) polynomial regression 5) moving averages 6) chi-square for a two-by two table and for an n by m contingency table 7) moving averages - with various (e.g. exponential) weighting 8) cubic splines (smoothing, not interpolating) 9) other types of splines, e.g. 'linear' splines 10) erfc-1 inverse error function complement (i.e. tables of integrals of the normal (Gaussian) curve, or mathematical approximations) 11) erfcerror function complement 12) Table of significant values for t test at P 0.01 one sided or two sided - or polynomial approximation 13) Table of significance levels for chi square test 14) Table of significance levels for F distribution as arising in ANOVA 15) Confidence limits for binomial variables; possibly for multinomial variables Thanks and Regards -Asif __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] S3 vs. S4
On 1/1/06 2:07 PM, Erin Hodgess [EMAIL PROTECTED] wrote: Dear R People: Could someone direct me to some documentation on the difference between S3 and S4 classes, please? For example, why would a person use one as opposed to another? Maybe pros and cons of each? The Bioconductor project has encouraged my use of S4 classes. S4 allows creation of data structures that have methods associated with them, so for data-structure heavy programming, I think S4 might have some advantages, but I am NOT an expert in the field. Just one other link that I have found quite useful: http://www.stat.auckland.ac.nz/S-Workshop/Gentleman/S4Objects.pdf Sean __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] need to know some basic functionality features of R-Proj
On 1/3/06 6:46 AM, Mohammed Asifulla - CTD , Chennai [EMAIL PROTECTED] wrote: Hi, I am new-comer to statistics and R-Project. I would like to know if these features can be attained in R-Project.Please help. 1) beta 1 and Beta 2, or gamma one and gamma two for skewness and kurtosis, respectively, including standard errors and tests for significance (relative to values for a Gaussian distribution). 2) linear correlation 3) quadratic regression 4) polynomial regression 5) moving averages 6) chi-square for a two-by two table and for an n by m contingency table 7) moving averages - with various (e.g. exponential) weighting 8) cubic splines (smoothing, not interpolating) 9) other types of splines, e.g. 'linear' splines 10) erfc-1 inverse error function complement (i.e. tables of integrals of the normal (Gaussian) curve, or mathematical approximations) 11) erfcerror function complement 12) Table of significant values for t test at P 0.01 one sided or two sided - or polynomial approximation 13) Table of significance levels for chi square test 14) Table of significance levels for F distribution as arising in ANOVA 15) Confidence limits for binomial variables; possibly for multinomial variables Asif, It is highly likely that all these can be attained using R. I think most (if not all) of those on your list can be done with existing packages; for those that can't, R is also a full-featured programming language, so you can write functions to do what you like. I would suggest starting with the Introduction to R manual to learn what R can do. It can be obtained via the Manuals link at the left side of the R home page: http://www.r-project.org Also, if you are posting to the email list, it is quite helpful to read the posting guide, available as a link at the bottom of all emails from this list. Sean __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Q about RSQLite
Hello Liu, this might be caused by NA entries in your SQLite table. Have a look at the following code: (test - data.frame(matrix(c(1:10, NA, NA), ncol=2, nrow=6))) con - dbConnect(SQLite(), dbname = test.db) dbWriteTable(con, test, test, type=BLOB, overwrite=TRUE) d1 - dbReadTable(con, test) dbDisconnect(con) d1 HTH, Bernhard -Ursprüngliche Nachricht- Von: Wensui Liu [mailto:[EMAIL PROTECTED] Gesendet: Samstag, 31. Dezember 2005 07:09 An: r-help@stat.math.ethz.ch Betreff: [R] Q about RSQLite Happy new year, dear listers, I have a question about Rsqlite. when I fetch the data out of sqlite database, there is something like '\r\n' at the end of last column. Here is the example: Sepal_Length Sepal_Width Petal_Length Petal_WidthSpecies 1 5.1 3.5 1.4 0.2 setosa\r\n 2 4.9 3.0 1.4 0.2 setosa\r\n 3 4.7 3.2 1.3 0.2 setosa\r\n 4 4.6 3.1 1.5 0.2 setosa\r\n 5 5.0 3.6 1.4 0.2 setosa\r\n 6 5.4 3.9 1.7 0.4 setosa\r\n 7 4.6 3.4 1.4 0.3 setosa\r\n 8 5.0 3.4 1.5 0.2 setosa\r\n 9 4.4 2.9 1.4 0.2 setosa\r\n 10 4.9 3.1 1.5 0.1 setosa\r\n Any idea? Thank you so much [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html * Confidentiality Note: The information contained in this mess...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Q about RSQLite
Check the way you imported the data / the SQLite documentation. The \r\n that you see (you're on Windows, right?) is used to indicate the end of the data lines in the source file - \r is a carriage return, and \n is a new line character. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Wensui Liu Sent: Saturday, December 31, 2005 1:09 AM To: r-help@stat.math.ethz.ch Subject: [R] Q about RSQLite Happy new year, dear listers, I have a question about Rsqlite. when I fetch the data out of sqlite database, there is something like '\r\n' at the end of last column. Here is the example: Sepal_Length Sepal_Width Petal_Length Petal_WidthSpecies 1 5.1 3.5 1.4 0.2 setosa\r\n 2 4.9 3.0 1.4 0.2 setosa\r\n 3 4.7 3.2 1.3 0.2 setosa\r\n 4 4.6 3.1 1.5 0.2 setosa\r\n 5 5.0 3.6 1.4 0.2 setosa\r\n 6 5.4 3.9 1.7 0.4 setosa\r\n 7 4.6 3.4 1.4 0.3 setosa\r\n 8 5.0 3.4 1.5 0.2 setosa\r\n 9 4.4 2.9 1.4 0.2 setosa\r\n 10 4.9 3.1 1.5 0.1 setosa\r\n Any idea? Thank you so much [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bookmarking a page inside r-project.org
On Tue, 3 Jan 2006 07:29:27 +, hadley wickham (hw) wrote: A solution would be a content-management system that produced the HTML of the site from some other form of input. Only the output HTML would need to be mirrored. Care to put together such a thing, and import all the existing pages into it? One way to get around the offline problem is to have a dynamic copy somewhere and then spider and save it (eg. with wget -r). This would (obviously) require a server somewhere - but with a post-commit svn hook could be kept up to date easily. However, it is still difficult to view changes to the page immediately. What assumptions can I make about what tools are available to the editors? Can I assume the standard unix tool chain? Yes. What assumptions can I make about the people doing the editing? How many people edit the pages? For www.R-project.org all of R core have write access, but only a few actually do it ;-) How familiar with html are they? Hard to tell, let's assume at least basic familiarity with HTML (but very good familiarity to the concept of markup laguages per se). You say many of the pages are manually edited, which ones aren't? Under www.r-project.org I think all are manual. How are they generated? on CRAN all package listings are of course auto-generated (mostly using perl scripts), the mirror list is created using R. Are all the pages under https://svn.r-project.org/R-project-web/trunk/ ? No, CRAN is not, as it is pulled together from various sites where maintainers of binary distributions etc. create their parts - the CRAN master itself is mirror for the pits and pieces (e.g., windows R base binaries are mirrored from Duncan Murdoch, windows packages from Uwe Ligges, etc. etc.). Best, -- --- Friedrich Leisch Institut für Statistik Tel: (+43 1) 58801 10715 Technische Universität WienFax: (+43 1) 58801 10798 Wiedner Hauptstraße 8-10/1071 A-1040 Wien, Austria http://www.ci.tuwien.ac.at/~leisch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] r: RODBC QUESTION
Clark, I agree with Vitor that working in R might be easier, but it seems that you are working in the excel VBA environment and there may be good reason why you want to do so that we don't know about. If so, why use Rexcel function to read the file into excel when you can use VBA code to open the file in excel and then you can send the data you need to analyse to R using Rexcel? And of course the great thing about VBA is that if you don't know how to code what you want to do, you can always record it as a macro and then view the code (a neat feature that S-PLUS has too). Good luck and please follow up with more questions if our suggestions are of no help to you. Thanks, Roger On 12/31/05, Vitor Chagas [EMAIL PROTECTED] wrote: Hello Allan, You can work in two different ways, from Excel using RExcel or from R with RODBC. Personally i prefer working from R. You can start by giving names to the excel ranges (remember to put the var names in the 1st line), then run the following code to select the excel spreadsheet library(RODBC) # Select XLS File xls.file = choose.files(filters = *.xls) workdir = unlist(strsplit(xls.file, )) workdir = paste(workdir[-length(workdir)], collapse=/) setwd(workdir) # Connect to XLS Data channel - odbcConnectExcel(xls.file) after this u can check what tables (ranges) are available to work with sqlTables(channel) if you have a range named «Claims», use something like this to load it in to R xlClaims = sqlQuery(channel, paste(SELECT * FROM Claims)) and close the connection with close(channel) Hope it helps, and sorry for my poor english. Best regards, Vitor --- Clark Allan [EMAIL PROTECTED] wrote: hello all i have a quick question. i have been using the RODBC library (trying to read Excel data into R but i am doing this by using Rexcel. this is probably not the correct forum - sorry for this). my code is shown below: Sub A() 'start the connection to R Call RInterface.StartRServer RInterface.RRun library(RODBC) RInterface.RRun A = odbcConnectExcel('c:/TRY.xls') RInterface.RRun q1 = sqlFetch(A, 'Sheet1') RInterface.RRun odbcClose (A) Worksheets(out).Activate Call RInterface.GetArray(q1, Range(A1)) Call RInterface.StopRServer End Sub i have included four examples below. on the left hand side we have the data as it appears in Excel and on the right hand side we have the output from the code (outputted to the 'out' sheet in excel). in the first example the code works while in the other three exampl0es it does not. ('a' is some character) when i use the commands in r directly everything works correctly (ie missing values are treated as NA - characters is treated similarly) can anyone show me how to solve this! ANOTHER QUESTION: am i allowed to have numeric and character values in the same column when importing from Excel to R? (seems like i cant) thanking you in advance! wishing you all a happy new year (in advance) / allan Y X1 X2 1 6 3 1 6 3 2 6 2 2 6 2 3 5 2 3 5 2 Y X1 X2 0 1 6 3 2 6 2 3 a 2 Y X1 X2 0 1 6 3 2 6 2 3 a 2 Y X1 X2 0 1 3 2 6 2 3 5 2 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Labels exceed the plot area
If I use cex.lab = 2 and cex.axis = 2 the yaxis label in a plot exceed the plot area. How do I get the plot itself smaller to get space for the label so I still use cex.lab = 2 and cex.axis = 2? Kare -- ### Kare Edvardsen [EMAIL PROTECTED] Norwegian Institute for Air Research (NILU) Polarmiljosenteret NO-9296 Tromso http://www.nilu.no Swb. +47 77 75 03 75 Dir. +47 77 75 03 90 Fax. +47 77 75 03 76 Mob. +47 90 74 60 69 ### __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Package for multiple membership model?
Hello all: I am interested in computing what the multilevel modeling literature calls a multiple membership model. More specifically, I am working with a data set involving clients and providers. The clients are the lower-level units who are nested within providers (higher-level). However, this is not nesting in the usual sense, as clients can belong to multple providers, which I understand makes this a multiple membership model. Right now, I would like to keep this simple, using only a continuous dependent variable, but would like to also extend this to a repeated measures design. This doesn't seem to be possible with the lme package. Is there something else I could consider? Thanks, Brian NIMH Training Fellow GWB School of Social Work, PhD Program Washington University in St. Louis One Brookings Drive St. Louis, MO 63130 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] KALMAN FILTER HELP
Is this happening with the example as you show it, or are you trying to print mydata? There is a bug in the print method for TSdata objects, which I have fixed and was intending to put on CRAN in a few days. This bug does give the infinite recursion error, but would only happen when you print the data by typing mydata or print(mydata) I don't think the assignment you show would produce this problem, but please send me more details if it does. The problem, which will be fixed in the next release, is only with the print method. Other things are working and you should be able to do model estimation, conversion, and plot the data, just not print it. Paul Gilbert Sumanta Basak wrote: Hi All, Currently I’m using DSE package for Kalman Filtering. I have a dataset of one dependent variable and seven other independent variables. I’m confused at one point. How to declare the input-output series using TSdata command. Because the given example at page 37 showing some error. rain - matrix(rnorm(86*17), 86,17) radar - matrix(rnorm(86*5), 86,5) mydata - TSdata(input=radar, output=rain) *input data:* *Error: evaluation nested too deeply: infinite recursion / options(expressions=)?* Can anyone explain it to me what’s going wrong in this? In my data set, I have “Change in Exchange Rate” as my dependent variable and seven other economic variables as independent variables. I’m trying to forecast “Change in Exchange Rate” using available dataset of 244 points. How can declare the input and output dataset in this framework? I hope I’m right to explain in this way what ultimately I’m going to do. After having a TSdata object, I want to use toSS to convert the TS model into state space model, and then use l.SS. Am I right in my thinking? Please advice, and many thanks in advance. -- SUMANTA BASAK. Analyst. Phone No. - 080 - 41989937 (O) 09886047620 (M) Amba Research (India) Pvt Ltd. G02 Prestige Loka. 7/1, Brunton Road. Bangalore - 560025. India. -- --- This e-mail may contain confidential and/or privileged inf...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] how to work on multiple R objects?...
Hello, Happy New Year!... I am encountering a problem trying to work on the data that I load in R. I have loaded to R a series of stock data using (csv files are named e.g. IBM.R) length.R - length(list.files(., pattern=.R)) # the number of files with one #column in the directory ./ ending to .R for (i in 1:length.R) { assign(read.csv(list.files(., pattern=.R)[i], read.csv(list.files(., pattern=.R)[i]))) } I would like to perform various tasks on all these objects, but I cannot because ls(pattern=.R)[1] is not a list, but a character string!!!: typeof(ls(pa=.R)[1]) [1] character Exempli gratia: typeof(ls(pattern=.R)[45]) [1] character ls(pattern=.R)[45] I do not want that: [1] wmd.txt.R typeof(wmd.txt.R) I want that: [1] list so that I can find the mean of the series on all of these files/loaded objects with a loop that uses the command mean(wmd.txt.R) instead of mean(wmd.txt.R) that does not work... Could you help me, please or propose another way to achieve the same result? Thank you very much for your assistance, Tsardounis Constantine __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how to work on multiple R objects?...
On 1/3/06 10:24 AM, Constantine Tsardounis [EMAIL PROTECTED] wrote: Hello, Happy New Year!... I am encountering a problem trying to work on the data that I load in R. I have loaded to R a series of stock data using (csv files are named e.g. IBM.R) length.R - length(list.files(., pattern=.R)) # the number of files with one #column in the directory ./ ending to .R for (i in 1:length.R) { assign(read.csv(list.files(., pattern=.R)[i], read.csv(list.files(., pattern=.R)[i]))) } mylist - list() for (i in list.files('.',pattern='.R')) { mylist[[i]] - read.csv(i) } Sean __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] How to set the size of a rgl window, par3d() ?
Dear R- Users, is there a way to determine the size of an rgl window (rgl.open()) either in advance or afterwards, (without using the mouse, of course) ? Intuitively, one would assume to set the size by: library(rgl); par3d(viewport=c(0,0,500,500)); #rgl.open(); for example. As the parameter 'viewport' is 'readonly' this results in an error message: Error in par3d(viewport = c(0, 0, 500, 500)) : invalid value specified for rgl parameter viewport In addition: Warning message: parameter viewport cannot be set. Any possible workarounds ? Thanks Bjoern __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
I have had an email conversation with the author of the technical report from which the quote was taken. I am formulating a comment to the report that will be posted with the technical report. I would be pleased if this thread continued, so I will know better what I want to say. Plus I should be able to reference this thread in the comment. Regards, Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Rau, Roland wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Monday, January 02, 2006 4:59 PM To: Philippe Grosjean Cc: Kort, Eric; Kjetil Halvorsen; R-help@stat.math.ethz.ch Subject: Re: [R] A comment about R: Probably what is needed is for someone familiar with both Stata and R to create a lexicon in the vein of the Octave to R lexicon http://cran.r-project.org/doc/contrib/R-and-octave-2.txt to make it easier for Stata users to understand R. Ditto for SAS and SPSS. IMO this is a very good proposal but I think that the main problem is not the translation of one function in SPSS/Stata/SAS to the equivalent in R. Remembering my first contact with R after using SPSS for some years (and having some experience with Stata and SAS) was that your mental framework is different. You think in SPSS-terms (i.e. you expect that data are automatically a rectangular matrix, functions operate on columns of this matrix, you have always only one dataset available, ...). This is why jumping from SPSS to Stata is relatively easy. But to jump from any of the three to R is much more difficult. This mental barrier is also the main obstacle for me now when I try to encourage the use of R to other people who have a similar background as I had. What can be done about it? I guess the only answer is investing time from the user which implies that R will probably never become the language of choice for casual users. But popularity is probably not the main goal of the R-Project (it would be rather a nice side-effect). Just a few thoughts ... Best, Roland + This mail has been sent through the MPI for Demographic Rese...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
On Mon, 2 Jan 2006, Philippe Grosjean wrote: That said, I think one should interpret Mitchell's paper in a different way. Obviously, he is an unconditional and happy Stata user (he even wrote a book about graphs programming in Stata). His claim in favor of Stata (versus SAS and SPSS, and also, indirectly, versus R) is to be interpreted the same way as unconditional lovers of Macintoshes or PCs would argue against the other clan. Both architectures are good and have strengths and weaknesses. Real arguments are more sentimental, and could resume in: The more I use it, the more I like it,... and the aliens are bad, ugly and stupid! Would this apply to Stata versus R? I don't know Stata at all, but I imagine it could be the case from what I read in Mitchell's paper... I think there are good reasons why Stata is becoming much more popular in epidemiology and biostatistics [and I'm not particularly prejudiced against R]. In my experience people who like R also like Stata, though clearly the reverse is not necessarily true. Stata, like R, is readily programmable. Users can -- and do -- write and distribute programs that look just like the built-in routines. There is an active and helpful mailing list. However, Stata programming is very different from R programming, since it is macro-based (think Tcl/Tk) rather than function-based. Stata is also easier to learn: it has a very consistent syntax and even better documentation than R. We use Stata for all our service course teaching, and despite the fact that it is command-line based rather than GUI the students were no more unhappy than when SPSS was used for the lowest-level courses and Egret for the higher-level service courses. [Stata now has a GUI but it is awful and quite a lot of students prefer the command-line] -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] bookmarking a page inside r-project.org
In fact it's just as easy in Internet Explorer: right-click + Open in New Window, or Shift-Click, followed by Ctrl+D. Or, right-click + Add to Favorites. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Charles Annis, P.E. Sent: Monday, January 02, 2006 8:15 PM To: 'Jonathan Baron'; r-help@stat.math.ethz.ch Subject: Re: [R] bookmarking a page inside r-project.org You can do something similar with Microsoft's browser but it isn't quite as easy as Foxfire: Right-click on the frame and choose Properties. Then highlight and copy the URL and paste into the address window and click Go. Then save the page. Charles Annis, P.E. [EMAIL PROTECTED] phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jonathan Baron Sent: Monday, January 02, 2006 7:45 PM To: r-help@stat.math.ethz.ch Subject: [R] bookmarking a page inside r-project.org I'm replying to: https://stat.ethz.ch/pipermail/r-help/2006-January/083823.html In Firefox (a browser), right click on the frame. Then you get a menu that has bookmark as one of the options. Firefox is available from http://www.mozilla.org. Jon -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Package for multiple membership model?
Souds like a model with cross-classified random effects. Lme4 can handle this easily. Shige On 1/3/06, Brian Perron [EMAIL PROTECTED] wrote: Hello all: I am interested in computing what the multilevel modeling literature calls a multiple membership model. More specifically, I am working with a data set involving clients and providers. The clients are the lower-level units who are nested within providers (higher-level). However, this is not nesting in the usual sense, as clients can belong to multple providers, which I understand makes this a multiple membership model. Right now, I would like to keep this simple, using only a continuous dependent variable, but would like to also extend this to a repeated measures design. This doesn't seem to be possible with the lme package. Is there something else I could consider? Thanks, Brian NIMH Training Fellow GWB School of Social Work, PhD Program Washington University in St. Louis One Brookings Drive St. Louis, MO 63130 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Package for multiple membership model?
On Tue, 3 Jan 2006, Brian Perron wrote: Hello all: I am interested in computing what the multilevel modeling literature calls a multiple membership model. More specifically, I am working with a data set involving clients and providers. The clients are the lower-level units who are nested within providers (higher-level). However, this is not nesting in the usual sense, as clients can belong to multple providers, which I understand makes this a multiple membership model. Right now, I would like to keep this simple, using only a continuous dependent variable, but would like to also extend this to a repeated measures design. This doesn't seem to be possible with the lme package. Is there something else I could consider? Thanks, I think you want lmer() in the lme4 Matrix packages. It allows crossed random effects. -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] under (and over) dispersion in Poisson regression
I am trying to use Poisson regression to model count data. My results are suggestive of under dispersion (0.79). How close to one does one want the measure of dispersion to be before one accepts the results of the analysis? I know that there is no definitive answer to my question, but I would like to get some sense of general practice. Thanks, John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC and University of Maryland School of Medicine Claude Pepper OAIC University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 410-605-7119 - NOTE NEW EMAIL ADDRESS: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Package for multiple membership model?
Brian Perron [EMAIL PROTECTED] writes: Hello all: I am interested in computing what the multilevel modeling literature calls a multiple membership model. More specifically, I am working with a data set involving clients and providers. The clients are the lower-level units who are nested within providers (higher-level). However, this is not nesting in the usual sense, as clients can belong to multple providers, which I understand makes this a multiple membership model. Right now, I would like to keep this simple, using only a continuous dependent variable, but would like to also extend this to a repeated measures design. This doesn't seem to be possible with the lme package. Is there something else I could consider? Thanks, Brian You could take a look at the lmer() function in the lme4/Matrix packages - see the Rnews 2005/1 article. One potential problem is that for repeated measurements, it is not (currently?) as strong on correlation structure as lme(). You can actually deal with crossed random effects in lme() too, it just gets a little more complicated, involving things like library(nlme) data(Assay) as1 - lme(logDens~sample*dilut, data=Assay, random=pdBlocked(list( pdIdent(~1), pdIdent(~sample-1), pdIdent(~dilut-1 as2 - lme(logDens~sample*dilut, data=Assay, random=list(Block=pdBlocked(list( pdIdent(~1), pdIdent(~sample-1))),dilut=~1)) as3 - lme(logDens~sample*dilut, data=Assay, random=list(Block=~1, Block=pdIdent(~sample-1), dilut=~1)) which all fit the same model (but get the DF wrong in three different ways...) This is slightly different from your example because the crossed factors are nested in Block, but you can always fake a nesting using one - rep(1, length(logDens)) #or whatever lme(, random=list(one=~)) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
Dear Peter et al., It's not reasonable to argue with someone's experience -- that is, if people tell me that they found R harder to learn than SAS, say, then I believe them -- but that's not my experience in teaching relatively inexperienced students to use statistical software. A few points: (1) Casual and initial use of statistical software is easier through a GUI, so it's not reasonable, for example, to compare learning to use SPSS via its GUI to learning R via commands. (2) I don't believe that it's hard to teach a useful initial subset of R commands. Which commands are in the subset will depend somewhat on what one is trying to do. I believe that there are several examples of this approach, including my R and S-PLUS Companion to Applied Regression. Likewise, starting with a simple modus operandi, such as working with a single attached data frame, can cut through a lot of the complexity. Once someone is comfortable with basic use of R, expanding knowledge of functions, packages, and other ways of handling data comes naturally. (3) I don't find R less uniform than SAS or SPSS, particularly in the way that statistical models are handled. Moreover, trying to do something innovative or non-standard in SAS is relatively difficult (in my experience), and even harder in SPSS. I'm less familiar with Stata, but uniformity seems one of its strengths. (The Stata scripting language puts me off, however.) (4) Not everyone has the same experience and thinks in the same way. I've used many different statistical packages and computing environments, and have learned quite a few programming languages (most of which I can no longer use). Of these, I found APL and R the easiest to learn, and Lisp (Lisp-Stat) the hardest. Sometimes, though, it's worth expending the effort to learn something that's difficult -- I feel that I got a lot out of learning to program in Lisp, for example. (5) The essential point is that how hard one finds it to learn something is a function of the intrinsic difficulty of the thing, the person's previous experience, preferred modes of thinking, etc., and how learning is approached. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Flom Sent: Tuesday, January 03, 2006 6:28 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: R-help@stat.math.ethz.ch Subject: Re: [R] A comment about R: Rau, Roland [EMAIL PROTECTED] wrote IMO this is a very good proposal but I think that the main problem is not the translation of one function in SPSS/Stata/SAS to the equivalent in R. Remembering my first contact with R after using SPSS for some years (and having some experience with Stata and SAS) was that your mental framework is different. You think in SPSS-terms (i.e. you expect that data are automatically a rectangular matrix, functions operate on columns of this matrix, you have always only one dataset available, ...). This is why jumping from SPSS to Stata is relatively easy. But to jump from any of the three to R is much more difficult. This mental barrier is also the main obstacle for me now when I try to encourage the use of R to other people who have a similar background as I had. What can be done about it? I guess the only answer is investing time from the user which implies that R will probably never become the language of choice for casual users. But popularity is probably not the main goal of the R-Project (it would be rather a nice side-effect). As someone who uses SAS qutie a bit and R somewhat less, I think Roland makes some excellent points. Going from SPSS to SAS (which I once did) is like going from Spansih to French. Going from SAS to R (which I am trying to do) is like going from English to Chinese. But it's more than that. Beyond the obvious differences in the languages is a difference in how they are written about; and how they are improved. SAS documentation is much lengthier than R's. Some people like the terseness of R's help. Some like the verboseness of SAS's. SOme of this difference is doubtless due to the fact that SAS is commercial, and pays people to write the documentation. I have tremednous appreciation for the unpaid effort that goes into R, and nothing I say here should be seen as detracting from that. As to how they are improved, the fact that R is extended (in part) by packages written by many many different people is good, becuase it means that the latest techniques can be written up, often by the people who invent the techniques (and, again, I appreciate this tremendously), but it does mean that a) It is hard to know what is out there at any given time; b) the styles of pacakages difer somewhat.
Re: [R] A comment about R:
John Fox [EMAIL PROTECTED] 1/3/2006 9:35 am as always, raises some excellent points. I have some responses, interspersed It's not reasonable to argue with someone's experience -- that is, if people tell me that they found R harder to learn than SAS, say, then I believe them -- but that's not my experience in teaching relatively inexperienced students to use statistical software. A few points: A lot of this probably has to do with what you learned first. I learned SAS long before I learned R. Had it been reversed, I would probably find SAS hard. (1) Casual and initial use of statistical software is easier through a GUI, so it's not reasonable, for example, to compare learning to use SPSS via its GUI to learning R via commands. True, but I was comparing SAS and R, and this originally started with STATA and R, and all 3 of those are command driven. (4) Not everyone has the same experience and thinks in the same way. I've used many different statistical packages and computing environments, and have learned quite a few programming languages (most of which I can no longer use). Of these, I found APL and R the easiest to learn, and Lisp (Lisp-Stat) the hardest. Sometimes, though, it's worth expending the effort to learn something that's difficult -- I feel that I got a lot out of learning to program in Lisp, for example. This is, I think, a big part of it. I think that R would be a lot easier to learn for someone who has learned some other computer language. I have not. I agree that learning something difficult can often be worth it. Peter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] cox model
I'm a french medicine student and i work on oncology. I work about treatment oh breast cancer. I have 3 sub group of patient. I made some kaplan meyer survival curve, and i made a cox model. On the survival curve, on the last observations there is a crossmatch of the different survival curve at 150 month . I study the validity of my model by study of residual by study the proportional risk. So i used Coxzph formula, but the global test is p 0.05, so my model is not a proportional risk. Do you know how i can cut the cox model analysis before the 150 month, wich are the time where the curves are crossing? Thank you for your Help Dr Billemont __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to set the size of a rgl window, par3d() ?
On 1/3/2006 10:30 AM, begert wrote: Dear R- Users, is there a way to determine the size of an rgl window (rgl.open()) either in advance or afterwards, (without using the mouse, of course) ? Intuitively, one would assume to set the size by: library(rgl); par3d(viewport=c(0,0,500,500)); #rgl.open(); for example. As the parameter 'viewport' is 'readonly' this results in an error message: Error in par3d(viewport = c(0, 0, 500, 500)) : invalid value specified for rgl parameter viewport In addition: Warning message: parameter viewport cannot be set. Any possible workarounds ? Not that I know of. This is handled by OpenGL and the windowing system; rgl just queries OpenGL to give the par3d(viewport) response. It would take a bit of time to add this, because it would need to be added for all 3 output devices (Windows, X11, OSX). Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Summary functions to dataframe
I have written a few different summary functions. I want to calculate the statistics by groups and I am having trouble getting the output as a dataframe. I have attached one example with a small dataset that calculates summary stats and percentiles, I have others that calculate upper confidence limits etc. I would like the output to be converted to a dataframe with one of the columns as the grouping variable. This seems simple but my attempts with do.call(cbind) and rbind have not worked so I have concluded I a missing something obvious. Any help is appreciated. Thanks, Mike areas - structure (list(N_Type = structure(c(4, 1, 4, 1, 1, 4, 1, 4, 4, 1, 4, 1, 4, 1, 4, 1, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 4, 1, 4, 1, 2, 1, 2, 1, 4, 1, 4, 1, 4, 1, 4, 1, 1, 4, 1, 4, 1, 4, 1, 4, 4, 1, 4, 1, 2, 1, 2, 1, 1, 4, 1, 4, 4, 1, 4, 1, 4, 1, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 1, 4, 1, 4, 2, 1, 2, 1, 1, 4, 1, 4, 1, 4, 4, 1, 4, 1), .Label = c(All, Inside 370, Not Applicable, Outside 370 ), class = factor), AdRes = c(23.7, 23.7, 42.4, 42.4, 630, 630, 990, 990, 72.85, 72.85, 70.6, 70.6, 10, 10, 21.7, 21.7, 171.66, 171.66, 306, 306, 62.1, 62.1, 53.25, 53.25, 208, 208, 64.8, 64.8, 87.3, 87.3, 356, 356, 25.8, 25.8, 156, 156, 166, 166, 135.5, 135.5, 170.5, 170.5, 203, 203, 227.5, 227.5, 224, 224, 123, 123, 140.66, 140.66, 142.5, 142.5, 44.65, 44.65, 50.3, 50.3, 1320, 1320, 577, 577, 71.1, 71.1, 411, 411, 104, 104, 122, 122, 201, 201, 230, 230, 192, 192, 304, 304, 184.5, 184.5, 350, 350, 536, 536, 470.5, 470.5, 172, 172, 166, 166, 205, 205, 595, 595, 227.5, 227.5, 9.1, 9.1, 14.6, 14.6, 10.9, 10.9, 11.1, 11.1, 313.5, 313.5, 53.8, 53.8, 29.8, 29.8, 29.5, 29.5, 34.05, 34.05, 21.8, 21.8, 385.5, 385.5, 541, 541, 168, 168, 119, 119, 376, 376, 91.9, 91.9, 97.76, 97.76, 164, 164, 244, 244, 303.5, 303.5, 388, 388, 59.8, 59.8, 227.5, 227.5, 165, 165, 19.15, 19.15, 651, 651, 195, 195, 190, 190, 164, 164, 190, 190, 334, 334)), .Names = c(N_Type, AdRes), row.names = c(8956, 8957, 8972, 8973, 8974, 8975, 8976, 8977, 8978, 8979, 8980, 8981, 8982, 8983, 8984, 8985, 9159, 9160, 9175, 9176, 9177, 9178, 9185, 9186, 9201, 9202, 9203, 9204, 9205, 9206, 9207, 9208, 9209, 9210, 9217, 9218, 9233, 9234, 9241, 9242, 9261, 9262, 9277, 9278, 9285, 9286, 9301, 9302, 9309, 9310, 9329, 9330, 9345, 9346, 9353, 9354, 9369, 9370, 9371, 9372, 9373, 9374, 9410, 9411, 9412, 9413, 9414, 9415, 9422, 9423, 9424, 9425, 9426, 9427, 9428, 9429, 9430, 9431, 9432, 9433, 9434, 9435, 9436, 9437, 9444, 9445, 9452, 9453, 9454, 9455, 9456, 9457, 9458, 9459, 9460, 9461, 9468, 9469, 9470, 9471, 9472, 9473, 9474, 9475, 9476, 9477, 9478, 9479, 9480, 9481, 9488, 9489, 9496, 9497, 9498, 9499, 9720, 9721, 9722, 9723, 9724, 9725, 9726, 9727, 9728, 9729, 9730, 9731, 9732, 9733, 9734, 9735, 9736, 9737, 9738, 9739, 9740, 9741, 9742, 9743, 9744, 9745, 9746, 9747, 9748, 9749, 9750, 9751, 9752, 9753, 9754, 9755, 9756, 9757, 9758, 9759, 9760, 9761), class = data.frame) Pstats - function(x) { Max = max(x) Min = min(x) AMean = mean(x) AStdev = sd(x) Samples - length(x) p10 - quantile(x,0.1,na.rm = TRUE, names = FALSE) p20 - quantile(x,0.2,na.rm = TRUE, names = FALSE) p30 - quantile(x,0.3,na.rm = TRUE, names = FALSE) p40 - quantile(x,0.4,na.rm = TRUE, names = FALSE) p50 - quantile(x,0.5,na.rm = TRUE, names = FALSE) p60 - quantile(x,0.6,na.rm = TRUE, names = FALSE) p70 - quantile(x,0.7,na.rm = TRUE, names = FALSE) p80 - quantile(x,0.8,na.rm = TRUE, names = FALSE) p90 - quantile(x,0.9,na.rm = TRUE, names = FALSE) Result - data.frame(Samples,AMean,AStdev, Min,Max,p10,p20,p30,p40,p50,p60,p70,p80,p90) return(Result) #write.table(Result, file = Results.csv, sep = ,,row.names = FALSE) } attach(areas) res - by(areas, N_Type, function (x) (Pstats(AdRes))) #need to convert res to a dataframe Michael Bock, PhD ENVIRON International Corporation 136 Commercial Street, Suite 402 Portland, ME 04101 phone: 207.347.4413 fax: 207.347.4384 This message contains information that may be confidential, ...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
On 3 Jan 2006 at 7:35, Thomas Lumley wrote: On Mon, 2 Jan 2006, Philippe Grosjean wrote: That said, I think one should interpret Mitchell's paper in a different way. Obviously, he is an unconditional and happy Stata user (he even wrote a book about graphs programming in Stata). His claim in favor of Stata (versus SAS and SPSS, and also, indirectly, versus R) is to be interpreted the same way as unconditional lovers of Macintoshes or PCs would argue against the other clan. Both architectures are good and have strengths and weaknesses. Real arguments are more sentimental, and could resume in: The more I use it, the more I like it,... and the aliens are bad, ugly and stupid! Would this apply to Stata versus R? I don't know Stata at all, but I imagine it could be the case from what I read in Mitchell's paper... I think there are good reasons why Stata is becoming much more popular in epidemiology and biostatistics [and I'm not particularly prejudiced against R]. In my experience people who like R also like Stata, though clearly the reverse is not necessarily true. Stata, like R, is readily programmable. Users can -- and do -- write and distribute programs that look just like the built-in routines. There is an active and helpful mailing list. However, Stata programming is very different from R programming, since it is macro-based (think Tcl/Tk) rather than function-based. Stata is also easier to learn: it has a very consistent syntax and even better documentation than R. We use Stata for all our service course teaching, and despite the fact that it is command-line based rather than GUI the students were no more unhappy than when SPSS was used for the lowest-level courses and Egret for the higher-level service courses. [Stata now has a GUI but it is awful and quite a lot of students prefer the command-line] -thomas I'll offer a Second to Thomas's motion. I like R but I find Stata much easier to teach in service courses. For most of my students, the Stata learning curve is much more tolerable than that of R (at a reduction in capability, of course). I state on Day 1 that I think R is the world's best package, and that Stata is my choice for a very acceptable compromise --- for most purposes. A few students go on to write their own Stata programs, and a few go on to learn R and love it. But the vast majority of my students learn enough Stata to get through the courses, and afterward they do whatever their advisor wants them to do (the First Law of Graduate School). For a sizable fraction (maybe 25%), that also proves to be Stata, as there is a solid core of Stata users among the faculty here. I'l also agree that Stata's GUI is ghastly; most of my students (both during courses and any later use) quickly adapt to using Stata's command line, and they use it quite effectively. ---JRG John R. Gleason Associate Professor Syracuse University 430 Huntington Hall Voice: 315-443-3107 Syracuse, NY 13244-2340 USA FAX: 315-443-4085 PGP public key at keyservers __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
Patrick Burns [EMAIL PROTECTED] writes: I have had an email conversation with the author of the technical report from which the quote was taken. I am formulating a comment to the report that will be posted with the technical report. I would be pleased if this thread continued, so I will know better what I want to say. Plus I should be able to reference this thread in the comment. One thing that is often overlooked, and hasn't yet been mentioned in the thread, is how much *simpler* R can be for certain completely basic tasks of practical or pedagogical relevance: Calculate a simple derived statistic, confidence intervals from estimate and SE, percentage points of the binomial distribution - using dbinom or from the formula, take the sum of each of 10 random samples from a set of numbers, etc. This is where other packages get stuck in the procedure+dataset mindset. For much the same reason, those packages make you tend to treat practical data analysis as something distinct from theoretical understanding of the methods: You just don't use SAS or SPSS or Stata to illustrate the concept of a random sample by setting up a small simulation study as the first thing you do in a statistics class, whereas you could quite conceivably do it in R. (What *is* the equivalent of rnorm(25) in those languages, actually?) Even when using SAS in teaching, I sometimes fire up R just to calculate simple things like pbar - (p1+p2)/2 sqrt(pbar*(1-pbar)) which you need to cheat SAS Analyst's sample size calculator to work with proportions rather than means. SAS leaves you no way to do this short of setting up a new data set. The Windows calculator will do it, of course, but the students can't see what you are doing then. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] For loop gets exponentially slower as dataset gets larger...
I am running R 2.1.1 in a Microsoft Windows XP environment. I have a matrix with three vectors (columns) and ~2 million rows. The three vectors are date_, id, and price. The data is ordered (sorted) by code and date_. (The matrix contains daily prices for several thousand stocks, and has ~2 million rows. If a stock did not trade on a particular date, its price is set to NA) I wish to add a fourth vector that is next_price. (Next price is the current price as long as the current price is not NA. If the current price is NA, the next_price is the next price that the security with this same ID trades. If the stock does not trade again, next_price is set to NA.) I wrote the following loop to calculate next_price. It works as intended, but I have one problem. When I have only 10,000 rows of data, the calculations are very fast. However, when I run the loop on the full 2 million rows, it seems to take ~ 1 second per row. Why is this happening? What can I do to speed the calculations when running the loop on the full 2 million rows? (I am not running low on memory, but I am maxing out my CPU at 100%) Here is my code and some sample data: data- data[order(data$code,data$date_),] l-dim(data)[1] w-3 data[l,w+1]-NA for (i in (l-1):(1)){ data[i,w+1]-ifelse(is.na(data[i,w])==F,data[i,w],ifelse(data[i,2]==data[i+1,2],data[i+1,w+1],NA)) } date id price next_price 6/24/20051635444.7838 444.7838 6/27/20051635448.4756 448.4756 6/28/20051635455.4161 455.4161 6/29/20051635454.6658 454.6658 6/30/20051635453.9155 453.9155 7/1/2005 1635453.3153 453.3153 7/4/2005 1635NA 453.9155 7/5/2005 1635453.9155 453.9155 7/6/2005 1635453.0152 453.0152 7/7/2005 1635452.8651 452.8651 7/8/2005 1635456.0163 456.0163 12/19/2005 1635442.6982 442.6982 12/20/2005 1635446.5159 446.5159 12/21/2005 1635452.4714 452.4714 12/22/2005 1635451.074 451.074 12/23/2005 1635454.6453 454.6453 12/27/2005 1635NA NA 12/28/2005 1635NA NA 12/1/2003188166.1562 66.1562 12/2/2003188164.9192 64.9192 12/3/2003188166.0078 66.0078 12/4/2003188165.8098 65.8098 12/5/2003188164.1275 64.1275 12/8/2003188164.8697 64.8697 12/9/2003188163.5337 63.5337 12/10/2003 188162.9399 62.9399 - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] under (and over) dispersion in Poisson regression
This most often indicates a problem with dispersion estimate. See the cautionary tale in MASS4 chapter 7. If you have a reliable dispersion estimate that low for genuine counts, they are either not independent or not Poisson (for example, limited), and one would want to find out what is going on. On Tue, 3 Jan 2006, John Sorkin wrote: I am trying to use Poisson regression to model count data. My results are suggestive of under dispersion (0.79). How close to one does one want the measure of dispersion to be before one accepts the results of the analysis? I know that there is no definitive answer to my question, but I would like to get some sense of general practice. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Bootstrap w/ Clustered Data
Looks like I may have found a function that addresses my needs. Bootcov in Design handles bootstrapping from clustered data and will save the coefficients. I'm not entirely sure it handles clusters the way I'd like, but I'm going through the code. If it doesn't, it looks easily re-writeable. As far as I can tell, boot in package boot would do clusters only if the estimation function passed to it pastes together data based on the clusters boot selects. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Summary functions to dataframe
Try this: Pstats - function(x) c(Max = max(x), Min = min(x), AMean = mean(x), AStdev = sd(x), Samples = length(x), quantile(x, 1:9/10, na.rm = TRUE)) res - with(areas, by(AdRes, N_Type, Pstats)) do.call(rbind, res) Also, check out summaryBy in the doBy package at http://genetics.agrsci.dk/~sorenh/misc/index.html On 1/3/06, Mike Bock [EMAIL PROTECTED] wrote: I have written a few different summary functions. I want to calculate the statistics by groups and I am having trouble getting the output as a dataframe. I have attached one example with a small dataset that calculates summary stats and percentiles, I have others that calculate upper confidence limits etc. I would like the output to be converted to a dataframe with one of the columns as the grouping variable. This seems simple but my attempts with do.call(cbind) and rbind have not worked so I have concluded I a missing something obvious. Any help is appreciated. Thanks, Mike areas - structure (list(N_Type = structure(c(4, 1, 4, 1, 1, 4, 1, 4, 4, 1, 4, 1, 4, 1, 4, 1, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 4, 1, 4, 1, 2, 1, 2, 1, 4, 1, 4, 1, 4, 1, 4, 1, 1, 4, 1, 4, 1, 4, 1, 4, 4, 1, 4, 1, 2, 1, 2, 1, 1, 4, 1, 4, 4, 1, 4, 1, 4, 1, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 4, 1, 1, 4, 1, 4, 2, 1, 2, 1, 1, 4, 1, 4, 1, 4, 4, 1, 4, 1), .Label = c(All, Inside 370, Not Applicable, Outside 370 ), class = factor), AdRes = c(23.7, 23.7, 42.4, 42.4, 630, 630, 990, 990, 72.85, 72.85, 70.6, 70.6, 10, 10, 21.7, 21.7, 171.66, 171.66, 306, 306, 62.1, 62.1, 53.25, 53.25, 208, 208, 64.8, 64.8, 87.3, 87.3, 356, 356, 25.8, 25.8, 156, 156, 166, 166, 135.5, 135.5, 170.5, 170.5, 203, 203, 227.5, 227.5, 224, 224, 123, 123, 140.66, 140.66, 142.5, 142.5, 44.65, 44.65, 50.3, 50.3, 1320, 1320, 577, 577, 71.1, 71.1, 411, 411, 104, 104, 122, 122, 201, 201, 230, 230, 192, 192, 304, 304, 184.5, 184.5, 350, 350, 536, 536, 470.5, 470.5, 172, 172, 166, 166, 205, 205, 595, 595, 227.5, 227.5, 9.1, 9.1, 14.6, 14.6, 10.9, 10.9, 11.1, 11.1, 313.5, 313.5, 53.8, 53.8, 29.8, 29.8, 29.5, 29.5, 34.05, 34.05, 21.8, 21.8, 385.5, 385.5, 541, 541, 168, 168, 119, 119, 376, 376, 91.9, 91.9, 97.76, 97.76, 164, 164, 244, 244, 303.5, 303.5, 388, 388, 59.8, 59.8, 227.5, 227.5, 165, 165, 19.15, 19.15, 651, 651, 195, 195, 190, 190, 164, 164, 190, 190, 334, 334)), .Names = c(N_Type, AdRes), row.names = c(8956, 8957, 8972, 8973, 8974, 8975, 8976, 8977, 8978, 8979, 8980, 8981, 8982, 8983, 8984, 8985, 9159, 9160, 9175, 9176, 9177, 9178, 9185, 9186, 9201, 9202, 9203, 9204, 9205, 9206, 9207, 9208, 9209, 9210, 9217, 9218, 9233, 9234, 9241, 9242, 9261, 9262, 9277, 9278, 9285, 9286, 9301, 9302, 9309, 9310, 9329, 9330, 9345, 9346, 9353, 9354, 9369, 9370, 9371, 9372, 9373, 9374, 9410, 9411, 9412, 9413, 9414, 9415, 9422, 9423, 9424, 9425, 9426, 9427, 9428, 9429, 9430, 9431, 9432, 9433, 9434, 9435, 9436, 9437, 9444, 9445, 9452, 9453, 9454, 9455, 9456, 9457, 9458, 9459, 9460, 9461, 9468, 9469, 9470, 9471, 9472, 9473, 9474, 9475, 9476, 9477, 9478, 9479, 9480, 9481, 9488, 9489, 9496, 9497, 9498, 9499, 9720, 9721, 9722, 9723, 9724, 9725, 9726, 9727, 9728, 9729, 9730, 9731, 9732, 9733, 9734, 9735, 9736, 9737, 9738, 9739, 9740, 9741, 9742, 9743, 9744, 9745, 9746, 9747, 9748, 9749, 9750, 9751, 9752, 9753, 9754, 9755, 9756, 9757, 9758, 9759, 9760, 9761), class = data.frame) Pstats - function(x) { Max = max(x) Min = min(x) AMean = mean(x) AStdev = sd(x) Samples - length(x) p10 - quantile(x,0.1,na.rm = TRUE, names = FALSE) p20 - quantile(x,0.2,na.rm = TRUE, names = FALSE) p30 - quantile(x,0.3,na.rm = TRUE, names = FALSE) p40 - quantile(x,0.4,na.rm = TRUE, names = FALSE) p50 - quantile(x,0.5,na.rm = TRUE, names = FALSE) p60 - quantile(x,0.6,na.rm = TRUE, names = FALSE) p70 - quantile(x,0.7,na.rm = TRUE, names = FALSE) p80 - quantile(x,0.8,na.rm = TRUE, names = FALSE) p90 - quantile(x,0.9,na.rm = TRUE, names = FALSE) Result - data.frame(Samples,AMean,AStdev, Min,Max,p10,p20,p30,p40,p50,p60,p70,p80,p90) return(Result) #write.table(Result, file = Results.csv, sep = ,,row.names = FALSE) } attach(areas) res - by(areas, N_Type, function (x) (Pstats(AdRes))) #need to convert res to a dataframe Michael Bock, PhD ENVIRON International Corporation 136 Commercial Street, Suite 402 Portland, ME 04101 phone: 207.347.4413 fax: 207.347.4384 This message contains information that may be confidential, ...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE
Re: [R] A comment about R:
Another big difference between R and other computing language such as SPSS/SAS/STATA. You can easily get a job using SPSS/SAS/STATA. But it is extremely difficult to find a job using R. ^_^. On 03 Jan 2006 17:53:40 +0100, Peter Dalgaard [EMAIL PROTECTED] wrote: Patrick Burns [EMAIL PROTECTED] writes: I have had an email conversation with the author of the technical report from which the quote was taken. I am formulating a comment to the report that will be posted with the technical report. I would be pleased if this thread continued, so I will know better what I want to say. Plus I should be able to reference this thread in the comment. One thing that is often overlooked, and hasn't yet been mentioned in the thread, is how much *simpler* R can be for certain completely basic tasks of practical or pedagogical relevance: Calculate a simple derived statistic, confidence intervals from estimate and SE, percentage points of the binomial distribution - using dbinom or from the formula, take the sum of each of 10 random samples from a set of numbers, etc. This is where other packages get stuck in the procedure+dataset mindset. For much the same reason, those packages make you tend to treat practical data analysis as something distinct from theoretical understanding of the methods: You just don't use SAS or SPSS or Stata to illustrate the concept of a random sample by setting up a small simulation study as the first thing you do in a statistics class, whereas you could quite conceivably do it in R. (What *is* the equivalent of rnorm(25) in those languages, actually?) Even when using SAS in teaching, I sometimes fire up R just to calculate simple things like pbar - (p1+p2)/2 sqrt(pbar*(1-pbar)) which you need to cheat SAS Analyst's sample size calculator to work with proportions rather than means. SAS leaves you no way to do this short of setting up a new data set. The Windows calculator will do it, of course, but the students can't see what you are doing then. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- WenSui Liu (http://statcompute.blogspot.com) Senior Decision Support Analyst Health Policy and Clinical Effectiveness Cincinnati Children Hospital Medical Center [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
One implicit point in Kjetil's message is the difficulty of learning enough of R to make its use a natural and desired first choice alternative, which I see as the point at which real progress and learning commence with any new language. I agree that the long learning curve is a serious problem, and in the past I have discussed, off list, with one of the very senior contributors to this list the possibility of splitting the list into sections for newcomers and for advanced users. He gave some very cogent reasons for not splitting, such as the possibility of newcomers' getting bad advice from others only slightly more advanced than themselves. And yet I suspect that a newcomers' section would encourage the kind of mutually helpful collegiality among newcomers that now characterizes the exchanges of the more experienced users on this list. I know that I have occasionally been reluctant to post issues that seem too elementary or trivial to vex the others on the list with and so have stumbled around for an hour or so seeking the solution to a simple problem. Had I the counsel of others similarly situated progress might have been far faster. Have other newcomers or occasional users had the same experience? Is it time to reconsider splitting this list into two sections? Certainly the volume of traffic could justify it. Ben Fairbank -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kjetil Halvorsen Sent: Sunday, January 01, 2006 8:37 AM To: R-help@stat.math.ethz.ch Subject: [R] A comment about R: Readers of this list might be interested in the following commenta about R. In a recent report, by Michael N. Mitchell http://www.ats.ucla.edu/stat/technicalreports/ says about R: Perhaps the most notable exception to this discussion is R, a language for statistical computing and graphics. R is free to download under the terms of the GNU General Public License (see http://www.r-project. org/). Our web site has resources on R and I have tried, sometimes in great earnest, to learn and understand R. I have learned and used a number of statistical packages (well over 10) and a number of programming languages (over 5), and I regret to say that I have had enormous diffculties learning and using R. I know that R has a great fan base composed of skilled and excellent statisticians, and that includes many people from the UCLA statistics department. However, I feel like R is not so much of a statistical package as much as it is a statistical programming environment that has many new and cutting edge features. For me learning R has been very diffcult and I have had a very hard time finding answers to many questions about using it. Since the R community tends to be composed of experts deeply enmeshed in R, I often felt that I was missing half of the pieces of the puzzle when reading information about the use of R { it often feels like there is an assumption that readers are also experts in R. I often found the documentation for R quite sparse and many essential terms or constructs were used but not defined or cross-referenced. While there are mailing lists regarding R where people can ask questions, there is no offcial technical support. Because R is free and is based on the contributions of the R community, it is extremely extensible and programmable and I have been told that it has many cutting edge features, some not available anywhere else. Although R is free, it may be more costly in terms of your time to learn, use, and obtain support for it. My feeling is that R is much more suited to the sort of statistician who is oriented towards working very deeply with it. I think R is the kind of package that you really need to become immersed in (like a foreign language) and then need to use on a regular basis. I think that it is much more diffcult to use it casually as compared to SAS, Stata or SPSS. But by devoting time and effort to it would give you access to a programming environment where you can write R programs and collaborate with others who are also using R. Those who are able to access its power, even at an applied level, would be able to access tools that may not be found in other packages, but this might come with a serious investment of time to suffciently use R and maintain your skills with R. Kjetil [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
Ben Fairbank [EMAIL PROTECTED] 1/3/2006 12:42 pm wrote One implicit point in Kjetil's message is the difficulty of learning enough of R to make its use a natural and desired first choice alternative, which I see as the point at which real progress and learning commence with any new language. I agree that the long learning curve is a serious problem, and in the past I have discussed, off list, with one of the very senior contributors to this list the possibility of splitting the list into sections for newcomers and for advanced users. He gave some very cogent reasons for not splitting, such as the possibility of newcomers' getting bad advice from others only slightly more advanced than themselves. And yet I suspect that a newcomers' section would encourage the kind of mutually helpful collegiality among newcomers that now characterizes the exchanges of the more experienced users on this list. I know that I have occasionally been reluctant to post issues that seem too elementary or trivial to vex the others on the list with and so have stumbled around for an hour or so seeking the solution to a simple problem. Had I the counsel of others similarly situated progress might have been far faster. Have other newcomers or occasional users had the same experience? I, for one, have had this experience. I am usually hesitant to post elementary questions here. However, I think that the 'cogent reasons' given by 'one of the very senior contributors' are valid. I think that a 'newcomers list' would only really be useful if it included some experts who could respond, out of generosity. I don't think the R community lacks generosity - obviously not, given all the thousands of hours people have spent writing the language and all the packages and so on. But these generous people have different abilities and get pleasure in different ways. Some people get a thrill out of answering complex questions that require them to come up with novel solutions involving complex code. Some people get a thrill out of helping newbies over the humps. Dividing the lists might help the experts, as much as it helps the beginners. Peter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
U I cannot say how easy or hard R is to learn, but in response to the UCLA commentary: However, I feel like R is not so much of a statistical package as much as it is a statistical programming environment that has many new and cutting edge features. Please note: the first sentence of the Preface of THE Green Book (PROGRAMMING WITH DATA: A GUIDE TO THE S LANGUAGE) by John Chambers, the inventor of the S Language, explicitly states: S is a programming language and environment for all kinds of computing involving data. I think this says that R is **not** meant to be a statistical package in the conventional sense and should not be considered one. As computing involving data is a complex and frequently messy business on both technical (statistics), practical (messy data), and aesthetic (graphics, tables) levels, it is perhaps to be expected that a programming language and environment for all kinds of computing involving data is complex. Personally, I find that (Chambers's next sentence) R's ability To turn ideas into software, quickly and faithfully, to be a boon. But, then again, I'm a statistical professional and not a casual user. Cheers, -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
Wensui Liu wrote: Another big difference between R and other computing language such as SPSS/SAS/STATA. You can easily get a job using SPSS/SAS/STATA. But it is extremely difficult to find a job using R. ^_^. Actually in finance it is getting easier all the time for knowledge of R to be a significant benefit. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
Berton Gunter writes U I cannot say how easy or hard R is to learn, but in response to the UCLA commentary: However, I feel like R is not so much of a statistical package as much as it is a statistical programming environment that has many new and cutting edge features. Please note: the first sentence of the Preface of THE Green Book (PROGRAMMING WITH DATA: A GUIDE TO THE S LANGUAGE) by John Chambers, the inventor of the S Language, explicitly states: S is a programming language and environment for all kinds of computing involving data. I think this says that R is **not** meant to be a statistical package in the conventional sense and should not be considered one. As computing involving data is a complex and frequently messy business on both technical (statistics), practical (messy data), and aesthetic (graphics, tables) levels, it is perhaps to be expected that a programming language and environment for all kinds of computing involving data is complex. Personally, I find that (Chambers's next sentence) R's ability To turn ideas into software, quickly and faithfully, to be a boon. snip Right. So in 2 months I will finish my MD program here in the U.S. I also have a master's degree in Epidemiology (in which we used SAS)--but that hardly qualifies me as statistics expert. Nonetheless, I have learned to use R out of necessity without undue difficulty. So have multiple of my colleagues around me with MDs, PhDs, and Master's degrees. We do mainly microarray analysis, so the availability of a rapidly developing and customizable toolset (BioC packages) is essential to our work. And, in the same vein of others' comments, R's nuts and bolts characteristics make me think, learn, and improve. And the fear of getting Ripleyed on the mailing list also makes me think, read, and improve before submitting half baked questions to the list. So in sum, I use R because it encourages thoughtful analysis, it is flexible and extensible, and it is free. I feel that these are strengths of the environment, not weaknesses. So if an individual finds another tool better suited for their work that is obviously just fine, but I hardly think these characteristics of R are grounds for criticism, excellent proposals for evolution of documentation and mailing lists notwithstanding. -Eric This email message, including any attachments, is for the so...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] For loop gets exponentially slower as dataset gets larger...
Your 2-million loop is overkill, because apparently in the (vast) majority of cases you don't need to loop at all. You could try something like this: 1. Split the price by id, e.g. price.list - split(price,id) For each id, 2a. When price is not NA, assign it to next price _without_ using a for loop - e.g. next.price[!is.na(price)] - price[!is.na(price)] 2b. Use a for loop only when price is NA, but even then work with vectors as much as you can, for example (untested) for (i in setdiff(which(is.na(price)),length(price))) { remaining.prices - price[(i+1):length(price)] of.interest - head(remaining.prices[!is.na(remaining.prices)],1) if (class(of.interest) == logical) next.price[i] - NA else next.price[i] - of.interest } To run (2a) and (2b) you could use lapply(); to paste the bits together try do.call(rbind,price.list). You might also want to take a look at ?Rprof and check the archives for efficiency suggestions. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of r user Sent: Tuesday, January 03, 2006 11:59 AM To: rhelp Subject: [R] For loop gets exponentially slower as dataset gets larger... I am running R 2.1.1 in a Microsoft Windows XP environment. I have a matrix with three vectors (columns) and ~2 million rows. The three vectors are date_, id, and price. The data is ordered (sorted) by code and date_. (The matrix contains daily prices for several thousand stocks, and has ~2 million rows. If a stock did not trade on a particular date, its price is set to NA) I wish to add a fourth vector that is next_price. (Next price is the current price as long as the current price is not NA. If the current price is NA, the next_price is the next price that the security with this same ID trades. If the stock does not trade again, next_price is set to NA.) I wrote the following loop to calculate next_price. It works as intended, but I have one problem. When I have only 10,000 rows of data, the calculations are very fast. However, when I run the loop on the full 2 million rows, it seems to take ~ 1 second per row. Why is this happening? What can I do to speed the calculations when running the loop on the full 2 million rows? (I am not running low on memory, but I am maxing out my CPU at 100%) Here is my code and some sample data: data- data[order(data$code,data$date_),] l-dim(data)[1] w-3 data[l,w+1]-NA for (i in (l-1):(1)){ data[i,w+1]-ifelse(is.na(data[i,w])==F,data[i,w],ifelse(data[ i,2]==data[i+1,2],data[i+1,w+1],NA)) } date id price next_price 6/24/20051635444.7838 444.7838 6/27/20051635448.4756 448.4756 6/28/20051635455.4161 455.4161 6/29/20051635454.6658 454.6658 6/30/20051635453.9155 453.9155 7/1/2005 1635453.3153 453.3153 7/4/2005 1635NA 453.9155 7/5/2005 1635453.9155 453.9155 7/6/2005 1635453.0152 453.0152 7/7/2005 1635452.8651 452.8651 7/8/2005 1635456.0163 456.0163 12/19/2005 1635442.6982 442.6982 12/20/2005 1635446.5159 446.5159 12/21/2005 1635452.4714 452.4714 12/22/2005 1635451.074 451.074 12/23/2005 1635454.6453 454.6453 12/27/2005 1635NA NA 12/28/2005 1635NA NA 12/1/2003188166.1562 66.1562 12/2/2003188164.9192 64.9192 12/3/2003188166.0078 66.0078 12/4/2003188165.8098 65.8098 12/5/2003188164.1275 64.1275 12/8/2003188164.8697 64.8697 12/9/2003188163.5337 63.5337 12/10/2003 188162.9399 62.9399 - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
On Tue, 3 Jan 2006, Peter Dalgaard wrote: One thing that is often overlooked, and hasn't yet been mentioned in the thread, is how much *simpler* R can be for certain completely basic tasks of practical or pedagogical relevance: Calculate a simple derived statistic, confidence intervals from estimate and SE, percentage points of the binomial distribution - using dbinom or from the formula, take the sum of each of 10 random samples from a set of numbers, etc. This is where other packages get stuck in the procedure+dataset mindset. Some of these things are actually fairly straightforward in Stata. For example, Stata will give confidence intervals and tests for linear combinations of coefficients and even (using symbolic differentiation and the delta method) for nonlinear combinations. The first is available in packages in R, the second is in S Programming but doesn't seem to be packaged. . di Binomial(10,4,0.2) .12087388 Taking the sum of each of ten random samples, or other things of that sort, obviously requires creating a new data set, but again there are facilities to automate this. I have, for example, computed bootstrap confidence intervals for ratio or difference of medians in a service course using Stata. It would be easier in R, but not that much easier. For much the same reason, those packages make you tend to treat practical data analysis as something distinct from theoretical understanding of the methods: You just don't use SAS or SPSS or Stata to illustrate the concept of a random sample by setting up a small simulation study as the first thing you do in a statistics class, whereas you could quite conceivably do it in R. (What *is* the equivalent of rnorm(25) in those languages, actually?) set obs 25 gen x = invnorm(uniform()) [This does create a new data set, of course] Even when using SAS in teaching, I sometimes fire up R just to calculate simple things like pbar - (p1+p2)/2 sqrt(pbar*(1-pbar)) local pbar=(0.3+0.5)/2 display sqrt(`pbar'*(1-`pbar')) Now, I still prefer R both for data analysis and (even more so) for programming. There are some things that it is genuinely difficult to program in Stata -- and as evidence that this isn't just my ignorance of the best approaches, the language was substantially reworked in both versions 8 and 9 to allow the vendor to implement better graphics and linear mixed models. On the question of which system really is easier to learn I can only comment that this isn't the only question where education, as a field, would benefit from some good randomized controlled trials. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R fortunes candidate? (was A comment about R)
A candidate for the fortunes package? (Perhaps the highest honor one can receive: being verbified :-) ) And the fear of getting Ripleyed on the mailing list also makes me think, read, and improve before submitting half baked questions to the list. -- Eric Kort Cheers, Bert __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] For loop gets exponentially slower as dataset gets larger...
Accepting this stacked representation for the moment try this. When reordering the dates do it in reverse order. Then loop over all codes applying the zoo function na.locf to the the prices for that code. locf stands for last observation carried forward. Since our dates are reversed it will bring the next one backwards. Finally sort back into ascending order. library(zoo) # needed for na.locf which also works for non-zoo objects data - data[order(data$code, - as.numeric(data$date_)),] attach(data) next_price - price for(i in unique(code)) next_price[code==i] - na.locf(price[code==i], na.rm=F) data$next_price - next_price data - data[order(data$code, data$date_),] detach() Here it is again but this time we represent it as a list of zoo objects with one component per code. In the code below we split the data on code and apply f to do that. Note that na.locf replaces NAs with the last observation carried forward so by reversing the data, using na.locf and reversing the data again we get the effect. library(zoo) f - function(x) { z - zoo(x$price, x$date_) next_price - rev(na.locf(rev(coredata(z)), na.rm = FALSE)) merge(z, next_price) } z - lapply(split(data, data$code), f) On 1/3/06, r user [EMAIL PROTECTED] wrote: I am running R 2.1.1 in a Microsoft Windows XP environment. I have a matrix with three vectors (columns) and ~2 million rows. The three vectors are date_, id, and price. The data is ordered (sorted) by code and date_. (The matrix contains daily prices for several thousand stocks, and has ~2 million rows. If a stock did not trade on a particular date, its price is set to NA) I wish to add a fourth vector that is next_price. (Next price is the current price as long as the current price is not NA. If the current price is NA, the next_price is the next price that the security with this same ID trades. If the stock does not trade again, next_price is set to NA.) I wrote the following loop to calculate next_price. It works as intended, but I have one problem. When I have only 10,000 rows of data, the calculations are very fast. However, when I run the loop on the full 2 million rows, it seems to take ~ 1 second per row. Why is this happening? What can I do to speed the calculations when running the loop on the full 2 million rows? (I am not running low on memory, but I am maxing out my CPU at 100%) Here is my code and some sample data: data- data[order(data$code,data$date_),] l-dim(data)[1] w-3 data[l,w+1]-NA for (i in (l-1):(1)){ data[i,w+1]-ifelse(is.na(data[i,w])==F,data[i,w],ifelse(data[i,2]==data[i+1,2],data[i+1,w+1],NA)) } date id price next_price 6/24/20051635444.7838 444.7838 6/27/20051635448.4756 448.4756 6/28/20051635455.4161 455.4161 6/29/20051635454.6658 454.6658 6/30/20051635453.9155 453.9155 7/1/2005 1635453.3153 453.3153 7/4/2005 1635NA 453.9155 7/5/2005 1635453.9155 453.9155 7/6/2005 1635453.0152 453.0152 7/7/2005 1635452.8651 452.8651 7/8/2005 1635456.0163 456.0163 12/19/2005 1635442.6982 442.6982 12/20/2005 1635446.5159 446.5159 12/21/2005 1635452.4714 452.4714 12/22/2005 1635451.074 451.074 12/23/2005 1635454.6453 454.6453 12/27/2005 1635NA NA 12/28/2005 1635NA NA 12/1/2003188166.1562 66.1562 12/2/2003188164.9192 64.9192 12/3/2003188166.0078 66.0078 12/4/2003188165.8098 65.8098 12/5/2003188164.1275 64.1275 12/8/2003188164.8697 64.8697 12/9/2003188163.5337 63.5337 12/10/2003 188162.9399 62.9399 - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
As others have pointed out, since R is more of a programming language than a statistical package, yes, it is _harder_ to learn. I would say its easier to learn than C++, harder to learn than VBA, and on par with learning Java, but that's all debatable. One thing that makes R slightly more intimidating than it has to be is that once a noob decides to download R, install it and open it, he gets a semi-blank screen and that's it. Eventually he may/may not find out that what he needs to do next is to decide which text editor he wants to use. They all have their pluses and minuses. Some can be as intimidating as R itself. God help him if he tries to learn Xemacs at the same time as learning R. I learned C++ and other langauges before/concurrently with learning R (actually S+), but I have to admit it was still not easy. Its been a long road for me, but I hardly ever use spreadsheets anymore. However, getting the casual users to do things in R instead of a spreadsheet is not going to be easy and I am not sure that that is the goal. I am not sure how relevant this comment is, but there is something about a product being free that makes it appear less valuable. At the company I used to work for a group of people tried to persuad the managers to buy S+ licenses for them all. Whenever I would tell them that they could download R right now for _free_ I would just get blank stares. Thanks, Roger On 1/3/06, Kort, Eric [EMAIL PROTECTED] wrote: Berton Gunter writes U I cannot say how easy or hard R is to learn, but in response to the UCLA commentary: However, I feel like R is not so much of a statistical package as much as it is a statistical programming environment that has many new and cutting edge features. Please note: the first sentence of the Preface of THE Green Book (PROGRAMMING WITH DATA: A GUIDE TO THE S LANGUAGE) by John Chambers, the inventor of the S Language, explicitly states: S is a programming language and environment for all kinds of computing involving data. I think this says that R is **not** meant to be a statistical package in the conventional sense and should not be considered one. As computing involving data is a complex and frequently messy business on both technical (statistics), practical (messy data), and aesthetic (graphics, tables) levels, it is perhaps to be expected that a programming language and environment for all kinds of computing involving data is complex. Personally, I find that (Chambers's next sentence) R's ability To turn ideas into software, quickly and faithfully, to be a boon. snip Right. So in 2 months I will finish my MD program here in the U.S. I also have a master's degree in Epidemiology (in which we used SAS)--but that hardly qualifies me as statistics expert. Nonetheless, I have learned to use R out of necessity without undue difficulty. So have multiple of my colleagues around me with MDs, PhDs, and Master's degrees. We do mainly microarray analysis, so the availability of a rapidly developing and customizable toolset (BioC packages) is essential to our work. And, in the same vein of others' comments, R's nuts and bolts characteristics make me think, learn, and improve. And the fear of getting Ripleyed on the mailing list also makes me think, read, and improve before submitting half baked questions to the list. So in sum, I use R because it encourages thoughtful analysis, it is flexible and extensible, and it is free. I feel that these are strengths of the environment, not weaknesses. So if an individual finds another tool better suited for their work that is obviously just fine, but I hardly think these characteristics of R are grounds for criticism, excellent proposals for evolution of documentation and mailing lists notwithstanding. -Eric This email message, including any attachments, is for the so...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
On 1/3/06, Thomas Lumley [EMAIL PROTECTED] wrote: On Tue, 3 Jan 2006, Peter Dalgaard wrote: One thing that is often overlooked, and hasn't yet been mentioned in the thread, is how much *simpler* R can be for certain completely basic tasks of practical or pedagogical relevance: Calculate a simple derived statistic, confidence intervals from estimate and SE, percentage points of the binomial distribution - using dbinom or from the formula, take the sum of each of 10 random samples from a set of numbers, etc. This is where other packages get stuck in the procedure+dataset mindset. Some of these things are actually fairly straightforward in Stata. For In fact there are some things that are very easy to do in Stata and can be done in R but only with more difficulty. For example, consider this introductory session in Stata: http://www.stata.com/capabilities/session.html Looking at the first few queries, see how easy it is to take the top few in Stata whereas in R one would have a complex use of order. Its not hard in R to write a function that would make it just as easy but its not available off the top of one's head though RSiteSearch(sort.data.frame) will find one if one knew what to search for. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] randomForest - classifier switch
Hi I am trying to use randomForest for classification. I am using this code: set.seed(71) rf.model - randomForest(similarity ~ ., data=set1[1:100,], importance=TRUE, proximity=TRUE) Warning message: The response has five or fewer unique values. Are you sure you want to do regression? in: randomForest.default(m, y, ...) rf.model Call: randomForest(x = similarity ~ ., data = set1[1:100, ], importance = TRUE, proximity = TRUE) Type of random forest: regression Number of trees: 500 No. of variables tried at each split: 10 Mean of squared residuals: 0.1159130 % Var explained: 50.8 As you can see I get a regression model. How can I make sure I get a classification model? Thanks . Stephen -- 2/01/2006 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] randomForest - classifier switch
From: Stephen Choularton Hi I am trying to use randomForest for classification. I am using this code: set.seed(71) rf.model - randomForest(similarity ~ ., data=set1[1:100,], importance=TRUE, proximity=TRUE) Warning message: The response has five or fewer unique values. Are you sure you want to do regression? in: randomForest.default(m, y, ...) rf.model Call: randomForest(x = similarity ~ ., data = set1[1:100, ], importance = TRUE, proximity = TRUE) Type of random forest: regression Number of trees: 500 No. of variables tried at each split: 10 Mean of squared residuals: 0.1159130 % Var explained: 50.8 As you can see I get a regression model. How can I make sure I get a classification model? By making sure your response variable is a factor, e.g., set1$similarity - as.factor(set1$similarity) Andy Thanks . Stephen -- 2/01/2006 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] newbie R question
I'm sorry to bother everyone with a stupid question but, when I am at an R prompt in Windows, is there a way to see what packages you already have installed from the R site so that you can just do library(name_of_package) and it will work. I've looked at help etc but I can't find a command like this. Maybe there isn't one which is fine. Mark ** This email and any files transmitted with it are confidentia...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] p-value of Logrank-Test
Hello! I want to compare two Kaplan-Meier-Curves by using the Logrank-Test: logrank(Surv(time[b], status[b]) ~ group[b]) This way I only get the value of the test-statistic, but not the p-value. Does anybody know how I can get the p-value? Thanks in advance! Verena Hoffmann __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] newbie R question
Mark Leeds [EMAIL PROTECTED] writes: I'm sorry to bother everyone with a stupid question but, when I am at an R prompt in Windows, is there a way to see what packages you already have installed from the R site so that you can just do library(name_of_package) and it will work. I've looked at help etc but I can't find a command like this. Maybe there isn't one which is fine. Just library() (w/no arguments) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] newbie R question
On Tue, 2006-01-03 at 16:07 -0500, Mark Leeds wrote: I'm sorry to bother everyone with a stupid question but, when I am at an R prompt in Windows, is there a way to see what packages you already have installed from the R site so that you can just do library(name_of_package) and it will work. I've looked at help etc but I can't find a command like this. Maybe there isn't one which is fine. library() HTH, Eric This email message, including any attachments, is for the so...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] newbie R question
On Tue, 3 Jan 2006, Mark Leeds wrote: I'm sorry to bother everyone with a stupid question but, when I am at an R prompt in Windows, is there a way to see what packages you already have installed from the R site so that you can just do library(name_of_package) and it will work. I've looked at help etc but I can't find a command like this. Maybe there isn't one which is fine. library() Mark ** This email and any files transmitted with it are confidentia...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] newbie R question
On Tue, 3 Jan 2006, Mark Leeds wrote: I'm sorry to bother everyone with a stupid question but, when I am at an R prompt in Windows, is there a way to see what packages you already have installed from the R site so that you can just do library(name_of_package) and it will work. I've looked at help etc but I can't find a command like this. Maybe there isn't one which is fine. library() (no arguments) lists all the installed packages (by library). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] newbie R question
Thanks to all. I didn't Realize that you Got so many packages Automatically. I've used S+ for roughly 10 years on and off and I am starting to switch over ( was finally forced to because my new company preferred me to use R for cost. I am the only user ) and it's unbelievable what has been done in R by everyone. Truly amazing. You should all be quite proud about what you have created. Mark -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 03, 2006 4:16 PM To: Mark Leeds Cc: R-Stat Help Subject: Re: [R] newbie R question On Tue, 3 Jan 2006, Mark Leeds wrote: I'm sorry to bother everyone with a stupid question but, when I am at an R prompt in Windows, is there a way to see what packages you already have installed from the R site so that you can just do library(name_of_package) and it will work. I've looked at help etc but I can't find a command like this. Maybe there isn't one which is fine. library() (no arguments) lists all the installed packages (by library). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 ** This email and any files transmitted with it are confidentia...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] p-value of Logrank-Test
On Tue, 3 Jan 2006, Verena Hoffmann wrote: Hello! I want to compare two Kaplan-Meier-Curves by using the Logrank-Test: logrank(Surv(time[b], status[b]) ~ group[b]) This way I only get the value of the test-statistic, but not the p-value. Does anybody know how I can get the p-value? You don't say where you found the logrank() function, but a) The survdiff() function in the survival package gives p-values as well as test statistic for the logrank test b) The test statistic presumably has a chisquare null distribution, so pchisq() would turn it into a p-value. -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] mixed effects models - negative binomial family?
Have you tried nlme? I tried to something similar. Here is the code that I used for a negative binomial random effects model library(nlme) mydata-read.table(C:\\Plx\\plx.all\\plxall.txt,header=TRUE) loglike = function(PLX_NRX,PD4_42D,GAT_34D,VIS_42D,MSL_42D,SPE_ROL,XM2_DUM,THX_DUM,b0,b1,b2,b3,b4,b5,b6,b7,alpha){ lambda = exp(b0 + b1*GAT_34D+b2*VIS_42D+b3*MSL_42D+b4*PD4_42D+b5*SPE_ROL+b6*XM2_DUM+b7*THX_DUM) y=round(PLX_NRX) y - table(y) freq - as.vector(y) count - as.numeric(names(y)) count - count[!(freq 1)] freq - freq[!(freq 1)] n - length(count) df - n - 1 df - df - 2 xbar - weighted.mean(count, freq) s2 - var(rep(count, freq)) p - xbar/s2 alpha - xbar^2/(s2 - xbar) ( dnbinom(y,alpha,(alpha/(alpha+lambda))) ) } plx.nlme-nlme(PLX_NRX~loglike(PLX_NRX,PD4_42D,GAT_34D,VIS_42D,MSL_42D,SPE_ROL,XM2_DUM,THX_DUM,b0,b1,b2,b3,b4,b5,b6,b7,alpha), data=mydata, fixed=list(b0 + b1+b2+b3+b4+b5+b6+b7+alpha~1), random=b0~1|menum, start=c(b0=0,b1=0,b2=0,b3=0,b4=0,b5=0,b6=0,b7=0,alpha=5) ) I am not sure that this is what you are looking for , but I hope this helps! Elizabeth Lawson Constantinos Antoniou [EMAIL PROTECTED] wrote: Hello all, I would like to fit a mixed effects model, but my response is of the negative binomial (or overdispersed poisson) family. The only (?) package that looks like it can do this is glmm.ADMB (but it cannot run on Mac OS X - please correct me if I am wrong!) [1] I think that glmmML {glmmML}, lmer {Matrix}, and glmmPQL {MASS} do not provide this family (i.e. nbinom, or overdispersed poisson). Is there any other package that offers this functionality? Thanking you in advance, Costas [1] Yes, I know I can use this on another OS. But it is kind of a nuisance, as I have my whole workflow setup on a mac, including emacs +ess, the data etc etc. It will be non-trivial to start moving/ syncing files between 1 computers, in order to use this package... -- Constantinos Antoniou, Ph.D. Department of Transportation Planning and Engineering National Technical University of Athens 5, Iroon Polytechniou str. GR-15773, Athens, Greece __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] All possible subsets model selection using AICc
Hello List, I was wondering if a package or piece of code exists that will allow all possible subsets regression model selection within program R. I have already looked at step(AIC) which does not test differing combinations of variables within a model as far as I can tell. In addition I tried to use the leaps command, but that does not use the criterion I am looking for. Any help or advice would be greatly appreciated. Thanks Matt Williamson Matthew Williamson Graduate Research Assistant Department of Fishery and Wildlife Biology Colorado State University, Fort Collins, CO 80523 Office: (970)491-5790 Cell:(970)412-0442 We are now confronted by the fact...that wars are no longer won;...all wars are lost by all who wage them; the only difference between participants is the degree and kind of losses they sustain. ...Science has so sharpened the fighter's sword that it is impossible for him to cut his enemy without cutting himself. --Aldo Leopold [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] All possible subsets model selection using AICc
On Tue, 3 Jan 2006, Matt Williamson wrote: Hello List, I was wondering if a package or piece of code exists that will allow all possible subsets regression model selection within program R. I have already looked at step(AIC) which does not test differing combinations of variables within a model as far as I can tell. In addition I tried to use the leaps command, but that does not use the criterion I am looking for. leaps() or regsubsets() in the leaps package almost certainly do use the criterion you are looking for (even though you don't tell us what that criterion is). These functions produce one or more best models of each size, and for models of the same size all the commonly-used criteria reduce to ranking by residual sum of squares, which is what leaps() and regsubsets() do. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Including random effects in logistic regression.
I'm trying to analyse some data using logistic regression in R, but I want to include random effects in the model. The glm function appears not to have options for including random effects, and the lme and nlme documentation indicates that these functions are for continuous, not dichotomous, response variables. Are there options in R for this type of analysis? Jason Marshal Bariloche, Argentina __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] abline in log-log plot
I'm working with a scatterplot of data, using plot() with log=xy to get log-log axes. But I can't get the regression line to plot correctly. I use abline(lm(log(Y)~log(X))) and ge a line that looks like the correct slope, but the Y-intercept is messed up. I haven't changed the y-axis other than to use the log-transformation. I can get the non-log regression line to plot as a curve on the log-log axes by using abline(lm(Y~X),unt=T), but I just want to plot the straight regression line of log(Y)~log(X). [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Connectivity across a grid above a variable surface
Hi, I'm looking for ideas or packages with relevant algorithms for calculating the connectivity across a grid, where connectivity is defined as the minimum amount of cross-sectional area along a continuous path. The upper boundary of the cross-sectional area is a fixed elevation, and the lower boundary is a gridded surface of variable elevation. My variable elevation surface represents the top of an impermeable geologic layer. I would like to represent the degree to which a fluid could flow from one end of my grid to another, above the surface and below the fixed level. I don't need to derive information about path lengths and hydraulic gradient, but if I could, that would be a plus. A groundwater flow model would provide the exact answer, but I'm looking for something more approximate and faster. My grids are such that there are many dead-end flow paths, where the bottom boundary rises to meet the top boundary and the cross-sectional area available for flow pinches out. In plan view, fluid can enter all along one boundary and leave all along the opposite boundary, but flow connectivity across the grid varies between bottom boundary scenarios. Scott Waichler Pacific Northwest National Laboratory scott.waichler _at_ pnl.gov __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] all possible combinations of list elements
I have a list as follows P - list(A = c(CS, CX), B = 1:4, Y = c(4, 9)) I now would like to prepare a new list where the rows of the new list provide all possible combinations of the elements in the orginal list. Thus, the result should be the following CS 1 4 CS 1 9 CS 2 4 CS 2 9 CS 3 4 CS 3 9 CS 4 4 CS 4 9 CX 1 4 CX 1 9 CX 2 4 CX 2 9 CX 3 4 CX 3 9 CX 4 4 CX 4 9 Is there a simple routine in R to create this list of all possible combinations? The routine will be part of a function with the list P as an input. P will not always have the same number of elements and each element in the list P may have different numbers of values. Thanks, Eberhard Morgenroth Eberhard Morgenroth, Assistant Professor of Environmental Engineering University of Illinois at Urbana-Champaign 3219 Newmark Civil Engineering Laboratory, MC-250 205 North Mathews Avenue, Urbana, IL 61801, USA Email: [EMAIL PROTECTED] http://cee.uiuc.edu/research/morgenroth __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] all possible combinations of list elements
On Tue, 2006-01-03 at 18:57 -0600, Eberhard F Morgenroth wrote: I have a list as follows P - list(A = c(CS, CX), B = 1:4, Y = c(4, 9)) I now would like to prepare a new list where the rows of the new list provide all possible combinations of the elements in the orginal list. Thus, the result should be the following CS1 4 CS1 9 CS2 4 CS2 9 CS3 4 CS3 9 CS4 4 CS4 9 CX1 4 CX1 9 CX2 4 CX2 9 CX3 4 CX3 9 CX4 4 CX4 9 Is there a simple routine in R to create this list of all possible combinations? The routine will be part of a function with the list P as an input. P will not always have the same number of elements and each element in the list P may have different numbers of values. See ?expand.grid expand.grid(P) A B Y 1 CS 1 4 2 CX 1 4 3 CS 2 4 4 CX 2 4 5 CS 3 4 6 CX 3 4 7 CS 4 4 8 CX 4 4 9 CS 1 9 10 CX 1 9 11 CS 2 9 12 CX 2 9 13 CS 3 9 14 CX 3 9 15 CS 4 9 16 CX 4 9 HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R:
Hello, Unlike most posts on the R mailing list I feel qualified to comment on this one. For about 3 months I have been trying to learn use R, after having used various versions of SPSS for about 10 years. I think it is far too simplistic to ascribe non-use of R to laziness. This may well be the case for some, however, I have read 5-6 books on R, waded through on-line resources, read the documentation and asked multiple questions via e-mails - and still find even some of the basics very difficult. There are several reasons for this: 1. For some tasks R is extremely user-unfriendly. Some comparative examples: (a) In running a chi-square analysis in SPSS the following syntax is included /STATISTIC=CHISQ /CELLS= COUNT EXPECTED ROW COLUMN TOTAL RESID . this produces expected and observed counts, row column percentages, residuals, chi-square Fisher's exact test + other output. In R, it is a herculean task to produce similar output . It certainly, can't be produced in 2 lines as far as I can tell. (b) in SPSS if I want to compare multiple variables by a single dependent variable this is readily performed CROSSTABS /TABLES=baserdis baserenh basersoc baseradd socbest disbest entbest addbest worsdis worsphy by group I used the chi-square example again, but the same applies for a t-test. I started looking into how to do something similar in R, with the t-test command but gave up. R does force the user to take a more considered approach to analysis. (c) To obtain a correlation matrix in R with the correlation p-value is no simple task - In SPSS this is obtained via: GET FILE='D:\a study\data\dat\key data\master data.sav'. NONPAR CORR /VARIABLES= goodnum badnum good5 bad5 avfreq avdayamt /PRINT=KENDALL TWOTAIL /MISSING=PAIRWISE . In R something like this is required - by(mydat, mydat$group, function(x) { + nm - names(x) + rho - matrix(, 6, 2) + rho.nm - matrix(, 6, 2) + k - 1 + for(i in 2:4) { + for(j in (i + 1):5) { + x.i - x[, i] + x.j - x[, j] + ct - cor.test(x.i, x.j, method=c(kendall) , alternative =c(two-sided)) + rho[k, 1] - ct$estimate + rho[k, 2] - round(ct$p-value, 3) + rho.nm[k, ] - c(nm[i], nm[j]) + k - k + 1 + } + } + rho - cbind(as.data.frame(rho.nm), as.data.frame(rho)) + names(rho) - c(freq.i, freq.j, cor, p-value) + rho + }) 2) It is not always clear what the output produced by R, is. The Mann-Whitney U-test is a good example. In R, it seems a standardised value is obtained. I was advised that it is easy enough to check this as R is open-source, but at least for me, I don't believe I would understand this code anyway. It is confusing when comparative programs such as R and SPSS produce dis-similar results. For the user it is important to be able to fairly easily reconcile such differences, to engender confidence in results. 3) I find the help files in R quite difficult to understand. For example, see help(t.test). It is almost assumed by the examples that you know what to do. Personally, I would find some form of simple decision tree easier -e.g. If you want to perform a t-test with the dependent variable in one column and the dependent use in another use t.test(AVFREQ~GROUP) . If you want to perform a t-test with the dependent variable in separate columns (each column representing a different group) use - t.test(AVFREQ1, AVFREQ2) . 4) My initial approach to using R, was to run commands I had used commonly in SPSS and compare the results. I have only got as far as basic ANOVA. This has been time-consuming and at times it has been difficult to obtain advice. Some people on the R list have been extremely generous with their time and knowledge, and I have much appreciated this assistance. At other times I see responses met with something like arrogance. With the sophistication of R, there is also an elitism. This is a barrier to R being more widely accepted and used. 5) differences in terminology - this is just part of the learning process, but I still found it took quite some time to work out simple commands and what different analyses were called. 6) system administrators may be wary of freeware. No doubt for the sophisticated user, my comments may seem trite and easily resolved, however I believe my comments have some relevance as to why R is not more readily used or accepted. Bob Green __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Including random effects in logistic regression.
Jason the lmer() function in the Matrix package is what you will need. -Original Message- From: [EMAIL PROTECTED] on behalf of Jason Marshal Sent: Tue 1/3/2006 8:10 AM To: r-help@stat.math.ethz.ch Cc: Subject:[R] Including random effects in logistic regression. I'm trying to analyse some data using logistic regression in R, but I want to include random effects in the model. The glm function appears not to have options for including random effects, and the lme and nlme documentation indicates that these functions are for continuous, not dichotomous, response variables. Are there options in R for this type of analysis? Jason Marshal Bariloche, Argentina __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Glimmix and glm
I'm not certain what you are asking. I just got 10 hits for 'RSiteSearch(Glimmix)'. Seven of them mentioned SAS PROC GLIMMIX: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/65945.html http://finzi.psych.upenn.edu/R/Rhelp02a/archive/65954.html http://finzi.psych.upenn.edu/R/Rhelp02a/archive/53310.html http://finzi.psych.upenn.edu/R/Rhelp02a/archive/53311.html http://finzi.psych.upenn.edu/R/Rhelp02a/archive/65935.html http://finzi.psych.upenn.edu/R/Rhelp02a/archive/65949.html http://finzi.psych.upenn.edu/R/Rhelp02a/archive/57731.html If you'd like more help from this group, PLEASE do read the posting guide! www.R-project.org/posting-guide.html. Anecdotal evidence suggests that posts that conform more closely to the suggestions there tend to get quicker more useful replies. Best Wishes, Spencer Graves [EMAIL PROTECTED] wrote: Hello. Some months age an e-mail was posted in which a comparison between Glimmix and glm was discussed. I have not been able to find that e-mail on the R archive. Does anyone recall the date of the above e-mail? Thank you very much. *** Antonio Paredes USDA- Center for Veterinary Biologics Biometrics Unit 510 South 17th Street, Suite 104 Ames, IA 50010 (515) 232-5785 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A comment about R: (sort.data.frame)
Gabor Grothendieck wrote on 1/3/2006 2:37 PM: Looking at the first few queries, see how easy it is to take the top few in Stata whereas in R one would have a complex use of order. Its not hard in R to write a function that would make it just as easy but its not available off the top of one's head though RSiteSearch(sort.data.frame) will find one if one knew what to search for. Yes, R has a few peculiar gaps. As to sort.data.frame(), it should be added to R base, in my opinion. It is silly to make people download code for such a basic operation. MHP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Looking for packages to do Feature Selection and Classification
Hi All, Sorry if this is a repost (a quick browse didn't give me the answer). I wonder if there are packages that can do the feature selection and classification at the same time. For instance, I am using SVM to classify my samples, but it's easy to get overfitted if using all of the features. Thus, it is necessary to select good features to build an optimum hyperplane (?). Here is a simple example: Suppose I have 100 useful features and 100 useless features (or noise features), I want the SVM to give me the same results when 1) using only 100 useful features or 2) using all 200 features. Any suggestions or point me to a reference? Thanks in advance! Frank [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] newbie where to look question
I don't want to bother anyone with specific questions because I am a R newbie and I see that there is TON ( emphasis on Ton ) of documentation out there but could someone just tell me the best placed to look/read for learning about ( for R-2-2.1 in Windows ) .Rprofile .REnviron. .Rdata .First function ( analogous to the one in Splus ). Analog of Splus Chapter Basically, I want to learn how to start R so that my own source code and various packages are already available when I start up and how to make separate .Data ( I used to do this in Splus with Splus Chapter ) directories etc. I am willing to fight through it and try to figure it out myself but there's so much stuff on the net in terms of threads etc that I might be helped by knowing the best place to start to learn. Thanks. Mark ** This email and any files transmitted with it are confidentia...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] newbie where to look question
On 3 January 2006 at 22:56, Mark Leeds wrote: | out there but could | someone just tell me the | best placed to look/read | for learning about ( for R-2-2.1 in Windows ) | | .Rprofile | .REnviron. | .Rdata ?Startup | .First function ( analogous to the one in Splus ). ?.First | Analog of Splus Chapter Not sure. Running RSiteSearch(S-Plus Chapter) leads to the R Data Import/Export manual, and RSiteSearch(SPlus Chapter) has some hits too. | Basically, I want to learn how to start R so that | my own source code and various | packages are already available when | I start up and how to make separate .Data ( I used to | do this in Splus with Splus Chapter ) directories etc. That's done a little differently here but I do not know of a good migration guide for users with prior S-Plus experience. Hth, Dirk -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] unexpected false convergence
I replicated your 'false convergence' using R 2.2.0: sessionInfo() R version 2.2.0, 2005-10-06, i386-pc-mingw32 attached base packages: [1] methods stats graphics grDevices utils datasets [7] base other attached packages: nlme MASS 3.1-66 7.2-23 Since the error message said, Error in lme.formula, I listed the code for lme.formula and traced it using debug(lme.formula), The function glmmPQL calls lme.formula repeatedly. The function lme.formula in turn calls nlminb when it's available, though it used to call optim. The fifth time lme.formula was called, nlminb returned the error message false convergence (8). Under R 2.2, nlminb is part of the base package. I'm not certain, but I don't think it was available in base under R 2.1.1. I think this explains the problem, but not how to fix it. I tried modifying the code fo lme.formula to force it to call optim, but this generated a different error. I am therefore copying Professors Bates Ripley in case one of them might want to look at this. hope this helps. spencer graves Jack Tanner wrote: I've come into some code that produces different results under R 2.1.1 and R 2.2.1. I'm really unfamiliar with the libraries in question (MASS and nlme), so I don't know if this is a bug in my code, or a regression in R. If it's a bug on my end, I'd appreciate any advice on potential causes and relevant documentation. The code: score-c(1,8,1,3,4,4,2,5,3,6,0,3,1,5,0,5,1,11,1,2,4,5,2,4,1,6,1,2,8,16,5,16,3,15,3,12,4,9,2,4,1,8,2,6,4,11,2,9,3,17,2,6) id-rep(1:13,rep(4,13)) test-gl(2,1,52,labels=c(pre,post)) coder-gl(2,2,52,labels=c(two,three)) il-data.frame(id,score,test,coder) attach(il) cs1-corSymm(value=c(.396,.786,.718,.639,.665,.849),form=~1|id) cs1-Initialize(cs1,data=il) run-glmmPQL(score~test+coder, random=~1|id,family=poisson,data=il,correlation=cs1) The output under R 2.2.1, which leaves the run object (last line of the code) undefined: iteration 1 iteration 2 iteration 3 iteration 4 Error in lme.formula(fixed = zz ~ test + coder, random = ~1 | id, data = list( : false convergence (8) Under R 2.1.1, I get exactly 4 iterations as well, but no false convergence message, and run is defined. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] newbie where to look question
On Tue, 2006-01-03 at 22:56 -0500, Mark Leeds wrote: I don't want to bother anyone with specific questions because I am a R newbie and I see that there is TON ( emphasis on Ton ) of documentation out there but could someone just tell me the best placed to look/read for learning about ( for R-2-2.1 in Windows ) .Rprofile .REnviron. .Rdata .First function ( analogous to the one in Splus ). Analog of Splus Chapter Basically, I want to learn how to start R so that my own source code and various packages are already available when I start up and how to make separate .Data ( I used to do this in Splus with Splus Chapter ) directories etc. I am willing to fight through it and try to figure it out myself but there's so much stuff on the net in terms of threads etc that I might be helped by knowing the best place to start to learn. Thanks. Mark, One of the best places to start looking is actually the R e-mail list Posting Guide, for which there is a link at the bottom of every e-mail that comes through the list: http://www.r-project.org/posting-guide.html Much of what you want to cover is in An Introduction to R, which is available from the menus in the Windows version or online at the main R web site under Manuals. Additional information on your specific questions are available using ?Startup and ?.First from within an R session. For Chapters, see ?save and ?load, which I believe will provide for parallel functionality in a fashion. The main R FAQ: http://cran.r-project.org/doc/FAQ/R-FAQ.html and the R Windows FAQ: http://cran.r-project.org/bin/windows/base/rw-FAQ.html are good resources as well. If you are transitioning from S-PLUS, you might want to pay particular attention to section 3.3 of the main R FAQ on the differences between R and S-PLUS. Finally, thanks to Andy Liaw and Jon Baron, there is there RSiteSearch() function, which will enable you to search the e-mail list archives and documentation online from within an R session. See ?RSiteSearch. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Questions about cbind
Dear R-helpers I have a stupid question about cbind function. Suppose I have a dataframe like this Frame: A 10 C 20 B 40 and a numeric matrix like this Matrix: A 1 B 2 C 3 cbind(Frame[,2],Matrix[,1]) simply binds these two columns without checking the order, I mean, the result will be A 10 1 B 20 2 C 30 3 rather than A 10 1 B 30 2 C 20 3 So my problem is: Is there any solution for R to bind columns with correct order? Many thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Questions about cbind
On Wed, 2006-01-04 at 13:28 +0800, Vincent Deng wrote: Dear R-helpers I have a stupid question about cbind function. Suppose I have a dataframe like this Frame: A 10 C 20 B 40 and a numeric matrix like this Matrix: A 1 B 2 C 3 cbind(Frame[,2],Matrix[,1]) simply binds these two columns without checking the order, I mean, the result will be A 10 1 B 20 2 C 30 3 rather than A 10 1 B 30 2 C 20 3 So my problem is: Is there any solution for R to bind columns with correct order? Many thanks I presume that either the '40' in the first expression of Frame or the '30's in the second and third outputs are typos? See ?merge, which will perform SQL-like 'join' operations using a primary key: Frame V1 V2 1 A 10 2 C 20 3 B 40 Note that despite the name, this is not a matrix, but also a data frame. A matrix can only have one data type, while a data frame can have more than one. Matrix V1 V2 1 A 1 2 B 2 3 C 3 merge(Frame, Matrix, by = V1) V1 V2.x V2.y 1 A 101 2 B 402 3 C 203 HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] silly, extracting the value of C from the results of somers2
Sorry I have a very simple question: I used somers2 function from Design package: z- somers2(x,y, weights=w) results are: z CDxynMissing 0.88 0.76 5000.00 Now I want to call only the value of C to be used in further analyses, but I fail to do it. I have tried: z$C NULL z[,C] Error in z[,C]: incorrect number of dimensions and some other silly things. If I do list(z) [[1]] CDxynMissing 0.88 0.76 5000.00 Can somebody tell me how can I obtain just the value of c? Thank you useRs, very gratefull Ahimsa -- Ahimsa Campos Arceiz The University Museum, The University of Tokyo Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033 phone +81-(0)3-5841-2824 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] silly, extracting the value of C from the results of somers2
ahimsa campos arceiz wrote: Sorry I have a very simple question: I used somers2 function from Design package: z- somers2(x,y, weights=w) results are: z CDxynMissing 0.88 0.76 5000.00 Now I want to call only the value of C to be used in further analyses, but I fail to do it. I have tried: z$C NULL z[,C] Error in z[,C]: incorrect number of dimensions and some other silly things. If I do list(z) [[1]] CDxynMissing 0.88 0.76 5000.00 Can somebody tell me how can I obtain just the value of c? (I think that somers2() is in package:Hmisc.) The help page clearly says that somers2 returns a vector and there's an example on the help page that does _exactly_ what you ask! z[C] or z[1] Peter Ehlers Thank you useRs, very gratefull Ahimsa __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html