Re: [R] Monotonic interpolation
RSiteSearch(monotone, restr=func) will give you several packages and functions for monotone smoothing, including the isoreg() function in the standard stats package. You can determine if any of these does what you want. Bert Gunter Genetech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of excalibur Sent: Thursday, September 06, 2007 8:04 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] Monotonic interpolation Le jeu. 6 sept. à 09:45, excalibur a écrit : Hello everybody, has anyone got a function for smooth monotonic interpolation (splines ...) of a univariate function (like a distribution function for example) ? approxfun() might be what your looking for. Is the result of approxfun() inevitably monotonic ? -- View this message in context: http://www.nabble.com/Monotonic-interpolation-tf4392288.html#a12524568 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] computing distance in miles or km between 2 street addre
There is a well-known (greeedy) algorithm due to Dijkstra for choosing the shortest path = minimum weight path on a weighted digraph between two vertices. I'm sure numerous open source versions of this are available. optim() is not relevant. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ted Harding Sent: Thursday, September 06, 2007 3:18 PM To: Philip James Smith; r-help@stat.math.ethz.ch Subject: Re: [R] computing distance in miles or km between 2 street addre On 06-Sep-07 18:42:32, Philip James Smith wrote: Hi R-ers: I need to compute the distance between 2 street addresses in either km or miles. I do not care if the distance is a shortest driving route or if it is as the crow flies. Does anybody know how to do this? Can it be done in R? I have thousands of addresses, so I think that Mapquest is out of the question! Please rely to: [EMAIL PROTECTED] Thank you! Phil Smith That's a somewhat ill-posed question! You will for a start need a database of some kind, either of geographical locations (coordinates) of street addresses, or of the metric of the road network with capability to identify the street addresses in the database. If it's just as the crow flies, then it can be straightforwardly computed in R, either by Pythogoras (when they are not too far apart) or using a function which takes account of the shape of the Earth, There are many R packages which have to do with mapping data. Search for map through the list of R packages at http://finzi.psych.upenn.edu/R/library/maptools/html/00Index.html -- maptools in particular. Also look at (for instance) aspace. For shortest driving route then you need to find the shortest distance through a network. You may find some hints in the package optim -- but there must be some R experts out there on this sort of thing! However, the primary need is for the database which gives the distance information in one form or another. What were you proposing to use for this? As far as I know, R has no database relevant to street addresses! Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 06-Sep-07 Time: 23:17:57 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Robust linear models and unequal variance
Let me try a reply, although I wish others wiser than I had responded. 1. How do you know the variances are unequal? 2. If you somehow know what the variances are (or at least their relative sizes), you can use the weights arguments of the functions you mentions to weight inversely proportional to variance (except not for the MM method in rlm() according to the docs.) 3. That ranked regression is robust is a myth. It also does not deal with the unequal variance situation. It is not a panacea for anything. If you need robust regression use robust regression. 4. If group sizes are not too dissimilar, than whether you case weight or not may not make much difference (alas, hard to tell a priori). Especially to estimation. The fundamental issue is that outliers and unequal variances must be operationalized, otherwise they are confounded: outlier only has meaning compared to what is expected from a specified distribution. Outliers are no longer out when the variance is large. Also look at glm() with the quasi option if you wish to consider fitting a heterogeneous variance structure to initialize a robust method (which could, of course, be distorted by your outliers). Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Geertje Van der Heijden Sent: Tuesday, September 04, 2007 10:55 AM To: r-help@stat.math.ethz.ch Subject: [R] Robust linear models and unequal variance Hi all, I have probably a basic question, but I can't seem to find the answer in the literature or in the R-archives. I would like to do a robust ANCOVA (using either rlm or lmRob of the MASS and robust packages) - my response variable deviates slightly from normal and I have some outliers. The data consist of 2 factor variables and 3-5 covariates (fdepending on the model). However, the variance between my groups is not equal and I am not sure if it is therefore appropriate to use a robust statistical method or if a non-parametric analysis (i.e. ranked regression) might be better. If I can still use a robust statistical method, which estimator is best to use to deal with unequal variance? And if it is better to use a non-parametric analysis, could anyone put me in the direction of the right non-parametric method to use (the relationship between my response variable and the covariates is linear)? Any help on this would be greatly appreciated! Many thanks, Geertje Geertje van der Heijden PhD student Tropical Ecology School of Geography University of Leeds Leeds LS2 9JT Tel: (+44)(0)113 3433345 Email: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saving plot into file
?Devices e.g. ?pdf Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of uv Sent: Friday, August 31, 2007 5:41 AM To: r-help@stat.math.ethz.ch Subject: [R] Saving plot into file Hello. I am using R with Mac X11. I am looping through a few hundreds of text lines, making a plot() for each of them. I would like to save these plots graphical images into separate graphical files and I didn't succeed doing that. I would be grateful for any suggestion. -- View this message in context: http://www.nabble.com/Saving-plot-into-file-tf4359947.html#a12425669 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] retrieve p-value from a cox.obj
str() is your friend. It tells you about the structure of any R object, from which you can usually glean what you need to know to get what you want. It is often useful to use it on summary(object) rather than on the object, as the summary method for an (S3) classed object often contains what you're looking for. Less generally, names() and as.list() can sometimes get you what you want also. Alternatively, check the summary.coxph() code (survival:::summary.coxph, as it's hidden in the namespace). It is clear there how to get what you want, either direct from the fitted object or from the summary.coxph object. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of clearsky Sent: Wednesday, August 29, 2007 8:41 AM To: r-help@stat.math.ethz.ch Subject: [R] retrieve p-value from a cox.obj I have a cox.obj named obj, obj - coxph( Surv(time, status) ~ group, surv.data) now I want to retrieve the p-value from obj, so that I can run this hundreds of times and plot out the distribution of the p-value. could anyone tell me how to get p-value from obj? thanks, -- View this message in context: http://www.nabble.com/retrieve-p-value-from-a-cox.obj-tf4348520.html#a123896 52 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excel
Erich: This is not a comment either for or against the use of Excel. I only wish to point out that AFAICS, Hadley Wickham's reshape package offers all the pivot table functionality and more. If I am wrong about this, please let me and everyone else know. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Erich Neuwirth Sent: Wednesday, August 29, 2007 11:43 AM To: r-help Subject: Re: [R] Excel Excel bashing can be fun but also can be dangerous because you are makeing your life harder than necessary. Statisticians meanwhile know that the numerics of statistical computation can be quite bad, therefore one should not use them. But using our (we = Thomas Baier + Erich Neuwirth) RExcel addin either with the R(D)COM server or with rcom (package on CRAN) allows you to use all the nice features of Excel (yes, there are quite a few) and use R as as the computational engine within Excel. The formula =RApply(var,A1:A1000) in an Excel cell for example will use R to compute the variance of the data in column A in Excel. If you change any of the values in the range A1:A1000 will automatically recompute the variance. There is one feature in Excel which is extremely convenient, Pivot tables. Anybody doing any work as statistical consultant really ought to know about Pivot tables, and I am still surprised how many statisticians do not know about it. Neither Gnumeric nor OpenOffice Calc offer comparably convenient ways working with multidimensional tables. I think the answer to the question Excel or R of course is Excel and R. -- Erich Neuwirth, University of Vienna Faculty of Computer Science Computer Supported Didactics Working Group Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-39464 Fax: +43-1-4277-39459 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Experimental Design with R
Please use R's search tools. RSiteSearch(experimental design, restr = funct) finds optBlock() in the AlgDesign package as the 10th hit. Whether this package will have what you want is another issue. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc BERVEILLER Sent: Tuesday, August 28, 2007 6:12 AM To: r-help@stat.math.ethz.ch Subject: [R] Experimental Design with R Dear R-users, I want to know if there is a package that allows to define different experimental designs (factorial, orthogonal, taguchi) and to compare them. I don't found one in the R-web site, but it is possible I missed it! Thank you in advance Sincerely, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset using noncontiguous variables by name (not index)
The problem is that x3:x5 does not mean what you think it means. The only reason it does the right thing in subset() is because a clever trick is used there (read the code -- it's not hard to understand) to ensure that it does. Gabor has essentially mimicked that trick in his solution. However, it is not necessary do this. You can construct the call directly as you tried to do. Using the anscombe example, here's how: chooz - c(x1,x3:x4,y2) ## enclose the desired expression in quotes do.call (subset, list( x = anscombe, select = parse(text = chooz))) -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Sunday, August 26, 2007 2:10 PM To: Muenchen, Robert A (Bob) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] subset using noncontiguous variables by name (not index) Using builtin data frame anscombe try this. First we set up a data frame anscombe.seq which has one row containing 1, 2, 3, ... . Then select out from that data frame and unlist it to get the desired index vector. anscombe.seq - replace(anscombe[1,], TRUE, seq_along(anscombe)) idx - unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2))) anscombe[idx] x1 x3 x4 y2 1 10 10 8 9.14 2 8 8 8 8.14 3 13 13 8 8.74 4 9 9 8 8.77 5 11 11 8 9.26 6 14 14 8 8.10 7 6 6 8 6.13 8 4 4 19 3.10 9 12 12 8 9.13 10 7 7 8 7.26 11 5 5 8 4.74 On 8/26/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Hi All, I'm using the subset function to select a list of variables, some of which are contiguous in the data frame, and others of which are not. It works fine when I use the form: subset(mydata,select=c(x1,x3:x5,x7) ) In reality, my list is far more complex. So I would like to store it in a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to work. That use of the c function seems to violate R rules, so I'm not sure how it works at all. A small simulation of the problem is below. If the variable names orders were really this simple, I could use indices like summary( mydata[ ,c(1,3:5,7) ] ) but alas, they are not. How does the c function work this way in the first place, and how can I make this substitution? Thanks, Bob mydata - data.frame( x1=c(1,2,3,4,5), x2=c(1,2,3,4,5), x3=c(1,2,3,4,5), x4=c(1,2,3,4,5), x5=c(1,2,3,4,5), x6=c(1,2,3,4,5), x7=c(1,2,3,4,5) ) mydata # This does what I want. summary( subset(mydata,select=c(x1,x3:x5,x7) ) ) # Can I substitute myVars? attach(mydata) myVars1 - c(x1,x3:x5,x7) # Not looking good! myVars1 # This doesn't do the right thing. summary( subset(mydata,select=myVars1 ) ) # Total desperation on this attempt: myVars2 - x1,x3:x5,x7 myVars2 # This doesn't work either. summary( subset(mydata,select=myVars2 ) ) = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] uneven list to matrix
It can be done straightforwardly -- don't know about efficientcy -- without recourse to zoo or merge: ## example data x- 1:5 names(x) - letters[1:5] alph - list( x[1:4], x[c(1,3,4)],x[c(1,4,5)]) ## Solution rn - unique(unlist(sapply(alph,names))) mx - matrix( nr=length(rn), nc=length(alph),dimnames = list(rn,NULL)) ## use dimnames = sort(rn) if you want to sort them for(i in seq(length(alph))){y - alph[[i]]; mx[names(y),i] - y} Bert Gunter Nonclinical Statistics 7-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christopher Marcum Sent: Thursday, August 23, 2007 10:27 PM To: Gabor Grothendieck Cc: r-help@stat.math.ethz.ch Subject: Re: [R] uneven list to matrix Hi Gabor, My apologies. Both solutions work just fine on large lists (n=1000, n[[i]]=500). A memory problem on my machine caused the error and fail-to-sort. Thank you! PS - The zoo method is slightly faster. Best, Chris Gabor Grothendieck wrote: On 8/24/07, Christopher Marcum [EMAIL PROTECTED] wrote: Hi Gabor, Thank you. The native solution works just fine, though there is an interesting side effect, namely, that with very large lists the rows of the output become scrambled though the corresponding columns are correctly sorted. The zoo package solution does not work on large lists: there is an error: Error in order(na.last, decreasing, ...) : argument 1 is not a vector They both work on the example data. Please provide reproducible examples to illustrate your comments if you would like a response. Gabor Grothendieck wrote: Here are two solutions. The first repeatedly uses merge and the second creates a zoo object from each alph component whose time index consists of the row labels and uses zoo's multiway merge to merge them. # test data m - matrix(1:5, 5, dimnames = list(LETTERS[1:5], NULL)) alph - list(m[1:4,,drop=F], m[c(1,3,4),,drop=F], m[c(1,4,5),,drop=F]) alph # solution 1 out - alph[[1]] for(i in 2:length(alph)) { out - merge(out, alph[[i]], by = 0, all = TRUE) row.names(out) - out[[1]] out - out[-1] } matrix(as.matrix(out), nrow(out), dimnames=list(rownames(out),NULL)) # solution 2 library(zoo) z - do.call(merge, lapply(alph, function(x) zoo(c(x), rownames(x matrix(coredata(z), nrow(z), dimnames=list(time(z),NULL)) On 8/23/07, Christopher Marcum [EMAIL PROTECTED] wrote: Hello, I am sure I am not the only person with this problem. I have a list with n elements, each consisting of a single column matrix with different row lengths. Each row has a name ranging from A to E. Here is an example: alph[[1]] A 1 B 2 C 3 D 4 alph[[2]] A 1 C 3 D 4 alph[[3]] A 1 D 4 E 5 I would like to create a matrix from the elements in the list with n columns such that the row names are preserved and NAs are inserted into the cells where the uneven lists do not match up based on their row names. Here is an example of the desired output: newmatrix [,1] [,2] [,3] A 1 1 1 B 2 NANA C 3 3 NA D 4 4 4 E NANA5 Any suggestions? I have tried do.call(cbind,list) I also thought I was on the right track when I tried converting each element into a vector and then running this loop (which ultimately failed): newmat-matrix(NA,ncol=3,nrow=5) colnames(newmatrix)-c(A:E) for(j in 1:3){ for(i in 1:5){ for(k in 1:length(list[[i]])){ if(is.na(match(colnames(newmatrix),names(alph[[i]])))[j]==TRUE){ newmatrix[i,j]-NA} else newmatrix[i,j]-alph[[i]][k]}}} Thanks, Chris UCI Sociology __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regulatory Compliance and Validation Issues
FWIW: The few companies with which I'm familiar have significance resources -- i.e. software QC departments -- devoted to validating that internally developed code (e.g. SAS macros) used in submissions does what its claims to do. Extensive documentation of the code and the validation processs is required. All changes to such code must of course be documented and validated. I believe this is all part of CFR Part 11 requirements. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr Sent: Monday, August 20, 2007 12:17 PM To: Cody Hamilton Cc: Thomas Lumley; r-help@stat.math.ethz.ch Subject: Re: [R] Regulatory Compliance and Validation Issues Cody Hamilton wrote: Dear Thomas, Thank you for your reply. You are of course quite right - the R Foundation couldn't be responsible for any individually contributed package. I am curious as to how an orgainization operating in a regulated environment could safely use a contributed package. What if the author/maintainer retires or loses interest in maintaining the package? The organization would then find itself in the awkward position of being reliant on software for which there is no technical support and which may not be compatible with future versions of the base R software. I suppose the organization could take responsibility for maintaining the individual functions within a package on its own (one option made possible by the open source nature of R), but this would require outstanding programming resources which the company may not have (Thomas Lumleys are sadly rare). In addition, as the organization is claiming the functions as their own (and not as out-of-the-box software), the level of required validation would be truly extraordinary. I also wonder if an everyone-maintain-their-own-copy approach could lead to multiple mutated vers i! ons of a package's functions across the R universe (e.g. Edwards' version of sas.get() vs. Company X's version of sas.get(), etc.). Regards, -Cody Cody, I think of this issue as not unlike an organization using its own code written by its own analysts or SAS programmers. Code is reused all the time. Frank As always, I am speaking for myself and not necessarily for Edwards Lifesciences. -Original Message- From: Thomas Lumley [mailto:[EMAIL PROTECTED] Sent: Sunday, August 19, 2007 8:50 AM To: Cody Hamilton Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Regulatory Compliance and Validation Issues On Fri, 17 Aug 2007, Cody Hamilton wrote: snip I have a few specific comments/questions that I would like to present to the R help list. snip 2. While the document's scope is limited to base R plus recommended packages, I believe most companies will need access to functionalities provided by packages not included in the base or recommended packages. (For example, I don't think I could survive without the sas.get() function from the Design library.) How can a company address the issues covered in the document for packages outside its scope? For example, what if a package's author does not maintain historical archive versions of the package? What if the author no longer maintains the package? Is the solution to add more packages to the recommended list (I'm fairly certain that this would not be a simple process) or is there another solution? This will have to be taken up with the package maintainer. The R Foundation doesn't have any definitive knowledge about, eg, Frank Harrell's development practices and I don't think the FDA would regard our opinions as relevant. Archiving, at least, is addressed by CRAN: all the previously released versions of packages are available 3. At least at my company, each new version must undergo basically the same IQ/OQ/PQ as the first installation. As new versions of R seem to come at least once a year, the ongoing validation effort would be painful if the most up-to-date version of R is to be maintained within the company. Is there any danger it delaying the updates (say updating R within the company every two years or so)? It's worse than that: there are typically 4 releases of R per year (the document you are commenting on actually gives dates). The ongoing validation effort may indeed be painful, and this was mentioned as an issue in the talk by David James Tony Rossini. The question of what is missed by delaying updates can be answered by looking at the NEWS file. The question of whether it is dangerous is really an internal risk management issue for you. -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair
Re: [R] Convert factor to numeric vector of labels
Matt: I believe you have confused issues. Setting stringsAsFactors = FALSE would dramatically **increase** the amount of memory used for storing character vectors, which is what factors are for. So your proposed solution does exactly the opposite of what you want. The issue you are worried about is when numeric fields are somehow interpreted as non-numeric. This can happen for a variety of reasons (stray characters in numeric fields,quotes around numbers,...). The solution is not to set a global default that does the opposite of what you want in its intended use, but to read the documentation and either set the appropriate arguments (perhaps colClasses of read.table) or fix the original data before R reads it (e.g. remove quotes and stray characters). Failing that, the one-off solutions given are the correct way to handle what is a data problem, not an R problem. However, I should add that there are arguments for making stringsAsFactors = FALSE; search the archives for discussions why. The memory penalty will have to be paid, of course. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Matthew Keller Sent: Tuesday, August 14, 2007 12:48 PM To: John Kane Cc: Falk Lieder; r-help@stat.math.ethz.ch Subject: Re: [R] Convert factor to numeric vector of labels Hi all, If we, the R community, are endeavoring to make R user friendly (gasp!), I think that one of the first places to start would be in setting stringsAsFactors = FALSE. Several times I've run into instances of folks decrying R's rediculous usage of memory in reading data, only to come to find out that these folks were unknowingly importing certain columns as factors. The fix is easy once you know it, but it isn't obvious to new users, and I'd bet that it turns some % of people off of the program. Factors are not used often enough to justify this default behavior in my opinion. When factors are used, the user knows to treat the variable as a factor, and so it can be done on a case-by-case (or should I say variable-by-variable?) basis. Is this a default that should be changed? Matt On 8/13/07, John Kane [EMAIL PROTECTED] wrote: This is one of R's rather _endearing_ little idiosyncrasies. I ran into it a while ago. http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html For some reason, possibly historical, the option stringAsFactors is set to TRUE. As Prof Ripley says FAQ 7.10 will tell you as.numeric(as.character(f)) # for a one-off conversion From Gabor Grothendieck A one-off solution for a complete data.frame DF - data.frame(let = letters[1:3], num = 1:3, stringsAsFactors = FALSE) str(DF) # to see what has happened. You can reset the option globally, see below. However you might want to read Gabor Grothendieck's comment about this in the thread referenced above since it could cause problems if you transfer files alot. Personally I went with the global option since I don't tend to transfer programs to other people and I was getting tired of tracking down errors in my programs caused by numeric and character variables suddenly deciding to become factors. From Steven Tucker: You can also this option globally with options(stringsAsFactors = TRUE) # in \library\base\R\Rprofile --- Falk Lieder [EMAIL PROTECTED] wrote: Hi, I have imported a data file to R. Unfortunately R has interpreted some numeric variables as factors. Therefore I want to reconvert these to numeric vectors whose values are the factor levels' labels. I tried as.numeric(factor), but it returns a vector of factor levels (i.e. 1,2,3,...) instead of labels (i.e. 0.71, 1.34, 2.61,.). What can I do instead? Best wishes, Falk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Legend on graph
You can get the legend outside the plot region by 1. First changing the clipping region via par(xpd = TRUE) ; (or xpd=NA). see ?par 2. Specifying x and y coodinates for legend placement outside the limits of the plot region. This allows you to include a legend without adding a bunch of useless whitespace to the plot region; or to add a grid to the plot without interfering with the legend. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nguyen Dinh Nguyen Sent: Monday, August 13, 2007 3:42 PM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: [R] Legend on graph Hi Akki, Then you may need to increase y-axis scale by ylim=c(min,max) Cheers Nguyen On 8/12/07, akki [EMAIL PROTECTED] wrote: Hi, I have a problem when I want to put a legend on the graph. I do: legend(topright, names(o), cex=0.9, col=plot_colors,lty=1:5, bty=n) but the legend is writen into the graph (graphs' top but into the graph), because I have values on this position. How can I write the legend on top the graph without the legend writes on graph's values. Thanks. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mixture of Normals with Large Data
Why would anyone want to fit a mixture of normals with 110 million observations?? Any questions about the distribution that you would care to ask can be answered directly from the data. Of course, any test of normality (or anything else) would be rejected. More to the point, the data are certainly not a random sample of anything. There will be all kinds of systematic nonrandom structure in them. This is clearly a situation where the researcher needs to think more carefully about the substantive questions of interest and how the data may shed light on them, instead of arbitrarily and perhaps reflexively throwing some silly statistical methodology at them. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tim Victor Sent: Tuesday, August 07, 2007 3:02 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] Mixture of Normals with Large Data I wasn't aware of this literature, thanks for the references. On 8/5/07, RAVI VARADHAN [EMAIL PROTECTED] wrote: Another possibility is to use data squashing methods. Relevant papers are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen (1999). Ravi. Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: [EMAIL PROTECTED] - Original Message - From: Charles C. Berry [EMAIL PROTECTED] Date: Saturday, August 4, 2007 8:01 pm Subject: Re: [R] Mixture of Normals with Large Data To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch On Sat, 4 Aug 2007, Tim Victor wrote: All: I am trying to fit a mixture of 2 normals with 110 million observations. I am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and I continue to run out of memory. Does anyone have any suggestions. If the first few million observations can be regarded as a SRS of the rest, then just use them. Or read in blocks of a convenient size and sample some observations from each block. You can repeat this process a few times to see if the results are sufficiently accurate. Otherwise, read in blocks of a convenient size (perhaps 1 million observations at a time), quantize the data to a manageable number of intervals - maybe a few thousand - and tabulate it. Add the counts over all the blocks. Then use mle() to fit a multinomial likelihood whose probabilities are the masses associated with each bin under a mixture of normals law. Chuck Thanks so much, Tim [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E UC San Diego La Jolla, San Diego 92093-0901 __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mixture of Normals with Large Data
Have you considered the situation of wanting to characterize probability densities of prevalence estimates based on a complex random sample of some large population. No -- and I stand by my statement. The empirical distribution of the data themselves are the best characterization of the density. You and others are free to disagree. -- Bert On 8/7/07, Bert Gunter [EMAIL PROTECTED] wrote: Why would anyone want to fit a mixture of normals with 110 million observations?? Any questions about the distribution that you would care to ask can be answered directly from the data. Of course, any test of normality (or anything else) would be rejected. More to the point, the data are certainly not a random sample of anything. There will be all kinds of systematic nonrandom structure in them. This is clearly a situation where the researcher needs to think more carefully about the substantive questions of interest and how the data may shed light on them, instead of arbitrarily and perhaps reflexively throwing some silly statistical methodology at them. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tim Victor Sent: Tuesday, August 07, 2007 3:02 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] Mixture of Normals with Large Data I wasn't aware of this literature, thanks for the references. On 8/5/07, RAVI VARADHAN [EMAIL PROTECTED] wrote: Another possibility is to use data squashing methods. Relevant papers are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen (1999). Ravi. Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: [EMAIL PROTECTED] - Original Message - From: Charles C. Berry [EMAIL PROTECTED] Date: Saturday, August 4, 2007 8:01 pm Subject: Re: [R] Mixture of Normals with Large Data To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch On Sat, 4 Aug 2007, Tim Victor wrote: All: I am trying to fit a mixture of 2 normals with 110 million observations. I am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and I continue to run out of memory. Does anyone have any suggestions. If the first few million observations can be regarded as a SRS of the rest, then just use them. Or read in blocks of a convenient size and sample some observations from each block. You can repeat this process a few times to see if the results are sufficiently accurate. Otherwise, read in blocks of a convenient size (perhaps 1 million observations at a time), quantize the data to a manageable number of intervals - maybe a few thousand - and tabulate it. Add the counts over all the blocks. Then use mle() to fit a multinomial likelihood whose probabilities are the masses associated with each bin under a mixture of normals law. Chuck Thanks so much, Tim [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E UC San Diego La Jolla, San Diego 92093-0901 __ R-help@stat.math.ethz.ch mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create vectors form matrices
The poster asked for row major representation, not column major representation. Matrices **are** vectors -- stored in column major order. Try: cat(x,\n) ## versus... cat(t(x),\n) The tabular printout occurs because the print() method for a matrix object (more generally any array) prints the matrix (a vector with a dim attribute) in an appropriate way. However you can manipulate the matrix **as** a vector, and in most circumstances, the dim attribute will be preserved so it will remain a matrix object. Please read An Introduction to R, ?methods and ?print (at least) for details. R will always be arcane to those who do not make a serious effort to learn it. It is **not** meant to be intuitive and easy for casual users to just plunge into. It is far too complex and powerful for that. But the rewards are great for serious data analysts who put in the effort. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Henrique Dallazuanna Sent: Monday, August 06, 2007 7:33 AM To: Niccolò Bassani Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Create vectors form matrices Try: dim(matrix) - NULL -- Henrique Dallazuanna Curitiba-Parana-Brasil 250 25' 40 S 490 16' 22 O On 06/08/07, Niccolr Bassani [EMAIL PROTECTED] wrote: Hi, dear R users. I've a kind of stupid question, I hope you can provide some help! The topic here's really simple: vectors and matrices. I have a matrix (616 rows x 22 cols) filled with numbers and NAs; something like this: 1 2 3 4 5 6 NA NA NA NA 1 2 3 4 NA NA NA NA NA . .. What I'm trying to do is to put all the rows on a unique row, so to have something like this: 1 2 3 4 5 6 NA NA NA NA 1 2 3 4 NA NA NA NA NA . and so on. The matter is that whatever I try, I just get something like this: 1 1 1 1 1 1 1 1 .2 2 2 2 2 2 2 2 2 .. Obviously, this is not what required. I've tried to concatenate, I've built a for cicle, but nothing seems to produce what I want. Sorry for the dumb question, but I'm almost sure I need holidays... Thanks in advance! niccolr [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame)
I suspect you'll get some creative answers, but if all you're worried about is whether a column exists before you do something with it, what's wrong with: nm - ... ## a character vector of names if(!all(nm %in% names(yourdata))) ## complain else ## do something I think this is called defensive programming. Bert Gunter Genentech -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steven McKinney Sent: Friday, August 03, 2007 10:38 AM To: r-help@stat.math.ethz.ch Subject: [R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame) Hi all, What are current methods people use in R to identify mis-spelled column names when selecting columns from a data frame? Alice Johnson recently tackled this issue (see [BioC] posting below). Due to a mis-spelled column name (FileName instead of Filename) which produced no warning, Alice spent a fair amount of time tracking down this bug. With my fumbling fingers I'll be tracking down such a bug soon too. Is there any options() setting, or debug technique that will flag data frame column extractions that reference a non-existent column? It seems to me that the [.data.frame extractor used to throw an error if given a mis-spelled variable name, and I still see lines of code in [.data.frame such as if (any(is.na(cols))) stop(undefined columns selected) In R 2.5.1 a NULL is silently returned. foo - data.frame(Filename = c(a, b)) foo[, FileName] NULL Has something changed so that the code lines if (any(is.na(cols))) stop(undefined columns selected) in [.data.frame no longer work properly (if I am understanding the intention properly)? If not, could [.data.frame check an options() variable setting (say warn.undefined.colnames) and throw a warning if a non-existent column name is referenced? sessionInfo() R version 2.5.1 (2007-06-27) powerpc-apple-darwin8.9.1 locale: en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: plotrix lme4 Matrix lattice 2.2-3 0.99875-4 0.999375-0 0.16-2 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Johnstone, Alice Sent: Wed 8/1/2007 7:20 PM To: [EMAIL PROTECTED] Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame For interest sake, I have found out why I wasn't getting my expected results when using read.AnnotatedDataFrame Turns out the error was made in the ReadAffy command, where I specified the filenames to be read from my AnnotatedDataFrame object. There was a typo error with a capital N ($FileName) rather than lowercase n ($Filename) as in my target file..whoops. However this meant the filename argument was ignored without the error message(!) and instead of using the information in the AnnotatedDataFrame object (which included filenames, but not alphabetically) it read the .cel files in alphabetical order from the working directory - hence the wrong file was given the wrong label (given by the order of Annotated object) and my comparisons were confused without being obvious as to why or where. Our solution: specify that filename is as.character so assignment of file to target is correct(after correcting $Filename) now that using read.AnnotatedDataFrame rather than readphenoData. Data-ReadAffy(filenames=as.character(pData(pd)$Filename),phenoData=pd) Hurrah! It may be beneficial to others, that if the filename argument isn't specified, that filenames are read from the phenoData object if included here. Thanks! -Original Message- From: Martin Morgan [mailto:[EMAIL PROTECTED] Sent: Thursday, 26 July 2007 11:49 a.m. To: Johnstone, Alice Cc: [EMAIL PROTECTED] Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame Hi Alice -- Johnstone, Alice [EMAIL PROTECTED] writes: Using R2.5.0 and Bioconductor I have been following code to analysis Affymetrix expression data: 2 treatments vs control. The original code was run last year and used the read.phenoData command, however with the newer version I get the error message Warning messages: read.phenoData is deprecated, use read.AnnotatedDataFrame instead The phenoData class is deprecated, use AnnotatedDataFrame (with ExpressionSet) instead I use the read.AnnotatedDataFrame command, but when it comes to the end of the analysis the comparison of the treatment to the controls gets mixed up compared to what you get using the original read.phenoData ie it looks like the 3 groups get labelled wrong and so the comparisons are different (but they can still be matched up
Re: [R] Extracting a website text content using R
Yes, there are. (Please see and follow the posting guide if you wish to obtain something more specific) Bert Gunter Genetech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Am Stat Sent: Wednesday, August 01, 2007 2:19 PM To: r-help@stat.math.ethz.ch Subject: [R] Extracting a website text content using R Dear useR, Just wandering whether it is possible that there is any function in R could let me get the text contents for a certain website. Thanks a lot! Best, Leon [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Slightly OT - use of R
Why? You might receive more useful replies from a relevant subset of users if you specify the purpose you have in mind. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Logsdon Sent: Monday, July 30, 2007 1:28 AM To: r-help@stat.math.ethz.ch Subject: [R] Slightly OT - use of R I am trying to get a measure of how R compares in usage as a statistical platform compared to other software. I would guess it is the most widely used among statisticians at least by virtue of it being open source. But is there any study to which I can refer? By asking this list I am not exactly adopting a rigorous approach! Best wishes John John Logsdon Try to make things as simple Quantex Research Ltd, Manchester UK as possible but not simpler [EMAIL PROTECTED] [EMAIL PROTECTED] +44(0)161 445 4951/G:+44(0)7717758675 www.quantex-research.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating symmetric matrices
See ?dist for an object oriented approach that may be better. Directly, you can do something like (see ?row ?col): x - matrix(NA, 10,10) ## Lower triangular : x[row(x) = col(x) ] - rnorm(55) x[row(x) col(x)] - x[row(x) col(x)] ## or you could have saved the random vector and re-used it. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gregory Gentlemen Sent: Friday, July 27, 2007 8:28 PM To: r-help@stat.math.ethz.ch Subject: [R] generating symmetric matrices Greetings, I have a seemingly simple task which I have not been able to solve today. I want to construct a symmetric matrix of arbtriray size w/o using loops. The following I thought would do it: p - 6 Rmat - diag(p) dat.cor - rnorm(p*(p-1)/2) Rmat[outer(1:p, 1:p, )] - Rmat[outer(1:p, 1:p, )] - dat.cor However, the problem is that the matrix is filled by column and so the resulting matrix is not symmetric. I'd be grateful for any adive and/or solutions. Gregory - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constructing correlation matrices
See ?dist for an object oriented approach that may be better. Directly, you can do something like (see ?row ?col): x - matrix(NA, 10,10) ## Lower triangular : x[row(x) = col(x) ] - rnorm(55) x[row(x) col(x)] - x[row(x) col(x)] ## or you could have saved the random vector and re-used it. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gregory Gentlemen Sent: Sunday, July 29, 2007 7:32 PM To: r-help@stat.math.ethz.ch Subject: [R] Constructing correlation matrices Greetings, I have a seemingly simple task which I have not been able to solve today and I checked all of the help archives on this and have been unable to find anything useful. I want to construct a symmetric matrix of arbtriray size w/o using loops. The following I thought would do it: p - 6 Rmat - diag(p) dat.cor - rnorm(p*(p-1)/2) Rmat[outer(1:p, 1:p, )] - Rmat[outer(1:p, 1:p, )] - dat.cor However, the problem is that the matrix is filled by column and so the resulting matrix is not symmetric. I'd be grateful for any adive and/or solutions. Gregory - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] doubt about options(graphics.record=T)
Below is an explicit excerpt from the Help file. How, please is this not clear enough? Bert Gunter Genentech Nonclinical Statistics Recorded plot histories are of class SavedPlots. They have a print method, and a subset method. As the individual plots are of class recordedplot they can be replayed by printing them: see recordPlot. The active plot history is stored in variable .SavedPlots in the workspace. [emphasis added] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of terra Sent: Monday, July 23, 2007 11:30 AM To: Prof Brian Ripley; r-help@stat.math.ethz.ch Subject: Re: [R] doubt about options(graphics.record=T) Hi all, I've been using R under WindowsXP. So, where the R stores the graphic archives (don't saved) if I use the option options(graphics.record=T) inside of Rprofile.site file? The relevant help file (?windows) does tell you: please read it. Dear Prof. Ripley, I read the recommended (?windows) end it was not clear enough! BTW, I just found a discussion from ([R] RGui: windows-record and command history Thomas Steiner (23 Mar 2006)) where Duncan wrote: - The graphics history is stored in your current workspace in memory, and it can get big. I think it is the answer I was searching. Do you agree? Regards, /\/\/\/\ Jose Claudio Faria Brasil/Bahia/UESC/DCET Estatistica Experimental/Prof. Titular [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Tels: 73-3634.2779 (res - Ilheus/BA) 19-3435.1536 (res - Piracicaba/SP) * 19-9144.8979 (cel - Piracicaba/SP) * /\/\/\/\ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Drawing rectangles in multiple panels
Deepayan et. al.: A question/comment: I have usually found that the subscripts argument is what I need when passing *external* information into the panel function, for example, when I wish to add results from a fit done external to the trellis call. Fits[subscripts] gives me the fits (or whatever) I want to plot for each panel. It is not clear to me how the panel layout information from panel.number(), etc. would be helpful here instead. Am I correct? -- or is there a smarter way to do this that I've missed? Cheers, Bert Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Wednesday, July 11, 2007 10:04 AM To: Jonathan Williams Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Drawing rectangles in multiple panels On 7/11/07, Jonathan Williams [EMAIL PROTECTED] wrote: Hi folks, I'm having some trouble understanding the intricacies of panel functions. I wish to create three side-by-side graphs, each with different data-- so far, so good: I rbind() the data, add a column of subscripts as a conditioning variable, load up the lattice package, specify either a c(3,1) 'layout' or work through 'allow.multiple' and 'outer' and I'm good to go. But now I wish to add three rectangles to each plot, which will be in different places on each panel, and I'm terribly stuck. I can guess this requires defining a panel function on the fly, but none of my attempts are working. Suggestions? You haven't told us what determines the rectangles (only that they are different in each panel). If they are completely driven by panel data, here's an example: panel.qrect - function(x, y, ...) { xq - quantile(x, c(0.1, 0.9)) yq - quantile(y, c(0.1, 0.9)) panel.rect(xq[1], yq[1], xq[2], yq[2], col = grey86, border = NA) panel.xyplot(x, y, ...) } xyplot(Sepal.Length ~ Sepal.Width | Species, iris, panel = panel.qrect) If the rectangles are somehow determined externally, you probably want to use one of the accessor functions described in help(panel.number). There are good and bad (i.e. less robust) ways to use these, but we need to know your use case before recommending one. -Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why doesn't as.character of this factor create a vector ofcharacters?
Andrew: As you haven't received a reply yet ... ?factor,?UseMethod, and An Introduction to R may help. But it's a bit subtle. Factors are objects that are integer vectors (codes) with a levels attribute that associates the codes with levels as character names. So df[df$a==Abraham,] is a data.frame in which the columns are still factors. as.character() is a S3 generic function that calls the (internal) default method on a data.frame. This obviously just turns the vector of integers into characters and ignores the levels attribute. t() is also a S3 generic with a data.frame method. This merely converts the data.frame to a matrix via as.matrix and then applies t() to the matrix. The as.matrix() method for data.frames captures the levels and converts the data.frame to a character matrix with the level names, not their numeric codes.So another perhaps more intuitive but also more storage intensive way (I think) of doing what you wantthat avoids the transpose and as.vector() conversion would be: mx - as.matrix(df) mx[mx[,a]==Abraham,,drop=TRUE] HTH. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Yee Sent: Tuesday, July 10, 2007 8:57 AM To: r-help@stat.math.ethz.ch Subject: [R] why doesn't as.character of this factor create a vector ofcharacters? I'm trying to figure out why when I use as.character() on one row of a data.frame, I get factor numbers instead of a character vector. Any suggestions? See the following code: a-c(Abraham,Jonah,Moses) b-c(Sarah,Hannah,Mary) c-c(Billy,Joe,Bob) df-data.frame(a=a,b=b,c=c) #Suppose I'm interested in one line of this data frame but as a vector one.line - df[df$a==Abraham,] #However the following illustrates the problem I'm having one.line - as.vector(df[df$a==Abraham,]) #Creates a one row data.frame instead of a vector! #compare above to one.line - as.character(df[df$a==Abraham,]) #Creates a vector of 1, 3, 1! #In the end, this creates the output that I'd like: one.line -as.vector(t(df[df$a==Abraham,])) #but it seems like a lot of work! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Salient feature selection
Andy: See e.g. the pls package. However, be forewarned: this is a vague problem (what kind of predictors/responses do you want? -- linear combinations? nonlinear combinations? ...). The problem is also NP-Hard I believe, so solutions are very algorithm (and even starting value)-dependent. For these reasons, statistical inference is difficult, at best, and probably not even meaningful in your context, as I doubt that you have a random sample of anything. A personal recommendation (with which many disagree, I know): seek extreme parsimony in both predictors and responses for results to be replicable/scientifically meaningful. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andy Weller Sent: Monday, July 02, 2007 8:17 AM To: R-help@stat.math.ethz.ch Subject: [R] Salient feature selection I am relatively new to R. I am hoping that someone will be able to point me in the right direction and/or suggest a technique/package/reference that will help me with the following. I have: a) Some explanatory variables (integers, real) - these are real world physical descriptions, i.e. counts of features, etc b) Some response variables (integers, real) - these are image analysis measurements (gray-value distributions, textural descriptors, etc) of the same things represented in a and I want to find out which between the two correlate best - i.e. the salient features from BOTH sets (i.e. not for classification purposes). For example, if a has 10 explanatory variables and b has 10 response variables, I want to test the complete set of explanatory variables with each individual response (or vice versa). So, explanatory 1-10 with response 1, explanatory 1-10 with response 2, explanatory 1-10 with response 3, etc... This should ultimately tell me which real world physical features are related best with the image analysis measurements (with the confidence level between them). I hope this makes sense? I have used SPSS AnswerTree's Exhaustive CHAID before to select a subset of input features for a complete set of output features to aid the creation of artificial neural networks. I want to do a similar thing, but it is not important for ALL explanatory and response variables are used/selected. I hope that I have been clear in my intentions and I look forward to your replies, Andy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] applying max elementwise to two vectors
Please... use and **read** the docs: ?max --- pmax Bert Gunter -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Afshartous, David Sent: Thursday, June 28, 2007 1:20 PM To: r-help@stat.math.ethz.ch Subject: [R] applying max elementwise to two vectors All, Is there one liner way to obtain the max per observation for two vectors? I looked at apply and lapply but it seems that groundwork would have to be done before applying either of those. The code below does it but seems like overkill. Thanks! Dave x = rnorm(10) y = rnorm(10) ind = which(x y) z = x z[ind] - y[ind] ## z now contains the max's __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme correlation structures
Please read ?lme carefully -- the info you seek is there. In particular, the weights argument for changing variance weighting by covariates and the correlation argument for specifying correlation structures. Pinheiro and Bates's MIXED EFFECT MODELS IN S... is the canonical reference (which you should get if you want to use R as you said) that exposits the ideas at greater length. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gareth Hughes Sent: Wednesday, June 27, 2007 7:50 AM To: r-help@stat.math.ethz.ch Subject: [R] lme correlation structures Hi all, I've been using SAS proc mixed to fit linear mixed models and would like to be able to fit the same models in R. Two things in particular: 1) I have longitudinal data and wish to allow for different repeated measures covariance parameter estimates for different groups (men and women), each covariance matrix having the same structure. In proc mixed this would be done by specifying group= in the REPEATED statement. Is this simple to do in R? (I've tried form=~time|indv/sex for example but this doesn't seem to do the job). 2) I've read that other correlation structures can be specified. Does anyone have any examples of how toeplitz or (first-order) ante-dependence structures can be specified? Many thanks, Gareth __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] moving-window (neighborhood) analysis
See the Spatial section under CRAN's Task views Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Carlos Guâno Grohmann Sent: Wednesday, June 27, 2007 8:27 AM To: r-help@stat.math.ethz.ch Subject: [R] moving-window (neighborhood) analysis Hello all I was wondering what would be the best way to do a moving-window analysis of a matrix? By moving-window I mean that kind of analysis common in GIS, where each pixel (matrix element) of the resulting map is a function of it neighbors, and the neighborhood is a square matrix. I was hoping there was some function in R that could do that, where I could define the size of the neighborhood, and then apply some function to the values, some function I don't have in GIS packages (like circular statistics). thanks all. Carlos -- +---+ Carlos Henrique Grohmann - Guano Visiting Researcher at Kingston University London - UK Geologist M.Sc - Doctorate Student at IGc-USP - Brazil Linux User #89721 - carlos dot grohmann at gmail dot com +---+ _ Good morning, doctors. I have taken the liberty of removing Windows 95 from my hard drive. --The winning entry in a What were HAL's first words contest judged by 2001: A SPACE ODYSSEY creator Arthur C. Clarke __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] exaustive subgrouping or combination
Do you realize that for n items, there are 2^(n-1) such groups -- since you essentially want all possible subsets divided by 2: all possible subsets and their complements repeats each split twice, backwards and forwards. So this will quickly become ummm... rather large. If you really want to do this, one lazy but inefficient way I can think of is to use expand.grid() to generate your subsets. Here's a toy example that shows you how with n = 4. ## generate a list with four components each of which ## is c(TRUE,FALSE) -- note that a data.frame is a list z - data.frame(matrix(rep(c(TRUE,FALSE),4),nrow=2) ## Now use expand.grid to get all 2^4 possible 4 vectors as rows ix - do.call(expand.grid,z) ## This is essentially what you want. apply(ix[1:8,],1,function(x)(1:4)[x]) ## gives you the list of first splits, while apply(ix[16:9],... gives the complements (note reversal of order). Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Waverley Sent: Wednesday, June 27, 2007 1:57 PM To: r-help@stat.math.ethz.ch Subject: [R] exaustive subgrouping or combination Dear Colleagues, I am looking for a package or previous implemented R to subgroup and exaustively divide a vector of squence into 2 groups. For example: 1, 2, 3, 4 I want to have a group of 1, (2,3,4) (1,2), (3,4) (1,3), (2,4) (1,4), (2,3) (1,2,3), 4 (2,3), (1,4) ... Can someone help me as how to implement this? I get some imaginary problem when the sequence becomes large. Thanks much in advance. -- Waverley @ Palo Alto [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] open .r files with double-click
However, do note (on Windows) that you can use an external text/programming editors (see CRAN's listings)and can register .r / .R files to open automatically in the chosen editor when clicked on.At least some of these editors (eg TINN-R) can be configured to automatically and simultaneously open the RGUI, too, I believe -- but someone may correct me on this. Bert Gunter Nonclinical Statistics 7-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Duncan Murdoch Sent: Saturday, June 09, 2007 4:29 AM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] open .r files with double-click On 08/06/2007 2:52 PM, [EMAIL PROTECTED] wrote: Hi Folks, On Windows XP, R 2.5.0. After reading the Installation for Windows and Windows FAQs, I cannot resolve this. I set file types so that Rgui.exe will open .r files. When I try to open a .r file by double-clicking, R begins to launch, but I get an error message saying Argument 'C:\Documents and Settings\Zoology\My Documents\trial.r' _ignored_ I click OK, and then R GUI opens, but not the script file. Is there a way to change this? Not currently. See the appendix Invoking R of the Introduction manual for the current command line parameters, which don't include open a script. This would be a reasonable addition, and I'll add it at some point, sooner if someone else comes up with a convincing argument for the right command line parameter to do this. It would be better if clicking on a second script opened a new window in the same session, but that takes more work; not sure I'll get to this. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rlm results on trellis plot
I don't think the code below does what's requested, as it assumes a single overall fit for all panels, and I think the requester wanted separate fits by panel. This can be easily done, of course, by a minor modification: xyplot( y ~ x | z, panel = function(x,y,...){ panel.xyplot(x,y,...) panel.abline(lm(y~x),col=blue,lwd=2) panel.abline(rlm(y~x),col = red,lwd=2) }) Note that the coefficients do not need to be explicitly extracted by coef(), as panel.abline will do this automatically. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 Alan S Barnett wrote: How do I add to a trellis plot the best fit line from a robust fit? I can use panel.lm to add a least squares fit, but there is no panel.rlm function. How about using panel.abline() instead of panel.lmline()? fit1 - coef(lm(stack.loss ~ Air.Flow, data = stackloss)) fit2 - coef(rlm(stack.loss ~ Air.Flow, data = stackloss)) xyplot(stack.loss ~ Air.Flow, data=stackloss, panel = function(x, y, ...){ panel.xyplot(x, y, ...) panel.abline(fit1, type=l, col=blue) panel.abline(fit2, type=l, col=red) }, aspect=1) -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R is not a validated software package..
Frank et. al: I believe this is a bit too facile. 21 CFR Part 11 does necessitate a software validation **process** -- but this process does not require any particular software. Rather, it requires that those using whatever software demonstrate to the FDA's satisfaction that the software does what it's supposed to do appropriately. This includes a lot more than assuring, say, the numerical accuracy of computations; I think it also requires demonstration that the data are secure, that it is properly transferred from one source to another, etc. I assume that the statistical validation of R would be relatively simple, as R already has an extensive test suite, and it would simply be a matter of providing that test suite info. A bit more might be required, but I don't think it's such a big deal. I think Wensui Liu's characterization of clinical statisticians as having a mentality related to job security is a canard. Although I work in nonclinical, my observation is that clinical statistics is complex and difficult, not only because of many challenging statistical issues, but also because of the labyrinthian complexities of the regulated and extremely costly environment in which they work. It is certainly a job that I could not do. That said, probably the greatest obstacle to change from SAS is neither obstinacy nor ignorance, but rather inertia: pharmaceutical companies have over the decades made a huge investment in SAS infrastructure to support the collection, organization, analysis, and submission of data for clinical trials. To convert this to anything else would be a herculean task involving huge expense, risk, and resources. R, S-Plus (and much else -- e.g. numerous unvalidated data mining software packages) are routinely used by clinical statisticians to better understand their data and for exploratory analyses that are used to supplement official analyses (e.g. for trying to justify collection of tissue samples or a pivotal study in a patient subpopulation). But it is difficult for me to see how one could make a business case to change clinical trial analysis software infrastructure from SAS to S-Plus, SPSS, or anything else. **DISCLAINMER** My opinions only. They do not in any way represent the view of my company or its employees. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr Sent: Friday, June 08, 2007 7:45 AM To: Giovanni Parrinello Cc: r-help@stat.math.ethz.ch Subject: Re: [R] R is not a validated software package.. Giovanni Parrinello wrote: Dear All, discussing with a statistician of a pharmaceutical company I received this answer about the statistical package that I have planned to use: As R is not a validated software package, we would like to ask if it would rather be possible for you to use SAS, SPSS or another approved statistical software system. Could someone suggest me a 'polite' answer? TIA Giovanni Search the archives and you'll find a LOT of responses. Briefly, in my view there are no requirements, just some pharma companies that think there are. FDA is required to accepted all submissions, and they get some where only Excel was used, or Minitab, and lots more. There is a session on this at the upcoming R International Users Meeting in Iowa in August. The session will include dicussions of federal regulation compliance for R, for those users who feel that such compliance is actually needed. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to find how many modes in 2 dimensions case
Note that the number of modes (local maxima??) is a function of the bandwidth, so I'm not sure your question is even meaningful. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Wang Sent: Friday, June 08, 2007 11:54 AM To: R-help@stat.math.ethz.ch Subject: [R] how to find how many modes in 2 dimensions case Hi, Does anyone know how to count the number of modes in 2 dimensions using kde2d function? Thanks Pat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to input data from the keyboard
Please do your homework: help.search(input) Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Miguel Caro Sent: Thursday, June 07, 2007 11:01 AM To: r-help@stat.math.ethz.ch Subject: [R] how to input data from the keyboard Hello everybody, i wish to input data from the keyboard. In C++ it would seem like this: printf(Input parameter Alpha= ); scanf(%d, alpha); how would be in R? Thanks for your help. Bye Miguel. -- View this message in context: http://www.nabble.com/how-to-input-data-from-the-keyboard-tf3885387.html#a11 013164 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing multiple distributions
While Ravi's suggestion of the compositions package is certainly appropriate, I suspect that the complex and extensive statistical homework you would need to do to use it might be overwhelming (the geometry of compositions is a simplex, and this makes things hard). As a simple and perhaps useful alternative, use pairs() or splom() to plot your 5-D data, distinguishing the different treatments via color and/or symbol. In addition, it might be useful to do the same sort of plot on the first two principal components (?prcomp) of the first 4 dimensions of your 5 component vectors (since the 5th is determined by the first 4). Because of the simplicial geometry, this PCA approach is not right, but it may nevertheless be revealing. The same plotting ideas are in the compositions package done properly (in the correct geometry),so if you are motivated to do so, you can do these things there. Even if you don't dig into the details, using the compositions package version of the plots may be realtively easy to do,interpretable, and revealing -- more so than my simple but wrong suggestions. You can decide. I would not trust inference using ad hoc approaches in the untransformed data. That's what the package is for. But plotting the data should always be at least the first thing you do anyway. I often find it to be sufficient, too. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of jiho Sent: Thursday, May 31, 2007 8:37 AM To: R-help Subject: Re: [R] Comparing multiple distributions Nobody answered my first request. I am sorry if I did not explain my problem clearly. English is not my native language and statistical english is even more difficult. I'll try to summarize my issue in more appropriate statistical terms: Each of my observations is not a single number but a vector of 5 proportions (which add up to 1 for each observation). I want to compare the shape of those vectors between two treatments (i.e. how the quantities are distributed between the 5 values in treatment A with respect to treatment B). I was pointed to Hotelling T-squared. Does it seem appropriate? Are there other possibilities (I read many discussions about hotelling vs. manova but I could not see how any of those related to my particular case)? Thank you very much in advance for your insights. See below for my earlier, more detailed, e-mail. On 2007-May-21 , at 19:26 , jiho wrote: I am studying the vertical distribution of plankton and want to study its variations relatively to several factors (time of day, species, water column structure etc.). So my data is special in that, at each sampling site (each observation), I don't have *one* number, I have *several* numbers (abundance of organisms in each depth bin, I sample 5 depth bins) which describe a vertical distribution. Then let say I want to compare speciesA with speciesB, I would end up trying to compare a group of several distributions with another group of several distributions (where a distribution is a vector of 5 numbers: an abundance for each depth bin). Does anyone know how I could do this (with R obviously ;) )? Currently I kind of get around the problem and: - compute mean abundance per depth bin within each group and compare the two mean distributions with a ks.test but this obviously diminishes the power of the test (I only compare 5*2 observations) - restrict the information at each sampling site to the mean depth weighted by the abundance of the species of interest. This way I have one observation per station but I reduce the information to the mean depths while the actual repartition is important also. I know this is probably not directly R related but I have already searched around for solutions and solicited my local statistics expert... to no avail. So I hope that the stats' experts on this list will help me. Thank you very much in advance. JiHO --- http://jo.irisson.free.fr/ -- Ce message a iti virifii par MailScanner pour des virus ou des polluriels et rien de suspect n'a iti trouvi. CRI UPVD http://www.univ-perp.fr __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] runif with weights
You did not explicitly say it, but your example indicates that you want to sample from integers only (else what would weights mean?). So... ?sample -- in particular note the prob argument and read help docs carefully e.g. sample(100,25,prob=c(0,rep.int(.4,9),rep.int(.6,90))) ## without replacement Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ken Knoblauch Sent: Wednesday, May 30, 2007 5:59 PM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: [R] runif with weights Not sure why you have set the probability of a 1 to 0 but maybe something like this might be what you want: round( ifelse( rbinom(25, 1, 0.4), runif(25, 2, 10), runif(25, 11, 100) ) ) [1] 2 6 34 90 79 71 83 8 47 36 21 32 17 71 3 16 9 65 94 6 30 5 7 10 13 I would like to generate 25 numbers from 1 to 100 but I would like to have some numbers that could be more probable to come out. I was thinking of the function runif: runif(25, 1, 100) , but I don´t know how to give more weight to some numbers. Example: each number from 2 to 10 has the probability of 40% to come out but the probability of each number from 11 to 100 to come out is 60%. -- Ken Knoblauch Inserm U846 Institut Cellule Souche et Cerveau Département Neurosciences Intégratives 18 avenue du Doyen Lépine 69500 Bron France tel: +33 (0)4 72 91 34 77 fax: +33 (0)4 72 91 34 61 portable: +33 (0)6 84 10 64 10 http://www.pizzerialesgemeaux.com/u846/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] normality tests [Broadcast]
False. Box proved ~ca 1952 that standard inferences in the linear regression model are robust to nonnormality, at least for (nearly) balanced designs. The **crucial** assumption is independence, which I suspect partially motivated his time series work on arima modeling. More recently, work on hierarchical models (e.g. repeated measures/mixed effect models) has also dealt with lack of independence. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of wssecn Sent: Friday, May 25, 2007 2:59 PM To: r-help Subject: Re: [R] normality tests [Broadcast] The normality of the residuals is important in the inference procedures for the classical linear regression model, and normality is very important in correlation analysis (second moment)... Washington S. Silva Thank you all for your replies they have been more useful... well in my case I have chosen to do some parametric tests (more precisely correlation and linear regressions among some variables)... so it would be nice if I had an extra bit of support on my decisions... If I understood well from all your replies... I shouldn't pay s much attntion on the normality tests, so it wouldn't matter which one/ones I use to report... but rather focus on issues such as the power of the test... Thanks again. On 25/05/07, Lucke, Joseph F [EMAIL PROTECTED] wrote: Most standard tests, such as t-tests and ANOVA, are fairly resistant to non-normalilty for significance testing. It's the sample means that have to be normal, not the data. The CLT kicks in fairly quickly. Testing for normality prior to choosing a test statistic is generally not a good idea. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy Sent: Friday, May 25, 2007 12:04 PM To: [EMAIL PROTECTED]; Frank E Harrell Jr Cc: r-help Subject: Re: [R] normality tests [Broadcast] From: [EMAIL PROTECTED] On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: Hi all, apologies for seeking advice on a general stats question. I ve run normality tests using 8 different methods: - Lilliefors - Shapiro-Wilk - Robust Jarque Bera - Jarque Bera - Anderson-Darling - Pearson chi-square - Cramer-von Mises - Shapiro-Francia All show that the null hypothesis that the data come from a normal distro cannot be rejected. Great. However, I don't think it looks nice to report the values of 8 different tests on a report. One note is that my sample size is really tiny (less than 20 independent cases). Without wanting to start a flame war, are there any advices of which one/ones would be more appropriate and should be reported (along with a Q-Q plot). Thank you. Regards, Wow - I have so many concerns with that approach that it's hard to know where to begin. But first of all, why care about normality? Why not use distribution-free methods? You should examine the power of the tests for n=20. You'll probably find it's not good enough to reach a reliable conclusion. And wouldn't it be even worse if I used non-parametric tests? I believe what Frank meant was that it's probably better to use a distribution-free procedure to do the real test of interest (if there is one) instead of testing for normality, and then use a test that assumes normality. I guess the question is, what exactly do you want to do with the outcome of the normality tests? If those are going to be used as basis for deciding which test(s) to do next, then I concur with Frank's reservation. Generally speaking, I do not find goodness-of-fit for distributions very useful, mostly for the reason that failure to reject the null is no evidence in favor of the null. It's difficult for me to imagine why there's insufficient evidence to show that the data did not come from a normal distribution would be interesting. Andy Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University -- yianni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read
Re: [R] trouble understanding why ...==NaN isn't true
1. NaN is a character string, **not** NaN; hence is.nan(NaN) yields FALSE. 2. Please read the docs! ?NaN explicitly says: Do not test equality to NaN, or even use identical, since systems typically have many different NaN values. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Yee Sent: Tuesday, May 29, 2007 3:33 PM To: r-help@stat.math.ethz.ch Subject: [R] trouble understanding why ...==NaN isn't true I have the following data: dataset[2,Sample.227] [1]NaN 1558 Levels: -0.000 -0.001 -0.002 -0.003 -0.004 -0.005 -0.006 -0.007 -0.008- 0.009 ... 2.000 However, I'm not sure why this expression is coming back as FALSE: dataset[2,Sample.227]==NaN [1] FALSE Similarly: dataset[2,Sample.227]==NaN [1] NA It seems that since NaN is represented as a character, this expression ==NaN should be TRUE, but it's returning as FALSE. Thanks, Andrew [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it possible to print a data.frame without the row names?
?write.table Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bos, Roger Sent: Thursday, May 24, 2007 7:17 AM To: r-help@stat.math.ethz.ch Subject: [R] Is it possible to print a data.frame without the row names? Is it possible to print a data.frame without the row names? I checked ?data.frame, ?print, ?format and didn't see anything that helped. In the example below, I would just like to show the two columns of data and not the row.names 1:10. a-data.frame(1:10, 21:30) a X1.10 X21.30 1 1 21 2 2 22 3 3 23 4 4 24 5 5 25 6 6 26 7 7 27 8 8 28 9 9 29 1010 30 row.names(a)-NULL a X1.10 X21.30 1 1 21 2 2 22 3 3 23 4 4 24 5 5 25 6 6 26 7 7 27 8 8 28 9 9 29 1010 30 Thanks, Roger J. Bos, CFA ** * This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No right to confidential or privileged treatment of this message is waived or lost by any error in transmission. If you have received this message in error, please immediately notify the sender by e-mail, delete the message and all copies from your system and destroy any hard copies. You must not, directly or indirectly, use, disclose, distribute, print or copy any part of this message if you are not the intended recipient. ** [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help with apply and ccf
I understand you to want correlations of corresponding rows (** not ccf, which returns a vector ccf for each pair of rows). If that is so, 1) ... in theory, diag(cor(t(A), t(B)) would work without apply, except 196,000 rows is probably too large, and it is probably too inefficient to compute and then throw away all the off-diagonals anyway. 2. ##Use a 3d array. ar - array(c(A,B),dim=c(dim(A),2)) ## this can also be done by abind() in the abind package apply(ar,1,function(x)cor(x[,1],x[,2])) ## Value is a vector 3. ## probably simplest and best sapply(seq_along(nrow(a)),function(i)cor(a[i,],b[i,])) ## Note: value is a vector, not an array Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Andric Sent: Tuesday, May 22, 2007 8:35 AM To: r-help@stat.math.ethz.ch Subject: [R] R-help with apply and ccf Dear R gurus, I would like to use the ccf function on two matrices that are each 196000 x 12. Ideally, I want to be able to go row by row for the two matrices using apply for the ccf function and get one 196000 X 1 array output. The apply function though wants only one array, no? Basically, is there a way to use apply when there are two arrays in order to do something like correlation on a row by row basis? Thanks for your help Michael [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple programming question
?cut This would recode to a factor with numeric labels for its levels. as.numeric(as.character(...))would then convert the labels to numeric values that you can manipulate. This presumes that the variable you are coding is numeric and you want to recode by binning the values into ordered bins. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Lauri Nikkinen Sent: Friday, May 18, 2007 8:02 AM To: Gabor Grothendieck Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Simple programming question Thank you all for your answers. Actually Gabor's first post was right in that sense that I wanted to have low to all cases which are lower than second highest. But how about if I want to convert/recode those high, mid and low to numeric to make some calculations, e.g. 3, 1, 0 respectively. How do I have to modify your solutions? I would also like to apply this solution to many kinds of recoding situations. -Lauri 2007/5/18, Gabor Grothendieck [EMAIL PROTECTED]: There was a problem in the first line in the case that the highest number is not unique within a category. In this example its not apparent since that never occurs. At any rate, it should be: f - function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE))) factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high)) Also note that the factor labels were arranged so that low, mid and high correspond to levels 1, 2 and 3 respectively. On 5/18/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: Try this. f assigns 1, 2 and 3 to the highest, second highest and third highest within a category. ave applies f to each category. Finally we convert it to a factor. f - function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE))) factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high)) On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote: Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr - dfr[order(dfr$categ),] and I want to score values or points in variable named var3 following this kind of logic: 1. the highest value of var3 within category (variable named categ) - high 2. the second highest value - mid 3. lowest value - low This would be the output of this reasoning: dfr$score - factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low ,low,high,mid,low,low)) dfr The question is how I do this programmatically in R (i.e. if I have 2000 rows in my dfr)? I appreciate your help! Cheers, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using lm() with variable formula
... and note that if a matrix of responses is on the left of ~ , separate regressions will be simultaneously fit to each of the columns of the matrix. Note that this **is** in TFM -- ?lm. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Thursday, May 17, 2007 8:22 AM To: Chris Elsaesser Cc: r-help@stat.math.ethz.ch Subject: Re: [R] using lm() with variable formula Try this: lm(Sepal.Length ~., iris[1:3]) # or cn - c(Sepal.Length, Sepal.Width, Petal.Length) lm(Sepal.Length ~., iris[cn]) On 5/17/07, Chris Elsaesser [EMAIL PROTECTED] wrote: New to R; please excuse me if this is a dumb question. I tried to RTFM; didn't help. I want to do a series of regressions over the columns in a data.frame, systematically varying the response variable and the the terms; and not necessarily including all the non-response columns. In my case, the columns are time series. I don't know if that makes a difference; it does mean I have to call lag() to offset non-response terms. I can not assume a specific number of columns in the data.frame; might be 3, might be 20. My central problem is that the formula given to lm() is different each time. For example, say a data.frame had columns with the following headings: height, weight, BP (blood pressure), and Cals (calorie intake per time frame). In that case, I'd need something like the following: lm(height ~ weight + BP + Cals) lm(height ~ weight + BP) lm(height ~ weight + Cals) lm(height ~ BP + Cals) lm(weight ~ height + BP) lm(weight ~ height + Cals) etc. In general, I'll have to read the header to get the argument labels. Do I have to write several functions, each taking a different number of arguments? I'd like to construct a string or list representing the varialbes in the formula and apply lm(), so to say [I'm mainly a Lisp programmer where that part would be very simple. Anyone have a Lisp API for R? :-}] Thanks, chris Chris Elsaesser, PhD Principal Scientist, Machine Learning SPADAC Inc. 7921 Jones Branch Dr. Suite 600 McLean, VA 22102 703.371.7301 (m) 703.637.9421 (o) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeated measures regression
You need to gain some background. MIXED EFFECTS MODELS in S and S-PLUS by Pinheiro and Bates is a canonical reference for how to do this with R. Chapter 10 of Venables and Ripley's MASS(4th ed.) contains a more compact but very informative overview that may suffice. Other useful references can also be found on CRAN. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Christie Sent: Thursday, May 17, 2007 10:06 AM To: R-help@stat.math.ethz.ch Subject: [R] repeated measures regression How does one go about doing a repeated measure regression? The documentation I have on it (Lorch Myers 1990) says to use linear / (subj x linear) to get your F. However, if I put subject into glm or lm I can't get back a straight error term because it assumes (rightly) that subject is a nominal predictor of some sort. In looking at LME it seems like it just does the right thing here if I enter the random effect the same as when looking for ANOVA like results out of it. But, part of the reason I'm asking is that I wanted to compare the two methods. I suppose I could get it out of aov but isn't that built on lm? I guess what I'm asking is how to calculate the error terms easily with lm. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bug or feature?
... but it **is** explicitly documented in ?subset: For data frames, the subset argument works on the rows. Note that subset will be evaluated in the data frame, so columns can be referred to (by name) as variables in the expression (see the examples). Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of ivo welch Sent: Thursday, May 17, 2007 11:53 AM To: jim holtman Cc: r-help Subject: Re: [R] bug or feature? ahh...it is the silent substitution of the data frame in the subset statement. I should have known this. (PS: this may not be desirable behavior; maybe it would be useful to issue a warning if the same name is defined in an upper data frame. just an opinion...) mea misunderstanding. /iaw __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Testing for existence inside a function
I think parent.frame() is what is wanted, not parent.env(environment()) in your suggested solution: Consider this: (which does **not** however handle the arbitrary expressions as argument issue): foo1 - function(z){ cat(exists(deparse(substitute(z)),parent.frame()), exists(deparse(substitute(z)),parent.env(environment())), exists(deparse(substitute(z))),\n) invisible() } foo - function(x){ y - x foo1(y) } x-1 ## Then ... foo(x) TRUE FALSE FALSE Note that parent.env() is the **enclosing environment** i.e. the environment in which foo1 is defined (lexical scoping); while parent.frame() is the frame of the caller of foo1, which is what is wanted if foo1 is to work when called within a function. Note that parent.frame() would also work when foo1 is called at the command line. Further corrections/clarifications welcome, of course. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Tuesday, May 15, 2007 10:06 AM To: Liaw, Andy Cc: r-help@stat.math.ethz.ch; Talbot Katz Subject: Re: [R] Testing for existence inside a function Maybe this: chk2 - function(x) { chr - deparse(substitute(x)) e - parse(text = chr) structure(exists(chr, parent.env(environment())), is.name = length(e) == 1 is.name(e[[1]])) } chk2(1) # structure(FALSE, is.name = FALSE) ab - 1 chk2(ab+1) # structure(FALSE, is.name = FALSE) chk2(ab) # structure(TRUE, is.name = TRUE) exists(x) # FALSE chk2(x) # structure(FALSE, is.name = TRUE) chk2(x+1) # structure(FALSE, is.name = FALSE) On 5/15/07, Liaw, Andy [EMAIL PROTECTED] wrote: Another thing to watch out for is that an argument to a function can be an expression (or even literal constants), instead of just the name of an object. exists() wouldn't really do the right thing. I'm not sure how to properly do the exhaustive check. Andy From: Gabor Grothendieck Try this modification: chk - function(x) exists(deparse(substitute(x)), parent.env(environment())) ab - 1 chk(ab) [1] TRUE exists(x) [1] FALSE chk(x) [1] FALSE On 5/15/07, Talbot Katz [EMAIL PROTECTED] wrote: Hi. Thanks once more for the swift response. This solution works pretty well. The only small glitch is if I pass the function an argument with the same name as the function argument. That is, suppose x is the argument name in my user-defined function, and that object x does not exist. If I call the function f(x), i.e., using the non-existent object x as the argument value, then the function says that x exists. Here is my example log: chkex5 - function(objn){ + c(exob=exists(deparse(substitute(objn + } exists(objn) [1] FALSE chkex5(objn) exob TRUE But I suppose I can live with this. Thanks again! -- TMK -- 212-460-5430home 917-656-5351cell From: Liaw, Andy [EMAIL PROTECTED] To: Talbot Katz [EMAIL PROTECTED],r-help@stat.math.ethz.ch Subject: RE: [R] Testing for existence inside a function Date: Tue, 15 May 2007 11:41:17 -0400 Just need a bit more work: R f - function(x) exists(deparse(substitute(x))) R f(y) [1] FALSE R y - 1 R f(y) [1] TRUE R f(z) [1] FALSE Andy From: Talbot Katz Hi, Andy. Thank you for the quick response! Unfortunately, none of these are exactly what I'm looking for. I'm looking for the following: Suppose object y exists and object z does not exist. If I pass y as the value of the argument to my function, I want to be able to verify, inside my function, the existence of y; similarly, if I pass z as the value of the argument, I want to be able to see, inside the function, that z doesn't exist. The missing function just checks whether the argument is missing; in my case, the argument is not missing, but the object may not exist. And the way you use the exists function inside the user-defined function doesn't test the argument to the user-defined function, it's just hard-coded for the object y. So I'm sorry if I wasn't clear before, and I hope this is clear now. Perhaps what I'm attempting to do is unavailable because it's a bad programming paradigm. But even an explanation if that's the case would be appreciated. -- TMK -- 212-460-5430home 917-656-5351cell From: Liaw, Andy [EMAIL PROTECTED] To: Talbot Katz [EMAIL PROTECTED],r-help@stat.math.ethz.ch Subject: RE: [R] Testing for existence inside a function [Broadcast] Date: Tue, 15 May 2007 11:03:12 -0400 Not sure which one you want, but the following should cover it: R f - function(x) c(x=missing(x), y=exists(y)) R f(1) x y
Re: [R] confidence intervals on multiple comparisons
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Tuesday, May 15, 2007 12:52 PM To: Salvatore Enrico Indiogine Cc: R-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject: Re: [R] confidence intervals on multiple comparisons Enrico, prop.test is for testing proportions two at a time. If you want to test for differences between 4 proportions simultaneously (rather than two at a time), try a logistic regression model (from which you can get confidence intervals for each of your groups). Cody Hamilton, PhD Staff Biostatistician Edwards Lifesciences Yes, but beware: in the default contr.treatment coding for contrasts, you get estimates and confidence intervals for the first group and for the **differences** between the first group and the others. As you said, it's easy to get what you want from this, but you must pay attention to the details here. Bert Gunter Genentech Nonclinical statistics Salvatore Enrico Indiogine [EMAIL PROTECTED] To .com R-help@stat.math.ethz.ch Sent by: cc [EMAIL PROTECTED] at.math.ethz.ch Subject [R] confidence intervals on multiple comparisons 05/13/2007 10:51 AM Greetings! I am using prop.test to compare 4 proportions to find out whether they are equal. According to the help function you can not have confidence intervals if you compare more than 2 proportions. I need to find an effect size or confidence interval for these proportions. Any suggestions? Enrico -- Enrico Indiogine Mathematics Education Texas AM University [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looking for a cleaner way to implement a setting certainindices of a matrix to 1 function
Suggestion: You might make it easier for folks to help if you explained in clear and simple terms what you are trying to do. Code is hard to deconstruct. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Leeds, Mark (IED) Sent: Tuesday, May 08, 2007 2:22 PM To: r-help@stat.math.ethz.ch Subject: [R] Looking for a cleaner way to implement a setting certainindices of a matrix to 1 function I wrote an ugly algorithm to set certain elements of a matrix to 1 without looping and below works and you can see what The output is below the code. K-6 lagnum-2 restrictmat-matrix(0,nrow=K,ncol=K*3) restrictmat[((col(restrictmat) - row(restrictmat) = 0 ) (col(restrictmat)-row(restrictmat)) %% K == 0)]-1 restrictmat[,(lagnum*K+1):ncol(restrictmat)]-0 restrictmat [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [1,]100000100 0 0 0 0 0 0 0 0 0 [2,]010000010 0 0 0 0 0 0 0 0 0 [3,]001000001 0 0 0 0 0 0 0 0 0 [4,]000100000 1 0 0 0 0 0 0 0 0 [5,]000010000 0 1 0 0 0 0 0 0 0 [6,]000001000 0 0 1 0 0 0 0 0 0 For lagnum equals 1 , it also works : restrictmat [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [1,]100000000 0 0 0 0 0 0 0 0 0 [2,]010000000 0 0 0 0 0 0 0 0 0 [3,]001000000 0 0 0 0 0 0 0 0 0 [4,]000100000 0 0 0 0 0 0 0 0 0 [5,]000010000 0 0 0 0 0 0 0 0 0 [6,]000001000 0 0 0 0 0 0 0 0 0 But I am thinking that there has to be a better way particularly because I'll get an error if I set lagnum to 3. Any improvements or total revampings are appreciated. The number of columns will always be a multiple of the number of rows So K doesn't have to be 6. that was just to show what the commands do. thanks. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Neural Nets (nnet) - evaluating success rate of predictions
Folks: If I understand correctly, the following may be pertinent. Note that the procedure: min.nnet = nnet[k] such that error rate of nnet[k] = min[i] {error rate(nnet(training data) from ith random start) } does not guarantee a classifier with a lower error rate on **new** data than any single one of the random starts. That is because you are using the same training set to choose the model (= nnet parameters) as you are using to determine the error rate. I know it's tempting to think that choosing the best among many random starts always gets you a better classifier, but it need not. The error rate on the training set for any classifier -- be it a single one or one derived in some way from many -- is a biased estimate of the true error rate, so that choosing a classifer on this basis does not assure better performance for future data. In particular, I would guess that choosing the best among many (hundreds/thousands) random starts is probably almost guaranteed to produce a poor predictor (ergo the importance of parsimony/penalization). I would appreciate comments from anyone, pro or con, with knowledge and experience of these things, however, as I'm rather limited on both. The simple answer to the question of obtaining the error rate using validation data is: Do whatever you like to choose/fit a classifier on the training set. **Once you are done,** the estimate of your error rate is the error rate you get on applying that classifier to the validation set. But you can do this only once! If you don't like that error rate and go back to finding a a better predictor in some way, then your validation data have now been used to derive the classifier and thus has become part of the training data, so any further assessment of the error rate of a new classifier on it is now also a biased estimate. You need yet new validation data for that. Of course, there are all sort of cross validation schemes one can use to avoid -- or maybe mitigate -- these issues: most books on statistical classification/machine learning discuss this in detail. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of hadley wickham Sent: Monday, May 07, 2007 5:26 AM To: Wensui Liu Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Neural Nets (nnet) - evaluating success rate of predictions Pick the one with the lowest error rate on your training data? Hadley On 5/7/07, Wensui Liu [EMAIL PROTECTED] wrote: well, how to do you know which ones are the best out of several hundreds? I will average all results out of several hundreds. On 5/7/07, hadley wickham [EMAIL PROTECTED] wrote: On 5/6/07, nathaniel Grey [EMAIL PROTECTED] wrote: Hello R-Users, I have been using (nnet) by Ripley to train a neural net on a test dataset, I have obtained predictions for a validtion dataset using: PP-predict(nnetobject,validationdata) Using PP I can find the -2 log likelihood for the validation datset. However what I really want to know is how well my nueral net is doing at classifying my binary output variable. I am new to R and I can't figure out how you can assess the success rates of predictions. table(PP, binaryvariable) should get you started. Also if you're using nnet with random starts, I strongly suggest taking the best out of several hundred (or maybe thousand) trials - it makes a big difference! Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu A lousy statistician who happens to know a little programming (http://spaces.msn.com/statcompute/blog) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple scatterplots
Please note: in R you can specify (some of the) graphics parameters as the appropriate length vectors. So your plot example below can also be done as, for example: plot( rep.int(aa,3),c(cc,bb,dd),col=rep(c(red,blue,green),e=length(aa))) However, this doesn't seem to fit the posted request, where maybe something like a trellis plot of the different distributions is what is wanted?? -- but I may well misunderstand. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Kane Sent: Wednesday, May 02, 2007 11:06 AM To: Kostadin Cholakov; r-help@stat.math.ethz.ch Subject: Re: [R] Multiple scatterplots Your title and your posting do not say the same thing. Assuming you want all three distributions on one scatter plot does this help? aa - 1:10 bb - 11:2 cc - bb^2 dd - c(3,4,7,9,11,32,11,14,5,9) plot(aa,cc, col=red) points(aa,bb, col=blue) points(aa,dd, col=green) Also in plotting it is a good idea to look at all the variations etc that you can get with par() Type ?par --- Kostadin Cholakov [EMAIL PROTECTED] wrote: Hi, I have to plot three Ziph distributions for three languages where the x value represents the rank of a given word and the y value represents the relative frequency of this word in the corpus. Is there some way so that I can plot all three distributions on a single scatterplot, preferably with different colours :) I tried to find something in the R manual but there are no such examples :( Thank you! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] thousand separator (was RE: weight)
Except this doesn't work for 1,123,456.789 Marc. I hesitate to suggest it, but gregexpr() will do it, as it captures the position of **every** match to ,. This could be then used to process the vector via some sort of loop/apply statement. But I think there **must** be a more elegant way using regular expressions alone, so I, too, await a clever reply. -- Bert Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz Sent: Monday, April 30, 2007 10:02 AM To: Liaw, Andy Cc: r-help@stat.math.ethz.ch Subject: Re: [R] thousand separator (was RE: weight) One possibility would be to use something like the following post-import: WTPP [1] 1,106.8250 1,336.5138 str(WTPP) Factor w/ 2 levels 1,106.8250,1,336.5138: 1 2 as.numeric(gsub(,, , WTPP)) [1] 1106.825 1336.514 Essentially strip the ',' characters from the factors and then coerce the resultant character vector to numeric. HTH, Marc Schwartz On Mon, 2007-04-30 at 12:26 -0400, Liaw, Andy wrote: I've run into this occasionally. My current solution is simply to read it into Excel, re-format the offending column(s) by unchecking the thousand separator box, and write it back out. Not exactly ideal to say the least. If anyone can provide a better solution in R, I'm all ears... Andy From: Natalie O'Toole Hi, These are the variables in my file. I think the variable i'm having problems with is WTPP which is of the Factor type. Does anyone know how to fix this, please? Thanks, Nat data.frame': 290 obs. of 5 variables: $ PROV : num 48 48 48 48 48 48 48 48 48 48 ... $ REGION: num 4 4 4 4 4 4 4 4 4 4 ... $ GRADE : num 7 7 7 7 7 7 7 7 7 7 ... $ Y_Q10A: num 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ... $ WTPP : Factor w/ 1884 levels 1,106.8250,1,336.5138,..: 1544 67 1568 40 221 1702 1702 1434 310 310 ... __ --- Douglas Bates [EMAIL PROTECTED] wrote: On 4/28/07, John Kane [EMAIL PROTECTED] wrote: IIRC you have a yes/no smoking variable scored 1/2 ? It is possibly being read in as a factor not as an integer. try class(df$smoking.variable) to see . Good point. In general I would recommend using str(df) to check on the class or storage type of all variables in a data frame if you are getting unexpected results when manipulating it. That function is carefully written to provide a maximum of information in a minimum of space. Yes but I'm an relative newbie at R and didn't realise that str() would do that. I always thought it was some kind of string function. Thanks, it makes life much easier. --- Natalie O'Toole [EMAIL PROTECTED] wrote: Hi, I'm getting an error message: Error in df[, 1:4] * df[, 5] : non-numeric argument to binary operator In addition: Warning message: Incompatible methods (Ops.data.frame, Ops.factor) for * here is my code: ##reading in the file happyguys-read.table(c:/test4.dat, header=TRUE, row.names=1) ##subset the file based on Select If test-subset(happyguys, PROV==48 GRADE == 7 Y_Q10A 9) ##sorting the file mydata-test mydataSorted-mydata[ order(mydata$Y_Q10A), ] print(mydataSorted) ##assigning a different name to file happyguys-mydataSorted ##trying to weight my data data.frame-happyguys df-data.frame df1-df[, 1:4] * df[, 5] ##getting error message here?? Error in df[, 1:4] * df[, 5] : non-numeric argument to binary operator In addition: Warning message: Incompatible methods (Ops.data.frame, Ops.factor) for * Does anyone know what this error message means? I've been reviewing R code all day getting more familiar with it Thanks, Nat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] thousand separator (was RE: weight)
Nothing! My mistake! gsub -- not sub -- is what you want to get 'em all. -- Bert Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz Sent: Monday, April 30, 2007 10:18 AM To: Bert Gunter Cc: r-help@stat.math.ethz.ch Subject: Re: [R] thousand separator (was RE: weight) Bert, What am I missing? print(as.numeric(gsub(,, , 1,123,456.789)), 10) [1] 1123456.789 FWIW, this is using: R version 2.5.0 Patched (2007-04-27 r41355) Marc On Mon, 2007-04-30 at 10:13 -0700, Bert Gunter wrote: Except this doesn't work for 1,123,456.789 Marc. I hesitate to suggest it, but gregexpr() will do it, as it captures the position of **every** match to ,. This could be then used to process the vector via some sort of loop/apply statement. But I think there **must** be a more elegant way using regular expressions alone, so I, too, await a clever reply. -- Bert Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz Sent: Monday, April 30, 2007 10:02 AM To: Liaw, Andy Cc: r-help@stat.math.ethz.ch Subject: Re: [R] thousand separator (was RE: weight) One possibility would be to use something like the following post-import: WTPP [1] 1,106.8250 1,336.5138 str(WTPP) Factor w/ 2 levels 1,106.8250,1,336.5138: 1 2 as.numeric(gsub(,, , WTPP)) [1] 1106.825 1336.514 Essentially strip the ',' characters from the factors and then coerce the resultant character vector to numeric. HTH, Marc Schwartz On Mon, 2007-04-30 at 12:26 -0400, Liaw, Andy wrote: I've run into this occasionally. My current solution is simply to read it into Excel, re-format the offending column(s) by unchecking the thousand separator box, and write it back out. Not exactly ideal to say the least. If anyone can provide a better solution in R, I'm all ears... Andy From: Natalie O'Toole Hi, These are the variables in my file. I think the variable i'm having problems with is WTPP which is of the Factor type. Does anyone know how to fix this, please? Thanks, Nat data.frame': 290 obs. of 5 variables: $ PROV : num 48 48 48 48 48 48 48 48 48 48 ... $ REGION: num 4 4 4 4 4 4 4 4 4 4 ... $ GRADE : num 7 7 7 7 7 7 7 7 7 7 ... $ Y_Q10A: num 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ... $ WTPP : Factor w/ 1884 levels 1,106.8250,1,336.5138,..: 1544 67 1568 40 221 1702 1702 1434 310 310 ... __ --- Douglas Bates [EMAIL PROTECTED] wrote: On 4/28/07, John Kane [EMAIL PROTECTED] wrote: IIRC you have a yes/no smoking variable scored 1/2 ? It is possibly being read in as a factor not as an integer. try class(df$smoking.variable) to see . Good point. In general I would recommend using str(df) to check on the class or storage type of all variables in a data frame if you are getting unexpected results when manipulating it. That function is carefully written to provide a maximum of information in a minimum of space. Yes but I'm an relative newbie at R and didn't realise that str() would do that. I always thought it was some kind of string function. Thanks, it makes life much easier. --- Natalie O'Toole [EMAIL PROTECTED] wrote: Hi, I'm getting an error message: Error in df[, 1:4] * df[, 5] : non-numeric argument to binary operator In addition: Warning message: Incompatible methods (Ops.data.frame, Ops.factor) for * here is my code: ##reading in the file happyguys-read.table(c:/test4.dat, header=TRUE, row.names=1) ##subset the file based on Select If test-subset(happyguys, PROV==48 GRADE == 7 Y_Q10A 9) ##sorting the file mydata-test mydataSorted-mydata[ order(mydata$Y_Q10A), ] print(mydataSorted) ##assigning a different name to file happyguys-mydataSorted ##trying to weight my data data.frame-happyguys df-data.frame df1-df[, 1:4] * df[, 5] ##getting error message here?? Error in df[, 1:4] * df[, 5] : non-numeric argument to binary operator In addition: Warning message: Incompatible methods (Ops.data.frame, Ops.factor) for * Does anyone know what this error message means? I've been reviewing R code all day getting more familiar with it Thanks, Nat __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
Re: [R] exclude the unfit data from the iteration
?try Wrap each iteration in a try() call Also ?tryCatch if you want to get fancy -- and can understand the rather arcane docs. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mohammad Ehsanul Karim Sent: Tuesday, April 24, 2007 3:33 PM To: r-help@stat.math.ethz.ch Subject: [R] exclude the unfit data from the iteration Dear List, Trying to explain my situation as simply as possible for me: I am running a series of iteration on coxph model on simulated data (newly generated data on each iteration to run under coxph; in my example below- sim.fr is the generated data). However, sometimes i get warning messages like Ran out of iterations and did not converge or Error in var(x, na.rm = na.rm) : missing observations in cov/cor because in some cases one of my covariate (say, var5 or var6 or both who are binary variables) becomes all 0's! How do I exclude the unfit data (that does not fit/converge: that produces warning messages) that may be generated in any iteration, and still continue by replacing it by the next iteration data (untill it generates acceptable data that does not give any trouble like not converging)? Is there any provision in R? sim.result - function(...){ ... fit.gm.em - coxph(Surv(times,censored) ~ var1+var2+var3+var4+var5+var6 + frailty(id,dist='gamma', method='em'), data= sim.fr) ... } I know options(warn=-1) can hide warning messages, but I need not hide the problem, all i need to do is to tell the program to continue untill fixed number of times (say, 100) it iterates with good data. Thank you for your time. Mohammad Ehsanul Karim (R - 2.3.1 on windows) Institute of Statistical Research and Training University of Dhaka __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting mixed models to censored data?
Douglas: AFAIK, this is subject area of active current research. Diggle, Heagerty, Liang, and Zeger , 2002, (ANALYSIS OF LONGITUDINAL DATA) say on p.316: An emerging consensus is that analysis of data with potentially informative dropouts necessarily involves assumptions which are difficult, or even impossible, to check from the observed data. This was ca 1994, I believe, so I don't know whether this view is still held among experts (which I am not). But if it is, you may do well to be careful of whatever SAS does even if you do have to go running off to it. Cheers, Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Douglas Grove Sent: Monday, April 23, 2007 10:58 AM To: r-help@stat.math.ethz.ch Subject: [R] fitting mixed models to censored data? Hi, I'm trying to figure out if there are any packages allowing one to fit mixed models (or non-linear mixed models) to data that includes censoring. I've done some searching already on CRAN and through the mailing list archives, but haven't discovered anything. Since I may well have done a poor job searching I thought I'd ask here prior to giving up. I understand that SAS's proc nlmixed can accomodate censoring (though proc mixed apparently can't), so if I can't find something available in R, I'll have to break down and use that. Please, save me from having to use SAS! Thanks much, Doug __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] summary and min max
I believe it is fair to say that this is where (S3 to keep it simple) classes come in handy: Class the sorts of objects you're working with, say MyClass, and then write your own summary.MyClass() method. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robert Duval Sent: Monday, April 23, 2007 4:16 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] summary and min max Has anyone created an alternative summary method where the rounding is made only for digits to right of the decimal point? I personally don't like the way summarize works on this particular issue, but I'm not sure how to modify it generically... (of course one can always set digits=something_big but this is not elegant and unpractical when one doesn't know in advance the magnitude of a number) robert On 4/23/07, Mike Prager [EMAIL PROTECTED] wrote: Sebastian P. Luque [EMAIL PROTECTED] wrote: I came across a case where there's a discrepancy between minimum and maximum values reported by 'summary' and the 'min' and 'max' functions: summary() rounds by default. Thus its reporting oddball values is considered a feature, not a bug. -- Mike Prager, NOAA, Beaufort, NC * Opinions expressed are personal and not represented otherwise. * Any use of tradenames does not constitute a NOAA endorsement. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] inconsistent output using 'round'
It has nothing to do with round() -- it's the digits argument of the print method that controls the number of digits in the output, print.default in this case. And the documentation from print.default says for the digits argument: digits: a non-null value for digits specifies the minimum number of significant digits to be printed in values. The default, NULL, uses getOption(digits) And, lo and behold, your output shows a minimum of 3 **significant** digits with more being used in tables to line up values that are both greater and less than 1. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bob Green Sent: Thursday, April 19, 2007 2:05 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] inconsistent output using 'round' Peter, Many thanks. I have never seen a confidence interval from 0.000 to 6265941604681544800.000 - this is a worry. I am also still puzzled why use of digits = 3, produced output which includes 2, 3 and 4 decimal points as per below. The two decimal point values for the coef should have been 2.479, 1.027, 1.614. regards Bob print(exp(coef(mod.multacute)),digits = 3) (Intercept) in.acute.dangery violent.convictionsy GBH.UW 0.233 3.900.714 homicide 0.183 2.480.682 in.acute.dangery:violent.convictionsy GBH.UW1.03 homicide 1.61 print(exp(confint(mod.multacute)),digits =3) , , GBH.UW 2.5 % 97.5 % (Intercept) 0.130 0.417 in.acute.dangery 1.384 10.970 violent.convictionsy 0.213 2.390 in.acute.dangery:violent.convictionsy 0.146 7.200 , , homicide 2.5 % 97.5 % (Intercept) 0.0964 0.349 in.acute.dangery 0.7194 8.543 violent.convictionsy 0.1747 2.660 in.acute.dangery:violent.convictionsy 0.1767 14.738 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bquote in plot.default vs plot.formula ?
Folks: If it's not too technical, could someone explain the following: x - 1:5; y - x ## The following 3 all work as expected: plot(x,y, main= expression(sin(x+1))) plot(y~x, main= expression(sin(x+2 ))) plot(x,y, main= bquote(sin(x+3))) ## The following does not: plot(y~x, main= bquote(sin(x+4))) ## Perhaps more interesting results occur if log[10] is substituted for sin in these expressions. The last plot command then produces the error message: Error in log[10] : object is not subsettable Feel free to reply offline if you think that's more appropriate. Version info below. Cheers, Bert Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 Version info: R.Version() $platform [1] i386-pc-mingw32 $arch [1] i386 $os [1] mingw32 $system [1] i386, mingw32 $status [1] $major [1] 2 $minor [1] 4.1 $year [1] 2006 $month [1] 12 $day [1] 18 $`svn rev` [1] 40228 $language [1] R $version.string [1] R version 2.4.1 (2006-12-18) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nlm() and optim()
Numerical optimization is sensitive to (at least) the method chosen, control/convergence specifications, and the parameterization of the function being optimized (all of this is well known). Defining what you mean by reproduce in a precise, operational way is therefore essential. You have not done so. For example, if it is the negative (ln)likelihood of a statistical model that is being minimized, if the model is overparametrized so that there are near identifiability issues,the confidence region for the parameters will essentially be a (possibly quite irregular)lower dimensional subspace (submanifold) of the full parameter space. Would you say that results reproduce if they fall within this confidence region, even though they may be quite different than the estimated minima? Issues with possibly multiple local minima also complicate matters. Bottom line: Determining when you havereproduced results from complex modelling that rely on numerical optimization for model fitting can be difficult. Careful and parsimonious modelling is vital. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Silvia Lucato Sent: Tuesday, April 10, 2007 7:32 AM To: r-help@stat.math.ethz.ch Subject: [R] nlm() and optim() Dear R-users, I have just joint the list and much appreciate any thoughts on 2 issues. Firstly, I want to reproduce some minimization results conducted in MATLAB. I have suceeded with nlm and optim-method CG. I have been told that I should get also with other optim methods. Actually, I found the same results when testing a very straightforward equation. However with a more complicated model it was not true. It is realy possible? Have I got it by chance in the simple case? Secondly, in order to check which optimization is more suitable for our study, I would like to have the value of the minimized parameters on each iteration to later plot a likelihood surface. However, for both nlm and optim, I could only keep the last iteration results. Is there a way to store/record the minimized values for each iteration? Sorry if these questions are reocuring. I have been searching for hints but did not get too far and I am fairly new to R. Comments and examples are most welcome. Silvia Hadeler - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a data frame from a list
Dmitri: As you apparently have not received a reply IMHO, one of the glories of R is the ease with which you can create de novo solutions for little problems like this yourself. While there may be more efficient,robust, and elegant solutions already available, it can frequently be considerably more time consuming to find and figure them out, as you appear to have experienced. (And once outside base R and standard packages, documentation can be problematic). Anyway, whether you agree with that propoganda or not, here is a little function (no claim for elegance or efficiency!) that does what you want, I think: makeFrame-function(xlist) { allnames - sort(unique(unlist(sapply(xlist,names data.frame(lapply(xlist,function(y,an)structure(y[match(an,names(y))], names=NULL), an=allnames),row.names=allnames) } ##test it lst $a A B 1 8 $b A B C 2 3 0 $c B D 2 0 makeFrame(lst) a b c A 1 2 NA B 8 3 2 C NA 0 NA D NA NA 0 Cheers, Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dimitri Szerman Sent: Thursday, April 05, 2007 11:58 AM To: R-Help Subject: [R] creating a data frame from a list Dear all, A few months ago, I asked for your help on the following problem: I have a list with three (named) numeric vectors: lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0) ) lst $a A B 1 8 $b A B C 2 3 0 $c B D 2 0 Now, I'd love to use this list to create the following data frame: dtf = data.frame(a=c(A=1,B=8,C=NA,D=NA), + b=c(A=2,B=3,C=0,D=NA), + c=c(A=NA,B=2,C=NA,D=0) ) dtf ab c A 1 2 NA B 8 3 2 C NA 0 NA D NA NA0 That is, I wish to merge the three vectors in the list into a data frame by their (row)names. And I got the following answer: library(zoo) z - do.call(merge, lapply(lst, function(x) zoo(x, names(x rownames(z) - time(z) coredata(z) However, it does not seem to be working. Here's what I get when I try it: lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0) ) library(zoo) z - do.call(merge, lapply(lst, function(x) zoo(x, names(x Error in if (freq 1 identical(all.equal(freq, round(freq)), TRUE)) freq - round(freq) : missing value where TRUE/FALSE needed In addition: Warning message: NAs introduced by coercion and z was not created. Any ideas on what is going on here? Thank you, Dimitri __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wikibooks
Question: Many (perhaps most?) questions on the list are easily answerable simply by checking existing R Docs (Help file/man pages, Intro to R, etc.). Why would a Wiki be more effective in deflecting such questions from the mailing list than them? Why would too helpful R experts be more inclined to refer people to the Wiki than the existing docs? Bottom line: it's psychology at issue here, I think, not the form of the docs. Disclaimer 1: None of this is meant to reflect one way or ther other on the usefulness of Wikis as a documentation format -- only their ability to change the Help list culture. Disclaimer 2: Others have repeatedly made similar comments (asking us to refer people to the docs rather than providing explicit answers, I mean). Cheers, Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr Sent: Thursday, March 29, 2007 3:32 PM To: Ben Bolker Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Wikibooks Ben Bolker wrote: Alberto Monteiro albmont at centroin.com.br writes: As a big fan of Wikipedia, it's frustrating to see how little there is about R in the correlated project, the Wikibooks: http://en.wikibooks.org/wiki/R_Programming Alberto Monteiro Well, we do have an R wiki -- http://wiki.r-project.org/rwiki/doku.php -- although it is not as active as I'd like. (We got stuck halfway through porting Paul Johnson's R Tips to it ...) Please contribute! Most of the (considerable) effort people expend in answering questions about R goes to the mailing lists -- I personally would like it if some tiny fraction of that energy could be redirected toward the wiki, where information can be presented in a nicer format and (ideally) polished over time -- rather than having to dig back through multiple threads on the mailing lists to get answers. (After that we have to get people to look for the answers on the wiki.) I would like to strongly second Ben. In some ways, R experts are too nice. Continuing to answer the same questions over and over does not lead to a better way using R wiki. I would rather see the work go into enhancing the wiki and refactoring information, and responses to many r-help please for help be see wiki topic x. While doing this let's consider putting a little more burden on new users to look for good answers already provided. Frank Just my two cents -- and I've been delinquent in my wiki'ing recently too ... Ben Bolker __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Completely off topic, but amusing?
Folks: Thought that many on this list might find this amusing, perhaps even a bit relevant. Hope it's OK: WASHINGTON - The government's estimate of the number of Americans without health insurance fell by nearly 2 million Friday, but not because anyone got health coverage. The Census Bureau http://search.news.yahoo.com/search/news/?p=Census+Bureau said it has been overstating the number of people without health insurance since 1995. The bureau blamed the inflated numbers on a **12-year-old computer programming error**.[emphasis added -- BG] ** So what does validated software really mean? (Rhetorical question -- no reply sought). Cheers to all, Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Colored boxes with values in the box
Sounds like ?image what you are looking for, perhaps? Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pappu, Kartik Sent: Thursday, March 22, 2007 3:42 PM To: r-help@stat.math.ethz.ch Subject: [R] Colored boxes with values in the box Hi all, I have a x, y matrix of numbers (usually ranging from 0 to 40). I need to group these numbers and assign a color to each group (for example 0 to 15 - Blue, 16-30- Yellow, and 31-40- Red). Then I need to draw a rectangular matrix which contains X x Y boxes and each box has the corresponding value from the input matrix and is also colored according to which group (i.e red, yellow, or blue) that value falls into. I have used the color2D.matplot function from the plotrix package, but I cant quite figure out how to group the values to represent red blue and yellow colors. Thanks Kartik -- IMPORTANT WARNING: This email (and any attachments) is only...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bad points in regression
(mount soapbox...) While I know the prior discussion represents common practice, I would argue -- perhaps even plead -- that the modern(?? 30 years old now) alternative of robust/resistant estimation be used, especially in the readily available situation of least-squares regression. RSiteSearch(Robust) will bring up numerous possibilities.rrcov and robustbase are at least two packages devoted to this, but the functionality is available in many others (e.g. rlm() in MASS). Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ted Harding Sent: Friday, March 16, 2007 6:44 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] Bad points in regression On 16-Mar-07 12:41:50, Alberto Monteiro wrote: Ted Harding wrote: alpha - 0.3 beta - 0.4 sigma - 0.5 err - rnorm(100) err[15] - 5; err[25] - -4; err[50] - 10 x - 1:100 y - alpha + beta * x + sigma * err ll - lm(y ~ x) plot(ll) ll is the output of a linear model fiited by lm(), and so has several components (see ?lm in the section Value), one of which is residuals (which can be abbreviated to res). So, in the case of your example, which(abs(ll$res)2) 15 25 50 extracts the information you want (and the 2 was inspired by looking at the residuals plot from your plot(ll)). Ok, but how can I grab those points _in general_? What is the criterium that plot used to mark those points as bad points? Ahh ... ! I see what you're after. OK, look at the plot method for lm(): ?plot.lm ## S3 method for class 'lm': plot(x, which = 1:4, caption = c(Residuals vs Fitted, Normal Q-Q plot, Scale-Location plot, Cook's distance plot), panel = points, sub.caption = deparse(x$call), main = , ask = prod(par(mfcol)) length(which) dev.interactive(), ..., id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75) where (see further down): id.n: number of points to be labelled in each plot, starting with the most extreme. and note, in the default parameter-values listing above: id.n = 3 Hence, the 3 most extreme points (according to the criterion being plotted in each plot) are marked in each plot. So, for instance3, try plot(ll,id.n=5) and you will get points 10,15,25,28,50. And so on. But that pre-supposes that you know how many points are exceptional. What is meant by extremeis not stated in the help page ?plot.lm, but can be identified by inspecting the code for plot.lm(), which you can see by entering plot.lm In your example, if you omit the line which assigns anomalous values to err[15[, err[25] and err[50], then you are likely to observe that different points get identified on different plots. For instance, I just got the following results for the default id.n=3: [1] Residuals vs Fitted: 41,53,59 [2] Standardised Residuals:41,53,59 [3] sqrt(Stand Res) vs Fitted: 41,53,59 [4] Cook's Distance: 59,96,97 There are several approaches (with somewhat different outcomes) to identifying outliers. If you apply one of these, you will probably get the identities of the points anyway. Again in the context of your example (where in fact you deliberately set 3 points to have exceptional errors, thus coincidentally the same as the default value 3 of id.n), you could try different values for id.n and inspect the graphs to see whether a given value of id.n marks some points that do not look exceptional relative to the mass of the other points. So, the above plot(ll,id.n=5) gave me one point, 10 on the residuals plot, which apparently belonged to the general distribution of residuals. Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 16-Mar-07 Time: 13:43:54 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Connecting R-help and Google Groups?
I know nothing about Google Groups, but FWIW, I think it would be most unwise for R/CRAN to hook up to **any** commercially sponsored web portals. Future changes in their policies, interfaces,or access conditions may make them inaccessible or unfreindly to R users. So long as we have folks willing and able to host and maintain our lists as part of the CRAN infrastructure, CRAN maintains control. I think this is wise and prudent. I am happy to be educated to the contrary if I misunderstand how this would work. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Lynch Sent: Wednesday, March 14, 2007 8:48 AM To: R-help@stat.math.ethz.ch Subject: [R] Connecting R-help and Google Groups? This morning I tried to see if I could find the r-help mailing list on Google Groups, which has an interface that I like. I found three Google Groups (The R Project for Statistical Computing, rproject, and rhelp) but none of them are connected to the r-help list. Is there perhaps some reason why it wouldn't be a good thing for there to be a connected Google Group? I think it should be possible to set things up so that a post to the Google Group goes to the r-help mailing list, and vice-versa. Also, does anyone know why the three existing R Google Groups failed to get connected to r-help? It might require some action on the part of the r-help list administrator. Thanks, --Paul -- Paul Lynch Aquilent, Inc. National Library of Medicine (Contractor) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to modify a column of a matrix
?cut ## if you have several bins, where ifelse becomes messy Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz Sent: Monday, March 12, 2007 11:25 AM To: Sergio Della Franca Cc: r-help@stat.math.ethz.ch Subject: Re: [R] How to modify a column of a matrix On Mon, 2007-03-12 at 18:55 +0100, Sergio Della Franca wrote: Dear R-helpers, I'm trying to create a string-code to modify the contents of a column of a matrix. For example, I have this dataset: YEAR PRODUCTS 1992 3253 1993 4144 1994 3246 1996 4144 1997 4087 1998 3836 1999 4379 2000 4072 2001 4202 2002 4554 2003 4456 2004 4738 2005 4144 I want to convert/update the values of the column PRODUCTS under some condition (i.e. when the values of PRODUCTS is greather than 4000 replace the values of PRODUCTS whit 0 else replace with 1). My question is the following: there is a function or a metodology that allow to makes this operation? Thank you in advance, Sergio If the data is above is matrix (MAT) and not a data frame: # See ?cbind and ?ifelse MAT - cbind(MAT, NewCol = ifelse(MAT[, PRODUCTS] 4000, 0, 1)) MAT YEAR PRODUCTS NewCol 1 1992 3253 1 2 1993 4144 0 3 1994 3246 1 4 1996 4144 0 5 1997 4087 0 6 1998 3836 1 7 1999 4379 0 8 2000 4072 0 9 2001 4202 0 10 2002 4554 0 11 2003 4456 0 12 2004 4738 0 13 2005 4144 0 If it is a data frame: DF$NewCol - ifelse(DF$PRODUCTS 4000, 0, 1) DF YEAR PRODUCTS NewCol 1 1992 3253 1 2 1993 4144 0 3 1994 3246 1 4 1996 4144 0 5 1997 4087 0 6 1998 3836 1 7 1999 4379 0 8 2000 4072 0 9 2001 4202 0 10 2002 4554 0 11 2003 4456 0 12 2004 4738 0 13 2005 4144 0 HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hwo can i get a vector that...
apply(yourMatrix,1,which.max) Bert Gunter Nonclinical Statistics 7-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of bunny , lautloscrew.com Sent: Wednesday, March 07, 2007 2:12 PM To: r-help@stat.math.ethz.ch Subject: [R] hwo can i get a vector that... dear all, how can i get a vector that shows the number of the column of matrix that contains the maximum of the row ?? can´t believe in need a loop for this... i have a 100 x 3 matrix and want to get a 100 x 1 vector with values 1,2,3 . there must be a simple solution. i just cannot find it. i think am searching on the wrong end. thx for help in advance. m. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to edit my R codes into a efficient way
Have you read An Introduction to R? If not, do so before posting any further questions. Once you have read it, pay attention to what it says about lists, which is a very general data structure (indeed, **the** most general) that is very convenient for this sort of task. The general approach that one uses is something like: ContentsOfFiles - lapply(filenameVector, functionThatReadsFile,additionalParametersto Function) More specifically, ContentsOfFiles - lapply(filenameVector, read.csv, header=TRUE, quote=,fill=TRUE) see ?lapply Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Xuhong Zhu Sent: Tuesday, March 06, 2007 7:19 AM To: r-help@stat.math.ethz.ch Subject: [R] how to edit my R codes into a efficient way Hello, Everyone, I am a student an a new learner of R and I am trying to do my homework in R. I have 10 files need to be read and process seperately. I really want to write the codes into something like macro to save the lines instead of repeating 10 times of similar work. The following is part of my codes and I only extracted three lines for each repeating section. data.1 - read.csv(http://pegasus.cc.ucf.edu/~xsu/CLASS/STA6704/pat1.csv;, header = TRUE, sep = ,, quote = , fill = TRUE); data.2 - read.csv(http://pegasus.cc.ucf.edu/~xsu/CLASS/STA6704/pat3.csv;, header = TRUE, sep = ,, quote = , fill = TRUE); data.3 - read.csv(http://pegasus.cc.ucf.edu/~xsu/CLASS/STA6704/pat4.csv;, header = TRUE, sep = ,, quote = , fill = TRUE); baby.1 - data.frame(cuff=data.1$avg_value, time=seq(1,dim(data.1)[1]), patient=rep(1, dim(data.1)[1])) baby.2 - data.frame(cuff=data.2$avg_value, time=seq(1,dim(data.2)[1]), patient=rep(3, dim(data.2)[1])) baby.3 - data.frame(cuff=data.3$avg_value, time=seq(1,dim(data.3)[1]), patient=rep(4, dim(data.3)[1])) I also tried the codes below but it doesn't work. for(n in 1:10){ mm - data.frame(cuff=paste(data,n, sep=.)$avg_value, time=seq(1,dim(paste(data,n, sep=.))[1]), patient=rep(1,paste(data,n, sep=.))[1])) assign(paste(baby,n,sep=.), mm)} I am looking forward to your help and thanks very much! Xuhong __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Off topic:Spam on R-help increase?
Folks: In the past 2 days I have seen a large increase of spam getting into R-help. Are others experiencing this problem? If so, has there been some change to the spam filters on the R-servers? If not, is the problem on my end? Feel free to reply privately. Thanks. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recalling and printing multiple graphs. Is there somethingin the HISTORY menu that will help?
See FAQ for Windows 5.2 and the referenced README. ?win.metafile and ?replayPlot might allow you to replay the saved plot history (by default in .SavedPlots) into a file in emf or wmf format, I think, but I haven't actually tried this -- don't know if it will work for multiple graphs. Let us know if this approach works if you don't get a definitive answer elsewhere. Cheers, Bert Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Sorkin Sent: Tuesday, March 06, 2007 9:44 AM To: r-help@stat.math.ethz.ch Subject: [R] Recalling and printing multiple graphs. Is there somethingin the HISTORY menu that will help? I have written an R function that produces multiple graphs. I use par(ask=TRUE) to allow for the inspection of each graph before the next graph is drawn. I am looking for a way to recall all graphs drawn in an R session, and a method that can be used to print all the graphs at one time. I know that I could simply print each graph after I inspect the graph, but this gets tiresome if one's function produces tens of graphs. I suspect that if I knew more about the history menu (which currently has an entry RECORDING) I could get the graphs to be replayed and printed, but alas I have not been able to find instructions for using the HISTORY menu. Please take pity on my when you let me know that some easy search or command could get me the information I needed. I have looked, but clearly in the wrong places. John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC, University of Maryland School of Medicine Claude D. Pepper OAIC, University of Maryland Clinical Nutrition Research Unit, and Baltimore VA Center Stroke of Excellence University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) [EMAIL PROTECTED] Confidentiality Statement: This email message, including any attachments, is for the so...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] from function to its name?
Seth is, of course, correct, but perhaps the following may help: ## function that takes a function as an argument foo - function(f,x)list(deparse(substitute(f)),f(x)) ## Value is a list of length 2; first component is a character string giving the name of the funtion; second component is the result of applying the function to the x argument. ##pass in the name (UNquoted) of the function as the first argument ## This works because the evaluator looks up the function that the symbol is bound to in the usual way foo(mean, 1:5) [[1]] [1] mean [[2]] [1] 3 ## pass in an unnamed function as the first argument foo(function(y)sum(y)/length(y), 1:5) [[1]] [1] function(y) sum(y)/length(y) [[2]] [1] 3 ## the following gives an error since the first argument is a character string, not a name/symbol: foo(f=mean, 1:5) Error in foo(f = mean, 1:5) : could not find function f Cheers, Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Seth Falcon Sent: Friday, March 02, 2007 9:18 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] from function to its name? Ido M. Tamir [EMAIL PROTECTED] writes: I wanted to pass a vector of functions as an argument to a function to do some calculations and put the results in a list where each list entry has the name of the function. I thought I could either pass a vector of function names as character, then retrieve the functions etc... Or do the opposite, pass the functions and then retrieve the names, but this seems not to be possible it occurred to me, hence my question. Functions don't have to have names, by which I mean that the definition doesn't have to be bound to a symbol. If your function takes a list of functions then: yourFunc(theFuncs=list(function(x) x + 1)) You could force the list to have names and use them. Or you could force function names to be passed in (your other idea). + seth __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R code for Statistical Models in S ?
The White Book provides the original S Language Specification. This was what existed at Bell labs way back then. Subsequent implementations, both S-Plus and R, will differ on details. Also, a lot of development effort has flowed over the dam since publication, so both implementations contain lots of stuff not even mentioned there.See also the Green book. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Charilaos Skiadas Sent: Thursday, March 01, 2007 12:56 PM To: R-Mailingliste Subject: [R] R code for Statistical Models in S ? I just acquired a copy of Statistical Models in S, I guess most commonly known as the white book, and realized to my dismay that most of the code is not directly executable in R, and I was wondering if there was a source discussing the things that are different and what the new ways of calling things are. For instance, the first obstacle was the solder.balance data set. I found a solder data set in rpart, which is very close to it except for the fact that the Panel variable is not a factor, but that's easily fixed. The first problem is the next two calls, on pages 2 and 3. One is plot(solder.balance), which is supposed to produce a very different plot than it does in R (I actually don't know the name of the plot, which is part of the problem I guess). Then one is supposed to call plot.factor(skips ~ Opening + Mask), which I took to mean: plot(skips ~ Opening + Mask, data=solder), and that worked, though I still haven't been able to make a direct call to plot.factor work (I keep getting a could not find function plot.factor error). Anyway, just wondered whether there is some page somewhere that discusses these little differences here and there, as I am sure there will be a number of other problems such as these along the way. Haris Skiadas Department of Mathematics and Computer Science Hanover College __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Default par() options
Thomas: I am not sure exactly what you are asking for below, but I wonder if your query could be satisfied by the judicious use of the ... argument in a wrapper function to par(), like myPar=function(bg=lightgray, pch=19,...)par(bg=bg,pch=pch,...) or perhaps myX11 - function(width=10, bg=lightgray, pch=19,...) { X11(width=width) par(bg=bg,pch = pch,...) } This would use the existing user-chosen defaults for the respective devices if no other values were provided, and would allow the user to explicitly specify any different values for them or additional arguments to par if needed. I agree that it ain't elegant, though, so I'd welcome better alternatives, too. Of course, one can explicitly use formals() and the construction: dots - as.list(substitute(list(...)))[-1] ## VR: S PROGRAMMING p. 46 to obtain all the arguments and their names and appropriately stuff them into either par() or X11() using do.call() or something similar; but that seems like more than you need here. Anyway, HTH. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Petr Klasterecky Sent: Thursday, March 01, 2007 12:51 PM To: Thomas Friedrichsmeier Cc: [EMAIL PROTECTED] Subject: Re: [R] Default par() options I am no expert on these topics but currently I am solving a similar issue using the .Rprofile file and the .First function. So maybe it's enough to put .First - function(){ par(whatever you want) further instructions if neccessary } Petr Thomas Friedrichsmeier napsal(a): The following question/idea came up on the RKWard development mailing list, but might be of general interest: Is there a nice way to customize the default look of all graphs on all devices? I.e. a way to - for instance - set the following options before each plot: par(bg=light gray, las=2, pch=19) As far as I have found, there would currently be two ways to do this: 1) Adding the above statement manually after opening the device, and before starting the plot. It could of course be wrapped inside a custom function to save some typing, but you'd still need to make sure to always add the command. 2) Overriding all device functions with something like: X11 - function (...) { grDevices::X11 (...) par ([custom options]) } This would be feasible, but feels rather dirty. Also, something substantially more elaborate would be needed to honor e.g. fonts and bg arguments, if explicitely specified in the call to X11. Would have to be done for each device separately. Does a third, more elegant solution exist? If not, would the following idea have any chances of being added to R? Create a new options(par.default), similar to the already existing options(par.ask.default). This would take a list of par() options to set a default value for, like e.g.: options(par.default=list(bg=light gray, las=2, pch=19)) Only those options would need to be specified in the list, for which you actually want to set a default different from the built-in. Options explicitely specified in X11(), plot(), additional calls to par(), etc. would take precedence over options(par.default). Regards Thomas Friedrichsmeier -- -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Petr Klasterecky Dept. of Probability and Statistics Charles University in Prague Czech Republic __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Packages in R for least median squares regression and computingoutliers (thompson tau technique etc.)
Packages MASS and robustbase both have this functionality. There may also be others. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of lalitha viswanath Sent: Wednesday, February 28, 2007 10:04 AM To: r-help@stat.math.ethz.ch Subject: [R] Packages in R for least median squares regression and computingoutliers (thompson tau technique etc.) Hi I am looking for suitable packages in R that do regression analyses using least median squares method (or better). Additionally, I am also looking for packages that implement algorithms/methods for detecting outliers that can be discarded before doing the regression analyses. Although some websites refer to lms method under package lps in R, I am unable to find such a package on CRAN. I would greatly appreciate any pointers to suitable functions/packages for doing the above analyses. Thanks Lalitha TV dinner still cooling? Check out Tonight's Picks on Yahoo! TV. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is a expression good for?
See VR's S PROGRAMMING, esp. section 3.5; and section 6.1 and subsequent of the R Language Definition. An expression object is the output of parse(), and so is R's representation of a parsed expression. It is a type of list -- a parse tree for the expression. This means that you can actually find the sorts of things you mention by taking it apart as a list: ex - parse(text = x + y) ex expression(x + y) class(ex) [1] expression ex[[1]] x + y ex[[c(1,1)]] `+` ex[[c(1,2)]] x ex[[c(1,3)]] y There are few if any circumstances when one should do this: this is the job of the evaluator. There are also special tools available for when you really might want to do this sort of thing -- eg. ?formula, ?terms for altering model specifications. But it is tricky to do right and in full generality -- e.g. ?eval and the above references for some of the issues. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alberto Monteiro Sent: Wednesday, February 28, 2007 1:03 PM To: r-help@stat.math.ethz.ch Subject: [R] What is a expression good for? I mean, I can generate a expression, for example, with: z - expression(x+y) But then how can I _use_ it? Is it possible to retrieve information from it, for example, that z is a sum, its first argument is x (or expression(x)) and its second argument is y? Alberto Monteiro __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting of all possible models
... Below -- Bert Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr Sent: Tuesday, February 27, 2007 5:14 AM To: Indermaur Lukas Cc: r-help@stat.math.ethz.ch Subject: Re: [R] fitting of all possible models Indermaur Lukas wrote: Hi, Fitting all possible models (GLM) with 10 predictors will result in loads of (2^10 - 1) models. I want to do that in order to get the importance of variables (having an unbalanced variable design) by summing the up the AIC-weights of models including the same variable, for every variable separately. It's time consuming and annoying to define all possible models by hand. Is there a command, or easy solution to let R define the set of all possible models itself? I defined models in the following way to process them with a batch job: # e.g. model 1 preference- formula(Y~Lwd + N + Sex + YY) # e.g. model 2 preference_heterogeneity- formula(Y~Ri + Lwd + N + Sex + YY) etc. etc. I appreciate any hint Cheers Lukas If you choose the model from amount 2^10 -1 having best AIC, that model will be badly biased. Why look at so many? Pre-specification of models, or fitting full models with penalization, --- ...the rub being how much to penalize. My impression from what I've read is, for prediction, close to the more, the better is the predictor... . Nature rewards parsimony. Cheers, Bert Frank °°° Lukas Indermaur, PhD student eawag / Swiss Federal Institute of Aquatic Science and Technology ECO - Department of Aquatic Ecology Überlandstrasse 133 CH-8600 Dübendorf Switzerland Phone: +41 (0) 71 220 38 25 Fax: +41 (0) 44 823 53 15 Email: [EMAIL PROTECTED] www.lukasindermaur.ch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] looping
You do not say -- and I am unable to divine -- whether you wish to sample with or without replacement: each time or as a whole. In general, when you want to do this sort of thing, the fastest way to do it is just to sample everything you need at once and then form it into a list or matrix or whatever. For example, for sampling 100 each time with replacement 200 times: mySamples - matrix(sample(yourDatavector, 100*200,replace=FALSE),ncol=200) will give you a 100 row by 200 column matrix of samples without replacement from yourDatavector. I hope that you can adapt this to suit your needs. Bert Gunter Nonclinical Statistics 7-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Neil Hepburn Sent: Monday, February 26, 2007 4:11 PM To: r-help@stat.math.ethz.ch Subject: [R] looping Greetings: I am looking for some help (probably really basic) with looping. What I want to do is repeatedly sample observations (about 100 per sample) from a large dataset (100,000 observations). I would like the samples labelled sample.1, sample.2, and so on (or some other suitably simple naming scheme). To do this manually I would smp.1 - sample(10, 100) sample.1 - dataset[smp.1,] smp.2 - sample(10, 100) sample.2 - dataset[smp.2,] . . . smp.50 - sample(10, 100) sample.50 - dataset[smp.50,] and so on. I tried the following loop code to generate 100 samples: for (i in 1:50){ + smp.[i] - sample(10, 100) + sample.[i] - dataset[smp.[i],]} Unfortunately, that does not work -- specifying the looping variable i in the way that I have does not work since R uses that to reference places in a vector (x[i] would be the ith element in the vector x) Is it possible to assign the value of the looping variable in a name within the loop structure? Cheers, Neil Hepburn === Neil Hepburn, Economics Instructor Social Sciences Department, The University of Alberta Augustana Campus 4901 - 46 Avenue Camrose, Alberta T4V 2R3 Phone (780) 697-1588 email [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeated measures in Classification and Regresssion Trees
Andrew: Good question! AFAIK most of the so-called machine learning machinery -- regression and classification trees, SVM's, neural nets, random forests, and other more chic methods (I make no attempt to keep up with all of them) -- ignore error structure; that is, they assume the data are at least independent (not necessarily identically distributed). I don't think merely exchangeable is good enough either, though I may be wrong about this. But I believe you have put your finger on a key issue: although all this cool methodology is usually not terribly concerned with inference (x-validation and bootstrapping being the usual methodology rather than, say, asymptotics), one wonders how biased the estimators are when there are various correlations in the data. I suspect a lot, depending on the nature of the correlations and the methods. I think the moral is: thermodynamics still rules -- there's no free lunch. You are just as likely to produce nonsense using all this nonparametric methodology as you are using parametric methods if you ignore the error structure of the data. Incidentally, I should point out that George Box fulminated on this very issue about 50 years ago. In his statistics classes he always used to say that all the fuss (then) about using non-parametric rank-based methods (e.g. Mann-Whitney-Wilcoxon) rather than parametric t-statistics was silly since the t-statistics were relatively insensitive to deopartures from normality anyway and it was lack of independence, not exact normality, that was the key practical issue, and both approaches were sensitive to that. He published several papers to this effect, of course. Needless to say, I would welcome other -- especially better informed and contrary -- views on these issues, either on or off list. Cheers, Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Park Sent: Friday, February 23, 2007 7:51 AM To: r-help@stat.math.ethz.ch Subject: [R] Repeated measures in Classification and Regresssion Trees Dear R members, I have been trying to find out whether one can use multivariate regression trees (for example mvpart) to analyze repeated measures data. As a non-parametric technique, CART is insensitive to most of the assumptions of parametric regression, but repeated measures data raises the issue of the independence of several data points measured on the same subject, or from the same plot over time. Any perspectives will be welcome, Andy Park (Assistant Professor) Centre for Forest Interdisciplinary Research (CFIR), Department of Biology, University of Winnipeg, 515 Portage Avenue, Winnipeg, Manitoba, R3B 2E9, Canada Phone: (204) 786-9407 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] investigating interactions with mixed models
?interaction.plot Should help you. This works on the data, not the model. A 3-way interaction just means that the 2-way interaction differs among the various levels of the 3rd factor. Clever use of trellis plots (?xyplot -- especially ?panel.linejoin -- gives greater flexibility, but it requires that a steeper learning curve be climbed). In general, the presence of interactions is just another manifestation of the response varying nonlinearly in the factors (**not** in the parameters, of course -- it's a linear model after all). This is essentially always the case, it's just a question of whether the signal/noise ratio (which depends on sample size) is large enough to see it via P-values. So by all means look at the plots and try to understand and interpret what's going on; but by no means assume that p-values above and below a threshhold of .05 are a clear guide to determining this. As usual, statistical significance and scientific relevance are not equivalent, and the degree of overlap between the two is often difficult to judge. Cheers, Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Robinson Sent: Thursday, February 22, 2007 2:32 PM To: R. Baker Cc: r-help@stat.math.ethz.ch Subject: Re: [R] investigating interactions with mixed models Hello Rachel, I don't think that there is any infrastructure for these procedures on lmer objects, yet. If you are willing to use lme instead, then the multcomp package seems to provide post-hoc tests. It is worth noting that there is some doubt as to the validity of the reference distributions for tests of fixed effects in the presence of random effects. http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-are-p_002dvalues-not-displa yed-when-using-lmer_0028_0029_003f Cheers Andrew On Thu, Feb 22, 2007 at 12:32:44PM +, R. Baker wrote: I'm investigating a number of dependent variables using mixed models, e.g. data.lmer45 = lmer(ampStopB ~ (type + stress + MorD)^3 + (1|speaker) + (1|word), data=data) The p-values for some of the 2-way and 3-way interactions are significant at a 0.05 level and I have been trying to find out how to understand the exact nature of the interactions. Does anyone know if it is possible to run post-hoc tests on mixed model (lmer) objects? I have read about TukeyHSD but it seems that this can only be run on anova (aov) objects. Any suggestions would be gratefully appreciated! Rachel Baker -- -- PhD student Dept of Linguistics Sidgwick Avenue University of Cambridge Cambridge __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert to binary to decimal
why not simply: sum(x * 2^(rev(seq_along(x)) - 1)) ? Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 Bert Gunter Nonclinical Statistics 7-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Roland Rau Sent: Thursday, February 15, 2007 8:22 AM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] convert to binary to decimal That was a nice quick distraction. Unfortunately, I am not the first to answer. :-( Anyway, I offer two solutions (which are different from the one of Marc Schwartz); I wrote it quickly but I hope they are correct. Enjoy and thanks, Roland a - c(TRUE, FALSE, TRUE) b - c(TRUE, FALSE, TRUE, TRUE) bin2dec.easy - function(binaryvector) { sum(2^(which(rev(binaryvector)==TRUE)-1)) } bin2dec.recursive - function(binaryvector) { reversed.input - rev(binaryvector) binaryhelper(reversed.input, 0, 0) } binaryhelper - function(binvector, currentpower, currentresult) { if (length(binvector)1) { currentresult } else { if (binvector[1]) { binaryhelper(binvector[-1], currentpower+1, currentresult+2^currentpower) } else { binaryhelper(binvector[-1], currentpower+1, currentresult) } } } bin2dec.easy(a) bin2dec.recursive(a) bin2dec.easy(b) bin2dec.recursive(b) On 2/15/07, Marc Schwartz [EMAIL PROTECTED] wrote: On Thu, 2007-02-15 at 16:38 +0100, Martin Feldkircher wrote: Hello, we need to convert a logical vector to a (decimal) integer. Example: a=c(TRUE, FALSE, TRUE) (binary number 101) the function we are looking for should return dec2bin(a)=5 Is there a package for such a function or is it even implemented in the base package? We found the hexmode and octmode command, but not a binmode. We know how to program it ourselves however we are looking for a computationally efficient algorithm. Martin and Stefan This is a modification of a function that I had posted a while back, so that it handles 'x' as a logical vector. I added the first line in the function to convert the logical vector to it's numeric equivalent and then coerce to character: bin2dec - function(x) { x - as.character(as.numeric(x)) b - as.numeric(unlist(strsplit(x, ))) pow - 2 ^ ((length(b) - 1):0) sum(pow[b == 1]) } a - c(TRUE, FALSE, TRUE) bin2dec(a) [1] 5 HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] font size in plots
In general, most methods for R's generic plot command (try: getAnywhere(plot.hclust)) in R's base graphics system accept further arguments in the (...) portion that provide these sorts of capabilities. ?par will tell you about these further graphical parameters. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Federico Abascal Sent: Wednesday, February 14, 2007 7:42 AM To: r-help@stat.math.ethz.ch Subject: [R] font size in plots Dear members of the list, it is likely a stupid question but I cannot find the information neither in R manuals nor in google. I am generating a plot (from hclust results) but I cannot see properly the labels because the default font size is too large. How can I change it? Thanks! Federico __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Putting splom in a function
Roberto: You need to do what ?xyplot says. as.symbol(groups) is not a variable and is certainly not subsettable. If you have a variable named groups in your data.frame , then ... groups = groups gives the grouping according to the levels of that variable. If you do not, then it may be picking up a variable named groups from somewhere else, probably your global workspace, which may be producihg your unexpected results. Or perhaps there is a variable named groups in you data frame which is not what you think it is. Have you checked? In any case, please examine or run the examples in ?xyplot, especially those that use the group = argument. One note: I do grant you that the phrase variable or expression may be confusing in this context. But do note that ?as.expression explicitly says: 'Expression' here is not being used in its colloquial sense, that of mathematical expressions. Those are calls (see call) in R, and an R expression vector is a list of calls etc, typically as returned by parse. What is meant by the phrase in the xyplot help is expression in its colloquial sense of a math (or more generally, any R) expression, not a formal expression object, which is what the cast as.expression() gives. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Roberto Perdisci Sent: Wednesday, February 14, 2007 1:05 PM To: r-help@stat.math.ethz.ch Subject: [R] Putting splom in a function Hello R list, I have a little problem with splom. I'd like to wrap it in a function, for example: multi.scatterplot - function(data,groups,cols,colors) { splom(~data[,cols], groups = as.symbol(groups), data = data, panel = panel.superpose, col=colors) } and then call it like in multi.scatterplot(iris,Species,1:4,c(green,blue,red)) but the problem is: Error in form$groups[form$subscr] : object is not subsettable if I use groups = groups instead of groups = as.symbol(groups) shomthing is plotted, but not the correct scatterplot. I think the problem is that I don't cast the 'groups' variable to the correct type. Besides as.symbol() I tried also as.expression(), because ?xyplot says groups: a variable or expression to be evaluated in the data frame specified by 'data'. What is the correct type? What as.* should I use? thank you, regards, Roberto __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with subsets and xyplot
?aggregate says: ... the result is reformatted into a data frame containing the variables in by and x. The ones arising from by contain the unique combinations of grouping values used for determining the subsets, and the ones arising from x the corresponding summary statistics for the subset of the respective variables in x. so meansbymsa does not have the same number of rows as your original data frame, which it must for subsetting to work properly (meansbymsa[,2] was recycled to be of the right length by default, which produces the nonsense you got. See ?xyplot) Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Flom Sent: Wednesday, February 07, 2007 12:10 PM To: [EMAIL PROTECTED] Subject: [R] Problem with subsets and xyplot Hello I have a dataframe that looks like this MSA CITY HIVEST YEAR YR CAT 1 0200 Albuquerque 0.50 1996 1996 5 2 0520 Atlanta13.00 1997 1997 5 3 0720 Baltimore 29.10 1994 1994 1 4 0720 Baltimore 13.00 1995 1995 5 5 0720 Baltimore 3.68 1996 1996 3 6 0720 Baltimore 9.00 1997 1997 5 7 0720 Baltimore 11.00 1998 1998 5 8 0875 Bergen-Passaic 51.80 1990 1990 5 many more rows I would like to create some xyplots, but separately for MSAs that are high, moderate or low on HIVEST. Here's what I tried READ IN DATA AND RECODE SOME VARIABLES attach(hivest) cat - CAT cat[cat 5] - 6 msa - as.numeric(MSA) msa[msa == 7361] - 7360 msa[msa == 7362] - 7360 msa[msa == 7363] - 7360 msa[msa == 5601] - 5600 msa[msa == 5602] - 5600 msa[msa == 6484] - 6483 FIND MEANS FOR EACH MSA, FOR SUBSETTING LATER meanbymsa - aggregate(HIVEST, by = list(msa), FUN = mean, na.rm = T) meanbymsa[,2] gives me the column I want; the 25%tile of this column is about 3.1. but when I try plot1 - xyplot(HIVEST~YEAR|as.factor(msa), pch = LETTERS[cat], subset = (meanbymsa[,2] 3.1)) plot1 I don't get what I expect. No errors, and it is a subset, but the subset is NOT MSAs with low values of HIVEST. Any help appreciated. Peter Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St http://cduhr.ndri.org www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in Industry
... two main drawbacks of R at our firm (as viewed by our IT dept) are lack of guaranteed support as well as the difficulty in finding candidates. -- Just an aside: lack of guaranteed support -- absolutely true in theory, absolutely false in practice. I doubt that the voluntary support found on r-help and other R lists can be matched by the guaranteed support of any commercial software product. Not that this makes a difference to the IT group's requirements, of course... Cheers, Bert __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme in R and Splus-7
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of yyan liu Sent: Monday, February 05, 2007 11:25 AM To: r-help@stat.math.ethz.ch Subject: [R] lme in R and Splus-7 Hi: I used the function lme in R and Splus-7. With the same dataset and same argument for the function, I got quite different estimation results from these two software. Anyone has this experience before? Why don't you try searching the archives yourself to see? Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange error in robust package
Probably not worth the effort to try and figure out. Try reinstalling the latest version of the package and repeating. Maybe something got corrupted. Also, while you're at it, make sure you have the latest version or R installed and all your other packages are up to date (robust uses some of them). Bert Gunter Nonclinical Statistics 7-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Monica Pisica Sent: Monday, February 05, 2007 2:34 PM To: r-help@stat.math.ethz.ch Subject: [R] strange error in robust package Importance: High Hi everybody, I am using quite frequently the robust package and until now i never had any problems. Actually last time i used it was last Friday very successfully. Anyway, today anytime i want to use the function fit.models i get the following error even if i use the example form the help file: data(woodmod.dat) woodmod.fm - fit.models(list(Robust = covRob, Classical = cov), data = woodmod.dat) Error in donostah(data, control) : object .Random.seed not found Error in model.list[[i]] : subscript out of bounds Does anybody know what is wrong? Thanks, Monica Palaseanu-Lovejoy USGS / ETI Pro St. Petersburg, FL _ Spaces __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop with string variable AND customizable summary output
Prior answers are certainly correct, but this is where lists and lapply shine: result-lapply(list(UK,USA),function(z)summary(lm(y~x,data=z))) As in (nearly) all else, simplicity is a virtue. If you prefer to keep the data sources as a character vector,dataNames, result-lapply(dataNames,function(z)summary(lm(y~x,data=get(z should work. Note: both of these are untested for the general case where they might be used within a function and may not find the right z unless you pay attention to scope, especially in the get() construction. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, January 29, 2007 8:23 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Loop with string variable AND customizable summary output Dear All, Thank you very much for your help! Carlo -Original Message- From: Wensui Liu [mailto:[EMAIL PROTECTED] Sent: Mon 29/01/2007 15:39 To: Rosa,C Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Loop with string variable AND customizable summary output Carlo, try something like: for (i in c(UK,USA)) { summ-summary(lm(y ~ x), subset = (country = i)) assign(paste('output', i, sep = ''), summ); } (note: it is untested, sorry). On 1/29/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Dear All, I am using R for my research and I have two questions about it: 1) is it possible to create a loop using a string, instead of a numeric vector? I have in mind a specific problem: Suppose you have 2 countries: UK, and USA, one dependent (y) and one independent variable (y) for each country (vale a dire: yUK, xUK, yUSA, xUSA) and you want to run automatically the following regressions: for (i in c(UK,USA)) output{i}-summary(lm(y{i} ~ x{i})) In other words, at the end I would like to have two objects as output: outputUK and outputUSA, which contain respectively the results of the first and second regression (yUK on xUK and yUSA on xUSA). 2) in STATA there is a very nice code (outreg) to display nicely (and as the user wants to) your regression results. Is there anything similar in R / R contributed packages? More precisely, I am thinking of something that is close in spirit to summary but it is also customizable. For example, suppose you want different Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 or a different format display (i.e. without t value column) implemented automatically (without manually editing it every time). In alternative, if I was able to see it, I could modify the source code of the function summary, but I am not able to see its (line by line) code. Any idea? Or may be a customizable regression output already exists? Thanks really a lot! Carlo __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu A lousy statistician who happens to know a little programming (http://spaces.msn.com/statcompute/blog) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange behaviour with equality after simple subtraction
FAQ on R 7.31. ?all.equal ?identical Have you read these? Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mike Prager Sent: Friday, January 26, 2007 8:41 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] strange behaviour with equality after simple subtraction martin sikora [EMAIL PROTECTED] wrote: today while trying to extract data from a list for subsequent analysis, i stumbled upon this funny behavior on my system: x-c(0.1,0.9) 1-x[2] [1] 0.1 x[1] [1] 0.1 x[1]==1-x[2] [1] FALSE x[1]1-x[2] [1] TRUE Not at all strange, an expected property of floating-point arithmetic and one of the most frequently asked questions here. print(0.1, digits=17) [1] 0.1 print(1 - 0.9, digits=17) [1] 0.09998 A simple description of the issue is at http://docs.python.org/tut/node16.html In most cases, it suffices to test for approximate difference or relative difference. The former would look like this if (abs(x[1] - x[2]) eps)) ... with eps set to something you think is an insignificant difference, say 1.0e-10. -- Mike Prager, NOAA, Beaufort, NC * Opinions expressed are personal and not represented otherwise. * Any use of tradenames does not constitute a NOAA endorsement. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Robust PCA?
You seem not to have received a reply. You can use cov.rob in MASS or cov.Mcd in robustbase or undoubtedly others to obtain a robust covariance matrix and then use that for PCA. -- Bert Bert Gunter Nonclinical Statistics 7-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Talbot Katz Sent: Thursday, January 18, 2007 11:44 AM To: r-help@stat.math.ethz.ch Subject: [R] Robust PCA? Hi. I'm checking into robust methods for principal components analysis. There seem to be several floating around. I'm currently focusing my attention on a method of Hubert, Rousseeuw, and Vanden Branden (http://wis.kuleuven.be/stat/Papers/robpca.pdf) mainly because I'm familiar with other work by Rousseeuw and Hubert in robust methodologies. Of course, I'd like to obtain code for this method, or another good robust PCA method, if there's one out there. I haven't noticed the existence on CRAN of a package for robust PCA (the authors of the ROBPCA method do provide MATLAB code). -- TMK -- 212-460-5430home 917-656-5351cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Effect size in GLIM models
Folks: I think this and several other recent posts on ranking predictors are nice illustrations of a fundamental conundrum: Empirical models are fit as good *predictors*; meaningful interpretation of separate parameters/components of the predictors may well be difficult or impossible, especially in complex models. All that the fitting process guarantees if it works well is a good overall predictor to data sampled from the same process. Unfortunately, most/much of the time, those who apply the procedures are interested in interpretation, not prediction. Addendum: Interpretation is helped by well-designed studies and experiments, hindered by data mining of observational data. I don't think any of this is profound, just sometimes forgotten; however, I would welcome public or private reaction to this comment, and especially refinement/corrections. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Prof Brian Ripley Sent: Wednesday, January 17, 2007 6:02 AM To: Behnke Jerzy Cc: Reader Tom; r-help@stat.math.ethz.ch Subject: Re: [R] Effect size in GLIM models On Wed, 17 Jan 2007, Behnke Jerzy wrote: Dear All, I wonder if anyone can advise me as to whether there is a consensus as to how the effect size should be calculated from GLIM models in R for any specified significant main effect or interaction. I think there is consensus that effect sizes are not measured by significance tests. If you have a log link (you did not say), the model coefficients have a direct interpretation via multiplicative increases in rates. In investigating the causes of variation in infection in wild animals, we have fitted 4-way GLIM models in R with negative binomial errors. What exactly do you mean by 'GLIM models in R with negative binomial errors'? Negative binomial regression is within the GLM framework only for fixed shape theta. Package MASS has glm.nb() which extends the framework and you may be using without telling us. (AFAIK GLIM is a software package, not a class of models.) I suspect you are using the code from MASS without reference to the book it supports, which has a worked example of model selection. These are then simplified using the STEP procedure, and finally each of the remaining terms is deleted in turn, and the model without that term compared to a model with that term to estimate probability 'probability' of what? An ANOVA of each model gives the deviance explained by each interaction and main effect, and the percentage deviance attributable to each factor can be calculated from NULL deviance. If theta is not held fixed, anova() is probably not appropriate: see the help for anova.negbin. However, we estimate probabilities by subsequent deletion of terms, and this gives the LR statistic. Expressing the value of the LR statistic as a percentage of 2xlog-like in a model without any factors, gives lower values than the former procedure. I don't know anything to suggest percentages of LR statistics are reasonable summary measures. There are extensions of R^2 to these models, but AFAIK they share the well-attested drawbacks of R^2. Are either of these appropriate? If so which is best, or alternatively how can % deviance be calculated. We require % deviance explained by each factor or interaction, because we need to compare individual factors (say host age) across a range of infections. Any advice will be most gratefully appreciated. I can send you a worked example if you require more information. We do ask for more information in the posting guide and the footer of every message. I have had to guess uncomfortably much in formulating my answers. Jerzy. M. Behnke, The School of Biology, The University of Nottingham, University Park, NOTTINGHAM, NG7 2RD __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self
Re: [R] R editor vs. Tinn-R
Thierry: Instead of discussing this odd behaviour of TINN-R, I would prefer a discussion on importing data through the clipboard. In my opinion it isn't a good a idea to import data with the clipboard. I know that it's a quick and dirty way to get your data fast into R. But I see two major drawbacks. First of all you have no chance of checking what data you imported. This is important when you need to check your results a few days (weeks, months or even years) later. A second drawback is that you won't feel the need to store your data in an orderly fashion. Which often leads to a huge pile of junk, instead of a valuable dataset... - I do not understand this. I do this all the time, easily check the data in R (which has all sorts of powerful capabilities to do this), and easily store the data as part of the .Rdata file that also contains functions, transformations, analyses, etc. that I have used on the data. I do not know what is more orderly and useful than that! So would you care to elaborate? Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eval(parse(text vs. get when accessing a function
?? Or to add to what Peter Dalgaard said... (perhaps for the case of many more functions) Why eval(parse())? What's wrong with if then? g - function(fpost,x){if(fpost==1)f.1 else f.2 }(x) or switch() if you have more than 2 possible arguments? I think your remarks reinforce the wisdom of Thomas's axiom . Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ramon Diaz-Uriarte Sent: Friday, January 05, 2007 10:02 AM To: r-help; [EMAIL PROTECTED] Subject: [R] eval(parse(text vs. get when accessing a function Dear All, I've read Thomas Lumley's fortune If the answer is parse() you should usually rethink the question.. But I am not sure it that also applies (and why) to other situations (Lumley's comment http://tolstoy.newcastle.edu.au/R/help/05/02/12204.html was in reply to accessing a list). Suppose I have similarly called functions, except for a postfix. E.g. f.1 - function(x) {x + 1} f.2 - function(x) {x + 2} And sometimes I want to call f.1 and some other times f.2 inside another function. I can either do: g - function(x, fpost) { calledf - eval(parse(text = paste(f., fpost, sep = ))) calledf(x) ## do more stuff } Or: h - function(x, fpost) { calledf - get(paste(f., fpost, sep = )) calledf(x) ## do more stuff } Two questions: 1) Why is the second better? 2) By changing g or h I could use do.call instead; why would that be better? Because I can handle differences in argument lists? Thanks, R. -- Ramón Díaz-Uriarte Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Some Windows code for GUI-izing workspace loading
Folks: Motivated by the recent thread on setting working directories, below are a couple of functions for GUI-izing saving and loading files **in Windows only** that sort of takes care of this automatically. The simple strategy is just to maintain a file consisting of the filenames of recently saved workspace (.Rdata, etc.)files. Whenever I save a workspace via the function mySave() below, the filename is chosen via a standard Windows file browser, and the filename where the workspace was saved is added to the list if it isn't already there. The recent() function then reads this file and brings up a GUI standard Windows list box (via select.list()) of the first k filenames (default k = 10) to load into the workspace **and** sets the working directory to that of the first file loaded (several can be brought in at once). I offer these functions with some trepidation: they are extremely simple and unsophisticated, and you definitely use them at your own risk. There is no checking nor warning for whether object names in one loaded file duplicate and hence overwrite those in another when more than one is loaded, for example. Nevertheless, I have found the functions handy, as I use the recently used files options on all my software all the time and wanted to emulate this for R. Suggestions for improvement (or better yet, code!) or information about bugs or other stupidities gratefully appreciated. Cheers, Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 Code Follows # mySave- function(recentlistFile=paste(c:/Program Files/R,recentFiles.txt,sep=/), savePlots=FALSE) { ## DESCRIPTION: ## Use a windows GUI to save current workspace ## ARGUMENTS: ##recentlistFile: a quoted character string giving the full pathname/filename to ## the file containing the listof recent files. ## This must be the same as the filename argument of recent() ##The default saves the file in the global R program directory, which means it does not ##have to be changed when updating to new versions of R which I store under ##the global R directory. You may need to change this if you have a different ##way of doing things. ## ## ##savePlots: logical. Should the .SavedPlots plot history be saved? This object can ##be quite large and not saving it often makes saving and loading much faster, ##as well as avoiding memory problems. The default is not to save. if(!savePlots) if(exists(.SavedPlots,where=1))rm(.SavedPlots,pos=1) fname-choose.files(caption='Save As...',filters=Filters['RData',],multi=FALSE) if(fname!=){ save.image(fname) if(!file.exists(recentlistFile))write(fname,recentlistFile,ncol=1) else{ nm-scan(recentlistFile,what=,quiet=TRUE,sep=\n) ## remove duplicate filenames and list in LIFO order write(unique(c(fname,nm)),recentlistFile,ncol=1) } } else cat('\nWorkspace not saved\n') } recent- function(filename=paste(c:/Program Files/R,recentFiles.txt,sep=/),nshow=10, setwork=TRUE) { ## DESCRIPTION: ## GUI-izes workspace loading by bringing up a select box of files containing ## recently saved workspaces to load into R. ## ARGUMENTS: ## file: character. The full path name to the file containing the file list, ## which is a text file with the filenames, one per line. ## ## ## nshow: The maximum number of paths to show in the list ## ## setwork: logical. Should the working directory be set to that of the first file ## loaded? ## find the file containing the filenames if it exists if(!file.exists(filename)) stop(File containing recent files list cannot be found.) filelist-scan(filename,what=character(),quiet=TRUE,sep='\n') len-length(filelist) if(!len)stop(No recent files) recentFiles-select.list(filelist[1:min(nshow,len)],multiple=TRUE) if(!length(recentFiles))stop(No files selected) i-0 for(nm in recentFiles){ if(file.exists(nm)){ load(nm,env=.GlobalEnv) i-i+1 if(i==1 setwork)setwd(dirname(nm)) } else cat('\nFile',nm,'not found.\n') } cat('\n\n',i,paste(' file',ifelse(i==1,'','s'),' loaded\n',sep=)) } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] na.action and simultaneous regressions
Ravi: You misinterpreted my reply -- perhaps I was unclear. I did **not** say that lm() with a matrix response would do it, but that the apply construction or an explicit loop would. As you and the poster noted, lm() produces a separate fit to each column of only the rowwise complete data. Bert Gunter -Original Message- From: Ravi Varadhan [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 03, 2007 2:15 PM To: 'Bert Gunter'; 'Talbot Katz'; r-help@stat.math.ethz.ch Subject: RE: [R] na.action and simultaneous regressions No, Bert, lm doesn't produce a list each of whose components is a separate fit using all the nonmissing data in the column. It is true that the regressions are independently performed, but when the response matrix is passed from lm on to lm.fit, only the complete rows are passed, i.e. rows with no missing values. I looked at lm function, but it was not obvious to me how to fix it. In the following toy example, the degrees of freedom for y1 regression should be 18 and that for y2 should be 15, but both degrees of freedom are only 15. y1 - runif(20) y2 - c(runif(17), rep(NA,3)) x - rnorm(20) summary(lm(cbind(y1,y2) ~ x)) Response y1 : Call: lm(formula = y1 ~ x) Residuals: Min 1Q Median 3Q Max -0.52592 -0.22632 -0.00964 0.25117 0.31227 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.569890.06902 8.257 5.82e-07 *** x -0.123250.06516 -1.8910.078 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.2798 on 15 degrees of freedom Multiple R-Squared: 0.1926, Adjusted R-squared: 0.1387 F-statistic: 3.577 on 1 and 15 DF, p-value: 0.07804 Response y2 : Call: lm(formula = y2 ~ x) Residuals: Min 1Q Median 3Q Max -0.48880 -0.28552 -0.06022 0.23167 0.54425 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.437120.07686 5.687 4.31e-05 *** x0.102780.07257 1.4160.177 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3115 on 15 degrees of freedom Multiple R-Squared: 0.118, Adjusted R-squared: 0.05915 F-statistic: 2.006 on 1 and 15 DF, p-value: 0.1771 Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bert Gunter Sent: Wednesday, January 03, 2007 4:46 PM To: 'Talbot Katz'; r-help@stat.math.ethz.ch Subject: Re: [R] na.action and simultaneous regressions As the Help page says: If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix So there's nothing hidden going on behind the scenes, and apply(cbind(y1,y2),2,function(z)lm(z~x)) (or an explicit loop, of course) will produce a list each of whose components is a separate fit using all the nonmissing data in the column. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Talbot Katz Sent: Wednesday, January 03, 2007 11:56 AM To: r-help@stat.math.ethz.ch Subject: [R] na.action and simultaneous regressions Hi. I am running regressions of several dependent variables using the same set of independent variables. The independent variable values are complete, but each dependent variable has some missing values for some observations; by default, lm(y1~x) will carry out the regressions using only the observations without missing values of y1. If I do lm(cbind(y1,y2)~x), the default will be to use only the observations for which neither y1 nor y2 is missing. I'd like to have the regression for each separate dependent variable use all the non-missing cases for that variable. I would think that there should be a way to do that using the na.action option, but I haven't seen this in the documentation or figured out how to do it on my own. Can it be done this way, or do I have to code the regressions in a loop? (By the way, since it restricts to non-missing values in all the variables simultaneously, is this because it's doing some sort of SUR or other simultaneous equation estimation behind the scenes?) Thanks! -- TMK -- 212-460-5430home 917-656-5351cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Re: [R] sorting by name
This is trivial. help([) and An Introduction to R will tell you how. P.S. As earlier posts today have mentioned, stepwise variable selection is generally a bad idea. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Brooke LaFlamme Sent: Thursday, December 14, 2006 4:34 PM To: r-help@stat.math.ethz.ch Subject: [R] sorting by name Hi all, I'm not sure that there is really a way to do this, but I thought I'd see if anyone knew. I have a file with 1 to n columns all named something like X1, X2, X3Xn. I have another file that has in one column n number of rows. Each row has a number in it (not in order; the ordering of the numbers is important but it isn't in count order). Basically, I would like to order the columns in the first file by the numbers in the rows of the second file. So, if file#2 has these numbers in rows 1-4: [,1] [1,] 2 [2,] 3 [3,] 1 [4,] 4 I would like the first file to look like this: X2 X3 X1 X4 1 Instead of the original order: X1 X2 X3 X4 1 Is this possible? The point of this all is to run a stepwise linear regression that first regresses on X2, then adds in X3, X1, X4 in that order, stopping at each step to assess whether to drop one or more of the previously added variables. Thank you in advance for any suggestions! Brooke LaFlamme __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2 questions
Warning: Something of a personal rant, clearly reflecting my own hangups, and having nothing to do with my company or anyone else! A good reason for most to stop reading now While you will find respondents on this list on the whole quite gracious in their willingness to help newbies learn R, there are limits to this patience. In particular, questions of the sort below seem to me, at least to be clear announcements that the original poster has **not** read the posting guide, nor has made an honest attempt to learn R by studying what I think is quite good basic documentation (see, for example, An Introduction to R; CRAN lists many more similar resources). While I grant that sometimes the online help is a bit terse, I don't think that anyone who has made an honest attempt to read the basic docs would ask such questions. If I am wrong in this, I apologize. But, if not, then I consider the questions unworthy of my time to respond to. Whether right or wrong, queries posted in a way that conveys that impression are less likely to elicit good replies. I guess the moral is that on this list anyway, good behavior is rewarded, and bad behavior is ignored. Cheers, Bert Gunter -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Van Campenhout Bjorn Sent: Tuesday, December 12, 2006 7:51 AM To: r-help Cc: [EMAIL PROTECTED] Subject: Re: [R] 2 questions Hi! I'm new here. Want to ask two possibly quite basic questions: 1. How can I clear all objects in one stroke? how about rm(ls()) try rm(list=ls()) Bjorn 2. How can I perform a regression with independent variables specified by an object? Dh, no spontaneous idea. Greetings, Sebastian Thanks, Tim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combinations of m objects into r groups
This issue has come up before: RSiteSearch(nkpartitions) will find references for you on CRAN. You might also try http://ranau.cs.ui.ac.id/book/AlgDesignManual/BOOK/BOOK4/NODE153.HTM for some background, or google on set partitions. Bottom line: it ain't trivial. Cheers, Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Maria Montez Sent: Tuesday, December 12, 2006 4:07 PM To: r-help@stat.math.ethz.ch Subject: [R] combinations of m objects into r groups Hi! Suppose I have m objects. I need to find out what are all possible ways I can group those m objects into r groups. Moreover, I need to create a matrix that contains what those arrangements are. I've created code for when r=2 but I've come to a halt when trying to generalize it into r groups. For example, if I have m=6 objects and I want to arrange them into groups of r=2, there are a total of 41 possible arrangements. I would like a matrix of the form (showing only 9 possible arrangements): c1 c2 c3 c4 c5 c6 c7 c8 c9 1 1 2 2 2 2 2 1 1 1 2 2 1 2 2 2 2 1 2 2 3 2 2 1 2 2 2 2 1 2 4 2 2 2 1 2 2 2 2 1 5 2 2 2 2 1 2 2 2 2 6 2 2 2 2 2 1 2 2 2 This means that arrangement c1 puts object 1 into group 1 and all other objects into group 2. I've created code for this particular example with two groups. I'm using the subsets function which I've found posted online, in a post that references page 149 of Venables and Ripley (2nd ed). #subsets function computes all possibles combinations of n objects r at a time subsets-function(r,n,v=1:n) { if(r=0) NULL else if(r=n) v[1:n] else rbind(cbind(v[1],Recall(r-1,n-1,v[-1])), Recall(r, n-1,v[-1])) } #labels for objects r - c(1100,1010,1001,0110,0101,0011) m-length(r) for (k in 1:trunc(m/2)){ a - subsets(k, m) for (i in 1:dim(a)[1]){ sub - rep(2, m) b - a[i,] for (j in 1:length(b)){ sub[b[j]]=1 } r - data.frame(r, sub) } } names - c(xcomb) for (i in 1:(dim(r)[2]-1)) { names - c(names,paste(c,i,sep=)) } names(r) - names Any suggestions? Thanks, Maria After searching for help I found a __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove from a string
I second Marc's comments below, but for amusement, another alternative to the (undesirable) eval(call()) construction is: foo-function(x)x^2 get(foo)(1:5) [1] 1 4 9 16 25 I believe this is equally undesirable, however, and as Marc said, making your function a function of two arguments or something similar would be the better approach. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz Sent: Friday, December 08, 2006 6:14 AM To: Katharina Vedovelli Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Remove from a string On Fri, 2006-12-08 at 14:57 +0100, Katharina Vedovelli wrote: Hi all! I have lots of functions called in the following pattern 'NameOfFunctionNumber' where the name always stays the same and the number varies from 1 to 98. Another function which I run in advance returns the number of the function which has to be called next. Now I want to combine 'NameOfFunction' with the 'Number' returned so that i can call the desired function. I do this by: x-c(NameOfFunction,Number) z-paste(x,collapse=) z which returns NameOfFunctionNumber My Problem is that R doesn't recognise this as the name of my function because of the at the beginning and the end. Is there a way of getting rid of those? Or does anybody know another way of solving this problem? Thanks a lot for your help! Cheers, Katharina It is not entirely clear what your ultimate goal is, thus there may be a (much) better approach than calling functions in this manner. What do the functions actually do and does the output vary based upon some attribute (ie. the class) of the argument such that using R's typical function dispatch method would be more suitable. However, to address the specific question, at least two options: NameOfFunction21 - function(x) x^2 eval(call(paste(NameOfFunction, 21, sep = ), 21)) [1] 441 do.call(paste(NameOfFunction, 21, sep = ), list(21)) [1] 441 In both cases, the result is to evaluate the function call, with 21 as the argument. See ?call, ?eval and ?do.call for more information. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summary shows wrong maximum
Folks: Is So this is at best a matter of opinion, and credentials do matter for opinions. -- Brian Ripley an R fortunes candidate? -- Bert Gunter On Tue, 5 Dec 2006, Oliver Czoske wrote: On Mon, 4 Dec 2006, Uwe Ligges wrote: Sebastian Spaeth wrote: Hi all, I have a list with a numerical column cum_hardreuses. By coincidence I discovered this: max(libs[,cum_hardreuses]) [1] 1793 summary(libs[,cum_hardreuses]) Min. 1st Qu. MedianMean 3rd Qu.Max. 1 2 4 36 141790 (note the max value of 1790) Ouch this is bad! Anything I can do to remedy this? Known bug? No, it's a feature! See ?summary: printing is done up to 3 significant digits by default. Unfortunately, '1790' is printed with *four* significant digits, not three. The correct representation with three significant digits would have to employ scientific notation, 1.79e3. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Usage of apply
But do note -- again! -- that the apply family of functions do their magic **internally through looping**, so that they are generally not much faster -- and sometimes a bit slower -- then explicit loops. Their chief advantage (IMO, of course) is in code clarity and correctness, which is why I prefer them. (They are also written to do their looping as efficiently as possible, which explicit looping in user code may not.) Of course, vectorized calculations (colMeans() in the example below) **are** much faster and usually clearer than explicit loops. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chuck Cleland Sent: Wednesday, December 06, 2006 6:54 AM To: R Help Subject: Re: [R] Usage of apply Jin Shusong wrote: Dear R Users, Are there any documents on the usage of apply, tapply, sapply so that I avoid explicit loops. I found that these three functions were quite hard to be understood. Thank you in advance. If you have read the help pages for each and possibly even consulted the reference on those help pages, you may need to elaborate on what parts of these functions you don't understand. You might also describe a loop you are contemplating and ask how it might be replaced by one of these functions. Here is a very simple example of a loop that could be avoided with one of these functions: for(i in 1:4){print(mean(iris[,i]))} [1] 5.84 [1] 3.057333 [1] 3.758 [1] 1.199333 Here is how you would do that with apply(): apply(iris[,1:4], 2, mean) Sepal.Length Sepal.Width Petal.Length Petal.Width 5.84 3.057333 3.758000 1.199333 Even better in this particular case would be: colMeans(iris[,1:4]) Sepal.Length Sepal.Width Petal.Length Petal.Width 5.84 3.057333 3.758000 1.199333 but you don't always want mean() or sum() as the function, so the functions you mention above are more general than colMeans() and similar functions. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summary shows wrong maximum
Mike: I offered no opinion -- and really didn't have any -- about the worthiness of any of the comments that were made. I just liked Brian's little quotable aside. But since you bait me a bit ... In general, I believe that showing th 2-3 most important -- **not significant** -- digits **and no more** is desirable. By most important I mean the leftmost digits which are changing in the data (there are some caveats in the presence of extreme outliers). Printing more digits merely obfuscates the ability of the eye/brain to perceive the patterns of change in the data, the presumed intent of displaying it (not of storing it, of course). Displaying excessive digits to demonstrate (usually falsely) one's precision is evil. Clarity of communications is the standard we should aspire to. These views have been more eloquently expressed by A.S.C Ehrenburg and Howard Wainer among others... -- Bert Bert Gunter Nonclinical Statistics 7-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mike Prager Sent: Wednesday, December 06, 2006 11:46 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] Summary shows wrong maximum I don't know about candidacy, and I'm not going to argue about correctness, but it seems to me that the only valid reasons to limit precision of printing in a statistics program are (1) to save space and (2) to allow for machine limitations. This is neither. To chop off information and replace it with zeroes is just plain nasty. Bert Gunter [EMAIL PROTECTED] wrote: Folks: Is So this is at best a matter of opinion, and credentials do matter for opinions. -- Brian Ripley an R fortunes candidate? -- Bert Gunter On Tue, 5 Dec 2006, Oliver Czoske wrote: On Mon, 4 Dec 2006, Uwe Ligges wrote: Sebastian Spaeth wrote: Hi all, I have a list with a numerical column cum_hardreuses. By coincidence I discovered this: max(libs[,cum_hardreuses]) [1] 1793 summary(libs[,cum_hardreuses]) Min. 1st Qu. MedianMean 3rd Qu.Max. 1 2 4 36 141790 (note the max value of 1790) Ouch this is bad! Anything I can do to remedy this? Known bug? No, it's a feature! See ?summary: printing is done up to 3 significant digits by default. Unfortunately, '1790' is printed with *four* significant digits, not three. The correct representation with three significant digits would have to employ scientific notation, 1.79e3. -- Mike Prager, NOAA, Beaufort, NC * Opinions expressed are personal and not represented otherwise. * Any use of tradenames does not constitute a NOAA endorsement. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stat question - not R question so ignore if not interested
... But of course this is always the question underlying all empirical -- or maybe even scientific -- analysis: is there some other perhaps more fundamental variable out there that I'm missing that would explain what's really going on? I clearly remember George Box commenting on this in his Monday night beer and statistics sessions: after you're done and perhaps have written up and presented your (intricate!) analysis, you're always worried that someone might come along and say, Well, did you consider...? Cheers, Bert Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jonathan Baron Sent: Tuesday, December 05, 2006 1:45 PM To: Richard M. Heiberger Cc: r-help@stat.math.ethz.ch; C. Park; Leeds,Mark (IED) Subject: Re: [R] stat question - not R question so ignore if not interested A classic example used by my colleague Paul Rozin (when he teaches Psych 1) is to compute the correlation between height and number of shoes owned, in the class. Shorter students own more shoes. But ... On 12/05/06 16:34, Richard M. Heiberger wrote: The missing piece is why there are two clusters. There is most likely a two-level factor distinguishing the groups that was not included in the model. It might not even have been measured and now you need to find it. Rich -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Editor: Judgment and Decision Making (http://journal.sjdm.org) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tests for NULL objects
Merely convention. NULL == 2 == logical(0), that is, a logical vector of length 0. It makes sense (at least to me) that any(logical(0)) is FALSE, since no elements of the vector are TRUE. all(logical(0)) is TRUE since no elements of the vector are FALSE. I think these are reasonable and fairly standard conventions, but even if you disagree, they are certainly not worth making a fuss over and certainly cannot be changed without breaking a lot of code, I'm sure. Bert Gunter Nonclinical Statistics 7-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Benilton Carvalho Sent: Wednesday, November 29, 2006 2:21 PM To: R-Mailingliste Subject: [R] tests for NULL objects Hi Everyone, After searching the subject and not being successful, I was wondering if any you could explain me the idea behind the following fact: all(NULL == 2) ## TRUE any(NULL == 2) ## FALSE Thanks a lot, Benilton -- Benilton Carvalho PhD Candidate Department of Biostatistics Johns Hopkins University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.