First, a big thanks to all of the developers and users that have worked to make R such useful software. It is only because I find the software so useful that I have the following opinions.
A recent post to R-devel listed the 'Top 10 Features' for one person. I found it to be quite an interesting read. Over the past couple of years I have assembled my own lists. Retrospective. Some of my favorite things I like about R (vs. S-Plus) 1. Integration with emacs 2. Nice color handling 3. Wealth of packages, easy package updates 4. HTML help 5. More answers on R-news than S-help 6. Active developer community 7. Package creation tools 8. Functions: setwd, with, apropos Prospective. Periodically David Smith at Insightful asks users, "If you had $100, how would you allocate that money to development?" Without listing dollar amounts, these are my personal choices for R. 1. Add "head" and "tail" to R base. Patrick Burns has these: http://www.burns-stat.com/pages/public.html#genutil Very handy functions for checking data manipulation. 2. Strive for self-contained examples in all .Rd files (as far as possible). Generally quite good, but there's always room for improvement. For R base, If I create examples, to whom should I send them (R-devel?) and how (request for change?). Here's one example (by P. Dalgaard) for function 'replace' # Replace in a data frame NA´s with -1? dd <- data.frame(a=c(1,2,NA,4),b=c(NA,2,3,4)) dd[] <- lapply(dd,function(x) replace(x, is.na(x), -1)) 3. Encourage (more) standards for function names. A prominent link on CRAN to the coding conventions would be good. Here is a draft of coding conventions: http://www.maths.lth.se/help/R/RCC/ Partly as a result of the community development of R, the names of functions lack consistency. Consider the following examples: row.names, rownames browseURL, contrib.url, fixup.package.URLs package.contents, packageStatus mahalanobis, TukeyHSD getMethod, getS3method The sooner that conventions are encouraged, the more consistent future function names will be. 4. Increased integration of text and graphics output (for PDF, in particular). Sweave is fantastic for quality reporting, but can be a lot of work when a quick analysis is all that is needed. Often I would like to do something like print a box plot and include an anova table, for example: pdf("file") boxplot(y~x) frame() sink.to.pdf() frame() anova(lm(y~x)) sink() dev.off() I know of no such (simple) tools. Ben Bolker has a an idea here: http://maths.newcastle.edu.au/~rking/R/help/02b/4179.html 5. Drop unused factor levels by default. (At least as a settable option.) This issue has been debated before--I'm just adding my vote and justification. The proportion of time I want data to include unused factor levels is close to zero. The amount of time I spend cleaning data to get rid of unused factor levels is quite substantial. 6. Expanded font control for graphics devices. This is already being considered, so again I'm just adding my vote. See: http://www.stat.auckland.ac.nz/~paul/R/fonts.html 7. Clean up namespace implementation The introduction of namespaces has (for me) been a nuisance without any benefits that I am aware of. I speak as a user, not a package maintainer. I would like to see (1) more education about namespace benefits, (2) more discussions about what is the appropriate role for namespaces and (3) improvements to the documentation, which is now often less correct (if not broken) due to namespaces. For example, help(is.function) doesn't say how functions hidden behind namespaces will be treated. Most help files completely ignore issues with namespaces. Some people will say, "of course namespaces are working exactly as expected"! But that is only true if you expect functions to be hidden...quite a few versions of R trained users otherwise. The quiet introduction of namespaces has broken my modus operandi for: args(predict.lme) is.function(predict.lme) predict.lme(object) exists("predict.lme") Namespaces may be neat/right from a language-design perspective, but have made it more frustrating for me to actually use the software. 8. More consistency in the use of na.action and na.rm. Compare: mean(..., na.rm= ...) lme(..., na.action=... ) Maybe na.action could be added to 'mean' and other functions. There are issues of compatability with S-Plus here... 9. Add 'substitute' to getAnywhere Acutally, the code for getAnywhere already contains 'substitute', so it looks like the author intended for the function to work without a quoted argument. That would be wonderful. Then why does getAnywhere("predict.lme") work but getAnywhere(predict.lme) does not work? (Yet another namespace issue) I'm not the first person to ask this question. Obviously I'm a member of the "blind" population that can't read help files: http://maths.newcastle.edu.au/~rking/R/help/03b/0760.html Another possibility is that the help file could be clearer for us blind folk that interpret "x: a character string or name" to mean that x might not be a character string. (See the help page) 10. More uniformity in quoting arguments. Uniformity outweighs cleverness/exceptions ("The Art of Unix Programming"). Functions accepting non-quoted arguments is.function(obj) args(predict) rm(a) help(help) find(replace) or find("replace") Functions requiring quoted arguments get("help") exists("predict.lme") Some people have claimed "the designers of S knew what they were doing" because you can do clever things like this: i="help" exists(i) But we could just as easily be doing other clever things and have more uniform quoting rules. S is probably too mature for this to really be considered. 11. Have 'aggregate' add logical/default names to its value. I'm basically echoing this thread: http://maths.newcastle.edu.au/~rking/R/help/03b/7517.html Using aggregate(x,by,FUN), I would find it very useful if the factor names in the "by" list carried through to the final aggregate data.frame. Also, when 'x' is a vector (and maybe for other data structures), it would be nice to have the original names included in the result. 12. Wanted: General-purpose mixed-models function/package The nlme library is very nice for mixed-effects models with nested effects, but it is not very general-purpose. Even Bates/Pinheiro have said several times in posts to R-help/S-news that nlme was designed for nested models and using other models can be hard. Bates: "highly unintuitive" (crossed effects model) Bates: "algorithms for lme are tuned for nested random effects" For example, in nlme, The syntax for crossed random effects is quite intimidating Try removing the variance component for Rep in: random=~1|Rep/WholePlot. Try changing an nested effect from random to fixed (or vice-versa). Try to extract lsmeans for fixed-effects in a model. Try to do a multiple-comparison of fixed-effects estimates. Try using AR1xAR1 error structure. The nlme library appears to have tools for this, but again is syntactically difficult. I can find no examples. Most of these tasks would ideally be straightforward in a general-purpose mixed-models function (as they are in SAS, Genstat, etc.) The ASREML software is available in S-Plus (and soon R, I'm told) via the proprietary 'samm' library. Whereas lme seems excellent for basic nested-effects models and difficult for other models, samm excels at crossed-effects models, but doesn't have the plethora of useful print, plot, extractor, and summary methods that are found in nlme. 13. The fantasy list. Go ahead and tell me, "In your dreams!" Deprecate 'update'. Cute, but makes session transcripts hard to read. Remove implicit intercepts in models. Require y~1+x. Force thinking about intercepts. Lattice colors could be more saturated for printing and projecting Rename 'prompt' to something closer to its purpose like makeSkeletonHelp The humble opinion of one devoted user, Kevin Wright ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel