One of the recurring themes in the recent UserR conference was that many people find it difficult to find the functions they need for a particular task. Sandy Weisberg suggested a small idea he would like to see: a hints function that given an object, lists likely operations. I've done my best to implement this function using the tools currently available in R, and my code is included at the bottom of this email (I hope that I haven't just duplicated something already present in R). I think Sandy's idea is genuinely useful, even in the limited form provided by my implementation, and I have already discovered a few useful functions that I was unaware of.
While developing and testing this function, I ran into a few problems which, I think, represent underlying problems with the current documentation system. These are typified by the results of running hints on a object produced by glm (having class c("glm", "lm")). I have outlined (very tersely) some possible solutions. Please note that while these solutions are largely technological, the problem is at heart sociological: writing documentation is no easier (and perhaps much harder) than writing a scientific publication, but the rewards are fewer. Problems: * Many functions share the same description (eg. head, tail). Solution: each rdoc file should only describe one method. Problem: Writing rdoc files is tedious, there is a lot of information duplicated between the code and the documenation (eg. the usage statement) and some functions share a lot of similar information. Solution: make it easier to write documentation (eg. documentation inline with code), and easier to include certain common descriptions in multiple methods (eg. new include command) * It is difficult to tell which functions are commonly used/important. Solution: break down by keywords. Problem: keywords are not useful at the moment. Solution: make better list of keywords available and encourage people to use it. Problem: people won't unless there is a strong incentive, plus good keywording requires considerable expertise (especially in bulding up list). This is probably insoluable unless one person systematically keywords all of the base packages. * Some functions aren't documented (eg. simulate.lm, formula.glm) - typically, these are methods where the documentation is in the generic. Solution: these methods should all be aliased to the generic (by default?), and R CMD check should be amended to check for this situation. You could also argue that this is a deficiency with my function, and easily fixed by automatically referring to the generic if the specific isn't documented. * It can't supply suggestions when there isn't an explicit method (ie. .default is used), this makes it pretty useless for basic vectors. This may not really be a problem, as all possible operations are probably too numerous to list. * Provides full name for function, when best practice is to use generic part only when calling function. However, getting precise documentation may requires that full name. I do the best I can (returning the generic if specific is alias to a documentation file with the same method name), but this reflects a deeper problem that the name you should use when calling a function may be different to the name you use to get documentation. * Can only display methods from currently loaded packages. This is a shortcoming of the methods function, but I suspect it is difficult to find S3 methods without loading a package. Relatively trivial problems: * Needs wide display to be effective. Could be dealt with by breaking description in a sensible manner (there may already by R code to do this. Please let me know if you know of any) * Doesn't currently include S4 methods. Solution: add some more code to wrap showMethods * Personally, I think sentence case is more aesthetically pleasing (and more flexible) than title case. Hadley hints <- function(x) { db <- eval(utils:::.hsearch_db()) if (is.null(db)) { help.search("abcd!", rebuild=TRUE, agrep=FALSE) db <- eval(utils:::.hsearch_db()) } base <- db$Base alias <- db$Aliases key <- db$Keywords m <- all.methods(class=class(x)) m_id <- alias[match(m, alias[,1]), 2] keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1]) f.names <- cbind(m, base[match(m_id, base[,3]), 4]) f.names <- unlist(lapply(1:nrow(f.names), function(i) { if (is.na(f.names[i, 2])) return(f.names[i, 1]) a <- methodsplit(f.names[i, 1]) b <- methodsplit(f.names[i, 2]) if (a[1] == b[1]) f.names[i, 2] else f.names[i, 1] })) hints <- cbind(f.names, base[match(m_id, base[,3]), 5]) hints <- hints[order(tolower(hints[,1])),] hints <- rbind( c("--------", "---------------"), hints) rownames(hints) <- rep("", nrow(hints)) colnames(hints) <- c("Function", "Task") hints[is.na(hints)] <- "(Unknown)" class(hints) <- "hints" hints } print.hints <- function(x, ...) print(unclass(x), quote=FALSE) all.methods <- function(classes) { methods <- do.call(rbind,lapply(classes, function(x) { m <- methods(class=x) t(sapply(as.vector(m), methodsplit)) #m[attr(m, "info")$visible] })) rownames(methods[!duplicated(methods[,1]),]) } methodsplit <- function(m) { parts <- strsplit(m, "\\.")[[1]] if (length(parts) == 1) { c(name=m, class="") } else{ c(name=paste(parts[-length(parts)], collapse="."), class=parts[length(parts)]) } } ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html