On Sat, Oct 1, 2011 at 1:08 PM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 11-08-23 2:23 PM, Janko Thyson wrote: >> >> aDear list, >> >> I'm aware of the fact that I posted on something related a while ago, >> but I just can't sweat this off and would like to ask your for an opinion: >> >> The problem: >> Namespaces are great, but they don't resolve certain conflicts regarding >> name clashes. There are more and more people out there trying to come up >> with their own R packages, which is great also! Yet, it becomes more and >> more likely that programmers will choose identical names for their >> exported functions and/or that they add functionality to existing >> function (i.e. overwriting existing functions). >> The whole process of which packages overwrite which functions is >> somewhat obscure and in addition depends on their order in the search >> path. On the other hand, it is not possible to use "namespace" >> functionality (i.e. 'namespace::fun()'; also less efficient than direct >> call; see illustration below) during early stages of the development >> process (i.e. the package is not finished yet) as there is no namespace >> available yet. >> > > I agree there can be a problem, but I don't think it is necessarily as > serious as you suggest. Even though there are more and more packages > available, most people will still use roughly the same number of them. Just > because CRAN has thousands of packages doesn't mean I use all of them at the > same time. > > >> I know of at least two cases where such overwrites (I think it's called >> masking, right?) led to some confusion at our chair: >> 1) loading package forecast overwrites certain functions in stats which >> made some code refactoring necessary > > If your code had been in a package with a NAMESPACE, it would not have been > affected by a user loading forecast. (If you start importing it, then of > course it could cause masking problems.) > > You suggest above that users only put code into a package very late in the > development process. The solution is, don't do that. Create a package > early on, and use it through the majority of development time. > > You can leave the choice of exports until late by exporting everything; > you'll still get the benefit of the more controlled name search from the > beginning. > > You say you can't use "namespace::call()" until the namespace package has > been written. But why would you want to? If the call is coming from the > new package, objects in it will be used with first priority in resolving the > call. You only need the :: notation when there are ambiguities in calls to > external packages. > > >> 2) loading package 'R.utils' followed by package 'roxygen' overwrites >> 'parse.default()' which results in errors for something like >> 'eval(parse(text="a<- 1"))' ; see illustration below) >> And I'm sure the community could come up with lots more of such scenarios. >> >> Suggestions: >> 1) In order to avoid name clashes/unintended overwrites, how about >> switching to a coding paradigm that explicitly (and automatically) >> includes a package's name in all its functions' names once code is >> turned into a real package? E.g., getting used to "preemptively" type >> 'package_fun()' or 'package.fun()' instead of just 'fun()'. Better to be >> save than sorry, right? This could be realized pretty easily (see >> example below) and, IMHO, would significantly increase transparency. > > I think long names with consistent prefixes are harder to read than short > descriptive names. I think this would make code harder to read. For > example, the first few lines of mean.default would change from > > if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) { > warning("argument is not numeric or logical: returning NA") > return(NA_real_) > } > > to > > if (!base_is.numeric(x) && !base_is.complex(x) && > !base_is.logical(x)) { > base_warning("argument is not numeric or logical: returning NA") > return(base_NA_real_) > } > > >> 2) In order to avoid intended (but for the user often pretty obscure) >> overwrites of existing functions, we could use the same mechanism >> together with the "rule": just don't provide any functions that >> overwrite existing ones, rather prepend your version of that function >> with your package name and leave it up to the user which version he >> wants to call. > > That seems like good advice. > > Duncan Murdoch
Except that namespace::foo should be assigned to another local variable instead of using package::foo in a tight loop, because repeated calls to "::" can introduce a significant performance penalty. (This has been discussed in another thread.) >> >> At the moment, all of this is probably not that big of a deal yet, but >> my suggestion has more of a mid-term/long-term character. >> >> Below you find a little illustration. I'm probably asking too much, but >> it'd be great if we could get a little discussion going on how to >> improve the way of loading packages! >> >> Best regards and thanks for R and all it's packages! >> Janko >> >> >> ################################################################################ >> # PROOF OF CONCEPT >> >> ################################################################################ >> >> # 1) PROBLEM >> # IMHO, with the number of packages submitted to CRAN constantly >> increasing, >> # over time we will be likely to see problems with respect to name >> clashes. >> # The main reasons I see for this are the following: >> # a) package developers picking identical names for their exported >> functions >> # b) package developers overwriting base functions in order to add >> functionality >> # to existing functions >> # c) ... >> # >> # This can create scenarios in which the user might not exactly know that >> # he/she is using a 'modified' version of a specific function. More so, >> the user >> # needs to carefully read the description of each new package he plans >> # to use in order to find out which functions are exported and which >> existing >> # functions might be overwritten. This in turn might imply that the user's >> # existing code needs to be refactored (i.e. instead of using 'fun()' it >> # might now be necessary to type 'namespace::fun()' to be sure that the >> desired >> # function is called). >> >> # 2) SUGGESTED SOLUTION >> # That being said, why don't we switch to a 'preemptive' coding paradigm >> # where the default way of calling functions includes the specification of >> # its namespace? In principle, the functionality offered by >> 'namespace::fun()' >> # gets the job done. >> # BUT: >> # a) it is slower compared to the direct way of calling a function. >> # (see illustration below). >> # b) this option is not available througout the development process of a >> package >> # as there is no namespace yet and there's no way to emulate one. >> This in >> # turn means that even though a package developer would buy into >> strictly >> # using 'mypkg::fun()' throughout his package code, he can only do so >> at the >> # very final stage of the process RIGHT before turning his code into a >> # working package (when he's absolutely sure everything is working as >> planned). >> # For debugging he would need to go back to using 'fun()'. Pretty >> cumbersome. >> >> # So how about simply automatically prepending a given function's name >> with >> # the package's name for each package that is build (e.g. 'pkg.fun()' or >> # 'pkg_fun()')? In the end, this would just be a small change for new >> packages >> # without a significant decrease of performance and it could also be >> realized >> # at early stages of the development process (see illustration below). >> >> # 3) ILLUSTRATION >> >> # Example case where base function 'parse.default' is overwritten: >> parse(text="a<- 5") # Works >> require(R.utils) >> require(roxygen) >> parse(text="a<- 5") # Does not work anymore >> >> ################# START A NEW R SESSION BEFORE YOU CONTINUE >> #################### >> >> # Inefficiency of 'namespace::fun()': >> require(microbenchmark) >> res.a<- microbenchmark(eval(parse(text="a<- 5"))) >> res.b<- microbenchmark(eval(base::parse(text="a<- 5"))) >> median(res.a$time)/median(res.b$time) >> >> # Can be made up by explicit assignment: >> foo<- base::parse >> res.a<- microbenchmark(eval(parse(text="a<- 5"))) >> res.b<- microbenchmark(eval(foo(text="a<- 5"))) >> median(res.a$time)/median(res.b$time) >> >> # Automatically prepend function names: >> processNamespaces<- function( >> do.global=FALSE, >> do.verbose=FALSE, >> .delim.name="_", >> ... >> ){ >> srch.list.0<- search() >> srch.list<- gsub("package:", "", srch.list.0) >> if(!do.global){ >> assign(".NS", new.env(), envir=.GlobalEnv) >> } >> out<- lapply(1:length(srch.list), function(x.pkg){ >> pkg<- srch.list[x.pkg] >> >> # SKIP LIST >> if(pkg %in% c(".GlobalEnv", "Autoloads")){ >> return(NULL) >> } >> # / >> >> # TARGET ENVIR >> if(!do.global){ >> # ADD PACKAGE TO .NS ENVIRONMENT >> envir<- eval(substitute( >> assign(PKG, new.env(), envir=.NS), >> list(PKG=pkg) >> )) >> # / >> # envir<- get(pkg, envir=.NS, inherits=FALSE) >> envir.msg<- paste(".NS$", pkg, sep="") >> } else { >> envir<- .GlobalEnv >> envir.msg<- ".GlobalEnv" >> } >> # / >> >> # PROCESS FUNCTIONS >> cnt<- ls(pos=x.pkg) >> out<- unlist(sapply(cnt, function(x.cnt){ >> value<- get(x.cnt, pos=x.pkg, inherits=FALSE) >> obj.mod<- paste(pkg, x.cnt, sep=.delim.name) >> if(!is.function(value)){ >> return(NULL) >> } >> if(do.verbose){ >> cat(paste("Assigning '", obj.mod, "' to '", envir.msg, >> "'", sep=""), sep="\n") >> } >> eval(substitute( >> assign(OBJ.MOD, value, envir=ENVIR), >> list( >> OBJ.MOD=obj.mod, >> ENVIR=envir >> ) >> )) >> return(obj.mod) >> })) >> names(out)<- NULL >> # / >> return(out) >> }) >> names(out)<- srch.list >> return(out) >> } >> >> # +++++ >> >> funs<- processNamespaces(do.verbose=TRUE) >> ls(.NS) >> ls(.NS$base) >> .NS$base$base_parse >> >> res.a<- microbenchmark(eval(parse(text="a<- 5"))) >> res.b<- microbenchmark(eval(.NS$base$base_parse(text="a<- 5"))) >> median(res.a$time)/median(res.b$time) >> >> #+++++ >> >> funs<- processNamespaces(do.global=TRUE, do.verbose=TRUE) >> base_parse >> >> res.a<- microbenchmark(eval(parse(text="a<- 5"))) >> res.b<- microbenchmark(eval(base_parse(text="a<- 5"))) >> median(res.a$time)/median(res.b$time) >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel