On Oct 1, 2011, at 6:14 PM, Duncan Murdoch wrote: > On 11-10-01 5:14 PM, Dominick Samperi wrote: >> On Sat, Oct 1, 2011 at 1:08 PM, Duncan Murdoch<murdoch.dun...@gmail.com> >> wrote: >>> On 11-08-23 2:23 PM, Janko Thyson wrote: >>>> >>>> aDear list, >>>> >>>> I'm aware of the fact that I posted on something related a while ago, >>>> but I just can't sweat this off and would like to ask your for an opinion: >>>> >>>> The problem: >>>> Namespaces are great, but they don't resolve certain conflicts regarding >>>> name clashes. There are more and more people out there trying to come up >>>> with their own R packages, which is great also! Yet, it becomes more and >>>> more likely that programmers will choose identical names for their >>>> exported functions and/or that they add functionality to existing >>>> function (i.e. overwriting existing functions). >>>> The whole process of which packages overwrite which functions is >>>> somewhat obscure and in addition depends on their order in the search >>>> path. On the other hand, it is not possible to use "namespace" >>>> functionality (i.e. 'namespace::fun()'; also less efficient than direct >>>> call; see illustration below) during early stages of the development >>>> process (i.e. the package is not finished yet) as there is no namespace >>>> available yet. >>>> >>> >>> I agree there can be a problem, but I don't think it is necessarily as >>> serious as you suggest. Even though there are more and more packages >>> available, most people will still use roughly the same number of them. Just >>> because CRAN has thousands of packages doesn't mean I use all of them at the >>> same time. >>> >>> >>>> I know of at least two cases where such overwrites (I think it's called >>>> masking, right?) led to some confusion at our chair: >>>> 1) loading package forecast overwrites certain functions in stats which >>>> made some code refactoring necessary >>> >>> If your code had been in a package with a NAMESPACE, it would not have been >>> affected by a user loading forecast. (If you start importing it, then of >>> course it could cause masking problems.) >>> >>> You suggest above that users only put code into a package very late in the >>> development process. The solution is, don't do that. Create a package >>> early on, and use it through the majority of development time. >>> >>> You can leave the choice of exports until late by exporting everything; >>> you'll still get the benefit of the more controlled name search from the >>> beginning. >>> >>> You say you can't use "namespace::call()" until the namespace package has >>> been written. But why would you want to? If the call is coming from the >>> new package, objects in it will be used with first priority in resolving the >>> call. You only need the :: notation when there are ambiguities in calls to >>> external packages. >>> >>> >>>> 2) loading package 'R.utils' followed by package 'roxygen' overwrites >>>> 'parse.default()' which results in errors for something like >>>> 'eval(parse(text="a<- 1"))' ; see illustration below) >>>> And I'm sure the community could come up with lots more of such scenarios. >>>> >>>> Suggestions: >>>> 1) In order to avoid name clashes/unintended overwrites, how about >>>> switching to a coding paradigm that explicitly (and automatically) >>>> includes a package's name in all its functions' names once code is >>>> turned into a real package? E.g., getting used to "preemptively" type >>>> 'package_fun()' or 'package.fun()' instead of just 'fun()'. Better to be >>>> save than sorry, right? This could be realized pretty easily (see >>>> example below) and, IMHO, would significantly increase transparency. >>> >>> I think long names with consistent prefixes are harder to read than short >>> descriptive names. I think this would make code harder to read. For >>> example, the first few lines of mean.default would change from >>> >>> if (!is.numeric(x)&& !is.complex(x)&& !is.logical(x)) { >>> warning("argument is not numeric or logical: returning NA") >>> return(NA_real_) >>> } >>> >>> to >>> >>> if (!base_is.numeric(x)&& !base_is.complex(x)&& >>> !base_is.logical(x)) { >>> base_warning("argument is not numeric or logical: returning NA") >>> return(base_NA_real_) >>> } >>> >>> >>>> 2) In order to avoid intended (but for the user often pretty obscure) >>>> overwrites of existing functions, we could use the same mechanism >>>> together with the "rule": just don't provide any functions that >>>> overwrite existing ones, rather prepend your version of that function >>>> with your package name and leave it up to the user which version he >>>> wants to call. >>> >>> That seems like good advice. >>> >>> Duncan Murdoch >> >> Except that namespace::foo should be assigned to another local >> variable instead of using package::foo in a tight loop, because >> repeated calls to "::" can introduce a significant performance >> penalty. (This has been discussed in another thread.) > > That's good advice too. >
Well, if this idiom is used more often, we may re-consider the implementation of `::` - it could definitely be sped up if needed, but I think so far the view was that it's not worth the hassle (aka increased complexity and relative ugliness of making it a special form) ... Cheers, Simon > Duncan Murdoch > >> >>>> >>>> At the moment, all of this is probably not that big of a deal yet, but >>>> my suggestion has more of a mid-term/long-term character. >>>> >>>> Below you find a little illustration. I'm probably asking too much, but >>>> it'd be great if we could get a little discussion going on how to >>>> improve the way of loading packages! >>>> >>>> Best regards and thanks for R and all it's packages! >>>> Janko >>>> >>>> >>>> ################################################################################ >>>> # PROOF OF CONCEPT >>>> >>>> ################################################################################ >>>> >>>> # 1) PROBLEM >>>> # IMHO, with the number of packages submitted to CRAN constantly >>>> increasing, >>>> # over time we will be likely to see problems with respect to name >>>> clashes. >>>> # The main reasons I see for this are the following: >>>> # a) package developers picking identical names for their exported >>>> functions >>>> # b) package developers overwriting base functions in order to add >>>> functionality >>>> # to existing functions >>>> # c) ... >>>> # >>>> # This can create scenarios in which the user might not exactly know that >>>> # he/she is using a 'modified' version of a specific function. More so, >>>> the user >>>> # needs to carefully read the description of each new package he plans >>>> # to use in order to find out which functions are exported and which >>>> existing >>>> # functions might be overwritten. This in turn might imply that the user's >>>> # existing code needs to be refactored (i.e. instead of using 'fun()' it >>>> # might now be necessary to type 'namespace::fun()' to be sure that the >>>> desired >>>> # function is called). >>>> >>>> # 2) SUGGESTED SOLUTION >>>> # That being said, why don't we switch to a 'preemptive' coding paradigm >>>> # where the default way of calling functions includes the specification of >>>> # its namespace? In principle, the functionality offered by >>>> 'namespace::fun()' >>>> # gets the job done. >>>> # BUT: >>>> # a) it is slower compared to the direct way of calling a function. >>>> # (see illustration below). >>>> # b) this option is not available througout the development process of a >>>> package >>>> # as there is no namespace yet and there's no way to emulate one. >>>> This in >>>> # turn means that even though a package developer would buy into >>>> strictly >>>> # using 'mypkg::fun()' throughout his package code, he can only do so >>>> at the >>>> # very final stage of the process RIGHT before turning his code into a >>>> # working package (when he's absolutely sure everything is working as >>>> planned). >>>> # For debugging he would need to go back to using 'fun()'. Pretty >>>> cumbersome. >>>> >>>> # So how about simply automatically prepending a given function's name >>>> with >>>> # the package's name for each package that is build (e.g. 'pkg.fun()' or >>>> # 'pkg_fun()')? In the end, this would just be a small change for new >>>> packages >>>> # without a significant decrease of performance and it could also be >>>> realized >>>> # at early stages of the development process (see illustration below). >>>> >>>> # 3) ILLUSTRATION >>>> >>>> # Example case where base function 'parse.default' is overwritten: >>>> parse(text="a<- 5") # Works >>>> require(R.utils) >>>> require(roxygen) >>>> parse(text="a<- 5") # Does not work anymore >>>> >>>> ################# START A NEW R SESSION BEFORE YOU CONTINUE >>>> #################### >>>> >>>> # Inefficiency of 'namespace::fun()': >>>> require(microbenchmark) >>>> res.a<- microbenchmark(eval(parse(text="a<- 5"))) >>>> res.b<- microbenchmark(eval(base::parse(text="a<- 5"))) >>>> median(res.a$time)/median(res.b$time) >>>> >>>> # Can be made up by explicit assignment: >>>> foo<- base::parse >>>> res.a<- microbenchmark(eval(parse(text="a<- 5"))) >>>> res.b<- microbenchmark(eval(foo(text="a<- 5"))) >>>> median(res.a$time)/median(res.b$time) >>>> >>>> # Automatically prepend function names: >>>> processNamespaces<- function( >>>> do.global=FALSE, >>>> do.verbose=FALSE, >>>> .delim.name="_", >>>> ... >>>> ){ >>>> srch.list.0<- search() >>>> srch.list<- gsub("package:", "", srch.list.0) >>>> if(!do.global){ >>>> assign(".NS", new.env(), envir=.GlobalEnv) >>>> } >>>> out<- lapply(1:length(srch.list), function(x.pkg){ >>>> pkg<- srch.list[x.pkg] >>>> >>>> # SKIP LIST >>>> if(pkg %in% c(".GlobalEnv", "Autoloads")){ >>>> return(NULL) >>>> } >>>> # / >>>> >>>> # TARGET ENVIR >>>> if(!do.global){ >>>> # ADD PACKAGE TO .NS ENVIRONMENT >>>> envir<- eval(substitute( >>>> assign(PKG, new.env(), envir=.NS), >>>> list(PKG=pkg) >>>> )) >>>> # / >>>> # envir<- get(pkg, envir=.NS, inherits=FALSE) >>>> envir.msg<- paste(".NS$", pkg, sep="") >>>> } else { >>>> envir<- .GlobalEnv >>>> envir.msg<- ".GlobalEnv" >>>> } >>>> # / >>>> >>>> # PROCESS FUNCTIONS >>>> cnt<- ls(pos=x.pkg) >>>> out<- unlist(sapply(cnt, function(x.cnt){ >>>> value<- get(x.cnt, pos=x.pkg, inherits=FALSE) >>>> obj.mod<- paste(pkg, x.cnt, sep=.delim.name) >>>> if(!is.function(value)){ >>>> return(NULL) >>>> } >>>> if(do.verbose){ >>>> cat(paste("Assigning '", obj.mod, "' to '", envir.msg, >>>> "'", sep=""), sep="\n") >>>> } >>>> eval(substitute( >>>> assign(OBJ.MOD, value, envir=ENVIR), >>>> list( >>>> OBJ.MOD=obj.mod, >>>> ENVIR=envir >>>> ) >>>> )) >>>> return(obj.mod) >>>> })) >>>> names(out)<- NULL >>>> # / >>>> return(out) >>>> }) >>>> names(out)<- srch.list >>>> return(out) >>>> } >>>> >>>> # +++++ >>>> >>>> funs<- processNamespaces(do.verbose=TRUE) >>>> ls(.NS) >>>> ls(.NS$base) >>>> .NS$base$base_parse >>>> >>>> res.a<- microbenchmark(eval(parse(text="a<- 5"))) >>>> res.b<- microbenchmark(eval(.NS$base$base_parse(text="a<- 5"))) >>>> median(res.a$time)/median(res.b$time) >>>> >>>> #+++++ >>>> >>>> funs<- processNamespaces(do.global=TRUE, do.verbose=TRUE) >>>> base_parse >>>> >>>> res.a<- microbenchmark(eval(parse(text="a<- 5"))) >>>> res.b<- microbenchmark(eval(base_parse(text="a<- 5"))) >>>> median(res.a$time)/median(res.b$time) >>>> >>>> ______________________________________________ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel