Gabor Grothendieck wrote: > Certainly this has been recognized as a potential problem: > > http://developer.r-project.org/nonstandard-eval.pdf > > however, it is convenient when you are performing > an analysis and entering commands directly as opposed > to writing a program although possibly the potential ambiguities > overshadow the convenience. >
in most cases, i do not see why one could not use a string literal passed by value instead of having an expression deparsed within the function, which may lead to confusing behaviour. this would give much more consistent and predictable code. this has nothing to do with the evaluation mechanism, which can still be lazy. in the case of subset, i do not really see how this design might be helpful, but it's easy to see how it can be harmful, examples have just been given. the convenience here is at most up to being able to omit quotes, at the risk of having columns selected where they should not, and vice versa. the worst thing is that it destroys the benefit of lexical scoping: subset(d, select=group) did the programmer intend to select the column named 'group'? or the columns whose names appear in the vector group? is d supposed not to have a column named 'group', should one change the identifier if d does have such a column, to avoid selecting that column instead of whatever else would be selected? etc. could this not be written as subset(d, select="group") (two extra characters), and have it cleanly and always mean 'pick the one column named 'group''? so there are actually three problems here: - one that a programmer may be unaware that her own code not do what she wants; - another that a user may unaware of that the code she uses performs this way; - another that a user may not be sure whether the code may be reused as is, or must be modified so as not to interfere with the particular data. the dependence of subset's behaviour on the particular data it is applied to is confusing. and here's an example of how it breaks its own smart semantics: d = data.frame(a=1) d$`c(a,b)` = 2 d # no problem, two columns names(d) # one named 'c(a,b)' subset(d, select=c(a,b)) # so what? the expression given to select certainly is a valid and actual name of a column in d, but subset complains there's no such column (well, it actually says object "b" not found, by which it probably means that object b, i.e., object named 'b', has not been found. not only uninformative as a message in this situation, but also revealing the pervasive confusion of the name and the named, as the object "b" -- a one-character string -- has not been mentioned here at all. what a mess.) this can't possibly be considered good design, can it? the dubious benefit is heavily outweighed by the drawbacks. vQ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.