Well, that's embarrassing. Sorry for the noise on that front, everyone. I misunderstood something from the aforementioned unrelated conversation I was having, but not double checking is on me (I rarely use if else and when I do I avoid that situation in my own code, which is why I didn't already know this)
I'd still argue that situation should at least warn, possibly error, as it seems indicative of a bug in the user's code. On Mon, Nov 28, 2016 at 7:00 AM, Martin Maechler <maech...@stat.math.ethz.ch > wrote: > >>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org> > >>>>> on Sat, 26 Nov 2016 17:14:01 +0000 writes: > > > Just stating, in 'ifelse', 'test' is not recycled. As I said in > "R-intro: length of 'ifelse' result" (https://stat.ethz.ch/ > pipermail/r-devel/2016-September/073136.html), ifelse(condition, a, b) > returns a vector of the length of 'condition', even if 'a' or 'b' is longer. > > yes and ?ifelse (the help page) also does not say that test is > recycled, rather > > >> If \code{yes} or \code{no} are too short, their elements are > recycled. > > (*and* the problem you wrote the above has been corrected in the > R-intro manual shortly after). > > > > On current 'ifelse' code in R: > > > * The part > > ans[nas] <- NA > > could be omitted because NA's are already in place. > > If the part is removed, variable 'nas' is no longer used. > > I agree that this seems logical. If I apply the change, R's own > full checks do not seem affected, and I may try to commit that > change and "wait and see". > > > > * The any(*) part actually checks the thing that is used as the > index vector. The index vector could be stored and then repeatedly used, > like the following. > > > if (any(sel <- test & ok)) > > ans[sel] <- rep(yes, length.out = length(ans))[sel] > > yes, I know, and have had similar thoughts in the past. > However note (I know you that) the current code is > > if (any(test[ok])) > ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok] > > and any(test[ok]) may be considerably faster than > any(sel <- test & ok) > > OTOH I think the current code would only be faster (for the > above) when any(.) returned FALSE ... > I think it may depend on the typical use cases which of the two > versions is more efficient. > > > > * If 'test' is a factor, doing > > storage.mode(test) <- "logical" > > is not appropriate, but is.atomic(test) returns TRUE. Maybe use > > if(!is.object(test)) > > instead of > > if(is.atomic(test)) . > > This would be a considerable change I think... > Note that I'm currently really proposing to introduce an *additional* > ifelse function with different "more reasonable" semantic, and > your last change would do that too. > > My alternative should really work > - for factors > - for "array"s including "matrix" (as the current ifelse() does!) > - for "Date", "POSIXct", "ts"(timeseries), "zoo", > "sparseVector", "sparseMatrix" (*), or "mpfr", > without any special code, but rather by design. > > *) Currently needs the R-forge version of Matrix, version 1.2-8. > > A bit less than an hour ago, I have updated the gist with an updated > proposal ifelse2() {and the current alternatives that I know}, > modified so it *does* keep more, e.g. dim() attributes in > reasonable cases. > > https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d878 > 94#file-ifelse-def-r-L168 > > Hence my ifelse2() became even a bit longer (but not slower) > working for even more classes of "yes" and "no". > > > > On ifelse-checks.R: > > * In function 'chkIfelse', if the fourth function argument names is > not "NA.", the argument name is changed, but the function body still uses > the old name. That makes error in chkIfelse(ifelseHW) . > > A fix: > > if(names(formals(FUN))[[4]] != "NA.") { > > body(FUN) <- do.call(substitute, list(body(FUN), > > setNames(list(quote(NA.)), > names(formals(FUN))[[4]]))) > > names(formals(FUN))[[4]] <- "NA." > > } > > yes, thank you! (a bit embarrassing for me ..) > > > After fixing, chkIfelse(ifelseHW) just fails at identical(iflt, > as.POSIXlt(ifct)) . > > 'iflt' has NA as 'tzone' and 'isdst' components. > > * Because function 'chkIfelse' continues checking after failure, > > as.POSIXlt(ifct) > > may give error. The error happens, for example, in > chkIfelse(ifelseR) . Maybe place it inside try(...). > > * If 'lt' is a "POSIXlt" object, (lt-100) is a "POSIXct" object. > > So, > > FUN(c(TRUE, FALSE, NA, TRUE), lt, lt-100) > > is an example of mixed class. > > good; thank you for the hint. > > > * The part of function 'chkIfelse' in > > for(i in seq_len(nFact)) > > uses 'NA.' function argument. That makes error when 'chkIfelse' is > applied to function without fourth argument. > > The part should be wrapped in > > if(has.4th) . > yes of course > > > * Function 'ifelseJH' has fourth argument, but the argument is not > for value if NA. So, instead of > > chkIfelse(ifelseJH) , > > maybe call > > chkIfelse(function(test, yes, no) ifelseJH(test, yes, no)) . > You are right; > I've decided to solve this differently. > > I'm looking at these suggestions now, notably also your proposals below; > thank you, Suharto! > > (I wanted to put my improved 'ifelse2' out first, quickly). > Martin > > > > A concrete version of 'ifelse2' that starts the result from 'yes': > > function(test, yes, no, NA. = NA) { > > if(!is.logical(test)) > > test <- if(isS4(test)) methods::as(test, "logical") else > as.logical(test) > > n <- length(test) > > ans <- rep(yes, length.out = n) > > ans[!test & !is.na(test)] <- rep(no, length.out = n)[!test & ! > is.na(test)] > > ans[is.na(test)] <- rep(NA., length.out = n)[is.na(test)] > > ans > > } > > > It requires 'rep' method that is compatible with subsetting. It also > works with "POSIXlt" in R 2.7.2, when 'length' gives 9, and gives an > appropriate result if time zones are the same. > > For coercion of 'test', there is no need of keeping attributes. So, > it doesn't do > > storage.mode(test) <- "logical" > > and goes directly to 'as.logical'. > > It relies on subassignment for silent coercions of > > logical < integer < double < complex . > > Unlike 'ifelse', it never skips any subassignment. So, phenomenon as > in "example of different return modes" in ?ifelse doesn't happen. > > > Another version, for keeping attributes as pointed out by Duncan > Murdoch: > > function(test, yes, no, NA. = NA) { > > if(!is.logical(test)) > > test <- if(isS4(test)) methods::as(test, "logical") else > as.logical(test) > > n <- length(test) > > n.yes <- length(yes); n.no <- length(no) > > if (n.yes != n) { > > if (n.no == n) { # swap yes <-> no > > test <- !test > > ans <- yes; yes <- no; no <- ans > > n.no <- n.yes > > } else yes <- yes[rep_len(seq_len(n.yes), n)] > > } > > ans <- yes > > if (n.no == 1L) > > ans[!test] <- no > > else > > ans[!test & !is.na(test)] <- no[ > > if (n.no == n) !test & !is.na(test) > > else rep_len(seq_len(n.no), n)[!test & !is.na(test)]] > > stopifnot(length(NA.) == 1L) > > ans[is.na(test)] <- NA. > > ans > > } > > > Note argument evaluation order: 'test', 'yes', 'no', 'NA.'. > > First, it chooses the first of 'yes' and 'no' that has the same > length as the result. If none of 'yes' and 'no' matches the length of the > result, it chooses recycled (or truncated) 'yes'. > > It uses 'rep' on the index and subsetting as a substitute for 'rep' > on the value. > > It requires 'length' method that is compatible with subsetting. > > Additionally, it uses the same idea as dplyr::if_else, or more > precisely the helper function 'replace_with'. It doesn't use 'rep' if the > length of 'no' is 1 or is the same as the length of the result. For > subassignment with value of length 1, recycling happens by itself and NA in > index is OK. > > It limits 'NA.' to be of length 1, considering 'NA.' just as a label > for NA. > > > Cases where the last version above or 'ifelse2 or 'ifelseHW' in > ifelse-def.R gives inappropriate answers: > > - 'yes' and 'no' are "difftime" objects with different "units" > attribute > > - 'yes' and 'no' are "POSIXlt" objects with different time zone > > Example: 'yes' in "UTC" and 'no' in "EST5EDT". The reverse, 'yes' in > "EST5EDT" and 'no' in "UTC" gives error. > > > For the cases, c(yes, no) helps. Function 'ifelseJH' in ifelse-def.R > gives a right answer for "POSIXlt" case. > > --------------------- > > Martin et al., > > > > > > On Tue, Nov 22, 2016 at 2:12 AM, Martin Maechler <maechler at > stat.math.ethz.ch > >> wrote: > > >> > >> Note that my premise was really to get *away* from inheriting > >> too much from 'test'. > >> Hence, I have *not* been talking about replacing ifelse() but > >> rather of providing a new ifelse2() > >> > >> [ or if_else() if Hadley was willing to ditch the dplyr one > >> in favor of a base one] > >> > >> > Specifically, based on an unrelated discussion with Henrik > Bengtsson > >> on > >> > Twitter, I wonder if preserving the recycling behavior test > is > >> longer than > >> > yes, no, but making the case where > >> > >> > length( test ) < max(length( yes ), length( no )) > >> > >> > would simplify usage for userRs in a useful way. > >> > > > That was a copyediting bug on my part, it seems I hit send with my > message > > only half-edited/proofread. Apologies. > > > That should have said that making the case where test is the one > that will > > be recycled (because it is shorter than either yes or no) an error. > My > > claim is that the fact that test itself can be recycled, rather than > just > > yes or no, is confusing to many R users. If we are writing an > ifelse2 we > > might want to drop that feature and just throw an error in that case. > > (Users could still use the original ifelse if they understand and > > specifically want that behavior). > > > Does that make more sense? > > > > >> > >> > Also, If we combine a stricter contract that the output will > always > >> be of > >> > length with the suggestion of a specified output class > >> > >> > > Here, again, I was talking about the restriction that the output be > > guaranteed to be the length of test, regardless of the length of yes > and > > no. That, combined with a specific, guaranteed output class would > make a > > much narrower/more restricted but also (I argue) much easier to > understand > > function. Particularly for beginning and intermediate users. > > > I do hear what you're saying about silent conversion, though, so > what I'm > > describing might be a third function (ifelse3 for lack of a better > name for > > now), as you pointed out. > > > >> that was not my intent here.... but would be another interesting > >> extension. However, I would like to keep R-semantic silent > coercions > >> such as > >> logical < integer < double < complex > >> > >> and your pseudo code below would not work so easily I think. > >> > >> > the pseudo code could be > >> > >> (I'm changing assignment '=' to '<-' ... [please!] ) > >> > >> > ifelse2 <- function(test, yes, no, outclass) { > >> > lenout <- length(test) > >> > out <- as( rep(yes, length.out <- lenout), outclass) > >> > out[!test] <- as(rep(no, length.out = lenout)[!test], > outclass) > >> > # handle NA stuff > >> > out > >> > } > >> > >> > >> > NAs could be tricky if outclass were allowed to be completely > >> general, but > >> > doable, I think? Another approach if we ARE fast-passing > while > >> leaving > >> > ifelse intact is that maybe NA's in test just aren't allowed > in > >> ifelse2. > >> > I'm not saying we should definitely do that, but it's > possible and > >> would > >> > make things faster. > >> > >> > Finally, In terms of efficiency, with the stuff that Luke > and I are > >> working > >> > on, the NA detection could be virtually free in certain > cases, which > >> could > >> > give a nice boost for long vectors that don't have any NAs > (and > >> 'know' > >> > that they don't). > >> > >> That *is* indeed a very promising prospect! > >> Thank you in advance! > >> > >> > Best, > >> > ~G > >> > >> I still am bit disappointed by the fact that it seems nobody has > >> taken a good look at my ifelse2() proposal. > >> > > > I plan to look at it soon. Thanks again for all your work. > > > ~G > > > >> > >> I really would like an alternative to ifelse() in *addition* to > >> the current ifelse(), but hopefully in the future being used in > >> quite a few places instead of ifelse() > >> efficiency but for changed semantics, namely working for > considerably > >> more "vector like" classes of 'yes' and 'no' than the current > >> ifelse(). > >> > >> As I said, the current proposal works for objects of class > >> "Date", "POSIXct", "POSIXlt", "factor", "mpfr" (pkg 'Rmpfr') > >> and hopefully for "sparseVector" (in a next version of the 'Matrix' > pkg). > >> > >> Martin > >> > >> > On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler < > >> maechler at stat.math.ethz.ch > >> >> wrote: > >> > >> >> Finally getting back to this : > >> >> > >> >> >>>>> Hadley Wickham <h.wickham at gmail.com> > >> >> >>>>> on Mon, 15 Aug 2016 07:51:35 -0500 writes: > >> >> > >> >> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham > >> >> > <h.wickham at gmail.com> wrote: > >> >> >>> >> One possibility would also be to consider a > >> >> >>> "numbers-only" or >> rather "same type"-only {e.g., > >> >> >>> would also work for characters} >> version. > >> >> >>> > >> >> >>> > I don't know what you mean by these. > >> >> >>> > >> >> >>> In the mean time, Bob Rudis mentioned dplyr::if_else(), > >> >> >>> which is very relevant, thank you Bob! > >> >> >>> > >> >> >>> As I have found, that actually works in such a "same > >> >> >>> type"-only way: It does not try to coerce, but gives an > >> >> >>> error when the classes differ, even in this somewhat > >> >> >>> debatable case : > >> >> >>> > >> >> >>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error: > >> >> >>> `false` has type 'double' not 'integer' > >> >> >>> > > >> >> >>> > >> >> >>> As documented, if_else() is clearly stricter than > >> >> >>> ifelse() and e.g., also does no recycling (but of > >> >> >>> length() 1). > >> >> >> > >> >> >> I agree that if_else() is currently too strict - it's > >> >> >> particularly annoying if you want to replace some values > >> >> >> with a missing: > >> >> >> > >> >> >> x <- sample(10) if_else(x > 5, NA, x) # Error: `false` > >> >> >> has type 'integer' not 'logical' > >> >> >> > >> >> >> But I would like to make sure that this remains an error: > >> >> >> > >> >> >> if_else(x > 5, x, "BLAH") > >> >> >> > >> >> >> Because that seems more likely to be a user error (but > >> >> >> reasonable people might certainly believe that it should > >> >> >> just work) > >> >> >> > >> >> >> dplyr is more accommodating in other places (i.e. in > >> >> >> bind_rows(), collapse() and the joins) but it's > >> >> >> surprisingly hard to get all the details right. For > >> >> >> example, what should the result of this call be? > >> >> >> > >> >> >> if_else(c(TRUE, FALSE), factor(c("a", "b")), > >> >> >> factor(c("c", "b")) > >> >> >> > >> >> >> Strictly speaking I think you could argue it's an error, > >> >> >> but that's not very user-friendly. Should it be a factor > >> >> >> with the union of the levels? Should it be a character > >> >> >> vector + warning? Should the behaviour change if one set > >> >> >> of levels is a subset of the other set? > >> >> >> > >> >> >> There are similar issues for POSIXct (if the time zones > >> >> >> are different, which should win?), and difftimes > >> >> >> (similarly for units). Ideally you'd like the behaviour > >> >> >> to be extensible for new S3 classes, which suggests it > >> >> >> should be a generic (and for the most general case, it > >> >> >> would need to dispatch on both arguments). > >> >> > >> >> > One possible principle would be to use c() - > >> >> > i.e. construct out as > >> >> > >> >> > out <- c(yes[0], no[0] > >> >> > length(out) <- max(length(yes), length(no)) > >> >> > >> >> yes; this would require that a `length<-` method works for > the > >> >> class of the result. > >> >> > >> >> Duncan Murdoch mentioned a version of this, in his very > >> >> first reply: > >> >> > >> >> ans <- c(yes, no)[seq_along(test)] > >> >> ans <- ans[seq_along(test)] > >> >> > >> >> which is less efficient for atomic vectors, but requires > >> >> less from the class: it "only" needs `c` and `[` to work > >> >> > >> >> and a mixture of your two proposals would be possible too: > >> >> > >> >> ans <- c(yes[0], no[0]) > >> >> ans <- ans[seq_along(test)] > >> >> > >> >> which does *not* work for my "mpfr" numbers (CRAN package > 'Rmpfr'), > >> >> but that's a buglet in the c.mpfr() implementation of my > Rmpfr > >> >> package... (which has already been fixed in the development > version > >> on > >> >> R-forge, > >> >> https://r-forge.r-project.org/R/?group_id=386) > >> >> > >> >> > But of course that wouldn't help with factor responses. > >> >> > >> >> Yes. However, a version of Duncan's suggestion -- of > treating > >> 'yes' first > >> >> -- does help in that case. > >> >> > >> >> For once, mainly as "feasability experiment", > >> >> I have created a github gist to make my current ifelse2() > proposal > >> >> available > >> >> for commenting, cloning, pullrequesting, etc: > >> >> > >> >> Consisting of 2 files > >> >> - ifelse-def.R : Functions definitions only, basically all > the > >> current > >> >> proposals, called ifelse*() > >> >> - ifelse-checks.R : A simplistic checking function > >> >> and examples calling it, notably demonstrating that my > >> >> ifelse2() does work with > >> >> "Date", <dateTime> (i.e. "POSIXct" and "POSIXlt"), factors, > >> >> and "mpfr" (the arbitrary-precision numbers in my package > "Rmpfr") > >> >> > >> >> Also if you are not on github, you can quickly get to the > ifelse2() > >> >> definition : > >> >> > >> >> https://gist.github.com/mmaechler/ > 9cfc3219c4b89649313bfe6853d878 > >> >> 94#file-ifelse-def-r-L168 > >> >> > >> >> > Also, if you're considering an improved ifelse(), I'd > >> >> > strongly urge you to consider adding an `na` argument, > >> >> > >> >> I now did (called it 'NA.'). > >> >> > >> >> > so that you can use ifelse() to transform all three > >> >> > possible values in a logical vector. > >> >> > >> >> > Hadley > >> >> > -- http://hadley.nz > >> >> > >> >> For those who really hate GH (and don't want or cannot > easily > >> follow the > >> >> above URL), here's my current definition: > >> >> > >> >> > >> >> ##' Martin Maechler, 14. Nov 2016 --- taking into account > Duncan M. > >> and > >> >> Hadley's > >> >> ##' ideas in the R-devel thread starting at (my mom's 86th > >> birthday): > >> >> ##' https://stat.ethz.ch/pipermail/r-devel/2016-August/ > 072970.html > >> >> ifelse2 <- function (test, yes, no, NA. = NA) { > >> >> if(!is.logical(test)) { > >> >> if(is.atomic(test)) > >> >> storage.mode(test) <- "logical" > >> >> else ## typically a "class"; storage.mode<-() typically > fails > >> >> test <- if(isS4(test)) methods::as(test, "logical") else > >> >> as.logical(test) > >> >> } > >> >> > >> >> ## No longer optimize the "if (a) x else y" cases: > >> >> ## Only "non-good" R users use ifelse(.) instead of if(.) > in these > >> >> cases. > >> >> > >> >> ans <- > >> >> tryCatch(rep(if(is.object(yes) && identical(class(yes), > class(no))) > >> >> ## as c(o) or o[0] may not work for the class > >> >> yes else c(yes[0], no[0]), length.out = > >> >> length(test)), > >> >> error = function(e) { ## try asymmetric, yes-leaning > >> >> r <- yes > >> >> r[!test] <- no[!test] > >> >> r > >> >> }) > >> >> ok <- !(nas <- is.na(test)) > >> >> if (any(test[ok])) > >> >> ans[test & ok] <- rep(yes, length.out = length(ans))[test & > ok] > >> >> if (any(!test[ok])) > >> >> ans[!test & ok] <- rep(no, length.out = length(ans))[!test > & ok] > >> >> ans[nas] <- NA. # possibly coerced to class(ans) > >> >> ans > >> >> } > >> >> > >> >> ______________________________________________ > >> >> R-devel at r-project.org mailing list > >> >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> >> > >> > >> > >> > >> > -- > >> > Gabriel Becker, PhD > >> > Associate Scientist (Bioinformatics) > >> > Genentech Research > >> > >> > [[alternative HTML version deleted]] > >> > >> > > > > -- > > Gabriel Becker, PhD > > Associate Scientist (Bioinformatics) > > Genentech Research > > > [[alternative HTML version deleted]] > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Gabriel Becker, PhD Associate Scientist (Bioinformatics) Genentech Research [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel