>>>>> "WK" == Wacek Kusnierczyk <waclaw.marcin.kusnierc...@idi.ntnu.no> >>>>> on Mon, 30 Mar 2009 14:26:24 +0200 writes:
WK> Michael Dewey wrote: >> At 05:07 30/03/2009, Aaron M. Swoboda wrote: >>> I would like to know which rows are duplicates of each other, not >>> simply that a row is duplicate of another row. In the following >>> example rows 1 and 3 are duplicates. >>> >>> > x <- c(1,3,1) >>> > y <- c(2,4,2) >>> > z <- c(3,4,3) >>> > data <- data.frame(x,y,z) >>> x y z >>> 1 1 2 3 >>> 2 3 4 4 >>> 3 1 2 3 >> WK> i don't have any solution significantly better than what you have WK> already been given. but i have a warning instead. WK> in the below, you use both 'duplicated' and 'unique' on data frames, and WK> the proposed solution relies on the latter. you may want to try to WK> avoid both when working with data frames; this is because of how they WK> do (or don't) work. WK> duplicated (and unique, which calls duplicated) simply pastes the WK> content of each row into a *string*, and then works on the strings. WK> this means that NAs in the data frame are converted to "NA"s, and "NA" WK> == "NA", obviously, so that rows that include NAs and are otherwise WK> identical will be considered *identical*. WK> that's not bad (yet), but you should be aware. however, duplicated has WK> a parameter named 'incomparables', explained in ?duplicated as follows: WK> " WK> incomparables: a vector of values that cannot be compared. 'FALSE' is a WK> special value, meaning that all values can be compared, and WK> may be the only value accepted for methods other than the WK> default. It will be coerced internally to the same type as WK> 'x'. WK> " WK> and also WK> " WK> Values in 'incomparables' will never be marked as duplicated. This WK> is intended to be used for a fairly small set of values and will WK> not be efficient for a very large set. WK> " WK> that is, for example: WK> vector = c(NA, NA) WK> duplicated(vector) WK> # [1] FALSE TRUE WK> duplicated(vector), incomparables=NA) WK> # [1] FALSE FALSE WK> list = list(NA, NA) WK> duplicated(list) WK> # [1] FALSE TRUE WK> duplicated(list, incomparables=NA) WK> # [1] FALSE FALSE WK> what the documentation *fails* to tell you is that the parameter WK> 'incomparables' is defunct No, not "defunct", but the contrary of it, "not yet implemented" ! WK> in duplicated.data.frame, which you can see in its WK> source code (below), or in the following example: WK> # data as above, or any data frame WK> duplicated(data, incomparables=NA) WK> # Error in if (!is.logical(incomparables) || incomparables) WK> .NotYetUsed("incomparables != FALSE") : WK> # missing value where TRUE/FALSE needed WK> the error message here is *confusing*. yes! WK> the error is raised because the WK> author of the code made a mistake and apparently haven't carefully ((plural or singular ??)) WK> examined and tested his product; the code goes: ((aah, ... "singular" ...)) WK> duplicated.data.frame WK> # function (x, incomparables = FALSE, fromLast = FALSE, ...) WK> # { WK> # if (!is.logical(incomparables) || incomparables) WK> # .NotYetUsed("incomparables != FALSE") WK> # duplicated(do.call("paste", c(x, sep = "\r")), fromLast = fromLast) WK> # } WK> # <environment: namespace:base> WK> clearly, the intention here is to raise an error with a (still hardly WK> clear) message as in: WK> .NotYetUsed("incomparables != FALSE") WK> # Error: argument 'incomparables != FALSE' is not used (yet) WK> but instead, if(NA) is evaluated (because '!is.logical(NA) || NA' WK> evaluates, *obviously*, to NA) and hence the uninformative error message. WK> take home point: rtfm, *but* don't believe it. and then be helpful to the R community and send a bug report *with* a patch if {as in this case} you are able to... Well, that' no longer needed here, I'll fix that easily myself. Martin WK> vQ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel