On Mon, 10 Dec 2007, Charles C. Berry wrote: > On Mon, 10 Dec 2007, G. Jay Kerns wrote: > >> Hello, >> >> I have been interested in setdiff() for data frames that operates >> row-wise. I looked in the documentation, mailing lists, etc., and >> didn't find exactly the right thing. Given data frames A, B with the >> same columns, the goal is to extract the rows that are in A, but not >> in B. Of course, one can usually do setdiff(rownames(A), rownames(B)) >> but that is cheating. :-) >> >> I played around a little bit and came up with >> >> setdiff.data.frame = function(A, B){ >> g <- function( y, B){ >> any( apply(B, 1, FUN = function(x) >> identical(all.equal(x, y), TRUE) ) ) } >> unique( A[ !apply(A, 1, FUN = function(t) g(t, B) ), ] ) >> } >> >> I am sure that somebody can do this a better/faster way... any ideas? > > setdiff.data.frame <- > function(A,B) A[ !duplicated( rbind(B,A) )[ -seq_len(nrow(B))] , ] > > This ignores rownames(A) which may not be what is wanted in every case.
I was about to suggest using the approach taken by duplicated.data.frame, (which is to 'hash' the rows to a character vector) then call setdiff. E.g. a <- do.call("paste", c(A, sep = "\r")) b <- do.call("paste", c(B, sep = "\r")) A[match(setdiff(a, b),a), ] Note that apply() is intended for matrices (not data frames) and the version given can do a horrendous amount of coercion, whereas the above does it only once. > > HTH, > > Chuck > >> Any chance we could get a data.frame method for set.diff in future R >> versions? (The notion of "set" is somewhat ambiguous with respect to >> rows, columns, and entries in the data frame case.) No chance: if you have not found it in the archives, it is too rare a request. >> Jay >> >> P.S. You can see what I'm looking for with >> >> A <- expand.grid( 1:3, 1:3 ) >> B <- A[ 2:5, ] >> setdiff.data.frame(A,B) -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel