I appreciate the compliment from Ivan and still share the puzzlement at the empty return.
What is the policy for changing something that is wrong? There is a trade-off between breaking old code that worked around a problem and breaking new code written by people who make reasonable assumptions. Mathematically, it seems obvious to me that duplicated.matrix(A) should do something like this: v <- matrix(FALSE, nrow = nrow(A) -> nr, ncol=1L) # or an ordinary vector? if (nr > 1L) # Check because 2:0 & 2:1 do not do what we want. { for (i in 2:nr) { for (j in 1:(i-1)) if (identical(A[i,],A[j,])) # or something more complicated to handle incomparables { v[i] <- TRUE; break} } } v Of course my code is horribly inefficient, but the difference should be just in computing the same result faster. An empty vector of some type is identical to an empty vector of the same type, so this computes [,1] [1,] FALSE [2,] TRUE [3,] TRUE [4,] TRUE [5,] TRUE , and I argue that that is correct. A gap in documentation makes a change to the correct behaviour easier. (If the current behaviour were documented then the first step in changing the behaviour would be to issue a warning that the change is coming in a future version.) The protection for old code could be just a warning that can be turned off with a call to options. The new documentation should be more explicit. Regards, Jorgen. From: Mark Webster <markwebster...@yahoo.co.uk> To: Jorgen Harmse <jhar...@roku.com>, Ivan Krylov <ikry...@disroot.org> Cc: "r-help@r-project.org" <r-help@r-project.org> Subject: Re: [R] duplicated() on zero-column data frames returns empty Message-ID: <603481690.9150754.1712522666...@mail.yahoo.com> Content-Type: text/plain; charset="utf-8" duplicated.matrix is an interesting one. I think a similar change would make sense, because it would have the dimensions that people would expect when using the default MARGIN = 1. However, it could be argued that it's not a needed change, because the Value section of its documentation only guarantees the dimensions of the output when using MARGIN = 0. In that case, duplicated.matrix does indeed return the expected 5x0 matrix for your example: str(duplicated(matrix(0, 5, 0), MARGIN = 0))# logi[1:5, 0 ] Best Regards, Mark Webster [[alternative HTML version deleted]] From: Mark Webster markwebster...@yahoo.co.uk<mailto:markwebster...@yahoo.co.uk> To: Ivan Krylov ikry...@disroot.org<mailto:ikry...@disroot.org>, r-help@r-project.org<mailto:r-help@r-project.org> r-help@r-project.org<mailto:r-help@r-project.org> Subject: Re: [R] duplicated() on zero-column data frames returns empty vector Message-ID: 1379736116.7985600.1712306452...@mail.yahoo.com<mailto:1379736116.7985600.1712306452...@mail.yahoo.com> Content-Type: text/plain; charset="utf-8" Do you mean the row names should mean all the rows should be counted as non-duplicates?Yes, I can see the argument for that, thanks.I must say I'm still puzzled at what interpretation would motivate the current behaviour of returning a logical(0), however. Date: Sun, 7 Apr 2024 11:00:51 +0300 From: Ivan Krylov <ikry...@disroot.org<mailto:ikry...@disroot.org>> To: Jorgen Harmse <jhar...@roku.com<mailto:jhar...@roku.com>> Cc: "r-help@r-project.org<mailto:r-help@r-project.org>" <r-help@r-project.org<mailto:r-help@r-project.org>>, "markwebster...@yahoo.co.uk<mailto:markwebster...@yahoo.co.uk>" <markwebster...@yahoo.co.uk<mailto:markwebster...@yahoo.co.uk>> Subject: Re: [R] duplicated() on zero-column data frames returns empty Message-ID: 20240407110051.7924c03c@Tarkus<mailto:20240407110051.7924c03c@Tarkus> Content-Type: text/plain; charset="utf-8" � Fri, 5 Apr 2024 16:08:13 +0000 Jorgen Harmse <jhar...@roku.com<mailto:jhar...@roku.com>> �����: > if duplicated really treated a row name as part of the row then > any(duplicated(data.frame(�))) would always be FALSE. My expectation > is that if key1 is a subset of key2 then all(duplicated(df[key1]) >= > duplicated(df[key2])) should always be TRUE. That's a good argument, thank you! Would you suggest similar changes to duplicated.matrix too? Currently it too returns 0-length output for 0-column inputs: # 0-column matrix for 0-column input str(duplicated(matrix(0, 5, 0))) # logi[1:5, 0 ] # 1-column matrix for 1-column input str(duplicated(matrix(0, 5, 1))) # logi [1:5, 1] FALSE TRUE TRUE TRUE TRUE # a dim-1 array for >1-column input str(duplicated(matrix(0, 5, 10))) # logi [1:5(1d)] FALSE TRUE TRUE TRUE TRUE -- Best regards, Ivan [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.