Re: [R] duplicated() on zero-column data frames returns empty

Jorgen Harmse via R-help Mon, 08 Apr 2024 10:03:40 -0700

I appreciate the compliment from Ivan and still share the puzzlement at the 
empty return.


What is the policy for changing something that is wrong? There is a trade-off 
between breaking old code that worked around a problem and breaking new code 
written by people who make reasonable assumptions. Mathematically, it seems 
obvious to me that duplicated.matrix(A) should do something like this:

v <- matrix(FALSE, nrow = nrow(A) -> nr, ncol=1L) # or an ordinary vector?
if (nr > 1L) # Check because 2:0 & 2:1 do not do what we want.
{ for (i in 2:nr)
  { for (j in 1:(i-1))
    if (identical(A[i,],A[j,])) # or something more complicated to handle 
incomparables
    { v[i] <- TRUE; break}
  }
}
v

Of course my code is horribly inefficient, but the difference should be just in 
computing the same result faster. An empty vector of some type is identical to 
an empty vector of the same type, so this computes

      [,1]

[1,] FALSE

[2,]  TRUE

[3,]  TRUE

[4,]  TRUE

[5,]  TRUE
, and I argue that that is correct.

A gap in documentation makes a change to the correct behaviour easier. (If the 
current behaviour were documented then the first step in changing the behaviour 
would be to issue a warning that the change is coming in a future version.) The 
protection for old code could be just a warning that can be turned off with a 
call to options. The new documentation should be more explicit.

Regards,
Jorgen.

From: Mark Webster <markwebster...@yahoo.co.uk>
To: Jorgen Harmse <jhar...@roku.com>, Ivan Krylov
        <ikry...@disroot.org>
Cc: "r-help@r-project.org" <r-help@r-project.org>
Subject: Re: [R] duplicated() on zero-column data frames returns empty
Message-ID: <603481690.9150754.1712522666...@mail.yahoo.com>
Content-Type: text/plain; charset="utf-8"

 duplicated.matrix is an interesting one. I think a similar change would make 
sense, because it would have the dimensions that people would expect when using 
the default MARGIN = 1. However, it could be argued that it's not a needed 
change, because the Value section of its documentation only guarantees the 
dimensions of the output when using MARGIN = 0. In that case, duplicated.matrix 
does indeed return the expected 5x0 matrix for your example:
str(duplicated(matrix(0, 5, 0), MARGIN = 0))# logi[1:5, 0 ]
Best Regards,
Mark Webster
        [[alternative HTML version deleted]]

From: Mark Webster markwebster...@yahoo.co.uk<mailto:markwebster...@yahoo.co.uk>
To: Ivan Krylov ikry...@disroot.org<mailto:ikry...@disroot.org>,  
r-help@r-project.org<mailto:r-help@r-project.org>
        r-help@r-project.org<mailto:r-help@r-project.org>
Subject: Re: [R]  duplicated() on zero-column data frames returns
        empty vector
Message-ID: 
1379736116.7985600.1712306452...@mail.yahoo.com<mailto:1379736116.7985600.1712306452...@mail.yahoo.com>
Content-Type: text/plain; charset="utf-8"

 Do you mean the row names should mean all the rows should be counted as 
non-duplicates?Yes, I can see the argument for that, thanks.I must say I'm 
still puzzled at what interpretation would motivate the current behaviour of 
returning a logical(0), however.

Date: Sun, 7 Apr 2024 11:00:51 +0300
From: Ivan Krylov <ikry...@disroot.org<mailto:ikry...@disroot.org>>
To: Jorgen Harmse <jhar...@roku.com<mailto:jhar...@roku.com>>
Cc: "r-help@r-project.org<mailto:r-help@r-project.org>" 
<r-help@r-project.org<mailto:r-help@r-project.org>>,
        "markwebster...@yahoo.co.uk<mailto:markwebster...@yahoo.co.uk>" 
<markwebster...@yahoo.co.uk<mailto:markwebster...@yahoo.co.uk>>
Subject: Re: [R] duplicated() on zero-column data frames returns empty
Message-ID: 
20240407110051.7924c03c@Tarkus<mailto:20240407110051.7924c03c@Tarkus>
Content-Type: text/plain; charset="utf-8"

� Fri, 5 Apr 2024 16:08:13 +0000
Jorgen Harmse <jhar...@roku.com<mailto:jhar...@roku.com>> �����:

> if duplicated really treated a row name as part of the row then
> any(duplicated(data.frame(�))) would always be FALSE. My expectation
> is that if key1 is a subset of key2 then all(duplicated(df[key1]) >=
> duplicated(df[key2])) should always be TRUE.

That's a good argument, thank you!

Would you suggest similar changes to duplicated.matrix too? Currently
it too returns 0-length output for 0-column inputs:

# 0-column matrix for 0-column input
str(duplicated(matrix(0, 5, 0)))
# logi[1:5, 0 ]

# 1-column matrix for 1-column input
str(duplicated(matrix(0, 5, 1)))
# logi [1:5, 1] FALSE TRUE TRUE TRUE TRUE

# a dim-1 array for >1-column input
str(duplicated(matrix(0, 5, 10)))
# logi [1:5(1d)] FALSE TRUE TRUE TRUE TRUE

--
Best regards,
Ivan




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] duplicated() on zero-column data frames returns empty

Reply via email to