Re: [Rd] Inquiry about the behaviour of subsetting and names in matrices

Karolis Koncevičius Wed, 03 May 2023 02:09:03 -0700

Thank you for such a quick reply, here are some points that I think might have 
been missed:


> I would state the question the other way : why are NAs integer indices 
> allowed?
> In my experience, they are sometimes useful but they often delay the 
> detection of bugs. However, due to backward compatibility, this feature 
> cannot be removed. Adding this feature to character indices would worsen the 
> problem.

But please also note that character indices with NA are allowed for vectors. 
This is more an inconsistency between vectors and matrices. In vectors both 
numeric and character sub-setting works with NAs. In matrices only numberic and 
not character sub-setting works with NAs. Potentially this in itself can also 
be a source of bugs, or, at least surprises.

> Setting names() on a matrix is a rarely used feature that has practically no 
> positive and no negative consequences. I see no incentive to change the 
> behavior and break existing code.

When writing this message I had the opposite opinion. That this 2nd point is 
one of the most bug-probe points of all 3. As I would assume most users setting 
names() on a matrix would only do it by accident.

> In my opinion adding these features would improve the consistency of R but 
> would add more sources of bugs in an already unsafe language.

I think this maybe is a crux of the thing.

My original impression was that R was “clever” about the usage of NAs by 
design. i.e. when you choose an unknown object from a set of objects the result 
is an object, but nobody knows which - hence NA. Is it really accepted now that 
such a decision was a mistake and lead to bugs in user code?

Kind regards,
Karolis K.

> On May 3, 2023, at 11:15 AM, GILLIBERT, Andre <andre.gillib...@chu-rouen.fr> 
> wrote:
> 
> 
> Karolis wrote:
>> Hello,
> 
>> I have stumbled upon a few cases where the behaviour of naming and 
>> subsetting in matrices seems unintuitive.
>> All those look related so wanted to put everything in one message.
> 
> 
>> 1. Why row/col selection by names with NAs is not allowed?
> 
>>  x <- setNames(1:10, letters[1:10])
>>  X <- matrix(x, nrow=2, dimnames = list(letters[1:2], LETTERS[1:5]))
> 
>>  x[c(1, NA, 3)]       # vector: works and adds "NA"
>>  x[c("a", NA, "c")]   # vector: works and adds "NA"
>>  X[,c(1, NA, 3)]      # works and selects "NA" column
>>  X[,c("A", NA, "C")]  # <error>
> 
> I would state the question the other way : why are NAs integer indices 
> allowed?
> In my experience, they are sometimes useful but they often delay the 
> detection of bugs. However, due to backward compatibility, this feature 
> cannot be removed. Adding this feature to character indices would worsen the 
> problem.
> 
> I see another reason to keep the behavior as is currently : character indices 
> are most often used with column names in contexts were they are unlikely to 
> be NAs except as a consequence of a bug. In other words, I fear that the 
> valid-use-case/bug ratio would be quite poor with this feature.
> 
>> 2. Should setting names() for a matrix be allowed?
>> 
>>  names(X) <- paste0("e", 1:length(X))
>>  X["e4"]  # works
>> 
>>  # but any operation on a matrix drops the names
>>  X <- X[,-1]  # all names are gone
>>  X["e4"]      # <error>
>> 
>>  Maybe names() should not be allowed on a matrix?
> 
> Setting names() on a matrix is a rarely used feature that has practically no 
> positive and no negative consequences. I see no incentive to change the 
> behavior and break existing code.
> 
>> 3. Should selection of non-existent dimension names really be an error?
>> 
>>  x[22]   # works on a vector - gives "NA"
>>  X[,22]  # <error>
> 
> This is very often a bug on vectors and should not have been allowed on 
> vectors in the first place... But for backwards compatibility, it is hard to 
> remove. Adding this unsafe feature to matrices is a poor idea in my opinion.
> 
>>  A potential useful use-case is matching a smaller matrix to a larger one:
> 
> This is a valid use-case, but in my opinion, it adds more problems than it 
> solves.
> 
>> These also doesn't seem to be documented in '[', 'names', 'rownames’.
> 
> Indeed, the documentation of '[' seems to be unclear on indices out of range. 
> It can be improved.
> 
>> Interested if there specific reasons for this behaviour, or could these 
>> potentially be adjusted?
> 
> In my opinion adding these features would improve the consistency of R but 
> would add more sources of bugs in an already unsafe language.
> 
> Sincerely
> André GILLIBERT


        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Inquiry about the behaviour of subsetting and names in matrices

Reply via email to