Re: [Rd] confusing all.equal output

2023-03-03 Thread Martin Maechler
>>>>> peter dalgaard 
>>>>> on Thu, 2 Mar 2023 19:47:59 +0100 writes:

> I believe the wording goes back to Martin Maechler many
> moons ago (AFAICT towards the end of the last millennium.)
> We might leave it to him to change it?
> - Peter D.

Thank you, Peter.

Yes, this is *very* old.  I could claim that R users seem to get
more and more confused over time, because nobody had ever
complained for a quarter of a century .. (;-) ;-)

I know I had been inspired by the all.equal() implementation of
S-PLUS version 3.x (x = 4, IIRC) at the time, but then I also think
that I have to take the "full blame" on this :

Trying to think like myself "yesterday, when I was young ..",
I guess the argumentation for using  is.NA  was what I
considered helpful to the non experienced S / R user at the time:
Everybody has seen 'NA' before (and they see it in their objects
in this case) but only somewhat more experienced useRs would
know about is.na(). .. and it may be that at the time I found it
"slick" to combine the "NA" and "is.na" into  "is.NA" ...

About the other wording and how the mismatches should be counted, I
have no recollection.

But indeed, already in 1999, i.e., before R 1.0.0 existed,
that part of the code was

out <- is.na(target)
if(any(out != is.na(current)))
return(paste("`is.NA' value mismatches:", sum(is.na(current)),
 "in current,", sum(out), " in target"))

- - - 

Ok, now I need to work to commit a (completely orthogonal) change to
all.equal.numeric()  which had been lying around with me for
about a year at least... so I can start looking at your proposed
changes ...

Martin


>> On 2 Mar 2023, at 19:30 , avi.e.gr...@gmail.com wrote:
>> 
>> I think if you step back, you can ask what the purpose of
>> an error message is and who designs it.
>> 
>> Is the message for the developer or others on their team
>> or something an end-user knowing nothing about R will
>> see.
>> 
>> This reminds me a bit of legal mumbo jumbo that turns
>> many reading it off as it keeps talking about the party
>> of the first part or the plaintiff as compared to
>> somewhat straighter talk.
>> 
>> The scenario is that you are comparing two things. Their
>> names are not things like "target" or "current" so even
>> other programmers not involved in your code will pause
>> and wonder.
>> 
>> One view is to use phrases like first and second
>> arguments/lists/whatever.  You might talk about the one
>> on the left (but using LHS is a bit opaque) versus the
>> one on the right.
>> 
>> But sometimes it can be too verbose. Sometimes the error
>> message is being generated not where everything is clear.
>> 
>> So ideally you could say:
>> 
>> WARNING Danger Will Robinson.  Comparing two things for
>> equality.  Result finds mismatches.  There were NA found
>> on the (left or right) that were not matched on the other
>> side.  Number of such found: 2
>> 
>> If you had a Systems Engineer write detailed requirements
>> that included something a bit better than the example and
>> the programmer was able to supply the data using the
>> words and guidelines, it might fit some needs but maybe
>> not satisfy other programmers. But there are human
>> factors people whose job it is to help choose among
>> alternatives and although they may not choose well,
>> letting a programmer come up with whatever they feel like
>> is generally worse.
>> 
>> Yes, in their microcosm centered on a dozen lines of
>> code, "current" and "target" may have meaning. But are
>> they the intended user of the product?
>> 
>> -Original Message- From: R-devel
>>  On Behalf Of Antoine
>> Fabri Sent: Thursday, March 2, 2023 12:23 PM To: peter
>> dalgaard  Cc: R-devel
>>  Subject: Re: [Rd] confusing
>> all.equal output
>> 
>> Good points. I don't mind the terminology since target
>> and current are the names of the arguments. As the
>> function is already designed to stop at the first failing
>> check we might not need to enumerate or count the
>> mismatches, instead we could have "`NA` found in `target`
>> but not in `current` at position "
>> 
>> [[alternative HTML versio

Re: [Rd] confusing all.equal output

2023-03-02 Thread peter dalgaard
I believe the wording goes back to Martin Maechler many moons ago (AFAICT 
towards the end of the last millennium.)

We might leave it to him to change it?

- Peter D.

> On 2 Mar 2023, at 19:30 , avi.e.gr...@gmail.com wrote:
> 
> I think if you step back, you can ask what the purpose of an error message
> is and who designs it.
> 
> Is the message for the developer or others on their team or something an
> end-user knowing nothing about R will see.
> 
> This reminds me a bit of legal mumbo jumbo that turns many reading it off as
> it keeps talking about the party of the first part or the plaintiff as
> compared to somewhat straighter talk.
> 
> The scenario is that you are comparing two things. Their names are not
> things like "target" or "current" so even other programmers not involved in
> your code will pause and wonder.
> 
> One view is to use phrases like first and second arguments/lists/whatever.
> You might talk about the one on the left (but using LHS is a bit opaque)
> versus the one on the right. 
> 
> But sometimes it can be too verbose. Sometimes the error message is being
> generated not where everything is clear.
> 
> So ideally you could say:
> 
> WARNING Danger Will Robinson.
> Comparing two things for equality.
> Result finds mismatches.
> There were NA found on the (left or right) that were not matched on the
> other side.
> Number of such found: 2
> 
> If you had a Systems Engineer write detailed requirements that included
> something a bit better than the example and the programmer was able to
> supply the data using the words and guidelines, it might fit some needs but
> maybe not satisfy other programmers. But there are human factors people
> whose job it is to help choose among alternatives and although they may not
> choose well, letting a programmer come up with whatever they feel like is
> generally worse. 
> 
> Yes, in their microcosm centered on a dozen lines of code, "current" and
> "target" may have meaning. But are they the intended user of the product?
> 
> -Original Message-----
> From: R-devel  On Behalf Of Antoine Fabri
> Sent: Thursday, March 2, 2023 12:23 PM
> To: peter dalgaard 
> Cc: R-devel 
> Subject: Re: [Rd] confusing all.equal output
> 
> Good points. I don't mind the terminology since target and current are the
> names of the arguments. As the function is already designed to stop at the
> first failing check we might not need to enumerate or count the mismatches,
> instead we could have "`NA` found in `target` but not in `current` at
> position "
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] confusing all.equal output

2023-03-02 Thread avi.e.gross
I think if you step back, you can ask what the purpose of an error message
is and who designs it.

Is the message for the developer or others on their team or something an
end-user knowing nothing about R will see.

This reminds me a bit of legal mumbo jumbo that turns many reading it off as
it keeps talking about the party of the first part or the plaintiff as
compared to somewhat straighter talk.

The scenario is that you are comparing two things. Their names are not
things like "target" or "current" so even other programmers not involved in
your code will pause and wonder.

One view is to use phrases like first and second arguments/lists/whatever.
You might talk about the one on the left (but using LHS is a bit opaque)
versus the one on the right. 

But sometimes it can be too verbose. Sometimes the error message is being
generated not where everything is clear.

So ideally you could say:

WARNING Danger Will Robinson.
Comparing two things for equality.
Result finds mismatches.
There were NA found on the (left or right) that were not matched on the
other side.
Number of such found: 2

If you had a Systems Engineer write detailed requirements that included
something a bit better than the example and the programmer was able to
supply the data using the words and guidelines, it might fit some needs but
maybe not satisfy other programmers. But there are human factors people
whose job it is to help choose among alternatives and although they may not
choose well, letting a programmer come up with whatever they feel like is
generally worse. 

Yes, in their microcosm centered on a dozen lines of code, "current" and
"target" may have meaning. But are they the intended user of the product?

-Original Message-
From: R-devel  On Behalf Of Antoine Fabri
Sent: Thursday, March 2, 2023 12:23 PM
To: peter dalgaard 
Cc: R-devel 
Subject: Re: [Rd] confusing all.equal output

Good points. I don't mind the terminology since target and current are the
names of the arguments. As the function is already designed to stop at the
first failing check we might not need to enumerate or count the mismatches,
instead we could have "`NA` found in `target` but not in `current` at
position "

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] confusing all.equal output

2023-03-02 Thread Antoine Fabri
Good points. I don't mind the terminology since target and current are the
names of the arguments. As the function is already designed to stop at the
first failing check we might not need to enumerate or count the mismatches,
instead we could have "`NA` found in `target` but not in `current` at
position "

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] confusing all.equal output

2023-03-02 Thread peter dalgaard
Yes... Also, of course, the sentence after colon does not the describe the 
cause of the mismatch, e.g.

> all.equal(c(1,NA,NA), c(NA,NA,3))
[1] "'is.NA' value mismatch: 2 in current 2 in target"

could be confusing. 

Perhaps "is.na() mismatch (2 positions)", with the count calculated as 
sum(is.na(current) != is.na(target)) instead? 

Or you could give both off-diagonal elements of the confusion matrix:

"target-only: 1, current-only: 1"

but actually, the whole current/target terminology is somewhat unclear.

-pd

> On 1 Mar 2023, at 13:53 , Antoine Fabri  wrote:
> 
> dear r-devel,
> 
> This has probably been forever like this but is this satisfying ?
> 
> all.equal(c(1,NA,NA), c(1,NA,3))
> #> [1] "'is.NA' value mismatch: 1 in current 2 in target"
> 
> is.NA() doesn't exist (is.na() does), and is.na() is never 1 or 2.
> 
> In this example it's obvious that we're counting missing values, in a
> general situation I believe it isn't (we might understand it as the
> position of the first NA for instance).
> 
> I would expect something like "'amount of missing values mismatch: 1 in
> current 2 in target"
> 
> Thanks,
> 
> Antoine
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] confusing all.equal output

2023-03-01 Thread Antoine Fabri
dear r-devel,

This has probably been forever like this but is this satisfying ?

all.equal(c(1,NA,NA), c(1,NA,3))
#> [1] "'is.NA' value mismatch: 1 in current 2 in target"

is.NA() doesn't exist (is.na() does), and is.na() is never 1 or 2.

In this example it's obvious that we're counting missing values, in a
general situation I believe it isn't (we might understand it as the
position of the first NA for instance).

I would expect something like "'amount of missing values mismatch: 1 in
current 2 in target"

Thanks,

Antoine

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel