Re: [Rd] Should 0L * NA_integer_ be 0L?

2020-05-23 Thread Michael Chirico
OK, so maybe one way to paraphrase:

For R, the boundedness of integer vectors is an implementation detail,
rather than a deeper mathematical fact that can be exploited for this
case.

One might also expect then that overflow wouldn't result in NA, but
rather automatically cast up to numeric? But that this doesn't happen
for efficiency reasons?

Would it make any sense to have a different carveout for the logical
case? For logical, storage as integer might be seen as a similar type
of implementation detail (though if we're being this strict, the
question arises of what multiplication of logical values even means).

FALSE * NA = 0L


On Sat, May 23, 2020 at 6:49 PM Martin Maechler
 wrote:
>
> > Michael Chirico
> > on Sat, 23 May 2020 18:08:22 +0800 writes:
>
> > I don't see this specific case documented anywhere (I also tried to 
> search
> > the r-devel archives, as well as I could); the only close reference
> > mentions NA & FALSE = FALSE, NA | TRUE = TRUE. And there's also this
> > snippet from R-lang:
>
> > In cases where the result of the operation would be the same for all
> >> possible values the NA could take, the operation may return this value.
> >>
>
> > This begs the question -- shouldn't 0L * NA_integer_ be 0L?
>
> > Because this is an integer operation, and according to this definition 
> of
> > NA:
>
> > Missing values in the statistical sense, that is, variables whose value
> >> is not known, have the value @code{NA}
> >>
>
> > NA_integer_ should be an unknown integer value between -2^31+1 and 
> 2^31-1.
> > Multiplying any of these values by 0 results in 0 -- that is, the 
> result of
> > the operation would be 0 for all possible values the NA could take.
>
>
> > This came up from what seems like an inconsistency to me:
>
> > all(NA, FALSE)
> > # [1] FALSE
> > NA * FALSE
> > # [1] NA
>
> > I agree with all(NA, FALSE) being FALSE because we know for sure that 
> all
> > cannot be true. The same can be said of the multiplication -- whether NA
> > represents TRUE or FALSE, the resulting value is 0 (FALSE).
>
> > I also agree with the numeric case, FWIW: NA_real_ * 0 has to be 
> NA_real_,
> > because NA_real_ could be Inf or NaN, for both of which multiplication 
> by 0
> > gives NaN, hence 0 * NA_real_ is either 0 or NaN, hence it must be 
> NA_real_.
>
> I agree about almost everything you say above. ...
> but possibly the main conclusion.
>
> The problem with your proposed change would be that  integer
> arithmetic gives a different result than the corresponding
> "numeric" computation.
> (I don't remember other such cases in R, at least as long as the
>  integer arithmetic does not overflow.)
>
> One principle to decided such problems in S and R has been that
> the user should typically *not* have to know if their data is
> stored in float/double or in integer, and the results should be the same
> (possibly apart from staying integer for some operations).
>
>
> {{Note that there are also situations were it's really
>   undesirable that0 * NA   does *not* give 0 (but NA);
>   notably in sparse matrix operations where you'd very often can
>   now that NA was not Inf (or NaN) and you really would like to
>   preserve sparseness ...}}
>
>
> > [[alternative HTML version deleted]]
>
> (as you did not use plain text ..)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Should 0L * NA_integer_ be 0L?

2020-05-23 Thread Martin Maechler
> Michael Chirico 
> on Sat, 23 May 2020 18:08:22 +0800 writes:

> I don't see this specific case documented anywhere (I also tried to search
> the r-devel archives, as well as I could); the only close reference
> mentions NA & FALSE = FALSE, NA | TRUE = TRUE. And there's also this
> snippet from R-lang:

> In cases where the result of the operation would be the same for all
>> possible values the NA could take, the operation may return this value.
>> 

> This begs the question -- shouldn't 0L * NA_integer_ be 0L?

> Because this is an integer operation, and according to this definition of
> NA:

> Missing values in the statistical sense, that is, variables whose value
>> is not known, have the value @code{NA}
>> 

> NA_integer_ should be an unknown integer value between -2^31+1 and 2^31-1.
> Multiplying any of these values by 0 results in 0 -- that is, the result 
of
> the operation would be 0 for all possible values the NA could take.


> This came up from what seems like an inconsistency to me:

> all(NA, FALSE)
> # [1] FALSE
> NA * FALSE
> # [1] NA

> I agree with all(NA, FALSE) being FALSE because we know for sure that all
> cannot be true. The same can be said of the multiplication -- whether NA
> represents TRUE or FALSE, the resulting value is 0 (FALSE).

> I also agree with the numeric case, FWIW: NA_real_ * 0 has to be NA_real_,
> because NA_real_ could be Inf or NaN, for both of which multiplication by 0
> gives NaN, hence 0 * NA_real_ is either 0 or NaN, hence it must be 
NA_real_.

I agree about almost everything you say above. ...
but possibly the main conclusion.

The problem with your proposed change would be that  integer
arithmetic gives a different result than the corresponding
"numeric" computation.
(I don't remember other such cases in R, at least as long as the
 integer arithmetic does not overflow.)

One principle to decided such problems in S and R has been that
the user should typically *not* have to know if their data is
stored in float/double or in integer, and the results should be the same
(possibly apart from staying integer for some operations).


{{Note that there are also situations were it's really
  undesirable that0 * NA   does *not* give 0 (but NA);
  notably in sparse matrix operations where you'd very often can
  now that NA was not Inf (or NaN) and you really would like to
  preserve sparseness ...}}


> [[alternative HTML version deleted]]

(as you did not use plain text ..)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Should 0L * NA_integer_ be 0L?

2020-05-23 Thread Michael Chirico
I don't see this specific case documented anywhere (I also tried to search
the r-devel archives, as well as I could); the only close reference
mentions NA & FALSE = FALSE, NA | TRUE = TRUE. And there's also this
snippet from R-lang:

In cases where the result of the operation would be the same for all
> possible values the NA could take, the operation may return this value.
>

This begs the question -- shouldn't 0L * NA_integer_ be 0L?

Because this is an integer operation, and according to this definition of
NA:

Missing values in the statistical sense, that is, variables whose value
> is not known, have the value @code{NA}
>

NA_integer_ should be an unknown integer value between -2^31+1 and 2^31-1.
Multiplying any of these values by 0 results in 0 -- that is, the result of
the operation would be 0 for all possible values the NA could take.

This came up from what seems like an inconsistency to me:

all(NA, FALSE)
# [1] FALSE
NA * FALSE
# [1] NA

I agree with all(NA, FALSE) being FALSE because we know for sure that all
cannot be true. The same can be said of the multiplication -- whether NA
represents TRUE or FALSE, the resulting value is 0 (FALSE).

I also agree with the numeric case, FWIW: NA_real_ * 0 has to be NA_real_,
because NA_real_ could be Inf or NaN, for both of which multiplication by 0
gives NaN, hence 0 * NA_real_ is either 0 or NaN, hence it must be NA_real_.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel