Re: [Rd] 1954 from NA

2021-05-24 Thread Avi Gross via R-devel
ake it doable in the above example. From: Adrian Dușa mailto:dusa.adr...@unibuc.ro> > Sent: Monday, May 24, 2021 8:18 AM To: Greg Minshall mailto:minsh...@umich.edu> > Cc: Avi Gross mailto:avigr...@verizon.net> >; r-devel mailto:r-devel@r-project.org> > Subject: Re: [

Re: [Rd] 1954 from NA

2021-05-24 Thread Nicholas Tierney
Hi all, When first hearing about ALTREP I've wondered how it might be able to be used to store special missing value information - how can we learn more about implementing ALTREP classes? The idea of carrying around a "meaning of my NAs" vector, as Gabe said, would be very interesting! I've done

Re: [Rd] 1954 from NA

2021-05-24 Thread Gabriel Becker
Hi All, So there is a not particularly active, but closely curated (ie everything on there should be good in terms of principled examples) github organization of ALTREP examples: https://github.com/ALTREP-examples. Currently there are two examples by Luke (including a package version of the memory

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Mon, May 24, 2021 at 5:47 PM Gabriel Becker wrote: > Hi Adrian, > > I had the same thought as Luke. It is possible that you can develop an > ALTREP that carries around the tagging information you're looking for in a > way that is more persistent (in some cases) than R-level attributes and > mo

Re: [Rd] 1954 from NA

2021-05-24 Thread Gabriel Becker
Hi Adrian, I had the same thought as Luke. It is possible that you can develop an ALTREP that carries around the tagging information you're looking for in a way that is more persistent (in some cases) than R-level attributes and more hidden than additional user-visible columns. The downsides to t

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Mon, May 24, 2021 at 4:40 PM Bertram, Alexander via R-devel < r-devel@r-project.org> wrote: > Dear Adrian, > SPSS and other packages handle this problem in a very similar way to what I > described: they store additional metadata for each variable. You can see > this in the way that SPSS organiz

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
Hi Taras, On Mon, May 24, 2021 at 4:20 PM Taras Zakharko wrote: > Hi Adrian, > > Have a look at vctrs package — they have low-level primitives that might > simplify your life a bit. I think you can get quite far by creating a > custom type that stores NAs in an attribute and utilizes vctrs proxy

Re: [Rd] 1954 from NA

2021-05-24 Thread Bertram, Alexander via R-devel
Dear Adrian, SPSS and other packages handle this problem in a very similar way to what I described: they store additional metadata for each variable. You can see this in the way that SPSS organizes it's file format: each "variable" has additional metadata that indicate how specific values of the va

Re: [Rd] 1954 from NA

2021-05-24 Thread Taras Zakharko
Hi Adrian, Have a look at vctrs package — they have low-level primitives that might simplify your life a bit. I think you can get quite far by creating a custom type that stores NAs in an attribute and utilizes vctrs proxy functionality to preserve these attributes across different operations.

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
Dear Alex, Thanks for piping in, I am learning with each new message. The problem is clear, the solution escapes me though. I've already tried the attributes route: it is going to triple the data size: along with the additional (logical) variable that specifies which level is missing, one also nee

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Mon, May 24, 2021 at 2:11 PM Greg Minshall wrote: > [...] > if you have 500 columns of possibly-NA'd variables, you could have one > column of 500 "bits", where each bit has one of N values, N being the > number of explanations the corresponding column has for why the NA > exists. > The mere

Re: [Rd] 1954 from NA

2021-05-24 Thread Bertram, Alexander via R-devel
Dear Adrian, I just wanted to pipe in and underscore Thomas' point: the payload bits of IEEE 754 floating point values are no place to store data that you care about or need to keep. That is not only related to the R APIs, but also how processors handle floating point values and signaling and non-s

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Mon, May 24, 2021 at 1:31 PM Tomas Kalibera wrote: > [...] > > For the reasons I explained, I would be against such a change. Keeping the > data on the side, as also recommended by others on this list, would allow > you for a reliable implementation. I don't want to support fragile package > c

Re: [Rd] 1954 from NA

2021-05-24 Thread Greg Minshall
Adrian, > If it was only one column then your solution is neat. But with 5-600 > variables, each of which can contain multiple missing values, to > double this number of variables just to describe NA values seems to me > excessive. Not to mention we should be able to quickly convert / > import /

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
Hmm... If it was only one column then your solution is neat. But with 5-600 variables, each of which can contain multiple missing values, to double this number of variables just to describe NA values seems to me excessive. Not to mention we should be able to quickly convert / import / export from o

Re: [Rd] 1954 from NA

2021-05-24 Thread Tomas Kalibera
On 5/24/21 11:46 AM, Adrian Dușa wrote: > On Sun, May 23, 2021 at 10:14 PM Tomas Kalibera > mailto:tomas.kalib...@gmail.com>> wrote: > > [...] > > Good, but unfortunately the delineation between computation and > non-computation is not always transparent. Even if an operation > d

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Sun, May 23, 2021 at 10:14 PM Tomas Kalibera wrote: > [...] > > Good, but unfortunately the delineation between computation and > non-computation is not always transparent. Even if an operation doesn't > look like "computation" on the high-level, it may internally involve > computation - so, r

Re: [Rd] 1954 from NA

2021-05-23 Thread Greg Minshall
s fine for that calculation too. > > > -Original Message- > From: R-devel On Behalf Of Adrian Du?a > Sent: Sunday, May 23, 2021 2:04 PM > To: Tomas Kalibera > Cc: r-devel > Subject: Re: [Rd] 1954 from NA > > Dear Tomas, > > I understand that perfectly, b

Re: [Rd] 1954 from NA

2021-05-23 Thread Avi Gross via R-devel
alues are due to reasons A/B/C then the added columns works fine for that calculation too. -Original Message- From: R-devel On Behalf Of Adrian Du?a Sent: Sunday, May 23, 2021 2:04 PM To: Tomas Kalibera Cc: r-devel Subject: Re: [Rd] 1954 from NA Dear Tomas, I understand that perfec

Re: [Rd] 1954 from NA

2021-05-23 Thread Tomas Kalibera
On 5/23/21 8:04 PM, Adrian Dușa wrote: > Dear Tomas, > > I understand that perfectly, but that is fine. > The payload is not going to be used in any computations anyways, it is > strictly an information carrier that differentiates between different > types of (tagged) NA values. Good, but unfor

Re: [Rd] 1954 from NA

2021-05-23 Thread Adrian Dușa
Dear Tomas, I understand that perfectly, but that is fine. The payload is not going to be used in any computations anyways, it is strictly an information carrier that differentiates between different types of (tagged) NA values. Having only one NA value in R is extremely limiting for the social s

Re: [Rd] 1954 from NA

2021-05-23 Thread Tomas Kalibera
TLDR: tagging R NAs is not possible. External software should not depend on how R currently implements NA, this may change at any time. Tagging of NA is not supported in R (if it were, it would have been documented). It would not be possible to implement such tagging reliably with the current

Re: [Rd] 1954 from NA

2021-05-23 Thread brodie gaslam via R-devel
> On Sunday, May 23, 2021, 10:45:22 AM EDT, Adrian Dușa > wrote: > > On Sun, May 23, 2021 at 4:33 PM brodie gaslam via R-devel > wrote: > > I should add, I don't know that you can rely on this > > particular encoding of R's NA.  If I were trying to restore > > an NA from some external format, I

Re: [Rd] 1954 from NA

2021-05-23 Thread Adrian Dușa
On Sun, May 23, 2021 at 4:33 PM brodie gaslam via R-devel < r-devel@r-project.org> wrote: > I should add, I don't know that you can rely on this > particular encoding of R's NA. If I were trying to restore > an NA from some external format, I would just generate an > R NA via e.g NA_real_ in the

Re: [Rd] 1954 from NA

2021-05-23 Thread Mark van der Loo
I wrote about this once over here: http://www.markvanderloo.eu/yaRb/2012/07/08/representation-of-numerical-nas-in-r-and-the-1954-enigma/ -M Op zo 23 mei 2021 15:33 schreef brodie gaslam via R-devel < r-devel@r-project.org>: > I should add, I don't know that you can rely on this > particular en

Re: [Rd] 1954 from NA

2021-05-23 Thread brodie gaslam via R-devel
I should add, I don't know that you can rely on this particular encoding of R's NA.  If I were trying to restore an NA from some external format, I would just generate an R NA via e.g NA_real_ in the R session I'm restoring the external data into, and not try to hand assemble one. Best, B. On

Re: [Rd] 1954 from NA

2021-05-23 Thread brodie gaslam via R-devel
This is because the NA in question is NA_real_, which is encoded in double precision IEEE-754, which uses 64 bits.  The "1954" is just part of the NA.  The NA must also conform to the NaN encoding for double precision numbers, which requires that the "beginning" portion of the number be "0x7ff0" (w

[Rd] 1954 from NA

2021-05-23 Thread Adrian Dușa
Dear R devs, I am probably missing something obvious, but still trying to understand why the 1954 from the definition of an NA has to fill 32 bits when it normally doesn't need more than 16. Wouldn't the code below achieve exactly the same thing? typedef union { double value; unsigned sh