Re: [Rd] [External] Re: 1954 from NA

2021-05-25 Thread Avi Gross via R-devel
Greg,

I am curious what they suggest you use multiple NaN values for. Or, is it 
simply like how text messages on your phone started because standard size 
packets were bigger than what some uses required so they piggy-backed messages 
on the "empty" space.

If by NaN you include the various flavors of NA such as NA_logical_ and 
NA_complex_ I have sometimes wondered if they are slightly different bitstreams 
or all the same but interpreted by programs as being the right kind for their 
context. Sounds like maybe they are different and there is one for pretty much 
each basic type except perhaps raw.

But if you add more, in that case, will it be seen as the right NA for the 
environment it is in? Heck, if R adds yet another basic type (like a 
quaternion) or a nibble, could they use the same bits you took without asking 
for your application?

It does sound like some suggest you use a method with existing abilities and 
tightly control that all functions used to manipulate the data will behave and 
preserve those attributes. I am not so sure the clients using it will obey. I 
have seen plenty of people say use some tidyverse functions for various 
purposes then use something more base-R like complete.cases() or rbind() that 
may, but also may not, preserve what they want. And once lost, ...

Now, of course, you could write wrapper functions that will take the data, copy 
the attributes, allow whatever changes, and carefully put them back before 
returning. This may not be trivial though if you want to do something like 
delete lots of rows as you might need to first identify what rows will be kept, 
then adjust the vector of attributes accordingly before returning it. Sorting 
is another such annoyance. Many things do conversions such as making copies or 
converting a copy to a factor, that may mess things up. If it has already been 
done and people have experience, great. If not, good luck.

-Original Message-
From: Gregory Warnes  
Sent: Tuesday, May 25, 2021 9:13 PM
To: Avi Gross 
Cc: r-devel 
Subject: Re: [Rd] [External] Re: 1954 from NA

As a side note, for floating point values, the IEEE 754 standard provides for a 
large set of NaN values, making it possible to have multiple types of NAs for 
floating point values...

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: 1954 from NA

2021-05-25 Thread Gregory Warnes
As a side note, for floating point values, the IEEE 754 standard provides for a 
large set of NaN values, making it possible to have multiple types of NAs for 
floating point values...

Sent from my iPad

> On May 25, 2021, at 3:03 PM, Avi Gross via R-devel  
> wrote:
> 
> That helps get more understanding of what you want to do, Adrian. Getting 
> anyone to switch is always a challenge but changing R enough to tempt them 
> may be a bigger challenge. His is an old story. I was the first adopter for 
> C++ in my area and at first had to have my code be built with an all C 
> project making me reinvent some wheels so the same “make” system knew how to 
> build the two compatibly and link them. Of course, they all eventually had to 
> join me in a later release but I had moved forward by then.
> 
> 
> 
> I have changed (or more accurately added) lots of languages in my life and 
> continue to do so. The biggest challenge is not to just adapt and use it 
> similarly to the previous ones already mastered but to understand WHY someone 
> designed the language this way and what kind of idioms are common and useful 
> even if that means a new way of thinking. But, of course, any “older” 
> language has evolved and often drifted in multiple directions. Many now 
> borrow heavily from others even when the philosophy is different and often 
> the results are not pretty. Making major changes in R might have serious 
> impacts on existing programs including just by making them fail as they run 
> out of memory.
> 
> 
> 
> If you look at R, there is plenty you can do in base R, sometimes by standing 
> on your head. Yet you see package after package coming along that offers not 
> just new things but sometimes a reworking and even remodeling of old things. 
> R has a base graphics system I now rarely use and another called lattice I 
> have no reason to use again because I can do so much quite easily in ggplot. 
> Similarly, the evolving tidyverse group of packages approaches things from an 
> interesting direction to the point where many people mainly use it and not 
> base R. So if they were to teach a class in how to gather your data and 
> analyze it and draw pretty pictures, the students might walk away thinking 
> they had learned R but actually have learned these packages.
> 
> 
> 
> Your scenario seems related to a common scenario of how we can have values 
> that signal beyond some range in an out-of-band manner. Years ago we had 
> functions in languages like C that would return a -1 on failure when only 
> non-negative results were otherwise possible. That can work fine but fails in 
> cases when any possible value in the range can be returned. We have languages 
> that deal with this kind of thing using error handling constructs like 
> exceptions.  Sometimes you bundle up multiple items into a structure and 
> return that with one element of the structure holding some kind of return 
> status and another holding the payload. A variation on this theme, as in 
> languages like GO is to have function that return multiple values with one of 
> them containing nil on success and an error structure on failure.
> 
> 
> 
> The situation we have here that seems to be of concern to you is that you 
> would like each item in a structure to have attributes that are recognized 
> and propagated as it is being processed. Older languages tended not to even 
> have a concept so basic types simply existed and two instances of the number 
> 5 might even be the same underlying one or two strings with the same contents 
> and so on. You could of course play the game of making a struct, as mentioned 
> above, but then you needed your own code to do all the handling as nothing 
> else knew it contained multiple items and which ones had which purpose.
> 
> 
> 
> R did add generalized attributes and some are fairly well integrated or at 
> least partially. “Names” were discussed as not being easy to keep around. 
> Factors used their own tagging method that seems to work fairly well but 
> probably not everywhere. But what you want may be more general and not built 
> on similar foundations.
> 
> 
> 
> I look at languages like Python that are arguably more object-oriented now 
> than R is and in some ways can be extended better, albeit not in others. If I 
> wanted to create an object to hold the number 5 and I add methods to the 
> object that allow it to participate in various ways with other objects using 
> the hidden payload but also sometimes using the hidden payload, then I might 
> pair it with the string “five” but also with dozens of other strings for the 
> word representing 5 in many languages. So I might have it act like a number 
> in numerical situations and like text when someone is using it in writing a 
> novel in any of many languages.
> 
> 
> 
> You seem to want to have the original text visible that gives a reason 
> something is missing (or something like that) but have the software TREAT it 
> like it is m

Re: [Rd] [External] Re: 1954 from NA

2021-05-25 Thread Duncan Murdoch
You've already been told how to solve this:  just add attributes to the 
objects. Use the standard NA to indicate that there is some kind of 
missingness, and the attribute to describe exactly what it is.  Stick a 
class on those objects and define methods so that subsetting and 
arithmetic preserves the extra info you've added. If you do some 
operation that turns those NAs into NaNs, big deal:  the attribute will 
still be there, and is.na(NaN) still returns TRUE.


Base R doesn't need anything else.

You complained that users shouldn't need to know about attributes, and 
they won't:  you, as the author of the package that does this, will 
handle all those details.  Working in your subject area you know all the 
different kinds of NAs that people care about, and how they code them in 
input data, so you can make it all totally transparent.  If you do it 
well, someone in some other subject area with a completely different set 
of kinds of missingness will be able to adapt your code to their use.


I imagine this has all been done in one of the thousands of packages on 
CRAN, but if it hasn't been done well enough for you, do it better.


Duncan Murdoch

On 25/05/2021 7:01 p.m., Adrian Dușa wrote:

Dear Avi,

That was quite a lengthy email...
What you write makes sense of course. I try hard not to deviate from the
base R, and thought my solution does just that but apparently no such luck.

I suspect, however, that something will have to eventually change: since
one of the R building blocks (such as an NA) is questioned by compilers, it
is serious enough to attract attention from the R core and maintainers.
And if that happens, my fingers are crossed the solution would allow users
to declare existing values as missing.

The importance of that, for the social sciences, cannot be stressed enough.

Best wishes, thanks once again to everyone,
Adrian

On Tue, May 25, 2021 at 10:03 PM Avi Gross via R-devel <
r-devel@r-project.org> wrote:


That helps get more understanding of what you want to do, Adrian. Getting
anyone to switch is always a challenge but changing R enough to tempt them
may be a bigger challenge. His is an old story. I was the first adopter for
C++ in my area and at first had to have my code be built with an all C
project making me reinvent some wheels so the same “make” system knew how
to build the two compatibly and link them. Of course, they all eventually
had to join me in a later release but I had moved forward by then.



I have changed (or more accurately added) lots of languages in my life and
continue to do so. The biggest challenge is not to just adapt and use it
similarly to the previous ones already mastered but to understand WHY
someone designed the language this way and what kind of idioms are common
and useful even if that means a new way of thinking. But, of course, any
“older” language has evolved and often drifted in multiple directions. Many
now borrow heavily from others even when the philosophy is different and
often the results are not pretty. Making major changes in R might have
serious impacts on existing programs including just by making them fail as
they run out of memory.



If you look at R, there is plenty you can do in base R, sometimes by
standing on your head. Yet you see package after package coming along that
offers not just new things but sometimes a reworking and even remodeling of
old things. R has a base graphics system I now rarely use and another
called lattice I have no reason to use again because I can do so much quite
easily in ggplot. Similarly, the evolving tidyverse group of packages
approaches things from an interesting direction to the point where many
people mainly use it and not base R. So if they were to teach a class in
how to gather your data and analyze it and draw pretty pictures, the
students might walk away thinking they had learned R but actually have
learned these packages.



Your scenario seems related to a common scenario of how we can have values
that signal beyond some range in an out-of-band manner. Years ago we had
functions in languages like C that would return a -1 on failure when only
non-negative results were otherwise possible. That can work fine but fails
in cases when any possible value in the range can be returned. We have
languages that deal with this kind of thing using error handling constructs
like exceptions.  Sometimes you bundle up multiple items into a structure
and return that with one element of the structure holding some kind of
return status and another holding the payload. A variation on this theme,
as in languages like GO is to have function that return multiple values
with one of them containing nil on success and an error structure on
failure.



The situation we have here that seems to be of concern to you is that you
would like each item in a structure to have attributes that are recognized
and propagated as it is being processed. Older languages tended not to even
have a concept so basic types simply exis

Re: [Rd] [External] Re: 1954 from NA

2021-05-25 Thread Adrian Dușa
Dear Avi,

That was quite a lengthy email...
What you write makes sense of course. I try hard not to deviate from the
base R, and thought my solution does just that but apparently no such luck.

I suspect, however, that something will have to eventually change: since
one of the R building blocks (such as an NA) is questioned by compilers, it
is serious enough to attract attention from the R core and maintainers.
And if that happens, my fingers are crossed the solution would allow users
to declare existing values as missing.

The importance of that, for the social sciences, cannot be stressed enough.

Best wishes, thanks once again to everyone,
Adrian

On Tue, May 25, 2021 at 10:03 PM Avi Gross via R-devel <
r-devel@r-project.org> wrote:

> That helps get more understanding of what you want to do, Adrian. Getting
> anyone to switch is always a challenge but changing R enough to tempt them
> may be a bigger challenge. His is an old story. I was the first adopter for
> C++ in my area and at first had to have my code be built with an all C
> project making me reinvent some wheels so the same “make” system knew how
> to build the two compatibly and link them. Of course, they all eventually
> had to join me in a later release but I had moved forward by then.
>
>
>
> I have changed (or more accurately added) lots of languages in my life and
> continue to do so. The biggest challenge is not to just adapt and use it
> similarly to the previous ones already mastered but to understand WHY
> someone designed the language this way and what kind of idioms are common
> and useful even if that means a new way of thinking. But, of course, any
> “older” language has evolved and often drifted in multiple directions. Many
> now borrow heavily from others even when the philosophy is different and
> often the results are not pretty. Making major changes in R might have
> serious impacts on existing programs including just by making them fail as
> they run out of memory.
>
>
>
> If you look at R, there is plenty you can do in base R, sometimes by
> standing on your head. Yet you see package after package coming along that
> offers not just new things but sometimes a reworking and even remodeling of
> old things. R has a base graphics system I now rarely use and another
> called lattice I have no reason to use again because I can do so much quite
> easily in ggplot. Similarly, the evolving tidyverse group of packages
> approaches things from an interesting direction to the point where many
> people mainly use it and not base R. So if they were to teach a class in
> how to gather your data and analyze it and draw pretty pictures, the
> students might walk away thinking they had learned R but actually have
> learned these packages.
>
>
>
> Your scenario seems related to a common scenario of how we can have values
> that signal beyond some range in an out-of-band manner. Years ago we had
> functions in languages like C that would return a -1 on failure when only
> non-negative results were otherwise possible. That can work fine but fails
> in cases when any possible value in the range can be returned. We have
> languages that deal with this kind of thing using error handling constructs
> like exceptions.  Sometimes you bundle up multiple items into a structure
> and return that with one element of the structure holding some kind of
> return status and another holding the payload. A variation on this theme,
> as in languages like GO is to have function that return multiple values
> with one of them containing nil on success and an error structure on
> failure.
>
>
>
> The situation we have here that seems to be of concern to you is that you
> would like each item in a structure to have attributes that are recognized
> and propagated as it is being processed. Older languages tended not to even
> have a concept so basic types simply existed and two instances of the
> number 5 might even be the same underlying one or two strings with the same
> contents and so on. You could of course play the game of making a struct,
> as mentioned above, but then you needed your own code to do all the
> handling as nothing else knew it contained multiple items and which ones
> had which purpose.
>
>
>
> R did add generalized attributes and some are fairly well integrated or at
> least partially. “Names” were discussed as not being easy to keep around.
> Factors used their own tagging method that seems to work fairly well but
> probably not everywhere. But what you want may be more general and not
> built on similar foundations.
>
>
>
> I look at languages like Python that are arguably more object-oriented now
> than R is and in some ways can be extended better, albeit not in others. If
> I wanted to create an object to hold the number 5 and I add methods to the
> object that allow it to participate in various ways with other objects
> using the hidden payload but also sometimes using the hidden payload, then
> I might pair it with the string “five” but

Re: [Rd] [External] Re: 1954 from NA

2021-05-25 Thread Avi Gross via R-devel
That helps get more understanding of what you want to do, Adrian. Getting 
anyone to switch is always a challenge but changing R enough to tempt them may 
be a bigger challenge. His is an old story. I was the first adopter for C++ in 
my area and at first had to have my code be built with an all C project making 
me reinvent some wheels so the same “make” system knew how to build the two 
compatibly and link them. Of course, they all eventually had to join me in a 
later release but I had moved forward by then.

 

I have changed (or more accurately added) lots of languages in my life and 
continue to do so. The biggest challenge is not to just adapt and use it 
similarly to the previous ones already mastered but to understand WHY someone 
designed the language this way and what kind of idioms are common and useful 
even if that means a new way of thinking. But, of course, any “older” language 
has evolved and often drifted in multiple directions. Many now borrow heavily 
from others even when the philosophy is different and often the results are not 
pretty. Making major changes in R might have serious impacts on existing 
programs including just by making them fail as they run out of memory.

 

If you look at R, there is plenty you can do in base R, sometimes by standing 
on your head. Yet you see package after package coming along that offers not 
just new things but sometimes a reworking and even remodeling of old things. R 
has a base graphics system I now rarely use and another called lattice I have 
no reason to use again because I can do so much quite easily in ggplot. 
Similarly, the evolving tidyverse group of packages approaches things from an 
interesting direction to the point where many people mainly use it and not base 
R. So if they were to teach a class in how to gather your data and analyze it 
and draw pretty pictures, the students might walk away thinking they had 
learned R but actually have learned these packages.

 

Your scenario seems related to a common scenario of how we can have values that 
signal beyond some range in an out-of-band manner. Years ago we had functions 
in languages like C that would return a -1 on failure when only non-negative 
results were otherwise possible. That can work fine but fails in cases when any 
possible value in the range can be returned. We have languages that deal with 
this kind of thing using error handling constructs like exceptions.  Sometimes 
you bundle up multiple items into a structure and return that with one element 
of the structure holding some kind of return status and another holding the 
payload. A variation on this theme, as in languages like GO is to have function 
that return multiple values with one of them containing nil on success and an 
error structure on failure.

 

The situation we have here that seems to be of concern to you is that you would 
like each item in a structure to have attributes that are recognized and 
propagated as it is being processed. Older languages tended not to even have a 
concept so basic types simply existed and two instances of the number 5 might 
even be the same underlying one or two strings with the same contents and so 
on. You could of course play the game of making a struct, as mentioned above, 
but then you needed your own code to do all the handling as nothing else knew 
it contained multiple items and which ones had which purpose.

 

R did add generalized attributes and some are fairly well integrated or at 
least partially. “Names” were discussed as not being easy to keep around. 
Factors used their own tagging method that seems to work fairly well but 
probably not everywhere. But what you want may be more general and not built on 
similar foundations.

 

I look at languages like Python that are arguably more object-oriented now than 
R is and in some ways can be extended better, albeit not in others. If I wanted 
to create an object to hold the number 5 and I add methods to the object that 
allow it to participate in various ways with other objects using the hidden 
payload but also sometimes using the hidden payload, then I might pair it with 
the string “five” but also with dozens of other strings for the word 
representing 5 in many languages. So I might have it act like a number in 
numerical situations and like text when someone is using it in writing a novel 
in any of many languages.

 

You seem to want to have the original text visible that gives a reason 
something is missing (or something like that) but have the software TREAT it 
like it is missing in calculations. In effect, you want is.na() to be a bit 
more like is.numeric() or is.character() and care more about the TYPE of what 
is being stored. An item may contain a 999 and yet not be seen as a number but 
as an NA. The problem I see is that you also may want the item to be a string 
like “DELETED” and yet include it in the vector that R insists can only hold 
integers. R does have a built-in data structure called

[Rd] Should all.equal.POSIXt respect check.attributes?

2021-05-25 Thread Jonathan Keane
Hello,

Since bugzilla #17277
(https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17277) was
resolved all.equal.POSIXt now compares timezone attributes. Comment 4
(https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17277#c4) in that
ticket makes reference that both arguments check.tz (which appears to
have actually been implemented as check.tzone) and check.attributes
should disable this checking. However looking at the implementation
(and behavior with a devel version of R) I'm   finding that
check.attributes is not disabling the timezone checks.

Should the more general check.attributes disable this check (as well
as being able to specifically disable only timezone checks with
check.tzone)?

-Jon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: 1954 from NA

2021-05-25 Thread Adrian Dușa
On Tue, May 25, 2021 at 4:14 PM  wrote:

> [...]
>
> Yes, it should be discarded.
>
> You can of course do what you like in code you keep to yourself. But
> please do not distribute code that does this. via CRAN or any other
> means. It will only create problems for those maintaining R.
>
> > After all, the NA is nothing but a tagged NaN.
>
> And we are now paying a price for what was, in hindsight, an
> unfortunate decision.
>

I (only now) understand that. That code is based on the R sources and (mind
you) an almost identical one from package haven.

Regardless, it was not the code I was trying to show, but the vignette: the
end result, the functionality of the software.
That is, automatically treat declared missing values as NAs, without users
being required to explicitly deal with attributes.

Now that I think about it, there might be a way to do this without tagging
NAs, so back to square one.

Best wishes,
Adrian

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: 1954 from NA

2021-05-25 Thread luke-tierney

On Tue, 25 May 2021, Adrian Dușa wrote:


Dear Avi,

Thank you so much for the extended messages, I read them carefully.
While partially offering a solution (I've already been there), it creates
additional work for the user, and some of that is unnecessary.

What I am trying to achieve is best described in this draft vignette:

devtools::install_github("dusadrian/mixed")
vignette("mixed")

Once a value is declared to be missing, the user should not do anything
else about it. Despite being present, the value should automatically be
treated as missing by the software. That is the way it's done in all major
statistical packages like SAS, Stata and even SPSS.

My end goal is to make R attractive for my faculty peers (and beyond),
almost all of whom are massively using SPSS and sometimes Stata. But in
order to convince them to (finally) make the switch, I need to provide
similar functionality, not additional work.

Re. your first part of the message, I am definitely not trying to change
the R internals. The NA will still be NA, exactly as currently defined.
My initial proposal was based on the observation that the 1954 payload was
stored as an unsigned int (thus occupying 32 bits) when it is obvious it
doesn't need more than 16. That was the only proposed modification, and
everything else stays the same.

I now learned, thanks to all contributors in this list, that building
something around that payload is risky because we do not know exactly what
the compilers will do. One possible solution that I can think of, while
(still) maintaining the current functionality around the NA, is to use a
different high word for the NA that would not trigger compilation issues.
But I have absolutely no idea what that implies for the other inner
workings of R.

I very much trust the R core will eventually find a robust solution,
they've solved much more complicated problems than this. I just hope the
current thread will push the idea of tagged NAs on the table, for when they
will discuss this.

Once that will be solved, and despite the current advice discouraging this
route, I believe tagging NAs is a valuable idea that should not be
discarded.


Yes, it should be discarded.

You can of course do what you like in code you keep to yourself. But
please do not distribute code that does this. via CRAN or any other
means. It will only create problems for those maintaining R.


After all, the NA is nothing but a tagged NaN.


And we are now paying a price for what was, in hindsight, an
unfortunate decision.

Best,

luke


All the best,
Adrian


On Tue, May 25, 2021 at 7:05 AM Avi Gross via R-devel 
wrote:


I was thinking about how one does things in a language that is properly
object-oriented versus R that makes various half-assed attempts at being
such.

Clearly in some such languages you can make an object that is a wrapper
that allows you to save an item that is the main payload as well as
anything else you want. You might need a way to convince everything else to
allow you to make things like lists and vectors and other collections of
the objects and perhaps automatically unbox them for many purposes. As an
example in a language like Python, you might provide methods so that adding
A and B actually gets the value out of A and/or B and adds them properly.
But there may be too many edge cases to handle and some software may not
pay attention to what you want including some libraries written in other
languages.

I mention Python for the odd reason that it is now possible to combine
Python and R in the same program and sort of switch back and forth between
data representations. This may provide some openings for preserving and
accessing metadata when needed.

Realistically, if R was being designed from scratch TODAY, many things
might be done differently. But I recall it being developed at Bell Labs for
purposes where it was sort of revolutionary at the time (back when it was
S) and designed to do things in a vectorized way and probably primarily for
the kinds of scientific and mathematical operations where a single NA (of
several types depending on the data) was enough when augmented by a few
things like a Nan and Inf and -Inf. I doubt they seriously saw a need for
an unlimited number of NA that were all the same AND also all different
that they felt had to be built-in. As noted, had they had a reason to make
it fully object-oriented too and made the base types such as integer into
full-fledged objects with room for additional metadata, then things may be
different. I note I have seen languages which have both a data type called
integer as lower case and Integer as upper case. One of them is regularly
boxed and unboxed automagically when used in a context that needs the
other. As far as efficiency goes, this invisibly adds many steps. So do
languages that sometimes take a variable that is a pointer and invisibly
reference it to provide the underlying field rather than make you do extra
typing and so on.

So is there any reason onl

Re: [Rd] [External] Re: 1954 from NA

2021-05-25 Thread Adrian Dușa
Dear Avi,

Thank you so much for the extended messages, I read them carefully.
While partially offering a solution (I've already been there), it creates
additional work for the user, and some of that is unnecessary.

What I am trying to achieve is best described in this draft vignette:

devtools::install_github("dusadrian/mixed")
vignette("mixed")

Once a value is declared to be missing, the user should not do anything
else about it. Despite being present, the value should automatically be
treated as missing by the software. That is the way it's done in all major
statistical packages like SAS, Stata and even SPSS.

My end goal is to make R attractive for my faculty peers (and beyond),
almost all of whom are massively using SPSS and sometimes Stata. But in
order to convince them to (finally) make the switch, I need to provide
similar functionality, not additional work.

Re. your first part of the message, I am definitely not trying to change
the R internals. The NA will still be NA, exactly as currently defined.
My initial proposal was based on the observation that the 1954 payload was
stored as an unsigned int (thus occupying 32 bits) when it is obvious it
doesn't need more than 16. That was the only proposed modification, and
everything else stays the same.

I now learned, thanks to all contributors in this list, that building
something around that payload is risky because we do not know exactly what
the compilers will do. One possible solution that I can think of, while
(still) maintaining the current functionality around the NA, is to use a
different high word for the NA that would not trigger compilation issues.
But I have absolutely no idea what that implies for the other inner
workings of R.

I very much trust the R core will eventually find a robust solution,
they've solved much more complicated problems than this. I just hope the
current thread will push the idea of tagged NAs on the table, for when they
will discuss this.

Once that will be solved, and despite the current advice discouraging this
route, I believe tagging NAs is a valuable idea that should not be
discarded.
After all, the NA is nothing but a tagged NaN.

All the best,
Adrian


On Tue, May 25, 2021 at 7:05 AM Avi Gross via R-devel 
wrote:

> I was thinking about how one does things in a language that is properly
> object-oriented versus R that makes various half-assed attempts at being
> such.
>
> Clearly in some such languages you can make an object that is a wrapper
> that allows you to save an item that is the main payload as well as
> anything else you want. You might need a way to convince everything else to
> allow you to make things like lists and vectors and other collections of
> the objects and perhaps automatically unbox them for many purposes. As an
> example in a language like Python, you might provide methods so that adding
> A and B actually gets the value out of A and/or B and adds them properly.
> But there may be too many edge cases to handle and some software may not
> pay attention to what you want including some libraries written in other
> languages.
>
> I mention Python for the odd reason that it is now possible to combine
> Python and R in the same program and sort of switch back and forth between
> data representations. This may provide some openings for preserving and
> accessing metadata when needed.
>
> Realistically, if R was being designed from scratch TODAY, many things
> might be done differently. But I recall it being developed at Bell Labs for
> purposes where it was sort of revolutionary at the time (back when it was
> S) and designed to do things in a vectorized way and probably primarily for
> the kinds of scientific and mathematical operations where a single NA (of
> several types depending on the data) was enough when augmented by a few
> things like a Nan and Inf and -Inf. I doubt they seriously saw a need for
> an unlimited number of NA that were all the same AND also all different
> that they felt had to be built-in. As noted, had they had a reason to make
> it fully object-oriented too and made the base types such as integer into
> full-fledged objects with room for additional metadata, then things may be
> different. I note I have seen languages which have both a data type called
> integer as lower case and Integer as upper case. One of them is regularly
> boxed and unboxed automagically when used in a context that needs the
> other. As far as efficiency goes, this invisibly adds many steps. So do
> languages that sometimes take a variable that is a pointer and invisibly
> reference it to provide the underlying field rather than make you do extra
> typing and so on.
>
> So is there any reason only an NA should have such meta-data? Why not have
> reasons associated with Inf stating it was an Inf because you asked for one
> or the result of a calculation such as dividing by Zero (albeit maybe that
> might be a NaN) and so on. Maybe I could annotate integers with whether
> they are prime or e