Re: [Rd] string concatenation operator (revisited)

2021-12-12 Thread Bob Rudis
FWIW {stringi} has %+% for this functionality (and I occasionally use
it), tho I do enough processing of quite ughly string content that I
pretty much always have {stringi} loaded. That may not be true for
many other folks.

On Fri, Dec 10, 2021 at 2:07 PM Grant McDermott  wrote:
>
> Sorry I haven't had a chance to reply to anyone. I feel like I dropped a 
> grenade in a room and promptly bolted...
>
> Just to say, then, that I really appreciate everyone's comments and 
> suggestions. While I'm tempted to push back on some points, I don't think 
> it's worth on balance, or will add much beyond what's already been said.
>
> It's interesting to see that there appears to be at least some appetite for 
> additional (f)string operators in base R... notwithstanding the valid 
> objections and the difficulties raised in this thread.
>
> Cheers,
> Grant
>
> Get Outlook for Android
> 
> From: Grant McDermott
> Sent: Saturday, December 4, 2021 12:36:44 PM
> To: r-devel@r-project.org 
> Subject: string concatenation operator (revisited)
>
> Hi all,
>
> I wonder if the R Core team might reconsider an old feature request, as 
> detailed in this 2005 thread: 
> https://stat.ethz.ch/pipermail/r-help/2005-February/thread.html#66698
>
> The TL;DR version is base R support for a `+.character` method. This would 
> essentially provide a shortcut to `paste0`, in much the same way that `\(x)` 
> now provides a shortcut to `function(x)`.
>
> > a = "hello "; b = "world"
> > a + b
> > [1] "hello world"
>
> I appreciate some of the original concerns raised against a native "string1 + 
> string2" implementation. The above thread also provides several 
> use-at-your-own-risk workarounds. But sixteen years is a long time in 
> software development and R now stands as something of an exception on this 
> score. Python, Julia, Stata, and SQL (among various others) all support 
> native string concatenation/interpolation using binary/arithmetic operators. 
> It's been a surprising source of frustration for students in some of the 
> classes I teach, particularly those coming from another language.
>
> Many thanks for considering.
>
> PS. I hope I didn't miss any additional discussion of this issue beyond the 
> original 2005 thread. My search efforts didn't turn anything else up, except 
> this popular Stackoverflow question: 
> https://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r
>
> Grant McDermott
> Assistant Professor
> Department of Economics
> University of Oregon
> www.grantmcdermott.com
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-10 Thread Grant McDermott
Sorry I haven't had a chance to reply to anyone. I feel like I dropped a 
grenade in a room and promptly bolted...

Just to say, then, that I really appreciate everyone's comments and 
suggestions. While I'm tempted to push back on some points, I don't think it's 
worth on balance, or will add much beyond what's already been said.

It's interesting to see that there appears to be at least some appetite for 
additional (f)string operators in base R... notwithstanding the valid 
objections and the difficulties raised in this thread.

Cheers,
Grant

Get Outlook for Android

From: Grant McDermott
Sent: Saturday, December 4, 2021 12:36:44 PM
To: r-devel@r-project.org 
Subject: string concatenation operator (revisited)

Hi all,

I wonder if the R Core team might reconsider an old feature request, as 
detailed in this 2005 thread: 
https://stat.ethz.ch/pipermail/r-help/2005-February/thread.html#66698

The TL;DR version is base R support for a `+.character` method. This would 
essentially provide a shortcut to `paste​0`, in much the same way that `\(x)` 
now provides a shortcut to `function(x)`.

> a = "hello "; b = "world"
> a + b
> [1] "hello world"

I appreciate some of the original concerns raised against a native "string1 + 
string2" implementation. The above thread also provides several 
use-at-your-own-risk workarounds. But sixteen years is a long time in software 
development and R now stands as something of an exception on this score. 
Python, Julia, Stata, and SQL (among various others) all support native string 
concatenation/interpolation using binary/arithmetic operators. It's been a 
surprising source of frustration for students in some of the classes I teach, 
particularly those coming from another language.

Many thanks for considering.

PS. I hope I didn't miss any additional discussion of this issue beyond the 
original 2005 thread. My search efforts didn't turn anything else up, except 
this popular Stackoverflow question: 
https://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r

Grant McDermott
Assistant Professor
Department of Economics
University of Oregon
www.grantmcdermott.com


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-08 Thread Duncan Murdoch
so qualify as worth including in base R but let me 
clarify. There is a difference between being in the minimal core of a language 
and being in a list of packages that are by default included when R is built. 
Even if you include a package by default, it should not be an error to say 
library(name) if it is already loaded on your machine. So even after you make 
something part of the base distribution, people may continue to invoke it as if 
it was not there, lest the code be run on an older version.

The reality is that there can be significant costs in a tradeoff between ease 
of use with many choices and in the expense of running a bloated application 
that takes longer to load and more memory and spends more time searching 
namespaces and so on.

Does adding a properly designed "+" cause much bloat? Maybe not. But the 
guardians of the language get so many requests, that realistically they can only approve 
a small number for each release and often then have to spend more time fixing bugs after 
getting complaints about code that does not work the same anymore!


-Original Message-
From: R-devel  On Behalf Of Taras Zakharko
Sent: Tuesday, December 7, 2021 4:09 AM
To: r-devel 
Subject: Re: [Rd] string concatenation operator (revisited)

Great summary, Avi.

String concatenation cold be trivially added to R, but it probably should not 
be. You will notice that modern languages tend not to use “+” to do string 
concatenation (they either have a custom operator or a special kind of pattern 
to do it) due to practical issues such an approach brings (implicit type 
casting, lack of commutativity, performance etc.). These issues will be felt 
even more so in R with it’s weak typing, idiosyncratic casting behavior and NAs.

As other’s have pointed out, any kind of behavior one wants from string 
concatenation can be implemented by custom operators as needed. This is not 
something that needs to be in the base R. I would rather like the efforts to be 
directed on improving string formatting (such as glue-style built-in string 
interpolation).

— Taras



On 7 Dec 2021, at 02:27, Avi Gross via R-devel  wrote:

After seeing what others are saying, it is clear that you need to
carefully think things out before designing any implementation of a
more native concatenation operator whether it is called "+' or
anything else. There may not be any ONE right solution but unlike a
function version like paste() there is nowhere to place any options that 
specify what you mean.

You can obviously expand paste() to accept arguments like
replace.NA="" or replace.NA="" and similar arguments on what to do
if you see a NaN, and Inf or -Inf, a NULL or even an NA.character_ and
so on. Heck, you might tell to make other substitutions as in
substitute=list(100=99, D=F) or any other nonsense you can come up with.

But you have nowhere to put options when saying:

c <- a + b

Sure, you could set various global options before the addition and
maybe rest them after, but that is not a way I like to go for
something this basic.

And enough such tinkering makes me wonder if it is easier to ask a
user to use a slightly different function like this:

paste.no.na <- function(...) do.call(paste, Filter(Negate(is.na),
list(...)))

The above one-line function removes any NA from the argument list to
make a potentially shorter list before calling the real paste() using it.

Variations can, of course, be made that allow functionality as above.

If R was a true object-oriented language in the same sense as others
like Python, operator overloading of "+" might be doable in more
complex ways but we can only work with what we have. I tend to agree
with others that in some places R is so lenient that all kinds of
errors can happen because it makes a guess on how to correct it.
Generally, if you really want to mix numeric and character, many
languages require you to transform any arguments to make all of
compatible types. The paste() function is clearly stated to coerce all
arguments to be of type character for you. Whereas a+b makes no such
promises and also is not properly defined even if a and b are both of
type character. Sure, we can expand the language but it may still do
things some find not to be quite what they wanted as in "2"+"3"
becoming "23" rather than 5. Right now, I can use
as.numeric("2")+as.numeric("3") and get the intended result after making very 
clear to anyone reading the code that I wanted strings converted to floating point before the 
addition.

As has been pointed out, the plus operator if used to concatenate does
not have a cognate for other operations like -*/ and R has used most
other special symbols for other purposes. So, sure, we can use something like 

(4 periods) if it is not already being used for something but using +
here is a tad confusing. Having said that, the makers of Python did
make that choice.

-----Origi

Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Avi Gross via R-devel
sion.

The reality is that there can be significant costs in a tradeoff between ease 
of use with many choices and in the expense of running a bloated application 
that takes longer to load and more memory and spends more time searching 
namespaces and so on. 

Does adding a properly designed "+" cause much bloat? Maybe not. But the 
guardians of the language get so many requests, that realistically they can 
only approve a small number for each release and often then have to spend more 
time fixing bugs after getting complaints about code that does not work the 
same anymore!


-Original Message-
From: R-devel  On Behalf Of Taras Zakharko
Sent: Tuesday, December 7, 2021 4:09 AM
To: r-devel 
Subject: Re: [Rd] string concatenation operator (revisited)

Great summary, Avi. 

String concatenation cold be trivially added to R, but it probably should not 
be. You will notice that modern languages tend not to use “+” to do string 
concatenation (they either have a custom operator or a special kind of pattern 
to do it) due to practical issues such an approach brings (implicit type 
casting, lack of commutativity, performance etc.). These issues will be felt 
even more so in R with it’s weak typing, idiosyncratic casting behavior and 
NAs. 

As other’s have pointed out, any kind of behavior one wants from string 
concatenation can be implemented by custom operators as needed. This is not 
something that needs to be in the base R. I would rather like the efforts to be 
directed on improving string formatting (such as glue-style built-in string 
interpolation).

— Taras


> On 7 Dec 2021, at 02:27, Avi Gross via R-devel  wrote:
> 
> After seeing what others are saying, it is clear that you need to 
> carefully think things out before designing any implementation of a 
> more native concatenation operator whether it is called "+' or 
> anything else. There may not be any ONE right solution but unlike a 
> function version like paste() there is nowhere to place any options that 
> specify what you mean.
> 
> You can obviously expand paste() to accept arguments like 
> replace.NA="" or replace.NA="" and similar arguments on what to do 
> if you see a NaN, and Inf or -Inf, a NULL or even an NA.character_ and 
> so on. Heck, you might tell to make other substitutions as in 
> substitute=list(100=99, D=F) or any other nonsense you can come up with.
> 
> But you have nowhere to put options when saying:
> 
> c <- a + b
> 
> Sure, you could set various global options before the addition and 
> maybe rest them after, but that is not a way I like to go for 
> something this basic.
> 
> And enough such tinkering makes me wonder if it is easier to ask a 
> user to use a slightly different function like this:
> 
> paste.no.na <- function(...) do.call(paste, Filter(Negate(is.na),
> list(...)))
> 
> The above one-line function removes any NA from the argument list to 
> make a potentially shorter list before calling the real paste() using it.
> 
> Variations can, of course, be made that allow functionality as above. 
> 
> If R was a true object-oriented language in the same sense as others 
> like Python, operator overloading of "+" might be doable in more 
> complex ways but we can only work with what we have. I tend to agree 
> with others that in some places R is so lenient that all kinds of 
> errors can happen because it makes a guess on how to correct it.
> Generally, if you really want to mix numeric and character, many 
> languages require you to transform any arguments to make all of 
> compatible types. The paste() function is clearly stated to coerce all 
> arguments to be of type character for you. Whereas a+b makes no such 
> promises and also is not properly defined even if a and b are both of 
> type character. Sure, we can expand the language but it may still do 
> things some find not to be quite what they wanted as in "2"+"3"
> becoming "23" rather than 5. Right now, I can use
> as.numeric("2")+as.numeric("3") and get the intended result after making very 
> clear to anyone reading the code that I wanted strings converted to floating 
> point before the addition.
> 
> As has been pointed out, the plus operator if used to concatenate does 
> not have a cognate for other operations like -*/ and R has used most 
> other special symbols for other purposes. So, sure, we can use something like 
> 
> (4 periods) if it is not already being used for something but using + 
> here is a tad confusing. Having said that, the makers of Python did 
> make that choice.
> 
> -Original Message-
> From: R-devel  On Behalf Of Gabriel 
> Becker
> Sent: Monday, December 6, 2021 7:21 PM
> To: Bill Dunlap 
> Cc: Radford N

Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Martin Maechler
> Martin Maechler 
> on Tue, 7 Dec 2021 18:35:00 +0100 writes:

> Taras Zakharko 
> on Tue, 7 Dec 2021 12:56:30 +0100 writes:

>> I fully agree! General string interpolation opens a gaping security hole 
and is accompanied by all kinds of problems and decisions. What I envision 
instead is something like this:
>> f”hello {name}” 

>> Which gets parsed by R to this:

>> (STRINTERPSXP (CHARSXP (PROMISE nil)))

>> Basically, a new type of R language construct that still can be 
processed by packages (for customized interpolation like in cli etc.), with a 
default eval which is basically paste0(). The benefit here would be that this 
is eagerly parsed and syntactically checked, and that the promise code could 
carry a srcref. And of course, that you could pass an interpolated string 
expression lazily between frames without losing the environment etc… For more 
advanced applications, a low level string interpolation expression constructor 
could be provided (that could either parse a general string — at the user’s 
risk, or build it directly from expressions). 

>> — Taras

> Well, many months ago, R's  NEWS (for R-devel, then became R 4.0.0)
> contained

> * There is a new syntax for specifying _raw_ character constants
> similar to the one used in C++: r"(...)" with ... any character
> sequence not containing the sequence )".  This makes it easier to
> write strings that contain backslashes or both single and double
> quotes.  For more details see ?Quotes.

> This should be pretty close to what you propose above
> (well, you need to replace your UTF-8 forward double quotes by
> ASCII ones),
> no ?

No it is not; sorry I'm not at full strength..
Martin


>>> On 7 Dec 2021, at 12:06, Simon Urbanek  
wrote:
>>> 
>>> 
>>> 
 On Dec 7, 2021, at 22:09, Taras Zakharko mailto:taras.zakha...@uzh.ch>> wrote:
 
 Great summary, Avi. 
 
 String concatenation cold be trivially added to R, but it probably 
should not be. You will notice that modern languages tend not to use “+” to do 
string concatenation (they either have 
 a custom operator or a special kind of pattern to do it) due to 
practical issues such an approach brings (implicit type casting, lack of 
commutativity, performance etc.). These issues will be felt even more so in R 
with it’s weak typing, idiosyncratic casting behavior and NAs. 
 
 As other’s have pointed out, any kind of behavior one wants from 
string concatenation can be implemented by custom operators as needed. This is 
not something that needs to be in the base R. I would rather like the efforts 
to be directed on improving string formatting (such as glue-style built-in 
string interpolation).
 
>>> 
>>> This is getting OT, but there is a very good reason why string 
interpolation is not in core R. As I recall it has been considered some time 
ago, but it is very dangerous as it implies evaluation on constants which opens 
a huge security hole and has questionable semantics (where you evaluate etc). 
Hence it's much easier to ban a package than to hack it out of R ;).
>>> 
>>> Cheers,
>>> Simon
>>> 
 — Taras

> []

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Martin Maechler
> Taras Zakharko 
> on Tue, 7 Dec 2021 12:56:30 +0100 writes:

> I fully agree! General string interpolation opens a gaping security hole 
and is accompanied by all kinds of problems and decisions. What I envision 
instead is something like this:
> f”hello {name}” 

> Which gets parsed by R to this:

> (STRINTERPSXP (CHARSXP (PROMISE nil)))

> Basically, a new type of R language construct that still can be processed 
by packages (for customized interpolation like in cli etc.), with a default 
eval which is basically paste0(). The benefit here would be that this is 
eagerly parsed and syntactically checked, and that the promise code could carry 
a srcref. And of course, that you could pass an interpolated string expression 
lazily between frames without losing the environment etc… For more advanced 
applications, a low level string interpolation expression constructor could be 
provided (that could either parse a general string — at the user’s risk, or 
build it directly from expressions). 

> — Taras

Well, many months ago, R's  NEWS (for R-devel, then became R 4.0.0)
contained

* There is a new syntax for specifying _raw_ character constants
  similar to the one used in C++: r"(...)" with ... any character
  sequence not containing the sequence )".  This makes it easier to
  write strings that contain backslashes or both single and double
  quotes.  For more details see ?Quotes.

This should be pretty close to what you propose above
(well, you need to replace your UTF-8 forward double quotes by
ASCII ones),
no ?

>> On 7 Dec 2021, at 12:06, Simon Urbanek  
wrote:
>> 
>> 
>> 
>>> On Dec 7, 2021, at 22:09, Taras Zakharko mailto:taras.zakha...@uzh.ch>> wrote:
>>> 
>>> Great summary, Avi. 
>>> 
>>> String concatenation cold be trivially added to R, but it probably 
should not be. You will notice that modern languages tend not to use “+” to do 
string concatenation (they either have 
>>> a custom operator or a special kind of pattern to do it) due to 
practical issues such an approach brings (implicit type casting, lack of 
commutativity, performance etc.). These issues will be felt even more so in R 
with it’s weak typing, idiosyncratic casting behavior and NAs. 
>>> 
>>> As other’s have pointed out, any kind of behavior one wants from string 
concatenation can be implemented by custom operators as needed. This is not 
something that needs to be in the base R. I would rather like the efforts to be 
directed on improving string formatting (such as glue-style built-in string 
interpolation).
>>> 
>> 
>> This is getting OT, but there is a very good reason why string 
interpolation is not in core R. As I recall it has been considered some time 
ago, but it is very dangerous as it implies evaluation on constants which opens 
a huge security hole and has questionable semantics (where you evaluate etc). 
Hence it's much easier to ban a package than to hack it out of R ;).
>> 
>> Cheers,
>> Simon
>> 
>>> — Taras

 []

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Dirk Eddelbuettel


On 8 December 2021 at 00:06, Simon Urbanek wrote:
| Hence it's much easier to ban a package than to hack it out of R ;).

Paging Achim for suggested `fortunes` inclusion.

Dirk

-- 
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Taras Zakharko
l of compatible types. The paste() function is clearly stated to coerce
>>> all arguments to be of type character for you. Whereas a+b makes no such
>>> promises and also is not properly defined even if a and b are both of type
>>> character. Sure, we can expand the language but it may still do things some
>>> find not to be quite what they wanted as in "2"+"3" becoming "23" rather
>>> than 5. Right now, I can use as.numeric("2")+as.numeric("3") and get the
>>> intended result after making very clear to anyone reading the code that I
>>> wanted strings converted to floating point before the addition.
>>> 
>>> As has been pointed out, the plus operator if used to concatenate does not
>>> have a cognate for other operations like -*/ and R has used most other
>>> special symbols for other purposes. So, sure, we can use something like 
>>> (4 periods) if it is not already being used for something but using + here
>>> is a tad confusing. Having said that, the makers of Python did make that
>>> choice.
>>> 
>>> -Original Message-
>>> From: R-devel  On Behalf Of Gabriel Becker
>>> Sent: Monday, December 6, 2021 7:21 PM
>>> To: Bill Dunlap 
>>> Cc: Radford Neal ; r-devel 
>>> Subject: Re: [Rd] string concatenation operator (revisited)
>>> 
>>> As I recall, there was a large discussion related to that which resulted in
>>> the recycle0 argument being added (but defaulting to FALSE) for
>>> paste/paste0.
>>> 
>>> I think a lot of these things ultimately mean that if there were to be a
>>> string concatenation operator, it probably shouldn't have behavior identical
>>> to paste0. Was that what you were getting at as well, Bill?
>>> 
>>> ~G
>>> 
>>> On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap  wrote:
>>> 
>>>> Should paste0(character(0), c("a","b")) give character(0)?
>>>> There is a fair bit of code that assumes that paste("X",NULL) gives "X"
>>>> but c(1,2)+NULL gives numeric(0).
>>>> 
>>>> -Bill
>>>> 
>>>> On Mon, Dec 6, 2021 at 1:32 PM Duncan Murdoch 
>>>> 
>>>> wrote:
>>>> 
>>>>> On 06/12/2021 4:21 p.m., Avraham Adler wrote:
>>>>>> Gabe, I agree that missingness is important to factor in. To 
>>>>>> somewhat
>>>>> abuse
>>>>>> the terminology, NA is often used to represent missingness. Perhaps 
>>>>>> concatenating character something with character something missing
>>>>> should
>>>>>> result in the original character?
>>>>> 
>>>>> I think that's a bad idea.  If you wanted to represent an empty 
>>>>> string, you should use "" or NULL, not NA.
>>>>> 
>>>>> I'd agree with Gabe, paste0("abc", NA) shouldn't give "abcNA", it 
>>>>> should give NA.
>>>>> 
>>>>> Duncan Murdoch
>>>>> 
>>>>>> 
>>>>>> Avi
>>>>>> 
>>>>>> On Mon, Dec 6, 2021 at 3:35 PM Gabriel Becker 
>>>>>> 
>>>>> wrote:
>>>>>> 
>>>>>>> Hi All,
>>>>>>> 
>>>>>>> Seeing this and the other thread (and admittedly not having 
>>>>>>> clicked
>>>>> through
>>>>>>> to the linked r-help thread), I wonder about NAs.
>>>>>>> 
>>>>>>> Should NA  "hi there"  not result in NA_character_? This 
>>>>>>> is not what any of the paste functions do, but in my opinoin, NA +
>>>>> 
>>>>>>> seems like it should be NA  (not "NA"), particularly if we are 
>>>>>>> talking about `+` overloading, but potentially even in the case of 
>>>>>>> a distinct concatenation operator?
>>>>>>> 
>>>>>>> I guess what I'm saying is that in my head missingness propagation
>>>>> rules
>>>>>>> should take priority in such an operator (ie NA +  
>>>>>>> should *always * be NA).
>>>>>>> 
>>>>>>> Is that something others disagree with, or has it just not come up 
>>>>>>> yet
>>>>> in
>>>&

Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Simon Urbanek



> On Dec 7, 2021, at 22:09, Taras Zakharko  wrote:
> 
> Great summary, Avi. 
> 
> String concatenation cold be trivially added to R, but it probably should not 
> be. You will notice that modern languages tend not to use “+” to do string 
> concatenation (they either have 
> a custom operator or a special kind of pattern to do it) due to practical 
> issues such an approach brings (implicit type casting, lack of commutativity, 
> performance etc.). These issues will be felt even more so in R with it’s weak 
> typing, idiosyncratic casting behavior and NAs. 
> 
> As other’s have pointed out, any kind of behavior one wants from string 
> concatenation can be implemented by custom operators as needed. This is not 
> something that needs to be in the base R. I would rather like the efforts to 
> be directed on improving string formatting (such as glue-style built-in 
> string interpolation).
> 

This is getting OT, but there is a very good reason why string interpolation is 
not in core R. As I recall it has been considered some time ago, but it is very 
dangerous as it implies evaluation on constants which opens a huge security 
hole and has questionable semantics (where you evaluate etc). Hence it's much 
easier to ban a package than to hack it out of R ;).

Cheers,
Simon


> — Taras
> 
> 
>> On 7 Dec 2021, at 02:27, Avi Gross via R-devel  wrote:
>> 
>> After seeing what others are saying, it is clear that you need to carefully
>> think things out before designing any implementation of a more native
>> concatenation operator whether it is called "+' or anything else. There may
>> not be any ONE right solution but unlike a function version like paste()
>> there is nowhere to place any options that specify what you mean.
>> 
>> You can obviously expand paste() to accept arguments like replace.NA="" or
>> replace.NA="" and similar arguments on what to do if you see a NaN, and
>> Inf or -Inf, a NULL or even an NA.character_ and so on. Heck, you might tell
>> to make other substitutions as in substitute=list(100=99, D=F) or any other
>> nonsense you can come up with.
>> 
>> But you have nowhere to put options when saying:
>> 
>> c <- a + b
>> 
>> Sure, you could set various global options before the addition and maybe
>> rest them after, but that is not a way I like to go for something this
>> basic.
>> 
>> And enough such tinkering makes me wonder if it is easier to ask a user to
>> use a slightly different function like this:
>> 
>> paste.no.na <- function(...) do.call(paste, Filter(Negate(is.na),
>> list(...)))
>> 
>> The above one-line function removes any NA from the argument list to make a
>> potentially shorter list before calling the real paste() using it.
>> 
>> Variations can, of course, be made that allow functionality as above. 
>> 
>> If R was a true object-oriented language in the same sense as others like
>> Python, operator overloading of "+" might be doable in more complex ways but
>> we can only work with what we have. I tend to agree with others that in some
>> places R is so lenient that all kinds of errors can happen because it makes
>> a guess on how to correct it. Generally, if you really want to mix numeric
>> and character, many languages require you to transform any arguments to make
>> all of compatible types. The paste() function is clearly stated to coerce
>> all arguments to be of type character for you. Whereas a+b makes no such
>> promises and also is not properly defined even if a and b are both of type
>> character. Sure, we can expand the language but it may still do things some
>> find not to be quite what they wanted as in "2"+"3" becoming "23" rather
>> than 5. Right now, I can use as.numeric("2")+as.numeric("3") and get the
>> intended result after making very clear to anyone reading the code that I
>> wanted strings converted to floating point before the addition.
>> 
>> As has been pointed out, the plus operator if used to concatenate does not
>> have a cognate for other operations like -*/ and R has used most other
>> special symbols for other purposes. So, sure, we can use something like 
>> (4 periods) if it is not already being used for something but using + here
>> is a tad confusing. Having said that, the makers of Python did make that
>> choice.
>> 
>> -Original Message-
>> From: R-devel  On Behalf Of Gabriel Becker
>> Sent: Monday, December 6, 2021 7:21 PM
>> To: Bill Dunlap 
>> Cc: Radford Neal ; r-devel 
>> Subject: Re: [Rd] string conc

Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Duncan Murdoch

On 07/12/2021 4:09 a.m., Taras Zakharko wrote:

Great summary, Avi.

String concatenation cold be trivially added to R, but it probably should not 
be. You will notice that modern languages tend not to use “+” to do string 
concatenation (they either have
a custom operator or a special kind of pattern to do it) due to practical 
issues such an approach brings (implicit type casting, lack of commutativity, 
performance etc.). These issues will be felt even more so in R with it’s weak 
typing, idiosyncratic casting behavior and NAs.

As other’s have pointed out, any kind of behavior one wants from string concatenation can be implemented by custom operators as needed. 



This is not something that needs to be in the base R. I would rather like the 
efforts to be directed on improving string formatting (such as glue-style 
built-in string interpolation).


R already has that in the glue package and elsewhere in other packages 
(e.g. I wrote a simple version for rgl). What would be the benefit of 
having it built in?


Duncan Murdoch



— Taras



On 7 Dec 2021, at 02:27, Avi Gross via R-devel  wrote:

After seeing what others are saying, it is clear that you need to carefully
think things out before designing any implementation of a more native
concatenation operator whether it is called "+' or anything else. There may
not be any ONE right solution but unlike a function version like paste()
there is nowhere to place any options that specify what you mean.

You can obviously expand paste() to accept arguments like replace.NA="" or
replace.NA="" and similar arguments on what to do if you see a NaN, and
Inf or -Inf, a NULL or even an NA.character_ and so on. Heck, you might tell
to make other substitutions as in substitute=list(100=99, D=F) or any other
nonsense you can come up with.

But you have nowhere to put options when saying:

c <- a + b

Sure, you could set various global options before the addition and maybe
rest them after, but that is not a way I like to go for something this
basic.

And enough such tinkering makes me wonder if it is easier to ask a user to
use a slightly different function like this:

paste.no.na <- function(...) do.call(paste, Filter(Negate(is.na),
list(...)))

The above one-line function removes any NA from the argument list to make a
potentially shorter list before calling the real paste() using it.

Variations can, of course, be made that allow functionality as above.

If R was a true object-oriented language in the same sense as others like
Python, operator overloading of "+" might be doable in more complex ways but
we can only work with what we have. I tend to agree with others that in some
places R is so lenient that all kinds of errors can happen because it makes
a guess on how to correct it. Generally, if you really want to mix numeric
and character, many languages require you to transform any arguments to make
all of compatible types. The paste() function is clearly stated to coerce
all arguments to be of type character for you. Whereas a+b makes no such
promises and also is not properly defined even if a and b are both of type
character. Sure, we can expand the language but it may still do things some
find not to be quite what they wanted as in "2"+"3" becoming "23" rather
than 5. Right now, I can use as.numeric("2")+as.numeric("3") and get the
intended result after making very clear to anyone reading the code that I
wanted strings converted to floating point before the addition.

As has been pointed out, the plus operator if used to concatenate does not
have a cognate for other operations like -*/ and R has used most other
special symbols for other purposes. So, sure, we can use something like 
(4 periods) if it is not already being used for something but using + here
is a tad confusing. Having said that, the makers of Python did make that
choice.

-Original Message-
From: R-devel  On Behalf Of Gabriel Becker
Sent: Monday, December 6, 2021 7:21 PM
To: Bill Dunlap 
Cc: Radford Neal ; r-devel 
Subject: Re: [Rd] string concatenation operator (revisited)

As I recall, there was a large discussion related to that which resulted in
the recycle0 argument being added (but defaulting to FALSE) for
paste/paste0.

I think a lot of these things ultimately mean that if there were to be a
string concatenation operator, it probably shouldn't have behavior identical
to paste0. Was that what you were getting at as well, Bill?

~G

On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap  wrote:


Should paste0(character(0), c("a","b")) give character(0)?
There is a fair bit of code that assumes that paste("X",NULL) gives "X"
but c(1,2)+NULL gives numeric(0).

-Bill

On Mon, Dec 6, 2021 at 1:32 PM Duncan Murdoch

wrote:


On 06/12/2021 4:21 p.m., Avraham Adler wrote:

Gabe, I agree that missingness is important to factor in. To
somewhat

abuse

the termino

Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Taras Zakharko
Great summary, Avi. 

String concatenation cold be trivially added to R, but it probably should not 
be. You will notice that modern languages tend not to use “+” to do string 
concatenation (they either have 
a custom operator or a special kind of pattern to do it) due to practical 
issues such an approach brings (implicit type casting, lack of commutativity, 
performance etc.). These issues will be felt even more so in R with it’s weak 
typing, idiosyncratic casting behavior and NAs. 

As other’s have pointed out, any kind of behavior one wants from string 
concatenation can be implemented by custom operators as needed. This is not 
something that needs to be in the base R. I would rather like the efforts to be 
directed on improving string formatting (such as glue-style built-in string 
interpolation).

— Taras


> On 7 Dec 2021, at 02:27, Avi Gross via R-devel  wrote:
> 
> After seeing what others are saying, it is clear that you need to carefully
> think things out before designing any implementation of a more native
> concatenation operator whether it is called "+' or anything else. There may
> not be any ONE right solution but unlike a function version like paste()
> there is nowhere to place any options that specify what you mean.
> 
> You can obviously expand paste() to accept arguments like replace.NA="" or
> replace.NA="" and similar arguments on what to do if you see a NaN, and
> Inf or -Inf, a NULL or even an NA.character_ and so on. Heck, you might tell
> to make other substitutions as in substitute=list(100=99, D=F) or any other
> nonsense you can come up with.
> 
> But you have nowhere to put options when saying:
> 
> c <- a + b
> 
> Sure, you could set various global options before the addition and maybe
> rest them after, but that is not a way I like to go for something this
> basic.
> 
> And enough such tinkering makes me wonder if it is easier to ask a user to
> use a slightly different function like this:
> 
> paste.no.na <- function(...) do.call(paste, Filter(Negate(is.na),
> list(...)))
> 
> The above one-line function removes any NA from the argument list to make a
> potentially shorter list before calling the real paste() using it.
> 
> Variations can, of course, be made that allow functionality as above. 
> 
> If R was a true object-oriented language in the same sense as others like
> Python, operator overloading of "+" might be doable in more complex ways but
> we can only work with what we have. I tend to agree with others that in some
> places R is so lenient that all kinds of errors can happen because it makes
> a guess on how to correct it. Generally, if you really want to mix numeric
> and character, many languages require you to transform any arguments to make
> all of compatible types. The paste() function is clearly stated to coerce
> all arguments to be of type character for you. Whereas a+b makes no such
> promises and also is not properly defined even if a and b are both of type
> character. Sure, we can expand the language but it may still do things some
> find not to be quite what they wanted as in "2"+"3" becoming "23" rather
> than 5. Right now, I can use as.numeric("2")+as.numeric("3") and get the
> intended result after making very clear to anyone reading the code that I
> wanted strings converted to floating point before the addition.
> 
> As has been pointed out, the plus operator if used to concatenate does not
> have a cognate for other operations like -*/ and R has used most other
> special symbols for other purposes. So, sure, we can use something like 
> (4 periods) if it is not already being used for something but using + here
> is a tad confusing. Having said that, the makers of Python did make that
> choice.
> 
> -Original Message-
> From: R-devel  On Behalf Of Gabriel Becker
> Sent: Monday, December 6, 2021 7:21 PM
> To: Bill Dunlap 
> Cc: Radford Neal ; r-devel 
> Subject: Re: [Rd] string concatenation operator (revisited)
> 
> As I recall, there was a large discussion related to that which resulted in
> the recycle0 argument being added (but defaulting to FALSE) for
> paste/paste0.
> 
> I think a lot of these things ultimately mean that if there were to be a
> string concatenation operator, it probably shouldn't have behavior identical
> to paste0. Was that what you were getting at as well, Bill?
> 
> ~G
> 
> On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap  wrote:
> 
>> Should paste0(character(0), c("a","b")) give character(0)?
>> There is a fair bit of code that assumes that paste("X",NULL) gives "X"
>> but c(1,2)+NULL gives numeric(0).
>> 
>> -Bill
>> 
>

Re: [Rd] string concatenation operator (revisited)

2021-12-06 Thread Avi Gross via R-devel
After seeing what others are saying, it is clear that you need to carefully
think things out before designing any implementation of a more native
concatenation operator whether it is called "+' or anything else. There may
not be any ONE right solution but unlike a function version like paste()
there is nowhere to place any options that specify what you mean.

You can obviously expand paste() to accept arguments like replace.NA="" or
replace.NA="" and similar arguments on what to do if you see a NaN, and
Inf or -Inf, a NULL or even an NA.character_ and so on. Heck, you might tell
to make other substitutions as in substitute=list(100=99, D=F) or any other
nonsense you can come up with.

But you have nowhere to put options when saying:

c <- a + b

Sure, you could set various global options before the addition and maybe
rest them after, but that is not a way I like to go for something this
basic.

And enough such tinkering makes me wonder if it is easier to ask a user to
use a slightly different function like this:

paste.no.na <- function(...) do.call(paste, Filter(Negate(is.na),
list(...)))

The above one-line function removes any NA from the argument list to make a
potentially shorter list before calling the real paste() using it.

Variations can, of course, be made that allow functionality as above. 

If R was a true object-oriented language in the same sense as others like
Python, operator overloading of "+" might be doable in more complex ways but
we can only work with what we have. I tend to agree with others that in some
places R is so lenient that all kinds of errors can happen because it makes
a guess on how to correct it. Generally, if you really want to mix numeric
and character, many languages require you to transform any arguments to make
all of compatible types. The paste() function is clearly stated to coerce
all arguments to be of type character for you. Whereas a+b makes no such
promises and also is not properly defined even if a and b are both of type
character. Sure, we can expand the language but it may still do things some
find not to be quite what they wanted as in "2"+"3" becoming "23" rather
than 5. Right now, I can use as.numeric("2")+as.numeric("3") and get the
intended result after making very clear to anyone reading the code that I
wanted strings converted to floating point before the addition.

As has been pointed out, the plus operator if used to concatenate does not
have a cognate for other operations like -*/ and R has used most other
special symbols for other purposes. So, sure, we can use something like 
(4 periods) if it is not already being used for something but using + here
is a tad confusing. Having said that, the makers of Python did make that
choice.

-Original Message-
From: R-devel  On Behalf Of Gabriel Becker
Sent: Monday, December 6, 2021 7:21 PM
To: Bill Dunlap 
Cc: Radford Neal ; r-devel 
Subject: Re: [Rd] string concatenation operator (revisited)

As I recall, there was a large discussion related to that which resulted in
the recycle0 argument being added (but defaulting to FALSE) for
paste/paste0.

I think a lot of these things ultimately mean that if there were to be a
string concatenation operator, it probably shouldn't have behavior identical
to paste0. Was that what you were getting at as well, Bill?

~G

On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap  wrote:

> Should paste0(character(0), c("a","b")) give character(0)?
> There is a fair bit of code that assumes that paste("X",NULL) gives "X"
> but c(1,2)+NULL gives numeric(0).
>
> -Bill
>
> On Mon, Dec 6, 2021 at 1:32 PM Duncan Murdoch 
> 
> wrote:
>
>> On 06/12/2021 4:21 p.m., Avraham Adler wrote:
>> > Gabe, I agree that missingness is important to factor in. To 
>> > somewhat
>> abuse
>> > the terminology, NA is often used to represent missingness. Perhaps 
>> > concatenating character something with character something missing
>> should
>> > result in the original character?
>>
>> I think that's a bad idea.  If you wanted to represent an empty 
>> string, you should use "" or NULL, not NA.
>>
>> I'd agree with Gabe, paste0("abc", NA) shouldn't give "abcNA", it 
>> should give NA.
>>
>> Duncan Murdoch
>>
>> >
>> > Avi
>> >
>> > On Mon, Dec 6, 2021 at 3:35 PM Gabriel Becker 
>> > 
>> wrote:
>> >
>> >> Hi All,
>> >>
>> >> Seeing this and the other thread (and admittedly not having 
>> >> clicked
>> through
>> >> to the linked r-help thread), I wonder about NAs.
>> >>
>> >> Should NA  "hi there"  not result in NA_character_? This 
>>

Re: [Rd] string concatenation operator (revisited)

2021-12-06 Thread Bill Dunlap
>I think a lot of these things ultimately mean that if there were to be a
string >concatenation operator, it probably shouldn't have behavior
identical to >paste0. Was that what you were getting at as well, Bill?

Yes.

On Mon, Dec 6, 2021 at 4:21 PM Gabriel Becker  wrote:

> As I recall, there was a large discussion related to that which resulted
> in the recycle0 argument being added (but defaulting to FALSE) for
> paste/paste0.
>
> I think a lot of these things ultimately mean that if there were to be a
> string concatenation operator, it probably shouldn't have behavior
> identical to paste0. Was that what you were getting at as well, Bill?
>
> ~G
>
> On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap 
> wrote:
>
>> Should paste0(character(0), c("a","b")) give character(0)?
>> There is a fair bit of code that assumes that paste("X",NULL) gives "X"
>> but c(1,2)+NULL gives numeric(0).
>>
>> -Bill
>>
>> On Mon, Dec 6, 2021 at 1:32 PM Duncan Murdoch 
>> wrote:
>>
>>> On 06/12/2021 4:21 p.m., Avraham Adler wrote:
>>> > Gabe, I agree that missingness is important to factor in. To somewhat
>>> abuse
>>> > the terminology, NA is often used to represent missingness. Perhaps
>>> > concatenating character something with character something missing
>>> should
>>> > result in the original character?
>>>
>>> I think that's a bad idea.  If you wanted to represent an empty string,
>>> you should use "" or NULL, not NA.
>>>
>>> I'd agree with Gabe, paste0("abc", NA) shouldn't give "abcNA", it should
>>> give NA.
>>>
>>> Duncan Murdoch
>>>
>>> >
>>> > Avi
>>> >
>>> > On Mon, Dec 6, 2021 at 3:35 PM Gabriel Becker 
>>> wrote:
>>> >
>>> >> Hi All,
>>> >>
>>> >> Seeing this and the other thread (and admittedly not having clicked
>>> through
>>> >> to the linked r-help thread), I wonder about NAs.
>>> >>
>>> >> Should NA  "hi there"  not result in NA_character_? This is
>>> not
>>> >> what any of the paste functions do, but in my opinoin, NA +
>>> 
>>> >> seems like it should be NA  (not "NA"), particularly if we are talking
>>> >> about `+` overloading, but potentially even in the case of a distinct
>>> >> concatenation operator?
>>> >>
>>> >> I guess what I'm saying is that in my head missingness propagation
>>> rules
>>> >> should take priority in such an operator (ie NA +  should
>>> >> *always * be NA).
>>> >>
>>> >> Is that something others disagree with, or has it just not come up
>>> yet in
>>> >> (the parts I have read) of this discussion?
>>> >>
>>> >> Best,
>>> >> ~G
>>> >>
>>> >> On Mon, Dec 6, 2021 at 10:03 AM Radford Neal 
>>> >> wrote:
>>> >>
>>> > In pqR (see pqR-project.org), I have implemented ! and !! as binary
>>> > string concatenation operators, equivalent to paste0 and paste,
>>> > respectively.
>>> >
>>> > For instance,
>>> >
>>> >   > "hello" ! "world"
>>> >   [1] "helloworld"
>>> >   > "hello" !! "world"
>>> >   [1] "hello world"
>>> >   > "hello" !! 1:4
>>> >   [1] "hello 1" "hello 2" "hello 3" "hello 4"
>>> 
>>>  I'm curious about the details:
>>> 
>>>  Would `1 ! 2` convert both to strings?
>>> >>>
>>> >>> They're equivalent to paste0 and paste, so 1 ! 2 produces "12", just
>>> >>> like paste0(1,2) does.  Of course, they wouldn't have to be exactly
>>> >>> equivalent to paste0 and paste - one could impose stricter
>>> >>> requirements if that seemed better for error detection.  Off hand,
>>> >>> though, I think automatically converting is more in keeping with the
>>> >>> rest of R.  Explicitly converting with as.character could be tedious.
>>> >>>
>>> >>> I suppose disallowing logical arguments might make sense to guard
>>> >>> against typos where ! was meant to be the unary-not operator, but
>>> >>> ended up being a binary operator, after some sort of typo.  I doubt
>>> >>> that this would be a common error, though.
>>> >>>
>>> >>> (Note that there's no ambiguity when there are no typos, except that
>>> >>> when negation is involved a space may be needed - so, for example,
>>> >>> "x" !  !TRUE is "xFALSE", but "x"!!TRUE is "x TRUE".  Existing uses
>>> of
>>> >>> double negation are still fine - eg, a <- !!TRUE still sets a to
>>> TRUE.
>>> >>> Parsing of operators is greedy, so "x"!!!TRUE is "x FALSE", not
>>> "xTRUE".)
>>> >>>
>>>  Where does the binary ! fit in the operator priority?  E.g. how is
>>> 
>>> a ! b > c
>>> 
>>>  parsed?
>>> >>>
>>> >>> As (a ! b) > c.
>>> >>>
>>> >>> Their precedence is between that of + and - and that of < and >.
>>> >>> So "x" ! 1+2 evalates to "x3" and "x" ! 1+2 < "x4" is TRUE.
>>> >>>
>>> >>> (Actually, pqR also has a .. operator that fixes the problems with
>>> >>> generating sequences with the : operator, and it has precedence lower
>>> >>> than + and - and higher than ! and !!, but that's not relevant if you
>>> >>> don't have the .. operator.)
>>> >>>
>>> >>> Radford Neal
>>> >>>
>>> >>> __
>>> >>> 

Re: [Rd] string concatenation operator (revisited)

2021-12-06 Thread David Scott
I am surprised nobody so far has mentioned glue which is an 
implementation in R of a python idiom.

It is a reverse import in a great number of R packages on CRAN. It 
specifies how some of the special cases so far considered are treated 
which seems an advantage:

 > library(glue)
 > glue(NA, 2)
NA2
 > glue(NA, 2, .sep = " ")
NA 2
 > glue(NA, 2, .na = NULL)
NA

David Scott

On 7/12/2021 1:20 pm, Gabriel Becker wrote:
> As I recall, there was a large discussion related to that which 
> resulted in
> the recycle0 argument being added (but defaulting to FALSE) for
> paste/paste0.
>
> I think a lot of these things ultimately mean that if there were to be a
> string concatenation operator, it probably shouldn't have behavior
> identical to paste0. Was that what you were getting at as well, Bill?
>
> ~G
>
> On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap  
> wrote:
>
> > Should paste0(character(0), c("a","b")) give character(0)?
> > There is a fair bit of code that assumes that paste("X",NULL) gives "X"
> > but c(1,2)+NULL gives numeric(0).
> >
> > -Bill
> >
> > On Mon, Dec 6, 2021 at 1:32 PM Duncan Murdoch 
> > wrote:
> >
> >> On 06/12/2021 4:21 p.m., Avraham Adler wrote:
> >> > Gabe, I agree that missingness is important to factor in. To somewhat
> >> abuse
> >> > the terminology, NA is often used to represent missingness. Perhaps
> >> > concatenating character something with character something missing
> >> should
> >> > result in the original character?
> >>
> >> I think that's a bad idea. If you wanted to represent an empty string,
> >> you should use "" or NULL, not NA.
> >>
> >> I'd agree with Gabe, paste0("abc", NA) shouldn't give "abcNA", it 
> should
> >> give NA.
> >>
> >> Duncan Murdoch
> >>
> >> >
> >> > Avi
> >> >
> >> > On Mon, Dec 6, 2021 at 3:35 PM Gabriel Becker 
> >> wrote:
> >> >
> >> >> Hi All,
> >> >>
> >> >> Seeing this and the other thread (and admittedly not having clicked
> >> through
> >> >> to the linked r-help thread), I wonder about NAs.
> >> >>
> >> >> Should NA  "hi there" not result in NA_character_? This 
> is not
> >> >> what any of the paste functions do, but in my opinoin, NA +
> >> 
> >> >> seems like it should be NA (not "NA"), particularly if we are 
> talking
> >> >> about `+` overloading, but potentially even in the case of a 
> distinct
> >> >> concatenation operator?
> >> >>
> >> >> I guess what I'm saying is that in my head missingness propagation
> >> rules
> >> >> should take priority in such an operator (ie NA +  should
> >> >> *always * be NA).
> >> >>
> >> >> Is that something others disagree with, or has it just not come 
> up yet
> >> in
> >> >> (the parts I have read) of this discussion?
> >> >>
> >> >> Best,
> >> >> ~G
> >> >>
> >> >> On Mon, Dec 6, 2021 at 10:03 AM Radford Neal 
> 
> >> >> wrote:
> >> >>
> >> > In pqR (see pqR-project.org), I have implemented ! and !! as 
> binary
> >> > string concatenation operators, equivalent to paste0 and paste,
> >> > respectively.
> >> >
> >> > For instance,
> >> >
> >> > > "hello" ! "world"
> >> > [1] "helloworld"
> >> > > "hello" !! "world"
> >> > [1] "hello world"
> >> > > "hello" !! 1:4
> >> > [1] "hello 1" "hello 2" "hello 3" "hello 4"
> >> 
> >>  I'm curious about the details:
> >> 
> >>  Would `1 ! 2` convert both to strings?
> >> >>>
> >> >>> They're equivalent to paste0 and paste, so 1 ! 2 produces "12", 
> just
> >> >>> like paste0(1,2) does. Of course, they wouldn't have to be exactly
> >> >>> equivalent to paste0 and paste - one could impose stricter
> >> >>> requirements if that seemed better for error detection. Off hand,
> >> >>> though, I think automatically converting is more in keeping 
> with the
> >> >>> rest of R. Explicitly converting with as.character could be 
> tedious.
> >> >>>
> >> >>> I suppose disallowing logical arguments might make sense to guard
> >> >>> against typos where ! was meant to be the unary-not operator, but
> >> >>> ended up being a binary operator, after some sort of typo. I doubt
> >> >>> that this would be a common error, though.
> >> >>>
> >> >>> (Note that there's no ambiguity when there are no typos, except 
> that
> >> >>> when negation is involved a space may be needed - so, for example,
> >> >>> "x" ! !TRUE is "xFALSE", but "x"!!TRUE is "x TRUE". Existing 
> uses of
> >> >>> double negation are still fine - eg, a <- !!TRUE still sets a 
> to TRUE.
> >> >>> Parsing of operators is greedy, so "x"!!!TRUE is "x FALSE", not
> >> "xTRUE".)
> >> >>>
> >>  Where does the binary ! fit in the operator priority? E.g. how is
> >> 
> >>  a ! b > c
> >> 
> >>  parsed?
> >> >>>
> >> >>> As (a ! b) > c.
> >> >>>
> >> >>> Their precedence is between that of + and - and that of < and >.
> >> >>> So "x" ! 1+2 evalates to "x3" and "x" ! 1+2 < "x4" is TRUE.
> >> >>>
> >> >>> (Actually, pqR also has a .. operator that fixes the problems with
> >> >>> generating sequences with the : operator, and it has 

Re: [Rd] string concatenation operator (revisited)

2021-12-06 Thread Gabriel Becker
As I recall, there was a large discussion related to that which resulted in
the recycle0 argument being added (but defaulting to FALSE) for
paste/paste0.

I think a lot of these things ultimately mean that if there were to be a
string concatenation operator, it probably shouldn't have behavior
identical to paste0. Was that what you were getting at as well, Bill?

~G

On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap  wrote:

> Should paste0(character(0), c("a","b")) give character(0)?
> There is a fair bit of code that assumes that paste("X",NULL) gives "X"
> but c(1,2)+NULL gives numeric(0).
>
> -Bill
>
> On Mon, Dec 6, 2021 at 1:32 PM Duncan Murdoch 
> wrote:
>
>> On 06/12/2021 4:21 p.m., Avraham Adler wrote:
>> > Gabe, I agree that missingness is important to factor in. To somewhat
>> abuse
>> > the terminology, NA is often used to represent missingness. Perhaps
>> > concatenating character something with character something missing
>> should
>> > result in the original character?
>>
>> I think that's a bad idea.  If you wanted to represent an empty string,
>> you should use "" or NULL, not NA.
>>
>> I'd agree with Gabe, paste0("abc", NA) shouldn't give "abcNA", it should
>> give NA.
>>
>> Duncan Murdoch
>>
>> >
>> > Avi
>> >
>> > On Mon, Dec 6, 2021 at 3:35 PM Gabriel Becker 
>> wrote:
>> >
>> >> Hi All,
>> >>
>> >> Seeing this and the other thread (and admittedly not having clicked
>> through
>> >> to the linked r-help thread), I wonder about NAs.
>> >>
>> >> Should NA  "hi there"  not result in NA_character_? This is not
>> >> what any of the paste functions do, but in my opinoin, NA +
>> 
>> >> seems like it should be NA  (not "NA"), particularly if we are talking
>> >> about `+` overloading, but potentially even in the case of a distinct
>> >> concatenation operator?
>> >>
>> >> I guess what I'm saying is that in my head missingness propagation
>> rules
>> >> should take priority in such an operator (ie NA +  should
>> >> *always * be NA).
>> >>
>> >> Is that something others disagree with, or has it just not come up yet
>> in
>> >> (the parts I have read) of this discussion?
>> >>
>> >> Best,
>> >> ~G
>> >>
>> >> On Mon, Dec 6, 2021 at 10:03 AM Radford Neal 
>> >> wrote:
>> >>
>> > In pqR (see pqR-project.org), I have implemented ! and !! as binary
>> > string concatenation operators, equivalent to paste0 and paste,
>> > respectively.
>> >
>> > For instance,
>> >
>> >   > "hello" ! "world"
>> >   [1] "helloworld"
>> >   > "hello" !! "world"
>> >   [1] "hello world"
>> >   > "hello" !! 1:4
>> >   [1] "hello 1" "hello 2" "hello 3" "hello 4"
>> 
>>  I'm curious about the details:
>> 
>>  Would `1 ! 2` convert both to strings?
>> >>>
>> >>> They're equivalent to paste0 and paste, so 1 ! 2 produces "12", just
>> >>> like paste0(1,2) does.  Of course, they wouldn't have to be exactly
>> >>> equivalent to paste0 and paste - one could impose stricter
>> >>> requirements if that seemed better for error detection.  Off hand,
>> >>> though, I think automatically converting is more in keeping with the
>> >>> rest of R.  Explicitly converting with as.character could be tedious.
>> >>>
>> >>> I suppose disallowing logical arguments might make sense to guard
>> >>> against typos where ! was meant to be the unary-not operator, but
>> >>> ended up being a binary operator, after some sort of typo.  I doubt
>> >>> that this would be a common error, though.
>> >>>
>> >>> (Note that there's no ambiguity when there are no typos, except that
>> >>> when negation is involved a space may be needed - so, for example,
>> >>> "x" !  !TRUE is "xFALSE", but "x"!!TRUE is "x TRUE".  Existing uses of
>> >>> double negation are still fine - eg, a <- !!TRUE still sets a to TRUE.
>> >>> Parsing of operators is greedy, so "x"!!!TRUE is "x FALSE", not
>> "xTRUE".)
>> >>>
>>  Where does the binary ! fit in the operator priority?  E.g. how is
>> 
>> a ! b > c
>> 
>>  parsed?
>> >>>
>> >>> As (a ! b) > c.
>> >>>
>> >>> Their precedence is between that of + and - and that of < and >.
>> >>> So "x" ! 1+2 evalates to "x3" and "x" ! 1+2 < "x4" is TRUE.
>> >>>
>> >>> (Actually, pqR also has a .. operator that fixes the problems with
>> >>> generating sequences with the : operator, and it has precedence lower
>> >>> than + and - and higher than ! and !!, but that's not relevant if you
>> >>> don't have the .. operator.)
>> >>>
>> >>> Radford Neal
>> >>>
>> >>> __
>> >>> R-devel@r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>>
>> >>
>> >>  [[alternative HTML version deleted]]
>> >>
>> >> __
>> >> R-devel@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>
>>
>> __
>> R-devel@r-project.org mailing list
>> 

Re: [Rd] string concatenation operator (revisited)

2021-12-06 Thread Bill Dunlap
Should paste0(character(0), c("a","b")) give character(0)?
There is a fair bit of code that assumes that paste("X",NULL) gives "X" but
c(1,2)+NULL gives numeric(0).

-Bill

On Mon, Dec 6, 2021 at 1:32 PM Duncan Murdoch 
wrote:

> On 06/12/2021 4:21 p.m., Avraham Adler wrote:
> > Gabe, I agree that missingness is important to factor in. To somewhat
> abuse
> > the terminology, NA is often used to represent missingness. Perhaps
> > concatenating character something with character something missing should
> > result in the original character?
>
> I think that's a bad idea.  If you wanted to represent an empty string,
> you should use "" or NULL, not NA.
>
> I'd agree with Gabe, paste0("abc", NA) shouldn't give "abcNA", it should
> give NA.
>
> Duncan Murdoch
>
> >
> > Avi
> >
> > On Mon, Dec 6, 2021 at 3:35 PM Gabriel Becker 
> wrote:
> >
> >> Hi All,
> >>
> >> Seeing this and the other thread (and admittedly not having clicked
> through
> >> to the linked r-help thread), I wonder about NAs.
> >>
> >> Should NA  "hi there"  not result in NA_character_? This is not
> >> what any of the paste functions do, but in my opinoin, NA +
> 
> >> seems like it should be NA  (not "NA"), particularly if we are talking
> >> about `+` overloading, but potentially even in the case of a distinct
> >> concatenation operator?
> >>
> >> I guess what I'm saying is that in my head missingness propagation rules
> >> should take priority in such an operator (ie NA +  should
> >> *always * be NA).
> >>
> >> Is that something others disagree with, or has it just not come up yet
> in
> >> (the parts I have read) of this discussion?
> >>
> >> Best,
> >> ~G
> >>
> >> On Mon, Dec 6, 2021 at 10:03 AM Radford Neal 
> >> wrote:
> >>
> > In pqR (see pqR-project.org), I have implemented ! and !! as binary
> > string concatenation operators, equivalent to paste0 and paste,
> > respectively.
> >
> > For instance,
> >
> >   > "hello" ! "world"
> >   [1] "helloworld"
> >   > "hello" !! "world"
> >   [1] "hello world"
> >   > "hello" !! 1:4
> >   [1] "hello 1" "hello 2" "hello 3" "hello 4"
> 
>  I'm curious about the details:
> 
>  Would `1 ! 2` convert both to strings?
> >>>
> >>> They're equivalent to paste0 and paste, so 1 ! 2 produces "12", just
> >>> like paste0(1,2) does.  Of course, they wouldn't have to be exactly
> >>> equivalent to paste0 and paste - one could impose stricter
> >>> requirements if that seemed better for error detection.  Off hand,
> >>> though, I think automatically converting is more in keeping with the
> >>> rest of R.  Explicitly converting with as.character could be tedious.
> >>>
> >>> I suppose disallowing logical arguments might make sense to guard
> >>> against typos where ! was meant to be the unary-not operator, but
> >>> ended up being a binary operator, after some sort of typo.  I doubt
> >>> that this would be a common error, though.
> >>>
> >>> (Note that there's no ambiguity when there are no typos, except that
> >>> when negation is involved a space may be needed - so, for example,
> >>> "x" !  !TRUE is "xFALSE", but "x"!!TRUE is "x TRUE".  Existing uses of
> >>> double negation are still fine - eg, a <- !!TRUE still sets a to TRUE.
> >>> Parsing of operators is greedy, so "x"!!!TRUE is "x FALSE", not
> "xTRUE".)
> >>>
>  Where does the binary ! fit in the operator priority?  E.g. how is
> 
> a ! b > c
> 
>  parsed?
> >>>
> >>> As (a ! b) > c.
> >>>
> >>> Their precedence is between that of + and - and that of < and >.
> >>> So "x" ! 1+2 evalates to "x3" and "x" ! 1+2 < "x4" is TRUE.
> >>>
> >>> (Actually, pqR also has a .. operator that fixes the problems with
> >>> generating sequences with the : operator, and it has precedence lower
> >>> than + and - and higher than ! and !!, but that's not relevant if you
> >>> don't have the .. operator.)
> >>>
> >>> Radford Neal
> >>>
> >>> __
> >>> R-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>
> >>
> >>  [[alternative HTML version deleted]]
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-06 Thread Duncan Murdoch

On 06/12/2021 4:21 p.m., Avraham Adler wrote:

Gabe, I agree that missingness is important to factor in. To somewhat abuse
the terminology, NA is often used to represent missingness. Perhaps
concatenating character something with character something missing should
result in the original character?


I think that's a bad idea.  If you wanted to represent an empty string, 
you should use "" or NULL, not NA.


I'd agree with Gabe, paste0("abc", NA) shouldn't give "abcNA", it should 
give NA.


Duncan Murdoch



Avi

On Mon, Dec 6, 2021 at 3:35 PM Gabriel Becker  wrote:


Hi All,

Seeing this and the other thread (and admittedly not having clicked through
to the linked r-help thread), I wonder about NAs.

Should NA  "hi there"  not result in NA_character_? This is not
what any of the paste functions do, but in my opinoin, NA + 
seems like it should be NA  (not "NA"), particularly if we are talking
about `+` overloading, but potentially even in the case of a distinct
concatenation operator?

I guess what I'm saying is that in my head missingness propagation rules
should take priority in such an operator (ie NA +  should
*always * be NA).

Is that something others disagree with, or has it just not come up yet in
(the parts I have read) of this discussion?

Best,
~G

On Mon, Dec 6, 2021 at 10:03 AM Radford Neal 
wrote:


In pqR (see pqR-project.org), I have implemented ! and !! as binary
string concatenation operators, equivalent to paste0 and paste,
respectively.

For instance,

  > "hello" ! "world"
  [1] "helloworld"
  > "hello" !! "world"
  [1] "hello world"
  > "hello" !! 1:4
  [1] "hello 1" "hello 2" "hello 3" "hello 4"


I'm curious about the details:

Would `1 ! 2` convert both to strings?


They're equivalent to paste0 and paste, so 1 ! 2 produces "12", just
like paste0(1,2) does.  Of course, they wouldn't have to be exactly
equivalent to paste0 and paste - one could impose stricter
requirements if that seemed better for error detection.  Off hand,
though, I think automatically converting is more in keeping with the
rest of R.  Explicitly converting with as.character could be tedious.

I suppose disallowing logical arguments might make sense to guard
against typos where ! was meant to be the unary-not operator, but
ended up being a binary operator, after some sort of typo.  I doubt
that this would be a common error, though.

(Note that there's no ambiguity when there are no typos, except that
when negation is involved a space may be needed - so, for example,
"x" !  !TRUE is "xFALSE", but "x"!!TRUE is "x TRUE".  Existing uses of
double negation are still fine - eg, a <- !!TRUE still sets a to TRUE.
Parsing of operators is greedy, so "x"!!!TRUE is "x FALSE", not "xTRUE".)


Where does the binary ! fit in the operator priority?  E.g. how is

   a ! b > c

parsed?


As (a ! b) > c.

Their precedence is between that of + and - and that of < and >.
So "x" ! 1+2 evalates to "x3" and "x" ! 1+2 < "x4" is TRUE.

(Actually, pqR also has a .. operator that fixes the problems with
generating sequences with the : operator, and it has precedence lower
than + and - and higher than ! and !!, but that's not relevant if you
don't have the .. operator.)

Radford Neal

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-06 Thread Avraham Adler
Gabe, I agree that missingness is important to factor in. To somewhat abuse
the terminology, NA is often used to represent missingness. Perhaps
concatenating character something with character something missing should
result in the original character?

Avi

On Mon, Dec 6, 2021 at 3:35 PM Gabriel Becker  wrote:

> Hi All,
>
> Seeing this and the other thread (and admittedly not having clicked through
> to the linked r-help thread), I wonder about NAs.
>
> Should NA  "hi there"  not result in NA_character_? This is not
> what any of the paste functions do, but in my opinoin, NA + 
> seems like it should be NA  (not "NA"), particularly if we are talking
> about `+` overloading, but potentially even in the case of a distinct
> concatenation operator?
>
> I guess what I'm saying is that in my head missingness propagation rules
> should take priority in such an operator (ie NA +  should
> *always * be NA).
>
> Is that something others disagree with, or has it just not come up yet in
> (the parts I have read) of this discussion?
>
> Best,
> ~G
>
> On Mon, Dec 6, 2021 at 10:03 AM Radford Neal 
> wrote:
>
> > > > In pqR (see pqR-project.org), I have implemented ! and !! as binary
> > > > string concatenation operators, equivalent to paste0 and paste,
> > > > respectively.
> > > >
> > > > For instance,
> > > >
> > > >  > "hello" ! "world"
> > > >  [1] "helloworld"
> > > >  > "hello" !! "world"
> > > >  [1] "hello world"
> > > >  > "hello" !! 1:4
> > > >  [1] "hello 1" "hello 2" "hello 3" "hello 4"
> > >
> > > I'm curious about the details:
> > >
> > > Would `1 ! 2` convert both to strings?
> >
> > They're equivalent to paste0 and paste, so 1 ! 2 produces "12", just
> > like paste0(1,2) does.  Of course, they wouldn't have to be exactly
> > equivalent to paste0 and paste - one could impose stricter
> > requirements if that seemed better for error detection.  Off hand,
> > though, I think automatically converting is more in keeping with the
> > rest of R.  Explicitly converting with as.character could be tedious.
> >
> > I suppose disallowing logical arguments might make sense to guard
> > against typos where ! was meant to be the unary-not operator, but
> > ended up being a binary operator, after some sort of typo.  I doubt
> > that this would be a common error, though.
> >
> > (Note that there's no ambiguity when there are no typos, except that
> > when negation is involved a space may be needed - so, for example,
> > "x" !  !TRUE is "xFALSE", but "x"!!TRUE is "x TRUE".  Existing uses of
> > double negation are still fine - eg, a <- !!TRUE still sets a to TRUE.
> > Parsing of operators is greedy, so "x"!!!TRUE is "x FALSE", not "xTRUE".)
> >
> > > Where does the binary ! fit in the operator priority?  E.g. how is
> > >
> > >   a ! b > c
> > >
> > > parsed?
> >
> > As (a ! b) > c.
> >
> > Their precedence is between that of + and - and that of < and >.
> > So "x" ! 1+2 evalates to "x3" and "x" ! 1+2 < "x4" is TRUE.
> >
> > (Actually, pqR also has a .. operator that fixes the problems with
> > generating sequences with the : operator, and it has precedence lower
> > than + and - and higher than ! and !!, but that's not relevant if you
> > don't have the .. operator.)
> >
> >Radford Neal
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-- 
Sent from Gmail Mobile

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-06 Thread Gabriel Becker
Hi All,

Seeing this and the other thread (and admittedly not having clicked through
to the linked r-help thread), I wonder about NAs.

Should NA  "hi there"  not result in NA_character_? This is not
what any of the paste functions do, but in my opinoin, NA + 
seems like it should be NA  (not "NA"), particularly if we are talking
about `+` overloading, but potentially even in the case of a distinct
concatenation operator?

I guess what I'm saying is that in my head missingness propagation rules
should take priority in such an operator (ie NA +  should
*always * be NA).

Is that something others disagree with, or has it just not come up yet in
(the parts I have read) of this discussion?

Best,
~G

On Mon, Dec 6, 2021 at 10:03 AM Radford Neal  wrote:

> > > In pqR (see pqR-project.org), I have implemented ! and !! as binary
> > > string concatenation operators, equivalent to paste0 and paste,
> > > respectively.
> > >
> > > For instance,
> > >
> > >  > "hello" ! "world"
> > >  [1] "helloworld"
> > >  > "hello" !! "world"
> > >  [1] "hello world"
> > >  > "hello" !! 1:4
> > >  [1] "hello 1" "hello 2" "hello 3" "hello 4"
> >
> > I'm curious about the details:
> >
> > Would `1 ! 2` convert both to strings?
>
> They're equivalent to paste0 and paste, so 1 ! 2 produces "12", just
> like paste0(1,2) does.  Of course, they wouldn't have to be exactly
> equivalent to paste0 and paste - one could impose stricter
> requirements if that seemed better for error detection.  Off hand,
> though, I think automatically converting is more in keeping with the
> rest of R.  Explicitly converting with as.character could be tedious.
>
> I suppose disallowing logical arguments might make sense to guard
> against typos where ! was meant to be the unary-not operator, but
> ended up being a binary operator, after some sort of typo.  I doubt
> that this would be a common error, though.
>
> (Note that there's no ambiguity when there are no typos, except that
> when negation is involved a space may be needed - so, for example,
> "x" !  !TRUE is "xFALSE", but "x"!!TRUE is "x TRUE".  Existing uses of
> double negation are still fine - eg, a <- !!TRUE still sets a to TRUE.
> Parsing of operators is greedy, so "x"!!!TRUE is "x FALSE", not "xTRUE".)
>
> > Where does the binary ! fit in the operator priority?  E.g. how is
> >
> >   a ! b > c
> >
> > parsed?
>
> As (a ! b) > c.
>
> Their precedence is between that of + and - and that of < and >.
> So "x" ! 1+2 evalates to "x3" and "x" ! 1+2 < "x4" is TRUE.
>
> (Actually, pqR also has a .. operator that fixes the problems with
> generating sequences with the : operator, and it has precedence lower
> than + and - and higher than ! and !!, but that's not relevant if you
> don't have the .. operator.)
>
>Radford Neal
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-06 Thread Radford Neal
> > In pqR (see pqR-project.org), I have implemented ! and !! as binary
> > string concatenation operators, equivalent to paste0 and paste,
> > respectively.
> > 
> > For instance,
> > 
> >  > "hello" ! "world"
> >  [1] "helloworld"
> >  > "hello" !! "world"
> >  [1] "hello world"
> >  > "hello" !! 1:4
> >  [1] "hello 1" "hello 2" "hello 3" "hello 4"
> 
> I'm curious about the details:
> 
> Would `1 ! 2` convert both to strings?

They're equivalent to paste0 and paste, so 1 ! 2 produces "12", just
like paste0(1,2) does.  Of course, they wouldn't have to be exactly
equivalent to paste0 and paste - one could impose stricter
requirements if that seemed better for error detection.  Off hand,
though, I think automatically converting is more in keeping with the
rest of R.  Explicitly converting with as.character could be tedious.

I suppose disallowing logical arguments might make sense to guard
against typos where ! was meant to be the unary-not operator, but
ended up being a binary operator, after some sort of typo.  I doubt
that this would be a common error, though.

(Note that there's no ambiguity when there are no typos, except that
when negation is involved a space may be needed - so, for example, 
"x" !  !TRUE is "xFALSE", but "x"!!TRUE is "x TRUE".  Existing uses of
double negation are still fine - eg, a <- !!TRUE still sets a to TRUE.
Parsing of operators is greedy, so "x"!!!TRUE is "x FALSE", not "xTRUE".)

> Where does the binary ! fit in the operator priority?  E.g. how is
> 
>   a ! b > c
> 
> parsed?

As (a ! b) > c.

Their precedence is between that of + and - and that of < and >.
So "x" ! 1+2 evalates to "x3" and "x" ! 1+2 < "x4" is TRUE.  

(Actually, pqR also has a .. operator that fixes the problems with
generating sequences with the : operator, and it has precedence lower
than + and - and higher than ! and !!, but that's not relevant if you
don't have the .. operator.)

   Radford Neal

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-06 Thread Duncan Murdoch

On 06/12/2021 1:14 a.m., Radford Neal wrote:

The TL;DR version is base R support for a `+.character` method. This
would essentially provide a shortcut to `paste0`...


In pqR (see pqR-project.org), I have implemented ! and !! as binary
string concatenation operators, equivalent to paste0 and paste,
respectively.

For instance,

 > "hello" ! "world"
 [1] "helloworld"
 > "hello" !! "world"
 [1] "hello world"
 > "hello" !! 1:4
 [1] "hello 1" "hello 2" "hello 3" "hello 4"


I'm curious about the details:

Would `1 ! 2` convert both to strings?

Where does the binary ! fit in the operator priority?  E.g. how is

  a ! b > c

parsed?

Duncan Murdoch


 
This seems preferable to overloading the + operator, which would lead

to people reading code wondering whether a+b is doing an addition or a
string concatenation.  There are very few circumstances in which one
would want to write code where a+b might be either of these.  So it's
better to make clear what is going on by having a different operator
for string concatenation.

Plus ! and !! semm natural for representing paste0 and paste, whereas
using ++ for paste (with + for paste0) would look rather strange.

Radford Neal

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-05 Thread Radford Neal
> The TL;DR version is base R support for a `+.character` method. This
> would essentially provide a shortcut to `paste0`...

In pqR (see pqR-project.org), I have implemented ! and !! as binary
string concatenation operators, equivalent to paste0 and paste,
respectively.  

For instance,

> "hello" ! "world"
[1] "helloworld"
> "hello" !! "world"
[1] "hello world"
> "hello" !! 1:4
[1] "hello 1" "hello 2" "hello 3" "hello 4"

This seems preferable to overloading the + operator, which would lead
to people reading code wondering whether a+b is doing an addition or a
string concatenation.  There are very few circumstances in which one
would want to write code where a+b might be either of these.  So it's
better to make clear what is going on by having a different operator
for string concatenation.  

Plus ! and !! semm natural for representing paste0 and paste, whereas
using ++ for paste (with + for paste0) would look rather strange.

   Radford Neal

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-05 Thread Duncan Murdoch

On 05/12/2021 7:22 a.m., Ivan Krylov wrote:

On Sat, 4 Dec 2021 21:26:05 -0500
Avi Gross via R-devel  wrote:


In many languages, like PERL, this results in implicated conversion
to make "text1" the result.


FWIW, Perl5 has a separate string concatenation operator (".") in order
to avoid potential confusion with addition. So do Lua (".."), SQL
("||", only some of the dialects) and Raku ("~", former Perl6). Some of
the potential concerns with string concatenation as an operator in R
could be alleviated by introducing a separate operator, just like matrix
multiplication ("%*%") is separate from elementwise multiplication
("*"), nowadays even in Python ("@" and "*", respectively).



People seem to handle the automatic conversion of comparison operators. 
 Occasionally someone is surprised that


  123 < "5"

is TRUE, but mostly people muddle along.

One possible issue is that for some things (e.g. S3 Arith group 
generic), "+" is grouped with the other arithmetic operators, "-", "*", 
"^", "%%", "%/%", "/".  I don't think it would make sense for any of 
them to work on strings.  But there are exceptions listed for number of 
arguments among the Math group, so an exception in the Arith group 
wouldn't be the end of the world.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-05 Thread GILLIBERT, Andre
Ivan Krylov  wrote:
> FWIW, Perl5 has a separate string concatenation operator (".") in order
> to avoid potential confusion with addition. So do Lua (".."), SQL
> ("||", only some of the dialects) and Raku ("~", former Perl6).


Indeed, using the same operator '+' for addition and string concatenation is 
not a great idea in my opinion.
Accidental character arguments to a '+' that meant to be a numerical addition 
would go undetected. Bug tracking would be harder in that case.

R is already too permissive: it finds some interpretation of most probably 
buggy code, such as ifelse() on vectors of unequal length or '[' operator with 
only one argument to index a matrix. I would not want to add new permissive 
behaviors.

A new operator, dedicated to string concatenation, such as %+% or %.%, would be 
better, in my opinion.

--
Sincerely
Andr� GILLIBERT

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-05 Thread Ivan Krylov
On Sat, 4 Dec 2021 21:26:05 -0500
Avi Gross via R-devel  wrote:

> In many languages, like PERL, this results in implicated conversion
> to make "text1" the result.

FWIW, Perl5 has a separate string concatenation operator (".") in order
to avoid potential confusion with addition. So do Lua (".."), SQL
("||", only some of the dialects) and Raku ("~", former Perl6). Some of
the potential concerns with string concatenation as an operator in R
could be alleviated by introducing a separate operator, just like matrix
multiplication ("%*%") is separate from elementwise multiplication
("*"), nowadays even in Python ("@" and "*", respectively).

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-04 Thread Rui Barradas

Hello,

Bert Gunter started a very recent R-Help thread [1] about the following 
method not working.


`+.character` <- function(x, y) paste0(x, y)


The discussion is worth reading and at least partly answers to the 
reason why the feature request has never made it to base R.


It goes without saying that I do not speak for the R Core team.


[1] https://stat.ethz.ch/pipermail/r-help/2021-December/473163.html


Hope this helps,

Rui Barradas

Às 22:36 de 04/12/21, Grant McDermott escreveu:

Hi all,

I wonder if the R Core team might reconsider an old feature request, as 
detailed in this 2005 thread: 
https://stat.ethz.ch/pipermail/r-help/2005-February/thread.html#66698

The TL;DR version is base R support for a `+.character` method. This would 
essentially provide a shortcut to `paste​0`, in much the same way that `\(x)` 
now provides a shortcut to `function(x)`.


a = "hello "; b = "world"
a + b
[1] "hello world"


I appreciate some of the original concerns raised against a native "string1 + 
string2" implementation. The above thread also provides several use-at-your-own-risk 
workarounds. But sixteen years is a long time in software development and R now stands as 
something of an exception on this score. Python, Julia, Stata, and SQL (among various 
others) all support native string concatenation/interpolation using binary/arithmetic 
operators. It's been a surprising source of frustration for students in some of the 
classes I teach, particularly those coming from another language.

Many thanks for considering.

PS. I hope I didn't miss any additional discussion of this issue beyond the 
original 2005 thread. My search efforts didn't turn anything else up, except 
this popular Stackoverflow question: 
https://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r

Grant McDermott
Assistant Professor
Department of Economics
University of Oregon
www.grantmcdermott.com


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-04 Thread Avi Gross via R-devel
Grant,

One nit to consider is that the default behavior of pasteo() to include a space 
as a separator would not be a perfect choice for the usual meaning of plus. 

I would prefer a+b to be "helloworld" in your example and to get what you say 
would be 

a + " " + b

Which I assume would put in a space where you want it and not where you don't.

As I am sure you have been told, you already can make an operator like this:

`%+%` <- function(x, y) paste0(x, y)

And then use:

a %+% b

And to do it this way, you might have two such functions where %+% does NOT add 
a space but the odd version with a space in it, % +% or %++% does add a space!

`%+%` <- function(x, y) paste0(x, y, sep="")
`%++%` <- function(x, y) paste0(x, " ",  y)
`% +%` <- function(x, y) paste0(x, " ",  y)

Now testing it with:

a = "hello"; b = "world" # NOTE I removed the trailing space you had in "a".

> a %+% b
[1] "helloworld"
> a %++% b
[1] "hello world"
> a % +% b
[1] "hello world"

It also seems to work with multiple units mixed in a row as shown below:

> a %+% b % +% a %++% b
[1] "helloworld hello world"

And it sort of works with vectors of strings or numbers using string 
concatenation:

> a <- letters[1:3]
> b <- seq(from=101, to = 301, by = 100)
> a %+% b %+% a
[1] "a101a" "b201b" "c301c"

But are you asking for a naked "+" sign to be vectorized like that?

And what if someone accidentally types something like:

a = "text"
a = a + 1

The addition now looks like adding an integer to a text string. In many 
languages, like PERL, this results in implicated conversion to make "text1" the 
result. My work-around does that:

> a = a %+% 1
> a
[1] "text1"

BUT what you are asking for is for R to do normal addition if a and b are both 
numeric and presumably do (as languages like Python do) text concatenation when 
they are both text. What do you suggest happen if one is numeric and the other 
is text or perhaps some arbitrary data type? 

I checked to see what Python version 3.9 does:

>>> 5 + 4
9
>>> "5" + "4"
'54'
>>> "5" + 4
Traceback (most recent call last):
  File "", line 1, in 
"5" + 4
TypeError: can only concatenate str (not "int") to str

It is clear it does not normally support such mixed methods, albeit I can 
probably easily create an object sub-class where I create a dunder method that 
perhaps checks if one of the two things being added can be coerced into a 
string or into a number as needed to convert so the two types match.

But this is about R.

As others have said, the underlying early philosophy of R being created as a 
language did not head the same way as some other languages and R is mainly not 
the same kind of object-oriented as some others and thus some things are not 
trivially done but can be done using other ways like the %+% technique above.

But R also allows weird things like this: 
# VERY CAREFULLY as overwriting "+" means you cannot use it in your other ...
# So not a suggested idea but if done you must preserve the original meaning of 
plus elsewhere like I do.

flexible_plus <- function(first, second) {
  if (all(is.numeric(first), is.numeric(second))) return(first + second)
  if (all(is.character(first), is.character(second))) return(paste0(first, 
second))
  # If you reach here, there is an error
  print("ERROR: both arguments must be numeric or both character")
  return(NULL)
}

Now define things carefully to use something like the function flexible_plus I 
created becoming the MEANING of a naked plus sign.  But note it will now be 
used in other ways and places in any code that does addition so it is not an 
ideal solution. It does sort of work, FWIW.

`%+++%` <- `+`
`+` <- flexible_plus

Finally some testing:

> 5 %+++% 3
[1] 8
> flexible_plus(5, 3)
[1] 8
> 5 + 3
[1] 8
> "hello" + "world"
[1] "helloworld"
> "hello" + 5
[1] "ERROR: both arguments must be numeric or both character"
NULL

It does seem to do approximately what I said it would do but also does some 
vectorized things as well as long as all are the same type:

> c(1,2,3) + 4
[1] 5 6 7
> c(1,2,3) + c(4,5,6)
[1] 5 7 9
> c("word1", "word2", "word3") + "more"
[1] "word1more" "word2more" "word3more"
> c("word1", "word2", "word3") + c("more", "snore")
[1] "word1more"  "word2snore" "word3more"

Again, the above code is for illustration purposes only. I would be beyond 
shocked if the above did not break something somewhere and it certainly is not 
as efficient as the built-in adder. As an exercise, it looks reasonable. LOL!


-Original Message-
From: R-devel  On Behalf Of Grant McDermott
Sent: Saturday, December 4, 2021 5:37 PM
To: r-devel@r-project.org
Subject: [Rd] string concatenation operator (revisited)

Hi all,

I wonder if the R Core team might reconsider an old feature request, as 
detailed in this 2005 thread: 
https://stat.ethz.ch/pipermail/r-help/2005-February/thread.html#66698

The TL;DR version is base R support for a `+.character` method. This would 
essentially provide a shortcut to `paste​0`, in much the same