Re: [Rd] String interpolation [Was: string concatenation operator (revisited)]

2021-12-07 Thread Taras Zakharko
> I don't think a custom type alone would work, because users would expect to 
> use such string anywhere a regular string can be used, and that's where the 
> problems start - the evaluation would have to happen at a point where it is 
> not expected since we can assume today that CHAR() doesn't evaluate. If it's 
> just construct that needs some function call to turn it into a real string, 
> then that's (from user's perspective) no different than glue() 
> 

Oh, it will be still evaluated as expected. It would just be a new type of 
language expression, just like byte code or call or a promise. You just need a 
new case in the switch statement of eval(). The rest is just lazy evaluation as 
usual, no change of rules is needed. Of course, some rules need to be 
established on when exactly the evaluation kicks in  (and this can be a bit 
tricky), but I am sure one can figure out a sane approach — my intuition would 
be to evaluate a format string any time one evaluates a promise. In fact, it 
could probably be treated as a special type of promise itself, with value 
caching and all. Under which approach the end user will never see the special 
type, every time you assign a formatted string somewhere, it will get evaluated 
to a plain old character vector. But if passed as an argument you get the 
benefits of lazy evaluation. 

What functions could do is suspend the evaluation to check if an argument is a 
(processed) format string and apply custom formatting to it. Again, not any 
different from today’s R, where  you can capture the lazy expression and apply 
transformations to it. The R parser just does some basic preprocessing for you. 

> admittedly, you could do a lot more with such internal type, but not sure if 
> the complexity is worth it

That’s the question :) I am not sure either. It was just a spontaneous idea I 
thew out there, not a result of careful deliberation. Still, I believe it can 
be useful to think about things like that, it just might give the right person 
just the right idea. 


> For what it's worth, you can also get 90% of the way there with:
> 
>f <- glue::glue
>f("if you squint, this is a Python f-string”)
> 
> ...
> 
> That said, if something like this were to happen in R, my vote would
> be an implementation in the parser that transformed f"string" into
> something like 'interpolate("string")', so that f"string" would just
> become syntactic sugar for already-existing code

Not really. With this approach expression parsing would still be done at 
evaluation time, so you don’t get any  of the potential benefits that come from 
my suggestion (expression parsing at parse time, higher runtime performance, 
correctly captured expression promises). 

One quick note about parser transformations: lazy evaluation with expression 
capturing (substitution) is one of unique strength of R, as it allows one to 
trivially implement powerful DLSs on top of the language (as demonstrated by 
“tidy evaluation” implementation in tidyverse).  Parser transformations might 
make the implementation simpler, but they remove the  information from the 
parse tree and reduce opportunities. 

— Taras


> On 8 Dec 2021, at 00:13, Kevin Ushey  wrote:
> 
> For what it's worth, you can also get 90% of the way there with:
> 
>f <- glue::glue
>f("if you squint, this is a Python f-string")
> 
> Having this in an add-on package also makes it much easier to change
> in response to user feedback; R packages have more freedom to make
> backwards-incompatible changes.
> 
> That said, if something like this were to happen in R, my vote would
> be an implementation in the parser that transformed f"string" into
> something like 'interpolate("string")', so that f"string" would just
> become syntactic sugar for already-existing code (and so such code
> could remain debuggable, easy to reason about, etc without any changes
> to R internals)
> 
> Thanks,
> Kevin
> 
> On Tue, Dec 7, 2021 at 2:06 PM Simon Urbanek
>  wrote:
>> 
>> I don't think a custom type alone would work, because users would expect to 
>> use such string anywhere a regular string can be used, and that's where the 
>> problems start - the evaluation would have to happen at a point where it is 
>> not expected since we can assume today that CHAR() doesn't evaluate. If it's 
>> just construct that needs some function call to turn it into a real string, 
>> then that's (from user's perspective) no different than glue() so I don't 
>> think the users would see the benefit (admittedly, you could do a lot more 
>> with such internal type, but not sure if the complexity is worth it).
>> 
>> Cheers,
>> Simon
>> 
>> 
>> 
>>> On Dec 8, 2021, at 12:56 AM, Taras Zakharko  wrote:
>>> 
>>> I fully agree! General string interpolation opens a gaping security hole 
>>> and is accompanied by all kinds of problems and decisions. What I envision 
>>> instead is something like this:
>>> 
>>>  f”hello {name}”
>>> 
>>> Which gets parsed by R to this:
>>> 
>>>  

Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Avi Gross via R-devel
Taras and Duncan and others do make a point about things not needing to be 
built in to the base R distribution if something similar can already be found 
elsewhere.

To an extent, that is quite true. But what exactly should be in the core of a 
language that has this kind of extensibility? 

I note how annoying it can be to load a package that then loads all kinds of 
other packages it depends on and often ones you personally will not know 
anything about and mostly never use directly. If core R was minimal, this can 
get worse and there can be serious overhead.

Obviously some code belongs there that directly interacts with the operating 
system or that implements major parts of the language. But clearly there was 
more put into S/R than the minimum even from early days based on how the 
language was expected to be used. And it has grown further over the years. The 
recent addition of a modified form of a pipe operator, along with a new way to 
declare a function so it can be added into a pipeline, are examples. Ideally, 
any feature that becomes used heavily that is already in a package, let alone a 
package with many such useful features, can be a candidate for inclusion 
directly or by emulation.

Back to string concatenation, I think it is fair to suggest S began as a 
statistical language of sorts with a heavy emphasis on numeric data and on 
vectorized data that led to vectors and data.frames being "built-in" so doing 
lots more with text was a secondary consideration that functions like paste() 
not only could easily handle, but could also handle vectorized input. It works 
pretty well and arguably overloading '+' is not needed. And note, underneath it 
all, R programs can largely be written using functions rather than operators. 
You can type:

`+`(5, `*`(2, 3))

and it evaluates to 11 and means 5+(2*3) and 

And paste() is not the only function you can use to do string concatenation. 
Consider one trivial use of sprintf() which also does much more:

> first <- "Avi"
> last <- "Gross"
> combined <- sprintf("%s%s", first, last)
> print(combined)
[1] "AviGross"

Obviously this also supports including a space between the %s copies and so on.


I note other languages also keep trying to expand to be everything for 
everybody and can use examples from many but Python is easy to see in many ways 
and is a bit of a competitor to R for some purposes. Python too has  packages 
called modules that extend the interpreted language and have had tons of 
modules added over the years including some to deal with items not included 
when the language was created. One reason R has done so well is that Python had 
things like lists but had no vectorized methods and other components like R did 
so lots of programs must first import modules like numpy and pandas to be able 
to create Series and Dataframes and manipulate them efficiently. But many 
modules have now been built on top of these extensions for various kinds of 
scientific programming and at some point you wonder why it is not built-in to 
the language to fill a gap they left. Lists are slow and dictionaries have 
limited use for many things. Tasks like machine learning can use huge amounts 
of data and do complex calculations repeatedly so Python has had to be 
extended. Yet, there too, most things have to be imported at runtime.

I am not a fanatic in R about the tidyverse set of packages  and often do some 
things using the built-in ways or use the tidyverse or mix and match. Both have 
value for me and some things remain easier than others depending on 
circumstances. Of course, using the same function name as other packages makes 
it hard to incorporate. But I don't think it would be hard to create a base R 
that includes a subset of the tidyverse as part of the base and leave other 
parts to be brought in only as needed.

The talk about string concatenation, also mentions the use of the glue package 
that I also sometimes use. The concatenation of strings and other types into a 
bigger string is often done in many languages and I note I have used five 
different methods in Python that are built-in as people keep wanting to bring 
in the way it is already done in some other language they like. I am talking 
about not so much concatenation but variants on the printf() family to format a 
string from many components and some look a bit like glue.  Potentially, a 
package like glue could also qualify as worth including in base R but let me 
clarify. There is a difference between being in the minimal core of a language 
and being in a list of packages that are by default included when R is built. 
Even if you include a package by default, it should not be an error to say 
library(name) if it is already loaded on your machine. So even after you make 
something part of the base distribution, people may continue to invoke it as if 
it was not there, lest the code be run on an older version.

The reality is that there can be significant costs in a tradeoff be

Re: [Rd] String interpolation [Was: string concatenation operator (revisited)]

2021-12-07 Thread Kevin Ushey
For what it's worth, you can also get 90% of the way there with:

f <- glue::glue
f("if you squint, this is a Python f-string")

Having this in an add-on package also makes it much easier to change
in response to user feedback; R packages have more freedom to make
backwards-incompatible changes.

That said, if something like this were to happen in R, my vote would
be an implementation in the parser that transformed f"string" into
something like 'interpolate("string")', so that f"string" would just
become syntactic sugar for already-existing code (and so such code
could remain debuggable, easy to reason about, etc without any changes
to R internals)

Thanks,
Kevin

On Tue, Dec 7, 2021 at 2:06 PM Simon Urbanek
 wrote:
>
> I don't think a custom type alone would work, because users would expect to 
> use such string anywhere a regular string can be used, and that's where the 
> problems start - the evaluation would have to happen at a point where it is 
> not expected since we can assume today that CHAR() doesn't evaluate. If it's 
> just construct that needs some function call to turn it into a real string, 
> then that's (from user's perspective) no different than glue() so I don't 
> think the users would see the benefit (admittedly, you could do a lot more 
> with such internal type, but not sure if the complexity is worth it).
>
> Cheers,
> Simon
>
>
>
> > On Dec 8, 2021, at 12:56 AM, Taras Zakharko  wrote:
> >
> > I fully agree! General string interpolation opens a gaping security hole 
> > and is accompanied by all kinds of problems and decisions. What I envision 
> > instead is something like this:
> >
> >   f”hello {name}”
> >
> > Which gets parsed by R to this:
> >
> >   (STRINTERPSXP (CHARSXP (PROMISE nil)))
> >
> > Basically, a new type of R language construct that still can be processed 
> > by packages (for customized interpolation like in cli etc.), with a default 
> > eval which is basically paste0(). The benefit here would be that this is 
> > eagerly parsed and syntactically checked, and that the promise code could 
> > carry a srcref. And of course, that you could pass an interpolated string 
> > expression lazily between frames without losing the environment etc… For 
> > more advanced applications, a low level string interpolation expression 
> > constructor could be provided (that could either parse a general string — 
> > at the user’s risk, or build it directly from expressions).
> >
> > — Taras
> >
> >

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] String interpolation [Was: string concatenation operator (revisited)]

2021-12-07 Thread Simon Urbanek
I don't think a custom type alone would work, because users would expect to use 
such string anywhere a regular string can be used, and that's where the 
problems start - the evaluation would have to happen at a point where it is not 
expected since we can assume today that CHAR() doesn't evaluate. If it's just 
construct that needs some function call to turn it into a real string, then 
that's (from user's perspective) no different than glue() so I don't think the 
users would see the benefit (admittedly, you could do a lot more with such 
internal type, but not sure if the complexity is worth it).

Cheers,
Simon



> On Dec 8, 2021, at 12:56 AM, Taras Zakharko  wrote:
> 
> I fully agree! General string interpolation opens a gaping security hole and 
> is accompanied by all kinds of problems and decisions. What I envision 
> instead is something like this:
> 
>   f”hello {name}” 
> 
> Which gets parsed by R to this:
> 
>   (STRINTERPSXP (CHARSXP (PROMISE nil)))
> 
> Basically, a new type of R language construct that still can be processed by 
> packages (for customized interpolation like in cli etc.), with a default eval 
> which is basically paste0(). The benefit here would be that this is eagerly 
> parsed and syntactically checked, and that the promise code could carry a 
> srcref. And of course, that you could pass an interpolated string expression 
> lazily between frames without losing the environment etc… For more advanced 
> applications, a low level string interpolation expression constructor could 
> be provided (that could either parse a general string — at the user’s risk, 
> or build it directly from expressions). 
> 
> — Taras
> 
> 
>> On 7 Dec 2021, at 12:06, Simon Urbanek  wrote:
>> 
>> 
>> 
>>> On Dec 7, 2021, at 22:09, Taras Zakharko >> > wrote:
>>> 
>>> Great summary, Avi. 
>>> 
>>> String concatenation cold be trivially added to R, but it probably should 
>>> not be. You will notice that modern languages tend not to use “+” to do 
>>> string concatenation (they either have 
>>> a custom operator or a special kind of pattern to do it) due to practical 
>>> issues such an approach brings (implicit type casting, lack of 
>>> commutativity, performance etc.). These issues will be felt even more so in 
>>> R with it’s weak typing, idiosyncratic casting behavior and NAs. 
>>> 
>>> As other’s have pointed out, any kind of behavior one wants from string 
>>> concatenation can be implemented by custom operators as needed. This is not 
>>> something that needs to be in the base R. I would rather like the efforts 
>>> to be directed on improving string formatting (such as glue-style built-in 
>>> string interpolation).
>>> 
>> 
>> This is getting OT, but there is a very good reason why string interpolation 
>> is not in core R. As I recall it has been considered some time ago, but it 
>> is very dangerous as it implies evaluation on constants which opens a huge 
>> security hole and has questionable semantics (where you evaluate etc). Hence 
>> it's much easier to ban a package than to hack it out of R ;).
>> 
>> Cheers,
>> Simon
>> 
>> 
>>> — Taras
>>> 
>>> 
 On 7 Dec 2021, at 02:27, Avi Gross via R-devel  
 wrote:
 
 After seeing what others are saying, it is clear that you need to carefully
 think things out before designing any implementation of a more native
 concatenation operator whether it is called "+' or anything else. There may
 not be any ONE right solution but unlike a function version like paste()
 there is nowhere to place any options that specify what you mean.
 
 You can obviously expand paste() to accept arguments like replace.NA="" or
 replace.NA="" and similar arguments on what to do if you see a NaN, and
 Inf or -Inf, a NULL or even an NA.character_ and so on. Heck, you might 
 tell
 to make other substitutions as in substitute=list(100=99, D=F) or any other
 nonsense you can come up with.
 
 But you have nowhere to put options when saying:
 
 c <- a + b
 
 Sure, you could set various global options before the addition and maybe
 rest them after, but that is not a way I like to go for something this
 basic.
 
 And enough such tinkering makes me wonder if it is easier to ask a user to
 use a slightly different function like this:
 
 paste.no.na <- function(...) do.call(paste, Filter(Negate(is.na),
 list(...)))
 
 The above one-line function removes any NA from the argument list to make a
 potentially shorter list before calling the real paste() using it.
 
 Variations can, of course, be made that allow functionality as above. 
 
 If R was a true object-oriented language in the same sense as others like
 Python, operator overloading of "+" might be doable in more complex ways 
 but
 we can only work with what we have. I tend to agree with others that in 
 some
 places R is so lenient tha

Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Martin Maechler
> Martin Maechler 
> on Tue, 7 Dec 2021 18:35:00 +0100 writes:

> Taras Zakharko 
> on Tue, 7 Dec 2021 12:56:30 +0100 writes:

>> I fully agree! General string interpolation opens a gaping security hole 
and is accompanied by all kinds of problems and decisions. What I envision 
instead is something like this:
>> f”hello {name}” 

>> Which gets parsed by R to this:

>> (STRINTERPSXP (CHARSXP (PROMISE nil)))

>> Basically, a new type of R language construct that still can be 
processed by packages (for customized interpolation like in cli etc.), with a 
default eval which is basically paste0(). The benefit here would be that this 
is eagerly parsed and syntactically checked, and that the promise code could 
carry a srcref. And of course, that you could pass an interpolated string 
expression lazily between frames without losing the environment etc… For more 
advanced applications, a low level string interpolation expression constructor 
could be provided (that could either parse a general string — at the user’s 
risk, or build it directly from expressions). 

>> — Taras

> Well, many months ago, R's  NEWS (for R-devel, then became R 4.0.0)
> contained

> * There is a new syntax for specifying _raw_ character constants
> similar to the one used in C++: r"(...)" with ... any character
> sequence not containing the sequence )".  This makes it easier to
> write strings that contain backslashes or both single and double
> quotes.  For more details see ?Quotes.

> This should be pretty close to what you propose above
> (well, you need to replace your UTF-8 forward double quotes by
> ASCII ones),
> no ?

No it is not; sorry I'm not at full strength..
Martin


>>> On 7 Dec 2021, at 12:06, Simon Urbanek  
wrote:
>>> 
>>> 
>>> 
 On Dec 7, 2021, at 22:09, Taras Zakharko mailto:taras.zakha...@uzh.ch>> wrote:
 
 Great summary, Avi. 
 
 String concatenation cold be trivially added to R, but it probably 
should not be. You will notice that modern languages tend not to use “+” to do 
string concatenation (they either have 
 a custom operator or a special kind of pattern to do it) due to 
practical issues such an approach brings (implicit type casting, lack of 
commutativity, performance etc.). These issues will be felt even more so in R 
with it’s weak typing, idiosyncratic casting behavior and NAs. 
 
 As other’s have pointed out, any kind of behavior one wants from 
string concatenation can be implemented by custom operators as needed. This is 
not something that needs to be in the base R. I would rather like the efforts 
to be directed on improving string formatting (such as glue-style built-in 
string interpolation).
 
>>> 
>>> This is getting OT, but there is a very good reason why string 
interpolation is not in core R. As I recall it has been considered some time 
ago, but it is very dangerous as it implies evaluation on constants which opens 
a huge security hole and has questionable semantics (where you evaluate etc). 
Hence it's much easier to ban a package than to hack it out of R ;).
>>> 
>>> Cheers,
>>> Simon
>>> 
 — Taras

> []

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Martin Maechler
> Taras Zakharko 
> on Tue, 7 Dec 2021 12:56:30 +0100 writes:

> I fully agree! General string interpolation opens a gaping security hole 
and is accompanied by all kinds of problems and decisions. What I envision 
instead is something like this:
> f”hello {name}” 

> Which gets parsed by R to this:

> (STRINTERPSXP (CHARSXP (PROMISE nil)))

> Basically, a new type of R language construct that still can be processed 
by packages (for customized interpolation like in cli etc.), with a default 
eval which is basically paste0(). The benefit here would be that this is 
eagerly parsed and syntactically checked, and that the promise code could carry 
a srcref. And of course, that you could pass an interpolated string expression 
lazily between frames without losing the environment etc… For more advanced 
applications, a low level string interpolation expression constructor could be 
provided (that could either parse a general string — at the user’s risk, or 
build it directly from expressions). 

> — Taras

Well, many months ago, R's  NEWS (for R-devel, then became R 4.0.0)
contained

* There is a new syntax for specifying _raw_ character constants
  similar to the one used in C++: r"(...)" with ... any character
  sequence not containing the sequence )".  This makes it easier to
  write strings that contain backslashes or both single and double
  quotes.  For more details see ?Quotes.

This should be pretty close to what you propose above
(well, you need to replace your UTF-8 forward double quotes by
ASCII ones),
no ?

>> On 7 Dec 2021, at 12:06, Simon Urbanek  
wrote:
>> 
>> 
>> 
>>> On Dec 7, 2021, at 22:09, Taras Zakharko mailto:taras.zakha...@uzh.ch>> wrote:
>>> 
>>> Great summary, Avi. 
>>> 
>>> String concatenation cold be trivially added to R, but it probably 
should not be. You will notice that modern languages tend not to use “+” to do 
string concatenation (they either have 
>>> a custom operator or a special kind of pattern to do it) due to 
practical issues such an approach brings (implicit type casting, lack of 
commutativity, performance etc.). These issues will be felt even more so in R 
with it’s weak typing, idiosyncratic casting behavior and NAs. 
>>> 
>>> As other’s have pointed out, any kind of behavior one wants from string 
concatenation can be implemented by custom operators as needed. This is not 
something that needs to be in the base R. I would rather like the efforts to be 
directed on improving string formatting (such as glue-style built-in string 
interpolation).
>>> 
>> 
>> This is getting OT, but there is a very good reason why string 
interpolation is not in core R. As I recall it has been considered some time 
ago, but it is very dangerous as it implies evaluation on constants which opens 
a huge security hole and has questionable semantics (where you evaluate etc). 
Hence it's much easier to ban a package than to hack it out of R ;).
>> 
>> Cheers,
>> Simon
>> 
>>> — Taras

 []

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Dirk Eddelbuettel


On 8 December 2021 at 00:06, Simon Urbanek wrote:
| Hence it's much easier to ban a package than to hack it out of R ;).

Paging Achim for suggested `fortunes` inclusion.

Dirk

-- 
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Documentation of addmargins

2021-12-07 Thread SOEIRO Thomas
Yes, it is!

There is only a small typo (missing punctuation for easier reading)

Sorry for the misunderstanding, it may not be clear enough in my previous mail.

-Message d'origine-
De : GILLIBERT, Andre [mailto:andre.gillib...@chu-rouen.fr] 
Envoyé : mardi 7 décembre 2021 16:59
À : SOEIRO Thomas; R Development List
Objet : RE: Documentation of addmargins

EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS

Thomas SOEIRO wrote:
> Dear list,

> There is a minor typo in addmargins (section Details):

> - If the functions used to form margins are not commutative the result 
> depends on the order in which margins are computed. Annotation of margins is 
> done via naming the FUN list.
> + If the functions used to form margins are not commutative**add ':' or ', 
> i.e.' here** the result depends on the order in which margins are computed. 
> Annotation of margins is done via naming the FUN list.
>
>
> I'm not sure if such minor things really need to be reported when they are 
> noticed... Please let me know if not. Of course this is minor, but imho one 
> of the strengths of R is also its documentation!
>

The documentation looks correct to me.
If the function FUN is not commutative (i.e. the result depends on the order of 
the vector passed to it), then the result of addmargins() will depend on the 
order of the 'margin' argument to the addmargins() function.

For instance:
mat <- rbind(c(1,10),c(100,1000))
fun <- function(x) {x[1]-x[2]-x[1]*x[2]} # non-commutative function a <- 
addmargins(mat ,margin=c(1,2), FUN=fun) b <- addmargins(mat ,margin=c(2,1), 
FUN=fun)

a and b are different, because the fun function is not commutative.

--
Sincerely
André GILLIBERT
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Documentation of addmargins

2021-12-07 Thread GILLIBERT, Andre

Thomas SOEIRO wrote:
> Dear list,

> There is a minor typo in addmargins (section Details):

> - If the functions used to form margins are not commutative the result 
> depends on the order in which margins are computed. Annotation of margins is 
> done via naming the FUN list.
> + If the functions used to form margins are not commutative**add ':' or ', 
> i.e.' here** the result depends on the order in which margins are computed. 
> Annotation of margins is done via naming the FUN list.
>
>
> I'm not sure if such minor things really need to be reported when they are 
> noticed... Please let me know if not. Of course this is minor, but imho one 
> of the strengths of R is also its documentation!
>

The documentation looks correct to me.
If the function FUN is not commutative (i.e. the result depends on the order of 
the vector passed to it), then the result of addmargins() will depend on the 
order of the 'margin' argument to the addmargins() function.

For instance:
mat <- rbind(c(1,10),c(100,1000))
fun <- function(x) {x[1]-x[2]-x[1]*x[2]} # non-commutative function
a <- addmargins(mat ,margin=c(1,2), FUN=fun)
b <- addmargins(mat ,margin=c(2,1), FUN=fun)

a and b are different, because the fun function is not commutative.

-- 
Sincerely
André GILLIBERT
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Documentation of addmargins

2021-12-07 Thread SOEIRO Thomas
Dear list,

There is a minor typo in addmargins (section Details):

- If the functions used to form margins are not commutative the result depends 
on the order in which margins are computed. Annotation of margins is done via 
naming the FUN list.
+ If the functions used to form margins are not commutative**add ':' or ', 
i.e.' here** the result depends on the order in which margins are computed. 
Annotation of margins is done via naming the FUN list.


I'm not sure if such minor things really need to be reported when they are 
noticed... Please let me know if not. Of course this is minor, but imho one of 
the strengths of R is also its documentation!

Best,

Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Taras Zakharko
I fully agree! General string interpolation opens a gaping security hole and is 
accompanied by all kinds of problems and decisions. What I envision instead is 
something like this:

   f”hello {name}” 

Which gets parsed by R to this:

   (STRINTERPSXP (CHARSXP (PROMISE nil)))

Basically, a new type of R language construct that still can be processed by 
packages (for customized interpolation like in cli etc.), with a default eval 
which is basically paste0(). The benefit here would be that this is eagerly 
parsed and syntactically checked, and that the promise code could carry a 
srcref. And of course, that you could pass an interpolated string expression 
lazily between frames without losing the environment etc… For more advanced 
applications, a low level string interpolation expression constructor could be 
provided (that could either parse a general string — at the user’s risk, or 
build it directly from expressions). 

— Taras


> On 7 Dec 2021, at 12:06, Simon Urbanek  wrote:
> 
> 
> 
>> On Dec 7, 2021, at 22:09, Taras Zakharko > > wrote:
>> 
>> Great summary, Avi. 
>> 
>> String concatenation cold be trivially added to R, but it probably should 
>> not be. You will notice that modern languages tend not to use “+” to do 
>> string concatenation (they either have 
>> a custom operator or a special kind of pattern to do it) due to practical 
>> issues such an approach brings (implicit type casting, lack of 
>> commutativity, performance etc.). These issues will be felt even more so in 
>> R with it’s weak typing, idiosyncratic casting behavior and NAs. 
>> 
>> As other’s have pointed out, any kind of behavior one wants from string 
>> concatenation can be implemented by custom operators as needed. This is not 
>> something that needs to be in the base R. I would rather like the efforts to 
>> be directed on improving string formatting (such as glue-style built-in 
>> string interpolation).
>> 
> 
> This is getting OT, but there is a very good reason why string interpolation 
> is not in core R. As I recall it has been considered some time ago, but it is 
> very dangerous as it implies evaluation on constants which opens a huge 
> security hole and has questionable semantics (where you evaluate etc). Hence 
> it's much easier to ban a package than to hack it out of R ;).
> 
> Cheers,
> Simon
> 
> 
>> — Taras
>> 
>> 
>>> On 7 Dec 2021, at 02:27, Avi Gross via R-devel  
>>> wrote:
>>> 
>>> After seeing what others are saying, it is clear that you need to carefully
>>> think things out before designing any implementation of a more native
>>> concatenation operator whether it is called "+' or anything else. There may
>>> not be any ONE right solution but unlike a function version like paste()
>>> there is nowhere to place any options that specify what you mean.
>>> 
>>> You can obviously expand paste() to accept arguments like replace.NA="" or
>>> replace.NA="" and similar arguments on what to do if you see a NaN, and
>>> Inf or -Inf, a NULL or even an NA.character_ and so on. Heck, you might tell
>>> to make other substitutions as in substitute=list(100=99, D=F) or any other
>>> nonsense you can come up with.
>>> 
>>> But you have nowhere to put options when saying:
>>> 
>>> c <- a + b
>>> 
>>> Sure, you could set various global options before the addition and maybe
>>> rest them after, but that is not a way I like to go for something this
>>> basic.
>>> 
>>> And enough such tinkering makes me wonder if it is easier to ask a user to
>>> use a slightly different function like this:
>>> 
>>> paste.no.na <- function(...) do.call(paste, Filter(Negate(is.na),
>>> list(...)))
>>> 
>>> The above one-line function removes any NA from the argument list to make a
>>> potentially shorter list before calling the real paste() using it.
>>> 
>>> Variations can, of course, be made that allow functionality as above. 
>>> 
>>> If R was a true object-oriented language in the same sense as others like
>>> Python, operator overloading of "+" might be doable in more complex ways but
>>> we can only work with what we have. I tend to agree with others that in some
>>> places R is so lenient that all kinds of errors can happen because it makes
>>> a guess on how to correct it. Generally, if you really want to mix numeric
>>> and character, many languages require you to transform any arguments to make
>>> all of compatible types. The paste() function is clearly stated to coerce
>>> all arguments to be of type character for you. Whereas a+b makes no such
>>> promises and also is not properly defined even if a and b are both of type
>>> character. Sure, we can expand the language but it may still do things some
>>> find not to be quite what they wanted as in "2"+"3" becoming "23" rather
>>> than 5. Right now, I can use as.numeric("2")+as.numeric("3") and get the
>>> intended result after making very clear to anyone reading the code that I
>>> wanted strings converted to floating point before the addit

Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Simon Urbanek



> On Dec 7, 2021, at 22:09, Taras Zakharko  wrote:
> 
> Great summary, Avi. 
> 
> String concatenation cold be trivially added to R, but it probably should not 
> be. You will notice that modern languages tend not to use “+” to do string 
> concatenation (they either have 
> a custom operator or a special kind of pattern to do it) due to practical 
> issues such an approach brings (implicit type casting, lack of commutativity, 
> performance etc.). These issues will be felt even more so in R with it’s weak 
> typing, idiosyncratic casting behavior and NAs. 
> 
> As other’s have pointed out, any kind of behavior one wants from string 
> concatenation can be implemented by custom operators as needed. This is not 
> something that needs to be in the base R. I would rather like the efforts to 
> be directed on improving string formatting (such as glue-style built-in 
> string interpolation).
> 

This is getting OT, but there is a very good reason why string interpolation is 
not in core R. As I recall it has been considered some time ago, but it is very 
dangerous as it implies evaluation on constants which opens a huge security 
hole and has questionable semantics (where you evaluate etc). Hence it's much 
easier to ban a package than to hack it out of R ;).

Cheers,
Simon


> — Taras
> 
> 
>> On 7 Dec 2021, at 02:27, Avi Gross via R-devel  wrote:
>> 
>> After seeing what others are saying, it is clear that you need to carefully
>> think things out before designing any implementation of a more native
>> concatenation operator whether it is called "+' or anything else. There may
>> not be any ONE right solution but unlike a function version like paste()
>> there is nowhere to place any options that specify what you mean.
>> 
>> You can obviously expand paste() to accept arguments like replace.NA="" or
>> replace.NA="" and similar arguments on what to do if you see a NaN, and
>> Inf or -Inf, a NULL or even an NA.character_ and so on. Heck, you might tell
>> to make other substitutions as in substitute=list(100=99, D=F) or any other
>> nonsense you can come up with.
>> 
>> But you have nowhere to put options when saying:
>> 
>> c <- a + b
>> 
>> Sure, you could set various global options before the addition and maybe
>> rest them after, but that is not a way I like to go for something this
>> basic.
>> 
>> And enough such tinkering makes me wonder if it is easier to ask a user to
>> use a slightly different function like this:
>> 
>> paste.no.na <- function(...) do.call(paste, Filter(Negate(is.na),
>> list(...)))
>> 
>> The above one-line function removes any NA from the argument list to make a
>> potentially shorter list before calling the real paste() using it.
>> 
>> Variations can, of course, be made that allow functionality as above. 
>> 
>> If R was a true object-oriented language in the same sense as others like
>> Python, operator overloading of "+" might be doable in more complex ways but
>> we can only work with what we have. I tend to agree with others that in some
>> places R is so lenient that all kinds of errors can happen because it makes
>> a guess on how to correct it. Generally, if you really want to mix numeric
>> and character, many languages require you to transform any arguments to make
>> all of compatible types. The paste() function is clearly stated to coerce
>> all arguments to be of type character for you. Whereas a+b makes no such
>> promises and also is not properly defined even if a and b are both of type
>> character. Sure, we can expand the language but it may still do things some
>> find not to be quite what they wanted as in "2"+"3" becoming "23" rather
>> than 5. Right now, I can use as.numeric("2")+as.numeric("3") and get the
>> intended result after making very clear to anyone reading the code that I
>> wanted strings converted to floating point before the addition.
>> 
>> As has been pointed out, the plus operator if used to concatenate does not
>> have a cognate for other operations like -*/ and R has used most other
>> special symbols for other purposes. So, sure, we can use something like 
>> (4 periods) if it is not already being used for something but using + here
>> is a tad confusing. Having said that, the makers of Python did make that
>> choice.
>> 
>> -Original Message-
>> From: R-devel  On Behalf Of Gabriel Becker
>> Sent: Monday, December 6, 2021 7:21 PM
>> To: Bill Dunlap 
>> Cc: Radford Neal ; r-devel 
>> Subject: Re: [Rd] string concatenation operator (revisited)
>> 
>> As I recall, there was a large discussion related to that which resulted in
>> the recycle0 argument being added (but defaulting to FALSE) for
>> paste/paste0.
>> 
>> I think a lot of these things ultimately mean that if there were to be a
>> string concatenation operator, it probably shouldn't have behavior identical
>> to paste0. Was that what you were getting at as well, Bill?
>> 
>> ~G
>> 
>> On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap  wrote:
>> 
>>> Should paste0(character(0

Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Duncan Murdoch

On 07/12/2021 4:09 a.m., Taras Zakharko wrote:

Great summary, Avi.

String concatenation cold be trivially added to R, but it probably should not 
be. You will notice that modern languages tend not to use “+” to do string 
concatenation (they either have
a custom operator or a special kind of pattern to do it) due to practical 
issues such an approach brings (implicit type casting, lack of commutativity, 
performance etc.). These issues will be felt even more so in R with it’s weak 
typing, idiosyncratic casting behavior and NAs.

As other’s have pointed out, any kind of behavior one wants from string concatenation can be implemented by custom operators as needed. 



This is not something that needs to be in the base R. I would rather like the 
efforts to be directed on improving string formatting (such as glue-style 
built-in string interpolation).


R already has that in the glue package and elsewhere in other packages 
(e.g. I wrote a simple version for rgl). What would be the benefit of 
having it built in?


Duncan Murdoch



— Taras



On 7 Dec 2021, at 02:27, Avi Gross via R-devel  wrote:

After seeing what others are saying, it is clear that you need to carefully
think things out before designing any implementation of a more native
concatenation operator whether it is called "+' or anything else. There may
not be any ONE right solution but unlike a function version like paste()
there is nowhere to place any options that specify what you mean.

You can obviously expand paste() to accept arguments like replace.NA="" or
replace.NA="" and similar arguments on what to do if you see a NaN, and
Inf or -Inf, a NULL or even an NA.character_ and so on. Heck, you might tell
to make other substitutions as in substitute=list(100=99, D=F) or any other
nonsense you can come up with.

But you have nowhere to put options when saying:

c <- a + b

Sure, you could set various global options before the addition and maybe
rest them after, but that is not a way I like to go for something this
basic.

And enough such tinkering makes me wonder if it is easier to ask a user to
use a slightly different function like this:

paste.no.na <- function(...) do.call(paste, Filter(Negate(is.na),
list(...)))

The above one-line function removes any NA from the argument list to make a
potentially shorter list before calling the real paste() using it.

Variations can, of course, be made that allow functionality as above.

If R was a true object-oriented language in the same sense as others like
Python, operator overloading of "+" might be doable in more complex ways but
we can only work with what we have. I tend to agree with others that in some
places R is so lenient that all kinds of errors can happen because it makes
a guess on how to correct it. Generally, if you really want to mix numeric
and character, many languages require you to transform any arguments to make
all of compatible types. The paste() function is clearly stated to coerce
all arguments to be of type character for you. Whereas a+b makes no such
promises and also is not properly defined even if a and b are both of type
character. Sure, we can expand the language but it may still do things some
find not to be quite what they wanted as in "2"+"3" becoming "23" rather
than 5. Right now, I can use as.numeric("2")+as.numeric("3") and get the
intended result after making very clear to anyone reading the code that I
wanted strings converted to floating point before the addition.

As has been pointed out, the plus operator if used to concatenate does not
have a cognate for other operations like -*/ and R has used most other
special symbols for other purposes. So, sure, we can use something like 
(4 periods) if it is not already being used for something but using + here
is a tad confusing. Having said that, the makers of Python did make that
choice.

-Original Message-
From: R-devel  On Behalf Of Gabriel Becker
Sent: Monday, December 6, 2021 7:21 PM
To: Bill Dunlap 
Cc: Radford Neal ; r-devel 
Subject: Re: [Rd] string concatenation operator (revisited)

As I recall, there was a large discussion related to that which resulted in
the recycle0 argument being added (but defaulting to FALSE) for
paste/paste0.

I think a lot of these things ultimately mean that if there were to be a
string concatenation operator, it probably shouldn't have behavior identical
to paste0. Was that what you were getting at as well, Bill?

~G

On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap  wrote:


Should paste0(character(0), c("a","b")) give character(0)?
There is a fair bit of code that assumes that paste("X",NULL) gives "X"
but c(1,2)+NULL gives numeric(0).

-Bill

On Mon, Dec 6, 2021 at 1:32 PM Duncan Murdoch

wrote:


On 06/12/2021 4:21 p.m., Avraham Adler wrote:

Gabe, I agree that missingness is important to factor in. To
somewhat

abuse

the terminology, NA is often used to represent missingness. Perhaps
concatenating character something with character something missing

should

resu

Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Taras Zakharko
Great summary, Avi. 

String concatenation cold be trivially added to R, but it probably should not 
be. You will notice that modern languages tend not to use “+” to do string 
concatenation (they either have 
a custom operator or a special kind of pattern to do it) due to practical 
issues such an approach brings (implicit type casting, lack of commutativity, 
performance etc.). These issues will be felt even more so in R with it’s weak 
typing, idiosyncratic casting behavior and NAs. 

As other’s have pointed out, any kind of behavior one wants from string 
concatenation can be implemented by custom operators as needed. This is not 
something that needs to be in the base R. I would rather like the efforts to be 
directed on improving string formatting (such as glue-style built-in string 
interpolation).

— Taras


> On 7 Dec 2021, at 02:27, Avi Gross via R-devel  wrote:
> 
> After seeing what others are saying, it is clear that you need to carefully
> think things out before designing any implementation of a more native
> concatenation operator whether it is called "+' or anything else. There may
> not be any ONE right solution but unlike a function version like paste()
> there is nowhere to place any options that specify what you mean.
> 
> You can obviously expand paste() to accept arguments like replace.NA="" or
> replace.NA="" and similar arguments on what to do if you see a NaN, and
> Inf or -Inf, a NULL or even an NA.character_ and so on. Heck, you might tell
> to make other substitutions as in substitute=list(100=99, D=F) or any other
> nonsense you can come up with.
> 
> But you have nowhere to put options when saying:
> 
> c <- a + b
> 
> Sure, you could set various global options before the addition and maybe
> rest them after, but that is not a way I like to go for something this
> basic.
> 
> And enough such tinkering makes me wonder if it is easier to ask a user to
> use a slightly different function like this:
> 
> paste.no.na <- function(...) do.call(paste, Filter(Negate(is.na),
> list(...)))
> 
> The above one-line function removes any NA from the argument list to make a
> potentially shorter list before calling the real paste() using it.
> 
> Variations can, of course, be made that allow functionality as above. 
> 
> If R was a true object-oriented language in the same sense as others like
> Python, operator overloading of "+" might be doable in more complex ways but
> we can only work with what we have. I tend to agree with others that in some
> places R is so lenient that all kinds of errors can happen because it makes
> a guess on how to correct it. Generally, if you really want to mix numeric
> and character, many languages require you to transform any arguments to make
> all of compatible types. The paste() function is clearly stated to coerce
> all arguments to be of type character for you. Whereas a+b makes no such
> promises and also is not properly defined even if a and b are both of type
> character. Sure, we can expand the language but it may still do things some
> find not to be quite what they wanted as in "2"+"3" becoming "23" rather
> than 5. Right now, I can use as.numeric("2")+as.numeric("3") and get the
> intended result after making very clear to anyone reading the code that I
> wanted strings converted to floating point before the addition.
> 
> As has been pointed out, the plus operator if used to concatenate does not
> have a cognate for other operations like -*/ and R has used most other
> special symbols for other purposes. So, sure, we can use something like 
> (4 periods) if it is not already being used for something but using + here
> is a tad confusing. Having said that, the makers of Python did make that
> choice.
> 
> -Original Message-
> From: R-devel  On Behalf Of Gabriel Becker
> Sent: Monday, December 6, 2021 7:21 PM
> To: Bill Dunlap 
> Cc: Radford Neal ; r-devel 
> Subject: Re: [Rd] string concatenation operator (revisited)
> 
> As I recall, there was a large discussion related to that which resulted in
> the recycle0 argument being added (but defaulting to FALSE) for
> paste/paste0.
> 
> I think a lot of these things ultimately mean that if there were to be a
> string concatenation operator, it probably shouldn't have behavior identical
> to paste0. Was that what you were getting at as well, Bill?
> 
> ~G
> 
> On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap  wrote:
> 
>> Should paste0(character(0), c("a","b")) give character(0)?
>> There is a fair bit of code that assumes that paste("X",NULL) gives "X"
>> but c(1,2)+NULL gives numeric(0).
>> 
>> -Bill
>> 
>> On Mon, Dec 6, 2021 at 1:32 PM Duncan Murdoch 
>> 
>> wrote:
>> 
>>> On 06/12/2021 4:21 p.m., Avraham Adler wrote:
 Gabe, I agree that missingness is important to factor in. To 
 somewhat
>>> abuse
 the terminology, NA is often used to represent missingness. Perhaps 
 concatenating character something with character something missing
>>> should
 result in the original ch