Re: [Rd] stringsAsFactors and type.convert()

2020-04-20 Thread Martin Maechler
> Arni Magnusson 
> on Mon, 20 Apr 2020 16:50:16 + writes:

> Dear Martin,
> Thank you for the well-reasoned response. I realized I was rather late to 
make this suggestion for 4.0.0, changing a somewhat low-level function that can 
indeed affect packages.

> I was just reviewing some R user scripts that were using type.convert(), 
mainly on data frames. In all cases, people were passing as.is=TRUE, so I was 
reminded that I would not be the only user who would appreciate if as.is=TRUE 
becomes the default at some point.

> So I am happy to hear that the help page now mentions that the as.is=TRUE 
is planned to be the default at some point in the future. Looking forward to 
the 4.0.0 official release - all positive changes!

Thank you, Arni.

Well, I did not reveal everything, rather

My current suggestion is to *change* the default,
but not to TRUE but rather  if not specified to give a *warning*
which says it will use 'TRUE', but still a warning ...
along the line of the help page statement I mentioned (cited at the end below) 
that callers really should always specify the 'as.is' argument
... which may be a good idea anyway, alerting the user when changing
default behavior.

> All the best,
> Arni

thank you, the same to you,
Martin



> 
> From: Martin Maechler 
> Sent: Monday, April 20, 2020 6:23:31 PM
> To: Arni Magnusson
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] stringsAsFactors and type.convert()

> Arni Magnusson
> on Mon, 13 Apr 2020 22:20:19 + writes:

>> If read.table() is defaulting to "character" instead of "factor" data 
type, shouldn't type.convert() also default to "character" in R 4.0.0?
>> This would seem like a good time to change the default to 
type.convert(as.is=TRUE), to align it with the new default in read.table and 
data.frame. I think many R >=4.0.0 users would be happy with as.is=TRUE as the 
default in type.convert.

>> I'm happy to work on the patch and run tests if that is helpful.

>> Cheers,
>> Arni

> Dear Arni,
> thank you for the notice, which unfortunately wasn't noticed
> (Easter break etc) and was too late in any case to fulfill the
> criterion of a small trivial bug fix  for  R 4.0.0 beta (very close
> to becoming RC (= "Release Candidate").

> Even when type.convert() may not be used much directly (but
> rather indirectly via read.table() where there's no problem), we
> found it too risky to destabilize the R 4.0.0 prereleases.
> As you all know there were ( / are?) still package changes
> needed and a few other important "todo"s, so we had to decide to
> postpone this (even for R-devel) to after releasing R 4.0.0
> coming Friday.

> I've committed a change to the help page which does mention that
> the default for 'as.is' is planned to be changed.

> Also, the help page's  "Details" section, for a long time has
> ended with

> Since this is a helper function, the caller should always pass an
> appropriate value of 'as.is'.

> If useRs and package authors have followed this advice, they
> won't be bitten at all.

> Best regards,
> Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stringsAsFactors and type.convert()

2020-04-20 Thread Arni Magnusson
Dear Martin,

Thank you for the well-reasoned response. I realized I was rather late to make 
this suggestion for 4.0.0, changing a somewhat low-level function that can 
indeed affect packages.

I was just reviewing some R user scripts that were using type.convert(), mainly 
on data frames. In all cases, people were passing as.is=TRUE, so I was reminded 
that I would not be the only user who would appreciate if as.is=TRUE becomes 
the default at some point.

So I am happy to hear that the help page now mentions that the as.is=TRUE is 
planned to be the default at some point in the future. Looking forward to the 
4.0.0 official release - all positive changes!

All the best,
Arni


From: Martin Maechler 
Sent: Monday, April 20, 2020 6:23:31 PM
To: Arni Magnusson
Cc: r-devel@r-project.org
Subject: Re: [Rd] stringsAsFactors and type.convert()

> Arni Magnusson
> on Mon, 13 Apr 2020 22:20:19 + writes:

> If read.table() is defaulting to "character" instead of "factor" data 
type, shouldn't type.convert() also default to "character" in R 4.0.0?
> This would seem like a good time to change the default to 
type.convert(as.is=TRUE), to align it with the new default in read.table and 
data.frame. I think many R >=4.0.0 users would be happy with as.is=TRUE as the 
default in type.convert.

> I'm happy to work on the patch and run tests if that is helpful.

> Cheers,
> Arni

Dear Arni,
thank you for the notice, which unfortunately wasn't noticed
(Easter break etc) and was too late in any case to fulfill the
criterion of a small trivial bug fix  for  R 4.0.0 beta (very close
to becoming RC (= "Release Candidate").

Even when type.convert() may not be used much directly (but
rather indirectly via read.table() where there's no problem), we
found it too risky to destabilize the R 4.0.0 prereleases.
As you all know there were ( / are?) still package changes
needed and a few other important "todo"s, so we had to decide to
postpone this (even for R-devel) to after releasing R 4.0.0
coming Friday.

I've committed a change to the help page which does mention that
the default for 'as.is' is planned to be changed.

Also, the help page's  "Details" section, for a long time has
ended with

 Since this is a helper function, the caller should always pass an
 appropriate value of 'as.is'.

If useRs and package authors have followed this advice, they
won't be bitten at all.

Best regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stringsAsFactors and type.convert()

2020-04-20 Thread Martin Maechler
> Arni Magnusson 
> on Mon, 13 Apr 2020 22:20:19 + writes:

> If read.table() is defaulting to "character" instead of "factor" data 
type, shouldn't type.convert() also default to "character" in R 4.0.0?
> This would seem like a good time to change the default to 
type.convert(as.is=TRUE), to align it with the new default in read.table and 
data.frame. I think many R >=4.0.0 users would be happy with as.is=TRUE as the 
default in type.convert.

> I'm happy to work on the patch and run tests if that is helpful.

> Cheers,
> Arni

Dear Arni,
thank you for the notice, which unfortunately wasn't noticed
(Easter break etc) and was too late in any case to fulfill the
criterion of a small trivial bug fix  for  R 4.0.0 beta (very close
to becoming RC (= "Release Candidate").

Even when type.convert() may not be used much directly (but
rather indirectly via read.table() where there's no problem), we
found it too risky to destabilize the R 4.0.0 prereleases.
As you all know there were ( / are?) still package changes
needed and a few other important "todo"s, so we had to decide to
postpone this (even for R-devel) to after releasing R 4.0.0
coming Friday.

I've committed a change to the help page which does mention that
the default for 'as.is' is planned to be changed.

Also, the help page's  "Details" section, for a long time has
ended with

 Since this is a helper function, the caller should always pass an
 appropriate value of 'as.is'.

If useRs and package authors have followed this advice, they
won't be bitten at all.

Best regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion: "." in [lsv]apply()

2020-04-20 Thread Gabor Grothendieck
I wouldn't drive my choices using unlikely edge cases
but rather focus on the vast majority of practical cases.

The popularity of tidyverse shows that this philosophy
works well from a user's perspective.

For the vast majority of practical cases it works well, and for the
others you can either use function as usual or do it like this:

lapply(quote(a + b), fn$identity(~ as.character(x)))

or (we used dot here but you can use any name you like)

. <- fn$identity
lapply(quote(a + b), .(~ as.character(x)))

For the vast majority of practical cases it has the advantages that the function
can be represented more naturally using whatever argument names are most
convenient rather than being forced to use reserved names and it supports
multiple arguments and dot dot dot.

Also it does more than just represent functions. It also interpolates strings
so it can be used for multiple purposes.

library(sqldf)
mytime <- 4
fn$sqldf("select * from BOD where Time < $mytime")


On Mon, Apr 20, 2020 at 9:32 AM Sokol Serguei  wrote:
>
> Le 19/04/2020 à 20:46, Gabor Grothendieck a écrit :
> > You can get pretty close to that already using fn$ in the gsubfn package:
> >> library(gsubfn) fn$sapply(split(mtcars, mtcars$cyl), x ~
> >> summary(lm(mpg ~ wt, x))$r.squared)
> > 4 6 8 0.5086326 0.4645102 0.4229655
> Right, I thought about similar syntax but this implementation has
> similar flaws pointed by Simon, i.e. it reduces the domain of valid
> inputs (though not on the same parameters). Take an example:
>
> library(gsubfn)
> fn$sapply(quote(x+y), as.character)
> #Error in lapply(X = X, FUN = FUN, ...) : object 'x' not found
>
> while
>
> sapply(quote(x+y), as.character)
> #[1] "+" "x" "y"
>
> This makes me think that it could be advantageous to replace
> match.fun(FUN) in *apply() family by as.function(FUN) with obvious
> additional methods:
> as.function.character <- function(x) match.fun(x)
> as.function.name <- function(x) match.fun(x)
>
> Such replacement would leave current usage of *apply() as is but at the
> same time would leave enough space for users who want to adapt *apply()
> to their objects like formula or whatever class that is currently not
> convertible to functions by match.fun()
>
> Would it be possible?
>
> Best,
> Serguei.
>
> > It is not specific to sapply but rather fn$ can preface most
> > functions. If the only free variables are the arguments to the
> > function then you can omit the left hand side of the formula, i.e. the
> > arguments to the function are implied by the free variables in the
> > right hand side. Here x is the implied argument to the function
> > because it is a free variable. We did not have use the name x. Any
> > name could be used. It is the fact that it is a free variable, not its
> > name, that matters.
> >> fn$sapply(split(mtcars, mtcars$cyl), ~ sum(dim(x)))
> > 4 6 8 22 18 25 On Fri, Apr 17, 2020 at 4:11 AM Sokol Serguei
> >  wrote:
> >> Thanks Simon, Now, I see better your argument. Le 16/04/2020 à 22:48,
> >> Simon Urbanek a écrit :
> >>> ... I'm not arguing against the principle, I'm arguing about your
> >>> particular proposal as it is inconsistent and not general.
> >> This sounds promising for me. May be in a (new?) future, R core will
> >> come with a correct proposal for this principle? Meanwhile, to avoid
> >> substitute(), I'll look on the side of formula syntax deviation as
> >> your example x ~> i + x suggested. Best, Serguei.
> >>> Personally, I find the current syntax much clearer and readable
> >>> (defining anything by convention like . being the function variable
> >>> seems arbitrary and "dirty" to me), but if you wanted to define a
> >>> shorter syntax, you could use something like x ~> i + x. That said,
> >>> I really don't see the value of not using function(x) [especially
> >>> these days when people are arguing for long variable names with the
> >>> justification that IDEs do all the work anyway], but as I said, my
> >>> argument was against the actual proposal, not general ideas about
> >>> syntax improvement. Cheers, Simon
>  On 17/04/2020, at 3:53 AM, Sokol Serguei 
>  wrote: Simon, Thanks for replying. In what follows I won't try to
>  argue (I understood that you find this a bad idea) but I would like
>  to make clearer some of your point for me (and may be for others).
>  Le 16/04/2020 à 16:48, Simon Urbanek a écrit :
> > Serguei,
> >> On 17/04/2020, at 2:24 AM, Sokol Serguei 
> >> wrote: Hi, I would like to make a suggestion for a small
> >> syntactic modification of FUN argument in the family of functions
> >> [lsv]apply(). The idea is to allow one-liner expressions without
> >> typing "function(item) {...}" to surround them. The argument to
> >> the anonymous function is simply referred as ".". Let take an
> >> example. With this new feature, the following call
> >> sapply(split(mtcars, mtcars$cyl), function(d) summary(lm(mpg ~
> >> wt, d))$r.squared) # 4 6 8 #0.5086326 0.4645102 0.422

Re: [Rd] suggestion: "." in [lsv]apply()

2020-04-20 Thread Sokol Serguei

Le 19/04/2020 à 20:46, Gabor Grothendieck a écrit :

You can get pretty close to that already using fn$ in the gsubfn package:
library(gsubfn) fn$sapply(split(mtcars, mtcars$cyl), x ~ 
summary(lm(mpg ~ wt, x))$r.squared) 

4 6 8 0.5086326 0.4645102 0.4229655
Right, I thought about similar syntax but this implementation has 
similar flaws pointed by Simon, i.e. it reduces the domain of valid 
inputs (though not on the same parameters). Take an example:


library(gsubfn)
fn$sapply(quote(x+y), as.character)
#Error in lapply(X = X, FUN = FUN, ...) : object 'x' not found

while

sapply(quote(x+y), as.character)
#[1] "+" "x" "y"

This makes me think that it could be advantageous to replace 
match.fun(FUN) in *apply() family by as.function(FUN) with obvious 
additional methods:

as.function.character <- function(x) match.fun(x)
as.function.name <- function(x) match.fun(x)

Such replacement would leave current usage of *apply() as is but at the 
same time would leave enough space for users who want to adapt *apply() 
to their objects like formula or whatever class that is currently not 
convertible to functions by match.fun()


Would it be possible?

Best,
Serguei.

It is not specific to sapply but rather fn$ can preface most 
functions. If the only free variables are the arguments to the 
function then you can omit the left hand side of the formula, i.e. the 
arguments to the function are implied by the free variables in the 
right hand side. Here x is the implied argument to the function 
because it is a free variable. We did not have use the name x. Any 
name could be used. It is the fact that it is a free variable, not its 
name, that matters.
fn$sapply(split(mtcars, mtcars$cyl), ~ sum(dim(x))) 
4 6 8 22 18 25 On Fri, Apr 17, 2020 at 4:11 AM Sokol Serguei 
 wrote:
Thanks Simon, Now, I see better your argument. Le 16/04/2020 à 22:48, 
Simon Urbanek a écrit :
... I'm not arguing against the principle, I'm arguing about your 
particular proposal as it is inconsistent and not general. 
This sounds promising for me. May be in a (new?) future, R core will 
come with a correct proposal for this principle? Meanwhile, to avoid 
substitute(), I'll look on the side of formula syntax deviation as 
your example x ~> i + x suggested. Best, Serguei.
Personally, I find the current syntax much clearer and readable 
(defining anything by convention like . being the function variable 
seems arbitrary and "dirty" to me), but if you wanted to define a 
shorter syntax, you could use something like x ~> i + x. That said, 
I really don't see the value of not using function(x) [especially 
these days when people are arguing for long variable names with the 
justification that IDEs do all the work anyway], but as I said, my 
argument was against the actual proposal, not general ideas about 
syntax improvement. Cheers, Simon
On 17/04/2020, at 3:53 AM, Sokol Serguei  
wrote: Simon, Thanks for replying. In what follows I won't try to 
argue (I understood that you find this a bad idea) but I would like 
to make clearer some of your point for me (and may be for others). 
Le 16/04/2020 à 16:48, Simon Urbanek a écrit :

Serguei,
On 17/04/2020, at 2:24 AM, Sokol Serguei  
wrote: Hi, I would like to make a suggestion for a small 
syntactic modification of FUN argument in the family of functions 
[lsv]apply(). The idea is to allow one-liner expressions without 
typing "function(item) {...}" to surround them. The argument to 
the anonymous function is simply referred as ".". Let take an 
example. With this new feature, the following call 
sapply(split(mtcars, mtcars$cyl), function(d) summary(lm(mpg ~ 
wt, d))$r.squared) # 4 6 8 #0.5086326 0.4645102 0.4229655 could 
be rewritten as sapply(split(mtcars, mtcars$cyl), summary(lm(mpg 
~ wt, .))$r.squared) "Not a big saving in typing" you can say but 
multiplied by the number of [lsv]apply usage and a neater look, I 
think, the idea merits to be considered. 
It's not in any way "neater", not only is it less readable, it's 
just plain wrong. What if the expression returned a function? 
do you mean like in l=sapply(1:3, function(i) function(x) i+x) 
l[[1]](3) # 4 l[[2]](3) # 5 This is indeed a corner case but a pair 
of () or {} can keep wsapply() in course: l=wsapply(1:3, 
(function(x) .+x)) l[[1]](3) # 4 l[[2]](3) # 5
How do you know that you don't want to apply the result of the call? 
A small example (if it is significantly different from the one 
above) would be very helpful for me to understand this point.
For the same reason the implementation below won't work - very 
often you just pass a symbol that evaluates to a function and 
always en expression that returns a function and there is no way 
to distinguish that from your new proposed syntax. 

Even with () or {} around such "dotted" expression? Best, Serguei.
When you feel compelled to use substitute() you should hear alarm 
bells that something is wrong ;). You can certainly write a new 
function that uses a different syntax (and I'm sure som