Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Bill Dunlap
When I am debugging a function with code like
x <- f1(x)
x <- f2(x)
result <- f3(x)
I will often slip a line like '.GlobalEnv$tmp1 <- x' between the first two
lines and '.GlobalEnv$tmp2 <- x' between the last two lines and look at the
intermediate results, 'tmp1' and 'tmp2' in the global environment, later to
see what is going on.

The equivalent expression using pipes is
x |>
f1() |>
f2() \>
f3() -> result
You can slip lines like 'print() \>' between the pipe parts because
print(x) returns x, but it is more tedious to add assignment lines.  One
could define a function like
   pipe_save <- function(x, name, envir=.GlobalEnv) {
   envir[[name]] <- x
x
   }
and then puts lines like 'pipe_save("tmp1") |>' into the pipe sequence to
save intermediate results.

A function like
pipe_eval <- function(x, expr) {
   eval(substitute(expr), list(x=x))
x
   }
would make it easy to call plot() or summary(), etc., on the piped data
with lines like
   'pipe_eval(print(summary(x)) |>'
inserted into the pipe sequence.

E.g.,

> 1/(1:10) |>
+pipe_eval(print(summary(x))) |>
+range() |>
+pipe_eval(print(x)) |>
+sum()
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
 0.1000  0.1295  0.1833  0.2929  0.3125  1.
[1] 0.1 1.0
[1] 1.1

You could even add if(isTRUE(getOption("debug"))) before the eval() or
assignment to make these do nothing to make it easy to turn debugging on
and off with options(debug=TRUE/FALSE).

-Bill


On Wed, Dec 9, 2020 at 1:58 PM Timothy Goodman 
wrote:
>
> On Wed, Dec 9, 2020 at 1:03 PM Duncan Murdoch 
> wrote:  Then I could run any number of lines with pipes at the
>
> > > start and no special character at the end, and have it treated as a
> > > single pipeline.  I suppose that'd need to be a feature offered by the
> > > environment (RStudio's RNotebooks in my case).  I could wrap my
> > > pipelines in parentheses (to make the "pipes at start of line" syntax
> > > valid R code), and then could use the hypothetical "submit selected
code
> > > ignoring line-breaks" feature when running just the first part of the
> > > pipeline -- i.e., selecting full lines, but starting after the opening
> > > paren so as not to need to insert a closing paren.
> >
> > I think I don't understand your workflow enough to comment on this.
> >
> > Duncan
> >
> >
> >
> What I mean is, I could add parentheses as suggested to let me put the
> pipes at the start of the line, like this:
>
> (  # Line 1
> my_data_frame  # Line 2
> |> filter(some_condition)  # Line 3
> |> group_by(some_column)   # Line 4
> |> summarize(some_functions)   # Line 5
> )  # Line 6
>
> If this gives me an unexpected result, I might want to re-run just up
> through line 3 and check the output, to see if something is wrong with the
> "filter" (e.g., my condition matched less data than expected).  Ideally, I
> could do this without changing the code, by just selecting lines 2 and 3
> and pressing Ctrl+Enter (my environment's shortcut for "run selected
> code").  But it wouldn't work, because without including the parentheses
> these lines would be treated as two separate expressions, the second of
> which is invalid since it starts with a pipe.  Alternatively, I could
> include line 1 in my selection (along with lines 2 and 3), but it wouldn't
> work without having to type a new closing parenthesis after line 3, and
> then delete it afterwards.  Or, I could select and comment out lines 4 and
> 5, and then select and run all 6 lines.  But none of those are as
> convenient as just being able to select and run lines 2 and 3 (which is
> what I'm used to being able to do in several other languages which support
> pipelines).  And though it may seem a minor annoyance, when I'm working a
> lot with dplyr code I find myself wanting to do something like this many
> times per day.
>
> What *would* work well would be if I could write the code as above, but
> then when I want to select and re-run just lines 2 and 3, I would use some
> keyboard shortcut that meant "pass this code to the parser as a single
> line, with line breaks (and comments) removed".  Then it would be run like
> my_data_frame |> filter(some_condition)
> instead of producing an error.  That'd require the environment I'm using
--
> RStudio -- to support this feature, but wouldn't require any change to how
> R is parsed.  From the replies here, I'm coming around to thinking that'd
> be the better option.
>
> - Tim
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Timothy Goodman
I'm thrilled to hear it!  Thank you!

- Tim

P.S. I re-added the r-devel list, since Kevin's reply was sent just to me,
but I thought there might be others interested in knowing about those work
items.  (I hope that's OK, email-etiquette-wise.)

On Wed, Dec 9, 2020 at 1:10 PM Kevin Ushey  wrote:

> You might be surprised to learn that the RStudio IDE engineers might
> be receptive to such a feature request. :-)
>
> https://github.com/rstudio/rstudio/issues/8589
> https://github.com/rstudio/rstudio/issues/8590
>
> (Spoiler alert: I am one of the RStudio IDE engineers, and I think
> this would be worth doing.)
>
> Best,
> Kevin
>
> On Wed, Dec 9, 2020 at 12:16 PM Timothy Goodman 
> wrote:
> >
> > Since my larger concern is being able to conveniently select and re-run
> part of a multiline pipeline, I don't think wrapping in parentheses will
> help.  I'd have to add a closing paren at the end of the selection, which
> is no more convenient than having to highlight all but the last pipe.
> (Admittedly, wrapping in parens would allow my preferred syntax of having
> pipes at the start of the line, but I don't think that's worth the cost of
> having to constantly move the trailing paren around.)
> >
> > My back-up plan if I fail to persuade you all is indeed to beg the
> developers of RStudio to add an option to do the transformation I would
> want when executing notebook code, but I'm anticipating the objection of "R
> Notebooks shouldn't transform invalid R code into valid R code."  I was
> hoping "Let's make this new pipe |> work differently in a case that's
> currently an error" would be an easier sell.
> >
> > Also, just to reiterate: Only one of my two suggestions really requires
> caring about newlines.  (That's my preferred solution, but I understand
> it'd be the bigger change.)  The other suggestion just amounts to ignoring
> a final |> when code is submitted for execution.
> >
> >  -Tim
> >
> > On Wed, Dec 9, 2020 at 11:58 AM Kevin Ushey 
> wrote:
> >>
> >> I agree with Duncan that the right solution is to wrap the pipe
> >> expression with parentheses. Having the parser treat newlines
> >> differently based on whether the session is interactive, or on what
> >> type of operator happens to follow a newline, feels like a pretty big
> >> can of worms.
> >>
> >> I think this (or something similar) would accomplish what you want
> >> while still retaining the nice aesthetics of the pipe expression, with
> >> a minimal amount of syntax "noise":
> >>
> >> result <- (
> >>   data
> >> |> op1()
> >> |> op2()
> >> )
> >>
> >> For interactive sessions where you wanted to execute only parts of the
> >> pipeline at a time, I could see that being accomplished by the editor
> >> -- it could transform the expression so that it could be handled by R,
> >> either by hoisting the pipe operator(s) up a line, or by wrapping the
> >> to-be-executed expression in parentheses for you. If such a style of
> >> coding became popular enough, I'm sure the developers of such editors
> >> would be interested and willing to support this ...
> >>
> >> Perhaps more importantly, it would be much easier to accomplish than a
> >> change to the behavior of the R parser, and it would be work that
> >> wouldn't have to be maintained by the R Core team.
> >>
> >> Best,
> >> Kevin
> >>
> >> On Wed, Dec 9, 2020 at 11:34 AM Timothy Goodman 
> wrote:
> >> >
> >> > If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute
> the
> >> > command in the Notebook environment I'm using) I certainly *would*
> expect R
> >> > to treat it as a complete statement.
> >> >
> >> > But what I'm talking about is a different case, where I highlight a
> >> > multi-line statement in my notebook:
> >> >
> >> > my_data_frame1
> >> > |> filter(some_conditions_1)
> >> >
> >> > and then press Ctrl+Enter.  Or, I suppose the equivalent would be to
> run an
> >> > R script containing those two lines of code, or to run a multi-line
> >> > statement like that from the console (which in RStudio I can do by
> pressing
> >> > Shift+Enter between the lines.)
> >> >
> >> > In those cases, R could either (1) Give an error message [the current
> >> > behavior], or (2) understand that the first line is meant to be piped
> to
> >> > the second.  The second option would be significantly more useful,
> and is
> >> > almost certainly what the user intended.
> >> >
> >> > (For what it's worth, there are some languages, such as Javascript,
> that
> >> > consider the first token of the next line when determining if the
> previous
> >> > line was complete.  JavaScript's rules around this are overly
> complicated,
> >> > but a rule like "a pipe following a line break is treated as
> continuing the
> >> > previous line" would be much simpler.  And while it might be
> objectionable
> >> > to treat the operator %>% different from other operators, the
> addition of
> >> > |>, which isn't truly an operator at all, seems like the right time to
> >> > consider it.)
> 

Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Timothy Goodman
On Wed, Dec 9, 2020 at 1:03 PM Duncan Murdoch 
wrote:  Then I could run any number of lines with pipes at the

> > start and no special character at the end, and have it treated as a
> > single pipeline.  I suppose that'd need to be a feature offered by the
> > environment (RStudio's RNotebooks in my case).  I could wrap my
> > pipelines in parentheses (to make the "pipes at start of line" syntax
> > valid R code), and then could use the hypothetical "submit selected code
> > ignoring line-breaks" feature when running just the first part of the
> > pipeline -- i.e., selecting full lines, but starting after the opening
> > paren so as not to need to insert a closing paren.
>
> I think I don't understand your workflow enough to comment on this.
>
> Duncan
>
>
>
What I mean is, I could add parentheses as suggested to let me put the
pipes at the start of the line, like this:

(  # Line 1
my_data_frame  # Line 2
|> filter(some_condition)  # Line 3
|> group_by(some_column)   # Line 4
|> summarize(some_functions)   # Line 5
)  # Line 6

If this gives me an unexpected result, I might want to re-run just up
through line 3 and check the output, to see if something is wrong with the
"filter" (e.g., my condition matched less data than expected).  Ideally, I
could do this without changing the code, by just selecting lines 2 and 3
and pressing Ctrl+Enter (my environment's shortcut for "run selected
code").  But it wouldn't work, because without including the parentheses
these lines would be treated as two separate expressions, the second of
which is invalid since it starts with a pipe.  Alternatively, I could
include line 1 in my selection (along with lines 2 and 3), but it wouldn't
work without having to type a new closing parenthesis after line 3, and
then delete it afterwards.  Or, I could select and comment out lines 4 and
5, and then select and run all 6 lines.  But none of those are as
convenient as just being able to select and run lines 2 and 3 (which is
what I'm used to being able to do in several other languages which support
pipelines).  And though it may seem a minor annoyance, when I'm working a
lot with dplyr code I find myself wanting to do something like this many
times per day.

What *would* work well would be if I could write the code as above, but
then when I want to select and re-run just lines 2 and 3, I would use some
keyboard shortcut that meant "pass this code to the parser as a single
line, with line breaks (and comments) removed".  Then it would be run like
my_data_frame |> filter(some_condition)
instead of producing an error.  That'd require the environment I'm using --
RStudio -- to support this feature, but wouldn't require any change to how
R is parsed.  From the replies here, I'm coming around to thinking that'd
be the better option.

- Tim

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Gregory Warnes
Many languages allow a final backslash (“\”) character to allow an
expression to span multiple lines, and I’ve often wished for this in R,
particularly to allow me to put  `else` on a separate line at the
top-level. It would also allow alignment of infix operators like the new
pipe operator `|>` at the start of a line, which I would heartily endorse.

On Wed, Dec 9, 2020 at 3:58 PM Ben Bolker  wrote:

>Definitely support the idea that if this kind of trickery is going to
> happen that it be confined to some particular IDE/environment or some
> particular submission protocol. I don't want it to happen in my ESS
> session please ... I'd rather deal with the parentheses.
>
> On 12/9/20 3:45 PM, Timothy Goodman wrote:
> > Regarding special treatment for |>, isn't it getting special treatment
> > anyway, because it's implemented as a syntax transformation from x |>
> f(y)
> > to f(x, y), rather than as an operator?
> >
> > That said, the point about wanting a block of code submitted line-by-line
> > to work the same as a block of code submittedr d all at once is a fair
> one.
> > Maybe the better solution would be if there were a way to say "Submit the
> > selected code as a single expression, ignoring line-breaks".  Then I
> could
> > run any number of lines with pipes at the start and no special character
> at
> > the end, and have it treated as a single pipeline.  I suppose that'd need
> > to be a feature offered by the erred environment (RStudio's RNotebooks
> in my
> > case).  I could wrap my pipelines in parentheses (to make the "pipes at
> > start of line" syntax valid R code), and then could use the hypothetical
> > "submit selected code ignoring line-breaks" feature when running just the
> > first part of the pipeline -- i.e., selecting full lines, but starting
> > after the opening paren so as not to need to insert a closing paren.
> >
> > - Tim
> >
> > On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch  >
> > wrote:
> >
> >> On 09/12/2020 2:33 p.m., Timothy Goodman wrote:
> >>> If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
> >>> command in the Notebook environment I'm using) I certainly *would*
> >>> expect R to treat it as a complete statement.
> >>>
> >>> But what I'm talking about is a different case, where I highlight a
> >>> multi-line statement in my notebook:
> >>>
> >>>   my_data_frame1
> >>>   |> filter(some_conditions_1)
> >>>
> >>> and then press Ctrl+Enter.
> >>
> >> I don't think I'd like it if parsing changed between passing one line at
> >> a time and passing a block of lines.  I'd like to be able to highlight a
> >> few lines and pass those, then type one, then highlight some more and
> >> pass those:  and have it act as though I just passed the whole combined
> >> block, or typed everything one line at a time.
> >>
> >>
> >> Or, I suppose the equivalent would be to run
> >>> an R script containing those two lines of code, or to run a multi-line
> >>> statement like that from the console (which in RStudio I can do by
> >>> pressing Shift+Enter between the lines.)
> >>>
> >>> In those cases, R could either (1) Give an error message [the current
> >>> behavior], or (2) understand that the first line is meant to be piped
> to
> >>> the second.  The second option would be significantly more useful, and
> >>> is almost certainly what the user intended.
> >>>
> >>> (For what it's worth, there are some languages, such as Javascript,
> that
> >>> consider the first token of the next line when determining if the
> >>> previous line was complete.  JavaScript's rules around this are overly
> >>> complicated, but a rule like "a pipe following a line break is treated
> >>> as continuing the previous line" would be much simpler.  And while it
> >>> might be objectionable to treat the operator %>% different from other
> >>> operators, the addition of |>, which isn't truly an operator at all,
> >>> seems like the right time to consider it.)
> >>
> >> I think this would be hard to implement with R's current parser, but
> >> possible.  I think it could be done by distinguishing between EOL
> >> markers within a block of text and "end of block" marks.  If it applied
> >> only to the |> operator it would be *really* ugly.
> >>
> >> My strongest objection to it is the one at the top, though.  If I have a
> >> block of lines sitting in my editor that I just finished executing, with
> >> the cursor pointing at the next line, I'd like to know that it didn't
> >> matter whether the lines were passed one at a time, as a block, or some
> >> combination of those.
> >>
> >> Duncan Murdoch
> >>
> >>>
> >>> -Tim
> >>>
> >>> On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <
> murdoch.dun...@gmail.com
> >>> > wrote:
> >>>
> >>>  The requirement for operators at the end of the line comes from
> the
> >>>  interactive nature of R.  If you type
> >>>
> >>>my_data_frame_1
> >>>
> >>>  how could R know that you are not done, a

Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Duncan Murdoch

On 09/12/2020 3:45 p.m., Timothy Goodman wrote:
Regarding special treatment for |>, isn't it getting special treatment 
anyway, because it's implemented as a syntax transformation from x |> 
f(y) to f(x, y), rather than as an operator?


That's different.  Currently |> is parsed just like any other binary 
operator, it's the code emitted after parsing that is different from 
most other cases.  I think your suggestion would need changes in the 
parsing itself.


It's a few years since I worked with Bison (the parser generator that R 
uses), but I recall that handling inconsistencies was always tricky.



That said, the point about wanting a block of code submitted 
line-by-line to work the same as a block of code submitted all at once 
is a fair one.  Maybe the better solution would be if there were a way 
to say "Submit the selected code as a single expression, ignoring 
line-breaks". 


The way to do that is to replace some of the line breaks with 
semicolons, which act as statement separators.  The tricky bit is to 
figure out which ones to replace.  So if your block is


  x +
  y
  z

you'd glue it together as "x + y; z".  RStudio appears to know enough 
about R parsing to do that, and presumably if it was allowed to look at 
the start of the next line could handle things like


  x
  |> f()
  z

and rewrite them as "x |> f(); z".  It would mess up debugging a little 
(z is now on line 1, not line 3), but maybe it could undo the 
transformation if R told it there was a problem at line 1, column 11.



 Then I could run any number of lines with pipes at the
start and no special character at the end, and have it treated as a 
single pipeline.  I suppose that'd need to be a feature offered by the 
environment (RStudio's RNotebooks in my case).  I could wrap my 
pipelines in parentheses (to make the "pipes at start of line" syntax 
valid R code), and then could use the hypothetical "submit selected code 
ignoring line-breaks" feature when running just the first part of the 
pipeline -- i.e., selecting full lines, but starting after the opening 
paren so as not to need to insert a closing paren.


I think I don't understand your workflow enough to comment on this.

Duncan




- Tim

On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch > wrote:


On 09/12/2020 2:33 p.m., Timothy Goodman wrote:
 > If I type my_data_frame_1 and press Enter (or Ctrl+Enter to
execute the
 > command in the Notebook environment I'm using) I certainly *would*
 > expect R to treat it as a complete statement.
 >
 > But what I'm talking about is a different case, where I highlight a
 > multi-line statement in my notebook:
 >
 >      my_data_frame1
 >          |> filter(some_conditions_1)
 >
 > and then press Ctrl+Enter.

I don't think I'd like it if parsing changed between passing one
line at
a time and passing a block of lines.  I'd like to be able to
highlight a
few lines and pass those, then type one, then highlight some more and
pass those:  and have it act as though I just passed the whole combined
block, or typed everything one line at a time.


    Or, I suppose the equivalent would be to run
 > an R script containing those two lines of code, or to run a
multi-line
 > statement like that from the console (which in RStudio I can do by
 > pressing Shift+Enter between the lines.)
 >
 > In those cases, R could either (1) Give an error message [the
current
 > behavior], or (2) understand that the first line is meant to be
piped to
 > the second.  The second option would be significantly more
useful, and
 > is almost certainly what the user intended.
 >
 > (For what it's worth, there are some languages, such as
Javascript, that
 > consider the first token of the next line when determining if the
 > previous line was complete.  JavaScript's rules around this are
overly
 > complicated, but a rule like "a pipe following a line break is
treated
 > as continuing the previous line" would be much simpler.  And
while it
 > might be objectionable to treat the operator %>% different from
other
 > operators, the addition of |>, which isn't truly an operator at all,
 > seems like the right time to consider it.)

I think this would be hard to implement with R's current parser, but
possible.  I think it could be done by distinguishing between EOL
markers within a block of text and "end of block" marks.  If it applied
only to the |> operator it would be *really* ugly.

My strongest objection to it is the one at the top, though.  If I
have a
block of lines sitting in my editor that I just finished executing,
with
the cursor pointing at the next line, I'd like to know that it didn't
matter whether the lines were passed one at a time, as a block, or some
combination of those.

Duncan Murdoch

Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Ben Bolker
  Definitely support the idea that if this kind of trickery is going to 
happen that it be confined to some particular IDE/environment or some 
particular submission protocol. I don't want it to happen in my ESS 
session please ... I'd rather deal with the parentheses.


On 12/9/20 3:45 PM, Timothy Goodman wrote:

Regarding special treatment for |>, isn't it getting special treatment
anyway, because it's implemented as a syntax transformation from x |> f(y)
to f(x, y), rather than as an operator?

That said, the point about wanting a block of code submitted line-by-line
to work the same as a block of code submitted all at once is a fair one.
Maybe the better solution would be if there were a way to say "Submit the
selected code as a single expression, ignoring line-breaks".  Then I could
run any number of lines with pipes at the start and no special character at
the end, and have it treated as a single pipeline.  I suppose that'd need
to be a feature offered by the environment (RStudio's RNotebooks in my
case).  I could wrap my pipelines in parentheses (to make the "pipes at
start of line" syntax valid R code), and then could use the hypothetical
"submit selected code ignoring line-breaks" feature when running just the
first part of the pipeline -- i.e., selecting full lines, but starting
after the opening paren so as not to need to insert a closing paren.

- Tim

On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch 
wrote:


On 09/12/2020 2:33 p.m., Timothy Goodman wrote:

If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
command in the Notebook environment I'm using) I certainly *would*
expect R to treat it as a complete statement.

But what I'm talking about is a different case, where I highlight a
multi-line statement in my notebook:

  my_data_frame1
  |> filter(some_conditions_1)

and then press Ctrl+Enter.


I don't think I'd like it if parsing changed between passing one line at
a time and passing a block of lines.  I'd like to be able to highlight a
few lines and pass those, then type one, then highlight some more and
pass those:  and have it act as though I just passed the whole combined
block, or typed everything one line at a time.


Or, I suppose the equivalent would be to run

an R script containing those two lines of code, or to run a multi-line
statement like that from the console (which in RStudio I can do by
pressing Shift+Enter between the lines.)

In those cases, R could either (1) Give an error message [the current
behavior], or (2) understand that the first line is meant to be piped to
the second.  The second option would be significantly more useful, and
is almost certainly what the user intended.

(For what it's worth, there are some languages, such as Javascript, that
consider the first token of the next line when determining if the
previous line was complete.  JavaScript's rules around this are overly
complicated, but a rule like "a pipe following a line break is treated
as continuing the previous line" would be much simpler.  And while it
might be objectionable to treat the operator %>% different from other
operators, the addition of |>, which isn't truly an operator at all,
seems like the right time to consider it.)


I think this would be hard to implement with R's current parser, but
possible.  I think it could be done by distinguishing between EOL
markers within a block of text and "end of block" marks.  If it applied
only to the |> operator it would be *really* ugly.

My strongest objection to it is the one at the top, though.  If I have a
block of lines sitting in my editor that I just finished executing, with
the cursor pointing at the next line, I'd like to know that it didn't
matter whether the lines were passed one at a time, as a block, or some
combination of those.

Duncan Murdoch



-Tim

On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch mailto:murdoch.dun...@gmail.com>> wrote:

 The requirement for operators at the end of the line comes from the
 interactive nature of R.  If you type

   my_data_frame_1

 how could R know that you are not done, and are planning to type the
 rest of the expression

 %>% filter(some_conditions_1)
 ...

 before it should consider the expression complete?  The way languages
 like C do this is by requiring a statement terminator at the end.

You

 can also do it by wrapping the entire thing in parentheses ().

 However, be careful: Don't use braces:  they don't work.  And parens
 have the side effect of removing invisibility from the result (which

is

 a design flaw or bonus, depending on your point of view).  So I
 actually
 wouldn't advise this workaround.

 Duncan Murdoch


 On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
  > Hi,
  >
  > I'm a data scientist who routinely uses R in my day-to-day work,
 for tasks
  > such as cleaning and transforming data, exploratory data
 analysis, etc.
  >

Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Timothy Goodman
Regarding special treatment for |>, isn't it getting special treatment
anyway, because it's implemented as a syntax transformation from x |> f(y)
to f(x, y), rather than as an operator?

That said, the point about wanting a block of code submitted line-by-line
to work the same as a block of code submitted all at once is a fair one.
Maybe the better solution would be if there were a way to say "Submit the
selected code as a single expression, ignoring line-breaks".  Then I could
run any number of lines with pipes at the start and no special character at
the end, and have it treated as a single pipeline.  I suppose that'd need
to be a feature offered by the environment (RStudio's RNotebooks in my
case).  I could wrap my pipelines in parentheses (to make the "pipes at
start of line" syntax valid R code), and then could use the hypothetical
"submit selected code ignoring line-breaks" feature when running just the
first part of the pipeline -- i.e., selecting full lines, but starting
after the opening paren so as not to need to insert a closing paren.

- Tim

On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch 
wrote:

> On 09/12/2020 2:33 p.m., Timothy Goodman wrote:
> > If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
> > command in the Notebook environment I'm using) I certainly *would*
> > expect R to treat it as a complete statement.
> >
> > But what I'm talking about is a different case, where I highlight a
> > multi-line statement in my notebook:
> >
> >  my_data_frame1
> >  |> filter(some_conditions_1)
> >
> > and then press Ctrl+Enter.
>
> I don't think I'd like it if parsing changed between passing one line at
> a time and passing a block of lines.  I'd like to be able to highlight a
> few lines and pass those, then type one, then highlight some more and
> pass those:  and have it act as though I just passed the whole combined
> block, or typed everything one line at a time.
>
>
>Or, I suppose the equivalent would be to run
> > an R script containing those two lines of code, or to run a multi-line
> > statement like that from the console (which in RStudio I can do by
> > pressing Shift+Enter between the lines.)
> >
> > In those cases, R could either (1) Give an error message [the current
> > behavior], or (2) understand that the first line is meant to be piped to
> > the second.  The second option would be significantly more useful, and
> > is almost certainly what the user intended.
> >
> > (For what it's worth, there are some languages, such as Javascript, that
> > consider the first token of the next line when determining if the
> > previous line was complete.  JavaScript's rules around this are overly
> > complicated, but a rule like "a pipe following a line break is treated
> > as continuing the previous line" would be much simpler.  And while it
> > might be objectionable to treat the operator %>% different from other
> > operators, the addition of |>, which isn't truly an operator at all,
> > seems like the right time to consider it.)
>
> I think this would be hard to implement with R's current parser, but
> possible.  I think it could be done by distinguishing between EOL
> markers within a block of text and "end of block" marks.  If it applied
> only to the |> operator it would be *really* ugly.
>
> My strongest objection to it is the one at the top, though.  If I have a
> block of lines sitting in my editor that I just finished executing, with
> the cursor pointing at the next line, I'd like to know that it didn't
> matter whether the lines were passed one at a time, as a block, or some
> combination of those.
>
> Duncan Murdoch
>
> >
> > -Tim
> >
> > On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch  > > wrote:
> >
> > The requirement for operators at the end of the line comes from the
> > interactive nature of R.  If you type
> >
> >   my_data_frame_1
> >
> > how could R know that you are not done, and are planning to type the
> > rest of the expression
> >
> > %>% filter(some_conditions_1)
> > ...
> >
> > before it should consider the expression complete?  The way languages
> > like C do this is by requiring a statement terminator at the end.
> You
> > can also do it by wrapping the entire thing in parentheses ().
> >
> > However, be careful: Don't use braces:  they don't work.  And parens
> > have the side effect of removing invisibility from the result (which
> is
> > a design flaw or bonus, depending on your point of view).  So I
> > actually
> > wouldn't advise this workaround.
> >
> > Duncan Murdoch
> >
> >
> > On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
> >  > Hi,
> >  >
> >  > I'm a data scientist who routinely uses R in my day-to-day work,
> > for tasks
> >  > such as cleaning and transforming data, exploratory data
> > analysis, etc.
> >  > This includes frequent use of the pipe operator from the ma

Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Ben Bolker

  FWIW there is previous discussion of this in a twitter thread from May:

https://twitter.com/bolkerb/status/1258542150620332039

at the end I suggested defining something like .__END <- identity() as a 
pipe-ender.


On 12/9/20 2:58 PM, Kevin Ushey wrote:

I agree with Duncan that the right solution is to wrap the pipe
expression with parentheses. Having the parser treat newlines
differently based on whether the session is interactive, or on what
type of operator happens to follow a newline, feels like a pretty big
can of worms.

I think this (or something similar) would accomplish what you want
while still retaining the nice aesthetics of the pipe expression, with
a minimal amount of syntax "noise":

result <- (
   data
 |> op1()
 |> op2()
)

For interactive sessions where you wanted to execute only parts of the
pipeline at a time, I could see that being accomplished by the editor
-- it could transform the expression so that it could be handled by R,
either by hoisting the pipe operator(s) up a line, or by wrapping the
to-be-executed expression in parentheses for you. If such a style of
coding became popular enough, I'm sure the developers of such editors
would be interested and willing to support this ...

Perhaps more importantly, it would be much easier to accomplish than a
change to the behavior of the R parser, and it would be work that
wouldn't have to be maintained by the R Core team.

Best,
Kevin

On Wed, Dec 9, 2020 at 11:34 AM Timothy Goodman  wrote:


If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
command in the Notebook environment I'm using) I certainly *would* expect R
to treat it as a complete statement.

But what I'm talking about is a different case, where I highlight a
multi-line statement in my notebook:

 my_data_frame1
 |> filter(some_conditions_1)

and then press Ctrl+Enter.  Or, I suppose the equivalent would be to run an
R script containing those two lines of code, or to run a multi-line
statement like that from the console (which in RStudio I can do by pressing
Shift+Enter between the lines.)

In those cases, R could either (1) Give an error message [the current
behavior], or (2) understand that the first line is meant to be piped to
the second.  The second option would be significantly more useful, and is
almost certainly what the user intended.

(For what it's worth, there are some languages, such as Javascript, that
consider the first token of the next line when determining if the previous
line was complete.  JavaScript's rules around this are overly complicated,
but a rule like "a pipe following a line break is treated as continuing the
previous line" would be much simpler.  And while it might be objectionable
to treat the operator %>% different from other operators, the addition of
|>, which isn't truly an operator at all, seems like the right time to
consider it.)

-Tim

On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch 
wrote:


The requirement for operators at the end of the line comes from the
interactive nature of R.  If you type

  my_data_frame_1

how could R know that you are not done, and are planning to type the
rest of the expression

%>% filter(some_conditions_1)
...

before it should consider the expression complete?  The way languages
like C do this is by requiring a statement terminator at the end.  You
can also do it by wrapping the entire thing in parentheses ().

However, be careful: Don't use braces:  they don't work.  And parens
have the side effect of removing invisibility from the result (which is
a design flaw or bonus, depending on your point of view).  So I actually
wouldn't advise this workaround.

Duncan Murdoch


On 09/12/2020 12:45 a.m., Timothy Goodman wrote:

Hi,

I'm a data scientist who routinely uses R in my day-to-day work, for

tasks

such as cleaning and transforming data, exploratory data analysis, etc.
This includes frequent use of the pipe operator from the magrittr and

dplyr

libraries, %>%.  So, I was pleased to hear about the recent work on a
native pipe operator, |>.

This seems like a good time to bring up the main pain point I encounter
when using pipes in R, and some suggestions on what could be done about
it.  The issue is that the pipe operator can't be placed at the start of

a

line of code (except in parentheses).  That's no different than any

binary

operator in R, but I find it's a source of difficulty for the pipe

because

of how pipes are often used.

[I'm assuming here that my usage is fairly typical of a lot of users; at
any rate, I don't think I'm *too* unusual.]

=== Why this is a problem ===

It's very common (for me, and I suspect for many users of dplyr) to write
multi-step pipelines and put each step on its own line for readability.
Something like this:

### Example 1 ###
my_data_frame_1 %>%
  filter(some_conditions_1) %>%
  inner_join(my_data_frame_2, by = some_columns_1) %>%
  group_by(some_columns_2) %>%
  summarize(some

Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Duncan Murdoch

On 09/12/2020 2:33 p.m., Timothy Goodman wrote:
If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the 
command in the Notebook environment I'm using) I certainly *would* 
expect R to treat it as a complete statement.


But what I'm talking about is a different case, where I highlight a 
multi-line statement in my notebook:


     my_data_frame1
         |> filter(some_conditions_1)

and then press Ctrl+Enter.


I don't think I'd like it if parsing changed between passing one line at 
a time and passing a block of lines.  I'd like to be able to highlight a 
few lines and pass those, then type one, then highlight some more and 
pass those:  and have it act as though I just passed the whole combined 
block, or typed everything one line at a time.



  Or, I suppose the equivalent would be to run
an R script containing those two lines of code, or to run a multi-line 
statement like that from the console (which in RStudio I can do by 
pressing Shift+Enter between the lines.)


In those cases, R could either (1) Give an error message [the current 
behavior], or (2) understand that the first line is meant to be piped to 
the second.  The second option would be significantly more useful, and 
is almost certainly what the user intended.


(For what it's worth, there are some languages, such as Javascript, that 
consider the first token of the next line when determining if the 
previous line was complete.  JavaScript's rules around this are overly 
complicated, but a rule like "a pipe following a line break is treated 
as continuing the previous line" would be much simpler.  And while it 
might be objectionable to treat the operator %>% different from other 
operators, the addition of |>, which isn't truly an operator at all, 
seems like the right time to consider it.)


I think this would be hard to implement with R's current parser, but 
possible.  I think it could be done by distinguishing between EOL 
markers within a block of text and "end of block" marks.  If it applied 
only to the |> operator it would be *really* ugly.


My strongest objection to it is the one at the top, though.  If I have a 
block of lines sitting in my editor that I just finished executing, with 
the cursor pointing at the next line, I'd like to know that it didn't 
matter whether the lines were passed one at a time, as a block, or some 
combination of those.


Duncan Murdoch



-Tim

On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch > wrote:


The requirement for operators at the end of the line comes from the
interactive nature of R.  If you type

      my_data_frame_1

how could R know that you are not done, and are planning to type the
rest of the expression

        %>% filter(some_conditions_1)
        ...

before it should consider the expression complete?  The way languages
like C do this is by requiring a statement terminator at the end.  You
can also do it by wrapping the entire thing in parentheses ().

However, be careful: Don't use braces:  they don't work.  And parens
have the side effect of removing invisibility from the result (which is
a design flaw or bonus, depending on your point of view).  So I
actually
wouldn't advise this workaround.

Duncan Murdoch


On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
 > Hi,
 >
 > I'm a data scientist who routinely uses R in my day-to-day work,
for tasks
 > such as cleaning and transforming data, exploratory data
analysis, etc.
 > This includes frequent use of the pipe operator from the magrittr
and dplyr
 > libraries, %>%.  So, I was pleased to hear about the recent work on a
 > native pipe operator, |>.
 >
 > This seems like a good time to bring up the main pain point I
encounter
 > when using pipes in R, and some suggestions on what could be done
about
 > it.  The issue is that the pipe operator can't be placed at the
start of a
 > line of code (except in parentheses).  That's no different than
any binary
 > operator in R, but I find it's a source of difficulty for the
pipe because
 > of how pipes are often used.
 >
 > [I'm assuming here that my usage is fairly typical of a lot of
users; at
 > any rate, I don't think I'm *too* unusual.]
 >
 > === Why this is a problem ===
 >
 > It's very common (for me, and I suspect for many users of dplyr)
to write
 > multi-step pipelines and put each step on its own line for
readability.
 > Something like this:
 >
 >    ### Example 1 ###
 >    my_data_frame_1 %>%
 >      filter(some_conditions_1) %>%
 >      inner_join(my_data_frame_2, by = some_columns_1) %>%
 >      group_by(some_columns_2) %>%
 >      summarize(some_aggregate_functions_1) %>%
 >      filter(some_conditions_2) %>%
 >      left_join(my_data_frame_3, by = some_columns_3) %>%
 >      group_by(

Re: [Rd] New pipe operator and gg plotz

2020-12-09 Thread Hadley Wickham
Another option is https://github.com/hadley/ggplot1 🤣
Hadley

On Wed, Dec 9, 2020 at 1:24 PM Duncan Murdoch  wrote:
>
> Looks like Sergio Oller took your ambitious approach:
> https://github.com/zeehio/ggpipe.  It hasn't been updated since 2017, so
> there may be some new things in ggplot2 that aren't there yet.
>
> Duncan Murdoch
>
> On 09/12/2020 2:16 p.m., Greg Snow wrote:
> > Since `+` is already a function we could do regular piping to change this 
> > code:
> >
> > mtcars %>%
> >ggplot(aes(x=wt, y=mpg)) +
> >geom_point()
> >
> > to this:
> >
> > mtcars %>%
> >ggplot(aes(x=wt, y=mpg)) %>%
> >`+`(geom_point())
> >
> > Further we can write wrapper functions like:
> >
> > p_geom_point <- function(x,...) {
> >x + geom_point(...)
> > }
> >
> > The run the code like:
> >
> > mtcars %>%
> >ggplot(aes(x=wt, y=mpg)) %>%
> >p_geom_point()
> >
> > All three of the above give the same plot from what I can see, but I
> > have not tested it with very many options beyond the above.
> >
> > A really ambitious person could create a new package with wrappers for
> > all the ggplot2 functions that can come after the plus sign, then we
> > could use pipes for everything.  I don't know if there are any strange
> > circumstances that would make this cause problems (it probably will
> > slow things down slightly, but probably not enough for people to
> > notice).
> >
> > On Sun, Dec 6, 2020 at 7:18 PM Avi Gross via R-devel
> >  wrote:
> >>
> >> Thanks, Duncan. That answers my question fairly definitively.
> >>
> >> Although it can be DONE it likely won't be for the reasons Hadley 
> >> mentioned until we get some other product that replaces it entirely. There 
> >> are some interesting work-arounds mentioned.
> >>
> >> I was thinking of one that has overhead but might be a pain. Hadley 
> >> mentioned a slight variant. The first argument to a function now is 
> >> expected to be the data argument. The second might be the mapping. Now if 
> >> the function is called with a new first argument that is a ggplot object, 
> >> it could be possible to test the type and if it is a ggplot object than 
> >> slide over carefully any additional matched arguments that were not 
> >> explicitly named. Not sure that is at all easy to do.
> >>
> >> Alternately, you can ask that when used in such a pipeline that the user 
> >> call all other arguments using names like data=whatever, 
> >> mapping=aes(whatever) so no other args need to be adjusted by position.
> >>
> >> But all this is academic and I concede will likely not be done. I can live 
> >> with the plus signs.
> >>
> >>
> >> -Original Message-
> >> From: Duncan Murdoch 
> >> Sent: Sunday, December 6, 2020 2:50 PM
> >> To: Avi Gross ; 'r-devel' 
> >> Subject: Re: [Rd] New pipe operator and gg plotz
> >>
> >> Hadley's answer (#7 here:
> >> https://community.rstudio.com/t/why-cant-ggplot2-use/4372) makes it pretty 
> >> clear that he thinks it would have been nice now if he had made that 
> >> choice when ggplot2 came out, but it's not worth the effort now to change 
> >> it.
> >>
> >> Duncan Murdoch
> >>
> >> On 06/12/2020 2:34 p.m., Avi Gross via R-devel wrote:
> >>> As someone who switches back and forth between using standard R methods 
> >>> and those of the tidyverse, depending on the problem, my mood and whether 
> >>> Jupiter aligns with Saturn in the new age of Aquarius, I have a question 
> >>> about the forthcoming built-in pipe. Will it motivate anyone to 
> >>> eventually change or enhance the ggplot functionality to have a version 
> >>> that gets rid of the odd use of the addition symbol?
> >>>
> >>> I mean I now sometimes have a pipeline that looks like:
> >>>
> >>> Data %>%
> >>>Do_this %>%
> >>>Do_that(whatever) %>%
> >>>ggplot(...) +
> >>>geom_whatever(...) +
> >>>...
> >>>
> >>> My understanding is this is a bit of a historical anomaly that might 
> >>> someday be modified back.
> >>>
> >>> As I understand it, the call to ggplot() creates a partially filled-in 
> >>> object that holds all kinds of useful info. The additional calls to 
> >>> geom_point() and so on will add/change that hidden object. Nothing much 
> >>> happens till the object is implicitly or explicitly given to print() 
> >>> which switches to the print function for objects of that type and creates 
> >>> a graph based on the contents of the object at that time. So, in theory, 
> >>> you could have a pipelined version of ggplot where the first function 
> >>> accepts something like a  data.frame or tibble as the default first 
> >>> argument and at the end returns the object we have been describing. All 
> >>> additional functions would then accept such an object as the (hidden?) 
> >>> first argument and return the modified object. The final function in the 
> >>> pipe would either have the value captured in a variable for later use or 
> >>> print implicitly generating a graph.
> >>>
> >>> So the above sil

Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Kevin Ushey
I agree with Duncan that the right solution is to wrap the pipe
expression with parentheses. Having the parser treat newlines
differently based on whether the session is interactive, or on what
type of operator happens to follow a newline, feels like a pretty big
can of worms.

I think this (or something similar) would accomplish what you want
while still retaining the nice aesthetics of the pipe expression, with
a minimal amount of syntax "noise":

result <- (
  data
|> op1()
|> op2()
)

For interactive sessions where you wanted to execute only parts of the
pipeline at a time, I could see that being accomplished by the editor
-- it could transform the expression so that it could be handled by R,
either by hoisting the pipe operator(s) up a line, or by wrapping the
to-be-executed expression in parentheses for you. If such a style of
coding became popular enough, I'm sure the developers of such editors
would be interested and willing to support this ...

Perhaps more importantly, it would be much easier to accomplish than a
change to the behavior of the R parser, and it would be work that
wouldn't have to be maintained by the R Core team.

Best,
Kevin

On Wed, Dec 9, 2020 at 11:34 AM Timothy Goodman  wrote:
>
> If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
> command in the Notebook environment I'm using) I certainly *would* expect R
> to treat it as a complete statement.
>
> But what I'm talking about is a different case, where I highlight a
> multi-line statement in my notebook:
>
> my_data_frame1
> |> filter(some_conditions_1)
>
> and then press Ctrl+Enter.  Or, I suppose the equivalent would be to run an
> R script containing those two lines of code, or to run a multi-line
> statement like that from the console (which in RStudio I can do by pressing
> Shift+Enter between the lines.)
>
> In those cases, R could either (1) Give an error message [the current
> behavior], or (2) understand that the first line is meant to be piped to
> the second.  The second option would be significantly more useful, and is
> almost certainly what the user intended.
>
> (For what it's worth, there are some languages, such as Javascript, that
> consider the first token of the next line when determining if the previous
> line was complete.  JavaScript's rules around this are overly complicated,
> but a rule like "a pipe following a line break is treated as continuing the
> previous line" would be much simpler.  And while it might be objectionable
> to treat the operator %>% different from other operators, the addition of
> |>, which isn't truly an operator at all, seems like the right time to
> consider it.)
>
> -Tim
>
> On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch 
> wrote:
>
> > The requirement for operators at the end of the line comes from the
> > interactive nature of R.  If you type
> >
> >  my_data_frame_1
> >
> > how could R know that you are not done, and are planning to type the
> > rest of the expression
> >
> >%>% filter(some_conditions_1)
> >...
> >
> > before it should consider the expression complete?  The way languages
> > like C do this is by requiring a statement terminator at the end.  You
> > can also do it by wrapping the entire thing in parentheses ().
> >
> > However, be careful: Don't use braces:  they don't work.  And parens
> > have the side effect of removing invisibility from the result (which is
> > a design flaw or bonus, depending on your point of view).  So I actually
> > wouldn't advise this workaround.
> >
> > Duncan Murdoch
> >
> >
> > On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
> > > Hi,
> > >
> > > I'm a data scientist who routinely uses R in my day-to-day work, for
> > tasks
> > > such as cleaning and transforming data, exploratory data analysis, etc.
> > > This includes frequent use of the pipe operator from the magrittr and
> > dplyr
> > > libraries, %>%.  So, I was pleased to hear about the recent work on a
> > > native pipe operator, |>.
> > >
> > > This seems like a good time to bring up the main pain point I encounter
> > > when using pipes in R, and some suggestions on what could be done about
> > > it.  The issue is that the pipe operator can't be placed at the start of
> > a
> > > line of code (except in parentheses).  That's no different than any
> > binary
> > > operator in R, but I find it's a source of difficulty for the pipe
> > because
> > > of how pipes are often used.
> > >
> > > [I'm assuming here that my usage is fairly typical of a lot of users; at
> > > any rate, I don't think I'm *too* unusual.]
> > >
> > > === Why this is a problem ===
> > >
> > > It's very common (for me, and I suspect for many users of dplyr) to write
> > > multi-step pipelines and put each step on its own line for readability.
> > > Something like this:
> > >
> > >### Example 1 ###
> > >my_data_frame_1 %>%
> > >  filter(some_conditions_1) %>%
> > >  inner_join(my_data_frame_2, by = some_columns_1) %>%
> > >

Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Timothy Goodman
If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
command in the Notebook environment I'm using) I certainly *would* expect R
to treat it as a complete statement.

But what I'm talking about is a different case, where I highlight a
multi-line statement in my notebook:

my_data_frame1
|> filter(some_conditions_1)

and then press Ctrl+Enter.  Or, I suppose the equivalent would be to run an
R script containing those two lines of code, or to run a multi-line
statement like that from the console (which in RStudio I can do by pressing
Shift+Enter between the lines.)

In those cases, R could either (1) Give an error message [the current
behavior], or (2) understand that the first line is meant to be piped to
the second.  The second option would be significantly more useful, and is
almost certainly what the user intended.

(For what it's worth, there are some languages, such as Javascript, that
consider the first token of the next line when determining if the previous
line was complete.  JavaScript's rules around this are overly complicated,
but a rule like "a pipe following a line break is treated as continuing the
previous line" would be much simpler.  And while it might be objectionable
to treat the operator %>% different from other operators, the addition of
|>, which isn't truly an operator at all, seems like the right time to
consider it.)

-Tim

On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch 
wrote:

> The requirement for operators at the end of the line comes from the
> interactive nature of R.  If you type
>
>  my_data_frame_1
>
> how could R know that you are not done, and are planning to type the
> rest of the expression
>
>%>% filter(some_conditions_1)
>...
>
> before it should consider the expression complete?  The way languages
> like C do this is by requiring a statement terminator at the end.  You
> can also do it by wrapping the entire thing in parentheses ().
>
> However, be careful: Don't use braces:  they don't work.  And parens
> have the side effect of removing invisibility from the result (which is
> a design flaw or bonus, depending on your point of view).  So I actually
> wouldn't advise this workaround.
>
> Duncan Murdoch
>
>
> On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
> > Hi,
> >
> > I'm a data scientist who routinely uses R in my day-to-day work, for
> tasks
> > such as cleaning and transforming data, exploratory data analysis, etc.
> > This includes frequent use of the pipe operator from the magrittr and
> dplyr
> > libraries, %>%.  So, I was pleased to hear about the recent work on a
> > native pipe operator, |>.
> >
> > This seems like a good time to bring up the main pain point I encounter
> > when using pipes in R, and some suggestions on what could be done about
> > it.  The issue is that the pipe operator can't be placed at the start of
> a
> > line of code (except in parentheses).  That's no different than any
> binary
> > operator in R, but I find it's a source of difficulty for the pipe
> because
> > of how pipes are often used.
> >
> > [I'm assuming here that my usage is fairly typical of a lot of users; at
> > any rate, I don't think I'm *too* unusual.]
> >
> > === Why this is a problem ===
> >
> > It's very common (for me, and I suspect for many users of dplyr) to write
> > multi-step pipelines and put each step on its own line for readability.
> > Something like this:
> >
> >### Example 1 ###
> >my_data_frame_1 %>%
> >  filter(some_conditions_1) %>%
> >  inner_join(my_data_frame_2, by = some_columns_1) %>%
> >  group_by(some_columns_2) %>%
> >  summarize(some_aggregate_functions_1) %>%
> >  filter(some_conditions_2) %>%
> >  left_join(my_data_frame_3, by = some_columns_3) %>%
> >  group_by(some_columns_4) %>%
> >  summarize(some_aggregate_functions_2) %>%
> >  arrange(some_columns_5)
> >
> > [I guess some might consider this an overly long pipeline; for me it's
> > pretty typical.  I *could* split it up by assigning intermediate results
> to
> > variables, but much of the value I get from the pipe is that it lets my
> > code communicate which results are temporary, and which will be used
> again
> > later.  Assigning variables for single-use results would remove that
> > expressiveness.]
> >
> > I would prefer (for reasons I'll explain) to be able to write the above
> > example like this, which isn't valid R:
> >
> >### Example 2 (not valid R) ###
> >my_data_frame_1
> >  %>% filter(some_conditions_1)
> >  %>% inner_join(my_data_frame_2, by = some_columns_1)
> >  %>% group_by(some_columns_2)
> >  %>% summarize(some_aggregate_functions_1)
> >  %>% filter(some_conditions_2)
> >  %>% left_join(my_data_frame_3, by = some_columns_3)
> >  %>% group_by(some_columns_4)
> >  %>% summarize(some_aggregate_functions_2)
> >  %>% arrange(some_columns_5)
> >
> > One (minor) advantage is obvious: It lets you easily line up the pipes,
> > which 

Re: [Rd] New pipe operator and gg plotz

2020-12-09 Thread Duncan Murdoch
Looks like Sergio Oller took your ambitious approach: 
https://github.com/zeehio/ggpipe.  It hasn't been updated since 2017, so 
there may be some new things in ggplot2 that aren't there yet.


Duncan Murdoch

On 09/12/2020 2:16 p.m., Greg Snow wrote:

Since `+` is already a function we could do regular piping to change this code:

mtcars %>%
   ggplot(aes(x=wt, y=mpg)) +
   geom_point()

to this:

mtcars %>%
   ggplot(aes(x=wt, y=mpg)) %>%
   `+`(geom_point())

Further we can write wrapper functions like:

p_geom_point <- function(x,...) {
   x + geom_point(...)
}

The run the code like:

mtcars %>%
   ggplot(aes(x=wt, y=mpg)) %>%
   p_geom_point()

All three of the above give the same plot from what I can see, but I
have not tested it with very many options beyond the above.

A really ambitious person could create a new package with wrappers for
all the ggplot2 functions that can come after the plus sign, then we
could use pipes for everything.  I don't know if there are any strange
circumstances that would make this cause problems (it probably will
slow things down slightly, but probably not enough for people to
notice).

On Sun, Dec 6, 2020 at 7:18 PM Avi Gross via R-devel
 wrote:


Thanks, Duncan. That answers my question fairly definitively.

Although it can be DONE it likely won't be for the reasons Hadley mentioned 
until we get some other product that replaces it entirely. There are some 
interesting work-arounds mentioned.

I was thinking of one that has overhead but might be a pain. Hadley mentioned a 
slight variant. The first argument to a function now is expected to be the data 
argument. The second might be the mapping. Now if the function is called with a 
new first argument that is a ggplot object, it could be possible to test the 
type and if it is a ggplot object than slide over carefully any additional 
matched arguments that were not explicitly named. Not sure that is at all easy 
to do.

Alternately, you can ask that when used in such a pipeline that the user call 
all other arguments using names like data=whatever, mapping=aes(whatever) so no 
other args need to be adjusted by position.

But all this is academic and I concede will likely not be done. I can live with 
the plus signs.


-Original Message-
From: Duncan Murdoch 
Sent: Sunday, December 6, 2020 2:50 PM
To: Avi Gross ; 'r-devel' 
Subject: Re: [Rd] New pipe operator and gg plotz

Hadley's answer (#7 here:
https://community.rstudio.com/t/why-cant-ggplot2-use/4372) makes it pretty 
clear that he thinks it would have been nice now if he had made that choice 
when ggplot2 came out, but it's not worth the effort now to change it.

Duncan Murdoch

On 06/12/2020 2:34 p.m., Avi Gross via R-devel wrote:

As someone who switches back and forth between using standard R methods and 
those of the tidyverse, depending on the problem, my mood and whether Jupiter 
aligns with Saturn in the new age of Aquarius, I have a question about the 
forthcoming built-in pipe. Will it motivate anyone to eventually change or 
enhance the ggplot functionality to have a version that gets rid of the odd use 
of the addition symbol?

I mean I now sometimes have a pipeline that looks like:

Data %>%
   Do_this %>%
   Do_that(whatever) %>%
   ggplot(...) +
   geom_whatever(...) +
   ...

My understanding is this is a bit of a historical anomaly that might someday be 
modified back.

As I understand it, the call to ggplot() creates a partially filled-in object 
that holds all kinds of useful info. The additional calls to geom_point() and 
so on will add/change that hidden object. Nothing much happens till the object 
is implicitly or explicitly given to print() which switches to the print 
function for objects of that type and creates a graph based on the contents of 
the object at that time. So, in theory, you could have a pipelined version of 
ggplot where the first function accepts something like a  data.frame or tibble 
as the default first argument and at the end returns the object we have been 
describing. All additional functions would then accept such an object as the 
(hidden?) first argument and return the modified object. The final function in 
the pipe would either have the value captured in a variable for later use or 
print implicitly generating a graph.

So the above silly example might become:

Data %>%
   Do_this %>%
   Do_that(whatever) %>%
   ggplot(...) %>%
   geom_whatever(...) %>%
   ...

Or, am I missing something here?

The language and extensions such as are now in the tidyverse might be more 
streamlined and easier to read when using consistent notation. If we now build 
a reasonable version of the pipeline in, might we encourage other uses to 
gradually migrate back closer to the mainstream?

-Original Message-
From: R-devel  On Behalf Of Rui
Barradas
Sent: Sunday, December 6, 2020 2:51 AM
To: Gregory Warnes ; Abby Spurdle

Cc: r-devel 
Subject: Re: [Rd] New pipe

Re: [Rd] New pipe operator and gg plotz

2020-12-09 Thread Greg Snow
Since `+` is already a function we could do regular piping to change this code:

mtcars %>%
  ggplot(aes(x=wt, y=mpg)) +
  geom_point()

to this:

mtcars %>%
  ggplot(aes(x=wt, y=mpg)) %>%
  `+`(geom_point())

Further we can write wrapper functions like:

p_geom_point <- function(x,...) {
  x + geom_point(...)
}

The run the code like:

mtcars %>%
  ggplot(aes(x=wt, y=mpg)) %>%
  p_geom_point()

All three of the above give the same plot from what I can see, but I
have not tested it with very many options beyond the above.

A really ambitious person could create a new package with wrappers for
all the ggplot2 functions that can come after the plus sign, then we
could use pipes for everything.  I don't know if there are any strange
circumstances that would make this cause problems (it probably will
slow things down slightly, but probably not enough for people to
notice).

On Sun, Dec 6, 2020 at 7:18 PM Avi Gross via R-devel
 wrote:
>
> Thanks, Duncan. That answers my question fairly definitively.
>
> Although it can be DONE it likely won't be for the reasons Hadley mentioned 
> until we get some other product that replaces it entirely. There are some 
> interesting work-arounds mentioned.
>
> I was thinking of one that has overhead but might be a pain. Hadley mentioned 
> a slight variant. The first argument to a function now is expected to be the 
> data argument. The second might be the mapping. Now if the function is called 
> with a new first argument that is a ggplot object, it could be possible to 
> test the type and if it is a ggplot object than slide over carefully any 
> additional matched arguments that were not explicitly named. Not sure that is 
> at all easy to do.
>
> Alternately, you can ask that when used in such a pipeline that the user call 
> all other arguments using names like data=whatever, mapping=aes(whatever) so 
> no other args need to be adjusted by position.
>
> But all this is academic and I concede will likely not be done. I can live 
> with the plus signs.
>
>
> -Original Message-
> From: Duncan Murdoch 
> Sent: Sunday, December 6, 2020 2:50 PM
> To: Avi Gross ; 'r-devel' 
> Subject: Re: [Rd] New pipe operator and gg plotz
>
> Hadley's answer (#7 here:
> https://community.rstudio.com/t/why-cant-ggplot2-use/4372) makes it pretty 
> clear that he thinks it would have been nice now if he had made that choice 
> when ggplot2 came out, but it's not worth the effort now to change it.
>
> Duncan Murdoch
>
> On 06/12/2020 2:34 p.m., Avi Gross via R-devel wrote:
> > As someone who switches back and forth between using standard R methods and 
> > those of the tidyverse, depending on the problem, my mood and whether 
> > Jupiter aligns with Saturn in the new age of Aquarius, I have a question 
> > about the forthcoming built-in pipe. Will it motivate anyone to eventually 
> > change or enhance the ggplot functionality to have a version that gets rid 
> > of the odd use of the addition symbol?
> >
> > I mean I now sometimes have a pipeline that looks like:
> >
> > Data %>%
> >   Do_this %>%
> >   Do_that(whatever) %>%
> >   ggplot(...) +
> >   geom_whatever(...) +
> >   ...
> >
> > My understanding is this is a bit of a historical anomaly that might 
> > someday be modified back.
> >
> > As I understand it, the call to ggplot() creates a partially filled-in 
> > object that holds all kinds of useful info. The additional calls to 
> > geom_point() and so on will add/change that hidden object. Nothing much 
> > happens till the object is implicitly or explicitly given to print() which 
> > switches to the print function for objects of that type and creates a graph 
> > based on the contents of the object at that time. So, in theory, you could 
> > have a pipelined version of ggplot where the first function accepts 
> > something like a  data.frame or tibble as the default first argument and at 
> > the end returns the object we have been describing. All additional 
> > functions would then accept such an object as the (hidden?) first argument 
> > and return the modified object. The final function in the pipe would either 
> > have the value captured in a variable for later use or print implicitly 
> > generating a graph.
> >
> > So the above silly example might become:
> >
> > Data %>%
> >   Do_this %>%
> >   Do_that(whatever) %>%
> >   ggplot(...) %>%
> >   geom_whatever(...) %>%
> >   ...
> >
> > Or, am I missing something here?
> >
> > The language and extensions such as are now in the tidyverse might be more 
> > streamlined and easier to read when using consistent notation. If we now 
> > build a reasonable version of the pipeline in, might we encourage other 
> > uses to gradually migrate back closer to the mainstream?
> >
> > -Original Message-
> > From: R-devel  On Behalf Of Rui
> > Barradas
> > Sent: Sunday, December 6, 2020 2:51 AM
> > To: Gregory Warnes ; Abby Spurdle
> > 
> > Cc: r-devel 
> > Subject: Re: [

Re: [Rd] Ignore Sites Option For libPaths

2020-12-09 Thread Dirk Eddelbuettel


On 9 December 2020 at 09:49, Martin Maechler wrote:
| Also, R allows the user to remove their own home directory, it
| should also allow to get a .libPaths() which contains nothing compulsory
| but R's own .Library {as only that can contain 'base' !}

That would be a very nice-to-have feature! But right now, .libPaths() does
now allow this per my reading of the help page:

 ‘.libPaths’ is used for getting or setting the library trees that
 R knows about (and hence uses when looking for packages).  If
 called with argument ‘new’, the library search path is set to the
 existing directories in ‘unique(c(new, .Library.site, .Library))’
 and this is returned.  If given no argument, a character vector
 with the currently active library trees is returned.

Hence I was trying to help OP approximate the behaviour via the command-line
but count me in as in terms of supporting this in R itself if you want to
make such a change.

Dirk

-- 
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-09 Thread Peter Dalgaard



> On 9 Dec 2020, at 16:20 , Duncan Murdoch  wrote:
> 
> To me curry(mean, na.rm = TRUE)(x) looks a lot more complicated than mean(x, 
> na.rm = TRUE), especially since it has the additional risk that users can 
> define their own function called "curry".

Not to mention that it would make people's data handling scripts look like the 
menu at an Indian restaurant ;-)

-pd

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: New pipe operator

2020-12-09 Thread Gabor Grothendieck
On Wed, Dec 9, 2020 at 12:36 PM Gabriel Becker  wrote:
> I mean, I think the bizarro pipe was a pretty clever piece of work. I was 
> impressed by what John did there, but I don't really know what you're 
> suggesting here. As you say, the bizarro pipe works now without any changes 
> and you're welcome to use it if you prefer it to base's (proposed/likely) |> 
> and magrittr's %>%.
>

If  |> exists then it will be impossible to avoid it unless the only
software you ever use is your own.
It's about the entire R ecosystem and what gets used because it is in the base.

It would still be possible to implement \(x)... without |>  so I would
go with that and rethink
the pipe situation.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: New pipe operator

2020-12-09 Thread Gabriel Becker
On Wed, Dec 9, 2020 at 8:26 AM Gabor Grothendieck 
wrote:

> On Wed, Dec 9, 2020 at 10:08 AM Duncan Murdoch 
> wrote:
> >
> > You might be interested in this blog post by Michael Barrowman:
> >
> > https://michaelbarrowman.co.uk/post/the-new-base-pipe/
> >
> > He does some timing comparisons, and the current R-devel implementations
> > of |> and \() do quite well.
>
> It does bring out that the requirement of using functions to get around the
> lack of placeholders is not free but exacts a small penalty in
> terms of performance (in addition to verbosity).
>

I mean, technically, yes, but even with that overhead it's 2 *orders of
magnitude* faster than the magrittr you're used to, and by the look of it
~3x faster than the new magrittr. And, those base pipe speeds are in
microseconds. You'd have to be running that pipeline thousands of times -
which people don't generally do with pipelines in the first place -  to see
a *5 millisecond* slowdown, which you would then happily fail to notice
completely because what your pipeline is actually doing takes so much
longer than those microseconds of the extra function call that its unlikely
to be detectable at all.



The bizarro pipe supports placeholders and so doesn't require functions
> as a workaround and thus would presumably be even faster.  It is also
> perfectly consistent with the rest of R and requires no new syntax.
> You have to explicitly add a dot as the first argument but this seems a
> better
> compromise to me than those involved with |> .
>

I mean, I think the bizarro pipe was a pretty clever piece of work. I was
impressed by what John did there, but I don't really know what you're
suggesting here. As you say, the bizarro pipe works now without any changes
and you're welcome to use it if you prefer it to base's (proposed/likely)
|> and magrittr's %>%.

~G

>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: New pipe operator

2020-12-09 Thread Gabor Grothendieck
On Wed, Dec 9, 2020 at 10:08 AM Duncan Murdoch  wrote:
>
> You might be interested in this blog post by Michael Barrowman:
>
> https://michaelbarrowman.co.uk/post/the-new-base-pipe/
>
> He does some timing comparisons, and the current R-devel implementations
> of |> and \() do quite well.

It does bring out that the requirement of using functions to get around the
lack of placeholders is not free but exacts a small penalty in
terms of performance (in addition to verbosity).

The bizarro pipe supports placeholders and so doesn't require functions
as a workaround and thus would presumably be even faster.  It is also
perfectly consistent with the rest of R and requires no new syntax.
You have to explicitly add a dot as the first argument but this seems a better
compromise to me than those involved with |> .

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-09 Thread Duncan Murdoch

On 09/12/2020 10:42 a.m., Jan van der Laan wrote:





On 09-12-2020 16:20, Duncan Murdoch wrote:

On 09/12/2020 9:55 a.m., Jan van der Laan wrote:



I think only allowing functions on the right hand side (e.g. only the |>
operator and not the |:>) would be enough to handle most cases and seems
easier to reason about. The limitations of that can easily be worked
around using existing functionality in the language.


I agree that would be sufficient, but I don't see how it makes reasoning
easier.  The transformation is trivial, so I'll assume that doesn't
consume any mental energy compared to understanding what the final
expression actually does.  Using your currying example, the choice is
between

   x |> mean(na.rm = TRUE)

which transforms to mean(x, na.rm = TRUE), or your proposed

   x |> curry(mean, na.rm = TRUE)

which transforms to

   curry(mean, na.rm = TRUE)(x)

To me curry(mean, na.rm = TRUE)(x) looks a lot more complicated than
mean(x, na.rm = TRUE), especially since it has the additional risk that
users can define their own function called "curry".



First, I do agree that

x |> mean(na.rm = TRUE)

is cleaner and this covers most of the use cases of users and many users
are used to the syntax from the magritr pipes.

However, for programmers (there is not distinct line between users and
programmers), it is simpler to reason in the sense that lhs |> rhs
always mean rhs(lhs); this does not depend on whether rhs is call or
(anonymous) function (not sure what is called what; which perhaps
illustrates the difficulty).


I think your proposed rule is pretty simple, with just one case:

lhs |> rhs

would transform to rhs(lhs).  Yes, that's simple.

The current rule is not as simple as yours, but it only has two cases 
instead of 1.  Both involve the rhs being a call, nothing else.


Case 1, the common one:  rhs is a call to a function using regular 
syntax, e.g. f(args) where args might be empty.  Then it is transformed 
to f(lhs, args).


Case 2:  rhs is a call to `function`, which we normally write as 
"function(args) body", which is transformed to (function(args) body)(lhs).


That's it!  Nothing else is allowed.  Not as simple as yours, but simple 
enough to be trivial to reason about.  Most of the effort would be spent 
in figuring out how the transformed expression would evaluate, and since 
your transformed expression is more complicated in the common case where 
currying is needed, I prefer the current proposal.





As soon as you start to have functions returning functions, you have to
think about how many brackets you have to place where. Being able to use
functions returning functions does open up possibilities for
programmers, as illustrated for example in my example using expressions.
This would have been much less clear.


I think your examples would work in the current system, too, with a 
small change to fexpr.  A corresponding change to curry could be made, 
but then it wouldn't be doing currying, so I won't do that.  Here's your 
example rewritten in the R-devel system:


fexpr <- function(x, expr){
  expr <- substitute(expr)
  f <- function(.) {}
  body(f) <- expr
  f(x)
}
. <- fexpr


1:10 |> mean()
c(1,3,NA) |> mean(na.rm = TRUE)
c(1,3,NA) |> .( mean(., na.rm = TRUE) ) |> identity()
c(1,3,NA) |> .( . + 4)
c(1,3,NA) |> fexpr( . + 4)
c(1,3,NA) |> function(x) mean(x, na.rm = TRUE) |> fexpr(. + 1)

That produces the same outputs as your code.

Duncan Murdoch



The argument of users begin able to redefine curry. Yes they can and
this is perhaps a good thing. They can also redefine a lot of other
stuff. And I am not suggesting that curry or fexpr or . are good names.
You could even have a curry operator.

Best,
Jan







Duncan Murdoch



The problem with only allowing

x |> mean

and not

x |> mean()

is with additional arguments. However, this can be solved with a
currying function, for example:

x |> curry(mean, na.rm = TRUE)

The cost is a few additional characters.

In the same way it is possible to write a function that accepts an
expression and returns a function containing that expression. This can
be used to have expressions on the right-hand side and reduces the need
for anonymous functions.

x |> fexpr(. + 10)
dta |> fexpr(lm(y ~ x, data = .))

You could call this function .:

x |> .(. + 10)
dta |> .(lm(y ~ x, data = .))


Dummy example code (thanks to  a colleague of mine)


fexpr <- function(expr){
     expr <- substitute(expr)
     f <- function(.) {}
     body(f) <- expr
     f
}
. <- fexpr

curry <- function(fun,...){
     L <- list(...)
     function(...){
   do.call(fun, c(list(...),L))
     }
}

`%|>%` <- function(e1, e2) {
     e2(e1)
}


1:10 %>% mean
c(1,3,NA) %|>% curry(mean, na.rm = TRUE)
c(1,3,NA) %|>% .( mean(., na.rm = TRUE) ) %>% identity
c(1,3,NA) %|>% .( . + 4)
c(1,3,NA) %|>% fexpr( . + 4)
c(1,3,NA) %|>% function(x) mean(x, na.rm = TRUE) %>% fexpr(. + 1)

--
Jan

__
R-devel@r-project.org mailing list
ht

Re: [Rd] New pipe operator

2020-12-09 Thread Jan van der Laan






On 09-12-2020 16:20, Duncan Murdoch wrote:

On 09/12/2020 9:55 a.m., Jan van der Laan wrote:



I think only allowing functions on the right hand side (e.g. only the |>
operator and not the |:>) would be enough to handle most cases and seems
easier to reason about. The limitations of that can easily be worked
around using existing functionality in the language.


I agree that would be sufficient, but I don't see how it makes reasoning 
easier.  The transformation is trivial, so I'll assume that doesn't 
consume any mental energy compared to understanding what the final 
expression actually does.  Using your currying example, the choice is 
between


  x |> mean(na.rm = TRUE)

which transforms to mean(x, na.rm = TRUE), or your proposed

  x |> curry(mean, na.rm = TRUE)

which transforms to

  curry(mean, na.rm = TRUE)(x)

To me curry(mean, na.rm = TRUE)(x) looks a lot more complicated than 
mean(x, na.rm = TRUE), especially since it has the additional risk that 
users can define their own function called "curry".



First, I do agree that

x |> mean(na.rm = TRUE)

is cleaner and this covers most of the use cases of users and many users 
are used to the syntax from the magritr pipes.


However, for programmers (there is not distinct line between users and 
programmers), it is simpler to reason in the sense that lhs |> rhs 
always mean rhs(lhs); this does not depend on whether rhs is call or 
(anonymous) function (not sure what is called what; which perhaps 
illustrates the difficulty).


As soon as you start to have functions returning functions, you have to 
think about how many brackets you have to place where. Being able to use 
functions returning functions does open up possibilities for 
programmers, as illustrated for example in my example using expressions. 
This would have been much less clear.


The argument of users begin able to redefine curry. Yes they can and 
this is perhaps a good thing. They can also redefine a lot of other 
stuff. And I am not suggesting that curry or fexpr or . are good names. 
You could even have a curry operator.


Best,
Jan







Duncan Murdoch



The problem with only allowing

x |> mean

and not

x |> mean()

is with additional arguments. However, this can be solved with a
currying function, for example:

x |> curry(mean, na.rm = TRUE)

The cost is a few additional characters.

In the same way it is possible to write a function that accepts an
expression and returns a function containing that expression. This can
be used to have expressions on the right-hand side and reduces the need
for anonymous functions.

x |> fexpr(. + 10)
dta |> fexpr(lm(y ~ x, data = .))

You could call this function .:

x |> .(. + 10)
dta |> .(lm(y ~ x, data = .))


Dummy example code (thanks to  a colleague of mine)


fexpr <- function(expr){
    expr <- substitute(expr)
    f <- function(.) {}
    body(f) <- expr
    f
}
. <- fexpr

curry <- function(fun,...){
    L <- list(...)
    function(...){
  do.call(fun, c(list(...),L))
    }
}

`%|>%` <- function(e1, e2) {
    e2(e1)
}


1:10 %>% mean
c(1,3,NA) %|>% curry(mean, na.rm = TRUE)
c(1,3,NA) %|>% .( mean(., na.rm = TRUE) ) %>% identity
c(1,3,NA) %|>% .( . + 4)
c(1,3,NA) %|>% fexpr( . + 4)
c(1,3,NA) %|>% function(x) mean(x, na.rm = TRUE) %>% fexpr(. + 1)

--
Jan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-09 Thread Duncan Murdoch

On 09/12/2020 9:55 a.m., Jan van der Laan wrote:



I think only allowing functions on the right hand side (e.g. only the |>
operator and not the |:>) would be enough to handle most cases and seems
easier to reason about. The limitations of that can easily be worked
around using existing functionality in the language.


I agree that would be sufficient, but I don't see how it makes reasoning 
easier.  The transformation is trivial, so I'll assume that doesn't 
consume any mental energy compared to understanding what the final 
expression actually does.  Using your currying example, the choice is 
between


 x |> mean(na.rm = TRUE)

which transforms to mean(x, na.rm = TRUE), or your proposed

 x |> curry(mean, na.rm = TRUE)

which transforms to

 curry(mean, na.rm = TRUE)(x)

To me curry(mean, na.rm = TRUE)(x) looks a lot more complicated than 
mean(x, na.rm = TRUE), especially since it has the additional risk that 
users can define their own function called "curry".


Duncan Murdoch



The problem with only allowing

x |> mean

and not

x |> mean()

is with additional arguments. However, this can be solved with a
currying function, for example:

x |> curry(mean, na.rm = TRUE)

The cost is a few additional characters.

In the same way it is possible to write a function that accepts an
expression and returns a function containing that expression. This can
be used to have expressions on the right-hand side and reduces the need
for anonymous functions.

x |> fexpr(. + 10)
dta |> fexpr(lm(y ~ x, data = .))

You could call this function .:

x |> .(. + 10)
dta |> .(lm(y ~ x, data = .))


Dummy example code (thanks to  a colleague of mine)


fexpr <- function(expr){
expr <- substitute(expr)
f <- function(.) {}
body(f) <- expr
f
}
. <- fexpr

curry <- function(fun,...){
L <- list(...)
function(...){
  do.call(fun, c(list(...),L))
}
}

`%|>%` <- function(e1, e2) {
e2(e1)
}


1:10 %>% mean
c(1,3,NA) %|>% curry(mean, na.rm = TRUE)
c(1,3,NA) %|>% .( mean(., na.rm = TRUE) ) %>% identity
c(1,3,NA) %|>% .( . + 4)
c(1,3,NA) %|>% fexpr( . + 4)
c(1,3,NA) %|>% function(x) mean(x, na.rm = TRUE) %>% fexpr(. + 1)

--
Jan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: New pipe operator

2020-12-09 Thread Duncan Murdoch

You might be interested in this blog post by Michael Barrowman:

https://michaelbarrowman.co.uk/post/the-new-base-pipe/

He does some timing comparisons, and the current R-devel implementations 
of |> and \() do quite well.


Duncan Murdoch


On 06/12/2020 4:42 a.m., Jan Gorecki wrote:

Luke,
When writing a blog post on that, could you please describe
performance implications that this new feature will carry?
AFAIU, compared to a standard way of using temporary variables, pipes
will allow to not increment REFCNT of objects being piped into.
Therefore peak memory usage could be lower in some cases.

As for brackets required on RHS, I think it makes sense to be
consistent and either require brackets for anonymous functions the
same way we require for function name, or not require brackets for
both of them.

Best,
Jan

On Sat, Dec 5, 2020 at 8:10 PM  wrote:


We went back and forth on this several times. The key advantage of
requiring parentheses is to keep things simple and consistent.  Let's
get some experience with that. If experience shows requiring
parentheses creates too many issues then we can add the option of
dropping them later (with special handling of :: and :::). It's easier
to add flexibility and complexity than to restrict it after the fact.

Best,

luke

On Sat, 5 Dec 2020, Hugh Parsonage wrote:


I'm surprised by the aversion to

mtcars |> nrow

over

mtcars |> nrow()

and I think the decision to disallow the former should be
reconsidered.  The pipe operator is only going to be used when the rhs
is a function, so there is no ambiguity with omitting the parentheses.
If it's disallowed, it becomes inconsistent with other treatments like
sapply(mtcars, typeof) where sapply(mtcars, typeof()) would just be
noise.  I'm not sure why this decision was taken

If the only issue is with the double (and triple) colon operator, then
ideally `mtcars |> base::head` should resolve to `base::head(mtcars)`
-- in other words, demote the precedence of |>

Obviously (looking at the R-Syntax branch) this decision was
considered, put into place, then dropped, but I can't see why
precisely.

Best,


Hugh.







On Sat, 5 Dec 2020 at 04:07, Deepayan Sarkar  wrote:


On Fri, Dec 4, 2020 at 7:35 PM Duncan Murdoch  wrote:


On 04/12/2020 8:13 a.m., Hiroaki Yutani wrote:

   Error: function '::' not supported in RHS call of a pipe


To me, this error looks much more friendly than magrittr's error.
Some of them got too used to specify functions without (). This
is OK until they use `::`, but when they need to use it, it takes
hours to figure out why

mtcars %>% base::head
#> Error in .::base : unused argument (head)

won't work but

mtcars %>% head

works. I think this is a too harsh lesson for ordinary R users to
learn `::` is a function. I've been wanting for magrittr to drop the
support for a function name without () to avoid this confusion,
so I would very much welcome the new pipe operator's behavior.
Thank you all the developers who implemented this!


I agree, it's an improvement on the corresponding magrittr error.

I think the semantics of not evaluating the RHS, but treating the pipe
as purely syntactical is a good decision.

I'm not sure I like the recommended way to pipe into a particular argument:

mtcars |> subset(cyl == 4) |> \(d) lm(mpg ~ disp, data = d)

or

mtcars |> subset(cyl == 4) |> function(d) lm(mpg ~ disp, data = d)

both of which are equivalent to

mtcars |> subset(cyl == 4) |> (function(d) lm(mpg ~ disp, data = d))()

It's tempting to suggest it should allow something like

mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .)


Which is really not that far off from

mtcars |> subset(cyl == 4) |> \(.) lm(mpg ~ disp, data = .)

once you get used to it.

One consequence of the implementation is that it's not clear how
multiple occurrences of the placeholder would be interpreted. With
magrittr,

sort(runif(10)) %>% ecdf(.)(.)
## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

This is probably what you would expect, if you expect it to work at all, and not

ecdf(sort(runif(10)))(sort(runif(10)))

There would be no such ambiguity with anonymous functions

sort(runif(10)) |> \(.) ecdf(.)(.)

-Deepayan


which would be expanded to something equivalent to the other versions:
but that makes it quite a bit more complicated.  (Maybe _ or \. should
be used instead of ., since those are not legal variable names.)

I don't think there should be an attempt to copy magrittr's special
casing of how . is used in determining whether to also include the
previous value as first argument.

Duncan Murdoch




Best,
Hiroaki Yutani

2020年12月4日(金) 20:51 Duncan Murdoch :


Just saw this on the R-devel news:


R now provides a simple native pipe syntax ‘|>’ as well as a shorthand
notation for creating functions, e.g. ‘\(x) x + 1’ is parsed as
‘function(x) x + 1’. The pipe implementation as a syntax transformation
was motivated by suggestions from Jim Hester and Lionel Henry. These
features are experimental 

Re: [Rd] New pipe operator

2020-12-09 Thread Jan van der Laan




On 08-12-2020 12:46, Gabor Grothendieck wrote:

Duncan Murdoch:

I agree it's all about call expressions, but they aren't all being
treated equally:

x |> f(...)

expands to f(x, ...), while

x |> `function`(...)

expands to `function`(...)(x).  This is an exception to the rule for


Yes, this is the problem.  It is trying to handle two different sorts of right
hand sides, calls and functions, using only syntax level operations and
it really needs to either make use of deeper information or have some
method that is available at the syntax level for identifying whether the
right hand side is a call or function.  In the latter case having two
operators would be one way to do it.

   f <- \(x) x + 1
   x |> f()  # call
   x |:> f  # function
   x |:> \(x) x + 1  # function

In the other case where deeper information is used there would only be one
operator and it would handle all cases but would use more than just syntax
level knowledge.

R solved these sorts of problems long ago using S3 and other object oriented
systems which dispatch different methods based on what the right hand side is.
The attempt to avoid using the existing or equivalent mechanisms seems to have
led to this problem.





I think only allowing functions on the right hand side (e.g. only the |> 
operator and not the |:>) would be enough to handle most cases and seems 
easier to reason about. The limitations of that can easily be worked 
around using existing functionality in the language.


The problem with only allowing

x |> mean

and not

x |> mean()

is with additional arguments. However, this can be solved with a 
currying function, for example:


x |> curry(mean, na.rm = TRUE)

The cost is a few additional characters.

In the same way it is possible to write a function that accepts an 
expression and returns a function containing that expression. This can 
be used to have expressions on the right-hand side and reduces the need 
for anonymous functions.


x |> fexpr(. + 10)
dta |> fexpr(lm(y ~ x, data = .))

You could call this function .:

x |> .(. + 10)
dta |> .(lm(y ~ x, data = .))


Dummy example code (thanks to  a colleague of mine)


fexpr <- function(expr){
  expr <- substitute(expr)
  f <- function(.) {}
  body(f) <- expr
  f
}
. <- fexpr

curry <- function(fun,...){
  L <- list(...)
  function(...){
do.call(fun, c(list(...),L))
  }
}

`%|>%` <- function(e1, e2) {
  e2(e1)
}


1:10 %>% mean
c(1,3,NA) %|>% curry(mean, na.rm = TRUE)
c(1,3,NA) %|>% .( mean(., na.rm = TRUE) ) %>% identity
c(1,3,NA) %|>% .( . + 4)
c(1,3,NA) %|>% fexpr( . + 4)
c(1,3,NA) %|>% function(x) mean(x, na.rm = TRUE) %>% fexpr(. + 1)

--
Jan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Gabor Grothendieck
On Wed, Dec 9, 2020 at 4:03 AM Timothy Goodman  wrote:
> But the bigger issue happens when I want to re-run just *part* of the
> pipeline.

Insert one of the following into the pipeline. It does not require that you
edit any lines.   It only involves inserting a line.

  print %>%
  { str(.); . } %>%
  { . ->> .save } %>%

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Duncan Murdoch
The requirement for operators at the end of the line comes from the 
interactive nature of R.  If you type


my_data_frame_1

how could R know that you are not done, and are planning to type the 
rest of the expression


  %>% filter(some_conditions_1)
  ...

before it should consider the expression complete?  The way languages 
like C do this is by requiring a statement terminator at the end.  You 
can also do it by wrapping the entire thing in parentheses ().


However, be careful: Don't use braces:  they don't work.  And parens 
have the side effect of removing invisibility from the result (which is 
a design flaw or bonus, depending on your point of view).  So I actually 
wouldn't advise this workaround.


Duncan Murdoch


On 09/12/2020 12:45 a.m., Timothy Goodman wrote:

Hi,

I'm a data scientist who routinely uses R in my day-to-day work, for tasks
such as cleaning and transforming data, exploratory data analysis, etc.
This includes frequent use of the pipe operator from the magrittr and dplyr
libraries, %>%.  So, I was pleased to hear about the recent work on a
native pipe operator, |>.

This seems like a good time to bring up the main pain point I encounter
when using pipes in R, and some suggestions on what could be done about
it.  The issue is that the pipe operator can't be placed at the start of a
line of code (except in parentheses).  That's no different than any binary
operator in R, but I find it's a source of difficulty for the pipe because
of how pipes are often used.

[I'm assuming here that my usage is fairly typical of a lot of users; at
any rate, I don't think I'm *too* unusual.]

=== Why this is a problem ===

It's very common (for me, and I suspect for many users of dplyr) to write
multi-step pipelines and put each step on its own line for readability.
Something like this:

   ### Example 1 ###
   my_data_frame_1 %>%
 filter(some_conditions_1) %>%
 inner_join(my_data_frame_2, by = some_columns_1) %>%
 group_by(some_columns_2) %>%
 summarize(some_aggregate_functions_1) %>%
 filter(some_conditions_2) %>%
 left_join(my_data_frame_3, by = some_columns_3) %>%
 group_by(some_columns_4) %>%
 summarize(some_aggregate_functions_2) %>%
 arrange(some_columns_5)

[I guess some might consider this an overly long pipeline; for me it's
pretty typical.  I *could* split it up by assigning intermediate results to
variables, but much of the value I get from the pipe is that it lets my
code communicate which results are temporary, and which will be used again
later.  Assigning variables for single-use results would remove that
expressiveness.]

I would prefer (for reasons I'll explain) to be able to write the above
example like this, which isn't valid R:

   ### Example 2 (not valid R) ###
   my_data_frame_1
 %>% filter(some_conditions_1)
 %>% inner_join(my_data_frame_2, by = some_columns_1)
 %>% group_by(some_columns_2)
 %>% summarize(some_aggregate_functions_1)
 %>% filter(some_conditions_2)
 %>% left_join(my_data_frame_3, by = some_columns_3)
 %>% group_by(some_columns_4)
 %>% summarize(some_aggregate_functions_2)
 %>% arrange(some_columns_5)

One (minor) advantage is obvious: It lets you easily line up the pipes,
which means that you can see at a glance that the whole block is a single
pipeline, and you'd immediately notice if you inadvertently omitted a pipe,
which otherwise can lead to confusing output.  [It's also aesthetically
pleasing, especially when %>% is replaced with |>, but that's subjective.]

But the bigger issue happens when I want to re-run just *part* of the
pipeline.  I do this often when debugging: if the output of the pipeline
seems wrong, I re-run the first few steps and check the output, then
include a little more and re-run again, etc., until I locate my mistake.
Working in an interactive notebook environment, this involves using the
cursor to select just the part of the code I want to re-run.

It's fast and easy to select *entire* lines of code, but unfortunately with
the pipes placed at the end of the line I must instead select everything
*except* the last three characters of the line (the last two characters for
the new pipe).  Then when I want to re-run the same partial pipeline with
the next line of code included, I can't just press SHIFT+Down to select it
as I otherwise would, but instead must move the cursor horizontally to a
position three characters before the end of *that* line (which is generally
different due to varying line lengths).  And so forth each time I want to
include an additional line.

Moreover, with the staggered positions of the pipes at the end of each
line, it's very easy to accidentally select the final pipe on a line, and
then sit there for a moment wondering if the environment has stopped
responding before realizing it's just waiting for further input (i.e., for
the right-hand side).  These small delays and disruptions add up over the
course of a day.

This desire t

Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Stefan Evert
I'm not a pipe user, so I may be overlooking some issue, but wouldn't simply 
putting identity() on the last line solve your main problem?

### Example 1 ###
 my_data_frame_1 %>%
   filter(some_conditions_1) %>%
   inner_join(my_data_frame_2, by = some_columns_1) %>%
   group_by(some_columns_2) %>%
   summarize(some_aggregate_functions_1) %>%
   filter(some_conditions_2) %>%
   left_join(my_data_frame_3, by = some_columns_3) %>%
   group_by(some_columns_4) %>%
   summarize(some_aggregate_functions_2) %>%
   arrange(some_columns_5) %>%
   identity()

I agree that it would be visually more pleasing to have the pipe symbols lined 
up at the start of each line, but I don't think it's worth breaking R's 
principle of evaluating any line with a complete expression.

With your solution 1, R wouldn't be able to execute any complete command 
because it would have to wait and see if the next line happens to start with 
%>%.

With your solution 2, 
  
  my_data_frame_1 %>%

would be a complete expression (because an extra trailing %>% is allowed on the 
last line of a pipe) and hence execute immediately rather than wait for the 
next line.

Best,
Stefan


> On 9 Dec 2020, at 06:45, Timothy Goodman  wrote:
> 
> Hi,
> 
> I'm a data scientist who routinely uses R in my day-to-day work, for tasks
> such as cleaning and transforming data, exploratory data analysis, etc.
> This includes frequent use of the pipe operator from the magrittr and dplyr
> libraries, %>%.  So, I was pleased to hear about the recent work on a
> native pipe operator, |>.
> 
> This seems like a good time to bring up the main pain point I encounter
> when using pipes in R, and some suggestions on what could be done about
> it.  The issue is that the pipe operator can't be placed at the start of a
> line of code (except in parentheses).  That's no different than any binary
> operator in R, but I find it's a source of difficulty for the pipe because
> of how pipes are often used.
> 
> [I'm assuming here that my usage is fairly typical of a lot of users; at
> any rate, I don't think I'm *too* unusual.]
> 
> === Why this is a problem ===
> 
> It's very common (for me, and I suspect for many users of dplyr) to write
> multi-step pipelines and put each step on its own line for readability.
> Something like this:
> 
>  ### Example 1 ###
>  my_data_frame_1 %>%
>filter(some_conditions_1) %>%
>inner_join(my_data_frame_2, by = some_columns_1) %>%
>group_by(some_columns_2) %>%
>summarize(some_aggregate_functions_1) %>%
>filter(some_conditions_2) %>%
>left_join(my_data_frame_3, by = some_columns_3) %>%
>group_by(some_columns_4) %>%
>summarize(some_aggregate_functions_2) %>%
>arrange(some_columns_5)
> 
> [I guess some might consider this an overly long pipeline; for me it's
> pretty typical.  I *could* split it up by assigning intermediate results to
> variables, but much of the value I get from the pipe is that it lets my
> code communicate which results are temporary, and which will be used again
> later.  Assigning variables for single-use results would remove that
> expressiveness.]
> 
> I would prefer (for reasons I'll explain) to be able to write the above
> example like this, which isn't valid R:
> 
>  ### Example 2 (not valid R) ###
>  my_data_frame_1
>%>% filter(some_conditions_1)
>%>% inner_join(my_data_frame_2, by = some_columns_1)
>%>% group_by(some_columns_2)
>%>% summarize(some_aggregate_functions_1)
>%>% filter(some_conditions_2)
>%>% left_join(my_data_frame_3, by = some_columns_3)
>%>% group_by(some_columns_4)
>%>% summarize(some_aggregate_functions_2)
>%>% arrange(some_columns_5)
> 
> One (minor) advantage is obvious: It lets you easily line up the pipes,
> which means that you can see at a glance that the whole block is a single
> pipeline, and you'd immediately notice if you inadvertently omitted a pipe,
> which otherwise can lead to confusing output.  [It's also aesthetically
> pleasing, especially when %>% is replaced with |>, but that's subjective.]
> 
> But the bigger issue happens when I want to re-run just *part* of the
> pipeline.  I do this often when debugging: if the output of the pipeline
> seems wrong, I re-run the first few steps and check the output, then
> include a little more and re-run again, etc., until I locate my mistake.
> Working in an interactive notebook environment, this involves using the
> cursor to select just the part of the code I want to re-run.
> 
> It's fast and easy to select *entire* lines of code, but unfortunately with
> the pipes placed at the end of the line I must instead select everything
> *except* the last three characters of the line (the last two characters for
> the new pipe).  Then when I want to re-run the same partial pipeline with
> the next line of code included, I can't just press SHIFT+Down to select it
> as I otherwise would, but instead must move the cursor horizontally to a
> position three characters before the end of

[Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Timothy Goodman
Hi,

I'm a data scientist who routinely uses R in my day-to-day work, for tasks
such as cleaning and transforming data, exploratory data analysis, etc.
This includes frequent use of the pipe operator from the magrittr and dplyr
libraries, %>%.  So, I was pleased to hear about the recent work on a
native pipe operator, |>.

This seems like a good time to bring up the main pain point I encounter
when using pipes in R, and some suggestions on what could be done about
it.  The issue is that the pipe operator can't be placed at the start of a
line of code (except in parentheses).  That's no different than any binary
operator in R, but I find it's a source of difficulty for the pipe because
of how pipes are often used.

[I'm assuming here that my usage is fairly typical of a lot of users; at
any rate, I don't think I'm *too* unusual.]

=== Why this is a problem ===

It's very common (for me, and I suspect for many users of dplyr) to write
multi-step pipelines and put each step on its own line for readability.
Something like this:

  ### Example 1 ###
  my_data_frame_1 %>%
filter(some_conditions_1) %>%
inner_join(my_data_frame_2, by = some_columns_1) %>%
group_by(some_columns_2) %>%
summarize(some_aggregate_functions_1) %>%
filter(some_conditions_2) %>%
left_join(my_data_frame_3, by = some_columns_3) %>%
group_by(some_columns_4) %>%
summarize(some_aggregate_functions_2) %>%
arrange(some_columns_5)

[I guess some might consider this an overly long pipeline; for me it's
pretty typical.  I *could* split it up by assigning intermediate results to
variables, but much of the value I get from the pipe is that it lets my
code communicate which results are temporary, and which will be used again
later.  Assigning variables for single-use results would remove that
expressiveness.]

I would prefer (for reasons I'll explain) to be able to write the above
example like this, which isn't valid R:

  ### Example 2 (not valid R) ###
  my_data_frame_1
%>% filter(some_conditions_1)
%>% inner_join(my_data_frame_2, by = some_columns_1)
%>% group_by(some_columns_2)
%>% summarize(some_aggregate_functions_1)
%>% filter(some_conditions_2)
%>% left_join(my_data_frame_3, by = some_columns_3)
%>% group_by(some_columns_4)
%>% summarize(some_aggregate_functions_2)
%>% arrange(some_columns_5)

One (minor) advantage is obvious: It lets you easily line up the pipes,
which means that you can see at a glance that the whole block is a single
pipeline, and you'd immediately notice if you inadvertently omitted a pipe,
which otherwise can lead to confusing output.  [It's also aesthetically
pleasing, especially when %>% is replaced with |>, but that's subjective.]

But the bigger issue happens when I want to re-run just *part* of the
pipeline.  I do this often when debugging: if the output of the pipeline
seems wrong, I re-run the first few steps and check the output, then
include a little more and re-run again, etc., until I locate my mistake.
Working in an interactive notebook environment, this involves using the
cursor to select just the part of the code I want to re-run.

It's fast and easy to select *entire* lines of code, but unfortunately with
the pipes placed at the end of the line I must instead select everything
*except* the last three characters of the line (the last two characters for
the new pipe).  Then when I want to re-run the same partial pipeline with
the next line of code included, I can't just press SHIFT+Down to select it
as I otherwise would, but instead must move the cursor horizontally to a
position three characters before the end of *that* line (which is generally
different due to varying line lengths).  And so forth each time I want to
include an additional line.

Moreover, with the staggered positions of the pipes at the end of each
line, it's very easy to accidentally select the final pipe on a line, and
then sit there for a moment wondering if the environment has stopped
responding before realizing it's just waiting for further input (i.e., for
the right-hand side).  These small delays and disruptions add up over the
course of a day.

This desire to select and re-run the first part of a pipeline is also the
reason why it doesn't suffice to achieve syntax like my "Example 2" by
wrapping the entire pipeline in parentheses.  That's of no use if I want to
re-run a selection that doesn't include the final close-paren.

=== Possible Solutions ===

I can think of two, but maybe there are others.  The first would make
"Example 2" into valid code, and the second would allow you to run a
selection that included a trailing pipe.

  Solution 1: Add a special case to how R is parsed, so if the first
(non-whitespace) token after an end-line is a pipe, that pipe gets moved to
before the end-line.
- Argument for: This lets you write code like example 2, which
addresses the pain point around re-running part of a pipeline, and has
advantages for readability.  Also, s

Re: [Rd] Ignore Sites Option For libPaths

2020-12-09 Thread Martin Maechler
> Gabriel Becker 
> on Tue, 8 Dec 2020 17:09:57 -0800 writes:

> Of course you can, but the ability to do something via R
> code and the ability to do them by wrapping the invocation
> of R are not similar terms of convenience, IMO.

> I say that as someone who routinely does both type of
> thing.

> ~G

I agree with Gabe here.
Also, R allows the user to remove their own home directory, it
should also allow to get a .libPaths() which contains nothing compulsory
but R's own .Library {as only that can contain 'base' !}

Martin


> On Tue, Dec 8, 2020 at 4:07 PM Dirk Eddelbuettel
>  wrote:

>> 
>> On 8 December 2020 at 23:00, Dario Strbenac wrote: |
>> Could .libPaths gain an option to ignore all values other
>> than the user-specified new parameter? Currently, it
>> takes the union of new and .Library and .Library.site and
>> there is no way to turn it off.
>> 
>> Are you use? It is constructed from looking at
>> environment variables you could set.
>> 
>> edd@rob:~$ R_LIBS="/tmp" R_LIBS_SITE="/var" Rscript -e
>> 'print(.libPaths())' [1] "/tmp" "/var"
>> "/usr/lib/R/library" edd@rob:~$
>> 
>> Dirk

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel