Re: [Rd] Improved Data Aggregation and Summary Statistics in R

2019-02-27 Thread Sebastian Martin Krantz
Dear Iñaki and Joris,

thank you for the positive feedback! I had attached a code file to the
post, but apparently it was removed.
I will attach it again to this e-mail, otherwise both vignette and code can
be downloaded from the following link:
https://www.dropbox.com/sh/s0k1tiz7el55g1q/AACpri-nruXjcMwUnNcHoycKa?dl=0
Best,
Sebastian

On Wed, 27 Feb 2019 at 11:14, Joris Meys  wrote:

> Dear Sebastian,
>
> Initially I was a bit hesitant to think about yet another way to summarize
> data, but your illustrations convinced me this is actually a great addition
> to the toolset currently available in different R packages. Many of us have
> written custom functions to get the required tables for specific data sets,
> but this would reduce that effort to simply using the right collap() call.
>
> Like Inaki, I'm very interested in trying it out if you have the code
> available somewhere.
>
> Cheers
> Joris
>
>
>
>
>
> On Wed, Feb 27, 2019 at 9:01 AM Sebastian Martin Krantz <
> sebastian.kra...@graduateinstitute.ch> wrote:
>
>> Dear Developers,
>>
>> Having spent time developing and thinking about how data aggregation and
>> summary statistics can be enhanced in R, I would like to present my
>> ideas/efforts in the form of two commands:
>>
>> The first, which for now I called 'collap', is an upgrade of aggregate
>> that
>> accommodates and extends the functionality of aggregate in various
>> respects, most importantly to work with multilevel and multi-type data,
>> multiple function calls, highly customized aggregation tasks, a much
>> greater flexibility in the passing of inputs and tidy output.
>>
>> The second function, 'qsu', is an advanced and flexible summary command
>> for
>> cross-sectional and multilevel (panel) data (i.e. it can provide overall,
>> between and within entities statistics, and allows for grouping, custom
>> functions and transformations). It also provides a quick method to compute
>> and output within-transformed data.
>>
>> Both commands are efficiently built from core R, but provide for optional
>> integration with data.table, which renders them extremely fast on large
>> datasets. An explanation of the syntax, a demonstration and benchmark
>> results are provided in the attached vignette.
>>
>> Since both commands accommodate existing functionality while adding
>> significant basic functionality, I though that their addition to the stats
>> package would be a worthwhile consideration. I am happy for your feedback.
>>
>> Best regards,
>>
>> Sebastian Krantz
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Department of Data Analysis and Mathematical Modelling
> Ghent University
> Coupure Links 653, B-9000 Gent (Belgium)
>
> <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium=gmail=g>
>
> ---
> Biowiskundedagen 2018-2019
> http://www.biowiskundedagen.ugent.be/
>
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improved Data Aggregation and Summary Statistics in R

2019-02-28 Thread Sebastian Martin Krantz
Thanks to all who gave feedback so far, there is now a version of the
package on Github, it can be installed by

remotes::install_github("SebKrantz/collapse")

further feedback is still very welcome!


On Wed, 27 Feb 2019 at 12:48, Duncan Murdoch 
wrote:

> On 26/02/2019 8:25 a.m., Sebastian Martin Krantz wrote:
> > Dear Developers,
> >
> > Having spent time developing and thinking about how data aggregation and
> > summary statistics can be enhanced in R, I would like to present my
> > ideas/efforts in the form of two commands:
> >
> > The first, which for now I called 'collap', is an upgrade of aggregate
> that
> > accommodates and extends the functionality of aggregate in various
> > respects, most importantly to work with multilevel and multi-type data,
> > multiple function calls, highly customized aggregation tasks, a much
> > greater flexibility in the passing of inputs and tidy output.
> >
> > The second function, 'qsu', is an advanced and flexible summary command
> for
> > cross-sectional and multilevel (panel) data (i.e. it can provide overall,
> > between and within entities statistics, and allows for grouping, custom
> > functions and transformations). It also provides a quick method to
> compute
> > and output within-transformed data.
> >
> > Both commands are efficiently built from core R, but provide for optional
> > integration with data.table, which renders them extremely fast on large
> > datasets. An explanation of the syntax, a demonstration and benchmark
> > results are provided in the attached vignette.
> >
> > Since both commands accommodate existing functionality while adding
> > significant basic functionality, I though that their addition to the
> stats
> > package would be a worthwhile consideration. I am happy for your
> feedback.
>
> Generally the R Core group is reluctant to incorporate new functions
> into the base packages.  Each function that is added adds to their work,
> and they already have too much to do.  (I am no longer a member of R
> Core, but I don't think things have changed since I retired.)
>
> It is much easier for them if volunteers publish functions themselves,
> via contributed packages.
>
> Nowadays Github provides a very convenient platform on which you can
> develop a package containing your functions.  If other users find bugs
> or have suggested improvements, it's very easy for them to send those to
> you, and you can make the fixes available immediately.  Once you are
> satisfied that it is stable, you can submit it to CRAN, and anyone using
> R can easily install it.
>
> If you find the prospect of writing a package daunting, you shouldn't.
> It's actually quite easy, especially if you are using RStudio or ESS (or
> some other helpful front-end.)  Hadley Wickham's book
> <http://r-pkgs.had.co.nz/> is a pretty accessible description of a
> development strategy.  (It's not the only strategy, but lots of people
> use it.)
>
> Duncan Murdoch
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] base::order making available retGrp and sortStr options for radix method?

2020-05-08 Thread Sebastian Martin Krantz
Hi together,

a bit more than a month ago I have released the 'collapse' package for
advanced and fast data transformation in R with an array of fast grouped
and weighted functions and facilities for efficient grouped programming in
R.

As I am preparing the next update of this package I have come across the
following: For grouping, 'collapse' uses the function 'GRP', and efficient
wrapper around data.table:::forderv for fast radix sort based grouping. To
do this the source code for forderv was copied and deparallelized. Now I
realized that an earlier deparallelized version of forderv is already fully
available in base R:
https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/main/radixsort.c

This function is called in base::order(..., method = "radix"). I was mildly
aware that data.table ordering has made it into base R but I first thought
the grouping feature of forder had been removed. However in fact it is
there but disabled. base::order lines 31-35 reads:

  if (method == "radix") {
decreasing <- rep_len(as.logical(decreasing), length(z))
return(.Internal(radixsort(na.last, decreasing, FALSE,
  TRUE, ...)))
  }

which is essentially: return(.Internal(radixsort(na.last, decreasing,
retGrp,
  sortStr, ...))) with the retGrp arguments which returns the group
starts and the maximum group size disabled. sortStr = FALSE can be used to
do unordered groupings.

My request is if it is possible to make available these features to the
user. It would make available extremely fast ordered grouping facilities to
all developers and prevent the need for people like myself to copy this
source code. In R it could be made available through a simple function like:

radixorder <- function(..., na.last = TRUE, decreasing = FALSE, retGrp =
FALSE,  sortStr = TRUE) {
  z <- list(...)
  decreasing <- rep_len(as.logical(decreasing), length(z))
  return(.Internal(radixsort(na.last, decreasing, retGRP,
 otharg, ...)))
}

Alternatively a macro in the C API like R_orderVector i.e.
R_orderVectorRadix would be great.

Best regards,

Sebastian




Virenfrei.
www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior

2021-06-29 Thread Sebastian Martin Krantz
Hello together, I'm working on some custom (grouped, weighted) sum, min and
max functions and I want them to support the special case of plain integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):

SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
  int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
  if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
  }
// ...
}

when I let x <- 1:1e8, fsum(x) works fine and returns the correct value. If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
into a vector again, dim(x) <- NULL, fsum(x) gives  NULL and a warning
message 'converting NULL pointer to R NULL'. For functions fmin and fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how do
I make this a robust implementation?

Best regards,

Sebastian Krantz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior

2021-06-29 Thread Sebastian Martin Krantz
Thanks both. Is there a suggested way I can get this speedup in a package?
Or just leave it for now?

Thanks also for the clarification Bill. The issue I have with that is that
in my C code ALTREP(x) evaluates to true even after adding and removing
dimensions (otherwise it would be handled by the normal sum method and I’d
be fine). Also .Internal(inspect(x)) still shows the compact
representation.

-Sebastian

On Tue 29. Jun 2021 at 19:43, Bill Dunlap  wrote:

> Adding the dimensions attribute takes away the altrep-ness.  Removing
> dimensions
> does not make it altrep.  E.g.,
>
> > a <- 1:10
> > am <- a ; dim(am) <- c(2L,5L)
> > amn <- am ; dim(amn) <- NULL
> > .Call("is_altrep", a)
> [1] TRUE
> > .Call("is_altrep", am)
> [1] FALSE
> > .Call("is_altrep", amn)
> [1] FALSE
>
> where is_altrep() is defined by the following C code:
>
> #include 
> #include 
>
> SEXP is_altrep(SEXP x)
> {
> return Rf_ScalarLogical(ALTREP(x));
> }
>
>
> -Bill
>
> On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
> sebastian.kra...@graduateinstitute.ch> wrote:
>
>> Hello together, I'm working on some custom (grouped, weighted) sum, min
>> and
>> max functions and I want them to support the special case of plain integer
>> sequences using ALTREP. I thereby encountered some behavior I cannot
>> explain to myself. The head of my fsum C function looks like this (g is
>> optional grouping vector, w is optional weights vector):
>>
>> SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
>>   int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
>> narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
>>   if(ALTREP(x) && ng == 0 && nwl) {
>> switch(tx) {
>> case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
>> case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
>> case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
>> default: error("ALTREP object must be integer or real typed");
>> }
>>   }
>> // ...
>> }
>>
>> when I let x <- 1:1e8, fsum(x) works fine and returns the correct value.
>> If
>> I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
>> into a vector again, dim(x) <- NULL, fsum(x) gives  NULL and a warning
>> message 'converting NULL pointer to R NULL'. For functions fmin and fmax
>> (similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
>> e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
>> NULL'. So what is going on here? What do these functions return? And how
>> do
>> I make this a robust implementation?
>>
>> Best regards,
>>
>> Sebastian Krantz
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior

2021-06-29 Thread Sebastian Martin Krantz
Thanks Gabriel and Luke,

I understand now the functions return NULL if no method is applicable. I
wonder though why do ALTINTEGER_MIN and MAX return NULL on a plain integer
sequence? I also see that min() and max() are not optimized i.e. min(1:1e8)
appears to materialize the vector.

In general I expect my functions to mostly be applied to real data so this
is not a huge issue for me (I’d rather get rid of it again than calling
sum() or risking that the macros are removed from the API), but it could be
nice to have this speedup available to packages. If these macros have
matured and it can be made explicit that they return NULL if no method is
applicable, or, better, they internally dispatch to a normal sum method if
this is the case, they could become very manageable and useful.

Best,

Sebastian



On Tue 29. Jun 2021 at 21:09, Gabriel Becker  wrote:

> Also, @Luke Tierney   I can prepare a patch that
> has wrappers delegate to payload's ALTREP class methods for things like
> sum, min, max, etc once conference season calms down a bit.
>
> Best,
> ~G
>
> On Tue, Jun 29, 2021 at 11:07 AM Gabriel Becker 
> wrote:
>
>> Hi Sebastian,
>>
>> So the way that it is currently factored, there isn't a good way of
>> getting what you want under the constraints of what Luke said (ALTINTEGER_SUM
>> is not part of the API).
>>
>> I don't know what his reason are for saying that per say and would not
>> want to speak for him, but of the top of my head, I suspect it is because
>> ALTREP sum methods are allowed to return NULL (the C version) to say "I
>> don't have a sum method that is applicable here, please continue with the
>> normal code". So, just as an example, your exact code is likely to
>> segfault, I think, if you hit an ALTREP that chooses not to implement a sum
>> method because you'll be running around with a SEXP that has the value NULL
>> (the C one, not the R one).
>>
>> One thing you could do, is check for altrepness and then construct and
>> evaluate a call to the R sum function in that case, but that probably isn't
>> quite what you want either, as this will hit the code you're trying to
>> bypass/speedup  in the case where the ALTREP class doesn't implement a sum
>> methods. I see that Luke just mentioned this as well but I'll leave it in
>> since I had already typed it.
>>
>> I hope that helps clarify some things.
>>
>> Best,
>> ~G
>>
>>
>> On Tue, Jun 29, 2021 at 10:13 AM Sebastian Martin Krantz <
>> sebastian.kra...@graduateinstitute.ch> wrote:
>>
>>> Thanks both. Is there a suggested way I can get this speedup in a
>>> package?
>>> Or just leave it for now?
>>>
>>> Thanks also for the clarification Bill. The issue I have with that is
>>> that
>>> in my C code ALTREP(x) evaluates to true even after adding and removing
>>> dimensions (otherwise it would be handled by the normal sum method and
>>> I’d
>>> be fine). Also .Internal(inspect(x)) still shows the compact
>>> representation.
>>>
>>> -Sebastian
>>>
>>> On Tue 29. Jun 2021 at 19:43, Bill Dunlap 
>>> wrote:
>>>
>>> > Adding the dimensions attribute takes away the altrep-ness.  Removing
>>> > dimensions
>>> > does not make it altrep.  E.g.,
>>> >
>>> > > a <- 1:10
>>> > > am <- a ; dim(am) <- c(2L,5L)
>>> > > amn <- am ; dim(amn) <- NULL
>>> > > .Call("is_altrep", a)
>>> > [1] TRUE
>>> > > .Call("is_altrep", am)
>>> > [1] FALSE
>>> > > .Call("is_altrep", amn)
>>> > [1] FALSE
>>> >
>>> > where is_altrep() is defined by the following C code:
>>> >
>>> > #include 
>>> > #include 
>>> >
>>> > SEXP is_altrep(SEXP x)
>>> > {
>>> > return Rf_ScalarLogical(ALTREP(x));
>>> > }
>>> >
>>> >
>>> > -Bill
>>> >
>>> > On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
>>> > sebastian.kra...@graduateinstitute.ch> wrote:
>>> >
>>> >> Hello together, I'm working on some custom (grouped, weighted) sum,
>>> min
>>> >> and
>>> >> max functions and I want them to support the special case of plain
>>> integer
>>> >> sequences using ALTREP. I thereby encountered some behavior I cannot
>>> >> explain to myself. The head of my fsum C function looks like this (g
>&

[Rd] pmin() and pmax() should process a single list of vectors, rather than returning it

2022-10-28 Thread Sebastian Martin Krantz
Dear R Core,

The {kit} package has a nice set of parallel statistical functions
complimenting base R's pmin() and pmax(): psum(), pprod(), pmean(), etc..
These can be called on a set of vectors like pmin() and pmax() e.g.
with(mtcars,  psum(mpg, carb, wt)) or on a single list of vectors e.g.
psum(mtcars). In contrast, pmin() and pmax() only allow the former. Calling
pmax(mtcars) oddly returns mtcars as is, without giving any error or
warning. I think this behavior should be changed to come in line with the
kit versions.

kit::psum is defined as:
psum <- function(..., na.rm=FALSE) .Call(CpsumR,  na.rm, if (...length() ==
1L && is.list(..1)) ..1 else list(...))

The first line of pmin() and pmax() is elts <- list(...). I propose
changing that first line to:
elts <- if (...length() == 1L && is.list(..1)) unclass(..1) else list(...).

This will provide convenient functionality (do.call(pmax, mtcars) is
inconvenient), and guard against the (odd) behavior of simply returning a
list passed to these functions.

Best regards,

Sebastian Krantz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Multiple Assignment built into the R Interpreter?

2023-03-11 Thread Sebastian Martin Krantz
Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can follow
all aspects you raised, but to give my limited take on a few:

> your proposal violates a very basic property of the  language, i.e. that all 
> statements are expressions and have a value.
> What's the value of 1 + (A, C = init_matrices()).

I'm not sure I see the point here. I evaluated  1 + (d = dim(mtcars);
nr = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error, as
the above expression should. `%=%` assigns to
environments, so 1 + (c("A", "C") %=% init_matrices()) returns
numeric(0), with A and C having their values assigned.

> suppose f() returns list(A = 1, B = 2) and I do
>  B, A <- f()
> Should assignment be by position or by name?

In other languages this is by position. The feature is not meant to
replace list2env(), and being able to rename objects in the assignment
is a vital feature of codes
using multi input and output functions e.g. in Matlab or Julia.

> Honestly, given that this is simply syntactic sugar, I don't think I would 
> support it.

You can call it that, but it would be used by almost every R user
almost every day. Simple things like nr, nc = dim(x); values, vectors
= eigen(x) etc. where the creation of intermediate objects
is cumbersome and redundant.

> I see you've already mentioned it ("JavaScript-like"). I think it would  
> fulfil Sebastian's requirements too, as long as it is considered "true 
> assignment" by the rest of the language.

I don't have strong opinions about how the issue is phrased or
implemented. Something like [t, n] = dim(x) might even be more clear.
It's important though that assignment remains by position,
so even if some output gets thrown away that should also be positional.

>  A <- 0
>  [A, B = A + 10] <- list(1, A = 2)

I also fail to see the use of allowing this. something like this is an error.

> A = 2
> (B = A + 1) <- 1
Error in (B = A + 1) <- 1 : could not find function "(<-"

Regarding the practical implementation, I think `collapse::%=%` is a
good starting point. It could be introduced in R as a separate
function, or `=` could be modified to accommodate its capability. It
should be clear that
with more than one LHS variables the assignment is an environment
level operation and the results can only be used in computations once
assigned to the environment, e.g. as in 1 + (c("A", "C") %=%
init_matrices()),
A and C are not available for the addition in this statement. The
interpretor then needs to be modified to read something like nr, nc =
dim(x) or [nr, nc] = dim(x). as an environment-level multiple
assignment operation with no
immediate value. Appears very feasible to my limited understanding,
but I guess there are other things to consider still. Definitely
appreciate the responses so far though.

Best regards,

Sebastian





On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch 
wrote:

> On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
> > On Sat, 11 Mar 2023 11:11:06 -0500
> > Duncan Murdoch  wrote:
> >
> >> That's clear, but your proposal violates a very basic property of the
> >> language, i.e. that all statements are expressions and have a value.
> >
> > How about reframing this feature request from multiple assignment
> > (which does go contrary to "everything has only one value, even if it's
> > sometimes invisible(NULL)") to "structured binding" / "destructuring
> > assignment" [*], which takes this single single value returned by the
> > expression and subsets it subject to certain rules? It may be easier to
> > make a decision on the semantics for destructuring assignment (e.g.
> > languages which have this feature typically allow throwing unneeded
> > parts of the return value away), and it doesn't seem to break as much
> > of the rest of the language if implemented.
> >
> > I see you've already mentioned it ("JavaScript-like"). I think it would
> > fulfil Sebastian's requirements too, as long as it is considered "true
> > assignment" by the rest of the language.
> >
> > The hard part is to propose the actual grammar of the new feature (in
> > terms of src/main/gram.y, preferably without introducing conflicts) and
> > its semantics (including the corner cases, some of which you have
> > already mentioned). I'm not sure I'm up to the task.
> >
>
> If I were doing it, here's what I'd propose:
>
>'[' formlist ']' LEFT_ASSIGN expr
>'[' formlist ']' EQ_ASSIGN expr
>expr RIGHT_ASSIGN  '[' formlist ']'
>
> where `formlist` has the syntax of the formals list for a function
> definition.  This would have the following semantics:
>
> {
>   *tmp* <- expr
>
>   # For arguments with no "default" expression,
>
>   argname1 <- *tmp*[[1]]
>   argname2 <- *tmp*[[2]]
>   ...
>
>   # For arguments with a default listed
>
>   argname3 <- with(*tmp*, default3)
> }
>
>
> The value of the whole thing would therefore be (invisibly) the value of
> the last item in the assignment.
>
> Two examples:
>
>[A, B, C] <- expr   # assign the 

Re: [Rd] Multiple Assignment built into the R Interpreter?

2023-03-12 Thread Sebastian Martin Krantz
Kevins package is very nice as a proof of concept, no doubt about that, but
it is not at the level of performance or convenience that a native R
implementation would offer. I would probably not use it to translate matlab
routines into R packages placed on CRAN, because it’s an additional
dependency, I have a performance burden in every iteration, and
utils::globalVariables() is everything but elegant. From that perspective
it would be more convenient for me right now to stick with collapse::%=%,
which is already written in C, and also call
utils::globalVariables().

But again my hope in starting this was that R Core might see that the
addition of multiple assignment would be a significant enhancement to the
language, of the same order as the base pipe |> in my opinion.

I think the discussion so far has at least brought forth a way to implement
this in a way that does not violate fundamental principles of the language.
Which could form a basis for thinking about an actual addition to the
language.

Best regards,

Sebastian


On Sun 12. Mar 2023 at 13:18, Duncan Murdoch 
wrote:

> On 12/03/2023 6:07 a.m., Sebastian Martin Krantz wrote:
> > Thinking more about this, and seeing Kevins examples at
> > https://github.com/kevinushey/dotty
> > <https://github.com/kevinushey/dotty>, I think this is the most R-like
> > way of doing it,
> > with an additional benefit as it would allow to introduce the useful
> > data.table semantics DT[, .(a = b, c, d)] to more general R. So I would
> > propose to
> > introduce a new primitive function . <- function(...) .Primitive(".") in
> > R with an assignment method and the following features:
>
> I think that proposal is very unlikely to be accepted.  If it was a
> primitive function, it could only be maintained by R Core.  They are
> justifiably very reluctant to take on extra work for themselves.
>
> Kevin's package demonstrates that this can be done entirely in a
> contributed package, which means there's no need for R Core to be
> involved.  I don't know if he has plans to turn his prototype into a
> CRAN package.  If he doesn't, then it will be up to some other
> interested maintainer to step up and take on the task, or it will just
> fade away.
>
> I haven't checked whether your proposals below represent changes from
> the current version of dotty, but if they do, the way to proceed is to
> fork that project, implement your changes, and offer to contribute them
> back to the main branch.
>
> Duncan Murdoch
>
>
>
> >
> >   * Positional assignment e.g. .[nr, nc] <- dim(x), and named assignment
> > e.g. .[new = carb] <- mtcars or .[new = log(carb)] <- mtcars. All
> > the functionality proposed by Kevin at
> > https://github.com/kevinushey/dotty
> > <https://github.com/kevinushey/dotty> is useful, unambiguous and
> > feasible.
> >   * Silent dropping of RHS values e.g. .[mpg_new, cyl_new] <- mtcars.
> >   * Mixing of positional and named assignment e.g .[mpg_new, carb_new =
> > carb, cyl_new] <- mtcars. The inputs not assigned by name are simply
> > the elements of RHS in the order they occur, regardless of whether
> > they have been used previously e.g. .[mpg_new, cyl_new = cyl,
> > log_cyl = log(cyl), cyl_new2] <- mtcars is feasible. RHS here could
> > be any named vector type.
> >   * Conventional use of the function as lazy version of of list(), as in
> > data.table: .(A = B, C, D) is the same as list(A = B, C = C, D = D).
> > This would also be useful, allowing more parsimonious code, and
> > avoid the need to assign names to all return values in a function
> > return, e.g. if I already have matrices A, C, Q and R as internal
> > objects in my function, I can simply end by return(.(A, C, Q, R))
> > instead of return(list(A = A, C = C, Q = Q, R = R)) if I wanted the
> > list to be named with the object names.
> >
> > The implementation of this in R and C should be pretty straightforward.
> > It would just require a modification to R CMD Check to recognize .[<- as
> > assignment.
> >
> > Best regards,
> >
> > Sebastian
> > -
> > 2.)
> >
> > On Sun, 12 Mar 2023 at 09:42, Sebastian Martin Krantz
> >  > <mailto:sebastian.kra...@graduateinstitute.ch>> wrote:
> >
> > Thanks Gabriel and Kevin for your inputs,
> >
> > regarding your points Gabriel, I think Python and Julia do allow
> > multiple sub-assignment, but in-line with my earlier suggestion in
> > response to Duncan to make multiple assignment an environment-level
> > operation (like collapse::%=% curren

Re: [Rd] Multiple Assignment built into the R Interpreter?

2023-03-12 Thread Sebastian Martin Krantz
Thinking more about this, and seeing Kevins examples at
https://github.com/kevinushey/dotty, I think this is the most R-like way of
doing it,
with an additional benefit as it would allow to introduce the useful
data.table semantics DT[, .(a = b, c, d)] to more general R. So I would
propose to
introduce a new primitive function . <- function(...) .Primitive(".") in R
with an assignment method and the following features:

   - Positional assignment e.g. .[nr, nc] <- dim(x), and named assignment
   e.g. .[new = carb] <- mtcars or .[new = log(carb)] <- mtcars. All the
   functionality proposed by Kevin at https://github.com/kevinushey/dotty
   is useful, unambiguous and feasible.
   - Silent dropping of RHS values e.g. .[mpg_new, cyl_new] <- mtcars.
   - Mixing of positional and named assignment e.g .[mpg_new, carb_new =
   carb, cyl_new] <- mtcars. The inputs not assigned by name are simply the
   elements of RHS in the order they occur, regardless of whether they have
   been used previously e.g. .[mpg_new, cyl_new = cyl, log_cyl = log(cyl),
   cyl_new2] <- mtcars is feasible. RHS here could be any named vector type.
   - Conventional use of the function as lazy version of of list(), as in
   data.table: .(A = B, C, D) is the same as list(A = B, C = C, D = D). This
   would also be useful, allowing more parsimonious code, and avoid the need
   to assign names to all return values in a function return, e.g. if I
   already have matrices A, C, Q and R as internal objects in my function, I
   can simply end by return(.(A, C, Q, R)) instead of return(list(A = A, C =
   C, Q = Q, R = R)) if I wanted the list to be named with the object names.

The implementation of this in R and C should be pretty straightforward. It
would just require a modification to R CMD Check to recognize .[<- as
assignment.

Best regards,

Sebastian
-
2.)

On Sun, 12 Mar 2023 at 09:42, Sebastian Martin Krantz <
sebastian.kra...@graduateinstitute.ch> wrote:

> Thanks Gabriel and Kevin for your inputs,
>
> regarding your points Gabriel, I think Python and Julia do allow multiple
> sub-assignment, but in-line with my earlier suggestion in response to
> Duncan to make multiple assignment an environment-level operation (like
> collapse::%=% currently works),  this would not be possible in R.
>
> Regarding the [a] <- coolest_function() syntax, yeah it would mean do
> multiple assignment and set a equal to the first element dropping all other
> elements. Multiple assignment should be positional loke in other languages,
> enabling flexible renaming of objects on the fly. So it should be
> irrelevant whether the function returns a named or unnamed list or vector.
>
> Thanks also Kevin for this contribution. I think it’s a remarkable effort,
> and I wouldn’t mind such semantics e.g. making it a function call to ‘.[‘
> or any other one-letter function, as long as it’s coded in C and recognized
> by the interpreter as an assignment operation.
>
> Best regards,
>
> Sebastian
>
>
>
>
>
> On Sun 12. Mar 2023 at 01:00, Kevin Ushey  wrote:
>
>> FWIW, it's possible to get fairly close to your proposed semantics
>> using the existing metaprogramming facilities in R. I put together a
>> prototype package here to demonstrate:
>>
>> https://github.com/kevinushey/dotty
>>
>> The package exports an object called `.`, with a special `[<-.dot` S3
>> method which enables destructuring assignments. This means you can
>> write code like:
>>
>> .[nr, nc] <- dim(mtcars)
>>
>> and that will define 'nr' and 'nc' as you expect.
>>
>> As for R CMD check warnings, you can suppress those through the use of
>> globalVariables(), and that can also be automated within the package.
>> The 'dotty' package includes a function 'dotify()' which automates
>> looking for such usages in your package, and calling globalVariables()
>> so that R CMD check doesn't warn. In theory, a similar technique would
>> be applicable to other packages defining similar operators (zeallot,
>> collapse).
>>
>> Obviously, globalVariables() is a very heavy hammer to swing for this
>> issue, but you might consider the benefits worth the tradeoffs.
>>
>> Best,
>> Kevin
>>
>> On Sat, Mar 11, 2023 at 2:53 PM Duncan Murdoch 
>> wrote:
>> >
>> > On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
>> > > Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can
>> > > follow all aspects you raised, but to give my limited take on a few:
>> > >
>> > >> your proposal violates a very basic property of the  language, i.e.
>> that all statements are expressions and have a value.  > What's the value
>

Re: [Rd] Multiple Assignment built into the R Interpreter?

2023-03-12 Thread Sebastian Martin Krantz
Thanks Gabriel and Kevin for your inputs,

regarding your points Gabriel, I think Python and Julia do allow multiple
sub-assignment, but in-line with my earlier suggestion in response to
Duncan to make multiple assignment an environment-level operation (like
collapse::%=% currently works),  this would not be possible in R.

Regarding the [a] <- coolest_function() syntax, yeah it would mean do
multiple assignment and set a equal to the first element dropping all other
elements. Multiple assignment should be positional loke in other languages,
enabling flexible renaming of objects on the fly. So it should be
irrelevant whether the function returns a named or unnamed list or vector.

Thanks also Kevin for this contribution. I think it’s a remarkable effort,
and I wouldn’t mind such semantics e.g. making it a function call to ‘.[‘
or any other one-letter function, as long as it’s coded in C and recognized
by the interpreter as an assignment operation.

Best regards,

Sebastian





On Sun 12. Mar 2023 at 01:00, Kevin Ushey  wrote:

> FWIW, it's possible to get fairly close to your proposed semantics
> using the existing metaprogramming facilities in R. I put together a
> prototype package here to demonstrate:
>
> https://github.com/kevinushey/dotty
>
> The package exports an object called `.`, with a special `[<-.dot` S3
> method which enables destructuring assignments. This means you can
> write code like:
>
> .[nr, nc] <- dim(mtcars)
>
> and that will define 'nr' and 'nc' as you expect.
>
> As for R CMD check warnings, you can suppress those through the use of
> globalVariables(), and that can also be automated within the package.
> The 'dotty' package includes a function 'dotify()' which automates
> looking for such usages in your package, and calling globalVariables()
> so that R CMD check doesn't warn. In theory, a similar technique would
> be applicable to other packages defining similar operators (zeallot,
> collapse).
>
> Obviously, globalVariables() is a very heavy hammer to swing for this
> issue, but you might consider the benefits worth the tradeoffs.
>
> Best,
> Kevin
>
> On Sat, Mar 11, 2023 at 2:53 PM Duncan Murdoch 
> wrote:
> >
> > On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
> > > Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can
> > > follow all aspects you raised, but to give my limited take on a few:
> > >
> > >> your proposal violates a very basic property of the  language, i.e.
> that all statements are expressions and have a value.  > What's the value
> of 1 + (A, C = init_matrices()).
> > >
> > > I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars); nr
> > > = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,
> >
> >
> >d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)
> >
> > is not a statement, it is a sequence of 4 statements.
> >
> > Duncan Murdoch
> >
> >   as the
> > > above expression should. `%=%` assigns to
> > > environments, so 1 + (c("A", "C") %=% init_matrices()) returns
> > > numeric(0), with A and C having their values assigned.
> > >
> > >> suppose f() returns list(A = 1, B = 2) and I do  > B, A <- f() >
> Should assignment be by position or by name?
> > >
> > > In other languages this is by position. The feature is not meant to
> > > replace list2env(), and being able to rename objects in the assignment
> > > is a vital feature of codes
> > > using multi input and output functions e.g. in Matlab or Julia.
> > >
> > >> Honestly, given that this is simply syntactic sugar, I don't think I
> would support it.
> > >
> > > You can call it that, but it would be used by almost every R user
> almost
> > > every day. Simple things like nr, nc = dim(x); values, vectors =
> > > eigen(x) etc. where the creation of intermediate objects
> > > is cumbersome and redundant.
> > >
> > >> I see you've already mentioned it ("JavaScript-like"). I think it
> would  fulfil Sebastian's requirements too, as long as it is considered
> "true assignment" by the rest of the language.
> > >
> > > I don't have strong opinions about how the issue is phrased or
> > > implemented. Something like [t, n] = dim(x) might even be more clear.
> > > It's important though that assignment remains by position,
> > > so even if some output gets thrown away that should also be positional.
> > >
> > >>  A <- 0  > [A, B = A + 10] <- list(1, A = 2)
> > >
> > > I also fail to see the use

[Rd] Multiple Assignment built into the R Interpreter?

2023-03-11 Thread Sebastian Martin Krantz
Dear R Core,

working on my dynamic factor modelling package, which requires several
subroutines to create and update several system matrices, I come back to
the issue of being annoyed by R not supporting multiple assignment out of
the box like Matlab, Python and julia. e.g. something like

A, C, Q, R = init_matrices(X, Y, Z)

would be a great addition to the language. I know there are several
workarounds such as the %<-% operator in the zeallot package or my own %=%
operator in collapse, but these don't work well for package development as
R CMD Check warns about missing global bindings for the created variables,
e.g. I would have to use

A <- C <- Q <- R <- NULL
.c(A, C, Q, R) %=% init_matrices(X, Y, Z)

in a package, which is simply annoying. Of course the standard way of

init <- init_matrices(X, Y, Z)
 A <- init$A; C <- init$C; Q <- init$Q; R <- init$R
rm(init)

is also super cumbersome compared to Python or Julia. Another reason is of
course performance, even my %=% operator written in C has a non-negligible
performance cost for very tight loops, compared to a solution at the
interpretor level or in a primitive function such as `=`.

So my conclusion at this point is that it is just significantly easier to
implement such codes in Julia, in addition to the greater performance it
offers. There are obvious reasons why I am still coding in R and C, thanks
to the robust API and great ecosystem of packages, but adding this could be
a presumably low-hanging fruit to make my life a bit easier. Several issues
for this have been filed on Stackoverflow, the most popular one (
https://stackoverflow.com/questions/7519790/assign-multiple-new-variables-on-lhs-in-a-single-line)
has been viewed 77 thousand times.

But maybe this has already been discussed here and already decided against.
In that case, a way to browse R-devel archives to find out would be nice.

Best regards,

Sebastian

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel