Re: [Rd] Is ALTREP "non-API"?

2024-04-22 Thread Gabriel Becker
Hi Yutani,

The headers have been updated by Luke Tierney: ALTREP is an *experimental*
API, in that it is an official API that is legal for packages to use, but
may it change with short notice as the framework is further developed.

Hope that helps,
~G

On Mon, Apr 22, 2024 at 4:46 AM Hiroaki Yutani  wrote:

> Thanks for your convincing comment, but it seems the R core team has a
> different opinion...
> A few hours ago, src/include/R_ext/Altrep.h got this comment:
>
> /*
>Not part of the API, subject to change at any time.
> */
>
> commit:
> https://github.com/r-devel/r-svn/commit/2059bffde642f8426d1f39ab5dd995d19a575d4d
>
> While I'm glad to see their attempt to make it clear, I'm confused. That
> commit marks many other files as "not API," but I think it's a bit
> inconsistent with what Writing R Extension says.
>
> For example, src/include/R_ext/Parse.h got a comment "So not API," but the
> entry point R_ParseVector is explained in Writing R Extension[1]. So, I
> believe it's clearly an "API" both in the sense of WRE's dialect and in an
> ordinary sense. Which should I believe? WRE? The source code?
>
> It might be just a coincidence, but I'm sorry if my question drove the R
> core team to such a too-quick clarification. I just wanted to discuss how
> to fix the current inconsistencies.
>
> I think the R core needs a proper definition of "API" first. In my
> opinion, it makes little sense to call it "non-API" just to show the
> possibility of future breaking changes. Whether you call it API or non-API,
> clever users will still accept the breaking changes on it if it's
> reasonable. For example, how about "experimental API" or "unstable API"?
> They sound better to me.
>
> Best,
> Yutani
>
> [1]:
> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Parsing-R-code-from-C
>
>
> 2024年4月22日(月) 16:37 Gabriel Becker :
>
>> Hi Yutani,
>>
>> ALTREP is part of the official R api, as illustrated by the presence of
>> src/include/R_ext/Altrep.h. Everything declared in the header files in that
>> directory is official API AFAIK (and I believe that is more definitive than
>> the manuals).
>>
>> The documentation of ALTREP has lagged behind its implementation
>> unfortunately, which may partially my fault for not submitting doc
>> patches for it against the manuals. Sorry for my contribution to that, I'll
>> see if I can loop back around to contributing documentation for ALTREP.
>>
>> Best,
>> ~G
>>
>> On Sun, Apr 21, 2024 at 6:36 PM Hiroaki Yutani 
>> wrote:
>>
>>> Thanks, Hernando,
>>>
>>> Sorry, "API" is a bit confusing term in this context, but what I want to
>>> discuss is the "API" that Writing R Extension defines as quoted in my
>>> previous email. It's probably different from an ordinary sense when we
>>> casually say "R C API".
>>>
>>> You might wonder why I care about such a difference. This is because
>>> calling a "non-API" is considered a violation of CRAN repository policy,
>>> which means CRAN will kick out the R package. I know many CRAN packages
>>> use
>>> ALTREP, but just being accepted by CRAN at the moment doesn't mean CRAN
>>> will keep accepting it. So, I want to clarify the current status of
>>> ALTREP.
>>>
>>> Best,
>>> Yutani
>>>
>>> 2024年4月22日(月) 10:17 :
>>>
>>> > Hello, I don't believe it is illegal, as ALTREP "implements an
>>> abstraction
>>> > underneath the C API". And is "compatible with all code which uses the
>>> > API".
>>> >
>>> > Please see slide deck by Gabriel Becker,  with L Tierney, M Lawrence
>>> and T
>>> > Kalibera.
>>> >
>>> >
>>> >
>>> https://bioconductor.org/help/course-materials/2020/BiocDevelForum/16-ALTREP
>>> > .pdf
>>> > <
>>> https://bioconductor.org/help/course-materials/2020/BiocDevelForum/16-ALTREP.pdf
>>> >
>>> >
>>> > ALTREP framework implements an abstraction underneath traditional R C
>>> API
>>> > - Generalizes whats underneath the API
>>> > - Without changing how data are accessed
>>> > - Compatible with all C code which uses the API
>>> > - Compatible with R internals
>>> >
>>> >
>>> > I hope this helps,
>>> > Hernando
>>> >
>>> >
>>> > -Original

Re: [Rd] Is ALTREP "non-API"?

2024-04-22 Thread Gabriel Becker
Hi Yutani,

ALTREP is part of the official R api, as illustrated by the presence of
src/include/R_ext/Altrep.h. Everything declared in the header files in that
directory is official API AFAIK (and I believe that is more definitive than
the manuals).

The documentation of ALTREP has lagged behind its implementation
unfortunately, which may partially my fault for not submitting doc
patches for it against the manuals. Sorry for my contribution to that, I'll
see if I can loop back around to contributing documentation for ALTREP.

Best,
~G

On Sun, Apr 21, 2024 at 6:36 PM Hiroaki Yutani  wrote:

> Thanks, Hernando,
>
> Sorry, "API" is a bit confusing term in this context, but what I want to
> discuss is the "API" that Writing R Extension defines as quoted in my
> previous email. It's probably different from an ordinary sense when we
> casually say "R C API".
>
> You might wonder why I care about such a difference. This is because
> calling a "non-API" is considered a violation of CRAN repository policy,
> which means CRAN will kick out the R package. I know many CRAN packages use
> ALTREP, but just being accepted by CRAN at the moment doesn't mean CRAN
> will keep accepting it. So, I want to clarify the current status of ALTREP.
>
> Best,
> Yutani
>
> 2024年4月22日(月) 10:17 :
>
> > Hello, I don't believe it is illegal, as ALTREP "implements an
> abstraction
> > underneath the C API". And is "compatible with all code which uses the
> > API".
> >
> > Please see slide deck by Gabriel Becker,  with L Tierney, M Lawrence and
> T
> > Kalibera.
> >
> >
> >
> https://bioconductor.org/help/course-materials/2020/BiocDevelForum/16-ALTREP
> > .pdf
> > <
> https://bioconductor.org/help/course-materials/2020/BiocDevelForum/16-ALTREP.pdf
> >
> >
> > ALTREP framework implements an abstraction underneath traditional R C API
> > - Generalizes whats underneath the API
> > - Without changing how data are accessed
> > - Compatible with all C code which uses the API
> > - Compatible with R internals
> >
> >
> > I hope this helps,
> > Hernando
> >
> >
> > -Original Message-
> > From: R-devel  On Behalf Of Hiroaki
> Yutani
> > Sent: Sunday, April 21, 2024 8:48 PM
> > To: r-devel 
> > Subject: [Rd] Is ALTREP "non-API"?
> >
> > Writing R Extension[1] defines "API" as:
> >
> > Entry points which are documented in this manual and declared in an
> > installed header file. These can be used in distributed packages and will
> > only be changed after deprecation.
> >
> > But, the document (WRE) doesn't have even a single mention of ALTREP, the
> > term "ALTREP" itself or any entry points related to ALTREP. Does this
> mean,
> > despite the widespread use of it on R packages including CRAN ones,
> ALTREP
> > is not the API and accordingly using it in distributed packages is
> > considered illegal?
> >
> > Best,
> > Yutani
> >
> > [1]:
> > https://cran.r-project.org/doc/manuals/r-release/R-exts.html#The-R-API
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving user-friendliness of S4 dispatch failure when mis-naming arguments?

2023-08-10 Thread Gabriel Becker
I just want to add my 2 cents that I think it would be very useful and
beneficial to improve S4 to surface that information as well.

More information about the way that the dispatch failed would be of great
help in situations like the one Michael pointed out.

~G

On Thu, Aug 10, 2023 at 9:59 AM Michael Chirico via R-devel <
r-devel@r-project.org> wrote:

> I forwarded that along to the original reporter with positive feedback
> -- including the argument names is definitely a big help for cuing
> what exactly is missing.
>
> Would a patch to do something similar for S4 be useful?
>
> On Thu, Aug 10, 2023 at 6:46 AM Hadley Wickham 
> wrote:
> >
> > Hi Michael,
> >
> > I can't help with S4, but I can help to make sure this isn't a problem
> > with S7. What do you think of the current error message? Do you see
> > anything obvious we could do to improve?
> >
> > library(S7)
> >
> > dbGetQuery <- new_generic("dbGetQuery", c("conn", "statement"))
> > dbGetQuery(connection = NULL, query = NULL)
> > #> Error: Can't find method for generic `dbGetQuery(conn, statement)`:
> > #> - conn : MISSING
> > #> - statement: MISSING
> >
> > Hadley
> >
> > On Wed, Aug 9, 2023 at 10:02 PM Michael Chirico via R-devel
> >  wrote:
> > >
> > > I fielded a debugging request from a non-expert user today. At root
> > > was running the following:
> > >
> > > dbGetQuery(connection = conn, query = query)
> > >
> > > The problem is that they've named the arguments incorrectly -- it
> > > should have been [1]:
> > >
> > > dbGetQuery(conn = conn, statement = query)
> > >
> > > The problem is that the error message "looks" highly confusing to the
> > > untrained eye:
> > >
> > > Error in (function (classes, fdef, mtable)  :   unable to find an
> > > inherited method for function ‘dbGetQuery’ for signature ‘"missing",
> > > "missing"’
> > >
> > > In retrospect, of course, this makes sense -- the mis-named arguments
> > > are getting picked up by '...', leaving the required arguments
> > > missing.
> > >
> > > But I was left wondering how we could help users right their own ship
> here.
> > >
> > > Would it help to mention the argument names? To include some code
> > > checking for weird combinations of missing arguments? Any other
> > > suggestions?
> > >
> > > Mike C
> > >
> > > [1]
> https://github.com/r-dbi/DBI/blob/97934c885749dd87a6beb10e8ccb6a5ebea3675e/R/dbGetQuery.R#L62-L64
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> >
> > --
> > http://hadley.nz
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Vectorize library() to improve handling of versioned deps [sprint project proposal]

2023-08-07 Thread Gabriel Becker
Hi All,

This is a proposal for a project which could be worked on during the R
development Sprint at the end of this month; it was requested that we start
a discussion here to see what R-core's thoughts on it were before we
officially add it to the docket.


AFAIK, R officially supports both versioned dependencies (almost
exclusively of the >=version variety) and library paths with more than one
directory. Further, I believe if at least de facto supports the same
package being installed in different directories along the lib path. The
most common of these, I'd bet, would be different versions of the same
library being installed in a site library and in a user's personal library,
though that's not the only way this can happen.

The combination of these two features, however can give rise to
packages which are all correctly installed and all loadable individually,
but which must be loaded in a particular order when used together or the
loading of some of them will fail.

Consider the following dependency structure between packages

pkgA: pkgB (>= 0.5.0)

pkgC: pkgB (>= 0.6.0)

Consider the following multi-libpath setup:

~/pth1/: pkgA, pkg B [0.5.1]
~/pth2/: pkg C, pkg B [0.6.5]

And consider that we have the libpath c("~/pth1/", "~/pth2").

If we do

library(pkgA)

Things will work great.

Same if we do

library(pkgC)

BUT, if we do

library(pkgA)
library(pkgC)

pkgC will not be able to be loaded, because an insufficient version of
pkgB will
already be loaded.

I propose that library be modified to be able to take a character vector of
package names, when it does, it performs the dependency calculations to
determine how all packages in the vector can be loaded (in the order they
appear). In the example above, this would mean that if we did

library(c("pkgA", "pkgC"))

It would determine that pkgB version 0.6.5 was needed (or alternatively,
that version 0.5.1 was insufficient) and use that *when loading the
dependencies of pkgA*.

The proposal issue for the sprint itself is here:
https://github.com/r-devel/r-project-sprint-2023/discussions/15

Thoughts?

~G

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] codetools wrongly complains about lazy evaluation in S4 methods

2023-06-07 Thread Gabriel Becker
The API supported workaround is to call globalVariables, which,
essentially, declares the variables without defining them (a distinction R
does not usually make).

The issue with this approach, of course, is that its a very blunt
instrument. It will cause false negatives if you accidentally use the same
symbol in a standard evaluation context elsewhere in your code.
Nonetheless, that's the intended approach as far as i know.

Best,
~G



On Wed, Jun 7, 2023 at 1:07 AM Serguei Sokol via R-devel <
r-devel@r-project.org> wrote:

> Le 03/06/2023 à 17:50, Mikael Jagan a écrit :
> > In a package, I define a method for not-yet-generic function 'qr.X'
> > like so:
> >
> > > setOldClass("qr")
> > > setMethod("qr.X", signature(qr = "qr"), function(qr, complete,
> > ncol) NULL)
> >
> > The formals of the newly generic 'qr.X' are inherited from the
> > non-generic
> > function in the base namespace.  Notably, the inherited default value of
> > formal argument 'ncol' relies on lazy evaluation:
> >
> > > formals(qr.X)[["ncol"]]
> > if (complete) nrow(R) else min(dim(R))
> >
> > where 'R' must be defined in the body of any method that might
> > evaluate 'ncol'.
> > To my surprise, tools:::.check_code_usage_in_package() complains about
> > the
> > undefined symbol:
> >
> > qr.X: no visible binding for global variable 'R'
> > qr.X,qr: no visible binding for global variable 'R'
> > Undefined global functions or variables:
> >   R
> I think this issue is similar to the complaints about non defined
> variables in expressions involving non standard evaluation, e.g. column
> names in a data frame which are used as unquoted symbols. One of
> workarounds is simply to declare them somewhere in your code. In your
> case, it could be something as simple as:
>
>R=NULL
>
> Best,
> Serguei.
>
> >
> > I claim that it should _not_ complain, given that lazy evaluation is a
> > really
> > a feature of the language _and_ given that it already does not
> > complain about
> > the formals of functions that are not S4 methods.
> >
> > Having said that, it is not obvious to me what in codetools would need
> > to change
> > here.  Any ideas?
> >
> > I've attached a script that creates and installs a test package and
> > reproduces
> > the check output by calling tools:::.check_code_usage_in_package().
> > Hope it
> > gets through.
> >
> > Mikael
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] range() for Date and POSIXct could respect `finite = TRUE`

2023-05-19 Thread Gabriel Becker
Hi All,

I think there may be some possible confusion about what allowsInf would be
reporting (or maybe its just me :) ) if we did this.

Consider a class "myclass", S3, for starters,

with

setMethod("allowsInf", "myclass", function(obj) FALSE)

Then, what would

myclassthing <- structure(1.5, class = "mything")
myclassthing[1] <- Inf

do. Assumely it would happily complete without complaint, right, even
though allowInf(myclassthing) would return FALSE? Thus an infinite value
was allowed. This seems very misleading/counter-intuitive to me.

Perhaps this is just an issue with the proposed naming, though I'm not
certain that's the case.

I guess what i'm saying is allowsInf at its core is a validation criterion
for objects of a particular class, and under that paradigm, having that
validation not be enforced (Which it would not be, at least for S3 classed
objects, I imagine) seems like it would muddy the waters further rather
than making things clearer.

Put another way, and as pointed out by Bill above, the result of allowsInf
is really an attribute of a *class*, not of an object. allowsInf(x) is
really just a proxy for allowsInf(class(x)), right? The problem here is
that S3 doesn't *have* classes in a sense that makes the latter coherent.

Its notable here that developers could also get around this by implementing
methods for the summary group generic that either implement the finite
argument or not as appropriate for their class, right? And that would be
true whether the default for, e.g., min and max were altered to have the
finite argument to match range, or not.

Best,
~G


On Fri, May 19, 2023 at 8:30 AM Martin Maechler 
wrote:

> > Bill Dunlap
> > on Thu, 11 May 2023 10:42:48 -0700 writes:
>
> >> What do others think?
>
> > I can imagine a class, "TemperatureKelvins", that wraps a
> > double but would have a range of 0 to Inf or one called
> > "GymnasticsScore" with a range of 0 to 10.  For those
> > sorts of things it would be nice to have a generic that
> > gave the possible min and max for the class instead of one
> > that just said they were -Inf and Inf or not.
>
> > -Bill
>
> yeah.. I agree that a general concept of such an interval class
> is even more flexible and generally useful.
> OTOH, people have already introduced such classes where they
> were really needed, and here it's really about
> *if*
> is.finite() and is.infinite() are also available and working
> but not always FALSE (which they are for logical, integer,
> character *and* raw, the latter really debatable - but *not* in this
> thread).
>
> So, allows.infinite(x)  would *not* vectorize but return TRUE or
> FALSE (and typically not NA ..), in some sense being a property
> of class(x) only.
>
>
> > On Thu, May 11, 2023 at 1:49 AM Martin Maechler
> >  wrote:
>
> >> > Davis Vaughan
> >> > on Tue, 9 May 2023 09:49:41 -0400 writes:
> >>
> >> > It seems like the main problem is that `is.numeric(x)`
> >> > isn't fully indicative of whether or not `is.finite(x)`
> >> > makes sense for `x` (i.e.  Date isn't numeric but does
> >> > allow infinite dates).
> >>
> >> > So I could also imagine a new `allows.infinite()` S3
> >> > generic that would return a single TRUE/FALSE for whether
> >> > or not the type allows infinite values, this would also be
> >> > indicative of whether or not `is.finite()` and
> >> > `is.infinite()` make sense on that type. I imagine it
> >> > being used like:
>
> >> > ```
> >> >   allows.infinite <- function(x) {
> >> > UseMethod("allows.infinite")
> >> >   }
> >> >   allows.infinite.default <- function(x) {
> >> > is.numeric(x) # For backwards compatibility, maybe? Not sure.
> >> >   }
>
> it would have to include  is.complex() as well *and*
> in principle I'd want to *exclude* integers as they really
> cannot be +/- Inf
> ... but then you did say "not sure" ..
>
> I'm still somewhat favoring this proposal,
> because it would be a bit more generally applicable
> but still very simple.
>
> Personally, I'd go for the shorter allowsInf()  name,
> not adding another  .()  generic function,
> but that's less important and should not determine decisions I think.
>
> Martin
>
> >> >   allows.infinite.Date <- function(x) {
> >> > TRUE
> >> >   }
> >> >   allows.infinite.POSIXct <- function(x) {
> >> > TRUE
> >> >   }
> >> >
> >> >   range.default <- function (..., na.rm = FALSE, finite = FALSE) {
> >> > x <- c(..., recursive = TRUE)
> >> > if (allows.infinite(x)) { # changed from `is.numeric()`
> >> >   if (finite)
> >> > x <- x[is.finite(x)]
> >> >   else if (na.rm)
> >> > x <- x[!is.na(x)]
> >> >   c(min(x), max(x))
> >> > }
> >> > else {
> >> >   if (finite)
> >> > na.rm <- TRUE
> >> >   c(min(x, na.rm = na.rm), max(x, 

Re: [Rd] Should '@" now be listed in tools:::.get_internal_S3_generics() ?

2023-04-28 Thread Gabriel Becker
Karolis,

It seems likely, without having looked myself, that you could be correct
about the issue, but it does seem worth noting that both of the functions
you have mentioned are not exported, and thus not part of the API that
extension packages are allowed to use and rely on.

If retrieving the list of "internal S3 generics" is something package and
user code is allowed to do, the real fix seems to go beyond what you're
suggesting, to actually providing an API entry point that gives the
relevant information (maybe in an identical form to how those internal
functions do so, maybe not). If it's not, for some principled reason,
something R-core wants to support package and user code doing, then the
fact that the new thing doesn't work automatically with roxygen2 would
become the roxygen maintainers' job to fix or document.

I do not know whether R-core feels this is something packages/users should
be able to do; both decisions strike me as possible, to be honest,
depending on details I don't know and/or am not privy to.

Best,
~G

On Fri, Apr 28, 2023 at 1:49 PM Karolis Koncevičius <
karolis.koncevic...@gmail.com> wrote:

> This issue might go deeper - I was not successful in passing R CMD checks
> for the usage files. R CMD check kept showing errors for `@` declarations,
> even thou they were identical to `$` declarations (which passed fine).
>
> Seems like the usage check functions are not prepared for `@` - also in
> tools:::.S3_method_markup_regexp
>
> > On Apr 28, 2023, at 10:34 PM, Karolis Koncevičius <
> karolis.koncevic...@gmail.com> wrote:
> >
> > I was building a package that uses the new generic @ and kept having
> errors with “roxygen2” documentation. “roxygen2” generated NAMESPACE added
> `@.newclass` as a newly exported function, not as a S3method.
> >
> > At first I thought this must be a bug in roxygen2 and they lag behind
> the new developments in R. But after some investigation I found that
> “roxygen2” is using tools:::.get_internal_S3_generis() to decide if the
> method should be exported as S3method or not. For some reason “@<-“ is
> listed in there, but “@“ is not.
> >
> > Am I missing some context, or is this an oversight?
> >
> > Kind regards,
> > Karolis Koncevicius
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Autocompletion for the new S3 generic @ method?

2023-04-02 Thread Gabriel Becker
Hi Tomasz,

I haven't had a chance to look at your patch yet (also I can't accept it as
I'm not on R-core), but patches for consideration should be submitted to
bugzilla (https://bugs.r-project.org), not the unofficial github mirror of
the the SVN repo.

Best,
~G

On Sun, Apr 2, 2023 at 6:09 AM Tomasz Kalinowski 
wrote:

> I agree, this is a good idea and would be very helpful in interactive
> contexts.
>
> I have a draft patch implementing this feature here:
> https://github.com/r-devel/r-svn/pull/122
> (Append  “.patch” to the URL to get a raw patch.)
>
> Regards,
> Tomasz
>
> > On Mar 31, 2023, at 2:11 PM, Karolis K 
> wrote:
> >
> > Hello,
> >
> > In the current R-devel @ is S3 generic, so we can do things like - for
> example - use it to extract matrix rows by name:
> >
> >.S3method("@", "mm", function(object, name) object[name,])
> >m <- structure(matrix(rnorm(20), ncol=2), dimnames=list(paste0("row",
> 1:10), paste("col", 1:2)), class="mm")
> >
> >m@row1
> >
> > However, seems like currently it does not support autocompletion.
> >
> > Wouldn’t it make sense to add a method like .EtaNames() which would
> provide tab autocompletions for x@ in the same way current
> .DollarNames() does for x$?
> >
> > Regards,
> > Karolis K.
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] write.csv performance improvements?

2023-03-30 Thread Gabriel Becker
Hi Toby et al,



On Wed, Mar 29, 2023 at 10:24 PM Toby Hocking  wrote:

> Dear R-devel,
> I did a systematic comparison of write.csv with similar functions, and
> observed two asymptotic inefficiencies that could be improved.
>
> 1. write.csv is quadratic time (N^2) in the number of columns N.
> Can write.csv be improved to use a linear time algorithm, so it can handle
> CSV files with larger numbers of columns?
>

Yes, I think there is a narrow fix and a wider discussion to be had.

I've posted a discussion and the narrow fix at:
https://bugs.r-project.org/show_bug.cgi?id=18500

For "normal data", ie data that doesn't have classed object columns, the
narrow change I propose in the patch us the performance we might expect
(see the attached, admittedly very ugly plots).

The fact remains though, that with the patch, write.table is still
quadratic in the number of *object-classed *columns.

It doesn't seem like it should be, but I haven't (yet) had a chance to dig
deeper to attack that.  Might be a good subject for the R developer sprint,
if R-core agrees.

~G

> For more details including figures and session info, please see
> https://github.com/tdhock/atime/issues/9
>
> 2. write.csv uses memory that is linear in the number of rows, whereas
> similar R functions for writing CSV use only constant memory. This is not
> as important of an issue to fix, because anyway linear memory is used to
> store the data in R. But since the other functions use constant memory,
> could write.csv also? Is there some copying happening that could be
> avoided? (this memory measurement uses bench::mark, which in turn uses
> utils::Rprofmem)
> https://github.com/tdhock/atime/issues/10
>
> Sincerely,
> Toby Dylan Hocking
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Request: better default R_LIBS_USER

2023-03-23 Thread Gabriel Becker
Small but crucial typo correction:

 Perhaps this could be put on the list of possible changes for R 5.0, to be
> bundled with other as-yet undecided breaking changes, members of R-core
> feel the same way.
>

 *if* members of R-core feel the same way.

~G

>
> Best,
> ~G
>
>
>> Cheers.
>>
>> --
>> Felipe Contreras
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Request: better default R_LIBS_USER

2023-03-23 Thread Gabriel Becker
Felipe,

Thanks for being interested in making R better. Its great to see engagement
from a new "virtual face", so to speak. That said, without speaking for
R-core, my experience is that the R-project and R-core team place a very
high premium on backwards compatibility. They will make breaking changes,
on occasion, but its rare and extreme for them to do so. I personally
don't think the amount of convenience/modern conceptual best practices
adherence we're talking about here rises to the level of justifying a
breaking change put in quickly.

As for why I'm calling this a breaking change, see inline:

>

On Thu, Mar 23, 2023 at 12:49 PM Felipe Contreras <
felipe.contre...@gmail.com> wrote:

> > Just expand %U to both:
> >
> > paste(c(
> > file.path(home, ".local", "lib", "r", x.y),
> > file.path(home, "R", paste0(R.version$platform, "-library"),
> x.y)
> > ), collapse = ":")
> >
> > Then R would install packages to the new location in new installations
> > by default, but still use packages from the old location if present.
>
> This would work, would it not?
>

So off the top of my head, this would mean that people who have versions of
R from before and after that change installed simultaneously would, *by
default* have package libraries that live in completely different places on
the drive.

Does that preclude a change like this from ever being made? No, but it does
seem more like a major version change than a minor version change to me.

For an example of why this could be a breaking change, i'd be willing to
bet there are shell scripts out there which assume the user directories
live at ~user/R// and detect them as such. Do I think its
common? No. Do I think its the right way to do whatever that shell script
is doing? No, but I'd be surprised if there wasn't code that does it
somewhere. And that code will/would break under your proposed change.


> Moreover, this location is going to change the next time the minor
> > version is bumped anyway.
>
> Or just change it for the next version of R.
>

All of the above said, I do actually agree that it would be nice to change
the default in the long term (including for mac builds from source).
Perhaps this could be put on the list of possible changes for R 5.0, to be
bundled with other as-yet undecided breaking changes, members of R-core
feel the same way.

Best,
~G


> Cheers.
>
> --
> Felipe Contreras
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] WISH: Optional mechanism preventing var <<- value from assigning non-existing variable

2023-03-19 Thread Gabriel Becker
I have to say <<- is a core debugging tool when assigning into the global
environment. I suppose I could use assign but that would be somewhat
annoying.

That said I'm still for this change, the vast overwhelming number of times
that <<- is in my package code - already rare but it does happen - it would
absolutely be a bug (typo most likely) for it to get to the global
environment and assign into it. Assigning into thr global environment from
package code is a serious anti pattern anyway.

To be honest from the developer perspective what id personally actually
want is an assigner that was willing to go up exactly one frame from the
current one to find its binding. That is how I essentially always am using
<<- myself.

~G

On Sun, Mar 19, 2023, 11:16 AM Bill Dunlap  wrote:

> Why should it make an exception for cases where the about-to-be-assigned-to
> name is present in the global environment?  I think it should warn or give
> an error if the altered variable is in any environment on the search list.
>
> -Bill
>
> On Sun, Mar 19, 2023 at 10:54 AM Duncan Murdoch 
> wrote:
>
> > I think that should be the default behaviour. It's pretty late to get
> > that into R 4.3.0, but I think your proposal (with check.superassignment
> > = FALSE being the default) could make it in, and 4.4.0 could change the
> > default to TRUE.
> >
> > Duncan
> >
> >
> >
> > On 19/03/2023 12:08 p.m., Henrik Bengtsson wrote:
> > > I'd like to be able to prevent the <<- assignment operator ("super
> > > assignment") from assigning to the global environment unless the
> > > variable already exists and is not locked.  If it does not exist or is
> > > locked, I'd like an error to be produced.  This would allow me to
> > > evaluate expressions with this temporarily set to protect against
> > > mistakes.
> > >
> > > For example, I'd like to do something like:
> > >
> > > $ R --vanilla
> > >> exists("a")
> > > [1] FALSE
> > >
> > >> options(check.superassignment = TRUE)
> > >> local({ a <<- 1 })
> > > Error: object 'a' not found
> > >
> > >> a <- 0
> > >> local({ a <<- 1 })
> > >> a
> > > [1] 1
> > >
> > >> rm("a")
> > >> options(check.superassignment = FALSE)
> > >> local({ a <<- 1 })
> > >> exists("a")
> > > [1] TRUE
> > >
> > >
> > > BACKGROUND:
> > >
> > >  From help("<<-") we have:
> > >
> > > "The operators <<- and ->> are normally only used in functions, and
> > > cause a search to be made through parent environments for an existing
> > > definition of the variable being assigned. If such a variable is found
> > > (and its binding is not locked) then its value is redefined, otherwise
> > > assignment takes place in the global environment."
> > >
> > > I argue that it's unfortunate that <<- fallbacks back to assigning to
> > > the global environment if the variable does not already exist.
> > > Unfortunately, it has become a "go to" solution for many to use it
> > > that way.  Sometimes it is intended, sometimes it's a mistake.  We
> > > find it also in R packages on CRAN, even if 'R CMD check' tries to
> > > detect when it happens (but it's limited to do so from run-time
> > > examples and tests).
> > >
> > > It's probably too widely used for us to change to a more strict
> > > behavior permanent.  The proposed R option allows me, as a developer,
> > > to evaluate an R expression with the strict behavior, especially if I
> > > don't trust the code.
> > >
> > > With 'check.superassignment = TRUE' set, a developer would have to
> > > first declare the variable in the global environment for <<- to assign
> > > there.  This would remove the fallback "If such a variable is found
> > > (and its binding is not locked) then its value is redefined, otherwise
> > > assignment takes place in the global environment" in the current
> > > design.  For those who truly intends to assign to the global, could
> > > use assign(var, value, envir = globalenv()) or globalenv()[[var]] <-
> > > value.
> > >
> > > 'R CMD check' could temporarily set 'check.superassignment = TRUE'
> > > during checks.  If we let environment variable
> > > 'R_CHECK_SUPERASSIGNMENT' set the default value of option
> > > 'check.superassignment' on R startup, it would be possible to check
> > > packages optionally this way, but also to run any "non-trusted" R
> > > script in the "strict" mode.
> > >
> > >
> > > TEASER:
> > >
> > > Here's an example why using <<- for assigning to the global
> > > environment is a bad idea:
> > >
> > > This works:
> > >
> > > $ R --vanilla
> > >> y <- lapply(1:3, function(x) { if (x > 2) keep <<- x; x^2 })
> > >> keep
> > >> [1] 3
> > >
> > >
> > > This doesn't work:
> > >
> > > $ R --vanilla
> > >> library(purrr)
> > >> y <- lapply(1:3, function(x) { if (x > 2) keep <<- x; x^2 })
> > > Error in keep <<- x : cannot change value of locked binding for 'keep'
> > >
> > >
> > > But, if we "declare" the variable first, it works:
> > >
> > > $ R --vanilla
> > >> library(purrr)
> > >> keep <- 0
> > >> y <- lapply(1:3, function(x) { if (x > 2) keep <<- x; 

Re: [Rd] Multiple Assignment built into the R Interpreter?

2023-03-11 Thread Gabriel Becker
There are some other considerations too (apologies if these were mentioned
above and I missed them). Also below are initial thoughts, so apologies for
any mistakes or oversights.

For example, if

[a, b] <- my2valuefun()

works the same as

local({
tmp <- my2valuefun()
stopifnot(is.list(tmp) && length(tmp) == 2)
a <<- tmp[[1]]
b <<- tmp[[2]]
})

Do we expect

[a[1], b[3]] <- my2valuefun()

to also work? That doesn't sound very fun to me, personally, but obviously
the "single value return" versions of these do work and have for a long
time, i.e.

a[1] <- my2valuefun()[[1]]
b[3] <- my2valuefun()[[2]]

is perfectly valid R code (though it does call the function twice which is
"silly" in some sense).

Another thing which arises from the Julia API specifically which I think is
problematic is the ambiguity of's atomic "types" being vectors. Consider
the following

coolest_function <- function() c(a = 15, b = 65, c = 275)
a <- coolest_function()

That obviously makes a vector of length 3. Anything else would break *like
all the R code*

But now, what does

[a] <- coolest_function()

do? Does it assign 15 to a, because b and c arent' being assigned to?

Does this mean variables being assigned to actually need to *match the
names within the return object*? I don't think that would work at all in
general...

Alternatively, is the second one an error, because the function isn't
returning a list? This doesn't really fix the problem either though

Because a single list of length > 1 *is a valid thing to return from an R
function*. I think, like in Julia, you'd need to declare the set of things
being returned, and perhaps map them to the variables you want assigned

crazy_notworking_fun <- function() {
  return(a = 5, b = 65, c = 275)
}

[a_val = a, b_val = b] <- crazy_notworking_fun()

Or even,

[a_val <- a, b_val <-b] <- crazy_notworking_fun()


In that case, however, it becomes somewhat unclear (to me at least) what

only_val <- crazy_notworking_fun()

would do. Throw an error because multivalued functions are fundamentally
different and we can't pretend they aren't? This would disallow all of the
things you think "most r users would use every day" (a claim I'm somewhat
skeptical of, to be honest). If thats not it, though, what? I don't think
it can/should return the full list of results, because that introduces the
ambiguity this is trying to avoid right back in.  Perhaps just the first
thing returned? That is internally consistent, but  somewhat strange
behavior...

Best,
~G




On Sat, Mar 11, 2023 at 2:15 PM Sebastian Martin Krantz <
sebastian.kra...@graduateinstitute.ch> wrote:

> Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can follow
> all aspects you raised, but to give my limited take on a few:
>
> > your proposal violates a very basic property of the  language, i.e. that
> all statements are expressions and have a value.
> > What's the value of 1 + (A, C = init_matrices()).
>
> I'm not sure I see the point here. I evaluated  1 + (d = dim(mtcars);
> nr = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error, as
> the above expression should. `%=%` assigns to
> environments, so 1 + (c("A", "C") %=% init_matrices()) returns
> numeric(0), with A and C having their values assigned.
>
> > suppose f() returns list(A = 1, B = 2) and I do
> >  B, A <- f()
> > Should assignment be by position or by name?
>
> In other languages this is by position. The feature is not meant to
> replace list2env(), and being able to rename objects in the assignment
> is a vital feature of codes
> using multi input and output functions e.g. in Matlab or Julia.
>
> > Honestly, given that this is simply syntactic sugar, I don't think I
> would support it.
>
> You can call it that, but it would be used by almost every R user
> almost every day. Simple things like nr, nc = dim(x); values, vectors
> = eigen(x) etc. where the creation of intermediate objects
> is cumbersome and redundant.
>
> > I see you've already mentioned it ("JavaScript-like"). I think it would
> fulfil Sebastian's requirements too, as long as it is considered "true
> assignment" by the rest of the language.
>
> I don't have strong opinions about how the issue is phrased or
> implemented. Something like [t, n] = dim(x) might even be more clear.
> It's important though that assignment remains by position,
> so even if some output gets thrown away that should also be positional.
>
> >  A <- 0
> >  [A, B = A + 10] <- list(1, A = 2)
>
> I also fail to see the use of allowing this. something like this is an
> error.
>
> > A = 2
> > (B = A + 1) <- 1
> Error in (B = A + 1) <- 1 : could not find function "(<-"
>
> Regarding the practical implementation, I think `collapse::%=%` is a
> good starting point. It could be introduced in R as a separate
> function, or `=` could be modified to accommodate its capability. It
> should be clear that
> with more than one LHS variables the assignment is an environment
> level operation and the results can only be used 

Re: [Rd] transform.data.frame() ignores unnamed arguments when no named argument is provided

2023-03-04 Thread Gabriel Becker
Hi Avi,

On Fri, Mar 3, 2023 at 9:07 PM  wrote:

> I am probably mistaken but it looks to me like the design of much of the
> data.frame infrastructure not only does not insist you give columns names,
> but even has all kinds of options such as check.names and fix.empty.names
>
>
> https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/data.frame
>
>
I think this is true, but thats for the *construction* of a data.frame,
where as, in my opinion from what I can tell, transform is for operating on
a data.frame that has already been constructed. I'm not personally
convinced the same allowances should be made at this conceptually later
stage in data processing.


> During the lifetime of a column, it can get removed, renamed, transfomed
> in many ways and so on. A data.frame read in from a file such as a .CSV
> often begins with temporary created names.
>
> It is so common, that sometimes not giving a name is a choice and not in
> any way an error. I have seen some rather odd names in backticks that
> include spaces and seen duplicate names. The reality is you can index by
> column number two and maybe no actual name was needed by the one creating
> or modifying the data.
>

You can but this creates brittle, difficult to maintain code to the extent
that I consider this an anti-pattern, and I don't believe I'm alone in that.


>
> Some placed warnings are welcome as they tend to reflect a possibly
> serious error.  But that error may not easily be at this point versus later
> in the game.  If later the program tries to access the misnamed column,
> then an error makes sense. Warnings, if overused, get old quickly and you
> regularly see code written to suppress startup messages or warnings because
> the same message shown every day becomes something you ignore mentally even
> if not suppressed. How many times has loading the tidyverse reminded me it
> is shadowing a few base R functions? How many times have I really cared?
>

I think this is a bad example to make your case on, because symbol masking
is actually *really* important. In bioinformatics, Bioconductor is the
flagship (which sails upon the sea that R provides), but guess what; dplyr
and Bioconductor both define filter, and they do so meaning completely
different incompatible things.

I have seen code that wanted one version and got the other in both
directions, and in neither case is it fun, but without that warning it
would be a dystopian nightmarescape that scarcely bears thinking about.


> What makes some sense to me is to add an argument to some functions
> BEGGING to be shown the errors of your ways and turn that on as you wish,
> often after something has gone wrong.
>


Flipping this on its head, I wonder, alternatively, if there might be a
"strict" mode for transform which errors out on unnamed arguments, instead
of providing the current undefined behavior.

Best,
~G


>
> -Original Message-
> From: R-devel  On Behalf Of Martin Maechler
> Sent: Friday, March 3, 2023 10:26 AM
> To: Gabriel Becker 
> Cc: Antoine Fabri ; R-devel <
> r-devel@r-project.org>
> Subject: Re: [Rd] transform.data.frame() ignores unnamed arguments when no
> named argument is provided
>
> >>>>> Gabriel Becker
> >>>>> on Thu, 2 Mar 2023 14:37:18 -0800 writes:
>
> > On Thu, Mar 2, 2023 at 2:02 PM Antoine Fabri
> >  wrote:
>
> >> Thanks and good point about unspecified behavior. The way
> >> it behaves now (when it doesn't ignore) is more
> >> consistent with data.frame() though so I prefer that to a
> >> "warn and ignore" behaviour:
> >>
> >> data.frame(a = 1, b = 2, 3)
> >>
> >> #> a b X3
> >>
> >> #> 1 1 2 3
> >>
> >>
> >> data.frame(a = 1, 2, 3)
> >>
> >> #> a X2 X3
> >>
> >> #> 1 1 2 3
> >>
> >>
> >> (and in general warnings make for unpleasant debugging so
> >> I prefer when we don't add new ones if avoidable)
> >>
>
> > I find silence to be much more unpleasant in practice when
> > debugging, myself, but that may be a personal preference.
>
> +1
>
> I also *strongly* disagree with the claim
>
>" in general warnings make for unpleasant debugging "
>
> That may be true for beginners (for whom debugging is often not really
> feasible anyway ..), but somewhat experienced useRs should know
>
> about
> options(warn = 1) # or
> options(warn = 2) # plus  options(error = recover) #
> or
> tryCatch( ...,  warning = ..)
>
> or  {even more}
>
&

Re: [Rd] transform.data.frame() ignores unnamed arguments when no named argument is provided

2023-03-02 Thread Gabriel Becker
On Thu, Mar 2, 2023 at 2:02 PM Antoine Fabri 
wrote:

> Thanks and good point about unspecified behavior. The way it behaves now
> (when it doesn't ignore) is more consistent with data.frame() though so I
> prefer that to a "warn and ignore" behaviour:
>
> data.frame(a = 1, b = 2, 3)
>
> #>   a b X3
>
> #> 1 1 2  3
>
>
> data.frame(a = 1, 2, 3)
>
> #>   a X2 X3
>
> #> 1 1  2  3
>
>
> (and in general warnings make for unpleasant debugging so I prefer when we
> don't add new ones if avoidable)
>

I find silence to be much more unpleasant in practice when debugging,
myself, but that may be a personal preference.


>
>
> playing a bit more with it, it would make sense to me that the following
> have the same output:
>
>
> coefficient <- 3
>
>
> data.frame(value1 = 5) |> transform(coefficient, value2 = coefficient *
> value1)
>
> #>   value1 X3 value2
>
> #> 1  5  3 15
>
>
> data.frame(value1 = 5, coefficient) |> transform(value2 = coefficient *
> value1)
>
> #>   value1 coefficient value2
>
> #> 1  5   3 15
>
>
I'm not so sure. data.frame() is doing some substitute magic to get the
column name coefficient there.

> coefficient = 3

> data.frame(value1 = 5, coefficient)

  value1 coefficient

1  5   3

Beyond that these two pieces of code are doing subtly but crucially
different things; in the latter, coefficient is a variable in the
data.frame, and when transform resolves that symbol during calculation of
value2, it *gets the column in the incoming data.frame*.

In the former case, coefficient does not exist in the data.frame, so the
symbol is being resolved somewhere else in the scope chain (in this case,
the global environment).

These happen to be the same, except for the column name , but we can see
the difference if we change the code to

> coefficient <- 3

> data.frame(value1 = 5, coefficient = 4)  |> transform(value2 = value1 *
coefficient)

  value1 coefficient value2

1  5   4 20

> data.frame(value1 = 5) |> transform(coefficient = 4, value2 = value1 *
coefficient)

  value1 coefficient *value2*

1  5   4 *15*

Please note that another way this difference could rear its head is if
these arent' directly one after eachother in a pipe:

> coefficient <- 3

> df1 <- data.frame(value1 = 5, coefficient)

> coefficient <- 4

> df2 <- data.frame(value1 = 5)

> df1 |> transform(value2 = value1 * coefficient)

  value1 coefficient value2

1  5   3 15

> df2 |> transform(coefficient, value2 = value1 * coefficient)

  value1 X4 value2

1  5  4 20


Cause you know someday the place where you do that transform and the place
where coefficient is initially set are gonna be far away from eachother, so
whether you put coefficient into the incoming data, or don't will matter.


Best,
~G

[[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] nightly r-devel.pkg builds failing since Jan 15

2023-02-26 Thread Gabriel Becker
Hi all,

It looks like for intel macs (ie high sierra) the nightly build of R-devel
has been failing continuously since Jan 16th:

https://mac.r-project.org/high-sierra/last-success/

Is this a known issue? I didn't see any way to get at the relevant logs (of
the .pkg creation step), as the .tar.gz step succeeded.

Also, the framework (at least the non-pkg'ed one thats in the .tar.gz file)
is unsigned, meaning the OS gives you grief about opening it.

Finally, it seems now that even the 4.2 branch is failing in the make stage.

Best,
~G

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] unlist preserve common class?

2022-12-09 Thread Gabriel Becker
Hi Spencer,

Another, potentially somewhat less disruptive/more general option would be
to add a stop.at.object or stop.at.nonlist (or alternatively list.only)
argument, which would basically translate to "collapse the list structure
to flat, but don't try to combine the leaf elements within the list. You
could then do whatever you wanted to said now-flat list as a second call.

i.e.,

flatlist <- unlist(structured_list, list.only = TRUE)
final_res <- cool_combiner_fun(flatlist)

I had to do something similar years ago when I was implementing xpath for
arbitrary R objects, because you can, e.g., always get x[1] out of x
infinitely many times, so I defined "Stopping functions". The fully general
case would be to do the same here, and accept, e.g., stopping.cond, but
that is probably too complex for unlist and might simply belong as a
completely separate function.

Best,
~G

On Thu, Dec 8, 2022 at 8:21 PM Spencer Graves <
spencer.gra...@effectivedefense.org> wrote:

> Hi, Gabriel:
>
>
> On 12/8/22 8:20 PM, Gabriel Becker wrote:
> > Hi Spencer,
> >
> > My 2c.
> >
> > According to the docs, factors are special-cased. Other S3 'classes'
> > could be special-cased, such as Date in your example, I suppose, but it
> > is not clear how what you're describing could be implemented for the
> > general case.
> >
> > Suppose I define an S3 "class" called my_awesome_class, and have a list
> > of 3 of them in it, and no other guarantees are provided. What should,
> > or even could, R do in the case of unlist(list_of_awesomes)?
> >
> > There is no guarantee that I as an S3 developer have provided a c method
> > for my class such that we could say the unlist call above is equivalent
> > (roughly) to do.call(c, list_of_awesomes), nor that I provided any other
> > particular "mash this set of my_awesome_class objects into one". Nor is
> > it even guaranteed that the concept of combining my_awesome_classobjects
> > is even coherent, or would produce a new my_awesome_classobject when
> > performed if it is.
>
>
>   What about adding another argument to create, e.g.,
>
>
> unlist(x, recursive = TRUE, use.names = TRUE, attributeFunction=NULL)
>
>
>   Then assign the assign the results of the current "unlist(x,
> ...)"
> to, say, "ux", and follow that by
>
>
>
> if(!is.null(attributeFunction))attributes(ux) <- attributeFunction(x)
>
>
> return(ux)
>
>
>   An alternative could be to have a default attributeFunction,
> that
> computes the attributes of each component of x and keeps only the ones
> that are shared by all components of x.  This would be the same as the
> current behavior for factors IF each component had the same factor
> levels and would drop attributes that are different between components.
> For S4 classes, if the attributes were not ALL identical, then all the
> attributes would be dropped, as with the current behavior.  This should
> not be a problem for S3 generics, because they should always check to
> make sure all the required attributes are available.
>
> >
> > That said, your example was of length one,
>
>
>   My example was of length one to provide a minimal,
> self-contained
> example.  That was motivated by a more complicated example, which took
> me a couple of hours to understand why it wasn't working as I expected ;-)
>
>
>   Thanks for your reply.
>
>
>   Spencer Graves
>
>
> we could special case (the
> > default method of) unlist so that for x /not a list/, we're guaranteed
> that
> >
> > identical(unlist(list(x)), x) == TRUE
> >
> > This would simplify certain code, such as the one from your motivating
> > example, but at the cost of making the output of unlist across inputs
> > less consistent and less easy to reason about and predict. In other
> > words the answer to the question "what class is
> > unlist(list_of_awesomes)? " would become "it depends on how many of them
> > are in the list"... That wouldn't be a good thing on balance, imho.
> >
> > Best,
> > ~G
> >
> > On Thu, Dec 8, 2022 at 5:44 PM Spencer Graves
> >  > <mailto:spencer.gra...@effectivedefense.org>> wrote:
> >
> > Consider:
> >
> >
> >   > str(unlist(list(Sys.Date(
> >num 19334
> >
> >
> >   > str(unlist(list(factor('a'
> >Factor w/ 1 level "a": 1
> >
> >
> >I naively expected "str(unlist(list(Sys.Date(" to
> > return an
> > objec

Re: [Rd] unlist preserve common class?

2022-12-08 Thread Gabriel Becker
Hi Spencer,

My 2c.

According to the docs, factors are special-cased. Other S3 'classes' could
be special-cased, such as Date in your example, I suppose, but it is not
clear how what you're describing could be implemented for the general case.

Suppose I define an S3 "class" called my_awesome_class, and have a list of
3 of them in it, and no other guarantees are provided. What should, or even
could, R do in the case of unlist(list_of_awesomes)?

There is no guarantee that I as an S3 developer have provided a c method
for my class such that we could say the unlist call above is equivalent
(roughly) to do.call(c, list_of_awesomes), nor that I provided any other
particular "mash this set of my_awesome_class objects into one". Nor is it
even guaranteed that the concept of combining my_awesome_class objects is
even coherent, or would produce a new my_awesome_class object when
performed if it is.

That said, your example was of length one, we could special case (the
default method of) unlist so that for x *not a list*, we're guaranteed that

identical(unlist(list(x)), x) == TRUE

This would simplify certain code, such as the one from your motivating
example, but at the cost of making the output of unlist across inputs less
consistent and less easy to reason about and predict. In other words the
answer to the question "what class is unlist(list_of_awesomes)? " would
become "it depends on how many of them are in the list"... That wouldn't be
a good thing on balance, imho.

Best,
~G

On Thu, Dec 8, 2022 at 5:44 PM Spencer Graves <
spencer.gra...@effectivedefense.org> wrote:

> Consider:
>
>
>  > str(unlist(list(Sys.Date(
>   num 19334
>
>
>  > str(unlist(list(factor('a'
>   Factor w/ 1 level "a": 1
>
>
>   I naively expected "str(unlist(list(Sys.Date(" to return an
> object of class 'Date'.  After some thought, I felt a need to ask this
> list if they think that the core R language might benefit from modifying
> the language so "str(unlist(list(Sys.Date(" was of class 'Date', at
> least as an option.
>
>
>   Comments?
>   Thanks,
>   Spencer Graves
>
>
>  > sessionInfo()
> R version 4.2.2 (2022-10-31)
> Platform: x86_64-apple-darwin17.0 (64-bit)
> Running under: macOS Big Sur 11.7.1
>
> Matrix products: default
> LAPACK:
> /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_4.2.2 tools_4.2.2
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tools:: extracting pkg dependencies from DCF

2022-10-28 Thread Gabriel Becker
Hi Jan,


On Fri, Oct 28, 2022 at 1:57 PM Jan Gorecki  wrote:

> Gabriel,
>
> It is the most basic CI use case. One wants to install only
> dependencies only of the package, and run R CMD check on the package.


Really what you're looking for though, is to install all the dependencies
which aren't present right? Excluding base packages is just a particular
way to do that under certain assumptions about the CI environment.

So


needed_pkgs <- setdiff(package_dependencies(...),
installed.packages()[,"Package"])
install.packages(needed_pkgs, repos = fancyrepos)


will do what you want without installing the package itself, if that is
important. This will filter out base and recommended packages (which will
be already installed in your CI container, since R is).


Now this does not take into account versioned dependencies, so it's not
actually fully correct (whereas installing the package is), but it gets you
where you're trying to go. And in a clean CI container without cached
package installation for the deps, its equivalent.


Also, as an aside, if you need to get the base packages, you can do

installed.packages(priority="base")[,"Package"]

   basecompilerdatasetsgraphics   grDevicesgrid

 "base"  "compiler"  "datasets"  "graphics" "grDevices"  "grid"

methodsparallel splines   stats  stats4   tcltk

  "methods"  "parallel"   "splines" "stats""stats4" "tcltk"

  tools   utils

"tools" "utils"

(to get base and recommended packages use 'high' instead of 'base')

No need to be reaching down into unexported functions. So if you *really*
only want to exclude base functions (which likely will give you some
protection from versioned dep issues), you can change the code above to

needed_pkgs <- setdiff(package_dependencies(...),
installed.packages(priority = "high")[,"Package"])
install.packages(needed_pkgs, repos = fancyrepos)

Best,
~G


> On Fri, Oct 28, 2022 at 8:42 PM Gabriel Becker 
> wrote:
> >
> > Hi Jan,
> >
> > The reason, I suspect without speaking for R-core, is that by design you
> should not be specifying package dependencies as additional packages to
> install. install.packages already does this for you, as it did in the
> construct of a repository code that I provided previously in the thread.
> You should be *only* doing
> >
> > install.packages(, repos = *)
> >
> > Then everything happens automatically via extremely well tested very
> mature code.
> >
> > I (still) don't understand why you'd need to pass install.packages the
> vector of dependencies yourself, as that is counter to install.packages'
> core design.
> >
> > Does that make sense?
> >
> > Best,
> > ~G
> >
> > On Fri, Oct 28, 2022 at 12:18 PM Jan Gorecki 
> wrote:
> >>
> >> Gabriel,
> >>
> >> I am trying to design generic solution that could be applied to
> >> arbitrary package. Therefore I went with the latter solution you
> >> proposed.
> >> If we wouldn't have to exclude base packages, then its a 3 liner
> >>
> >> file.copy("DESCRIPTION", file.path(tdir<-tempdir(), "PACKAGES"));
> >> db<-available.packages(paste0("file://", tdir));
> >> utils::install.packages(tools::package_dependencies("pkgname", db,
> >> which="most")[[1L]])
> >>
> >> As you noticed, we still have to filter out base packages. Otherwise
> >> it won't be a robust utility that can be used in CI. Therefore we have
> >> to add a call to tools:::.get_standard_package_names() which is an
> >> internal function (as of now). Not only complicating the call but also
> >> putting the functionality outside of safe use.
> >>
> >> Considering above, don't you agree that the following one liner could
> >> nicely address the problem? The problem that hundreds/thousands of
> >> packages are now addressing in their CI scripts by using a third party
> >> packages.
> >>
> >> utils::install.packages(packages.dcf("DESCRIPTION", which="most"))
> >>
> >> It is hard to me to understand why R members don't consider this basic
> >> functionality to be part of base R. Possibly they just don't need it
> >> themselves. Yet isn't this sufficient that hundreds/thousands of
> >> packages does need this functionality?
> >>
> >> Best regards,
> >> Jan
> >>
> >> On Mon, Oct 17, 2022 at 8:3

Re: [Rd] Lazy-evaluate elements wrapped with invisible

2022-10-28 Thread Gabriel Becker
Hi Dipterix,


On Fri, Oct 28, 2022 at 1:10 PM Dipterix Wang 
wrote:

> Hi,
>
> I was wondering if it is a good idea to delay the evaluation of expression
> within invisible(), just like data()/delayedAssign()?
>
> The idea is a function might return an invisible object. This object might
> not be used by the users if the function returns are not assigned nor
> passed to another function call. For example,
>
> f <- function() {
>   # do something eagerly
>
>   return(invisible({
> # calculate message that might take long/extra memory, but only useful
> if printed out
>   }))
> }
>
> If `f()` is not immediately assigned to a variable, then there is no
> reason to evaluate invisible(…).
>

This is not quite true. The value, even when invisible, is captured by
.Last.value, and

> f <- function() invisible(5)

> f()

> .Last.value

[1] 5


Now that doesn't actually preclude what you're suggesting (just have to
wait for .Last.value to be populated by something else), but it does
complicate it to the extent that I'm not sure the benefit we'd get would be
worth it.

Also, in the case you're describing, you'd be pushing the computational
cost into printing, which, imo, is not where it should live. Printing a
values generally speaking, should just print things, imo.

That said, if you really wanted to do this, you could approach the behavior
you want, I believe (but again, I think this is a bad idea) by returning a
custom class that wraps formula (or, I imagine, tidyverse style quosures)
that reach back into the call frame you return them from, and evaluating
them only on demand.

Best,
~G


> This idea is somewhere between `delayedAssign` and eager evaluation. Maybe
> we could call it delayedInvisible()?
>
> Best,
> - Zhengjia
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tools:: extracting pkg dependencies from DCF

2022-10-28 Thread Gabriel Becker
 > What about repos.dcf? Maybe additional repositories could be an
> attribute
> >> > attached to returned character vector.
> >> >
> >> > The use case is to, for a given package sources, obtain its
> dependencies,
> >> > so one can use that for installing them/mirroring CRAN subset, or
> whatever.
> >> > The later is especially important for a production environment where
> one
> >> > wants to have fixed version of packages, and mirroring relevant
> subset of
> >> > CRAN is the most simple, and IMO reliable, way to manage such
> environment.
> >> >
> >> > Regards
> >> > Jan
> >> >
> >> > On Fri, Oct 14, 2022, 23:34 Gabriel Becker 
> wrote:
> >> >
> >> >> Hi Jan and Jan,
> >> >>
> >> >> Can you explain a little more what exactly you want the
> non-recursive,
> >> >> non-version aware dependencies from an individual package for?
> >> >>
> >> >> Either way package_dependencies will do this for you* with a little
> >> >> "aggressive convincing". It wants output from available.packages,
> but who
> >> >> really cares what it wants? It's a function and we are people :)
> >> >>
> >> >>> library(tools)
> >> >>> db <- read.dcf("~/gabe/checkedout/rtables_clean/DESCRIPTION")
> >> >>> package_dependencies("rtables", db, which = intersect(c("Depends",
> >> >> "Suggests", "Imports", "LinkingTo"), colnames(db)))
> >> >> $rtables
> >> >> [1] "methods""magrittr"   "formatters" "dplyr"  "tibble"
> >> >> [6] "tidyr"  "testthat"   "xml2"   "knitr"  "rmarkdown"
> >> >> [11] "flextable"  "officer""stats"  "htmltools"  "grid"
> >> >>
> >> >>
> >> >> The only gotcha that I see immediately is that "LinkingTo" isn't
> always
> >> >> there (whereas it is with real output from available.packages). If
> you
> >> >> know your package doesn't have that (or that it does) at call time ,
> this
> >> >> becomes a one-liner:
> >> >>
> >> >> package_dependencies("rtables", db =
> >> >> read.dcf("~/gabe/checkedout/rtables_clean/DESCRIPTION"), which =
> >> >> c("Depends", "Suggests", "Imports"))
> >> >> $rtables
> >> >> [1] "methods""magrittr"   "formatters" "dplyr"  "tibble"
> >> >> [6] "tidyr"  "testthat"   "xml2"   "knitr"  "rmarkdown"
> >> >> [11] "flextable"  "officer""stats"  "htmltools"  "grid"
> >> >>
> >> >> You can also trick it a slightly different way by giving it what it
> >> >> actually wants
> >> >>
> >> >>> tdir <- tempdir()
> >> >>> file.copy("~/gabe/checkedout/rtables_clean/DESCRIPTION",
> file.path(tdir,
> >> >> "PACKAGES"))
> >> >> [1] TRUE
> >> >>> avl <- available.packages(paste0("file://", tdir))
> >> >>> library(tools)
> >> >>> package_dependencies("rtables", avl)
> >> >> $rtables
> >> >> [1] "methods""magrittr"   "formatters" "stats"  "htmltools"
> >> >> [6] "grid"
> >> >>
> >> >>> package_dependencies("rtables", avl, which = "all")
> >> >> $rtables
> >> >> [1] "methods""magrittr"   "formatters" "stats"  "htmltools"
> >> >> [6] "grid"   "dplyr"  "tibble" "tidyr"  "testthat"
> >> >> [11] "xml2"   "knitr"  "rmarkdown"  "flextable"  "officer"
> >> >>
> >> >> So the only real benefits I see that we'd be picking up here is
> automatic
> >> >> filtering by priority, and automatic extraction of the package name
> from
> &

Re: [Rd] tools:: extracting pkg dependencies from DCF

2022-10-15 Thread Gabriel Becker
Rlib/syswide-4.1.2"
base64enc "file:///Users/gabrielbecker/Rlib/syswide-4.1.2"
> package_dependencies("rtables", avl, recursive = TRUE)
$rtables
 [1] "methods""magrittr"   "formatters" "stats"  "htmltools"
 [6] "grid"   "utils"  "digest" "grDevices"  "base64enc"
[11] "rlang"  "fastmap"

> package_dependencies("rtables", avl, which = "all", recursive = TRUE)
$rtables
  [1] "methods"
  [2] "magrittr"
  [3] "formatters"
  [4] "stats"
  [5] "htmltools"
  [6] "grid"
  [7] "dplyr"
  [8] "tibble"

 
[653] "rjson"
[654] "rsolr"
[655] "rlecuyer"
[656] "filelock"

Now you should probably move the PACKAGES file somewhere else and not leave
it in your package library, but I trust this illustrated my point. Most of
the exported machinery is based on available.packages output, but that's
not really a meaningful blocker for this type of work. We can get
available.packages output if we need to. Remember the PACKAGES file is just
a bunch of DESCRIPTION files slightly trimmed and then appended one after
the other.

 This also shows why recursive and which=all don't really go together. In
my opinion (and thus switchr's) the correct thing to do is do all for the
package in question and then only hard dependencies of those packages
recursively. That will let you build the package's vignettes (if you care
about such things), but won't pull in hundreds or thousands of reverse deps.

Best,
~G



> Regards
> Jan
>
> On Fri, Oct 14, 2022, 23:34 Gabriel Becker  wrote:
>
>> Hi Jan and Jan,
>>
>> Can you explain a little more what exactly you want the non-recursive,
>> non-version aware dependencies from an individual package for?
>>
>> Either way package_dependencies will do this for you* with a little
>> "aggressive convincing". It wants output from available.packages, but who
>> really cares what it wants? It's a function and we are people :)
>>
>> > library(tools)
>> > db <- read.dcf("~/gabe/checkedout/rtables_clean/DESCRIPTION")
>> > package_dependencies("rtables", db, which = intersect(c("Depends",
>> "Suggests", "Imports", "LinkingTo"), colnames(db)))
>> $rtables
>>  [1] "methods""magrittr"   "formatters" "dplyr"  "tibble"
>>  [6] "tidyr"  "testthat"   "xml2"   "knitr"  "rmarkdown"
>> [11] "flextable"  "officer""stats"  "htmltools"  "grid"
>>
>>
>> The only gotcha that I see immediately is that "LinkingTo" isn't always
>> there (whereas it is with real output from available.packages). If you
>> know your package doesn't have that (or that it does) at call time , this
>> becomes a one-liner:
>>
>> package_dependencies("rtables", db =
>> read.dcf("~/gabe/checkedout/rtables_clean/DESCRIPTION"), which =
>> c("Depends", "Suggests", "Imports"))
>> $rtables
>>  [1] "methods""magrittr"   "formatters" "dplyr"  "tibble"
>>  [6] "tidyr"  "testthat"   "xml2"   "knitr"  "rmarkdown"
>> [11] "flextable"  "officer""stats"  "htmltools"  "grid"
>>
>> You can also trick it a slightly different way by giving it what it
>> actually wants
>>
>> > tdir <- tempdir()
>> > file.copy("~/gabe/checkedout/rtables_clean/DESCRIPTION",
>> file.path(tdir, "PACKAGES"))
>> [1] TRUE
>> > avl <- available.packages(paste0("file://", tdir))
>> > library(tools)
>> > package_dependencies("rtables", avl)
>> $rtables
>> [1] "methods""magrittr"   "formatters" "stats"  "htmltools"
>> [6] "grid"
>>
>> > package_dependencies("rtables", avl, which = "all")
>> $rtables
>>  [1] "methods""magrittr"   "formatters" "stats"  "htmltools"
>>  [6] "grid"   "dplyr"  "tibble" "tidyr"  "testthat"
>> [11] "xml2"   "knitr"  "rmarkdown"  "flextable"  "

Re: [Rd] tools:: extracting pkg dependencies from DCF

2022-10-14 Thread Gabriel Becker
Hi Jan and Jan,

Can you explain a little more what exactly you want the non-recursive,
non-version aware dependencies from an individual package for?

Either way package_dependencies will do this for you* with a little
"aggressive convincing". It wants output from available.packages, but who
really cares what it wants? It's a function and we are people :)

> library(tools)
> db <- read.dcf("~/gabe/checkedout/rtables_clean/DESCRIPTION")
> package_dependencies("rtables", db, which = intersect(c("Depends",
"Suggests", "Imports", "LinkingTo"), colnames(db)))
$rtables
 [1] "methods""magrittr"   "formatters" "dplyr"  "tibble"
 [6] "tidyr"  "testthat"   "xml2"   "knitr"  "rmarkdown"
[11] "flextable"  "officer""stats"  "htmltools"  "grid"


The only gotcha that I see immediately is that "LinkingTo" isn't always
there (whereas it is with real output from available.packages). If you know
your package doesn't have that (or that it does) at call time , this
becomes a one-liner:

package_dependencies("rtables", db =
read.dcf("~/gabe/checkedout/rtables_clean/DESCRIPTION"), which =
c("Depends", "Suggests", "Imports"))
$rtables
 [1] "methods""magrittr"   "formatters" "dplyr"  "tibble"
 [6] "tidyr"  "testthat"   "xml2"   "knitr"  "rmarkdown"
[11] "flextable"  "officer""stats"  "htmltools"  "grid"

You can also trick it a slightly different way by giving it what it
actually wants

> tdir <- tempdir()
> file.copy("~/gabe/checkedout/rtables_clean/DESCRIPTION", file.path(tdir,
"PACKAGES"))
[1] TRUE
> avl <- available.packages(paste0("file://", tdir))
> library(tools)
> package_dependencies("rtables", avl)
$rtables
[1] "methods""magrittr"   "formatters" "stats"  "htmltools"
[6] "grid"

> package_dependencies("rtables", avl, which = "all")
$rtables
 [1] "methods""magrittr"   "formatters" "stats"  "htmltools"
 [6] "grid"   "dplyr"  "tibble" "tidyr"  "testthat"
[11] "xml2"   "knitr"  "rmarkdown"  "flextable"  "officer"

So the only real benefits I see that we'd be picking up here is automatic
filtering by priority, and automatic extraction of the package name from
the DESCRIPTION file. I'm not sure either of those warrant a new exported
function that R-core has to maintain forever.

Best,
~G

* I haven't tested this across all OSes, but I dont' know of any reason it
wouldn't work generally.

On Fri, Oct 14, 2022 at 2:33 PM Jan Gorecki  wrote:

> Hello Jan,
>
> Thanks for confirming about many packages reinventing this missing
> functionality.
> packages.dcf was not meant handle versions. It just extracts names of
> dependencies... Yes, such a simple thing, yet missing in base R.
>
> Versions of packages can be controlled when setting up R pkgs repo. This is
> how I used to handle it. Making a CRAN subset mirror of fixed version pkgs.
> BTW. function for that is also included in mentioned branch. I am just not
> proposing it, to increase the chance of having at least this simple,
> missing, functionality merged.
>
> Best
> Jan
>
> On Fri, Oct 14, 2022, 15:14 Jan Netík  wrote:
>
> > Hello Jan,
> >
> > I have seen many packages that implemented dependencies "extraction" on
> > their own for internal purposes and today I was doing exactly that for
> > mine. It's not a big deal using read.dcf on DESCRIPTION. It was
> sufficient
> > for me, but I had to take care of some \n chars (the overall returned
> value
> > has some rough edges, in my opinion). However, the function from the
> branch
> > seems to not care about version requirements, which are crucial for me.
> > Maybe that is something to reconsider before merging.
> >
> > Best,
> > Jan
> >
> > pá 14. 10. 2022 v 2:27 odesílatel Jan Gorecki 
> > napsal:
> >
> >> Dear R devs,
> >>
> >> I would like to raise a request for a simple helper function.
> >> Utility function to extract package dependencies from DESCRIPTION file.
> >>
> >> I do think that tools package is better place, for such a fundamental
> >> functionality, than community packages.
> >>
> >> tools pkg seems perfect fit (having already great function
> >> write_PACKAGES).
> >>
> >> Functionality I am asking for is already in R svn repository since 2016,
> >> in
> >> a branch tools4pkgs. Function is called 'packages.dcf'.
> >> Another one 'repos.dcf' would be a good functional complementary to it.
> >>
> >> Those two simple helper functions really makes it easier for
> organizations
> >> to glue together usage of their own R packages repos and CRAN repo in a
> >> smooth way. That could possibly help to offload CRAN from new
> submissions.
> >>
> >> gh mirror link for easy preview:
> >>
> >>
> https://github.com/wch/r-source/blob/tools4pkgs/src/library/tools/R/packages.R#L419
> >>
> >> Regards
> >> Jan Gorecki
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
>
>  

Re: [Rd] Proposal to limit Internet access during package load

2022-09-26 Thread Gabriel Becker
Ah, thats embarrassing. Thats a bug in how/where I handle lack of
connectivity, rather than me not doing it. I've just push a fix to the
github repo that now cleanly passes check  with no internet connectivity
(much more stringent).

Using a canned file is a bit odd, because in the case where there's no
connectivity, the package  won't work (the canned file would just set the
repositories to URLs that R still won't be able to reach).

Anyway,
Thanks
~G

On Mon, Sep 26, 2022 at 3:11 PM Simon Urbanek 
wrote:

>
>
> > On 27/09/2022, at 11:02 AM, Gabriel Becker 
> wrote:
> >
> > For the record, the only things switchr (my package) is doing internet
> wise should be hitting the bioconductor config file (
> http://bioconductor.org/config.yaml) so that it knows the things it need
> to know about Bioc repos/versions/etc (at load time, actually, not install
> time, but since install does a test load, those are essentially the same).
> >
> > I have fallback behavior for when the file can't be read, so there
> shouldn't be any actual build breakages/install breakages I don't think,
> but the check does happen.
> >
>
> $ sandbox-exec -n no-network R CMD INSTALL switchr_0.14.5.tar.gz
> [...]
> ** testing if installed package can be loaded from final location
> Error in readLines(con) :
>   cannot open the connection to 'http://bioconductor.org/config.yaml'
> Calls:  ... getBiocDevelVr -> getBiocYaml -> inet_handlers ->
> readLines
> Execution halted
> ERROR: loading failed
>
> So, yes, it does break. You should recover from the error and use a
> fall-back file that you ship.
>
> Cheers,
> Simon
>
>
> > Advice on what to do for the above use case that is better practice is
> welcome.
> >
> > ~G
> >
> > On Mon, Sep 26, 2022 at 2:40 PM Simon Urbanek <
> simon.urba...@r-project.org> wrote:
> >
> >
> > > On 27/09/2022, at 10:21 AM, Iñaki Ucar 
> wrote:
> > >
> > > On Mon, 26 Sept 2022 at 23:07, Simon Urbanek
> > >  wrote:
> > >>
> > >> Iñaki,
> > >>
> > >> I'm not sure I understand - system dependencies are an entirely
> different topic and I would argue a far more important one (very happy to
> start a discussion about that), but that has nothing to do with declaring
> downloads. I assumed your question was about large files in packages which
> packages avoid to ship and download instead so declaring them would be
> useful.
> > >
> > > Exactly. Maybe there's a misunderstanding, because I didn't talk about
> system dependencies (alas there are packages that try to download things
> that are declared as system dependencies, as Gabe noted). :)
> > >
> >
> >
> > Ok, understood. I would like to tackle those as well, but let's start
> that conversation in a few weeks when I have a lot more time.
> >
> >
> > >> And for that, the obvious answer is they shouldn't do that - if a
> package needs a file to run, it should include it. So an easy solution is
> to disallow it.
> > >
> > > Then we completely agree. My proposal about declaring additional
> sources was because, given that so many packages do this, I thought that I
> would find a strong opposition to this. But if R Core / CRAN is ok with
> just limiting net access at install time, then that's perfect to me. :)
> > >
> >
> > Yes we do agree :). I started looking at your list, and so far those
> seem simply bugs or design deficiencies in the packages (and outright
> policy violations). I think the only reason they exist is that it doesn't
> get detected in CRAN incoming, it's certainly not intentional.
> >
> > Cheers,
> > Simon
> >
> >
> > > Iñaki
> > >
> > >> But so far all examples where just (ab)use of downloads for binary
> dependencies which is an entirely different issue that needs a different
> solution (in a naive way declaring such dependencies, but we know it's not
> that simple - and download URLs don't help there).
> > >>
> > >> Cheers,
> > >> Simon
> > >>
> > >>
> > >>> On 27/09/2022, at 8:25 AM,  Ucar  wrote:
> > >>>
> > >>> On Sat, 24 Sept 2022 at 01:55, Simon Urbanek
> > >>>  wrote:
> > >>>>
> > >>>> Iñaki,
> > >>>>
> > >>>> I fully agree, this a very common issue since vast majority of
> server deployments I have encountered don't allow internet access. In
> practice this means that such packages are effectively banned.
> > >>>>
> > >>>> I would

Re: [Rd] Proposal to limit Internet access during package load

2022-09-26 Thread Gabriel Becker
For the record, the only things switchr (my package) is doing internet wise
should be hitting the bioconductor config file (
http://bioconductor.org/config.yaml) so that it knows the things it need to
know about Bioc repos/versions/etc (at load time, actually, not install
time, but since install does a test load, those are essentially the same).

I have fallback behavior for when the file can't be read, so there
shouldn't be any actual build breakages/install breakages I don't think,
but the check does happen.

Advice on what to do for the above use case that is better practice is
welcome.

~G

On Mon, Sep 26, 2022 at 2:40 PM Simon Urbanek 
wrote:

>
>
> > On 27/09/2022, at 10:21 AM, Iñaki Ucar  wrote:
> >
> > On Mon, 26 Sept 2022 at 23:07, Simon Urbanek
> >  wrote:
> >>
> >> Iñaki,
> >>
> >> I'm not sure I understand - system dependencies are an entirely
> different topic and I would argue a far more important one (very happy to
> start a discussion about that), but that has nothing to do with declaring
> downloads. I assumed your question was about large files in packages which
> packages avoid to ship and download instead so declaring them would be
> useful.
> >
> > Exactly. Maybe there's a misunderstanding, because I didn't talk about
> system dependencies (alas there are packages that try to download things
> that are declared as system dependencies, as Gabe noted). :)
> >
>
>
> Ok, understood. I would like to tackle those as well, but let's start that
> conversation in a few weeks when I have a lot more time.
>
>
> >> And for that, the obvious answer is they shouldn't do that - if a
> package needs a file to run, it should include it. So an easy solution is
> to disallow it.
> >
> > Then we completely agree. My proposal about declaring additional sources
> was because, given that so many packages do this, I thought that I would
> find a strong opposition to this. But if R Core / CRAN is ok with just
> limiting net access at install time, then that's perfect to me. :)
> >
>
> Yes we do agree :). I started looking at your list, and so far those seem
> simply bugs or design deficiencies in the packages (and outright policy
> violations). I think the only reason they exist is that it doesn't get
> detected in CRAN incoming, it's certainly not intentional.
>
> Cheers,
> Simon
>
>
> > Iñaki
> >
> >> But so far all examples where just (ab)use of downloads for binary
> dependencies which is an entirely different issue that needs a different
> solution (in a naive way declaring such dependencies, but we know it's not
> that simple - and download URLs don't help there).
> >>
> >> Cheers,
> >> Simon
> >>
> >>
> >>> On 27/09/2022, at 8:25 AM,  Ucar  wrote:
> >>>
> >>> On Sat, 24 Sept 2022 at 01:55, Simon Urbanek
> >>>  wrote:
> 
>  Iñaki,
> 
>  I fully agree, this a very common issue since vast majority of server
> deployments I have encountered don't allow internet access. In practice
> this means that such packages are effectively banned.
> 
>  I would argue that not even (1) or (2) are really an issue, because
> in fact the CRAN policy doesn't impose any absolute limits on size, it only
> states that the package should be "of minimum necessary size" which means
> it shouldn't waste space. If there is no way to reduce the size without
> impacting functionality, it's perfectly fine.
> >>>
> >>> "Packages should be of the minimum necessary size" is subject to
> >>> interpretation. And in practice, there is an issue with e.g. packages
> >>> that "bundle" big third-party libraries. There are also packages that
> >>> require downloading precompiled code, JARs... at installation time.
> >>>
>  That said, there are exceptions such as very large datasets (e.g., as
> distributed by Bioconductor) which are orders of magnitude larger than what
> is sustainable. I agree that it would be nice to have a mechanism for
> specifying such sources. So yes, I like the idea, but I'd like to see more
> real use cases to justify the effort.
> >>>
> >>> "More real use cases" like in "more use cases" or like in "the
> >>> previous ones are not real ones"? :)
> >>>
>  The issue with any online downloads, though, is that there is no
> guarantee of availability - which is real issue for reproducibility. So one
> could argue that if such external sources are required then they should be
> on a well-defined, independent, permanent storage such as Zenodo. This
> could be a matter of policy as opposed to the technical side above which
> would be adding such support to R CMD INSTALL.
> >>>
> >>> Not necessarily. If the package declares the additional sources in the
> >>> DESCRIPTION (probably with hashes), that's a big improvement over the
> >>> current state of things, in which basically we don't know what the
> >>> package tries download, then it may fail, and finally there's no
> >>> guarantee that it's what the author intended in the first place.
> >>>
> >>> But on top of this, R could add a CMD to 

Re: [Rd] Proposal to limit Internet access during package load

2022-09-26 Thread Gabriel Becker
Hi Simon,

The example of this I'm aware of that is most popular and widely used "in
the wild" is the stringi package (which is a dep of the widely used stringr
pkg) whose configure file downloads the ICU Data Library (icudt).

See https://github.com/gagolews/stringi/blob/master/configure#L5412

Note it does have some sort of workaround in place for non-internet-capable
build machines, but it is external (the build in question fails without the
workaround already explicitly performed).

Best,
~G



On Mon, Sep 26, 2022 at 12:50 PM Simon Urbanek 
wrote:

>
>
> > On Sep 27, 2022, at 8:25 AM, Iñaki Ucar  wrote:
> >
> > On Sat, 24 Sept 2022 at 01:55, Simon Urbanek
> >  wrote:
> >>
> >> Iñaki,
> >>
> >> I fully agree, this a very common issue since vast majority of server
> deployments I have encountered don't allow internet access. In practice
> this means that such packages are effectively banned.
> >>
> >> I would argue that not even (1) or (2) are really an issue, because in
> fact the CRAN policy doesn't impose any absolute limits on size, it only
> states that the package should be "of minimum necessary size" which means
> it shouldn't waste space. If there is no way to reduce the size without
> impacting functionality, it's perfectly fine.
> >
> > "Packages should be of the minimum necessary size" is subject to
> > interpretation. And in practice, there is an issue with e.g. packages
> > that "bundle" big third-party libraries. There are also packages that
> > require downloading precompiled code, JARs... at installation time.
> >
>
> JARs are part of the package, so that's a valid use, no question there,
> that's how Java packages do this already.
>
> Downloading pre-compiled binaries is something that shouldn't be done and
> a whole can of worms (since those are not sources and it *is* specific to
> the platform, os etc.) that is entirely separate, but worth a separate
> discussion. So I still don't see any use cases for actual sources. I do see
> a need for better specification of external dependencies which are not part
> of the package such that those can be satisfied automatically - but that's
> not the problem you asked about.
>
>
> >> That said, there are exceptions such as very large datasets (e.g., as
> distributed by Bioconductor) which are orders of magnitude larger than what
> is sustainable. I agree that it would be nice to have a mechanism for
> specifying such sources. So yes, I like the idea, but I'd like to see more
> real use cases to justify the effort.
> >
> > "More real use cases" like in "more use cases" or like in "the
> > previous ones are not real ones"? :)
> >
> >> The issue with any online downloads, though, is that there is no
> guarantee of availability - which is real issue for reproducibility. So one
> could argue that if such external sources are required then they should be
> on a well-defined, independent, permanent storage such as Zenodo. This
> could be a matter of policy as opposed to the technical side above which
> would be adding such support to R CMD INSTALL.
> >
> > Not necessarily. If the package declares the additional sources in the
> > DESCRIPTION (probably with hashes), that's a big improvement over the
> > current state of things, in which basically we don't know what the
> > package tries download, then it may fail, and finally there's no
> > guarantee that it's what the author intended in the first place.
> >
> > But on top of this, R could add a CMD to download those, and then some
> > lookaside storage could be used on CRAN. This is e.g. how RPM
> > packaging works: the spec declares all the sources, they are
> > downloaded once, hashed and stored in a lookaside cache. Then package
> > building doesn't need general Internet connectivity, just access to
> > the cache.
> >
>
> Sure, I fully agree that it would be a good first step, but I'm still
> waiting for examples ;).
>
> Cheers,
> Simon
>
>
> > Iñaki
> >
> >>
> >> Cheers,
> >> Simon
> >>
> >>
> >>> On Sep 24, 2022, at 3:22 AM, Iñaki Ucar 
> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I'd like to open this debate here, because IMO this is a big issue.
> >>> Many packages do this for various reasons, some more legitimate than
> >>> others, but I think that this shouldn't be allowed, because it
> >>> basically means that installation fails in a machine without Internet
> >>> access (which happens e.g. in Linux distro builders for security
> >>> reasons).
> >>>
> >>> Now, what if connection is suppressed during package load? There are
> >>> basically three use cases out there:
> >>>
> >>> (1) The package requires additional files for the installation (e.g.
> >>> the source code of an external library) that cannot be bundled into
> >>> the package due to CRAN restrictions (size).
> >>> (2) The package requires additional files for using it (e.g.,
> >>> datasets, a JAR...) that cannot be bundled into the package due to
> >>> CRAN restrictions (size).
> >>> (3) Other spurious reasons (e.g. the maintainer 

Re: [Rd] Respecting custom repositories files in interactive/batch R sessions

2022-09-15 Thread Gabriel Becker
Hi Dirk,

So there's a couple of things going on. First off you're correct that that
works generally. There are a couple of reasons that made it not. The first
is a bug/design error in Rstudio which is causing the R_PROFILE to not be
adhered to when you build there. I will be filing a bug regarding that with
them, as I know that is irrelevant to this list.  There was some indication
that even raw R CMD check running via an R studio server installation was
missing the profile, but that ended up being spurious upon deeper testing.

That said, I do think that there is a case to be made for the ability to
modify what repositories R knows about at a more fundamental level than
setting options in a site profile, and that is, ostensibly, what the
repositories file machinery does. I understand it was intended initially
and is currently only (?) used for the windows repository gui menu and
related setRepositories function, but I still think there is some value in
extending it in the ways I described.

One major difference is that in this case, even when run with --vanilla,
administrators would still be in control of which repositories users hit
(by default only, of course, but there is still value in that).

Best,
~G

On Thu, Sep 15, 2022 at 11:31 AM Dirk Eddelbuettel  wrote:

>
> I may be missing something here but aren't you overcomplicating things?
> One
> can avoid the repetitive dialog by setting   options(repos)   accordingly,
> and I have long done so.  The Debian (and hence Ubuntu and other
> derivatives)
> package does so via the Rprofile.site I ship.  See e.g. here
>
>  https://sources.debian.org/src/r-base/4.2.1-2/debian/Rprofile.site/
>
> I have used the same mechanism to point to intra-company repositories,
> easily
> a decade or so ago. I had no problems with R CMD check of in-house packages
> using this.
>
> Dirk
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Respecting custom repositories files in interactive/batch R sessions

2022-09-15 Thread Gabriel Becker
Hi all,

A company I work with mirrors CRAN internally behind its firewall for
security (and reproducibility/consistency/etc) reasons. In that case, we
would like all R processes (across all the R CMD *, as well as interactive
and batch sessions) to automatically hit our cran mirror instead of
prompting the user to select a mirror or failing to contact CRAN at all
(during check).

I recently found out about the ${R_HOME}/etc/repositories file (after
multiple years owning the R installations of a sizable corporate research
outfit in my previous job).

Contrary to my expectations, however, the CRAN entry found in the
repositories file is not respected in interactive or batch sessions.

With the value "https://fakeyfakeyfake; for the CRAN URL, I get this
behavior in an interactive session in Rdevel built from trunk:

R Under development (unstable) (2022-09-14 r82853) -- "Unsuffered
Consequences"





> available.packages()

--- Please select a CRAN mirror for use in this session ---

Secure CRAN mirrors


 


Selection: 0

*Error in contrib.url(repos, type) : *

*  trying to use CRAN without setting a mirror*

> readLines(file.path(R.home(), "etc", "repositories"))

 

[11] "menu_name\tURL\tdefault\tsource\twin.binary\tmac.binary"


[12] "CRAN\tCRAN\t\*"https://fakeyfakeyfake
\"*\tTRUE\tTRUE\tTRUE\tTRUE"
  


R CMD check, on the other hand, *does* use it the entry in repositories out
of the box:


gabrielbecker$ Rdevel CMD check switchr_0.14.5.tar.gz

[1]
"/Users/gabrielbecker/local/Rdevelraw/R.framework/Versions/4.3/Resources/library"

* using log directory ‘/Users/gabrielbecker/gabe/checkedout/switchr.Rcheck’

* using R Under development (unstable) (2022-09-14 r82853)

* using platform: x86_64-apple-darwin21.5.0 (64-bit)

* using session charset: UTF-8

* checking for file ‘switchr/DESCRIPTION’ ... OK

* checking extension type ... Package

* this is package ‘switchr’ version ‘0.14.5’

* checking package namespace information ... OK

* checking package dependencies ...Warning: unable to access index for
repository https://fakeyfakeyfake/src/contrib:

  cannot open URL 'https://fakeyfakeyfake/src/contrib/PACKAGES'


This behavior is coming from the fact that the repos option is unilaterally
set to c(CRAN = "@CRAN@") in utils::.onLoad


I propose instead that this should be set to either a) the CRAN entry to
the repository file, or even better imho, b) the set of all repos marked as
default in the repositories file, with a caveat that its set to @CRAN@ in
the case there is no cran entry, though comments around the source code in
tools suggest other things will break in that case anyway.

The default value of the repositories file has @CRAN@ for the cran entry,
and cran is the only repo marked as default, so this preserves the existing
behavior in what I assume to be the overwhelming majority of cases where
the repositories file is either not custom, or is only appended to .

I have a patch which does option (b) (and can easily be adapted to option
(a)) that I will submit to bugzilla after any discussion here.

Also, as a separate issue, I strongly feel that the R administration manual
section about repositories be updated to more clearly describe behavior and
best practices around setting the repos R will look in. I will develop a
patch for that separately once I see whether one of the above changes is
likely to go in or not (as I don't want to write it twice).

For completeness, I know that we could put a setRepositories call in the
the site Rprofile, but I have to admit I don't really understand why this
should be necessary.

Thoughts?
~G

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: svd() results should have a class

2022-06-23 Thread Gabriel Becker
This does make sense to me, though I admit to not feeling super strongly
about it. Particularly in the light of the precedent from qr().

It would also be "mostly" backwards compatible, as there would not be
methods for the new class for existing code to get hung up on out of the
gate. Particularly, if class(svd(...)) was c("svd", "list") to ensure
existing methods were always hit. I don't have a great sense of the
difference in end behavior between doing that and just having the class e
"svd", though.

Should we add this as a very late addition to the Bug BBQ list for further
discussion?

~G

On Thu, Jun 23, 2022 at 5:08 PM Lenth, Russell V 
wrote:

> Bob,
>
> I'm not talking about using svd as a generic method. I am talking about a
> method FOR svd results, e.g. an S3 method like foo.svd(), for which there
> already exist other methods, say foo.default and foo.qr. Currently if I
> wanted to do
>
> svdobj <- svd(x)
> foo(svdobj)
>
> it would not dispatch correctly because there is no svd class. Instead, it
> would be handled by foo.list if it exists, and it is certainly not clear
> that foo.list would do the right thing.
>
> Russ
>
> Sent from my iPad
>
> On Jun 23, 2022, at 6:53 PM, Robert Harlow  wrote:
>
> 
> Don't have a view on whether it makes sense in base R or not, but WRE
> section 7.1 may be helpful to you:
> https://cran.r-project.org/doc/manuals/R-exts.html#Adding-new-generics.
>
> It's not uncommon for packages to want to make base methods generic and
> the above link provides advice on how to do so.
>
> Bob
>
> On Thu, Jun 23, 2022 at 12:07 PM Lenth, Russell V  > wrote:
> Dear R-Devel,
>
> I noticed that if we run base::svd(x), we obtain an object of class
> "list". Shouldn't there be an "svd" class, in case someone (e.g., me) wants
> to write methods for singular value decompositions? Note that other
> matrix-decomposition routines like qr() and eigen() each return objects
> having those names.
>
> Thanks
>
> Russ Lenth
> russell-le...@uiowa.edu
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-06 Thread Gabriel Becker
As I recall, there was a large discussion related to that which resulted in
the recycle0 argument being added (but defaulting to FALSE) for
paste/paste0.

I think a lot of these things ultimately mean that if there were to be a
string concatenation operator, it probably shouldn't have behavior
identical to paste0. Was that what you were getting at as well, Bill?

~G

On Mon, Dec 6, 2021 at 4:11 PM Bill Dunlap  wrote:

> Should paste0(character(0), c("a","b")) give character(0)?
> There is a fair bit of code that assumes that paste("X",NULL) gives "X"
> but c(1,2)+NULL gives numeric(0).
>
> -Bill
>
> On Mon, Dec 6, 2021 at 1:32 PM Duncan Murdoch 
> wrote:
>
>> On 06/12/2021 4:21 p.m., Avraham Adler wrote:
>> > Gabe, I agree that missingness is important to factor in. To somewhat
>> abuse
>> > the terminology, NA is often used to represent missingness. Perhaps
>> > concatenating character something with character something missing
>> should
>> > result in the original character?
>>
>> I think that's a bad idea.  If you wanted to represent an empty string,
>> you should use "" or NULL, not NA.
>>
>> I'd agree with Gabe, paste0("abc", NA) shouldn't give "abcNA", it should
>> give NA.
>>
>> Duncan Murdoch
>>
>> >
>> > Avi
>> >
>> > On Mon, Dec 6, 2021 at 3:35 PM Gabriel Becker 
>> wrote:
>> >
>> >> Hi All,
>> >>
>> >> Seeing this and the other thread (and admittedly not having clicked
>> through
>> >> to the linked r-help thread), I wonder about NAs.
>> >>
>> >> Should NA  "hi there"  not result in NA_character_? This is not
>> >> what any of the paste functions do, but in my opinoin, NA +
>> 
>> >> seems like it should be NA  (not "NA"), particularly if we are talking
>> >> about `+` overloading, but potentially even in the case of a distinct
>> >> concatenation operator?
>> >>
>> >> I guess what I'm saying is that in my head missingness propagation
>> rules
>> >> should take priority in such an operator (ie NA +  should
>> >> *always * be NA).
>> >>
>> >> Is that something others disagree with, or has it just not come up yet
>> in
>> >> (the parts I have read) of this discussion?
>> >>
>> >> Best,
>> >> ~G
>> >>
>> >> On Mon, Dec 6, 2021 at 10:03 AM Radford Neal 
>> >> wrote:
>> >>
>> >>>>> In pqR (see pqR-project.org), I have implemented ! and !! as binary
>> >>>>> string concatenation operators, equivalent to paste0 and paste,
>> >>>>> respectively.
>> >>>>>
>> >>>>> For instance,
>> >>>>>
>> >>>>>   > "hello" ! "world"
>> >>>>>   [1] "helloworld"
>> >>>>>   > "hello" !! "world"
>> >>>>>   [1] "hello world"
>> >>>>>   > "hello" !! 1:4
>> >>>>>   [1] "hello 1" "hello 2" "hello 3" "hello 4"
>> >>>>
>> >>>> I'm curious about the details:
>> >>>>
>> >>>> Would `1 ! 2` convert both to strings?
>> >>>
>> >>> They're equivalent to paste0 and paste, so 1 ! 2 produces "12", just
>> >>> like paste0(1,2) does.  Of course, they wouldn't have to be exactly
>> >>> equivalent to paste0 and paste - one could impose stricter
>> >>> requirements if that seemed better for error detection.  Off hand,
>> >>> though, I think automatically converting is more in keeping with the
>> >>> rest of R.  Explicitly converting with as.character could be tedious.
>> >>>
>> >>> I suppose disallowing logical arguments might make sense to guard
>> >>> against typos where ! was meant to be the unary-not operator, but
>> >>> ended up being a binary operator, after some sort of typo.  I doubt
>> >>> that this would be a common error, though.
>> >>>
>> >>> (Note that there's no ambiguity when there are no typos, except that
>> >>> when negation is involved a space may be needed - so, for example,
>> >>> "x" !  !TRUE is "xFALSE", but "x&

Re: [Rd] string concatenation operator (revisited)

2021-12-06 Thread Gabriel Becker
Hi All,

Seeing this and the other thread (and admittedly not having clicked through
to the linked r-help thread), I wonder about NAs.

Should NA  "hi there"  not result in NA_character_? This is not
what any of the paste functions do, but in my opinoin, NA + 
seems like it should be NA  (not "NA"), particularly if we are talking
about `+` overloading, but potentially even in the case of a distinct
concatenation operator?

I guess what I'm saying is that in my head missingness propagation rules
should take priority in such an operator (ie NA +  should
*always * be NA).

Is that something others disagree with, or has it just not come up yet in
(the parts I have read) of this discussion?

Best,
~G

On Mon, Dec 6, 2021 at 10:03 AM Radford Neal  wrote:

> > > In pqR (see pqR-project.org), I have implemented ! and !! as binary
> > > string concatenation operators, equivalent to paste0 and paste,
> > > respectively.
> > >
> > > For instance,
> > >
> > >  > "hello" ! "world"
> > >  [1] "helloworld"
> > >  > "hello" !! "world"
> > >  [1] "hello world"
> > >  > "hello" !! 1:4
> > >  [1] "hello 1" "hello 2" "hello 3" "hello 4"
> >
> > I'm curious about the details:
> >
> > Would `1 ! 2` convert both to strings?
>
> They're equivalent to paste0 and paste, so 1 ! 2 produces "12", just
> like paste0(1,2) does.  Of course, they wouldn't have to be exactly
> equivalent to paste0 and paste - one could impose stricter
> requirements if that seemed better for error detection.  Off hand,
> though, I think automatically converting is more in keeping with the
> rest of R.  Explicitly converting with as.character could be tedious.
>
> I suppose disallowing logical arguments might make sense to guard
> against typos where ! was meant to be the unary-not operator, but
> ended up being a binary operator, after some sort of typo.  I doubt
> that this would be a common error, though.
>
> (Note that there's no ambiguity when there are no typos, except that
> when negation is involved a space may be needed - so, for example,
> "x" !  !TRUE is "xFALSE", but "x"!!TRUE is "x TRUE".  Existing uses of
> double negation are still fine - eg, a <- !!TRUE still sets a to TRUE.
> Parsing of operators is greedy, so "x"!!!TRUE is "x FALSE", not "xTRUE".)
>
> > Where does the binary ! fit in the operator priority?  E.g. how is
> >
> >   a ! b > c
> >
> > parsed?
>
> As (a ! b) > c.
>
> Their precedence is between that of + and - and that of < and >.
> So "x" ! 1+2 evalates to "x3" and "x" ! 1+2 < "x4" is TRUE.
>
> (Actually, pqR also has a .. operator that fixes the problems with
> generating sequences with the : operator, and it has precedence lower
> than + and - and higher than ! and !!, but that's not relevant if you
> don't have the .. operator.)
>
>Radford Neal
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] .onLoad, packageStartupMessage, and R CMD check

2021-11-04 Thread Gabriel Becker
Hi Michael,

Indeed, just to elaborate further on what I believe Duncan's point is, can
you give any examples, "dire" or not, that are appropriate when the package
is loaded but not attached (ie none of its symbols are visible to the user
without using :::)?

The only things I can think of are a package that changes the behavior of
other, attached package code, such as conflicted. Doing so is very much an
anti-pattern imo generally, with something like conflicted being an
(arguable) exception. And that's assuming conflicted even works/does
anything when loaded but not attached (I have not confirmed whether thats
the case or not). That or a package that is at end-of-life and is or soon
will be unsupported entirely.

The examples don't need to be yours, per se, if you know what those pushing
back against your linter were using messages from .onLoad for...

Best,
~G



On Thu, Nov 4, 2021 at 12:37 PM Duncan Murdoch 
wrote:

> On 04/11/2021 2:50 p.m., Michael Chirico via R-devel wrote:
> > I wrote a linter to stop users from using packageStartupMessage() in
> > their .onLoad() hook because of the R CMD check warning it triggers:
> >
> >
> https://github.com/wch/r-source/blob/8b6625e39cd62424dc23399dade37f20fa8afa91/src/library/tools/R/QC.R#L5167
> >
> > However, this received some pushback which I ultimately agree with,
> > and moreover ?.onLoad seems to agree as well:
> >
> >> Loading a namespace should where possible be silent, with startup
> > messages given by \code{.onAttach}. These messages (**and any essential
> > ones from \code{.onLoad}**) should use
> \code{\link{packageStartupMessage}}
> > so they can be silenced where they would be a distraction.
> >
> > **emphasis** mine. That is, if we think some message is _essential_ to
> > print during loadNamespace(), we are told to use
> > packageStartupMessage().
> >
> > Should we remove this R CMD check warning?
>
> The help page doesn't define what an "essential" message would be, but I
> would assume it's a message about some dire condition, not just "Hi! I
> just got loaded!".  So I think a note or warning would be appropriate,
> but not an error.
>
> Do you have an example of something that should routinely print, but
> that triggers a warning when checked?
>
> Duncan Murdoch
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] parallel PSOCK connection latency is greater on Linux?

2021-11-01 Thread Gabriel Becker
Hi all,

Please disregard my previous email as I misread the pasted output. Sorry
for the noise.

Best,
~G

On Mon, Nov 1, 2021 at 6:45 PM Jeff  wrote:

> Hi Gabriel,
>
> Yes, 40 milliseconds (ms) == 40,000 microseconds (us). My benchmarking
> output is reporting the latter, which is considerably higher than the 40us
> you are seeing. If I benchmark just the serialization round trip as you
> did, I get comparable results: 14us median on my Linux system. So at least
> on Linux, there is something else contributing the remaining 39,986us. The
> conclusion from earlier in this thread was that the culprit was TCP
> behavior unique to the Linux network stack.
>
> Jeff
>
> On Mon, Nov 1 2021 at 05:55:45 PM -0700, Gabriel Becker <
> gabembec...@gmail.com> wrote:
>
> Jeff,
>
> Perhaps I'm just missing something here, but ms is generally milliseconds,
> not microseconds (which are much smaller), right?
>
> Also, this seems to just be how long it takes to roundtrip serialize iris
> (in 4.1.0  on mac osx, as thats what I have handy right this moment):
>
> > microbenchmark({x <- unserialize(serialize(iris, connection = NULL))})
>
> Unit: microseconds
>
>  exprmin
> lq
>
>  { x <- unserialize(serialize(iris, connection = NULL)) } 35.378
> 36.0085
>
>  mean  median uq   max neval
>
>  40.26888 36.4345 43.641 80.39   100
>
>
>
> > res <- system.time(replicate(1, {x <- unserialize(serialize(iris,
> connection = NULL))}))
>
> > res/1
>
> user   system  elapsed
>
> 4.58e-05 2.90e-06 4.88e-05
>
>
> Thus the overhead appears to be extremely minimal in your results above,
> right? In fact it seems to be comparable or lower than replicate.
>
> ~G
>
>
>
>
>
> On Mon, Nov 1, 2021 at 5:20 PM Jeff Keller  wrote:
>
>> Hi Simon,
>>
>> I see there may have been some changes to address the TCP_NODELAY issue
>> on Linux in
>> https://github.com/wch/r-source/commit/82369f73fc297981e64cac8c9a696d05116f0797
>> .
>>
>> I gave this a try with R 4.1.1, but I still see a 40ms compute floor. Am
>> I misunderstanding these changes or how socketOptions is intended to be
>> used?
>>
>> -Jeff
>>
>> library(parallel)
>> library(microbenchmark)
>> options(socketOptions = "no-delay")
>> cl <- makeCluster(1)
>> (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us"))
>> # Unit: microseconds
>> #   expr  min   lq mean   median   uq max
>> neval
>> # clusterEvalQ(cl, iris) 96.9 43986.73 40535.93 43999.59 44012.79
>> 48046.6   100
>>
>> > On 11/04/2020 5:41 AM Iñaki Ucar  wrote:
>> >
>> >
>> > Please, check a tcpdump session on localhost while running the
>> following script:
>> >
>> > library(parallel)
>> > library(tictoc)
>> > cl <- makeCluster(1)
>> > Sys.sleep(1)
>> >
>> > for (i in 1:10) {
>> >   tic()
>> >   x <- clusterEvalQ(cl, iris)
>> >   toc()
>> > }
>> >
>> > The initialization phase comprises 7 packets. Then, the 1-second sleep
>> > will help you see where the evaluation starts. Each clusterEvalQ
>> > generates 6 packets:
>> >
>> > 1. main -> worker PSH, ACK 1026 bytes
>> > 2. worker -> main ACK 66 bytes
>> > 3. worker -> main PSH, ACK 3758 bytes
>> > 4. main -> worker ACK 66 bytes
>> > 5. worker -> main PSH, ACK 2484 bytes
>> > 6. main -> worker ACK 66 bytes
>> >
>> > The first two are the command and its ACK, the following are the data
>> > back and their ACKs. In the first 4-5 iterations, I see no delay at
>> > all. Then, in the following iterations, a 40 ms delay starts to happen
>> > between packets 3 and 4, that is: the main process delays the ACK to
>> > the first packet of the incoming result.
>> >
>> > So I'd say Nagle is hardly to blame for this. It would be interesting
>> > to see how many packets are generated with TCP_NODELAY on. If there
>> > are still 6 packets, then we are fine. If we suddenly see a gazillion
>> > packets, then TCP_NODELAY does more harm than good. On the other hand,
>> > TCP_QUICKACK would surely solve the issue without any drawback. As
>> > Nagle himself put it once, "set TCP_QUICKACK. If you find a case where
>> > that makes things worse, let me know."
>> >
>> > Iñaki
>> >
>> > On

Re: [Rd] parallel PSOCK connection latency is greater on Linux?

2021-11-01 Thread Gabriel Becker
Jeff,

Perhaps I'm just missing something here, but ms is generally milliseconds,
not microseconds (which are much smaller), right?

Also, this seems to just be how long it takes to roundtrip serialize iris
(in 4.1.0  on mac osx, as thats what I have handy right this moment):

> microbenchmark({x <- unserialize(serialize(iris, connection = NULL))})

Unit: microseconds

 exprmin  lq

 { x <- unserialize(serialize(iris, connection = NULL)) } 35.378 36.0085

 mean  median uq   max neval

 40.26888 36.4345 43.641 80.39   100



> res <- system.time(replicate(1, {x <- unserialize(serialize(iris,
connection = NULL))}))

> res/1

user   system  elapsed

4.58e-05 2.90e-06 4.88e-05


Thus the overhead appears to be extremely minimal in your results above,
right? In fact it seems to be comparable or lower than replicate.

~G





On Mon, Nov 1, 2021 at 5:20 PM Jeff Keller  wrote:

> Hi Simon,
>
> I see there may have been some changes to address the TCP_NODELAY issue on
> Linux in
> https://github.com/wch/r-source/commit/82369f73fc297981e64cac8c9a696d05116f0797
> .
>
> I gave this a try with R 4.1.1, but I still see a 40ms compute floor. Am I
> misunderstanding these changes or how socketOptions is intended to be used?
>
> -Jeff
>
> library(parallel)
> library(microbenchmark)
> options(socketOptions = "no-delay")
> cl <- makeCluster(1)
> (x <- microbenchmark(clusterEvalQ(cl, iris), times = 100, unit = "us"))
> # Unit: microseconds
> #   expr  min   lq mean   median   uq max
> neval
> # clusterEvalQ(cl, iris) 96.9 43986.73 40535.93 43999.59 44012.79 48046.6
>  100
>
> > On 11/04/2020 5:41 AM Iñaki Ucar  wrote:
> >
> >
> > Please, check a tcpdump session on localhost while running the following
> script:
> >
> > library(parallel)
> > library(tictoc)
> > cl <- makeCluster(1)
> > Sys.sleep(1)
> >
> > for (i in 1:10) {
> >   tic()
> >   x <- clusterEvalQ(cl, iris)
> >   toc()
> > }
> >
> > The initialization phase comprises 7 packets. Then, the 1-second sleep
> > will help you see where the evaluation starts. Each clusterEvalQ
> > generates 6 packets:
> >
> > 1. main -> worker PSH, ACK 1026 bytes
> > 2. worker -> main ACK 66 bytes
> > 3. worker -> main PSH, ACK 3758 bytes
> > 4. main -> worker ACK 66 bytes
> > 5. worker -> main PSH, ACK 2484 bytes
> > 6. main -> worker ACK 66 bytes
> >
> > The first two are the command and its ACK, the following are the data
> > back and their ACKs. In the first 4-5 iterations, I see no delay at
> > all. Then, in the following iterations, a 40 ms delay starts to happen
> > between packets 3 and 4, that is: the main process delays the ACK to
> > the first packet of the incoming result.
> >
> > So I'd say Nagle is hardly to blame for this. It would be interesting
> > to see how many packets are generated with TCP_NODELAY on. If there
> > are still 6 packets, then we are fine. If we suddenly see a gazillion
> > packets, then TCP_NODELAY does more harm than good. On the other hand,
> > TCP_QUICKACK would surely solve the issue without any drawback. As
> > Nagle himself put it once, "set TCP_QUICKACK. If you find a case where
> > that makes things worse, let me know."
> >
> > Iñaki
> >
> > On Wed, 4 Nov 2020 at 04:34, Simon Urbanek 
> wrote:
> > >
> > > I'm not sure the user would know ;). This is very system-specific
> issue just because the Linux network stack behaves so differently from
> other OSes (for purely historical reasons). That makes it hard to abstract
> as a "feature" for the R sockets that are supposed to be
> platform-independent. At least TCP_NODELAY is actually part of POSIX so it
> is on better footing, and disabling delayed ACK is practically only useful
> to work around the other side having Nagle on, so I would expect it to be
> rarely used.
> > >
> > > This is essentially RFC since we don't have a mechanism for socket
> options (well, almost, there is timeout and blocking already...) and I
> don't think we want to expose low-level details so perhaps one idea would
> be to add something like delay=NA to socketConnection() in order to not
> touch (NA), enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there
> is any other way we could infer the intention of the user to try to choose
> the right approach...
> > >
> > > Cheers,
> > > Simon
> > >
> > >
> > > > On Nov 3, 2020, at 02:28, Jeff  wrote:
> > > >
> > > > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that
> they might determine what is best for their potentially latency- or
> throughput-sensitive application?
> > > >
> > > > Best,
> > > > Jeff
> > > >
> > > > On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar 
> wrote:
> > > >> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek <
> simon.urba...@r-project.org> wrote:
> > > >>> It looks like R sockets on Linux could do with TCP_NODELAY --
> without (status quo):
> > > >> How many network packets are generated with and without it? If there
> 

Re: [Rd] [External] Re: Workaround very slow NAN/Infinities arithmetic?

2021-09-30 Thread Gabriel Becker
Mildly related (?) to this discussion, if you happen to be in a situation
where you know something is a C NAN, but need to check if its a proper R
NA, the R_IsNA function is surprisingly (to me, at least) expensive to do
in a tight loop because it calls the (again, surprisingly expensive to me)
isnan function.  This can happen in known sorted  Altrep REALSXPs where you
can easily determine the C-NAN status of all elements in the vector with a
binary search for the edge of the NANs, so in O(logn) calls to R_isnan. You
could notably also determine finiteness of all elements this way with a
couple more O(logn) passes if you needed to in the sorted case.

This came up when I was developing the patch for the unique/duplicated
fastpass for known-sorted vectors (thanks to Michael for working with me on
that and putting it in); I ended up writing an NAN_IS_R_NA macro to avoid
that isnan call since it's known. This was necessary (well, helpful at
least) because unique/duplicated care about the difference between NA and
NaN, while sorting and REAL_NO_NA (because ALTREP metadata/behavior is
closely linked to sort behavior) do not. In the case where you have a lot
of NAN values of solely one type or the other (by far most often because
they are all NAs and none are NaNs) the difference in speedup was
noticeably significant as I recall. I don't have the numbers handy but I
could run them again if desired.

~G

On Thu, Sep 30, 2021 at 10:25 AM  wrote:

> On Thu, 30 Sep 2021, brodie gaslam via R-devel wrote:
>
> >
> > André,
> >
> > I'm not an R core member, but happen to have looked a little bit at this
> > issue myself.  I've seen similar things on Skylake and Coffee Lake 2
> > (9700, one generation past your latest) too.  I think it would make sense
> > to have some handling of this, although I would want to show the
> trade-off
> > with performance impacts on CPUs that are not affected by this, and on
> > vectors that don't actually have NAs and similar.  I think the
> performance
> > impact is likely to be small so long as branch prediction is active, but
> > since branch prediction is involved you might need to check with
> different
> > ratios of NAs (not for your NA bailout branch, but for e.g. interaction
> > of what you add and the existing `na.rm=TRUE` logic).
>
> I would want to see realistic examples where this matters, not
> microbenchmarks, before thinking about complicating the code. Not all
> but most cases where sum(x) returns NaN/NA would eventually result in
> an error; getting to the error faster is not likely to be useful.
>
> My understanding is that arm64 does not support proper long doubles
> (they are the same as regular doubles). So code using long doubles
> isn't getting the hoped-for improved precision. Since that
> architecture is becoming more common we should probably be looking at
> replacing uses of long doubles with better algorithms that can work
> with regular doubles, e.g Kahan summation or variants for sum.
>
> > You'll also need to think of cases such as c(Inf, NA), c(NaN, NA), etc.,
> > which might complicate the logic a fair bit.
> >
> > Presumably the x87 FPU will remain common for a long time, but if there
> > was reason to think otherwise, then the value of this becomes
> > questionable.
> >
> > Either way, I would probably wait to see what R Core says.
> >
> > For reference this 2012 blog post[1] discusses some aspects of the issue,
> > including that at least "historically" AMD was not affected.
> >
> > Since we're on the topic I want to point out that the default NA in R
> > starts off as a signaling NA:
> >
> > example(numToBits)   # for `bitC`
> > bitC(NA_real_)
> > ## [1] 0 111 |
> 00100010
> > bitC(NA_real_ + 0)
> > ## [1] 0 111 |
> 10100010
> >
> > Notice the leading bit of the significant starts off as zero, which marks
> > it as a signaling NA, but becomes 1, i.e. non-signaling, after any
> > operation[2].
> >
> > This is meaningful because the mere act of loading a signaling NA into
> the
> > x87 FPU is sufficient to trigger the slowdowns, even if the NA is not
> > actually used in arithmetic operations.  This happens sometimes under
> some
> > optimization levels.  I don't now of any benefit of starting off with a
> > signaling NA, especially since the encoding is lost pretty much as soon
> as
> > it is used.  If folks are interested I can provide patch to turn the NA
> > quiet by default.
>
> In principle this might be a good idea, but the current bit pattern is
> unfortunately baked into a number of packages and documents on
> internals, as well as serialized objects. The work needed to sort that
> out is probably not worth the effort.
>
> It also doesn't seem to affect the performance issue here since
> setting b[1] <- NA_real_ + 0 produces the same slowdown (at least on
> my current Intel machine).
>
> Best,
>
> luke
>
> >
> > Best,
> >
> 

Re: [Rd] [External] Re: Update on rtools4 and ucrt support

2021-08-23 Thread Gabriel Becker
Hi all,

I will preface this with the fact that I don't do work on windows and the
following is based on remembered conversations/talks/etc from a while ago
so may be either incorrect or out of date, but I recall one of the major
things Jeroen was  targeting was use of/integration with a meaningful
package manager for external library dependencies in windows from-source
package builds, and that *I think* this was a part of his (then explicitly
experimental) Rtools4 setup (?)

Is the above correct, and if so, is there also package manager
integration/usage in Tomas' official R-core UCRT toolchain? If not, could
it be (perhaps, as Duncan suggested, via collaborative effort involving
Jeroen as well)?

I admit, both windows toolchains/builds and non-latin encodings are things
I have so far stayed away from, so I can't really contribute beyond that,
other than to say, as others have, that I do think both are impressive
pieces of work and that both Jeroen and Tomas should have our thanks thanks
for this and a lot of other work they put into R and the R community.

Best,
~G

On Mon, Aug 23, 2021 at 4:02 PM Dirk Eddelbuettel  wrote:

>
> As I type this, we are eight messages into this thread -- but I am not sure
> it has been made clear what the actual contentious issues are.
>
> There appear to be two toolchains, and they appear to be interoperate
> (though
> Duncan stated he had issues with an (arguably demanding) package).  Now, I
> have the opposite (hence positive) experience.  For one package I look
> after,
> a colleague took care of the (complicated in that case) 'needed to build
> the
> package' pre-requirements by ensuring we have a UCRT variant.  Jeroen then
> (unprompted) supplied a two-line/two-file PR to enable a Windows UCRT build
> (piggy-backing on the existing Windows build), and with that the 'ERROR' I
> had at CRAN reports under Tomas UCRT entry is gone. Net-net, this looks
> like
> a working setup to me which combines both toolchains without issues.
>
> And I was able to repeat this with a few more packages of mine for which
> Jeroen's winlibs factory has libraries---these now build to under Tomas's
> builder at CRAN.  So maybe this is not an either-or discussion?  So if
> there
> are issues, could we be told what they are, and could we possibly help
> Jeroen
> and Tomas to iron them out?
>
> Dirk
>
> --
> https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] na.omit inconsistent with is.na on list

2021-08-16 Thread Gabriel Becker
Hi Toby,

Right, my point is that is.na being equivalent to "is an incomplete case"
is really only true for atomic vectors. I don't see it being the case for
lists, given what is.na does for lists. This is all just  my opinion, but
that's my take: vec[!is.na(vec)] happens to be the same as na.omit(vec) for
atomics, but in general the operations are not equivalent and I wouldn't
expect them to be.

Best,
~G

On Mon, Aug 16, 2021 at 10:54 AM Toby Hocking  wrote:

> To clarify, ?is.na docs say that 'na.omit' returns the object with
> incomplete cases removed.
> If we take is.na to be the definition of "incomplete cases" then a list
> element with scalar NA is incomplete.
> About the data.frame method, in my opinion it is highly
> confusing/inconsistent for na.omit to keep rows with incomplete cases in
> list columns, but not in columns which are atomic vectors,
>
> > (f.num <- data.frame(num=c(1,NA,2)))
>   num
> 1   1
> 2  NA
> 3   2
> > is.na(f.num)
>num
> [1,] FALSE
> [2,]  TRUE
> [3,] FALSE
> > na.omit(f.num)
>   num
> 1   1
> 3   2
>
> > (f.list <- data.frame(list=I(list(1,NA,2
>   list
> 11
> 2   NA
> 32
> > is.na(f.list)
>   list
> [1,] FALSE
> [2,]  TRUE
> [3,] FALSE
> > na.omit(f.list)
>   list
> 11
> 2   NA
> 32
>
> On Sat, Aug 14, 2021 at 5:15 PM Gabriel Becker 
> wrote:
>
> > I understand what is.na does, the issue I have is that its task is not
> > equivalent to the conceptual task na.omit is doing, in my opinion, as
> > illustrated by what the data.frame method does.
> >
> > Thus what i was getting at above about it not being clear that lst[is.na
> (lst)]
> > being the correct thing for na.omit to do
> >
> > ~G
> >
> > ~G
> >
> > On Sat, Aug 14, 2021, 1:49 PM Toby Hocking  wrote:
> >
> >> Some relevant information from ?is.na: the behavior for lists is
> >> documented,
> >>
> >>  For is.na, elementwise the result is false unless that element
> >>  is a length-one atomic vector and the single element of that
> >>  vector is regarded as NA or NaN (note that any is.na method
> >>  for the class of the element is ignored).
> >>
> >> Also there are other functions anyNA and is.na<- which are consistent
> >> with
> >> is.na. That is, anyNA only returns TRUE if the list has an element
> which
> >> is
> >> a scalar NA. And is.na<- sets list elements to logical NA to indicate
> >> missingness.
> >>
> >> On Fri, Aug 13, 2021 at 1:10 AM Hugh Parsonage <
> hugh.parson...@gmail.com>
> >> wrote:
> >>
> >> > The data.frame method deliberately skips non-atomic columns before
> >> > invoking is.na(x) so I think it is fair to assume this behaviour is
> >> > intentional and assumed.
> >> >
> >> > Not so clear to me that there is a sensible answer for list columns.
> >> > (List columns seem to collide with the expectation that in each
> >> > variable every observation will be of the same type)
> >> >
> >> > Consider your list L as
> >> >
> >> > L <- list(NULL, NA, c(NA, NA))
> >> >
> >> > Seems like every observation could have a claim to be 'missing' here.
> >> > Concretely, if a data.frame had a list column representing the lat-lon
> >> > of an observation, we might only be able to represent missing values
> >> > like c(NA, NA).
> >> >
> >> > On Fri, 13 Aug 2021 at 17:27, Iñaki Ucar 
> >> wrote:
> >> > >
> >> > > On Thu, 12 Aug 2021 at 22:20, Gabriel Becker  >
> >> > wrote:
> >> > > >
> >> > > > Hi Toby,
> >> > > >
> >> > > > This definitely appears intentional, the first  expression of
> >> > > > stats:::na.omit.default is
> >> > > >
> >> > > >if (!is.atomic(object))
> >> > > >
> >> > > > return(object)
> >> > >
> >> > > I don't follow your point. This only means that the *default* method
> >> > > is not intended for non-atomic cases, but it doesn't mean it
> shouldn't
> >> > > exist a method for lists.
> >> > >
> >> > > > So it is explicitly just returning the object in non-atomic cases,
> >> > which
> >> > > > includes lists. I was not involved in this decision (obviously)
>

Re: [Rd] na.omit inconsistent with is.na on list

2021-08-14 Thread Gabriel Becker
I understand what is.na does, the issue I have is that its task is not
equivalent to the conceptual task na.omit is doing, in my opinion, as
illustrated by what the data.frame method does.

Thus what i was getting at above about it not being clear that lst[is.na(lst)]
being the correct thing for na.omit to do

~G

~G

On Sat, Aug 14, 2021, 1:49 PM Toby Hocking  wrote:

> Some relevant information from ?is.na: the behavior for lists is
> documented,
>
>  For is.na, elementwise the result is false unless that element
>  is a length-one atomic vector and the single element of that
>  vector is regarded as NA or NaN (note that any is.na method
>  for the class of the element is ignored).
>
> Also there are other functions anyNA and is.na<- which are consistent with
> is.na. That is, anyNA only returns TRUE if the list has an element which
> is
> a scalar NA. And is.na<- sets list elements to logical NA to indicate
> missingness.
>
> On Fri, Aug 13, 2021 at 1:10 AM Hugh Parsonage 
> wrote:
>
> > The data.frame method deliberately skips non-atomic columns before
> > invoking is.na(x) so I think it is fair to assume this behaviour is
> > intentional and assumed.
> >
> > Not so clear to me that there is a sensible answer for list columns.
> > (List columns seem to collide with the expectation that in each
> > variable every observation will be of the same type)
> >
> > Consider your list L as
> >
> > L <- list(NULL, NA, c(NA, NA))
> >
> > Seems like every observation could have a claim to be 'missing' here.
> > Concretely, if a data.frame had a list column representing the lat-lon
> > of an observation, we might only be able to represent missing values
> > like c(NA, NA).
> >
> > On Fri, 13 Aug 2021 at 17:27, Iñaki Ucar 
> wrote:
> > >
> > > On Thu, 12 Aug 2021 at 22:20, Gabriel Becker 
> > wrote:
> > > >
> > > > Hi Toby,
> > > >
> > > > This definitely appears intentional, the first  expression of
> > > > stats:::na.omit.default is
> > > >
> > > >if (!is.atomic(object))
> > > >
> > > > return(object)
> > >
> > > I don't follow your point. This only means that the *default* method
> > > is not intended for non-atomic cases, but it doesn't mean it shouldn't
> > > exist a method for lists.
> > >
> > > > So it is explicitly just returning the object in non-atomic cases,
> > which
> > > > includes lists. I was not involved in this decision (obviously) but
> my
> > > > guess is that it is due to the fact that what constitutes an
> > observation
> > > > "being complete" in unclear in the list case. What should
> > > >
> > > > na.omit(list(5, NA, c(NA, 5)))
> > > >
> > > > return? Just the first element, or the first and the last? It seems,
> at
> > > > least to me, unclear. A small change to the documentation to to add
> > "atomic
> > >
> > > > is.na(list(5, NA, c(NA, 5)))
> > > [1] FALSE  TRUE FALSE
> > >
> > > Following Toby's argument, it's clear to me: the first and the last.
> > >
> > > Iñaki
> > >
> > > > (in the sense of is.atomic returning \code{TRUE})" in front of
> > "vectors"
> > > > or similar  where what types of objects are supported seems
> justified,
> > > > though, imho, as the current documentation is either ambiguous or
> > > > technically incorrect, depending on what we take "vector" to mean.
> > > >
> > > > Best,
> > > > ~G
> > > >
> > > > On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking 
> > wrote:
> > > >
> > > > > Also, the na.omit method for data.frame with list column seems to
> be
> > > > > inconsistent with is.na,
> > > > >
> > > > > > L <- list(NULL, NA, 0)
> > > > > > str(f <- data.frame(I(L)))
> > > > > 'data.frame': 3 obs. of  1 variable:
> > > > >  $ L:List of 3
> > > > >   ..$ : NULL
> > > > >   ..$ : logi NA
> > > > >   ..$ : num 0
> > > > >   ..- attr(*, "class")= chr "AsIs"
> > > > > > is.na(f)
> > > > >  L
> > > > > [1,] FALSE
> > > > > [2,]  TRUE
> > > > > [3,] FALSE
> > > > > > na.omit(f)
> > > > >L
> > > &

Re: [Rd] na.omit inconsistent with is.na on list

2021-08-13 Thread Gabriel Becker
On Thu, Aug 12, 2021 at 4:30 PM Toby Hocking  wrote:

> Hi Gabe thanks for the feedback.
>
> On Thu, Aug 12, 2021 at 1:19 PM Gabriel Becker 
> wrote:
>
>> Hi Toby,
>>
>> This definitely appears intentional, the first  expression of
>> stats:::na.omit.default is
>>
>>if (!is.atomic(object))
>>
>> return(object)
>>
>> Based on this code it does seem that the documentation could be clarified
> to say atomic vectors.
>
>>
>> So it is explicitly just returning the object in non-atomic cases, which
>> includes lists. I was not involved in this decision (obviously) but my
>> guess is that it is due to the fact that what constitutes an observation
>> "being complete" in unclear in the list case. What should
>>
>> na.omit(list(5, NA, c(NA, 5)))
>>
>> return? Just the first element, or the first and the last? It seems, at
>> least to me, unclear.
>>
> I agree in principle/theory that it is unclear, but in practice is.na has
> an un-ambiguous answer (if list element is scalar NA then it is considered
> missing, otherwise not).
>

Well, yes it's unambiguous, but I would argue less likely than the other
option to be correct. Remember what na.omit is supposed to do: "remove
observations which are not complete".

Now for data.frames, this means it removes any row (i.e. observation,
despite the internal structure) where *any* column contains an NA. The most
analogous interpretation of na.omit on a list, in the well behaved (ie list
of atomic vectors) case, I think, is that we consider it a ragged
collection of "observations", in which case  x[is.na(x)] with x a list
would do the wrong thing because it is not checking these "observations"
for completeness.

Perhaps others disagree with me about that, and anyway, this only works
when you can check the elements of the list for "completeness" at all, the
list can have anything for elements, and then checking for completeness
becomes impossible...

As is, I do also wonder if a warning should be thrown letting the user know
that their call isn't doing ANY of the possible things it could mean...

Best,
~G


> A small change to the documentation to to add "atomic (in the sense of
>> is.atomic returning \code{TRUE})" in front of "vectors"  or similar  where
>> what types of objects are supported seems justified, though, imho, as the
>> current documentation is either ambiguous or technically incorrect,
>> depending on what we take "vector" to mean.
>>
>> Best,
>> ~G
>>
>> On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking  wrote:
>>
>>> Also, the na.omit method for data.frame with list column seems to be
>>> inconsistent with is.na,
>>>
>>> > L <- list(NULL, NA, 0)
>>> > str(f <- data.frame(I(L)))
>>> 'data.frame': 3 obs. of  1 variable:
>>>  $ L:List of 3
>>>   ..$ : NULL
>>>   ..$ : logi NA
>>>   ..$ : num 0
>>>   ..- attr(*, "class")= chr "AsIs"
>>> > is.na(f)
>>>  L
>>> [1,] FALSE
>>> [2,]  TRUE
>>> [3,] FALSE
>>> > na.omit(f)
>>>L
>>> 1
>>> 2 NA
>>> 3  0
>>>
>>> On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking  wrote:
>>>
>>> > na.omit is documented as "na.omit returns the object with incomplete
>>> cases
>>> > removed." and "At present these will handle vectors," so I expected
>>> that
>>> > when it is used on a list, it should return the same thing as if we
>>> subset
>>> > via is.na; however I observed the following,
>>> >
>>> > > L <- list(NULL, NA, 0)
>>> > > str(L[!is.na(L)])
>>> > List of 2
>>> >  $ : NULL
>>> >  $ : num 0
>>> > > str(na.omit(L))
>>> > List of 3
>>> >  $ : NULL
>>> >  $ : logi NA
>>> >  $ : num 0
>>> >
>>> > Should na.omit be fixed so that it returns a result that is consistent
>>> > with is.na? I assume that is.na is the canonical definition of what
>>> > should be considered a missing value in R.
>>> >
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] na.omit inconsistent with is.na on list

2021-08-12 Thread Gabriel Becker
Hi Toby,

This definitely appears intentional, the first  expression of
stats:::na.omit.default is

   if (!is.atomic(object))

return(object)


So it is explicitly just returning the object in non-atomic cases, which
includes lists. I was not involved in this decision (obviously) but my
guess is that it is due to the fact that what constitutes an observation
"being complete" in unclear in the list case. What should

na.omit(list(5, NA, c(NA, 5)))

return? Just the first element, or the first and the last? It seems, at
least to me, unclear. A small change to the documentation to to add "atomic
(in the sense of is.atomic returning \code{TRUE})" in front of "vectors"
or similar  where what types of objects are supported seems justified,
though, imho, as the current documentation is either ambiguous or
technically incorrect, depending on what we take "vector" to mean.

Best,
~G

On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking  wrote:

> Also, the na.omit method for data.frame with list column seems to be
> inconsistent with is.na,
>
> > L <- list(NULL, NA, 0)
> > str(f <- data.frame(I(L)))
> 'data.frame': 3 obs. of  1 variable:
>  $ L:List of 3
>   ..$ : NULL
>   ..$ : logi NA
>   ..$ : num 0
>   ..- attr(*, "class")= chr "AsIs"
> > is.na(f)
>  L
> [1,] FALSE
> [2,]  TRUE
> [3,] FALSE
> > na.omit(f)
>L
> 1
> 2 NA
> 3  0
>
> On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking  wrote:
>
> > na.omit is documented as "na.omit returns the object with incomplete
> cases
> > removed." and "At present these will handle vectors," so I expected that
> > when it is used on a list, it should return the same thing as if we
> subset
> > via is.na; however I observed the following,
> >
> > > L <- list(NULL, NA, 0)
> > > str(L[!is.na(L)])
> > List of 2
> >  $ : NULL
> >  $ : num 0
> > > str(na.omit(L))
> > List of 3
> >  $ : NULL
> >  $ : logi NA
> >  $ : num 0
> >
> > Should na.omit be fixed so that it returns a result that is consistent
> > with is.na? I assume that is.na is the canonical definition of what
> > should be considered a missing value in R.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rprofile.site function or variable definitions break with R 4.1

2021-08-12 Thread Gabriel Becker
Hi Andrew and Dirk,

The other question to think about is what was your Rprofile.site doing
before. We can infer from this error that apparently it was defining things
*in the namespace for the base package*. How often is that actually what
you wanted it to do/a good idea?

I haven't played around with it, as I don't use Rprofile.site to actually
create/assign object only, like Dirk, set options or option-adjacent things
(such as .libPaths), but I imagine you could get it to put things into the
global environment or attach a special "local config" entry to the search
path an put things there, if you so desired.

Best,
~G

On Thu, Aug 12, 2021 at 12:41 PM Dirk Eddelbuettel  wrote:

>
> On 12 August 2021 at 15:19, Andrew Piskorski wrote:
> | Ok, but what's the recommended way to actually USE Rprofile.site now?
> | Should I move all my local configuration into a special package, and
> | do nothing in Rprofile.site except require() that package?
>
> Exactly as before. I set my mirror as I have before and nothing changes
>
>   ## We set the cloud mirror, which is 'network-close' to everybody, as
> default
>   local({
>   r <- getOption("repos")
>   r["CRAN"] <- "https://cloud.r-project.org;
>   options(repos = r)
>   })
>
> I cannot help but think that you are shooting the messenger (here
> Rprofile.site) for an actual behaviour change in R itself ?
>
> Dirk
>
> --
> https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Feature request: Change default library path on Windows

2021-07-25 Thread Gabriel Becker
On Sun, Jul 25, 2021, 6:54 AM Steve Haroz  wrote:

> > Shouldn't it be in one of the AppData directories?
>
> I asked that same question on twitter. Here was a response
> (https://twitter.com/bmwiernik/status/1419033079495147522):
> * But it's not for files that should be user-accessible, like a
> library (cf. Zotero has preferences in AppData , but library files in
> %USERPROFILE%/Zotero)
> * So, for example, in R's case it could make sense for the core
> packages to be installed in %APPDATA%/R/R-4.1.0/library" rather than
> "C:/Program Files/R/R-4.1.0/library" (either is fairly common), but
> user packages should be somewhere more accessible.
>
> Here is a quote from
>
> https://docs.microsoft.com/en-us/windows/apps/design/app-settings/store-and-retrieve-app-data
> :
> "App data is mutable data that is created and managed by a specific
> app. It includes runtime state, app settings, user preferences,
> reference content (such as the dictionary definitions in a dictionary
> app), and other settings"
> I don't think libraries fall into the categories of state or settings.
>

Well, no,  but installed, and thus available for use by scripts, extension
packages are somewhat more comparable to dictionary definitions in in a
dictionary app.

They seem fairly analogous, in fact. Packages are essentially dictionaries
of available functions that scripts (equivalent roughly to text documents
here) can call.

That said I dont have a super strong opinion and don't use windows, just
pointing out that its not clear this would violate the intent of the cited
guidance.

Another option would be to allow users to set the default library location
from within the windows installer (if you can't already).

~G




> -Steve Haroz
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Clearing attributes returns ALTREP, serialize still saves them

2021-07-03 Thread Gabriel Becker
Ok, a bit more:

The relevant bit in serialize.c that I can see is:


if (ALTREP(s) && stream->version >= 3) {

SEXP info = ALTREP_SERIALIZED_CLASS(s);

SEXP state = ALTREP_SERIALIZED_STATE(s);

if (info != NULL && state != NULL) {

   int flags = PackFlags(ALTREP_SXP, LEVELS(s), OBJECT(s), 0, 0);

   PROTECT(state);

   PROTECT(info);

   OutInteger(stream, flags);

   WriteItem(info, ref_table, stream);

  * WriteItem(state, ref_table, stream);*

   WriteItem(ATTRIB(s), ref_table, stream);

   UNPROTECT(2); /* state, info */

   return;

}
/* else fall through to standard processing */
}

And in the wrapper altclass, we have:

*static SEXP wrapper_Serialized_state(SEXP x)*
{
return CONS(*WRAPPER_WRAPPED(x)*, WRAPPER_METADATA(x));
}

So whats happening, is that the data isn't being written out during the
WriteItem(ATTRIB(s)), that actually has the correct attribute value. Its
being written out in the bolden line above that, the state, which has the
wrapped SEXP, which ITSELF has the attributes on it, but is not an ALTREP,
so that goes through standard processing, which writes out the attributes
as normal.

So that, I believe, is what needs to change. One possibility is that
wrapper_Serialized_state can be made smarter so that the inner attributes
are duplicated and then wiped clean for any that are overridden by the
attributes on the wrapper.  Another option is that the ALTREP WriteItem
section could be made smarter, but that seems less robust.

Finally, the wrapper might be able to be modified in such a way that
setting the attribute on the wrapper clears taht attribute on the wrapped
value, if present. .

I think making wrapper_Serialized_state smarter is the right way to attack
this, and thats the first thing I'll try when I get to it, but if someone
tackles it before me hopefully this digging helped some.

Best,
~G

On Fri, Jul 2, 2021 at 10:18 PM Gabriel Becker 
wrote:

> Hi all,
>
> I don't have a solution yet, but a bit more here:
>
> > .Internal(inspect(x2b))
>
> @7f913826d590 14 REALSXP g0c0 [REF(1)]  wrapper [srt=-2147483648,no_na=0]
>
>   @7f9137500320 14 REALSXP g0c7 [REF(2),ATT] (len=100, tl=0)
> 0.45384,0.926371,0.838637,-1.71485,-0.719073,...
>
>   ATTRIB:
>
> @7f913826dc20 02 LISTSXP g0c0 [REF(1)]
>
>   TAG: @7f91378538d0 01 SYMSXP g0c0 [MARK,REF(460)] "data"
>
>   @7f911831 14 REALSXP g0c7 [REF(2)] (len=100, tl=0)
> 0.66682,0.480576,-1.13229,0.453313,-0.819498,...
>
> > attr(x2b, "data") <- "small"
>
> > .Internal(inspect(x2b))
>
> @7f913826d590 14 REALSXP g0c0 [REF(1),ATT]  wrapper
> [srt=-2147483648,no_na=0]
>
>   @7f9137500320 14 REALSXP g0c7 [REF(2),ATT] (len=100, tl=0)
> 0.45384,0.926371,0.838637,-1.71485,-0.719073,...
>
>   ATTRIB:
>
> @7f913826dc20 02 LISTSXP g0c0 [REF(1)]
>
>   TAG: @7f91378538d0 01 SYMSXP g0c0 [MARK,REF(461)] "data"
>
>   @7f911831 14 REALSXP g0c7 [REF(2)] (len=100, tl=0)
> 0.66682,0.480576,-1.13229,0.453313,-0.819498,...
>
> ATTRIB:
>
>   @7f913826c870 02 LISTSXP g0c0 [REF(1)]
>
> TAG: @7f91378538d0 01 SYMSXP g0c0 [MARK,REF(461)] "data"
>
> @7f9120580850 16 STRSXP g0c1 [REF(3)] (len=1, tl=0)
>
>   @7f91205808c0 09 CHARSXP g0c1 [REF(3),gp=0x60] [ASCII] [cached]
> "small"
>
>
> So we can see that the assignment of attr(x2b, "data") IS doing something,
> but it isn't doing the right thing. The fact that the above code assigned
> null instead of a value was hiding this.
>
>
> I will dig into this more if someone doesn't get it fixed before me, but
> it won't be until after useR, because I'm preparing multiple talks for that
> and it is this coming week.
>
>
> Best,
>
> ~G
>
> On Fri, Jul 2, 2021 at 9:15 PM Zafer Barutcuoglu <
> zafer.barutcuo...@gmail.com> wrote:
>
>> Hi all,
>>
>> Setting names/dimnames on vectors/matrices of length>=64 returns an
>> ALTREP wrapper which internally still contains the names/dimnames, and
>> calling base::serialize on the result writes them out. They are
>> unserialized in the same way, with the names/dimnames hidden in the ALTREP
>> wrapper, so the problem is not obvious except in wasted time, bandwidth, or
>> disk space.
>>
>> Example:
>>v1 <- setNames(rnorm(64), paste("element name", 1:64))
>>v2 <- unname(v1)
>>names(v2)
>># NULL
>>length(serialize(v1, NULL))
>># [1] 2039
>>length(serialize(v2, NULL))
>># [1] 2132
>>length(serialize(v2[TRUE], NULL))
>># [1] 543
>>
>>con <- rawConnection(raw(), "w")
>>serialize(v2, con)
>>v3 <- unseria

Re: [Rd] Clearing attributes returns ALTREP, serialize still saves them

2021-07-02 Thread Gabriel Becker
Hi all,

I don't have a solution yet, but a bit more here:

> .Internal(inspect(x2b))

@7f913826d590 14 REALSXP g0c0 [REF(1)]  wrapper [srt=-2147483648,no_na=0]

  @7f9137500320 14 REALSXP g0c7 [REF(2),ATT] (len=100, tl=0)
0.45384,0.926371,0.838637,-1.71485,-0.719073,...

  ATTRIB:

@7f913826dc20 02 LISTSXP g0c0 [REF(1)]

  TAG: @7f91378538d0 01 SYMSXP g0c0 [MARK,REF(460)] "data"

  @7f911831 14 REALSXP g0c7 [REF(2)] (len=100, tl=0)
0.66682,0.480576,-1.13229,0.453313,-0.819498,...

> attr(x2b, "data") <- "small"

> .Internal(inspect(x2b))

@7f913826d590 14 REALSXP g0c0 [REF(1),ATT]  wrapper
[srt=-2147483648,no_na=0]

  @7f9137500320 14 REALSXP g0c7 [REF(2),ATT] (len=100, tl=0)
0.45384,0.926371,0.838637,-1.71485,-0.719073,...

  ATTRIB:

@7f913826dc20 02 LISTSXP g0c0 [REF(1)]

  TAG: @7f91378538d0 01 SYMSXP g0c0 [MARK,REF(461)] "data"

  @7f911831 14 REALSXP g0c7 [REF(2)] (len=100, tl=0)
0.66682,0.480576,-1.13229,0.453313,-0.819498,...

ATTRIB:

  @7f913826c870 02 LISTSXP g0c0 [REF(1)]

TAG: @7f91378538d0 01 SYMSXP g0c0 [MARK,REF(461)] "data"

@7f9120580850 16 STRSXP g0c1 [REF(3)] (len=1, tl=0)

  @7f91205808c0 09 CHARSXP g0c1 [REF(3),gp=0x60] [ASCII] [cached]
"small"


So we can see that the assignment of attr(x2b, "data") IS doing something,
but it isn't doing the right thing. The fact that the above code assigned
null instead of a value was hiding this.


I will dig into this more if someone doesn't get it fixed before me, but it
won't be until after useR, because I'm preparing multiple talks for that
and it is this coming week.


Best,

~G

On Fri, Jul 2, 2021 at 9:15 PM Zafer Barutcuoglu <
zafer.barutcuo...@gmail.com> wrote:

> Hi all,
>
> Setting names/dimnames on vectors/matrices of length>=64 returns an ALTREP
> wrapper which internally still contains the names/dimnames, and calling
> base::serialize on the result writes them out. They are unserialized in the
> same way, with the names/dimnames hidden in the ALTREP wrapper, so the
> problem is not obvious except in wasted time, bandwidth, or disk space.
>
> Example:
>v1 <- setNames(rnorm(64), paste("element name", 1:64))
>v2 <- unname(v1)
>names(v2)
># NULL
>length(serialize(v1, NULL))
># [1] 2039
>length(serialize(v2, NULL))
># [1] 2132
>length(serialize(v2[TRUE], NULL))
># [1] 543
>
>con <- rawConnection(raw(), "w")
>serialize(v2, con)
>v3 <- unserialize(rawConnectionValue(con))
>names(v3)
># NULL
>length(serialize(v3, NULL))
># 2132
>
># Similarly for matrices:
>m1 <- matrix(rnorm(64), 8, 8, dimnames=list(paste("row name", 1:8),
> paste("col name", 1:8)))
>m2 <- unname(m1)
>dimnames(m2)
># NULL
>length(serialize(m1, NULL))
># [1] 918
>length(serialize(m2, NULL))
># [1] 1035
>length(serialize(m2[TRUE, TRUE], NULL))
># 582
>
> Previously discussed here, too:
> https://r.789695.n4.nabble.com/Invisible-names-problem-td4764688.html
>
> This happens with other attributes as well, but less predictably:
>x1 <- structure(rnorm(100), data=rnorm(100))
>x2 <- structure(x1, data=NULL)
>length(serialize(x1, NULL))
># [1] 8000952
>length(serialize(x2, NULL))
># [1] 924
>
>x1b <- rnorm(100)
>attr(x1b, "data") <- rnorm(100)
>x2b <- x1b
>attr(x2b, "data") <- NULL
>length(serialize(x1b, NULL))
># [1] 8000863
>length(serialize(x2b, NULL))
># [1] 8000956
>
> This is pretty severe, trying to track down why serializing a small object
> kills the network, because of which large attributes it may have once had
> during its lifetime around the codebase that are still secretly tagging
> along.
>
> Is there a plan to resolve this? Any suggestions for maybe a C++
> workaround until then? Or an alternative performant serialization solution?
>
> Best,
> --
> Zafer
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] On read.csv and write.csv

2021-07-01 Thread Gabriel Becker
On Thu, Jul 1, 2021 at 1:46 PM Stephen Ellison 
wrote:

>
> Please run the reproducible example provided.
> When you do, you will see that write.csv writes an unnecessary empty
> header field ("") over the row names column. This makes the number of
> header fields equal to the number of columns _including_ row names. That
> causes the original row names to be read as data by read.csv, following the
> rule that the number of header fields determines whether row names are
> present. read.csv  accordingly assumes that the former row names are
> unnamed data, calls the unnamed row names column "X" (or X.1 etc if X
> exists) and then adds new, default, row names _instead of the original row
> names written by write.csv_.
> That's not helpful.
>

This depends on if you are reading the csv via R or something else, I would
imagine. It not being "valid" CSV at all would likely cause some programs
to choke entirely, I expect. I admit that's conjecture though, I don't have
data on that one way or another.

~G

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] On read.csv and write.csv

2021-06-30 Thread Gabriel Becker
Hi Stephen,

Personally, I don't have super strong feelings about this, but
https://datatracker.ietf.org/doc/html/rfc4180#section-2 does say that the
optional header line should have the same number of fields as the data
records, so in as much as that is the "CSV specification", R's read.csv
behavior is supporting an extension, whereas its write.csv is
outputting "standard" compliant csv.

It is possible that one or a few of the mentioned multitude of independent
specs do specify header can have one less, I don't know, but if so,
according to the ietf, its not overly common.

I can't even speak to whether that is why the behavior is as it is, but I
figured it was worth mentioning.

~G

On Wed, Jun 30, 2021 at 2:15 PM Stephen Ellison 
wrote:

> Apologies if this is a well-worn question; I haven’t found it so far but
> there's a lot of r-dev and I may have missed it in the archives. In the
> mean time:
>
> I've managed to avoid writing csv files with R for a couple of decades but
> we're swopping data with a collaborator and I've tripped over an
> inconsistency between read.csv and write.csv that seems less than helpful.
> The default line number behaviour for read.csv is to assume that, when the
> number of items in the first row is one less than the number in the second,
> that the first column contains row names. write.csv, however, includes an
> empty string ("") as the first header entry over row names when writing. On
> rereading, the original row names are then treated as data with unknown
> name, replaced by "X".
>
> That means that, unlike read.table and write.table,  something written
> with write.csv is not read back correctly by read.csv .
>
> Is that intentional?
> And whether it is intentional or not, is it wise?
>
> Example:
>
> ( D1 <- data.frame(A=letters[1:5], N=1:5, Y=rnorm(5) ) )
> write.csv(D1, "temp.csv")
>
> ( D1w <- read.csv("temp.csv") )
>
> # Note the unnecessary new X column ...
> #Tidy up
> unlink("temp.csv")
>
> This differs from the parent .table defaults; write.table doesn’t add the
> extra "" column label, so the object read back with read.table does not
> contain an unwanted extra column.
>
> Wouldn’t it be more sensible if write.csv() and read.csv() were consistent
> in the same sense as read.table and write.table?
> Or at least if there were a switch (as.read.csv=TRUE ?) to tell write.csv
> to omit the initial "", or vice versa?
>
> Currently using R version 4.1.0 on Windows, but this reproduces at least
> as far back as 3.6
>
> Steve E
>
>
> ***
> This email and any attachments are confidential. Any u...{{dropped:15}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior

2021-06-29 Thread Gabriel Becker
Hi Sebastian,

min/max do not materialize the vector, you will see it as compact after
same as before. It *does* however do a pass over the data chunked by
region, which is much more expensive than it need be for compact sequences,
that is true.

I think in some version of code that never made it out of the branch, I had
default min/max methods which took sortedness into account if it was known.
One thing that significantly complicated that cod ewas that you have to
find the edge of the NAs(/NaNs for the real case) if narm is TRUE, which
involves a binary search using ELT (or a linear one using
ITERATE_BY_REGION, I suppose).

That said a newer version of the count nas code did get in from a later
patch  to update, so it is available in r-devel and could be used to
revisit that approach.

That aside, it is true that compact sequences in particular never have NAs
so the min and max altrep methods for those classes would be trivial. I
kind of doubt people are creating compact sequences and then asking for the
min/max/mean of them very often in practice.

Best,
~G

On Tue, Jun 29, 2021 at 11:26 AM Sebastian Martin Krantz <
sebastian.kra...@graduateinstitute.ch> wrote:

> Thanks Gabriel and Luke,
>
> I understand now the functions return NULL if no method is applicable. I
> wonder though why do ALTINTEGER_MIN and MAX return NULL on a plain integer
> sequence? I also see that min() and max() are not optimized i.e. min(1:1e8)
> appears to materialize the vector.
>
> In general I expect my functions to mostly be applied to real data so this
> is not a huge issue for me (I’d rather get rid of it again than calling
> sum() or risking that the macros are removed from the API), but it could be
> nice to have this speedup available to packages. If these macros have
> matured and it can be made explicit that they return NULL if no method is
> applicable, or, better, they internally dispatch to a normal sum method if
> this is the case, they could become very manageable and useful.
>
> Best,
>
> Sebastian
>
>
>
> On Tue 29. Jun 2021 at 21:09, Gabriel Becker 
> wrote:
>
>> Also, @Luke Tierney   I can prepare a patch that
>> has wrappers delegate to payload's ALTREP class methods for things like
>> sum, min, max, etc once conference season calms down a bit.
>>
>> Best,
>> ~G
>>
>> On Tue, Jun 29, 2021 at 11:07 AM Gabriel Becker 
>> wrote:
>>
>>> Hi Sebastian,
>>>
>>> So the way that it is currently factored, there isn't a good way of
>>> getting what you want under the constraints of what Luke said 
>>> (ALTINTEGER_SUM
>>> is not part of the API).
>>>
>>> I don't know what his reason are for saying that per say and would not
>>> want to speak for him, but of the top of my head, I suspect it is because
>>> ALTREP sum methods are allowed to return NULL (the C version) to say "I
>>> don't have a sum method that is applicable here, please continue with the
>>> normal code". So, just as an example, your exact code is likely to
>>> segfault, I think, if you hit an ALTREP that chooses not to implement a sum
>>> method because you'll be running around with a SEXP that has the value NULL
>>> (the C one, not the R one).
>>>
>>> One thing you could do, is check for altrepness and then construct and
>>> evaluate a call to the R sum function in that case, but that probably isn't
>>> quite what you want either, as this will hit the code you're trying to
>>> bypass/speedup  in the case where the ALTREP class doesn't implement a sum
>>> methods. I see that Luke just mentioned this as well but I'll leave it in
>>> since I had already typed it.
>>>
>>> I hope that helps clarify some things.
>>>
>>> Best,
>>> ~G
>>>
>>>
>>> On Tue, Jun 29, 2021 at 10:13 AM Sebastian Martin Krantz <
>>> sebastian.kra...@graduateinstitute.ch> wrote:
>>>
>>>> Thanks both. Is there a suggested way I can get this speedup in a
>>>> package?
>>>> Or just leave it for now?
>>>>
>>>> Thanks also for the clarification Bill. The issue I have with that is
>>>> that
>>>> in my C code ALTREP(x) evaluates to true even after adding and removing
>>>> dimensions (otherwise it would be handled by the normal sum method and
>>>> I’d
>>>> be fine). Also .Internal(inspect(x)) still shows the compact
>>>> representation.
>>>>
>>>> -Sebastian
>>>>
>>>> On Tue 29. Jun 2021 at 19:43, Bill Dunlap 
>>>> wrote:
>>>>
>>>> > Adding the dime

Re: [Rd] ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior

2021-06-29 Thread Gabriel Becker
Also, @Luke Tierney   I can prepare a patch that
has wrappers delegate to payload's ALTREP class methods for things like
sum, min, max, etc once conference season calms down a bit.

Best,
~G

On Tue, Jun 29, 2021 at 11:07 AM Gabriel Becker 
wrote:

> Hi Sebastian,
>
> So the way that it is currently factored, there isn't a good way of
> getting what you want under the constraints of what Luke said (ALTINTEGER_SUM
> is not part of the API).
>
> I don't know what his reason are for saying that per say and would not
> want to speak for him, but of the top of my head, I suspect it is because
> ALTREP sum methods are allowed to return NULL (the C version) to say "I
> don't have a sum method that is applicable here, please continue with the
> normal code". So, just as an example, your exact code is likely to
> segfault, I think, if you hit an ALTREP that chooses not to implement a sum
> method because you'll be running around with a SEXP that has the value NULL
> (the C one, not the R one).
>
> One thing you could do, is check for altrepness and then construct and
> evaluate a call to the R sum function in that case, but that probably isn't
> quite what you want either, as this will hit the code you're trying to
> bypass/speedup  in the case where the ALTREP class doesn't implement a sum
> methods. I see that Luke just mentioned this as well but I'll leave it in
> since I had already typed it.
>
> I hope that helps clarify some things.
>
> Best,
> ~G
>
>
> On Tue, Jun 29, 2021 at 10:13 AM Sebastian Martin Krantz <
> sebastian.kra...@graduateinstitute.ch> wrote:
>
>> Thanks both. Is there a suggested way I can get this speedup in a package?
>> Or just leave it for now?
>>
>> Thanks also for the clarification Bill. The issue I have with that is that
>> in my C code ALTREP(x) evaluates to true even after adding and removing
>> dimensions (otherwise it would be handled by the normal sum method and I’d
>> be fine). Also .Internal(inspect(x)) still shows the compact
>> representation.
>>
>> -Sebastian
>>
>> On Tue 29. Jun 2021 at 19:43, Bill Dunlap 
>> wrote:
>>
>> > Adding the dimensions attribute takes away the altrep-ness.  Removing
>> > dimensions
>> > does not make it altrep.  E.g.,
>> >
>> > > a <- 1:10
>> > > am <- a ; dim(am) <- c(2L,5L)
>> > > amn <- am ; dim(amn) <- NULL
>> > > .Call("is_altrep", a)
>> > [1] TRUE
>> > > .Call("is_altrep", am)
>> > [1] FALSE
>> > > .Call("is_altrep", amn)
>> > [1] FALSE
>> >
>> > where is_altrep() is defined by the following C code:
>> >
>> > #include 
>> > #include 
>> >
>> > SEXP is_altrep(SEXP x)
>> > {
>> > return Rf_ScalarLogical(ALTREP(x));
>> > }
>> >
>> >
>> > -Bill
>> >
>> > On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
>> > sebastian.kra...@graduateinstitute.ch> wrote:
>> >
>> >> Hello together, I'm working on some custom (grouped, weighted) sum, min
>> >> and
>> >> max functions and I want them to support the special case of plain
>> integer
>> >> sequences using ALTREP. I thereby encountered some behavior I cannot
>> >> explain to myself. The head of my fsum C function looks like this (g is
>> >> optional grouping vector, w is optional weights vector):
>> >>
>> >> SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
>> >>   int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
>> >> narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
>> >>   if(ALTREP(x) && ng == 0 && nwl) {
>> >> switch(tx) {
>> >> case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
>> >> case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
>> >> case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
>> >> default: error("ALTREP object must be integer or real typed");
>> >> }
>> >>   }
>> >> // ...
>> >> }
>> >>
>> >> when I let x <- 1:1e8, fsum(x) works fine and returns the correct
>> value.
>> >> If
>> >> I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn
>> this
>> >> into a vector again, dim(x) <- NULL, fsum(x) gives  NULL and a warning
>> >> message 'converting NULL pointer to R NULL'. For functions fmin and
>> fmax
>> &

Re: [Rd] ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior

2021-06-29 Thread Gabriel Becker
Hi Sebastian,

So the way that it is currently factored, there isn't a good way of getting
what you want under the constraints of what Luke said (ALTINTEGER_SUM is
not part of the API).

I don't know what his reason are for saying that per say and would not want
to speak for him, but of the top of my head, I suspect it is because ALTREP
sum methods are allowed to return NULL (the C version) to say "I don't have
a sum method that is applicable here, please continue with the normal
code". So, just as an example, your exact code is likely to segfault, I
think, if you hit an ALTREP that chooses not to implement a sum method
because you'll be running around with a SEXP that has the value NULL (the C
one, not the R one).

One thing you could do, is check for altrepness and then construct and
evaluate a call to the R sum function in that case, but that probably isn't
quite what you want either, as this will hit the code you're trying to
bypass/speedup  in the case where the ALTREP class doesn't implement a sum
methods. I see that Luke just mentioned this as well but I'll leave it in
since I had already typed it.

I hope that helps clarify some things.

Best,
~G


On Tue, Jun 29, 2021 at 10:13 AM Sebastian Martin Krantz <
sebastian.kra...@graduateinstitute.ch> wrote:

> Thanks both. Is there a suggested way I can get this speedup in a package?
> Or just leave it for now?
>
> Thanks also for the clarification Bill. The issue I have with that is that
> in my C code ALTREP(x) evaluates to true even after adding and removing
> dimensions (otherwise it would be handled by the normal sum method and I’d
> be fine). Also .Internal(inspect(x)) still shows the compact
> representation.
>
> -Sebastian
>
> On Tue 29. Jun 2021 at 19:43, Bill Dunlap 
> wrote:
>
> > Adding the dimensions attribute takes away the altrep-ness.  Removing
> > dimensions
> > does not make it altrep.  E.g.,
> >
> > > a <- 1:10
> > > am <- a ; dim(am) <- c(2L,5L)
> > > amn <- am ; dim(amn) <- NULL
> > > .Call("is_altrep", a)
> > [1] TRUE
> > > .Call("is_altrep", am)
> > [1] FALSE
> > > .Call("is_altrep", amn)
> > [1] FALSE
> >
> > where is_altrep() is defined by the following C code:
> >
> > #include 
> > #include 
> >
> > SEXP is_altrep(SEXP x)
> > {
> > return Rf_ScalarLogical(ALTREP(x));
> > }
> >
> >
> > -Bill
> >
> > On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
> > sebastian.kra...@graduateinstitute.ch> wrote:
> >
> >> Hello together, I'm working on some custom (grouped, weighted) sum, min
> >> and
> >> max functions and I want them to support the special case of plain
> integer
> >> sequences using ALTREP. I thereby encountered some behavior I cannot
> >> explain to myself. The head of my fsum C function looks like this (g is
> >> optional grouping vector, w is optional weights vector):
> >>
> >> SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
> >>   int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
> >> narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
> >>   if(ALTREP(x) && ng == 0 && nwl) {
> >> switch(tx) {
> >> case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
> >> case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
> >> case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
> >> default: error("ALTREP object must be integer or real typed");
> >> }
> >>   }
> >> // ...
> >> }
> >>
> >> when I let x <- 1:1e8, fsum(x) works fine and returns the correct value.
> >> If
> >> I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn
> this
> >> into a vector again, dim(x) <- NULL, fsum(x) gives  NULL and a warning
> >> message 'converting NULL pointer to R NULL'. For functions fmin and fmax
> >> (similarly defined using ALTINTEGER_MIN/MAX), I get this error right
> away
> >> e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
> >> NULL'. So what is going on here? What do these functions return? And how
> >> do
> >> I make this a robust implementation?
> >>
> >> Best regards,
> >>
> >> Sebastian Krantz
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] S3 weirdness

2021-06-25 Thread Gabriel Becker
On Thu, Jun 24, 2021 at 4:48 PM Gabor Grothendieck 
wrote:

> The fact that zoo:: in one part of the code has a side effect in
> another seems not to be in the spirit of functional programming or
> modularity.
>

While this is true, there is no way I know of for a package function
to...well, function in the general case without its namespace loaded, and
as has been brought up many times on this list, unloading namespaces fully
also doesn't work in the fully general case. Given those facts, it seems
the current behavior is essentially all that is possible to be done, right?

Also even if a namespace could be unloaded, can you imagine the penalty if
the namespace was loaded and then unloaded after every :: call? Some
scripts would just never complete at all. I kid, of course, but not by that
much I think...

~G


>
> On Thu, Jun 24, 2021 at 6:51 PM Simon Urbanek
>  wrote:
> >
> > Gabor,
> >
> > just by using zoo::read.zoo() you *do* load the namespace:
> >
> > > args(zoo::read.zoo)
> > function (file, format = "", tz = "", FUN = NULL, regular = FALSE,
> > index.column = 1, drop = TRUE, FUN2 = NULL, split = NULL,
> > aggregate = FALSE, ..., text, read = read.table)
> > NULL
> > > sessionInfo()
> > R Under development (unstable) (2021-06-23 r80548)
> > Platform: x86_64-apple-darwin19.6.0 (64-bit)
> > Running under: macOS Catalina 10.15.7
> >
> > Matrix products: default
> > BLAS:   /Volumes/Builds/R/build/lib/libRblas.dylib
> > LAPACK: /Volumes/Builds/R/build/lib/libRlapack.dylib
> >
> > locale:
> > [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
> >
> > attached base packages:
> > [1] stats graphics  grDevices utils datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] zoo_1.8-9   compiler_4.2.0  grid_4.2.0  lattice_0.20-44
> >
> > which includes S3 method dispatch tables:
> >
> > > methods(as.ts)
> > [1] as.ts.default* as.ts.zoo* as.ts.zooreg*
> > see '?methods' for accessing help and source code
> >
> > so the behavior is as expected.
> >
> > Cheers,
> > Simon
> >
> >
> > > On 25/06/2021, at 9:56 AM, Gabor Grothendieck 
> wrote:
> > >
> > > If we start up a vanilla session of R with no packages loaded and
> > > type the single line of code below as the first line entered then
> > > we get the output shown below.  The NA in the output and the length
> > > of 7 indicate that as.ts dispatched as.ts.zoo since as.ts.default
> > > would have resulted in a length of 6 with no NA's. It should not have
> > > known about as.ts.zoo since we never  explicitly loaded the zoo
> > > package using library or require.
> > > zoo:: was only used to refer to read.zoo.  This seems to be a bug in
> > > the way R is currently working.
> > >
> > >  as.ts(zoo::read.zoo(BOD))
> > >  ## Time Series:
> > >  ## Start = 1
> > >  ## End = 7
> > >  ## Frequency = 1
> > >  ## [1]  8.3 10.3 19.0 16.0 15.6   NA 19.8
> > >
> > >  R.version.string
> > >  ## [1] "R version 4.1.0 RC (2021-05-16 r80303)"
> > >
> > > --
> > > Statistics & Software Consulting
> > > GKX Group, GKX Associates Inc.
> > > tel: 1-877-GKX-GROUP
> > > email: ggrothendieck at gmail.com
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Possible ALTREP bug

2021-06-17 Thread Gabriel Becker
hole object you
> may want to get data in chunks. There are iteration macros that
> help. Some examples are in src/main/summary.c.
>
> Best,
>
> luke
>
> >
> > On Wed, Jun 16, 2021 at 4:29 PM Simon Urbanek <
> simon.urba...@r-project.org>
> > wrote:
> >   The usual quote applies: "use the source, Luke":
> >
> >   $ grep _ELT *.h | sort
> >   Rdefines.h:#define SET_ELEMENT(x, i, val)
> >SET_VECTOR_ELT(x, i, val)
> >   Rinternals.h:   The function STRING_ELT is used as an argument
> >   to arrayAssign even
> >   Rinternals.h:#define VECTOR_ELT(x,i)((SEXP *) DATAPTR(x))[i]
> >   Rinternals.h://SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
> >   Rinternals.h:Rbyte (RAW_ELT)(SEXP x, R_xlen_t i);
> >   Rinternals.h:Rbyte ALTRAW_ELT(SEXP x, R_xlen_t i);
> >   Rinternals.h:Rcomplex (COMPLEX_ELT)(SEXP x, R_xlen_t i);
> >   Rinternals.h:Rcomplex ALTCOMPLEX_ELT(SEXP x, R_xlen_t i);
> >   Rinternals.h:SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
> >   Rinternals.h:SEXP (VECTOR_ELT)(SEXP x, R_xlen_t i);
> >   Rinternals.h:SEXP ALTSTRING_ELT(SEXP, R_xlen_t);
> >   Rinternals.h:SEXP SET_VECTOR_ELT(SEXP x, R_xlen_t i, SEXP v);
> >   Rinternals.h:double (REAL_ELT)(SEXP x, R_xlen_t i);
> >   Rinternals.h:double ALTREAL_ELT(SEXP x, R_xlen_t i);
> >   Rinternals.h:int (INTEGER_ELT)(SEXP x, R_xlen_t i);
> >   Rinternals.h:int (LOGICAL_ELT)(SEXP x, R_xlen_t i);
> >   Rinternals.h:int ALTINTEGER_ELT(SEXP x, R_xlen_t i);
> >   Rinternals.h:int ALTLOGICAL_ELT(SEXP x, R_xlen_t i);
> >   Rinternals.h:void ALTCOMPLEX_SET_ELT(SEXP x, R_xlen_t i,
> >   Rcomplex v);
> >   Rinternals.h:void ALTINTEGER_SET_ELT(SEXP x, R_xlen_t i, int v);
> >   Rinternals.h:void ALTLOGICAL_SET_ELT(SEXP x, R_xlen_t i, int v);
> >   Rinternals.h:void ALTRAW_SET_ELT(SEXP x, R_xlen_t i, Rbyte v);
> >   Rinternals.h:void ALTREAL_SET_ELT(SEXP x, R_xlen_t i, double v);
> >   Rinternals.h:void ALTSTRING_SET_ELT(SEXP, R_xlen_t, SEXP);
> >   Rinternals.h:void SET_INTEGER_ELT(SEXP x, R_xlen_t i, int v);
> >   Rinternals.h:void SET_LOGICAL_ELT(SEXP x, R_xlen_t i, int v);
> >   Rinternals.h:void SET_REAL_ELT(SEXP x, R_xlen_t i, double v);
> >   Rinternals.h:void SET_STRING_ELT(SEXP x, R_xlen_t i, SEXP v);
> >
> >   So the indexing is with R_xlen_t and they return the value
> >   itself as one would expect.
> >
> >   Cheers,
> >   Simon
> >
> >
> >   > On Jun 17, 2021, at 2:22 AM, Toby Hocking 
> >   wrote:
> >   >
> >   > By the way, where is the documentation for INTEGER_ELT,
> >   REAL_ELT, etc? I
> >   > looked in Writing R Extensions and R Internals but I did not
> >   see any
> >   > mention.
> >   > REAL_ELT is briefly mentioned on
> >   > https://svn.r-project.org/R/branches/ALTREP/ALTREP.html
> >   > Would it be possible to please add some mention of them to
> >   Writing R
> >   > Extensions?
> >   > - how many of these _ELT functions are there? INTEGER, REAL,
> >   ... ?
> >   > - in what version of R were they introduced?
> >   > - I guess input types are always SEXP and int?
> >   > - What are the output types for each?
> >   >
> >   > On Fri, May 28, 2021 at 5:16 PM 
> >   wrote:
> >   >
> >   >> Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly
> >   new it may
> >   >> be possible to check that places where they are used allow
> >   for them to
> >   >> allocate. I have fixed the one that got caught by Gabor's
> >   example, and
> >   >> a rchk run might be able to pick up others if rchk knows
> >   these could
> >   >> allocate. (I may also be forgetting other places where the
> >   _ELt
> >   >> methods are used.)  Fixing all call sites for REAL, INTEGER,
> >   etc, was
> >   >> never realistic so there GC has to be suspended during the
> >   method
> >   >> call, and that is done in the dispatch mechanism.
> >   >>
> >   >> The bigger problem is jumps from inside things that existing
> >   code
> >   >> assumes will not do that. Catching those jumps is possible
> >   but
> >   >> expensive; doing anything sensible if one is caught is really

Re: [Rd] [External] Possible ALTREP bug

2021-05-28 Thread Gabriel Becker
Hi Jim et al,

Just to hopefully add a bit to what Luke already answered, from what I am
recalling looking back at that bioconductor thread Elt methods are used in
places where there are hard implicit assumptions that no garbage collection
will occur (ie they are called on things that aren't PROTECTed), and beyond
that, in places where there are hard assumptions that no error (longjmp)
will occur. I could be wrong, but I don't know that suspending garbage
collection would protect from the second one. Ie it is possible that an
error *ever* being raised from R code that implements an elt method could
cause all hell to break loose.

Luke or Tomas Kalibera would know more.

I was disappointed that implementing ALTREPs in R code was not in the cards
(it was in my original proposal back in 2016 to the DSC) but I trust Luke
that there are important reasons we can't safely allow that.

Best,
~G

On Fri, May 28, 2021 at 8:31 AM Jim Hester  wrote:

> From reading the discussion on the Bioconductor issue tracker it seems like
> the reason the GC is not suspended for the non-string ALTREP Elt methods is
> primarily due to performance concerns.
>
> If this is the case perhaps an additional flag could be added to the
> `R_set_altrep_*()` functions so ALTREP authors could indicate if GC should
> be halted when that particular method is called for that particular ALTREP
> class.
>
> This would avoid the performance hit (other than a boolean check) for the
> standard case when no allocations are expected, but allow authors to
> indicate that R should pause GC if needed for methods in their class.
>
> On Fri, May 28, 2021 at 9:42 AM  wrote:
>
> > integer and real Elt methods are not expected to allocate. You would
> > have to suspend GC to be able to do that. This currently can't be done
> > from package code.
> >
> > Best,
> >
> > luke
> >
> > On Fri, 28 May 2021, Gábor Csárdi wrote:
> >
> > > I have found some weird SEXP corruption behavior with ALTREP, which
> > > could be a bug. (Or I could be doing something wrong.)
> > >
> > > I have an integer ALTREP vector that calls back to R from the Elt
> > > method. When this vector is indexed in a lapply(), its first element
> > > gets corrupted. Sometimes it's just a type change to logical, but
> > > sometimes the corruption causes a crash.
> > >
> > > I saw this on macOS from R 3.5.3 to 4.2.0. I created a small package
> > > that demonstrates this: https://github.com/gaborcsardi/redfish
> > >
> > > The R callback in this package calls `loadNamespace("Matrix")`, but
> > > the same crash happens for other packages as well, and sometimes it
> > > also happens if I don't load any packages at all. (But that example
> > > was much more complicated, so I went with the package loading.)
> > >
> > > It is somewhat random, and sometimes turning off the JIT avoids the
> > > crash, but not always.
> > >
> > > Hopefully I am just doing something wrong in the ALTREP code (see
> > > https://github.com/gaborcsardi/redfish/blob/main/src/test.c), and it
> > > is not actually a bug.
> > >
> > > Thanks,
> > > Gabor
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> > --
> > Luke Tierney
> > Ralph E. Wareham Professor of Mathematical Sciences
> > University of Iowa  Phone: 319-335-3386
> > Department of Statistics andFax:   319-335-3017
> > Actuarial Science
> > 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> > Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 1954 from NA

2021-05-24 Thread Gabriel Becker
Hi All,

So there is a not particularly active, but closely curated (ie everything
on there should be good in terms of principled examples) github
organization of ALTREP examples: https://github.com/ALTREP-examples.
Currently there are two examples by Luke (including a package version of
the memory map ALTREP he wrote) and one by me.

To elaborate a bit more it looks like you could have read-only vectors with
tagged NAs, because despite my incorrect recollection, It looks like
Extract_subset IS hooked up, so subsetting an ALTREP can, depending on the
altrep class, give you another ALTREP.

They would effectively be subsettable but not mutable, though,
because setting elements in an ALTREP vector still wipes its altrepness.
This is unfortunate but an intentional design decision that itself
currently appears immutable,if you'll excuse the pun, last I heard.

I understand that that is a relatively sizable caveat, but ce la vie

Assuming that things would be useful with that caveat I can try to put a
proof of concept example into that organization that could works as the
starting board for a deeper collaboration soon. I think I have in my head a
way to approach it.

~G

On Mon, May 24, 2021 at 3:00 PM Nicholas Tierney 
wrote:

> Hi all,
>
> When first hearing about ALTREP I've wondered how it might be able to be
> used to store special missing value information - how can we learn more
> about implementing ALTREP classes? The idea of carrying around a "meaning
> of my NAs" vector, as Gabe said, would be very interesting!
>
> I've done a bit on creating "special missing values", as done in SPSS,
> SAS, and STATA, here:
> http://naniar.njtierney.com/articles/special-missing-values.html  (Note
> this approach requires carrying a duplicated dataframe of missing data
> around with the data - which I argue makes it easier to reason with, at the
> cost of storage. However this is just my approach, and there are others out
> there).
>
> Best,
>
> Nick
>
> On Tue, 25 May 2021 at 01:16, Adrian Dușa  wrote:
>
>> On Mon, May 24, 2021 at 5:47 PM Gabriel Becker 
>> wrote:
>>
>> > Hi Adrian,
>> >
>> > I had the same thought as Luke. It is possible that you can develop an
>> > ALTREP that carries around the tagging information you're looking for
>> in a
>> > way that is more persistent (in some cases) than R-level attributes and
>> > more hidden than additional user-visible columns.
>> >
>> > The downsides to this, of course, is that you'll in some sense be doing
>> > the same "extra vector for each vector you want tagged NA-s within"
>> under
>> > the hood, and that only custom machinery you write will recognize
>> things as
>> > something other than bog-standard NAs/NaNs.  You'll also have some
>> problems
>> > with the fact that data in ALTREPs isn't currently modifiable without
>> > losing ALTREPness. That said, ALTREPs are allowed to carry around
>> arbitrary
>> > persistent information with them, so from that perspective making an
>> ALTREP
>> > that carries around a "meaning of my NAs" vector of tags in its metadata
>> > would be pretty straightforward.
>> >
>>
>> Oh... now that is extremely interesting.
>> It is the first time I came across the ALTREP concept, so I need to study
>> the way it works before saying anything, but definitely something to
>> consider.
>>
>> Thanks so much for the pointer,
>> Adrian
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 1954 from NA

2021-05-24 Thread Gabriel Becker
Hi Adrian,

I had the same thought as Luke. It is possible that you can develop an
ALTREP that carries around the tagging information you're looking for in a
way that is more persistent (in some cases) than R-level attributes and
more hidden than additional user-visible columns.

The downsides to this, of course, is that you'll in some sense be doing the
same "extra vector for each vector you want tagged NA-s within" under the
hood, and that only custom machinery you write will recognize things as
something other than bog-standard NAs/NaNs.  You'll also have some problems
with the fact that data in ALTREPs isn't currently modifiable without
losing ALTREPness. That said, ALTREPs are allowed to carry around arbitrary
persistent information with them, so from that perspective making an ALTREP
that carries around a "meaning of my NAs" vector of tags in its metadata
would be pretty straightforward.

Best,
~G

On Mon, May 24, 2021 at 7:30 AM Adrian Dușa  wrote:

> Hi Taras,
>
> On Mon, May 24, 2021 at 4:20 PM Taras Zakharko 
> wrote:
>
> > Hi Adrian,
> >
> > Have a look at vctrs package — they have low-level primitives that might
> > simplify your life a bit. I think you can get quite far by creating a
> > custom type that stores NAs in an attribute and utilizes vctrs proxy
> > functionality to preserve these attributes across different operations.
> > Going that route will likely to give you a much more flexible and robust
> > solution.
> >
>
> Yes I am well aware of the primitives from package vctrs, since package
> haven itself uses the vctrs_vctr class.
> They're doing a very interesting work, albeit not a solution for this
> particular problem.
>
> A.
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Testing R build when using --without-recommended-packages?

2021-05-04 Thread Gabriel Becker
Hmm, that's fair enough Ben, I stand corrected.  I will say that this seems
to be a pretty "soft" recommendation, as these things go, given that it
isn't tested for by R CMD check, including with the -as-cran extensions. In
principle, it seems like it could be, similar checks are made in package
code for inappropriate external-package-symbol usage/

Either way, though, I suppose I have a number of packages which have been
invisibly non-best-practices compliant for their entire lifetimes (or at
least, the portion of that where they had tests/vignettes...).

Best,
~G

On Tue, May 4, 2021 at 2:22 PM Ben Bolker  wrote:

>
>Sorry if this has been pointed out already, but some relevant text
> from
>
> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Suggested-packages
>
>  > Note that someone wanting to run the examples/tests/vignettes may not
> have a suggested package available (and it may not even be possible to
> install it for that platform). The recommendation used to be to make
> their use conditional via if(require("pkgname")): this is OK if that
> conditioning is done in examples/tests/vignettes, although using
> if(requireNamespace("pkgname")) is preferred, if possible.
>
> ...
>
>  > Some people have assumed that a ‘recommended’ package in ‘Suggests’
> can safely be used unconditionally, but this is not so. (R can be
> installed without recommended packages, and which packages are
> ‘recommended’ may change.)
>
>
>
> On 5/4/21 5:10 PM, Gabriel Becker wrote:
> > Hi Henrik,
> >
> > A couple of things. Firstly, so far asI have ever heard, it's valid that
> a
> > package have hard dependencies in its tests for packages listed only in
> > Suggests.  In fact, that is one of the stated purposes of Suggests. An
> > argument could be made, I suppose, that the base packages should be under
> > stricter guidelines, but stats isn't violating the letter or intention of
> > Suggests by doing this.
> >
> >
> > Secondly, I don't have time to dig through the make files/administration
> > docs, but I do know that R CMD check has --no-stop-on-error, so you can
> > either separately or as part of make check, use that option for stats
> (and
> > elsewhere as needed?) and just know that the stats tests that depend on
> > MASS are "false positive" (or, more accurately, missing value) test
> > results, rather than real positives, and go from there.
> >
> > You could also "patch" the tests as part of your build process.
> Somewhere I
> > worked had to do that for parts of the internet tests that were unable to
> > get through the firewall.
> >
> > Best,
> > ~G
> >
> >
> >
> > On Tue, May 4, 2021 at 1:04 PM Henrik Bengtsson <
> henrik.bengts...@gmail.com>
> > wrote:
> >
> >> Two questions to R Core:
> >>
> >> 1. Is R designed so that 'recommended' packages are optional, or
> >> should that be considered uncharted territories?
> >>
> >> 2. Can such an R build/installation be validated using existing check
> >> methods?
> >>
> >>
> >> --
> >>
> >> Dirk, it's not clear to me whether you know for sure, or you draw
> >> conclusions based your long experience and reading. I think it's very
> >> important that others don't find this thread later on and read your
> >> comments as if they're the "truth" (unless they are).  I haven't
> >> re-read it from start to finish, but there are passages in 'R
> >> Installation and Administration' suggesting you can build and install
> >> R without 'recommended' packages.  For example, post-installation,
> >> Section 'Testing an Installation' suggests you can run (after making
> >> sure `make install-tests`):
> >>
> >> cd tests
> >> ../bin/R CMD make check
> >>
> >> but they fail the same way.  The passage continuous "... and other
> >> useful targets are test-BasePackages and test-Recommended to run tests
> >> of the standard and recommended packages (if installed) respectively."
> >> (*).  So, to me that hints at 'recommended' packages are optional just
> >> as they're "Priority: recommended".  Further down, there's also a
> >> mentioning of:
> >>
> >> $ R_LIBS_USER="" R --vanilla
> >>> Sys.setenv(LC_COLLATE = "C", LC_TIME = "C", LANGUAGE = "en")
> >>> tools::testInstalledPackages(scope = "base")
> >>
> >> which also produces errors when 'recommended' pa

Re: [Rd] Testing R build when using --without-recommended-packages?

2021-05-04 Thread Gabriel Becker
Hi Henrik,

A couple of things. Firstly, so far asI have ever heard, it's valid that a
package have hard dependencies in its tests for packages listed only in
Suggests.  In fact, that is one of the stated purposes of Suggests. An
argument could be made, I suppose, that the base packages should be under
stricter guidelines, but stats isn't violating the letter or intention of
Suggests by doing this.


Secondly, I don't have time to dig through the make files/administration
docs, but I do know that R CMD check has --no-stop-on-error, so you can
either separately or as part of make check, use that option for stats (and
elsewhere as needed?) and just know that the stats tests that depend on
MASS are "false positive" (or, more accurately, missing value) test
results, rather than real positives, and go from there.

You could also "patch" the tests as part of your build process. Somewhere I
worked had to do that for parts of the internet tests that were unable to
get through the firewall.

Best,
~G



On Tue, May 4, 2021 at 1:04 PM Henrik Bengtsson 
wrote:

> Two questions to R Core:
>
> 1. Is R designed so that 'recommended' packages are optional, or
> should that be considered uncharted territories?
>
> 2. Can such an R build/installation be validated using existing check
> methods?
>
>
> --
>
> Dirk, it's not clear to me whether you know for sure, or you draw
> conclusions based your long experience and reading. I think it's very
> important that others don't find this thread later on and read your
> comments as if they're the "truth" (unless they are).  I haven't
> re-read it from start to finish, but there are passages in 'R
> Installation and Administration' suggesting you can build and install
> R without 'recommended' packages.  For example, post-installation,
> Section 'Testing an Installation' suggests you can run (after making
> sure `make install-tests`):
>
> cd tests
> ../bin/R CMD make check
>
> but they fail the same way.  The passage continuous "... and other
> useful targets are test-BasePackages and test-Recommended to run tests
> of the standard and recommended packages (if installed) respectively."
> (*).  So, to me that hints at 'recommended' packages are optional just
> as they're "Priority: recommended".  Further down, there's also a
> mentioning of:
>
> $ R_LIBS_USER="" R --vanilla
> > Sys.setenv(LC_COLLATE = "C", LC_TIME = "C", LANGUAGE = "en")
> > tools::testInstalledPackages(scope = "base")
>
> which also produces errors when 'recommended' packages are missing,
> e.g. "Failed with error:  'there is no package called 'nlme'".
>
> (*) BTW, '../bin/R CMD make test-BasePackages' gives "make: *** No
> rule to make target 'test-BasePackages'.  Stop."
>
> Thanks,
>
> /Henrik
>
> On Tue, May 4, 2021 at 12:22 PM Dirk Eddelbuettel  wrote:
> >
> >
> > On 4 May 2021 at 11:25, Henrik Bengtsson wrote:
> > | FWIW,
> > |
> > | $ ./configure --help
> > | ...
> > |   --with-recommended-packages
> > |   use/install recommended R packages [yes]
> >
> > Of course. But look at the verb in your Subject: no optionality _in
> testing_ there.
> >
> > You obviously need to be able to build R itself to then build the
> recommended
> > packages you need for testing.
> >
> > Dirk
> >
> > --
> > https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.list fails on functions with S3 classes

2021-04-28 Thread Gabriel Becker
On Wed, Apr 28, 2021 at 6:04 PM brodie gaslam 
wrote:

>
> > On Wednesday, April 28, 2021, 5:16:20 PM EDT, Gabriel Becker <
> gabembec...@gmail.com> wrote:
> >
>
> > The analogous case for non-closures to what you are describing would be
> for
> > S3 to check mode(x) after striking out with class(x) to find relevant
> > methods. I don't think that would be appropriate.
>
> I would think of the general case to be to check `class(unclass(x))` on
> strike-out.


To me the general case is writing a robust default method that covers
whatever would be class(unclass(x)) would be. When you give an object a new
S3 class, you have the option of extending (c("newclass", "oldclass")) and
"not extending" (just "newclass"), and it certainly doesn't seem to me that
these two should behave the same. Perhaps others disagree.


>   This would then include things such as "matrix", etc.
> Dispatching on the implicit class as fallback seems like a natural thing
> to do in a language that dispatches on implicit class when there is none.
> After all, once you've struck out of your explicit classes, you have
> none left!
>
> This does happen naturally in some places (e.g. interacting with a

data.frame as a list), and is quite delightful (usually).


So I don't know of any places that this happens *in the S3 dispatch sense*.
There are certainly places where the default  method supports lists, and if
data.frame doesn't have a method so it hits the default method, which
handles lists. Am I missing somewhere where the dispatch gives a data.frame
to a list method (in S3 space)?


> I won't get
> into an argument of what the documentation states or whether any changes
> should be made, but to me that dispatch doesn't end with the implicit
> class seems feels like a logical wrinkle.  Yes, I can twist my brain to
> see how it can be made to make sense, but I don't like it.
>

I suppose it depends on how you view S3 dispatch. To me, view it purely as
labeling. S3 dispatch has literally nothing to do with the content of the
object. What you're describing would make that not the case. (Or if I'm
wrong about what is happening, then I'm incorrect about that too).

Best,
~G


>
> A fun past conversation on this very topic:
>
> https://stat.ethz.ch/pipermail/r-devel/2019-March/077457.html
>
> Best,
>
> B.
>
> > Also, as an aside, if you want your class to override methods that exist
> > for function you would want to set the class to c("foo", "function"), not
> > c("function", "foo"), as you had it in your example.
> >
> > Best,
> > ~G
> >
> > On Wed, Apr 28, 2021 at 1:45 PM Antoine Fabri 
> > wrote:
> >
> >> Dear R devel,
> >>
> >> as.list() can be used on functions, but not if they have a S3 class that
> >> doesn't include "function".
> >>
> >> See below :
> >>
> >> ```r
> >> add1 <- function(x) x+1
> >>
> >> as.list(add1)
> >> #> $x
> >> #>
> >> #>
> >> #> [[2]]
> >> #> x + 1
> >>
> >> class(add1) <- c("function", "foo")
> >>
> >> as.list(add1)
> >> #> $x
> >> #>
> >> #>
> >> #> [[2]]
> >> #> x + 1
> >>
> >> class(add1) <- "foo"
> >>
> >> as.list(add1)
> >> #> Error in as.vector(x, "list"): cannot coerce type 'closure' to
> vector of
> >> type 'list'
> >>
> >> as.list.function(add1)
> >> #> $x
> >> #>
> >> #>
> >> #> [[2]]
> >> #> x + 1
> >> ```
> >>
> >> In failing case the argument is dispatched to as.list.default instead of
> >> as.list.function.
> >>
> >> (1) Shouldn't it be dispatched to as.list.function ?
> >>
> >> (2) Shouldn't all generics when applied on an object of type closure
> fall
> >> back to the `fun.function` method  before falling back to the
> `fun.default`
> >> method ?
> >>
> >> Best regards,
> >>
> >> Antoine
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.list fails on functions with S3 classes

2021-04-28 Thread Gabriel Becker
Hi Antoine,

I would say this is the correct behavior. S3 dispatch is solely (so far as
I know?) concerned with the "actual classes" on the object. This is because
S3 classes act as labels that inform dispatch what, and in what order,
methods should be applied. You took the function class (ie label) off of
your object, which means that in the S3 sense, that object is no longer a
function and dispatching to function methods for it would be incorrect.
This is independent of whether the object is still callable "as a function".

The analogous case for non-closures to what you are describing would be for
S3 to check mode(x) after striking out with class(x) to find relevant
methods. I don't think that would be appropriate.

Also, as an aside, if you want your class to override methods that exist
for function you would want to set the class to c("foo", "function"), not
c("function", "foo"), as you had it in your example.

Best,
~G



On Wed, Apr 28, 2021 at 1:45 PM Antoine Fabri 
wrote:

> Dear R devel,
>
> as.list() can be used on functions, but not if they have a S3 class that
> doesn't include "function".
>
> See below :
>
> ```r
> add1 <- function(x) x+1
>
> as.list(add1)
> #> $x
> #>
> #>
> #> [[2]]
> #> x + 1
>
> class(add1) <- c("function", "foo")
>
> as.list(add1)
> #> $x
> #>
> #>
> #> [[2]]
> #> x + 1
>
> class(add1) <- "foo"
>
> as.list(add1)
> #> Error in as.vector(x, "list"): cannot coerce type 'closure' to vector of
> type 'list'
>
> as.list.function(add1)
> #> $x
> #>
> #>
> #> [[2]]
> #> x + 1
> ```
>
> In failing case the argument is dispatched to as.list.default instead of
> as.list.function.
>
> (1) Shouldn't it be dispatched to as.list.function ?
>
> (2) Shouldn't all generics when applied on an object of type closure fall
> back to the `fun.function` method  before falling back to the `fun.default`
> method ?
>
> Best regards,
>
> Antoine
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Potential improvements of ave?

2021-03-16 Thread Gabriel Becker
Hi Abby,

I actually have a patch submitted that does this for unique/duplicated
(only numeric cases I think) but it is, as patches from external
contributors go, quite sizable which means it requires a correspondingly
large amount of an R-core member's time and energy to vet and consider. It
is in the queue, and so, I expect (/hope, provided I didn't make a mistake)
it will be incorporated at some point. (
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17993)

You are correct that the speedups are quite significant for calling
unique/duplicated on large vectors that know they are sorted: Speedup on my
machine for a fairly sizable vector (length 1e7) ranges from about ~10x in
the densely duplicated case up to ~60-70x in the sparsely duplicated case
for duplicated(). For unique() it seems to range from ~10x in the densely
duplicated case to ~15 in the spare case.

I had thought that min and max already did this, but looking now, they
don't seem to by default, thought ALTREP classes themselves do have an
option of setting a min/max method, which would be hit. That does seem like
low-hanging fruit, I agree, though in many cases the slow down from a
single pass over the data to get a min probably isn't earthshattering.

The others do seem like they could benefit as well.

Best,
~G

On Tue, Mar 16, 2021 at 2:54 PM Abby Spurdle  wrote:

> There are some relatively obvious examples:
> unique, which.min/which.max/etc, range/min/max, quantile, aggregate/split
>
> Also, many timeseries, graphics and spline functions are dependent on the
> order.
>
> In the case of data.frame(s), a boolean flag would probably need to be
> extended to allow for multiple column sorting, and
> ascending/descending options.
>
> On Tue, Mar 16, 2021 at 11:08 AM Gabriel Becker 
> wrote:
> >
> > Abby,
> >
> > Vectors do have an internal mechanism for knowing that they are sorted
> via ALTREP (it was one of 2 core motivating features for 'smart vectors'
> the other being knowledge about presence of NAs).
> >
> > Currently I don't think we expose it at the R level, though it is part
> of the official C API. I don't know of any plans for this to change, but I
> suppose it could. Plus for functions in R itself, we could even use it
> without exposing it more widely. A number of functions, including sort
> itself, already do this in fact, but more could. I'd be interested in
> hearing which functions you think would particularly benefit from this.
> >
> > ~G
> >
> > On Mon, Mar 15, 2021 at 12:01 PM SOEIRO Thomas 
> wrote:
> >>
> >> Hi Abby,
> >>
> >> Thank you for your positive feedback.
> >>
> >> I agree for your general comment about sorting.
> >>
> >> For ave specifically, ordering may not help because the output must
> maintain the order of the input (as ave returns only x and not the entiere
> data.frame).
> >>
> >> Thanks,
> >>
> >> Thomas
> >> 
> >> De : Abby Spurdle 
> >> Envoyé : lundi 15 mars 2021 10:22
> >> À : SOEIRO Thomas
> >> Cc : r-devel@r-project.org
> >> Objet : Re: [Rd] Potential improvements of ave?
> >>
> >> EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS
> >>
> >> Hi Thomas,
> >>
> >> These are some great suggestions.
> >> But I can't help but feel there's a much bigger problem here.
> >>
> >> Intuitively, the ave function could (or should) sort the data.
> >> Then the indexing step becomes almost trivial, in terms of both time
> >> and space complexity.
> >> And the ave function is not the only example of where a problem
> >> becomes much simpler, if the data is sorted.
> >>
> >> Historically, I've never found base R functions user-friendly for
> >> aggregation purposes, or for sorting.
> >> (At least, not by comparison to SQL).
> >>
> >> But that's not the main problem.
> >> It would seem preferable to sort the data, only once.
> >> (Rather than sorting it repeatedly, or not at all).
> >>
> >> Perhaps, objects such as vectors and data.frame(s) could have a
> >> boolean attribute, to indicate if they're sorted.
> >> Or functions such as ave could have a sorted argument.
> >> In either case, if true, the function assumes the data is sorted and
> >> applies a more efficient algorithm.
> >>
> >>
> >> B.
> >>
> >>
> >> On Sat, Mar 13, 2021 at 1:07 PM SOEIRO Thomas 
> wrote:
> >> >
> >> > Dear all,
> >> >
> >> > I have two questions/suggestions abou

[Rd] Undefined (so far as I can tell?) behavior of browser when called at top level of sourced script?

2021-03-16 Thread Gabriel Becker
Hi all,

I was asked a question about why browser() was behaving a specific way, and
it turned out that it was being called in a script (rather than in a
function).

Putting aside the design considerations that lead to that, the behavior is
actually a bit puzzling, and so far as I have been able to see, completely
undocumented. My suspicion is that this behavior should be considered
undefined, but I wanted to make sure I wasn't missing something. (To be
perfectly honest I was a bit surprised it wasn't an error).

Some experimentation (done in 4.0.1 because that is what I have available,
R script attached) has lead me to conclude that if browser is called at the
top level, 'n' will just continue to the end, *except* in the case where
the next expression is a conditional ** which has a consequent that is
evaluated** or a loop, in which case it walks through the consequent/loop
body however many number of times and then the 'n' that steps out of that
continues on.

Should something be added to the documentation that either describes this
behavior, declares explicitly that using browser at the top level leads to
undefined behavior, or both?

I can prepare a patch to that effect if desired.

~G
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Potential improvements of ave?

2021-03-15 Thread Gabriel Becker
Abby,

Vectors do have an internal mechanism for knowing that they are sorted via
ALTREP (it was one of 2 core motivating features for 'smart vectors' the
other being knowledge about presence of NAs).

Currently I don't think we expose it at the R level, though it is part of
the official C API. I don't know of any plans for this to change, but I
suppose it could. Plus for functions in R itself, we could even use it
without exposing it more widely. A number of functions, including sort
itself, already do this in fact, but more could. I'd be interested in
hearing which functions you think would particularly benefit from this.

~G

On Mon, Mar 15, 2021 at 12:01 PM SOEIRO Thomas 
wrote:

> Hi Abby,
>
> Thank you for your positive feedback.
>
> I agree for your general comment about sorting.
>
> For ave specifically, ordering may not help because the output must
> maintain the order of the input (as ave returns only x and not the entiere
> data.frame).
>
> Thanks,
>
> Thomas
> 
> De : Abby Spurdle 
> Envoyé : lundi 15 mars 2021 10:22
> À : SOEIRO Thomas
> Cc : r-devel@r-project.org
> Objet : Re: [Rd] Potential improvements of ave?
>
> EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS
>
> Hi Thomas,
>
> These are some great suggestions.
> But I can't help but feel there's a much bigger problem here.
>
> Intuitively, the ave function could (or should) sort the data.
> Then the indexing step becomes almost trivial, in terms of both time
> and space complexity.
> And the ave function is not the only example of where a problem
> becomes much simpler, if the data is sorted.
>
> Historically, I've never found base R functions user-friendly for
> aggregation purposes, or for sorting.
> (At least, not by comparison to SQL).
>
> But that's not the main problem.
> It would seem preferable to sort the data, only once.
> (Rather than sorting it repeatedly, or not at all).
>
> Perhaps, objects such as vectors and data.frame(s) could have a
> boolean attribute, to indicate if they're sorted.
> Or functions such as ave could have a sorted argument.
> In either case, if true, the function assumes the data is sorted and
> applies a more efficient algorithm.
>
>
> B.
>
>
> On Sat, Mar 13, 2021 at 1:07 PM SOEIRO Thomas 
> wrote:
> >
> > Dear all,
> >
> > I have two questions/suggestions about ave, but I am not sure if it's
> relevant for bug reports.
> >
> >
> >
> > 1) I have performance issues with ave in a case where I didn't expect
> it. The following code runs as expected:
> >
> > set.seed(1)
> >
> > df1 <- data.frame(id1 = sample(1:1e2, 5e2, TRUE),
> >   id2 = sample(1:3, 5e2, TRUE),
> >   id3 = sample(1:5, 5e2, TRUE),
> >   val = sample(1:300, 5e2, TRUE))
> >
> > df1$diff <- ave(df1$val,
> > df1$id1,
> > df1$id2,
> > df1$id3,
> > FUN = function(i) c(diff(i), 0))
> >
> > head(df1[order(df1$id1,
> >df1$id2,
> >df1$id3), ])
> >
> > But when expanding the data.frame (* 1e4), ave fails (Error: cannot
> allocate vector of size 1110.0 Gb):
> >
> > df2 <- data.frame(id1 = sample(1:(1e2 * 1e4), 5e2 * 1e4, TRUE),
> >   id2 = sample(1:3, 5e2 * 1e4, TRUE),
> >   id3 = sample(1:(5 * 1e4), 5e2 * 1e4, TRUE),
> >   val = sample(1:300, 5e2 * 1e4, TRUE))
> >
> > df2$diff <- ave(df2$val,
> > df2$id1,
> > df2$id2,
> > df2$id3,
> > FUN = function(i) c(diff(i), 0))
> >
> > This use case does not seem extreme to me (e.g. aggregate et al work
> perfectly on this data.frame).
> > So my question is: Is this expected/intended/reasonable? i.e. Does ave
> need to be optimized?
> >
> >
> >
> > 2) Gabor Grothendieck pointed out in 2011 that drop = TRUE is needed to
> avoid warnings in case of unused levels (
> https://urldefense.com/v3/__https://stat.ethz.ch/pipermail/r-devel/2011-February/059947.html__;!!JQ5agg!J2AUFbQr31F2c6LUpTnyc5TX2Kh1bJ-VqhMND1c0N5axWO_tQl0pCJhtucPfjU7NXrBO$
> ).
> > Is it relevant/possible to expose the drop argument explicitly?
> >
> >
> >
> > Thanks,
> >
> > Thomas
> > __
> > R-devel@r-project.org mailing list
> >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!JQ5agg!J2AUFbQr31F2c6LUpTnyc5TX2Kh1bJ-VqhMND1c0N5axWO_tQl0pCJhtucPfjUzdLFM1$
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] NCOL, as.matrix, cbind and NULL

2021-02-16 Thread Gabriel Becker
Hi all,

so I've known for a while that NROW(NULL) gives 0, where nrow(NULL) gives
an error, so I naively expected NCOL to do the same.

Of course, it does not, and is documented* (more on this in a bit) as not
doing so. For those reading without the documentation open, it gives 1.

The relevant doc states:

‘nrow’ and ‘ncol’ return the number of rows or columns present in ‘x’.
 ‘NCOL’ and ‘NROW’ do the same treating a vector as 1-column matrix, even a
0-length vector, compatibly with ‘as.matrix()’ or ‘cbind()’, see the
example.

But there are a couple of fiddly bits here. First is that it says "even a
0-length *vector*" (emphasis mine), but we have

> is.vector(NULL)
[1] FALSE

As opposed, of course, to, e.g., numeric(0).

Next is the claim of compatibility with as.matrix and cbind, but in both my
released version of R (4.0.2) and devel that I just built from trunk, we
have

> NCOL(NULL)

[1] 1

> cbind(NULL)

NULL

> as.matrix(NULL)

*Error in array(x, c(length(x), 1L), if (!is.null(names(x)))
list(names(x),  : *

*  'data' must be of a vector type, was 'NULL'*


So in fact each function is treating NULL completely differently.


The fix (to change behavior or to add a mention in the documentation that
NULL is treated as a 0-length vector) would be easy to do, should I file a
bug with a patch for this?


Best,

~G

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Allowing S3 methods of rounding functions to take `...`

2021-01-28 Thread Gabriel Becker
Out of my naive curiosity, what arguments are you hoping a method for t()
will take?

I mean honestly an argument could be made that all S3 generics should take
 I don't think its an overwhelmingly compelling one, but I d see some
merit to it given what an s3 generic is at its core.

~G

On Thu, Jan 28, 2021 at 5:27 PM Abby Spurdle  wrote:

> That's a great suggestion Davis.
>
> While, we're on the topic...
> Could we have a "dots" argument in base::t, the transpose function?
>
>
> On Fri, Jan 29, 2021 at 4:48 AM Davis Vaughan  wrote:
> >
> > I should also say that I would be willing to attempt a patch for this, if
> > others agree that this would be useful.
> >
> > - Davis
> >
> > On Thu, Jan 28, 2021 at 9:14 AM Davis Vaughan  wrote:
> >
> > > Hi all,
> > >
> > > I would like to propose adding `...` to the signatures of the following
> > > rounding functions:
> > >
> > > - floor(x)
> > > - ceiling(x)
> > > - round(x, digits = 0)
> > > - And possibly signif(x, digits = 6)
> > >
> > > The purpose would be to allow S3 methods to add additional arguments as
> > > required.
> > >
> > > A few arguments in favor of this change:
> > >
> > > `trunc(x, ...)` already takes dots, which sets a precedent for the
> others
> > > to do so as well. It is documented in the same help file as the other
> > > rounding functions.
> > >
> > > Internally at the C level, a check is done to ensure that there is
> exactly
> > > 1 arg for floor() and ceiling(), and either 1 or 2 args for round().
> The
> > > actual names of those arguments are not checked, however, and I believe
> > > this is what allows `round.Date(x, ...)` and `round.POSIXt(x, unit)` to
> > > exist, solely because they have 2 arguments. It seems like this is a
> bit of
> > > a hack, since you couldn't create something similar for floor, like
> > > `floor.POSIXt(x, unit)` (not saying this should exist, it is just for
> > > argument's sake), because the 1 argument check would error on this. I
> think
> > > adding `...` to the signature of the generics would better support
> what is
> > > being done here.
> > >
> > > Additionally, I have a custom date-like S3 class of my own that I would
> > > like to write floor(), ceiling(), and round() methods for, and they
> would
> > > require passing additional arguments.
> > >
> > > If R core would like to make this change, they could probably tweak
> > > `do_trunc()` to be a bit more general, and use it for floor() and
> > > ceiling(), since it already allows `...`.
> > >
> > > A few references:
> > >
> > > Check for 1 arg in do_math1(), used by floor() and ceiling()
> > >
> > >
> https://github.com/wch/r-source/blob/fe82da3baf849fcd3cc7dbc31c6abc72b57aa083/src/main/arithmetic.c#L1270
> > >
> > > Check for 2 args in do_Math2(), used by round()
> > >
> > >
> https://github.com/wch/r-source/blob/fe82da3baf849fcd3cc7dbc31c6abc72b57aa083/src/main/arithmetic.c#L1655
> > >
> > > do_trunc() definition that allows `...`
> > >
> > >
> https://github.com/wch/r-source/blob/fe82da3baf849fcd3cc7dbc31c6abc72b57aa083/src/main/arithmetic.c#L1329-L1340
> > >
> > > - Davis
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Inconsistency of c.Date: non-commutativity and non-integer Value

2021-01-22 Thread Gabriel Becker
Hi Jens,

On Fri, Jan 22, 2021 at 1:18 PM Dirk Eddelbuettel  wrote:

>
> On 22 January 2021 at 21:35, Jens Heumann wrote:
> | Dear r-devel,
> |
> | Today I came across what I would call inconsistencies in the `c.Date`
> | method compared to what happens when concatenating other classes: 1.
> | Non-commutativity: The type in the arrangements of the elements does
> | matter (first element is critical), 2. the resulting value is numeric
> | instead of expected integer (as in the case with factors).
> |
> |  > ##
> |
> Examples
> |  > ## 1. Non-commutativity:
> |  > c(.1, Sys.Date())
> | [1] 0.1 18649.0
> |  > c(as.integer(.1), Sys.Date())
> | [1] 0 18649
> |  > ## whereas:
> |  > c(Sys.Date(), .1)
> | Error in as.Date.numeric(e) : 'origin' must be supplied
> |  > c(Sys.Date(), as.integer(.1))
> | Error in as.Date.numeric(e) : 'origin' must be supplied
> |  >
> |  > ## 2. Numeric instead of numeric value
> |  > str(c(as.integer(.1), Sys.Date()))
> |   num [1:2] 0 18649  ## not integer
> |  >
> |
> 
>
> |
> |
> | I'm not sure if `c.Date` should redefined, since there would probably be
> | many more classes to consider. However, the error message "'origin' must
> | be supplied" cannot be served by the user and appears to me like an
> | imperfection in the design.
> |
> | It would be desirable if `c.Date` harmonizes with the hierarchy stated
> | in `?c`: "NULL < raw < logical < integer < double < complex < character
> | < list < expression. [...] factors are treated only via their internal
> | integer codes" and behaves best like a factor (and also throws integer
> | values as in 2. above).
>

So I think the issue here is twofold. The first (fairly subtle) issue here
is that "Date" is not a type, it's an "(S3) class" (which, in turn, is just
a labeling attribute, as illustrated by the fact that it's removed by
c.default).

> typeof(Sys.Date())

[1] "double"


So Date cannot appear anywhere in that hierarchy, because it is a hierarchy
of types.

The reason c(as.integer(1), Sys.Date()) gives a numeric, is in fact because
of that type hierarchy, as one of the elements is a "double" (the Date),
which precludes returning an integer.

The second issue is is that S3 dispatch occurs only on the first element,
so you're hitting completely different methods with  c(as.integer(1),
Sys.Date()) and c(Sys.Date(), as.integer(1)) So that is where your
non-commutativity is coming from.  Personally, I think the case for
c(, stuff) to give you a Date object back is stronger than that of
commutativity, given the caveat that it would only be commutative if
c(, ) gave you a character back and
c(, ) gave you a numeric back, but I can see there's space
to disagree about that and argue for commutative absolutism.

Note though that there is unlikely to be a way to get c(, ) to give you a Date back, because, again, S3 dispatch only
sees the first argument, so you would *at best* be in c.character (but in
as it stands now, in c.default, which explicitly strips classes).

I do agree though that imho, the error from the latter could at least be
improved. Unfortunately, Date objects do not carry around their origin, so
we cannot do something such as "use the same origin as the Date object
which caused dispatch into this method for any non-Date objects", which was
going to be my suggestion here until I checked its feasibility.

Best,
~G

|
> | Or maybe disabling non-dates at all `if (!all(sapply(list(Sys.Date(),
> | .1), "class") == "Date")) stop...`, but this is a little beyond my
> | knowledge.
> |
> | Anyway, I hope my remark is of relevance and contributes to the
> | continuous development of our great programming language R!
>
> Nice analysis, well done.  Sadly it is also a "known feature" of the c()
> operator and documented as such -- S3 class attributes drop. C'est la vie.
> From ?c
>
>  ‘c’ is sometimes used for its side effect of removing attributes
>  except names, for example to turn an array into a vector.
>  ‘as.vector’ is a more intuitive way to do this, but also drops
>  names.  Note that methods other than the default are not required
>  to do this (and they will almost certainly preserve a class
>  attribute).
>
> I have into that trap approximately 4.56e8 times in this idiom
>
> > for (d in Sys.Date() + 0:2) print(d)
> [1] 18649
> [1] 18650
> [1] 18651
> >
>
> Eventually one learns to switch to an iterator, and to pick the dates from
> a
> vector preserving their class.
>
> Dirk
>
> --
> https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list

Re: [Rd] `merge()` not consistent in how it treats list columns

2021-01-02 Thread Gabriel Becker
Hi Antoine,


On Sat, Jan 2, 2021 at 11:16 AM Antoine Fabri 
wrote:

> Dear R-devel,
>
> When trying to merge 2 data frames by an "id" column, with this column a
> character in one of them, and a list of character in the other, merge
> behaves differently depending which is given first.
>
> Example :
>
> ```
> df1 <- data.frame(a=1)
> df2 <- data.frame(b=2)
> df1$id <- "ID"
> df2$id <- list("ID")
>
> # these print in a similar way, so the upcoming error will be hard to
> diagnose
> df1
> #>   a id
> #> 1 1 ID
> df2
> #>   b id
> #> 1 2 ID
>
> # especially as this works well, df2$id is treated as an atomic vector
> merge(df1, df2)
> #>   id a b
> #> 1 ID 1 2
>

Well, sure but that is because it happens to be a list with each element
having length one. In which case, it really should not have been a list at
all, and the fact that it was seems a deeper problem that should likely be
resolved instead of treating the symptom, in my opinion.

 > df1 <- data.frame(a=1)

> df2 <- data.frame(b=2)

> df1$id <- "ID"

> df2$id <- list(c("ID", "ID2"))

> merge(df1, df2)

[1] id a  b

<0 rows> (or 0-length row.names)


Thats probably not what you wanted it to do, right? Or maybe it is, it
depends, right?. And therein lies the rub.


I have to be honest, as a developer, I really wish this, even in your
example case, threw an error. Anything else just looks to me like a
debugging nightmare looming in the wings waiting to strike.





> # But this fails with a cryptic error message
> merge(df2, df1)
> #> Error in sort.list(bx[m$xi]): 'x' must be atomic for 'sort.list', method
> "shell" and "quick"
> #> Have you called 'sort' on a list?
> ```
>
> I believe that if we let it work one way it should work the other, and that
> if it works neither an explicit error  mentioning how we can't join by list
> column would be helpful.
>

There's no reason (in principle) you wouldn't be able to join by a list
column, they should just both have to be list columns, in my ideal (but
admittedly unlikely) world.  Id rather the atomic-vector/list mismatch case
throw an error, myself.


Now I kind of doubt we can change the behavior that works now, but as Avi
points out, I think this is something that is complicated and case specific
enough that it really ought to be your job as the coder to take care of
what should happen when you try to merge on columns that are fundamentally
different types.


Plus, having an id column as a list, unless it was really explicitly
intentional, seems very likely to be a bug to me. (I mean id column in the
you want to use it to merge things, not the fact that itw as called "id",
though admittedly those are likely to go together...

Best,
~G


>
> Many thanks and happy new year to all the R community,
>
> Antoine
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Ignore Sites Option For libPaths

2020-12-20 Thread Gabriel Becker
Hi all,

I had intended to do this sooner, but I have filed a wishlist entry, with
patch, for supporting this on bugzilla.

Best,
~G

On Wed, Dec 9, 2020 at 10:48 AM Dirk Eddelbuettel  wrote:

>
> On 9 December 2020 at 09:49, Martin Maechler wrote:
> | Also, R allows the user to remove their own home directory, it
> | should also allow to get a .libPaths() which contains nothing compulsory
> | but R's own .Library {as only that can contain 'base' !}
>
> That would be a very nice-to-have feature! But right now, .libPaths() does
> now allow this per my reading of the help page:
>
>  ‘.libPaths’ is used for getting or setting the library trees that
>  R knows about (and hence uses when looking for packages).  If
>  called with argument ‘new’, the library search path is set to the
>  existing directories in ‘unique(c(new, .Library.site, .Library))’
>  and this is returned.  If given no argument, a character vector
>  with the currently active library trees is returned.
>
> Hence I was trying to help OP approximate the behaviour via the
> command-line
> but count me in as in terms of supporting this in R itself if you want to
> make such a change.
>
> Dirk
>
> --
> https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] quantile() names

2020-12-14 Thread Gabriel Becker
Hi Edgar,

I certainly don't think quantile(x, .975) should return 980, as that is
a completely wrong answer.

I do agree that it seems like the name is a bit offputting. I'm not sure
how deep in the machinery you'd have to go to get digits to no effect on
the names (I don't have time to dig in right this second).

On the other hand, though, if we're going to make the names not respect
digits entirely, what do we do when someone does quantile(x, 1/3)? That'd
be a bad time had by all without digits coming to the rescue, i think.

Best,
~G

On Mon, Dec 14, 2020 at 11:55 AM Merkle, Edgar C. 
wrote:

> All,
>
> Consider the code below
>
> options(digits=2)
> x <- 1:1000
> quantile(x, .975)
>
> The value returned is 975 (the 97.5th percentile), but the name has been
> shortened to "98%" due to the digits option. Is this intended? I would have
> expected the name to also be "97.5%" here. Alternatively, the returned
> value might be 980 in order to match the name of "98%".
>
> Best,
> Ed
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: New pipe operator

2020-12-09 Thread Gabriel Becker
On Wed, Dec 9, 2020 at 8:26 AM Gabor Grothendieck 
wrote:

> On Wed, Dec 9, 2020 at 10:08 AM Duncan Murdoch 
> wrote:
> >
> > You might be interested in this blog post by Michael Barrowman:
> >
> > https://michaelbarrowman.co.uk/post/the-new-base-pipe/
> >
> > He does some timing comparisons, and the current R-devel implementations
> > of |> and \() do quite well.
>
> It does bring out that the requirement of using functions to get around the
> lack of placeholders is not free but exacts a small penalty in
> terms of performance (in addition to verbosity).
>

I mean, technically, yes, but even with that overhead it's 2 *orders of
magnitude* faster than the magrittr you're used to, and by the look of it
~3x faster than the new magrittr. And, those base pipe speeds are in
microseconds. You'd have to be running that pipeline thousands of times -
which people don't generally do with pipelines in the first place -  to see
a *5 millisecond* slowdown, which you would then happily fail to notice
completely because what your pipeline is actually doing takes so much
longer than those microseconds of the extra function call that its unlikely
to be detectable at all.



The bizarro pipe supports placeholders and so doesn't require functions
> as a workaround and thus would presumably be even faster.  It is also
> perfectly consistent with the rest of R and requires no new syntax.
> You have to explicitly add a dot as the first argument but this seems a
> better
> compromise to me than those involved with |> .
>

I mean, I think the bizarro pipe was a pretty clever piece of work. I was
impressed by what John did there, but I don't really know what you're
suggesting here. As you say, the bizarro pipe works now without any changes
and you're welcome to use it if you prefer it to base's (proposed/likely)
|> and magrittr's %>%.

~G

>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Ignore Sites Option For libPaths

2020-12-08 Thread Gabriel Becker
Of course you can, but the ability to do something via R code and the
ability to do them by wrapping the invocation of R are not similar terms of
convenience, IMO.

I say that as someone who routinely does both type of thing.

~G

On Tue, Dec 8, 2020 at 4:07 PM Dirk Eddelbuettel  wrote:

>
> On 8 December 2020 at 23:00, Dario Strbenac wrote:
> | Could .libPaths gain an option to ignore all values other than the
> user-specified new parameter? Currently, it takes the union of new and
> .Library and .Library.site and there is no way to turn it off.
>
> Are you use? It is constructed from looking at environment variables you
> could set.
>
>   edd@rob:~$ R_LIBS="/tmp" R_LIBS_SITE="/var" Rscript -e
> 'print(.libPaths())'
>   [1] "/tmp"   "/var"   "/usr/lib/R/library"
>   edd@rob:~$
>
> Dirk
>
> --
> https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Ignore Sites Option For libPaths

2020-12-08 Thread Gabriel Becker
Hi Dario,

My switchr package is designed specifically to do what you're describing,
and does support excluding site libraries. So clearly I agree it would be
useful, but also, it does go against the "concept" of site libraries
somewhat.

I agree it would be a useful addition with the default to including them,
personally. And the patch would be pretty easy to put together. I can put
together a patch and add it to bugzilla as a wishlist item and we'll see
what thoughts are (unless I hear an emphatic no here, in which case I won't
bother, or unless you'd like to take a crack at it yourself).

~G

On Tue, Dec 8, 2020 at 3:00 PM Dario Strbenac 
wrote:

> Good day,
>
> Could .libPaths gain an option to ignore all values other than the
> user-specified new parameter? Currently, it takes the union of new and
> .Library and .Library.site and there is no way to turn it off. For quick
> and convenient troubleshooting that doesn't involve requiring the editing
> of configuration files, it would be nice to be able to run
> .libPaths(.libPaths()[1], ignoreSiteFiles = TRUE) to limit to only one
> folder of R packages.
>
> > .libPaths()
> [1] "/dskh/nobackup/biostat/Rpackages/v4"
> "/usr/users/course/splus/library/R"
> [3] "/usr/lib/R/site-library" "/usr/lib/R/library"
> > .libPaths(.libPaths()[1]) # No option to ignore system-wide folders.
> > .libPaths() # Paths are same as before.
> [1] "/dskh/nobackup/biostat/Rpackages/v4"
> "/usr/users/course/splus/library/R"
> [3] "/usr/lib/R/site-library" "/usr/lib/R/library"
>
> --
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Gabriel Becker
Hi Denes,

On Mon, Dec 7, 2020 at 2:52 PM Dénes Tóth  wrote:

>
>
> This gave me the idea that naming the arguments can be used to skip the
> placeholder issue:
>
> "funny" |> sub(pattern = "f", replacement = "b")
>
> Of course this breaks if the maintainer changes the order of the
> function arguments (which is not a nice practice but happens).
>

This is true, but only if you are specifying all arguments that appear
before the one you want explicitly. In practice that may often be true? But
I don't really have a strong intuition about that as a non-pipe user. It
would require zero changes to the pipe by the R-core team though, so in
that sense it could be a solution in the cases it does work. It does make
the code subtler to read though, which is a pretty big downside, imho.


> An option could be to allow for missing argument in the first position,
> but this might add further undesired complexity, so probably not worth
> the effort:
>
> "funny" |> sub(x =, "f", "b")
>
> So basically the parsing rule would be:
>
> LHS |> RHS(arg=, ...) -> RHS(arg=LHS, ...)
>

The problem here is that its ambiguous, because myfun(x, y=, z) is
technically syntactically valid, so this would make code that parses now
into valid syntax change its meaning, and would prevent existing,
syntactically valid (Though hopefully quite rare) code in the pipe context.

~G


>
> >
> > (Assuming we could get the parser to handle |^bla^> correctly)
> >
> > For argument position issues would be sufficient. For more complicated
> > expressions, e.g., those that would use the placeholder multiple times or
> > inside compound expressions, requiring anonymous functions seems quite
> > reasonable to me. And honestly, while I kind of like it, I'm not sure if
> > that "stuffed pipe" expression (assuming we could get the parser to
> capture
> > it correctly) reads to me as nicer than the following, anyway.
> >
> > LHS |> \(x) RHS(arg1 = 5, pipearg = x, arg3 = 7)
> >
> > ~G
> >
> >>
> >> I also agree usages of the `.` placeholder can make the code more
> >> challenging to read, since understanding the behavior of a piped
> >> expression then requires scouring the RHS for usages of `.`, which can
> >> be challenging in dense code. Piping to an anonymous function makes
> >> the intent clear to the reader: the programmer is likely piping to an
> >> anonymous function because they care where the argument is used in the
> >> call, and so the reader of code should be aware of that.
> >>
> >> Best,
> >> Kevin
> >>
> >>
> >>
> >> On Mon, Dec 7, 2020 at 10:35 AM Gabor Grothendieck
> >>  wrote:
> >>>
> >>> On Mon, Dec 7, 2020 at 12:54 PM Duncan Murdoch <
> murdoch.dun...@gmail.com>
> >> wrote:
>  An advantage of the current implementation is that it's simple and
> easy
>  to understand.  Once you make it a user-modifiable binary operator,
>  things will go kind of nuts.
> 
>  For example, I doubt if there are many users of magrittr's pipe who
>  really understand its subtleties, e.g. the example in Luke's paper
> >> where
>  1 %>% c(., 2) gives c(1,2), but 1 %>% c(c(.), 2) gives c(1, 1, 2).
> (And
>  I could add 1 %>% c(c(.), 2, .) and  1 %>% c(c(.), 2, . + 2)  to
>  continue the fun.)
> >>>
> >>> The rule is not so complicated.  Automatic insertion is done unless
> >>> you use dot in the top level function or if you surround it with
> >>> {...}.  It really makes sense since if you use gsub(pattern,
> >>> replacement, .) then surely you don't want automatic insertion and if
> >>> you surround it with { ... } then you are explicitly telling it not
> >>> to.
> >>>
> >>> Assuming the existence of placeholders a possible simplification would
> >>> be to NOT do automatic insertion if { ... } is used and to use it
> >>> otherwise although personally having used it for some time I find the
> >>> existing rule in magrittr generally does what you want.
> >>>
> >>> __
> >>> R-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Gabriel Becker
On Mon, Dec 7, 2020 at 11:05 AM Kevin Ushey  wrote:

> IMHO the use of anonymous functions is a very clean solution to the
> placeholder problem, and the shorthand lambda syntax makes it much
> more ergonomic to use. Pipe implementations that crawl the RHS for
> usages of `.` are going to be more expensive than the alternatives. It
> is nice that the `|>` operator is effectively the same as a regular R
> function call, and given the identical semantics could then also be
> reasoned about the same way regular R function calls are.
>

I agree. That said, one thing that maybe could be done, though I'm not
super convinced its needed, is make a "curry-stuffed pipe", where something
like

LHS |^pipearg^> RHS(arg1 = 5, arg3 = 7)

Would parse to

RHS(pipearg = LHS, arg1 = 5, arg3 = 7)


(Assuming we could get the parser to handle |^bla^> correctly)

For argument position issues would be sufficient. For more complicated
expressions, e.g., those that would use the placeholder multiple times or
inside compound expressions, requiring anonymous functions seems quite
reasonable to me. And honestly, while I kind of like it, I'm not sure if
that "stuffed pipe" expression (assuming we could get the parser to capture
it correctly) reads to me as nicer than the following, anyway.

LHS |> \(x) RHS(arg1 = 5, pipearg = x, arg3 = 7)

~G

>
> I also agree usages of the `.` placeholder can make the code more
> challenging to read, since understanding the behavior of a piped
> expression then requires scouring the RHS for usages of `.`, which can
> be challenging in dense code. Piping to an anonymous function makes
> the intent clear to the reader: the programmer is likely piping to an
> anonymous function because they care where the argument is used in the
> call, and so the reader of code should be aware of that.
>
> Best,
> Kevin
>
>
>
> On Mon, Dec 7, 2020 at 10:35 AM Gabor Grothendieck
>  wrote:
> >
> > On Mon, Dec 7, 2020 at 12:54 PM Duncan Murdoch 
> wrote:
> > > An advantage of the current implementation is that it's simple and easy
> > > to understand.  Once you make it a user-modifiable binary operator,
> > > things will go kind of nuts.
> > >
> > > For example, I doubt if there are many users of magrittr's pipe who
> > > really understand its subtleties, e.g. the example in Luke's paper
> where
> > > 1 %>% c(., 2) gives c(1,2), but 1 %>% c(c(.), 2) gives c(1, 1, 2). (And
> > > I could add 1 %>% c(c(.), 2, .) and  1 %>% c(c(.), 2, . + 2)  to
> > > continue the fun.)
> >
> > The rule is not so complicated.  Automatic insertion is done unless
> > you use dot in the top level function or if you surround it with
> > {...}.  It really makes sense since if you use gsub(pattern,
> > replacement, .) then surely you don't want automatic insertion and if
> > you surround it with { ... } then you are explicitly telling it not
> > to.
> >
> > Assuming the existence of placeholders a possible simplification would
> > be to NOT do automatic insertion if { ... } is used and to use it
> > otherwise although personally having used it for some time I find the
> > existing rule in magrittr generally does what you want.
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-07 Thread Gabriel Becker
On Mon, Dec 7, 2020 at 10:35 AM Gabor Grothendieck 
wrote:

> On Mon, Dec 7, 2020 at 12:54 PM Duncan Murdoch 
> wrote:
> > An advantage of the current implementation is that it's simple and easy
> > to understand.  Once you make it a user-modifiable binary operator,
> > things will go kind of nuts.
> >
> > For example, I doubt if there are many users of magrittr's pipe who
> > really understand its subtleties, e.g. the example in Luke's paper where
> > 1 %>% c(., 2) gives c(1,2), but 1 %>% c(c(.), 2) gives c(1, 1, 2). (And
> > I could add 1 %>% c(c(.), 2, .) and  1 %>% c(c(.), 2, . + 2)  to
> > continue the fun.)
>
> The rule is not so complicated.  Automatic insertion is done unless
> you use dot in the top level function or if you surround it with
> {...}.  It really makes sense since if you use gsub(pattern,
> replacement, .) then surely you don't want automatic insertion and if
> you surround it with { ... } then you are explicitly telling it not
> to.
>
>
This is the point that I believe Duncan is trying to make (and I agree
with) though. Consider the question "after piping LHS into RHS, what is the
first argument in the resulting call?".

For the base pipe, the answer, completely unambiguously, is LHS. Full stop.
That is easy to understand.

For magrittr the answer is "Well, it depends, let me see your RHS
expression, is it wrapped in braces? If not, are you using the placeholder?
If you are using the placeholder, where/how are you using it?".

That is inherently much more complicated. Yes, you understand how the
magrittr pipe behaves, and yes you find it very convenient. Thats great,
but neither of those things equate to simplicity. They just mean that you,
a very experienced pipe user, carry around the cognitive load necessary to
have that understanding.

More concretely, the current base pipe  is extremely simple, all it does i


   1. Figure out RHS exprssion call
 1. If RHS is an anonymous function declaration, construct a call
 to it for a new RHS
  2. Insert LHS expression into first argument position of RHS call
  expression


Done. And (1) would be removed if anonymous functions required () after
them, which would be consistent, and even simpler, but kind of annoying. I
think it is a good compromise which is guaranteed to be safe because
anonymous functions are something the parser recognizes.  Either way, if
that was dropped, what |> does would be *entirely* trivial to understand
and explain. With a single sentence.

I had the equivalent pseudocode for the magrittr pipe written out here but
honestly felt like overkill that came across as mean, so I'll leave that as
an exercise to interested readers.

~G

> Assuming the existence of placeholders a possible simplification would
> be to NOT do automatic insertion if { ... } is used and to use it
> otherwise although personally having used it for some time I find the
> existing rule in magrittr generally does what you want.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New pipe operator

2020-12-06 Thread Gabriel Becker
Hi Gabor,

On Sun, Dec 6, 2020 at 3:22 PM Gabor Grothendieck 
wrote:

> I understand very well that it is implemented at the syntax level;
> however, in any case the implementation is irrelevant to the principles.
>
> Here a similar example to the one I gave before but this time written out:
>
> This works:
>
>   3 |> function(x) x + 1
>
> but this does not:
>
>   foo <- function(x) x + 1
>   3 |> foo
>
> so it breaks the principle of functions being first class objects.  foo
> and its
> definition are not interchangeable.


I understood what you meant as well.

The issue is that neither foo nor its definition are being operated on, or
even exist within the scope of what |> is defined to do. You are used to
magrittr's %>% where arguably what you are saying would be true. But its
not here, in my view.

Again, I think the issue is that |>, in as much as it "operates" on
anything at all (it not being a function, regardless of appearances),
operates on call expression objects, NOT on functions, ever.

function(x) x *parses to a call expression *as does RHSfun(), while RHSfun does
not, it parses to a name, *regardless of whether that symbol will
eventually evaluate to a closure or not.*

So in fact, it seems to me that, technically, all name symbols are being
treated exactly the same (none are allowed, including those which will
lookup to functions during evaluation), while all* call expressions are
also being treated the same. And again, there are no functions anywhere in
either case.

* except those that include that the parser flags as syntactically special.


> You have
> to write 3 |> foo() but don't have to write 3 |> (function(x) x + 1)().
>

I think you should probably be careful what you wish for here. I'm not
involved with this work and do not speak for any of those who were, but the
principled way to make that consistent while remaining entirely in the
parser seems very likely to be to require the latter, rather than not
require the former.


> This isn't just a matter of notation, i.e. foo vs foo(), but is a
> matter of breaking
> the way R works as a functional language with first class functions.
>

I don't agree. Consider `+`

Having

foo <- get("+") ## note no `` here
foo(x,y)

parse and work correctly while

+(x,y)

 does not does not mean + isn't a function or that it is a "second class
citizen", it simply means that the parser has constraints on the syntax for
writing code that calls it that calling other functions are not subject to.
The fact that such *syntactic* constraints can exist proves that there is
not some overarching inviolable principle being violated here, I think. Now
you may say "well thats just the parser, it has to parse + specially
because its an operator with specific precedence etc". Well, the same exact
thing is true of |> I think.

Best,
~G

>
> On Sun, Dec 6, 2020 at 4:06 PM Gabriel Becker 
> wrote:
> >
> > Hi Gabor,
> >
> > On Sun, Dec 6, 2020 at 12:52 PM Gabor Grothendieck <
> ggrothendi...@gmail.com> wrote:
> >>
> >> I think the real issue here is that functions are supposed to be
> >> first class objects in R
> >> or are supposed to be and |> would break that if if is possible
> >> to write function(x) x + 1 on the RHS but not foo (assuming foo
> >> was defined as that function).
> >>
> >> I don't think getting experience with using it can change that
> >> inconsistency which seems serious to me and needs to
> >> be addressed even if it complicates the implementation
> >> since it drives to the heart of what R is.
> >>
> >
> > With respect I think this is a misunderstanding of what is happening
> here.
> >
> > Functions are first class citizens. |> is, for all intents and purposes,
> a macro.
> >
> > LHS |> RHS(arg2=5)
> >
> > parses to
> >
> > RHS(LHS, arg2 = 5)
> >
> > There are no functions at the point in time when the pipe transformation
> happens, because no code has been evaluated. To know if a symbol is going
> to evaluate to a function requires evaluation which is a step entirely
> after the one where the |> pipe is implemented.
> >
> > Another way to think about it is that
> >
> > LHS |> RHS(arg2 = 5)
> >
> > is another way of writing RHS(LHS, arg2 = 5), NOT R code that is (or
> even can be) evaluated.
> >
> >
> > Now this is a subtle point that only really has implications in as much
> as it is not the case for magrittr pipes, but its relevant for discussions
> like this, I think.
> >
> > ~G
> >
> >> On Sat, Dec 5, 2020 at 1:08 PM Gabor Grothendieck
> >>  wrote:
> >

Re: [Rd] New pipe operator

2020-12-06 Thread Gabriel Becker
Hi Gabor,

On Sun, Dec 6, 2020 at 12:52 PM Gabor Grothendieck 
wrote:

> I think the real issue here is that functions are supposed to be
> first class objects in R
> or are supposed to be and |> would break that if if is possible
> to write function(x) x + 1 on the RHS but not foo (assuming foo
> was defined as that function).
>
> I don't think getting experience with using it can change that
> inconsistency which seems serious to me and needs to
> be addressed even if it complicates the implementation
> since it drives to the heart of what R is.
>
>
With respect I think this is a misunderstanding of what is happening here.

Functions are first class citizens. |> is, for all intents and purposes, a
*macro. *

LHS |> RHS(arg2=5)

*parses to*

RHS(LHS, arg2 = 5)

There are no functions at the point in time when the pipe transformation
happens, because no code has been evaluated. To know if a symbol is going
to evaluate to a function requires evaluation which is a step entirely
after the one where the |> pipe is implemented.

Another way to think about it is that

LHS |> RHS(arg2 = 5)

is another way of *writing* RHS(LHS, arg2 = 5), NOT R code that is (or even
can be) evaluated.


Now this is a subtle point that only really has implications in as much as
it is not the case for magrittr pipes, but its relevant for discussions
like this, I think.

~G

On Sat, Dec 5, 2020 at 1:08 PM Gabor Grothendieck
>  wrote:
> >
> > The construct utils::head  is not that common but bare functions are
> > very common and to make it harder to use the common case so that
> > the uncommon case is slightly easier is not desirable.
> >
> > Also it is trivial to write this which does work:
> >
> > mtcars %>% (utils::head)
> >
> > On Sat, Dec 5, 2020 at 11:59 AM Hugh Parsonage 
> wrote:
> > >
> > > I'm surprised by the aversion to
> > >
> > > mtcars |> nrow
> > >
> > > over
> > >
> > > mtcars |> nrow()
> > >
> > > and I think the decision to disallow the former should be
> > > reconsidered.  The pipe operator is only going to be used when the rhs
> > > is a function, so there is no ambiguity with omitting the parentheses.
> > > If it's disallowed, it becomes inconsistent with other treatments like
> > > sapply(mtcars, typeof) where sapply(mtcars, typeof()) would just be
> > > noise.  I'm not sure why this decision was taken
> > >
> > > If the only issue is with the double (and triple) colon operator, then
> > > ideally `mtcars |> base::head` should resolve to `base::head(mtcars)`
> > > -- in other words, demote the precedence of |>
> > >
> > > Obviously (looking at the R-Syntax branch) this decision was
> > > considered, put into place, then dropped, but I can't see why
> > > precisely.
> > >
> > > Best,
> > >
> > >
> > > Hugh.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Sat, 5 Dec 2020 at 04:07, Deepayan Sarkar <
> deepayan.sar...@gmail.com> wrote:
> > > >
> > > > On Fri, Dec 4, 2020 at 7:35 PM Duncan Murdoch <
> murdoch.dun...@gmail.com> wrote:
> > > > >
> > > > > On 04/12/2020 8:13 a.m., Hiroaki Yutani wrote:
> > > > > >>   Error: function '::' not supported in RHS call of a pipe
> > > > > >
> > > > > > To me, this error looks much more friendly than magrittr's error.
> > > > > > Some of them got too used to specify functions without (). This
> > > > > > is OK until they use `::`, but when they need to use it, it takes
> > > > > > hours to figure out why
> > > > > >
> > > > > > mtcars %>% base::head
> > > > > > #> Error in .::base : unused argument (head)
> > > > > >
> > > > > > won't work but
> > > > > >
> > > > > > mtcars %>% head
> > > > > >
> > > > > > works. I think this is a too harsh lesson for ordinary R users to
> > > > > > learn `::` is a function. I've been wanting for magrittr to drop
> the
> > > > > > support for a function name without () to avoid this confusion,
> > > > > > so I would very much welcome the new pipe operator's behavior.
> > > > > > Thank you all the developers who implemented this!
> > > > >
> > > > > I agree, it's an improvement on the corresponding magrittr error.
> > > > >
> > > > > I think the semantics of not evaluating the RHS, but treating the
> pipe
> > > > > as purely syntactical is a good decision.
> > > > >
> > > > > I'm not sure I like the recommended way to pipe into a particular
> argument:
> > > > >
> > > > >mtcars |> subset(cyl == 4) |> \(d) lm(mpg ~ disp, data = d)
> > > > >
> > > > > or
> > > > >
> > > > >mtcars |> subset(cyl == 4) |> function(d) lm(mpg ~ disp, data =
> d)
> > > > >
> > > > > both of which are equivalent to
> > > > >
> > > > >mtcars |> subset(cyl == 4) |> (function(d) lm(mpg ~ disp, data
> = d))()
> > > > >
> > > > > It's tempting to suggest it should allow something like
> > > > >
> > > > >mtcars |> subset(cyl == 4) |> lm(mpg ~ disp, data = .)
> > > >
> > > > Which is really not that far off from
> > > >
> > > > mtcars |> subset(cyl == 4) |> \(.) lm(mpg ~ disp, data = .)
> > > >
> > > > once you get used to it.
> > > >
> > > > One 

Re: [Rd] [External] Re: New pipe operator

2020-12-06 Thread Gabriel Becker
Hi Denes,

On Sun, Dec 6, 2020 at 6:43 AM Dénes Tóth  wrote:

> Dear Luke,
>
> In the meantime I checked the R-syntax branch and the docs; they are
> very helpful. I would also like to thank you for putting effort into
> this feature. Keeping it at the syntax level is also a very smart
> decision. However, the current API might not exploit the full power of
> the basic idea.
>
> 1) Requiring either an anonymous function or a function call, but not
> allowing for symbols which point to functions is inconsistent and will
> be misleading for non-experts.
>
> foo <- function(x) x
> identical(foo, function(x) x)
>
> mtcars |> foo   #bang!
> mtcars |> function(x) x #fine?
>
> You stated in :
> "
> Another variation supported by the implementation is that a symbol on
> the RHS is interpreted as the name of a function to call with the LHS
> as argument:
>
> ```r
>  > quote(x |> f)
> f(x)
> ```
> "
>
> So clearly this is not an implementation issue but a design decision.
>
> As a remedy, two different pipe operators could be introduced:
>
> LHS |> RHS-> RHS is treated as a function call
> LHS |>> RHS   -> RHS is treated as a function
>
> If |>> is used, it would not matter which notation is used for the RHS
> expression; the parser would assume it evaluates to a function.
>

I think multiplying the operators would not be a net positive. You'd then
have to remember and mix them when you mix anonymous functions and
non-anonymous functions.  It would result in

LHS |> RHS1() |>> \(x,y) blablabla |> RHS3()

I think thats too much intricacy. Better to be a little more restrictive
in way that (honestly doesnt' really hurt anything afaics, and) guarantees
consistency.

>
> 2) Simplified lambda expression:
> IMHO in the vast majority of use cases, this is used for single-argument
> functions, so parenthesis would not be required. Hence, both forms would
> be valid and equivalent:
>
> \x x + 1
> \(x) x + 1
>
>
Why special case something here when soemtimes you'll want more than one
argument. The parentheses seem really not a big deal. So I don't understand
the motivation here, if I'm being honest.


>
> 3) Function composition:
> Allowing for concise composition of functions would be a great feature.
> E.g., instead of
>
> foo <- function(x) print(mean(sqrt(x), na.rm = TRUE), digits = 2)
>
> or
>
> foo <- \x {x |> sqrt() |> mean(na.rm = TRUE) |> print(digits = 2)}
>
> one could write
>
> foo <- \x |> sqrt() |> mean(na.rm = TRUE) |> print(digits = 2)
>
> So basically if the lambda argument is followed by a pipe operator, the
> pipe chain is transformed to a function body where the first lambda
> argument is inserted into the first position of the pipeline.
>

This one I disagree with very strongly. Reading pipelines would suddenly
require a *much* higher cognitive load than before because you have to
model that complexity just to read it and know what it says. The brackets
there seem like an extremely low price to pay to avoid that. Operator
precedence should be extremely and easily predictable.


>
>
> Best,
> Denes
>
>
> On 12/5/20 7:10 PM, luke-tier...@uiowa.edu wrote:
> > We went back and forth on this several times. The key advantage of
> > requiring parentheses is to keep things simple and consistent.  Let's
> > get some experience with that. If experience shows requiring
> > parentheses creates too many issues then we can add the option of
> > dropping them later (with special handling of :: and :::). It's easier
> > to add flexibility and complexity than to restrict it after the fact.
> >
> > Best,
> >
> > luke
> >
> > On Sat, 5 Dec 2020, Hugh Parsonage wrote:
> >
> >> I'm surprised by the aversion to
> >>
> >> mtcars |> nrow
> >>
> >> over
> >>
> >> mtcars |> nrow()
> >>
> >> and I think the decision to disallow the former should be
> >> reconsidered.  The pipe operator is only going to be used when the rhs
> >> is a function, so there is no ambiguity with omitting the parentheses.
> >> If it's disallowed, it becomes inconsistent with other treatments like
> >> sapply(mtcars, typeof) where sapply(mtcars, typeof()) would just be
> >> noise.  I'm not sure why this decision was taken
> >>
> >> If the only issue is with the double (and triple) colon operator, then
> >> ideally `mtcars |> base::head` should resolve to `base::head(mtcars)`
> >> -- in other words, demote the precedence of |>
> >>
> >> Obviously (looking at the R-Syntax branch) this decision was
> >> considered, put into place, then dropped, but I can't see why
> >> precisely.
> >>
> >> Best,
> >>
> >>
> >> Hugh.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Sat, 5 Dec 2020 at 04:07, Deepayan Sarkar
> >>  wrote:
> >>>
> >>> On Fri, Dec 4, 2020 at 7:35 PM Duncan Murdoch
> >>>  wrote:
> 
>  On 04/12/2020 8:13 a.m., Hiroaki Yutani wrote:
> >>   Error: function '::' not supported in RHS call of a pipe
> >
> > To me, this error looks much more friendly than magrittr's error.
> > Some of them got too used to specify 

Re: [Rd] return (x+1) * 1000

2020-11-20 Thread Gabriel Becker
And the related:

> f = function() stop(return("lol"))

> f()

[1] "lol"


I have a feeling all of this is just return() performing correctly though.
If there are already R CMD CHECK checks for this kind of thing (I
wasnt sure but I'm hearing from others there may be/are) that may be
(and/or may need to be) sufficient.

~G

On Fri, Nov 20, 2020 at 3:27 PM Dénes Tóth  wrote:

> Or even more illustratively:
>
> uneval_after_return <- function(x) {
>return(x) * stop("Not evaluated")
> }
> uneval_after_return(1)
> # [1] 1
>
> On 11/20/20 10:12 PM, Mateo Obregón wrote:
> > Dear r-developers-
> >
> > After many years of using and coding in R and other languages, I came
> across
> > something that I think should be flagged by the parser:
> >
> > bug <- function (x) {
> >   return (x + 1) * 1000
> > }
> >> bug(1)
> > [1] 2
> >
> > The return() call is not like any other function call that returns a
> value to
> > the point where it was called from. I think this should
> straightforwardly be
> > handled in the parser by flagging it as a syntactic error.
> >
> > Thoughts?
> >
> > Mateo.
> > --
> > Mateo Obregón.
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] return (x+1) * 1000

2020-11-20 Thread Gabriel Becker
Hi all,

I can confirm this occurs for me as well.

The one thing that comes to mind is that there are certain larger
expressions that contain calls to return which we absolutely don't want to
be an error, e.g

if(somestuff)
return(TRUE)


That said, the actual expression Mateo pointed out certainly does look like
an error (it definitely isn't going to do what the developer intended).

I haven't looked at the parser much, to be honest. I assume there is
perhaps enough differentiation of if/else that return() could be allowed
within that but not inside a larger expression without it?

There would be things that are legal (though horrifying) now that would
stop working though, such as:

f = function(a) {

ret = switch(a,

 "1"= return("haha got 1!"),

 "2" = "regular ole 2")

ret

}


Whether it would be a problem or not that such insanity wouldn't work is
less clear. Are there valid non-if embedded return() cases that are
important to allow? If so (and if they're not differentiated by the parser,
which I somewhat doubt switch is, for example, though I'm not certain), I'm
skeptical we'd be able to do as he suggests.

It does seem worth considering though. If it can't be a hard parse error
but we agree many/most cases are problematic, perhaps adding detecting this
to the static checks that R CMD CHECK performs is another way forward.

Best,
~G

On Fri, Nov 20, 2020 at 1:34 PM Mateo Obregón 
wrote:

> Dear r-developers-
>
> After many years of using and coding in R and other languages, I came
> across
> something that I think should be flagged by the parser:
>
> bug <- function (x) {
>  return (x + 1) * 1000
> }
> > bug(1)
> [1] 2
>
> The return() call is not like any other function call that returns a value
> to
> the point where it was called from. I think this should straightforwardly
> be
> handled in the parser by flagging it as a syntactic error.
>
> Thoughts?
>
> Mateo.
> --
> Mateo Obregón.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] exists, get and get0 accept silently inputs of length > 1

2020-11-16 Thread Gabriel Becker
Hi Luke et al.,

Apologies. I knew there was no NSE before but incorrectly inferred from the
previous message that some had been added. Should have looked at the commit
myself before chiming in. Sorry for the noise.

~G

On Mon, Nov 16, 2020 at 8:39 PM  wrote:

> Come on, folks. There is no NSE involved in calls to get(): it's
> standard evaluation all the way into the C code. Prior to the change a
> first argument that is anything other than a character vector would
> produce an error. After the change, passing in a symbol will do the
> obvious thing. Code that worked previously without error (i.e. called
> get() with string values) will continue to work exactly as it did
> before.
>
> It's a little more convenient and a little more efficient for some
> computations on the language not to have to call as.character on
> symbols before passing them to get(). Hence the change expanding the
> domain of get().
>
> luke
>
> On Tue, 17 Nov 2020, Gabriel Becker wrote:
>
> > Hi all,
> > I have used variable values in get() as well, and including, I think, in
> > package code (though pretty infrequently).
> > Perhaps a character.only argument similar to library?
> >
> > ~G
> >
> > On Mon, Nov 16, 2020 at 5:31 PM Hugh Parsonage  >
> > wrote:
> >   I noticed the recent commit to R-dev (r79434).  Is this wise?
> >   I've
> >   often used get() in constructions like
> >
> >   for (j in ls()) if (is.numeric(x <- get(j))) ...
> >
> >   (and often interactively, rather than in a package)
> >
> >   Am I to understand that get(j) will now be equivalent to `j`
> >   even if j
> >   is a string referring putatively to another object?
> >
> >   On Sat, 14 Nov 2020 at 01:34,  wrote:
> >   >
> >   > Worth looking into. It would probably cause some check
> >   failures, so
> >   > would probably be a good idea to run a check across
> >   BIOC/CRAN.  At the
> >   > same time it would be worth allowing name objects (type
> >   "symbol") so
> >   > thee don't have to be converted to character for the call and
> >   then
> >   > back to names internally for the environment lookup.
> >   >
> >   > Best,
> >   >
> >   > luke
> >   >
> >   > On Fri, 13 Nov 2020, Antoine Fabri wrote:
> >   >
> >   > > Dear R-devel,
> >   > >
> >   > > The doc of exists, get and get0 is unambiguous, x should be
> >   an object given
> >   > > as a character string. However these accept longer inputs.
> >   It can lead an
> >   > > uncareful user to think these functions are vectorized when
> >   they're not,
> >   > > and generally lets through bugs that one might have
> >   preferred to trigger
> >   > > earlier failure.
> >   > >
> >   > > ``` r
> >   > > exists("d")
> >   > > #> [1] FALSE
> >   > > exists(c("c", "d"))
> >   > > #> [1] TRUE
> >   > > get(c("c", "d"))
> >   > > #> function (...)  .Primitive("c")
> >   > > get0(c("c", "d"))
> >   > > #> function (...)  .Primitive("c")
> >   > > ```
> >   > >
> >   > > I believe these should either fail, or be vectorized,
> >   probably the former.
> >   > >
> >   > > Thanks,
> >   > >
> >   > > Antoine
> >   > >
> >   > >   [[alternative HTML version deleted]]
> >   > >
> >   > > __
> >   > > R-devel@r-project.org mailing list
> >   > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >   > >
> >   >
> >   > --
> >   > Luke Tierney
> >   > Ralph E. Wareham Professor of Mathematical Sciences
> >   > University of Iowa  Phone:
> >319-335-3386
> >   > Department of Statistics andFax:
> >319-335-3017
> >   > Actuarial Science
> >   > 241 Schaeffer Hall  email:
> >luke-tier...@uiowa.edu
> >   > Iowa City, IA 52242 WWW:
> >   http://www.stat.uiowa.edu
> >   >
> >   > __
> >   > R-devel@r-project.org mailing list
> >   > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >   __
> >   R-devel@r-project.org mailing list
> >   https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
> Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] exists, get and get0 accept silently inputs of length > 1

2020-11-16 Thread Gabriel Becker
Hi all,

I have used variable values in get() as well, and including, I think, in
package code (though pretty infrequently).

Perhaps a character.only argument similar to library?

~G

On Mon, Nov 16, 2020 at 5:31 PM Hugh Parsonage 
wrote:

> I noticed the recent commit to R-dev (r79434).  Is this wise? I've
> often used get() in constructions like
>
> for (j in ls()) if (is.numeric(x <- get(j))) ...
>
> (and often interactively, rather than in a package)
>
> Am I to understand that get(j) will now be equivalent to `j` even if j
> is a string referring putatively to another object?
>
> On Sat, 14 Nov 2020 at 01:34,  wrote:
> >
> > Worth looking into. It would probably cause some check failures, so
> > would probably be a good idea to run a check across BIOC/CRAN.  At the
> > same time it would be worth allowing name objects (type "symbol") so
> > thee don't have to be converted to character for the call and then
> > back to names internally for the environment lookup.
> >
> > Best,
> >
> > luke
> >
> > On Fri, 13 Nov 2020, Antoine Fabri wrote:
> >
> > > Dear R-devel,
> > >
> > > The doc of exists, get and get0 is unambiguous, x should be an object
> given
> > > as a character string. However these accept longer inputs. It can lead
> an
> > > uncareful user to think these functions are vectorized when they're
> not,
> > > and generally lets through bugs that one might have preferred to
> trigger
> > > earlier failure.
> > >
> > > ``` r
> > > exists("d")
> > > #> [1] FALSE
> > > exists(c("c", "d"))
> > > #> [1] TRUE
> > > get(c("c", "d"))
> > > #> function (...)  .Primitive("c")
> > > get0(c("c", "d"))
> > > #> function (...)  .Primitive("c")
> > > ```
> > >
> > > I believe these should either fail, or be vectorized, probably the
> former.
> > >
> > > Thanks,
> > >
> > > Antoine
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> > --
> > Luke Tierney
> > Ralph E. Wareham Professor of Mathematical Sciences
> > University of Iowa  Phone: 319-335-3386
> > Department of Statistics andFax:   319-335-3017
> > Actuarial Science
> > 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> > Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sum() (and similar methods) should work for zero row data.frames

2020-10-18 Thread Gabriel Becker
Peter et al,

I had the same thought, in particular for any() and all(), which in as much
as they should work on data.frames in the first place (which to be
perfectly honest i do find quite debatable myself), should certainly work
on "logical" data.frames if they are going to work on "numeric" ones.

I can volunteer to prepare a patch if Martin (the reporter) did not want to
take a crack at it, and further if it is not already being done within
R-core.

Best,
~G

On Sun, Oct 18, 2020 at 12:19 AM peter dalgaard  wrote:

> Hmm, yes, this is probably wrong. E.g., we are likely to get
> inconsistencies out of boundary cases like this
>
> > a <- na.omit(airquality)
> > sum(a)
> [1] 37495.3
> > sum(a[FALSE,])
> Error in FUN(X[[i]], ...) :
>   only defined on a data frame with all numeric variables
>
> Or, closer to an actual use case:
>
> > sum(subset(a, Ozone>100))
> [1] 3330.5
> > sum(subset(a, Ozone>200))
> Error in FUN(X[[i]], ...) :
>   only defined on a data frame with all numeric variables
>
>
> However, given that numeric summaries generally treat logicals as 0/1,
> wouldn't it be easiest just to extend the check inside Summary.data.frame
> with "&& !is.logical(x)"?
>
> > sum(as.matrix(a[FALSE,]))
> [1] 0
>
> -pd
>
> > On 17 Oct 2020, at 21:18 , Martin  wrote:
> >
> > The "Summary" group generics always throw errors for a data.frame with
> zero rows, for example:
> >> sum(data.frame(x = numeric(0)))
> > #> Error in FUN(X[[i]], ...) :
> > #>   only defined on a data frame with all numeric variables
> > Same behaviour for min, max, any, all, ... . I believe this is
> inconsistent with what these methods do for other empty objects (vectors,
> matrices), where the return value is chosen to ensure transitivity:
> sum(numeric(0)) == 0.
> >
> > The reason for this is that the return type of as.matrix() for empty (no
> rows or no columns) data.frame objects is always a matrix of type
> "logical". The Summary method for data.frame, in turn, throws an error when
> the data.frame, converted to a matrix, is not of numeric type.
> >
> > I suggest two ways that make sum, min, max, ... more consistent. IMHO it
> would be fitting to implement both of these fixes, because they also make
> other things more consistent.
> >
> > 1. Make the return type of as.matrix() for zero-row data.frames
> consistent with the type that would have been returned, had the data.frame
> had more than zero rows. "as.matrix(data.frame(x = numeric(0)))" should
> then be numeric, if there is an empty "character" column the return matrix
> should be a character etc. This would make subsetting by row and conversion
> to matrix commute (except for row names sometimes):
> >> all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, ,
> drop = FALSE])
> > Furthermore, this change would make as.matrix.data.frame obey the
> documentation, which indicates that the coercion hierarchy is used for the
> return type.
> >
> > 2. Make the Summary.data.frame method accept data.frames that produce
> non-numeric matrices. Next to the main focus of this message, I believe it
> would e.g. be fitting to have any() and all() work on logical data.frame
> objects. The current behaviour is such that
> >> any(data.frame(x = 1))
> > #> [1] TRUE
> > #> Warning message:
> > #> In any(1, na.rm = FALSE) : coercing argument of type 'double' to
> logical
> > and
> >> any(data.frame(x = TRUE))
> > #> Error in FUN(X[[i]], ...) :
> > #>   only defined on a data frame with all numeric variables
> > So a numeric data.frame warns about implicit coercion, while a logical
> data.frame (which would not need coercion) does not work at all.
> >
> > (I feel more strongly about fixing 1. than 2., because I don't know the
> discussion that lead to the behaviour described in 2.)
> >
> > Best,
> > Martin
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tools::package_dependencies problems

2020-10-16 Thread Gabriel Becker
Hi Spencer,

You just need an available.packages matrix which reflects the reality you
want to test against. There are probably various ways of getting one of
these, but switchr allows you to build repositories off of many things
including local directories, so you could do something like

> setwd('/Users/gabrielbecker/gabe/checkedout/rtables/')
> library(switchr)
> chooseCRANmirror(ind = 1L)
> crancontribs = contrib.url(options()$repos)
> man = PkgManifest(name = "switchr", url = "~/gabe/checkedout/switchr",
type = "local")
> loccontrib <- lazyRepo(man)

> avl = available.packages(c(loccontrib, crancontribs))
> head(avl[, c("Package", "Version", "Repository")])
 PackageVersion
switchr  "switchr"  "0.14.3"
A3   "A3"   "1.0.0"
aaSEA"aaSEA""1.1.0"
AATtools "AATtools" "0.0.1"
ABACUS   "ABACUS"   "1.0.0"
abbyyR   "abbyyR"   "0.5.5"
 Repository

switchr
 
"file:///var/folders/14/z0rjkn8j0n5dj1lkdd4ng160gn/T/Rtmpe1zsSL/repo/src/contrib"
A3   "https://cloud.r-project.org/src/contrib;

aaSEA"https://cloud.r-project.org/src/contrib;

AATtools "https://cloud.r-project.org/src/contrib;

ABACUS   "https://cloud.r-project.org/src/contrib;

abbyyR   "https://cloud.r-project.org/src/contrib;

And pass avl directly to package_dependencies


> tools::package_dependencies("switchr", db = avl)
$switchr
[1] "methods" "tools"   "RJSONIO" "RCurl"

The benefit here beyond just constructing the PACKAGES file for your one
package (which if you want to get clever is all available.packages is
actually going to need) is that switchr will allow you to install it if you
want as well. switchr also allows you to install from your local working
copies of multiple inter-related packages simultaneously by having a
manifest that has those working copies in it, if you are doing development
that crosses package boundaries.


This reminds me that I need to polish and push some utilities related to
local checkouts of packages, I'll try to do that soon.

Anyway, hope that helps as it is now,

Best,
~G

On Fri, Oct 16, 2020 at 3:28 PM Spencer Graves 
wrote:

> Hello, All:
>
>
>   tools::package_dependencies('Ecfun') failed to find how my
> development version of Ecfun was using rJava, which generated errors in
> "R CMD build Ecfun".  This is because package_dependencies by default
> uses CRAN and ignores locally installed packages.
>
>
>   What do you think about having this function check both locally
> installed and CRAN versions?
>
>
>   It can probably be done, but I don't see how at the moment.
>
>
>   Also, the traditional interpretation of a help file with Usage
> including an argument 'which = c("Depends", "Imports", "LinkingTo")' is
> that specifying nothing defaults to "Depends".  In this case, it
> defaults to "Imports".  Moreover, I don't see a way to trace "Suggests".
>
>
>   ???
>   Thanks,
>   Spencer Graves
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Coercion function does not work for the ALTREP object

2020-10-07 Thread Gabriel Becker
Jiefei,

Where does the code for your altrep class live?

Thanks,
~G

On Wed, Oct 7, 2020 at 4:25 AM Jiefei Wang  wrote:

> Hi all,
>
> The coercion function defined for the ALTREP object will not be called by R
> when an assignment operation implicitly introduces coercion for a large
> ALTREP object.
>
> For example, If I create a vector of length 10, the ALTREP coercion
> function seems to work fine.
> ```
> > x <- 1:10
> > y <- wrap_altrep(x)
> > .Internal(inspect(y))
> @0x1f9271c0 13 INTSXP g0c0 [REF(2)] I am altrep
> > y[1] <- 1.0
> Duplicating object
> Coercing object
> > .Internal(inspect(y))
> @0x1f927c08 14 REALSXP g0c0 [REF(1)] I am altrep
> ```
>
> However, if I create a vector of length 1024, R will give me a normal
> real-type vector
> ```
> > x <- 1:1024
> > y <- wrap_altrep(x)
> > .Internal(inspect(y))
> @0x1f8ddb20 13 INTSXP g0c0 [REF(2)] I am altrep
> > y[1] <- 1.0
> > .Internal(inspect(y))
> @0x1f0d72a0 14 REALSXP g0c7 [REF(1)] (len=1024, tl=0) 1,2,3,4,5,...
> ```
>
> Note that the duplicate function is also called for the first example. It
> seems like R completely ignores my ALTREP functions in the second example.
> I feel this might be designed on purpose, but I do not understand the
> reason behind it. Is there any reason why we are not consistent here? Here
> is my session info
>
> sessionInfo()
> R Under development (unstable) (2020-09-03 r79126)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 18362)
>
> Best,
> Jiefei
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] S4 - inheritance changed by order of setClassUnion and setAs()

2020-10-05 Thread Gabriel Becker
Andreas,

As far as I can tell (/conjecture), this is because the list of classes a
particular class inherits from directly is appended to as needed, and so
the order that a class extends others isd refined by the order that those
connections are defined.

We can see this with two setClassUnion calls, rather than required setAs:

> setClass("grandma", slots = c(a = "character"))

> setClass("mother", slots = c(b = "matrix"), contains = "grandma")

> setClass("daughter", slots = c(c = "list"), contains = "mother")

> setClassUnion(name = "mr_x", members = c("daughter", "mother"))

> setClassUnion(name = "mr_y", members = c("daughter", "mother"))

> getClass("daughter")

Class "daughter" [in ".GlobalEnv"]


Slots:



Name:  c b a

Class:  listmatrix character


Extends:

Class "mother", directly

Class "mr_x", directly

Class "mr_y", directly

Class "grandma", by class "mother", distance 2

> setClass("grandma2", slots = c(a = "character"))

> setClass("mother2", slots = c(b = "matrix"), contains = "grandma2")

> setClass("daughter2", slots = c(c = "list"), contains = "mother2")

> setClassUnion(name = "mr_y2", members = c("daughter2", "mother2"))

> setClassUnion(name = "mr_x2", members = c("daughter2", "mother2"))

> getClass("daughter2")

Class "daughter2" [in ".GlobalEnv"]


Slots:



Name:  c b a

Class:  listmatrix character


Extends:

Class "mother2", directly

Class "mr_y2", directly

Class "mr_x2", directly

Class "grandma2", by class "mother2", distance 2


Note that mr_y2 appears in the list before mr_x2 the second block. The same
thing is happening with setAs which (somewhat contrary to my expectations,
admittedly) causes extends to consider "daughter" to inherit *directly* from
"grandma" in your example (though it does note its via explicit coercion).

I think the take-away here is that when modifying the class inheritance
structure explicitly, via setClassUnion or setAs (or, I assume, setIs)
order inherently matters.

In fact order also matters for multiple inheritence via the normal contains
mechanism. In practice, how could it not matter?

Multiple inheritence is very powerful but dangerous.

> setClass("person1", slots = c(f = "character"))

> setClass("person2", slots = c(g = "character"))

> setClass("people1",* contains = c("person1", "person2")*)

> getClass("people1")

Class "people1" [in ".GlobalEnv"]


Slots:



Name:  f g

Class: character character


Extends: "person1", "person2"

> setClass("people2", *contains = c("person2", "person1")*)

> getClass("people2")

Class "people2" [in ".GlobalEnv"]


Slots:



Name:  g f

Class: character character


Extends: "person2", "person1"

> setGeneric("ohno", function(obj) standardGeneric("ohno")

+ )

[1] "ohno"

> setMethod("ohno", "person1", function(obj) "person1!")

> setMethod("ohno", "person2", function(obj) "person2! Oh No!")

*> ohno(new("people1"))*

*[1] "person1!"*

*> ohno(new("people2"))*

*[1] "person2! Oh No!"*


Not sure if that helps any or not, but thats what I see here. And again, if
I got anything wrong here, someone please correct me :)

Best,
~G

On Mon, Oct 5, 2020 at 1:47 PM Blätte, Andreas 
wrote:

> Dear colleagues,
>
> there is a behaviour with S4 (virtual) classes that I find  very hard to
> understand: Depending on the position
> of setAs(), the tree of inheritance changes.
>
> This is my baseline example that defines the classes "grandma", "mother",
> "daughter" and a virtual
> class "mr_x". For a new instance if "daughter", "mr_x" is betweeen
> "mother" and "grandma".
>
> setClass("grandma", slots = c(a = "character"))
> setClass("mother", slots = c(b = "matrix"), contains = "grandma")
> setClass("daughter", slots = c(c = "list"), contains = "mother")
> setClassUnion(name = "mr_x", members = c("daughter", "mother"))
> setAs(from = "daughter", to = "grandma", def = function(from)
> new("grandma"))
> is(new("daughter"))
>
> [1] "daughter" "mother"   "mr_x" "grandma"
>
> Yet if I change the order of setAs() and setClassUnion(), this alters the
> pattern of inheritance.
>
> setClass("grandma", slots = c(a = "character"))
> setClass("mother", slots = c(b = "matrix"), contains = "grandma")
> setClass("daughter", slots = c(c = "list"), contains = "mother")
> setAs(from = "daughter", to = "grandma", def = function(from)
> new("grandma"))
> setClassUnion(name = "mr_x", members = c("daughter", "mother"))
> is(new("daughter"))
>
> [1] "daughter" "mother"   "grandma"  "mr_x"
>
> Is there a reasonable explanation for this behavior? I could not find any
> and I would appreciate
> your help. If it is not an unintended behavior, I find it very confusing
> and hard to anticipate.
>
> Kind regads
> Andreas
>
> --
> Prof. Dr. Andreas Blätte
> Professor of Public Policy and Regional Politics
> University of Duisburg-Essen
>
> [[alternative HTML version deleted]]
>
> __
> 

Re: [Rd] Internet access and R CMD make check-devel

2020-10-05 Thread Gabriel Becker
Thomas,

In my experience, as pointed out also by Gabor, its often part of the
devops build process to remove/comment out these tests or otherwise modify
them so that they will pass (if they SHOULD pass) in your environment.

That said, a quick look at the Makefile does suggest that failing on the
internet tests should be "allowed" and not cause the whole process to
return a non-zero value.

I don't have time right this second to test this though. Is that not the
behavior you're seeing in practice? Or do the tests hang so the process
never completes, or...?

Best,
~G

On Mon, Oct 5, 2020 at 9:49 AM Thomas J. Leeper 
wrote:

> I am trying to install R on CentOS (either 7 or 8, behavior is the
> same) in an environment behind a firewall and while I am able to run:
>
> R CMD make check
>
> I am unable to run:
>
> R CMD make check-devel
>
> These latter tests fail. The failure occurs in the internet access
> if() conditional statement in these two tests:
>
> https://svn.r-project.org/R/trunk/tests/internet.R
> https://svn.r-project.org/R/trunk/tests/internet2.R
>
> In my environment, nsl("cran.r-project.org") returns a valid, non-null
> value but subsequent commands in those test files do not successfully
> access the internet.
>
> I'd like to be able to run the full test suite given I am building
> from source. I'm wondering if it's possible to make these conditionals
> more strict so that the conditional tests internet access in a manner
> more similar to how internet access is used in the tests. Would this
> be possible? Or, make tests that require internet access into a
> distinct `check-internet` or similar?
>
> As an additional reference, the same conditional statement appears to
> also be used in these other tests:
>
> https://svn.r-project.org/R/trunk/tests/CRANtools.R
> https://svn.r-project.org/R/trunk/tests/libcurl.R
>
> Thanks,
> -Thomas
>
> Thomas J. Leeper
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Thread-safe R functions

2020-09-13 Thread Gabriel Becker
Jiefei,

Beyond the general response that Luke gave, to be a bit more specific to
what you said, DATAPTR and INTEGER_GET_REGION involve ALTREP method
execution (for ALTREP objects, obviously) so even they are not as simple
and straightforward as they were a couple years ago. They should not (any
longer) be thought of as being guaranteed to be essentially bare metal data
retrieval from memory.

Best,
~G

On Sun, Sep 13, 2020 at 6:49 AM  wrote:

> You should assume that NO functions or macros in the R API are
> thread-safe.  If some happen to be now, on some platforms, they are
> not guaranteed to be in the future. Even if you use a global lock you
> need to keep in mind that any function in the R API can signal an
> error and execute a longjmp, so you need to make sure you have set a
> top level context in your thread.
>
> Best,
>
> luke
>
> On Sun, 13 Sep 2020, Jiefei Wang wrote:
>
> > Hi,
> >
> > I am curious about whether there exist thread-safe functions in
> > `Rinternals.h`.  I know that R is single-threaded designed, but for the
> > simple and straightforward functions like `DATAPTR` and
> `INTEGER_GET_REGION`,
> > are these functions safe to call in a multi-thread environment?

>
> > Best,
> > Jiefei
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
> Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Gabriel Becker
Dan,

Sounds like a cool project! Response to one of your questions inline

On Mon, Sep 7, 2020 at 4:24 AM Dan Kortschak via R-devel <
r-devel@r-project.org> wrote:

>
> The last question is more a question of interest in design strategy,
> and the answer may have been lost to time. In order to reduce the need
> to go through Go's interface assertions in a number of cases I have
> decided to reinterpret R_NilValue to an untyped Go nil (this is
> important for example in list traversal where the CDR can (hopefully)
> be only one of two types LISTSXP or NILSXP; in Go this would require a
> generalised SEXP return, but by doing this reinterpretation I can
> return a *List pointer which may be nil, greatly simplifying the code
> and improving the performance). My question her is why a singleton null
> value was chosen to be represented as a fully allocated SEXP value
> rather than just a C NULL. Also, whether C NULL is used to any great
> extent within the internal code.


I cannot speak to initial intent, perhaps others can. I can say that there
is at least one place where the difference between R_NilValue and NULL is
very important as of right now. The current design of the ALTREP framework
contract expects ALTREP methods that return a SEXP to return C NULL when
they fail (or decline) to do the requested computation and the
non-altclass-specific machinery should be run as a fallback. The places
where ALTREP methods are plugged into the existing, general internals then
check for C-NULL after attempting to fast-path the computation via ALTREP.
Any non-C-NULL SEXP, including R_Nilvalue will be taken as an indication
that the altrep-method succeeded and that SEXP is the resulting value,
causing the fall-back machinery to be skipped.

IIUC the system you described, this means that it would be impossible to
implement (a fully general) ALTREP class in GO using your framework (at
least for the method types that return SEXP and for which R_NilValue is a
valid return value) because your code is unable to distinguish safely
between the two. In practice in most currently existing methods, you
wouldn't ever need to return R_NilValue, I wouldn't think.

The problem that jumps out at me is Extract_subset. Now I'd need to do some
digging to be certain but there, for some types in some situations, it DOES
*seem* like you might need to return the R-NULL and find yourself unable to
do so.

Its also possible more methods will be added to the table in the future
that would be problematic in light of that restrictrion.

In particular, if ALTREP list/environment implementations were to ever be
supported I would expect you to be dead in the water entirely in terms of
building those as you'd find yourself entirely unable to implement the
Basic Single-element getter machinery, I think.

Beyond that, a quick grep of the sources tells me there are definitely a
few times SEXP objects are  tested with  == NULL though not
overwhelmingly many. Most such tests are for non-SEXP pointers.

Best,
~G



> Note that the Go API provides a
> mechanism to easily reconvert the nil's used back to a R_NilValue when
> returning from a Go function[3].
>
> thanks
> Dan Kortschak
>
> [1]https://github.com/rgonomic/rgo
> [2]https://github.com/rgonomic/rgo/issues/1
> [3]https://pkg.go.dev/github.com/rgonomic/rgo/sexp?tab=doc#Value.Export
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] utils::isS3stdGeneric chokes on primitives and identity

2020-08-30 Thread Gabriel Becker
Submitted to bugzilla here:
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17901

On Sat, Aug 29, 2020 at 1:57 PM Gabriel Becker 
wrote:

> Hi all,
>
> I have a patch that fixes this and also fixes/improves debugcall so that
> it supports pkg::fun(obj) and pkg:::fun(obj) style calls. I'm going to test
> it a bit more and add a regression test for isS3stdGeneric and then I will
> submit it to bugzilla tonight or tomorrow morning.
>
> Best,
> ~G
>
> On Thu, Aug 27, 2020 at 5:28 PM Gabriel Becker 
> wrote:
>
>> Trace adds something to the body of the function, so it does make sense
>> that it doesn't. Whether traced functions still technically meet the
>> definition of standard s3 generic or not is, I suppose, up for debate, but
>> I would say that they should, I think.
>>
>> As before, if desired I can work on a patch for this if desired, or
>> someone on R-core can just take care of it if that is easier.
>>
>> Best,
>> ~G
>>
>> On Thu, Aug 27, 2020 at 11:22 AM Antoine Fabri 
>> wrote:
>>
>>> Should it work on traced functions ?
>>>
>>> As it is now it doesn't.
>>>
>>> Best,
>>>
>>> Antoine
>>>
>>> Le jeu. 20 août 2020 à 09:58, Kurt Hornik  a
>>> écrit :
>>>
>>>> >>>>> Gabriel Becker writes:
>>>>
>>>> > I added that so I can look at the proposed fix and put it or something
>>>> > similar in bugzilla for review final review.
>>>>
>>>> > Apologies for the oversight.
>>>>
>>>> Fixed now with
>>>>
>>>> -while(as.character(bdexpr[[1L]]) == "{")
>>>> +while(is.call(bdexpr) && (as.character(bdexpr[[1L]]) == "{"))
>>>>
>>>> (the suggested fix does not work on things like
>>>> foo <- function(x) {{ x }}
>>>> ...)
>>>>
>>>> Best
>>>> -k
>>>>
>>>> > ~G
>>>>
>>>> > On Wed, Aug 19, 2020 at 3:40 PM Antoine Fabri <
>>>> antoine.fa...@gmail.com>
>>>> > wrote:
>>>>
>>>> >> Dear R-devel,
>>>> >>
>>>> >> utils::isS3stdGeneric tries to subset the body of the function it's
>>>> fed,
>>>> >> primitives don't like that because they don't have a body, identity
>>>> doesn't
>>>> >> like it either because it's body is a symbol.
>>>> >>
>>>> >> According to the doc, any function is a legal input.
>>>> >>
>>>> >> See below:
>>>> >>
>>>> >> identity
>>>> >> #> function (x)
>>>> >> #> x
>>>> >> #> 
>>>> >> #> 
>>>> >>
>>>> >> max
>>>> >> #> function (..., na.rm = FALSE)  .Primitive("max")
>>>> >>
>>>> >> isS3stdGeneric(identity)
>>>> >> #> Error in bdexpr[[1L]]: objet de type 'symbol' non indiçable
>>>> >>
>>>> >> isS3stdGeneric(max)
>>>> >> #> Error in while (as.character(bdexpr[[1L]]) == "{") bdexpr <-
>>>> >> bdexpr[[2L]]: l'argument est de longueur nulle
>>>> >>
>>>> >> Here is a simple fix :
>>>> >>
>>>> >> isS3stdGeneric <- function(f) {
>>>> >> {
>>>> >> bdexpr <- body(f)
>>>> >> if(is.null(bdexpr) || !is.call(bdexpr)) return(FALSE)
>>>> >> while (as.character(bdexpr[[1L]]) == "{") bdexpr <- bdexpr[[2L]]
>>>> >> ret <- is.call(bdexpr) && identical(bdexpr[[1L]], as.name
>>>> >> ("UseMethod"))
>>>> >> if (ret)
>>>> >> names(ret) <- bdexpr[[2L]]
>>>> >> ret
>>>> >> }
>>>> >> }
>>>> >>
>>>> >> isS3stdGeneric(identity)
>>>> >> #> [1] FALSE
>>>> >> isS3stdGeneric(max)
>>>> >> #> [1] FALSE
>>>> >>
>>>> >> Best,
>>>> >>
>>>> >> Antoine
>>>> >>
>>>> >> [[alternative HTML version deleted]]
>>>> >>
>>>> >> __
>>>> >> R-devel@r-project.org mailing list
>>>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> >>
>>>>
>>>> >   [[alternative HTML version deleted]]
>>>>
>>>> > __
>>>> > R-devel@r-project.org mailing list
>>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] utils::isS3stdGeneric chokes on primitives and identity

2020-08-29 Thread Gabriel Becker
Hi all,

I have a patch that fixes this and also fixes/improves debugcall so that it
supports pkg::fun(obj) and pkg:::fun(obj) style calls. I'm going to test it
a bit more and add a regression test for isS3stdGeneric and then I will
submit it to bugzilla tonight or tomorrow morning.

Best,
~G

On Thu, Aug 27, 2020 at 5:28 PM Gabriel Becker 
wrote:

> Trace adds something to the body of the function, so it does make sense
> that it doesn't. Whether traced functions still technically meet the
> definition of standard s3 generic or not is, I suppose, up for debate, but
> I would say that they should, I think.
>
> As before, if desired I can work on a patch for this if desired, or
> someone on R-core can just take care of it if that is easier.
>
> Best,
> ~G
>
> On Thu, Aug 27, 2020 at 11:22 AM Antoine Fabri 
> wrote:
>
>> Should it work on traced functions ?
>>
>> As it is now it doesn't.
>>
>> Best,
>>
>> Antoine
>>
>> Le jeu. 20 août 2020 à 09:58, Kurt Hornik  a
>> écrit :
>>
>>> >>>>> Gabriel Becker writes:
>>>
>>> > I added that so I can look at the proposed fix and put it or something
>>> > similar in bugzilla for review final review.
>>>
>>> > Apologies for the oversight.
>>>
>>> Fixed now with
>>>
>>> -while(as.character(bdexpr[[1L]]) == "{")
>>> +while(is.call(bdexpr) && (as.character(bdexpr[[1L]]) == "{"))
>>>
>>> (the suggested fix does not work on things like
>>> foo <- function(x) {{ x }}
>>> ...)
>>>
>>> Best
>>> -k
>>>
>>> > ~G
>>>
>>> > On Wed, Aug 19, 2020 at 3:40 PM Antoine Fabri >> >
>>> > wrote:
>>>
>>> >> Dear R-devel,
>>> >>
>>> >> utils::isS3stdGeneric tries to subset the body of the function it's
>>> fed,
>>> >> primitives don't like that because they don't have a body, identity
>>> doesn't
>>> >> like it either because it's body is a symbol.
>>> >>
>>> >> According to the doc, any function is a legal input.
>>> >>
>>> >> See below:
>>> >>
>>> >> identity
>>> >> #> function (x)
>>> >> #> x
>>> >> #> 
>>> >> #> 
>>> >>
>>> >> max
>>> >> #> function (..., na.rm = FALSE)  .Primitive("max")
>>> >>
>>> >> isS3stdGeneric(identity)
>>> >> #> Error in bdexpr[[1L]]: objet de type 'symbol' non indiçable
>>> >>
>>> >> isS3stdGeneric(max)
>>> >> #> Error in while (as.character(bdexpr[[1L]]) == "{") bdexpr <-
>>> >> bdexpr[[2L]]: l'argument est de longueur nulle
>>> >>
>>> >> Here is a simple fix :
>>> >>
>>> >> isS3stdGeneric <- function(f) {
>>> >> {
>>> >> bdexpr <- body(f)
>>> >> if(is.null(bdexpr) || !is.call(bdexpr)) return(FALSE)
>>> >> while (as.character(bdexpr[[1L]]) == "{") bdexpr <- bdexpr[[2L]]
>>> >> ret <- is.call(bdexpr) && identical(bdexpr[[1L]], as.name
>>> >> ("UseMethod"))
>>> >> if (ret)
>>> >> names(ret) <- bdexpr[[2L]]
>>> >> ret
>>> >> }
>>> >> }
>>> >>
>>> >> isS3stdGeneric(identity)
>>> >> #> [1] FALSE
>>> >> isS3stdGeneric(max)
>>> >> #> [1] FALSE
>>> >>
>>> >> Best,
>>> >>
>>> >> Antoine
>>> >>
>>> >> [[alternative HTML version deleted]]
>>> >>
>>> >> __
>>> >> R-devel@r-project.org mailing list
>>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> >>
>>>
>>> >   [[alternative HTML version deleted]]
>>>
>>> > __
>>> > R-devel@r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] utils::isS3stdGeneric chokes on primitives and identity

2020-08-27 Thread Gabriel Becker
Trace adds something to the body of the function, so it does make sense
that it doesn't. Whether traced functions still technically meet the
definition of standard s3 generic or not is, I suppose, up for debate, but
I would say that they should, I think.

As before, if desired I can work on a patch for this if desired, or someone
on R-core can just take care of it if that is easier.

Best,
~G

On Thu, Aug 27, 2020 at 11:22 AM Antoine Fabri 
wrote:

> Should it work on traced functions ?
>
> As it is now it doesn't.
>
> Best,
>
> Antoine
>
> Le jeu. 20 août 2020 à 09:58, Kurt Hornik  a écrit :
>
>> >>>>> Gabriel Becker writes:
>>
>> > I added that so I can look at the proposed fix and put it or something
>> > similar in bugzilla for review final review.
>>
>> > Apologies for the oversight.
>>
>> Fixed now with
>>
>> -while(as.character(bdexpr[[1L]]) == "{")
>> +while(is.call(bdexpr) && (as.character(bdexpr[[1L]]) == "{"))
>>
>> (the suggested fix does not work on things like
>> foo <- function(x) {{ x }}
>> ...)
>>
>> Best
>> -k
>>
>> > ~G
>>
>> > On Wed, Aug 19, 2020 at 3:40 PM Antoine Fabri 
>> > wrote:
>>
>> >> Dear R-devel,
>> >>
>> >> utils::isS3stdGeneric tries to subset the body of the function it's
>> fed,
>> >> primitives don't like that because they don't have a body, identity
>> doesn't
>> >> like it either because it's body is a symbol.
>> >>
>> >> According to the doc, any function is a legal input.
>> >>
>> >> See below:
>> >>
>> >> identity
>> >> #> function (x)
>> >> #> x
>> >> #> 
>> >> #> 
>> >>
>> >> max
>> >> #> function (..., na.rm = FALSE)  .Primitive("max")
>> >>
>> >> isS3stdGeneric(identity)
>> >> #> Error in bdexpr[[1L]]: objet de type 'symbol' non indiçable
>> >>
>> >> isS3stdGeneric(max)
>> >> #> Error in while (as.character(bdexpr[[1L]]) == "{") bdexpr <-
>> >> bdexpr[[2L]]: l'argument est de longueur nulle
>> >>
>> >> Here is a simple fix :
>> >>
>> >> isS3stdGeneric <- function(f) {
>> >> {
>> >> bdexpr <- body(f)
>> >> if(is.null(bdexpr) || !is.call(bdexpr)) return(FALSE)
>> >> while (as.character(bdexpr[[1L]]) == "{") bdexpr <- bdexpr[[2L]]
>> >> ret <- is.call(bdexpr) && identical(bdexpr[[1L]], as.name
>> >> ("UseMethod"))
>> >> if (ret)
>> >> names(ret) <- bdexpr[[2L]]
>> >> ret
>> >> }
>> >> }
>> >>
>> >> isS3stdGeneric(identity)
>> >> #> [1] FALSE
>> >> isS3stdGeneric(max)
>> >> #> [1] FALSE
>> >>
>> >> Best,
>> >>
>> >> Antoine
>> >>
>> >> [[alternative HTML version deleted]]
>> >>
>> >> __
>> >> R-devel@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>
>>
>> >   [[alternative HTML version deleted]]
>>
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] NAs and rle

2020-08-25 Thread Gabriel Becker
Hi All,

A twitter user, Mike fc (@coolbutuseless) mentioned today that he was
surprised that repeated NAs weren't treated as a run by the rle function.

Now I know why they are not. NAs represent values which could be the same
or different from eachother if they were known, so from a purely conceptual
standpoint there is no way to tell whether they are the same and thus
constitute a run or not.

This conceptual strictness isnt universally observed, though, because we
get the following:

> unique(c(1, 2, 3, NA, NA, NA))

[1]  1  2  3 NA


Which means that rle(sort(x))$value is not guaranteed to be the same as
unique(x), which is a little strange (though likely of little practical
impact).


Personally, to me it also seems that, from a purely data-compression
standpoint, it would be valid to collapse those missing values into a run
of missing, as it reduces size in-memory/on disk without losing any
information.

Now none of this is to say that I suggest the default behavior be changed
(that would surely disrupt some non-trivial amount of existing code) but
what do people think of a  group.nas argument which defaults to FALSE
controlling the behavior?

As a final point, there is some precedent here (though obviously not at all
binding), as Bioconductor's Rle functionality does group NAs.

Best,
~G

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] utils::isS3stdGeneric chokes on primitives and identity

2020-08-19 Thread Gabriel Becker
I added that so I can look at the proposed fix and put it or something
similar in bugzilla for review final review.

Apologies for the oversight.

~G

On Wed, Aug 19, 2020 at 3:40 PM Antoine Fabri 
wrote:

> Dear R-devel,
>
> utils::isS3stdGeneric tries to subset the body of the function it's fed,
> primitives don't like that because they don't have a body, identity doesn't
> like it either because it's body is a symbol.
>
> According to the doc, any function is a legal input.
>
> See below:
>
> identity
> #> function (x)
> #> x
> #> 
> #> 
>
> max
> #> function (..., na.rm = FALSE)  .Primitive("max")
>
> isS3stdGeneric(identity)
> #> Error in bdexpr[[1L]]: objet de type 'symbol' non indiçable
>
> isS3stdGeneric(max)
> #> Error in while (as.character(bdexpr[[1L]]) == "{") bdexpr <-
> bdexpr[[2L]]: l'argument est de longueur nulle
>
> Here is a simple fix :
>
> isS3stdGeneric <- function(f) {
>   {
> bdexpr <- body(f)
> if(is.null(bdexpr) || !is.call(bdexpr)) return(FALSE)
> while (as.character(bdexpr[[1L]]) == "{") bdexpr <- bdexpr[[2L]]
> ret <- is.call(bdexpr) && identical(bdexpr[[1L]], as.name
> ("UseMethod"))
> if (ret)
>   names(ret) <- bdexpr[[2L]]
> ret
>   }
> }
>
> isS3stdGeneric(identity)
> #> [1] FALSE
> isS3stdGeneric(max)
> #> [1] FALSE
>
> Best,
>
> Antoine
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Seeding non-R RNG with numbers from R's RNG stream

2020-07-30 Thread Gabriel Becker
Tommy,

I'm not Duncan (and am not nor claim to be an RNG expert) but I believe RNG
streams are designed and thus tested, to be used as streams. Repeatedly
setting the seed after small numbers of samples from them does not fit the
designed usecase (And also doesn't match the test criteria by which they
are evaluated/validated, which is what I believe Duncan was saying).

(Anything Duncan or another RNG expert says that contradicts the above
should be taken as correct instead of what I Said).

Best,
~G

On Thu, Jul 30, 2020 at 1:30 PM Tommy Jones  wrote:

> Thank you for this. I'd like to be sure I understand the
> intuition correctly. Is the following true from what you said?
>
> I can just fix the seed at the C++ level and the results will still be
> (pseudo) random because the initialization at the R level is (pseudo)
> random.
>
> On Thu, Jul 30, 2020 at 3:36 PM Duncan Murdoch 
> wrote:
>
> > I wouldn't trust the C++ generator to be as good if you seed it this way
> > as if you just seeded it once with your phone number (or any other fixed
> > value) and let it run, because it's probably never been tested to be
> > good when run this way.  Is it good enough for the way you plan to use
> > it?  Maybe.
> >
> > Duncan Murdoch
> >
> > On 30/07/2020 3:05 p.m., Tommy Jones wrote:
> > > Hi,
> > >
> > > I am constructing a function that does sampling in C++ using a non-R
> RNG
> > > stream for thread safety reasons. This C++ function is wrapped by an R
> > > function, which is user facing. The R wrapper does some sampling itself
> > to
> > > initialize some variables before passing them off to C++. So that my
> > users
> > > do not have to manage two mechanisms to set random seeds, I've
> > constructed
> > > a solution (shown below) that allows both RNGs to be seeded with
> set.seed
> > > and respond to the state of R's RNG stream.
> > >
> > > I believe the below works. However, I am hoping to get feedback from
> more
> > > experienced useRs as to whether or not the below approach is unsafe in
> > ways
> > > that may affect reproducibility, modify global variables in bad ways,
> or
> > > have other unintended consequences I have not anticipated.
> > >
> > > Could I trouble one or more folks on this list to weigh in on the
> safety
> > > (or perceived wisdom) of using R's internal RNG stream to seed an RNG
> > > external to R? Many thanks in advance.
> > >
> > > This relates to a Stackoverflow question here:
> > >
> >
> https://stackoverflow.com/questions/63165955/is-there-a-best-practice-for-using-non-r-rngs-in-rcpp-code
> > >
> > > Pseudocode of a trivial facsimile of my current approach is below.
> > >
> > > --Tommy
> > >
> > > sample_wrapper <- function() {
> > ># initialize a variable to pass to C++
> > >init_var <- runif(1)
> > >
> > ># get current state of RNG stream
> > ># first entry of .Random.seed is an integer representing the
> > algorithm used
> > ># second entry is current position in RNG stream
> > ># subsequent entries are pseudorandom numbers
> > >seed_pos <- .Random.seed[2]
> > >
> > >seed <- .Random.seed[seed_pos + 2]
> > >
> > >out <- sample_cpp(init_var = init_var, seed = seed)
> > >
> > ># move R's position in the RNG stream forward by 1 with a throw away
> > sample
> > >runif(1)
> > >
> > ># return the output
> > >out}
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Speed-up/Cache loadNamespace()

2020-07-20 Thread Gabriel Becker
Mario, Abby, et al.

Note that there is no fully safe way of unloading packages which register
methods (as answered by Luke Tierney here:
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16644 ) which makes the
single R session running arbitrary different scripts thing pretty iffy over
the long term. Even swtichr (which tries hard to support something based on
this) only gets "pretty close".

If the scripts are always the same (up to bugfixes, etc) and most
importantly require the same loaded packages then the above won't be an
issue, of course. Just something to be aware of when planning something
like this.

Best,
~G

On Mon, Jul 20, 2020 at 2:59 PM Abby Spurdle  wrote:

> Thank you Serguei and Gabor.
> Great suggestions.
>
> > If your R scripts contain "stop()" or "q('yes')" or any other error, it
> > will end the Rscript process. Kind of watch-dog can be set for automatic
> > relaunching if needed.
>
> It should be possible to change the error handling behavior.
> From within R:
>
> options (error = function () NULL)
>
> Or something better...
>
> Also, it may be desirable to wipe the global environment (or parts of
> it), after each script:
>
> remove (list = ls (envir=.GlobalEnv, all.names=TRUE) )
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mget(missingArgument)?

2020-06-22 Thread Gabriel Becker
At first I thought this was more or less correct, because

> f = function(x) { y <- mget("x")[[1]]; missing(y)}

> f()

[1] TRUE


reflects the actual "value" of x, but then at the very least this


> f = function(x) { y <- mget("x")[[1]]; y}

> f()

*Error in f() : argument "y" is missing, with no default*


Is a problem because, of course, y was not an argument of f and talking
about its default is nonsensical, and the actual argument which was missing
is not named.


~G

On Mon, Jun 22, 2020 at 5:06 PM William Dunlap via R-devel <
r-devel@r-project.org> wrote:

> Currently, when mget() is used to get the value of a function's argument
> with no default value and no value in the call it returns the empty name
> (R_MissingArg).  Is that the right thing to do or should it return
> 'ifnotfound' or give an error?
>
> E.g.,
> > a <- (function(x) { y <- "y from function's environment";
> mget(c("x","y","z"), envir=environment(), ifnotfound=666)})()
> > str(a)
> List of 3
>  $ x: symbol
>  $ y: chr "y from function's environment"
>  $ z: num 666
>
> The similar function get0() gives an error in that case.
> > b <- (function(x) get0("x", envir=environment(), ifnotfound=666))()
> Error in get0("x", envir = environment(), ifnotfound = 666) :
>   argument "x" is missing, with no default
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-24 Thread Gabriel Becker
On Sat, May 23, 2020 at 9:59 PM Hervé Pagès  wrote:

> On 5/23/20 17:45, Gabriel Becker wrote:
> > Maybe my intuition is just
> > different but when I collapse multiple character vectors together, I
> > expect all the characters from each of those vectors to be in the
> > resulting collapsed one.
>
> Yes I'd expect that too. But the **collapse** operation in paste() has
> never been about collapsing **multiple** character vectors together.
> What it does is collapse the **single** character vector that comes out
> of the 'sep' operation.
>

I understand what it does, I broke ti down the same way in my post earlier
in the thread. the fact remains is that it is a single function which
significantly muddies the waters. so you can say

paste0(x,y, collapse=",", recycle0=TRUE)

is not a collapse operation on multiple vectors, and of course there's a
sense in which you're not wrong (again I understand what these functions
do), but it sure looks like one in the invocation, doesn't it?

Honestly the thing that this whole discussion has shown me most clearly is
that, imho, collapse (accepting ONLY one data vector) and paste(accepting
multiple) should never have been a single function to begin with.  But that
ship sailed long long ago.




> So
>
>paste(x, y, z, sep="", collapse=",")
>
> is analogous to
>
>sum(x + y + z)
>

Honestly, I'd be significantly more comfortable if

1:10 + integer(0) + 5

were an error too.

At least I'm consistent right?

~G

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-23 Thread Gabriel Becker
Brodie,

A good point, but more analogous to what I'm concerned with is

> sum(5, numeric(0))

[1] 5


Not 0 (the analogu of Herve's desired behavior).

Best,
~G

PS Brodie sorry for the double.

On Fri, May 22, 2020 at 6:12 PM brodie gaslam 
wrote:

> > On Friday, May 22, 2020, 6:16:45 PM EDT, Hervé Pagès <
> hpa...@fredhutch.org> wrote:
> >
> > Gabe,
> >
> > It's the current behavior of paste() that is a major source of bugs:
> >
> >   ## Add "rs" prefix to SNP ids and collapse them in a
> >   ## comma-separated string.
> >   collapse_snp_ids <- function(snp_ids)
> >   paste("rs", snp_ids, sep="", collapse=",")
> >
> >   snp_groups <- list(
> > group1=c(55, 22, 200),
> > group2=integer(0),
> > group3=c(99, 550)
> >   )
> >
> >   vapply(snp_groups, collapse_snp_ids, character(1))
> >   #group1group2group3
> >   # "rs55,rs22,rs200"  "rs"  "rs99,rs550"
> >
> > This has hit me so many times!
> >
> > Now with 'collapse0=TRUE', we finally have the opportunity to make it do
> > the right thing. Let's not miss that opportunity.
> >
> > Cheers,
> > H.
>
> FWIW what convinces me is consistency with other aggregating functions
> applied
> to zero length inputs:
>
> sum(numeric(0))
> ## [1] 0
>
> >
> >
> > On 5/22/20 11:26, Gabriel Becker wrote:
> > > I understand that this is consistent but it also strikes me as an
> > > enormous 'gotcha' of a magnitude that 'we' are trying to avoid/smooth
> > > over at this point in user-facing R space.
> > >
> > > For the record I'm not suggesting it should return something other than
> > > "", and in particular I'm not arguing that any call to paste /that does
> > > not return an error/ with non-NULL collapse should return a character
> > > vector of length one.
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-23 Thread Gabriel Becker
Herve (et al.),

On Fri, May 22, 2020 at 3:16 PM Hervé Pagès  wrote:

> Gabe,
>
> It's the current behavior of paste() that is a major source of bugs:
>
>## Add "rs" prefix to SNP ids and collapse them in a
>## comma-separated string.
>collapse_snp_ids <- function(snp_ids)
>paste("rs", snp_ids, sep="", collapse=",")
>
>snp_groups <- list(
>  group1=c(55, 22, 200),
>  group2=integer(0),
>  group3=c(99, 550)
>)
>
>vapply(snp_groups, collapse_snp_ids, character(1))
>#group1group2group3
># "rs55,rs22,rs200"  "rs"  "rs99,rs550"
>
> This has hit me so many times!
>
> Now with 'collapse0=TRUE', we finally have the opportunity to make it do
> the right thing. Let's not miss that opportunity.
>

I see what you're saying, but I don' know. Maybe my intuition is just
different but when I collapse multiple character vectors together, I
expect all the characters from each of those vectors to be in the resulting
collapsed one. In your example its a string literal tot be added
elementwise to the prefix, but what if it is another vector of length > 1.
Wouldn't it be strange that all those values are wiped and absent from the
resulting string? Maybe it's just me. like for paste(x,y,z, sep ="",
collapse = ", ", recycle0=TRUE) if length(y) is 0, it literally makes no
difference when x and z are.

I seem to be being largely outvoted anyway though, so we will see what
Martin and others who may pop up might think, but I raised the points I
wanted to raise so we'll see where things ultimately fall.

~G



>
> Cheers,
> H.
>
>
> On 5/22/20 11:26, Gabriel Becker wrote:
> > I understand that this is consistent but it also strikes me as an
> > enormous 'gotcha' of a magnitude that 'we' are trying to avoid/smooth
> > over at this point in user-facing R space.
> >
> > For the record I'm not suggesting it should return something other than
> > "", and in particular I'm not arguing that any call to paste /that does
> > not return an error/ with non-NULL collapse should return a character
> > vector of length one.
> >
> > Rather I'm pointing out that it could (perhaps should, imo) simply be an
> > error, which is also consistent, in the strict sense, with
> > previous behavior in that it is the developer simply declining to extend
> > the recycle0 argument to the full parameter space (there is no rule that
> > says we must do so, arguments whose use is incompatible with other
> > arguments can be reasonable and called for).
> >
> > I don't feel feel super strongly that reeturning "" in this and similar
> > cases horrible and should never happen, but i'd bet dollars to donuts
> > that to the extent that behavior occurs it will be a disproportionately
> > major source of bugs, and i think thats at least worth considering in
> > addition to pure consistency.
> >
> > ~G
> >
> > On Fri, May 22, 2020 at 9:50 AM William Dunlap  > <mailto:wdun...@tibco.com>> wrote:
> >
> > I agree with Herve, processing collapse happens last so
> > collapse=non-NULL always leads to a single character string being
> > returned, the same as paste(collapse="").  See the altPaste function
> > I posted yesterday.
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> > <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com=DwMFaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY=7ZT1IjmexPqsDBhrV3NspPTr8M8XiMweEwJWErgAlqw=
> >
> >
> >
> > On Fri, May 22, 2020 at 9:12 AM Hervé Pagès  > <mailto:hpa...@fredhutch.org>> wrote:
> >
> > I think that
> >
> >  paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ", collapse
> > = ",",
> > recycle0=TRUE)
> >
> > should just return an empty string and don't see why it needs to
> > emit a
> > warning or raise an error. To me it does exactly what the user
> > is asking
> > for, which is to change how the 3 arguments are recycled
> > **before** the
> > 'sep' operation.
> >
> > The 'recycle0' argument has no business in the 'collapse'
> operation
> > (which comes after the 'sep' operation): this operation still
> > behaves
> > lik

Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-22 Thread Gabriel Becker
I understand that this is consistent but it also strikes me as an enormous
'gotcha' of a magnitude that 'we' are trying to avoid/smooth over at this
point in user-facing R space.

For the record I'm not suggesting it should return something other than "",
and in particular I'm not arguing that any call to paste *that does not
return an error* with non-NULL collapse should return a character vector of
length one.

Rather I'm pointing out that it could (perhaps should, imo) simply be an
error, which is also consistent, in the strict sense, with
previous behavior in that it is the developer simply declining to extend
the recycle0 argument to the full parameter space (there is no rule that
says we must do so, arguments whose use is incompatible with other
arguments can be reasonable and called for).

I don't feel feel super strongly that reeturning "" in this and similar
cases horrible and should never happen, but i'd bet dollars to donuts that
to the extent that behavior occurs it will be a disproportionately major
source of bugs, and i think thats at least worth considering in addition to
pure consistency.

~G

On Fri, May 22, 2020 at 9:50 AM William Dunlap  wrote:

> I agree with Herve, processing collapse happens last so collapse=non-NULL
> always leads to a single character string being returned, the same as
> paste(collapse="").  See the altPaste function I posted yesterday.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Fri, May 22, 2020 at 9:12 AM Hervé Pagès  wrote:
>
>> I think that
>>
>> paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ", collapse = ",",
>> recycle0=TRUE)
>>
>> should just return an empty string and don't see why it needs to emit a
>> warning or raise an error. To me it does exactly what the user is asking
>> for, which is to change how the 3 arguments are recycled **before** the
>> 'sep' operation.
>>
>> The 'recycle0' argument has no business in the 'collapse' operation
>> (which comes after the 'sep' operation): this operation still behaves
>> like it always had.
>>
>> That's all there is to it.
>>
>> H.
>>
>>
>> On 5/22/20 03:00, Gabriel Becker wrote:
>> > Hi Martin et al,
>> >
>> >
>> >
>> > On Thu, May 21, 2020 at 9:42 AM Martin Maechler
>> > mailto:maech...@stat.math.ethz.ch>> wrote:
>> >
>> >  >>>>> Hervé Pagès
>> >  >>>>> on Fri, 15 May 2020 13:44:28 -0700 writes:
>> >
>> >  > There is still the situation where **both** 'sep' and
>> > 'collapse' are
>> >  > specified:
>> >
>> >  >> paste(integer(0), "nth", sep="", collapse=",")
>> >  > [1] "nth"
>> >
>> >  > In that case 'recycle0' should **not** be ignored i.e.
>> >
>> >  > paste(integer(0), "nth", sep="", collapse=",", recycle0=TRUE)
>> >
>> >  > should return the empty string (and not character(0) like it
>> > does at the
>> >  > moment).
>> >
>> >  > In other words, 'recycle0' should only control the first
>> > operation (the
>> >  > operation controlled by 'sep'). Which makes plenty of sense:
>> > the 1st
>> >  > operation is binary (or n-ary) while the collapse operation
>> > is unary.
>> >  > There is no concept of recycling in the context of unary
>> > operations.
>> >
>> > Interesting, ..., and sounding somewhat convincing.
>> >
>> >  > On 5/15/20 11:25, Gabriel Becker wrote:
>> >  >> Hi all,
>> >  >>
>> >  >> This makes sense to me, but I would think that recycle0 and
>> > collapse
>> >  >> should actually be incompatible and paste should throw an
>> > error if
>> >  >> recycle0 were TRUE and collapse were declared in the same
>> > call. I don't
>> >  >> think the value of recycle0 should be silently ignored if it
>> > is actively
>> >  >> specified.
>> >  >>
>> >  >> ~G
>> >
>> > Just to summarize what I think we should know and agree (or be
>> > be "disproven") and where this comes from ...
>> >
>> > 1) recycle0 is a ne

Re: [Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

2020-05-22 Thread Gabriel Becker
Hi Martin et al,



On Thu, May 21, 2020 at 9:42 AM Martin Maechler 
wrote:

> >>>>> Hervé Pagès
> >>>>> on Fri, 15 May 2020 13:44:28 -0700 writes:
>
> > There is still the situation where **both** 'sep' and 'collapse' are
> > specified:
>
> >> paste(integer(0), "nth", sep="", collapse=",")
> > [1] "nth"
>
> > In that case 'recycle0' should **not** be ignored i.e.
>
> > paste(integer(0), "nth", sep="", collapse=",", recycle0=TRUE)
>
> > should return the empty string (and not character(0) like it does at
> the
> > moment).
>
> > In other words, 'recycle0' should only control the first operation
> (the
> > operation controlled by 'sep'). Which makes plenty of sense: the 1st
> > operation is binary (or n-ary) while the collapse operation is
> unary.
> > There is no concept of recycling in the context of unary operations.
>
> Interesting, ..., and sounding somewhat convincing.
>
> > On 5/15/20 11:25, Gabriel Becker wrote:
> >> Hi all,
> >>
> >> This makes sense to me, but I would think that recycle0 and
> collapse
> >> should actually be incompatible and paste should throw an error if
> >> recycle0 were TRUE and collapse were declared in the same call. I
> don't
> >> think the value of recycle0 should be silently ignored if it is
> actively
> >> specified.
> >>
> >> ~G
>
> Just to summarize what I think we should know and agree (or be
> be "disproven") and where this comes from ...
>
> 1) recycle0 is a new R 4.0.0 option in paste() / paste0() which by default
>(recycle0 = FALSE) should (and *does* AFAIK) not change anything,
>hence  paste() / paste0() behave completely back-compatible
>if recycle0 is kept to FALSE.
>
> 2) recycle0 = TRUE is meant to give different behavior, notably
>0-length arguments (among '...') should result in 0-length results.
>
>The above does not specify what this means in detail, see 3)
>
> 3) The current R 4.0.0 implementation (for which I'm primarily responsible)
>and help(paste)  are in accordance.
>Notably the help page (Arguments -> 'recycle0' ; Details 1st para ;
> Examples)
>says and shows how the 4.0.0 implementation has been meant to work.
>
> 4) Several provenly smart members of the R community argue that
>both the implementation and the documentation of 'recycle0 =
>TRUE'  should be changed to be more logical / coherent / sensical ..
>
> Is the above all correct in your view?
>
> Assuming yes,  I read basically two proposals, both agreeing
> that  recycle0 = TRUE  should only ever apply to the action of 'sep'
> but not the action of 'collapse'.
>
> 1) Bill and Hervé (I think) propose that 'recycle0' should have
>no effect whenever  'collapse = '
>
> 2) Gabe proposes that 'collapse = ' and 'recycle0 = TRUE'
>should be declared incompatible and error. If going in that
>direction, I could also see them to give a warning (and
>continue as if recycle = FALSE).
>

Herve makes a good point about when sep and collapse are both set. That
said, if the user explicitly sets recycle0, Personally, I don't think it
should be silently ignored under any configuration of other arguments.

If all of the arguments are to go into effect, the question then becomes
one of ordering, I think.

Consider

paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ", collapse = ",",
recycle0=TRUE)

Currently that returns character(0), becuase the logic is essenttially (in
pseudo-code)

collapse(paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ",
recycle0=TRUE), collapse = ", ", recycle0=TRUE)

 -> collapse(character(0), collapse = ", " recycle0=TRUE)

-> character(0)

Now Bill Dunlap argued, fairly convincingly I think, that paste(...,
collapse=) should *always* return a character vector of length
exactly one. With recycle0, though,  it will return "" via the progression

paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ", collapse = ",",
recycle0=TRUE)

 -> collapse(character(0), collapse = ", ")

-> ""


because recycle0 is still applied to the sep-based operation which occurs
before collapse, thus leaving a vector of length 0 to collapse.

That is consistent but seems unlikely to be what the user wanted, imho. I
think if it does this there should be at least a warning when paste
collapses to "

  1   2   3   4   >