Re: [Rd] Is ALTREP "non-API"?

2024-04-24 Thread Hadley Wickham
A few more thoughts based on a simple question: how do you determine the
length of a vector?

Rf_length() is used in example code in R-exts, but I don't think it's
formally documented anywhere (although it's possible I missed it). Is using
in an example sufficient to consider a function to be part of the public
API? If so, SET_TYPEOF() is used in a number of examples, and hence used by
CRAN packages, but is no longer considered part of the public API.

Rf_xlength() doesn't appear to be mentioned anywhere in R-exts. Does this
imply that long vectors are not part of the exported API? Or is there some
other way we should be determining the length of such vectors?

Are the macro variants LENGTH and XLENGTH part of the exported API? Are we
supposed to use them or avoid them?

Relatedly, I presume that LOGICAL() is the way we're supposed to extract
logical values from a vector, but it isn't documented in R-exts, suggesting
that it's not part of the public API?

---

It's also worth pointing out where R-exts does well, with the documentation
of utility functions (
https://cran.r-project.org/doc/manuals/R-exts.html#Utility-functions). I
think this is what most people would consider documentation to imply, i.e.
a list of input arguments/types, the output type, and basic notes on their
operation.
---

Finally, it's worth noting that there's some lingering ill feelings over
how the connections API was treated. It was documented in R-exts only to be
later removed, including expunging mentions of it in the news. That's
obviously water under the bridge, but I do believe that there is
the potential for the R core team to build goodwill with the community if
they are willing to engage a bit more with the users of their APIs.

Hadley

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is ALTREP "non-API"?

2024-04-24 Thread Hadley Wickham
>
>
>
> >>> That is not true at all - the presence of header does not constitute
> >> declaration of something as the R API. There are cases where internal
> >> functions are in the headers for historical or other reasons since the
> >> headers are used both for the internal implementation and packages.
> That's
> >> why this is in R-exts under "The R API: entry points for C code":
> >>>
> >>> If I understand your point correctly, does this mean that
> >> Rf_allocVector() is not part of the "official" R API? It does not
> appear to
> >> be documented in the "The R API: entry points for C code" section.
> >>>
> >>
> >> It does, obviously:
> >> https://cran.r-project.org/doc/manuals/R-exts.html#Allocating-storage-1
> >
> >
> > I'm just trying to understand the precise definition of the official API
> > here. So it's any function mentioned in R-exts, regardless of which
> section
> > it appears in?
> >
> > Does this sentence imply that all functions starting with alloc* are part
> > of the official API?
> >
>
> Again, I can only quote the R-exts (few lines below the previous "The R
> API" quote):
>
>
> We can classify the entry points as
> API
> Entry points which are documented in this manual and declared in an
> installed header file. These can be used in distributed packages and will
> only be changed after deprecation.
>
>
> It says "in this manual" - I don't see anywhere restriction on a
> particular section of the manual, so I really don't see why you would think
> that allocation is not part on the API.
>

Because you mentioned that section explicitly earlier in the thread. This
obviously seems clear to you, but it's not at all clear to me and I suspect
many of the wider community. It's frustrating because we are trying
our best to do what y'all want us to do, but it feels like we keep getting
the rug pulled out from under us with very little notice, and then have to
spend a large amount of time figuring out workarounds. That is at least
feasible for my team since we have multiple talented folks who are paid
full-time to work on R, but it's a huge struggle for most people who are
generally maintaining packages in their spare time.

For the purposes of this discussion could you please "documented in the
manual" means? For example, this line mentions allocXxx functions: "There
are quite a few allocXxx functions defined in Rinternals.h—you may want to
explore them.". Does that imply that they are documented and free to use?

And in general, I'd urge R Core to make an explicit list of functions that
you consider to be part of the exported API, and then grandfather in
packages that used those functions prior to learning that we weren't
supposed to.

Hadley


-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is ALTREP "non-API"?

2024-04-23 Thread Hadley Wickham
>
>
>
> > > ALTREP is part of the official R api, as illustrated by the presence of
> > > src/include/R_ext/Altrep.h. Everything declared in the header files in
> that
> > > directory is official API AFAIK (and I believe that is more definitive
> than
> > > the manuals).
> > >
> >
> > That is not true at all - the presence of header does not constitute
> declaration of something as the R API. There are cases where internal
> functions are in the headers for historical or other reasons since the
> headers are used both for the internal implementation and packages. That's
> why this is in R-exts under "The R API: entry points for C code":
> >
> > If I understand your point correctly, does this mean that
> Rf_allocVector() is not part of the "official" R API? It does not appear to
> be documented in the "The R API: entry points for C code" section.
> >
>
> It does, obviously:
> https://cran.r-project.org/doc/manuals/R-exts.html#Allocating-storage-1


I'm just trying to understand the precise definition of the official API
here. So it's any function mentioned in R-exts, regardless of which section
it appears in?

Does this sentence imply that all functions starting with alloc* are part
of the official API?

> For many purposes it is sufficient to allocate R objects and manipulate
those. There are quite a
> few allocXxx functions defined in Rinternals.h—you may want to explore
them.

Generally, things in a file with "internal" in its name are internal.

Hadley

-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is ALTREP "non-API"?

2024-04-22 Thread Hadley Wickham
On Mon, Apr 22, 2024 at 5:14 PM Simon Urbanek 
wrote:

>
>
> > On Apr 22, 2024, at 7:37 PM, Gabriel Becker 
> wrote:
> >
> > Hi Yutani,
> >
> > ALTREP is part of the official R api, as illustrated by the presence of
> > src/include/R_ext/Altrep.h. Everything declared in the header files in
> that
> > directory is official API AFAIK (and I believe that is more definitive
> than
> > the manuals).
> >
>
> That is not true at all - the presence of header does not constitute
> declaration of something as the R API. There are cases where internal
> functions are in the headers for historical or other reasons since the
> headers are used both for the internal implementation and packages. That's
> why this is in R-exts under "The R API: entry points for C code":
>

If I understand your point correctly, does this mean that Rf_allocVector()
is not part of the "official" R API? It does not appear to be documented in
the "The R API: entry points for C code" section.

Hadley

-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] @doctype is deprecated. need help for r package documentation

2024-03-07 Thread Hadley Wickham
Do you have a pointer to the roxygen2 comments that you're using?
Hadley

On Thu, Mar 7, 2024 at 5:38 AM Ruff, Sergej 
wrote:

> Hello,
>
> I need help with a package I am currently developing called bootGSEA.
>  I noticed that when I try ‘?bootGSEA’ it goes to the help page in R
> itself but not to the html page (we had this issue last time as well but we
> solved it by adding a documentation to the package itself to the R file)
> like from the before version of the package.
> I tried “_PACKAGE” in the documentation of the package section as @doctype
> is depreceated, but it still doesn’t seem to solve the issue. Do you have
> any idea on this?
>
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>


-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] how to use pkgdown::build_site() with a project using S7 with a specialized plot()?

2024-01-03 Thread Hadley Wickham
2.png, reference/figures/force_diagram.R,
> and
> reference/figures/force_diagram.png
> Writing 404.html
> ── Building function reference ──
> ───
> ---
> Backtrace:
> 1. pkgdown::build_site()
> 2. pkgdown:::build_site_external(pkg = pkg, examples = examples,
> run_dont_run = run_don…
> 3. callr::r(function(..., cli_colors, pkgdown_internet) { …
> 4. callr:::get_result(output = out, options)
> 5. callr:::throw(callr_remote_error(remerr, output), parent =
> fix_msg(remerr[[3]]))
> ---
> Subprocess backtrace:
>  1. pkgdown::build_site(...)
>  2. pkgdown:::build_site_local(pkg = pkg, examples = examples,
> run_dont_run = run_dont_r…
>  3. pkgdown::build_reference(pkg, lazy = lazy, examples = examples,
> run_dont_run = run_…
>  4. pkgdown::build_reference_index(pkg)
>  5. pkgdown::render_page(pkg, "reference-index", data =
> data_reference_index(pkg), …
>  6. pkgdown:::render_page_html(pkg, name = name, data = data, depth =
> depth)
>  7. utils::modifyList(data_template(pkg, depth = depth), data)
>  8. base::stopifnot(is.list(x), is.list(val))
>  9. pkgdown:::data_reference_index(pkg)
> 10. meta %>% purrr::imap(data_reference_index_rows, pkg = pkg) %>% …
> 11. base::unlist(., recursive = FALSE)
> 12. purrr::compact(.)
> 13. purrr::discard(.x, function(x) is_empty(.f(x)))
> 14. purrr:::where_if(.x, .p, ...)
> 15. purrr:::map_(.x, .p, ..., .type = "logical", .purrr_error_call =
> .purrr_error_call)
> 16. purrr:::vctrs_vec_compat(.x, .purrr_user_env)
> 17. purrr::imap(., data_reference_index_rows, pkg = pkg)
> 18. purrr::map2(.x, vec_index(.x), .f, ...)
> 19. purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
> 20. purrr:::with_indexed_errors(i = i, names = names, error_call =
> .purrr_error_call, …
> 21. base::withCallingHandlers(expr, error = function(cnd) { …
> 22. purrr:::call_with_cleanup(map2_impl, environment(), .type, .progress,
> …
> 23. local .f(.x[[i]], .y[[i]], ...)
> 24. pkgdown:::section_topics(section$contents, pkg$topics, pkg$src_path)
> 25. base::rbind(topics, ext_topics(ext_strings))
> 26. base::rbind(deparse.level, ...)
> 27. pkgdown:::ext_topics(ext_strings)
> 28. purrr::map2(pkg, fun, get_rd_from_help)
> 29. purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
> 30. purrr:::with_indexed_errors(i = i, names = names, error_call =
> .purrr_error_call, …
> 31. base::withCallingHandlers(expr, error = function(cnd) { …
> 32. purrr:::call_with_cleanup(map2_impl, environment(), .type, .progress,
> …
> 33. local .f(.x[[i]], .y[[i]], ...)
> 34. rlang::check_installed(package, "as it's used in the reference
> index.")
> 35. base::stop(cnd)
> 36. (function (cnd) …
> 37. cli::cli_abort(message, location = i, name = name, parent = cnd, …
> 38. | rlang::abort(message, ..., call = call, use_cli_format = TRUE, …
> 39. | rlang:::signal_abort(cnd, .file)
> 40. | base::signalCondition(cnd)
> 41. (function (cnd) …
> 42. cli::cli_abort(message, location = i, name = name, parent = cnd, …
> 43. | rlang::abort(message, ..., call = call, use_cli_format = TRUE, …
> 44. | rlang:::signal_abort(cnd, .file)
> 45. | base::signalCondition(cnd)
> 46. global (function (e) …
> Execution halted
>
>
>
> On Jan 3, 2024, at 1:06 PM, Hadley Wickham  wrote:
>
> CAUTION: The Sender of this email is not from within Dalhousie.
> This bug is fixed in the dev version (I don’t remember off the top of my
> head in which of pkgdown and roxygen2 you need but it might be both). I’m
> planning CRAN updates for both in the near future.
>
> Hadley
>
> On Thursday, January 4, 2024, Daniel Kelley  wrote:
>
>> # Question
>>
>> Is there an online example online of specializing `plot()` for S7
>> objects, such that `pkgdown::build_site()` will produce webpages?  I ask
>> because I find lots of users (of other packages) tend to consult websites
>> made with pkgdown, rather than using the online help within R.  I think the
>> problem I am having (discussed in the following sections) has to do with my
>> specialization of plot().  I say that because when I was using S3 objects
>> in an earlier version of my package, `pkgdown::build_site()` worked as
>> intended.
>>
>> # Background
>>
>> In my 'mooring' package (https://github.com/dankelley/mooring/tree/S7),
>> I am writing code like (https://github.com/dankelley/
>> mooring/blob/f70b53ca12e88968f65710c205b50a64f750a99d/R/plot.R#L69)
>>
>> ```R
>> #' @aliases plot.mooring
>> #' ETC
>> `plot.mooring::mooring` <- plot(ETC) ETC
>> ```
>>
>> to handle objects made with (https://github.com/dank

Re: [R-pkg-devel] how to use pkgdown::build_site() with a project using S7 with a specialized plot()?

2024-01-03 Thread Hadley Wickham
This bug is fixed in the dev version (I don’t remember off the top of my
head in which of pkgdown and roxygen2 you need but it might be both). I’m
planning CRAN updates for both in the near future.

Hadley

On Thursday, January 4, 2024, Daniel Kelley  wrote:

> # Question
>
> Is there an online example online of specializing `plot()` for S7 objects,
> such that `pkgdown::build_site()` will produce webpages?  I ask because I
> find lots of users (of other packages) tend to consult websites made with
> pkgdown, rather than using the online help within R.  I think the problem I
> am having (discussed in the following sections) has to do with my
> specialization of plot().  I say that because when I was using S3 objects
> in an earlier version of my package, `pkgdown::build_site()` worked as
> intended.
>
> # Background
>
> In my 'mooring' package (https://github.com/dankelley/mooring/tree/S7), I
> am writing code like (https://github.com/dankelley/mooring/blob/
> f70b53ca12e88968f65710c205b50a64f750a99d/R/plot.R#L69)
>
> ```R
> #' @aliases plot.mooring
> #' ETC
> `plot.mooring::mooring` <- plot(ETC) ETC
> ```
>
> to handle objects made with (https://github.com/dankelley/mooring/blob/
> f70b53ca12e88968f65710c205b50a64f750a99d/R/oo.R#L2)
>
> ```R
> mooringS7 <- S7::new_class("mooring",
> package = "mooring",
> ETC
> ```
>
> Built up in Rstudio, with Roxygen2 being used to create documentation,
> things seem to work, e.g.
>
> ```R
> m <- mooring(anchor(), wire(length = 80), float(), waterDepth = 100)
> plot(m)
> ```
>
> produces a plot as intended, and
>
> ```R
> ?plot.mooring
> ```
>
> produces documentation as intended.
>
> *However* I encounter a problem comes when I try building a website with
>
> ```R
> pkgdown::build_site()
> ```
>
> This yields results as in the next section.  (I apologize for the length.
> I'm including the whole thing because I thought that would be less
> bothersome than writing another email to the list.)
>
> I am not sure how to find the problem, and so I hope that someone on this
> list can point out an example of how to set up `plot()` to work with S7
> objects, in such a way that documentation can be created with Roxygen2 and
> websites can be made with `pkgdown::build_site()`.
>
> # What pkgdown::build_site() gives
>
> ```
> > library(pkgdown)
> > build_site()
> Warning: Failed to parse usage:
>
> S3method(`plot`, ``mooring::mooring``)(
>   x,
>   which = "shape",
>   showInterfaces = TRUE,
>   showDepths = FALSE,
>   showLabels = TRUE,
>   showDetails = FALSE,
>   fancy = FALSE,
>   title = "",
>   mar = c(1.5, 3.5, 3.5, 1),
>   mgp = c(2, 0.7, 0),
>   xlim = NULL,
>   xaxs = "r",
>   yaxs = "r",
>   type = "l",
>   debug = 0,
>   ...
> )
>
> -- Installing package into temporary library 
> == Building pkgdown site ==
> =
> Reading from: '/Users/kelley/git/mooring'
> Writing to:   '/Users/kelley/git/mooring/docs'
> -- Initialising site --
> -
> -- Building home --
> -
> Writing '404.html'
> -- Building function reference --
> ---
> Error:
> ! in callr subprocess.
> Caused by error in `map2(.x, vec_index(.x), .f, ...)`:
> ! In index: 1.
> ℹ See `$stdout` for standard output.
> Type .Last.error to see the more details.
> > .Last.error
> 
> Error:
> ! in callr subprocess.
> Caused by error in `map2(.x, vec_index(.x), .f, ...)`:
> ! In index: 1.
> ℹ See `$stdout` for standard output.
> ---
> Backtrace:
> 1. pkgdown::build_site()
> 2. pkgdown:::build_site_external(pkg = pkg, examples = examples, run…
> 3. callr::r(function(..., cli_colors, pkgdown_internet) { …
> 4. callr:::get_result(output = out, options)
> 5. callr:::throw(callr_remote_error(remerr, output), parent = fix_…
> ---
> Subprocess backtrace:
>  1. pkgdown::build_site(...)
>  2. pkgdown:::build_site_local(pkg = pkg, examples = examples, run_do…
>  3. pkgdown::build_reference(pkg, lazy = lazy, examples = examples, …
>  4. pkgdown::build_reference_index(pkg)
>  5. pkgdown::render_page(pkg, "reference-index", data = data_referen…
>  6. pkgdown:::render_page_html(pkg, name = name, data = data, depth =…
>  7. utils::modifyList(data_template(pkg, depth = depth), da…
>  8. base::stopifnot(is.list(x), is.list(val))
>  9. pkgdown:::data_reference_index(pkg)
> 10. meta %>% purrr::imap(data_reference_index_rows, pkg = p…
> 11. base::unlist(., recursive = FALSE)
> 12. purrr::compact(.)
> 13. purrr::discard(.x, function(x) is_empty(.f(x)))
> 14. purrr:::where_if(.x, .p, ...)
> 15. purrr:::map_(.x, .p, ..., .type = "logical", .purrr_error_call …
> 16. purrr:::vctrs_vec_compat(.x, .purrr_user_env)
> 17. purrr::imap(., data_reference_index_rows, pkg = pkg)
> 18. purrr::map2(.x, vec_index(.x), .f, ...)
> 19. purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
> 20. purrr:::with_indexed_errors(i = i, 

Re: [R-pkg-devel] Discrepancy between R CMD check results and usethis::use_cran_comments

2023-09-26 Thread Hadley Wickham
On Tue, Sep 26, 2023 at 2:01 AM Leonard Mada via R-package-devel
 wrote:
>
> Dear List-Members,
>
> There are no errors/warnings/notes when I run the check:
>
> ── R CMD check results
>  Rpdb 2.3.3 
> Duration: 2m 50.1s
>
> 0 errors ✔ | 0 warnings ✔ | 0 notes ✔
>
> However, there is a discrepancy when I run:
> usethis::use_cran_comments(open = rlang::is_interactive())
> =>
> 0 errors ✔ | 0 warnings ✔ | 1 note
>
> For some reason, the file is saved with 1 note. The discrepancy remains
> even if I restart R, delete the old cran-comments.md file, and re-run
> the check.

This is just a template; you need to fill it out with the actual
results. (There's usually one note on first submission which is why we
use this as the default.)

Hadley

-- 
http://hadley.nz
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] How to fix Archived Package Rpdb?

2023-09-08 Thread Hadley Wickham
On Fri, Sep 8, 2023 at 6:02 AM Leonard Mada via R-package-devel
 wrote:
>
> Dear Members,
>
> I would like to reanimate the archived package Rpdb:
> https://cran.r-project.org/web/packages/Rpdb/index.html
>
> 1.) I have tried to contact the original author by email, but got no
> response.
>
> 2.) New Repository on GitHub
> I have copied the existing code to a new repository on GitHub:
> https://github.com/discoleo/Rpdb
>
> - fixed the use of deprecated functions (rgl);
> - fixed some bug with Roxygen2;
> - I hope that all errors are now fixed;
>
> 2.b.) Description file
> - I left the original author as the author (with the provided e-mail
> address): should I delete this email?

It probably doesn't matter than much either way, but since the author
doesn't appear to respond to emails to that address, I personally
would lean towards deleting it.

> - I have added myself as maintainer;
> - I have increased the last digit of the version number;
> - I have added links to this new GitHub repository: I did not find any
> other links in the previous version (except to the pdb-format);
> - updated the licence to GPL v3: the original does not specify any
> version number;
>
>
> Is there anything else that needs to be done?

There are at least three 3 R CMD check failures you need to address:

* The Authors@R field in DESCRIPTION is incorrectly formed, you need
something like this:
c(
  person("Leonard", "Mada", email = "leo.m...@syonic.eu", role = c("cre")),
  person("Julien", "Idé", role = c("aut"))
)

* You need to add LICENSE to .Rbuildignore, or and IMO better, delete
that file and use usethis::use_gpl3_license() to the license in
markdown form, and correctly ignored for CRAN submission

* Many examples use `\%in\%` instead of `%in%.

To make these sorts of problems easier to spot in the future I'd
suggest setting up a GitHub action to automatically run R CMD check
every time you push to GitHub. One easy way to do that is to run
usethis::use_github_action("check-standard").

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Improving user-friendliness of S4 dispatch failure when mis-naming arguments?

2023-08-10 Thread Hadley Wickham
Hi Michael,

I can't help with S4, but I can help to make sure this isn't a problem
with S7. What do you think of the current error message? Do you see
anything obvious we could do to improve?

library(S7)

dbGetQuery <- new_generic("dbGetQuery", c("conn", "statement"))
dbGetQuery(connection = NULL, query = NULL)
#> Error: Can't find method for generic `dbGetQuery(conn, statement)`:
#> - conn : MISSING
#> - statement: MISSING

Hadley

On Wed, Aug 9, 2023 at 10:02 PM Michael Chirico via R-devel
 wrote:
>
> I fielded a debugging request from a non-expert user today. At root
> was running the following:
>
> dbGetQuery(connection = conn, query = query)
>
> The problem is that they've named the arguments incorrectly -- it
> should have been [1]:
>
> dbGetQuery(conn = conn, statement = query)
>
> The problem is that the error message "looks" highly confusing to the
> untrained eye:
>
> Error in (function (classes, fdef, mtable)  :   unable to find an
> inherited method for function ‘dbGetQuery’ for signature ‘"missing",
> "missing"’
>
> In retrospect, of course, this makes sense -- the mis-named arguments
> are getting picked up by '...', leaving the required arguments
> missing.
>
> But I was left wondering how we could help users right their own ship here.
>
> Would it help to mention the argument names? To include some code
> checking for weird combinations of missing arguments? Any other
> suggestions?
>
> Mike C
>
> [1] 
> https://github.com/r-dbi/DBI/blob/97934c885749dd87a6beb10e8ccb6a5ebea3675e/R/dbGetQuery.R#L62-L64
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] LICENSE file in an R package for CRAN submission

2023-08-09 Thread Hadley Wickham
If you're using one of the licenses supported by usethis
(https://usethis.r-lib.org/reference/licenses.html), you can just call
the appropriate function and it will do all the setup required to be
both CRAN and GitHub compatible.

Hadley

On Wed, Aug 9, 2023 at 10:10 AM Emanuele Cordano
 wrote:
>
> Dear list,
>
> is there a way to put the LICENSE file within an R package like in Github,
> I have an R package on Github with a a LICENSE file compliant to Github and
> containing the text of the licence citing in the DESCRIPION file. But when
> I check the package , I obatained the following output:
>
> * checking top-level files ... NOTE
> File
>   LICENSE
>
> is not mentioned in the DESCRIPTION file.
>
> How can I solve this?
> Thank you
> best
> Emanuele Cordano
> --
> Emanuele Cordano, PhD
> Environmental Engineer / Ingegnere per l' Ambiente e il territorio nr.
> 3587 (Albo A - Provincia di Trento)
> cell: +39 3282818564
> email: emanuele.cord...@gmail.com,emanuele.cord...@rendena100.eu,
> emanuele.cord...@eurac.edu
> PEC: emanuele.cord...@ingpec.eu
> URL: www.rendena100.eu
> LinkedIn: https://www.linkedin.com/in/emanuele-cordano-31995333
> GitHub: https://github.com/ecor
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] A demonstrated shortcoming of the R package management system

2023-08-08 Thread Hadley Wickham
Hi Dirk,

Do you think it's worth also/instead considering a fix to S4 to avoid
this caching issue in future R versions?

(This is top of my for me as we consider the design of S7, and I
recently made a note to ensure we avoid similar problems there:
https://github.com/RConsortium/OOP-WG/issues/317)

Hadley

On Sun, Aug 6, 2023 at 4:05 PM Dirk Eddelbuettel  wrote:
>
>
> CRAN, by relying on the powerful package management system that is part of R,
> provides an unparalleled framework for extending R with nearly 20k packages.
>
> We recently encountered an issue that highlights a missing element in the
> otherwise outstanding package management system. So we would like to start a
> discussion about enhancing its feature set. As shown below, a mechanism to
> force reinstallation of packages may be needed.
>
> A demo is included below, it is reproducible in a container. We find the
> easiest/fastest reproduction is by saving the code snippet below in the
> current directory as eg 'matrixIssue.R' and have it run in a container as
>
>docker run --rm -ti -v `pwd`:/mnt rocker/r2u Rscript /mnt/matrixIssue.R
>
> This runs in under two minutes, first installing the older Matrix, next
> installs SeuratObject, and then by removing the older Matrix making the
> (already installed) current Matrix version the default. This simulates a
> package update for Matrix. Which, as the final snippet demonstrates, silently
> breaks SeuratObject as the cached S4 method Csparse_validate is now missing.
> So when SeuratObject was installed under Matrix 1.5.1, it becomes unuseable
> under Matrix 1.6.0.
>
> What this shows is that a call to update.packages() will silently corrupt an
> existing installation.  We understand that this was known and addressed at
> CRAN by rebuilding all binary packages (for macOS and Windows).
>
> But it leaves both users relying on source installation as well as
> distributors of source packages in a dire situation. It hurt me three times:
> my default R installation was affected with unit tests (involving
> SeuratObject) silently failing. It similarly broke our CI setup at work.  And
> it created a fairly bad headache for the Debian packaging I am involved with
> (and I surmise it affects other distro similarly).
>
> It would be good to have a mechanism where a package, when being upgraded,
> could flag that 'more actions are required' by the system (administrator).
> We think this example demonstrates that we need such a mechanism to avoid
> (silently !!) breaking existing installations, possibly by forcing
> reinstallation of other packages.  R knows the package dependency graph and
> could trigger this, possibly after an 'opt-in' variable the user / admin
> sets.
>
> One possibility may be to add a new (versioned) field 'Breaks:'. Matrix could
> then have added 'Breaks: SeuratObject (<= 4.1.3)' preventing an installation
> of Matrix 1.6.0 when SeuratObject 4.1.3 (or earlier) is present, but
> permitting an update to Matrix 1.6.0 alongside a new version, say, 4.1.4 of
> SeuratObject which could itself have a versioned Depends: Matrix (>= 1.6.0).
>
> Regards,  Dirk
>
>
> ## Code example follows. Recommended to run the rocker/r2u container.
> ## Could also run 'apt update -qq; apt upgrade -y' but not required
> ## Thanks to my colleague Paul Hoffman for the core of this example
>
> ## now have Matrix 1.6.0 because r2u and CRAN remain current but we can 
> install an older Matrix
> remotes::install_version('Matrix', '1.5.1')
>
> ## we can confirm that we have Matrix 1.5.1
> packageVersion("Matrix")
>
> ## we now install SeuratObject from source and to speed things up we first 
> install the binary
> install.packages("SeuratObject")   # in this container via bspm/r2u as binary
> ## and then force a source installation (turning bspm off) _while Matrix is 
> at 1.5.1_
> if (requireNamespace("bspm", quietly=TRUE) bspm::disable()
> Sys.setenv(PKG_CXXFLAGS='-Wno-ignored-attributes')  # Eigen compilation 
> noise silencer
> install.packages('SeuratObject')
>
> ## we now remove the Matrix package version 1.5.1 we installed into 
> /usr/local leaving 1.6.0
> remove.packages("Matrix")
> packageVersion("Matrix")
>
> ## and we now run a bit of SeuratObject code that is now broken as 
> Csparse_validate is gone
> suppressMessages(library(SeuratObject))
> data('pbmc_small')
> graph <- pbmc_small[['RNA_snn']]
> class(graph)
> getClass('Graph')
> show(graph) # this fails
>
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Feedback on "Using Rust in CRAN packages"

2023-07-13 Thread Hadley Wickham
> > If CRAN cannot trust even the official one of Rust, why does CRAN have Rust 
> > at all?
> >
>
> I don't see the connection - if you downloaded something in the past it 
> doesn't mean you will be able to do so in the future. And CRAN has Rust 
> because it sounded like a good idea to allow packages to use it, but I can 
> see that it opened a can of worms that we trying to tame here.

Can you give a bit more detail about your concerns here? Obviously
crates.io isn't some random site on the internet, it's the official
repository of the Rust language, supported by the corresponding
foundation for the language. To me that makes it feel very much like
CRAN, where we can assume if you downloaded something in the past, you
can download something in the future.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Unfortunate function name generic.something

2023-05-08 Thread Hadley Wickham
If it's internal only, you could change the name to levels_no()?
Hadley

On Mon, May 8, 2023 at 7:28 AM Ulrike Groemping
 wrote:
>
> Thanks, Duncan. I appreciate the view that levels.no acts as an S3
> method for the generic levels, if an object of class "no" is handed to
> it. However, as the function is not intended as an S3 method, it does
> not make sense to document it as such. As the function is internal only,
> which makes the scenario that it causes trouble extremely unlikely, I
> will simply comment out the usage line for the function in order to get
> rid of the note but keep the usage visible. I hope that this is OK.
>
> Best, Ulrike
>
> Am 08.05.2023 um 13:58 schrieb Duncan Murdoch:
> > There really isn't such a thing as "a function that looks like an S3
> > method, but isn't".  If it looks like an S3 method, then in the proper
> > circumstances, it will be called as one.
> >
> > In your case the function name is levels.no, and it isn't exported.
> > So if you happen to have an object with a class inheriting from "no",
> > and you call levels() on it, levels.no might be called.
> >
> > This will only affect users of your package indirectly.  If they have
> > objects inheriting from "no" and call levels() on them, levels.no will
> > not be called.  But if they pass such an object to one of your package
> > functions, and that function calls levels() on it, they could end up
> > calling levels.no().  It all depends on what other classes that object
> > inherits from.
> >
> > You can test this yourself.  Set debugging on any one of your
> > functions, then call it in the normal way.  Then while still in the
> > debugger set debugging on levels.no, and create an object using
> >
> >   x <- structure(1, class = "no")
> >
> > and call levels(x).  You should break to the code of levels.no.
> >
> > That is why the WRE manual says "First, a caveat: a function named
> > gen.cl will be invoked by the generic gen for class cl, so do not name
> > functions in this style unless they are intended to be methods."
> >
> > So probably the best solution (even if inconvenient) is to rename
> > levels.no to something that doesn't look like an S3 method.
> >
> > Duncan Murdoch
> >
> > On 08/05/2023 5:50 a.m., Ulrike Groemping wrote:
> >> Thank your for the solution attempt. However, using the keyword internal
> >> does not solve the problem, the note is still there. Any other proposals
> >> for properly documenting a function that looks like an S3 method, but
> >> isn't?
> >>
> >> Best, Ulrike
> >>
> >> Am 05.05.2023 um 12:56 schrieb Iris Simmons:
> >>> You can add
> >>>
> >>> \keyword{internal}
> >>>
> >>> to the Rd file. Your documentation won't show up the in the pdf
> >>> manual, it won't show up in the package index, but you'll still be
> >>> able to access the doc page with ?levels.no  or
> >>> help("levels.no ").
> >>>
> >>> This is usually used in a package's deprecated and defunct doc pages,
> >>> but you can use it anywhere.
> >>>
> >>> On Fri, May 5, 2023, 06:49 Ulrike Groemping
> >>>  wrote:
> >>>
> >>>  Dear package developeRs,
> >>>
> >>>  I am working on fixing some notes regarding package DoE.base.
> >>>  One note refers to the function levels.no  and
> >>>  complains that the
> >>>  function is not documented as a method for the generic function
> >>>  levels.
> >>>  Actually, it is not a method for the generic levels, but a
> >>> standalone
> >>>  internal function that I like to have documented.
> >>>
> >>>  Is there a way to document the function without renaming it and
> >>>  without
> >>>  triggering a note about method documentation?
> >>>
> >>>  Best, Ulrike
> >>>
> >>>  --
> >>>  ##
> >>>  ## Prof. Ulrike Groemping
> >>>  ## FB II
> >>>  ## Berliner Hochschule für Technik (BHT)
> >>>  ##
> >>>  ## prof.bht-berlin.de/groemping
> >>> 
> >>>  ## Phone: +49(0)30 4504 5127
> >>>  ## Fax:   +49(0)30 4504 66 5127
> >>>  ## Home office: +49(0)30 394 04 863
> >>>  ##
> >>>
> >>>  __
> >>>  R-package-devel@r-project.org mailing list
> >>>  https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >>>
> >>
> >> __
> >> R-package-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Acknowledging small functions from another package

2023-05-04 Thread Hadley Wickham
IMO those functions are so small that you don't need to call them out
in your DESCRIPTION. Just note in a nearby comment where they came
from.

Hadley

On Thu, May 4, 2023 at 3:21 AM David Hugh-Jones
 wrote:
>
> Hi,
>
> One of my packages copy-pasted some small functions (stuff like `%||%` for
> is.null) from ggplot2. (Both packages are MIT-licensed.)
>
> What is an appropriate way to acknowledge this in the DESCRIPTION Author:
> or Authors@R section? (Note that the list of ggplot2 authors is long and
> changing.)
>
> Cheers,
> David
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Changing R Package Maintainer

2023-04-07 Thread Hadley Wickham
Just submit your package, and you'll get an automated email to the old address.

Hadley

On Sat, Apr 8, 2023 at 7:14 AM Andrew Simmons  wrote:
>
> Hi,
>
>
> I'm changing my name and my email address. I've got an update I'd like to
> submit to CRAN, I've changed my name and email in my DESCRIPTION.
>
> I couldn't find any details about changing maintainers in the R manuals
> unfortunately. Someone online said to just submit the update, CRAN will
> send one email to the new address confirming the submission, and another to
> the old address confirming the new maintainer. Someone else said to email
> CRAN from the old address about the new maintainer and their address, and
> wait for a response of approval before submission. It was unclear if that
> would be cran-submissi...@r-project.org or c...@r-project.org, but I'd
> guess the first.
>
> Has anyone else done this before or does anyone know the best procedure?
> Also, given that this isn't a transfer of ownership, I'm still the same
> person with a different name, would that make this process easier?
>
>
> Thank you!
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Best way forward on a CRAN archived package

2022-10-16 Thread Hadley Wickham
I’d suggest resubmitting, after ensuring that R CMD  check runs without any
notes, warnings, or errors.

Hadley

On Monday, October 10, 2022, Diego Hernangómez Herrero <
diego.hernangomezherr...@gmail.com> wrote:

> Hi:
>
> I have some doubts on how to proceed in this case. I am the developer of
> tidyterra, and I received an email from CRAN on 23Sep2022 about an issue on
> the package, setting a deadline on 07Oct2022 to correct it.
>
> I sent a patch that was accepted on CRAN on 29Sep2022, that fixed the issue
> (or at least I am pretty sure I solved it). I received no further feedback
> by CRAN, so I assumed the package was safe. However it was finally archived
> on 07Oct2022.
>
> I have already sent an email to CRAN in order to check if they think the
> issues still persist (or maybe they missed the patch submission?), but I am
> in a rush since there are other packages that depend on tidyterra and they
> may be in risk of being archived on CRAN as well.
>
> So my question is: What is the best way forward at this point? Should I
> wait to get some feedback from CRAN or is it best to resubmit the package
> (I already have a new patch prepared)? I acknowledge that  "The time of the
> volunteers is CRAN’s most precious resource" so my goal is to reduce their
> burden as much as possible.
>
> Kind regards
>
> --
>
>
>
> Have a nice day!
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>


-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Issue handling datetimes: possible differences between computers

2022-10-10 Thread Hadley Wickham
On Sun, Oct 9, 2022 at 9:31 PM Jeff Newmiller  wrote:
>
> ... which is why tidyverse functions and Python datetime handling irk me so 
> much.
>
> Is tidyverse time handling intrinsically broken? They have a standard 
> practice of reading time as UTC and then using force_tz to fix the "mistake". 
> Same as Python.

Can you point to any docs that lead you to this conclusion so we can
get them fixed? I strongly encourage people to parse date-times in the
correct time zone; this is why lubridate::ymd_hms() and friends have a
tz argument.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] CRAN package isoband and its reverse dependencies

2022-10-05 Thread Hadley Wickham
Yes, we will make sure that this is fixed ASAP. There is no need to worry.

Hadley

On Wed, Oct 5, 2022 at 7:32 AM John Harrold  wrote:
>
> Howdy Folks,
>
> I got a message from CRAN today telling me that I have a strong reverse
> dependency on the isoband package. But I'm not alone! It look like more
> than 4700 other packages also have a strong dependency on this. Is there
> some organized effort to deal with this?
>
> Thanks
> John
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Non-ASCII and CRAN Checks

2022-09-20 Thread Hadley Wickham
In my experience this NOTE does not interfere with CRAN submission and you
can ignore it.

Hadley

On Monday, September 19, 2022, Igor L  wrote:

> Hello everybody,
>
> I'm testing my package with the devtools::check() function and I got a
> warning about found non-ASCII strings.
>
> These characters are in a dataframe and, as they are names of institutions
> used to filter databases, it makes no sense to translate them.
>
> Is there any way to make the check accept these characters?
>
> They are in latin1 encoding.
>
> Thanks in advance!
>
> --
> *Igor Laltuf Marques*
> Economist (UFF)
> Master in urban and regional planning (IPPUR-UFRJ)
> Researcher at ETTERN e CiDMob
> https://igorlaltuf.github.io/
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>


-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [External] Re: What is a "retired"package?

2021-09-21 Thread Hadley Wickham
Yes, as Lionel said that is why we have changed our terminology to
superseded — we wanted to imply that a retired package was still a
useful member of society, if not working full-time anymore, but most
people seem to think that retired means that we took the package out
behind the shed and put it out of its misery.

Haley

On Tue, Sep 21, 2021 at 1:43 PM Lenth, Russell V
 wrote:
>
> Hadley,
>
> As I suspected, and a good point. But please note that the term "retired" 
> causes angst, and it may be good to change that to "superceded" or something 
> else.
>
> As a side note, I'll mention that I myself am retired, and I'll claim that 
> that does not make me less dependable. But one difference in retirement is 
> that I now care less about public embarrassment, such as not knowing that all 
> along, I could have used base::apply instead of plyr::aaply.
>
> -Original Message-
> From: Hadley Wickham 
> Sent: Tuesday, September 21, 2021 11:48 AM
> To: Lenth, Russell V 
> Cc: Jeff Newmiller ; r-package-devel@r-project.org
> Subject: Re: [R-pkg-devel] [External] Re: What is a "retired"package?
>
> > But for the broader question, Jeff is saying that there really are 700 
> > packages that are in potential trouble!
>
> I think that's rather an overstatement of the problem — there's nothing wrong 
> with plyr; it's just no longer under active development.
> If anything, plyr is one of the safest packages to depend upon because you 
> can know it will never change :)
>
> Hadley
>
> --
> http://hadley.nz



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [External] Re: What is a "retired"package?

2021-09-21 Thread Hadley Wickham
> But for the broader question, Jeff is saying that there really are 700 
> packages that are in potential trouble!

I think that's rather an overstatement of the problem — there's
nothing wrong with plyr; it's just no longer under active development.
If anything, plyr is one of the safest packages to depend upon because
you can know it will never change :)

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] best LICENSE practices: AGPL-3 + LaTeX Project Public License

2021-07-15 Thread Hadley Wickham
On Wed, Jul 14, 2021 at 9:11 AM Ben Bolker  wrote:
>
>
>In the process of trying to get a package to build successfully on
> r-hub's Fedora platform, I had to add a whole bunch of LaTeX .sty files
> to the vignette directory.  One of these was collectbox.sty, which
> triggers the NOTE
>
> ---
> NOTE
> The following files contain a license that requires
> distribution of original sources:
>‘collectbox.sty’
> ---
>
>The licensing/copyright information in collectbox.sty is as follows:
>
>
> %% The original source files were:
> %%
> %% collectbox.dtx  (with options: `collectbox.sty')
> %%
> %% IMPORTANT NOTICE:
> %%
> %% For the copyright see the source file.
> %%
> %% Any modified versions of this file must be renamed
> %% with new filenames distinct from collectbox.sty.
> %%
> %% For distribution of the original source see the terms
> %% for copying and modification in the file collectbox.dtx.
> %%
> %% This generated file may be distributed as long as the
> %% original source files, as listed above, are part of the
> %% same distribution. (The sources need not necessarily be
> %% in the same archive or directory.)
> %% Copyright (C) 2012 by Martin Scharrer 
> %% 
> %% This work may be distributed and/or modified under the
> %% conditions of the LaTeX Project Public License, either version 1.3
> %% of this license or (at your option) any later version.
> %% The latest version of this license is in
> %%   http://www.latex-project.org/lppl.txt
> %% and version 1.3 or later is part of all distributions of LaTeX
> %% version 2005/12/01 or later.
>
> So I put collectbox.dtx into the inst/misc directory in the package.
> Fine.
>
>   Now, what do I need to do to (1) make sure that my DESCRIPTION file is
> correct and (2) hopefully, suppress the NOTE so I don't have to explain
> it to the CRAN maintainers every time?
>
> * Do I change the LICENCE line (which is currently AGPL-3)? According to
> https://cran.r-project.org/doc/manuals/R-exts.html#Licensing it would
> seem I would have to switch to "file LICENCE" (adding a
> "Licence_is_FOSS: yes"), where "LICENCE" contains something like
>
> package code licensed under AGPL-3; file vignettes/collectbox.sty is
> under the LaTeX Project Public License (source provided in
> misc/collectbox.dtx)
>
> ? Should it say "file LICENCE" or "AGPL-3 + file LICENCE" ?
>
> * Do I just include the files without comment, since I have complied (as
> far as I can tell) with the terms of the LPPL?

It's my understanding that the goal of the license field is to list
one license that the entire package can be distributed under (i.e. is
compatible with all licenses in the package). As long as you believe
that LPPL is compatible with the AGPL-3, then it's fine to keep the
license as AGPL-3.

I don't believe it would be correct to use "AGPL-3 + file LICENSE` as
R-exts only lists three uses of file LICENSE, none of which apply to
your case:

> If a package license restricts a base license (where permitted, e.g., using 
> GPL-3 or AGPL-3 with an
> attribution clause), the additional terms should be placed in file LICENSE 
> (or LICENCE), and the
> string ‘+ file LICENSE’ (or ‘+ file LICENCE’, respectively) should be 
> appended to the corresponding
> individual license specification.

> The optional file LICENSE/LICENCE contains a copy of the license of the 
> package...
> Whereas you should feel free to include a license file in your source 
> distribution, please do not arrange to
install yet another copy of the GNU COPYING or COPYING.LIB files ...
> Since files named LICENSE or LICENCE will be installed, do not use these 
> names for standard license files.

> A few “standard” licenses are rather license templates which need additional 
> information to be
> completed via ‘+ file LICENSE’.

I also recommend two additional changes:

* Include a LICENSE.note field that describes any parts of the package
that are available under other licenses.

* Add the authors of the included files to Authors@R

See https://r-pkgs.org/license.html#how-to-include for more details. I
haven't had any explicit feedback on these recommendations from CRAN
but they have worked for me in package submissions and align with my
(possibly flawed) understanding of CRAN policies and beliefs around
licensing.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] winUCRT failures

2021-04-25 Thread Hadley Wickham
Who is responsible for the winUCRT checks? Perhaps that person could
provide us with a list of root causes behind the testthat failures,
and we could look into resolving them.

Hadley

On Sun, Apr 25, 2021 at 7:50 AM Duncan Murdoch  wrote:
>
> The current CRAN release of rgl fails on winUCRT because of missing
> dependencies:
>
> 'htmlwidgets', 'htmltools', 'knitr', 'jsonlite', 'shiny', 'magrittr',
> 'crosstalk', 'manipulateWidget'.
>
> Tracing `htmlwidgets` shows it also fails because of missing dependencies:
>
> 'htmltools', 'jsonlite', 'yaml'
>
> and 'htmltools' fails because of missing dependencies
>
> 'digest', 'base64enc', 'rlang'
>
> but 'digest' only gets a warning (congratulations, Dirk!), 'base64enc'
> gets a NOTE (hurray Simon!).  'rlang' is failing a test because of a
> missing suggested dependency on 'glue'.  At that point I stopped searching.
>
> Does anyone have a list of packages that actually need fixes?  I'd like
> to help those maintainers with the necessary updates.
>
> Duncan Murdoch
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] winUCRT failures

2021-04-25 Thread Hadley Wickham
> One additional thought:
>
> If the testing package (i.e. testthat in this case) had been available
> but other suggested packages were not, it would be worth running tests
> with just testthat present:  that might be why you called the decision
> defensible.  I'd agree with that.
>
> However, it's still true that the fact that testthat has to be present
> to make magrittr available is a pretty serious flaw in magrittr and/or
> the CRAN processes.  Hopefully magrittr's authors are less stubborn than
> R Core/CRAN, and will make their package more resilient.

Isn't this a 0/0 problem? If there are zero failures from zero tests,
do we really want to declare that the package is ok?

I'm not interested in participating in another debate about whether or
not one should assume that suggested packages are available when
checking a package. Some time ago, we decided to install all suggested
packages when running reverse dependency checks and it has caused us
few problems (especially since linux binaries for all CRAN packages
are now readily available). Either way, isn't it easier for the
handful of experienced developers who perform many R CMD check runs to
install all suggested packages, rather than trying to get thousands of
individual package maintainers to change their behaviour?

Hadley



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] replicate evaluates its second argument in wrong environment

2021-02-15 Thread Hadley Wickham
On Monday, February 15, 2021, David Winsemius 
wrote:

>
> On 2/15/21 1:10 PM, Hadley Wickham wrote:
>
>> This is a nice example of the motivation for tidy evaluation — since
>> enquo() captures the environment in which the promise should be
>> evaluated, there's no need for an additional explicit argument.
>>
>> library(rlang)
>>
>> replicate2 <- function (n, expr, simplify = "array") {
>>exnr <- enquo(expr)
>>
>
> It does not appear that the line above would accomplish anything given the
> succeeding line. Or am I missing something? Taking it out doesn't seem to
> affect results. Whatever magic there is seems to be in the `eval_tidy`
> function, whose mechanism or rules seem opaque. Was "exnr" supposed to be
> passed to `eval_tidy`?
>
>
Oops, yes, obviously that was supposed to be expr. It doesn’t matter in
Gabor’s example because it evaluates to a constant but obviously would
matter in other cases.

Hadley


> --
>
> David.
>
>sapply(integer(n), function(i) eval_tidy(expr), simplify = simplify)
>> }
>>
>> doRep2 <- function(a, b) sapply(a, replicate2, b)
>> doRep2(3, 2)
>> #>  [,1]
>> #> [1,]2
>> #> [2,]2
>> #> [3,]2
>>
>> Hadley
>>
>> On Sat, Feb 13, 2021 at 7:09 AM Gabor Grothendieck
>>  wrote:
>>
>>> Currently replicate used within sapply within a function can fail
>>> because it gets the environment for its second argument, which is
>>> currently hard coded to be the parent frame, wrong.  See this link for
>>> a full example of how it goes wrong and how it could be made to work
>>> if it were possible to pass an envir argument to it.
>>>
>>> https://stackoverflow.com/questions/66184446/sapplya-replica
>>> te-b-expression-no-longer-works-inside-a-function/66185079#66185079
>>>
>>> --
>>> Statistics & Software Consulting
>>> GKX Group, GKX Associates Inc.
>>> tel: 1-877-GKX-GROUP
>>> email: ggrothendieck at gmail.com
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>
>>

-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] replicate evaluates its second argument in wrong environment

2021-02-15 Thread Hadley Wickham
This is a nice example of the motivation for tidy evaluation — since
enquo() captures the environment in which the promise should be
evaluated, there's no need for an additional explicit argument.

library(rlang)

replicate2 <- function (n, expr, simplify = "array") {
  exnr <- enquo(expr)
  sapply(integer(n), function(i) eval_tidy(expr), simplify = simplify)
}

doRep2 <- function(a, b) sapply(a, replicate2, b)
doRep2(3, 2)
#>  [,1]
#> [1,]2
#> [2,]2
#> [3,]2

Hadley

On Sat, Feb 13, 2021 at 7:09 AM Gabor Grothendieck
 wrote:
>
> Currently replicate used within sapply within a function can fail
> because it gets the environment for its second argument, which is
> currently hard coded to be the parent frame, wrong.  See this link for
> a full example of how it goes wrong and how it could be made to work
> if it were possible to pass an envir argument to it.
>
> https://stackoverflow.com/questions/66184446/sapplya-replicate-b-expression-no-longer-works-inside-a-function/66185079#66185079
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Testing on old R versions

2021-01-31 Thread Hadley Wickham
On Sun, Jan 31, 2021 at 4:32 AM Duncan Murdoch  wrote:
>
> I am trying out a modified version of the tidyverse actions, and it does
> seem to be going well.  Just one question:
>
> rgl has a soft dependency on alphashape3d, and alphashape3d has a hard
> dependency on rgl.  This means that I need to install in the order
>
>rgl hard dependencies
>rgl
>rgl soft dependencies
>
> Currently I'm using this code to do that:
>
># First install rgl with minimal deps then the rest
>devtools::install()
>remotes::install_deps(dependencies = TRUE)
>
> but devtools seems unnecessarily heavy for this.  Does remotes have a
> way to specify the install in the right order?

I forget all the details here but instead of `devtools::install()`,
I'm pretty sure you can just use `remotes::install_local()`.

We also have an experimental workflow that uses pak
(https://github.com/r-lib/actions/blob/master/examples/check-pak.yaml);
and pak should know enough about the full dependency graph to install
all dependencies in the correct order without additional work. This
workflow (and pak) itself, is still a bit experimental but we've
started to move the tidyverse workflows to use pak, and haven't had
many problems recently.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] New pipe operator and gg plotz

2020-12-09 Thread Hadley Wickham
Another option is https://github.com/hadley/ggplot1 藍
Hadley

On Wed, Dec 9, 2020 at 1:24 PM Duncan Murdoch  wrote:
>
> Looks like Sergio Oller took your ambitious approach:
> https://github.com/zeehio/ggpipe.  It hasn't been updated since 2017, so
> there may be some new things in ggplot2 that aren't there yet.
>
> Duncan Murdoch
>
> On 09/12/2020 2:16 p.m., Greg Snow wrote:
> > Since `+` is already a function we could do regular piping to change this 
> > code:
> >
> > mtcars %>%
> >ggplot(aes(x=wt, y=mpg)) +
> >geom_point()
> >
> > to this:
> >
> > mtcars %>%
> >ggplot(aes(x=wt, y=mpg)) %>%
> >`+`(geom_point())
> >
> > Further we can write wrapper functions like:
> >
> > p_geom_point <- function(x,...) {
> >x + geom_point(...)
> > }
> >
> > The run the code like:
> >
> > mtcars %>%
> >ggplot(aes(x=wt, y=mpg)) %>%
> >p_geom_point()
> >
> > All three of the above give the same plot from what I can see, but I
> > have not tested it with very many options beyond the above.
> >
> > A really ambitious person could create a new package with wrappers for
> > all the ggplot2 functions that can come after the plus sign, then we
> > could use pipes for everything.  I don't know if there are any strange
> > circumstances that would make this cause problems (it probably will
> > slow things down slightly, but probably not enough for people to
> > notice).
> >
> > On Sun, Dec 6, 2020 at 7:18 PM Avi Gross via R-devel
> >  wrote:
> >>
> >> Thanks, Duncan. That answers my question fairly definitively.
> >>
> >> Although it can be DONE it likely won't be for the reasons Hadley 
> >> mentioned until we get some other product that replaces it entirely. There 
> >> are some interesting work-arounds mentioned.
> >>
> >> I was thinking of one that has overhead but might be a pain. Hadley 
> >> mentioned a slight variant. The first argument to a function now is 
> >> expected to be the data argument. The second might be the mapping. Now if 
> >> the function is called with a new first argument that is a ggplot object, 
> >> it could be possible to test the type and if it is a ggplot object than 
> >> slide over carefully any additional matched arguments that were not 
> >> explicitly named. Not sure that is at all easy to do.
> >>
> >> Alternately, you can ask that when used in such a pipeline that the user 
> >> call all other arguments using names like data=whatever, 
> >> mapping=aes(whatever) so no other args need to be adjusted by position.
> >>
> >> But all this is academic and I concede will likely not be done. I can live 
> >> with the plus signs.
> >>
> >>
> >> -Original Message-
> >> From: Duncan Murdoch 
> >> Sent: Sunday, December 6, 2020 2:50 PM
> >> To: Avi Gross ; 'r-devel' 
> >> Subject: Re: [Rd] New pipe operator and gg plotz
> >>
> >> Hadley's answer (#7 here:
> >> https://community.rstudio.com/t/why-cant-ggplot2-use/4372) makes it pretty 
> >> clear that he thinks it would have been nice now if he had made that 
> >> choice when ggplot2 came out, but it's not worth the effort now to change 
> >> it.
> >>
> >> Duncan Murdoch
> >>
> >> On 06/12/2020 2:34 p.m., Avi Gross via R-devel wrote:
> >>> As someone who switches back and forth between using standard R methods 
> >>> and those of the tidyverse, depending on the problem, my mood and whether 
> >>> Jupiter aligns with Saturn in the new age of Aquarius, I have a question 
> >>> about the forthcoming built-in pipe. Will it motivate anyone to 
> >>> eventually change or enhance the ggplot functionality to have a version 
> >>> that gets rid of the odd use of the addition symbol?
> >>>
> >>> I mean I now sometimes have a pipeline that looks like:
> >>>
> >>> Data %>%
> >>>Do_this %>%
> >>>Do_that(whatever) %>%
> >>>ggplot(...) +
> >>>geom_whatever(...) +
> >>>...
> >>>
> >>> My understanding is this is a bit of a historical anomaly that might 
> >>> someday be modified back.
> >>>
> >>> As I understand it, the call to ggplot() creates a partially filled-in 
> >>> object that holds all kinds of useful info. The additional calls to 
> >>> geom_point() and so on will add/change that hidden object. Nothing much 
> >>> happens till the object is implicitly or explicitly given to print() 
> >>> which switches to the print function for objects of that type and creates 
> >>> a graph based on the contents of the object at that time. So, in theory, 
> >>> you could have a pipelined version of ggplot where the first function 
> >>> accepts something like a  data.frame or tibble as the default first 
> >>> argument and at the end returns the object we have been describing. All 
> >>> additional functions would then accept such an object as the (hidden?) 
> >>> first argument and return the modified object. The final function in the 
> >>> pipe would either have the value captured in a variable for later use or 
> >>> print implicitly generating a graph.
> >>>
> >>> So the above 

Re: [Rd] New pipe operator

2020-12-08 Thread Hadley Wickham
I just wanted to pipe in here (HA HA) to say that I agree with Kevin.
I've never loved the complicated magrittr rule (which has personally
tripped me up a couple of times) and I think the compact inline
function syntax provides a more general solution. It is a bit more
typing, and it will require a little time for your eyes to get used to
the new syntax, but overall I think it's a better solution.

In general, I think the base pipe does an excellent job of taking what
we've learned from 6 years of magrittr, keeping what has been most
successful while discarding complications around the edges.

Hadley

On Mon, Dec 7, 2020 at 1:05 PM Kevin Ushey  wrote:
>
> IMHO the use of anonymous functions is a very clean solution to the
> placeholder problem, and the shorthand lambda syntax makes it much
> more ergonomic to use. Pipe implementations that crawl the RHS for
> usages of `.` are going to be more expensive than the alternatives. It
> is nice that the `|>` operator is effectively the same as a regular R
> function call, and given the identical semantics could then also be
> reasoned about the same way regular R function calls are.
>
> I also agree usages of the `.` placeholder can make the code more
> challenging to read, since understanding the behavior of a piped
> expression then requires scouring the RHS for usages of `.`, which can
> be challenging in dense code. Piping to an anonymous function makes
> the intent clear to the reader: the programmer is likely piping to an
> anonymous function because they care where the argument is used in the
> call, and so the reader of code should be aware of that.
>
> Best,
> Kevin
>
>
>
> On Mon, Dec 7, 2020 at 10:35 AM Gabor Grothendieck
>  wrote:
> >
> > On Mon, Dec 7, 2020 at 12:54 PM Duncan Murdoch  
> > wrote:
> > > An advantage of the current implementation is that it's simple and easy
> > > to understand.  Once you make it a user-modifiable binary operator,
> > > things will go kind of nuts.
> > >
> > > For example, I doubt if there are many users of magrittr's pipe who
> > > really understand its subtleties, e.g. the example in Luke's paper where
> > > 1 %>% c(., 2) gives c(1,2), but 1 %>% c(c(.), 2) gives c(1, 1, 2). (And
> > > I could add 1 %>% c(c(.), 2, .) and  1 %>% c(c(.), 2, . + 2)  to
> > > continue the fun.)
> >
> > The rule is not so complicated.  Automatic insertion is done unless
> > you use dot in the top level function or if you surround it with
> > {...}.  It really makes sense since if you use gsub(pattern,
> > replacement, .) then surely you don't want automatic insertion and if
> > you surround it with { ... } then you are explicitly telling it not
> > to.
> >
> > Assuming the existence of placeholders a possible simplification would
> > be to NOT do automatic insertion if { ... } is used and to use it
> > otherwise although personally having used it for some time I find the
> > existing rule in magrittr generally does what you want.
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] best practices for handling a mixed-licensed package

2020-10-03 Thread Hadley Wickham
This is why I recommend that if you copy an entire directory of code you
include the LICENSE file for that directory; if you copy a single file,
make the license clear at the comment in a top of the file. This is
standard practice in most open source communities.

If you’re writing open source code, I don’t think it’s necessary to retain
a lawyer in order to handle these commonplace issues. (OTOH if you’re
building a business on top of open source code, hiring a lawyer is
absolutely essential).

Hadley

On Saturday, October 3, 2020, Jeff Newmiller 
wrote:

> You are addressing interpretation of "a license", while my concern is not
> with the licenses themselves but with the identification of which code goes
> with which license. Assuming that you will need to retain lawyers to decide
> how to handle a license in a particular use case may be reasonable, but
> assuming you will also use them to parse the files in the package  seems
> rather less reasonable IMO when you have such a clear alternative
> (packaging).
>
> On October 3, 2020 9:02:02 AM PDT, Dirk Eddelbuettel 
> wrote:
> >
> >On 3 October 2020 at 09:54, Hadley Wickham wrote:
> >| I think this is a bit of an oversimplification, especially given that
> >| "compatibility" is not symmetric. For example, you can include MIT
> >license
> >| code in a GPL licensed package; you can not include GPL licensed code
> >| inside an MIT licensed package. There are some rough guidelines at
> >| https://r-pkgs.org/license.html#license-compatibility.
> >
> >One approach for issues such as legal matters is to consult
> >subject-matter
> >experts which is why I pointed (in a prior private message spawned by
> >this
> >same thread) to sites such as
> >
> >  https://tldrlegal.com/
> >  https://choosealicense.com/
> >
> >Dirk
>
> --
> Sent from my phone. Please excuse my brevity.
>


-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] best practices for handling a mixed-licensed package

2020-10-03 Thread Hadley Wickham
On Fri, Oct 2, 2020 at 5:26 PM Dirk Eddelbuettel  wrote:

>
> On 2 October 2020 at 14:44, Jeff Newmiller wrote:
> | if you want clarity in the minds of _users_ I would beg you to split the
> code into two packages. People will likely either be afraid of the GPL
> bogey man and refrain from utilizing your MIT code as permitted or fail to
> honor the GPL terms correctly if both are in the same package.
>
> Have you read R's own doc/COPYRIGHTS ?
>
>https://github.com/wch/r-source/blob/trunk/doc/COPYRIGHTS
>
> In short the opposite of what you just suggest.
>
> Also, labels such as "more liberal" or "permissive" or "bogey man" are not
> exactly unambiguous.  Different people can and do have different views
> here.
> I would suggest using simpler terms such as "different". What matters is
> that
> the licenses permit open source use while ensuring they are compatible
> which
> is generally the case these days.
>

I think this is a bit of an oversimplification, especially given that
"compatibility" is not symmetric. For example, you can include MIT license
code in a GPL licensed package; you can not include GPL licensed code
inside an MIT licensed package. There are some rough guidelines at
https://r-pkgs.org/license.html#license-compatibility.

Hadley

-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Including full text of open source licenses in a package

2020-09-12 Thread Hadley Wickham
On Saturday, September 12, 2020, Hugh Parsonage 
wrote:

> Perhaps I have misread that excerpt from WRE, but my read is that
> package authors should not duplicate GNU COPYING, since it is present
> in all R distributions already when using GPL-2 and friends.  It
> doesn't apply to packages distributed with other licenses.
>
>
The directory to which it refers, https://www.r-project.org/Licenses/,
includes many open source licenses, not just those used for R. I’m also
pretty sure I’ve had a package fail CRAN submission for this problem in the
past.


> It should be noted that in GPL FAQ just below the part you quoted it says
> > A clear statement in the program's README file is legally sufficient as
> long as that accompanies the code, but it is easy for them to get separated.
>

That question (https://www.gnu.org/licenses/gpl-faq.en.html#LicenseCopyOnly) is
about whether a copy of the license in a file is sufficient, or whether you
must also include a statement at the top of every source file.

Hadley


-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Including full text of open source licenses in a package

2020-09-11 Thread Hadley Wickham
Hi all,

R-exts currently requests that package authors don't include copies of
standard licenses:

> Whereas you should feel free to include a license file in your source 
> distribution, please do
> not arrange to install yet another copy of the GNU COPYING or COPYING.LIB 
> files but
> refer to the copies on https://www.R-project.org/Licenses/ and included in 
> the R distribution
> (in directory share/licenses). Since files named LICENSE or LICENCE will be 
> installed,
> do not use these names for standard license files.

I'd like to request that this condition be removed because it makes it
overly difficult to ensure that every version of your package (source,
tar.gz, binary, and installed) includes the full text of the license.
This is important because most open source licenses explicitly require
that you include the full text of the license. For example, the GPL
faq (http://www.gnu.org/licenses/gpl-faq.html#WhyMustIInclude) states:

> Why does the GPL require including a copy of the GPL with every copy of the 
> program?
> (#WhyMustIInclude)
>
> Including a copy of the license with the work is vital so that everyone who 
> gets a copy of
> the program can know what their rights are.
>
> It might be tempting to include a URL that refers to the license, instead of 
> the license
> itself. But you cannot be sure that the URL will still be valid, five years 
> or ten years from
> now. Twenty years from now, URLs as we know them today may no longer exist.
>
> The only way to make sure that people who have copies of the program will 
> continue
> to be able to see the license, despite all the changes that will happen in 
> the network,
> is to include a copy of the license in the program.

This analysis by an open source lawyer,
https://writing.kemitchell.com/2016/09/21/MIT-License-Line-by-Line.html#notice-condition,
reinforces the same message for the MIT license.

Currently we've been working around this limitation by putting a
markdown version of the license in LICENSE.md and then adding that to
.Rbuildignore (this ensures that the source version on GitHub includes
the license even if the CRAN version does not). Ideally, as well as
allowing us to include full text of licenses in LICENSE or
LICENSE.txt, a LICENSE.md at the top-level of the package would also
be explicitly permitted.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Hadley Wickham
On Tue, Sep 8, 2020 at 4:12 AM Tomas Kalibera  wrote:
>
>
> The general principle is that R packages are only allowed to use what is
> documented in the R help (? command) and in Writing R Extensions. The
> former covers what is allowed from R code in extensions, the latter
> mostly what is allowed from C code in extensions (with some references
> to Fortran).

Could you clarify what you mean by "documented"? For example,
Rf_allocVector() is mentioned several times in R-exts, but I don't see
anywhere where the inputs and output are precisely described (which is
what I would consider to be documented). Is Rf_allocVector() part of
the API?

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] object.size vs lobstr::obj_size

2020-03-27 Thread Hadley Wickham
On Fri, Mar 27, 2020 at 4:01 PM Hervé Pagès  wrote:

>
>
> On 3/27/20 12:00, Hadley Wickham wrote:
> >
> >
> > On Fri, Mar 27, 2020 at 10:39 AM Hervé Pagès  > <mailto:hpa...@fredhutch.org>> wrote:
> >
> > Hi Tomas,
> >
> > On 3/27/20 07:01, Tomas Kalibera wrote:
> >  > they provide an over-approximation
> >
> > They can also provide an "under-approximation" (to say the least)
> e.g.
> > on reference objects where the entire substance of the object is
> > ignored
> > which makes object.size() completely meaningless in that case:
> >
> > setRefClass("A", fields=c(stuff="ANY"))
> > object.size(new("A", stuff=raw(0)))  # 680 bytes
> > object.size(new("A", stuff=runif(1e8)))  # 680 bytes
> >
> > Why wouldn't object.size() look at the content of environments?
> >
> >
> > As the author, I'm obviously biased, but I do like lobstr::obj_sizes()
> > which allows you to see the additional size occupied by one object given
> > any number of other objects. This is particularly important for
> > reference classes since individual objects appear quite large:
> >
> > A <- setRefClass("A", fields=c(stuff="ANY"))
> > lobstr::obj_size(new("A", stuff=raw(0)))
> > #> 567,056 B
> >
> > But the vast majority is shared across all instances of that class:
> >
> > lobstr::obj_size(A)
> > #> 719,232 B
> > lobstr::obj_sizes(A, new("A", stuff=raw(0)))
> > #> * 719,232 B
> > #> * 720 B
> > lobstr::obj_sizes(A, new("A", stuff=runif(1e8)))
> > #> * 719,232 B
> > #> * 800,000,720 B
>
> Nice. Can you clarify the situation with lobstr::obj_size vs
> pryr::object_size? I've heard of the latter before and use it sometimes
> but never heard of the former before seeing Stefan's post. Then I
> checked the authors of both and thought maybe they should talk to each
> other ;-)
>

pryr is basically retired :) TBH I don't know why I gave up on it, except
lobstr is a cooler name 藍 That's where all active development is
happening. (The underlying code is substantially similar although
lobstr includes bug fixes not present in pryr)

Hadley

-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] object.size vs lobstr::obj_size

2020-03-27 Thread Hadley Wickham
On Fri, Mar 27, 2020 at 11:08 AM Tomas Kalibera 
wrote:

> On 3/27/20 4:39 PM, Hervé Pagès wrote:
> > Hi Tomas,
> >
> > On 3/27/20 07:01, Tomas Kalibera wrote:
> >> they provide an over-approximation
> >
> > They can also provide an "under-approximation" (to say the least) e.g.
> > on reference objects where the entire substance of the object is
> > ignored which makes object.size() completely meaningless in that case:
> >
> >   setRefClass("A", fields=c(stuff="ANY"))
> >   object.size(new("A", stuff=raw(0)))  # 680 bytes
> >   object.size(new("A", stuff=runif(1e8)))  # 680 bytes
> >
> > Why wouldn't object.size() look at the content of environments?
>
> Yes, the treatment of environments is not "over-approximative". It has
> to be bounded somewhere, you can't traverse all captured environments,
> getting to say package namespaces, global environment, code of all
> functions, that would be too over-approximating. For environments used
> as hash maps that contain data, such as in reference classes, it would
> of course be much better to include them, but you can't differentiate
> programmatically. In principle the same environment can be used for both
> things, say a namespace environment can contain data (not clearly
> related to any user-level R object) as well as code. Not mentioning
> things like source references and parse data.
>
>
I think the heuristic used in lobstr works well in practice: don't traverse
further than the current environment (supplied as an argument so you can
override), and don't ever traverse past the global or base environments.

Hadley

-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] object.size vs lobstr::obj_size

2020-03-27 Thread Hadley Wickham
On Fri, Mar 27, 2020 at 10:39 AM Hervé Pagès  wrote:

> Hi Tomas,
>
> On 3/27/20 07:01, Tomas Kalibera wrote:
> > they provide an over-approximation
>
> They can also provide an "under-approximation" (to say the least) e.g.
> on reference objects where the entire substance of the object is ignored
> which makes object.size() completely meaningless in that case:
>
>setRefClass("A", fields=c(stuff="ANY"))
>object.size(new("A", stuff=raw(0)))  # 680 bytes
>object.size(new("A", stuff=runif(1e8)))  # 680 bytes
>
> Why wouldn't object.size() look at the content of environments?
>

As the author, I'm obviously biased, but I do like lobstr::obj_sizes()
which allows you to see the additional size occupied by one object given
any number of other objects. This is particularly important for reference
classes since individual objects appear quite large:

A <- setRefClass("A", fields=c(stuff="ANY"))
lobstr::obj_size(new("A", stuff=raw(0)))
#> 567,056 B

But the vast majority is shared across all instances of that class:

lobstr::obj_size(A)
#> 719,232 B
lobstr::obj_sizes(A, new("A", stuff=raw(0)))
#> * 719,232 B
#> * 720 B
lobstr::obj_sizes(A, new("A", stuff=runif(1e8)))
#> * 719,232 B
#> * 800,000,720 B

Hadley
-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rebuilding and re-checking of downstream dependencies on CRAN Mac build machines

2020-03-26 Thread Hadley Wickham
If I do install.packages("dplyr", type = "source"), I see:

Installing package into ‘/Users/hadley/R’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/src/contrib/dplyr_0.8.5.tar.gz'
Content type 'application/x-gzip' length 1378766 bytes (1.3 MB)
==
downloaded 1.3 MB

* installing *source* package ‘dplyr’ ...
** package ‘dplyr’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
ccache clang++ -Qunused-arguments
 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG
-I../inst/include -DRCPP_DEFAULT_INCLUDE_CALL=false -DCOMPILING_DPLYR
-DRCPP_USING_UTF8_ERROR_STRING -DRCPP_USE_UNWIND_PROTECT
-DBOOST_NO_AUTO_PTR  -I"/Users/hadley/R/BH/include"
-I"/Users/hadley/R/plogr/include" -I"/Users/hadley/R/Rcpp/include"
-isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
-I/usr/local/include  -fPIC  -Wall -g -O2  -c RcppExports.cpp -o
RcppExports.o
In file included from RcppExports.cpp:4:
In file included from ./../inst/include/dplyr.h:4:
In file included from ../inst/include/dplyr/main.h:6:
In file included from ../inst/include/dplyr/workarounds/static_assert.h:17:
In file included from /Users/hadley/R/BH/include/boost/config.hpp:57:
In file included from
/Users/hadley/R/BH/include/boost/config/platform/macos.hpp:28:
In file included from
/Users/hadley/R/BH/include/boost/config/detail/posix_features.hpp:18:
In file included from
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:655:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/gethostuuid.h:39:17:
error: unknown type name 'uuid_t'
int gethostuuid(uuid_t, const struct timespec *)
__OSX_AVAILABLE_STARTING(__MAC_10_5, __IPHONE_NA);
^
In file included from RcppExports.cpp:4:
In file included from ./../inst/include/dplyr.h:4:
In file included from ../inst/include/dplyr/main.h:6:
In file included from ../inst/include/dplyr/workarounds/static_assert.h:17:
In file included from /Users/hadley/R/BH/include/boost/config.hpp:57:
In file included from
/Users/hadley/R/BH/include/boost/config/platform/macos.hpp:28:
In file included from
/Users/hadley/R/BH/include/boost/config/detail/posix_features.hpp:18:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:662:27:
error: unknown type name 'uuid_t'; did you mean 'uid_t'?
int  getsgroups_np(int *, uuid_t);
  ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types/_uid_t.h:31:31:
note: 'uid_t' declared here
typedef __darwin_uid_tuid_t;
  ^
In file included from RcppExports.cpp:4:
In file included from ./../inst/include/dplyr.h:4:
In file included from ../inst/include/dplyr/main.h:6:
In file included from ../inst/include/dplyr/workarounds/static_assert.h:17:
In file included from /Users/hadley/R/BH/include/boost/config.hpp:57:
In file included from
/Users/hadley/R/BH/include/boost/config/platform/macos.hpp:28:
In file included from
/Users/hadley/R/BH/include/boost/config/detail/posix_features.hpp:18:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:664:27:
error: unknown type name 'uuid_t'; did you mean 'uid_t'?
int  getwgroups_np(int *, uuid_t);
  ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types/_uid_t.h:31:31:
note: 'uid_t' declared here
typedef __darwin_uid_tuid_t;
  ^
In file included from RcppExports.cpp:4:
In file included from ./../inst/include/dplyr.h:4:
In file included from ../inst/include/dplyr/main.h:6:
In file included from ../inst/include/dplyr/workarounds/static_assert.h:17:
In file included from /Users/hadley/R/BH/include/boost/config.hpp:57:
In file included from
/Users/hadley/R/BH/include/boost/config/platform/macos.hpp:28:
In file included from
/Users/hadley/R/BH/include/boost/config/detail/posix_features.hpp:18:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:727:31:
error: unknown type name 'uuid_t'; did you mean 'uid_t'?
int  setsgroups_np(int, const uuid_t);
  ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/_types/_uid_t.h:31:31:
note: 'uid_t' declared here
typedef __darwin_uid_tuid_t;
  ^
In file included from RcppExports.cpp:4:
In file included from ./../inst/include/dplyr.h:4:
In file included from ../inst/include/dplyr/main.h:6:
In file included from ../inst/include/dplyr/workarounds/static_assert.h:17:
In file included from /Users/hadley/R/BH/include/boost/config.hpp:57:
In file included from
/Users/hadley/R/BH/include/boost/config/platform/macos.hpp:28:
In file included from
/Users/hadley/R/BH/include/boost/config/detail/posix_features.hpp:18:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:729:31:
error: unknown type name 'uuid_t'; did you mean 'uid_t'?
int  

[R-pkg-devel] win-builder down?

2020-02-22 Thread Hadley Wickham
Hi all,

Is win-builder down? I submitted a couple of packages >24 hours ago,
and haven't heard back.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] class() |--> c("matrix", "arrary") [was "head.matrix ..."]

2019-11-14 Thread Hadley Wickham
On Sun, Nov 10, 2019 at 2:37 AM Martin Maechler
 wrote:
>
> > Gabriel Becker
> > on Sat, 2 Nov 2019 12:37:08 -0700 writes:
>
> > I agree that we can be careful and narrow and still see a
> > nice improvement in behavior. While Herve's point is valid
> > and I understand his frustration, I think staying within
> > the matrix vs c(matrix, array) space is the right scope
> > for this work in terms of fiddling with inheritance.
>
>  [.]
>
>
> > > Also, we seem to have a rule that inherits(x, c)  iff  c %in% class(x),
> >
> > good point, and that's why my usage of  inherits(.,.) was not
> > quite to the point.  [OTOH, it was to the point, as indeed from
> >   the ?class / ?inherits docu, S3 method dispatch and inherits
> >   must be consistent ]
> >
> > > which would break -- unless we change class(x) to return the whole
> > set of inherited classes, which I sense that we'd rather not do
>
>   []
>
> > Note again that both "matrix" and "array" are special [see ?class] as
> > being of  __implicit class__  and I am considering that this
> > implicit class behavior for these two should be slightly
> > changed 
> >
> > And indeed I think you are right on spot and this would mean
> > that indeed the implicit class
> > "matrix" should rather become c("matrix", "array").
>
> I've made up my mind (and not been contradicted by my fellow R
> corers) to try go there for  R 4.0.0   next April.

I can't seem to find the previous thread, so would you mind being a
bit more explicit here? Do you mean adding "array" to the implicit
class? Or adding it to the explicit class? Or adding it to inherits?
i.e. which of the following results are you proposing to change?

is_array <- function(x) UseMethod("is_array")
is_array.array <- function(x) TRUE
is_array.default <- function(x) FALSE

x <- matrix()
is_array(x)
#> [1] FALSE
x <- matrix()
inherits(x, "array")
#> [1] FALSE
class(x)
#> [1] "matrix"

It would be nice to make sure this is consistent with the behaviour of
integers, which have an implicit parent class of numeric:

is_numeric <- function(x) UseMethod("is_numeric")
is_numeric.numeric <- function(x) TRUE
is_numeric.default <- function(x) FALSE

x <- 1L
is_numeric(x)
#> [1] TRUE
inherits(x, "numeric")
#> [1] FALSE
class(x)
#> [1] "integer"

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] improving the performance of install.packages

2019-11-09 Thread Hadley Wickham
If this is the behaviour you are looking for, you might like to try
pak (https://pak.r-lib.org)

# Create a temporary library
path <- tempfile()
dir.create(path)
.libPaths(path)

pak::pkg_install("scales")
#> → Will install 8 packages:
#>   colorspace (1.4-1), labeling (0.3), munsell (0.5.0), R6 (2.4.0),
RColorBrewer
#>   (1.1-2), Rcpp (1.0.2), scales (1.0.0), viridisLite (0.3.0)
#>
#> → Will download 2 CRAN packages (4.7 MB), cached: 6 (3.69 MB).
#>
#> ✔ Installed colorspace 1.4-1 [139ms]
#> ✔ Installed labeling 0.3 [206ms]
#> ✔ Installed munsell 0.5.0 [288ms]
#> ✔ Installed R6 2.4.0 [375ms]
#> ✔ Installed RColorBrewer 1.1-2 [423ms]
#> ✔ Installed Rcpp 1.0.2 [472ms]
#> ✔ Installed scales 1.0.0 [511ms]
#> ✔ Installed viridisLite 0.3.0 [569ms]
#> ✔ 1 + 7 pkgs | kept 0, updated 0, new 8 | downloaded 2 (4.7 MB) [2.8s]

pak::pkg_install("scales")
#> ✔ No changes needed
#> ✔ 1 + 7 pkgs | kept 7, updated 0, new 0 | downloaded 0 (0 B) [855ms]

remove.packages(c("Rcpp", "munsell"))
pak::pkg_install("scales")
#> → Will install 2 packages:
#>   munsell (0.5.0), Rcpp (1.0.2)
#>
#> → All 2 packages (4.88 MB) are cached.
#>
#> ✔ Installed munsell 0.5.0 [75ms]
#> ✔ Installed Rcpp 1.0.2 [242ms]
#> ✔ 1 + 7 pkgs | kept 6, updated 0, new 2 | downloaded 0 (0 B) [1.5s]

On Fri, Nov 8, 2019 at 1:07 AM Joshua Bradley  wrote:
>
> Hello,
>
> Currently if you install a package twice:
>
> install.packages("testit")
> install.packages("testit")
>
> R will build the package from source (depending on what OS you're using)
> twice by default. This becomes especially burdensome when people are using
> big packages (i.e. lots of depends) and someone has a script with:
>
> install.packages("tidyverse")
> ...
> ... later on down the script
> ...
> install.packages("dplyr")
>
> In this case, "dplyr" is part of the tidyverse and will install twice. As
> the primary "package manager" for R, it should not install a package twice
> (by default) when it can be so easily checked. Indeed, many people resort
> to writing a few lines of code to filter out already-installed packages An
> r-help post from 2010 proposed a solution to improving the default
> behavior, by adding "force=FALSE" as a api addition to install.packages.(
> https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
>
> Would the R-core devs still consider this proposal?
>
> Josh Bradley
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Puzzled about a new method for "[".

2019-11-04 Thread Hadley Wickham
For what it's worth, I don't think this strategy can work in general,
because a class might have attributes that depend on its data/contents
(e.g. https://vctrs.r-lib.org/articles/s3-vector.html#cached-sum). I
don't think these are particularly common in practice, but it's
dangerous to assume that you can restore a class simply by restoring
its attributes after subsetting.

Hadley

On Sun, Nov 3, 2019 at 3:11 PM Rolf Turner  wrote:
>
>
> I recently tried to write a new method for "[", to be applied to data
> frames, so that the object returned would retain (all) attributes of the
> columns, including attributes that my code had created.
>
> I thrashed around for quite a while, and then got some help from Rui
> Barradas who showed me how to do it, in the following manner:
>
> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
> length(cols) == 1)[{
> SaveAt <- lapply(x, attributes)
> x <- NextMethod()
> lX <- lapply(names(x),function(nm, x, Sat){
>   attributes(x[[nm]]) <- Sat[[nm]]
>   x[[nm]]}, x = x, Sat = SaveAt)
> names(lX) <- names(x)
> x <- as.data.frame(lX)
> x
> }
>
> If I set class(X) <- c("myclass",class(X)) and apply "[" to X (e.g.
> something like X[1:42,]) the attributes are retained as desired.
>
> OK.  All good.  Now we finally come to my question!  I want to put this
> new method into a package that I am building.  When I build the package
> and run R CMD check I get a complaint:
>
> ... no visible binding for global variable ‘cols’
>
> And indeed, there is no such variable.  At first I thought that maybe
> the code should be
>
> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
>length(j) == 1)[{
>
> But I looked at "[.data.frame" and it has "cols" too; not "j".
>
> So why doesn't "[.data.frame" throw a warning when R gets built?
>
> Can someone please explain to me what's going on here?
>
> cheers,
>
> Rolf
>
> P. S. I amended the code for my method, replacing "cols" by "j", and it
> *seems* to run, and deliver the desired results.  (And the package
> checks, without complaint.) I am nervous, however, that there may be
> some Trap for Young Players that I don't perceive, lurking about and
> waiting to cause problems for me.
>
> R.
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Require -package.Rd?

2019-09-30 Thread Hadley Wickham
On Tue, Sep 24, 2019 at 8:07 AM Georgi Boshnakov
 wrote:
>
> It is worth noting that
>
> help(package="")
>
> shows  file -package.Rd, while
>
> help()
>
> shows topic "package".
>
> Topic -package.Rd is also printed at the top of the pdf manual, 
> while package.Rd follows the alphabetical ordering of the remaining topics. 
> It is unfortunate that Hadley Wickham's tools (at least 'pkgdown') recommend 
> and use .Rd, instead of -package.Rd as overall package 
> description.

I'm not sure what lead you to that believe, but we definitely support
(and have supported for years) package?.  See, e.g.
https://usethis.r-lib.org/reference/use_package_doc.html

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] OFFICIAL: R-devel check error: package or NAMESPACE load failed, there is no package called broom

2019-09-10 Thread Hadley Wickham
This works for me locally too, so I'd recommend trying win-devel
again. Sometimes you catch it in an inconsistent state and your check
fails for reasons unrelated to your package.

Hadley

On Sat, Sep 7, 2019 at 3:50 PM Georgina Anderson
 wrote:
>
> OFFICIAL
>
> Hi
>
> Any help with the following update to my package PHEindicatormethods would be 
> appreciated.
>
> I have made a very minor change to the package to fix dependencies on the 
> tidyr:nest() function as tidyr v 1.0.0 is due to be released with breaking 
> changes on 9th September.
> The version I am trying to upload to CRAN is 1.1.5 available here: 
> https://github.com/PublicHealthEngland/PHEindicatormethods
>
>
> When I run devtools::check_win_devel locally (Windows 10 laptop, R 3.6.1, 
> rStudio 1.2.1335) it passes with no NOTES, ERRORS or WARNINGS (locally it 
> also passes devtools::check_win on release and oldrelease and 
> devtools::check_rhub)
>
> When I submit to CRAN I have been notified that package 
> PHEindicatormethods_1.1.5.tar.gz does not pass the incoming checks 
> automatically, signposting failing pre-tests on Windows:
>
> * using R Under development (unstable) (2019-09-02 r77130)
>
> * using platform: x86_64-w64-mingw32 (64-bit)
>
> * using session charset: ISO8859-1
>
>
>
> This is the log file available for 7 days: 
> https://win-builder.r-project.org/incoming_pretest/PHEindicatormethods_1.1.5_20190905_224346/Windows/00check.log
>
> Below are two sections from the log that show the errors:
>
>
> > library('PHEindicatormethods')
>
> Error: package or namespace load failed for 'PHEindicatormethods' in 
> loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
>
>  there is no package called 'broom'
>
> Execution halted
>
> ** running examples for arch 'x64' ... ERROR
>
> Running examples in 'PHEindicatormethods-Ex.R' failed
>
> The error occurred in:
>
> R Under development (unstable) (2019-09-02 r77130) -- "Unsuffered 
> Consequences"
>
> Copyright (C) 2019 The R Foundation for Statistical Computing
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
> Error: processing vignette 'WorkedExamples_phe_sii.Rmd' failed with 
> diagnostics:
>
> package or namespace load failed for 'PHEindicatormethods' in 
> loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
>
>  there is no package called 'broom'
>
> --- failed re-building 'WorkedExamples_phe_sii.Rmd'
>
> SUMMARY: processing the following file failed:
>
>   'WorkedExamples_phe_sii.Rmd'
>
> Error: Vignette re-building failed.
>
> Execution halted
>
> I have checked the broom package and its upstream dependency on generics - I 
> notice the tidy() function was moved from broom to generics a while back but 
> this was before I submitted the current version (1.1.3) of my package to CRAN 
> so not sure this can be the problem, although one of my functions does 
> reference broom::tidy().  The evidence seems to point to changes in R-devel 
> causing something in my package to break but without being able to reproduce 
> the errors seen by CRAN I'm finding it difficult to pinpoint the problem.
>
> Thanks
> Georgie
>
>
> **
> The information contained in the EMail and any attachments is confidential 
> and intended solely and for the attention and use of the named addressee(s). 
> It may not be disclosed to any other person without the express authority of 
> Public Health England, or the intended recipient, or both. If you are not the 
> intended recipient, you must not disclose, copy, distribute or retain this 
> message or any part of it. This footnote also confirms that this EMail has 
> been swept for computer viruses by Symantec.Cloud, but please re-sweep any 
> attachments before opening or saving. http://www.gov.uk/PHE
> **
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] (Not) Reporting minimum R version in DESCRIPTION Depends Field

2019-07-30 Thread Hadley Wickham
On Fri, Jul 26, 2019 at 8:46 AM Ben Bolker  wrote:
>
>I'd add: as far as I know CRAN policy would only require you to
> report the minimum version if your package would fail on one of the
> older R versions that CRAN still tests (at present I think this goes
> back to 3.5.2).  For regular package developers (i.e.  without the
> resources of RStudio) it's a nuisance to install lots of old versions of
> R just to find out how far back your package actually works.

If you already use travis, this is pretty easy. All you need is something like:

matrix:
  include:
  - r: release
  - r: devel
  - r: oldrel
  - r: 3.4
  - r: 3.3
  - r: 3.2
  - r: 3.1

Obviously the challenge is making things work on older versions of R —
in our experience this is either really easy (i.e. you just
accidentally used a function that was added to R recently) or really
rather hard (your code on works in recent R because of bug that R core
fixed)

Hadley

> IMO it's a
> little bit obnoxious (albeit maybe unintentionally so) to prevent users
> who are stuck with earlier versions from using your package without a
> good reason.
>   (For this reason I try to save RData files that are included in my
> packages with version=2 so that users with R <3.5.0 aren't left out ...)
>
>   Unless you know that your package depends on features introduced in a
> particular version of R, I'd leave it out.  If you find out later from
> users that it breaks on earlier versions, and you can't find a way to
> enable its use on earlier versions, you can add the dependency in a
> later release.
>
>  Ben Bolker
>
> On 2019-07-26 10:20 a.m., Hadley Wickham wrote:
> > I no longer believe this to be good advice - I think you should only
> > declare a specific dependency if you want to strongly assert that your
> > package works with those versions. For example, all tidyverse versions
> > depend on R 3.2 and later, because we test on all those versions.
> >
> > Hadley
> >
> > On Friday, July 26, 2019, Jarrett Phillips 
> > wrote:
> >
> >> Hello,
> >>
> >> Numerous CRAN packages report minimum R versions within the Depends field
> >> of the DESCRIPTION file.
> >>
> >> Is this reporting always necessary?
> >>
> >> Hadley Wickham's book "R Packages" states:
> >>
> >> "You can also use Depends to require a specific version of R, e.g.
> >> Depends: R
> >> (>= 3.0.1) . As with packages, it’s a good idea to play it safe and require
> >> a version greater than or equal to the version you’re currently using.
> >> devtools::create()  will do this for you."
> >>
> >> devtools::check() doesn't warn about not specifying R version, so I am just
> >> curious.
> >>
> >>
> >> Thanks.
> >>
> >> Cheers,
> >>
> >> Jarrett
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> __
> >> R-package-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >>
> >
> >
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] (Not) Reporting minimum R version in DESCRIPTION Depends Field

2019-07-26 Thread Hadley Wickham
I no longer believe this to be good advice - I think you should only
declare a specific dependency if you want to strongly assert that your
package works with those versions. For example, all tidyverse versions
depend on R 3.2 and later, because we test on all those versions.

Hadley

On Friday, July 26, 2019, Jarrett Phillips 
wrote:

> Hello,
>
> Numerous CRAN packages report minimum R versions within the Depends field
> of the DESCRIPTION file.
>
> Is this reporting always necessary?
>
> Hadley Wickham's book "R Packages" states:
>
> "You can also use Depends to require a specific version of R, e.g.
> Depends: R
> (>= 3.0.1) . As with packages, it’s a good idea to play it safe and require
> a version greater than or equal to the version you’re currently using.
> devtools::create()  will do this for you."
>
> devtools::check() doesn't warn about not specifying R version, so I am just
> curious.
>
>
> Thanks.
>
> Cheers,
>
> Jarrett
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>


-- 
http://hadley.nz

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

2019-05-16 Thread Hadley Wickham
The existing behaviour seems inutitive to me. I would consider these
invariants for n vector x_i's each with size m:

* nrow(rbind(x_1, x_2, ..., x_n)) equals n
* ncol(rbind(x_1, x_2, ..., x_n)) equals m

Additionally, wouldn't you expect rbind(x_1[i], x_2[i]) to equal
rbind(x_1, x_2)[, i, drop = FALSE] ?

Hadley

On Thu, May 16, 2019 at 3:26 PM Gabriel Becker  wrote:
>
> Hi all,
>
> Apologies if this has been asked before (a quick google didn't  find it for
> me),and I know this is a case of behaving as documented but its so
> unintuitive (to me at least) that I figured I'd bring it up here anyway. I
> figure its probably going to not be changed,  but I'm happy to submit a
> patch if this is something R-core feels can/should change.
>
> So I recently got bitten by the fact that
>
> > nrow(rbind(character(), character()))
>
> [1] 2
>
>
> I was checking whether the result of an rbind call had more than one row,
> and that unexpected returned true, causing all sorts of shenanigans
> downstream as I'm sure you can imagine.
>
> Now I know that from ?rbind
>
> For ‘cbind’ (‘rbind’), vectors of zero length (including ‘NULL’)
> >
> >  are ignored unless the result would have zero rows (columns), for
> >
> >  S compatibility.  (Zero-extent matrices do not occur in S3 and are
> >
> >  not ignored in R.)
> >
>
> But there's a couple of things here. First, for the rowbind  case this
> reads as "if there would be zero columns,  the vectors will not be
> ignored". This wording implies to me that not ignoring the vectors is a
> remedy to the "problem" of the potential for a zero-column return, but
> thats not the case.  The result still has 0 columns, it just does not also
> have zero rows. So even if the behavior is not changed, perhaps this
> wording can be massaged for clarity?
>
> The other issue, which I admit is likely a problem with my intuition, but
> which I don't think I'm alone in having, is that even if I can't have a 0x0
> matrix (which is what I'd prefer) I would have expected/preferred a 1x0
> matrix, the reasoning being that if we must avoid a 0x0 return value, we
> would do the  minimum required to avoid, which is to not ignore the first
> length 0 vector, to ensure a non-zero-extent matrix, but then ignore the
> remaining ones as they contain information for 0 new rows.
>
> Of course I can program around this now that I know the behavior, but
> again, its so unintuitive (even for someone with a fairly well developed
> intuition for R's sometimes "quirky" behavior) that I figured I'd bring it
> up.
>
> Thoughts?
>
> Best,
> ~G
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] CRAN student assistants

2019-05-16 Thread Hadley Wickham
Hi all,

The thread seems to have drifted off topic. I really didn't want this
to devolve into a discussion about when cat() or message() is more
appropriate — I have complete faith in Jenny Bryan's ability to
understand technical tradeoffs and pick the most appropriate given the
constraints. I am most interested in understanding what level of
discretion CRAN's "Studentischer administrativer Mitarbeiter" have to
critique the implementation of R packages, particularly for packages
that do not yield R CMD check problems or otherwise violate CRAN
policies.

I mean no disrespect towards the CRAN maintainers (whose tireless
efforts are a crucial part of making R the success that it is), but I
don't think it's unreasonable to enquire as to who is involved in a
crucial piece of open source community infrastructure, and, if
students are involved, what their scope of work is and how they are
trained and supervised.

I do recognise that my question "Who are they?" may have been
perceived as overly intrusive. To clarify: I don't want to know names
or other personally identifying information, but I would like to know
in general terms how many there are, and what backgrounds they have.
Similarly, I don't want to know how much they are paid, just whether
or not they are volunteers or employees.

Hadley

On Tue, May 14, 2019 at 10:23 AM Hadley Wickham  wrote:
>
> Hi all,
>
> Several people on my team have received responses to their CRAN
> submissions from new members of the CRAN team who appear to be student
> assistants (judging from their job titles: "Studentischer
> administrativer Mitarbeiter"). From the outside, they appear to be
> exercising editorial control[^1] and conducting design reviews[^2].
>
> CRAN is a critical piece of R community infrastructure, and I am sure
> these students have been surrounded by the proper checks and balances,
> but it's not obvious what their role is from the outside. I'd really
> appreciate knowing a little more about them:
>
> * Who are they?
>
> * Are they paid employees or volunteers?
>
> * What is their scope of work?
>
> * How are they trained?
>
> * If we believe that they have made a mistake, how do we request
>   review from a senior CRAN member?
>
> * They appear to be able to apply additional discretionary criteria
>   that are not included in R CMD check or documented in the CRAN policies.
>   Is this true? If so, what is the scope of these additional checks?
>
> Hadley
>
> [^1]: The devoid package was rejected because the student assistant
> did not understand the purpose of the package.
>
> [^2]: The gargle package was rejected because the student assistant
> believed that the use of cat() was incorrect. It was not.
>
> --
> http://hadley.nz



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] CRAN student assistants

2019-05-14 Thread Hadley Wickham
Hi all,

Several people on my team have received responses to their CRAN
submissions from new members of the CRAN team who appear to be student
assistants (judging from their job titles: "Studentischer
administrativer Mitarbeiter"). From the outside, they appear to be
exercising editorial control[^1] and conducting design reviews[^2].

CRAN is a critical piece of R community infrastructure, and I am sure
these students have been surrounded by the proper checks and balances,
but it's not obvious what their role is from the outside. I'd really
appreciate knowing a little more about them:

* Who are they?

* Are they paid employees or volunteers?

* What is their scope of work?

* How are they trained?

* If we believe that they have made a mistake, how do we request
  review from a senior CRAN member?

* They appear to be able to apply additional discretionary criteria
  that are not included in R CMD check or documented in the CRAN policies.
  Is this true? If so, what is the scope of these additional checks?

Hadley

[^1]: The devoid package was rejected because the student assistant
did not understand the purpose of the package.

[^2]: The gargle package was rejected because the student assistant
believed that the use of cat() was incorrect. It was not.

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Discrepancy between is.list() and is(x, "list")

2019-03-28 Thread Hadley Wickham
On Wed, Mar 27, 2019 at 6:27 PM Abs Spurdle  wrote:
>
> > the prison made by ancient design choices
>
> That prison of ancient design choices isn't so bad.
>
> I have no further comments on object oriented semantics.
> However, I'm planning to follow the following design pattern.
>
> If I set the class of an object, I will append the new class to the
> existing class.
>
> #good
> class (object) = c ("something", class (object) )
>
> #bad
> class (object) = "something"
>
> I encourage others to do the same.

I don't think this is a good pattern. It's better to clearly define a
constructor function that checks that `object` is the correct
underlying base type for your class -
https://adv-r.hadley.nz/s3.html#s3-classes.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Discrepancy between is.list() and is(x, "list")

2019-03-27 Thread Hadley Wickham
I would recommend reading https://adv-r.hadley.nz/base-types.html and
https://adv-r.hadley.nz/s3.html. Understanding the distinction between
base types and S3 classes is very important to make this sort of
question precise, and in my experience, you'll find R easier to
understand if you carefully distinguish between them. (And hence you
shouldn't expect is.x(), inherits(, "x") and is(, "x") to always
return the same results)

Also note that many of is.*() functions are not testing for types or
classes, but instead often have more complex semantics. For example,
is.vector() tests for objects with an underlying base vector type that
have no attributes (apart from names). is.numeric() tests for objects
with base type integer or double, and that have the same algebraic
properties as numbers.

Hadley

On Mon, Mar 25, 2019 at 10:28 PM Abs Spurdle  wrote:
>
> > I have noticed a discrepancy between is.list() and is(x, “list”)
>
> There's a similar problem with inherits().
>
> On R 3.5.3:
>
> > f = function () 1
> > class (f) = "f"
>
> > is.function (f)
> [1] TRUE
> > inherits (f, "function")
> [1] FALSE
>
> I didn't check what happens with:
> > class (f) = c ("f", "function")
>
> However, they should have the same result, regardless.
>
> > Is this discrepancy intentional?
>
> I hope not.
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Checking for future file timestamps - warning with worldclockapi HTTP status 403 Site Disabled

2019-03-07 Thread Hadley Wickham
As of ~7 hours ago, the warning is suppressed:
https://github.com/wch/r-source/commit/31ee14c620eb1b939acd322f3b5617f998aab8e8

(But the service still doesn't work)

Hadley

On Thu, Mar 7, 2019 at 11:03 AM Hadley Wickham  wrote:
>
> It appears that the code was added by BDR on 2 Sep 2018:
> https://github.com/wch/r-source/commit/d839b1e04e173f90b51ad809ef0bdb18095abe6f
>
> I assume we are seeing failing R CMD check results because
> http://worldclockapi.com/api/json/utc/now has recently died.
>
> It would be appreciated if someone from R-core could look into this as
> it's currently causing all R-devel builds on travis to fail.
>
> Hadley
>
> On Thu, Mar 7, 2019 at 9:32 AM Bob Rudis  wrote:
> >
> > (a) that's gd news (#ty)
> > (b) genuine apologies for my confusion
> > (c) why was the introduction of reliance on a third-party site even under 
> > consideration?
> >
> > > On Mar 7, 2019, at 09:32, peter dalgaard  wrote:
> > >
> > > It's not "stealth fixed"! It was never there... (on the release branch)
> > >
> > > The timestamp checking code is still present in R-devel. I presume 
> > > something needs to be done about the breakage.
> > >
> > > - pd
> > >
> > >> On 7 Mar 2019, at 14:38 , Bob Rudis  wrote:
> > >>
> > >> It's fixed in the RC that's GA on the 11th.
> > >>
> > >> I think perhaps "stealth fixed" may be more appropro since it's not in 
> > >> SVN logs, Bugzilla nor noted prominently in any of the various NEWS* 
> > >> files.
> > >>
> > >> Then there's the "why was the core R installation using a third party, 
> > >> non-HTTPS site for this to begin with".
> > >>
> > >> And, in other news, there are tests in the R source that rely on a check 
> > >> of `foo.bar` for connectivity. `.bar` is a valid domain and `foo.bar` is 
> > >> registered. Thankfully there's no current IP address associated with it. 
> > >> Anything under `*.invalid` (https://en.wikipedia.org/wiki/.invalid) 
> > >> might be a better choice as well since that won't break the reason for 
> > >> the connectivity checks and won't arbitrarily send telemetry pings to 
> > >> third parties in the even anyone outside of R Core decides to run the 
> > >> tests (say, when patching something in R).
> > >>
> > >> -boB
> > >>
> > >>> On Mar 7, 2019, at 07:54, Rainer M Krug  wrote:
> > >>>
> > >>> I can confirm the same when checking on travis with r-devel.
> > >>>
> > >>> And thanks for the tip with
> > >>>
> > >>> env:
> > >>> - _R_CHECK_SYSTEM_CLOCK_=0
> > >>>
> > >>> In .travis.yml
> > >>>
> > >>> Seems to be working now
> > >>>
> > >>> Rainer
> > >>>
> > >>>
> > >>>
> > >>>> On 7 Mar 2019, at 12:48, Ralf Herold  wrote:
> > >>>>
> > >>>> Dear All,
> > >>>>
> > >>>> Checking a new package under development produces a warning in a local 
> > >>>> R-devel MS Windows environment (output below).
> > >>>>
> > >>>> Building it with R-devel on Travis fails (because warnings are changed 
> > >>>> to errors), but is successful when setting environment variable 
> > >>>> _R_CHECK_SYSTEM_CLOCK_ to zero.
> > >>>>
> > >>>> No issue occurs when checking and building with R-stable and R-oldrel 
> > >>>> on Travis, or with any R version on win-builder.r-project.org.
> > >>>>
> > >>>> The warning concerns using http://worldclockapi.com/, which however 
> > >>>> seems out of service ("The web app you have attempted to reach is 
> > >>>> currently stopped and does not accept any requests."). This is 
> > >>>> referenced in the main function for R CMD check 
> > >>>> (https://svn.r-project.org/R/trunk/src/library/tools/R/check.R) and 
> > >>>> may concern more R-devel than R-package-devel. I am posting here to 
> > >>>> check if the issue was noticed by other package developers and to 
> > >>>> check the impact.
> > >>>>
> > >>>> Thanks for any suggestions.
> > >>>> Best regards,
> > >>>> Ralf
>

Re: [R-pkg-devel] Checking for future file timestamps - warning with worldclockapi HTTP status 403 Site Disabled

2019-03-07 Thread Hadley Wickham
It appears that the code was added by BDR on 2 Sep 2018:
https://github.com/wch/r-source/commit/d839b1e04e173f90b51ad809ef0bdb18095abe6f

I assume we are seeing failing R CMD check results because
http://worldclockapi.com/api/json/utc/now has recently died.

It would be appreciated if someone from R-core could look into this as
it's currently causing all R-devel builds on travis to fail.

Hadley

On Thu, Mar 7, 2019 at 9:32 AM Bob Rudis  wrote:
>
> (a) that's gd news (#ty)
> (b) genuine apologies for my confusion
> (c) why was the introduction of reliance on a third-party site even under 
> consideration?
>
> > On Mar 7, 2019, at 09:32, peter dalgaard  wrote:
> >
> > It's not "stealth fixed"! It was never there... (on the release branch)
> >
> > The timestamp checking code is still present in R-devel. I presume 
> > something needs to be done about the breakage.
> >
> > - pd
> >
> >> On 7 Mar 2019, at 14:38 , Bob Rudis  wrote:
> >>
> >> It's fixed in the RC that's GA on the 11th.
> >>
> >> I think perhaps "stealth fixed" may be more appropro since it's not in SVN 
> >> logs, Bugzilla nor noted prominently in any of the various NEWS* files.
> >>
> >> Then there's the "why was the core R installation using a third party, 
> >> non-HTTPS site for this to begin with".
> >>
> >> And, in other news, there are tests in the R source that rely on a check 
> >> of `foo.bar` for connectivity. `.bar` is a valid domain and `foo.bar` is 
> >> registered. Thankfully there's no current IP address associated with it. 
> >> Anything under `*.invalid` (https://en.wikipedia.org/wiki/.invalid) might 
> >> be a better choice as well since that won't break the reason for the 
> >> connectivity checks and won't arbitrarily send telemetry pings to third 
> >> parties in the even anyone outside of R Core decides to run the tests 
> >> (say, when patching something in R).
> >>
> >> -boB
> >>
> >>> On Mar 7, 2019, at 07:54, Rainer M Krug  wrote:
> >>>
> >>> I can confirm the same when checking on travis with r-devel.
> >>>
> >>> And thanks for the tip with
> >>>
> >>> env:
> >>> - _R_CHECK_SYSTEM_CLOCK_=0
> >>>
> >>> In .travis.yml
> >>>
> >>> Seems to be working now
> >>>
> >>> Rainer
> >>>
> >>>
> >>>
>  On 7 Mar 2019, at 12:48, Ralf Herold  wrote:
> 
>  Dear All,
> 
>  Checking a new package under development produces a warning in a local 
>  R-devel MS Windows environment (output below).
> 
>  Building it with R-devel on Travis fails (because warnings are changed 
>  to errors), but is successful when setting environment variable 
>  _R_CHECK_SYSTEM_CLOCK_ to zero.
> 
>  No issue occurs when checking and building with R-stable and R-oldrel on 
>  Travis, or with any R version on win-builder.r-project.org.
> 
>  The warning concerns using http://worldclockapi.com/, which however 
>  seems out of service ("The web app you have attempted to reach is 
>  currently stopped and does not accept any requests."). This is 
>  referenced in the main function for R CMD check 
>  (https://svn.r-project.org/R/trunk/src/library/tools/R/check.R) and may 
>  concern more R-devel than R-package-devel. I am posting here to check if 
>  the issue was noticed by other package developers and to check the 
>  impact.
> 
>  Thanks for any suggestions.
>  Best regards,
>  Ralf
> 
> 
>  PS C:\Users\username> & 'C:\Program Files\R\R-devel\bin\R.exe' CMD check 
>  E:\mypackage_0.1.2.3.tar.gz --as-cran
>  * using log directory 'C:/Users/username/ctrdata.Rcheck'
>  * using R Under development (unstable) (2019-03-05 r76200)
>  * using platform: x86_64-w64-mingw32 (64-bit)
>  * using session charset: ISO8859-1
>  * using option '--as-cran'
>  [...]
>  * checking package directory ... OK
>  * checking for future file timestamps ...Warning in file(con, "r") :
>  cannot open URL 'http://worldclockapi.com/api/json/utc/now': HTTP status 
>  was '403 Site Disabled'
>  WARNING
>  unable to verify current time
>  * checking 'build' directory … OK
>  [...]
> 
> 
> 
>  ## Ralf Herold
>  ## mailto: ralf.her...@mailbox.org [S/MIME]
>  ## https://paediatricdata.eu/
> 
>  __
>  R-package-devel@r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Rainer M. Krug
> >>> email: Rainerkrugsde
> >>> PGP: 0x0F52F982
> >>>
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> __
> >>> R-package-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >>
> >> __
> >> R-package-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >
> > --
> > Peter Dalgaard, Professor,
> > Center for 

Re: [R-pkg-devel] submitting to github

2019-01-25 Thread Hadley Wickham
No one else has mentioned it on the thread, so I'd highly recommend
https://happygitwithr.com — it's a guide to git + github specifically
written for R users, and covers many of the common problems people
have when getting set up.

Hadley

On Fri, Jan 25, 2019 at 2:51 AM Troels Ring  wrote:
>
> Dear friends - I'm sorry to bother but seem to be unable to interact
> constructively with github.
>
>
>
> I try to follow the instructions from Hadley (thanks) - i.e. I have a
> small trial-project which functions well. Since I have tried many times I
> start from shell with
>
> rm -rf .git
>
> and then select version control using git (tools, project options,git/svn) -
> and origin is still marked as "none" after restarting RStudio.
>
> Then from shell again: git init
>
> Yielding
>
> Initialized empty Git repository in
> C:/Users/Troels/Dropbox/Rown/ABCharge/.git/
>
> Rstudio restarted, package reopened: origin still "none"
>
> Git panel appears OK.
>
>
>
> Now from github: add new repository (non present after prior deletions!)
>
> Named as package name - repeated in description - repository created
>
>
>
> Shell opened from RStudio
>
> git remote add origin https://github.com/troelsring/ABCharge.git  - works
> without problems - an origin seems correctly accepted in RStudio - but then:
>
> git push -u origin master  - results in:
>
>
>
> error: src refspec master does not match any.
>
> error: failed to push some refs to
> 'https://github.com/troelsring/ABCharge.git' below in red
>
>
>
> I seem (also!) to have problems with the SSH keys - Rstudio marks that I
> have a key in c:/Users/Troels/.ssh/id_rsa -
>
> but when I run file.exists("~/.ssh/id_rsa.pub")
> [1] FALSE -  Is returned - but that is not the issue I guess? I have anyway
> made a public key as suggested.
>
> I have spent hours seeking on the many pages for explanation for this
> probably simple problem.
> All best
> Troels
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] New CRAN internet policy

2018-12-21 Thread Hadley Wickham
On Fri, Dec 7, 2018 at 2:48 AM Martin Maechler
 wrote:
>
> >>>>> Hadley Wickham
> >>>>> on Thu, 6 Dec 2018 10:22:47 -0600 writes:
>
> > Hi all,
> > I'd love to get some clarification on what the new internet policy
> > means for packages like httr:
>
> >> Packages which use Internet resources should fail gracefully with an 
> informative
> >> message if the resource is not available (and not give a check warning 
> nor error).
>
> > It's not clear what "internet resource" means here? If it means
> > dataset, then I think the httr tests and examples are ok. If it means
> > any use of the internet, I'm not sure what do - httr critically
> > depends on internet access, so I can't see any way to make it fail
> > gracefully.
>
> > Hadley
>
> I cannot answer your question, notably as I'm not part of the CRAN
> team, but as R Core developer, I've encountered the problem
> many times which this policy tries to mitigate
> (but I also think we should consider to go further than the
>  above "policy") :
>
> As R developer, I'd like to see the effect of a change to the
> sources of base R, and so eventually, I may want to run the
> equivalent of 'R CMD check' on all existing CRAN and
> Bioconductor packages. If I have access to a server with many
> cores and very fast hard disks, I can hope to finish running the
> tests with 1--2 days.
> But then I have to deal with the result.  The few times I've
> done this, the result has been "a mess" because many many
> packages  nowadays assume in their examples and their regression/unit
> tests that internet access to some resources works, ... which it
> "often" does not, and so  download.file(),
> read.table("http://.;) etc result in errors sooner or later.
>
> Because of that some packages fail their checks "randomly" (in
> the sense that internet resources are not available "randomly").
> Ideally we'd find a very good way that these failures are
> communicated back to the person / process running (a version of)
> 'R CMD check', because in the above scenario, I'd like to weed
> out the 300 packages that just failed because of internet
> resource access failures,  and only look at the other packages
> that got a change in their 'R CMD check' results.

We have now decent tooling for this in revdepcheck
(https://www.github.com/r-lib/revdepcheck, planning for CRAN
submission next year). After performing all the revdepchecks, you can
run revdep_add_broken() to recheck packages that failed in the
previous round - in my experience testing httr (whose revdeps
obviously use the internet a lot) this resolves most of the randomness
(since it's fairly unlikely to get two random failures in a row).

My main concern about making the checking in examples and tests
stricter is that I think the primary result is that people will simply
do less testing and write fewer realistic examples, which is a net
negative for the community. When you want people to do the right
thing, I think you have to provide a carrot along with the stick.

> The recent introduction in R-devel of classed error conditions
> (in some cases), e.g., 
> https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2018/10/04#n2018-10-04,
>   and the similar and somewhat earlier
> effort of Lionel Henry to use classed error conditions (in
> rlang only, unfortunately, rather than as a patch proposal to R ..)
> maybe one step towards a nice solution here.

We'd be happy to propose a patch to base R, but it's not yet clear to
us exactly how things should work so I think it makes the most sense
to first prototype in a package.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Package update submission to CRAN fails on pretest

2018-12-08 Thread Hadley Wickham
You might try reinstalling devtools and dependencies - there was
unfortunately a brief combination of versions that lead to build()
failing to overwrite existing files.

Hadley
On Fri, Dec 7, 2018 at 10:46 AM Wolfgang Lenhard
 wrote:
>
> Many thanks for the remark. It seems, it has something to do with
> submitting via the devtools (maybe I did something wrong with tagging
> the prior release). Submitting it manually at least does not result in
> pretest rejection. I guess there is something out of sync which resulted
> in rechecking and old version.
>
> Am 07.12.2018 um 13:46 schrieb Georgi Boshnakov:
> > The link you gave, 
> > https://cran.r-project.org/web/checks/check_results_cNORM.html,
> > is to the check results for the package currently on CRAN and it is indeed 
> > v. 1.0.1
> > (it is linked from https://CRAN.R-project.org/package=cNORM).
> >
> > Was this really the link you got from CRAN's pretest? Sometimes they ask if 
> > care has been taken about NOTEs/WARNINGs for the current CRAN version, 
> > since some tests are not done during submission, but this doesn't seem the 
> > case here.
> >
> > Georgi Boshnakov
> >
> >
> > -Original Message-
> > From: R-package-devel [mailto:r-package-devel-boun...@r-project.org] On 
> > Behalf Of Wolfgang Lenhard
> > Sent: 07 December 2018 07:55
> > To: r-package-devel@r-project.org
> > Subject: [R-pkg-devel] Package update submission to CRAN fails on pretest
> >
> > Dear list,
> > I am getting problems when trying to submit an update of the package
> > cNORM to CRAN. I am developing the package with RStudio and devtools and
> > I am using Travis for automatic testing. The package is tested locally
> > on Win10 and Mac OS X and on Travis with Ubuntu and Mac both for
> > development and release versions of R. All local tests and tests on
> > Travis work flawlessly - no errors, warning or notes. When submitting to
> > CRAN, a note and an error show up on some of the Linux OS (Fedora &
> > Solaris) and Mac OS X, while others display an 'OK' (Win, Debian). The
> > results: https://cran.r-project.org/web/checks/check_results_cNORM.html
> >
> > - error: This seems to be related to the vignette with the following
> > message:
> >> * checking examples ... ERROR
> >> Running examples in ‘cNORM-Ex.R’ failed
> > I can however not identify the location of the error
> >
> > - Note: Check: data for non-ASCII characters
> >
> > The strange thing is: I checked all data files multiple times. They
> > mainly consist of data.frames with numerics and all colnames  are ASCII.
> > I am not able to replicate the issue. The same is true for the error,
> > which does not show up on Travis and as well locally. And finally, the
> > results state, that the version of the package is 1.0.1, which had been
> > the first submission to CRAN a month ago. The current version of the
> > package is 1.1.1. Could this be the reason for the problem?
> >
> > Do you have an idea how to progress with the testing or how to locate
> > the errors? Any help is welcome.
> >
> > Best regards,
> >   Wolfgang Lenhard
> >
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-package-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> --
> Prof. Dr. Wolfgang Lenhard
> Lehrstuhl Psychologie IV
> Raum 02.130
> Wittelsbacherplatz 1
> D-97070 Würzburg
>
> Tel.: 0931 3189791
> FAX:  0931 3184891
> URL:  https://go.uniwue.de/lenhard
> Map:  https://go.uni-wuerzburg.de/3b
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Suggested package relies on recent R

2018-12-08 Thread Hadley Wickham
Can you just set _R_CHECK_FORCE_SUGGESTS_=false?

env:
  global:
  # don't treat missing suggested packages as error
  - _R_CHECK_FORCE_SUGGESTS_=false

I am reasonably certain that is what CRAN uses.

Hadley

On Fri, Dec 7, 2018 at 9:11 AM David Hugh-Jones
 wrote:
>
> Hi,
>
> My package Suggests a package that relies on R >= 3.5.0. My package works
> fine with earlier R, though. When travis runs R CMD check on R-oldrel,
> therefore, it fails because it can't install the suggested package.
>
> Is this going to stop me submitting to CRAN? I don't really want to require
> R 3.5.0 just to satisfy an optional dependency.
>
> Cheers,
> David
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] New CRAN internet policy

2018-12-06 Thread Hadley Wickham
Policies at https://cran.r-project.org/web/packages/policies.html

Dirk Eddelbuettel has a CRAN policy watch where you can see the
changes: https://github.com/eddelbuettel/crp

Hadley

On Thu, Dec 6, 2018 at 10:42 AM Roy Mendelssohn - NOAA Federal
 wrote:
>
> Hi All:
>
> Can someone point me to where all the policies such as this one are posted.  
> This may affect a package I have,  and one problem I have is different people 
> have differing ideas of what defines a "graceful" exit.
>
> Thanks,
>
> Roy
>
>
> > On Dec 6, 2018, at 8:22 AM, Hadley Wickham  wrote:
> >
> > Hi all,
> >
> > I'd love to get some clarification on what the new internet policy
> > means for packages like httr:
> >
> >> Packages which use Internet resources should fail gracefully with an 
> >> informative
> >> message if the resource is not available (and not give a check warning nor 
> >> error).
> >
> > It's not clear what "internet resource" means here? If it means
> > dataset, then I think the httr tests and examples are ok. If it means
> > any use of the internet, I'm not sure what do - httr critically
> > depends on internet access, so I can't see any way to make it fail
> > gracefully.
> >
> > Hadley
> >
> > --
> > http://hadley.nz
> >
> > __
> > R-package-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> **
> "The contents of this message do not reflect any position of the U.S. 
> Government or NOAA."
> **
> Roy Mendelssohn
> Supervisory Operations Research Analyst
> NOAA/NMFS
> Environmental Research Division
> Southwest Fisheries Science Center
> ***Note new street address***
> 110 McAllister Way
> Santa Cruz, CA 95060
> Phone: (831)-420-3666
> Fax: (831) 420-3980
> e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/
>
> "Old age and treachery will overcome youth and skill."
> "From those who have been given much, much will be expected"
> "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
>


-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] creating a link to a vignette in a .Rd file

2018-11-20 Thread Hadley Wickham
> None of these solutions seem perfect to me. I think that my suggestion is the 
> most natural, but as you point out it won’t work in all contexts. Perhaps the 
> safest approach is to give the vignette() command in the text of the help 
> file, one of your suggestions.

If you do that, and you use pkgdown (https://pkgdown.r-lib.org), that
code chunk will automatically be linked to the vignette when you build
the website.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] creating a link to a vignette in a .Rd file

2018-11-19 Thread Hadley Wickham
On Mon, Nov 19, 2018 at 4:49 PM Fox, John  wrote:
>
> Dear r-package-devel list members,
>
> I'd like to create a link to a package vignette from a help file in the same 
> package, for example to the "partial-residuals" vignette in the effects 
> package from effect.Rd. I'm able to generate a URL for the vignette as 
> follows:
>
> \Sexpr[results=text]{paste0("file://", system.file("doc", 
> "partial-residuals.pdf", package="effects"))}
>
> but I'm unable to link to the resulting text string using href{}{} or url{} 
> because \Sexpr[etc]{etc.} is treated as verbatim text rather than evaluated.
>
> Is there a way around this problem or another approach that works?

Have you confirmed that a raw file:// url works?  I would be mildly
surprised if it did, given my understanding of how web browser
security works (which is patchy, but you should still check before
going too far down this path).

Don't you just need `results=rd` ?

\Sexpr[results=rd]{paste0("\\url{file://", system.file("doc",
"partial-residuals.pdf", package="effects"), "}")}

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[Rd] An update on the vctrs package

2018-11-05 Thread Hadley Wickham
Hi all,

I wanted to give you an update on vctrs ()
since I last bought it up here in August. The biggest change is that I now
have a much clearer idea of what vctrs is! I’ll summarise that here,
and point you to the documentation if you’re interested in learning
more. I’m planning on submitting vctrs to CRAN in the near future, but
it’s very much a 0.1.0 release and I expect it to continue to evolve as
more people try it out and give me feedback. I’d love to hear your
thoughts\!

vctrs has three main goals:

  - To define and motivate `vec_size()` and `vec_type()` as alternatives
to `length()` and `class()`.

  - To define type- and size-stability, useful tools for analysing
function interfaces.

  - To make it easier to create new S3 vector classes.

## Size and prototype

`vec_size()` was motivated by my desire to have a function that captures
the number of “observations” in a vector. This particularly important
for data frames because it’s useful to have a function such that
`f(data.frame(x))` equals `f(x)`. No base function has this property:
`NROW()` comes closest, but because it’s defined in terms of `length()`
for dimensionless objects, it always returns a number, even for types
that can’t go in a data frame, e.g. `data.frame(mean)` errors even
though `NROW(mean)` is `1`.

``` r
vec_size(1:10)
#> [1] 10
vec_size(as.POSIXlt(Sys.time() + 1:10))
#> [1] 10
vec_size(data.frame(x = 1:10))
#> [1] 10
vec_size(array(dim = c(10, 4, 1)))
#> [1] 10
vec_size(mean)
#> Error: `x` is a not a vector
```

`vec_size()` is paired with `vec_slice()` for subsetting, i.e.
`vec_slice()` is to `vec_size()` as `[` is to `length()`;
`vec_slice(data.frame(x), i)` equals `data.frame(vec_slice(x, i))`
(modulo variable/row names).

(I plan to make `vec_size()` and `vec_slice()` generic in the next
release, providing another point of differentiation from `NROW()`.)

Complementary to the size of a vector is its prototype, a
zero-observation slice of the vector. You can compute this using
`vec_type()`, but because many classes don’t have an informative print
method for a zero-length vector, I also provide `vec_ptype()` which
prints a brief summary. As well as the class, the prototype also
captures important attributes:

``` r
vec_ptype(1:10)
#> Prototype: integer
vec_ptype(array(1:40, dim = c(10, 4, 1)))
#> Prototype: integer[,4,1]
vec_ptype(Sys.time())
#> Prototype: datetime
vec_ptype(data.frame(x = 1:10, y = letters[1:10]))
#> Prototype: data.frame<
#>   x: integer
#>   y: factor<5e105>
#> >
```

`vec_size()` and `vec_type()` are accompanied by functions that either
find or enforce a common size (using modified recycling rules) or common
type (by reducing a double-dispatching `vec_type2()` that determines the
common type from a pair of types).

You can read more about `vec_size()` and `vec_type()` at
.

## Stability

The definitions of size and prototype are motivated by my experiences
doing code review. I find that I can often spot problems by running R
code in my head. Obviously my mental R interpreter is much simpler than
the real interpreter, but it seems to focus on prototypes and sizes, and
I’m suspicious of code where I can’t easily predict the class of every
new variable.

This leads me to two definitions. A function is **type-stable** iif:

  - You can predict the output type knowing only the input types.
  - The order of arguments in … does not affect the output type.

Similary, a function is **size-stable** iif:

  - You can predict the output size knowing only the input sizes, or
there is a single numeric input that specifies the output size.

For example, `ifelse()` is type-unstable because the output type can be
different even when the input types are the same:

``` r
vec_ptype(ifelse(NA, 1L, 1L))
#> Prototype: logical
vec_ptype(ifelse(FALSE, 1L, 1L))
#> Prototype: integer
```

Size-stability is generally not a useful for analysing base R functions
because the definition is a bit too far away from base conventions. The
analogously defined length-stability is a bit better, but the definition
of length for non-vectors means that complete length-stability is rare.
For example, while `length(c(x, y))` usually equals `length(x) +
length(y)`, it does not hold for all possible inputs:

``` r
length(globalenv())
#> [1] 0
length(mean)
#> [1] 1
length(c(mean, globalenv()))
#> [1] 2
```

(I don’t mean to pick on base here; the tidyverse also has many
functions that violate these principles, but I wanted to stick to
functions that all readers would be familiar with.)

Type- and size-stable functions are desirable because they make it
possible to reason about code without knowing the precise values
involved. Of course, not all functions should be type- or size-stable: R
would be incredibly limited if you could predict the type or size of
`[[` and `read.csv()` without knowing the specific inputs\! But where
possible, I think using type- and 

Re: [R-pkg-devel] Extending/adding to an R6 class from another package: qns

2018-10-19 Thread Hadley Wickham
> AzureRMR: the "base" package, provides a number of R6 classes
> AzureVM: a "child" package that extends classes from AzureRMR with extra 
> functionality related to virtual machines
> AzureStor: another child package that extends classes from AzureRMR, this 
> time for storage accounts
> Etc.
>
> For example, AzureRMR defines a class called "az_resource_group" that 
> represents an Azure resource group. Within this class, I have convenience 
> functions to manage individual Azure resources: 
> az_resource_group$get_resource(), az_resource_group$create_resource(), etc. 
> One benefit of this approach is that method chaining works: I can do 
> something like
>
>az_subscription("xxx")$get_resource_group("yyy")$get_resource("zzz").
>
> In my child packages, I then define further classes and methods for dealing 
> with specific services. For consistency, I also add convenience functions to 
> the base AzureRMR::az_resource_group class to work with these new classes. 
> For example, AzureVM defines a new class az_vm_template, and also adds a 
> $get_vm() method to AzureRMR::az_resource_group.
>
> Running devtools::check() however brings up a note and warning for the child 
> packages. For example, with AzureVM:
>
> * checking R code for possible problems ... NOTE
> File 'AzureVM/R/add_methods.R':
>   .onLoad calls:
> message("Creating resource group '", resource_group, "'")
>
> Package startup functions should use 'packageStartupMessage' to  generate 
> messages.
> See section 'Good practice' in '?.onAttach'.
>
> . . .
>
> * checking for code/documentation mismatches ... WARNING
> Functions or methods with usage in documentation object 'get_vm' but not in 
> code:
>   get_vm get_vm_cluster list_vms
>
>
> The reason for the note is because modifying R6 classes from another package 
> has to be done at runtime, ie, in the .onLoad function. The message() call 
> referred to is inside one of the new methods that I define for an AzureRMR 
> class, hence it never actually appears at package startup. I assume it's okay 
> to ignore this note?

I think monkey-patching classes on load is an extremely bad idea. You
would be better off subclassing, or if the classes are so closely
inter-related, you should put them in a single package. Or re-design
your interface to use the pipe instead of method chaining so this
isn't a problem (brief discussion at
https://adv-r.hadley.nz/oo-tradeoffs.html#tradeoffs-pipe)

> The reason for the warning is because writing documentation for R6 methods is 
> rather awkward, even/especially with Roxygen. This goes doubly so when the 
> method in question is for a class from a different package. What I've done is 
> to write a Roxygen block for the method as if it was a standalone function; 
> for example, the documentation for az_resource_group$get_vm() is like this:
>
> #' Get existing virtual machine(s)
> #'
> #' Method for the [AzureRMR::az_subscription] and 
> [AzureRMR::az_resource_group] classes.
> #'
> #' @rdname get_vm
> #' @name get_vm
> #' @usage
> #' get_vm(name)
> #' get_vm(name, resource_group = name)
> #'
> #' @param name The name of the VM or cluster.
> #' @param resource_group For the `az_subscription` method, the resource group 
> in which `get_vm()` will look for the VM. Defaults to the VM name.
> #'
> #' @details
> #' ...
> NULL
>
> This way, typing ?get_vm will bring up the relevant page, which seems to me 
> to be the best compromise in terms of the end-user experience. Is this an 
> acceptable way of doing the documentation for CRAN?

I think the usage should be consistent with how people actually call
the function, i.e. x$get_vm(name). I'm not sure if R CMD check will
like this, but I suspect it will silence the warning.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] stringi update

2018-09-15 Thread Hadley Wickham
Looking at the primary CRAN site:
https://cran.r-project.org/web/packages/stringi/index.html, you can
see that the windows binary is still at 1.1.7, suggesting that there's
some build failure. You can see exactly what that is on the CRAN check
page: https://cran.r-project.org/web/checks/check_results_stringi.html

Hadley
On Sat, Sep 15, 2018 at 1:24 AM Patrick Giraudoux
 wrote:
>
> Thnaks for the hint, but I have tried several  mirrors and still get the
> same trouble:
>
>  > update.packages(ask='graphics',checkBuilt=TRUE)
>
>There are binary versions available but the source versions are later:
>  binary source needs_compilation
> stringi  1.1.7  1.2.4  TRUE
>
> Do you want to install from sources the packages which need compilation?
> y/n:
>
> Which re-install the same version at each update...
>
>
>
>
> Le 10/09/2018 à 17:17, R. Mark Sharp a écrit :
> > Patrick,
> >
> > It looks like the CRAN mirror that you are using has not been updated with 
> > the binary versions, which are readily available on other mirrors (e.g., 
> > https://cran.revolutionanalytics.com).
> >
> > Mark
> > R. Mark Sharp, Ph.D.
> > Data Scientist and Biomedical Statistical Consultant
> > 7526 Meadow Green St.
> > San Antonio, TX 78251
> > mobile: 210-218-2868
> > rmsh...@me.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >> On Sep 10, 2018, at 12:23 AM, Patrick Giraudoux 
> >>  wrote:
> >>
> >> Since weeks the package stringi stays on the following versions:
> >>
> >>> update.packages(ask='graphics',checkBuilt=TRUE)
> >>There is a binary version available but the source version is later:
> >>  binary source needs_compilation
> >> stringi  1.1.7  1.2.4  TRUE
> >>
> >> Do you want to install from sources the package which needs compilation?
> >> y/n: n
> >>
> >> However updated,  I am asked to update it again with the same version at 
> >> any next update of R.
> >>
> >> Has somebody an idea about what is happening?
> >>
> >> Patrick
> >>
> >> __
> >> R-package-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Submission to CRAN when package needs personal data (API key)

2018-09-07 Thread Hadley Wickham
On Fri, Sep 7, 2018 at 9:13 AM Iñaki Ucar  wrote:
>
> El vie., 7 sept. 2018 a las 16:03, Ralf Stubner
> () escribió:
> >
> > On 07.09.2018 15:52, Iñaki Ucar wrote:
> > > For the record, this is what the testthat paper in the R Journal says:
> > >
> > > "[...] I recommend storing your tests in inst/tests/ (so users also
> > > have access to them), then including one file in tests/ that runs all
> > > of the package tests. The test_package(package_name) function makes
> > > this easy. [...] This setup has the additional advantage that users
> > > can make sure your package works correctly in their run-time
> > > environment."
> >
> > Tests in 'inst/test' got deprecated later on:
> >
> > https://github.com/r-lib/testthat/commit/0a7d27bb9ea545be7da1a10e511962928d888302
>
> Yeap, and I was pointing out the *old* (2011) practice and
> recommendation. The reason why this evolved and changed, that I don't
> know.

Because CRAN specifically asked me to put testthat tests in tests/
(R's standard testing directory), rather that somewhere non-standard.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Submission to CRAN when package needs personal data (API key)

2018-09-06 Thread Hadley Wickham
On Wed, Sep 5, 2018 at 3:03 PM Duncan Murdoch  wrote:
>
> On 05/09/2018 2:20 PM, Henrik Bengtsson wrote:
> > I take a complementary approach; I condition on, my home-made,
> > R_TEST_ALL variable.  Effectively, I do:
> >
> > if (as.logical(Sys.getenv("R_TEST_ALL", "FALSE"))) {
> > ...
> > }
> >
> > and set R_TEST_ALL=TRUE when I want to run that part of the test.  You
> > can also imagine refined versions of this, e.g. R_TEST_SETS=foo,bar
> > and test scripts with:
> >
> > if ("foo" %in% strsplit(Sys.getenv("R_TEST_SETS"), split="[, ]+")[[1]]) {
> > ...makes no assumption
> > }
> >
> > That avoids making assumptions on where the tests are submitted/run,
> > may it be CRAN, Bioconductor, Travis CI, ...
>
> This is the right way to do it.

I would like to gently push back on this assertion: if CRAN set an
environment variable we would have one single convention that all
packages could rely on. The current system relies on each package
author evolving their own solution. This makes life difficult when you
are running local reverse dependency checks: there is no way to
systematically assert that you want to run tests in a way as similar
as possible to CRAN.

I know that the CRAN maintainers already have a very large workload,
and I hate to add to it, but setting CRAN=1 in a few profile files
doesn't seem excessively burdensome.

> This discussion has come up before.  If you want to submit to CRAN, you
> should include tests that satisfy their requests.  If you want even more
> tests, there are several ways to add them in addition to the CRAN tests.
>   Henrik's is one, "R CMD check --test-dir=myCustomTests" is another.
>
> Rainer's package is unusual, in that from his description it can't
> really work unless the user obtains an API key.  There are other
> packages like that, and those cases need manual handling by CRAN:  they
> don't really run full tests by default.  But the vast majority of
> packages should be able to live within the CRAN guidelines.

10 years ago, I would have definitely supported this statement. But I
am not sure it is still correct today, as there are now many packages
that require a connection to web API to work (or depend on a package
that uses an API). Additionally, CRAN only allows a limited amount of
compute time for each check, so there are often longer tests that you
want to run locally but not on CRAN. CRAN is a specialised testing
service and it does have different constraints to your local machine,
travis, and bioconductor.

A quick search of the CRAN mirror on github
(https://github.com/search?q=org%3Acran+skip_on_cran=Code)
reveals that there are ~2700 tests that use testthat::skip_on_cran().
This is obviously an underestimate of the total number of tests
skipped on CRAN, as many packages don't use testthat, or use an
alternative technique to avoid running code on CRAN.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] True length - length(unclass(x)) - without having to call unclass()?

2018-09-02 Thread Hadley Wickham
For the new vctrs::records class, I implemented length, names, [[, and
[[<- myself in https://github.com/r-lib/vctrs/blob/master/src/fields.c.
That lets me override the default S3 methods while still being able to
access the underlying data that I'm interested in.

Another option that avoids (that you should never discuss in public
) is temporarily setting the object bit to FALSE.

In the long run, I think an ALTREP vector that exposes the underlying
data of an S3 object (i.e. sans attributes apart from names) is
probably the way forward.

Hadley
On Fri, Aug 24, 2018 at 1:03 PM Henrik Bengtsson
 wrote:
>
> Is there a low-level function that returns the length of an object 'x'
> - the length that for instance .subset(x) and .subset2(x) see? An
> obvious candidate would be to use:
>
> .length <- function(x) length(unclass(x))
>
> However, I'm concerned that calling unclass(x) may trigger an
> expensive copy internally in some cases.  Is that concern unfounded?
>
> Thxs,
>
> Henrik
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] build package with unicode (farsi) strings

2018-08-30 Thread Hadley Wickham
On Thu, Aug 30, 2018 at 2:11 AM Thierry Onkelinx
 wrote:
>
> Dear Farid,
>
> Try using the ASCII notation. letters_fa <- c("\u0627", "\u0641"). The full
> code table is available at https://www.utf8-chartable.de

It's a little easier to do this with code:

letters_fa <- c('الف','ب','پ','ت','ث','ج','چ','ح','خ','ر','ز','د')
writeLines(stringi::stri_escape_unicode(letters_fa))
#> \u0627\u0644\u0641
#> \u0628
#> \u067e
#> \u062a
#> \u062b
#> \u062c
#> \u0686
#> \u062d
#> \u062e
#> \u0631
#> \u0632
#> \u062f

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ROBUSTNESS: x || y and x && y to give warning/error if length(x) != 1 or length(y) != 1

2018-08-30 Thread Hadley Wickham
On Thu, Aug 30, 2018 at 10:58 AM Martin Maechler
 wrote:
>
> > Joris Meys
> > on Thu, 30 Aug 2018 14:48:01 +0200 writes:
>
> > On Thu, Aug 30, 2018 at 2:09 PM Dénes Tóth
> >  wrote:
> >> Note that `||` and `&&` have never been symmetric:
> >>
> >> TRUE || stop() # returns TRUE stop() || TRUE # returns an
> >> error
> >>
> >>
> > Fair point. So the suggestion would be to check whether x
> > is of length 1 and whether y is of length 1 only when
> > needed. I.e.
>
> > c(TRUE,FALSE) || TRUE
>
> > would give an error and
>
> > TRUE || c(TRUE, FALSE)
>
> > would pass.
>
> > Thought about it a bit more, and I can't come up with a
> > use case where the first line must pass. So if the short
> > circuiting remains and the extra check only gives a small
> > performance penalty, adding the error could indeed make
> > some bugs more obvious.
>
> I agree "in theory".
> Thank you, Henrik, for bringing it up!
>
> In practice I think we should start having a warning signalled.
> I have checked the source code in the mean time, and the check
> is really very cheap
> { because it can/should be done after checking isNumber(): so
>   then we know we have an atomic and can use XLENGTH() }
>
>
> The 0-length case I don't think we should change as I do find
> NA (is logical!) to be an appropriate logical answer.

Can you explain your reasoning a bit more here? I'd like to understand
the general principle, because from my perspective it's more
parsimonious to say that the inputs to || and && must be length 1,
rather than to say that inputs could be length 0 or length 1, and in
the length 0 case they are replaced with NA.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ROBUSTNESS: x || y and x && y to give warning/error if length(x) != 1 or length(y) != 1

2018-08-30 Thread Hadley Wickham
I think this is an excellent idea as it eliminates a situation which
is almost certainly user error. Making it an error would break a small
amount of existing code (even if for the better), so perhaps it should
start as a warning, but be optionally upgraded to an error. It would
be nice to have a fixed date (R version) in the future when the
default will change to error.

In an ideal world, I think the following four cases should all return
the same error:

if (logical()) 1
#> Error in if (logical()) 1: argument is of length zero
if (c(TRUE, TRUE)) 1
#> Warning in if (c(TRUE, TRUE)) 1: the condition has length > 1 and only the
#> first element will be used
#> [1] 1

logical() || TRUE
#> [1] TRUE
c(TRUE, TRUE) || TRUE
#> [1] TRUE

i.e. I think that `if`, `&&`, and `||` should all check that their
input is a logical (or numeric) vector of length 1.

Hadley

On Tue, Aug 28, 2018 at 10:03 PM Henrik Bengtsson
 wrote:
>
> # Issue
>
> 'x || y' performs 'x[1] || y' for length(x) > 1.  For instance (here
> using R 3.5.1),
>
> > c(TRUE, TRUE) || FALSE
> [1] TRUE
> > c(TRUE, FALSE) || FALSE
> [1] TRUE
> > c(TRUE, NA) || FALSE
> [1] TRUE
> > c(FALSE, TRUE) || FALSE
> [1] FALSE
>
> This property is symmetric in LHS and RHS (i.e. 'y || x' behaves the
> same) and it also applies to 'x && y'.
>
> Note also how the above truncation of 'x' is completely silent -
> there's neither an error nor a warning being produced.
>
>
> # Discussion/Suggestion
>
> Using 'x || y' and 'x && y' with a non-scalar 'x' or 'y' is likely a
> mistake.  Either the code is written assuming 'x' and 'y' are scalars,
> or there is a coding error and vectorized versions 'x | y' and 'x & y'
> were intended.  Should 'x || y' always be considered an mistake if
> 'length(x) != 1' or 'length(y) != 1'?  If so, should it be a warning
> or an error?  For instance,
> '''r
> > x <- c(TRUE, TRUE)
> > y <- FALSE
> > x || y
>
> Error in x || y : applying scalar operator || to non-scalar elements
> Execution halted
>
> What about the case where 'length(x) == 0' or 'length(y) == 0'?  Today
> 'x || y' returns 'NA' in such cases, e.g.
>
> > logical(0) || c(FALSE, NA)
> [1] NA
> > logical(0) || logical(0)
> [1] NA
> > logical(0) && logical(0)
> [1] NA
>
> I don't know the background for this behavior, but I'm sure there is
> an argument behind that one.  Maybe it's simply that '||' and '&&'
> should always return a scalar logical and neither TRUE nor FALSE can
> be returned.
>
> /Henrik
>
> PS. This is in the same vein as
> https://mailman.stat.ethz.ch/pipermail/r-devel/2017-March/073817.html
> - in R (>=3.4.0) we now get that if (1:2 == 1) ... is an error if
> _R_CHECK_LENGTH_1_CONDITION_=true
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Where does L come from?

2018-08-29 Thread Hadley Wickham
Thanks for the great discussion everyone!
Hadley
On Sat, Aug 25, 2018 at 8:26 AM Hadley Wickham  wrote:
>
> Hi all,
>
> Would someone mind pointing to me to the inspiration for the use of
> the L suffix to mean "integer"?  This is obviously hard to google for,
> and the R language definition
> (https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Constants)
> is silent.
>
> Hadley
>
> --
> http://hadley.nz



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-29 Thread Hadley Wickham
>> conflicted applies a few heuristics to minimise false positives (at the
>> cost of introducing a few false negatives). The overarching goal is to
>> ensure that code behaves identically regardless of the order in which
>> packages are attached.
>>
>> -   A number of packages provide a function that appears to conflict
>> with a function in a base package, but they follow the superset
>> principle (i.e. they only extend the API, as explained to me by
>> Hervè Pages).
>>
>> conflicted assumes that packages adhere to the superset principle,
>> which appears to be true in most of the cases that I’ve seen.
>
>
> It seems that you may be able to strengthen this heuristic from a blanket 
> assumption to something more narrowly targeted by looking for one or more of 
> the following to confirm likely-superset adherence
>
> matching or purely extending formals (ie all the named arguments of base::fun 
> match including order, and there are new arguments in pkg::fun only if 
> base::fun takes ...)
> explicit call to  base::fun in the body of pkg::fun
> UseMethod(funname) and at least one provided S3 method calls base::fun
> S4 generic creation using fun or base::fun as the seeding/default method body 
> or called from at least one method

Oooh nice, idea I'll definitely try it out.

>> For
>> example, the lubridate package provides `as.difftime()` and `date()`
>> which extend the behaviour of base functions, and provides S4
>> generics for the set operators.
>>
>> conflict_scout(c("lubridate", "base"))
>> #> 5 conflicts:
>> #> * `as.difftime`: [lubridate]
>> #> * `date`   : [lubridate]
>> #> * `intersect`  : [lubridate]
>> #> * `setdiff`: [lubridate]
>> #> * `union`  : [lubridate]
>>
>> There are two popular functions that don’t adhere to this principle:
>> `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
>> special cases so they correctly generate conflicts. (I sure wish I’d
>> know about the subset principle when creating dplyr!)
>>
>> conflict_scout(c("dplyr", "stats"))
>> #> 2 conflicts:
>> #> * `filter`: dplyr, stats
>> #> * `lag`   : dplyr, stats
>>
>> -   Deprecated functions should never win a conflict, so conflicted
>> checks for use of `.Deprecated()`. This rule is very useful when
>> moving functions from one package to another. For example, many
>> devtools functions were moved to usethis, and conflicted ensures
>> that you always get the non-deprecated version, regardess of package
>> attach order:
>
>
> I would completely believe this rule is useful for refactoring as you 
> describe, but that is the "same function" case. For an end-user in the 
> "different function same symbol" case it's not at all clear to me that the 
> deprecated function should always win.
>
> People sometimes use deprecated functions. It's not great, and eventually 
> they'll need to fix that for any given case, but imagine if you deprecated 
> the filter verb in dplyr (I know this will never happen, but I think it's 
> illustrative none the less).
>
> Consider a piece of code someone wrote before this hypothetical deprecation 
> of filter. The fact that it's now deprecated certainly doesn't mean that they 
> secretly wanted stats::filter all along, right? Conflicted acting as if it 
> does will lead to them getting the exact kind of error you're looking to 
> protect them from, and with even less ability to understand why because they 
> are already doing "The right thing" to protect themselves by using conflicted 
> in the first place...

Ah yes, good point. I'll add some heuristic to check that the function
name appears in the first argument of the .Deprecated call (assuming
that the call looks something like `.Deprecated("pkg::foo")`)

>> Finally, as mentioned above, the user can declare preferences:
>>
>> conflict_prefer("select", "MASS")
>> #> [conflicted] Will prefer MASS::select over any other package
>> conflict_scout(c("dplyr", "MASS"))
>> #> 1 conflict:
>> #> * `select`: [MASS]
>>
>
> I deeply worry about people putting this kind of thing, or even just 
> library(conflicted), in their .Rprofile and thus making their scripts 
> substantially less reproducible. Is that a consequence you have thought about 
> to this kind of functionality?

Yes, and I've already recommended against it in two places :)  I'm not
sure if there's any more I can do - people already put (e.g.)
`library(ggplot2)` in their .Rprofile, which is just as bad from a
reproducibility standpoint.

Thanks for the thoughtful feedback!

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Where does L come from?

2018-08-25 Thread Hadley Wickham
Hi all,

Would someone mind pointing to me to the inspiration for the use of
the L suffix to mean "integer"?  This is obviously hard to google for,
and the R language definition
(https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Constants)
is silent.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-24 Thread Hadley Wickham
On Fri, Aug 24, 2018 at 4:28 AM Joris Meys  wrote:
>
> Dear Hadley,
>
> There's been some mails from you lately about packages on R-devel. I would 
> argue that the appropriate list for that is R-pkg-devel, as I've been told 
> myself not too long ago. People might get confused and think this is about a 
> change to R itself, which it obviously is not.

The description for R-pkg-devel states:

> This list is to get help about package development in R. The goal of the list 
> is to provide a forum for learning about the package development process. We 
> hope to build a community of R package developers who can help each other 
> solve problems, and reduce some of the burden on the CRAN maintainers. If you 
> are having problems developing a package or passing R CMD check, this is the 
> place to ask!

The description for R-devel states:

> This list is intended for questions and discussion about code development in 
> R. Questions likely to prompt discussion unintelligible to non-programmers or 
> topics that are too technical for R-help's audience should go to R-devel, 
> unless they are specifically about problems in R package development where 
> the R-package-devel list is rather appropriate, see the posting guide 
> section. The main R mailing list is R-help.

My questions are not about how to develop a package, R CMD check, or
how to get it on CRAN, but instead about the semantics of the packages
I am working on. My opinion is supported by the fact that a number of
members of the R core team have responded (both on list and off) and
have not expressed concern about my choice of venue.

That said, I am happy to change venues (or simply not email at all) if
there is widespread concern that my emails are inappropriate.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-24 Thread Hadley Wickham
On Thu, Aug 23, 2018 at 3:46 PM Duncan Murdoch  wrote:
>
> First, some general comments:
>
> This sounds like a useful package.
>
> I would guess it has very little impact on runtime efficiency except
> when attaching a new package; have you checked that?

It adds one extra element to the search path, so the impact on speed
should be equivalent to loading one additional package (i.e.
negligible)

I've also done some benchmarking to see the impact on calls to
library(). These are now a little outdated (because I've added more
heuristics so I should re-do), but previously conflicted added about
100 ms overhead to a library() call when I had ~170 packages loaded
(the most I could load without running out of dlls).

> I am not so sure about your heuristics.  Can they be disabled, so the
> user is always forced to make the choice?  Even when a function is
> intended to adhere to the superset principle, they don't always get it
> right, so a really careful user should always do explicit disambiguation.

That is a good question - my intuition is always to start with less
user control as it makes it easier to get the core ideas right, and
it's easy to add more control later (whereas if you later take it
away, people get unhappy). Maybe it's natural to have a function that
does the opposite of conflict_prefer(), and declare that something
that doesn't appear to be a conflict actually is?

I don't think that an option to suppress the superset principle
altogether will work - my sense is that it will generate too many
false positives, to the point where you'll get frustrated and stop
using conflicted.

> And of course, if users wrote most of their long scripts as packages
> instead of as long scripts, the ambiguity issue would arise far less
> often, because namespaces in packages are intended to solve the same
> problem as your package does.

Agreed.

> One more comment inline about a typo, possibly in an error message.

Thanks for spotting; fixed in devel now.

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] conflicted: an alternative conflict resolution strategy

2018-08-23 Thread Hadley Wickham
Hi all,

I’d love to get your feedback on the conflicted package, which provides an
alternative strategy for resolving ambiugous function names (i.e. when
multiple packages provide identically named functions). conflicted 0.1.0
is already on CRAN, but I’m currently preparing a revision
(), and looking for feedback.

As you are no doubt aware, R’s default approach means that the most
recently loaded package “wins” any conflicts. You do get a message about
conflicts on load, but I see a lot newer R users experiencing problems
caused by function conflicts. I think there are three primary reasons:

-   People don’t read messages about conflicts. Even if you are
conscientious and do read the messages, it’s hard to notice a single
new conflict caused by a package upgrade.

-   The warning and the problem may be quite far apart. If you load all
your packages at the top of the script, it may potentially be 100s
of lines before you encounter a conflict.

-   The error messages caused by conflicts are cryptic because you end
up calling a function with utterly unexpected arguments.

For these reasons, conflicted takes an alternative approach, forcing the
user to explicitly disambiguate any conflicts:

library(conflicted)
library(dplyr)
library(MASS)

select
#> Error: [conflicted] `select` found in 2 packages.
#> Either pick the one you want with `::`
#> * MASS::select
#> * dplyr::select
#> Or declare a preference with `conflicted_prefer()`
#> * conflict_prefer("select", "MASS")
#> * conflict_prefer("select", "dplyr")

conflicted works by attaching a new “conflicted” environment just after
the global environment. This environment contains an active binding for
any ambiguous bindings. The conflicted environment also contains
bindings for `library()` and `require()` that rebuild the conflicted
environemnt suppress default reporting (but are otherwise thin wrapeprs
around the base equivalents).

conflicted also provides a `conflict_scout()` helper which you can use
to see what’s going on:

conflict_scout(c("dplyr", "MASS"))
#> 1 conflict:
#> * `select`: dplyr, MASS

conflicted applies a few heuristics to minimise false positives (at the
cost of introducing a few false negatives). The overarching goal is to
ensure that code behaves identically regardless of the order in which
packages are attached.

-   A number of packages provide a function that appears to conflict
with a function in a base package, but they follow the superset
principle (i.e. they only extend the API, as explained to me by
Hervè Pages).

conflicted assumes that packages adhere to the superset principle,
which appears to be true in most of the cases that I’ve seen. For
example, the lubridate package provides `as.difftime()` and `date()`
which extend the behaviour of base functions, and provides S4
generics for the set operators.

conflict_scout(c("lubridate", "base"))
#> 5 conflicts:
#> * `as.difftime`: [lubridate]
#> * `date`   : [lubridate]
#> * `intersect`  : [lubridate]
#> * `setdiff`: [lubridate]
#> * `union`  : [lubridate]

There are two popular functions that don’t adhere to this principle:
`dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
special cases so they correctly generate conflicts. (I sure wish I’d
know about the subset principle when creating dplyr!)

conflict_scout(c("dplyr", "stats"))
#> 2 conflicts:
#> * `filter`: dplyr, stats
#> * `lag`   : dplyr, stats

-   Deprecated functions should never win a conflict, so conflicted
checks for use of `.Deprecated()`. This rule is very useful when
moving functions from one package to another. For example, many
devtools functions were moved to usethis, and conflicted ensures
that you always get the non-deprecated version, regardess of package
attach order:

head(conflict_scout(c("devtools", "usethis")))
#> 26 conflicts:
#> * `use_appveyor`   : [usethis]
#> * `use_build_ignore`   : [usethis]
#> * `use_code_of_conduct`: [usethis]
#> * `use_coverage`   : [usethis]
#> * `use_cran_badge` : [usethis]
#> * `use_cran_comments`  : [usethis]
#> ...

Finally, as mentioned above, the user can declare preferences:

conflict_prefer("select", "MASS")
#> [conflicted] Will prefer MASS::select over any other package
conflict_scout(c("dplyr", "MASS"))
#> 1 conflict:
#> * `select`: [MASS]

I’d love to hear what people think about the general idea, and if there
are any obviously missing pieces.

Thanks!

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] CRAN incoming queue closed from Sep 1 to Sep 9

2018-08-14 Thread Hadley Wickham
Does this include automatically (bot) accepted submissions?
Hadley
On Tue, Aug 14, 2018 at 8:07 AM Uwe Ligges
 wrote:
>
> Dear package developers,
>
> the CRAN incoming queue will be closed from Sep 1 to Sep 9. Hence
> package submissions are only possible before and after that period.
>
> Best,
> Uwe Ligges
> (for the CRAN team)
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] substitute() on arguments in ellipsis ("dot dot dot")?

2018-08-13 Thread Hadley Wickham
Since you're already using bang-bang ;)

library(rlang)

dots1 <- function(...) as.list(substitute(list(...)))[-1L]
dots2 <- function(...) as.list(substitute(...()))
dots3 <- function(...) match.call(expand.dots = FALSE)[["..."]]
dots4 <- function(...) exprs(...)

bench::mark(
  dots1(1+2, "a", rnorm(3), stop("bang!")),
  dots2(1+2, "a", rnorm(3), stop("bang!")),
  dots3(1+2, "a", rnorm(3), stop("bang!")),
  dots4(1+2, "a", rnorm(3), stop("bang!")),
  check = FALSE
)[1:4]
#> # A tibble: 4 x 4
#>   expression  min mean  median
#>   
#> 1 "dots1(1 + 2, \"a\", rnorm(3), stop(\"bang!\"…   3.23µs   4.15µs  3.81µs
#> 2 "dots2(1 + 2, \"a\", rnorm(3), stop(\"bang!\"…   2.72µs   4.48µs  3.37µs
#> 3 "dots3(1 + 2, \"a\", rnorm(3), stop(\"bang!\"…   4.06µs   4.94µs  4.69µs
#> 4 "dots4(1 + 2, \"a\", rnorm(3), stop(\"bang!\"…   3.92µs4.9µs  4.46µs


On Mon, Aug 13, 2018 at 4:19 AM Henrik Bengtsson
 wrote:
>
> Thanks all, this was very helpful.  Peter's finding - dots2() below -
> is indeed interesting - I'd be curious to learn what goes on there.
>
> The different alternatives perform approximately the same;
>
> dots1 <- function(...) as.list(substitute(list(...)))[-1L]
> dots2 <- function(...) as.list(substitute(...()))
> dots3 <- function(...) match.call(expand.dots = FALSE)[["..."]]
>
> stats <- microbenchmark::microbenchmark(
>   dots1(1+2, "a", rnorm(3), stop("bang!")),
>   dots2(1+2, "a", rnorm(3), stop("bang!")),
>   dots3(1+2, "a", rnorm(3), stop("bang!")),
>   times = 10e3
> )
> print(stats)
> # Unit: microseconds
> #expr  min   lq mean median
> uq  max neval
> #  dots1(1 + 2, "a", rnorm(3), stop("bang!")) 2.14 2.45 3.04   2.58
> 2.73 1110 1
> #  dots2(1 + 2, "a", rnorm(3), stop("bang!")) 1.81 2.10 2.47   2.21
> 2.34 1626 1
> #  dots3(1 + 2, "a", rnorm(3), stop("bang!")) 2.59 2.98 3.36   3.15
> 3.31 1037 1
>
> /Henrik
>
> On Mon, Aug 13, 2018 at 7:10 AM Peter Meilstrup
>  wrote:
> >
> > Interestingly,
> >
> >as.list(substitute(...()))
> >
> > also works.
> >
> > On Sun, Aug 12, 2018 at 1:16 PM, Duncan Murdoch
> >  wrote:
> > > On 12/08/2018 4:00 PM, Henrik Bengtsson wrote:
> > >>
> > >> Hi. For any number of *known* arguments, we can do:
> > >>
> > >> one <- function(a) list(a = substitute(a))
> > >> two <- function(a, b) list(a = substitute(a), b = substitute(b))
> > >>
> > >> and so on. But how do I achieve the same when I have:
> > >>
> > >> dots <- function(...) list(???)
> > >>
> > >> I want to implement this such that I can do:
> > >>
> > >>> exprs <- dots(1+2)
> > >>> str(exprs)
> > >>
> > >> List of 1
> > >>   $ : language 1 + 2
> > >>
> > >> as well as:
> > >>
> > >>> exprs <- dots(1+2, "a", rnorm(3))
> > >>> str(exprs)
> > >>
> > >> List of 3
> > >>   $ : language 1 + 2
> > >>   $ : chr "a"
> > >>   $ : language rnorm(3)
> > >>
> > >> Is this possible to achieve using plain R code?
> > >
> > >
> > > I think so.  substitute(list(...)) gives you a single expression 
> > > containing
> > > a call to list() with the unevaluated arguments; you can convert that to
> > > what you want using something like
> > >
> > > dots <- function (...) {
> > >   exprs <- substitute(list(...))
> > >   as.list(exprs[-1])
> > > }
> > >
> > > Duncan Murdoch
> > >
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] vctrs: a type system for the tidyverse

2018-08-09 Thread Hadley Wickham
On Thu, Aug 9, 2018 at 4:26 PM jan Vitek  wrote:
>
> > I'm now confident that I
> > can avoid using "type" by itself, and instead always use it in a
> > compound phrase (like type system) to avoid confusion. That leaves the
> > `.type` argument to many vctrs functions. I'm considering change it to
> > .prototype, because what you actually give it is a zero-length vector
> > of the class you want, i.e. a prototype of the desired output. What do
> > you think of prototype as a name?
>
>
> The term “type system” in computer science is used in very different ways.
> What the note describes is not a type system, but rather a set of
> coercions used by a small number of functions in one package.
>
> Typically it refers to a set of rules (either statically enforced
> by the compiler or dynamically enforced by the runtime) that ensure
> that some particular category of errors can be caught by the
> language.
>
> There is none of that here.

I think there's a bit of that flavour here:

vec_c(factor("a"), Sys.Date())
#> Error: No common type for factor and date

This isn't a type system imposed by the language, but I don't think
that's a reason not to call it a type system.

That said, I agree that calling it a type system is currently
overselling it, and I have made your proposed change to the README
(and added a very-long term goal of making a type system that could be
applied using (e.g.) annotations).

> "The short-term goal of vctrs is to develop a type system for vectors which 
> will help reason about functions that combine different types of input (e.g. 
> c(), ifelse(), rbind()). The vctrs type system encompasses base vectors (e.g. 
> logical, numeric, character, list), S3 vectors (e.g. factor, ordered, Date, 
> POSIXct), and data frames; and can be extended to deal with S3 vectors 
> defined in other packages, as described in vignette("extending-vctrs”).”
>
> ==>
>
> The short-term goal of vctrs is to specify the behavior of functions that 
> combine different types of vectors (e.g. c(), ifelse(), rbind()). The 
> specification encompasses base vectors (e.g. logical, numeric, character, 
> list), S3 vectors (e.g. factor, ordered, Date, POSIXct), and data frames; and 
> can be extended to deal with S3 vectors defined in other packages, as 
> described in vignette("extending-vctrs”).

Thanks for the nice wording!

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] vctrs: a type system for the tidyverse

2018-08-09 Thread Hadley Wickham
> > As Gabe mentioned (and you've explained about) the term "type"
> > is really confusing here.  As you know, the R internals are all
> > about SEXPs, TYPEOF(), etc, and that's what the R level
> > typeof(.) also returns.  As you want to use something slightly
> > different, it should be different naming, ideally something not
> > existing yet in the R / S world, maybe 'kind' ?
>
> Agreed - I've been using type in the sense of "type system"
> (particularly as it related to algebraic data types), but that's not
> obvious from the current presentation, and as you note, is confusing
> with existing notions of type in R. I like your suggestion of kind,
> but I think it might be possible to just talk about classes, and
> instead emphasise that while the components of the system are classes
> (and indeed it's implemented using S3), the coercion/casting
> relationship do not strictly follow the subclass/superclass
> relationships.

I've taken another pass through (the first part of) the readme
(https://github.com/r-lib/vctrs#vctrs), and I'm now confident that I
can avoid using "type" by itself, and instead always use it in a
compound phrase (like type system) to avoid confusion. That leaves the
`.type` argument to many vctrs functions. I'm considering change it to
.prototype, because what you actually give it is a zero-length vector
of the class you want, i.e. a prototype of the desired output. What do
you think of prototype as a name?

Do you have any thoughts on good names for distinction vectors without
a class (i.e. logical, integer, double, ...) from vectors with a class
(e.g. factors, dates, etc). I've been thinking bare vector and S3
vector (leaving room to later think about S4 vectors). Do those sound
reasonable to you?

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] vctrs: a type system for the tidyverse

2018-08-09 Thread Hadley Wickham
On Thu, Aug 9, 2018 at 7:54 AM Joris Meys  wrote:
>
> Hi Hadley,
>
> my point actually came from a data analyst point of view. A character 
> variable is something used for extra information, eg the "any other ideas?" 
> field of a questionnaire. A categorical variable is a variable describing 
> categories defined by the researcher. If it is made clear that a factor is 
> the object type needed for a categorical variable, there is no confusion. All 
> my students get it. But I agree that in many cases people are taught that a 
> factor is somehow related to character variables. And that does not make 
> sense from a data analyst point of view if you think about variables as 
> continuous, ordinal and nominal in a model context.
>
> So I don't think adding more confusing behaviour and pitfalls is a solution 
> to something that's essentially a misunderstanding. It's something that's 
> only solved by explaining it correctly imho.

I agree with your definition of character and factor variables. It's
an important distinction, and I agree that the blurring of factors and
characters is generally undesirable. However, the merits of respecting
R's existing behaviour, and Martin Mächler's support, means that I'm
not going to change vctr's approach at this point in time. However, I
hear from you and Gabe that this is an important issue, and I'll
definitely keep it in mind as I solicit further feedback from users.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] vctrs: a type system for the tidyverse

2018-08-09 Thread Hadley Wickham
On Thu, Aug 9, 2018 at 3:57 AM Joris Meys  wrote:
>
>  I sent this to  Iñaki personally by mistake. Thank you for notifying me.
>
> On Wed, Aug 8, 2018 at 7:53 PM Iñaki Úcar  wrote:
>
> >
> > For what it's worth, I always thought about factors as fundamentally
> > characters, but with restrictions: a subspace of all possible strings.
> > And I'd say that a non-negligible number of R users may think about
> > them in a similar way.
> >
>
> That idea has been a common source of bugs and the most important reason
> why I always explain my students that factors are a special kind of
> numeric(integer), not character. Especially people coming from SPSS see
> immediately the link with categorical variables in that way, and understand
> that a factor is a modeling aid rather than an alternative for characters.
> It is a categorical variable and a more readable way of representing a set
> of dummy variables.
>
> I do agree that some of the factor behaviour is confusing at best, but that
> doesn't change the appropriate use and meaning of factors as categorical
> variables.
>
> Even more, I oppose the ideas that :
>
> 1) factors with different levels should be concatenated.
>
> 2) when combining factors, the union of the levels would somehow be a good
> choice.
>
> Factors with different levels are variables with different information, not
> more or less information. If one factor codes low and high and another
> codes low, mid and high, you can't say whether mid in one factor would be
> low or high in the first one. The second has a higher resolution, and
> that's exactly the reason why they should NOT be combined. Different levels
> indicate a different grouping, and hence that data should never be used as
> one set of dummy variables in any model.
>
> Even when combining factors, the union of levels only makes sense to me if
> there's no overlap between levels of both factors. In all other cases, a
> researcher will need to determine whether levels with the same label do
> mean the same thing in both factors, and that's not guaranteed. And when
> we're talking a factor with a higher resolution and a lower resolution, the
> correct thing to do modelwise is to recode one of the factors so they have
> the same resolution and every level the same definition before you merge
> that data.
>
> So imho the combination of two factors with different levels (or even
> levels in a different order) should give an error. Which R currently
> doesn't throw, so I get there's room for improvement.

I 100% agree with you, and is this the behaviour that vctrs used to
have and dplyr currently has (at least in bind_rows()). But
pragmatically, my experience with dplyr is that people find this
behaviour confusing and unhelpful. And when I played the full
expression of this behaviour in vctrs, I found that it forced me to
think about the levels of factors more than I'd otherwise like to: it
made me think like a programmer, not like a data analyst. So in an
ideal world, yes, I think factors would have stricter behaviour, but
my sense is that imposing this strictness now will be onerous to most
analysts.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] vctrs: a type system for the tidyverse

2018-08-08 Thread Hadley Wickham
>> So we say that a
>> factor `x` has finer resolution than factor `y` if the levels of `y`
>> are contained in `x`. So to find the common type of two factors, we
>> take the union of the levels of each factor, given a factor that has
>> finer resolution than both.
>
> I'm not so sure. I think a more useful definition of resolution may be
> that it is about increasing the precision of information. In that case,
> a factor with 4 levels each of which is present has a higher resolution
> than the same data with additional-but-absent levels on the factor object.
> Now that may be different when the the new levels are not absent, but
> my point is that its not clear to me that resolution is a useful way of
> talking about factors.

An alternative way of framing factors is that they're about tracking
possible values, particular possible values that don't exist in the
data that you have. Thinking about factors in that way, makes unioning
the levels more natural.

> If users want unrestricted character type behavior, then IMHO they should
> just be using characters, and it's quite easy for them to do so in any case
> I can easily think of where they have somehow gotten their hands on a factor.
> If, however, they want a factor, it must be - I imagine - because they 
> actually
> want the the semantics and behavior specific to factors.

I think this is true in the tidyverse, which will never give you a
factor unless you explicitly ask for one, but the default in base R
(at least as soon as a data frame is involved) is to turn character
vectors into factors.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] vctrs: a type system for the tidyverse

2018-08-08 Thread Hadley Wickham
> > I now have a better argument, I think:
>
> > If you squint your brain a little, I think you can see
> > that each set of automatic coercions is about increasing
> > resolution. Integers are low resolution versions of
> > doubles, and dates are low resolution versions of
> > date-times. Logicals are low resolution version of
> > integers because there's a strong convention that `TRUE`
> > and `FALSE` can be used interchangeably with `1` and `0`.
>
> > But what is the resolution of a factor? We must take a
> > somewhat pragmatic approach because base R often converts
> > character vectors to factors, and we don't want to be
> > burdensome to users. So we say that a factor `x` has finer
> > resolution than factor `y` if the levels of `y` are
> > contained in `x`. So to find the common type of two
> > factors, we take the union of the levels of each factor,
> > given a factor that has finer resolution than
> > both. Finally, you can think of a character vector as a
> > factor with every possible level, so factors and character
> > vectors are coercible.
>
> > (extracted from the in-progress vignette explaining how to
> > extend vctrs to work with your own vctrs, now that vctrs
> > has been rewritten to use double dispatch)
>
> I like this argumentation, and find it very nice indeed!
> It confirms my own gut feeling which had lead me to agreeing
> with you, Hadley, that taking the union of all factor levels
> should be done here.

That's great to hear :)

> As Gabe mentioned (and you've explained about) the term "type"
> is really confusing here.  As you know, the R internals are all
> about SEXPs, TYPEOF(), etc, and that's what the R level
> typeof(.) also returns.  As you want to use something slightly
> different, it should be different naming, ideally something not
> existing yet in the R / S world, maybe 'kind' ?

Agreed - I've been using type in the sense of "type system"
(particularly as it related to algebraic data types), but that's not
obvious from the current presentation, and as you note, is confusing
with existing notions of type in R. I like your suggestion of kind,
but I think it might be possible to just talk about classes, and
instead emphasise that while the components of the system are classes
(and indeed it's implemented using S3), the coercion/casting
relationship do not strictly follow the subclass/superclass
relationships.

A good motivating example is now ordered vs factor - I don't think you
can say that ordered or factor have greater resolution than the other
so:

vec_c(factor("a"), ordered("a"))
#> Error: No common type for factor and ordered

This is not what you'd expect from an _object_ system since ordered is
a subclass of factor.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] vctrs: a type system for the tidyverse

2018-08-08 Thread Hadley Wickham
>>> Method dispatch for `vec_c()` is quite simple because associativity and
>>> commutativity mean that we can determine the output type only by
>>> considering a pair of inputs at a time. To this end, vctrs provides
>>> `vec_type2()` which takes two inputs and returns their common type
>>> (represented as zero length vector):
>>>
>>> str(vec_type2(integer(), double()))
>>> #>  num(0)
>>>
>>> str(vec_type2(factor("a"), factor("b")))
>>> #>  Factor w/ 2 levels "a","b":
>>
>>
>> What is the reasoning behind taking the union of the levels here? I'm not
>> sure that is actually the behavior I would want if I have a vector of
>> factors and I try to append some new data to it. I might want/ expect to
>> retain the existing levels and get either NAs or an error if the new data
>> has (present) levels not in the first data. The behavior as above doesn't
>> seem in-line with what I understand the purpose of factors to be (explicit
>> restriction of possible values).
>
> Originally (like a week ago ), we threw an error if the factors
> didn't have the same level, and provided an optional coercion to
> character. I decided that while correct (the factor levels are a
> parameter of the type, and hence factors with different levels aren't
> comparable), that this fights too much against how people actually use
> factors in practice. It also seems like base R is moving more in this
> direction, i.e. in 3.4 factor("a") == factor("b") is an error, whereas
> in R 3.5 it returns FALSE.

I now have a better argument, I think:

If you squint your brain a little, I think you can see that each set
of automatic coercions is about increasing resolution. Integers are
low resolution versions of doubles, and dates are low resolution
versions of date-times. Logicals are low resolution version of
integers because there's a strong convention that `TRUE` and `FALSE`
can be used interchangeably with `1` and `0`.

But what is the resolution of a factor? We must take a somewhat
pragmatic approach because base R often converts character vectors to
factors, and we don't want to be burdensome to users. So we say that a
factor `x` has finer resolution than factor `y` if the levels of `y`
are contained in `x`. So to find the common type of two factors, we
take the union of the levels of each factor, given a factor that has
finer resolution than both. Finally, you can think of a character
vector as a factor with every possible level, so factors and character
vectors are coercible.

(extracted from the in-progress vignette explaining how to extend
vctrs to work with your own vctrs, now that vctrs has been rewritten
to use double dispatch)

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] vctrs: a type system for the tidyverse

2018-08-06 Thread Hadley Wickham
> First off, you are using the word "type" throughout this email; You seem to
> mean class (judging by your Date and factor examples, and the fact you
> mention S3 dispatch) as opposed to type in the sense of what is returned by
> R's  typeof() function. I think it would be clearer if you called it class
> throughout unless that isn't actually what you mean (in which case I would
> have other questions...)

I used "type" to hand wave away the precise definition - it's not S3
class or base type (i.e. typeof()) but some hybrid of the two. I do
want to emphasise that it's a type system, not a oo system, in that
coercions are not defined by superclass/subclass relationships.

> More thoughts inline.
>
> On Mon, Aug 6, 2018 at 9:21 AM, Hadley Wickham  wrote:
>>
>> Hi all,
>>
>> I wanted to share with you an experimental package that I’m currently
>> working on: vctrs, <https://github.com/r-lib/vctrs>. The motivation for
>> vctrs is to think deeply about the output “type” of functions like
>> `c()`, `ifelse()`, and `rbind()`, with an eye to implementing one
>> strategy throughout the tidyverse (i.e. all the functions listed at
>> <https://github.com/r-lib/vctrs#tidyverse-functions>). Because this is
>> going to be a big change, I thought it would be very useful to get
>> comments from a wide audience, so I’m reaching out to R-devel to get
>> your thoughts.
>>
>> There is quite a lot already in the readme
>> (<https://github.com/r-lib/vctrs#vctrs>), so here I’ll try to motivate
>> vctrs as succinctly as possible by comparing `base::c()` to its
>> equivalent `vctrs::vec_c()`. I think the drawbacks of `c()` are well
>> known, but to refresh your memory, I’ve highlighted a few at
>> <https://github.com/r-lib/vctrs#compared-to-base-r>. I think they arise
>> because of two main challenges: `c()` has to both combine vectors *and*
>> strip attributes, and it only dispatches on the first argument.
>>
>> The design of vctrs is largely driven by a pair of principles:
>>
>> -   The type of `vec_c(x, y)` should be the same as `vec_c(y, x)`
>>
>> -   The type of `vec_c(x, vec_c(y, z))` should be the same as
>> `vec_c(vec_c(x, y), z)`
>>
>> i.e. the type should be associative and commutative. I think these are
>> good principles because they makes types simpler to understand and to
>> implement.
>>
>> Method dispatch for `vec_c()` is quite simple because associativity and
>> commutativity mean that we can determine the output type only by
>> considering a pair of inputs at a time. To this end, vctrs provides
>> `vec_type2()` which takes two inputs and returns their common type
>> (represented as zero length vector):
>>
>> str(vec_type2(integer(), double()))
>> #>  num(0)
>>
>> str(vec_type2(factor("a"), factor("b")))
>> #>  Factor w/ 2 levels "a","b":
>
>
> What is the reasoning behind taking the union of the levels here? I'm not
> sure that is actually the behavior I would want if I have a vector of
> factors and I try to append some new data to it. I might want/ expect to
> retain the existing levels and get either NAs or an error if the new data
> has (present) levels not in the first data. The behavior as above doesn't
> seem in-line with what I understand the purpose of factors to be (explicit
> restriction of possible values).

Originally (like a week ago ), we threw an error if the factors
didn't have the same level, and provided an optional coercion to
character. I decided that while correct (the factor levels are a
parameter of the type, and hence factors with different levels aren't
comparable), that this fights too much against how people actually use
factors in practice. It also seems like base R is moving more in this
direction, i.e. in 3.4 factor("a") == factor("b") is an error, whereas
in R 3.5 it returns FALSE.

I'm not wedded to the current approach, but it feels like the same
principle should apply in comparisons like x == y (even though == is
outside the scope of vctrs, ideally the underlying principles would be
robust enough to suggest what should happen).

> I guess what I'm saying is that while I agree associativity is good for most
> things, it doesn't seem like the right behavior to me in the case of
> factors.

I think associativity is such a strong and useful principle that it
may be worth making some sacrifices for factors. That said, my claim
of associativity is only on the type, not the values of the type:
vec_c(fa, fb) and vec_c(fb, fa) both return factors, but the levels
are in different orders.

> Also, while we're on factors, what does
>

[Rd] vctrs: a type system for the tidyverse

2018-08-06 Thread Hadley Wickham
Hi all,

I wanted to share with you an experimental package that I’m currently
working on: vctrs, . The motivation for
vctrs is to think deeply about the output “type” of functions like
`c()`, `ifelse()`, and `rbind()`, with an eye to implementing one
strategy throughout the tidyverse (i.e. all the functions listed at
). Because this is
going to be a big change, I thought it would be very useful to get
comments from a wide audience, so I’m reaching out to R-devel to get
your thoughts.

There is quite a lot already in the readme
(), so here I’ll try to motivate
vctrs as succinctly as possible by comparing `base::c()` to its
equivalent `vctrs::vec_c()`. I think the drawbacks of `c()` are well
known, but to refresh your memory, I’ve highlighted a few at
. I think they arise
because of two main challenges: `c()` has to both combine vectors *and*
strip attributes, and it only dispatches on the first argument.

The design of vctrs is largely driven by a pair of principles:

-   The type of `vec_c(x, y)` should be the same as `vec_c(y, x)`

-   The type of `vec_c(x, vec_c(y, z))` should be the same as
`vec_c(vec_c(x, y), z)`

i.e. the type should be associative and commutative. I think these are
good principles because they makes types simpler to understand and to
implement.

Method dispatch for `vec_c()` is quite simple because associativity and
commutativity mean that we can determine the output type only by
considering a pair of inputs at a time. To this end, vctrs provides
`vec_type2()` which takes two inputs and returns their common type
(represented as zero length vector):

str(vec_type2(integer(), double()))
#>  num(0)

str(vec_type2(factor("a"), factor("b")))
#>  Factor w/ 2 levels "a","b":

# NB: not all types have a common/unifying type
str(vec_type2(Sys.Date(), factor("a")))
#> Error: No common type for date and factor

(`vec_type()` currently implements double dispatch through a combination
of S3 dispatch and if-else blocks, but this will change to a pure S3
approach in the near future.)

To find the common type of multiple vectors, we can use `Reduce()`:

vecs <- list(TRUE, 1:10, 1.5)

type <- Reduce(vec_type2, vecs)
str(type)
#>  num(0)

There’s one other piece of the puzzle: casting one vector to another
type. That’s implemented by `vec_cast()` (which also uses double
dispatch):

str(lapply(vecs, vec_cast, to = type))
#> List of 3
#>  $ : num 1
#>  $ : num [1:10] 1 2 3 4 5 6 7 8 9 10
#>  $ : num 1.5

All up, this means that we can implement the essence of `vec_c()` in
only a few lines:

vec_c2 <- function(...) {
  args <- list(...)
  type <- Reduce(vec_type, args)

  cast <- lapply(type, vec_cast, to = type)
  unlist(cast, recurse = FALSE)
}

vec_c(factor("a"), factor("b"))
#> [1] a b
#> Levels: a b

vec_c(Sys.Date(), Sys.time())
#> [1] "2018-08-06 00:00:00 CDT" "2018-08-06 11:20:32 CDT"

(The real implementation is little more complex:
)

On top of this foundation, vctrs expands in a few different ways:

-   To consider the “type” of a data frame, and what the common type of
two data frames should be. This leads to a natural implementation of
`vec_rbind()` which includes all columns that appear in any input.

-   To create a new “list\_of” type, a list where every element is of
fixed type (enforced by `[<-`, `[[<-`, and `$<-`)

-   To think a little about the “shape” of a vector, and to consider
recycling as part of the type system. (This thinking is not yet
fully fleshed out)

Thanks for making it to the bottom of this long email :) I would love to
hear your thoughts on vctrs. It’s something that I’ve been having a lot
of fun exploring, and I’d like to make sure it is as robust as possible
(and the motivations are as clear as possible) before we start using it
in other packages.

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is NULL a vector?

2018-07-23 Thread Hadley Wickham
On Mon, Jul 23, 2018 at 2:17 PM, Duncan Murdoch
 wrote:
> On 23/07/2018 3:03 PM, Hadley Wickham wrote:
>>
>> Hi all,
>>
>> Would you generally consider NULL to be a vector?
>
>
> According to the language definition (in the doc directory), it is not:
> "Vectors can be thought of as contiguous cells containing data. Cells are
> accessed through indexing operations such as x[5]. More details are given in
> Indexing.
>
> R has six basic (‘atomic’) vector types: logical, integer, real, complex,
> string (or character) and raw. The modes and storage modes for the different
> vector types are listed in the following table."
>
> and later
>
> "There is a special object called NULL. It is used whenever there is a need
> to indicate or specify that an object is absent. It should not be confused
> with a vector or list of zero length."

Perfect, thanks!

Also available online at
https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Vector-objects

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Is NULL a vector?

2018-07-23 Thread Hadley Wickham
Hi all,

Would you generally consider NULL to be a vector? Base R functions are
a little inconsistent:

## In favour

``` r
identical(as.vector(NULL), NULL)
#> [1] TRUE

identical(as(NULL, "vector"), NULL)
#> [1] TRUE

# supports key vector vector generics
length(NULL)
#> [1] 0
NULL[c(3, 4, 5)]
#> NULL
NULL[[1]]
#> NULL
```

## Against

``` r
is.vector(NULL)
#> [1] FALSE

is(NULL, "vector")
#> [1] FALSE
```

## Abstentions

``` r
is.atomic(NULL)
#> [1] TRUE
# documentation states "returns NULL if x is of an atomic type (or NULL)"
# is "or" exclusive or inclusive?
```

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Testing for vectors

2018-07-08 Thread Hadley Wickham
On Sat, Jul 7, 2018 at 11:19 PM, Gabe Becker  wrote:
> Hadley,
>
>
> On Sat, Jul 7, 2018 at 1:32 PM, Hadley Wickham  wrote:
>>
>> On Sat, Jul 7, 2018 at 1:50 PM, Gabe Becker  wrote:
>> > Hadley,
>> >
>> >>
>> >> I was thinking primarily of completing the set of is.matrix() and
>> >> is.array(), or generally, how do you say: is `x` a 1d dimensional
>> >> thing?
>> >
>> >
>> > Can you clarify what you mean by dimensionality sense and specifically
>> > 1d
>> > here?
>>
>> What do we call a vector that is not an array? (or matrix)
>>
>> What do we call an object that acts 1-dimensional? (i.e. has
>> length(dim()) %in% c(0, 1)) ?
>
>
>
> Right, or even (length(dim()) == 0 || sum(dim() > 1) <= 1)
>
>  but that is exactly my point, those two(/three) sets of things are not the
> same. 1d arrays meet the second definition but not the first. Matrices and
> arrays that don't meet either of yours would still meet mine. Which
> definition are you proposing strictly define what a vector is?

I am not proposing any definition. I am enquiring if there is a
definition in base R. The answer appears to be now.

> Another completely unrelated way to define vector, btw, is via the vector
> interface (from what I recall this is roughly [, [[, length, and format
> methods, though I'm probably forgetting some). This is (more or less)
> equivalent to defining a vector as "a thing that can be the column of a
> data.frame and have all the base-provided machinery work".

I don't know if that definition is adequate because a call would be a
vector by that definition. I'm pretty sure a call does not make sense
as a data frame column.

Also technically data frames don't require their columns to have equal
length(), but equal NROW(). So the spirit of that definition would
imply that a matrices and arrays are also vectors, which seems like it
might be undesirable.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Testing for vectors

2018-07-08 Thread Hadley Wickham
On Sat, Jul 7, 2018 at 3:48 PM, Ott Toomet  wrote:
> Thanks, Hadley for bringing this up:-)
>
> I am teaching R and I can suggest 5 different definitions of 'vector':
>
> a) vector as a collection of homogeneous objects, indexed by [ ] (more
> precisely atomic vector).  Sometimes you hear that in R, "everything is a
> vector", but this is only true for atomic objects.
> b) vector as a collection of objects, indexed by either [ ] and [[ ]].  This
> includes atomic vectors and lists.
> c) vector versus scalar.  It pops up when teaching math and stats, and is
> somewhat confusing, in particular if my previous claim was that "R does not
> have scalars".
> d) vector versus matrix (or other arrays).  Again, it only matters when
> doing matrix operations where 'vectors', i.e. objects with NULL dimension,
> behave their own way.
> e) finally, 'is.vector' has it's own understanding what constitutes a
> vector.

Yes!

And to add to the confusion there are three meanings to numeric vector:

* As an alias for double (i.e. numeric() and as.numeric())
* To refer to integer and double types jointly (as is S3 and S4 class)
* A vector that behaves as if it is a number (e.g. is.numeric(), which
excludes factors)

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Testing for vectors

2018-07-07 Thread Hadley Wickham
On Sat, Jul 7, 2018 at 1:50 PM, Gabe Becker  wrote:
> Hadley,
>
>>
>> I was thinking primarily of completing the set of is.matrix() and
>> is.array(), or generally, how do you say: is `x` a 1d dimensional
>> thing?
>
>
> Can you clarify what you mean by dimensionality sense and specifically 1d
> here?

What do we call a vector that is not an array? (or matrix)

What do we call an object that acts 1-dimensional? (i.e. has
length(dim()) %in% c(0, 1)) ?

> You can also have an n x 1 matrix, which technically has 2 dimensions but
> conceptually is equivalent to a 1d array and/or a vector.

Yes. You can also have array that's n x 1 x 1.

> Also, are you including lists in your conceptions of 1d vector here? I'm
> with Duncan here, in that i'm having trouble understanding exactly what you
> want to do without a bit more context.

Isn't it standard terminology that a vector is the set of atomic vectors + list?

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Testing for vectors

2018-07-07 Thread Hadley Wickham
On Sat, Jul 7, 2018 at 12:54 PM, Duncan Murdoch
 wrote:
> On 07/07/2018 1:20 PM, Hadley Wickham wrote:
>>
>> Hi all,
>>
>> Is there are base function that I've missed that tests if an object is
>> a vector in the dimensionality sense, rather than the data structure
>> sense? i.e. something that checks is.null(dim(x)) ?
>>
>> is.vector() is trivially disqualified since it also checks for the
>> presence of non-names attributes:
>>
>> x <- factor(c("a", "a", "b"))
>> is.vector(x)
>> #> [1] FALSE
>>
>> is.null(dim(x))
>> #> [1] TRUE
>>
>
> I don't know of one.  I can't think of nontrivial cases where that
> distinction matters; do you know of any where base functions act differently
> on vectors and 1D arrays?  (A trivial example is that dimnames(x) gives
> different results for a named vector and an array with dimnames.)

I was thinking primarily of completing the set of is.matrix() and
is.array(), or generally, how do you say: is `x` a 1d dimensional
thing?

(I don't have any feel for whether the check should be is.null(dim(x))
vs. length(dim(x)) <= 1)

Hadley
-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Testing for vectors

2018-07-07 Thread Hadley Wickham
Hi all,

Is there are base function that I've missed that tests if an object is
a vector in the dimensionality sense, rather than the data structure
sense? i.e. something that checks is.null(dim(x)) ?

is.vector() is trivially disqualified since it also checks for the
presence of non-names attributes:

x <- factor(c("a", "a", "b"))
is.vector(x)
#> [1] FALSE

is.null(dim(x))
#> [1] TRUE

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Weird error on CRAN linux check

2018-07-04 Thread Hadley Wickham
I don't think it's related to the error, but you shouldn't be exporting this:

export("align<-.huxtable")

You should generally only export the method.

Hadley

On Wed, Jul 4, 2018 at 9:00 AM, David Hugh-Jones
 wrote:
> Hi all,
>
> The following shows an error for my package:
> https://www.r-project.org/nosvn/R.check/r-release-linux-x86_64/huxtable-00check.html
>
> Here's an excerpt:
>
>> ### ** Examples
>>
>>
>> ht <- huxtable(a = 1:3, b = 1:3)
>> align(ht) <- 'right'
> Error in UseMethod("align<-") :
>   no applicable method for 'align<-' applied to an object of class
> "c('huxtable', 'data.frame')"
> Calls: align<-
>
>
> The error didn't show up on win-builder, travis, appveyor or my own
> computer (a mac). The package defines an `align<-.huxtable` method which is
> correctly loaded on my computer, and the NAMESPACE file contains these
> lines:
>
> S3method("align<-",huxtable)
> export("align<-")
> export("align<-.huxtable")
>
> Has anyone got any ideas?
>
> David
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel



-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Errors from Suggests or Enhances not in mainstream repositories

2018-07-03 Thread Hadley Wickham
On Tue, Jul 3, 2018 at 5:57 AM, Duncan Murdoch  wrote:
> On 02/07/2018 6:13 PM, Ben Bolker wrote:
>>
>> I got something similar.  I have a few thoughts:
>>
>> (1) you should use  "if (require(citrus)) { ... }" in your examples;
>> "Suggests" and "Enhances" packages are supposed to be *optional*, i.e.
>> examples and tests should be able to run even if they're not installed
>
>
> Nowadays 'if (requireNamespace("citrus")) { ... }' would be preferred in
> tests and examples, along with a 'citrus::' prefix on the objects from that
> package that are needed.  This has milder side effects than `require()`.

And requireNamespace("citrus", quietly = TRUE) is even slightly better
since it avoids one more side-effect ;)

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham
On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles  wrote:
>
>
>> On Jun 8, 2018, at 1:49 PM, Hadley Wickham  wrote:
>>
>> Hmmm, yes, there must be some special case in the C code to avoid
>> recycling a length-1 logical vector:
>
>
> Here is a version that (I think) handles Herve's issue of arrays having one 
> or more 0 dimensions.
>
> subset_ROW <-
> function(x,i)
> {
> dims <- dim(x)
> index_list <- which(dims[-1] != 0L) + 3
> mc <- quote(x[i])
> nd <- max(1L, length(dims))
> mc[ index_list ] <- list(TRUE)
> mc[[ nd + 3L ]] <- FALSE
> names( mc )[ nd+3L ] <- "drop"
> eval(mc)
> }
>
> Curiously enough the timing is *much* better for this implementation than for 
> the first version I sent.
>
> Constructing a version of `mc' that looks like `x[idrop=FALSE]' can be 
> done with `alist(a=)' in place of `list(TRUE)' in the earlier version but 
> seems to slow things down noticeably. It requires almost twice (!!) as much 
> time as the version above.

I think that's probably because alist() is a slow way to generate a
missing symbol:

bench::mark(
  alist(x = ),
  list(x = quote(expr = )),
  check = FALSE
)[1:5]
#> # A tibble: 2 x 5
#>   expressionmin mean   median  max
#>  
#> 1 alist(x = ) 2.8µs   3.54µs   3.29µs   34.9µs
#> 2 list(x = quote(expr = ))169ns 219.38ns181ns   24.2µs

(note the units)

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham
Hmmm, yes, there must be some special case in the C code to avoid
recycling a length-1 logical vector:

dims <- c(4, 4, 4, 1e5)

arr <- array(rnorm(prod(dims)), dims)
dim(arr)
#> [1]  4  4  4 10
i <- c(1, 3)

bench::mark(
  arr[i, TRUE, TRUE, TRUE],
  arr[i, , , ]
)[c("expression", "min", "mean", "max")]
#> # A tibble: 2 x 4
#>   expressionmin mean  max
#> 
#> 1 arr[i, TRUE, TRUE, TRUE]   41.8ms   43.6ms   46.5ms
#> 2 arr[i, , , ]   41.7ms   43.1ms   46.3ms


On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles  wrote:
>
>
>> On Jun 8, 2018, at 11:52 AM, Hadley Wickham  wrote:
>>
>> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles  wrote:
>>>
>>>
>>>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès  wrote:
>>>>
>>>> Also the TRUEs cause problems if some dimensions are 0:
>>>>
>>>>> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>>> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>>>   (subscript) logical subscript too long
>>>
>>> OK. But this is easy enough to handle.
>>>
>>>>
>>>> H.
>>>>
>>>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>>>> I suspect this will have suboptimal performance since the TRUEs will
>>>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>>>> recycling)
>>>>> Hadley
>>>
>>>
>>> AFAICS, it is not an issue. Taking
>>>
>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>>
>>> as a test case
>>>
>>> and using a function that will either use the literal code 
>>> `x[idrop=FALSE]' or `eval(mc)':
>>>
>>> subset_ROW4 <-
>>> function(x, i, useLiteral=FALSE)
>>> {
>>>literal <- quote(x[idrop=FALSE])
>>>mc <- quote(x[i])
>>>nd <- max(1L, length(dim(x)))
>>>mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>>>mc[["drop"]] <- FALSE
>>>if (useLiteral)
>>>eval(literal)
>>>else
>>>eval(mc)
>>> }
>>>
>>> I get identical times with
>>>
>>> system.time(for (i in 1:1) 
>>> subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>>>
>>> and with
>>>
>>> system.time(for (i in 1:1) 
>>> subset_ROW4(arr,seq(1,length=10,by=100),FALSE))
>>
>> I think that's because you used a relatively low precision timing
>> mechnaism, and included the index generation in the timing. I see:
>>
>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>> i <- seq(1,length = 10, by = 100)
>>
>> bench::mark(
>>  arr[i, TRUE, TRUE, TRUE],
>>  arr[i, , , ]
>> )
>> #> # A tibble: 2 x 1
>> #>   expressionminmean   median  max  n_gc
>> #>
>> #> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms 2
>> #> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs 2
>>
>> So not a huge difference, but it's there.
>
>
> Funny. I get similar results to yours above albeit with smaller differences. 
> Usually < 5 percent.
>
> But with subset_ROW4 I see no consistent difference.
>
> In this example, it runs faster on average using `eval(mc)' to return the 
> result:
>
>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>> i <- seq(1,length=10,by=100)
>> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]
> # A tibble: 2 x 8
>   expression  min mean   median  max `itr/sec` 
> mem_alloc  n_gc
>  
>  
> 1 subset_ROW4(arr, i, FALSE)   28.9µs   34.9µs   32.1µs   1.36ms28686.
> 5.05KB 5
> 2 subset_ROW4(arr, i, TRUE)28.9µs 35µs   32.4µs 875.11µs28572.
> 5.05KB 5
>>
>
> And on subsequent reps the lead switches back and forth.
>
>
> Chuck
>



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham
On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles  wrote:
>
>
>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès  wrote:
>>
>> Also the TRUEs cause problems if some dimensions are 0:
>>
>>  > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>  Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>(subscript) logical subscript too long
>
> OK. But this is easy enough to handle.
>
>>
>> H.
>>
>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>> I suspect this will have suboptimal performance since the TRUEs will
>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>> recycling)
>>> Hadley
>
>
> AFAICS, it is not an issue. Taking
>
> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>
> as a test case
>
> and using a function that will either use the literal code 
> `x[idrop=FALSE]' or `eval(mc)':
>
> subset_ROW4 <-
>  function(x, i, useLiteral=FALSE)
> {
> literal <- quote(x[idrop=FALSE])
> mc <- quote(x[i])
> nd <- max(1L, length(dim(x)))
> mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
> mc[["drop"]] <- FALSE
> if (useLiteral)
> eval(literal)
> else
> eval(mc)
>  }
>
> I get identical times with
>
> system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>
> and with
>
> system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))

I think that's because you used a relatively low precision timing
mechnaism, and included the index generation in the timing. I see:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length = 10, by = 100)

bench::mark(
  arr[i, TRUE, TRUE, TRUE],
  arr[i, , , ]
)
#> # A tibble: 2 x 1
#>   expressionminmean   median  max  n_gc
#>
#> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms 2
#> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs 2

So not a huge difference, but it's there.

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham
I suspect this will have suboptimal performance since the TRUEs will
get recycled. (Maybe there is, or could be, ALTREP, support for
recycling)
Hadley

On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles  wrote:
>
>
>> On Jun 8, 2018, at 8:45 AM, Hadley Wickham  wrote:
>>
>> Hi all,
>>
>> Is there a better to way to subset the ROWs (in the sense of NROW) of
>> an vector, matrix, data frame or array than this?
>
>
> You can use TRUE to fill the subscripts for dimensions 2:nd
>
>>
>> subset_ROW <- function(x, i) {
>>  nd <- length(dim(x))
>>  if (nd <= 1L) {
>>x[i]
>>  } else {
>>dims <- rep(list(quote(expr = )), nd - 1L)
>>do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>>  }
>> }
>
>
> subset_ROW <-
> function(x,i)
> {
> mc <- quote(x[i])
> nd <- max(1L, length(dim(x)))
> mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L)
> mc[["drop"]] <- FALSE
> eval(mc)
>
> }
>
>>
>> subset_ROW(1:10, 4:6)
>> #> [1] 4 5 6
>>
>> str(subset_ROW(array(1:10, c(10)), 2:4))
>> #>  int [1:3(1d)] 2 3 4
>> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
>> #>  int [1:3, 1] 2 3 4
>> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
>> #>  int [1:3, 1:2] 2 3 4 7 8 9
>> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
>> #>  int [1:3, 1, 1] 2 3 4
>>
>> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
>> #>   x y
>> #> 2 2 9
>> #> 3 3 8
>> #> 4 4 7
>>
>
> HTH,
>
> Chuck
>



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


  1   2   3   4   5   6   7   8   >