Re: [Rd] Apply .Rbuildignore before copying files in R CMD build

2024-08-29 Thread Duncan Murdoch

On 2024-08-29 3:34 p.m., Gábor Csárdi wrote:

On Thu, Aug 29, 2024 at 12:12 AM Duncan Murdoch
 wrote:
[...]

I think the reason is simplicity.  The build process can add, delete or
modify files.  You wouldn't want that to happen on the original source
files, so R copies the files to a temporary location to run things.

If it applied .Rbuildignore first, then important files for the build
might not be available, and the build could fail.


AFAICT the ignored files are deleted right after the copy, so they
are not present during the build process. (But FIXME.)


I think some builds do that, but builds of packages with vignettes 
generally do an install of the package, and that might need the ignored 
files.  There could be other situations too.


You probably know this, but for the benefit of those who don't:  you can 
read the build operations in the 1100 line function 
tools:::.build_packages, which starts here: 
https://github.com/wch/r-source/blob/1bdf2503322b43ce8698008eb5bc1f55bc8a58c2/src/library/tools/R/build.R#L93


The prepare_pkg() function is run between the copy and the cleanup, and 
it might do a package install.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Apply .Rbuildignore before copying files in R CMD build

2024-08-28 Thread Duncan Murdoch

On 2024-08-28 5:59 p.m., Alexey Sergushichev wrote:

Hi,

Is there any reason why .Rbuildignore is not used before copying package
files in R CMD build?

For some of the packages I develop I have rather large directories with
miscellaneous files for testing and other purposes. They are in my
.Rbuildignore (and .gitignore) file, but that doesn't prevent R CMD build
from trying to copy them on the build process. Having them copied either
breaks the build completely because /tmp directory gets out of space, or
just slows it down a lot. So I wonder if there is a specific reason for
this behavior and whether it could be change or controlled by some
parameter.

There is some discussion in the context of pkgbuild package:
https://github.com/r-lib/pkgbuild/issues/59 It provides a hackish
workaround for that, which also does not work on Windows.


I think the reason is simplicity.  The build process can add, delete or 
modify files.  You wouldn't want that to happen on the original source 
files, so R copies the files to a temporary location to run things.


If it applied .Rbuildignore first, then important files for the build 
might not be available, and the build could fail.


Having an R package that needs so much data that you can't fit two 
copies of it on your disk is a really unusual situation.  I think it 
will have to be up to you to fix it (by increasing your temp space, or 
decreasing the size of some of those files, or something else).


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] specials and ::

2024-08-27 Thread Duncan Murdoch

On 2024-08-27 9:43 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:

You are right of course, Peter, but I can see where some will get confused.   
In a formula
some symbols and functions are special operators, and others are simple 
functions.   That
is the reason one needs I(events/time) to put a rate in as a variable.    
Someone who
types 'offset' at the command line will see that there actually IS a function 
behind the
scenes.

Does anyone see a downside to Bill Dunlap's suggestion where the first step of 
my formula
processing would be to "clean off" any survival:: modifiers?    That is, 
something that
will break? After all, the code already has a lot of  "if () "  lines for 
other common
user errors.   I could view it as just saving me the time to deal with the 'we 
found an
error' emails.   I would output the corrected version as the "call" component.


I don't know if you have any data vectors that someone might use in a 
fit, but conceivably


  survdiff( Surv(time, status) ~ survival::datavector +
strata(inst),  data=lung)

would mean something different than

  survdiff( Surv(time, status) ~ datavector +
strata(inst),  data=lung)

if a user had a vector named datavector.

Duncan Murdoch



Terry

On 8/27/24 03:38, peter dalgaard wrote:

In my view, that's just plain wrong, because strata() is not a function but a 
special operator in a model formula. Wouldn't it also blow up on 
stats::offset()?

Oh, yes it would:


lm(y~x+offset(z))

Call:
lm(formula = y ~ x + offset(z))

Coefficients:
(Intercept)x
   0.7350   0.0719


lm(y~x+stats::offset(z))

Call:
lm(formula = y ~ x + stats::offset(z))

Coefficients:
   (Intercept) x  stats::offset(z)
0.64570.10780.8521


Or, to be facetious:


lm(y~base::"+"(x,z))

Call:
lm(formula = y ~ base::"+"(x, z))

Coefficients:
  (Intercept)  base::"+"(x, z)
   0.4516   0.4383



-pd


On 26 Aug 2024, at 16:42 , Therneau, Terry M., Ph.D. via 
R-devel  wrote:

The survival package makes significant use of the "specials" argument of 
terms(), before
calling model.frame; it is part of nearly every modeling function. The reason 
is that
strata argments simply have to be handled differently than other things on the 
right hand
side. Likewise for tt() and cluster(), though those are much less frequent.

I now get "bug reports" from the growing segment that believes one should put
packagename:: in front of every single instance.   For instance
fit <- survival::survdiff( survival::Surv(time, status) ~ ph.karno +
survival::strata(inst),  data= survival::lung)

This fails to give the correct answer because it fools terms(formula, specials=
"strata").I've stood firm in my response of "that's your bug, not mine", 
but I begin
to believe I am swimming uphill.   One person responded that it was company 
policy to
qualify everything.

I don't see an easy way to fix survival, and even if I did it would be a 
tremendous amout
of work.   What are other's thoughts?

Terry



--

Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"

[[alternative HTML version deleted]]

__
R-devel@r-project.org  mailing list
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=05%7C02%7Ctherneau%40mayo.edu%7C7659a5f0f0d34746966a08dcc6739fed%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638603447151664511%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UAkeksswfFdLwOdzQIOXUPC2Ey255oW%2FX41kptNZNcU%3D&reserved=0


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] specials and ::

2024-08-26 Thread Duncan Murdoch

On 2024-08-26 12:34 p.m., Duncan Murdoch wrote:

On 2024-08-26 10:42 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:

The survival package makes significant use of the "specials" argument of 
terms(), before
calling model.frame; it is part of nearly every modeling function. The reason 
is that
strata argments simply have to be handled differently than other things on the 
right hand
side. Likewise for tt() and cluster(), though those are much less frequent.

I now get "bug reports" from the growing segment that believes one should put
packagename:: in front of every single instance.   For instance
     fit <- survival::survdiff( survival::Surv(time, status) ~ ph.karno +
survival::strata(inst),  data= survival::lung)

This fails to give the correct answer because it fools terms(formula, specials=
"strata").    I've stood firm in my response of "that's your bug, not mine", 
but I begin
to believe I am swimming uphill.   One person responded that it was company 
policy to
qualify everything.

I don't see an easy way to fix survival, and even if I did it would be a 
tremendous amout
of work.   What are other's thoughts?


I received a similar complaint about the tables package, which had
assumed during argument processing that it was on the search list in
order to find a function (see
https://github.com/dmurdoch/tables/issues/30 if you want the details).
In my case there's only one function exported by tables that wasn't
being found, "labelSubset".

I don't know any of the details of the survival problems.  When I try
your example code above without attaching survival, it appears to work.
So my solution might be irrelevant to you.

The way I found to work around this was to use this code early in the
processing, when it is trying to turn the data argument into an environment:

  parent <- if (is.environment(data)) data else environment(table)
  if (!exists("labelSubset", envir = parent)) {
withTableFns <- new.env(parent = parent)
withTableFns$labelSubset <- labelSubset
  } else
withTableFns <- parent

  if (is.null(data))
data <- withTableFns
  else if (is.list(data))
data <- list2env(data, parent = withTableFns)
  else if (!is.environment(data))
stop("'data' must be a dataframe, list or environment")


Of course, posting this meant I discovered a bug in it:  if 
is.environment(data) was TRUE, the modification was ignored.  Line 8 
should be


   if (is.null(data) || is.environment(data))

to handle that case.

Duncan Murdoch



This inserts a new environment containing just that one tables function.

One issue is if a user has "labelSubset" already in the environment; I
decided to use that one on the assumption that the user did it
intentionally.  It would have been better to use a name that was less
likely to show up in another package, but it's old code.

This isn't on CRAN yet, so I'd be interested in hearing about problems
with this approach, or better solutions.

Duncan Murdoch


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] specials and ::

2024-08-26 Thread Duncan Murdoch

On 2024-08-26 10:42 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:

The survival package makes significant use of the "specials" argument of 
terms(), before
calling model.frame; it is part of nearly every modeling function. The reason 
is that
strata argments simply have to be handled differently than other things on the 
right hand
side. Likewise for tt() and cluster(), though those are much less frequent.

I now get "bug reports" from the growing segment that believes one should put
packagename:: in front of every single instance.   For instance
    fit <- survival::survdiff( survival::Surv(time, status) ~ ph.karno +
survival::strata(inst),  data= survival::lung)

This fails to give the correct answer because it fools terms(formula, specials=
"strata").    I've stood firm in my response of "that's your bug, not mine", 
but I begin
to believe I am swimming uphill.   One person responded that it was company 
policy to
qualify everything.

I don't see an easy way to fix survival, and even if I did it would be a 
tremendous amout
of work.   What are other's thoughts?


I received a similar complaint about the tables package, which had 
assumed during argument processing that it was on the search list in 
order to find a function (see 
https://github.com/dmurdoch/tables/issues/30 if you want the details). 
In my case there's only one function exported by tables that wasn't 
being found, "labelSubset".


I don't know any of the details of the survival problems.  When I try 
your example code above without attaching survival, it appears to work. 
So my solution might be irrelevant to you.


The way I found to work around this was to use this code early in the 
processing, when it is trying to turn the data argument into an environment:


parent <- if (is.environment(data)) data else environment(table)
if (!exists("labelSubset", envir = parent)) {
  withTableFns <- new.env(parent = parent)
  withTableFns$labelSubset <- labelSubset
} else
  withTableFns <- parent

if (is.null(data))
data <- withTableFns
else if (is.list(data))
data <- list2env(data, parent = withTableFns)
else if (!is.environment(data))
stop("'data' must be a dataframe, list or environment")

This inserts a new environment containing just that one tables function.

One issue is if a user has "labelSubset" already in the environment; I 
decided to use that one on the assumption that the user did it 
intentionally.  It would have been better to use a name that was less 
likely to show up in another package, but it's old code.


This isn't on CRAN yet, so I'd be interested in hearing about problems 
with this approach, or better solutions.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] specials and ::

2024-08-26 Thread Duncan Murdoch

On 2024-08-26 12:26 p.m., Chris Black wrote:

It’s completely reasonable to decline to do extra work to support it, but at 
the same time: Qualified calls are widely used and recommended, and users are 
also being completely reasonable when they try to use them (probably without 
checking the manual!) and expect them to work.


If the issues in survival are the same as the issues I saw in the tables 
package, the way to issue such a message would be to put code like


 if (! ("package:survival" %in% search()))
   stop("'survival' needs to be attached using library() or require()")

in functions that could trigger the problems.

Duncan Murdoch



Would there be a tolerably easy way to make the fit fail loudly on 
`survival::strata(…)` rather than return the wrong result?




On Aug 26, 2024, at 7:42 AM, Therneau, Terry M., Ph.D. via R-devel 
 wrote:

The survival package makes significant use of the "specials" argument of 
terms(), before
calling model.frame; it is part of nearly every modeling function. The reason 
is that
strata argments simply have to be handled differently than other things on the 
right hand
side. Likewise for tt() and cluster(), though those are much less frequent.

I now get "bug reports" from the growing segment that believes one should put
packagename:: in front of every single instance.   For instance
   fit <- survival::survdiff( survival::Surv(time, status) ~ ph.karno +
survival::strata(inst),  data= survival::lung)

This fails to give the correct answer because it fools terms(formula, specials=
"strata").I've stood firm in my response of "that's your bug, not mine", 
but I begin
to believe I am swimming uphill.   One person responded that it was company 
policy to
qualify everything.

I don't see an easy way to fix survival, and even if I did it would be a 
tremendous amout
of work.   What are other's thoughts?

Terry



--

Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question about regexp edge case

2024-08-09 Thread Duncan Murdoch

Thanks!  I think your suggested additions to the docs are perfect.

Duncan Murdoch

On 2024-08-09 5:01 a.m., Tomas Kalibera wrote:


On 8/1/24 20:55, Duncan Murdoch wrote:

Thanks Tomas.  Do note that my original post also mentioned a bug or
doc error in the PCRE docs for this regexp:


   - perl = TRUE does *not* give the documented result on at least one
system (which is "123456789", because "{,5}" is documented to not be
a quantifier, so it should only match the literal string "{,5}").


This is a change in documented behavior in PCRE. PCRE2 10.43
(share/man/man3/pcre2pattern.3) says:

"If the first number is omitted, the lower limit is taken as zero; in
this case the upper limit must be present. X{,4} is interpreted as
X{0,4}. In earlier versions such a sequence was not interpreted as a
quantifier. Other regular expression engines may behave either way."

And the changelog:

"29. Perl 5.34.0 changed the meaning of (for example) {,3} which did not
used to be treated as a quantifier. Now it is interpreted as {0,3} and
PCRE2 has changed to match. Note that {,} is still not a quantifier."

Sadly the previous behavior was also documented in pcre2pattern.3:

"For example, {,6} is not a quantifier, but a literal string of four
characters"

I've confirmed with R built with PCRE2 10.42, 10.43 and 10.44. In
practice, users would most likely see the new behavior on Windows, where
Rtools44 has PCRE2 10.43.

The R documentation (?regex) refers to the PCRE2 documentation for
"complete details", mentioning how to find out what is the version of
PCRE(2) used.  I've now added a warning about that PCRE behavior may
change between versions, with the {,m} as an example. I don't think we
can do much more - I don't think we should be replicating the PCRE
documentation/changelog - but we could add more examples, if any
important appear. Also, we don't want to write R programs that depend on
concrete versions of PCRE.

It is a good thing that ?regex doesn't document "{,m}", because it
cannot be used reliably/portably. One should use some of the documented
forms, instead, i.e. "{0,m}". Indeed there is the problem of how to use
only the documented subset of behavior (in ?regex), because one also
needs to avoid accidentally running into undocumented expressions with
special meaning, like in this case. But perhaps still authors could try
to defensively avoid risky expressions in literals in patterns, such as
those involving "{}" or otherwise similar to documented expressions with
a special meaning.

Best
Tomas




Duncan

On 2024-08-01 6:49 a.m., Tomas Kalibera wrote:


On 7/29/24 09:37, Ivan Krylov via R-devel wrote:

В Sun, 28 Jul 2024 20:02:21 -0400
Duncan Murdoch  пишет:


gsub("^([0-9]{,5}).*","\\1","123456789")
[1] "123456"

This is in TRE itself: for "^([0-9]{,1})" tre_regexecb returns {.rm_so
= 0, .rm_eo = 1}, matching "1", but for "^([0-9]{,2})" and above it
returns an off-by-one result, {.rm_so = 0, .rm_eo = 3}.

Compiling with TRE_DEBUG, I see it parsed correctly:

catenation, sub 0, 0 tags
     assertions: bol
     iteration {-1, 2}, sub -1, 0 tags, greedy
   literal (0, 9) (48, 57), pos 0, sub -1, 0 tags

...but after tre_expand_ast I see

catenation, sub 0, 1 tags
     assertions: bol
     catenation, sub -1, 1 tags
   tag 0
   union, sub -1, 0 tags
     literal empty
     catenation, sub -1, 0 tags
   literal (0, 9) (48, 57), pos 2, sub -1, 0 tags
   union, sub -1, 0 tags
     literal empty
     catenation, sub -1, 0 tags
   literal (0, 9) (48, 57), pos 1, sub -1, 0 tags
   union, sub -1, 0 tags
     literal empty
     literal (0, 9) (48, 57), pos 0, sub -1, 0 tags

...which has one too many copies of "literal (0,9)". I think it's due
to the expansion loop on line 942 of src/extra/tre/tre-compile.c being

for (j = iter->min; j < iter->max; j++)

...where 'min' is -1 to denote no minimum. This is further confirmed by
"{0,3}", "{1,3}", "{2,3}", "{3,3}" all working correctly.

Neither TRE documentation [1] nor POSIX [2] specify the {,n} syntax:
from my reading, it looks like if the upper boundary is specified, the
lower boundary must be specified too. But if we do want to fix this, it
will have to be a special case for iter->min == -1.


Thanks. It seems that TRE is now maintained again upstream, so it would
be best to discuss this with TRE maintainers directly (if not already
solved by https://github.com/laurikari/tre/pull/98).

The same applies to any other open TRE issues.

Best Tomas





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question about regexp edge case

2024-08-01 Thread Duncan Murdoch
Thanks Tomas.  Do note that my original post also mentioned a bug or doc 
error in the PCRE docs for this regexp:


  - perl = TRUE does *not* give the documented result on at least one 
system (which is "123456789", because "{,5}" is documented to not be a 
quantifier, so it should only match the literal string "{,5}").


Duncan

On 2024-08-01 6:49 a.m., Tomas Kalibera wrote:


On 7/29/24 09:37, Ivan Krylov via R-devel wrote:

В Sun, 28 Jul 2024 20:02:21 -0400
Duncan Murdoch  пишет:


gsub("^([0-9]{,5}).*","\\1","123456789")
[1] "123456"

This is in TRE itself: for "^([0-9]{,1})" tre_regexecb returns {.rm_so
= 0, .rm_eo = 1}, matching "1", but for "^([0-9]{,2})" and above it
returns an off-by-one result, {.rm_so = 0, .rm_eo = 3}.

Compiling with TRE_DEBUG, I see it parsed correctly:

catenation, sub 0, 0 tags
assertions: bol
iteration {-1, 2}, sub -1, 0 tags, greedy
  literal (0, 9) (48, 57), pos 0, sub -1, 0 tags

...but after tre_expand_ast I see

catenation, sub 0, 1 tags
assertions: bol
catenation, sub -1, 1 tags
  tag 0
  union, sub -1, 0 tags
literal empty
catenation, sub -1, 0 tags
  literal (0, 9) (48, 57), pos 2, sub -1, 0 tags
  union, sub -1, 0 tags
literal empty
catenation, sub -1, 0 tags
  literal (0, 9) (48, 57), pos 1, sub -1, 0 tags
  union, sub -1, 0 tags
literal empty
literal (0, 9) (48, 57), pos 0, sub -1, 0 tags

...which has one too many copies of "literal (0,9)". I think it's due
to the expansion loop on line 942 of src/extra/tre/tre-compile.c being

for (j = iter->min; j < iter->max; j++)

...where 'min' is -1 to denote no minimum. This is further confirmed by
"{0,3}", "{1,3}", "{2,3}", "{3,3}" all working correctly.

Neither TRE documentation [1] nor POSIX [2] specify the {,n} syntax:
from my reading, it looks like if the upper boundary is specified, the
lower boundary must be specified too. But if we do want to fix this, it
will have to be a special case for iter->min == -1.


Thanks. It seems that TRE is now maintained again upstream, so it would
be best to discuss this with TRE maintainers directly (if not already
solved by https://github.com/laurikari/tre/pull/98).

The same applies to any other open TRE issues.

Best Tomas



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Question about regexp edge case

2024-07-28 Thread Duncan Murdoch
On StackOverflow (here: 
https://stackoverflow.com/questions/78803652/why-does-gsub-in-r-match-one-character-too-many) 
there was a question about this result:


> gsub("^([0-9]{,5}).*","\\1","123456789")
[1] "123456"

The OP expected "12345" as the result.  Several points were raised:

 - The R docs don't mention the case of {,5} for the default perl = 
FALSE which uses TRE.

 - perl = TRUE gives the OP's expected result of "12345".
 - perl = TRUE does *not* give the documented result on at least one 
system (which is "123456789", because "{,5}" is documented to not be a 
quantifier, so it should only match the literal string "{,5}").
 - Some regexp engines (including Perl and Awk) document that "12345" 
is correct.


Is any of this worth fixing?

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] \>

2024-06-29 Thread Duncan Murdoch
I agree with you (I think we may be similarly aged), but there is the 
`magrittr::debug_pipe()` function, which can be inserted anywhere into 
either kind of pipe.  It will call `debug()` at that point, and let you 
examine the current value, before passing it on to the next entry.


You can't single step through a pipe (as far as I know), but with that 
modification, you can see what you've got at any point.


Duncan Murdoch


On 2024-06-29 6:57 p.m., Spencer Graves wrote:

Hi, Duncan:


On 6/29/24 17:24, Duncan Murdoch wrote:



   Yes. I'm not yet facile with "|>", but I'm learning.


   Spencer Graves


There's very little to know.  This:

   x |> f() |> g()

is just a different way of writing

      g(f(x))

If f() or g() have extra arguments, just add them afterwards:

      x |> f(a = 1) |> g(b = 2)

is just

      g(f(x, a = 1), b = 2)



  Agreed. If I understand correctly, the supporters of the former think
it's easier to highlight and execute a subset of the earlier character
string, e.g., "x |> f(a = 1)" than the corresponding subset of the
latter, "f(x, a = 1)". I remain unconvinced.


  For debugging, I prefer the following:


  fx1 <- f(x, a = 1)
  g(fx1, b=2)


  Yes, "fx1" occupies storage space that the other two do not. Ir you
are writing code for an 8086, the difference in important. However, for
my work, ease of debugging is important, which is why I prefer, "fx1 <-
f(x, a = 1); g(fx1, b=2)".


  Thanks, again, for the reply.
  Spencer Graves



This isn't quite true of the magrittr pipe, but it is exactly true of
the base pipe.

Duncan Murdoch



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] \>

2024-06-29 Thread Duncan Murdoch




  Yes. I'm not yet facile with "|>", but I'm learning.


  Spencer Graves


There's very little to know.  This:

 x |> f() |> g()

is just a different way of writing

g(f(x))

If f() or g() have extra arguments, just add them afterwards:

x |> f(a = 1) |> g(b = 2)

is just

g(f(x, a = 1), b = 2)

This isn't quite true of the magrittr pipe, but it is exactly true of 
the base pipe.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] rbind() on zero row matrices is inconsistent

2024-06-26 Thread Duncan Murdoch

The help for cbind() and rbind() says

"For cbind (rbind), vectors of zero length (including NULL) are ignored 
unless the result would have zero rows (columns), for S compatibility. 
(Zero-extent matrices do not occur in S3 and are not ignored in R.)"


This leads to an inconsistency.


  M <- matrix(NA, 0, 0)  # Make a 0x0 matrix
  N <- matrix(NA, 0, 1)  # Make a 0x1 matrix


  dim(rbind(M, NULL, NULL)) # adds 2 rows to M
  #> [1] 2 0
  dim(rbind(N, NULL, NULL)) # leaves N unchanged
  #> [1] 0 1


You get an extra row on the 0x0 matrix for each NULL value that is bound 
to it, but the 0xn matrix is unchanged for n > 0.


Clearly from the help this is intentional, but is it desirable? 
Wouldn't it make more sense for NULL to be ignored by rbind() and cbind()?


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SET_TYPEOF no longer allowed, how should I call R from within C?

2024-06-25 Thread Duncan Murdoch

On 2024-06-25 5:25 a.m., Matthew Fidler wrote:

Hi,

I have adapted code to run R from within C from the writing R extensions
here

https://colinfay.me/writing-r-extensions/system-and-foreign-language-interfaces.html


That was written in 2017.  You should use the one that came with R, or 
even better, the one that comes with the development version of R.


Duncan Murdoch



As a more comprehensive example of constructing an R call in C code and
evaluating, consider the following fragment of printAttributes in
src/main/print.c.

 /* Need to construct a call to

print(CAR(a), digits=digits)

based on the R_print structure, then eval(call, env).

See do_docall for the template for this sort of thing.

 */

 SEXP s, t;

 t = s = PROTECT(allocList(3));

 SET_TYPEOF(s, LANGSXP);

 SETCAR(t, install("print")); t = CDR(t);

 SETCAR(t, CAR(a)); t = CDR(t);

 SETCAR(t, ScalarInteger(digits));

 SET_TAG(t, install("digits"));

 eval(s, env);

 UNPROTECT(1);

At this point CAR(a) is the R object to be printed, the current attribute.
There are three steps: the call is constructed as a pairlist of length 3,
the list is filled in, and the expression represented by the pairlist is
evaluated.

A pairlist is quite distinct from a generic vector list, the only
user-visible form of list in R. A pairlist is a linked list (with CDR(t)
computing the next entry), with items (accessed by CAR(t)) and names or
tags (set by SET_TAG). In this call there are to be three items, a symbol
(pointing to the function to be called) and two argument values, the first
unnamed and the second named. Setting the type to LANGSXP makes this a call
which can be evaluated.



New checks  tells me that this is no longer allowed since it was not part
of the public api any longer.


So, how does one call R from C then?

Also should the writing R extensions be updated with the new approved
approach?


Thanks in advance.

Matt

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Segfault when parsing UTF-8 text with srcrefs

2024-05-28 Thread Duncan Murdoch

On 2024-05-28 1:35 p.m., Hadley Wickham wrote:

Hi all,

When I run the following code, R segfaults:

text <- "×"
srcfile <- srcfilecopy("test.r", text)
parse(textConnection(text), srcfile = srcfile)

It doesn't segfault if text is ASCII, or it's not wrapped in
textConnection, or srcfile isn't set.


I also see the segfault on

  R version 4.4.0 (2024-04-24) -- "Puppy Cup"
  Copyright (C) 2024 The R Foundation for Statistical Computing
  Platform: aarch64-apple-darwin20

Apple shows me this stack trace:

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_platform.dylib   0x189364904 _platform_strlen + 4
1   libR.dylib	   0x10380a954 Rf_mkChar + 20 
(envir.c:4076)

2   libR.dylib 0x10385e3ac finalizeData + 1516
3   libR.dylib	   0x10385d6dc R_Parse + 924 
(gram.c:4215)
4   libR.dylib	   0x1038f4a6c do_parse + 1260 
(source.c:294)
5   libR.dylib	   0x10383ac4c bcEval_loop + 
40204 (eval.c:8141)
6   libR.dylib	   0x10382356c bcEval + 684 
(eval.c:7524)
7   libR.dylib	   0x103822c6c Rf_eval + 556 
(eval.c:1167)
8   libR.dylib	   0x10382582c R_execClosure + 
812 (eval.c:2398)
9   libR.dylib	   0x103824924 applyClosure_core 
+ 164 (eval.c:2311)
10  libR.dylib	   0x103822f08 Rf_applyClosure + 
20 (eval.c:2333) [inlined]
11  libR.dylib	   0x103822f08 Rf_eval + 1224 
(eval.c:1285)
12  libR.dylib	   0x10387f8f8 R_ReplDLLdo1 + 440 
(main.c:398)
13  R 	   0x102d22fa0 
run_REngineRmainloop + 260
14  R 	   0x102d1a64c -[REngine runREPL] 
+ 124

15  R  0x102d0dd90 main + 588
16  dyld           0x188fae0e0 start + 2360

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Keep class attribute when applying c() to hexmodes

2024-05-27 Thread Duncan Murdoch

On 2024-05-27 11:49 a.m., Schuhmacher, Dominic wrote:

Dear list,

The following behavior in base R is unexpected to me:

a <- as.hexmode("99ac")
b <- as.hexmode("9ce5")
v <- c(a,b)
v
#> [1] 39340 40165
class(v)
#> [1] "integer"

Is there a good reason why v should not be of class "hexmode"?

I can see that this is exactly as documented. The help for `c()` only says that the 
arguments are coerced to a common type (which is integer anyway) and that all attributes 
except names are removed. On the other hand, it says further down that "c methods 
other than the default are not required to remove attributes (and they will almost 
certainly preserve a class attribute)".

So couldn't (or even shouldn't) there be a c.hexmode that keeps the class 
attribute?


I believe there could.  If you think there should, then you should 
submit a patch containing it to bugs.r-project.org.  Based on c.Date, 
here's a first attempt:


  c.hexmode <- function(...) as.hexmode(c(unlist(lapply(list(...), 
function(e) unclass(as.hexmode(e))


If you want this to be incorporated into R, you should test it, document 
it, and submit a patch containing it.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] config.site settings for M3 Mac typo in manual?

2024-05-20 Thread Duncan Murdoch
I've just upgraded to an M3 Mac laptop, and I'm working through getting 
the right configure settings to build R.


The Installation and Administration manual says to have this in config.site:

FFLAGS="-g -O2 -mmacos-version-min=11.0"
FCFLAGS="-g -O2 -mmacos-version-min=11.0"

but those give an error on my system, with it suggesting the FFLAGS and 
FCFLAGS version option should be -mmacosx-version-min=11.0.  I don't 
know if that's a typo or a change.


Besides

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] FR: Customize background colour of row and column headers for the View output

2024-05-15 Thread Duncan Murdoch
A criticism of your suggestion is that it is not backwards compatible. 
Does that matter?  I don't know, but probably not.  The X11 version of 
the viewer does what you suggest.


Duncan Murdoch

On 2024-05-15 2:20 a.m., Iago Giné Vázquez wrote:

About the decisions:

Actually, the same way dataedittext modifies the text colour not only
of data, but also of row and column names, and dataedituser modifies
the colour of all the borders, I think dataeditbg should modify the
background of all the window, at least that "inside" those
borders. Otherwise, any other option for the background colour on row
and column headers would allow the same set of colours and with the
same names, which are those commented in Rconsole:

## Colours for console and pager(s)
# (see rw/etc/rgb.txt for the known colours).

Regarding the code, looking to the Windows data editor [1], and taking
into account what you and lastly Ivan told previously

     This is entirely correct: the dialog uses the colour returned by
     dialog_bg(), which is GetSysColor(COLOR_BTNFACE).

(maybe a naive question from who does not know the code), could not be
used `guiColors[dataeditbg]` instead of `dialog_bg()` in the unique
place this is present?. So, instead of

     bbg = dialog_bg()

it would appear

     bbg = guiColors[dataeditbg];


Thanks!
Iago


[1] 
https://svn.r-project.org/R/trunk/src/library/utils/src/windows/dataentry.c <https://svn.r-project.org/R/trunk/src/library/utils/src/windows/dataentry.c>

--------
*De:* Duncan Murdoch 
*Enviat el:* dimarts, 14 de maig de 2024 14:22
*Per a:* Iago Giné Vázquez ; r-devel@R-project.org 


*A/c:* Ivan Krylov 
*Tema:* Re: FR: Customize background colour of row and column headers 
for the View output

This seems like something that should be fairly easily doable.  Why
don't you work out a patch?

Some decisions to make:

- What colours are we talking about?  Would you want the labels to have
their colour set independent of the dialog colours?  If so, would you
also want to configure the dialog colours?

- What names should be used for the colours?

- Where should all of these definitions be documented?

If you don't feel comfortable with the coding, perhaps you could answer
these questions, and someone else may code it for you.  I won't (I no
longer have easy access to Windows), but I could help with the design.

Duncan Murdoch

On 2024-05-14 5:25 a.m., Iago Giné Vázquez wrote:

Thanks again Duncan and Ivan,

I forward then the email to R-devel.

Summarizing, the dataedit options (in RGui preferences or RConsole) to
colouring the View output do not have effect on the background of the row
and column names (see https://ibb.co/Dkn2pVs <https://ibb.co/Dkn2pVs> <https://ibb.co/Dkn2pVs 

<https://ibb.co/Dkn2pVs>>).


Could this be implemented?

Thank you!

Best regards,
Iago

*De:* Ivan Krylov 
*Enviat el:* dilluns, 13 de maig de 2024 14:34
*Per a:* Duncan Murdoch 
*A/c:* Iago Giné Vázquez ; r-h...@r-project.org 


*Tema:* Re: [R] Is there some way to customize colours for the View output?
В Mon, 13 May 2024 06:08:22 -0400
Duncan Murdoch  пишет:


The row and column names don't appear to be controllable from that
menu, they seem (on my machine) to be displayed in the same colour as
the background of a dialog box, i.e. some kind of gray.  I don't
think R tries to control that colour, but perhaps some Windows
setting would change it.


This is entirely correct: the dialog uses the colour returned by
dialog_bg(), which is GetSysColor(COLOR_BTNFACE).

I think it could be a reasonable feature request to use an adjustable
colour for the row and column headers.

--
Best regards,
Ivan




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] FR: Customize background colour of row and column headers for the View output

2024-05-14 Thread Duncan Murdoch
This seems like something that should be fairly easily doable.  Why 
don't you work out a patch?


Some decisions to make:

- What colours are we talking about?  Would you want the labels to have 
their colour set independent of the dialog colours?  If so, would you 
also want to configure the dialog colours?


- What names should be used for the colours?

- Where should all of these definitions be documented?

If you don't feel comfortable with the coding, perhaps you could answer 
these questions, and someone else may code it for you.  I won't (I no 
longer have easy access to Windows), but I could help with the design.


Duncan Murdoch

On 2024-05-14 5:25 a.m., Iago Giné Vázquez wrote:

Thanks again Duncan and Ivan,

I forward then the email to R-devel.

Summarizing, the dataedit options (in RGui preferences or RConsole) to
colouring the View output do not have effect on the background of the row
and column names (see https://ibb.co/Dkn2pVs <https://ibb.co/Dkn2pVs>).

Could this be implemented?

Thank you!

Best regards,
Iago

*De:* Ivan Krylov 
*Enviat el:* dilluns, 13 de maig de 2024 14:34
*Per a:* Duncan Murdoch 
*A/c:* Iago Giné Vázquez ; r-h...@r-project.org 


*Tema:* Re: [R] Is there some way to customize colours for the View output?
В Mon, 13 May 2024 06:08:22 -0400
Duncan Murdoch  пишет:


The row and column names don't appear to be controllable from that
menu, they seem (on my machine) to be displayed in the same colour as
the background of a dialog box, i.e. some kind of gray.  I don't
think R tries to control that colour, but perhaps some Windows
setting would change it.


This is entirely correct: the dialog uses the colour returned by
dialog_bg(), which is GetSysColor(COLOR_BTNFACE).

I think it could be a reasonable feature request to use an adjustable
colour for the row and column headers.

--
Best regards,
Ivan


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] read.csv

2024-04-16 Thread Duncan Murdoch

On 16/04/2024 7:36 a.m., Rui Barradas wrote:

Às 11:46 de 16/04/2024, jing hua zhao escreveu:

Dear R-developers,

I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile 
to note -- my data involves a protein named "1433E" but to save space I drop 
the quote so it becomes,

Gene,SNP,prot,log10p
YWHAE,13:62129097_C_T,1433E,7.35
YWHAE,4:72617557_T_TA,1433E,7.73

Both read.cv() and readr::read_csv() consider prot(ein) name as (possibly 
confused by scientific notation) numeric 1433 which only alerts me when I tried 
to combine data,

all_data <- data.frame()
for (protein in proteins[1:7])
{
 cat(protein,":\n")
 f <- paste0(protein,".csv")
 if(file.exists(f))
 {
   p <- read.csv(f)
   print(p)
   if(nrow(p)>0) all_data  <- bind_rows(all_data,p)
 }
}

proteins[1:7]
[1] "1433B" "1433E" "1433F" "1433G" "1433S" "1433T" "1433Z"

dplyr::bind_rows() failed to work due to incompatible types nevertheless 
rbind() went ahead without warnings.

Best wishes,


Jing Hua

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Hello,

I wrote a file with that content and read it back with


read.csv("filename.csv", as.is = TRUE)


There were no problems, it all worked as expected.


What platform are you on?  I got the same output as Jing Hua:

Input filename.csv:

Gene,SNP,prot,log10p
YWHAE,13:62129097_C_T,1433E,7.35
YWHAE,4:72617557_T_TA,1433E,7.73

Output:

> read.csv("filename.csv")
   Gene SNP prot log10p
1 YWHAE 13:62129097_C_T 1433   7.35
2 YWHAE 4:72617557_T_TA 1433   7.73

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Repeated library() of one package with different include.only= entries

2024-04-11 Thread Duncan Murdoch

On 11/04/2024 7:04 a.m., Martin Maechler wrote:

Michael Chirico
 on Mon, 8 Apr 2024 10:19:29 -0700 writes:


 > Right now, attaching the same package with different include.only= has no
 > effect:

 > library(Matrix, include.only="fac2sparse")
 > library(Matrix)
 > ls("package:Matrix")
 > # [1] "fac2sparse"

 > ?library does not cover this case -- what is covered is the _loading_
 > behavior of repeated calls:

 >> [library and require] check and update the list of currently attached
 > packages and do not reload a namespace which is already loaded

 > But here we're looking at the _attach_ behavior of repeated calls.

 > I am particularly interested in allowing the exports of a package to be
 > built up gradually:

 > library(Matrix, include.only="fac2sparse")
 > library(Matrix, include.only="isDiagonal") # want: ls("package:Matrix") 
-->
 > c("fac2sparse", "isDiagonal")
 > ...

 > It seems quite hard to accomplish this at the moment. Is the behavior to
 > ignore new inclusions intentional? Could there be an argument to get
 > different behavior?

As you did not get an answer yet, ..., some remarks by an
R-corer who has tweaked library() behavior in the past :

- The `include.only = *` argument to library() has been a
   *relatively* recent addition {given the 25+ years of R history}:

   It was part of the extensive new features by Luke Tierney for
   R 3.6.0  [r76248 | luke | 2019-03-18 17:29:35 +0100], with NEWS entry

 • library() and require() now allow more control over handling
   search path conflicts when packages are attached. The policy is
   controlled by the new conflicts.policy option.

- I haven't seen these (then) new features been used much, unfortunately,
   also not from R-core members, but I'd be happy to be told a different story.
   


For the above reasons, it could well be that the current
implementation {of these features} has not been exercised a lot
yet, and limitations as you found them haven't been noticed yet,
or at least not noticed on the public R mailing lists, nor
otherwise by R-core (?).

Your implicitly proposed new feature (or even *changed*
default behavior) seems to make sense to me -- but as alluded
to, above, I haven't been a conscious user of any
'library(.., include.only = *)' till now.


I don't think it makes sense.  I would assume that

  library(Matrix, include.only="isDiagonal")

implies that only `isDiagonal` ends up on the search path, i.e. 
"include.only" means "include only", not "include in addition to 
whatever else has already been attached".


I think a far better approach to solve Michael's problem is simply to use

  fac2sparse <- Matrix::fac2sparse
  isDiagonal <- Matrix::isDiagonal

instead of messing around with the user's search list, which may have 
been intentionally set to include only one of those.


So I'd suggest changing the docs to say

"[library and require] check and update the list of currently attached
packages and do not reload a namespace which is already loaded.  If a 
package is already attached, no change will be made."


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in out-of-bounds assignment of list object to expression() vector

2024-04-05 Thread Duncan Murdoch

Yes, definitely looks like a bug.

Are you able to submit it to bugs.r-project.org?

Duncan Murdoch

On 05/04/2024 8:15 a.m., June Choe wrote:

There seems to be a bug in out-of-bounds assignment of list objects to an
expression() vector. Tested on release and devel. (Many thanks to folks
over at Mastodon for the help narrowing down this bug)

When assigning a list into an existing index, it correctly errors on
incompatible type, and the expression vector is unchanged:

```
x <- expression(a,b,c)
x[[3]] <- list() # Error
x
#> expression(a, b, c)
```

When assigning a list to an out of bounds index (ex: the next, n+1 index),
it errors the same but now changes the values of the vector to NULL:

```
x <- expression(a,b,c)
x[[4]] <- list() # Error
x
#> expression(NULL, NULL, NULL)
```

Curiously, this behavior disappears if a prior attempt is made at assigning
to the same index, using a different incompatible object that does not
share this bug (like a function):

```
x <- expression(a,b,c)
x[[4]] <- base::sum # Error
x[[4]] <- list() # Error
x
#> expression(a, b, c)
```

That "protection" persists until x[[4]] is evaluated, at which point the
bug can be produced again:

```
x[[4]] # Error
x[[4]] <- list() # Error
x
#> expression(NULL, NULL, NULL)
```

Note that `x` has remained a 3-length vector throughout.

Best,
June

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RSS Feed of NEWS needs a hand

2024-04-02 Thread Duncan Murdoch

On 02/04/2024 8:50 a.m., Dirk Eddelbuettel wrote:


On 2 April 2024 at 07:37, Dirk Eddelbuettel wrote:
|
| On 2 April 2024 at 08:21, Duncan Murdoch wrote:
| | I have just added R-4-4-branch to the feeds.  I think I've also fixed
| | the \I issue, so today's news includes a long list of old changes.
|
| These feeds can fussy: looks like you triggered many updates. Feedly
| currently greets me with 569 new posts (!!) in that channel.

Now 745 -- and the bigger issue seems to be that the 'posted at' timestamp is
wrong and 'current' so all the old posts are now seen as 'fresh'. Hence the
flood ... of unsorted post.

blosxom, simple as it is, takes (IIRC) filesystem ctime as the posting
timestamp so would be best if you had a backup with the old timestamps.



Looks like those dates are gone -- the switch from svn to git involved 
some copying, and I didn't preserve timestamps.


I'll see about regenerating the more recent ones.  I don't think there's 
much historical interest in the pre-4.0 versions, so maybe I'll just 
nuke those.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RSS Feed of NEWS needs a hand

2024-04-02 Thread Duncan Murdoch
I have just added R-4-4-branch to the feeds.  I think I've also fixed 
the \I issue, so today's news includes a long list of old changes.


Duncan Murdoch

On 16/03/2024 8:47 a.m., Duncan Murdoch wrote:

I have now put the files online at https://github.com/dmurdoch/diffnews
.  It seemed like too much trouble to include the SVN history, so this
is just a copy of the current version of the files.

Duncan Murdoch

On 15/03/2024 12:04 p.m., Lluís Revilla wrote:

Hi!

Thanks for this service! It is very helpful to know what is being developed.

I distribute the content to other venues and I noticed some times that the
updates are duplicated.
For example, the sentence "‘is.R()’ is deprecated as no other S dialect is
known to be in use (and this could only identify historical dialects, not
future ones)." is duplicated in different days:
Day 1:
https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2024/03/07#n2024-03-07
Day 2:
https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2024/03/09#n2024-03-09

I tried to look up how to avoid duplications with Blosxom
<http://blosxom.sourceforge.net/> but I didn't find a way.
It would be great if this could be further improved to avoid this
duplication.

Thanks!

Lluís

On Fri, 15 Mar 2024 at 13:50, Dirk Eddelbuettel  wrote:



Years ago Duncan set up a nightly job to feed RSS based off changes to
NEWS,
borrowing some setup parts from CRANberries as for example the RSS
'compiler'.

That job is currently showing the new \I{...} curly protection in an
unfavourable light. Copying from the RSS reader I had pointed at this since
the start [1], for today I see (indented by four spaces)

  CHANGES IN R-devel INSTALLATION on WINDOWS

  The makefiles and installer scripts for Windows have been tailored to
  \IRtools44, an update of the \IRtools43 toolchain. It is based on GCC
13
  and newer versions of \IMinGW-W64, \Ibinutils and libraries (targeting
  64-bit Intel CPUs). R-devel can no longer be built using \IRtools43
  without changes.

  \IRtools44 has experimental suport for 64-bit ARM (aarch64) CPUs via
LLVM
  17 toolchain using lld, clang/flang-new and libc++.

Can some kind soul put a filter over it to remove the \I ?

Thanks,  Dirk

[1] Feedly. Unless we set this up so early that I once used Google
Reader. It's been a while...

--
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] declare and validate options

2024-03-29 Thread Duncan Murdoch

On 29/03/2024 11:59 a.m., Antoine Fabri wrote:

I think there are too many packages that would need changes under this
scheme.


There would be zero if the registration of options is not required for 
packages first uploaded on CRAN before the feature is implemented.
If an option is not registered no validation is triggered and nothing 
breaks even if we opt in the behavior.


Sorry, I missed that.  Then the objection is that this would require 
CRAN to apply two different sets of rules on submissions. When a 
resubmission arrived, they'd need to look in the archive to find out 
which set of rules applied to it.  They do a bit of that now 
(determining if a submission is a resubmission, for example), but this 
would be a bigger change.  I don't think date of first submission is 
ever currently used.



If those functions could be made simple enough and bulletproof and were
widely adopted, maybe they'd be copied into one of the base packages,

Sure but realistically few maintainers will opt-in for more restrictions.


If this is something that you want CRAN to force on package authors, 
then you need to give some hard evidence that it will fix things that 
cause trouble.  But if you only apply the rule to new packages, not 
updates to old ones, it's hard to believe that it will really make much 
difference, though it will still be extra work for CRAN and R Core.


if posit did something on those lines maybe it would have a chance but 
otherwise I don't see an optional feature like this spread very far.
Or we need this package to make working with options really really much 
easier for themselves as developers, not just beneficial for users in 
the long run.


That should be a goal regardless of who does it.

Think about the development of the pipe operator:  it was in magrittr 
(and I think another package, but I forget the name) first, was widely 
adopted, then a simpler version was brought into base R.


Duncan Murdoch




Le ven. 29 mars 2024 à 16:25, Duncan Murdoch <mailto:murdoch.dun...@gmail.com>> a écrit :


On 29/03/2024 10:52 a.m., Antoine Fabri wrote:
 > Dear r-devel,
 >
 > options() are basically global variables and they come with
several issues:
 > * they're not really truly owned by a package aside from loose naming
 > conventions
 > * they're not validated
 > * their documentation is not standard, and they're often not
documented at
 > all, it's hard to know what options exist
 > * in practice they're sometimes used for internal purposes, which
is at
 > odds with their global nature and contribute to the mess, I think
they can
 > almost always be replaced by objects under a `globals`
environment in the
 > namespace, it's just a bit more work
 >
 > I tried to do as much as possible with static analysis using my
package opt
 > but it can only go so far :
https://github.com/moodymudskipper/opt
<https://github.com/moodymudskipper/opt>
 >
 > I think we can do a bit better and that it's not necessarily so
complex,
 > here's a draft of possible design :
 >
 > We could have something like this in a package to register
options along
 > with an optional validator, triggered on `options(..)` (or a new
function).
 >
 > # similar to registerS3method() :
 > registerOption("mypkg.my_option1")
 > registerOption("mypkg.my_option2", function(x)
stopifnot(is.numeric(x))
 > # maybe a `default` arg too to avoid the .onLoad() gymnastics and
invisible
 > NULL options
 >
 > * validation is a breaking change so we'd have an environment
variable to
 > opt in
 > * validation occurs when an option is set AND the namespace is
already
 > loaded (so we can still set options without loading a namespace)
OR it
 > occurs later when an applicable namespace is loaded
 > * if we register an option that has already been registered by
another
 > package, we get a message, the validator of the last loaded
namespace is
 > used, in practice due to naming conventions it doesn't really
happen, CRAN
 > could also enforce naming conventions for new packages
 > * New packages must use registerOption() if they define options,
and there
 > must be a standard documentation page for those, separately or
together
 > (with aliases), accessible with `?mypkg.my_option1` etc...
 >
 > This could certainly be done in different ways and I'd love to
hear about
 > other ideas or obstacles to improvements in this area.
 >

I think there are too many packages that would need changes under this
scheme.

A more easil

Re: [Rd] declare and validate options

2024-03-29 Thread Duncan Murdoch

On 29/03/2024 10:52 a.m., Antoine Fabri wrote:

Dear r-devel,

options() are basically global variables and they come with several issues:
* they're not really truly owned by a package aside from loose naming
conventions
* they're not validated
* their documentation is not standard, and they're often not documented at
all, it's hard to know what options exist
* in practice they're sometimes used for internal purposes, which is at
odds with their global nature and contribute to the mess, I think they can
almost always be replaced by objects under a `globals` environment in the
namespace, it's just a bit more work

I tried to do as much as possible with static analysis using my package opt
but it can only go so far : https://github.com/moodymudskipper/opt

I think we can do a bit better and that it's not necessarily so complex,
here's a draft of possible design :

We could have something like this in a package to register options along
with an optional validator, triggered on `options(..)` (or a new function).

# similar to registerS3method() :
registerOption("mypkg.my_option1")
registerOption("mypkg.my_option2", function(x) stopifnot(is.numeric(x))
# maybe a `default` arg too to avoid the .onLoad() gymnastics and invisible
NULL options

* validation is a breaking change so we'd have an environment variable to
opt in
* validation occurs when an option is set AND the namespace is already
loaded (so we can still set options without loading a namespace) OR it
occurs later when an applicable namespace is loaded
* if we register an option that has already been registered by another
package, we get a message, the validator of the last loaded namespace is
used, in practice due to naming conventions it doesn't really happen, CRAN
could also enforce naming conventions for new packages
* New packages must use registerOption() if they define options, and there
must be a standard documentation page for those, separately or together
(with aliases), accessible with `?mypkg.my_option1` etc...

This could certainly be done in different ways and I'd love to hear about
other ideas or obstacles to improvements in this area.



I think there are too many packages that would need changes under this 
scheme.


A more easily achievable improvement would be to provide functions to 
support registration, validation and documentation, and leave it up to 
the package author to call those.  This wouldn't give you validation at 
the time a user set an option, but could make it easier to validate when 
the package retrieved the value:  specify rules in one place, then 
retrieve from multiple places, without needing to duplicate the rules.


If those functions could be made simple enough and bulletproof and were 
widely adopted, maybe they'd be copied into one of the base packages, 
but really the only need for that would be to support validation on 
setting, rather than validation on retrieval.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RSS Feed of NEWS needs a hand

2024-03-16 Thread Duncan Murdoch
I have now put the files online at https://github.com/dmurdoch/diffnews 
.  It seemed like too much trouble to include the SVN history, so this 
is just a copy of the current version of the files.


Duncan Murdoch

On 15/03/2024 12:04 p.m., Lluís Revilla wrote:

Hi!

Thanks for this service! It is very helpful to know what is being developed.

I distribute the content to other venues and I noticed some times that the
updates are duplicated.
For example, the sentence "‘is.R()’ is deprecated as no other S dialect is
known to be in use (and this could only identify historical dialects, not
future ones)." is duplicated in different days:
Day 1:
https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2024/03/07#n2024-03-07
Day 2:
https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2024/03/09#n2024-03-09

I tried to look up how to avoid duplications with Blosxom
<http://blosxom.sourceforge.net/> but I didn't find a way.
It would be great if this could be further improved to avoid this
duplication.

Thanks!

Lluís

On Fri, 15 Mar 2024 at 13:50, Dirk Eddelbuettel  wrote:



Years ago Duncan set up a nightly job to feed RSS based off changes to
NEWS,
borrowing some setup parts from CRANberries as for example the RSS
'compiler'.

That job is currently showing the new \I{...} curly protection in an
unfavourable light. Copying from the RSS reader I had pointed at this since
the start [1], for today I see (indented by four spaces)

 CHANGES IN R-devel INSTALLATION on WINDOWS

 The makefiles and installer scripts for Windows have been tailored to
 \IRtools44, an update of the \IRtools43 toolchain. It is based on GCC
13
 and newer versions of \IMinGW-W64, \Ibinutils and libraries (targeting
 64-bit Intel CPUs). R-devel can no longer be built using \IRtools43
 without changes.

 \IRtools44 has experimental suport for 64-bit ARM (aarch64) CPUs via
LLVM
 17 toolchain using lld, clang/flang-new and libc++.

Can some kind soul put a filter over it to remove the \I ?

Thanks,  Dirk

[1] Feedly. Unless we set this up so early that I once used Google
Reader. It's been a while...

--
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RSS Feed of NEWS needs a hand

2024-03-15 Thread Duncan Murdoch
Usually the duplication happens because there's a small change between 
the entries, e.g. a spelling correction.  Other times it's not 
duplication, but a substantive change to the NEWS entry, so those really 
should be duplicated.


I can't spot a change in the example you gave, so it's probably a change 
to the white space.  The comparison code tries to ignore those changes, 
but it doesn't always get it right.


Sometime I should put the code on Github so others can fix bugs like 
this.  It's currently in a private SVN repository (and has been since I 
wrote it about 16 years ago).


Duncan Murdoch

On 15/03/2024 12:04 p.m., Lluís Revilla wrote:

Hi!

Thanks for this service! It is very helpful to know what is being developed.

I distribute the content to other venues and I noticed some times that the
updates are duplicated.
For example, the sentence "‘is.R()’ is deprecated as no other S dialect is
known to be in use (and this could only identify historical dialects, not
future ones)." is duplicated in different days:
Day 1:
https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2024/03/07#n2024-03-07
Day 2:
https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2024/03/09#n2024-03-09

I tried to look up how to avoid duplications with Blosxom
<http://blosxom.sourceforge.net/> but I didn't find a way.
It would be great if this could be further improved to avoid this
duplication.

Thanks!

Lluís

On Fri, 15 Mar 2024 at 13:50, Dirk Eddelbuettel  wrote:



Years ago Duncan set up a nightly job to feed RSS based off changes to
NEWS,
borrowing some setup parts from CRANberries as for example the RSS
'compiler'.

That job is currently showing the new \I{...} curly protection in an
unfavourable light. Copying from the RSS reader I had pointed at this since
the start [1], for today I see (indented by four spaces)

 CHANGES IN R-devel INSTALLATION on WINDOWS

 The makefiles and installer scripts for Windows have been tailored to
 \IRtools44, an update of the \IRtools43 toolchain. It is based on GCC
13
 and newer versions of \IMinGW-W64, \Ibinutils and libraries (targeting
 64-bit Intel CPUs). R-devel can no longer be built using \IRtools43
 without changes.

 \IRtools44 has experimental suport for 64-bit ARM (aarch64) CPUs via
LLVM
 17 toolchain using lld, clang/flang-new and libc++.

Can some kind soul put a filter over it to remove the \I ?

Thanks,  Dirk

[1] Feedly. Unless we set this up so early that I once used Google
Reader. It's been a while...

--
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Vignettes with long compute time

2024-03-11 Thread Duncan Murdoch

On 11/03/2024 11:43 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:

Is there a way to include the compiled version of a vignette in the doc 
directory but mark
it to NOT be rerun by CRAN?   I think I remember that this is possible, but 
have forgotton
how.   (It might even be a false memory.)


You could use a method similar to the testthat::skip_on_cran() approach. 
 Have the long running chunks only run conditional on having a special 
environment variable present.  This would be a little easier with knitr 
than with Sweave, since there you can use expressions for the chunk 
options, but you could always write the code something like this:


  if (Sys.getenv("RUN_SLOW_CHUNKS", 0)) {

... the slow code goes here ...

  } else
cat("This chunk takes several hours to compute.  If you want to run
 it, set the environment variable RUN_SLOW_CHUNKS to 1.\n")

Duncan Murdoch



Terry T.

Background:  Beth Atkinson and I are splitting out many of the vignettes from 
the survival
package into a separate package survivalVignettes.  There are a few reasons

   1. Some vignettes use packages outside of the base + recommended set; 
psueodovalues for
instance are normally used as input to a subsequent GEE model.    Since 
survival is itself
a recommended package, it can't legally host the pseudo.Rnw vignette.
   2. The set of vignettes for survival is large, and likely to get larger.    
It makes
sense to slim down the size of the package itself.
   3. It allows us to use Rmd.  (Again, survival can't use anything outside of 
base +
recommended).
   4. We have a couple of 'optional' vignettes that talk about edge cases, 
useful to some
people but not worth the size cost of cluttering up the main package.

The current submission fails due to one vignette in group 4 which takes a 
looong time to
run.  This vignette in particular is talking about compute time, and 
illustrates a cases
where an O(n^2) case arises.   As sentence that warns the use "of you do this 
it will take
hours to run" is a perfect case for a pdf that should not be recreated by R CMD 
check.



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R6 "classname" and generator name

2024-03-11 Thread Duncan Murdoch

On 11/03/2024 8:15 a.m., Barry Rowlingson wrote:

I'm writing some code that does a bit of introspection of R6 classes and am
wondering about the "classname" parameter. Its the first parameter to the
"R6Class" class generator generator function, and the few examples I've
looked at on CRAN set it the same as the name of the generator function,
for example, from the docs:

Queue <- R6Class("Queue", .)

but this isn't mandatory, it can be anything. Or NULL. (side quest: do
linters exist that flag this as bad style?).

Does anyone have an example of a CRAN package where this isn't the case? Or
even where an R6 class generator uses the default "NULL" for its classname
parameter? My introspection code is in two minds whether to use the
classname to label diagrams of classes, or to use the names of the actual
generator functions (which are what the package users should be using), or
show both if different, or flag up NULL values etc...

Never should have opened this can of worms. I don't even like worms.


Here's an example:

https://github.com/saraswatmks/superml/blob/0d7f6aea09968267a11612475424d4635d57877c/R/RandomSearch.R#L11-L12

I don't have any idea if this is intentional or not.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: capture "->"

2024-03-04 Thread Duncan Murdoch
That's a good suggestion, but if the function accepts strings, the 
problem is fairly easy using the parser.  E.g. compare


> getParseData( parse(text="x1 + x2 -> a3") )
   line1 col1 line2 col2 id parenttoken terminal text
11 11 1   13 11  0 exprFALSE
7  11 17  7 11 exprFALSE
1  11 12  1  3   SYMBOL TRUE   x1
3  11 12  3  7 exprFALSE
2  14 14  2  7  '+' TRUE+
4  16 17  4  6   SYMBOL TRUE   x2
6  16 17  6  7 exprFALSE
5  19 1   10  5 11 RIGHT_ASSIGN TRUE   ->
8  1   12 1   13  8 10   SYMBOL TRUE   a3
10 1   12 1   13 10 11 exprFALSE

> getParseData( parse(text="a3 <- x1 + x2") )
   line1 col1 line2 col2 id parent   token terminal text
11 11 1   13 11  0exprFALSE
1  11 12  1  3  SYMBOL TRUE   a3
3  11 12  3 11exprFALSE
2  14 15  2 11 LEFT_ASSIGN TRUE   <-
10 17 1   13 10 11exprFALSE
4  17 18  4  6  SYMBOL TRUE   x1
6  17 18  6 10exprFALSE
5  1   10 1   10  5 10 '+' TRUE+
7  1   12 1   13  7  9  SYMBOL TRUE   x2
9  1   12 1   13  9 10exprFALSE

The expressions produced are the same, but the parse data is different.

Duncan Murdoch

On 04/03/2024 11:51 a.m., Bill Dunlap wrote:

Maybe someone has already suggested this, but if your functions accepted
strings you could use sub or gsub to replace the -> with a symbol that
parsed at the same precedence as <-,
say <<-.  Then parse it and deal with it.  When it is time to display the
parsed and perhaps manipulated formulae to the user, deparse it and do the
reverse replacement.


encode <- function(string)gsub(perl=TRUE, "->", "<<-", x=string)
decode <- function(string)gsub(perl=TRUE, "<<-", "->", x=string)
rightArrow <- as.name("<<-")
leftArrow <- as.name("<-")
ast1 <- parse(text=encode("x1 + x2 -> a3"))[[1]]
ast2 <- parse(text=encode("y4 <- b5 + (b6 / b7)"))[[1]]
identical(ast1[[1]], rightArrow)

[1] TRUE

identical(ast2[[1]], leftArrow)

[1] TRUE

ast1[[3]] <- as.name("new_a3")
decode(deparse(ast1))

[1] "x1 + x2 -> new_a3"

-Bill

On Mon, Mar 4, 2024 at 1:59 AM Dmitri Popavenko 
wrote:


Dear Barry,

In general, I believe users are already accustomed with the classical
arrows "->" and "<-" which are used as such in quoted expressions.
But I agree that "-.>" is a very neat trick, thanks a lot. A small dot,
what a difference.

All the best,
Dmitri

On Mon, Mar 4, 2024 at 11:40 AM Barry Rowlingson <
b.rowling...@lancaster.ac.uk> wrote:


It seems like you want to use -> and <- as arrows with different meanings
to "A gets the value of B" in your package, as a means of writing
expressions in your package language.

Another possibility would be to use different symbols instead of the
problematic -> and <-, for example you could use <.~ and ~.> which are

not

at all flipped or changed before you get a chance to parse your

expression.

It might make your language parser a bit trickier though. Let's see how
these things turn into R's AST using `lobstr`:

  > library(lobstr)
  > ast(A ~.> B)
█─`~`
├─A
└─█─`>`
   ├─.
   └─B
  > ast(A <.~ B)
█─`~`
├─█─`<`
│ ├─A
│ └─.
└─B

You'd have to unpick that tree to figure out you've got A and B on either
side of your expression, and that the direction of the expression is L-R

or

R-L.

You could also use -.> and <.- symbols, leading to a different tree

  > ast(A -.> B)
█─`>`
├─█─`-`
│ ├─A
│ └─.
└─B
  > ast(A <.- B)
█─`<`
├─A
└─█─`-`
   ├─.
   └─B

Without knowing the complexity of your language expressions (especially

if

it allows dots and minus signs with special meanings) I'm not sure if A)
this will work or B) this will bend your brain in horrible directions in
order to make it work... Although you don't need to parse the AST as

above,

you can always deparse to get the text version of it:

  > textex = function(x){deparse(substitute(x))}
  > textex(A <.~ B)
[1] "A < . ~ B"

The <.~ form has an advantage over the <.- form if you want to do complex
expressions with more than one arrow, since the ~ form is syntactically
correct but the - form isnt:

  > textex(A <.~ B ~.> C)
[1] "A < . ~ B ~ . > C"
  > textex(A <.- B -.> C)
Error: unexp

Re: [Rd] capture "->"

2024-03-02 Thread Duncan Murdoch
You can't change the parser.  Changes like `+` <- `-` change the 
function that is called when the expression contains a function call to 
`+`; this happens in `eval()`, not in `parse()`.  There are never any 
function calls to `->`, because the parser outputs a call to `<-` with 
the operands reversed when it sees that token.


Duncan Murdoch

On 02/03/2024 6:06 a.m., Adrian Dușa wrote:

That would have been an elegant solution, but it doesn't seem to work:


`->` <- `+`
1 -> 3 # expecting 4

Error in 3 <- 1 : invalid (do_set) left-hand side to assignment

It is possible to reassign other multiple character operators:

`%%` <- `+`
1 %% 3

[1] 4

The assignment operator `->` is so special for the R parser, that it seems
impossible to change.

On Fri, Mar 1, 2024 at 11:30 PM  wrote:


Adrian,

That is indeed a specialized need albeit not necessarily one that cannot
be done by requiring an alternate way of typing a formula that avoids being
something the parser sees as needed to do at that level.

In this case, my other questions become moot as I assume the global
assignment operator and somethings like assign(“xyz”, 5) will not be in the
way.

What I was wondering about is what happens if you temporarily disable the
meaning of the assignment operator <- and turn it back on after.

In the following code, for no reason, I redefine + to mean – and then undo
it:



temp <- `+`
`+` <- `-`
5 + 3

[1] 2

`+` <- temp
5 + 3

[1] 8



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] capture "->"

2024-03-01 Thread Duncan Murdoch

On 01/03/2024 8:51 a.m., Dmitri Popavenko wrote:
On Fri, Mar 1, 2024 at 1:00 PM Duncan Murdoch <mailto:murdoch.dun...@gmail.com>> wrote:


...
I was thinking more of you doing something like

   parse(text = "A -> B", keep.source = TRUE)

I forget what the exact rules are for attaching srcrefs to arguments of
functions, but I do remember they are a little strange, because not
every possible argument can accept a srcref attribute.  For example,
you
can't attach one to NULL, or to a name.

Srcrefs are also fairly big and building them is slow, so I think we
tried to limit them to where they were needed, we didn't try to attach
them to every subexpression, just one per statement.  Each expression
within {} is a separate statement, so we get srcrefs attached to the {.
But in "foo(A -> B)" probably you only get one on the foo call.

In some circumstances you could get the srcref on that call by looking
at sys.call().  But then things are complicated again, because R
doesn't
attach srcrefs to things typed at the console, only to things that are
sourced from files or text strings (and parsed with keep.source=TRUE).

So I think you should probably require input from a string or a
file, or
not expect foo(A -> B) to work without some decoration.


Indeed, the more challenging task is to identify "->" at the console
(from a script or a string, seems trivial now).

I would be willing to decorate as much as it takes to make this work, I 
am just empty on more ideas how to persuade the parser.


By "decorate", I meant putting it in quotes and parsing it using 
parse(text=...), or putting it in braces as you found.  I think parsing 
a string is most likely to be reliable because someone might turn off 
`keep.source` and then the braced approach would fail.  But you have 
control over it when you call parse() yourself.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] capture "->"

2024-03-01 Thread Duncan Murdoch

On 01/03/2024 5:25 a.m., Dmitri Popavenko wrote:

Dear Duncan,

On Fri, Mar 1, 2024 at 11:30 AM Duncan Murdoch <mailto:murdoch.dun...@gmail.com>> wrote:


...
If you parse it with srcrefs, you could look at the source.  The parser
doesn't record whether it was A -> B or B <- A anywhere else.


Thank you, this gets me closer but it still needs a little push:

 > foo <- function(x) {
   x <- substitute(x)
   return(attr(x, "srcref")[[2]])
}

 > foo(A -> B)
NULL

This seems to work, however:
 > foo({A -> B})
A -> B

Is there a way to treat the formula as if it was enclosed between the 
curly brackets?

Dmitri


I was thinking more of you doing something like

 parse(text = "A -> B", keep.source = TRUE)

I forget what the exact rules are for attaching srcrefs to arguments of 
functions, but I do remember they are a little strange, because not 
every possible argument can accept a srcref attribute.  For example, you 
can't attach one to NULL, or to a name.


Srcrefs are also fairly big and building them is slow, so I think we 
tried to limit them to where they were needed, we didn't try to attach 
them to every subexpression, just one per statement.  Each expression 
within {} is a separate statement, so we get srcrefs attached to the {. 
But in "foo(A -> B)" probably you only get one on the foo call.


In some circumstances you could get the srcref on that call by looking 
at sys.call().  But then things are complicated again, because R doesn't 
attach srcrefs to things typed at the console, only to things that are 
sourced from files or text strings (and parsed with keep.source=TRUE).


So I think you should probably require input from a string or a file, or 
not expect foo(A -> B) to work without some decoration.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] capture "->"

2024-03-01 Thread Duncan Murdoch

On 01/03/2024 4:17 a.m., Dmitri Popavenko wrote:

Hi everyone,

I am aware this is a parser issue, but is there any possibility to capture
the use of the inverse assignment operator into a formula?

Something like:


foo <- function(x) substitute(x)


gives:


foo(A -> B)

B <- A

I wonder if there is any possibility whatsoever to signal the use of ->
instead of <-


If you parse it with srcrefs, you could look at the source.  The parser 
doesn't record whether it was A -> B or B <- A anywhere else.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in comparison of language objects?

2024-02-20 Thread Duncan Murdoch

On 20/02/2024 8:03 a.m., Duncan Murdoch wrote:

I noticed the following odd behaviour today:

exprs <- expression( mean(a), mean(b), { a }, { b } )

exprs[[1]] == exprs[[2]]
#> [1] FALSE

exprs[[3]] == exprs[[4]]
#> [1] TRUE

Does it make sense to anyone that the argument passed to `mean` matters,
but the expression contained in braces doesn't?


I have done some debugging, and found the cause:  for the comparison of 
language objects, R deparses them to strings using C function 
deparse1(), and looks at only the first line.  "mean(a)" deparses as is, 
but "{ a }" deparses to 3 lines


{
  a
}

and the first line is the same as for "{ b }", so they compare equal.

I think it would make more sense to deparse them to one long string, and 
compare those, i.e. to replace deparse1() with deparse1line() (which may 
have been the intention).


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug in comparison of language objects?

2024-02-20 Thread Duncan Murdoch

I noticed the following odd behaviour today:

  exprs <- expression( mean(a), mean(b), { a }, { b } )

  exprs[[1]] == exprs[[2]]
  #> [1] FALSE

  exprs[[3]] == exprs[[4]]
  #> [1] TRUE

Does it make sense to anyone that the argument passed to `mean` matters, 
but the expression contained in braces doesn't?


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Suggestion: simplify trace() interface

2024-02-19 Thread Duncan Murdoch
The trace() function is very nice for setting breakpoints or other 
debugging code in functions and methods, but its interface is 
confusingly complicated.


For example, there was a question on StackOverflow recently that led to 
this observation:


   trace(stats::predict.lm, edit = TRUE)

will allow breakpoints to be set in stats::predict.lm, but they will 
only be seen if that method is called directly, not indirectly via 
stats::predict on an lm object. If stats is on the search list,


   trace(predict.lm, edit = TRUE)

does the same thing as you would expect.

On the other hand,

   trace(stats:::predict.lm, edit = TRUE)

sets the breakpoint so it works in predict() calls, but *not* on direct 
calls.


I can see that sometimes you would want to differentiate between those 
two ways of calling predict.lm, but I would think that normally you'd 
want both kinds to be debugged.


There's also an argument "where" that allows you to limit the tracing, 
e.g. an example allows you to trace calls to lm() coming from the nlme 
package (presumably by tracing only the import, but I haven't debugged 
it carefully).


Wouldn't it make sense for "where" to be the *only* way to limit tracing 
to some copies of the function, and if "where" is omitted, trace() 
should attempt to modify all copies?


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] round.Date and trunc.Date not working / implemented

2024-02-09 Thread Duncan Murdoch

On 08/02/2024 7:58 p.m., Jiří Moravec wrote:

  > This is a workaround, and could be the basis for a round.Date
improvement:
  >   date <- Sys.Date()
  >   as.Date(round(as.POSIXct(date), "years"))
  >   as.Date(round(as.POSIXct(Sys.Date() + 180), "years"))
  > Duncan Murdoch

That would work, perhaps structured similarly as `trunc.Date` is.
The only issue might be that `trunc.Date` is currently using `round.Date`
in its numeric form likely to prevent ?expensive? conversion to POSIXt
when it is not required.

  > trunc.Date
  > function (x, units = c("secs", "mins", "hours", "days", "months",
  >     "years"), ...)
  > {
  >    units <- match.arg(units)
  >    if (units == "months" || units == "years")
  >    as.Date(trunc.POSIXt(x, units, ...))
  >    else round(x - 0.499)
  > }

Perhaps the working version of `round.Date` could be:

    round.Date = function(x, units = c("secs", "mins", "hours", "days",
"months", "years")){
      units = match.arg(units)

      if (units == "months" || units == "years")
        as.Date(round.POSIXt(x, units, ...))
      else .Date(round(as.numeric(x)))
    }


If I were writing round.Date, I wouldn't offer the user an explicit 
option to round to seconds, minutes or hours.  So the header could be


round.Date = function(x, units = c("days", "months", "years"))

Whether the function would complain if given other units like "secs" 
would need to be decided.


Like Henrik, I don't really like direct calls to methods such as your 
round.POSIXt call.  Those make assumptions that may not be true for 
weird corner cases where the class is not just "Date", but something 
more complicated that happens to have "Date" as one of the components of 
the class.  However, the related functions use that writing style, so I 
shouldn't complain too much.


Duncan Murdoch



Or perhaps `unclass` instead of `as.numeric`. Since the default `units`
for round(x) evaluates
to `sec`, this should correctly skip the first condition in `round` and
get to the correct numeric
rounding.

Perhaps the `trunc.Date` should be modified as well so that the call to
`round.Date` is skipped in favour of internal `round.numeric`, saving
few cycles.

-- Jirka

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] round.Date and trunc.Date not working / implemented

2024-02-08 Thread Duncan Murdoch

This is a workaround, and could be the basis for a round.Date improvement:

  date <- Sys.Date()

  as.Date(round(as.POSIXct(date), "years"))

  as.Date(round(as.POSIXct(Sys.Date() + 180), "years"))

Duncan Murdoch

On 08/02/2024 12:23 p.m., Henrik Bengtsson wrote:

Technically, there is a round() for 'Date' objects, but it doesn't
seem very useful, because it basically just fall back to the default
round() method, which only takes the 'digits' argument.

Here's an example:


date <- Sys.Date()
class(date)

[1] "Date"

We see that there are only two round() methods in addition to the
implicit built-in one;


methods("round")

[1] round.Date   round.POSIXt
see '?methods' for accessing help and source code

Looking at round() for 'Date';


round.Date

function (x, ...)
{
 .Date(NextMethod(), oldClass(x))
}


we see that it defers to the next method here, which is the built-in
one. The built-in one, only accepts 'digits', which does nothing for
digits >= 0.  For digits < 0, it rounds to power of ten, e.g.


date

[1] "2024-02-08"

round(date, digits = 0)

[1] "2024-02-08"

round(date, digits = 1)

[1] "2024-02-08"

round(date, digits = 2)

[1] "2024-02-08"

round(date, digits = -1)

[1] "2024-02-07"

round(date, digits = -2)

[1] "2024-03-18"

round(date, digits = -3)

[1] "2024-10-04"

round(date, digits = -4)

[1] "2024-10-04"

round(date, digits = -5)

[1] "1970-01-01"

So, although technically invalid, OPs remark is a valid one. I'd also
expect `round()` for Date to support 'units' similar to timestamps,
e.g.


time <- Sys.time()
class(time)

[1] "POSIXct" "POSIXt"

time

[1] "2024-02-08 09:17:02 PST"

round(time, units = "days")

[1] "2024-02-08 PST"

round(time, units = "months")

[1] "2024-02-01 PST"

round(time, units = "years")

[1] "2024-01-01 PST"

So, I agree with OP that one would expect:


round(date, units = "days")

[1] "2024-02-08"

round(date, units = "months")

[1] "2024-02-01"

round(date, units = "years")

[1] "2024-01-01"

to also work here.

FWIW, I don't think we want to encourage circumventing the S3 generic
and calling S3 methods directly, i.e. I don't recommend doing things
like round.POSIXt(...). Ideally, all S3 methods in R would be
non-exported, but some remain exported for legacy reason. But, I think
we should treat them as if they in the future will become
non-exported.

/Henrik

On Thu, Feb 8, 2024 at 8:18 AM Olivier Benz via R-devel
 wrote:



On 8 Feb 2024, at 15:15, Martin Maechler  wrote:


Jiří Moravec
on Wed, 7 Feb 2024 10:23:15 +1300 writes:



This is my first time working with dates, so if the answer is "Duh, work
with POSIXt", please ignore it.



Why is not `round.Date` and `trunc.Date` "implemented" for `Date`?



Is this because `Date` is (mostly) a virtual class setup for a better
inheritance or is that something that is just missing? (like
`sort.data.frame`). Would R core welcome a patch?



I decided to convert some dates to date using `as.Date` function, which
converts to a plain `Date` class, because that felt natural.



But then when trying to round to closest year, I have realized that the
`round` and `trunc` for `Date` do not behave as for `POSIXt`.



I would assume that these will have equivalent output:



Sys.time() |> round("years") # 2024-01-01 NZDT



Sys.Date() |> round("years") # Error in round.default(...): non-numeric
argument to mathematical function




Looking at the code (and reading the documentation more carefully) shows
the issue, but this looks like an omission that should be patched.



-- Jirka


You are wrong:  They *are* implemented,
both even visible since they are in the 'base' package!

==> they have help pages you can read 

Here are examples:


trunc(Sys.Date())

[1] "2024-02-08"

trunc(Sys.Date(), "month")

[1] "2024-02-01"

trunc(Sys.Date(), "year")

[1] "2024-01-01"






Maybe he meant

r$> Sys.time() |> round.POSIXt("years")
[1] "2024-01-01 CET"

r$> Sys.Date() |> round.POSIXt("years")
[1] "2024-01-01 UTC"

The only difference is the timezone


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


On Thu, Feb 8, 2024 at 9:06 AM Rui Barradas  wrote:


Às 14:36 de 08/02/2024, Olivier Benz via R-devel escreveu:

On 8 Feb 2024, at 15:15, Martin Maechler  wrote:


Jiří Moravec
  

Re: [Rd] [EXTERNAL] Re: NOTE: multiple local function definitions for ?fun? with different formal arguments

2024-02-07 Thread Duncan Murdoch
I put the idea below into a function that gives nicer looking results. 
Here's the new code:


dupnames <- function(path = ".") {

  Rfiles <- pkgload:::find_code(path)
  allnames <- data.frame(names=character(), filename=character(), line 
= numeric())

  result <- NULL
  for (f in Rfiles) {
exprs <- parse(f, keep.source = TRUE)
locs <- getSrcLocation(exprs)
names <- character(length(exprs))
lines <- numeric(length(exprs))
for (i in seq_along(exprs)) {
  expr <- exprs[[i]]
  if (is.name(expr[[1]]) &&
  deparse(expr[[1]]) %in% c("<-", "=") &&
  is.name(expr[[2]])) {
names[i] <- deparse(expr[[2]])
lines[i] <- locs[i]
  }
}
keep <- names != ""
if (any(keep)) {
  names <- names[keep]
  lines <- lines[keep]

  prev <- nrow(allnames)
  allnames <- rbind(allnames, data.frame(name = names, filename = 
basename(f), line = lines))

  dups <- which(duplicated(allnames$name))
  dups <- dups[dups > prev]
  if (any(dups)) {
origfile <- character(length(dups))
origline <- numeric(length(dups))
for (i in seq_along(dups)) {
  prev <- which(allnames$name == allnames$name[dups[i]])[1]
  origfile[i] <- allnames$filename[prev]
  origline[i] <- allnames$line[prev]
}

result <- rbind(result,
data.frame(name = allnames$name[dups],
   first = paste(origfile, origline, 
sep=":"),
   dup = paste(allnames$filename[dups], 
allnames$line[dups], sep = ":")))

  }
}
  }
  result
}


And here's what I get when I run it on rgl:

dupnames("rgl")
  name  first  dup
1  fns knitr.R:12  knitr.R:165
2  fns knitr.R:12 pkgdown.R:14
3  fns knitr.R:12shiny.R:8

Those are okay; the fns object is a temporary that is later removed in 
each case.


Duncan Murdoch


On 07/02/2024 9:05 a.m., Duncan Murdoch wrote:

I agree a note about this sort of change might be good.

I think it wouldn't be too hard to write such a check to detect simple
assignments using <- or =.  If you also wanted to detect method
redefinitions, or redefinitions of functions stored in lists, etc., it
would be harder.

There's unexported code in the pkgload package that will get you the
list of R files in the correct collation order:  pkgload:::find_code .
I don't know of such a function exported by some other package, but
there might be one.  Once you have that list, you could parse each file
and look for top level assignments to a name, then look for duplicates
in the vector of names.

Here's a little script that finds cases where an R source file makes an
assignment to a variable with the same name as one that was used earlier:

# Assume we are in the top level directory of a package.
Rfiles <- pkgload:::find_code()

allnames <- character()
for (f in Rfiles) {
exprs <- parse(f)
names <- character(length(exprs))
for (i in seq_along(exprs)) {
  expr <- exprs[[i]]
  if (is.name(expr[[1]]) &&
  deparse(expr[[1]]) %in% c("<-", "=") &&
  is.name(expr[[2]])) {
names[i] <- deparse(expr[[2]])
  }
}
names <- names[names != ""]
prev <- length(allnames)
allnames <- c(allnames, names)
dups <- which(duplicated(allnames))
    dups <- dups[dups > prev]
if (any(dups)) {
  cat("Duplicated names in ", basename(f), ":\n")
  cat(paste(allnames[dups], collapse = ", "), "\n")
}
}

It could be made more fancy to report the locations of both the original
and the dup if you feel motivated.

Duncan Murdoch

On 06/02/2024 8:09 p.m., Chris Black wrote:

Hopefully to too much of a tangent: A related problem this check doesn’t catch 
is accidental top-level redefinitions in package code, such as

## a.R:
helper <- function() 1

f <- function() {
helper()
}
# “cool, f() must return 1"

## b.R:
helper <- function(x) 2

g <- function() {
helper()
}
# “cool, g() must return 2"

## Runtime:
# > c(pkg::f(), pkg::g())
# [1] 2 2
# “oh right, only the last definition of helper() is used”
   
I’ve seen several variants of this issue in code from folks who are new to package development, especially if they're naively refactoring something that started out as an interactively-run analysis. Collaborators who are puzzled by it get my “packages are collections of objects not sequences of expressions, yes that needs to be in your mental model, here’s the link to RWE again” talk, but I would be happy to be able to point them to a check result to go along with it.


I don’t think this is grounds on its own to change a 20-year

Re: [Rd] [External] Get list of active calling handlers?

2024-02-07 Thread Duncan Murdoch

On 07/02/2024 8:36 a.m., luke-tier...@uiowa.edu wrote:

On Tue, 6 Feb 2024, Duncan Murdoch wrote:


The SO post https://stackoverflow.com/q/77943180 tried to call
globalCallingHandlers() from a function, and it failed with the error message
"should not be called with handlers on the stack".  A much simpler
illustration of the same error comes from this line:

  try(globalCallingHandlers(warning = function(e) e))

The problem here is that try() sets an error handler, and
globalCallingHandlers() sees it and aborts.

If I call globalCallingHandlers() with no arguments, I get a list of
currently active global handlers.  Is there also a way to get a list of
active handlers, including non-global ones (like the one try() added in the
line above)?


There is not. The internal stack is not safe to allow to escape to the
R level.  It would be possible to write a reflection function to
provide some information, but it would be a fair bit of work to design
and I don't think would be of enough value to justify that.

The original SO question would be better addressed to
Posit/RStudio. Someone with enough motivation might also be able to
figure out an answer by looking at the source code at
https://github.com/rstudio/rstudio.


Thanks!

Duncan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [EXTERNAL] Re: NOTE: multiple local function definitions for ?fun? with different formal arguments

2024-02-07 Thread Duncan Murdoch

I agree a note about this sort of change might be good.

I think it wouldn't be too hard to write such a check to detect simple 
assignments using <- or =.  If you also wanted to detect method 
redefinitions, or redefinitions of functions stored in lists, etc., it 
would be harder.


There's unexported code in the pkgload package that will get you the 
list of R files in the correct collation order:  pkgload:::find_code . 
I don't know of such a function exported by some other package, but 
there might be one.  Once you have that list, you could parse each file 
and look for top level assignments to a name, then look for duplicates 
in the vector of names.


Here's a little script that finds cases where an R source file makes an 
assignment to a variable with the same name as one that was used earlier:


# Assume we are in the top level directory of a package.
Rfiles <- pkgload:::find_code()

allnames <- character()
for (f in Rfiles) {
  exprs <- parse(f)
  names <- character(length(exprs))
  for (i in seq_along(exprs)) {
expr <- exprs[[i]]
if (is.name(expr[[1]]) &&
deparse(expr[[1]]) %in% c("<-", "=") &&
is.name(expr[[2]])) {
  names[i] <- deparse(expr[[2]])
}
  }
  names <- names[names != ""]
  prev <- length(allnames)
  allnames <- c(allnames, names)
  dups <- which(duplicated(allnames))
  dups <- dups[dups > prev]
  if (any(dups)) {
cat("Duplicated names in ", basename(f), ":\n")
cat(paste(allnames[dups], collapse = ", "), "\n")
  }
}

It could be made more fancy to report the locations of both the original 
and the dup if you feel motivated.


Duncan Murdoch

On 06/02/2024 8:09 p.m., Chris Black wrote:

Hopefully to too much of a tangent: A related problem this check doesn’t catch 
is accidental top-level redefinitions in package code, such as

## a.R:
helper <- function() 1

f <- function() {
helper()
}
# “cool, f() must return 1"

## b.R:
helper <- function(x) 2

g <- function() {
helper()
}
# “cool, g() must return 2"

## Runtime:
# > c(pkg::f(), pkg::g())
# [1] 2 2
# “oh right, only the last definition of helper() is used”
  
I’ve seen several variants of this issue in code from folks who are new to package development, especially if they're naively refactoring something that started out as an interactively-run analysis. Collaborators who are puzzled by it get my “packages are collections of objects not sequences of expressions, yes that needs to be in your mental model, here’s the link to RWE again” talk, but I would be happy to be able to point them to a check result to go along with it.


I don’t think this is grounds on its own to change a 20-year precedent, but in 
case anyone is collecting wishlist reasons to make the check look harder...

Thanks,
Chris


On Feb 6, 2024, at 3:17 PM, Martin Morgan  wrote:

I went looking and found this in codetools, where it's been for 20 years

https://gitlab.com/luke-tierney/codetools/-/blame/master/R/codetools.R?ref_type=heads#L951

I think the call stack in codetools is checkUsagePackage -> checkUsageEnv -> 
checkUsage, and these are similarly established. The call from the tools package 
https://github.com/wch/r-source/blame/95146f0f366a36899e4277a6a722964a51b93603/src/library/tools/R/QC.R#L4585
 is also quite old.

I'm not sure this had been said explicitly, but perhaps the original intent was 
to protect against accidentally redefining a local function. Obviously one 
could do this with a local variable too, though that might less often be an 
error…

toto <- function(mode) {
tata <- function(a, b) a * b  # intended
tata <- function(a, b) a / b  # oops
…
}

Another workaround is to actually name the local functions

toto <- function(mode) {
tata <- function(a, b) a * b
titi <- function(u, v, w) (u + v) / w
if (mode == 1)
tata
else
titi
}

… or to use a switch statement

toto <- function(mode) {
## fun <- switch(…) for use of `fun()` in toto
switch(
mode,
tata = function(a, b) a * b,
titi = function(u, v, w) (u + v) / w,
stop("unknown `mode = '", mode, "'`")
)
}

… or similarly to write `fun <- if … else …`, assigning the result of the `if` 
to `fun`. I guess this last formulation points to the fact that a more careful 
analysis of Hervé's original code means that `fun` can only take one value (only 
one branch of the `if` can be taken) so there can only be one version of `fun` in 
any invocation of `toto()`.

Perhaps the local names (and string-valued 'mode') are suggestive of special 
case, so serve as implicit documentation?

Adding `…` to `tata` doesn't seem like a good idea; toto(1)(3, 5, 7) no longer 
signals an error.

There seems to be a lot in commo

[Rd] Get list of active calling handlers?

2024-02-06 Thread Duncan Murdoch
The SO post https://stackoverflow.com/q/77943180 tried to call 
globalCallingHandlers() from a function, and it failed with the error 
message "should not be called with handlers on the stack".  A much 
simpler illustration of the same error comes from this line:


  try(globalCallingHandlers(warning = function(e) e))

The problem here is that try() sets an error handler, and 
globalCallingHandlers() sees it and aborts.


If I call globalCallingHandlers() with no arguments, I get a list of 
currently active global handlers.  Is there also a way to get a list of 
active handlers, including non-global ones (like the one try() added in 
the line above)?


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NOTE: multiple local function definitions for ?fun? with different formal arguments

2024-02-06 Thread Duncan Murdoch

On 06/02/2024 2:17 p.m., Hervé Pagès wrote:
Thanks. Workarounds are interesting but... what's the point of the NOTE 
in the first place?


Creating a function that can't be called could be an error.  Presumably 
you are careful and never try to call it with the wrong signature, but 
the check code isn't smart enough to follow every code path, so it gives 
the note to warn you that you might have something wrong.


You still have the same issue with my workaround, but the check code 
isn't smart enough to notice that.


Duncan Murdoch



H.

On 2/4/24 09:07, Duncan Murdoch wrote:
On 04/02/2024 10:55 a.m., Izmirlian, Grant (NIH/NCI) [E] via R-devel 
wrote:
Well you can see that yeast is exactly weekday you have.  The way out 
is to just not name the result


I think something happened to your explanation...



toto <- function(mode)
{
 ifelse(mode == 1,
 function(a,b) a*b,
 function(u, v, w) (u + v) / w)
}


It's a bad idea to use ifelse() when you really want if() ... else ... 
.  In this case it works, but it doesn't always.  So the workaround 
should be



toto <- function(mode)
{
    if(mode == 1)
    function(a,b) a*b
    else
    function(u, v, w) (u + v) / w
}






From: Grant Izmirlian 
Date: Sun, Feb 4, 2024, 10:44 AM
To: "Izmirlian, Grant (NIH/NCI) [E]" 
Subject: Fwd: [EXTERNAL] R-devel Digest, Vol 252, Issue 2

Hi,

I just ran into this 'R CMD check' NOTE for the first time:

* checking R code for possible problems ... NOTE
toto: multiple local function definitions for �fun� with different
   formal arguments

The "offending" code is something like this (simplified from the real 
code):


toto <- function(mode)
{
 if (mode == 1)
 fun <- function(a, b) a*b
 else
 fun <- function(u, v, w) (u + v) / w
 fun
}

Is that NOTE really intended? Hard to see why this code would be
considered "wrong".

I know it's just a NOTE but still...


I agree it's a false positive, but the issue is that you have a 
function object in your function which can't be called 
unconditionally.  The workaround doesn't create such an object.


Recognizing that your function never tries to call fun requires global 
inspection of toto(), and most of the checks are based on local 
inspection.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Advice debugging M1Mac check errors

2024-02-04 Thread Duncan Murdoch

Hi John.

I don't think the 80 bit format was part of IEEE 754; I think it was an 
Intel invention for the 8087 chip (which I believe preceded that 
standard), and didn't make it into the standard.


The standard does talk about 64 bit and 128 bit floating point formats, 
but not 80 bit.


Duncan Murdoch

On 04/02/2024 4:47 p.m., J C Nash wrote:

Slightly tangential: I had some woes with some vignettes in my
optimx and nlsr packages (actually in examples comparing to OTHER
packages) because the M? processors don't have 80 bit registers of
the old IEEE 754 arithmetic, so some existing "tolerances" are too
small when looking to see if is small enough to "converge", and one
gets "did not converge" type errors. There are workarounds,
but the discussion is beyond this post. However, worth awareness that
the code may be mostly correct except for appropriate tests of
smallness for these processors.

JN




On 2024-02-04 11:51, Dirk Eddelbuettel wrote:


On 4 February 2024 at 20:41, Holger Hoefling wrote:
| I wanted to ask if people have good advice on how to debug M1Mac package
| check errors when you don´t have a Mac? Is a cloud machine the best option
| or is there something else?

a) Use the 'mac builder' CRAN offers:
 https://mac.r-project.org/macbuilder/submit.html

b) Use the newly added M1 runners at GitHub Actions,
 
https://github.blog/changelog/2024-01-30-github-actions-introducing-the-new-m1-macos-runner-available-to-open-source/

Option a) is pretty good as the machine is set up for CRAN and builds
fast. Option b) gives you more control should you need it.

Dirk



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NOTE: multiple local function definitions for ?fun? with different formal arguments

2024-02-04 Thread Duncan Murdoch

On 04/02/2024 10:55 a.m., Izmirlian, Grant (NIH/NCI) [E] via R-devel wrote:

Well you can see that yeast is exactly weekday you have.  The way out is to 
just not name the result


I think something happened to your explanation...



toto <- function(mode)
{
 ifelse(mode == 1,
 function(a,b) a*b,
 function(u, v, w) (u + v) / w)
}


It's a bad idea to use ifelse() when you really want if() ... else ... . 
 In this case it works, but it doesn't always.  So the workaround should be



toto <- function(mode)
{
if(mode == 1)
function(a,b) a*b
else
function(u, v, w) (u + v) / w
}






From: Grant Izmirlian 
Date: Sun, Feb 4, 2024, 10:44 AM
To: "Izmirlian, Grant (NIH/NCI) [E]" 
Subject: Fwd: [EXTERNAL] R-devel Digest, Vol 252, Issue 2

Hi,

I just ran into this 'R CMD check' NOTE for the first time:

* checking R code for possible problems ... NOTE
toto: multiple local function definitions for �fun� with different
   formal arguments

The "offending" code is something like this (simplified from the real code):

toto <- function(mode)
{
 if (mode == 1)
 fun <- function(a, b) a*b
 else
 fun <- function(u, v, w) (u + v) / w
 fun
}

Is that NOTE really intended? Hard to see why this code would be
considered "wrong".

I know it's just a NOTE but still...


I agree it's a false positive, but the issue is that you have a function 
object in your function which can't be called unconditionally.  The 
workaround doesn't create such an object.


Recognizing that your function never tries to call fun requires global 
inspection of toto(), and most of the checks are based on local inspection.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [Feature Request] Hide API Key in download.file() / R's libcurl

2024-02-01 Thread Duncan Murdoch
I've just been reading 
https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication, and it 
states that putting userid:password in the URL is deprecated, but it 
does make sense that R should protect users who still use that scheme.


Duncan Murdoch

On 01/02/2024 11:28 a.m., Xinyi wrote:

Hi all,

When trying to install a package from R using install.packages(), it will
print out the full url address (of the remote repository) it was trying to
access. A bit further digging shows it is from the in_do_curlDownload
method from R's libcurl
<https://github.com/wch/r-source/blob/trunk/src/modules/internet/libcurl.c>:
install.packages() calls download.packages(), and download.packages() calls
download.file(), which uses "libcurl" as its default method.

This line from R mirror
<https://github.com/wch/r-source/blob/trunk/src/modules/internet/libcurl.c#L772>
("if (!quiet) REprintf(_("trying URL '%s'\n"), url);")  prints the full url
it is trying to access.

This is totally fine for public urls without credentials, but in the case
that a given url contains an API key, it poses security issues. For
example, if the getOption("repos") has been overridden to a
customized repository (protected by API keys), then

install.packages("zoo")

Installing packages into '--removed local directory path--'
trying URL 'https://--removed userid--:--removed
api-ke...@repository-addresss.com:4443/.../src/contrib/zoo_1.8-12.tar.gz  '
Content type 'application/x-gzip' length 782344 bytes (764 KB)
===
downloaded 764 KB

* installing *source* package 'zoo' ...
-- further logs removed --




I also tried several other options:

1. quite=1

install.packages("zoo", quite=1)

It did hide the url, but it also hid all other useful information.
2. method="curl"

install.packages("zoo", method="curl")

This does not print the url when the download is successful, but if there
were any errors, it still prints the url with API key in it.
3. method="wget"

install.packages("zoo", method="wget")

This hides API key by *password*, but I wasn't able to install packages
with this method even with public repos, with the error "Warning: unable to
access index for repository https://cloud.r-project.org/src/contrib/4.3:
'wget' call had nonzero exit status"


In other dynamic languages' package managers like Python's pip, API keys
are hidden by default since pip 18.x in 2018, and masked by "" from pip
19.x in 2019, see below examples. Can we get a similar default behaviour in
R?

1. with pip 10.x
$ pip install numpy -v # API key was not hided
Looking in indexes:  https://--removed userid--:--removed
api-ke...@repository-addresss.com:4443/.../pypi/simple
2. with pip 18.x # All credentials are removed by pip
$ pip install numpy -v
Looking in indexes:  https://repository-addresss.com:4443/
.../pypi/simple
3. with pip 19.x onwards # userid is kept, API key is replaced by 
$ pip install numpy -v
Looking in indexes:  https://userid:@
repository-addresss.com:4443/.../pypi/simple


I was instructed by https://www.r-project.org/bugs.html that I should get
some discussion on r-devel before filing a feature request. So looking
forward to comments/suggestions.

Thanks,
Xinyi

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] readChar() could read the whole file by default?

2024-01-29 Thread Duncan Murdoch

On 29/01/2024 1:09 p.m., Toby Hocking wrote:

My opinion is that the proposed feature would be greatly appreciated by users.
I had always wondered if I was the only one doing paste(readLines(f),
collapse="\n") all the time.
It would be great to have the proposed, more straightforward way to
read the whole file as a string: readChar("my_file.txt", -1) or even
better readChar("my_file.txt")
Thanks for your detailed analysis Michael.


These two things aren't the same:

  paste(readLines(f), collapse = "\n")

is not the same as

  readChar(f, file.size(f))

in cases where the file has Windows-style newlines and you're reading it 
on Unix, because the first one converts the CR LF newlines into \n, 
while the second would give \r\n.  I think they would match for reading 
Unix-style files on Windows.)


Does this ever matter?  I don't know, but I think usually people would 
want the behaviour of paste(readLines(f), collapse = "\n").


Duncan Murdoch




On Fri, Jan 26, 2024 at 2:05 PM luke-tierney--- via R-devel
 wrote:


On Fri, 26 Jan 2024, Michael Chirico wrote:


I am curious why readLines() has a default (n=-1L) to read the full
file while readChar() has no default for nchars= (i.e., readChar(file)
is an error). Is there a technical reason for this?

I often[1] see code like paste(readLines(f), collapse="\n") which
would be better served by readChar(), especially given issues with the
global string cache I've come across[2]. But lacking the default, the
replacement might come across less clean.


The string cache seems like a very dark pink herring to me. The fact
that the lines are allocated on the heap might create an issue; the
cache isn't likely to add much to that. In any case I would need to
see a realistic example to convince me this is worth addressing on
performance grounds.

I don't see any reason in principle not to have readChar and readBin
read the entire file if n = -1 (others might) but someone would need
to write a patch to implement that.

Best,

luke


For my own purposes the incantation readChar(file, file.size(file)) is
ubiquitous. Taking CRAN code[3] as a sample[4], 41% of readChar()
calls use either readChar(f, file.info(f)$size) or readChar(f,
file.size(f))[5].

Thanks for the consideration and feedback,
Mike C

[1] e.g. a quick search shows O(100) usages in CRAN packages:
https://github.com/search?q=org%3Acran+%2Fpaste%5B%28%5D%5Cs*readLines%5B%28%5D.*%5B%29%5D%2C%5Cs*collapse%5Cs*%3D%5Cs*%5B%27%22%5D%5B%5C%5C%5D%2F+lang%3AR&type=code,
and O(1000) usages generally on GitHub:
https://github.com/search?q=lang%3AR+%2Fpaste%5B%28%5D%5Cs*readLines%5B%28%5D.*%5B%29%5D%2C%5Cs*collapse%5Cs*%3D%5Cs*%5B%27%22%5D%5B%5C%5C%5D%2F+lang%3AR&type=code
[2] AIUI the readLines() approach "pollutes" the global string cache
with potentially 1000s/1s of strings for each line, only to get
them gc()'d after combining everything with paste(collapse="\n")
[3] The mirror on GitHub, which includes archived packages as well as
current (well, eventually-consistent) versions.
[4] Note that usage in packages is likely not representative of usage
in scripts, e.g. I often saw readChar(f, 1), or eol-finders like
readChar(f, 500) + grep("[\n\r]"), which makes more sense to me as
something to find in package internals than in analysis scripts. FWIW
I searched an internal codebase (scripts and packages) and found 70%
of usages reading the full file.
[5] repro: 
https://gist.github.com/MichaelChirico/247ea9500460dca239f031e74bdcf76b
requires GitHub PAT in env GITHUB_PAT for API permissions.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
 Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] eval(parse()) within mutate() returning same value for all rows

2023-12-29 Thread Duncan Murdoch

On 29/12/2023 9:13 a.m., Mateo Obregón wrote:

Hi all-

Looking through stackoverflow for R string combining examples, I found the
following from 3 years ago:

<https://stackoverflow.com/questions/63881854/how-to-format-strings-using-values-from-other-column-in-r>

The top answer suggests to use eval(parse(sprintf())). I tried the suggestion
and it did not return the expected combines strings. I thought that this might
be an issue with some leftover values being reused, so I explicitly eval()
with a new.env():


library(dplyr)
df <- tibble(words=c("%s plus %s equals %s"),

args=c("1,1,2","2,2,4","3,3,6"))

df |> mutate(combined = eval(parse(text=sprintf("sprintf('%s', %s)", words,

args)), envir=new.env()))

# A tibble: 3 × 3
   wordsargs  combined

1 %s plus %s equals %s 1,1,2 3 plus 3 equals 6
2 %s plus %s equals %s 2,2,4 3 plus 3 equals 6
3 %s plus %s equals %s 3,3,6 3 plus 3 equals 6

The `combined`  is not what I was expecting, as the same last eval() is
returned for all three rows.
  
Am I missing something? What has changed in the past three years?




I don't know if this is a change, but when `eval()` is passed an 
expression vector, it evaluates the elements in order and returns the 
value of the last one.  This is only partially documented:


"Value:  The result of evaluating the object: for an expression vector 
this is the result of evaluating the last element."



That text has been unchanged in the help page for 13 years.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] zapsmall(x) for scalar x

2023-12-17 Thread Duncan Murdoch
I'm really confused.  Steve's example wasn't a scalar x, it was a 
vector.  Your zapsmall() proposal wouldn't zap it to zero, and I don't 
see why summary() would if it was using your proposal.


Duncan Murdoch

On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote:

Isn’t that the correct outcome?  The user can change the number of digits if 
they want to see small values…


--
Change your thoughts and you change the world.
--Dr. Norman Vincent Peale


On Dec 17, 2023, at 12:11 AM, Steve Martin  wrote:

Zapping a vector of small numbers to zero would cause problems when
printing the results of summary(). For example, if
zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0, ..., 0) then
print(summary(2.220446e-16), digits = 7) would print
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
0  00   0   0  0

The same problem can also appear when printing the results of
summary.glm() with show.residuals = TRUE if there's little dispersion
in the residuals.

Steve


On Sat, 16 Dec 2023 at 17:34, Gregory Warnes  wrote:

I was quite suprised to discover that applying `zapsmall` to a scalar value has 
no apparent effect.  For example:


y <- 2.220446e-16
zapsmall(y,)

[1] 2.2204e-16

I was expecting zapsmall(x)` to act like


round(y, digits=getOption('digits'))

[1] 0

Looking at the current source code, indicates that `zapsmall` is expecting a 
vector:

zapsmall <-
function (x, digits = getOption("digits"))
{
if (length(digits) == 0L)
stop("invalid 'digits'")
if (all(ina <- is.na(x)))
return(x)
mx <- max(abs(x[!ina]))
round(x, digits = if (mx > 0) max(0L, digits - as.numeric(log10(mx))) else 
digits)
}

If `x` is a non-zero scalar, zapsmall will never perform rounding.

The man page simply states:
zapsmall determines a digits argument dr for calling round(x, digits = dr) such 
that values close to zero (compared with the maximal absolute value) are 
‘zapped’, i.e., replaced by 0.

and doesn’t provide any details about how ‘close to zero’ is defined.

Perhaps handling the special when `x` is a scalar (or only contains a single 
non-NA value)  would make sense:

zapsmall <-
function (x, digits = getOption("digits"))
{
if (length(digits) == 0L)
stop("invalid 'digits'")
if (all(ina <- is.na(x)))
return(x)
mx <- max(abs(x[!ina]))
round(x, digits = if (mx > 0 && (length(x)-sum(ina))>1 ) max(0L, digits - 
as.numeric(log10(mx))) else digits)
}

Yielding:


y <- 2.220446e-16
zapsmall(y)

[1] 0

Another edge case would be when all of the non-na values are the same:


y <- 2.220446e-16
zapsmall(c(y,y))

[1] 2.220446e-16 2.220446e-16

Thoughts?


Gregory R. Warnes, Ph.D.
g...@warnes.net
Eternity is a long time, take a friend!



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Request: documenting more specifically language objects in the R Language Definition document

2023-12-13 Thread Duncan Murdoch
I doubt if anyone will take you up on this request.  Only R Core members 
can change those manuals, and it's hard work to write clear and correct 
documentation.  This probably won't make it high enough on their lists 
of priorities to actually be addressed.


What you could do is try to write it yourself.  Find some helpers who 
really know the details (not necessarily R Core members) to review your 
proposal.  Once you have it written and everyone agrees it is correct, 
either publish it as a blog entry somewhere, or submit it to R Core for 
inclusion in the manual.  I don't recommend posting early drafts to this 
mailing list, though you could post near-final ones here:  you're only 
going to get a few comments before people lose interest.


This would be a lot of work for you.  Besides the work of writing 
clearly and correctly, you need to learn the material.  But that's a big 
benefit for you if you are really interested in working with this kind 
of thing.


Duncan Murdoch

On 13/12/2023 4:19 a.m., Iago Giné Vázquez wrote:

Dear  all,


This is a request to get language objects more documented in the R Language Definition 
document (CRAN version<https://cran.r-project.org/doc/manuals/r-release/R-lang.html>, 
ETHZ R-devel version<https://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html>).

Section '2.1.3 Language objects' claims
There are three types of objects that constitute the R language. They are 
calls, expressions, and names.
But then there is only a subsection '2.1.3.1 Symbol objects' which, if I do not 
understand wrongly, correspond to names subtype of language objects. It would 
be great if calls and expressions subtypes were specified with more detail as 
well. And also calls subtype 'formula'.

I came to here since when looking help for formula, it documents the stats function 
formula -Model Formula-, and it just says that it produces an object of class 
'"formula"' [...] and that a formula object has an associated environment 
[...]. Maybe this, and saying  that the mode of a formula is a call it is enough to 
describe a formula?

Same section 2.1.3 also claims

They can be [...] converted to and from lists by the as.list and as.call 
functions

It could be added also a description of how these lists should be (structured, 
their components, names, etc.) for the different language objects, that is, for 
names, expressions, calls, formulas and so on.

Thank you.

Best wishes,
Iago




[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] system()/system2() using short paths of commands on Windows?

2023-10-31 Thread Duncan Murdoch

On 31/10/2023 4:32 a.m., Tomas Kalibera wrote:


On 10/30/23 19:07, Yihui Xie wrote:

Sure. I'm not sure if it's possible to make it easier to reproduce,
but for now the example would require installing TinyTeX (via
tinytex::install_tinytex(), which can be later uninstalled cleanly via
tinytex::uninstall_tinytex() after you finish the investigation). Then
run:

   system2('fmtutil-sys', '--all')
   # or tinytex:::fmtutil() if fmtutil-sys.exe is not on PATH

and TeX Live would throw an error like this:

...\username\AppData\Roaming\TinyTeX\bin\windows\runscript.tlu:864: no
appropriate script or program found: fmtuti~1

The command "fmtutil-sys" is longer than 8 characters and hence
shortened to "fmtutil~1". Yes, in principle, TeX Live should work with
short path names, but it doesn't at the moment. I haven't figured out
if it was a recent breakage in TeX Live or not (I've tried to contact
TeX Live developers).

BTW, shell('fmtutil-sys --all') works fine.


I can reproduce the problem, also separately from R. It is not an R problem

./fmtutil-sys.exe --version
works

./fmtuti~1 --version
doesn't work

The problem is in runscript.tlu, when it looks at "progname", it parses
it assuming it is the full name, looking for "-sys" suffix, which won't
be in the short name:

progname, substcount = string.gsub(progname, '%-sys$', '')
local sysprog = (substcount > 0) -- true if there was a -sys suffix removed

and it does further processing using the program name.

This has to be fixed on the luatex end, it must be able to work with
short paths (e.g. expand it appropriately). You could probably work
around the installation before it gets fixed, e.g. by creating another
wrapper which would expand to long names, delete the short name, patch
the script, etc. After all, if it works via a shell, then probably the
shell is expanding to the long names and you have a work-around (I don't
know how reliable).

Adding an option to R's system*() functions to use only long names
doesn't make sense.


On the other hand, not modifying the executable name would make a lot of 
sense, wouldn't it?  I'm pretty sure all supported versions of Windows 
can handle long filenames.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wayland Display Support in R Plot

2023-10-29 Thread Duncan Murdoch

On 29/10/2023 4:20 p.m., Simon Urbanek wrote:




On 30/10/2023, at 8:38 AM, Dirk Eddelbuettel  wrote:


On 30 October 2023 at 07:54, Paul Murrell wrote:
| I am unaware of any Wayland display support.
|
| One useful way forward would be an R package that provides such a device
| (along the lines of 'Cairo', 'tikzDevice', et al)

As I understand it, it is a protocol, and not a device.



Well, X11 is a protocol, not a device, either.

Wayland is a lot worse, since it doesn't really do much at all - the clients 
are fully responsible for drawing (doesn't even support remote connections).

Given that Wayland is essentially a "dumb" framebuffer, probably the easiest 
way would be to take Cairo and add a libwayland back-end. Cairo is already modular so 
it's relatively straight-forward to add a new back-end to it (I'd probably just copy 
xlib-backend.c and replace X11 calls with libwayland calls since the low-level design is 
the same).

However, that is limited only to devices, so you would still run R code in the 
shell (or other GUI that may or may not by Wayland-based). Given that Wayland 
is so minimal, you'd need some GUI library for anything beyond that - so you 
may was well just run a Wayland-based browser and be done with it saving you 
all the bother (oh, right, that's called RStudio ;)).

One package that may be worth adding Wayland backend to is rgl so you get 
OpenGL on Wayland - I'd simply re-write it to use GLFW so it works across all 
platforms and including Wayland.


I looked into using GLFW a while ago, but it seemed too hard to do 
without other really major changes to rgl, so that's not going to happen 
soon (unless someone else does it).


I think the issue was that it was hard to get it to work with the 
ancient OpenGL 1.2 that rgl uses.  I forget whether it was just hard or 
actually impossible.


I am slowly working towards having rgl use newer OpenGL versions, but I 
don't expect this to be done for quite a while.


Duncan Murdoch


Cheers,
Simon




Several Linux distributions have long defaulted to it, so we already should
have thousands of users. While 'not X11' it provides a compatibility layer
and should be seamless.

I think I needed to fall back to X11 for a particular applications (likely
OBS) so my session tells me (under Settings -> About -> Windowing System) I
am still running X11. I'll check again once I upgrade from Ubuntu 23.04 to
Ubuntu 23.10

See https://en.wikipedia.org/wiki/Wayland_(protocol) for more.

Dirk

--
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] FR: valid_regex() to test string validity as a regular expression

2023-10-09 Thread Duncan Murdoch

On 09/10/2023 7:57 p.m., Michael Chirico via R-devel wrote:

It will be useful to package authors trying to validate input which is
supposed to be a valid regular expression.

As near as I can tell, the only way we can do so now is to run any
regex function and check for the warning and/or condition to bubble
up:

valid_regex <- function(str) {
   stopifnot(is.character(str), length(str) == 1L)
   !inherits(tryCatch(grepl(str, ""), condition = identity), "condition")
}

That's pretty hefty/inscrutable for such a simple validation. I see a
variety of similar approaches in CRAN packages [1], all slightly
different. It would be good for R to expose a "canonical" way to run
this validation.


I think currently we do as.character(str) (or some equivalent), so the 
test shouldn't require str to be a character to start.  For example, 
this is currently valid code:


  grepl(1, "abc123")

It's not great style, but shouldn't generate an error.

Duncan Murdoch



At root, the problem is that R does not expose the regex compilation
routines like 'tre_regcomp', so from the R side we have to resort to
hacky approaches.

Things get slightly complicated by encoding/useBytes modes
(tre_regwcomp, tre_regncomp, tre_regwncomp, tre_regcompb,
tre_regncompb; all in tre.h), but all are already present in other
regex routines, so this is doable.

Exposing a function to compile regular expressions is common in other
languages, e.g. Go [2], Python [3], JavaScript [4].

[1] 
https://github.com/search?q=lang%3AR+%2Fis%5Ba-zA-Z0-9._%5D*reg%5Ba-zA-Z0-9._%5D*ex.*%28%3C-%7C%3D%29%5Cs*function%2F+org%3Acran&type=code
[2] https://pkg.go.dev/regexp#Compile
[3] https://docs.python.org/3/library/re.html#re.compile
[4] 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as(, "dgTMatrix")' is deprecated.

2023-10-03 Thread Duncan Murdoch

On 03/10/2023 12:50 p.m., Koenker, Roger W wrote:

I’ve been getting this warning for a while now (about five years if memory 
serves) and I’m finally tired of it, but also too tired to track it down in 
Matrix.  As far as I can grep  I have no reference to either deprecated object, 
only the apparently innocuous  Matrix::Matrix(A, sparse = TRUE).  Can someone 
advise, Martin perhaps?  I thought it might come from Rmosek, but mosek folks 
don’t think so.
https://groups.google.com/g/mosek/c/yEwXmMfHBbg/m/l_mkeM4vAAAJ


A quick scan of that discussion didn't turn up anything relevant, e.g. a 
script to produce the warning.  Could you be more specific, or just post 
the script here?


In general, a good way to locate the source of a warning is to set 
options(warn=2) to turn it into an error, and then trigger it.  The 
traceback from the error will include a bunch of junk from the code that 
catches the warning, but it will also include the context where it was 
triggered.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problems caused by dev.off() behaviour

2023-10-03 Thread Duncan Murdoch

On 02/10/2023 10:17 p.m., Trevor Davis wrote:

 > Thanks!  However, isn't length(dev.list()) == 0 when there are no
devices?  That's what I'm seeing on MacOS.

Ifthere is only one graphics device then R should automatically set it 
as the active graphics device so it isn't really necessary to manually 
set it.  Although there wouldn't be any harm in manually setting it you 
only really need to worry about setting the previous graphics device 
when there are two or more devices open.


Right, I see.  With some more fiddling, I've decided that I don't like 
the error you get if you try to close device 1, so here's the current 
version:


safe.dev.off  <- function(which = dev.cur(), prev = dev.prev()) {
  if (which != 1) {
force(prev)
grDevices::dev.off(which)
  }
  if (length(dev.list()))
dev.set(prev)
  else
c("null device" = 1)
}

This does the dev.set even if there's only one device so it can return 
the resulting device number.


Duncan Murdoch



Trevor

On Mon, Oct 2, 2023 at 5:25 PM Duncan Murdoch <mailto:murdoch.dun...@gmail.com>> wrote:


Thanks!  However, isn't length(dev.list()) == 0 when there are no
devices?  That's what I'm seeing on MacOS.

Duncan Murdoch

On 02/10/2023 4:21 p.m., Trevor Davis wrote:
 >  > Use it just like dev.off(), but it *will* restore the previous
device.
 >
 > I'm observing that if there were no previously open graphics devices
 > then your `safe.dev.off()` opens up a new graphics device which
may be
 > an undesired side effect (because "surprisingly" `dev.set()` on
the null
 > graphics device opens up a new graphics device).  To avoid that you
 > could check if `dev.list()` is greater than length 1L:
 >
 >     safe.dev.off  <- function(which = dev.cur(), prev = dev.prev()) {
 >       force(prev)
 >       dev.off(which)
 >       if (length(dev.list()) > 1L) {
 >         dev.set(prev)
 >       }
 >     }
 >
 > Trevor
 >
 > On Mon, Oct 2, 2023 at 11:54 AM Duncan Murdoch
mailto:murdoch.dun...@gmail.com>
 > <mailto:murdoch.dun...@gmail.com
<mailto:murdoch.dun...@gmail.com>>> wrote:
 >
 >     I found some weird behaviour and reported it as
 > https://bugs.r-project.org/show_bug.cgi?id=18604
<https://bugs.r-project.org/show_bug.cgi?id=18604>
 >     <https://bugs.r-project.org/show_bug.cgi?id=18604
<https://bugs.r-project.org/show_bug.cgi?id=18604>> and
 > https://github.com/yihui/knitr/issues/2297
<https://github.com/yihui/knitr/issues/2297>
 >     <https://github.com/yihui/knitr/issues/2297
<https://github.com/yihui/knitr/issues/2297>>, but it turns out it
 >     was user
 >     error.
 >
 >     The dev.off() function was behaving as documented, but it
behaves in an
 >     unexpected (by me) way, and that caused the "bugs".
 >
 >     The issue is that
 >
 >          dev.off()
 >
 >     doesn't always result in the previous graphics device being made
 >     current.  If there are two or more other open graphics
devices, it
 >     won't
 >     choose the previous one, it will choose the next one.
 >
 >     I'm letting people know because this might affect other
people too.  If
 >     you use dev.off(), don't assume it restores the previous device!
 >
 >     Here's my little workaround alternative:
 >
 >         safe.dev.off  <- function(which = dev.cur(), prev =
dev.prev()) {
 >           force(prev)
 >           dev.off(which)
 >           dev.set(prev)
 >         }
 >
 >     Use it just like dev.off(), but it *will* restore the
previous device.
 >
 >     Duncan Murdoch
 >
 >     __
 > R-devel@r-project.org <mailto:R-devel@r-project.org>
<mailto:R-devel@r-project.org <mailto:R-devel@r-project.org>>
mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel
<https://stat.ethz.ch/mailman/listinfo/r-devel>
 >     <https://stat.ethz.ch/mailman/listinfo/r-devel
<https://stat.ethz.ch/mailman/listinfo/r-devel>>
 >



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problems caused by dev.off() behaviour

2023-10-02 Thread Duncan Murdoch
Thanks!  However, isn't length(dev.list()) == 0 when there are no 
devices?  That's what I'm seeing on MacOS.


Duncan Murdoch

On 02/10/2023 4:21 p.m., Trevor Davis wrote:

 > Use it just like dev.off(), but it *will* restore the previous device.

I'm observing that if there were no previously open graphics devices 
then your `safe.dev.off()` opens up a new graphics device which may be 
an undesired side effect (because "surprisingly" `dev.set()` on the null 
graphics device opens up a new graphics device).  To avoid that you 
could check if `dev.list()` is greater than length 1L:


    safe.dev.off  <- function(which = dev.cur(), prev = dev.prev()) {
      force(prev)
      dev.off(which)
      if (length(dev.list()) > 1L) {
        dev.set(prev)
      }
    }

Trevor

On Mon, Oct 2, 2023 at 11:54 AM Duncan Murdoch <mailto:murdoch.dun...@gmail.com>> wrote:


I found some weird behaviour and reported it as
https://bugs.r-project.org/show_bug.cgi?id=18604
<https://bugs.r-project.org/show_bug.cgi?id=18604> and
https://github.com/yihui/knitr/issues/2297
<https://github.com/yihui/knitr/issues/2297>, but it turns out it
was user
error.

The dev.off() function was behaving as documented, but it behaves in an
unexpected (by me) way, and that caused the "bugs".

The issue is that

     dev.off()

doesn't always result in the previous graphics device being made
current.  If there are two or more other open graphics devices, it
won't
choose the previous one, it will choose the next one.

I'm letting people know because this might affect other people too.  If
you use dev.off(), don't assume it restores the previous device!

Here's my little workaround alternative:

    safe.dev.off  <- function(which = dev.cur(), prev = dev.prev()) {
      force(prev)
      dev.off(which)
      dev.set(prev)
    }

Use it just like dev.off(), but it *will* restore the previous device.

Duncan Murdoch

__
R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
<https://stat.ethz.ch/mailman/listinfo/r-devel>



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Problems caused by dev.off() behaviour

2023-10-02 Thread Duncan Murdoch

I found some weird behaviour and reported it as
https://bugs.r-project.org/show_bug.cgi?id=18604 and
https://github.com/yihui/knitr/issues/2297, but it turns out it was user 
error.


The dev.off() function was behaving as documented, but it behaves in an 
unexpected (by me) way, and that caused the "bugs".


The issue is that

   dev.off()

doesn't always result in the previous graphics device being made 
current.  If there are two or more other open graphics devices, it won't 
choose the previous one, it will choose the next one.


I'm letting people know because this might affect other people too.  If 
you use dev.off(), don't assume it restores the previous device!


Here's my little workaround alternative:

  safe.dev.off  <- function(which = dev.cur(), prev = dev.prev()) {
force(prev)
dev.off(which)
dev.set(prev)
  }

Use it just like dev.off(), but it *will* restore the previous device.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Tight bounding box around text in graphics?

2023-09-26 Thread Duncan Murdoch
I think this is a `ragg` device bug, so I've posted an issue there: 
https://github.com/r-lib/ragg/issues/143 .  That issue contains slightly 
more detail than I included in the earlier message here.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Tight bounding box around text in graphics?

2023-09-26 Thread Duncan Murdoch
I've done some exploring, and things aren't working out as I would have 
expected.  Here's an example:


  library(grid)
  library(ragg)

  agg_png("test.png")
  pushViewport(viewport(gp = gpar(cex = 5)))
  y <-  c(0.2, 0.4, 0.6)

  texts <- c("東京", "Tokyo", "Tokyo 東京")

  convertHeight(stringDescent(texts), "npc")

  grid.text(texts,
x = 0, y = y, just = c(0,0))
  grid.segments(x0 = 0, x1 = 1, y0 = y, y1 = y)

  popViewport()
  dev.off()

(In case it doesn't make it through in email, the texts are Tokyo 
written in kanji, then in roman letters, then in both.)


What I see is that the kanji string is being reported as having zero 
descent, but in the resulting test.png file, it's clear that it does 
descend below the baseline.


I can think of lots of reasons for this discrepancy; can you suggest 
which is likeliest?


 - I have some misconception about how this is supposed to work
 - The Kanji font on my system misreports some measurements
 - The ragg::agg_png() device misreports measurements
 - Some other bug somewhere.

Duncan


On 26/09/2023 5:57 a.m., Duncan Murdoch wrote:

Thanks!  I said "base graphics" in my question, but I really have no
objection to using grid graphics, so I'll explore those grid functions.

Duncan Murdoch

On 25/09/2023 3:53 p.m., Paul Murrell wrote:

Hi

strheight(), which is based GEStrHeight(), is pretty crude, not only
ignoring descenders, but also only considering the ascent of the overall
font (capital "M").

There is a GEStrMetric(), which returns character-specific ascent and
descent, but that is only currently exposed via grid::stringAscent() and
grid::stringDescent().  There is also grid::stringHeight(), which is as
unsubtle as strheight().

For example, these are all the same (just font ascent) ...

   > strheight("y", "in")
[1] 0.1248031
   > strheight("x", "in")
[1] 0.1248031
   > strheight("M", "in")
[1] 0.1248031

... and these are all the same ...

   > convertHeight(stringHeight("y"), "in")
[1] 0.124803149606299inches
   > convertHeight(stringHeight("x"), "in")
[1] 0.124803149606299inches
   > convertHeight(stringAscent("M"), "in")
[1] 0.124803149606299inches

... but these have more detail ...

   > convertHeight(stringAscent("y"), "in")
[1] 0.0936023622047244inches
   > convertHeight(stringDescent("y"), "in")
[1] 0.0416010498687664inches
   > convertHeight(stringAscent("x"), "in")
[1] 0.0936023622047244inches
   > convertHeight(stringDescent("x"), "in")
[1] 0inches
   > convertHeight(stringHeight("M"), "in")
[1] 0.124803149606299inches
   > convertHeight(stringDescent("M"), "in")
[1] 0inches

In theory, it should not be difficult to add a graphics::strascent() and
graphics::strdescent() if that would help.

Paul

On 26/09/23 08:06, Duncan Murdoch wrote:

I've mentioned in previous messages that I'm trying to redo rgl text.

Part of what I need is to measure the size of strings in pixels when
they are drawn by base graphics.

It appears that

strwidth(texts, "user", cex = cex, font = font, family = family)

gives accurate measurements of the width in user coordinates. I've got
those set up to match pixels, so I'm fine here.

However, the equivalent call for strheight() only measures height above
the baseline according to the docs, and indeed the number is smaller
than the size of what's displayed. Descenders (e.g. the tail of "y")
aren't counted.

Is there a way to measure how far a character might descend? Is it
valid to assume it won't descend more than a line height below the top
of the char?

I have a partial solution -- textshaping::shape_text gives a "height"
value that includes lots of space below the character, and a
"top_border" value that measures from the top of the textbox to the
baseline. So I think `height - top_border` would give me what I'm
asking for. But this only works with graphics devices in the ragg
package. Is there a general solution?

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
<https://stat.ethz.ch/mailman/listinfo/r-devel>




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Tight bounding box around text in graphics?

2023-09-26 Thread Duncan Murdoch
Thanks!  I said "base graphics" in my question, but I really have no 
objection to using grid graphics, so I'll explore those grid functions.


Duncan Murdoch

On 25/09/2023 3:53 p.m., Paul Murrell wrote:

Hi

strheight(), which is based GEStrHeight(), is pretty crude, not only
ignoring descenders, but also only considering the ascent of the overall
font (capital "M").

There is a GEStrMetric(), which returns character-specific ascent and
descent, but that is only currently exposed via grid::stringAscent() and
grid::stringDescent().  There is also grid::stringHeight(), which is as
unsubtle as strheight().

For example, these are all the same (just font ascent) ...

  > strheight("y", "in")
[1] 0.1248031
  > strheight("x", "in")
[1] 0.1248031
  > strheight("M", "in")
[1] 0.1248031

... and these are all the same ...

  > convertHeight(stringHeight("y"), "in")
[1] 0.124803149606299inches
  > convertHeight(stringHeight("x"), "in")
[1] 0.124803149606299inches
  > convertHeight(stringAscent("M"), "in")
[1] 0.124803149606299inches

... but these have more detail ...

  > convertHeight(stringAscent("y"), "in")
[1] 0.0936023622047244inches
  > convertHeight(stringDescent("y"), "in")
[1] 0.0416010498687664inches
  > convertHeight(stringAscent("x"), "in")
[1] 0.0936023622047244inches
  > convertHeight(stringDescent("x"), "in")
[1] 0inches
  > convertHeight(stringHeight("M"), "in")
[1] 0.124803149606299inches
  > convertHeight(stringDescent("M"), "in")
[1] 0inches

In theory, it should not be difficult to add a graphics::strascent() and
graphics::strdescent() if that would help.

Paul

On 26/09/23 08:06, Duncan Murdoch wrote:

I've mentioned in previous messages that I'm trying to redo rgl text.

Part of what I need is to measure the size of strings in pixels when
they are drawn by base graphics.

It appears that

strwidth(texts, "user", cex = cex, font = font, family = family)

gives accurate measurements of the width in user coordinates. I've got
those set up to match pixels, so I'm fine here.

However, the equivalent call for strheight() only measures height above
the baseline according to the docs, and indeed the number is smaller
than the size of what's displayed. Descenders (e.g. the tail of "y")
aren't counted.

Is there a way to measure how far a character might descend? Is it
valid to assume it won't descend more than a line height below the top
of the char?

I have a partial solution -- textshaping::shape_text gives a "height"
value that includes lots of space below the character, and a
"top_border" value that measures from the top of the textbox to the
baseline. So I think `height - top_border` would give me what I'm
asking for. But this only works with graphics devices in the ragg
package. Is there a general solution?

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
<https://stat.ethz.ch/mailman/listinfo/r-devel>




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Tight bounding box around text in graphics?

2023-09-25 Thread Duncan Murdoch

I've mentioned in previous messages that I'm trying to redo rgl text.

Part of what I need is to measure the size of strings in pixels when 
they are drawn by base graphics.


It appears that

  strwidth(texts, "user", cex = cex, font = font, family = family)

gives accurate measurements of the width in user coordinates.  I've got 
those set up to match pixels, so I'm fine here.


However, the equivalent call for strheight() only measures height above 
the baseline according to the docs, and indeed the number is smaller 
than the size of what's displayed.  Descenders (e.g. the tail of "y") 
aren't counted.


Is there a way to measure how far a character might descend?  Is it 
valid to assume it won't descend more than a line height below the top 
of the char?


I have a partial solution -- textshaping::shape_text gives a "height" 
value that includes lots of space below the character, and a 
"top_border" value that measures from the top of the textbox to the 
baseline.  So I think `height - top_border` would give me what I'm 
asking for.  But this only works with graphics devices in the ragg 
package.  Is there a general solution?


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Help requested: writing text to a raster in memory

2023-09-24 Thread Duncan Murdoch
I'm somewhat aware of how tricky it all is.  For now I'm going to do it 
in R (usng textshaping for layout and base graphics on the 
ragg::agg_capture device to draw to the bitmap).  I'll avoid allowing 
changes to happen in the C++ code.


Eventually I'll see if I can translate the code into C++.  I know 
textshaping has a C interface, but for the actual drawing I'll have to 
work something else out.  Or maybe just leave it in R, and only try to 
write a new bitmap when it's safe.


For future reference, will the measurements reported by 
textshaping::shape_text() match the values used by your Cairo package, 
or are equivalent measurements available elsewhere?


Duncan Murdoch

On 24/09/2023 6:55 p.m., Simon Urbanek wrote:

Duncan,

drawing text is one of the most complicated things you can do, so it really 
depends how for you want to go. You can do it badly with a simple cairo 
show_text API. The steps involved in doing it properly are detecting the 
direction of the language, finding fonts, finding glyphs (resolving ligatures), 
applying hints, drawing glyphs etc. Fortunately there are libraries that help 
with than, but even then it's non-trivial. Probably the most modern pipeline is 
icu + harfbuzz + freetype + fontconfig + cairo. This is implemented, e.g in 
https://github.com/s-u/Cairo/blob/master/src/cairotalk.c (the meat is in  
L608-) and for all but the drawing part there is an entire R package (in C++) 
devoted to this: https://github.com/r-lib/textshaping/tree/main/src -- Thomas 
Lin Pedersen is probably THE expert on this.

Cheers,
Simon



On 24/09/2023, at 7:44 AM, Duncan Murdoch  wrote:

I am in the process of updating the rgl package.  One thing I'd like to do is 
to change text support in it when using OpenGL to display to be more like the 
way text is drawn in WebGL displays (i.e. the ones rglwidget() produces).

Currently in R, rgl uses the FTGL library to draw text.  That library is 
unsupported these days, and uses the old fixed pipeline in OpenGL.

In WebGL, text is displayed by "shaders", programs that run on the GPU. 
Javascript code prepares bitmap images of the text to display, then the shader transfers 
parts of that bitmap to the output display.

I'd like to duplicate the WebGL process in the C++ code running the OpenGL 
display in R.  The first step in this is to render a character vector full of 
text into an in-memory raster, taking account of font, cex, etc.  (I want one 
raster for the whole vector, with a recording of locations from which the 
shader should get each component of it.)

It looks to me as though I could do this using the ragg::agg_capture device in 
R code, but I'd prefer to do it all in C++ code because I may need to make 
changes to the raster at times when it's not safe to call back to R, e.g. if 
some user interaction requires the axis labels to be recomputed and redrawn.

Does anyone with experience doing this kind of thing know of examples I can 
follow, or have advice on how to proceed?  Or want to volunteer to help with 
this?

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NROW and NCOL on NULL

2023-09-23 Thread Duncan Murdoch

On 23/09/2023 3:41 p.m., Simone Giannerini wrote:

I know it's documented and I know there are other ways to guard
against this behaviour, once you know about this.
The point is whether it might be worth it to make NCOL and NROW return
the same value on NULL and make R more consistent/intuitive and
possibly less error prone.


If you don't list any examples of problems, then the only possible 
conclusion is that there aren't any except obscure ones, so the answer 
is clearly that it is not worth it to make this change.


Duncan Murdoch



Regards,

Simone

On Sat, Sep 23, 2023 at 7:50 PM Duncan Murdoch  wrote:


It's been documented for a long time that NCOL(NULL) is 1.  What
particular problems did you have in mind?  There might be other ways to
guard against them.

Duncan Murdoch

On 23/09/2023 1:43 p.m., Simone Giannerini wrote:

Dear list,

I do not know what would be the 'correct' answer to the following but
I think that they should return the same value to avoid potential
problems and hard to debug errors.

Regards,

Simone
---


NCOL(NULL)

[1] 1


NROW(NULL)

[1] 0


sessionInfo()

R version 4.3.1 RC (2023-06-08 r84523 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default


locale:
[1] LC_COLLATE=Italian_Italy.utf8  LC_CTYPE=Italian_Italy.utf8
[3] LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.utf8

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.1








__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Help requested: writing text to a raster in memory

2023-09-23 Thread Duncan Murdoch
I am in the process of updating the rgl package.  One thing I'd like to 
do is to change text support in it when using OpenGL to display to be 
more like the way text is drawn in WebGL displays (i.e. the ones 
rglwidget() produces).


Currently in R, rgl uses the FTGL library to draw text.  That library is 
unsupported these days, and uses the old fixed pipeline in OpenGL.


In WebGL, text is displayed by "shaders", programs that run on the GPU. 
Javascript code prepares bitmap images of the text to display, then the 
shader transfers parts of that bitmap to the output display.


I'd like to duplicate the WebGL process in the C++ code running the 
OpenGL display in R.  The first step in this is to render a character 
vector full of text into an in-memory raster, taking account of font, 
cex, etc.  (I want one raster for the whole vector, with a recording of 
locations from which the shader should get each component of it.)


It looks to me as though I could do this using the ragg::agg_capture 
device in R code, but I'd prefer to do it all in C++ code because I may 
need to make changes to the raster at times when it's not safe to call 
back to R, e.g. if some user interaction requires the axis labels to be 
recomputed and redrawn.


Does anyone with experience doing this kind of thing know of examples I 
can follow, or have advice on how to proceed?  Or want to volunteer to 
help with this?


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NROW and NCOL on NULL

2023-09-23 Thread Duncan Murdoch
It's been documented for a long time that NCOL(NULL) is 1.  What 
particular problems did you have in mind?  There might be other ways to 
guard against them.


Duncan Murdoch

On 23/09/2023 1:43 p.m., Simone Giannerini wrote:

Dear list,

I do not know what would be the 'correct' answer to the following but
I think that they should return the same value to avoid potential
problems and hard to debug errors.

Regards,

Simone
---


NCOL(NULL)

[1] 1


NROW(NULL)

[1] 0


sessionInfo()

R version 4.3.1 RC (2023-06-08 r84523 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default


locale:
[1] LC_COLLATE=Italian_Italy.utf8  LC_CTYPE=Italian_Italy.utf8
[3] LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.utf8

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.1



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Recent changes to as.complex(NA_real_)

2023-09-22 Thread Duncan Murdoch
Since the result of is.na(x) is the same on each of those, I don't see a 
problem.  As long as that is consistent, I don't see a problem.  You 
shouldn't be using any other test for NA-ness.  You should never be 
expecting identical() to treat different types as the same (e.g. 
identical(NA, NA_real_) is FALSE, as it should be).  If you are using a 
different test, that's user error.


Duncan Murdoch

On 22/09/2023 2:41 p.m., Hervé Pagès wrote:

We could also question the value of having an infinite number of NA
representations in the complex space. For example all these complex
values are displayed the same way (as NA), are considered NAs by
is.na(), but are not identical or semantically equivalent (from an Re()
or Im() point of view):

      NA_real_ + 0i

      complex(r=NA_real_, i=Inf)

      complex(r=2, i=NA_real_)

      complex(r=NaN, i=NA_real_)

In other words, using a single representation for complex NA (i.e.
complex(r=NA_real_, i=NA_real_)) would avoid a lot of unnecessary
complications and surprises.

Once you do that, whether as.complex(NA_real_) should return
complex(r=NA_real_, i=0) or complex(r=NA_real_, i=NA_real_) becomes a
moot point.

Best,

H.

On 9/22/23 03:38, Martin Maechler wrote:

Mikael Jagan
  on Thu, 21 Sep 2023 00:47:39 -0400 writes:

  > Revisiting this thread from April:

  >https://stat.ethz.ch/pipermail/r-devel/2023-April/082545.html

  > where the decision (not yet backported) was made for
  > as.complex(NA_real_) to give NA_complex_ instead of
  > complex(r=NA_real_, i=0), to be consistent with
  > help("as.complex") and as.complex(NA) and as.complex(NA_integer_).

  > Was any consideration given to the alternative?
  > That is, to changing as.complex(NA) and as.complex(NA_integer_) to
  > give complex(r=NA_real_, i=0), consistent with
  > as.complex(NA_real_), then amending help("as.complex")
  > accordingly?

Hmm, as, from R-core, mostly I was involved, I admit to say "no",
to my knowledge the (above) alternative wasn't considered.

> The principle that
> Im(as.complex()) should be zero
> is quite fundamental, in my view, hence the "new" behaviour
> seems to really violate the principle of least surprise ...

of course "least surprise"  is somewhat subjective.  Still,
I clearly agree that the above would be one desirable property.

I think that any solution will lead to *some* surprise for some
cases, I think primarily because there are *many* different
values z  for which  is.na(z)  is true,  and in any case
NA_complex_  is only of the many.

I also agree with Mikael that we should reconsider the issue
that was raised by Davis Vaughan here ("on R-devel") last April.

  > Another (but maybe weaker) argument is that
  > double->complex coercions happen more often than
  > logical->complex and integer->complex ones.  Changing the
  > behaviour of the more frequently performed coercion is
  > more likely to affect code "out there".

  > Yet another argument is that one expects

  >  identical(as.complex(NA_real_), NA_real_ + (0+0i))

  > to be TRUE, i.e., that coercing from double to complex is
  > equivalent to adding a complex zero.  The new behaviour
  > makes the above FALSE, since NA_real_ + (0+0i) gives
  > complex(r=NA_real_, i=0).

No!  --- To my own surprise (!) --- in current R-devel the above is TRUE,
and
NA_real_ + (0+0i)  , the same as
NA_real_ + 0i  , really gives  complex(r=NA, i=NA) :

Using showC() from ?complex

showC <- function(z) noquote(sprintf("(R = %g, I = %g)", Re(z), Im(z)))

we see (in R-devel) quite consistently


showC(NA_real_ + 0i)

[1] (R = NA, I = NA)

showC(NA   + 0i)  # NA is 'logical'

[1] (R = NA, I = NA)
where as in R 4.3.1 and "R-patched" -- *in*consistently


showC(NA_real_ + 0i)

[1] (R = NA, I = 0)

showC(NA + 0i)

[1] (R = NA, I = NA)
 and honestly, I do not see *where* (and when) we changed
the underlying code (in arithmetic.c !?)  in R-devel to *also*
produce  NA_complex_  in such complex *arithmetic*


  > Having said that, one might also (but more naively) expect

  > identical(as.complex(as.double(NA_complex_)), NA_complex_)

  > to be TRUE.

as in current R-devel

  > Under my proposal it continues to be FALSE.

as in "R-release"

  > Well, I'd prefer if it gave FALSE with a warning
  > "imaginary parts discarded in coercion", but it seems that
  > as.double(complex(r=a, i=b)) never warns when either of
  > 'a' and 'b' is NA_real_ or NaN, even where "information"
  > {nonzero 'b'} is clearly lost ...

The question of *warning* her

Re: [Rd] Strange behaviour of do.call()

2023-09-19 Thread Duncan Murdoch
Sorry, it's a silly thinko.  I misspelled the vline argument.  Thanks 
Ivan for the gentle nudge!


Duncan Murdoch

On 19/09/2023 10:44 a.m., Duncan Murdoch wrote:

The knitr::kable() function does some internal setup, including
determining the target format, and then calls an internal function using

do.call(paste("kable", format, sep = "_"), list(x = x,
  caption = caption, escape = escape, ...))

I was interested in setting the `vlign` argument to knitr:::kable_latex,
using this code:

knitr::kable(head(mtcars), format="latex", align = "c", vlign="")

If I debug knitr::kable, I can see that `vlign = ""` is part of
list(...).  However, if I debug knitr:::kable_latex, I get weird results:

> debug(knitr:::kable_latex)
> knitr::kable(head(mtcars), format="latex", align = "c", vlign="")
debugging in: kable_latex(x = c("Mazda RX4", "Mazda RX4 Wag", "Datsun
710",
"Hornet 4 Drive", "Hornet Sportabout", "Valiant", "21.0", "21.0",
"22.8", "21.4", "18.7", "18.1", "6", "6", "4", "6", "8", "6",
"160", "160", "108", "258", "360", "225", "110", "110", "93",
"110", "175", "105", "3.90", "3.90", "3.85", "3.08", "3.15",
"2.76", "2.620", "2.875", "2.320", "3.215", "3.440", "3.460",
"16.46", "17.02", "18.61", "19.44", "17.02", "20.22", "0", "0",
"1", "1", "0", "1", "1", "1", "1", "0", "0", "0", "4", "4", "4",
"3", "3", "3", "4", "4", "1", "1", "2", "1"), caption = NULL,
escape = TRUE, vlign = "")
debug: {

[rest of function display omitted]

I see here that vlign = "" is being shown as an argument.  However, when
I print vlign, sometimes I get "object not found", and somethings I get

Browse[2]> vline
debug: [1] "|"

(which is what the default value would be).  In the latter case, I also see

Browse[2]> list(...)
$vlign
[1] ""

i.e. vlign remains part of the ... list, it wasn't bound to the argument
named vlign.

I can't spot anything particularly strange in the way knitr is handling
this; can anyone else?  My sessionInfo() is below.

Duncan Murdoch

  > sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.9

Matrix products: default
BLAS:
/Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib

LAPACK:
/Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;
   LAPACK version 3.11.0

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.1 tools_4.3.1knitr_1.44 xfun_0.40


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Strange behaviour of do.call()

2023-09-19 Thread Duncan Murdoch
The knitr::kable() function does some internal setup, including 
determining the target format, and then calls an internal function using


  do.call(paste("kable", format, sep = "_"), list(x = x,
caption = caption, escape = escape, ...))

I was interested in setting the `vlign` argument to knitr:::kable_latex, 
using this code:


  knitr::kable(head(mtcars), format="latex", align = "c", vlign="")

If I debug knitr::kable, I can see that `vlign = ""` is part of 
list(...).  However, if I debug knitr:::kable_latex, I get weird results:


  > debug(knitr:::kable_latex)
  > knitr::kable(head(mtcars), format="latex", align = "c", vlign="")
  debugging in: kable_latex(x = c("Mazda RX4", "Mazda RX4 Wag", "Datsun 
710",

  "Hornet 4 Drive", "Hornet Sportabout", "Valiant", "21.0", "21.0",
  "22.8", "21.4", "18.7", "18.1", "6", "6", "4", "6", "8", "6",
  "160", "160", "108", "258", "360", "225", "110", "110", "93",
  "110", "175", "105", "3.90", "3.90", "3.85", "3.08", "3.15",
  "2.76", "2.620", "2.875", "2.320", "3.215", "3.440", "3.460",
  "16.46", "17.02", "18.61", "19.44", "17.02", "20.22", "0", "0",
  "1", "1", "0", "1", "1", "1", "1", "0", "0", "0", "4", "4", "4",
  "3", "3", "3", "4", "4", "1", "1", "2", "1"), caption = NULL,
  escape = TRUE, vlign = "")
debug: {

  [rest of function display omitted]

I see here that vlign = "" is being shown as an argument.  However, when 
I print vlign, sometimes I get "object not found", and somethings I get


  Browse[2]> vline
  debug: [1] "|"

(which is what the default value would be).  In the latter case, I also see

  Browse[2]> list(...)
  $vlign
  [1] ""

i.e. vlign remains part of the ... list, it wasn't bound to the argument 
named vlign.


I can't spot anything particularly strange in the way knitr is handling 
this; can anyone else?  My sessionInfo() is below.


Duncan Murdoch

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.9

Matrix products: default
BLAS: 
/Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 

LAPACK: 
/Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; 
 LAPACK version 3.11.0


locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.1 tools_4.3.1knitr_1.44 xfun_0.40

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug in PCRE interface code

2023-09-04 Thread Duncan Murdoch
This Stackoverflow question https://stackoverflow.com/q/77036362 turned 
up a bug in the R PCRE interface.


The example (currently in an edit to the original question) tried to use 
named capture with more than 127 named groups.  Here's the code:


append_unique_id <- function(x) {
  for (i in seq_along(x)) {
x[i] <- paste0("<", paste(sample(letters, 10), collapse = ""), ">", 
x[i])

  }
  x
}

list_regexes <- sample(letters, 128, TRUE) # <<<<<<<<<<< change this to
   # 127 and it works
regex2 <- append_unique_id(list_regexes)
regex2 <- paste0("(?", regex2, ")")
regex2 <- paste(regex2, collapse = "|")

out <- gregexpr(regex2, "Cyprus", perl = TRUE, ignore.case = TRUE)
#> Error in gregexpr(regex2, "Cyprus", perl = TRUE, ignore.case = TRUE): 
attempt to set index -129/128 in SET_STRING_ELT


I think the bug is in R, here: 
https://github.com/wch/r-source/blob/57d15d68235dd9bcfaa51fce83aaa71163a020e1/src/main/grep.c#L3079


This is the line

int capture_num = (entry[0]<<8) + entry[1] - 1;

where entry is declared as a pointer to a char.  What this is doing is 
extracting a 16 bit number from the first two bytes of a character 
string holding the name of the capture group.  Since char is a signed 
type, the conversion of bytes to integer gets messed up and the value 
comes out wrong.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Calling a replacement function in a custom environment

2023-08-27 Thread Duncan Murdoch
I think there isn't a way to make this work other than calling `is.na<-` 
explicitly:


  x <- b$`is.na<-`(x, TRUE)

It seems like a reasonable suggestion to make

  b$is.na(x) <- TRUE

work as long as b is an environment.

If you wanted it to work when b was a list, it would be more problematic 
because of partial name matching.  E.g. suppose b was a list containing 
functions partial(), partial<-(), and part<-(), and I call


  b$part(x) <- 1

what would be called?

Duncan Murdoch

On 27/08/2023 10:59 a.m., Konrad Rudolph wrote:

Hello all,

I am wondering whether it’s at all possible to call a replacement function
in a custom environment. From my experiments this appears not to be the
case, and I am wondering whether that restriction is intentional.

To wit, the following works:

x = 1
base::is.na(x) = TRUE

However, the following fails:

x = 1
b = baseenv()
b$is.na(x) = TRUE

The error message is "invalid function in complex assignment". Grepping the
R code for this error message reveals that this behaviour seems to be
hard-coded in function `applydefine` in src/main/eval.c: the function
explicitly checks for `::` and :::` and permits those assignments, but has
no equivalent treatment for `$`.

Am I overlooking something to make this work? And if not — unless there’s a
concrete reason against it, could it be considered to add support for this
syntax, i.e. for calling a replacement function by `$`-subsetting the
defining environment, as shown above?

Cheers,
Konrad



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Concerns with SVD -- and the Matrix Exponential

2023-08-16 Thread Duncan Murdoch
Dear Durga, I think you have a basic misunderstanding of this mailing 
list.  The responses you have received are from users and volunteer 
developers.  There are no "officials of R-Software".  R is an open 
source project containing contributions from hundreds (maybe thousands) 
of people.


It's only natural that there will be some contradictions in the 
responses from those people.  It's up to you to read the responses and 
find the parts of them that are useful to you.  It's rude of you to ask 
one particular respondent to do that work for you.


Duncan Murdoch

On 16/08/2023 4:06 a.m., Durga Prasad G me14d059 wrote:

Dear Martin, I am getting different responses from different officials of
R-Software, but those statements are contradicting with the statements
discussed in your email. Kindly go through the previous files and emails,
and respond. I personally think, together we can fix the issue which is
observed in SVD.

Thanks and regards
Durga Prasad

On Tue, Aug 1, 2023 at 4:51 PM Lakshman, Aidan H  wrote:


Hi Durga,

There’s an error in your calculations here. You mention that for the SVD
of a symmetric matrix, we must have U=V, but this is not a correct
statement. The unitary matrices are only equivalent if the matrix A is
positive semidefinite.

In your example, you provide the matrix {{1,4},{4,1}}, which has
eigenvalues 5 and -3. This is not positive semidefinite, and thus there's
no requirement that the unitary matrices be equivalent.

If you verify your example with something like wolfram alpha, you’ll find
that R’s solution is correct.

-Aidan

---

Aidan Lakshman (he/him) <https://www.ahl27.com/>

Doctoral Fellow, Wright Lab <https://www.wrightlabscience.com/>

University of Pittsburgh School of Medicine

Department of Biomedical Informatics

ah...@pitt.edu

(724) 612-9940



--
*From:* R-devel  on behalf of Durga Prasad
G me14d059 
*Sent:* Tuesday, August 1, 2023 4:18:20 AM
*To:* Martin Maechler ; r-devel@r-project.org
; profjcn...@gmail.com 
*Subject:* Re: [Rd] Concerns with SVD -- and the Matrix Exponential

Hi Martin, Thank you for your reply. The response and the links provided by
you helped to learn more. But I am not able to obtain the simple even
powers of a matrix: one simple case is the square of a matrix. The square
of the matrix using direct matrix multiplication operations and svd (A = U
D V') are different. Kindly check the attached file for the complete
explanation. I want to know which technique was used in building the svd in
R-Software. I want to discuss about svd if you schedule a meeting.

Thanks and Regards
Durga Prasad


On Mon, Jul 17, 2023 at 2:13 PM Martin Maechler <
maech...@stat.math.ethz.ch>
wrote:


J C Nash
 on Sun, 16 Jul 2023 13:30:57 -0400 writes:


 > Better check your definitions of SVD -- there are several
 > forms, but all I am aware of (and I wrote a couple of the
 > codes in the early 1970s for the SVD) have positive
 > singular values.

 > JN

Indeed.

More generally, the decomposition A = U D V'
(with diagonal D and orthogonal U,V)
is not at all unique.

There are not only many possible different choices of the sign
of the diagonal entries, but also the *ordering* of the singular values
is non unique.
That's why R and 'Lapack', the world-standard for
   computer/numerical linear algebra, and others I think,
make the decomposition unique by requiring
non-negative entries in D and have them *sorted* decreasingly.

The latter is what the help page   help(svd)  always said
(and you should have studied that before raising such concerns).

-

To your second point (in the document), the matrix exponential:
It is less known, but still has been known among experts for
many years (and I think even among students of a class on
numerical linear algebra), that there are quite a
few mathematically equivalent ways to compute the matrix exponential,
*BUT* that most of these may be numerically disastrous, for several
different reasons depending on the case.

This has been known for close to 50 years now:

  Cleve Moler and Charles Van Loan  (1978)
  Nineteen Dubious Ways to Compute the Exponential of a Matrix
  SIAM Review Vol. 20(4)


https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoi.org%2F10.1137%2F1020098&data=05%7C01%7Cahl27%40pitt.edu%7C8575b77db32345ca544b08db927ceae0%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638264837816871329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Y4mlFL%2FggLKd7FoIoY62esiFGUwukRG0YmELsJj7nd0%3D&reserved=0
<https://doi.org/10.1137/1020098>


Where as that publication had been important and much cited at
the time, the same authors (known world experts in the field)
wrote a review of that review 25 year

Re: [Rd] feature request: optim() iteration of functions that return multiple values

2023-08-04 Thread Duncan Murdoch

Enrico gave you a workaround that stores the extra values in an environment.

Another possible workaround is an optional argument to myfun() that asks 
it to return more information, e.g.


fr <- function(x, data, extraInfo = FALSE) {   ## Rosenbrock Banana 
function

x1 <- x[1]
x2 <- x[2]
ans <- 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
if (extraInfo) {
  list(ans=ans, extras = ...)
else
  ans
}

Then after optim() finishes, call fr() again with parameters as returned 
by optim, and extraInfo = TRUE.


Duncan Murdoch

On 03/08/2023 4:21 p.m., Sami Tuomivaara wrote:

Dear all,

I have used optim a lot in contexts where it would useful to be able to iterate 
function myfun that, in addition to the primary objective to be minimized 
('minimize.me'), could return other values such as alternative metrics of the 
minimization, informative intermediate values from the calculations, etc.

myfun  <- function()
{
...
return(list(minimize.me = minimize.me, R2 = R2, pval = pval, etc.))
}

During the iteration, optim could utilize just the first value from the myfun 
return list; all the other values calculated and returned by myfun could be 
ignored by optim.
After convergence, the other return values of myfun could be finally extracted and 
appended into the optim return value (which is a list) as additional entry e.g.: 
$aux <- list(R2, pval, etc.), (without 'minimize.me' as it is already returned 
as $value).

The usual ways for accessing optim return values, e.g., $par, $value, etc. are 
not affected.  Computational cost may not be prohibitive either.  Is this 
feasible to consider?


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in perl=TRUE regexp matching?

2023-07-24 Thread Duncan Murdoch

On 23/07/2023 9:01 p.m., Brodie Gaslam wrote:



On 7/23/23 4:29 PM, Duncan Murdoch wrote:

The help page for `?gsub` says (in the context of performance
considerations):


"... just one UTF-8 string will force all the matching to be done in
Unicode"


It's been a little while since I looked at the code but IIRC this just
means that strings are converted to UTF-8 before matching.  The problem
here seems to be more about the interpretation of the "\\w+" token by
PCRE.  I think this makes it a little clearer what's going on:

  gsub("\\w", "a", "Γ", perl=TRUE)
  [1] "Γ"

So no match.  The PCRE docs
https://www.pcre.org/original/doc/html/pcrepattern.html (this might be
the old docs, but it works for our purposes here) mention we can turn on
unicode property matching with the "(*UCP)" token:

   gsub("(*UCP)\\w", "a", "Γ", perl=TRUE)
   [1] "a"

So there are two layers at play here.  The first one is whether R
converts strings to UTF-8, which I think is what the documentation is
about.  The other is whether the PCRE engine is configured to recognize
Unicode properties, which at least in both of our configurations for
this specific case it appears like it is not.


From the surrounding context, I think the docs are talking about more 
than just conversion to UTF-8.  The full paragraph reads like this:


"If you are working in a single-byte locale (though not common since R 
4.2) and have marked UTF-8 strings that are representable in that 
locale, convert them first as just one UTF-8 string will force all the 
matching to be done in Unicode, which attracts a penalty of around

3× for the default POSIX 1003.2 mode."

i.e. it says the presence of UTF-8 strings slows things down by a factor 
of 3, so it's faster to convert everything to the local encoding.  If it 
was just conversion, I don't think that would be true.


But maybe "for the default POSIX 1003.2 mode" applies to the whole 
paragraph, not just to the penalty, so this is intentional.


Duncan Murdoch


Best,

B.





However, this thread on SO:  https://stackoverflow.com/q/76749529 gives
some indication that this is not true for `perl = TRUE`.  Specifically:

  > strings <- c("89 562", "John Smith", "Γιάννης Παπαδόπουλος",
"Jean-François Dupuis")
  > Encoding(strings)
[1] "unknown" "unknown" "UTF-8"   "UTF-8"
  > regex <- "\\B\\w+| +"
  > gsub(regex, "", strings)
[1] "85"   "JS"   "ΓΠ"   "J-FD"

  > gsub(regex, "", strings, perl = TRUE)
[1] "85"  "JS"  "ΓιάννηςΠαπαδόπουλος"
"J-FçoD"

and the website https://regex101.com/r/QDFrOE/1 gives the first answer
when the regex option /u ("match with full Unicode) is specified, but
the second answer when it is not.

Now I'm not at all sure that that website is authoritative, but this
looks like a flag may have been missed in the `perl = TRUE` case.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug in perl=TRUE regexp matching?

2023-07-23 Thread Duncan Murdoch
The help page for `?gsub` says (in the context of performance 
considerations):



"... just one UTF-8 string will force all the matching to be done in 
Unicode"



However, this thread on SO:  https://stackoverflow.com/q/76749529 gives 
some indication that this is not true for `perl = TRUE`.  Specifically:


> strings <- c("89 562", "John Smith", "Γιάννης Παπαδόπουλος", 
"Jean-François Dupuis")

> Encoding(strings)
[1] "unknown" "unknown" "UTF-8"   "UTF-8"
> regex <- "\\B\\w+| +"
> gsub(regex, "", strings)
[1] "85"   "JS"   "ΓΠ"   "J-FD"

> gsub(regex, "", strings, perl = TRUE)
[1] "85"  "JS"  "ΓιάννηςΠαπαδόπουλος" 
"J-FçoD"


and the website https://regex101.com/r/QDFrOE/1 gives the first answer 
when the regex option /u ("match with full Unicode) is specified, but 
the second answer when it is not.


Now I'm not at all sure that that website is authoritative, but this 
looks like a flag may have been missed in the `perl = TRUE` case.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] tools::parseLatex() crashes on "\\verb{}"

2023-07-21 Thread Duncan Murdoch

On 21/07/2023 11:34 a.m., Antoine Fabri wrote:

Do I understand correctly that we don't want Rd files to be valid latex ?


Yes, it needs to be valid Rd format, which is "a simple markup language 
much of which closely resembles (La)TeX".  For more details see section 
2.1 of Writing R Extensions, which includes links to even more detail.


Duncan Murdoch



This seems odd to me.
I see that `tools::parse_Rd()` doesn't like `\verb!foo!` so maybe roxygen2
is actually doing the right thing (as opposed to just trying to) ?

`parse_Rd() ` is probably what I need indeed, for some reason I hadn't
found it, so that should fix my own issue here thanks a lot.

Le ven. 21 juil. 2023 à 16:18, Ivan Krylov  a écrit :


В Fri, 21 Jul 2023 15:14:09 +0200
Antoine Fabri  пишет:


On a closer look it seems like roxygen2 introduces those, when using
markdown backtick quoting, if the quoted content is not syntactic. For
instance:

#' `c(c(1)`
#' `c(c(1))`

Will convert the first line to `\verb{c(c(1)}` and the second to
`\code{c(c(1))}`.


roxygen2 tries to do the right thing here. As defined in "Parsing Rd
files" [*], \code{} blocks are supposed to contain syntactically valid
R code. When something that is not valid R is given in a Markdown code
block, roxygen2 should not output \code{}, so it outputs \verb{}.

Also, unlike in LaTeX as understood by tools::parseLatex(), \verb{}
blocks use the {} braces in R documentation, and are understood
correctly by tools::parse_Rd(). Perhaps you also need tools::parse_Rd()?

--
Best regards,
Ivan

[*] https://developer.r-project.org/parseRd.pdf



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] question about an R idiom: eval()ing a quoted block

2023-07-11 Thread Duncan Murdoch

On 11/07/2023 6:01 p.m., Ben Bolker wrote:

In a few places in the R source code, such as the $initialize element
of `family` objects, and in the body of power.t.test() (possibly other
power.* functions), sets of instructions that will need to be run later
are encapsulated by saving them as an expression and later applying
eval(), rather than as a function. This seems weird to me; the only
reason I can think of for doing it this way is to avoid having to pass
back multiple objects and assign them in the calling environment (since
R doesn't have a particularly nice form of Python's tuple-unpacking idiom).

Am I missing something?

   cheers
 Ben


https://github.com/r-devel/r-svn/blob/eac72e66a4d2c2aba50867bd80643b978febf5a3/src/library/stats/R/power.R#L38-L52

https://github.com/r-devel/r-svn/blob/master/src/library/stats/R/family.R#L166-L171


Those examples are very old (the second is at least 20 years old).  It 
may be they were written by someone who was thinking in S rather than in 
R.


As far as I recall (but I might be wrong), S didn't have the same 
scoping rules for accessing and modifying local variables in a function 
from a nested function.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Strange error in R CMD check --timings

2023-07-02 Thread Duncan Murdoch

On 02/07/2023 2:34 p.m., Sebastian Meyer wrote:

Am 02.07.23 um 18:01 schrieb Duncan Murdoch:

This SO post: https://stackoverflow.com/q/76583828 describes a strange R
CMD check error.  Depending on the contents of a comment in one of the
examples sections of a help page, an error like this could be triggered:

 > base::assign(".dptime", (proc.time() - get(".ptime", pos =
"CheckExEnv")), pos = "CheckExEnv")
 > base::cat("read_net", base::get(".format_ptime", pos =
'CheckExEnv')(get(".dptime", pos = "CheckExEnv")), "\n",
file=base::get(".ExTimings", pos = 'CheckExEnv'), append=TRUE, sep="\t")
 > ### * 
 > ###
 > cleanEx()
 > options(digits = 7L)
 > base::cat("Time elapsed: ", proc.time() - base::g
 + Error: unexpected end of input
 Execution halted

The code without the offending comment is available here:

 https://github.com/rob-ward-psych/iac

at revision c2f3529.  To add the offending comment, change line 318 of
R/iac_networks.R to

 #' # Ken is a burglar in the Sharks, what is retrieved from his name

and run roxygen on the package, so the long comment ends up in the
examples section of man/read_net.Rd instead of the empty comment that is
there on Github.

At first it appeared to require devtools::check(), but in fact the error
comes from R CMD check --timings .  One thing that may be related is
that an earlier example had this code:

 file.edit(iac_example("what_where.yaml"))


I could reproduce the check error on Ubuntu for some settings of EDITOR.

- For EDITOR="nano", the output below file.edit() in iac-Ex.Rout showed
content from the first few lines of the yaml file and then
Too many errors from stdin

- For EDITOR="vi", iac-Ex.Rout showed
Vim: Warning: Output is not to a terminal
Vim: Warning: Input is not from a terminal
Press ENTER or type command to continue
... some content from the yaml file ...
Vim: Error reading input, exiting...

OTOH,
EDITOR="nonexistent" resulted in a successful check run with file.edit()
output
sh: 1: nonexistent: not found
Warning: error in running command

whereas EDITOR="emacs" would open the GUI while "* checking examples
...", waiting for me to finish editing.

I agree that the package should conditionalize a [file.]edit() example
on the R session being interactive(). I'm wondering, however, whether R
CMD check should itself generally set the "editor" option to a read-only
variant, e.g., a function that just calls file.show() with a warning
when it runs the (massaged) examples. For related reasons, I guess, it
already sets the "pager" option on Windows to "console". Alternatively,
if the massaged "editor" option called stop() (similar to T and F
producing errors), such examples would really need to be conditioned on
interactive().


Given how variable the results are, I think the last option (calling 
file.edit() non-interactively should be an error) would be my choice.


Duncan Murdoch



Sebastian Meyer



If that line is skipped (by conditioning on interactive()), the error
goes away.  But this might be unrelated, since deleting that comment
also makes the error go away.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Strange error in R CMD check --timings

2023-07-02 Thread Duncan Murdoch
This SO post: https://stackoverflow.com/q/76583828 describes a strange R 
CMD check error.  Depending on the contents of a comment in one of the 
examples sections of a help page, an error like this could be triggered:


  > base::assign(".dptime", (proc.time() - get(".ptime", pos = 
"CheckExEnv")), pos = "CheckExEnv")
  > base::cat("read_net", base::get(".format_ptime", pos = 
'CheckExEnv')(get(".dptime", pos = "CheckExEnv")), "\n", 
file=base::get(".ExTimings", pos = 'CheckExEnv'), append=TRUE, sep="\t")

  > ### * 
  > ###
  > cleanEx()
  > options(digits = 7L)
  > base::cat("Time elapsed: ", proc.time() - base::g
  + Error: unexpected end of input
  Execution halted

The code without the offending comment is available here:

  https://github.com/rob-ward-psych/iac

at revision c2f3529.  To add the offending comment, change line 318 of 
R/iac_networks.R to


  #' # Ken is a burglar in the Sharks, what is retrieved from his name

and run roxygen on the package, so the long comment ends up in the 
examples section of man/read_net.Rd instead of the empty comment that is 
there on Github.


At first it appeared to require devtools::check(), but in fact the error 
comes from R CMD check --timings .  One thing that may be related is 
that an earlier example had this code:


  file.edit(iac_example("what_where.yaml"))

If that line is skipped (by conditioning on interactive()), the error 
goes away.  But this might be unrelated, since deleting that comment 
also makes the error go away.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] codetools wrongly complains about lazy evaluation in S4 methods

2023-06-12 Thread Duncan Murdoch
Most of the errors, warnings and notes generated by R CMD check are 
generated by code in the tools package, usually in the tools/R/QC.R 
source file.  Search that file for the error message, then backtrack to 
find the code that causes it to be triggered.


If I recall correctly, it works on the evaluated source rather than the 
actual source, so it will only see the result of evaluating `setMethod` 
in your example.  I don't know the methods package well enough to know 
exactly what that does, but presumably it produces a function and hides 
it somewhere so that the S4 dispatch can find it when it needs to.


Duncan Murdoch

On 12/06/2023 2:03 p.m., Mikael Jagan wrote:

Thanks both.  Yes, I was aware of globalVariables, etc.  I guess I was hoping
to be pointed to the right place in the source code, in case the issue could
be addressed properly, notably as it seems to have already been addressed for
functions that are not S4 methods, i.e., codetools is apparently not bothered
by

  def <- function(x = y) { y <- 0; x }

but still complains about

  setMethod("someGeneric", "someClass", def)

...

Mikael

On 2023-06-07 5:13 am, Gabriel Becker wrote:

The API supported workaround is to call globalVariables, which,
essentially, declares the variables without defining them (a distinction R
does not usually make).

The issue with this approach, of course, is that its a very blunt
instrument. It will cause false negatives if you accidentally use the same
symbol in a standard evaluation context elsewhere in your code.
Nonetheless, that's the intended approach as far as i know.

Best,
~G



On Wed, Jun 7, 2023 at 1:07 AM Serguei Sokol via R-devel <
r-devel@r-project.org> wrote:


Le 03/06/2023 à 17:50, Mikael Jagan a écrit :

In a package, I define a method for not-yet-generic function 'qr.X'
like so:

  > setOldClass("qr")
  > setMethod("qr.X", signature(qr = "qr"), function(qr, complete,
ncol) NULL)

The formals of the newly generic 'qr.X' are inherited from the
non-generic
function in the base namespace.  Notably, the inherited default value of
formal argument 'ncol' relies on lazy evaluation:

  > formals(qr.X)[["ncol"]]
  if (complete) nrow(R) else min(dim(R))

where 'R' must be defined in the body of any method that might
evaluate 'ncol'.
To my surprise, tools:::.check_code_usage_in_package() complains about
the
undefined symbol:

  qr.X: no visible binding for global variable 'R'
  qr.X,qr: no visible binding for global variable 'R'
  Undefined global functions or variables:
R

I think this issue is similar to the complaints about non defined
variables in expressions involving non standard evaluation, e.g. column
names in a data frame which are used as unquoted symbols. One of
workarounds is simply to declare them somewhere in your code. In your
case, it could be something as simple as:

 R=NULL

Best,
Serguei.



I claim that it should _not_ complain, given that lazy evaluation is a
really
a feature of the language _and_ given that it already does not
complain about
the formals of functions that are not S4 methods.

Having said that, it is not obvious to me what in codetools would need
to change
here.  Any ideas?

I've attached a script that creates and installs a test package and
reproduces
the check output by calling tools:::.check_code_usage_in_package().
Hope it
gets through.

Mikael

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rd macros are not expanded inside of \eqn{} or \deqn{}

2023-06-12 Thread Duncan Murdoch

A description of the format is given in this document:

  https://developer.r-project.org/parseRd.pdf

As far as I know that document is still up to date.  As it says in Table 
3, \eqn and \deqn take "Verbatim" arguments.  That mode is described in 
the introduction to Section 2; it contains text and comments, so by 
design no macros are expanded.


I think it's unlikely that this would change.  The problem is that the 
equation markup can contain LaTeX macros.  So the parser would have to 
have a new mode where it distinguished between LaTeX macros and Rd 
macros.  But then how would you write true verbatim text, where you're 
trying to discuss the macros?  It gets complicated very quickly.


What you could conceivably do is write your own macro that passed its 
content to R code that expanded your user-defined macros. It sounds 
complicated, and would probably be hard to get right.


Duncan Murdoch

On 12/06/2023 1:55 p.m., Mikael Jagan wrote:

I was a bit surprised to learn that, if one has an Rd file as below:

  %% zzz.Rd
  \newcommand{\zzz}{whatever}
  \name{zzz}
  \title{zzz}
  \description{ \zzz{} \eqn{\zzz{}} \deqn{\zzz{}} }

then the macro is _not_ expanded inside of \eqn{} or \deqn{} when parsed to text
or HTML.  Is this behaviour intentional?  Could it be changed?  Inside of \eqn{}
and \deqn{} is where I am _most_ likely to want to use macros, at least since
R 4.2.0, which added KaTeX support ...

See output pasted below.

Mikael

  > tools::Rd2txt(tools::parse_Rd("zzz.Rd"))
zzz

Description:

   whatever \zzz{}

 \zzz{}

  > tools::Rd2HTML(tools::parse_Rd("zzz.Rd"))
R: zzz


https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.css";>

const macros = { "\\R": "\\textsf{R}", "\\code": "\\texttt"};
function processMathHTML() {
  var l = document.getElementsByClassName('reqn');
  for (let e of l) { katex.render(e.textContent, e, { throwOnError: false,
macros }); }
  return;
}
https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.js"</a>;
  onload="processMathHTML();">



zzzR
Documentation

zzz

Description

   whatever \zzz{} 
\zzz{}






__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to locate references to an environment?

2023-05-23 Thread Duncan Murdoch

On 23/05/2023 2:29 p.m., Peter Meilstrup wrote:

R developers,

I am trying to track down a memory leak in my R package.

I have a complex object O which comprises a lot of closures and such.
Among which, the object uses an environment E to perform computations
and keep intermediate values in. When O closes/finishes with its task
it nulls out its reference to E so that that intermediate data can be
garbage collected; I've verified that it does null the reference.

However, it seems there is another reference to E floating around. I
can tell because I can ask O to put a large array in E, then tell O to
close, which nulls the reference to E, but then if I serialize(O,
ascii=TRUE) I can still see the array in the output.

Dangling references to E could come from a closure created in E, or an
unforced promise from a function call evaluated in E that created a
closure I still have a reference to, or, ... my question is how do I
locate the reference?

Is there a way to scan the workspace for objects that refer to a given object?

Or is there a tool that will unpack/explain serialize()'s .rds format
in a more human-readable way so that I can tell where the reference to
E occurs?


I don't know of such a tool.  You can generate a lot of data about the 
internals of an object by using


 .Internal(inspect(O))

If your O is complex as you say, you probably won't want to read through 
all of that output, but you can save it to a file and search for ENVSXP 
for an environment, or if you have printed E and it shows up as 
something like




you can search for that address in the output, e.g. in an example I just 
ran, an environment and a closure that uses that environment were printed as


  @7fb3761d1d98 04 ENVSXP g1c0 [MARK,REF(4)] <0x7fb3761d1d98>

and

  @7fb3762befd0 03 CLOSXP g1c0 [MARK,REF(2),ATT]
  FORMALS:
@7fb3c00f7ee0 00 NILSXP g1c0 [MARK,REF(65535)]
  BODY:
@7fb376e5bc88 14 REALSXP g1c1 [MARK,REF(2)] (len=1, tl=0) 123
  CLOENV:
@7fb3761d1d98 04 ENVSXP g1c0 [MARK,REF(4)] <0x7fb3761d1d98>


  The hard part might be to identify what you've found once you find 
it, because names of objects aren't printed with the object, they come 
later when the "names" attribute gets printed.


So it might be easier to do it by trial and error:

 rm(O) and save the workspace.  Does your test array get saved?  If so, 
it's referenced from something outside of O.  If not, remove the 
elements of O one by one, until saving it doesn't save the array.  The 
last thing removed is a culprit. (There may be others...)


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Change DEFAULTDEPARSE to DEFAULTDEPARSE | SHOWATTRIBUTES ?

2023-05-06 Thread Duncan Murdoch

On 06/05/2023 12:26 p.m., Mikael Jagan wrote:

The deparse options used by default by 'deparse' and 'dput' are

  c("keepNA", "keepInteger", "niceNames", "showAttributes")

but Defn.h still has

  #define DEFAULTDEPARSE1089 /* KEEPINTEGER | KEEPNA | 
NICE_NAMES, used for
calls */

i.e., with the SHOWATTRIBUTES bit turned off.  Is that on purpose?
Note that this leads to weird things like:

  > (expr <- call("is.matrix", matrix(1:4, 2L, 2L)))
  is.matrix(1:4)
  > eval(expr)
  [1] TRUE

which can confuse anyone not paying close attention ...


I agree that deparse does a better job in this case, but I'm not sure 
I'd recommend the change.  You should try it, and see if any tests are 
broken.  I'd guess there will be some in base R, but I might be wrong.


Contributed packages are another issue.  Lots of packages test for 
changes in their output; this change could break those.


I think the underlying issue is that call("is.matrix", matrix(1:4, 2L, 
2L)) produces something that would never be produced by the parser, so 
"deparsing" isn't really well defined.


For example deparse(expr) also gets it wrong:

  [1] "is.matrix(structure(1:4, dim = c(2L, 2L)))"

Even though that evaluates to the same result, it isn't the expression I 
put into expr.  There are also many examples where you don't get the 
right answer from either version.  A simple one is this:


   > (expr <- call("identity", pi))
   identity(3.14159265358979)
   > eval(expr)
   [1] 3.141593
   > eval(expr) == identity(3.14159265358979)
   [1] FALSE

Here the issue is that the deparsed expression doesn't include the full 
precision for pi that is stored in expr.


(Aside:  This is one reason why it's such a bad idea to use the common 
pattern:


   deparse expression
   modify part of it
   parse the result

You can often get more changes than you intended.  It's better to work 
on the expression directly.)


BTW, I just noticed something else in deparse() that's probably a bug.

   > deparse(expr, control = "exact")
   [1] "quote(I(0x1.921fb54442d18p+1))"

I don't know why "quote" is now showing up; it means that

   parse(text = deparse(expr, control = "exact"))

produces something that's really quite different from expr.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Generalised piping into operators

2023-04-21 Thread Duncan Murdoch

On 21/04/2023 12:16 p.m., Michael Milton wrote:
I'm afraid I don't understand. I know that parsing `+`(1, 1) returns a 
result equivalent to `1 + 1`, but why does that impose a restriction on 
parsing the pipe operator? What is the downside of allowing arbitrary 
RHS functions?


I thought the decision to exclude "_ + 1" happens after enough parsing 
has happened so that the code making the decision can't tell the 
difference between "_ + 1" and "`+`(_, 1)".  I might be wrong about 
that, but this suggests it:


  > quote(_ + 1)
  Error in quote("_" + 1) : invalid use of pipe placeholder (:1:0)
  > quote(`+`(_,  1))
  Error in quote("_" + 1) : invalid use of pipe placeholder (:1:0)

On the other hand, this works:

  > quote(x |> `+`(e1 = _, 1))
  x + 1

So maybe `+`() is fine after all.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Generalised piping into operators

2023-04-21 Thread Duncan Murdoch

On 21/04/2023 11:33 a.m., Michael Milton wrote:
Thanks, this makes sense. Is there a similar precedence reasoning behind 
why operator functions (`+` etc) can't be piped into?


Yes:

> identical(quote(1 + 1), quote(`+`(1, 1)))
[1] TRUE

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Generalised piping into operators

2023-04-21 Thread Duncan Murdoch

On 21/04/2023 4:35 a.m., Michael Milton wrote:

I just checked out R-devel and noticed that the new "pipe extractor"
capability coming in 4.3 only works for the 4 extractor operators, but no
other standard operators like +, *, %*% etc, meaning that e.g. mtcars |>
as.matrix() |> _ + 1 |> colMeans() is a syntax error. In addition, we are
still subject to the restriction that the functions on the RHS of a pipe
can't have special names, so mtcars |> as.matrix() |> `+`(1) |> colMeans() is
also a syntax error.

Either option would be great, as I find it much cleaner to coordinate a
sequence of function calls using pipes rather than nested function calls or
using temporary variables.

May I enquire why both of these expressions are disallowed, and if it might
be possible to help remove one or both of these restrictions? There is some
discussion at
https://stat.ethz.ch/pipermail/r-devel/2020-December/080210.html but the
thread is mostly concerned with other things like the placeholder and
whether or not parentheses can be omitted. My naive view is that piping
into a special operator function like `+` would be the least ambiguous: `+`
presumably parses to the same type of token as `colMeans` does, so the
function parse tree seems like it would work fine if this was allowed in a
pipe.


If it were allowed, people would expect expressions like yours to work, 
but as ?Syntax says, your pipe would actually be parsed as something like


  (mtcars |> as.matrix() |> _) + (1 |> colMeans())

because the |> operator has higher precedence than +.

Since parens would be needed somewhere to override this, you may as well 
type


  (as.matrix(mtcars) + 1) |> colMeans()

which is both shorter and arguably clearer than

  mtcars |> as.matrix() |> (_ + 1) |> colMeans()

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unique ID for conditions to supress/rethrow selected conditions?

2023-04-16 Thread Duncan Murdoch

On 16/04/2023 10:49 a.m., nos...@altfeld-im.de wrote:

On Sun, 2023-04-16 at 13:52 +0200, Iñaki Ucar wrote:


I agree that something like this would be a nice addition. With the
current condition system, it would be certainly easy (but quite a lot
of work) to define a hierarchy of built-in conditions, and then use
them consistently throughout base R.


Yes, a typed condition system would be great.


Why don't you contribute some patches to the sources to implement it in 
some particular area?  You can follow the pattern that is used in the 
gram.y file:  identify a class of messages, and what information is 
useful in them.  Write a function or functions to generate errors or 
warnings with that class, and possibly a subclass.  Then replace each 
error() and warning() call in some group of functions with a call to 
your function(s).


Even better, since that's a lot of work:  propose it as a project to be 
funded by the R Foundation, or as a shorter project in some other way. 
There are now quite a number of ways to contribute to R, e.g. see


  https://github.com/r-devel/rdevguide

Duncan Murdoch


I have two other ideas:



By reading the "R messages" and "preparing translactions" sections of the "R 
extensions manual"

https://cran.r-project.org/doc/manuals/r-release/R-exts.html#R-messages

I was thinking about using the "unique" R message texts (which are the msgid in 
the *.po files,
see e.g. 
https://github.com/r-devel/r-svn/blob/60a4db2171835067999e96fd2751b6b42c6a6ebc/src/library/base/po/de.po#L892)
to maintain a unique ID (not dependent on the actual translation into the 
current language).

A "simple" solution could be to pre- or postfix each message text with an ID, 
for example this code here

  else errorcall(call, _("non-numeric argument to function"));
  # 
https://github.com/r-devel/r-svn/blob/49597237842697595755415cf9147da26c8d1088/src/main/complex.c#L347

would become

  else errorcall(call, _("non-numeric argument to function [47]"));
or
  else errorcall(call, _("[47] non-numeric argument to function"));

Now the ID could be extracted more easily (at least for base R condition 
messages)...

This would even be back-portable to older R versions to make condition IDs broadly 
available "in the wild".
Another way to introduce an ID for each condition in base R would be ("the hard 
way")

1) by refactoring each and every code location with an embedded message string 
to use a centralized
key/msg_text data structure to "look up" the appropriate message text and

2) use the key to enrich the condition as unique ID (e.g. as an attribute in 
the condition object).

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unique ID for conditions to supress/rethrow selected conditions?

2023-04-16 Thread Duncan Murdoch

As far as I know there are no reliable unique identifiers.

There appears to be a general drift towards classed conditions, but so 
far only a small subset of conditions are generated that way (and to 
tell you the truth, I forget how to detect one, even though I 
contributed some of them to the parser.  One example (I forget whether I 
wrote this one or not) is


https://github.com/r-devel/r-svn/blob/cf233857df61549b71eb466ceab7d081424833d6/src/main/gram.y#L1328

which raises an error with class c("pipebindDisabled", "parseError", 
"error", "condition").


For some conditions, it might be sufficient to generate the condition 
and then use (part of) the generated message as the identifier.  But 
that's not going to always be possible.


Duncan Murdoch



On 16/04/2023 6:56 a.m., nos...@altfeld-im.de wrote:

I am the author of the *tryCatchLog* package and want to

- suppress selected conditions (warnings and messages)
- rethrow  selected conditions (e.g a specific warning as a message or to 
"rename" the condition text).

I could not find any reliable unique identifier for each possible condition

- that (base) R throws
- that 3rd-party packages can throw (out of scope here).



Is there any reliable way to identify each possible condition of base R?

Are there plans to implement such an identifier ("errno")?



PS: Things that do not work good enough IMHO:

 1. Just use the condition classes (not really unique to distiguish between 
each and every condition))

 2. Try to match the condition text
(it depends on the active language setting in R which cannot be switched "on 
the fly" on each platform
 and wordings or translations may even change in the future)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error message for infinite probability parameters in rbinom() and rmultinom()

2023-04-08 Thread Duncan Murdoch

On 08/04/2023 5:53 p.m., Martin Maechler wrote:

Christophe Dutang
 on Sat, 8 Apr 2023 14:21:53 +0200 writes:


 > Dear all,

 > Using rmultinom() in a stochastic model, I found this function returns 
an error message 'NA in probability' for an infinite probability.

 > Maybe, a more precise message will be helpful when debugging.

 >> rmultinom(1, 3:5, c(1/2, 1/3, Inf))
 > Error in rmultinom(1, 3:5, c(1/2, 1/3, Inf)) : NA in probability vector
 >> rmultinom(1, 3:5, c(1/2, 1/3, NA))
 > Error in rmultinom(1, 3:5, c(1/2, 1/3, NA)) : NA in probability vector

Thank you.

I agree the first ('Inf') should not do what it currently does,
and probably the 2nd one should neither give an error.


Note that in rmultinom,  the 'prob' is allowed to be *NOT*
scaled to sum(.) = 1.

Therefore 'Inf' makes sense as the limit (of a sequence) of (a)
very large number(s).

I claim that

   rmultinom(1, 3, c(1/2, 1/3, Inf))

should give the same as

   rmultinom(1, 3, c(1/2, 1/3, 1e300))

even without a warning,


That case makes sense, but is it worth the effort?  Certainly

rmultinom(1, 3, c(1/2, Inf, Inf))

can't give a useful answer because we don't know the relative size of 
the two infinities.  I imagine the first NA comes from computing 
prob/sum(prob), which is c(0, 0, NaN).


Duncan Murdoch

> and OTOH,  an NA in prob may return NA (and signal a warning)
> instead of an error.



 > For rgeom() or rbinom(), we got a warning for infinite probability :

Yes, but there, prob must be in [0,1] ... so that's somewhat differnt.

 >> rbinom(1, 3, Inf)
 > [1] NA
 > Warning message:
 > In rbinom(1, 3, Inf) : NAs produced
 >> rbinom(1, 3, NA)
 > [1] NA
 > Warning message:
 > In rbinom(1, 3, NA) : NAs produced
 >> rgeom(1, Inf)
 > [1] NA
 > Warning message:
 > In rgeom(1, Inf) : NAs produced
 >> rgeom(1, NA)
 > [1] NA
 > Warning message:
 > In rgeom(1, NA) : NAs produced


 > Maybe, it could be better to harmonize the behavior for infinite 
probability.

 > Kind regards, Christophe


 >> sessionInfo()
 > R version 4.2.3 (2023-03-15)
 > Platform: aarch64-apple-darwin20 (64-bit)
 > Running under: macOS Ventura 13.2.1

 > Matrix products: default
 > BLAS:   
/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
 > LAPACK: 
/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

 > locale:
 > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 > attached base packages:
 > [1] stats graphics  grDevices utils datasets  methods   base

 > loaded via a namespace (and not attached):
 > [1] compiler_4.2.3 tools_4.2.3

 > -
 > Christophe DUTANG
 > LJK, Ensimag, Grenoble INP, UGA, France
 > Web: http://dutangc.free.fr

 > __
 > R-devel@r-project.org mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] removeSource() vs. function literals

2023-03-30 Thread Duncan Murdoch

On 30/03/2023 10:32 a.m., Ivan Krylov wrote:

Dear R-devel,

In a package of mine, I use removeSource on expression objects in order
to make expressions that are semantically the same serialize to the
same byte sequences:
https://github.com/cran/depcache/blob/854d68a/R/fixup.R#L8-L34

Today I learned that expressions containing function definitions also
contain the source references for the functions, not as an attribute,
but as a separate argument to the `function` call:

str(quote(function() NULL)[[4]])
# 'srcref' int [1:8] 1 11 1 25 11 25 1 1
# - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'
#   

This means that removeSource() on an expression that would define a
function when evaluated doesn't actually remove the source reference
from the object.

Do you think it would be appropriate to teach removeSource() to remove
such source references? What could be a good way to implement that?
if (is.call(fn) && identical(fn[[1]], 'function')) fn[[4]] <- NULL
sounds too arbitrary. if (inherits(fn, 'srcref')) return(NULL) sounds
too broad.



I don't think there's a simple way to do that.  Functions can define 
functions within themselves.  If you're talking about code that was 
constructed by messing with language objects, it could contain both 
function objects and calls to `function` to construct them.  You'd need 
to recurse through all expressions in the object.  Some of those 
expressions might be environments, so your changes could leak out of the 
function you're working on.


Things are simpler if you know the expression is the unmodified result 
of parsing source code, but if you know that, wouldn't you usually be 
able to control things by setting keep.source = FALSE?


Maybe a workable solution is something like parse(deparse(expr, control 
= "exact"), keep.source = FALSE).  Wouldn't work on environments or 
various exotic types, but would probably warn you if it wasn't working.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] subfolders in the R folder

2023-03-28 Thread Duncan Murdoch

On 28/03/2023 2:00 p.m., Henrik Bengtsson wrote:

A quick drive-by-comment: What if 'R CMD build' would have an option
to flatten R/ subfolders when building the tarball, e.g.

R/unix/a.R
R/windows/a.R
R/a.R

becomes:

R/00__unix__a.R
R/00__windows__a.R
R/a.R

?  Maybe that would be sufficient for most use cases.  The only thing
I can imagine is that source file references (e.g. in check NOTEs)
will be toward the latter and not the former.


If you are renaming a file (or merging multiple files, etc.) you can use 
#line directives so that diagnostic messages refer to the original file.


Duncan Murdoch


Of course, one could write a 'build2' shell script locally that wraps
all this internally, so that one can call 'R CMD build2 mypkg', which
then creates a flattened copy of the package folder, and runs 'R CMD
build' on that. Prototyping that could be a good start to see what
such a solution will bring and what it breaks.





/Henrik

On Tue, Mar 28, 2023 at 6:24 PM Barry Rowlingson
 wrote:


The "good reason" is all the tooling in R doesn't work with subfolders and
would have to be rewritten. All the package check and build stuff. And
that's assuming you don't want to change the basic flat package structure -
for example to allow something like `library(foo)` to attach a package and
`library(foo.bar)` to attach some subset of package `foo`. That would
require more changes of core R package and namespace code.

As a workaround, you could implement a hierarchical structure in your file
*names*. That's what `ggplot2` does with its (...downloads tarball...) 192
files in its R folder. Well mostly, there's a load of files called
annotation- and geom- and plot- and position- and stat- etc etc. No reason
why you can't have multiple "levels" separated with "-" as you would have
multiple folder levels separated with "/". You can then do `ls geom-*` to
see the `geom` "folder" and so on (on a unix shell).

And then when R Core receive a patch that implements subfolders, a quick
shell script will be able to create the hierarchy for you and drop all the
files in the right place.

One reason for the flat folder structure may be that R's packages
themselves have no structure to the functions - compare with Python where
modules can have subfolders and functions in subfolders can be access with
module.subfolder.subsub.foo(x), and module subfolders can be imported etc.
The whole module ecosystem was designed with structure in mind.

I don't think there's any restriction on subfolders in the "inst" folder of
a package so if you have scripts you can arrange them there.

Given that most of my students seem to keep all their 23,420 files in one
folder called "Stuff" I think we can manage like this for a bit longer.

B



On Tue, Mar 28, 2023 at 4:43 PM Antoine Fabri 
wrote:


This email originated outside the University. Check before clicking links
or attachments.

Dear R-devel,

Packages don't allow for subfolders in R with a couple exceptions. We find
in "Writing R extensions" :


The R and man subdirectories may contain OS-specific subdirectories named

unix or windows.

This is something I've seen discussed outside of the mailing list numerous
times, and thanks to this SO question

https://stackoverflow.com/questions/14902199/using-source-subdirectories-within-r-packages-with-roxygen2
I could find a couple instances where this was discussed here as well,
apologies if I missed later discussions :

* https://stat.ethz.ch/pipermail/r-devel/2009-December/056022.html
* https://stat.ethz.ch/pipermail/r-devel/2010-February/056513.html

I don't see a very compelling conclusion, nor a justification for the
behavior, and I see that it makes some users snarky (second link is an
example), so let me make a case.

This limitation is an annoyance for bigger projects where we must choose
between having fewer files with too many objects defined (less structure,
more scrolling), or to have too many scripts, often with long prefixed
names to emulate essentially what folders would do. In my experience this
creates confusion, slows down the workflow, makes onboarding or open source
contributions on a new project harder (where do we start ?), makes dead
code easier to happen, makes it harder to test the rights things etc...

It would seem to me, but I might be naive, that it'd be a quick enough fix
to flatten the R folders not named "unix" or "windows"  when building the
package. Is there a good reason why we can't do that ?

Thanks,

Antoine

PS:
Other SO Q&As:
https://stackoverflow.com/questions/33776643/subdirectory-in-r-package

https://stackoverflow.com/questions/18584807/code-organisation-in-r-package-development

 [[alternative HTML version deleted]]

__
R-devel@r-project.o

Re: [Rd] Query: Could documentation include modernized references?

2023-03-26 Thread Duncan Murdoch

On 26/03/2023 11:54 a.m., J C Nash wrote:

A tangential email discussion with Simon U. has highlighted a long-standing
matter that some tools in the base R distribution are outdated, but that
so many examples and other tools may use them that they cannot be deprecated.

The examples that I am most familiar with concern optimization and nonlinear
least squares, but other workers will surely be able to suggest cases elsewhere.
I was the source (in Pascal) of Nelder-Mead, BFGS and CG algorithms in optim().
BFGS is still mostly competitive, and Nelder-Mead is useful for initial 
exploration
of an optimization problem, but CG was never very good, right from the mid-1970s
well before it was interfaced to R. By contrast Rcgmin works rather well
considering how similar it is in nature to CG. Yet I continue to see use and
even recommendations of these tools in inappropriate circumstances.

Given that it would break too many other packages and examples to drop the
existing tools, should we at least add short notes in the man (.Rd) pages?
I'm thinking of something like

 optim() has methods that are dated. Users are urged to consider suggestions
 from ...

and point to references and/or an appropriate Task View, which could, of course,
be in the references.

I have no idea what steps are needed to make such edits to the man pages. Would
R-core need to be directly involved, or could one or two trusted R developers
be given privileges to seek advice on and implement such modest documentation
additions?  FWIW, I'm willing to participate in such an effort, which I believe
would help users to use appropriate and up-to-date tools.


I can answer your final paragraph:

Currently R-core would need to be directly involved, in that they are 
the only ones with write permission on the R sources.


However, they don't need to do the work, they just need to approve of it 
and commit it.  So I would suggest one way forward is the following:


- You fork one of the mirrors of the R sources from Github, and (perhaps 
with help from others) edit one or two of the pages in the way you're 
describing.  Once you think they are ready, make them available online 
for others to review (Github or Gitlab would help doing this), and then 
submit the changes as a patch against the svn sources on the R Bugzilla 
site.


- Another way could be that you copy the help page sources to a dummy 
package, instead of checking out the whole of the R sources.  You'll 
need to be careful not to miss other changes to the originals between 
the time you make your copy and the time you submit the patches.


Don't do too many pages, because you're probably going to have to work 
out the details of the workflow as you go, and earn R Core's trust by 
submitting good changes and responding to their requests.  And maybe 
don't do any until you hear from a member of R Core that they're willing 
to participate in this, because they certainly don't accept all suggestions.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] WISH: Optional mechanism preventing var <<- value from assigning non-existing variable

2023-03-19 Thread Duncan Murdoch

On 19/03/2023 2:43 p.m., Gabriel Becker wrote:
I have to say <<- is a core debugging tool when assigning into the 
global environment. I suppose I could use assign but that would be 
somewhat annoying.


That said I'm still for this change, the vast overwhelming number of 
times that <<- is in my package code - already rare but it does happen - 
it would absolutely be a bug (typo most likely) for it to get to the 
global environment and assign into it. Assigning into thr global 
environment from package code is a serious anti pattern anyway.


To be honest from the developer perspective what id personally actually 
want is an assigner that was willing to go up exactly one frame from the 
current one to find its binding. That is how I essentially always am 
using <<- myself.


This sounds like a linter would be appropriate:  any time you make an 
assignment that goes more than one level up, it warns you about it.


Other linter rules could limit the destination in other ways, e.g. 
assigning to globalenv() or things in the search list could be disallowed.


Another error I've made a few times is to use "<-" by mistake when "<<-" 
was intended.  A linter could detect this by seeing both `x <- value1` 
and `x <<- value2` in the same context.  That's legal, but (for me at 
least) it usually indicates that one of them is a typo.


Duncan Murdoch



~G

On Sun, Mar 19, 2023, 11:16 AM Bill Dunlap <mailto:williamwdun...@gmail.com>> wrote:


Why should it make an exception for cases where the
about-to-be-assigned-to
name is present in the global environment?  I think it should warn
or give
an error if the altered variable is in any environment on the search
list.

-Bill

On Sun, Mar 19, 2023 at 10:54 AM Duncan Murdoch
mailto:murdoch.dun...@gmail.com>>
wrote:

 > I think that should be the default behaviour. It's pretty late to get
 > that into R 4.3.0, but I think your proposal (with
check.superassignment
 > = FALSE being the default) could make it in, and 4.4.0 could
change the
 > default to TRUE.
 >
 > Duncan
 >
 >
 >
 > On 19/03/2023 12:08 p.m., Henrik Bengtsson wrote:
 > > I'd like to be able to prevent the <<- assignment operator ("super
 > > assignment") from assigning to the global environment unless the
 > > variable already exists and is not locked.  If it does not
exist or is
 > > locked, I'd like an error to be produced.  This would allow me to
 > > evaluate expressions with this temporarily set to protect against
 > > mistakes.
 > >
 > > For example, I'd like to do something like:
 > >
 > > $ R --vanilla
 > >> exists("a")
 > > [1] FALSE
 > >
 > >> options(check.superassignment = TRUE)
 > >> local({ a <<- 1 })
 > > Error: object 'a' not found
 > >
 > >> a <- 0
 > >> local({ a <<- 1 })
 > >> a
 > > [1] 1
 > >
 > >> rm("a")
 > >> options(check.superassignment = FALSE)
 > >> local({ a <<- 1 })
 > >> exists("a")
 > > [1] TRUE
 > >
 > >
 > > BACKGROUND:
 > >
 > >  From help("<<-") we have:
 > >
 > > "The operators <<- and ->> are normally only used in functions, and
 > > cause a search to be made through parent environments for an
existing
 > > definition of the variable being assigned. If such a variable
is found
 > > (and its binding is not locked) then its value is redefined,
otherwise
 > > assignment takes place in the global environment."
 > >
 > > I argue that it's unfortunate that <<- fallbacks back to
assigning to
 > > the global environment if the variable does not already exist.
 > > Unfortunately, it has become a "go to" solution for many to use it
 > > that way.  Sometimes it is intended, sometimes it's a mistake.  We
 > > find it also in R packages on CRAN, even if 'R CMD check' tries to
 > > detect when it happens (but it's limited to do so from run-time
 > > examples and tests).
 > >
 > > It's probably too widely used for us to change to a more strict
 > > behavior permanent.  The proposed R option allows me, as a
developer,
 > > to evaluate an R expression with the strict behavior,
especially if I
 > > don't trust the code.
 > >
 >

Re: [Rd] WISH: Optional mechanism preventing var <<- value from assigning non-existing variable

2023-03-19 Thread Duncan Murdoch

On 19/03/2023 2:15 p.m., Bill Dunlap wrote:
Why should it make an exception for cases where the 
about-to-be-assigned-to name is present in the global environment?  I 
think it should warn or give an error if the altered variable is in any 
environment on the search list.


I'd say code like this should work:

  x <- NULL
  f <- function() x <<- 123
  f()

and then x should be changed to 123 unless the binding to x is locked. 
I don't see why it should matter if this code is in local() or in a 
function, or if it is run at the top level.


For most things on the search list, the binding would be locked, but we 
do allow people to attach environments, and then they'd be on the search 
list, so this code should work too:


  g <- function() {

attach(environment())

x <- NULL
f <- function() x <<- 123
f()

  }

What shouldn't work would be something like

  mean <<- 3

but it already doesn't work (contrary to the documentation), giving

  Error: cannot change value of locked binding for 'mean'

(which makes sense; what if I locked a binding in the global 
environment?  Then we'd go to the fallback, but the fallback can't work, 
because the binding is already there but locked...)


Duncan Murdoch



-Bill

On Sun, Mar 19, 2023 at 10:54 AM Duncan Murdoch 
mailto:murdoch.dun...@gmail.com>> wrote:


I think that should be the default behaviour. It's pretty late to get
that into R 4.3.0, but I think your proposal (with
check.superassignment
= FALSE being the default) could make it in, and 4.4.0 could change the
default to TRUE.

Duncan



On 19/03/2023 12:08 p.m., Henrik Bengtsson wrote:
 > I'd like to be able to prevent the <<- assignment operator ("super
 > assignment") from assigning to the global environment unless the
 > variable already exists and is not locked.  If it does not exist
or is
 > locked, I'd like an error to be produced.  This would allow me to
 > evaluate expressions with this temporarily set to protect against
 > mistakes.
 >
 > For example, I'd like to do something like:
 >
 > $ R --vanilla
 >> exists("a")
 > [1] FALSE
 >
 >> options(check.superassignment = TRUE)
 >> local({ a <<- 1 })
 > Error: object 'a' not found
 >
 >> a <- 0
 >> local({ a <<- 1 })
 >> a
 > [1] 1
 >
 >> rm("a")
 >> options(check.superassignment = FALSE)
 >> local({ a <<- 1 })
 >> exists("a")
 > [1] TRUE
 >
 >
 > BACKGROUND:
 >
 >  From help("<<-") we have:
 >
 > "The operators <<- and ->> are normally only used in functions, and
 > cause a search to be made through parent environments for an existing
 > definition of the variable being assigned. If such a variable is
found
 > (and its binding is not locked) then its value is redefined,
otherwise
 > assignment takes place in the global environment."
 >
 > I argue that it's unfortunate that <<- fallbacks back to assigning to
 > the global environment if the variable does not already exist.
 > Unfortunately, it has become a "go to" solution for many to use it
 > that way.  Sometimes it is intended, sometimes it's a mistake.  We
 > find it also in R packages on CRAN, even if 'R CMD check' tries to
 > detect when it happens (but it's limited to do so from run-time
 > examples and tests).
 >
 > It's probably too widely used for us to change to a more strict
 > behavior permanent.  The proposed R option allows me, as a developer,
 > to evaluate an R expression with the strict behavior, especially if I
 > don't trust the code.
 >
 > With 'check.superassignment = TRUE' set, a developer would have to
 > first declare the variable in the global environment for <<- to
assign
 > there.  This would remove the fallback "If such a variable is found
 > (and its binding is not locked) then its value is redefined,
otherwise
 > assignment takes place in the global environment" in the current
 > design.  For those who truly intends to assign to the global, could
 > use assign(var, value, envir = globalenv()) or globalenv()[[var]] <-
 > value.
 >
 > 'R CMD check' could temporarily set 'check.superassignment = TRUE'
 > during checks.  If we let environment variable
 > 'R_CHECK_SUPERASSIGNMENT' set the default value of option
 >

Re: [Rd] WISH: Optional mechanism preventing var <<- value from assigning non-existing variable

2023-03-19 Thread Duncan Murdoch
I think that should be the default behaviour. It's pretty late to get 
that into R 4.3.0, but I think your proposal (with check.superassignment 
= FALSE being the default) could make it in, and 4.4.0 could change the 
default to TRUE.


Duncan



On 19/03/2023 12:08 p.m., Henrik Bengtsson wrote:

I'd like to be able to prevent the <<- assignment operator ("super
assignment") from assigning to the global environment unless the
variable already exists and is not locked.  If it does not exist or is
locked, I'd like an error to be produced.  This would allow me to
evaluate expressions with this temporarily set to protect against
mistakes.

For example, I'd like to do something like:

$ R --vanilla

exists("a")

[1] FALSE


options(check.superassignment = TRUE)
local({ a <<- 1 })

Error: object 'a' not found


a <- 0
local({ a <<- 1 })
a

[1] 1


rm("a")
options(check.superassignment = FALSE)
local({ a <<- 1 })
exists("a")

[1] TRUE


BACKGROUND:

 From help("<<-") we have:

"The operators <<- and ->> are normally only used in functions, and
cause a search to be made through parent environments for an existing
definition of the variable being assigned. If such a variable is found
(and its binding is not locked) then its value is redefined, otherwise
assignment takes place in the global environment."

I argue that it's unfortunate that <<- fallbacks back to assigning to
the global environment if the variable does not already exist.
Unfortunately, it has become a "go to" solution for many to use it
that way.  Sometimes it is intended, sometimes it's a mistake.  We
find it also in R packages on CRAN, even if 'R CMD check' tries to
detect when it happens (but it's limited to do so from run-time
examples and tests).

It's probably too widely used for us to change to a more strict
behavior permanent.  The proposed R option allows me, as a developer,
to evaluate an R expression with the strict behavior, especially if I
don't trust the code.

With 'check.superassignment = TRUE' set, a developer would have to
first declare the variable in the global environment for <<- to assign
there.  This would remove the fallback "If such a variable is found
(and its binding is not locked) then its value is redefined, otherwise
assignment takes place in the global environment" in the current
design.  For those who truly intends to assign to the global, could
use assign(var, value, envir = globalenv()) or globalenv()[[var]] <-
value.

'R CMD check' could temporarily set 'check.superassignment = TRUE'
during checks.  If we let environment variable
'R_CHECK_SUPERASSIGNMENT' set the default value of option
'check.superassignment' on R startup, it would be possible to check
packages optionally this way, but also to run any "non-trusted" R
script in the "strict" mode.


TEASER:

Here's an example why using <<- for assigning to the global
environment is a bad idea:

This works:

$ R --vanilla

y <- lapply(1:3, function(x) { if (x > 2) keep <<- x; x^2 })
keep
[1] 3



This doesn't work:

$ R --vanilla

library(purrr)
y <- lapply(1:3, function(x) { if (x > 2) keep <<- x; x^2 })

Error in keep <<- x : cannot change value of locked binding for 'keep'


But, if we "declare" the variable first, it works:

$ R --vanilla

library(purrr)
keep <- 0
y <- lapply(1:3, function(x) { if (x > 2) keep <<- x; x^2 })
keep
[1] 3


/Henrik

PS. Does the <<- operator have an official name? Hadley calls it
"super assignment" in 'Advanced R'
(https://adv-r.hadley.nz/environments.html), which is where I got it
from.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Multiple Assignment built into the R Interpreter?

2023-03-14 Thread Duncan Murdoch

On 13/03/2023 6:01 a.m., Duncan Murdoch wrote:

Yes, this is really a problem with the checks, not with the language.

A simpler approach than your alternativeAssignment function would be
simply to allow globalVariables() to be limited to a single function as
the note in its help page says.


I just took a look, and this would be quite easy to do.  It would 
require changes to codetools and to utils, but probably just a few dozen 
lines.


Duncan Murdoch



This might be tedious to write by hand, but could be automated using
methods like "dotify" in dotty.

Duncan Murdoch


On 12/03/2023 10:36 p.m., Pavel Krivitsky wrote:

Dear All,

As a maintainer of large, complex packages, I can think of many places
in which deconstructing assignment would simplify the code, as well as
facilitate readability by breaking up larger functions into helpers, so
I would be very glad to see this incorporated somehow.

I think the crux of the matter is that while there is a number of ways
to implement deconstructing assignment within R, there is no mechanism
to tell R CMD check about it without also suppressing checks for every
other instance of that variable name. This is particularly problematic
because those variable names are likely to be used elsewhere in the
package.

Workarounds that have been suggested all defeat the conciseness and
clarity of the deconstructing assignment and introduce potential for
subtle bugs.

The check warnings are something that can only be addressed in
'codetools', with a finer API than what utils::globalVariables()
provides. Perhaps this would have a lower hurdle than modifying R
language itself?

  From skimming through the relevant 'codetools' code, one idea for such
an API would be a function, along the lines of

utils::alternativeAssignment(op, assigned)

that sets up a callback assigned = function(op, e) that given the
operator (as string) and the expression it's embedded in, returns a
list of three elements:
   * a character vector containing a list of variables assigned to that
 might not otherwise be detected
   * a character vector containing a list of variables referenced that
 might not otherwise be detected
   * expression e with potentially "offending" elements removed, which
 will then be processed by the rest of the checking code

Then, say, 'zeallot' could implement zeallot::zeallot_assign_detect(),
and a package developer using it could put

utils::alternativeAssignment("%<-%", zeallot::zeallot_assign_detect)

in their .onLoad() function. Similarly, users of 'dotty' could set up
callbacks for all standard assignment operators to inform the code
about the nonstandard assignment.

Best Regards,Pavel

On Sun, 2023-03-12 at 14:05 +0200, Sebastian Martin Krantz wrote:

Kevins package is very nice as a proof of concept, no doubt about
that, but
it is not at the level of performance or convenience that a native R
implementation would offer. I would probably not use it to translate
matlab
routines into R packages placed on CRAN, because it’s an additional
dependency, I have a performance burden in every iteration, and
utils::globalVariables() is everything but elegant. From that
perspective
it would be more convenient for me right now to stick with
collapse::%=%,
which is already written in C, and also call
utils::globalVariables().

But again my hope in starting this was that R Core might see that the
addition of multiple assignment would be a significant enhancement to
the
language, of the same order as the base pipe |> in my opinion.

I think the discussion so far has at least brought forth a way to
implement
this in a way that does not violate fundamental principles of the
language.
Which could form a basis for thinking about an actual addition to the
language.

Best regards,

Sebastian


On Sun 12. Mar 2023 at 13:18, Duncan Murdoch

wrote:


On 12/03/2023 6:07 a.m., Sebastian Martin Krantz wrote:

Thinking more about this, and seeing Kevins examples at
https://github.com/kevinushey/dotty
<https://github.com/kevinushey/dotty>, I think this is the most
R-like
way of doing it,
with an additional benefit as it would allow to introduce the
useful
data.table semantics DT[, .(a = b, c, d)] to more general R. So I
would
propose to
introduce a new primitive function . <- function(...)
.Primitive(".") in
R with an assignment method and the following features:


I think that proposal is very unlikely to be accepted.  If it was a
primitive function, it could only be maintained by R Core.  They
are
justifiably very reluctant to take on extra work for themselves.

Kevin's package demonstrates that this can be done entirely in a
contributed package, which means there's no need for R Core to be
involved.  I don't know if he has plans to turn his prototype into
a
CRAN package.  If he doesn't, then it will be up to some other
interested maintainer to step up and

Re: [Rd] Multiple Assignment built into the R Interpreter?

2023-03-13 Thread Duncan Murdoch

Yes, this is really a problem with the checks, not with the language.

A simpler approach than your alternativeAssignment function would be 
simply to allow globalVariables() to be limited to a single function as 
the note in its help page says.


This might be tedious to write by hand, but could be automated using 
methods like "dotify" in dotty.


Duncan Murdoch


On 12/03/2023 10:36 p.m., Pavel Krivitsky wrote:

Dear All,

As a maintainer of large, complex packages, I can think of many places
in which deconstructing assignment would simplify the code, as well as
facilitate readability by breaking up larger functions into helpers, so
I would be very glad to see this incorporated somehow.

I think the crux of the matter is that while there is a number of ways
to implement deconstructing assignment within R, there is no mechanism
to tell R CMD check about it without also suppressing checks for every
other instance of that variable name. This is particularly problematic
because those variable names are likely to be used elsewhere in the
package.

Workarounds that have been suggested all defeat the conciseness and
clarity of the deconstructing assignment and introduce potential for
subtle bugs.

The check warnings are something that can only be addressed in
'codetools', with a finer API than what utils::globalVariables()
provides. Perhaps this would have a lower hurdle than modifying R
language itself?

 From skimming through the relevant 'codetools' code, one idea for such
an API would be a function, along the lines of

utils::alternativeAssignment(op, assigned)

that sets up a callback assigned = function(op, e) that given the
operator (as string) and the expression it's embedded in, returns a
list of three elements:
  * a character vector containing a list of variables assigned to that
might not otherwise be detected
  * a character vector containing a list of variables referenced that
might not otherwise be detected
  * expression e with potentially "offending" elements removed, which
will then be processed by the rest of the checking code

Then, say, 'zeallot' could implement zeallot::zeallot_assign_detect(),
and a package developer using it could put

utils::alternativeAssignment("%<-%", zeallot::zeallot_assign_detect)

in their .onLoad() function. Similarly, users of 'dotty' could set up
callbacks for all standard assignment operators to inform the code
about the nonstandard assignment.

Best Regards,Pavel

On Sun, 2023-03-12 at 14:05 +0200, Sebastian Martin Krantz wrote:

Kevins package is very nice as a proof of concept, no doubt about
that, but
it is not at the level of performance or convenience that a native R
implementation would offer. I would probably not use it to translate
matlab
routines into R packages placed on CRAN, because it’s an additional
dependency, I have a performance burden in every iteration, and
utils::globalVariables() is everything but elegant. From that
perspective
it would be more convenient for me right now to stick with
collapse::%=%,
which is already written in C, and also call
utils::globalVariables().

But again my hope in starting this was that R Core might see that the
addition of multiple assignment would be a significant enhancement to
the
language, of the same order as the base pipe |> in my opinion.

I think the discussion so far has at least brought forth a way to
implement
this in a way that does not violate fundamental principles of the
language.
Which could form a basis for thinking about an actual addition to the
language.

Best regards,

Sebastian


On Sun 12. Mar 2023 at 13:18, Duncan Murdoch

wrote:


On 12/03/2023 6:07 a.m., Sebastian Martin Krantz wrote:

Thinking more about this, and seeing Kevins examples at
https://github.com/kevinushey/dotty
<https://github.com/kevinushey/dotty>, I think this is the most
R-like
way of doing it,
with an additional benefit as it would allow to introduce the
useful
data.table semantics DT[, .(a = b, c, d)] to more general R. So I
would
propose to
introduce a new primitive function . <- function(...)
.Primitive(".") in
R with an assignment method and the following features:


I think that proposal is very unlikely to be accepted.  If it was a
primitive function, it could only be maintained by R Core.  They
are
justifiably very reluctant to take on extra work for themselves.

Kevin's package demonstrates that this can be done entirely in a
contributed package, which means there's no need for R Core to be
involved.  I don't know if he has plans to turn his prototype into
a
CRAN package.  If he doesn't, then it will be up to some other
interested maintainer to step up and take on the task, or it will
just
fade away.

I haven't checked whether your proposals below represent changes
from
the current version of dotty, but if they do, the way to proceed is
to
fork that project, implement 

Re: [Rd] Multiple Assignment built into the R Interpreter?

2023-03-12 Thread Duncan Murdoch

On 12/03/2023 6:07 a.m., Sebastian Martin Krantz wrote:
Thinking more about this, and seeing Kevins examples at 
https://github.com/kevinushey/dotty 
<https://github.com/kevinushey/dotty>, I think this is the most R-like 
way of doing it,
with an additional benefit as it would allow to introduce the useful 
data.table semantics DT[, .(a = b, c, d)] to more general R. So I would 
propose to
introduce a new primitive function . <- function(...) .Primitive(".") in 
R with an assignment method and the following features:


I think that proposal is very unlikely to be accepted.  If it was a 
primitive function, it could only be maintained by R Core.  They are 
justifiably very reluctant to take on extra work for themselves.


Kevin's package demonstrates that this can be done entirely in a 
contributed package, which means there's no need for R Core to be 
involved.  I don't know if he has plans to turn his prototype into a 
CRAN package.  If he doesn't, then it will be up to some other 
interested maintainer to step up and take on the task, or it will just 
fade away.


I haven't checked whether your proposals below represent changes from 
the current version of dotty, but if they do, the way to proceed is to 
fork that project, implement your changes, and offer to contribute them 
back to the main branch.


Duncan Murdoch





  * Positional assignment e.g. .[nr, nc] <- dim(x), and named assignment
e.g. .[new = carb] <- mtcars or .[new = log(carb)] <- mtcars. All
the functionality proposed by Kevin at
https://github.com/kevinushey/dotty
<https://github.com/kevinushey/dotty> is useful, unambiguous and
feasible.
  * Silent dropping of RHS values e.g. .[mpg_new, cyl_new] <- mtcars.
  * Mixing of positional and named assignment e.g .[mpg_new, carb_new =
carb, cyl_new] <- mtcars. The inputs not assigned by name are simply
the elements of RHS in the order they occur, regardless of whether
they have been used previously e.g. .[mpg_new, cyl_new = cyl,
log_cyl = log(cyl), cyl_new2] <- mtcars is feasible. RHS here could
be any named vector type.
  * Conventional use of the function as lazy version of of list(), as in
data.table: .(A = B, C, D) is the same as list(A = B, C = C, D = D).
This would also be useful, allowing more parsimonious code, and
avoid the need to assign names to all return values in a function
return, e.g. if I already have matrices A, C, Q and R as internal
objects in my function, I can simply end by return(.(A, C, Q, R))
instead of return(list(A = A, C = C, Q = Q, R = R)) if I wanted the
list to be named with the object names.

The implementation of this in R and C should be pretty straightforward. 
It would just require a modification to R CMD Check to recognize .[<- as 
assignment.


Best regards,

Sebastian
-
2.)

On Sun, 12 Mar 2023 at 09:42, Sebastian Martin Krantz 
<mailto:sebastian.kra...@graduateinstitute.ch>> wrote:


Thanks Gabriel and Kevin for your inputs,

regarding your points Gabriel, I think Python and Julia do allow
multiple sub-assignment, but in-line with my earlier suggestion in
response to Duncan to make multiple assignment an environment-level
operation (like collapse::%=% currently works),  this would not be
possible in R.

Regarding the [a] <- coolest_function() syntax, yeah it would mean
do multiple assignment and set a equal to the first element dropping
all other elements. Multiple assignment should be positional loke in
other languages, enabling flexible renaming of objects on the fly.
So it should be irrelevant whether the function returns a named or
unnamed list or vector.

Thanks also Kevin for this contribution. I think it’s a remarkable
effort, and I wouldn’t mind such semantics e.g. making it a function
call to ‘.[‘ or any other one-letter function, as long as it’s coded
in C and recognized by the interpreter as an assignment operation.

Best regards,

Sebastian





On Sun 12. Mar 2023 at 01:00, Kevin Ushey mailto:kevinus...@gmail.com>> wrote:

FWIW, it's possible to get fairly close to your proposed semantics
using the existing metaprogramming facilities in R. I put together a
prototype package here to demonstrate:

https://github.com/kevinushey/dotty
<https://github.com/kevinushey/dotty>

The package exports an object called `.`, with a special
`[<-.dot` S3
method which enables destructuring assignments. This means you can
write code like:

     .[nr, nc] <- dim(mtcars)

and that will define 'nr' and 'nc' as you expect.

As for R CMD check warnings, you can suppress those through the
use of
globalVariables(), and that can also be automated within the
package.
T

Re: [Rd] Multiple Assignment built into the R Interpreter?

2023-03-12 Thread Duncan Murdoch

I really like it!  Nicely done.

Duncan Murdoch


On 11/03/2023 6:00 p.m., Kevin Ushey wrote:

FWIW, it's possible to get fairly close to your proposed semantics
using the existing metaprogramming facilities in R. I put together a
prototype package here to demonstrate:

 https://github.com/kevinushey/dotty

The package exports an object called `.`, with a special `[<-.dot` S3
method which enables destructuring assignments. This means you can
write code like:

 .[nr, nc] <- dim(mtcars)

and that will define 'nr' and 'nc' as you expect.

As for R CMD check warnings, you can suppress those through the use of
globalVariables(), and that can also be automated within the package.
The 'dotty' package includes a function 'dotify()' which automates
looking for such usages in your package, and calling globalVariables()
so that R CMD check doesn't warn. In theory, a similar technique would
be applicable to other packages defining similar operators (zeallot,
collapse).

Obviously, globalVariables() is a very heavy hammer to swing for this
issue, but you might consider the benefits worth the tradeoffs.

Best,
Kevin

On Sat, Mar 11, 2023 at 2:53 PM Duncan Murdoch  wrote:


On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:

Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can
follow all aspects you raised, but to give my limited take on a few:


your proposal violates a very basic property of the  language, i.e. that all 
statements are expressions and have a value.  > What's the value of 1 + (A, C = 
init_matrices()).


I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars); nr
= d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,



d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)

is not a statement, it is a sequence of 4 statements.

Duncan Murdoch

   as the

above expression should. `%=%` assigns to
environments, so 1 + (c("A", "C") %=% init_matrices()) returns
numeric(0), with A and C having their values assigned.


suppose f() returns list(A = 1, B = 2) and I do  > B, A <- f() > Should 
assignment be by position or by name?


In other languages this is by position. The feature is not meant to
replace list2env(), and being able to rename objects in the assignment
is a vital feature of codes
using multi input and output functions e.g. in Matlab or Julia.


Honestly, given that this is simply syntactic sugar, I don't think I would 
support it.


You can call it that, but it would be used by almost every R user almost
every day. Simple things like nr, nc = dim(x); values, vectors =
eigen(x) etc. where the creation of intermediate objects
is cumbersome and redundant.


I see you've already mentioned it ("JavaScript-like"). I think it would  fulfil 
Sebastian's requirements too, as long as it is considered "true assignment" by the rest 
of the language.


I don't have strong opinions about how the issue is phrased or
implemented. Something like [t, n] = dim(x) might even be more clear.
It's important though that assignment remains by position,
so even if some output gets thrown away that should also be positional.


  A <- 0  > [A, B = A + 10] <- list(1, A = 2)


I also fail to see the use of allowing this. something like this is an
error.


A = 2
(B = A + 1) <- 1

Error in (B = A + 1) <- 1 : could not find function "(<-"

Regarding the practical implementation, I think `collapse::%=%` is a
good starting point. It could be introduced in R as a separate function,
or `=` could be modified to accommodate its capability. It should be
clear that
with more than one LHS variables the assignment is an environment level
operation and the results can only be used in computations once assigned
to the environment, e.g. as in 1 + (c("A", "C") %=% init_matrices()),
A and C are not available for the addition in this statement. The
interpretor then needs to be modified to read something like nr, nc =
dim(x) or [nr, nc] = dim(x). as an environment-level multiple assignment
operation with no
immediate value. Appears very feasible to my limited understanding, but
I guess there are other things to consider still. Definitely appreciate
the responses so far though.

Best regards,

Sebastian





On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch mailto:murdoch.dun...@gmail.com>> wrote:

 On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
  > On Sat, 11 Mar 2023 11:11:06 -0500
  > Duncan Murdoch mailto:murdoch.dun...@gmail.com>> wrote:
  >
  >> That's clear, but your proposal violates a very basic property
 of the
  >> language, i.e. that all statements are expressions and have a value.
  >
  > How about reframing this feature request from multiple assignment
  > (which does go contrary to "everything has only one v

Re: [Rd] Multiple Assignment built into the R Interpreter?

2023-03-11 Thread Duncan Murdoch

On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can 
follow all aspects you raised, but to give my limited take on a few:



your proposal violates a very basic property of the  language, i.e. that all 
statements are expressions and have a value.  > What's the value of 1 + (A, C = 
init_matrices()).


I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars); nr 
= d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,



  d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)

is not a statement, it is a sequence of 4 statements.

Duncan Murdoch

 as the

above expression should. `%=%` assigns to
environments, so 1 + (c("A", "C") %=% init_matrices()) returns 
numeric(0), with A and C having their values assigned.



suppose f() returns list(A = 1, B = 2) and I do  > B, A <- f() > Should 
assignment be by position or by name?


In other languages this is by position. The feature is not meant to 
replace list2env(), and being able to rename objects in the assignment 
is a vital feature of codes

using multi input and output functions e.g. in Matlab or Julia.


Honestly, given that this is simply syntactic sugar, I don't think I would 
support it.


You can call it that, but it would be used by almost every R user almost 
every day. Simple things like nr, nc = dim(x); values, vectors = 
eigen(x) etc. where the creation of intermediate objects

is cumbersome and redundant.


I see you've already mentioned it ("JavaScript-like"). I think it would  fulfil 
Sebastian's requirements too, as long as it is considered "true assignment" by the rest 
of the language.


I don't have strong opinions about how the issue is phrased or 
implemented. Something like [t, n] = dim(x) might even be more clear. 
It's important though that assignment remains by position,

so even if some output gets thrown away that should also be positional.


 A <- 0  > [A, B = A + 10] <- list(1, A = 2)


I also fail to see the use of allowing this. something like this is an 
error.



A = 2
(B = A + 1) <- 1

Error in (B = A + 1) <- 1 : could not find function "(<-"

Regarding the practical implementation, I think `collapse::%=%` is a 
good starting point. It could be introduced in R as a separate function, 
or `=` could be modified to accommodate its capability. It should be 
clear that
with more than one LHS variables the assignment is an environment level 
operation and the results can only be used in computations once assigned 
to the environment, e.g. as in 1 + (c("A", "C") %=% init_matrices()),
A and C are not available for the addition in this statement. The 
interpretor then needs to be modified to read something like nr, nc = 
dim(x) or [nr, nc] = dim(x). as an environment-level multiple assignment 
operation with no
immediate value. Appears very feasible to my limited understanding, but 
I guess there are other things to consider still. Definitely appreciate 
the responses so far though.


Best regards,

Sebastian





On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <mailto:murdoch.dun...@gmail.com>> wrote:


On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
 > On Sat, 11 Mar 2023 11:11:06 -0500
 > Duncan Murdoch mailto:murdoch.dun...@gmail.com>> wrote:
 >
 >> That's clear, but your proposal violates a very basic property
of the
 >> language, i.e. that all statements are expressions and have a value.
 >
 > How about reframing this feature request from multiple assignment
 > (which does go contrary to "everything has only one value, even
if it's
 > sometimes invisible(NULL)") to "structured binding" / "destructuring
 > assignment" [*], which takes this single single value returned by the
 > expression and subsets it subject to certain rules? It may be
easier to
 > make a decision on the semantics for destructuring assignment (e.g.
 > languages which have this feature typically allow throwing unneeded
 > parts of the return value away), and it doesn't seem to break as much
 > of the rest of the language if implemented.
 >
 > I see you've already mentioned it ("JavaScript-like"). I think it
would
 > fulfil Sebastian's requirements too, as long as it is considered
"true
 > assignment" by the rest of the language.
 >
 > The hard part is to propose the actual grammar of the new feature (in
 > terms of src/main/gram.y, preferably without introducing
conflicts) and
 > its semantics (including the corner cases, some of which you have
 > already mentioned). I'm not sure I'm up to the task.
 >

If I were doing it, here's wha

  1   2   3   4   5   6   7   8   9   10   >