Re: [Rd] numerical issue with t.test

2024-09-16 Thread Ben Bolker
  To be more specific, this replicates the computations that t.test is 
doing (stripped of all the different cases that stats:::t.test.default 
handles)


z <- err1-err2
se <- sqrt(var(z)/length(z))
mz <- mean(z)
tstat <- mz/se
2*pt(tstat, df= length(z)-1, lower.tail = FALSE)

On 2024-09-16 10:54 a.m., Michael Dewey wrote:

Dear Toby

I see no problem there. If you compute the mean and variance of err1 - 
err2 which is what the paired test is working on then that might help to 
see what is going on.


Michael

On 16/09/2024 15:47, Toby Hocking wrote:

Hi! I expected that t.test should report a very large p-value (close
to 1), even when using paired=TRUE, for the data below (which are very
similar). However, I observe p-value = 0.02503 which indicates a
significant difference, even though there is none. Can this be fixed
please? This is with R-4.4.1. For reference below I use paired=FALSE
with the same data, and I get p-value = 1 as expected.

err1 = c(-1.6076199373862132, -1.658521185520103, 
-1.6549424312339873, -1.5887767975086149, -1.634129577540383, 
-1.7442711937982249)
err2 = c(-1.6076199373862132, -1.6585211855201032, 
-1.6549424312339875, -1.5887767975086149, -1.6341295775403832, 
-1.7442711937982252)

t.test(err1,err2,paired=TRUE)


 Paired t-test

data:  err1 and err2
t = 3.1623, df = 5, p-value = 0.02503
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
  2.769794e-17 2.683615e-16
sample estimates:
mean difference
    1.480297e-16


t.test(err1,err2,paired=FALSE)


 Welch Two Sample t-test

data:  err1 and err2
t = 0, df = 10, p-value = 1
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  -0.06988771  0.06988771
sample estimates:
mean of x mean of y
-1.648044 -1.648044

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
> E-mail is sent at my convenience; I don't expect replies outside of 
working hours.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] specials and ::

2024-08-27 Thread Ben Bolker
  I don't see a big downside, but I will say that there's always a bit 
of a tradeoff between "train the users to do it right" (by writing clear 
documentation and informative error messages) and "make things easy for 
the user" (by making the code more complicated to handle things for them 
automatically).


   For example, part of me wishes that (1) there were only one way to 
provide a response variable for a binomial variable with N>1 (preferably 
by specifying proportions and a weights argument) and (2) grouping 
variables in lme4/nlme/et al always had to be specified as factors 
(rather than automatically being coerced to factors). Making those 
decisions would avoid so much code complexity ... (and eliminate one 
class of errors, i.e. people including a continuous covariate as a 
random-effect grouping variable because they think of 'random effect' 
and 'nuisance variable' as synonyms ...)


  But taking the "train the users to do it right" path does also 
involve more discussion with users ("if your software knows what I 
should be doing why can't it just do it for me?")


  cheers
   Ben Bolker

On 2024-08-27 9:43 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:

You are right of course, Peter, but I can see where some will get confused.   
In a formula
some symbols and functions are special operators, and others are simple 
functions.   That
is the reason one needs I(events/time) to put a rate in as a variable.    
Someone who
types 'offset' at the command line will see that there actually IS a function 
behind the
scenes.

Does anyone see a downside to Bill Dunlap's suggestion where the first step of 
my formula
processing would be to "clean off" any survival:: modifiers?    That is, 
something that
will break? After all, the code already has a lot of  "if () "  lines for 
other common
user errors.   I could view it as just saving me the time to deal with the 'we 
found an
error' emails.   I would output the corrected version as the "call" component.

Terry

On 8/27/24 03:38, peter dalgaard wrote:

In my view, that's just plain wrong, because strata() is not a function but a 
special operator in a model formula. Wouldn't it also blow up on 
stats::offset()?

Oh, yes it would:


lm(y~x+offset(z))

Call:
lm(formula = y ~ x + offset(z))

Coefficients:
(Intercept)x
   0.7350   0.0719


lm(y~x+stats::offset(z))

Call:
lm(formula = y ~ x + stats::offset(z))

Coefficients:
   (Intercept) x  stats::offset(z)
0.64570.10780.8521


Or, to be facetious:


lm(y~base::"+"(x,z))

Call:
lm(formula = y ~ base::"+"(x, z))

Coefficients:
  (Intercept)  base::"+"(x, z)
   0.4516   0.4383



-pd


On 26 Aug 2024, at 16:42 , Therneau, Terry M., Ph.D. via 
R-devel  wrote:

The survival package makes significant use of the "specials" argument of 
terms(), before
calling model.frame; it is part of nearly every modeling function. The reason 
is that
strata argments simply have to be handled differently than other things on the 
right hand
side. Likewise for tt() and cluster(), though those are much less frequent.

I now get "bug reports" from the growing segment that believes one should put
packagename:: in front of every single instance.   For instance
fit <- survival::survdiff( survival::Surv(time, status) ~ ph.karno +
survival::strata(inst),  data= survival::lung)

This fails to give the correct answer because it fools terms(formula, specials=
"strata").I've stood firm in my response of "that's your bug, not mine", 
but I begin
to believe I am swimming uphill.   One person responded that it was company 
policy to
qualify everything.

I don't see an easy way to fix survival, and even if I did it would be a 
tremendous amout
of work.   What are other's thoughts?

Terry



--

Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"

[[alternative HTML version deleted]]

__
R-devel@r-project.org  mailing list
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=05%7C02%7Ctherneau%40mayo.edu%7C7659a5f0f0d34746966a08dcc6739fed%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638603447151664511%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UAkeksswfFdLwOdzQIOXUPC2Ey255oW%2FX41kptNZNcU%3D&reserved=0


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Dr. Benjamin Bolker
Professor, Mathematics & Stat

Re: [Rd] CRAN package submission

2024-08-26 Thread Ben Bolker
Try the foghorn package for checking the status of your submission in the
CRAN queue?

On Mon, Aug 26, 2024, 4:46 AM jing hua zhao  wrote:

> Dear CRAN / All,
>
> I appeared not to receive any email notification after upload a package
> update (to furnish the confirmation) -- is the system down?
>
> Many thanks,
>
>
>
> Jing Hua
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Strange Behavior in RNG

2024-08-17 Thread Ben Bolker
You could argue that the 'n' argument should be rounded rather than
truncated, but this form of coercion from float to integer is
common/standard (in C, for example). In any case, it's a long standing part
of R and is very unlikely to be changed ...

On Sat, Aug 17, 2024, 12:11 AM Jiefei Wang  wrote:

> Hi Rui and John,
>
> Thanks for your reply. I'm not sure if this is a question for R-help as I
> think the behavior of RNG is weird, but I will happy to move this
> discussion if the admin think this is not their topic.
>
> I was a C/C++ developer so I understand the double-type numbers sometimes
> can generate surprising results, but what unexpected here is that even the
> number is super close to 400 'rnorm' still rounds it down to 399. Shouldn't
> it be round up in this case? Probably the underlying code just convert the
> number into an int type, but I was expecting that the function can tolerate
> a certain degree of errors. Maybe I have too much expectations for it...
>
> Best,
> Jiefei
>
>
>
> On Fri, Aug 16, 2024, 22:19 Rui Barradas  wrote:
>
> > Às 01:45 de 17/08/2024, Jiefei Wang escreveu:
> > > Hi,
> > >
> > > I just observed a strange behavior in R. The rnorm function does not
> > > give me the numbers with a given length. I think it is somehow related
> > > to the internal representation of double-type numbers but I am not
> > > sure if this is supposed to happen. Below is a reproducible example
> > >
> > > ```
> > > ## Create a list, we will only take the forth value, which is 0.6
> > > nList <- seq(0,1,0.2)
> > > n <- nList[4]
> > > n
> > > # [1] 0.6
> > > length(rnorm(1000*n))
> > > # [1] 600
> > > length(rnorm(1000-1000*n))
> > > # [1] 399 <--- What happened here?
> > > length(rnorm(1000-1000*0.6))
> > > # [1] 400
> > > 1000-1000*n
> > > # [1] 400 <- this looks good to me...
> > > 1000-1000*0.6
> > > # [1] 400
> > > identical(n, 0.6)
> > > # [1] FALSE
> > > .Internal(inspect(n))
> > > # @0x0217c75d79d0 14 REALSXP g0c1 [REF(1)] (len=1, tl=0) 0.6
> > > .Internal(inspect(0.6))
> > > # @0x0217c791e0c8 14 REALSXP g0c1 [REF(2)] (len=1, tl=0) 0.6
> > > ```
> > >
> > > As you can see, length(rnorm(1000-1000*n)) does not really give me the
> > > result I want. This is somewhat surprising because it is hard to
> > > imagine that a manually-typed 0.6 can behave differently than 0.6 from
> > > a sequence. Furthermore, 0.6 is the only problematic number from
> > > `nList`. The rest numbers work fine. I can guess it is due to the
> > > rounding mechanism, but I think this should be treated as a bug: if
> > > the print function can show the result of 1000-1000*n correctly, it
> > > will be strange that rnorm behaves differently. Below is my session
> > > info
> > >
> > > R version 4.3.0 (2023-04-21 ucrt)
> > > Platform: x86_64-w64-mingw32/x64 (64-bit)
> > > Running under: Windows 10 x64 (build 19045)
> > >
> > > Matrix products: default
> > >
> > > locale:
> > > [1] LC_COLLATE=English_United States.utf8
> > > [2] LC_CTYPE=English_United States.utf8
> > > [3] LC_MONETARY=English_United States.utf8
> > > [4] LC_NUMERIC=C
> > > [5] LC_TIME=English_United States.utf8
> > >
> > > time zone: America/Chicago
> > > tzcode source: internal
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > Hello,
> >
> > This is R FAQ 7.31.
> > In fact, the sequences
> >
> > seq(0, 1, 0.1)
> > seq(0, 1, 0.2)
> >
> > should probably be a FAQ 7.31 example.
> > If you print the numbers with more decimals you will see why the error.
> >
> >
> >
> > # generate the list
> > nList <- seq(0,1,0.2)
> > # compare the list with manually typed numbers
> > nList != c(0, 0.2, 0.4, 0.6, 0.8, 1)
> > #> [1] FALSE FALSE FALSE  TRUE FALSE FALSE
> >
> > # note the value of 0.6
> > print(nList, digits = 16L)
> > #> [1] 0. 0.2000 0.4000
> > 0.6001
> > #> [5] 0.8000 1.
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> >
> >
> > --
> > Este e-mail foi analisado pelo software antivírus AVG para verificar a
> > presença de vírus.
> > www.avg.com
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] changes in R-devel and zero-extent objects in Rcpp

2024-06-10 Thread Ben Bolker

  Thanks, that's very useful.

  AFAICT, in the problematic case we are doing some linear algebra with 
zero-column matrices that are mathematically well-defined (and whose 
base-R equivalents work correctly). It's maybe not surprising that 
Eigen/RcppEigen would do some weird stuff in this edge case.  I'll see 
if I can come up with some pure RcppEigen/Eigen examples to illustrate 
the problem ...


  cheers
   Ben



On 2024-06-10 10:12 a.m., Mikael Jagan wrote:


The ASan output is:

     > reference binding to misaligned address 0x0001 for type 
'const double', which requires 8 byte alignment


That there is a "reference" to 0x1 means that there really _is_ an 
attempt to
access memory there.  The stack trace provided by ASan tells you exactly 
where

it happens: line 100 of
RcppEigen/inst/include/Eigen/src/Core/products/GeneralMatrixMatrixTriangular.h:

     for(Index k2=0; k2where 'rhs' is an object wrapping the pointer with a method 
getSubMapper(i, j)
for accessing the data like a matrix.  In the first loop iteration, you 
access

rhs[0]; there is no defensive test for 'rhs' of positive length.

So ASan _is_ revealing an illegal access, complaining only now (since 
r86629)

because _now_ the address that you access illegally is misaligned.


--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
> E-mail is sent at my convenience; I don't expect replies outside of 
working hours.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] changes in R-devel and zero-extent objects in Rcpp

2024-06-08 Thread Ben Bolker
  The ASAN errors occur *even if the zero-length object is not actually 
accessed*/is used in a perfectly correct manner, i.e. it's perfectly 
legal in base R to define `m <- numeric(0)` or `m <- matrix(nrow = 0, 
ncol = 0)`, whereas doing the equivalent in Rcpp will (now) lead to an 
ASAN error.


  i.e., these are *not* previously cryptic out-of-bounds accesses that 
are now being revealed, but instead sensible and previously legal 
definitions of zero-length objects that are now causing problems.


   I'm pretty sure I'm right about this, but it's absolutely possible 
that I'm just confused at this point; I don't have a super-simple 
example to show you at the moment. The closest is this example by Mikael 
Jagan: https://github.com/lme4/lme4/issues/794#issuecomment-2155093049


  which shows that if x is a pointer to a zero-length vector (in plain 
C++ for R, no Rcpp is involved), DATAPTR(x) and REAL(x) evaluate to 
different values.


  Mikael further points out that "Rcpp seems to cast a (void *) 
returned by DATAPTR to (double *) when constructing a Vector 
from a SEXP, rather than using the (double *) returned by REAL." So 
perhaps R-core doesn't want to guarantee that these operations give 
identical answers, in which case Rcpp will have to change the way it 
does things ...


  cheers
   Ben



On 2024-06-08 6:39 p.m., Kevin Ushey wrote:

IMHO, this should be changed in both Rcpp and downstream packages:

1. Rcpp could check for out-of-bounds accesses in cases like these, and 
emit an R warning / error when such an access is detected;


2. The downstream packages unintentionally making these out-of-bounds 
accesses should be fixed to avoid doing that.


That is, I think this is ultimately a bug in the affected packages, but 
Rcpp could do better in detecting and handling this for client packages 
(avoiding a segfault).


Best,
Kevin


On Sat, Jun 8, 2024, 3:06 PM Ben Bolker <mailto:bbol...@gmail.com>> wrote:



     A change to R-devel (SVN r86629 or
https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250 
<https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250>
has changed the handling of pointers to zero-length objects, leading to
ASAN issues with a number of Rcpp-based packages (the commit message
reads, in part, "Also define STRICT_TYPECHECK when compiling
inlined.c.")

    I'm interested in discussion from the community.

    Details/diagnosis for the issues in the lme4 package are here:
https://github.com/lme4/lme4/issues/794
<https://github.com/lme4/lme4/issues/794>, with a bit more discussion
about how zero-length objects should be handled.

    The short(ish) version is that r86629 enables the
CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro

<https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104
 
<https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104>>,
which returns a trivial pointer (rather than the data pointer that
would
be returned in the normal control flow) if an object has length 0:

/* Attempts to read or write elements of a zero length vector will
     result in a segfault, rather than read and write random memory.
     Returning NULL would be more natural, but Matrix seems to assume
     that even zero-length vectors have non-NULL data pointers, so
     return (void *) 1 instead. Zero-length CHARSXP objects still have a
     trailing zero byte so they are not handled. */

    In the Rcpp context this leads to an inconsistency, where `REAL(x)`
is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn
leads to ASAN warnings like

runtime error: reference binding to misaligned address 0x0001
for type 'const double', which requires 8 byte alignment
0x0001: note: pointer points here

     I'm in over my head and hoping for insight into whether this
problem
should be resolved by changing R, Rcpp, or downstream Rcpp packages ...

    cheers
     Ben Bolker

__
R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
<https://stat.ethz.ch/mailman/listinfo/r-devel>



--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
> E-mail is sent at my convenience; I don't expect replies outside of 
working hours.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] changes in R-devel and zero-extent objects in Rcpp

2024-06-08 Thread Ben Bolker



   A change to R-devel (SVN r86629 or 
https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250 
has changed the handling of pointers to zero-length objects, leading to 
ASAN issues with a number of Rcpp-based packages (the commit message 
reads, in part, "Also define STRICT_TYPECHECK when compiling inlined.c.")


  I'm interested in discussion from the community.

  Details/diagnosis for the issues in the lme4 package are here: 
https://github.com/lme4/lme4/issues/794, with a bit more discussion 
about how zero-length objects should be handled.


  The short(ish) version is that r86629 enables the 
CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro 
<https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104>, 
which returns a trivial pointer (rather than the data pointer that would 
be returned in the normal control flow) if an object has length 0:


/* Attempts to read or write elements of a zero length vector will
   result in a segfault, rather than read and write random memory.
   Returning NULL would be more natural, but Matrix seems to assume
   that even zero-length vectors have non-NULL data pointers, so
   return (void *) 1 instead. Zero-length CHARSXP objects still have a
   trailing zero byte so they are not handled. */

  In the Rcpp context this leads to an inconsistency, where `REAL(x)` 
is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn 
leads to ASAN warnings like


runtime error: reference binding to misaligned address 0x0001 
for type 'const double', which requires 8 byte alignment

0x0001: note: pointer points here

   I'm in over my head and hoping for insight into whether this problem 
should be resolved by changing R, Rcpp, or downstream Rcpp packages ...


  cheers
   Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] confint Attempts to Use All Server CPUs by Default

2024-05-22 Thread Ben Bolker
  Following up on this -- on my system, I have 69 packages installed 
that appear to provide something like a confint() method:


h <-  help.search("confint", agrep = FALSE)
p <- sort(unique(h$matches$Package))
length(p)
## [1] 69

p

 [1] "bamlss"  "bbmle"   "binom"   "brglm2" 

 [5] "broom"   "caper"   "car" "CDM" 

 [9] "CLME""coin""crosstable"  "dclone" 


[13] "doBy""drc" "Ecfun"   "emmeans"
[17] "epigrowthfit""evd" "Exact"   "fitode"
[21] "fixest"  "ggfortify"   "ggplot2" "GLMMadaptive"
[25] "glmmTMB" "gratia"  "hdm" "JMbayes"
[29] "JointAI" "lava""lme4""lmeresampler"
[33] "lmtest"  "logistf" "MASS""maxLik"
[37] "metafor" "mitml"   "MKmisc"  "mmrm"
[41] "mosaic"  "MplusAutomation" "multcomp""ordinal"
[45] "papaja"  "parsnip" "prodlim" "R2admb"
[49] "rethinking"  "riskRegression"  "RSA" "rstanarm"
[53] "rxode2"  "segmented"   "simr""sirt"
[57] "sn"  "spaMM"   "stats"   "stats4"
[61] "strucchange" "survey"  "survival""systemfit"
[65] "TMB" "unmarked""vegan"   "VGAM"
[69] "zipfR"

 If a confint() method were actually included in the package test 
suite, I would expect this kind of problem to be caught by CRAN checks 
(which look for code that is being greedy with parallelization). But 
it's perfectly possible that a package maintainer neglected to include 
such tests ...



On 2024-05-21 6:00 a.m., Ivan Krylov via R-devel wrote:

В Tue, 21 May 2024 08:00:11 +
Dario Strbenac via R-devel  пишет:


Would a less resource-intensive value, such as 1, be a safer default
CPU value for confint?


Which confint() method do you have in mind? There is at least four of
them by default in R, and many additional classes could make use of
stats:::confint.default by implementing vcov().


Also, there is no mention of such parallel processing in ?confint, so
it was not clear at first where to look for performance degradation.
It could at least be described in the manual page so that users would
know that export OPENBLAS_NUM_THREADS=1 is a solution.


There isn't much R can do about the behaviour of the BLAS, because
there is no standard interface to set the number of threads. Some BLASes
(like ATLAS) don't even offer it as a tunable number at all [*].

A system administrator could link the installation of R against
FlexiBLAS [**], provide safe defaults in the environment variables and
educate the users about its tunables [***], but that's a choice just
like it had been a choice to link R against a parallel variant of
OpenBLAS on a shared computer. This is described in R Installation and
Administration, section A.3.1 [].



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [Non-DoD Source] Re: R for the US Air Force

2024-05-16 Thread Ben Bolker

  For what it's worth

https://www.r-project.org/foundation/board.html

 says

The R Foundation is seated in Vienna, Austria and currently hosted by 
the Vienna University of Economics and Business. It is a registered 
association under Austrian law and active worldwide. The R Foundation 
can be contacted by e-mail to R-foundation at r-project.org



  The statutes here <https://www.r-project.org/foundation/> probably 
won't be useful, but you might be able to get the Austrian registration 
documents from someone at the R foundation (see e-mail above); it seems 
as though that might qualify for lists A and B in your documentation 
specs, and they might be able to provide you with an Austrian tax 
receipt for list C ...


  cheers
Ben Bolker

On 2024-05-16 4:03 p.m., ADAMS, DOUGLAS L CIV USAF AFMC OO-ALC/OBWA via 
R-devel wrote:

You described it well; I’m afraid that’s what it is.  They want documents of an 
organization that doesn’t fit the description, and only then will they test it.

  


That sounds great.  I’ll reach out to Jeremy and see if he can help me get the 
IDs the USAF wants.  It sounds like Joshua predates my time here.  That’s a 
really good idea.

  


Thanks very much!

  


Doug

  

  


From: Josiah Parry 
Sent: Thursday, May 16, 2024 1:05 PM
To: ADAMS, DOUGLAS L CIV USAF AFMC OO-ALC/OBWA 
Cc: R-devel@R-project.org
Subject: [Non-DoD Source] Re: [Rd] R for the US Air Force

  



You don't often get email from josiah.pa...@gmail.com.  
<https://aka.ms/LearnAboutSenderIdentification> Learn why this is important



Hey Doug,

  


R is not a product that is provided by a company or any vendor that can be 
procured through a vendor e.g. something on a GSA schedule.

  


Seems like you're caught in the bureaucracy hell hole. I used to help the USAF, 
and other DoD members use R when I was at RStudio (now Posit).

  


I recommend you find someone in your organization who is doing Data Science. 
They'll likely have charted a path and you can follow in their footsteps.

For example https://www.linkedin.com/in/joshua-couse/.

  


Jeremy Allen has since taken over my role at Posit since I've left. I'm sure he 
knows a plethora of people in the USAF who are using R and can help you out.

  


https://www.linkedin.com/in/jeremy-allen-data/

  

  

  

  


On Thu, May 16, 2024 at 2:57 PM ADAMS, DOUGLAS L CIV USAF AFMC OO-ALC/OBWA via R-devel 
mailto:r-devel@r-project.org> > wrote:

Hello,



The US Air Force used to have R available on our main network, but now those 
who need to accept it back are
being very particular about what they're accepting in terms of official 
documentation.



Would you be able to help me with this endeavor?  I'm attaching a pdf that 
shows what documentation they'd
require for us to re-establish R as being acceptable on the network here.  I 
understand if you're too busy or
if it's a pain.  Haha



Thank you so much for your help and consideration!



Doug Adams

United States Air Force

Hill AFB, Utah



__
R-devel@r-project.org <mailto:R-devel@r-project.org>  mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] View() segfaulting ...

2024-04-25 Thread Ben Bolker

  A clean build solves it for me too. Thank you!
  (I need to add this to my "have you tried turning it off and back on 
again?" list ...)


  Ben


On 2024-04-25 8:07 a.m., luke-tier...@uiowa.edu wrote:

I saw it also on some of my Ubuntu builds, but the issue went away
after a make clean/make, so maybe give that a try.

Best,

luke

On Wed, 24 Apr 2024, Ben Bolker wrote:

 I'm using bleeding-edge R-devel, so maybe my build is weird. Can 
anyone else reproduce this?


 View() seems to crash on just about anything.

View(1:3)
*** stack smashing detected ***: terminated
Aborted (core dumped)

 If I debug(View) I get to the last line of code with nothing 
obviously looking pathological:


Browse[1]>
debug: invisible(.External2(C_dataviewer, x, title))
Browse[1]> x
$x
[1] "1" "2" "3"

Browse[1]> title
[1] "Data: 1:3"
Browse[1]>
*** stack smashing detected ***: terminated
Aborted (core dumped)




R Under development (unstable) (2024-04-24 r86483)
Platform: x86_64-pc-linux-gnu
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS/LAPACK: 
/usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; 
LAPACK version 3.10.0


locale:
[1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8    LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

time zone: America/Toronto
tzcode source: system (glibc)

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.5.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] View() segfaulting ...

2024-04-24 Thread Ben Bolker
As suggested by Josh Ulrich, here's what I get when running under 
valgrind.



$ R -d valgrind
==218120== Memcheck, a memory error detector
==218120== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==218120== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright 
info

==218120== Command: /usr/local/lib/R/bin/exec/R
==218120==

R Under development (unstable) (2024-04-24 r86483) -- "Unsuffered 
Consequences"

Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

gcoto> gctorture(TRUE)
> View(1:3)
*** stack smashing detected ***: terminated
==218120==
==218120== Process terminating with default action of signal 6 (SIGABRT)
==218120==at 0x4D619FC: __pthread_kill_implementation 
(pthread_kill.c:44)

==218120==by 0x4D619FC: __pthread_kill_internal (pthread_kill.c:78)
==218120==by 0x4D619FC: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==218120==by 0x4D0D475: raise (raise.c:26)
==218120==by 0x4CF37F2: abort (abort.c:79)
==218120==by 0x4D54675: __libc_message (libc_fatal.c:155)
==218120==by 0x4E01599: __fortify_fail (fortify_fail.c:26)
==218120==by 0x4E01565: __stack_chk_fail (stack_chk_fail.c:24)
==218120==by 0x27B686AD: in_R_X11_dataviewer (dataentry.c:540)
==218120==by 0x495C7C7: do_External (dotcode.c:573)
==218120==by 0x499A07F: bcEval_loop (eval.c:8141)
==218120==by 0x49B501C: bcEval (eval.c:7524)
==218120==by 0x49B501C: bcEval (eval.c:7509)
==218120==by 0x49B538A: Rf_eval (eval.c:1167)
==218120==by 0x49B755E: R_execClosure (eval.c:2398)
==218120==
==218120== HEAP SUMMARY:
==218120== in use at exit: 42,061,827 bytes in 9,305 blocks
==218120==   total heap usage: 23,905 allocs, 14,600 frees, 66,039,858 
bytes allocated

==218120==
==218120== LEAK SUMMARY:
==218120==definitely lost: 0 bytes in 0 blocks
==218120==indirectly lost: 0 bytes in 0 blocks
==218120==  possibly lost: 5,868 bytes in 14 blocks
==218120==still reachable: 42,055,959 bytes in 9,291 blocks
==218120==   of which reachable via heuristic:
==218120== newarray   : 4,264 bytes in 1 
blocks

==218120== suppressed: 0 bytes in 0 blocks
==218120== Rerun with --leak-check=full to see details of leaked memory
==218120==
==218120== For lists of detected and suppressed errors, rerun with: -s
==218120== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Aborted (core dumped)
bolker:~/R$

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] View() segfaulting ...

2024-04-24 Thread Ben Bolker
  I'm using bleeding-edge R-devel, so maybe my build is weird. Can 
anyone else reproduce this?


  View() seems to crash on just about anything.

View(1:3)
*** stack smashing detected ***: terminated
Aborted (core dumped)

  If I debug(View) I get to the last line of code with nothing 
obviously looking pathological:


Browse[1]>
debug: invisible(.External2(C_dataviewer, x, title))
Browse[1]> x
$x
[1] "1" "2" "3"

Browse[1]> title
[1] "Data: 1:3"
Browse[1]>
*** stack smashing detected ***: terminated
Aborted (core dumped)




R Under development (unstable) (2024-04-24 r86483)
Platform: x86_64-pc-linux-gnu
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS/LAPACK: 
/usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; 
LAPACK version 3.10.0


locale:
 [1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8
 [5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_CA.UTF-8
 [7] LC_PAPER=en_CA.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

time zone: America/Toronto
tzcode source: system (glibc)

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.5.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] read.csv

2024-04-16 Thread Ben Bolker
  Tangentially, your code will be more efficient if you add the data 
files to a *list* one by one and then apply bind_rows or 
do.call(rbind,...) after you have accumulated all of the information 
(see chapter 2 of the _R Inferno_). This may or may not be practically 
important in your particular case.


Burns, Patrick. 2012. The R Inferno. Lulu.com. 
http://www.burns-stat.com/pages/Tutor/R_inferno.pdf.



On 2024-04-16 6:46 a.m., jing hua zhao wrote:

Dear R-developers,

I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile 
to note -- my data involves a protein named "1433E" but to save space I drop 
the quote so it becomes,

Gene,SNP,prot,log10p
YWHAE,13:62129097_C_T,1433E,7.35
YWHAE,4:72617557_T_TA,1433E,7.73

Both read.cv() and readr::read_csv() consider prot(ein) name as (possibly 
confused by scientific notation) numeric 1433 which only alerts me when I tried 
to combine data,

all_data <- data.frame()
for (protein in proteins[1:7])
{
cat(protein,":\n")
f <- paste0(protein,".csv")
if(file.exists(f))
{
  p <- read.csv(f)
  print(p)
  if(nrow(p)>0) all_data  <- bind_rows(all_data,p)
}
}

proteins[1:7]
[1] "1433B" "1433E" "1433F" "1433G" "1433S" "1433T" "1433Z"

dplyr::bind_rows() failed to work due to incompatible types nevertheless 
rbind() went ahead without warnings.

Best wishes,


Jing Hua

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] VPAT OR Accessibility Conformance Report Request

2024-02-15 Thread Ben Bolker
  There was a recent thread here started by someone asking a similar 
question.


https://stat.ethz.ch/pipermail/r-devel/2024-January/083120.html

  You should probably start by going through all of that thread, but 
the bottom line is that:


 * R is in fact very accessible to people with a wide range of challenges
 * unfortunately you are very unlikely to be able to get such a report, 
unless someone volunteers to do one, as R is a volunteer-run project 
(and the developers are generally more interested in maintaining and 
improving the software than satisfying bureaucrats)


  A google search

https://www.google.com/search?q=R+CRAN+VPAT

  suggests that VPATs are available for various third-party 
extensions/packaging of R (Oracle R, RStudio desktop).


  This article is old, but if someone in your IT department read it, 
hopefully they would agree to download the software: 
https://journal.r-project.org/archive/2013-1/godfrey.pdf


On 2024-02-15 12:23 p.m., Zachary Benner wrote:

To Whom It May Concern,
My name is Zach Benner and I am the Accessibility (A.D.A.-Americans with
Disabilities Act) Coordinator here at University of Maine at Machias.  I am
reaching out on the behalf of our science professors here at UMM. The
professor is looking to utilize your software to implement within his
courses however, due to possible accessibility concerns our IT department
will not download the software onto school computers unless we have a copy
of a VPAT or Accessibility Conformance Information Report. I was wondering
if I could obtain this information to provide to the IT department?
Thank you for your time on this matter and I look forward to hearing from
you soon. Please let me know if you have any questions or concerns.
Sincerely,
Zach Benner
Accessibility Coordinator & Student Success Liaison
University of Maine Machias
Office: 229A  Torrey Hall
116 O'Brien Ave
Machias, ME, 04654
Office Number: 207-255-1228

Confidentiality Notice:  This e-mail and any attachments...{{dropped:12}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ADA Compliance

2024-01-12 Thread Ben Bolker
 I would be very surprised if anyone had written up a VPAT 
<https://www.section508.gov/sell/vpat/> for R.


  It won't help you with the bureaucratic requirements, but R is in 
fact very accessible to visually impaired users: e.g. see



https://community.rstudio.com/t/accessibility-of-r-rstudio-compared-to-excel-for-student-that-is-legally-blind/103849/3

From https://github.com/ajrgodfrey/BrailleR

> R is perhaps the most blind-friendly statistical software option 
because all scripts can be written in plain text, using the text editor 
a user prefers, and all output can be saved in a wide range of file 
formats. The advent of R markdown and other reproducible research 
techniques can offer the blind user a degree of efficiency that is not 
offered in many other statistical software options. In addition, the 
processed Rmd files are usually HTML which are the best supported files 
in terms of screen reader development.


  (And there is continued attention to making sure R stays accessible 
in this way: 
https://stat.ethz.ch/pipermail/r-devel/2022-December/082180.html; 
https://stat.ethz.ch/pipermail/r-devel/2023-February/082313.html)


  R is also easy to use without a mouse, which should improve 
accessibility for users with neuromuscular conditions.


   cheers
Ben Bolker




On 2024-01-12 2:50 p.m., Hunter, Zayne via R-devel wrote:

Hello,


I am working with Ball State University to obtain a license of R. As part of 
our requirements for obtaining new software, we must review the VPAT for ADA 
compliance. Can you provide this information for me?

Thanks,


Zayne Hunter
Technology Advisor & Vendor Relations Manager
Ball State University
zayne.hun...@bsu.edu
(765)285-7853






[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Base R wilcox.test gives incorrect answers, has been fixed in DescTools, solution can likely be ported to Base R

2023-12-11 Thread Ben Bolker
  You could request a bugzilla account and post it to 
https://bugs.r-project.org/ yourself: from 
https://www.r-project.org/bugs.html,


> In order to get a bugzilla account (i.e., become “member”), please 
send an e-mail (from the address you want to use as your login) to 
bug-report-requ...@r-project.org briefly explaining why, and a volunteer 
will add you to R’s Bugzilla members.


  (On the other hand, I think that posting to this list was a good idea 
in any case, as it is more visible than the bugs list and may spark some 
useful discussion.)


   cheers
   Ben Bolker


On 2023-12-11 9:44 a.m., tkp...@gmail.com wrote:

While using the Hodges Lehmann Mean in DescTools (DescTools::HodgesLehmann),
I found that it generated incorrect answers (see
<https://github.com/AndriSignorell/DescTools/issues/97>
https://github.com/AndriSignorell/DescTools/issues/97). The error is driven
by the existence of tied values forcing wilcox.test in Base R to switch to
an approximate algorithm that returns incorrect results - see
<https://aakinshin.net/posts/r-hodges-lehmann-problems/>
https://aakinshin.net/posts/r-hodges-lehmann-problems/ for a detailed
exposition of the issue.

  


Andri Signorell and Cyril Moser have a new C++ implementation of
DescTools::HodgesLehmann using a O(N log(N)) algorithm due to Monahan, but
wilcox.test in Base R appears to be still broken. Will someone kindly bring
this observation, as well as the existence of a solution, to the attention
of the relevant person(s) in the Base R development team?

  


The paper by Mohanan, as well as the original Fortran implementation of the
algorithm are linked to from
<https://github.com/AndriSignorell/DescTools/issues/97>
https://github.com/AndriSignorell/DescTools/issues/97). Inefficient O(N^2)
algorithms for the Hodges-Lehmann mean are known and are implemented in a
variety of packages. For example, the authors of rt.test
(https://cran.r-project.org/web/packages/rt.test) use the O(N^2) approach. I
suspect that Andri and Cyril will be more than happy to assist with fixing
wilcox.test in Base R with their implementation of Monahan's fast algorithm.

  


Sincerely

  


Thomas Philips

  



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] option to silence/quieten stats::confint.glm ?

2023-12-07 Thread Ben Bolker

   confint.glm prints a message "Waiting for profiling to be done..."

   I could have sworn that there used to be an option (quiet = TRUE?) 
to turn this message off without resorting to suppressMessages() 
(finer/more specific control is always preferable ...) -- but on the 
basis of looking back at archived versions of MASS, and at this Stack 
Overflow post:


https://stackoverflow.com/questions/43847705/how-do-i-silence-confint-in-r

 I think I was hallucinating.

 Do people think this would be a reasonable minor feature request/would 
a patch suggestion be considered? What would the  best name for the 
argument be? (scan() has "quiet")


pos <- tail(search(), 1)  ## base package
tt <- lapply(c(lsf.str(pos = pos)), \(x) names(formals(x))) |> unlist() 
|> table()

> tt[["quiet"]]
[1] 4
> tt[["silent"]]
[1] 1
> tt[["verbose"]]
[1] 9


   cheers
 Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Minor bug with stats::isoreg

2023-09-27 Thread Ben Bolker



  Thanks! Submitted as https://bugs.r-project.org/show_bug.cgi?id=18603


On 2023-09-27 4:49 p.m., Travers Ching wrote:

Hello, I'd like to file a small bug report. I searched and didn't find a
duplicate report.

Calling isoreg with an Inf value causes a segmentation fault, tested on R
4.3.1 and R 4.2. A reproducible example is: `isoreg(c(0,Inf))`

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NROW and NCOL on NULL

2023-09-23 Thread Ben Bolker
   This is certainly worth discussing, but there's always a heavy 
burden of back-compatibility; how much better would it be for NCOL and 
NROW to both return zero, vs. the amount of old code that would be broken?


  Furthermore, the reason for this behaviour is justified as 
consistency with the behaviour of as.matrix() and cbind() for 
zero-length vectors, from ?NCOL:


 ## as.matrix() produces 1-column matrices from 0-length vectors,
 ## and so does cbind() :

 (of course you could argue that this behaviour should be changed as 
well ...)



On 2023-09-23 3:41 p.m., Simone Giannerini wrote:

I know it's documented and I know there are other ways to guard
against this behaviour, once you know about this.
The point is whether it might be worth it to make NCOL and NROW return
the same value on NULL and make R more consistent/intuitive and
possibly less error prone.

Regards,

Simone

On Sat, Sep 23, 2023 at 7:50 PM Duncan Murdoch  wrote:


It's been documented for a long time that NCOL(NULL) is 1.  What
particular problems did you have in mind?  There might be other ways to
guard against them.

Duncan Murdoch

On 23/09/2023 1:43 p.m., Simone Giannerini wrote:

Dear list,

I do not know what would be the 'correct' answer to the following but
I think that they should return the same value to avoid potential
problems and hard to debug errors.

Regards,

Simone
---


NCOL(NULL)

[1] 1


NROW(NULL)

[1] 0


sessionInfo()

R version 4.3.1 RC (2023-06-08 r84523 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default


locale:
[1] LC_COLLATE=Italian_Italy.utf8  LC_CTYPE=Italian_Italy.utf8
[3] LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.utf8

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.1








__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A demonstrated shortcoming of the R package management system

2023-08-06 Thread Ben Bolker
  I would support this suggestion.  There is a similar binary 
dependency chain from Matrix → TMB → glmmTMB; we have implemented 
various checks to make users aware that they need to reinstall from 
source, and to some extent we've tried to push out synchronous updates 
(i.e., push an update of TMB to CRAN every time Matrix changes, and an 
update of glmmTMB after that), but centralized machinery for this would 
certainly be nice.


  FWIW some of the machinery is here: 
https://github.com/glmmTMB/glmmTMB/blob/d9ee7b043281341429381faa19b5e53cb5a378c3/glmmTMB/R/utils.R#L209-L295 
-- it relies on a Makefile rule that caches the current installed 
version of TMB: 
https://github.com/glmmTMB/glmmTMB/blob/d9ee7b043281341429381faa19b5e53cb5a378c3/glmmTMB/R/utils.R#L209-L295



  cheers
   Ben Bolker


On 2023-08-06 5:05 p.m., Dirk Eddelbuettel wrote:


CRAN, by relying on the powerful package management system that is part of R,
provides an unparalleled framework for extending R with nearly 20k packages.

We recently encountered an issue that highlights a missing element in the
otherwise outstanding package management system. So we would like to start a
discussion about enhancing its feature set. As shown below, a mechanism to
force reinstallation of packages may be needed.

A demo is included below, it is reproducible in a container. We find the
easiest/fastest reproduction is by saving the code snippet below in the
current directory as eg 'matrixIssue.R' and have it run in a container as

docker run --rm -ti -v `pwd`:/mnt rocker/r2u Rscript /mnt/matrixIssue.R
   
This runs in under two minutes, first installing the older Matrix, next

installs SeuratObject, and then by removing the older Matrix making the
(already installed) current Matrix version the default. This simulates a
package update for Matrix. Which, as the final snippet demonstrates, silently
breaks SeuratObject as the cached S4 method Csparse_validate is now missing.
So when SeuratObject was installed under Matrix 1.5.1, it becomes unuseable
under Matrix 1.6.0.

What this shows is that a call to update.packages() will silently corrupt an
existing installation.  We understand that this was known and addressed at
CRAN by rebuilding all binary packages (for macOS and Windows).

But it leaves both users relying on source installation as well as
distributors of source packages in a dire situation. It hurt me three times:
my default R installation was affected with unit tests (involving
SeuratObject) silently failing. It similarly broke our CI setup at work.  And
it created a fairly bad headache for the Debian packaging I am involved with
(and I surmise it affects other distro similarly).

It would be good to have a mechanism where a package, when being upgraded,
could flag that 'more actions are required' by the system (administrator).
We think this example demonstrates that we need such a mechanism to avoid
(silently !!) breaking existing installations, possibly by forcing
reinstallation of other packages.  R knows the package dependency graph and
could trigger this, possibly after an 'opt-in' variable the user / admin
sets.

One possibility may be to add a new (versioned) field 'Breaks:'. Matrix could
then have added 'Breaks: SeuratObject (<= 4.1.3)' preventing an installation
of Matrix 1.6.0 when SeuratObject 4.1.3 (or earlier) is present, but
permitting an update to Matrix 1.6.0 alongside a new version, say, 4.1.4 of
SeuratObject which could itself have a versioned Depends: Matrix (>= 1.6.0).

Regards,  Dirk


## Code example follows. Recommended to run the rocker/r2u container.
## Could also run 'apt update -qq; apt upgrade -y' but not required
## Thanks to my colleague Paul Hoffman for the core of this example

## now have Matrix 1.6.0 because r2u and CRAN remain current but we can install 
an older Matrix
remotes::install_version('Matrix', '1.5.1')

## we can confirm that we have Matrix 1.5.1
packageVersion("Matrix")

## we now install SeuratObject from source and to speed things up we first 
install the binary
install.packages("SeuratObject")   # in this container via bspm/r2u as binary
## and then force a source installation (turning bspm off) _while Matrix is at 
1.5.1_
if (requireNamespace("bspm", quietly=TRUE) bspm::disable()
Sys.setenv(PKG_CXXFLAGS='-Wno-ignored-attributes')  # Eigen compilation 
noise silencer
install.packages('SeuratObject')

## we now remove the Matrix package version 1.5.1 we installed into /usr/local 
leaving 1.6.0
remove.packages("Matrix")
packageVersion("Matrix")

## and we now run a bit of SeuratObject code that is now broken as 
Csparse_validate is gone
suppressMessages(library(SeuratObject))
data('pbmc_small')
graph <- pbmc_small[['RNA_snn']]
class(graph)
getClass('Graph')
show(graph) # this fails




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] question about an R idiom: eval()ing a quoted block

2023-07-11 Thread Ben Bolker
  In a few places in the R source code, such as the $initialize element 
of `family` objects, and in the body of power.t.test() (possibly other 
power.* functions), sets of instructions that will need to be run later 
are encapsulated by saving them as an expression and later applying 
eval(), rather than as a function. This seems weird to me; the only 
reason I can think of for doing it this way is to avoid having to pass 
back multiple objects and assign them in the calling environment (since 
R doesn't have a particularly nice form of Python's tuple-unpacking idiom).


  Am I missing something?

 cheers
   Ben


https://github.com/r-devel/r-svn/blob/eac72e66a4d2c2aba50867bd80643b978febf5a3/src/library/stats/R/power.R#L38-L52

https://github.com/r-devel/r-svn/blob/master/src/library/stats/R/family.R#L166-L171

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] restoring LANGUAGE env variable within an R session

2023-06-26 Thread Ben Bolker
  Thanks, this is exactly PR#18055. Should have looked (but assumed I 
was probably just overlooking something ...)


On 2023-06-26 10:02 a.m., Sebastian Meyer wrote:

Translated strings are cached.
I'd recommend to use the

     • New partly experimental Sys.setLanguage() utility, solving the
   main problem of PR#18055.

introduced in R 4.2.0.

Best,

 Sebastian Meyer


Am 26.06.23 um 15:15 schrieb Ben Bolker:

    I was playing around with the setting of the LANGUAGE variable and am
wondering whether I'm missing something obvious about resetting the
value to its original state once it's been set.  I seem to be able to
reset the language for warnings/errors once, but not to change it a
second time (or reset it) once it's been set ... ??

## default (no LANGUAGE set, English locale)
  > sqrt(-1)
[1] NaN
Warning message:
In sqrt(-1) : NaNs produced
## no complaints, doesn't change (as expected)
  > Sys.setenv(LANGUAGE = "en")
  > sqrt(-1)
[1] NaN
Warning message:
In sqrt(-1) : NaNs produced

## change to German
  > Sys.setenv(LANGUAGE = "de")
  > sqrt(-1)
[1] NaN
Warnmeldung:
In sqrt(-1) : NaNs wurden erzeugt

## try to change to Spanish - no luck
## (this does work in a clean session)

  > Sys.setenv(LANGUAGE = "es")
  > sqrt(-1)
[1] NaN
Warnmeldung:
In sqrt(-1) : NaNs wurden erzeugt

## try resetting to blank
  > Sys.setenv(LANGUAGE = "")
  > sqrt(-1)
[1] NaN
Warnmeldung:
In sqrt(-1) : NaNs wurden erzeugt

## or back to English explicitly?
  > Sys.setenv(LANGUAGE = "en")
  > sqrt(-1)
[1] NaN
Warnmeldung:
In sqrt(-1) : NaNs wurden erzeugt
  >

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
> E-mail is sent at my convenience; I don't expect replies outside of 
working hours.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] restoring LANGUAGE env variable within an R session

2023-06-26 Thread Ben Bolker
  That's reasonable, but I'm wondering why it works the *first* time 
it's called in a session. Is this just undefined behaviour (so I 
shouldn't be surprised whatever happens)?  Again,


$ Rscript -e 'sqrt(-1); Sys.setenv(LANGUAGE="es"); sqrt(-1)'
[1] NaN
Warning message:
In sqrt(-1) : NaNs produced
[1] NaN
Warning message:
In sqrt(-1) : Se han producido NaNs

  I should clarify that this really isn't that important for my 
workflow, it just seemed like an odd loose end.


Weirdly, I just discovered that Sys.setLanguage().  Don't know how it 
differs, but there's a bindtextdomain(NULL) call  there which may be the 
magic sauce ... ???


 sqrt(-1)
[1] NaN
Warning message:
In sqrt(-1) : NaNs produced
> Sys.setLanguage("de")
> sqrt(-1)
[1] NaN
Warnmeldung:
In sqrt(-1) : NaNs wurden erzeugt
> Sys.setLanguage("en")
> sqrt(-1)
[1] NaN
Warning message:
In sqrt(-1) : NaNs produced



On 2023-06-26 9:38 a.m., Dirk Eddelbuettel wrote:


Ben,

POSIX level / glibc level variables are set at process start and AGAIK cannot
really be altered after start.  They clearly work when set _before_ calling 
sqrt(-1):

 $ LANGUAGE=es Rscript -e 'sqrt(-1)'
 [1] NaN
 Warning message:
 In sqrt(-1) : Se han producido NaNs
 $ LANGUAGE=de Rscript -e 'sqrt(-1)'
 [1] NaN
 Warnmeldung:
 In sqrt(-1) : NaNs wurden erzeugt
 $

I think the `callr` package can help you with this use from with R by
effectively spawning a new process for you. Or, lower-level, you can call
`system()` or `system2()` yourself and take care of the setup.

Cheers, Dirk



--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
> E-mail is sent at my convenience; I don't expect replies outside of 
working hours.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] restoring LANGUAGE env variable within an R session

2023-06-26 Thread Ben Bolker
  I was playing around with the setting of the LANGUAGE variable and am 
wondering whether I'm missing something obvious about resetting the 
value to its original state once it's been set.  I seem to be able to 
reset the language for warnings/errors once, but not to change it a 
second time (or reset it) once it's been set ... ??


## default (no LANGUAGE set, English locale)
> sqrt(-1)
[1] NaN
Warning message:
In sqrt(-1) : NaNs produced
## no complaints, doesn't change (as expected)
> Sys.setenv(LANGUAGE = "en")
> sqrt(-1)
[1] NaN
Warning message:
In sqrt(-1) : NaNs produced

## change to German
> Sys.setenv(LANGUAGE = "de")
> sqrt(-1)
[1] NaN
Warnmeldung:
In sqrt(-1) : NaNs wurden erzeugt

## try to change to Spanish - no luck
## (this does work in a clean session)

> Sys.setenv(LANGUAGE = "es")
> sqrt(-1)
[1] NaN
Warnmeldung:
In sqrt(-1) : NaNs wurden erzeugt

## try resetting to blank
> Sys.setenv(LANGUAGE = "")
> sqrt(-1)
[1] NaN
Warnmeldung:
In sqrt(-1) : NaNs wurden erzeugt

## or back to English explicitly?
> Sys.setenv(LANGUAGE = "en")
> sqrt(-1)
[1] NaN
Warnmeldung:
In sqrt(-1) : NaNs wurden erzeugt
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] why does [A-Z] include 'T' in an Estonian locale?

2023-06-16 Thread Ben Bolker

  Yes.
  FWIW I submitted a request for a documentation fix to TRE (to 
document that it actually uses Unicode order, not collation order, to 
define ranges, just like most (but not all) other regex engines ...)


https://github.com/laurikari/tre/issues/88

On 2023-06-16 5:16 a.m., peter dalgaard wrote:

Just for amusement: Similar messups occur with Danish and its three extra 
letters:


Sys.setlocale("LC_ALL", "da_DK")

[1] "da_DK/da_DK/da_DK/C/da_DK/en_US.UTF-8"

sort(c(LETTERS,"Æ","Ø","Å"))

  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" 
"S"
[20] "T" "U" "V" "W" "X" "Y" "Z" "Æ" "Ø" "Å"


grepl("[A-Å]", "Ø")

[1] FALSE

grepl("[A-Å]", "Æ")

[1] FALSE

grepl("[A-Æ]", "Å")

[1] TRUE

grepl("[A-Æ]", "Ø")

[1] FALSE

grepl("[A-Ø]", "Å")

[1] TRUE

grepl("[A-Ø]", "Æ")

[1] TRUE

So for character ranges, the order is Å,Æ,Ø (which is how they'd collate in 
Swedish, except that Swedish uses diacriticals rather than Æ and Ø).


Sys.setlocale("LC_ALL", "sv_SE")

[1] "sv_SE/sv_SE/sv_SE/C/sv_SE/en_US.UTF-8"

sort(c(LETTERS,"Æ","Ø","Å"))

  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" 
"S"
[20] "T" "U" "V" "W" "X" "Y" "Z" "Å" "Æ" "Ø"

sort(c(LETTERS,"Ä","Ö","Å"))

  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" 
"S"
[20] "T" "U" "V" "W" "X" "Y" "Z" "Å" "Ä" "Ö"




On 30 May 2023, at 17:45 , Ben Bolker  wrote:

  Inspired by this old Stack Overflow question

https://stackoverflow.com/questions/19765610/when-does-locale-affect-rs-regular-expressions

I was wondering why this is TRUE:

Sys.setlocale("LC_ALL", "et_EE")
grepl("[A-Z]", "T")

TRE's documentation at <https://laurikari.net/tre/documentation/regex-syntax/> says that 
a range "is shorthand for the full range of characters between those two [endpoints] 
(inclusive) in the collating sequence".

Yet, T is *not* between A and Z in the Estonian collating sequence:

sort(LETTERS)
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "Z" "T" "U" "V" "W" "X" "Y"

  I realize that this may be a question about TRE rather than about R *per se* (FWIW the 
grepl() result is also TRUE with `perl = TRUE`, so the question also applies to PCRE), 
but I'm wondering if anyone has any insights ...  (and yes, I know that the correct 
answer is "use [:alpha:] and don't worry about it")

(In contrast, the ICU engine underlying stringi/stringr says "[t]he characters to 
include are determined by Unicode code point ordering" - see

https://stackoverflow.com/questions/76365426/does-stringrs-regex-engine-translate-a-z-into-abcdefghijklmnopqrstuvwyz/76366163#76366163

for links)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
> E-mail is sent at my convenience; I don't expect replies outside of 
working hours.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] infelicity in `na.print = ""` for numeric columns of data frames/formatting numeric values

2023-06-05 Thread Ben Bolker




On 2023-06-05 9:27 a.m., Martin Maechler wrote:

Ben Bolker
 on Sat, 3 Jun 2023 13:06:41 -0400 writes:


 > format(c(1:2, NA)) gives the last value as "NA" rather than
 > preserving it as NA, even if na.encode = FALSE (which does the
 > 'expected' thing for character vectors, but not numeric vectors).

 > This was already brought up in 2008 in
 > https://bugs.r-project.org/show_bug.cgi?id=12318 where Gregor Gorjanc
 > pointed out the issue. Documentation was added and the bug closed as
 > invalid. GG ended with:

 >> IMHO it would be better that na.encode argument would also have an
 > effect for numeric like vectors. Nearly any function in R returns NA
 > values and I expected the same for format, at least when na.encode=FALSE.

 > I agree!

I do too, at least "in principle", keeping in mind that
backward compatibility is also an important principle ...

Not sure if the 'na.encode' argument should matter or possibly a
new optional argument, but "in principle" I think that

   format(c(1:2, NA, 4))

should preserve is.na(.) even by default.


   I would say it should preserve `is.na` *only* if na.encode = FALSE - 
that seems like the minimal appropriate change away from the current 
behaviour.




 > I encountered this in the context of printing a data frame with
 > na.print = "", which works as expected when printing the individual
 > columns but not when printing the whole data frame (because
 > print.data.frame calls format.data.frame, which calls format.default
 > ...).  Example below.

 > It's also different from what you would get if you converted to
 > character before formatting and printing:

 > print(format(as.character(c(1:2, NA)), na.encode=FALSE), na.print ="")

 > Everything about this is documented (if you look carefully enough),
 > but IMO it violates the principle of least surprise
 > https://en.wikipedia.org/wiki/Principle_of_least_astonishment , so I
 > would call it at least an 'infelicity' (sensu Bill Venables)

 > Is there any chance that this design decision could be revisited?

We'd have to hear other opinions / gut feelings.

Also, someone (not me) would ideally volunteer to run
'R CMD check ' for a few 1000 (not necessarily all) CRAN &
BioC packages with an accordingly patched version of R-devel
(I might volunteer to create such a branch, e.g., a bit before the R
  Sprint 2023 end of August).


  I might be willing to do that, although it would be nice if there 
were a pre-existing framework (analogous to r-lib/revdepcheck) for 
automating it and collecting the results ...






 > cheers
 > Ben Bolker


 > ---

The following issue you are raising
may really be a *different* one, as it involves format() and
print() methods for "data.frame", i.e.,

format.data.frame() vs
 print.data.frame()

which is quite a bit related, of course, to how 'numeric'
columns are formatted -- as you note yourself below;
I vaguely recall that the data.frame method could be an even
"harder problem" .. but I don't remember the details.

It may also be that there are no changes necessary to the
*.data.frame() methods, and only the documentation (you mention)
should be updated ...



  I *think* that if format.default() were changed so that 
na.encode=FALSE also applied to numeric types, then data frame printing 
would naturally work 'right' (since print.data.frame calls 
format.data.frame which calls format() for the individual columns 
specifying encode=FALSE ...)


Martin

 > Consider

 > dd <- data.frame(f = factor(1:2), c = as.character(1:2), n =
 > as.numeric(1:2), i = 1:2)
 > dd[3,] <- rep(NA, 4)
 > print(dd, na.print = "")


 > print(dd, na.print = "")
 >   f c  n  i
 > 1 1 1  1  1
 > 2 2 2  2  2
 > 3 NA NA

 > This is in fact as documented (see below), but seems suboptimal given
 > that printing the columns separately with na.print = "" would
 > successfully print the NA entries as blank even in the numeric columns:

 > invisible(lapply(dd, print, na.print = ""))
 > [1] 1 2
 > Levels: 1 2
 > [1] "1" "2"
 > [1] 1 2
 > [1] 1 2

 > * ?print.data.frame documents that it calls format() for each column
 > before printing
 > * the code of print.data.frame() shows that it calls format.data.frame()
 > with na.encode = FALSE
 > * ?format.data.frame specifically notes that na.encode "only applies to
 > elements of character vectors, not to numerical, complex nor logical

[Rd] infelicity in `na.print = ""` for numeric columns of data frames/formatting numeric values

2023-06-03 Thread Ben Bolker
  format(c(1:2, NA)) gives the last value as "NA" rather than 
preserving it as NA, even if na.encode = FALSE (which does the 
'expected' thing for character vectors, but not numeric vectors).


  This was already brought up in 2008 in 
https://bugs.r-project.org/show_bug.cgi?id=12318 where Gregor Gorjanc 
pointed out the issue. Documentation was added and the bug closed as 
invalid. GG ended with:


> IMHO it would be better that na.encode argument would also have an
effect for numeric like vectors. Nearly any function in R returns NA 
values and I expected the same for format, at least when na.encode=FALSE.


  I agree!

  I encountered this in the context of printing a data frame with 
na.print = "", which works as expected when printing the individual 
columns but not when printing the whole data frame (because 
print.data.frame calls format.data.frame, which calls format.default 
...).  Example below.


  It's also different from what you would get if you converted to 
character before formatting and printing:


print(format(as.character(c(1:2, NA)), na.encode=FALSE), na.print ="")

  Everything about this is documented (if you look carefully enough), 
but IMO it violates the principle of least surprise 
https://en.wikipedia.org/wiki/Principle_of_least_astonishment , so I 
would call it at least an 'infelicity' (sensu Bill Venables)


  Is there any chance that this design decision could be revisited?

  cheers
   Ben Bolker


---

  Consider

dd <- data.frame(f = factor(1:2), c = as.character(1:2), n = 
as.numeric(1:2), i = 1:2)

dd[3,] <- rep(NA, 4)
print(dd, na.print = "")


print(dd, na.print = "")
  f c  n  i
1 1 1  1  1
2 2 2  2  2
3 NA NA

This is in fact as documented (see below), but seems suboptimal given 
that printing the columns separately with na.print = "" would 
successfully print the NA entries as blank even in the numeric columns:


invisible(lapply(dd, print, na.print = ""))
[1] 1 2
Levels: 1 2
[1] "1" "2"
[1] 1 2
[1] 1 2

* ?print.data.frame documents that it calls format() for each column 
before printing
* the code of print.data.frame() shows that it calls format.data.frame() 
with na.encode = FALSE
* ?format.data.frame specifically notes that na.encode "only applies to 
elements of character vectors, not to numerical, complex nor logical 
‘NA’s, which are always encoded as ‘"NA"’.


   So the NA values in the numeric columns become "NA" rather than 
remaining as NA values, and are thus printed rather than being affected 
by the na.print argument.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] why does [A-Z] include 'T' in an Estonian locale?

2023-06-03 Thread Ben Bolker
  Thanks, I do know about the docs you quoted.  Thanks for pointing me 
to the comment in the code.


 I've posted an issue (a request to make the documentation match the 
code) at the TRE repository:


https://github.com/laurikari/tre/issues/88


On 2023-06-01 5:53 a.m., Tomas Kalibera wrote:


On 5/30/23 17:45, Ben Bolker wrote:

Inspired by this old Stack Overflow question

https://stackoverflow.com/questions/19765610/when-does-locale-affect-rs-regular-expressions

I was wondering why this is TRUE:

Sys.setlocale("LC_ALL", "et_EE")
grepl("[A-Z]", "T")

TRE's documentation at 
<https://laurikari.net/tre/documentation/regex-syntax/> says that a 
range "is shorthand for the full range of characters between those two 
[endpoints] (inclusive) in the collating sequence".


Yet, T is *not* between A and Z in the Estonian collating sequence:

 sort(LETTERS)
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" 
"Q" "R" "S"

[20] "Z" "T" "U" "V" "W" "X" "Y"

  I realize that this may be a question about TRE rather than about R 
*per se* (FWIW the grepl() result is also TRUE with `perl = TRUE`, so 
the question also applies to PCRE), but I'm wondering if anyone has 
any insights ...  (and yes, I know that the correct answer is "use 
[:alpha:] and don't worry about it")


The correct answer depends on what you want to do, but please see 
?regexp in R:


"Because their interpretation is locale- and implementation-dependent, 
character ranges are best avoided."


and

"The only portable way to specify all ASCII letters is to list them all 
as the character class

‘[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]’."

This is from POSIX specification:

"In the POSIX locale, a range expression represents the set of collating 
elements that fall between two elements in the collation sequence, 
inclusive. In other locales, a range expression has unspecified 
behavior: strictly conforming applications shall not rely on whether the 
range expression is valid, or on the set of collating elements matched. 
A range expression shall be expressed as the starting point and the 
ending point separated by a  ( '-' )."


If you really want to know why the current implementation of R, TRE and 
PCRE2 works in a certain way, you can check the code, but I don't think 
it would be a good use of the time given what is written above.


It may be that TRE has a bug, maybe it doesn't do what was intended (see 
comment "XXX - Should use collation order instead of encoding values in 
character ranges." in the code), but I didn't check the code thoroughly.


Best
Tomas



(In contrast, the ICU engine underlying stringi/stringr says "[t]he 
characters to include are determined by Unicode code point ordering" - 
see


https://stackoverflow.com/questions/76365426/does-stringrs-regex-engine-translate-a-z-into-abcdefghijklmnopqrstuvwyz/76366163#76366163

for links)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] why does [A-Z] include 'T' in an Estonian locale?

2023-05-30 Thread Ben Bolker

  Inspired by this old Stack Overflow question

https://stackoverflow.com/questions/19765610/when-does-locale-affect-rs-regular-expressions

I was wondering why this is TRUE:

Sys.setlocale("LC_ALL", "et_EE")
grepl("[A-Z]", "T")

TRE's documentation at 
 says that a 
range "is shorthand for the full range of characters between those two 
[endpoints] (inclusive) in the collating sequence".


Yet, T is *not* between A and Z in the Estonian collating sequence:

 sort(LETTERS)
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" 
"Q" "R" "S"

[20] "Z" "T" "U" "V" "W" "X" "Y"

  I realize that this may be a question about TRE rather than about R 
*per se* (FWIW the grepl() result is also TRUE with `perl = TRUE`, so 
the question also applies to PCRE), but I'm wondering if anyone has any 
insights ...  (and yes, I know that the correct answer is "use [:alpha:] 
and don't worry about it")


(In contrast, the ICU engine underlying stringi/stringr says "[t]he 
characters to include are determined by Unicode code point ordering" - see


https://stackoverflow.com/questions/76365426/does-stringrs-regex-engine-translate-a-z-into-abcdefghijklmnopqrstuvwyz/76366163#76366163

for links)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Query: Could documentation include modernized references?

2023-03-26 Thread Ben Bolker
  For one point of evidence about how much people pay attention to the 
documentation about what's outdated: Brian Ripley added a comment to 
nlminb.Rd in 2013 saying that the function was "for historical 
compatibility" 
 
but it's still widely used in new code ...


 But I agree that adding appropriate warnings/links to the 
documentation couldn't hurt.


  cheers
   Ben

On 2023-03-26 12:41 p.m., Duncan Murdoch wrote:

On 26/03/2023 11:54 a.m., J C Nash wrote:
A tangential email discussion with Simon U. has highlighted a 
long-standing

matter that some tools in the base R distribution are outdated, but that
so many examples and other tools may use them that they cannot be 
deprecated.


The examples that I am most familiar with concern optimization and 
nonlinear
least squares, but other workers will surely be able to suggest cases 
elsewhere.
I was the source (in Pascal) of Nelder-Mead, BFGS and CG algorithms in 
optim().
BFGS is still mostly competitive, and Nelder-Mead is useful for 
initial exploration
of an optimization problem, but CG was never very good, right from the 
mid-1970s

well before it was interfaced to R. By contrast Rcgmin works rather well
considering how similar it is in nature to CG. Yet I continue to see 
use and

even recommendations of these tools in inappropriate circumstances.

Given that it would break too many other packages and examples to drop 
the
existing tools, should we at least add short notes in the man (.Rd) 
pages?

I'm thinking of something like

 optim() has methods that are dated. Users are urged to consider 
suggestions

 from ...

and point to references and/or an appropriate Task View, which could, 
of course,

be in the references.

I have no idea what steps are needed to make such edits to the man 
pages. Would
R-core need to be directly involved, or could one or two trusted R 
developers
be given privileges to seek advice on and implement such modest 
documentation
additions?  FWIW, I'm willing to participate in such an effort, which 
I believe

would help users to use appropriate and up-to-date tools.


I can answer your final paragraph:

Currently R-core would need to be directly involved, in that they are 
the only ones with write permission on the R sources.


However, they don't need to do the work, they just need to approve of it 
and commit it.  So I would suggest one way forward is the following:


- You fork one of the mirrors of the R sources from Github, and (perhaps 
with help from others) edit one or two of the pages in the way you're 
describing.  Once you think they are ready, make them available online 
for others to review (Github or Gitlab would help doing this), and then 
submit the changes as a patch against the svn sources on the R Bugzilla 
site.


- Another way could be that you copy the help page sources to a dummy 
package, instead of checking out the whole of the R sources.  You'll 
need to be careful not to miss other changes to the originals between 
the time you make your copy and the time you submit the patches.


Don't do too many pages, because you're probably going to have to work 
out the details of the workflow as you go, and earn R Core's trust by 
submitting good changes and responding to their requests.  And maybe 
don't do any until you hear from a member of R Core that they're willing 
to participate in this, because they certainly don't accept all 
suggestions.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
> E-mail is sent at my convenience; I don't expect replies outside of 
working hours.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] use Ctrl-W to close View() window?

2023-03-17 Thread Ben Bolker
  It does work, although it's very awkward with my current keyboard 
setup (I need to use Alt-Fn-F4).  However, knowing that there was *a* 
keyboard shortcut for it led me to figuring out how to add Ctrl-W as a 
synonym.  Thanks!  ("Super"-Q is also a synonym, where "Super" is the 
same as the Windows/MacOS command/system key ...)


On 2023-03-17 6:25 p.m., Johannes Ranke wrote:


Hi,

am I missing something or could you just use Alt-F4? This is pretty standard
for closing focussed windows on Windows and Linux at least. It just closed a
Window opened with View() on Debian Linux FWIW.

Cheers, Johannes

Am Freitag, 17. März 2023, 23:16:49 CET schrieb Ben Bolker:

I might be the last person in the world who's using View() outside of
RStudio, but does anyone have a sense of how hard it would be to enable
closing such a window (when in focus) with a standard keyboard shortcut
(e.g. Ctrl-W on Linux) ... ?  Where would I start looking in the code base?

Or maybe this can already be enabled somehow?

cheers

 Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] use Ctrl-W to close View() window?

2023-03-17 Thread Ben Bolker
  I might be the last person in the world who's using View() outside of 
RStudio, but does anyone have a sense of how hard it would be to enable 
closing such a window (when in focus) with a standard keyboard shortcut 
(e.g. Ctrl-W on Linux) ... ?  Where would I start looking in the code base?


  Or maybe this can already be enabled somehow?

  cheers

   Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] transform.data.frame() ignores unnamed arguments when no named argument is provided

2023-03-03 Thread Ben Bolker
   For what it's worth I think the increased emphasis on classed
errors should help with this (i.e., it will be easier to filter out
errors you know are false positives/irrelevant for your use case).

On Fri, Mar 3, 2023 at 12:17 PM Antoine Fabri  wrote:
>
> Let me expand a bit, I might have expressed myself poorly.
>
>  If there is a good reason for a warning I want a warning, and because I
> take them seriously I don't want my console cluttered with those that can
> be avoided. I strongly believe we should strive to make our code silent,
> and I like my console to tell me only what I need to know. In my opinion
> many warnings would be better designed as errors, sometimes with an
> argument to opt in the behaviour, or a documented way to work around. Some
> other warnings should just be documented behavior, because the behavior is
> not all that surprising.
>
> Some reasons why I find warnings hard to debug:
> - options(warn = 1) is not always enough to spot the source of the warning
> - options(warn = 2) fails at every warning, including the ones that are not
> interesting to the user and that they may not do anything about, in these
> cases you'll have to find a way to shut off the first to get to the second,
> and if it's packaged code that's not fun.
> - Unlike with errors, traceback() won't help.
> - tryCatch() will help you only if you call it at the right place, assuming
> you've found it.
> - We might also have many harmless warnings triggered through loops and
> hiding important ones.
> - When you are sure that you are OK with your code despite the warning, say
> `as.numeric(c("1", "2", "foo"))`, a workaround might be expensive (here we
> could use regex first to ditch the non numeric strings but who does that)
> so you're tempted to use `suppressWarnings()`, but then you might be
> suppressing other important warnings so you just made your code less safe
> because the developper wanted to make it safer (you might say it's on the
> user but still, we get suboptimal code that was avoidable).
>
> Of course I might miss some approaches that would make my experience of
> debugging warnings more pleasant.
>
> In our precise case I don't find the behavior surprising enough to warrant
> more precious red ink since it's close to what we get with data.frame(),
> and close to what we get with dplyr::mutate() FWIW, so I'd be personally
> happier to have this documented and work silently.
>
> Either way I appreciate you considering the problem.
>
> Thanks,
>
> Antoine
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] uniroot violates bounds?

2023-02-18 Thread Ben Bolker

c1 <- 4469.822
c2 <- 572.3413
f <- function(x) { c1/x - c2/(1-x) }; uniroot(f, c(1e-6, 1))
uniroot(f, c(1e-6, 1))


   provides a root at -6.00e-05, which is outside of the specified 
bounds.  The default value of the "extendInt" argument to uniroot() is 
"no", as far as I can see ...


$root
[1] -6.003516e-05

$f.root
[1] -74453981

$iter
[1] 1

$init.it
[1] NA

$estim.prec
[1] 6.103516e-05


  I suspect this fails because f(1) (value at the upper bound) is 
infinite, although setting interval to c(0.01, 1) does work/give a 
sensible answer ...  (works for a lower bound of 1e-4, fails for 1e-5 ...)


  Setting the upper bound < 1 appears to avoid the problem.

 For what it's worth, the result has an "init.it" component, but the 
only thing the documentation says about it is " component ‘init.it’ was 
added in R 3.1.0".


  And, I think (?) that the 'trace' argument only produces any output 
if the 'extendInt' option is enabled?


  Inspired by 
https://stackoverflow.com/questions/75494696/solving-a-system-of-non-linear-equations-with-only-one-unknown/75494955#75494955


  cheers
   Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Linking to Intel's MKL on Windows

2022-10-01 Thread Ben Bolker
   Maybe you can find out more about Microsoft's development/release 
process for MRO and why they're still on 4.0.2 (from June 2020)?  I 
followed the "user forum" link on their web page, but it appears to be a 
generic Windows forum ...


https://social.msdn.microsoft.com/Forums/en-US/home?forum%20=ropen

   I might tweet at @revodavid (David Smith) to see if there's any more 
information available about the MRO release schedule ...


  good luck,
   Ben Bolker




On 2022-10-01 12:00 p.m., Viechtbauer, Wolfgang (NP) wrote:

Hi Christine,

MKL is a closed-source commercial product (yes, one can get it for free, but it 
is not libre/open-source software).

Best,
Wolfgang


-Original Message-
From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Christine
Stawitz - NOAA Federal via R-devel
Sent: Friday, 30 September, 2022 18:46
To: r-devel@r-project.org
Subject: [Rd] Linking to Intel's MKL on Windows

Hi,

Recently I became aware that Microsoft R Open provides accelerated matrix
algebra computations through Intel's Math Kernel Libraries. However, the
version of R shipped with the Microsoft R Open is too out of date to be
able to use concurrently with other dependencies while developing our
package. This thread suggests a way to get the updated matrix libraries
with a more recent version of R, however it is unlikely to be approved by
our IT admin since those of us in government agencies aren't typically
given admin privileges: Linking Intel's Math Kernel Library (MKL) to R on
Windows - Stack Overflow
<https://stackoverflow.com/questions/38090206/linking-intels-math-kernel-library-
mkl-to-r-on-windows>

Is there a reason why CRAN doesn't provide a version of R with the updated
libraries such that developers don't have to recompile R or copy .dlls
around as described above? It would help those of us running software with
slow-running matrix calculations in R.

Thanks,
Christine

--
Christine C. Stawitz, PhD. (pronouns: she/her)

National Stock Assessment Program Modeling Team

NOAA Fisheries Office of Science and Technology |  U.S. Department of
Commerce

Mobile: 206-617-2060

www.fisheries.noaa.gov


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
> E-mail is sent at my convenience; I don't expect replies outside of 
working hours.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problem with accessibility in R 4.2.0 and 4.2.1.

2022-09-22 Thread Ben Bolker

  There was a long apparently related thread back in May:

https://stat.ethz.ch/pipermail/r-devel/2022-May/081708.html

but that problem was supposedly patched in 4.2.1 ...


On 2022-09-22 9:48 a.m., Andrew Hart via R-devel wrote:
Hi. I'm having an issue with R 4.2.1 on Windows but I'm not sure if this 
is the right place to ask about it. If it's not, I'm hoping someone can 
point me in the right direction.


I'm blind and have been using R for about 11 years now. The base build 
available on CRAN is quite accessible and works pretty well with 
screen-reading software such as JAWS for Windows and NVDA. R-studio is 
not accessible which appears to have something to do with the version of 
QT it uses, but that's not relevant as I don't use it.


Recently I installed R 4.2.1 (I tend to upgrade two or three times a 
year and this time I was jumping from R 4.1.2 to 4.2.1).
However, I've encountered a serious problem which makes the latest 
version more or less unusable for doing any kind of serious work.
The issue is that the screen-reading software is unable to locate the R 
cursor and behaves as though the cursor is near the top left of the R 
application window. Practically, this means I can't tell what characters 
I'm passing over when cursoring left and right, nor can I hear what 
character is being deleted when the backspace is pressed. Most 
importantly, I can't tell where the insertion point is. This is a major 
regression in the ability to work with and edit the command line in the 
R console. There are ways of actually viewing the command line but the 
way I work is frequently calling up a previous command and making a 
change so as to not have to type the whole command again.


I Went and installed R 4.1.3 and R 4.2.0 in an attempt to find out 
exactly when things went awry and the issue first appeared in R 4.2.0.
Looking through the release notes, the only things mentioned that seem 
likely to be relevant are the following:


• R uses a new 64-bit Tcl/Tk bundle. The previous 32-bit/64-bit bundle 
had a different layout and can no longer be used.


and

• R uses UTF-8 as the native encoding on recent Windows systems (at 
least Windows 10 version 1903, Windows Server 2022 or Windows Server 
1903). As a part
of this change, R uses UCRT as the C runtime. UCRT should be installed 
manually on systems older than Windows 10 or Windows Server 2016 before 
installing

R.

I can't really see how changing to utf-8 as the native encoding would 
produce the behaviour I'm seeing, so I am guessing that the change in 
TCL/TK might be the culprit.


I'm hoping that someone will be able to help shed some light on what's 
going on here.


Thanks a lot,
Andrew.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
> E-mail is sent at my convenience; I don't expect replies outside of 
working hours.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Writing in R // R 4.2.0 on Windows doesn't work with Dasher

2022-05-15 Thread Ben Bolker
 VScode is sorta-kinda open source 
https://opensource.com/article/20/6/open-source-alternatives-vs-code 
(that is, the default downloadable binaries are non-freely licensed). 
Presumably the open builds also work.


  On the other hand, it's also developed by Microsoft, so it's not much 
of a surprise that it works better on Windows than some of the alternatives.


On 2022-05-15 5:48 p.m., Duncan Murdoch wrote:

On 15/05/2022 5:01 p.m., Kasper Daniel Hansen wrote:
It is interesting that Paulo reports Rgui to behave differently from 
many (all?) other applications. However, I have no insight into Windows.


It's not a big surprise.  Rgui uses a UI library (Graphapp) that was 
written a very long time ago, and it hasn't adopted their updates in at 
least 15 years.  Additionally, Rgui hasn't really had any Windows users 
giving it the attention it needs.


And not nearly "all".  RStudio has different problems, which means 
everyone using the same UI library they use probably has them too.  I 
didn't see any open source projects in the list of things that work.


Duncan Murdoch



Best,
Kasper

On Sun, May 15, 2022 at 3:32 PM Duncan Murdoch 
mailto:murdoch.dun...@gmail.com>> wrote:


    On 15/05/2022 2:44 p.m., Ben Bolker wrote:
 >     I don't know if there's a good up-to-date list anywhere of
 > editors/IDEs that handle R nicely, but it would include at least:
 >
 >     Atom
 >     Sublime Text
 >     VS Code
 >     RStudio
 >     Jupyter notebooks
 >     vim
 >     emacs
 >     Tinn-R
 >
 >     It's worth being able to choose from such a list both for 
general
 > aesthetic preferences, and for those with accessibility 
challenges.


    One more that I should have mentioned:  StatET, a plug-in for the
    Eclipse IDE.

 >
 >    I do agree that it would be nice if there were a way to make 
the R

 > console work well with Dasher under Windows, but the technical
    details
 > are completely beyond me.

    A long time ago I used to know some of this stuff, but now I find
    working in Windows quite difficult.  I never knew it well enough to
    know
    the advantages and disadvantages of the approach RGui uses versus the
    one that dasher seems to be expecting.

    On the hopeful side, accessibility has always had a relatively high
    priority in the R Project, and there seems to be a recent push in 
that

    direction.  Perhaps there will be an opportunity for someone to
    bring up
    this issue at useR! 2022 (https://user2022.r-project.org
    <https://user2022.r-project.org>).

    Duncan Murdoch

    __
    R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
    <https://stat.ethz.ch/mailman/listinfo/r-devel>



--
Best,
Kasper




--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] model.weights and model.offset: request for adjustment

2022-02-01 Thread Ben Bolker
  The model.weights() and model.offset() functions from the 'stats' 
package index possibly-missing elements of a data frame via $, e.g.


x$"(offset)"
x$"(weights)"

This returns NULL without comment when x is a data frame:

x <- data.frame(a=1)
x$"(offset)"  ## NULL
x$"(weights)"  ## NULL

However, when x is a tibble we get a warning as well:

x <- tibble::as_tibble(x)
x$"(offset)"
## NULL
## Warning message:
## Unknown or uninitialised column: `(offset)`.

   I know it's not R-core's responsibility to manage forward 
compatibility with tibbles, but in this case [[-indexing would seem to 
be better practice in any case.


  Might a patch be accepted ... ?

  cheers
   Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] partial matching of row names in [-indexing

2022-01-14 Thread Ben Bolker
  Makes sense if you realize that ?"[" only applies to *vector*, 
*list*, and *matrix* indexing and that data frames follow their own 
rules that are documented elsewhere ...


  So yes, not a bug but I claim it's an infelicity. I might submit a 
doc patch.


 FWIW

b["A1",]
as.matrix(b)["A1",]

 illustrates the difference.

 thanks
   Ben


On 1/14/22 9:19 PM, Steve Martin wrote:

I don't think this is a bug in the documentation. The help page for
`?[.data.frame` has the following in the last paragraph of the
details:

Both [ and [[ extraction methods partially match row names. By default
neither partially match column names, but [[ will if exact = FALSE
(and with a warning if exact = NA). If you want to exact matching on
row names use match, as in the examples.

The example it refers to is

sw <- swiss[1:5, 1:4]  # select a manageable subset
sw["C", ] # partially matches
sw[match("C", row.names(sw)), ] # no exact match

Whether this is good behaviour or not is a different question, but the
documentation seems clear enough (to me, at least).

Best,
Steve

On Fri, 14 Jan 2022 at 20:40, Ben Bolker  wrote:



People are often surprised that row-indexing a data frame by [ +
character does partial matching (and annoyed that there is no way to
turn it off:

https://stackoverflow.com/questions/18033501/warning-when-partial-matching-rownames

https://stackoverflow.com/questions/34233235/r-returning-partial-matching-of-row-names

https://stackoverflow.com/questions/70716905/why-does-r-have-inconsistent-behaviors-when-a-non-existent-rowname-is-retrieved


?"[" says:

Character indices can in some circumstances be partially matched
   (see ‘pmatch’) to the names or dimnames of the object being
   subsetted (but never for subassignment).  UNLIKE S (Becker et al_
   p. 358), R NEVER USES PARTIAL MATCHING WHEN EXTRACTING BY ‘[’, and
   partial matching is not by default used by ‘[[’ (see argument
   ‘exact’).

(EMPHASIS ADDED).

Looking through the rest of that page, I don't see any other text that
modifies or supersedes that statement.

Is this a documentation bug?

The example given in one of the links above:

b <- as.data.frame(matrix(4:5, ncol = 1, nrow = 2, dimnames =
list(c("A10", "B"), "V1")))

b["A1",]  ## 4 (partial matching)
b[rownames(b) == "A1",]  ## logical(0)
b["A1", , exact=TRUE]## unused argument error
b$V1[["A1"]] ## subscript out of bounds error
b$V1["A1"]   ## NA

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] partial matching of row names in [-indexing

2022-01-14 Thread Ben Bolker



  People are often surprised that row-indexing a data frame by [ + 
character does partial matching (and annoyed that there is no way to 
turn it off:


https://stackoverflow.com/questions/18033501/warning-when-partial-matching-rownames

https://stackoverflow.com/questions/34233235/r-returning-partial-matching-of-row-names

https://stackoverflow.com/questions/70716905/why-does-r-have-inconsistent-behaviors-when-a-non-existent-rowname-is-retrieved


?"[" says:

Character indices can in some circumstances be partially matched
 (see ‘pmatch’) to the names or dimnames of the object being
 subsetted (but never for subassignment).  UNLIKE S (Becker et al_
 p. 358), R NEVER USES PARTIAL MATCHING WHEN EXTRACTING BY ‘[’, and
 partial matching is not by default used by ‘[[’ (see argument
 ‘exact’).

(EMPHASIS ADDED).

Looking through the rest of that page, I don't see any other text that 
modifies or supersedes that statement.


  Is this a documentation bug?

The example given in one of the links above:

b <- as.data.frame(matrix(4:5, ncol = 1, nrow = 2, dimnames = 
list(c("A10", "B"), "V1")))


b["A1",]  ## 4 (partial matching)
b[rownames(b) == "A1",]  ## logical(0)
b["A1", , exact=TRUE]## unused argument error
b$V1[["A1"]] ## subscript out of bounds error
b$V1["A1"]   ## NA

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] documentation patch for as.formula → reformulate

2022-01-09 Thread Ben Bolker
  There was some discussion on twitter about the fact that the manual 
page for as.formula() doesn't mention reformulate(), and indeed the last 
example is


## Create a formula for a model with a large number of variables:
 xnam <- paste0("x", 1:25)
 (fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"


which could arguably be better done as

  reformulate(xname, response = "y")

  I've attached a documentation patch that adds the alternative version 
and a \seealso{} link.


  Happy to submit to r-bugzilla if requested.

  cheers
   Ben Bolker
Index: formula.Rd
===
--- formula.Rd  (revision 81462)
+++ formula.Rd  (working copy)
@@ -158,9 +158,10 @@
 \seealso{
   \code{\link{~}}, \code{\link{I}}, \code{\link{offset}}.
 
-  For formula manipulation: \code{\link{terms}}, and \code{\link{all.vars}};
-  for typical use: \code{\link{lm}}, \code{\link{glm}}, and
+  For formula manipulation: \code{\link{terms}}, and \code{\link{all.vars}}.
+  For typical use: \code{\link{lm}}, \code{\link{glm}}, and
   \code{\link{coplot}}.
+  For formula construction: \code{\link{reformulate}}.
 }
 \examples{
 class(fo <- y ~ x1*x2) # "formula"
@@ -176,5 +177,8 @@
 ## Create a formula for a model with a large number of variables:
 xnam <- paste0("x", 1:25)
 (fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"
+## Equivalent with reformulate():
+fmla2 <- reformulate(xnam, response = "y")
+identical(fmla, fmla2)
 }
 \keyword{models}
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] trivial typo in NEWS file

2022-01-03 Thread Ben Bolker



  Index: doc/NEWS.Rd
===
--- doc/NEWS.Rd (revision 81435)
+++ doc/NEWS.Rd (working copy)
@@ -425,7 +425,7 @@
   data frames with default row names (Thanks to Charlie Gao's
   \PR{18179}).

-  \item \code{txtProgresBar()} now enforces a non-zero width for
+  \item \code{txtProgressBar()} now enforces a non-zero width for
   \code{char}, without which no progress can be visible.

   \item \code{dimnames(table(d))} is more consistent in the case where

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why does lm() with the subset argument give a different answer than subsetting in advance?

2021-12-27 Thread Ben Bolker
  I agree that it seems non-intuitive (I can't think of a design reason 
for it to look this way), but I'd like to stress that it's *not* an 
information leak; the predictions of the model are independent of the 
parameterization, which is all this issue affects. In a worst case there 
might be some unfortunate effects on numerical stability if the 
data-dependent bases are computed on a very different set of data than 
the model fitting actually uses.


  I've attached a suggested documentation patch (I hope it makes it 
through to the list, if not I can add it to the body of a message.)




On 12/26/21 8:35 PM, Balise, Raymond R wrote:

Hello R folks,
Today I noticed that using the subset argument in lm() with a polynomial gives 
a different result than using the polynomial when the data has already been 
subsetted. This was not at all intuitive for me.You can see an example 
here: 
https://stackoverflow.com/questions/70490599/why-does-lm-with-the-subset-argument-give-a-different-answer-than-subsetting-i

 If this is a design feature that you don’t think should be 
fixed, can you please include it in the documentation and explain why it makes 
sense to figure out the orthogonal polynomials on the entire dataset?  This 
feels like a serous leak of information when evaluating train and test datasets 
in a statistical learning framework.

Ray

Raymond R. Balise, PhD
Assistant  Professor
Department of Public Health Sciences, Biostatistics

University of Miami, Miller School of Medicine
1120 N.W. 14th Street
Don Soffer Clinical Research Center - Room 1061
Miami, Florida 33136



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics
Index: lm.Rd
===
--- lm.Rd   (revision 81416)
+++ lm.Rd   (working copy)
@@ -33,7 +33,9 @@
 typically the environment from which \code{lm} is called.}
 
   \item{subset}{an optional vector specifying a subset of observations
-to be used in the fitting process.}
+to be used in the fitting process. (See additional details about how
+this argument interacts with data-dependent bases in the
+\sQuote{Details} section of the \code{\link{model.frame}} documentation.)
 
   \item{weights}{an optional vector of weights to be used in the fitting
 process.  Should be \code{NULL} or a numeric vector.
Index: model.frame.Rd
===
--- model.frame.Rd  (revision 81416)
+++ model.frame.Rd  (working copy)
@@ -38,7 +38,9 @@
   \item{subset}{a specification of the rows to be used: defaults to all
 rows. This can be any valid indexing vector (see
 \code{\link{[.data.frame}}) for the rows of \code{data} or if that is not
-supplied, a data frame made up of the variables used in \code{formula}.}
+supplied, a data frame made up of the variables used in
+\code{formula}. (See additional details about how this argument
+interacts with data-dependent bases under \sQuote{Details} below.)
 
   \item{na.action}{how \code{NA}s are treated.  The default is first,
 any \code{na.action} attribute of \code{data}, second
@@ -103,6 +105,12 @@
   character variable is found, it is converted to a factor (as from \R
   2.10.0).
 
+  Because variables in the formula are evaluated before rows are dropped
+  based on \code{subset}, the characteristics of data-dependent bases
+  such as orthogonal polynomials (i.e. from terms using
+  \code{\link{poly}}) or splines will be computed based on the full data
+  set rather than the subsetted data set.
+
   Unless \code{na.action = NULL}, time-series attributes will be removed
   from the variables found (since they will be wrong if \code{NA}s are
   removed).
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] plogis (and other p* functions), vectorized lower.tail

2021-12-09 Thread Ben Bolker




On 12/9/21 10:03 AM, Martin Maechler wrote:

Matthias Gondan
 on Wed, 8 Dec 2021 19:37:09 +0100 writes:


 > Dear R developers,
 > I have seen that plogis silently ignores vector elements of lower.tail,

and also of 'log'.
This is indeed the case for all d*, p*, q* functions.

Yes, this has been on purpose and therefore documented, in the
case of plogis, e.g. in the 'Value' section of ?plogis :

  The length of the result is determined by ‘n’ for ‘rlogis’, and is
  the maximum of the lengths of the numerical arguments for the
  other functions.

  (note: *numerical* arguments: the logical ones are not recycled)

  The numerical arguments other than ‘n’ are recycled to the length
  of the result.  Only the first elements of the logical arguments
  are used.

  (above, we even explicitly mention the logical arguments ..)


Recycling happens for the first argument (x,p,q) of these
functions and for "parameters" of the distribution, but not for
lower.tail, log.p (or 'log').


 >> plogis(q=0.5, location=1, lower.tail=TRUE)
 > [1] 0.3775407
 >> plogis(q=0.5, location=1, lower.tail=FALSE)
 > [1] 0.6224593
 >> plogis(q=c(0.5, 0.5), location=1, lower.tail=c(TRUE, FALSE))
 > [1] 0.3775407 0.3775407

 > For those familiar with psychological measurement: A use case of the 
above function is the so-called Rasch model, where the probability that a person 
with some specific ability (q) makes a correct (lower.tail=TRUE) or wrong response 
(lower.tail=FALSE) to an item with a specific difficulty (location). A vectorized 
version of plogis would enable to determine the likelihood of an entire response 
vector in a single call. My current workaround is an intermediate call to 
„Vectorize“.

 > I am wondering if the logical argument of lower.tail can be vectorized 
(?). I see that this may be a substantial change in many places (basically, all p 
and q functions of probability distributions), but in my understanding, it would 
not break existing code which assumes lower.tail to be a single element. If that’s 
not
 > possible/feasible, I suggest to issue a warning if a vector of length > 
1 is given in lower.tail. I am aware that the documentation clearly states that 
lower.tail is a single boolean.

aah ok, here you say you know that the current behavior is documented.

 > Thank you for your consideration.


As you mention, changing this would be quite a large endeavor.
I had thought about doing that many years ago, not remembering
details, but seeing that in almost all situations you really
only need one of the two tails  (for Gaussian- or t- based confidence
intervals you also only need one, for symmetry reason).

Allowing the recycling there would make the intermediate C code
(which does the recycling) larger and probably slightly
slower because of conceptually two more for loops which would in
99.9% only have one case ..

I'd have found that ugly to add. ... ...
... but of course, if you can prove that the code bloat would not be large
and not deteriorate speed in a measurable way and if you'd find
someone to produce a comprehensive and tested patch ...

Martin


 > With best wishes,
 > Matthias



 > [[alternative HTML version deleted]]

 > __
 > R-devel@r-project.org mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



  I agree with everything said above, but think that adding a warning 
when length(lower.tail) > 1 (rather than silently ignoring) might be 
helpful ...  ??


  As for the vectorization, it seems almost trivial to do at the user 
level when needed (albeit it's probably a little bit inefficient):


pv <- Vectorize(plogis, c("q", "location", "scale", "lower.tail"))
pv(q=c(0.5, 0.5), location=1, lower.tail=c(TRUE, FALSE))
[1] 0.3775407 0.6224593

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stats::fft produces inconsistent results

2021-10-21 Thread Ben Bolker

  Nice!

On 10/21/21 4:26 PM, Dipterix Wang wrote:
Thank you for such detailed and plain explanation. It is much clearer to 
me now w.r.t. the R internal memory management and how PROTECT should be 
used.


Also after diving into the documentation of FFTW3 library, I think I 
found why the data was centered.


https://www.fftw.org/fftw3_doc/Planner-Flags.html 



Basically
1. FFTW3 modifies the input data by default
2. one has to initialize the data after planning fft (except for some 
special situations). This “subtle” detail is buried in their 
documentation and is very hard to debug once a mistake is made.


The second one actually causes CRAN package fftwtools to produce 
inconsistent results on osx 
(https://github.com/krahim/fftwtools/issues/15 
)


Best,
Dipterix

On Oct 21, 2021, at 6:32 AM, GILLIBERT, Andre 
mailto:andre.gillib...@chu-rouen.fr>> 
wrote:


> Haha, thanks : ) I guess I will probably be grouchy too if seeing so many 
people making the same mistakes again and again. It just happened to be me.

Fortunately, you did not get offensed. :)

This is nice to have a large community of developers for R packages, 
even if, sometimes, buggy packages are annoying R developers because 
any small change in R may "break" them even though they were actually 
broken from the begining.


>Indeed, I found myself often confused about when to PROTECT and when not.

A (relatively) quick explanation.
There are several “pools” of data objects that have different rules. 
The most common “pool” is the pool of garbage collectable R objects, 
that can be allocated with allocVector and is passed from R to C code 
and vice versa. Another pool is the malloc/free pool, that works with 
explicit allocation/deallocation. R does not modify the malloc/free 
implementation in any way, and memory leaks may happen. Operating 
systems may have other pools of memory (e.g. mmap'ed memory) that are 
not handled by R either. There is also a transient storage 
(R_alloc/vmaxset/vmaxget) that is automatically freed when returning 
from C to R, and should be used for temporary storage but not for 
objects returned to R code.


The PROTECT system is needed for garbage collectable objects.
The garbage collector may trigger whenever a R internal function is 
called. Typically, when some memory is internally allocated.
The garbage collector frees objects that are neither referenced 
directly nor indirectly from R code and from the PROTECT stack.
The PROTECT stack is used by C code to make sure objects that are not 
yet (or will never be) referenced by R code, are not destroyed when 
the garbage collector runs.


The functions allocating new R objects, such as allocVector(), but 
also coerceVector(), duplicate(),return unprotected objects, that may 
be destroyed the next time an internal R function is called, unless it 
is explicitly PROTECT'ed before. Indeed, such objects would have no 
reference from R code and so, would be deleted.


The PROTECT stack must be balanced on a call from R to a C function. 
There must be as many UNPROTECT'ions than PROTECT'ions.


The typical C code PROTECTs any object allocated as soon as it is 
allocated (e.g. call to allocVector or coerceVector). It UNPROTECTs 
temporary objects to "free" them (the actual memory release may be 
delayed to the next garbage collection). It UNPROTECTs the object it 
returns to R code. Indeed, in pure C code, there will be no garbage 
collection between the time the object is UNPROTECTed and the time R 
grabs the object. You must be very careful if you are using C++, 
because destructors must not call any R internal function that may 
trigger a garbage collection.
The arguments to the C code, do not have to be PROTECT'ed, unless they 
are re-allocated. For instance, it is frequent to call coerceVector or 
arguments and re-assign them to the C variable that represents the 
argument. The new object must be PROTECT'ed.


Actually, you do not need to *directly* PROTECT all objects that are 
allocated in the C function, but you must make sure that all objects 
are *indirectly* PROTECT'ed. For instance, you may allocate a VECSXP 
(a "list" in R) and fill the slots with newly allocated objects. You 
only need to PROTECT the VECSXP, since its slots are indirectly protected.


If you have any doubt, it is not a bug to over-PROTECT objects. It may 
slightly slow down garbage collection and use space on the PROTECTion 
stack, but that is rarely a big deal. You should only avoid that when 
that would lead to thousands or millions of protections.


As I said, the PROTECT stack must be balanced between the entry and 
exit of the C code. This is not a problem for 99% of functions that 
free all the memory they use internally except the object that is 
returned. Sometimes, some "background" memory, hidden to R code, may 
have to be allocated for more time. A call to R_PreserveObject 
protect

Re: [Rd] stats::fft produces inconsistent results

2021-10-19 Thread Ben Bolker

  This is a long shot, but here's a plausible scenario:

  as part of its pipeline, ravetools::mvfftw computes the mean of the 
input vector **and then centers it to a mean of zero** (intentionally or 
accidentally?)


  because variables are passed to compiled code by reference (someone 
can feel free to correct my terminology), this means that the original 
vector in R now has a mean of zero


  the first element of fft() is mean(x)*length(x), so if mean(x) has 
been forced to zero, that would explain your issue.


  I don't know about the non-reproducibility part.

On 10/19/21 7:06 PM, Dipterix Wang wrote:

Dear R-devel Team,

I'm developing a neuroscience signal pipeline package in R 
(https://github.com/dipterix/ravetools) and I noticed a weird issue that failed 
my unit test.

Basically I was trying to use `fftw3` library to implement fast multivariate 
fft function in C++. When I tried to compare my results with stats::fft, the 
test result showed the first element of **expected** (which was produced by 
stats::fft) was zero, which, I am pretty sure, is wrong, and I can confirm that 
my function produces correct results.

However, somehow I couldn’t reproduce this issue on my personal computer (osx, 
M1, R4.1.1), the error simply went away.

The catch is my function produced consistent and correct results but stats::fft 
was not. This does not mean `stats::fft` has bugs. Instead, I suspect there 
could be some weird interactions between my code and stats::fft at C/C++ level, 
but I couldn’t figure it out why.

+++ Details:

Here’s the code I used for the test:

https://github.com/dipterix/ravetools/blob/4dc35d64763304aff869d92dddad38a7f2b30637/tests/testthat/test-fftw.R#L33-L41

Test code
set.seed(1)
x <- rnorm(1000)
dim(x) <- c(100,10)
a <- ravetools:::mvfftw_r2c(x, 0)
c <- apply(x, 2, stats::fft)[1:51,]
expect_equal(a, c)


Here are the tests that gave me the errors:

The test logs on win-builder
https://win-builder.r-project.org/07586ios8AbL/00check.log

Test logs on GitHub
https://github.com/dipterix/ravetools/runs/3944874310?check_suite_focus=true


—— Failed tests ——
  -- Failure (test-fftw.R:41:3): mvfftw_r2c 
--
  `a` (`actual`) not equal to `c` (`expected`).

  actual vs expected
  [,1][,2]  
[,3]  [,4]...
  - actual[1, ] 10.8887367+ 0.000i  -3.7808077+ 0.000i   2.967354+ 
0.00i   5.160186+ 0.00i ...
  + expected[1, ]0.000+ 0.000i  -3.7808077+ 0.000i   2.967354+ 
0.00i   5.160186+ 0.00i...



The first columns are different, `actual` is the results I produced via 
`ravetools:::mvfftw_r2c`, and `expected` was produced by `stats::fft`


Any help or attention is very much appreciated.
Thanks,
- Zhengjia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question about quantile fuzz and GPL license

2021-09-14 Thread Ben Bolker




On 9/14/21 9:22 AM, Abel AOUN wrote:

Hello,

I'm currently working on Python numpy package to develop linear interpolation 
methods for quantiles.
Currently, numpy only support the type 7 of Hyndman & Fan and I did the 
implementation for the 8 other methods to do as much as R ::quantile.

As you may guess, I was inspired by R implementation as well as other sources, 
which lead to my questions:

About fuzz (see first reference below for the source code),
fuzz <- 4 * .Machine $ double.eps
I think I understand why the machine epsilon is used to correct some edge cases 
where the float comparisons would fail.
However I don't get why epsilon is multiplied by 4 instead of simply using 
epsilon.
Is there someone who can explain this 4 ?


No, but doing a bit of archaeology

https://github.com/wch/r-source/blame/trunk/src/library/stats/R/quantile.R

  give the commit message for these lines as "add (modified) version of 
quantile.default from Rob Hyndman (17 years ago)".  This commit was made 
by Brian Ripley.


  However, the code from Rob Hyndman here:

https://stat.ethz.ch/pipermail/r-devel/2004-July/030204.html

  does **not** have the lines with the fuzz.  So my guess would be that 
Brian Ripley is the author of that particular bit of code.


  I can't say, myself, what the logic behind 4 * .Machine$double.eps is ...




About licence,
Numpy is under license BSD and R is on GPL.
The only thing I really cherry picked and rewrote for numpy is the fuzz part.
I'm quite new to open source development. We are wondering if doing this breaks 
the license GPL and if I can credit the original authors.
Plus, I'm not quite sure this is the right place to ask this, if not, sorry for 
the noise.
The relevant discussion on numpy PR is here: [ 
https://github.com/numpy/numpy/pull/19857#discussion_r706019184 | 
https://github.com/numpy/numpy/pull/19857#discussion_r706019184 ]


Thank you for your time.

Regards,
Abel Aoun


References:
The source code for R::quantile (fuzz is at line 82) [ 
https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/quantile.R
 | 
https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/stats/R/quantile.R
 ] [ https://github.com/numpy/numpy/pull/19857 ]
R doc for quantile : [ 
https://www.rdocumentation.org/packages/stats/versions/3.5.0/topics/quantile | 
https://www.rdocumentation.org/packages/stats/versions/3.5.0/topics/quantile ]
The ongoing PR on numpy: [ https://github.com/numpy/numpy/pull/19857 | 
https://github.com/numpy/numpy/pull/19857 ]



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issues with drop.terms

2021-08-23 Thread Ben Bolker
  Small follow-up: (1) in order for lm() to actually work you need 
keep.response=TRUE in the drop.terms() call (I realize that this is 
*not* the problem in your example)


test4 <- terms(mpg ~ hp + I(cyl==4) + disp + wt )
check4 <- drop.terms(test4, 3, keep.response = TRUE)
formula(check4)
lm( check4, data=mtcars)

(2) I'm ambivalent about your "We can argue that the user should have 
used I(cyl==4), but very many won't." argument. This is the ever-present 
"document precisely and require users to know and follow the 
documentation" vs. "try to protect users from themselves" debate - 
taking either side to an extreme is (IMO) unproductive. I don't know how 
hard it would be to make drop.terms() **not** drop parentheses, but it 
seems like it may be very hard/low-level. My vote would be to see if 
there is a reasonably robust way to detect these constructions and 
**warn** about them.


  I have probably asked about this before, but if anyone knows of 
useful materials that go into more details about the definitions and 
implementation of model matrix/terms/etc. machinery, *beyond* the 
appropriate chapter of "Statistical Models in S" (Becker/Chambers white 
book), *or* the source code itself, I would love some pointers ...


 Ben Bolker


On 8/23/21 10:36 AM, Therneau, Terry M., Ph.D. via R-devel wrote:

This is a follow-up to my earlier note on [.terms.   Based on a couple days' 
work getting
the survival package to work around  issues, this will hopefully be more 
concise and
better expressed than the prior note.

1.
test1 <- terms( y ~ x1:x2 + x3)
check <- drop.terms(termobj =test1, dropx = 1)
formula(check)
## ~x1:x2

The documentation for the dropx argument is "vector of positions of variables 
to drop from
the right hand side of the model", but it is not clear what "positions" is.   I 
originally
assumed "the order in the formula as typed", but was wrong.   I suggest adding 
a line
"Position refers to the order of terms in the term.labels attribute of the 
terms object,
which is also the order they will appear in a coefficient vector (not counting 
the
intercept).

2.
library(splines)
test2 <- terms(model.frame(mpg ~  offset(cyl) + ns(hp, df=3) + disp + wt, 
data=mtcars))
check2 <- drop.terms(test2,  dropx = 2)
formula(check2)
## ~ns(hp, df=3) + wt

One side effect of how drop.terms is implemented, and one that I suspect was 
not intended,
is that offsets are completly ignored.    The above drops both the offset and 
the disp
term from the formula   The dataClasses and predvars attributes of the result 
are also
incorrect: they have lost the ns() term rather than the disp term;
the results of predict will be incorrect.

attr(check2, "predvars")
##    list(offset(cyl), disp, wt)

Question: should the function be updated to not drop offsets? If not a line 
needs to be
added to the help file.   The handling of predvars needs to be fixed regardless.

3.
test3 <- terms(mpg ~ hp + (cyl==4) + disp + wt )
check3 <- drop.terms(test3, 3)
formula(check3)
lm( check3, data=mtcars)   # fails

The drop.terms action has lost the () around the logical expression, which 
leads to an
invalid formula.  We can argue that the user should have used I(cyc==4), but 
very many won't.

4. As a footnote, more confusion (for me) is generated by the fact that the 
"specials"
attribute of a formula does not use the numbering discussed in 1 above.   I had 
solved
this issue long ago in the untangle.specials function; long enough ago that I 
forgot I had
solved it, and just wasted a day rediscovering that fact.

---

I can create a patch for 1 and 2 (once we answer my question), but a fix for 3 
is not
clear to me.  It currently leads to failure in a coxph call that includes a 
strata so I am
directly interested in a solution; e.g.,  coxph(Surv(time, status) ~ age + 
(ph.ecog==2) +
strata(inst), data=lung)

Terry T



--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Feature request: Change default library path on Windows

2021-07-25 Thread Ben Bolker
  Given (AFAICS) the absence of any actual R-core members 
<https://www.r-project.org/contributors.html> chiming into the 
discussion here, and provided that you think that the issue has been 
sufficiently discussed, I would say the next step ("how to go about 
making the change") is to compose & submit a wishlist request to the R 
bug tracker ...
  It might also be worth reaching out to/double-checking with Jeroen 
Ooms (Rtools maintainer and Windows infrastructure expert; I don't know 
if he has any more formal association with R-core/CRAN ?)


  cheers
   Ben Bolker


On 7/25/21 6:35 PM, Steve Haroz wrote:

So I would say that I still believe Microsoft doesn't give clear
guidance for this.


Sure, there is some ambiguity on where MS would prefer these kinds of
files. But what is clear is that the current location USER/Documents
is causing a serious issue.

And while we can all understand frustration with Microsoft, Windows
users represent a major proportion of the R install base. So let's see
what we can do to help out those users. Changing the default location
to either USER/R or USER/AppData/Local/R would help a lot of users,
both beginners and those with moderate experience who switch to a new
cloud backup.

Microsoft is unlikely to put out new guidance any time soon. And the
current guidance doesn't seem opposed to putting R libraries in either
suggested location. So how about we just pick one (I suggest USER/R
for simplicity) and discuss how to go about making the change?

-Steve

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] undefined subclass warning

2021-06-30 Thread Ben Bolker



  A colleague recently submitted a paper to JSS and was advised to 
address the following warning which occurs when their package 
(https://CRAN.R-project.org/package=pcoxtime) is loaded:


Warning message:
In .recacheSubclasses(def@className, def, env) :
  undefined subclass "numericVector" of class "Mnumeric"; definition 
not updated


After much digging I *think* I've concluded that this comes from the 
following import chain:


pcoxtime -> riskRegression -> rms -> quantreg -> MatrixModels

  that is, loading any of these packages throws the warning. 
MatrixModels Imports: *only* {stats, methods, Matrix} and loading these 
by themselves is warning-less.


   I assume there is some mismatch/incompatibility between MatrixModels 
(which was last updated 2021-03-01) and Matrix (2021-05-24), which has 
this NEWS item in the most release 1.3-3 
:


* removed the nowhere used (and unexported but still active) class union 
"Mnumeric" which actually trickled into many base classes properties. 
Notably would it break validity of factor with a proposed change in 
validity checking, as factors were also "Mnumeric" but did not fulfill 
its validity method. Similarly removed (disabled) unused class union 
"numericVector".


   It seems that REINSTALLING the package from source solves the 
problem, which is nice, but I don't fully understand why; I guess  
there are class structures that are evaluated at install time and stored 
in the package environment ...


  Any more explanations would be welcome.

  cheers
Ben

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dgTMatrix Segmentation Fault

2021-06-09 Thread Ben Bolker

  Nice!

On 6/9/21 9:00 PM, Dario Strbenac via R-devel wrote:

Good day,

Thanks to handy hints from Martin Morgan, I ran R under gdb and checked for any 
numeric overflow. We pinpointed the cause:

(gdb) info locals
i = 0
j = 10738
m = 20
n = 5
ans = 0x5b332790
aa = 0x5b3327c0

There is a line of C code in dgeMatrix.c for (i = 0; i < m; i++) aa[i] += xx[i 
+ j * m];

i  + j * m are all int, and overflow
(lldb) print 0 + 10738 * 20
(int) $5 = -2147367296

So, either the code should check that this doesn't occur, or be adjusted to 
allow for large indexes.

If anyone is interested, this is in the context of single-cell ATAC-seq data, 
which typically has about 20 genomic regions (rows) and perhaps 10 
biological cells (columns).

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to get utf8 string using R externals

2021-06-02 Thread Ben Bolker

  Might the new UCRT build help?

https://developer.r-project.org/Blog/public/2021/03/12/windows/utf-8-toolchain-and-cran-package-checks/

On 6/2/21 5:36 PM, Ben Bolker wrote:



On 6/2/21 5:31 PM, Duncan Murdoch wrote:

On 02/06/2021 4:33 p.m., xiaoyan yu wrote:

I have a R Script Predict.R:
 set.seed(42)
 C <- seq(1:1000)
 A <- rep(seq(1:200),5)
 E <- (seq(1:1000) * (0.8 + (0.4*runif(50, 0, 1
 L <- ifelse(runif(1000)>.5,1,0)
 df <- data.frame(cbind(C, A, E, L))
load("C:/Temp/tree.RData")    #  load the model for scoring

   P <- as.character(predict(tree_model_1,df,type='class'))

Then in a C++ program
I call eval to evaluate the script and then findVar the P variable.
After get each class label from P using string_elt and then
Rf_translateChar, the characters are unicodes () 
instead of

utf8 encoding of the korean characters 부실.
Can I know how to get UTF8 by using R externals?

I also found the same script giving utf8 characters in RGui but 
unicode in

Rterm.
I tried to attach a screenshot but got message "The message's content 
type

was not explicitly allowed"
In RGui, I saw the output 부실, while in Rterm, .


Sounds like you're using Windows.  Stop doing that.

Duncan Murdoch



   Shouldn't there be a smiley there somewhere?



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to get utf8 string using R externals

2021-06-02 Thread Ben Bolker




On 6/2/21 5:31 PM, Duncan Murdoch wrote:

On 02/06/2021 4:33 p.m., xiaoyan yu wrote:

I have a R Script Predict.R:
 set.seed(42)
 C <- seq(1:1000)
 A <- rep(seq(1:200),5)
 E <- (seq(1:1000) * (0.8 + (0.4*runif(50, 0, 1
 L <- ifelse(runif(1000)>.5,1,0)
 df <- data.frame(cbind(C, A, E, L))
load("C:/Temp/tree.RData")    #  load the model for scoring

   P <- as.character(predict(tree_model_1,df,type='class'))

Then in a C++ program
I call eval to evaluate the script and then findVar the P variable.
After get each class label from P using string_elt and then
Rf_translateChar, the characters are unicodes () 
instead of

utf8 encoding of the korean characters 부실.
Can I know how to get UTF8 by using R externals?

I also found the same script giving utf8 characters in RGui but 
unicode in

Rterm.
I tried to attach a screenshot but got message "The message's content 
type

was not explicitly allowed"
In RGui, I saw the output 부실, while in Rterm, .


Sounds like you're using Windows.  Stop doing that.

Duncan Murdoch



  Shouldn't there be a smiley there somewhere?

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] {splines} package gone missing?

2021-05-20 Thread Ben Bolker
splines is 'recommended' ( not sure about capitalization), not "base'

On Thu, May 20, 2021, 7:02 AM Randall Pruim  wrote:

> Thanks.  I actually sort of checked for this:
>
>
> row.names(installed.packages(priority = "base"))
>  [1] "base"  "compiler"  "datasets"  "graphics"  "grDevices" "grid"
>   "methods"   "parallel"  "stats" "stats4""tcltk"
> [12] "tools" "utils"
>
> But, of course, if the package is missing on my system (a newly installed
> 4.1 on an RStudio server), then it won’t be listed here.
>
> I’ll have to figure out what went wrong with the install.  I’ll probably
> start by having our sysadmin simply reinstall R 4.1 and hope that that
> takes care of the problem.
>
> Looks like profile is missing as well.
>
> —rjp
>
> On May 20, 2021, at 3:14 AM, peter dalgaard  pda...@gmail.com>> wrote:
>
> It is part of base R, so comes with the R sources:
>
> Peters-MacBook-Air:R pd$ ls src/library/
> Makefile.in   compiler/ grid/ splines/  tools/
> Makefile.win  datasets/ methods/  stats/
> translations/
> Recommended/  grDevices/parallel/ stats4/   utils/
> base/ graphics/ profile/  tcltk/
>
> - pd
>
> On 20 May 2021, at 06:02 , Randall Pruim  rpr...@calvin.edu>> wrote:
>
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-2Dproject.org_web_packages_splines_index.html&d=DwIFaQ&c=4rZ6NPIETe-LE5i2KBR4rw&r=S6U-baLhvGcJ7iUQX_KZ6K2om1TTOeUI_-mjRpTrm00&m=GW4Zim0TvggrmjKt0HtW7il0NwJJaworM4-LL3H2FyI&s=69EYlZJd6di726T20ILKnX7f4GWM-VaUplRjNXfGolQ&e=
> claims the the {splines} package has been archived.  If I follow the link
> there to the archives, the newest version shown is from 1999.  It seems
> like something has gone wrong with this package.
>
> I checked on another mirror and {splines} is missing there as well.
>
> —rjp
>
> __
> R-devel@r-project.org mailing list
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=4rZ6NPIETe-LE5i2KBR4rw&r=S6U-baLhvGcJ7iUQX_KZ6K2om1TTOeUI_-mjRpTrm00&m=GW4Zim0TvggrmjKt0HtW7il0NwJJaworM4-LL3H2FyI&s=Jen0Ht23Vg-znRGxp8YPRc-qCbG0uLYGAyyECxU6kFg&e=
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com pda...@gmail.com>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Add to Documentation of atan2.

2021-05-18 Thread Ben Bolker
  Can you dig through into the code, see what's going on, and suggest a 
documentation patch?  To get you started, the code for the complex 
version of atan2 is in


https://svn.r-project.org/R/trunk/src/main/complex.c

z_atan2 is at line 669 (the first argument is a pointer to the result, 
args 2 [csn] and 3 [ccs] are pointers to the arguments of atan2())


 In generic cases the computation is

dr = catan(dcsn / dccs);
if(creal(dccs) < 0) dr += M_PI;
if(creal(dr) > M_PI) dr -= 2 * M_PI;

where dcsn, dccs are converted versions of the args.

catan() is *either* taken from system libraries or is defined at line 489.

  On my system (Ubuntu), 'man 3 catan' gives documentation on the 
function, and says "The real part of y is chosen in the interval 
[-pi/2,pi/2]" - but that _could_ be system-dependent.


   cheers
   Ben Bolker

On 5/18/21 10:39 AM, Jorgen Harmse via R-devel wrote:

The current documentation says that atan2(y,x) is the angle between the x-axis and 
the vector from the origin to (x,y), but what does this mean when x & y are 
complex? The function seems to pick theta with Re(theta) between -pi and pi and 
with tan(theta) (approximately) equal to y/x, but that leaves 2 (sometimes 3) 
options, and there must be a set (branch region with 3 real dimensions?) on which 
the function is discontinuous. Please add details.

Even for real inputs, it might help to spell out the behaviour on the negative 
x-axis. It mostly matches the branch-cut rules for the other functions, but 
atan2(0,0)==0 is a unexpected.

I also suggest ‘See Also’ links from trigonometric functions to hyperbolic 
functions and from hyperbolic functions to exponential & logarithmic functions.

Regards,
Jorgen Harmse.




R.version.string


[1] "R version 4.0.4 (2021-02-15)"






[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] base R pipe documentation

2021-05-17 Thread Ben Bolker
  As of right now, as far as I can tell, the documentation for the new 
native |> pipe still says that it's experimental.


https://github.com/wch/r-source/blob/trunk/src/library/base/man/pipeOp.Rd#L45

 *Pipe support is experimental and may change prior to release.*

Also still in the 4-1 branch:

https://github.com/wch/r-source/blob/R-4-1-branch/src/library/base/man/pipeOp.Rd#L45

  (The corresponding comment in the NEWS file has been fixed in the 
last 24 hours, but hasn't propagated to the online/HTML version on the 
developer page yet ...)


  As a "wish list" item, if there are any particularly 
salient/important  differences between the |> pipe and the %>% magrittr 
pipe, it would be great to have those documented (I know that 
documenting the difference between a base-R operator and the one that's 
implemented in a non-Recommended package is a little weird, but it would 
be helpful in this case ...)  I know I could go back to the mailing list 
discussion at 
https://hypatia.math.ethz.ch/pipermail/r-devel/2020-December/080173.html 
and try to figure it out for myself ...


  cheers
   Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Testing R build when using --without-recommended-packages?

2021-05-04 Thread Ben Bolker



  Sorry if this has been pointed out already, but some relevant text 
from 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Suggested-packages


> Note that someone wanting to run the examples/tests/vignettes may not 
have a suggested package available (and it may not even be possible to 
install it for that platform). The recommendation used to be to make 
their use conditional via if(require("pkgname")): this is OK if that 
conditioning is done in examples/tests/vignettes, although using 
if(requireNamespace("pkgname")) is preferred, if possible.


...

> Some people have assumed that a ‘recommended’ package in ‘Suggests’ 
can safely be used unconditionally, but this is not so. (R can be 
installed without recommended packages, and which packages are 
‘recommended’ may change.)




On 5/4/21 5:10 PM, Gabriel Becker wrote:

Hi Henrik,

A couple of things. Firstly, so far asI have ever heard, it's valid that a
package have hard dependencies in its tests for packages listed only in
Suggests.  In fact, that is one of the stated purposes of Suggests. An
argument could be made, I suppose, that the base packages should be under
stricter guidelines, but stats isn't violating the letter or intention of
Suggests by doing this.


Secondly, I don't have time to dig through the make files/administration
docs, but I do know that R CMD check has --no-stop-on-error, so you can
either separately or as part of make check, use that option for stats (and
elsewhere as needed?) and just know that the stats tests that depend on
MASS are "false positive" (or, more accurately, missing value) test
results, rather than real positives, and go from there.

You could also "patch" the tests as part of your build process. Somewhere I
worked had to do that for parts of the internet tests that were unable to
get through the firewall.

Best,
~G



On Tue, May 4, 2021 at 1:04 PM Henrik Bengtsson 
wrote:


Two questions to R Core:

1. Is R designed so that 'recommended' packages are optional, or
should that be considered uncharted territories?

2. Can such an R build/installation be validated using existing check
methods?


--

Dirk, it's not clear to me whether you know for sure, or you draw
conclusions based your long experience and reading. I think it's very
important that others don't find this thread later on and read your
comments as if they're the "truth" (unless they are).  I haven't
re-read it from start to finish, but there are passages in 'R
Installation and Administration' suggesting you can build and install
R without 'recommended' packages.  For example, post-installation,
Section 'Testing an Installation' suggests you can run (after making
sure `make install-tests`):

cd tests
../bin/R CMD make check

but they fail the same way.  The passage continuous "... and other
useful targets are test-BasePackages and test-Recommended to run tests
of the standard and recommended packages (if installed) respectively."
(*).  So, to me that hints at 'recommended' packages are optional just
as they're "Priority: recommended".  Further down, there's also a
mentioning of:

$ R_LIBS_USER="" R --vanilla

Sys.setenv(LC_COLLATE = "C", LC_TIME = "C", LANGUAGE = "en")
tools::testInstalledPackages(scope = "base")


which also produces errors when 'recommended' packages are missing,
e.g. "Failed with error:  'there is no package called 'nlme'".

(*) BTW, '../bin/R CMD make test-BasePackages' gives "make: *** No
rule to make target 'test-BasePackages'.  Stop."

Thanks,

/Henrik

On Tue, May 4, 2021 at 12:22 PM Dirk Eddelbuettel  wrote:



On 4 May 2021 at 11:25, Henrik Bengtsson wrote:
| FWIW,
|
| $ ./configure --help
| ...
|   --with-recommended-packages
|   use/install recommended R packages [yes]

Of course. But look at the verb in your Subject: no optionality _in

testing_ there.


You obviously need to be able to build R itself to then build the

recommended

packages you need for testing.

Dirk

--
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R compilation on old(ish) CentOS

2021-05-01 Thread Ben Bolker
  Thanks -- yes, I can confirm that it installs OK after erasing and 
checking out SVN from scratch.


On 4/30/21 9:40 PM, Henrik Bengtsson wrote:

Ben, it's most like what Peter says.  I can confirm it works; I just
installed https://cran.r-project.org/src/base-prerelease/R-latest.tar.gz
on an up-to-date CentOS 7.9.2009 system using the vanilla gcc (GCC)
4.8.5 that comes with that version and R compiles just fine and it
passes 'make check' too.

Since R is trying to move toward C++14 support by default, I agree
with Iñaki, you might wanr to build and run R with a newer version of
gcc.  gcc 4.8.5 will only give you C++11 support.  RedHat's Software
Collections (SCL) devtoolset:s is the easiest way to do this. I've
done this too and can confirm that gcc 7.3.1 that comes with SCL
devtoolset/7 is sufficient to get C++14 support.  I'm sharing my
installation with lots of users, so I'm make it all transparent to the
end-user with environment modules, i.e. 'module load r/4.1.0' is all
the user needs to know.

/Henrik

On Thu, Apr 29, 2021 at 7:28 AM Peter Dalgaard  wrote:


You may want to check out your checkout

I see:

Peter-Dalgaards-iMac:R pd$ grep newsock src/main/connections.c
 con = R_newsock(host, port, server, serverfd, open, timeout, options);

but your file seems to have lost the ", options" bit somehow. Also, mine is 
line 3488, not 3477.

Maybe you have an old file getting in the way?

- Peter


On 29 Apr 2021, at 15:58 , Ben Bolker  wrote:

  I probably don't want to go down this rabbit hole very far, but if anyone has 
any *quick* ideas ...

  Attempting to build R from scratch with a fresh SVN checkout on a somewhat 
out-of-date CentOS system (for which I don't have root access, although I can 
bug people if I care enough).

  ../r-devel/configure; make

ends with

gcc -std=gnu99 -I../../../r-devel/trunk/src/extra  -I. -I../../src/include 
-I../../../r-devel/trunk/src/include -I/usr/local/include 
-I../../../r-devel/trunk/src/nmath -DHAVE_CONFIG_H  -fopenmp  -g -O2  -c 
../../../r-devel/trunk/src/main/connections.c -o connections.o
../../../r-devel/trunk/src/main/connections.c: In function ‘do_sockconn’:
../../../r-devel/trunk/src/main/connections.c:3477:5: error: too few arguments 
to function ‘R_newsock’
 con = R_newsock(host, port, server, serverfd, open, timeout);
 ^
In file included from ../../../r-devel/trunk/src/main/connections.c:80:0:
../../../r-devel/trunk/src/include/Rconnections.h:83:13: note: declared here
Rconnection R_newsock(const char *host, int port, int server, int serverfd, 
const char * const mode, int timeout, int options);
 ^
make[3]: *** [connections.o] Error 1

  Any suggestions for a quick fix/diagnosis?

  cheers
Ben Bolker




$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)

$ lsb_release -a
LSB Version: 
:core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID:   CentOS
Description:  CentOS Linux release 7.8.2003 (Core)
Release:  7.8.2003
Codename: Core

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R compilation on old(ish) CentOS

2021-04-29 Thread Ben Bolker
  I probably don't want to go down this rabbit hole very far, but if 
anyone has any *quick* ideas ...


  Attempting to build R from scratch with a fresh SVN checkout on a 
somewhat out-of-date CentOS system (for which I don't have root access, 
although I can bug people if I care enough).


  ../r-devel/configure; make

ends with

gcc -std=gnu99 -I../../../r-devel/trunk/src/extra  -I. 
-I../../src/include -I../../../r-devel/trunk/src/include 
-I/usr/local/include -I../../../r-devel/trunk/src/nmath -DHAVE_CONFIG_H 
 -fopenmp  -g -O2  -c ../../../r-devel/trunk/src/main/connections.c -o 
connections.o

../../../r-devel/trunk/src/main/connections.c: In function ‘do_sockconn’:
../../../r-devel/trunk/src/main/connections.c:3477:5: error: too few 
arguments to function ‘R_newsock’

 con = R_newsock(host, port, server, serverfd, open, timeout);
 ^
In file included from ../../../r-devel/trunk/src/main/connections.c:80:0:
../../../r-devel/trunk/src/include/Rconnections.h:83:13: note: declared here
 Rconnection R_newsock(const char *host, int port, int server, int 
serverfd, const char * const mode, int timeout, int options);

 ^
make[3]: *** [connections.o] Error 1

  Any suggestions for a quick fix/diagnosis?

  cheers
Ben Bolker




$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)

$ lsb_release -a
LSB Version: 
:core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch

Distributor ID: CentOS
Description:CentOS Linux release 7.8.2003 (Core)
Release:7.8.2003
Codename:   Core

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible bug in predict.lm when x is a poly

2021-03-29 Thread Ben Bolker
  Any particular reason you're using I() around your poly()?  That 
looks weird to me ... and it works fine if you don't do that ...  {AND, 
I think your result is *incorrect* when you have 3 observations in your 
response}.


  Basically, you have managed to short-circuit the (admittedly) obscure 
machinery that R uses to generate the correct bases when predicting from 
new data (see ?makepredictcall ...)


On 3/29/21 6:04 PM, Kenny Bell wrote:

Hi all,

As always, thank you all for your incredible work maintaining and improving
R.

mdl <- lm(data = mtcars,
   mpg ~ I(poly(disp, 2)))

predict(mdl, newdata = data.frame(disp = c(120, 120)))
#> Error in poly(disp, 2): 'degree' must be less than number of unique
points

predict(mdl, newdata = data.frame(disp = c(120, 121, 122)))
#> 1 2 3
#> 43.937856 12.617762  3.716257

The predict function seems to require a sufficiently high number of unique
values in newdata when the RHS is a poly. Of course, I would have expected
the output here to be:

#> 1 2
#> 43.937856 43.937856

If people agree, I can submit this to bugzilla.

Kind regards,
Kenny

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] boneheaded BLAS questions

2021-03-18 Thread Ben Bolker
  For what it's worth I eventually got it to build in a hacky way (had 
to add -lopenblaslib manually).  FWIW I *did* RTFM, several times, but 
for whatever reason the standard recipes are not working for me ...


  thanks!
    Ben Bolker

On 3/18/21 7:52 AM, Dirk Eddelbuettel wrote:


On 18 March 2021 at 09:15, Tomas Kalibera wrote:
| This is documented in R Admin manual, section A.3, and there is also
| "configure --help".
|
| On my Ubuntu 20.04, using "--with-blas --with-lapack" when a BLAS/LAPACK
| implementation is installed via "apt" works for me:
|
| with libblas3, liblapack3 I get in R via sessionInfo()
|
| BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
| LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
|
| then I install libopenblas0 and get, after re-starting R (not rebuilding):
|
| BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
| LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

Yes, thank you, that is exactly what I use and recommend too. And ...

On 18 March 2021 at 09:34, Tomas Kalibera wrote:
| and to install say MKL, this works for me:
|
| apt-get install intel-mkl-full
|
| and then:
|
| env MKL_INTERFACE_LAYER=GNU,LP64 MKL_THREADING_LAYER=GNU R
|
| gives me:
|
| BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so
|
| This is following documentation in A.3.1.3 of R Admin manual.

... which is what the (surprisingly popular, 139 stars) simple two-year old
script at GitHub has been automated.

https://github.com/eddelbuettel/mkl4deb

I may need to update the recommendation for the two MKL_* variables.

Dirk



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] boneheaded BLAS questions

2021-03-17 Thread Ben Bolker
  Thanks.  I know it's supposed to Just Work (and I definitely 
appreciate all the work that's gone into making it Just Work 99% of the 
time!).


  I tried --with-lapack, no joy.
  Will try to decipher the rules file tomorrow ...

  cheers
   Ben


On 3/17/21 10:25 PM, Dirk Eddelbuettel wrote:


Ben,

This stuff has worked unchanged since the 1990s when we had a _really_ far
sighted fellow in Debian come up with the 'switch the links' scheme which was
(and is) subsequently deployed by many numerical applications within Debian,
R and e.g. Octave included.

And I used this ability to switch over a decade ago in a never-quite-finished
paper which resulted in a package as well as a vignette as paper draft on
CRAN: gcbd [1] It used the ability to switch between implementation to time
and compare and benchmark the various BLAS and LAPACK libraries -- which was
then motivated by a comparison with GPUs. (The actual code / package is
stale-ish as some of the underlying packages have gone as eg the GPU one --
but the mechanics you are after still work the exact same way on Debian and
derivarives including Ubuntu and PopOS.)

(As a complete aside, the state of the art here is now one level up in
libraries based on flame/blis (a riff on blas) which can do a similar logical
switch _at runtime_ (rather than by flipping softlinks and restarting the
app). Julia and some other languages uses that, I think Fedora may have it in
its R build as well. Inaki may know more...)

That said, from the top of my head, I think you error may just be with the
second R compilation -- I always (i.e. for the Debian package) use both
   --with-blas --with-lapack
and not just --with-blas. And I do there is public: if you know where to look
you can see the exact invocation of the Debian build of the R package (which
Ubuntu and Pop and ... then shadow) [2]

Hth, Dirk

[1] https://cran.r-project.org/package=gcbd
[2] https://sources.debian.org/src/r-base/4.0.4-1/debian/rules/
 (and I apologise for how messy this still is)



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] boneheaded BLAS questions

2021-03-17 Thread Ben Bolker
  I've been going around in circles trying to get BLAS-switching 
working on a current r-devel, I'm sure I'm doing something dumb.  Any 
ideas about what I might be doing wrong, or suggestions for further 
diagnosis, would be welcome!


  tl;dr  I am compiling R-devel with (to the best of my knowledge) 
options set to allow BLAS-switching, but getting "undefined symbol" errors.


 

  Latest R-devel (via SVN), PopOS!/Ubuntu 20.10

  I have read Dirk E's post: https://github.com/eddelbuettel/mkl4deb
  I have attempted to read the relevant section of R Installation & 
Administration several times: 
https://cran.r-project.org/doc/manuals/r-release/R-admin.html#BLAS

  https://wiki.debian.org/DebianScience/LinearAlgebraLibraries


  I have installed MKL and OpenBLAS on my system via 'apt install' 
(libopenblas-dev, libopenblas-base, and TWO versions of intel-mkl-64bit)


  When I build R without BLAS everything is OK;

	rm -Rf r-build; mkdir r-build; cd r-build; ../r-devel/configure 
--without-blas --enable-R-shlib --enable-BLAS-shlib; make -j 6



Matrix products: default
BLAS:   /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so


   When I look at my BLAS alternatives I don't see anything obviously 
wrong:



sudo update-alternatives --config libblas.so.3-x86_64-linux-gnu
There are 3 choices for the alternative libblas.so.3-x86_64-linux-gnu 
(providing /usr/lib/x86_64-linux-gnu/libblas.so.3).


  SelectionPath 
Priority   Status


* 0/opt/intel/mkl/lib/intel64/libmkl_rt.so 
150   auto mode
  1/opt/intel/mkl/lib/intel64/libmkl_rt.so 
 150   manual mode
  2/usr/lib/x86_64-linux-gnu/blas/libblas.so.3 
 10manual mode
  3/usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
 100   manual mode



  When I rebuild R with --with-blas:

	rm -Rf r-build; mkdir r-build; cd r-build; ../r-devel/configure 
--with-blas --enable-R-shlib --enable-BLAS-shlib; make -j 6


 I end up with this:

gcc -I../../../r-devel/src/extra -I/usr/include/tirpc -I. 
-I../../src/include -I../../../r-devel/src/include  -I/usr/local/include 
-I../../../r-devel/src/nmath -DHAVE_CONFIG_H   -fopenmp -fpic  -g -O2 
-c ../../../r-devel/src/main/Rmain.c -o Rmain.o
gcc -Wl,--export-dynamic -fopenmp  -L"../../lib" -L/usr/local/lib -o 
R.bin Rmain.o  -lR -lRblas



/usr/bin/ld: ../../lib/libR.so: undefined reference to `zgemm_'
/usr/bin/ld: ../../lib/libR.so: undefined reference to `daxpy_'
/usr/bin/ld: ../../lib/libR.so: undefined reference to `dgemv_'
/usr/bin/ld: ../../lib/libR.so: undefined reference to `dscal_'


   If

===
intel-mkl-64bit-2018.2-046/all,now 2018.2-046 amd64 [installed]
intel-mkl-64bit-2020.4-912/all,now 2020.4-912 amd64 [installed]

<... lots more intel-mkl stuff>

libblas-dev/groovy,now 3.9.0-3ubuntu1 amd64 [installed,automatic]
libblas3/groovy,now 3.9.0-3ubuntu1 amd64 [installed,automatic]
libgraphblas3/groovy,now 1:5.8.1+dfsg-2 amd64 [installed,automatic]
libgslcblas0/groovy,now 2.6+dfsg-2 amd64 [installed,automatic]
libopenblas-base/groovy,now 0.3.10+ds-3ubuntu1 amd64 [installed]
libopenblas-dev/groovy,now 0.3.10+ds-3ubuntu1 amd64 [installed]
libopenblas-pthread-dev/groovy,now 0.3.10+ds-3ubuntu1 amd64 
[installed,automatic]
libopenblas0-pthread/groovy,now 0.3.10+ds-3ubuntu1 amd64 
[installed,automatic]

libopenblas0/groovy,now 0.3.10+ds-3ubuntu1 amd64 [installed]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] fragility of MASS::boxcox

2021-03-14 Thread Ben Bolker
  MASS::boxcox fails if (1) the data is a data frame called 'attr' (2) 
user doesn't specify y=TRUE, qr=TRUE in the initial lm() fit.


  boxcox.lm() calls update(), which apparently finds the built-in 
'attr' object instead of the data frame in the global environment.


  Is there anything to be done about this (other than the old "don't 
use names for your objects that are the same as built-in R functions") ?


  cheers
   Ben Bolker


library(MASS)
m1 <- lm(height~age, data=Loblolly)
boxcox(m1)
attr <- Loblolly
m3 <- update(m1, data=attr, y=TRUE, qr=TRUE)  ## fine
m2 <- update(m1, data=attr)
boxcox(m2)

> Error in model.frame.default(formula = height ~ age, data = attr, 
drop.unused.levels = TRUE) :

  'data' must be a data.frame, environment, or list

Error in model.frame.default(formula = height ~ age, data = attr, 
drop.unused.levels = TRUE) :

  'data' must be a data.frame, environment, or list
> traceback()
12: stop("'data' must be a data.frame, environment, or list")
11: model.frame.default(formula = height ~ age, data = attr, 
drop.unused.levels = TRUE)
10: stats::model.frame(formula = height ~ age, data = attr, 
drop.unused.levels = TRUE)

9: eval(mf, parent.frame())
8: eval(mf, parent.frame())
7: lm(formula = height ~ age, data = attr, y = TRUE, qr = TRUE)
6: eval(call, parent.frame())
5: eval(call, parent.frame())
4: update.default(object, y = TRUE, qr = TRUE, ...)
3: update(object, y = TRUE, qr = TRUE, ...)
2: boxcox.lm(m2)
1: boxcox(m2)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] trivial typo in src/library/base/man/pmatch.Rd

2021-03-13 Thread Ben Bolker
  You're right.  But I guess my confusion is proof that it can be 
confusing.


What about "The value to be returned for positions where there are 
either no partial matches or multiple partial matches" ?


 ("positions where the number of partial matches is not exactly 1" :-))


On 3/13/21 3:44 AM, peter dalgaard wrote:

I suspect this is as meant, but it is "multiply", the adverb, not the verb. So it might be worth 
rephrasing, but "multiple" would be wrong (it is about cases where you at one position have several 
partial matches, not several positions where you have a partial match). "Non-uniquely partially 
matching", perhaps?

-pd


On 13 Mar 2021, at 01:50 , Ben Bolker  wrote:

ll. 17-18 of src/library/base/man/pmatch.Rd says "the value to be returned at 
non-matching or multiply partially matching positions".

  I think "multiply" should be "multiple" there?

  Can submit an actual patch if that would be more useful.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] trivial typo in src/library/base/man/pmatch.Rd

2021-03-12 Thread Ben Bolker
ll. 17-18 of src/library/base/man/pmatch.Rd says "the value to be 
returned at non-matching or multiply partially matching positions".


  I think "multiply" should be "multiple" there?

  Can submit an actual patch if that would be more useful.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] installing from source

2020-12-28 Thread Ben Bolker
  Kevin Ushey pointed out to me privately that he submitted a bug 
report and a patch for this about a month ago, which Kurt Hornik put in 
R-devel (c79477):


https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17973

On 12/28/20 4:35 AM, Martin Maechler wrote:

Ben Bolker
 on Sun, 27 Dec 2020 15:02:47 -0500 writes:


 > There is a recurring issue with installing from source into paths
 > that contain single quotes/apostrophes. "Why would anyone do that??" is
 > certainly a legitimate response to such a problem, but I would also say
 > this constitutes a legitimate bug.  Would replacing both single-quotes
 > below with \\' solve the problem?

Here, I'm mostly among the  "Why would anyone do that??" people,
but I agree that it's worth some effort to try fixing this.

To your question above: Why don't you create a repr.ex. (we'd
want anyway for R-bugzilla) and *see* if your proposition solves
it - or did I misinterpret the Q?

 > I'm happy to post this (with a patch if my fix seems appropriate) on
 > r-bugzilla.


 > cheers
 > Ben Bolker

 > line 1672 of src/library/tools/R/install.R :

 > cmd <- paste0("tools:::.test_load_package('", pkg_name, "', ",
 > quote_path(lib), ")")


 > 
https://github.com/wch/r-source/blob/2eade649c80725352256f16509f9ff6919fd079c/src/library/tools/R/install.R#L1672

 > 
https://stackoverflow.com/questions/15129888/r-cmd-install-error-unexpected-symbol-in-test-load-package-function

 > 
https://stackoverflow.com/questions/65462881/cannot-download-packages-from-github-from-unexpected-symbol

 > __
 > R-devel@r-project.org mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] installing from source

2020-12-27 Thread Ben Bolker
  There is a recurring issue with installing from source into paths 
that contain single quotes/apostrophes. "Why would anyone do that??" is 
certainly a legitimate response to such a problem, but I would also say 
this constitutes a legitimate bug.  Would replacing both single-quotes 
below with \\' solve the problem?
   I'm happy to post this (with a patch if my fix seems appropriate) on 
r-bugzilla.


  cheers
Ben Bolker

line 1672 of src/library/tools/R/install.R :

  cmd <- paste0("tools:::.test_load_package('", pkg_name, "', ", 
quote_path(lib), ")")



https://github.com/wch/r-source/blob/2eade649c80725352256f16509f9ff6919fd079c/src/library/tools/R/install.R#L1672

https://stackoverflow.com/questions/15129888/r-cmd-install-error-unexpected-symbol-in-test-load-package-function

https://stackoverflow.com/questions/65462881/cannot-download-packages-from-github-from-unexpected-symbol

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R crashes when using huge data sets with character string variables

2020-12-12 Thread Ben Bolker

  On Windows you can use memory.limit.

https://stackoverflow.com/questions/12582793/limiting-memory-usage-in-r-under-linux

   Not sure how much that helps.

On 12/12/20 6:19 PM, Arne Henningsen wrote:

When working with a huge data set with character string variables, I
experienced that various commands let R crash. When I run R in a
Linux/bash console, R terminates with the message "Killed". When I use
RStudio, I get the message "R Session Aborted. R encountered a fatal
error. The session was terminated. Start New Session". If an object in
the R workspace needs too much memory, I would expect that R would not
crash but issue an error message "Error: cannot allocate vector of
size ...".  A minimal reproducible example (at least on my computer)
is:

nObs <- 1e9

date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs,
1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" )

Is this a bug or a feature of R?

Some information about my R version, OS, etc:

R> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=en_DK.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_DK.UTF-8LC_COLLATE=en_DK.UTF-8
[5] LC_MONETARY=en_DK.UTF-8LC_MESSAGES=en_DK.UTF-8
[7] LC_PAPER=en_DK.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.3

/Arne



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Ben Bolker
  Definitely support the idea that if this kind of trickery is going to 
happen that it be confined to some particular IDE/environment or some 
particular submission protocol. I don't want it to happen in my ESS 
session please ... I'd rather deal with the parentheses.


On 12/9/20 3:45 PM, Timothy Goodman wrote:

Regarding special treatment for |>, isn't it getting special treatment
anyway, because it's implemented as a syntax transformation from x |> f(y)
to f(x, y), rather than as an operator?

That said, the point about wanting a block of code submitted line-by-line
to work the same as a block of code submitted all at once is a fair one.
Maybe the better solution would be if there were a way to say "Submit the
selected code as a single expression, ignoring line-breaks".  Then I could
run any number of lines with pipes at the start and no special character at
the end, and have it treated as a single pipeline.  I suppose that'd need
to be a feature offered by the environment (RStudio's RNotebooks in my
case).  I could wrap my pipelines in parentheses (to make the "pipes at
start of line" syntax valid R code), and then could use the hypothetical
"submit selected code ignoring line-breaks" feature when running just the
first part of the pipeline -- i.e., selecting full lines, but starting
after the opening paren so as not to need to insert a closing paren.

- Tim

On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch 
wrote:


On 09/12/2020 2:33 p.m., Timothy Goodman wrote:

If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
command in the Notebook environment I'm using) I certainly *would*
expect R to treat it as a complete statement.

But what I'm talking about is a different case, where I highlight a
multi-line statement in my notebook:

  my_data_frame1
  |> filter(some_conditions_1)

and then press Ctrl+Enter.


I don't think I'd like it if parsing changed between passing one line at
a time and passing a block of lines.  I'd like to be able to highlight a
few lines and pass those, then type one, then highlight some more and
pass those:  and have it act as though I just passed the whole combined
block, or typed everything one line at a time.


Or, I suppose the equivalent would be to run

an R script containing those two lines of code, or to run a multi-line
statement like that from the console (which in RStudio I can do by
pressing Shift+Enter between the lines.)

In those cases, R could either (1) Give an error message [the current
behavior], or (2) understand that the first line is meant to be piped to
the second.  The second option would be significantly more useful, and
is almost certainly what the user intended.

(For what it's worth, there are some languages, such as Javascript, that
consider the first token of the next line when determining if the
previous line was complete.  JavaScript's rules around this are overly
complicated, but a rule like "a pipe following a line break is treated
as continuing the previous line" would be much simpler.  And while it
might be objectionable to treat the operator %>% different from other
operators, the addition of |>, which isn't truly an operator at all,
seems like the right time to consider it.)


I think this would be hard to implement with R's current parser, but
possible.  I think it could be done by distinguishing between EOL
markers within a block of text and "end of block" marks.  If it applied
only to the |> operator it would be *really* ugly.

My strongest objection to it is the one at the top, though.  If I have a
block of lines sitting in my editor that I just finished executing, with
the cursor pointing at the next line, I'd like to know that it didn't
matter whether the lines were passed one at a time, as a block, or some
combination of those.

Duncan Murdoch



-Tim

On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch mailto:murdoch.dun...@gmail.com>> wrote:

 The requirement for operators at the end of the line comes from the
 interactive nature of R.  If you type

   my_data_frame_1

 how could R know that you are not done, and are planning to type the
 rest of the expression

 %>% filter(some_conditions_1)
 ...

 before it should consider the expression complete?  The way languages
 like C do this is by requiring a statement terminator at the end.

You

 can also do it by wrapping the entire thing in parentheses ().

 However, be careful: Don't use braces:  they don't work.  And parens
 have the side effect of removing invisibility from the result (which

is

 a design flaw or bonus, depending on your point of view).  So I
 actually
 wouldn't advise this workaround.

 Duncan Murdoch


 On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
  > Hi,
  >
  > I'm a data scientist who routinely uses R in my day-to-day work,
 for tasks
  > such as cleaning and transforming data, exploratory data
 analysis, etc.
  >

Re: [Rd] the pipe |> and line breaks in pipelines

2020-12-09 Thread Ben Bolker

  FWIW there is previous discussion of this in a twitter thread from May:

https://twitter.com/bolkerb/status/1258542150620332039

at the end I suggested defining something like .__END <- identity() as a 
pipe-ender.


On 12/9/20 2:58 PM, Kevin Ushey wrote:

I agree with Duncan that the right solution is to wrap the pipe
expression with parentheses. Having the parser treat newlines
differently based on whether the session is interactive, or on what
type of operator happens to follow a newline, feels like a pretty big
can of worms.

I think this (or something similar) would accomplish what you want
while still retaining the nice aesthetics of the pipe expression, with
a minimal amount of syntax "noise":

result <- (
   data
 |> op1()
 |> op2()
)

For interactive sessions where you wanted to execute only parts of the
pipeline at a time, I could see that being accomplished by the editor
-- it could transform the expression so that it could be handled by R,
either by hoisting the pipe operator(s) up a line, or by wrapping the
to-be-executed expression in parentheses for you. If such a style of
coding became popular enough, I'm sure the developers of such editors
would be interested and willing to support this ...

Perhaps more importantly, it would be much easier to accomplish than a
change to the behavior of the R parser, and it would be work that
wouldn't have to be maintained by the R Core team.

Best,
Kevin

On Wed, Dec 9, 2020 at 11:34 AM Timothy Goodman  wrote:


If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
command in the Notebook environment I'm using) I certainly *would* expect R
to treat it as a complete statement.

But what I'm talking about is a different case, where I highlight a
multi-line statement in my notebook:

 my_data_frame1
 |> filter(some_conditions_1)

and then press Ctrl+Enter.  Or, I suppose the equivalent would be to run an
R script containing those two lines of code, or to run a multi-line
statement like that from the console (which in RStudio I can do by pressing
Shift+Enter between the lines.)

In those cases, R could either (1) Give an error message [the current
behavior], or (2) understand that the first line is meant to be piped to
the second.  The second option would be significantly more useful, and is
almost certainly what the user intended.

(For what it's worth, there are some languages, such as Javascript, that
consider the first token of the next line when determining if the previous
line was complete.  JavaScript's rules around this are overly complicated,
but a rule like "a pipe following a line break is treated as continuing the
previous line" would be much simpler.  And while it might be objectionable
to treat the operator %>% different from other operators, the addition of
|>, which isn't truly an operator at all, seems like the right time to
consider it.)

-Tim

On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch 
wrote:


The requirement for operators at the end of the line comes from the
interactive nature of R.  If you type

  my_data_frame_1

how could R know that you are not done, and are planning to type the
rest of the expression

%>% filter(some_conditions_1)
...

before it should consider the expression complete?  The way languages
like C do this is by requiring a statement terminator at the end.  You
can also do it by wrapping the entire thing in parentheses ().

However, be careful: Don't use braces:  they don't work.  And parens
have the side effect of removing invisibility from the result (which is
a design flaw or bonus, depending on your point of view).  So I actually
wouldn't advise this workaround.

Duncan Murdoch


On 09/12/2020 12:45 a.m., Timothy Goodman wrote:

Hi,

I'm a data scientist who routinely uses R in my day-to-day work, for

tasks

such as cleaning and transforming data, exploratory data analysis, etc.
This includes frequent use of the pipe operator from the magrittr and

dplyr

libraries, %>%.  So, I was pleased to hear about the recent work on a
native pipe operator, |>.

This seems like a good time to bring up the main pain point I encounter
when using pipes in R, and some suggestions on what could be done about
it.  The issue is that the pipe operator can't be placed at the start of

a

line of code (except in parentheses).  That's no different than any

binary

operator in R, but I find it's a source of difficulty for the pipe

because

of how pipes are often used.

[I'm assuming here that my usage is fairly typical of a lot of users; at
any rate, I don't think I'm *too* unusual.]

=== Why this is a problem ===

It's very common (for me, and I suspect for many users of dplyr) to write
multi-step pipelines and put each step on its own line for readability.
Something like this:

### Example 1 ###
my_data_frame_1 %>%
  filter(some_conditions_1) %>%
  inner_join(my_data_frame_2, by = some_columns_1) %>%
  group_by(some_columns_2) %>%
  summarize(some

[Rd] undocumented 'offset' argument in src/library/grDevices/man/adjustcolor.Rd

2020-11-30 Thread Ben Bolker

  The 'offset' argument description is blank ...

  maybe 'additive adjustment to each of the (red, green, blue, alpha) 
values defining the colors, after adjustment by the corresponding 
\code{.f} factor' ...?


This is the relevant code:

 x <- col2rgb(col, alpha = TRUE)/255
x[] <- pmax(0, pmin(1,
transform %*% x +
matrix(offset, nrow = 4L, ncol = ncol(x
rgb(x[1L,], x[2L,], x[3L,], x[4L,])

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] return (x+1) * 1000

2020-11-20 Thread Ben Bolker
  OK, you're way ahead of me.  If this is in the QA tools, I guess I
don't really see the need to change the parser and/or the language to
flag it immediately?

On Fri, Nov 20, 2020 at 7:43 PM Duncan Murdoch  wrote:
>
> On 20/11/2020 7:01 p.m., Ben Bolker wrote:
> >I may be unusual but I don't find these examples surprising at all/
> > I don't think I would make these mistakes (maybe it's easier to make
> > that mistake if you're used to a language where 'return' is a keyword
> > rather than a function?
> >
> > My two cents would be that it would make more sense to (1) write
> > code to detect these constructions in existing R code (I'm not good at
> > this, but presumably "return() as anything other than the head of an
> > element of the body of a function" would work?)
>
> No, it's commonly nested within an if() expression, and could appear
> anywhere else.
>
>   (2) apply it to some
> > corpus of R code to see whether it actually happens much;
>
> I did that, in the bug report #17180 I cited.  In 2016 it appeared to be
> misused in about 100 packages.
>
> (3) if so,
> > add the test you wrote in step 1 to the QA tools in the utils
> > package/CRAN checks.
>
> That was done this year.
>
> Duncan Murdoch
>
> >
> > On Fri, Nov 20, 2020 at 6:58 PM Henrik Bengtsson
> >  wrote:
> >>
> >> Without having dug into the details, it could be that one could update
> >> the parser by making a 'return' a keyword and require it to be
> >> followed by a parenthesis that optionally contains an expression
> >> followed by end of statement (newline or semicolon).  Such a
> >> "promotion" of the 'return' statement seems backward compatible and
> >> would end up throwing syntax errors on:
> >>
> >> function() return
> >> function() return 2*x
> >> function() return (2*x) + 1
> >>
> >> while still accepting:
> >>
> >> function() return()
> >> function() return(2*x)
> >> function() return((2*x) + 1)
> >>
> >> Just my two Friday cents
> >>
> >> /Henrik
> >>
> >> On Fri, Nov 20, 2020 at 3:37 PM Dénes Tóth  wrote:
> >>>
> >>> Yes, the behaviour of return() is absolutely consistent. I am wondering
> >>> though how many experienced R developers would predict the correct
> >>> return value just by looking at those code snippets.
> >>>
> >>> On 11/21/20 12:33 AM, Gabriel Becker wrote:
> >>>> And the related:
> >>>>
> >>>>  > f = function() stop(return("lol"))
> >>>>
> >>>>  > f()
> >>>>
> >>>>  [1] "lol"
> >>>>
> >>>>
> >>>> I have a feeling all of this is just return() performing correctly
> >>>> though. If there are already R CMD CHECK checks for this kind of thing
> >>>> (I wasnt sure but I'm hearing from others there may be/are) that may be
> >>>> (and/or may need to be) sufficient.
> >>>>
> >>>> ~G
> >>>>
> >>>> On Fri, Nov 20, 2020 at 3:27 PM Dénes Tóth  >>>> <mailto:toth.de...@kogentum.hu>> wrote:
> >>>>
> >>>>  Or even more illustratively:
> >>>>
> >>>>  uneval_after_return <- function(x) {
> >>>>  return(x) * stop("Not evaluated")
> >>>>  }
> >>>>  uneval_after_return(1)
> >>>>  # [1] 1
> >>>>
> >>>>  On 11/20/20 10:12 PM, Mateo Obregón wrote:
> >>>>   > Dear r-developers-
> >>>>   >
> >>>>   > After many years of using and coding in R and other languages, I
> >>>>  came across
> >>>>   > something that I think should be flagged by the parser:
> >>>>   >
> >>>>   > bug <- function (x) {
> >>>>   >   return (x + 1) * 1000
> >>>>   > }
> >>>>   >> bug(1)
> >>>>   > [1] 2
> >>>>   >
> >>>>   > The return() call is not like any other function call that
> >>>>  returns a value to
> >>>>   > the point where it was called from. I think this should
> >>>>  straightforwardly be
> >>>>   > handled in the parser by flagging it as a syntactic error.
> >>>>   >
> >>>>   > Thoughts?
> >>>>   >
> >>>>   > Mateo.
> >>>>   > --
> >>>>   > Mateo Obregón.
> >>>>   >
> >>>>   > __
> >>>>   > R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
> >>>>   > https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>   >
> >>>>
> >>>>  __
> >>>>  R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
> >>>>  https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>
> >>>
> >>> __
> >>> R-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] return (x+1) * 1000

2020-11-20 Thread Ben Bolker
  I may be unusual but I don't find these examples surprising at all/
I don't think I would make these mistakes (maybe it's easier to make
that mistake if you're used to a language where 'return' is a keyword
rather than a function?

   My two cents would be that it would make more sense to (1) write
code to detect these constructions in existing R code (I'm not good at
this, but presumably "return() as anything other than the head of an
element of the body of a function" would work?) (2) apply it to some
corpus of R code to see whether it actually happens much; (3) if so,
add the test you wrote in step 1 to the QA tools in the utils
package/CRAN checks.

On Fri, Nov 20, 2020 at 6:58 PM Henrik Bengtsson
 wrote:
>
> Without having dug into the details, it could be that one could update
> the parser by making a 'return' a keyword and require it to be
> followed by a parenthesis that optionally contains an expression
> followed by end of statement (newline or semicolon).  Such a
> "promotion" of the 'return' statement seems backward compatible and
> would end up throwing syntax errors on:
>
> function() return
> function() return 2*x
> function() return (2*x) + 1
>
> while still accepting:
>
> function() return()
> function() return(2*x)
> function() return((2*x) + 1)
>
> Just my two Friday cents
>
> /Henrik
>
> On Fri, Nov 20, 2020 at 3:37 PM Dénes Tóth  wrote:
> >
> > Yes, the behaviour of return() is absolutely consistent. I am wondering
> > though how many experienced R developers would predict the correct
> > return value just by looking at those code snippets.
> >
> > On 11/21/20 12:33 AM, Gabriel Becker wrote:
> > > And the related:
> > >
> > > > f = function() stop(return("lol"))
> > >
> > > > f()
> > >
> > > [1] "lol"
> > >
> > >
> > > I have a feeling all of this is just return() performing correctly
> > > though. If there are already R CMD CHECK checks for this kind of thing
> > > (I wasnt sure but I'm hearing from others there may be/are) that may be
> > > (and/or may need to be) sufficient.
> > >
> > > ~G
> > >
> > > On Fri, Nov 20, 2020 at 3:27 PM Dénes Tóth  > > > wrote:
> > >
> > > Or even more illustratively:
> > >
> > > uneval_after_return <- function(x) {
> > > return(x) * stop("Not evaluated")
> > > }
> > > uneval_after_return(1)
> > > # [1] 1
> > >
> > > On 11/20/20 10:12 PM, Mateo Obregón wrote:
> > >  > Dear r-developers-
> > >  >
> > >  > After many years of using and coding in R and other languages, I
> > > came across
> > >  > something that I think should be flagged by the parser:
> > >  >
> > >  > bug <- function (x) {
> > >  >   return (x + 1) * 1000
> > >  > }
> > >  >> bug(1)
> > >  > [1] 2
> > >  >
> > >  > The return() call is not like any other function call that
> > > returns a value to
> > >  > the point where it was called from. I think this should
> > > straightforwardly be
> > >  > handled in the parser by flagging it as a syntactic error.
> > >  >
> > >  > Thoughts?
> > >  >
> > >  > Mateo.
> > >  > --
> > >  > Mateo Obregón.
> > >  >
> > >  > __
> > >  > R-devel@r-project.org  mailing list
> > >  > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >  >
> > >
> > > __
> > > R-devel@r-project.org  mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] vignettes present in 2 folders or won't work

2020-11-01 Thread Ben Bolker
  I take Duncan's point but would second the motion to have WRE clarify 
how static vignettes are supposed to work; it's a topic I am repeatedly 
confused about despite being an experienced package maintainer. If 
knowledgeable outsiders compiled a documentation patch would it be 
likely to be considered ...??


On 11/1/20 2:29 PM, Duncan Murdoch wrote:

On 01/11/2020 1:02 p.m., Alexandre Courtiol wrote:

Noted Duncan and TRUE...

I cannot do more immediately unfortunately, that is always the issue 
of asking a last minute panic attack question before teaching a course 
involving the package...
I do have /doc in my .Rbuildignore for reasons I can no longer 
remember... I will dig and create a MRE/reprex.
The students will download heavy packages, but they probably won't 
notice.

*Apologies*

In the meantime, perhaps my question was clear enough to get clarity on:
1) whether having vignettes twice in foders inst/doc and vignettes is 
normal or not when vignettes are static.
2) where could anyone find a complete documentation on R vignettes 
since it is a recurring issue in this list and elsewhere.


The Writing R Extensions manual describes vignette support in R, but R 
allows contributed packages (like knitr, rmarkdown, R.rsp) to handle 
vignettes.  WRE explains enough to write such a package, but it's up to 
their authors to document how to use them, so "complete documentation" 
is spread out all over the place.  As with any documentation, there are 
probably errors and omissions.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] more Matrix weirdness

2020-09-08 Thread Ben Bolker
  Am I being too optimistic in expecting this (mixing and matching 
matrices and Matrices) to work?  If x is a matrix and m is a Matrix, 
replacing a commensurately sized sub-matrix of x with m throws "number 
of items to replace is not a multiple of replacement length" ...


x <- matrix(0,nrow=3,ncol=10, dimnames=list(letters[1:3],LETTERS[1:10]))
rr <- c("a","b","c")
cc <- c("B","C","E")
m <- Matrix(matrix(1:9,3,3))
x[rr,cc] <- m

   cheers
Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] M[cbind()] <- assignment with Matrix object loses attributes

2020-08-22 Thread Ben Bolker

  Thanks for taking a look!

   Hmm, really?  In `R Under development (unstable) (2020-08-14 
r79020)`, doing the indexed assignment with a regular matrix (as opposed 
to a Matrix) appears to preserve attributes.


m1 <- matrix(1:9,3,3)
attr(m1,"junk") <- 12
stopifnot(isTRUE(attr(m1,"junk")==12)) ## OK
m1[cbind(1:2,2:3)] <- 1
stopifnot(isTRUE(attr(m1,"junk")==12)) ## OK
attr(m1,"junk")  ## 12

   Do you lose attributes with this code? It would surprise me if this 
had changed in recent versions but I guess anything's possible ...


On 8/22/20 3:36 AM, Abby Spurdle wrote:

Hi Ben,

I had some problems reproducing this.
As far as I can tell *all* indexed assignments drop attributes.
(Maybe we have different versions).

I'm not an expert on S4, but I'm unenthusiastic about mixing slot (S4)
semantics with attribute (S3) semantics.
And str() excludes attributes, but attributes() includes slots.
Highlighting the problems here...

I think R should generate an error or a warning, if a user tries to
assign attributes to S4 objects.

In saying that, mixing OO design with numerical linear algebra is a gold mine...


On Tue, Aug 11, 2020 at 1:23 PM Ben Bolker  wrote:


Does this constitute a bug, or is there something I'm missing?
assigning sub-elements of a sparse Matrix via M[X]<-..., where X is a
2-column matrix, appears to drop user-assigned attributes. I dug around
in the R code for Matrix trying to find the relevant machinery but my
brain started to hurt too badly ...

 Will submit this as a bug if it seems warranted.

library(Matrix)
m1 <- matrix(1:9,3,3)
m1 <- Matrix(m1)
attr(m1,"junk") <- 12
stopifnot(isTRUE(attr(m1,"junk")==12))  ## OK
m1[cbind(1:2,2:3)] <- 1
stopifnot(isTRUE(attr(m1,"junk")==12)) ## not OK
attr(m1,"junk") ## NULL


## note I have to use the ugly stopifnot(isTRUE(...)) because a missing
attribute returns NULL, an assignment to NULL returns NULL, and
stopifnot(NULL) doesn't stop ...


 cheers

   Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] M[cbind()] <- assignment with Matrix object loses attributes

2020-08-10 Thread Ben Bolker
  Does this constitute a bug, or is there something I'm missing? 
assigning sub-elements of a sparse Matrix via M[X]<-..., where X is a 
2-column matrix, appears to drop user-assigned attributes. I dug around 
in the R code for Matrix trying to find the relevant machinery but my 
brain started to hurt too badly ...


   Will submit this as a bug if it seems warranted.

library(Matrix)
m1 <- matrix(1:9,3,3)
m1 <- Matrix(m1)
attr(m1,"junk") <- 12
stopifnot(isTRUE(attr(m1,"junk")==12))  ## OK
m1[cbind(1:2,2:3)] <- 1
stopifnot(isTRUE(attr(m1,"junk")==12)) ## not OK
attr(m1,"junk") ## NULL


## note I have to use the ugly stopifnot(isTRUE(...)) because a missing 
attribute returns NULL, an assignment to NULL returns NULL, and 
stopifnot(NULL) doesn't stop ...



   cheers

 Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] qnbinom with small size is slow

2020-08-07 Thread Ben Bolker

   I can reproduce this on

R Under development (unstable) (2020-07-24 r78910)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 18.04 LTS

  In my opinion this is worth reporting, but discussing it here first 
was a good idea.  Many more people read this list than watch the bug 
tracker, so it will get more attention here; once the excitement has 
died down here (which might be almost immediately!), if no-one has 
already volunteered to post it to the bug tracker, request an account 
(as specified at https://www.r-project.org/bugs.html )


  Thanks!

   Ben Bolker


For what it's worth it doesn't seem to be a threshold effect: approximately

log10(time[seconds]) ~ -8 - log10(-size)

over the range from 1e-6 to 1e-9


ff <- function(x) {
   system.time(qnbinom(0.5, mu=3, size=10^x))[["elapsed"]]
}
svec <- seq(-5,-9,by=-0.2)
res <- lapply(svec, function(x) {
    cat(x,"\n")
    replicate(10,ff(x))
    })

dd <- data.frame(size=rep(svec,each=10),
 time=unlist(res))
boxplot(log10(time)~size, dd)
summary(lm(log10(time)~size, data=dd, subset=time>0))




On 8/7/20 2:01 PM, Constantin Ahlmann-Eltze via R-devel wrote:


Hi all,

I recently noticed that `qnbinom()` can take a long time to calculate
a result if the `size` argument is very small.
For example
qnbinom(0.5, mu = 3, size = 1e-10)
takes ~30 seconds on my computer.

I used gdb to step through the qnbinom.c implementation and noticed
that in line 106
(https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qnbinom.c#L106)
`y` becomes a very large negative number. Later in the function `y` is
(as far as I can see) only used as input for `pnbinom()` which is why
I would assume that it should be a non-negative integer.

I was wondering if this behavior could be considered a bug and should
be reported on the bugzilla? I read the instructions at
https://www.r-project.org/bugs.html and wasn't quite sure, so I
decided to ask here first :)

Best,
Constantin




PS: I tested the code with R 4.0.0 on macOS and the latest unstable
version using docker (https://github.com/wch/r-debug). The session
info is

sessionInfo()

R Under development (unstable) (2020-08-06 r78973)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS:   /usr/local/RD/lib/R/lib/libRblas.so
LAPACK: /usr/local/RD/lib/R/lib/libRlapack.so

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.1.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] trivial typo in ?Matrix::sparse.model.matrix.Rd

2020-07-20 Thread Ben Bolker

  "form" -> "from". Diff against latest SVN:

Index: sparse.model.matrix.Rd
===
--- sparse.model.matrix.Rd    (revision 3336)
+++ sparse.model.matrix.Rd    (working copy)
@@ -4,7 +4,7 @@
 \alias{fac2sparse}
 \alias{fac2Sparse}
 \description{Construct a sparse model or \dQuote{design} matrix,
-  form a formula and data frame (\code{sparse.model.matrix}) or a single
+  from a formula and data frame (\code{sparse.model.matrix}) or a single
   factor (\code{fac2sparse}).

   The \code{fac2[Ss]parse()} functions are utilities, also used

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] closing R graphics windows?

2020-05-26 Thread Ben Bolker



   Does anyone have any idea how hard it would be/where to start if one 
wanted to hack/patch R to allow X11 graphics windows that had keyboard 
focus to be closed with standard keyboard shortcuts (e.g. Ctrl-W to 
close on Linux)?  Has this been suggested/tried before?


   cheers

    Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] unstable corner of parameter space for qbeta?

2020-03-26 Thread Ben Bolker



On 2020-03-26 4:02 a.m., Martin Maechler wrote:
>>>>>> Ben Bolker 
>>>>>> on Wed, 25 Mar 2020 21:09:16 -0400 writes:
> 
> > I've discovered an infelicity (I guess) in qbeta(): it's not a bug,
> > since there's a clear warning about lack of convergence of the numerical
> > algorithm ("full precision may not have been achieved").  I can work
> > around this, but I'm curious why it happens and whether there's a better
> > workaround -- it doesn't seem to be in a particularly extreme corner of
> > parameter space. It happens, e.g., for  these parameters:
> 
> > phi <- 1.1
> > i <- 0.01
> > t <- 0.001
> > shape1 = i/phi  ##  0.009090909
> > shape2 = (1-i)/phi  ## 0.9
> > qbeta(t,shape1,shape2)  ##  5.562685e-309
> > ##  brute-force uniroot() version, see below
> > Qbeta0(t,shape1,shape2)  ## 0.9262824
> 
> > The qbeta code is pretty scary to read: the warning "full precision
> > may not have been achieved" is triggered here:
> 
> > 
> https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qbeta.c#L530
> 
> > Any thoughts?
> 
> Well,  qbeta() is mostly based on inverting pbeta()  and pbeta()
> has *several* "dangerous" corners in its parameter spaces
> {in some cases, it makes sense to look at the 4 different cases
>  log.p = TRUE/FALSE  //  lower.tail = TRUE/FALSE  separately ..}
> 
> pbeta() itself is based on the most complex numerical code in
> all of base R, i.e., src/nmath/toms708.c  and that algorithm
> (TOMS 708) had been sophisticated already when it was published,
> and it has been improved and tweaked several times since being
> part of R, notably for the log.p=TRUE case which had not been in
> the focus of the publication and its algorithm.
> [[ NB: part of this you can read when reading  help(pbeta)  to the end ! ]]
> 
> I've spent many "man weeks", or even "man months" on pbeta() and
> qbeta(), already and have dreamed to get a good student do a
> master's thesis about the problem and potential solutions I've
> looked into in the mean time.
> 
> My current gut feeling is that in some cases, new approximations
> are necessary (i.e. tweaking of current approximations is not
> going to help sufficiently).
> 
> Also not (in the R sources)  tests/p-qbeta-strict-tst.R
> a whole file of "regression tests" about  pbeta() and qbeta()
> {where part of the true values have been computed with my CRAN
> package Rmpfr (for high precision computation) with the
> Rmpfr::pbetaI() function which gives arbitrarily precise pbeta()
> values but only when  (a,b) are integers -- that's the "I" in pbetaI().
> 
> Yes, it's intriguing ... and I'll look into your special
> findings a bit later today.
> 
> 
>   > Should I report this on the bug list?
> 
> Yes, please.  Not all problem of pbeta() / qbeta() are part yet,
> of R's bugzilla data base,  and maybe this will help to draw
> more good applied mathematicians look into it.

  Will report.

  I'm not at all surprised that this is a super-tough problem.  The only
part that was surprising to me was that my naive uniroot-based solution
worked (for this particular corner of parameter space where qbeta() has
trouble: it was terrible elsewhere, so now I'm using a hybrid solution
where I use my brute-force uniroot thing if I get a warning from qbeta().

I hesitated to even bring it up because I know you're really busy, but I
figured it was better to tag it now and let you deal with it some time
later.

Bugzilla report at https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17746

  cheers
   Ben Bolker


> 
> 
> 
> Martin Maechler
> ETH Zurich and R Core team
> (I'd call myself the "dpq-hacker" within R core -- related to
>  my CRAN package 'DPQ')
> 
> 
> > A more general illustration:
> > http://www.math.mcmaster.ca/bolker/misc/qbeta.png
> 
> > ===
> > fun <- function(phi,i=0.01,t=0.001, f=qbeta) {
> > f(t,shape1=i/phi,shape2=(1-i)/phi, lower.tail=FALSE)
> > }
> > ## brute-force beta quantile function
> > Qbeta0 <- function(t,shape1,shape2,lower.tail=FALSE) {
> > fn <- function(x) {pbeta(x,shape1,shape2,lower.tail=lower.tail)-t}
> > uniroot(fn,interval=c(0,1))$root
> > }
> > Qbeta <- Vectorize(Qbeta0,c("t","shape1","shape2"))
> > curve(fun,from=1,to=4)
> > curve(fun(x,f=Qbeta),add=TRUE,col=2)
> 
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] unstable corner of parameter space for qbeta?

2020-03-25 Thread Ben Bolker


  I've discovered an infelicity (I guess) in qbeta(): it's not a bug,
since there's a clear warning about lack of convergence of the numerical
algorithm ("full precision may not have been achieved").  I can work
around this, but I'm curious why it happens and whether there's a better
workaround -- it doesn't seem to be in a particularly extreme corner of
parameter space. It happens, e.g., for  these parameters:

phi <- 1.1
i <- 0.01
t <- 0.001
shape1 = i/phi  ##  0.009090909
shape2 = (1-i)/phi  ## 0.9
qbeta(t,shape1,shape2)  ##  5.562685e-309
##  brute-force uniroot() version, see below
Qbeta0(t,shape1,shape2)  ## 0.9262824

  The qbeta code is pretty scary to read: the warning "full precision
may not have been achieved" is triggered here:

https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qbeta.c#L530

  Any thoughts?  Should I report this on the bug list?


A more general illustration:
http://www.math.mcmaster.ca/bolker/misc/qbeta.png

===
fun <- function(phi,i=0.01,t=0.001, f=qbeta) {
  f(t,shape1=i/phi,shape2=(1-i)/phi, lower.tail=FALSE)
}
## brute-force beta quantile function
Qbeta0 <- function(t,shape1,shape2,lower.tail=FALSE) {
  fn <- function(x) {pbeta(x,shape1,shape2,lower.tail=lower.tail)-t}
  uniroot(fn,interval=c(0,1))$root
}
Qbeta <- Vectorize(Qbeta0,c("t","shape1","shape2"))
curve(fun,from=1,to=4)
curve(fun(x,f=Qbeta),add=TRUE,col=2)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] help with rchk warnings on Rf_eval(Rf_lang2(...))

2020-03-23 Thread Ben Bolker


 Thanks, that's really useful.  One more question for you, or someone
else here:

const ArrayXd glmLink::linkFun(const ArrayXd& mu) const {
return as(::Rf_eval(::Rf_lang2(as(d_linkFun),

as(Rcpp::NumericVector(mu.data(),

 mu.data() + mu.size()))
), d_rho);
}


I guess I need that to read
PROTECT(::Rf_eval(PROTECT(::Rf_lang2(...),...) , but as written it
doesn't seem I have anywhere to squeeze in an UNPROTECT(2).  Do I need
to define a temporary variable so I can UNPROTECT(2) before I return the
value?

Or is there a way I can use Shield() since this an Rcpp-based project
anyway?

  Sorry for all the very basic questions, but I'm flying nearly blind
here ...

  cheers
   Ben Bolker



On 2020-03-23 4:01 p.m., Tomas Kalibera wrote:
> On 3/23/20 8:39 PM, Ben Bolker wrote:
>> Dear r-devel folks,
>>
>>    [if this is more appropriate for r-pkg-devel please let me know and
>> I'll repost it over there ...]
>>
>> I'm writing to ask for help with some R/C++ integration idioms that are
>> used in a package I'm maintaining, that are unfamilar to me, and that
>> are now being flagged as problematic by Tomas Kalibera's 'rchk'
>> machinery (https://github.com/kalibera/rchk); results are here
>> https://raw.githubusercontent.com/kalibera/cran-checks/master/rchk/results/lme4.out
>>
>>
>> The problem is with constructions like
>>
>> ::Rf_eval(::Rf_lang2(fun, arg), d_rho)
>>
>> I *think* this means "construct a two-element pairlist from fun and arg,
>> then evaluate it within expression d_rho"
>>
>> This leads to warnings like
>>
>> "calling allocating function Rf_eval with argument allocated using
>> Rf_lang2"
>>
>> Is this a false positive or ... ? Can anyone help interpret this?
> This is a true error. You need to protect the argument of eval() before
> calling eval, otherwise eval() could destroy it before using it. This is
> a common rule: whenever passing an argument to a function, that argument
> must be protected (directly or indirectly). Rchk tries to be smart and
> doesn't report a warning when it can be sure that in that particular
> case, for that particular function, it is safe. This is easy to fix,
> just protect the result of lang2() before the call and unprotect (some
> time) after.
>> Not sure why this idiom was used in the first place: speed? (e.g., see
>> https://stat.ethz.ch/pipermail/r-devel/2019-June/078020.html ) Should I
>> be rewriting to avoid Rf_eval entirely in favor of using a Function?
>> (i.e., as commented in
>> https://stackoverflow.com/questions/37845012/rcpp-function-slower-than-rf-eval
>>
>> : "Also, calling Rf_eval() directly from a C++ context is dangerous as R
>> errors (ie, C longjmps) will bypass the destructors of C++ objects and
>> leak memory / cause undefined behavior in general. Rcpp::Function tries
>> to make sure that doesn't happen.")
> 
> Yes, eval (as well as lang2) can throw an error, this error has to be
> caught via R API and handled (e.g. by throwing as exception or something
> else, indeed that exception then needs to be caught and possibly
> converted back when leaving again to C stack frames). An R/C API you can
> use here is R_UnwindProtect. This is of course a bit of a pain, and one
> does not have to worry when programming in plain C.
> 
> I suppose Rcpp provides some wrapper around R_UnwindProtect, that would
> be a question for Rcpp experts/maintainers.
> 
> Best
> Tomas
> 
>>
>>   Any tips, corrections, pointers to further documentation, etc. would be
>> most welcome ... Web searching for this stuff hasn't gotten me very far,
>> and it seems to be deeper than most of the introductory material I can
>> find (including the Rcpp vignettes) ...
>>
>>    cheers
>>     Ben Bolker
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] help with rchk warnings on Rf_eval(Rf_lang2(...))

2020-03-23 Thread Ben Bolker
Dear r-devel folks,

  [if this is more appropriate for r-pkg-devel please let me know and
I'll repost it over there ...]

I'm writing to ask for help with some R/C++ integration idioms that are
used in a package I'm maintaining, that are unfamilar to me, and that
are now being flagged as problematic by Tomas Kalibera's 'rchk'
machinery (https://github.com/kalibera/rchk); results are here
https://raw.githubusercontent.com/kalibera/cran-checks/master/rchk/results/lme4.out

The problem is with constructions like

::Rf_eval(::Rf_lang2(fun, arg), d_rho)

I *think* this means "construct a two-element pairlist from fun and arg,
then evaluate it within expression d_rho"

This leads to warnings like

"calling allocating function Rf_eval with argument allocated using Rf_lang2"

Is this a false positive or ... ? Can anyone help interpret this?

Not sure why this idiom was used in the first place: speed? (e.g., see
https://stat.ethz.ch/pipermail/r-devel/2019-June/078020.html ) Should I
be rewriting to avoid Rf_eval entirely in favor of using a Function?
(i.e., as commented in
https://stackoverflow.com/questions/37845012/rcpp-function-slower-than-rf-eval
: "Also, calling Rf_eval() directly from a C++ context is dangerous as R
errors (ie, C longjmps) will bypass the destructors of C++ objects and
leak memory / cause undefined behavior in general. Rcpp::Function tries
to make sure that doesn't happen.")

 Any tips, corrections, pointers to further documentation, etc. would be
most welcome ... Web searching for this stuff hasn't gotten me very far,
and it seems to be deeper than most of the introductory material I can
find (including the Rcpp vignettes) ...

  cheers
   Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] survival bug?

2020-03-03 Thread Ben Bolker


  Microsoft offers fully-provisioned but time-limited developer images
for Windows 10 (I think they last for 3 months) for most major VM
platforms (including VirtualBox, which is the one I currently use).
There would certainly be a start-up cost in effort, but probably not any
financial cost.

   cheers
Ben Bolker

On 2020-03-03 4:02 p.m., Gabriel Becker wrote:
> Hi Terry,
> 
> http://win-builder.r-project.org/ and the rhub build service (which can be
> invoked by the rhub package) allow on demand checks in windows
> environments, though for active debugging the iteration time can be quite
> painful.
> 
> If you have access, e.g., through your employer, to a windows license you
> should also be able to do use VMWare or VirtualBox (I can never remember
> which one I like more) to run windows and test that way. This will have
> some start up cost in effort but allows active testing and iteration.
> 
> Hope that helps,
> ~G
> 
> On Tue, Mar 3, 2020 at 7:00 AM Therneau, Terry M., Ph.D. via R-devel <
> r-devel@r-project.org> wrote:
> 
>> My latest submission of survival3.1-10 to CRAN fails  a check, but only on
>> windows, which
>> I don't use.
>> How do I track this down?
>> The test in question works fine on my Linux box.
>>
>> Terry
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dput()

2020-02-29 Thread Ben Bolker


 I think Robin knows about FAQ 7.31/floating point (author of
'Brobdingnag', among other numerical packages).  I agree that this is
surprising (to me).

  To reframe this question: is there way to get an *exact* ASCII
representation of a numeric value (i.e., guaranteeing the restored value
is identical() to the original) ?

 .deparseOpts has

‘"digits17"’: Real and finite complex numbers are output using
  format ‘"%.17g"’ which may give more precision than the
  default (but the output will depend on the platform and there
  may be loss of precision when read back).

  ... but this still doesn't guarantee that all precision is kept.

  Maybe

 saveRDS(x,textConnection("out","w"),ascii=TRUE)
identical(x,as.numeric(out[length(out)]))   ## TRUE

?




On 2020-02-29 2:42 a.m., Rui Barradas wrote:
> Hello,
> 
> FAQ 7.31
> 
> See also this StackOverflow post:
> 
> https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal
> 
> Hope this helps,
> 
> Rui Barradas
> 
> Às 00:08 de 29/02/20, robin hankin escreveu:
>> My interpretation of dput.Rd is that dput() gives an exact ASCII form
>> of the internal representation of an R object.  But:
>>
>>   rhankin@cuttlefish:~ $ R --version
>> R version 3.6.2 (2019-12-12) -- "Dark and Stormy Night"
>> Copyright (C) 2019 The R Foundation for Statistical Computing
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> [snip]
>>
>> rhankin@cuttlefish:~ $ R --vanilla --quiet
>>> x <- sum(dbinom(0:20,20,0.35))
>>> dput(x)
>> 1
>>> x-1
>> [1] -4.440892e-16
>>>
>>> x==1
>> [1] FALSE
>>>
>>
>> So, dput(x) gives 1, but x is not equal to 1.  Can anyone advise?
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] specials issue, a heads up

2020-02-24 Thread Ben Bolker
In the long run, coming up with a way to parse specials in formulas
that is both clean and robust is a good idea - annoying users are a
little bit like CRAN maintainers in this respect. I think I would
probably do this by testing identical(eval(extracted_head),
survival::Surv) - but this has lots of potential annoyances (what if
extracted_head is a symbol that can't be found in any attached
environment?  Do we have to start with if
(length(find(deparse(extracted_head))>0) ?

In the short run, a clear note in the documentation seems entirely sufficient.

On Mon, Feb 24, 2020 at 12:01 PM Hugh Parsonage
 wrote:
>
> I mean if the person filing the bug regards style as more important than
> the truth of how R treats formulas then they’re literally talking in
> another language.
>
> I strongly recommend you do nothing or at most make a note in the
> documentation addressing this. Your time is too valuable.
>
> On Tue, 25 Feb 2020 at 12:56 am, Therneau, Terry M., Ph.D. via R-devel <
> r-devel@r-project.org> wrote:
>
> > I recently had a long argument wrt the survival package, namely that the
> > following code
> > didn't do what they expected, and so they reported it as a bug
> >
> >survival::coxph( survival::Surv(time, status) ~ age + sex +
> > survival::strata(inst),
> > data=lung)
> >
> > a. The Google R style guide  recommends that one put :: everywhere
> > b. This breaks the recognition of cluster as a "special" in the terms
> > function.
> >
> > I've been stubborn and said that their misunderstanding of how formulas
> > work is not my
> > problem.   But I'm sure that the issue will come up again, and multiple
> > other packages
> > will break.
> >
> > A big problem is that the code runs, it just gives the wrong answer.
> >
> > Suggestions?
> >
> > Terry T.
> >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] trivial typo in man page Quote.Rd

2020-02-21 Thread Ben Bolker


  Attn: someone on R-core:

  "ran" should be "can".

  Also, thanks for this feature!

Index: Quotes.Rd
===
--- Quotes.Rd   (revision 77845)
+++ Quotes.Rd   (working copy)
@@ -74,7 +74,7 @@
   Raw character constants are also available using a syntax similar to
   the one used in C++: \code{r"(...)"} with \code{...} any character
   sequence, except that it must not contain the closing sequence
-  \samp{)"}. The delimiter pairs \code{[]} and \code{\{\}} ran also be
+  \samp{)"}. The delimiter pairs \code{[]} and \code{\{\}} can also be
   used. For  additional flexibility, a number of dashes can be placed
   between the opening quote and the opening delimiter, as long as the same
   number of dashes appear between the closing delimiter and the closing
quote.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 3.6.3 scheduled for February 29

2020-02-06 Thread Ben Bolker
  Because it's the fifth recurrence of the date (29 February).
  https://en.wikipedia.org/wiki/The_Pirates_of_Penzance

On Thu, Feb 6, 2020 at 3:32 PM Abby Spurdle  wrote:
>
> Congratulations!
>
> > celebrate (beeR=TRUE, loud.music=FALSE,
> nbeeRs=2L,
> proportion.of.tech.talk=0.4)
>
> Why is it the 5th anniversary and the not the 20th anniversary?
>
>
> On Fri, Feb 7, 2020 at 4:58 AM Peter Dalgaard via R-devel
>  wrote:
> >
> > Full schedule is available on developer.r-project.org.
> >
> > (The date is chosen to celebrate the 5th anniversary of R 1.0.0. Some 
> > irregularity may occur on the release day, since this happens to be a 
> > Saturday and the release manager is speaking at the CelebRation2020 
> > event...)
> >
> > --
> > Peter Dalgaard, Professor,
> > Center for Statistics, Copenhagen Business School
> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> > Phone: (+45)38153501
> > Office: A 4.23
> > Email: pd@cbs.dk  Priv: pda...@gmail.com
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-20 Thread Ben Bolker


 Ugh, sounds like competing priorities.

  * maintain type consistency
  * minimize storage (= current version, since 3.0.0)
  * maximize utility for large lambda (= proposed change)
  * keep user interface, and code, simple (e.g., it would be easy enough
to add a switch that provided user control of int vs double return value)
  * backward compatibility



On 2020-01-20 12:33 p.m., Martin Maechler wrote:
>> Benjamin Tyner 
>> on Mon, 20 Jan 2020 08:10:49 -0500 writes:
> 
> > On 1/20/20 4:26 AM, Martin Maechler wrote:
> >> Coming late here -- after enjoying a proper weekend ;-) --
> >> I have been agreeing (with Spencer, IIUC) on this for a long
> >> time (~ 3 yrs, or more?), namely that I've come to see it as a
> >> "design bug" that  rpois() {and similar} must return return typeof() 
> "integer".
> >> 
> >> More strongly, I'm actually pretty convinced they should return
> >> (integer-valued) double instead of NA_integer_   and for that
> >> reason should always return double:
> >> Even if we have (hopefully) a native 64bit integer in R,
> >> 2^64 is still teeny tiny compared .Machine$double.max
> >> 
> >> (and then maybe we'd have .Machine$longdouble.max  which would
> >> be considerably larger than double.max unless on Windows, where
> >> the wise men at Microsoft decided to keep their workload simple
> >> by defining "long double := double" - as 'long double'
> >> unfortunately is not well defined by C standards)
> >> 
> >> Martin
> >> 
> > Martin if you are in favor, then certainly no objection from me! ;-)
> 
> > So now what about other discrete distributions e.g. could a similar 
> > enhancement apply here?
> 
> 
> >> rgeom(10L, 1e-10)
> >  [1] NA 1503061294 NA NA 1122447583 NA
> >  [7] NA NA NA NA
> > Warning message:
> > In rgeom(10L, 1e-10) : NAs produced
> 
> yes, of course there are several such distributions.
> 
> It's really something that should be discussed (possibly not
> here, .. but then I've started it here ...).
> 
> The  NEWS  for R 3.0.0 contain (in NEW FEATURES) :
> 
> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
>   rsignrank() and rwilcox() now return integer (not double)
>   vectors.  This halves the storage requirements for large
>   simulations.
> 
> and what I've been suggesting is to revert this change
> (svn rev r60225-6) which was purposefully and diligently done by
> a fellow R core member, so indeed must be debatable. 
> 
> Martin
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as-cran issue

2020-01-13 Thread Ben Bolker
  From R NEWS (changes in 3.6.0)

Experimentally, setting environment variable _R_CHECK_LENGTH_1_LOGIC2_
will lead to warnings (or errors if the variable is set to a ‘true’
value) when && or || encounter and use arguments of length more than one.

On 2020-01-13 11:46 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:
> Thanks for the feedback Dirk.   I sent my follow-up before I saw it.
> 
> Looking at the source code, it appears that there is no options() call
> to turn this on. Nor does "R --help" reveal a command line option.
> How then does a user turn this on outside of the R CMD check
> envirionment, so as to chase things like this down?
> 
> The fact that 1. renaming my function makes the error go away, 2. my
> function is just a wrapper to inherits(), and 3. its a new error in code
> that hasn't changed, all point me towards some oddity with the check
> function.
> 
> Terry
> 
> 
> On 1/13/20 10:22 AM, Dirk Eddelbuettel wrote:
>>
>> On 13 January 2020 at 10:02, Therneau, Terry M., Ph.D. via R-devel wrote:
>> | Where can I find out (and replicate) what options as-cran turns on?
>>
>> See the file src/library/tools/R/check.R in the R sources, and grep for
>> as_cran which is the internal variable controlled by the --as-cran option
>>
>> [...]
>>
>> | The check log contains multiple instances of the lines below:
>> |
>> | < Warning message:
>> | < In if (ismat(kmat)) { :
>> | <   the condition has length > 1 and only the first element will be
>> used
>> |
>> | I don't see how the error could arise, but if I know what as-cran is
>> doing perhaps I can
>> | replicate it.
>>
>> This was widely discussed on this list and should also be in the NEWS
>> file.
>>
>> The change is about what the message says: the if () tests a scalar
>> logical,
>> it appears that ismat(kmat) returns more than a scalar.
>>
>> There has always been an opt-in for this to error -- cf many messages
>> by Henrik
>> over the years as he tried to convince us all to use it more.
>>
>>
>> Dirk
>>
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Inconsistencies in wilcox.test

2019-12-07 Thread Ben Bolker


 Your second issue seems like a more or less unavoidable floating-point
computation issue.  The paired test operates by computing differences
between corresponding values of x and y.

  It's not impossible to try to detect "almost-ties" (by testing for
differences less than, say, sqrt(.Machine$double.eps)), but it's
delicate and somewhat subjective/problem-dependent.

  Example:

options(digits=20)
> unique(c(4,3,2)-c(3,2,1))
[1] 1
> unique(c(0.4,0.3,0.2)-c(0.3,0.2,0.1))
[1] 0.100033307 0.099977796 0.15551

On 2019-12-07 1:55 p.m., Karolis Koncevičius wrote:
> Hello,
> 
> Writing to share some things I've found about wilcox.test() that seem a
> a bit inconsistent.
> 
> 1. Inf values are not removed if paired=TRUE
> 
> # returns different results (Inf is removed):
> wilcox.test(c(1,2,3,4), c(0,9,8,7))
> wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
> 
> # returns the same result (Inf is left as value with highest rank):
> wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
> wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)
> 
> 2. tolerance issues with paired=TRUE.
> 
> wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
> # ...
> # Warning:  cannot compute exact p-value with ties
> 
> wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
> # ...
> # no warning
> 
> 3. Always 'x observations are missing' when paired=TRUE
> 
> wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
> # ...
> # Error:  not enough (finite) 'x' observations
> 
> 4. No indication if normal approximation was used:
> 
> # different numbers, but same "method" name
> wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
> wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
> 
> 
> From all of these I am pretty sure the 1st one is likely unintended,
> so attaching a small patch to adjust it. Can also try patching others if
> consensus is reached that the behavioiur has to be modified.
> 
> Kind regards,
> Karolis Koncevičius.
> 
> ---
> 
> Index: wilcox.test.R
> ===
> --- wilcox.test.R  (revision 77540)
> +++ wilcox.test.R  (working copy)
> @@ -42,7 +42,7 @@
>  if(paired) {
>  if(length(x) != length(y))
>  stop("'x' and 'y' must have the same length")
> -    OK <- complete.cases(x, y)
> +    OK <- is.finite(x) & is.finite(y)
>  x <- x[OK] - y[OK]
>  y <- NULL
>  }
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] depending on orphaned packages?

2019-09-29 Thread Ben Bolker



On 2019-09-25 3:26 a.m., Martin Maechler wrote:
>>>>>> Ben Bolker 
>>>>>> on Tue, 24 Sep 2019 20:09:55 -0400 writes:
> 
> > SuppDists is orphaned on CRAN (and has been since 2013).
> > https://cran.r-project.org/web/checks/check_results_.html
> 
> > Oddly, the simulate method for the inverse.gaussian family
> > [inverse.gaussian()$simulate] depends (in a loose sense) on SuppDists
> > (it fails if the SuppDists namespace is not available:
> 
> > if (!requireNamespace("SuppDists", quietly = TRUE))
> > stop("need CRAN package 'SuppDists' for simulation from the
> > 'inverse.gaussian' family")
> 
> 
> > The statmod package also implements inverse gaussian d/p/q/r functions
> > <https://journal.r-project.org/archive/2016-1/giner-smyth.pdf>.  It is
> > lightweight (depends on R >= 3.0.0, imports only base packages [stats
> > and graphics]) and has been around for a long time (archived versions on
> > CRAN go back to 2003).
> 
> > Would it make sense to replace the call to SuppDists::rinvGauss with a
> > corresponding call to statmod::rinvgauss ?  Would a patch be considered?
> 
> > Ben Bolker
> 
> I'd say "yes" & "yes".
> 
> "Base" code weekly depending on CRAN packages (apart from
> formally 'Recommended' ones)  is somewhat sub-optimal in any
> case, ((but possibly still the best thing, given reality
> [maintenance efforts, copyrights, ...])),
> but your proposal seems a  "uniformly not worse"  change
> ((and I have very much liked delving into parts of Gordon
>   Smyth's textbook on GLMs as a really nice mixture / in-between
>   of rigorous math and applied stats))

   I did actually think of a reason *not* to do this.

   The resulting random deviates generated by statmod::rinvgauss aren't
exactly the same as those from SuppDists::rinvGauss (same algorithm, but
I guess they use sufficiently different internal machinery), so this
could break exact backward compatibility for any code that uses
simulate() for inverse-Gaussian models.  Still might be worth doing, but
now the change is *not* "uniformly not worse".

An alternative (which would remove the dependency on a CRAN package)
would be to pull the code of statmod::rinvgauss into R (which would be
allowed - statmod is GPL 2/3 - but of course it would be polite to ask).
The downside to this solution would be adding the maintenance burden of
this code ...

> 
> Martin Maechler
> ETH Zurich and R Core
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] depending on orphaned packages?

2019-09-24 Thread Ben Bolker
SuppDists is orphaned on CRAN (and has been since 2013).

https://cran.r-project.org/web/checks/check_results_.html

 Oddly, the simulate method for the inverse.gaussian family
[inverse.gaussian()$simulate] depends (in a loose sense) on SuppDists
(it fails if the SuppDists namespace is not available:

if (!requireNamespace("SuppDists", quietly = TRUE))
stop("need CRAN package 'SuppDists' for simulation from the
'inverse.gaussian' family")


  The statmod package also implements inverse gaussian d/p/q/r functions
<https://journal.r-project.org/archive/2016-1/giner-smyth.pdf>.  It is
lightweight (depends on R >= 3.0.0, imports only base packages [stats
and graphics]) and has been around for a long time (archived versions on
CRAN go back to 2003).

  Would it make sense to replace the call to SuppDists::rinvGauss with a
corresponding call to statmod::rinvgauss ?  Would a patch be considered?

  Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Underscores in package names

2019-08-09 Thread Ben Bolker


 Ugh, but not *as* ambiguous as the proposed example (you can still
split unambiguously on "_"; yes, you could split on "last _" in
Gabriel's example, but ...)

On 2019-08-09 4:17 p.m., Duncan Murdoch wrote:
> On 09/08/2019 2:41 p.m., Gabriel Becker wrote:
>> Note that this proposal would make mypackage_2.3.1 a valid *package
>> name*,
>> whose corresponding tarball name might be mypackage_2.3.1_2.3.2 after a
>> patch. Yes its a silly example, but why allow that kind of ambiguity?
>>
> CRAN already has a package named "FuzzyNumbers.Ext.2", whose tarball is
> FuzzyNumbers.Ext.2_3.2.tar.gz, so I think we've already lost that game.
> 
> Duncan Murdoch
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Underscores in package names

2019-08-09 Thread Ben Bolker


  Creeping code complexity ...

  I like to think that the cuteR names will have a Darwinian
disadvantage in the long run. FWIW Hadley Wickham argues (rightly, I
think) against mixed-case names:
http://r-pkgs.had.co.nz/package.html#naming. I too am guilty of picking
mixed-case package names in the past.  Extra credit if the package name
and the standard function have different cases! e.g.
glmmADMB::glmmadmb(), although (a) that wasn't my choice and (b) at
least it was never on CRAN and (c) it wasn't one of the cuteR variety.

  Bonus points for the first analysis of case conventions in existing
CRAN package names ... I'll start.

> a1 <- rownames(available.packages())
> cute <- "[a-z]*R[a-z]*"
> table(grepl(cute,a1))

FALSE  TRUE
12565  2185


On 2019-08-09 2:00 p.m., neonira Arinoem wrote:
> Won't it be better to have a convention that allows lowercase, dash,
> underscore and dot as only valid characters for new package names and keep
> the ancient format validation scheme for older package names?
> 
> This could be implemented by a single function, taking a strictNaming_b_1
> parameter which defaults to true. Easy to use, and compliance results will
> vary according to the parameter value, allowing strict compliance for new
> package names and lazy compliance for older ones.
> 
> Doing so allows to enforce a new package name convention while also
> insuring continuity of compliance for already existing package names.
> 
> Fabien GELINEAU alias Neonira
> 
> Le ven. 9 août 2019 à 18:40, Kevin Wright  a écrit :
> 
>> Please, no.  I'd also like to disallow uppercase letters in package names.
>> For instance, the cuteness of using a capital "R" in package names is
>> outweighed by the annoyance of trying to remember which packages use an
>> upper-case letter.
>>
>> On Thu, Aug 8, 2019 at 9:32 AM Jim Hester 
>> wrote:
>>
>>> Are there technical reasons that package names cannot be snake case?
>>> This seems to be enforced by `.standard_regexps()$valid_package_name`
>>> which currently returns
>>>
>>>"[[:alpha:]][[:alnum:].]*[[:alnum:]]"
>>>
>>> Is there any technical reason this couldn't be altered to accept `_`
>>> as well, e.g.
>>>
>>>   "[[:alpha:]][[:alnum:]._]*[[:alnum:]]"
>>>
>>> I realize that historically `_` has not always been valid in variable
>>> names, but this has now been acceptable for 15+ years (since R 1.9.0 I
>>> believe). Might we also allow underscores for package names?
>>>
>>> Jim
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>
>> --
>> Kevin Wright
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: quiet namespace load is noisy

2019-07-23 Thread Ben Bolker


  Does setting message=FALSE in the chunk options of the vignette help?

  Or less preferably, using supressMessages() ?

On 2019-07-23 9:36 a.m., Lenth, Russell V wrote:
> Lionel,
> 
> Thanks for your response. I understand that method overriding can be a 
> serious issue, but as you say, this is not something that the user can act 
> upon. Yet the message lands at the user’s feet. 
> 
> In my case, the messages are cluttering my package vignettes, and may or may 
> not represent what users see if they themselves run the vignette code, 
> depending on what version of ggplot2, etc. they have. I will certainly update 
> my ggplot2 installation and that will help. But basically I don’t ever want 
> these kinds of messages to appear in my vignettes, so I will seek some other 
> workaround. 
> 
> Russ
> 
> Sent from my iPhone
> 
>> On Jul 23, 2019, at 1:32 AM, Lionel Henry  wrote:
>>
>> Hello,
>>
>> I think `quietly` should only silence normal masking messages
>> intended for users and providing information about normal
>> behaviour, such as masking.  This is not the case here as the
>> message is about overriding of S3 methods, which has global
>> effect and is rather problematic. It may change behaviour of
>> package and script code in unpredictable ways.
>>
>> This is not something that the user can act upon, the developers
>> of the parties involved need to be contacted by users so they can
>> fix it (the developers of the conflicting methods might not be
>> aware if the generic is from a third party package, such as
>> base::print()). In the case of ggplot2 vs rlang, you can update
>> ggplot2 to the latest version to fix these messages.
>>
>>> After all, other package startup messages ARE suppressed, and
>>> even error messages are suppressed
>>
>> Note that `quietly = TRUE` does not really suppress error
>> messages for missing packages. The errors are converted to a
>> boolean return value, and thus become normal behaviour, for which
>> it makes sense to suppress the message. This does not imply the
>> S3 overriding message should be suppressed as well.
>>
>> Best,
>> Lionel
>>
>>
>>> On 23 Jul 2019, at 06:29, Lenth, Russell V  wrote:
>>>
>>> Dear R-devel,
>>>
>>> Consider the following clip (in R version 3.6.0, Windows):
>>>
 requireNamespace("ggplot2", quietly = TRUE)
>>>   Registered S3 methods overwritten by 'ggplot2':
>>> method from 
>>> [.quosures rlang
>>> c.quosures rlang
>>> print.quosures rlang
>>>
>>> It seems to me that if one specifies 'quietly = TRUE', then messages about 
>>> S3 method overrides should be quieted along with everything else. After 
>>> all, other package startup messages ARE suppressed, and even error messages 
>>> are suppressed:
>>>
 requireNamespace("xyz", quietly = TRUE)
 ## (it is silent even though there is no "xyz" package)
>>>
>>> Thanks
>>>
>>> Russ Lenth
>>> U of Iowa
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] trivial typos in man/switch.Rd

2019-07-02 Thread Ben Bolker

  My colleague points out that these typos are probably still present
because almost no-one has the stamina to read that far down in ?switch ...

  cheers
Ben Bolker
Index: switch.Rd
===
--- switch.Rd   (revision 76766)
+++ switch.Rd   (working copy)
@@ -39,7 +39,7 @@
   in which case the next non-missing element is evaluated, so for
   example \code{switch("cc", a = 1, cc =, cd =, d = 2)} evaluates to
   \code{2}.  If there is more than one match, the first matching element
-  is used.  In the case of no match, if there is a unnamed element of
+  is used.  In the case of no match, if there is an unnamed element of
   \code{\dots} its value is returned.  (If there is more than one such
   argument an error is signaled.)
 
@@ -46,7 +46,7 @@
   The first argument is always taken to be \code{EXPR}: if it is named
   its name must (partially) match.
 
-  A warning is signaled if no alternatives are provides, as this is
+  A warning is signaled if no alternatives are provided, as this is
   usually a coding error.
   
   This is implemented as a \link{primitive} function that only evaluates
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z

2019-06-23 Thread Ben Bolker


  I agree with many the sentiments about the wisdom of computing very
small p-values (although the example below may win some kind of a prize:
I've seen people talking about p-values of the order of 10^(-2000), but
never 10^(-(10^8)) !).  That said, there are a several tricks for
getting more reasonable sums of very small probabilities.  The first is
to scale the p-values by dividing the *largest* of the probabilities,
then do the (p/sum(p)) computation, then multiply the result (I'm sure
this is described/documented somewhere).  More generally, there are
methods for computing sums on the log scale, e.g.

https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.misc.logsumexp.html

 I don't know where this has been implemented in the R ecosystem, but
this sort of computation is the basis of the "Brobdingnag" package for
operating on very large ("Brobdingnagian") and very small
("Lilliputian") numbers.


On 2019-06-21 6:58 p.m., jing hua zhao wrote:
> Hi Peter, Rui, Chrstophe and Gabriel,
> 
> Thanks for your inputs --  the use of qnorm(., log=TRUE) is a good point in 
> line with pnorm with which we devised log(p)  as
> 
> log(2) + pnorm(-abs(z), lower.tail = TRUE, log.p = TRUE)
> 
> that could do really really well for large z compared to Rmpfr. Maybe I am 
> asking too much since
> 
> z <-2
>> Rmpfr::format(2*pnorm(mpfr(-abs(z),100),lower.tail=TRUE,log.p=FALSE))
> [1] "1.660579603192917090365313727164e-86858901"
> 
> already gives a rarely seen small p value. I gather I also need a multiple 
> precision exp() and their sum since exp(z^2/2) is also a Bayes Factor so I  
> get log(x_i )/sum_i log(x_i) instead. To this point, I am obliged to clarify, 
> see 
> https://statgen.github.io/gwas-credible-sets/method/locuszoom-credible-sets.pdf.
> 
> I agree many feel geneticists go to far with small p values which I would 
> have difficulty to argue againston the other hand it is also expected to see 
> these in a non-genetic context. For instance the Framingham study was 
> established in 1948 just got $34m for six years on phenotypewide association 
> which we would be interesting to see.
> 
> Best wishes,
> 
> 
> Jing Hua
> 
> 
> 
> From: peter dalgaard 
> Sent: 21 June 2019 16:24
> To: jing hua zhao
> Cc: Rui Barradas; r-devel@r-project.org
> Subject: Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z
> 
> You may want to look into using the log option to qnorm
> 
> e.g., in round figures:
> 
>> log(1e-300)
> [1] -690.7755
>> qnorm(-691, log=TRUE)
> [1] -37.05315
>> exp(37^2/2)
> [1] 1.881797e+297
>> exp(-37^2/2)
> [1] 5.314068e-298
> 
> Notice that floating point representation cuts out at 1e+/-308 or so. If you 
> want to go outside that range, you may need explicit manipulation of the log 
> values. qnorm() itself seems quite happy with much smaller values:
> 
>> qnorm(-5000, log=TRUE)
> [1] -99.94475
> 
> -pd
> 
>> On 21 Jun 2019, at 17:11 , jing hua zhao  wrote:
>>
>> Dear Rui,
>>
>> Thanks for your quick reply -- this allows me to see the bottom of this. I 
>> was hoping we could have a handle of those p in genmoics such as 1e-300 or 
>> smaller.
>>
>> Best wishes,
>>
>>
>> Jing Hua
>>
>> 
>> From: Rui Barradas 
>> Sent: 21 June 2019 15:03
>> To: jing hua zhao; r-devel@r-project.org
>> Subject: Re: [Rd] Calculation of e^{z^2/2} for a normal deviate z
>>
>> Hello,
>>
>> Well, try it:
>>
>> p <- .Machine$double.eps^seq(0.5, 1, by = 0.05)
>> z <- qnorm(p/2)
>>
>> pnorm(z)
>> # [1] 7.450581e-09 1.22e-09 2.026908e-10 3.343152e-11 5.514145e-12
>> # [6] 9.094947e-13 1.500107e-13 2.474254e-14 4.080996e-15 6.731134e-16
>> #[11] 1.110223e-16
>> p/2
>> # [1] 7.450581e-09 1.22e-09 2.026908e-10 3.343152e-11 5.514145e-12
>> # [6] 9.094947e-13 1.500107e-13 2.474254e-14 4.080996e-15 6.731134e-16
>> #[11] 1.110223e-16
>>
>> exp(z*z/2)
>> # [1] 9.184907e+06 5.301421e+07 3.073154e+08 1.787931e+09 1.043417e+10
>> # [6] 6.105491e+10 3.580873e+11 2.104460e+12 1.239008e+13 7.306423e+13
>> #[11] 4.314798e+14
>>
>>
>> p is the smallest possible such that 1 + p != 1 and I couldn't find
>> anything to worry about.
>>
>>
>> R version 3.6.0 (2019-04-26)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 19.04
>>
>> Matrix products: default
>> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
>> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
>>
>> locale:
>>  [1] LC_CTYPE=pt_PT.UTF-8   LC_NUMERIC=C
>>  [3] LC_TIME=pt_PT.UTF-8LC_COLLATE=pt_PT.UTF-8
>>  [5] LC_MONETARY=pt_PT.UTF-8LC_MESSAGES=pt_PT.UTF-8
>>  [7] LC_PAPER=pt_PT.UTF-8   LC_NAME=C
>>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods
>> [7] base
>>
>> other attached packages:
>>
>> [many packages loaded]
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> �s 15:24 de 21/06/19, jing h

  1   2   3   4   5   6   >