Re: [Rd] stats::poly() stopped working for Date input -- intentional?

2022-08-05 Thread Martin Maechler
>>>>> Martin Maechler 
>>>>> on Fri, 8 Jul 2022 16:34:43 +0200 writes:

>>>>> Michael Chirico via R-devel 
>>>>> on Thu, 7 Jul 2022 22:17:12 -0700 writes:

>> SVN#80126 added rep.difftime, which means
>> rep(as.difftime(0, units="secs")) retains the "datetime"
>> class.

> (yes, by me, March 2021), this was fixing PR#18066 ==>
> https://bugs.r-project.org/show_bug.cgi?id=18066 )

> Thank you, Michael, for the report!

>> A consequence of this is that stats::poly() no longer
>> accepts Date/POSIXct input (because poly calls outer() on
>> the de-meaned input, which rep()):

>> # works on R 3.6.3 (and probably everything < 4.1.0) #
>> but on R 4.1.3 (and probably everything >= 4.1.0):
>> stats::poly(Sys.Date() - 0:4, 3) # Error in
>> Ops.difftime(X, Y, ...) : '^' not defined for "difftime"
>> objects

>> Is this intentional?

> Well, actually I think it was not intentional that poly()
> worked at all with Date/POSIXct input, ..  OTOH you *did*
> encounter it.

> Note that

>> poly(as.Date("2020-2-2") - 0:3, 2, raw = TRUE)
>   Error in Ops.Date(X, Y, ...) : ^ not defined for "Date"
> objects
>> 

> happens (I think) in all versions of R, i.e., even before
> the rep() extension.

>> If not, a simple patch is to call 'x <- as.double(x)'
>> before de-meaning.

> well, yes, in that branch of the source code.  ... and a
> similar call for the raw = TRUE case.

> At first, this seems to make sense to me, but actually it
> will break when someone uses

>poly(, ..)
   
> [ Also: what about the "prediction" case (coef =
> ) ?  could you use prediction of an lm() for
> your real use case ? ]

> ---

> Maybe it makes most sense if you open an R bugzilla entry
> for this (including part of our current dialogue).

Even though there hasn't been any such formal bug report,
I've now committed a change (to R-devel only for the time being,
svn revision 82681) 
which re-enables the working of poly() in such cases and even 
for the  raw=TRUE  case where it had never worked.
Also, this is now documented.

The only change to the source was the insertion of

if(is.object(x) && mode(x) == "numeric") x <- as.numeric(x) 

into the body of poly().


Thank you once more, Michael, for raising the issue.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] qt() returns Inf with certain negative ncp values

2022-06-14 Thread Martin Maechler
>>>>> GILLIBERT, Andre 
>>>>> on Tue, 14 Jun 2022 13:39:41 + writes:

> Hello,
>> I asked about the following observations on r-help and it
>> was suggested that they may indicate an algorithmic
>> problem with qt(), so I thought I should report them
>> here.

Which is fine.
Usually you should *CAREFULLY* read the corresponding reference
documentation before posting.

In this case, we have on R's help page {on non-central pt():}

This computes the lower tail only, so the upper tail suffers
from cancellation and a warning will be given when this is
likely to be significant. 

and (in ‘Note:’)

The code for non-zero ncp is principally intended to be used
for moderate values of ncp: it will not be highly accurate,
especially in the tails, for large values. 

and further also that a simple inversion is used for computing
the non-central qt().

> I explored numerical accuracy issues of pt and qt with
> non-central parameters.  There seems to be problems when
> probabilities are small (less than 10^-12 or 10^-14).

Yes, the help (above) says  "especially in the tails",
i.e., this is also well known.

> A few examples: pnorm(-30)# equal to 4.9e-198, which looks
> fine pt(-30, df=1, ncp=0)# equal to 1e-189, which
> looks fine too pt(-30, df=1, ncp=0.01) # equal to
> 1.044e-14, which looks bad. It should be closer to zero
> than the previous one pt(-300, df=1, ncp=0.01) # equal
> to 1.044e-14, while it should be even closer to zero !
> pt(-3000, df=1, ncp=0.01) # still equal to 1.044e-14,
> while it should be even closer to zero !

> qnorm(1e-13) # equal to -7.349, which looks fine qt(1e-13,
> df=1, ncp=0) # equal to -7.359, which looks fine
> qt(1e-13, df=1, ncp=0.01) # equal to -7.364, which
> looks fine qt(1.044e-14, df=1, ncp=0.01) # equal to
> -8.28, which looks fine qt(1.043e-14, df=1, ncp=0.01)
> # equal to -Inf, which is far too negative...

> The source code shows that the non-central qt() works by
> inverting the non-central pt()
> https://github.com/wch/r-source/blob/trunk/src/nmath/qnt.c

exactly; as the help page also says ..

> Consequently, both problems are related.

Indeed, and known and documented for a long time..

Still, this lack of a better algorithm had bothered me (as R
Core member) in the past quite a bit, and I had implemented other
approximations for cases where the current algorithm is
deficient... but I had not been entirely satisfied, nor had I
finished exploring or finding solutions in all relevant cases.

In the mean time I had created CRAN package 'DPQ' (Density,
Probability, Quantile computations) which also contains
quite a few functions related to better/alternative computations
of pt(*, ncp=*)  which I call pnt(), not the least because R's
implementation of the algorithm is in   /src/nmath/pnt.c
and the C function is called pnt().

Till now, I have not found a student or a collaborator to
finally get this project further  {{hint, hint!}}.

In DPQ, (download the *source* package if you are interested),
there's a help page listing the current approaches I have

  https://search.r-project.org/CRAN/refmans/DPQ/html/pnt.html
or
  https://rdrr.io/cran/DPQ/man/pnt.html

Additionally, in the source (man/pnt.Rd) there are comments about a not yet
implemented one, and there are even two R scripts exhibiting
bogous (and already fixed) behavior of the non-central t CDF:

 https://rdrr.io/rforge/DPQ/src/tests/t-nonc-tst.R   and
 https://rdrr.io/rforge/DPQ/src/tests/pnt-prec.R

Indeed, this situation *can* be improved, but it needs dedicated work
of people somewhat knowledgable in applied math etc.

Would you (readers ..) be interested in helping?

Best,
Martin

Martin Maechler
ETH Zurich  and  R Core team


PS: I'm adding code to explore this specific issue (better
inversion for those cases where pnt() is not the problem)
to my DPQ package  just these hours, notably a simple function
qtU() which only uses pt() and uniroot() to compute
(non-central) t-quantiles.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] a problem with the underscore in a R function document

2022-06-11 Thread Martin Maechler
> Yaoyong Li 
> on Sat, 11 Jun 2022 12:29:25 +0100 writes:

> Hello,

> I just got a problem in a function document in a package I developed.The
> file containing the document is generatingCDSaaFile.Rd. The problem is
> related to the underscore I used in the following sentence

> and the last part is ‘\_AAseq.txt.gz’

> I got a 'note' message when the package was checked in the CRAN:

> checkRd: (-1) generatingCDSaaFile.Rd:72: Escaped LaTeX specials: \_

> I don't think I saw this kind of message in the past (say six months
> ago). 

Yes, you are right.. this is quite new.  Some of my
/man/*.Rd  files have been affected similarly.

AFAIK there is a history behind, where the   Rd -> LaTeX
translation code was partly buggy  for quite a long time, and hence
such  `\`-escapes where necessary in *.Rd such that the produced
*.tex was LaTeX-able.

However, these Rd2latex bugs/problems have been fixed and now
almost all such \-escapes are not only unneeded but actually the
`\` become visible, hence ugly looking
==> hence the NOTE.

> A copy of the check result in CRAN is appended below. As you can
> see, I also got the same problem in other places in the document. I
> have been trying to fix the problem via searching Google. I have tried
> to replace "\_" with "\textunderscore " as some internet post
> suggested, but this did not solve the problem.

> So I just wonder if anyone can help me with the problem. Please let me
> know if any more information is needed.

In all cases in my packages,  just removing the `\` (or
sometimes `\\` ?)  was perfect, so I think you can and should do
just that.

Best,
Martin


> Best regards,

> Yaoyong

> #

> CRAN Package Check Results for Package geno2proteo
> 

> Last updated on 2022-06-11 11:52:26 CEST.
> FlavorVersionTinstallTcheckTtotalStatusFlags
> r-devel-linux-x86_64-debian-clang
> 

> 0.0.5 17.31 170.54 187.85 NOTE
> 

> r-devel-linux-x86_64-debian-gcc
> 

> 0.0.5 12.00 130.58 142.58 NOTE
> 

> r-devel-linux-x86_64-fedora-clang
> 

> 0.0.5 220.04 NOTE
> 

> r-devel-linux-x86_64-fedora-gcc
> 

> 0.0.5 229.54 NOTE
> 

> r-devel-windows-x86_64
> 

> 0.0.5 242.00 470.00 712.00 NOTE
> 

> r-patched-linux-x86_64
> 

> 0.0.5 12.58 164.99 177.57 OK
> 

> r-release-linux-x86_64
> 

> 0.0.5 OK
> 

> r-release-macos-arm64
> 

> 0.0.5 65.00 OK
> 

> r-release-macos-x86_64
> 

> 0.0.5 89.00 OK
> 

> r-release-windows-x86_64
> 

> 0.0.5 195.00 363.00 558.00 OK
> 

> r-oldrel-macos-arm64
> 

> 0.0.5 57.00 OK
> 

> r-oldrel-macos-x86_64
> 

> 0.0.5 84.00 OK
> 

Re: [Rd] How to access to internal header files

2022-05-24 Thread Martin Maechler
> Sebastian Fischer 
> on Tue, 24 May 2022 09:01:17 +0200 writes:

> Dear R-devel Mailing List,

> I would like to get a better understanding of R's internal structures by 
> using R's
> C API. For that I would like to have access to all the C header files 
> that are listed
> here: https://github.com/wch/r-source/tree/trunk/src/include. (i.e. I 
> want to e.g.
> #include 

> However when I install R, there is only a subset of those header files 
> available on my system. 

Of course, very much on purpose.
Headers such as Defn.h are *not* part of the API and hence
should not be visible.

> While I am aware that files like Defn.h are not intended  to be used by
> R Extensions, I assume there must be some way to configure R's 
> compilation to make
> these definitions available for my exploration of the language.

Well, if you want to explore how R is written ... and that's the
only good reason for looking into such private header files ...
then get the sources of R and explore...

The official sources (and even daily snapshots from both
"R-patched" and "R-devel")  are available e.g. from

 https://cran.r-project.org/sources.html


> I would appreciate any help.

> Best regards

> Sebastian Fischer

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] intersect() change of behavior in 4.2

2022-04-14 Thread Martin Maechler
> Lluís Revilla 
> on Tue, 12 Apr 2022 09:16:36 +0200 writes:

> Hi all,
> This change is documented on the man page so I think it is intentional.

yes, also if you look at the (svn) log messages of the code changes
(or its git mirrors).

>> From https://search.r-project.org/R/refmans/base/html/sets.html:

> For union, a vector of a common mode.
> For intersect, a vector of a common mode, or NULL if x or y is NULL.
> For setdiff, a vector of the same mode as x.

> Now the results are symmetrical to intersect( "foo", list())

indeed, and that *is* very desirable.

> Probably it is worth mentioning in the NEWS as it was found to cause
> some test to fail on a Bioconductor package some months ago.
> This could affect other packages and analysis too.

> Best,
> Lluís

I agree and have added an entry there ... still to be
back ported to R 4.2.0 beta.

Martin


> On Tue, 12 Apr 2022 at 01:22, Gábor Csárdi  wrote:
>> 
>> I wonder if this change is intentional, and if it is, whether it is
>> worth mentioning in the NEWS.
>> 
>> ❯ R-4.1 -q -e 'intersect(list(), "foo")'
>> > intersect(list(), "foo")
>> character(0)
>> 
>> ❯ R-4.2 -q -e 'intersect(list(), "foo")'
>> > intersect(list(), "foo")
>> list()
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Floating Point with POSIXct

2022-03-03 Thread Martin Maechler
>>>>> John Muschelli 
>>>>> on Thu, 3 Mar 2022 11:04:05 -0500 writes:
>>>>> John Muschelli 
>>>>> on Thu, 3 Mar 2022 11:04:05 -0500 writes:

> I see in ?POSIXct and I'm trying to understand the note:
>> Classes "POSIXct" and "POSIXlt" are able to express fractions of a 
second. (Conversion of fractions between the two forms may not be exact, but 
will have better than microsecond accuracy.)

> Mainly, I'm trying to understand printing of POSIXct with fractional
> seconds.  I see print.POSIXct calls format.POSIXct and eventually
> calls format.POSIXlt, which then takes into account `digits.secs` for
> printing. The format uses %OS3, which strptime indicates (* added):

>> Specific to R is %OSn, which for output gives the seconds *truncated* to 
0 <= n <= 6 decimal places (and if %OS is not followed by a digit, it uses the 
setting of getOption("digits.secs"), or if that is unset, n = 0).

> So I'm seeing it truncates the seconds to 3 digits, so I think that is
> why the below is printing 0.024.

> I think this is especially relevant even if you set
> `options(digits.secs = 6)`, then the code in
> format.POSIXlt would still return np=3 as the following condition
> would break at i = 3

> for (i in seq_len(np) - 1L) 
>   if (all(abs(secs - round(secs, > i)) < 1e-06)) {
> np <- i
> break
> }

> as sub_seconds - round(sub_seconds,3) < 1e-06.   This seems to be
> expected behavior given the docs, but would any consider this a bug?


> Example:

> options(digits.secs = 4)
> x = structure(947016000.025, class = c("POSIXct", "POSIXt"), tzone = 
"UTC")

I think you've fallen into the R FAQ 7.31 trap :

> ct <- 947016000.025
> ct %% 1
[1] 0.0248
>

Of course, the issue may still be somewhat interesting, ...

Yes, POSIXct is of limited precision and I think the help page
you mentioned did document that that's one reason for using
POSIXlt instead, as there, sub second accuracy can be much better.

But FAQ 7.31 and the fact that all numbers are base 2 and in
base 2,  no decimal   .025   can be represented in full accuracy.

Also, as you've noticed the R POSIX[cl]t  code just truncates,
i.e. rounds towards 0 unconditionally, and I tend to agree that it
should rather round than truncate.

But we should carefully separate the issues here, from the
underlying inherent FAQ 7.31 truth that most decimal numbers in
a computer are not quite what they look like ...

Martin Maechler
ETH Zurich and  R Core Team (also author of the CRAN package 'round')


> summary(x, digits = 20)
> #>  Min.   1st Qu.
Median
> #> "2000-01-04 20:00:00.024" "2000-01-04 20:00:00.024" "2000-01-04 
20:00:00.024"
> #>  Mean   3rd Qu.
  Max.
> #> "2000-01-04 20:00:00.024" "2000-01-04 20:00:00.024" "2000-01-04 
20:00:00.024"
> x
> #> [1] "2000-01-04 20:00:00.024 UTC"
> format.POSIXct(x, format = "%Y-%m-%d %H:%M:%OS3")
> #> [1] "2000-01-04 20:00:00.024"
> format.POSIXct(x, format = "%Y-%m-%d %H:%M:%OS4")
> #> [1] "2000-01-04 20:00:00.0249"
> sub_seconds = as.numeric(x) %% 1
> sub_seconds
> #> [1] 0.0248
> round(sub_seconds, 3)
> #> [1] 0.025

> rounded = as.POSIXct(
> floor(as.numeric(x)) +
> round(as.numeric(x) %% 1, 3),
> origin = "1970-01-01")
> rounded
> #> [1] "2000-01-04 20:00:00.024 UTC"
> as.numeric(rounded) %% 1
> #> [1] 0.0248

> R.version
> _
> platform   x86_64-pc-linux-gnu
> arch   x86_64
> os linux-gnu
> system x86_64, linux-gnu
> status
> major  4
> minor  1.2
> year   2021
> month  11
> day01
> svn rev81115
> language   R
> version.string R version 4.1.2 (2021-11-01)
> nickname   Bird Hippie



> Best,
> John

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Inconsistent behavior of stats::bw.nrd() and stats::bw.nrd0()

2022-02-24 Thread Martin Maechler
>>>>> Noah Greifer 
>>>>> on Wed, 23 Feb 2022 11:21:18 -0500 writes:

> Hello R-devel,

> I noticed an inconsistency in stats::bw.nrd() and stats::bw.nrd0, two
> functions used to compute the bandwidth for densities. According to the
> documentation,

> "bw.nrd0 implements a rule-of-thumb for choosing the bandwidth of a
> Gaussian kernel density estimator. It defaults to 0.9 times the minimum of
> the standard deviation and the interquartile range divided by 1.34 times
> the sample size to the negative one-fifth power (= Silverman's ‘rule of
> thumb’, Silverman (1986, page 48, eqn (3.31))) unless the quartiles
> coincide when a positive result will be guaranteed.

> bw.nrd is the more common variation given by Scott (1992), using factor
> 1.06."

> This implies the result of bw.nrd() should simply be 1.06/.9 times the
> result of bw.nrd0(). However, these functions are coded quite differently
> and, in particular, respond to situations where the data has an IQR of 0
> differently. The source of bw.nrd0 is

> function (x)
> {
> if (length(x) < 2L)
> stop("need at least 2 data points")
> hi <- sd(x)
> if (!(lo <- min(hi, IQR(x)/1.34)))
> (lo <- hi) || (lo <- abs(x[1L])) || (lo <- 1)
> 0.9 * lo * length(x)^(-0.2)
> }
> 
> and the source of bw.nrd is
> 
> function (x)
> {
> if (length(x) < 2L)
> stop("need at least 2 data points")
> r <- quantile(x, c(0.25, 0.75))
> h <- (r[2L] - r[1L])/1.34
> 1.06 * min(sqrt(var(x)), h) * length(x)^(-1/5)
> }
> 

> Importantly, when the IQR of the input is 0, bw.nrd0() falls back onto the
> standard deviation, guaranteeing the positive result described in the
> documentation. Whereas, bw.nrd() produces a result of 0. I am not sure
> which result is more desirable, but it would seem to me that they should at
> least be consistent. See examples below:

> > x <- c(1,1,1,1,1000)
> > stats::bw.nrd(x)
> [1] 0
> > stats::bw.nrd0(x)
> [1] 291.4265

> Noah Greifer

This is for historical (and copyright?) reasons only/mostly:

At the time  R 1.0.0 was released (on Feb. 29, 2000 -- a date not
   existing in MS Windows 3.11)
there were no bw.*() functions
and density() already contained

if (missing(bw))
  bw <-
if(missing(width)) {
hi <- sd(x)
if(!(lo <- min(hi, IQR(x)/1.34)))# qnorm(.75) - qnorm(.25) = 1.34898
(lo <- hi) || (lo <- abs(x[1])) || (lo <- 1.)
adjust * 0.9 * lo * N^(-0.2)
} else 0.25 * width

whereas in Nov 1999, it still was

if (missing(bw))
bw <-
if(missing(width))
adjust * 0.9 * min(sd (x), IQR(x)/1.34) * N^(-0.2)
else 0.25 * width

(which actually *was* Silverman's rule of thumb): He, as Scott
 and all math-statisticians who ever published about the
 problem, was/were never interested in the fact that a good algorithm
 must even work in extreme cases ..)

As a matter of fact, svn trunk rev 6994 (Dec 11, 1999) was a
large "branch update" which changed the 'bw' computation to the
more robust one (never giving 0, BTW, even when sd(x) == 0, not
just IQR(.) == 0).

So, we had already made sure that the default bandwidth 'bw' was
really numerically robust and would always give a positive result.

Then, about 2 years later, with svn r16938, 2001-11-28
by Prof Brian Ripley with message  'add bandwidth-selectors to density()'
he brought in all (I think) of the current bw.*() choices
actually from the MASS book's S/R code by Venables & Ripley,
including the most traditional  bw.nrd() one, as they were
already described in the book;
and kept the default bw for density() as previously,
now modularized in the bw.nrd0() function.

Also note that the documentation (the help page) is more or less
precise here, since as I said above, Scott did neither consider
dealing with the not so uncommon case of IQR() == 0.

But you are right that the help page could be more precise here
and make sure that the robustness applies only to the nrd0, not
to nrd.
I'd  think we'd accept a patch proposal for the
src/library/stats/man/bandwidth.Rd  help file, making this more
unambigous.

Best,
Martin


Martin Maechler
ETH Zurich  and  R Core team

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] str2lang throws error when the string is empty

2022-02-12 Thread Martin Maechler
> Dipterix Wang 
> on Fri, 11 Feb 2022 06:55:44 -0500 writes:

> Hi,

> str2lang("") raises an error in current version. 

on purpose.

> Would it
> be good if it returns a missing value expression? 

Well, others may be able to better explain why "the empty name"
aka your "missing value expression" or just "the missing"   is a
"dangerous" object  and ideally it would not be available at all
on the R level.  OTOH, it is available e.g. via alist(), maybe
slightly "less ugly" asalist(.=)$.

but I don't think there are really good use cases.
(see below)

> One use-case would be to build an expression that subsets an
> array:

> # Expected: x[index1, ] 
> as.call(list(quote(`[`), quote(x), quote(index1), str2lang("")))

> Right now I'm using the following, which is ugly
> as.call(list(quote(`[`), quote(x), quote(index1), alist(x=)[[1]]))

Well, in such cases, much less ugly than both your version is to
use  substitute(),
here e.g.,

  > substitute(x[I,], list(I = quote(index1)))
  x[index1, ]
  >

> Thanks, - Dipterix

You are welcome,
Martin



{Please, for next time do use  text/plain  e-mail : }

> [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in rbind.data.frame?

2022-02-09 Thread Martin Maechler
> Kurt Hornik 
> on Mon, 31 Jan 2022 09:29:22 +0100 writes:

> Duncan Murdoch writes:
>> Okay, I spotted it.  This is intentional.  From ?rbind.data.frame:
>> "The rbind data frame method first drops all zero-column and zero-row 
>> arguments."

> Hmm.  "As documented", but still surprising too me as well ...

> We also say

> For ‘rbind’ column names are taken from the first argument with
> appropriate names: colnames for a matrix, or names for a vector of
> length the number of columns of the result.

> Of course, one could argue that "The rbind data frame method first drops
> all zero-column and zero-row arguments." implies that "first argument
> ..." should be taken after dropping, but then

R> m <- matrix(0, 0, 2, dimnames = list(NULL, c("a", "b")))
R> rbind(m, c(3, 4))
>  a b
> [1,] 3 4

> which is not consistent with the data frame case.

(I agree and I think we should even consider to change
 rbind.data.frame() there  ... )

> Btw, whereas

R> rbind(c(1, 2), c(3, 4, 5))
> Warning in rbind(c(1, 2), c(3, 4, 5)) :
> number of columns of result is not a multiple of vector length (arg 1)
>  [,1] [,2] [,3]
> [1,]121
> [2,]345

> "as documented", 

R> df <- data.frame(a = 1, b = 2)
> rbind(df, c(3, 4, 5))
>   a b
> 1 1 2
> 2 3 4

> with is a bit worrying (and not as documented)?

Kurt and I have continued to talk about this,
and  few minutes ago, I've committed a change to R-devel's
rbind.data.frame()

which now gives

> rbind(data.frame(a = 1, b = 2), c(3, 4, 5))
  a b
1 1 2
2 3 4
Warning message:
In rbind(deparse.level, ...) :
  number of columns of result, 2, is not a multiple of vector length 3 of arg 2
> 

i.e., the same result, but *with* an informative warning,
analogously to the warning that has been produce "forever" in
the matrix case.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] model.weights and model.offset: request for adjustment

2022-02-03 Thread Martin Maechler
> Ben Bolker 
> on Tue, 1 Feb 2022 21:21:46 -0500 writes:

> The model.weights() and model.offset() functions from the 'stats' 
> package index possibly-missing elements of a data frame via $, e.g.

> x$"(offset)"
> x$"(weights)"

> This returns NULL without comment when x is a data frame:

> x <- data.frame(a=1)
> x$"(offset)"  ## NULL
> x$"(weights)"  ## NULL

> However, when x is a tibble we get a warning as well:

> x <- tibble::as_tibble(x)
> x$"(offset)"
> ## NULL
> ## Warning message:
> ## Unknown or uninitialised column: `(offset)`.

> I know it's not R-core's responsibility to manage forward 
> compatibility with tibbles, but in this case [[-indexing would seem to 
> be better practice in any case.

Yes, I would agree:  we should use  [[ instead of $ here
in order to force exact matching just as principle

Importantly, because  also  mf[["(weights)"]]
will return  NULL without a warning for a model/data frame, and
it seems it does so also for tibbles.

> Might a patch be accepted ... ?

That would not be necessary.

There's one remaining problem however:
`$` access is clearly faster than `[[` for small data frames
(because `$` is a primitive function doing everything in C, 
 whereas `[[` calls the R level data frame method ).

Faster in both cases, i.e., when there *is* a column and when there
is none (and NULL is returned), e.g., for the first case

> system.time(for(i in 1:2) df[["a"]])
   user  system elapsed 
  0.064   0.000   0.065 
> system.time(for(i in 1:2) df$a)
   user  system elapsed 
  0.009   0.000   0.009 

So that's probably been the reason why  `$`  has been prefered?


Martin

> cheers
> Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] documentation patch for as.formula → reformulate

2022-01-10 Thread Martin Maechler
> Ben Bolker   on Sun, 9 Jan 2022 16:39:43 -0500 writes:

>There was some discussion on twitter about the fact
> that the manual page for as.formula() doesn't mention
> reformulate(), and indeed the last example is

> ## Create a formula for a model with a large number of
> variables: xnam <- paste0("x", 1:25) (fmla <-
> as.formula(paste("y ~ ", paste(xnam, collapse= "+"


> which could arguably be better done as

>reformulate(xname, response = "y")

>I've attached a documentation patch that adds the
> alternative version and a \seealso{} link.

>Happy to submit to r-bugzilla if requested.

>cheers Ben Bolker

x[DELETED ATTACHMENT external:  reformulate_patch.txt, plain text]

Thanks a lot, Ben!

I've committed (+-) it to R-devel as svn rev 81464
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] "getOption(max.print) omitted %d entries" may be negative

2022-01-08 Thread Martin Maechler
>>>>> Tomas Kalibera 
>>>>> on Mon, 3 Jan 2022 20:59:30 +0100 writes:

    > On 1/3/22 6:15 PM, Martin Maechler wrote:
>>>>>>> Hugh Parsonage on Wed, 29 Dec 2021 00:36:51 +1100
>>>>>>> writes:
>> > In src/main/printvector.c in the definition of
>> printVector and > printNamedVector (and elsewhere):
>> 
>> > Rprintf(" [ reached getOption(\"max.print\") -- omitted
>> %d entries ]\n", > n - n_pr);
>> 
>> > Though n - n_pr is of type R_xlen_t so may not be
>> representable as > int. In practice negative values may
>> be observed for long vectors.
>> 
>> > Rprintf(" [ reached getOption(\"max.print\") -- omitted
>> %lld entries ]\n", > n - n_pr);
>> 
>> 
>> Thank you Hugh, for finding and reporting this, including
>> a proposed remedy.
>> 
>> At some point in time, I think the %lld format specifier
>> was not portable enough to all versions of C compiler /
>> libraries that were considered valid for compiling R.
>> 
>> See e.g.,
>> 
>> https://stackoverflow.com/questions/462345/format-specifier-for-long-long
>> 
>> which says that "it" does not work on Windows.
>> 
>> Maybe this has changed now that we require C99 and also
>> that since R version 4.0.0 (or 4.0.1) we also use a
>> somewhat more recent version of gcc also on Windows?
>> 
>> ... ah, searching the R sources reveals uses of %lld
>> *plus*
>> 
>> #ifdef Win32 #include  /* for %lld */ #endif
>> 
>> so it seems we can and should probably change this ...

> UCRT on Windows supports the C99 format, so %lld works,
> but there is a bug in GCC which causes a compilation
> warning to appear for %lld.

> There is an open GCC bug report with a patch. It has not
> been adopted, yet, but I got reviews from two people and
> patched the build of GCC in Rtools42. So, %lld etc now
> works without a warning for us on Windows and certainly
> can be used in package code.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95130

> For base R, as we have been using the trio remap to get
> rid of the warning with %lld, it would make sense to keep
> doing this for consistency. Eventually we might be able to
> remove the dependency on trio, after checking that the
> other problems due to which we use it have been resolved
> in UCRT.

> Tomas

I have committed changes now (svn r81459), using %lld as mentioned above,
also including  trioremap.h  for Windows,
not just for printing long unnamed atomic vectors, but also in
the code for printing named vectors and "generic vectors" aka
lists which previously did not allow long vectors at all,
using `length() and `int` before.

I've set it as to be ported to  "R 4.1.2 patched" eventually.
Martin




>> 
>> [Please, C compiler / library standard experts, chime in
>> !]
>> 
>> Martin Maechler ETH Zurich and R core team

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] "getOption(max.print) omitted %d entries" may be negative

2022-01-03 Thread Martin Maechler
>>>>> Hugh Parsonage 
>>>>> on Wed, 29 Dec 2021 00:36:51 +1100 writes:

> In src/main/printvector.c in the definition of printVector and
> printNamedVector  (and elsewhere):

> Rprintf(" [ reached getOption(\"max.print\") -- omitted %d entries ]\n",
> n - n_pr);

> Though n - n_pr is of type R_xlen_t so may not be representable as
> int. In practice negative values may be observed for long vectors.

> Rprintf(" [ reached getOption(\"max.print\") -- omitted %lld entries ]\n",
> n - n_pr);


Thank you Hugh, for finding and reporting this,
including a proposed remedy. 

At some point in time, I think the   %lld   format specifier was
not portable enough to all versions of C compiler / libraries
that were considered valid for compiling R.

See e.g.,

   https://stackoverflow.com/questions/462345/format-specifier-for-long-long

which says that "it" does not work on Windows.

Maybe this has changed now that we require C99 and also that
since R version 4.0.0 (or 4.0.1) we also use a somewhat more
recent version of gcc also on Windows?

... ah, searching the R sources reveals uses of %lld
*plus*

#ifdef Win32
#include  /* for %lld */
#endif

so it seems we can and should probably change this ...

[Please, C  compiler / library standard experts, chime in !]

Martin Maechler
ETH Zurich  and  R core team

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] trivial typo in NEWS file

2022-01-03 Thread Martin Maechler
> Ben Bolker 
> on Mon, 3 Jan 2022 11:04:48 -0500 writes:

> Index: doc/NEWS.Rd
> ===
> --- doc/NEWS.Rd   (revision 81435)
> +++ doc/NEWS.Rd   (working copy)
> @@ -425,7 +425,7 @@
> data frames with default row names (Thanks to Charlie Gao's
> \PR{18179}).

> -  \item \code{txtProgresBar()} now enforces a non-zero width for
> +  \item \code{txtProgressBar()} now enforces a non-zero width for
> \code{char}, without which no progress can be visible.

> \item \code{dimnames(table(d))} is more consistent in the case where


Thank you, Ben!

I will take care of this with my next commit (dealing with R's
bugzilla PR#18272).

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why does lm() with the subset argument give a different answer than subsetting in advance?

2022-01-03 Thread Martin Maechler
> Ben Bolker 
> on Mon, 27 Dec 2021 09:43:42 -0500 writes:

>I agree that it seems non-intuitive (I can't think of a
> design reason for it to look this way), but I'd like to
> stress that it's *not* an information leak; the
> predictions of the model are independent of the
> parameterization, which is all this issue affects. In a
> worst case there might be some unfortunate effects on
> numerical stability if the data-dependent bases are
> computed on a very different set of data than the model
> fitting actually uses.

>I've attached a suggested documentation patch (I hope
> it makes it through to the list, if not I can add it to
> the body of a message.)

It did make it through;  thank you, Ben!
( After adding two forgotten '}' ) I've committed the help file
additions to the R sources (R-devel) in svn r81434 .

Thanks again and

   "Happy New Year"

to all readers,

Martin




> On 12/26/21 8:35 PM, Balise, Raymond R wrote:
>> Hello R folks, Today I noticed that using the subset
>> argument in lm() with a polynomial gives a different
>> result than using the polynomial when the data has
>> already been subsetted. This was not at all intuitive for
>> me.  You can see an example here:
>> 
https://stackoverflow.com/questions/70490599/why-does-lm-with-the-subset-argument-give-a-different-answer-than-subsetting-i
>> 
>> If this is a design feature that you don’t think should
>> be fixed, can you please include it in the documentation
>> and explain why it makes sense to figure out the
>> orthogonal polynomials on the entire dataset?  This feels
>> like a serous leak of information when evaluating train
>> and test datasets in a statistical learning framework.
>> 
>> Ray
>> 
>> Raymond R. Balise, PhD Assistant Professor Department of
>> Public Health Sciences, Biostatistics
>> 
>> University of Miami, Miller School of Medicine 1120
>> N.W. 14th Street Don Soffer Clinical Research Center -
>> Room 1061 Miami, Florida 33136
>> 
>> 
>> 
>> [[alternative HTML version deleted]]
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 

> -- 
> Dr. Benjamin Bolker Professor, Mathematics & Statistics
> and Biology, McMaster University Director, School of
> Computational Science and Engineering Graduate chair,
> Mathematics & Statistics x[DELETED ATTACHMENT external:
> BenB_lm-subset.patch, plain text]
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Feature request: compareVersion

2021-12-21 Thread Martin Maechler


> Hi,
> currently I have to use

> compareVersion(as.character(packageVersion("shiny")), "0.11")

> It would be nice if compareVersion would make the as.character 
> internally, rather then force the user to do it.

> Thanks
> Sigbert

Well, if you follow the help page examples of packageVersion()
you would use

> packageVersion("Matrix") >= "1.4.0"
[1] TRUE
> packageVersion("shiny") >= "0.11"
[1] TRUE
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] meaning of browser(skipCalls=) [and multiple mouse buttons]

2021-12-16 Thread Martin Maechler
> Frederick Eaton 
> on Wed, 15 Dec 2021 20:09:46 -0800 writes:

> Just following up to check if anyone has had time to look over these 
patches.
> Frederick

I strongly guess that nobody has.

Let me give you my perception of what you have tried to
propose/use,  and why I hadn't thought I should put in time for it:

You had started the thread by proposing "to override stopifnot()",
something which I (even though principal author of the function)
don't find a good idea at all:

stopifnot() is just one important utility function that will
call stop() under some circumstances.
If you want to tweak  error handling / debugging / browser, ..
you need to work on the level of error conditions, their
handlers, etc. 

Secondly, you've mixed this up with mouse button
action/interrupt/.. handling  which may be a cool and nice idea,
but then your  `xbindkey`-etc code is, I think, only/entirely
for X11-based R interfaces, and I think this would only be a
Linux console, possibly one from using ESS (Emacs Speaks Statistics),
but most probably (but I'm guessing here) not even relevant when
using Rstudio on Linux, and even less relevant for any of the
other ways R is used interactively on non-Linux platforms. Maybe
it would also apply to *some* uses of R on the Mac, but not even
the default R-Mac GUI..

Sorry that this not as much encouraging as it probably should
be, but I though you'd rather want *some* feedback than none...

Best,
Martin



> On Wed, Dec 08, 2021 at 12:24:47AM -0800, Frederick Eaton wrote:
>> Dear R Core Team,
>> 
>> I'm attaching a proposed patch to hopefully address my confusions 
regarding the documentation of browser(). I'm not sure if all the material I 
added is correct, but I made experiments to confirm that the behavior is at 
least roughly as described.
>> 
>> patch ./src/library/base/man/browser.Rd < browser.patch
>> 
>> Also, here is a patch to support multiple mouse buttons in 
getGraphicsEvent(). This must be edited before it can be applied, I decided to 
keep the old code in an 'if(0)' to help make it clearer that my code is 
essentially doing the same thing.
>> 
>> 
https://github.com/navarum/tweaks/blob/master/r/patches/0001-Add-support-for-multiple-mouse-buttons.patch
>> wget -O - 
https://raw.githubusercontent.com/navarum/tweaks/master/r/patches/0001-Add-support-for-multiple-mouse-buttons.patch
 | patch -p1
>> 
>> It would be useful to have support in R for more than three mouse 
buttons because this enables the use of the mouse wheel (buttons 4 and 5), 
which can provide a more convenient interface when adjusting numbers and 
graphics and so on. I also have shift+wheel bound to buttons 6 and 7 via 
xbindkeys and xte, which I use for horizontal scrolling, via a trick from the 
web somewhere:
>> 
>> $ cat .xbindkeysrc.scm | grep xte
>> (xbindkey '(shift "b:4") "xte 'mouseclick 6'")
>> (xbindkey '(shift "b:5") "xte 'mouseclick 7'")
>> 
>> I hope that these contributions can be found acceptable.
>> 
>> Thank you in advance,
>> 
>> Frederick
>> 
>> 
>> 
>> On Mon, Nov 22, 2021 at 09:13:58AM -0800, Frederick Eaton wrote:
>>> Dear R Devel,
>>> 
>>> I have been advised to use "options(error=recover)" to enable
>>> debugging on errors. But sometimes it would seem more convenient to
>>> override "stopifnot", for example:
>>> 
>>> stopifnot = function(b) { if(!b) { browser(skipCalls=1); } }
>>> 
>>> However, this doesn't do what I expected. On looking closer I find
>>> that the "skipCalls" argument seems to be ignored except when printing
>>> the "Called from: " message; it does not affect the evaluation context
>>> or the output of 'where':
>>> 
>>> > var=2; f=function(){var=1; browser(skipCalls=0)}; f()
>>> Called from: f()
>>> Browse[1]> var
>>> [1] 1
>>> Browse[1]> where
>>> where 1: f()
>>> 
>>> Browse[1]> Q
>>> > var=2; f=function(){var=1; browser(skipCalls=1)}; f()
>>> Called from: top level Browse[1]> var
>>> [1] 1
>>> Browse[1]> where
>>> where 1: f()
>>> 
>>> Browse[1]> Q
>>> > var=2; f=function(){var=1; browser(skipCalls=2)}; f()
>>> Called from: top level Browse[1]> var
>>> [1] 1
>>> Browse[1]> where
>>> where 1: f()
>>> 
>>> Browse[1]> Q
>>> 
>>> So it appears that the "browser()" API does not actually make it
>>> possible to call this built-in function from within another R function
>>> and thereby emulate the same behavior as calling browser() directly.
>>> 
>>> If this is the case, it might be good to have it fixed or documented.
>>> I am aware of "browser(expr=)", but this requires editing the
>>> particular call that failed. The documentation for "browser()" led me
>>> to hope that my use case would be supported, if only because it admits
>>> that users might want to 

[Rd] Appropriate mailing list for " Improved LP/MIP solver "

2021-12-13 Thread Martin Maechler
>>>>> Avraham Adler 
>>>>> on Sun, 12 Dec 2021 16:27:02 + writes:

 []

Thank you, Julian and Avi,  on the topic of asking and replying
if there's interest on getting improved LP / MIP  (and QP, I think was
mentioned too!) solver interfaces for R.

However, Avi writes

> Also, to be good R-citizens, this thread should probably be moved to
> R-package-devel [1].

> Thanks,
> Avi

and I'm of an entirely different opinion.

R-package-devel  has been created, aons after R-devel,  to
*help* R package developers to get their packaging problems
solved, notably to get advice in making their packages ready for
CRAN.

Julian's question was really addressing the whole R developer
community asking if some functionality was desirable to be added
to the R-package space.

For me one *the* appropriate topics for this R-devel mailing
list.

Best,
Martin

---
Martin Maechler
ETH Zurich  and  R Core
(and original creator of R-help, R-devel, .. lists)

> [1] https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] plogis (and other p* functions), vectorized lower.tail

2021-12-09 Thread Martin Maechler
>>>>> Sokol Serguei on Thu, 9 Dec 2021 17:13:36 +0100 writes:

> On 09/12/2021 16:55, Ben Bolker wrote:
>> 
>> 
>> On 12/9/21 10:03 AM, Martin Maechler wrote:
>>>>>>>> Matthias Gondan
>>>>>>>>  on Wed, 8 Dec 2021 19:37:09 +0100 writes:
>>> 
>>>  > Dear R developers,
>>>  > I have seen that plogis silently ignores vector elements of 
>>> lower.tail,
>>> 
>>> and also of 'log'.
>>> This is indeed the case for all d*, p*, q* functions.
>>> 
>>> Yes, this has been on purpose and therefore documented, in the
>>> case of plogis, e.g. in the 'Value' section of ?plogis :
>>> 
>>>   The length of the result is determined by ‘n’ for ‘rlogis’, and is
>>>   the maximum of the lengths of the numerical arguments for the
>>>   other functions.
>>> 
>>>   (note: *numerical* arguments: the logical ones are not recycled)
>>> 
>>>   The numerical arguments other than ‘n’ are recycled to the length
>>>   of the result.  Only the first elements of the logical arguments
>>>   are used.
>>> 
>>>   (above, we even explicitly mention the logical arguments ..)
>>> 
>>> 
>>> Recycling happens for the first argument (x,p,q) of these
>>> functions and for "parameters" of the distribution, but not for
>>> lower.tail, log.p (or 'log').
>>> 
>>> 
>>>  >> plogis(q=0.5, location=1, lower.tail=TRUE)
>>>  > [1] 0.3775407
>>>  >> plogis(q=0.5, location=1, lower.tail=FALSE)
>>>  > [1] 0.6224593
>>>  >> plogis(q=c(0.5, 0.5), location=1, lower.tail=c(TRUE, FALSE))
>>>  > [1] 0.3775407 0.3775407
>>> 
>>>  > For those familiar with psychological measurement: A use case 
>>> of the above function is the so-called Rasch model, where the 
>>> probability that a person with some specific ability (q) makes a 
>>> correct (lower.tail=TRUE) or wrong response (lower.tail=FALSE) to an 
>>> item with a specific difficulty (location). A vectorized version of 
>>> plogis would enable to determine the likelihood of an entire response 
>>> vector in a single call. My current workaround is an intermediate 
>>> call to „Vectorize“.
>>> 
>>>  > I am wondering if the logical argument of lower.tail can be 
>>> vectorized (?). I see that this may be a substantial change in many 
>>> places (basically, all p and q functions of probability 
>>> distributions), but in my understanding, it would not break existing 
>>> code which assumes lower.tail to be a single element. If that’s not
>>>  > possible/feasible, I suggest to issue a warning if a vector of 
>>> length > 1 is given in lower.tail. I am aware that the documentation 
>>> clearly states that lower.tail is a single boolean.
>>> 
>>> aah ok, here you say you know that the current behavior is documented.
>>> 
>>>  > Thank you for your consideration.
>>> 
>>> 
>>> As you mention, changing this would be quite a large endeavor.
>>> I had thought about doing that many years ago, not remembering
>>> details, but seeing that in almost all situations you really
>>> only need one of the two tails  (for Gaussian- or t- based confidence
>>> intervals you also only need one, for symmetry reason).
>>> 
>>> Allowing the recycling there would make the intermediate C code
>>> (which does the recycling) larger and probably slightly
>>> slower because of conceptually two more for loops which would in
>>> 99.9% only have one case ..
>>> 
>>> I'd have found that ugly to add. ... ...
>>> ... but of course, if you can prove that the code bloat would not be 
>>> large
>>> and not deteriorate speed in a measurable way and if you'd find
>>> someone to produce a comprehensive and tested patch ...
>>> 
>>> Martin
>>> 
>>> 
>>>  > With best wishes,
>>>  > Matthias
>>> 
>>> 
>>> 
>>>  

Re: [Rd] plogis (and other p* functions), vectorized lower.tail

2021-12-09 Thread Martin Maechler
> Matthias Gondan 
> on Wed, 8 Dec 2021 19:37:09 +0100 writes:

> Dear R developers,
> I have seen that plogis silently ignores vector elements of lower.tail,

and also of 'log'.
This is indeed the case for all d*, p*, q* functions.

Yes, this has been on purpose and therefore documented, in the
case of plogis, e.g. in the 'Value' section of ?plogis :

 The length of the result is determined by ‘n’ for ‘rlogis’, and is
 the maximum of the lengths of the numerical arguments for the
 other functions.

 (note: *numerical* arguments: the logical ones are not recycled)

 The numerical arguments other than ‘n’ are recycled to the length
 of the result.  Only the first elements of the logical arguments
 are used.

 (above, we even explicitly mention the logical arguments ..)


Recycling happens for the first argument (x,p,q) of these
functions and for "parameters" of the distribution, but not for
lower.tail, log.p (or 'log').


>> plogis(q=0.5, location=1, lower.tail=TRUE) 
> [1] 0.3775407
>> plogis(q=0.5, location=1, lower.tail=FALSE) 
> [1] 0.6224593
>> plogis(q=c(0.5, 0.5), location=1, lower.tail=c(TRUE, FALSE)) 
> [1] 0.3775407 0.3775407

> For those familiar with psychological measurement: A use case of the 
above function is the so-called Rasch model, where the probability that a 
person with some specific ability (q) makes a correct (lower.tail=TRUE) or 
wrong response (lower.tail=FALSE) to an item with a specific difficulty 
(location). A vectorized version of plogis would enable to determine the 
likelihood of an entire response vector in a single call. My current workaround 
is an intermediate call to „Vectorize“.

> I am wondering if the logical argument of lower.tail can be vectorized 
(?). I see that this may be a substantial change in many places (basically, all 
p and q functions of probability distributions), but in my understanding, it 
would not break existing code which assumes lower.tail to be a single element. 
If that’s not
> possible/feasible, I suggest to issue a warning if a vector of length > 1 
is given in lower.tail. I am aware that the documentation clearly states that 
lower.tail is a single boolean.

aah ok, here you say you know that the current behavior is documented.

> Thank you for your consideration.


As you mention, changing this would be quite a large endeavor.
I had thought about doing that many years ago, not remembering
details, but seeing that in almost all situations you really
only need one of the two tails  (for Gaussian- or t- based confidence
intervals you also only need one, for symmetry reason).

Allowing the recycling there would make the intermediate C code
(which does the recycling) larger and probably slightly
slower because of conceptually two more for loops which would in
99.9% only have one case ..

I'd have found that ugly to add. ... ...
... but of course, if you can prove that the code bloat would not be large
and not deteriorate speed in a measurable way and if you'd find
someone to produce a comprehensive and tested patch ...

Martin


> With best wishes,
> Matthias



> [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Martin Maechler
>>>>> Martin Maechler 
>>>>> on Tue, 7 Dec 2021 18:35:00 +0100 writes:

>>>>> Taras Zakharko 
>>>>> on Tue, 7 Dec 2021 12:56:30 +0100 writes:

>> I fully agree! General string interpolation opens a gaping security hole 
and is accompanied by all kinds of problems and decisions. What I envision 
instead is something like this:
>> f”hello {name}” 

>> Which gets parsed by R to this:

>> (STRINTERPSXP (CHARSXP (PROMISE nil)))

>> Basically, a new type of R language construct that still can be 
processed by packages (for customized interpolation like in cli etc.), with a 
default eval which is basically paste0(). The benefit here would be that this 
is eagerly parsed and syntactically checked, and that the promise code could 
carry a srcref. And of course, that you could pass an interpolated string 
expression lazily between frames without losing the environment etc… For more 
advanced applications, a low level string interpolation expression constructor 
could be provided (that could either parse a general string — at the user’s 
risk, or build it directly from expressions). 

>> — Taras

> Well, many months ago, R's  NEWS (for R-devel, then became R 4.0.0)
> contained

> * There is a new syntax for specifying _raw_ character constants
> similar to the one used in C++: r"(...)" with ... any character
> sequence not containing the sequence )".  This makes it easier to
> write strings that contain backslashes or both single and double
> quotes.  For more details see ?Quotes.

> This should be pretty close to what you propose above
> (well, you need to replace your UTF-8 forward double quotes by
> ASCII ones),
> no ?

No it is not; sorry I'm not at full strength..
Martin


>>> On 7 Dec 2021, at 12:06, Simon Urbanek  
wrote:
>>> 
>>> 
>>> 
>>>> On Dec 7, 2021, at 22:09, Taras Zakharko mailto:taras.zakha...@uzh.ch>> wrote:
>>>> 
>>>> Great summary, Avi. 
>>>> 
>>>> String concatenation cold be trivially added to R, but it probably 
should not be. You will notice that modern languages tend not to use “+” to do 
string concatenation (they either have 
>>>> a custom operator or a special kind of pattern to do it) due to 
practical issues such an approach brings (implicit type casting, lack of 
commutativity, performance etc.). These issues will be felt even more so in R 
with it’s weak typing, idiosyncratic casting behavior and NAs. 
>>>> 
>>>> As other’s have pointed out, any kind of behavior one wants from 
string concatenation can be implemented by custom operators as needed. This is 
not something that needs to be in the base R. I would rather like the efforts 
to be directed on improving string formatting (such as glue-style built-in 
string interpolation).
>>>> 
>>> 
>>> This is getting OT, but there is a very good reason why string 
interpolation is not in core R. As I recall it has been considered some time 
ago, but it is very dangerous as it implies evaluation on constants which opens 
a huge security hole and has questionable semantics (where you evaluate etc). 
Hence it's much easier to ban a package than to hack it out of R ;).
>>> 
>>> Cheers,
>>> Simon
>>> 
>>>> — Taras

> []

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] string concatenation operator (revisited)

2021-12-07 Thread Martin Maechler
> Taras Zakharko 
> on Tue, 7 Dec 2021 12:56:30 +0100 writes:

> I fully agree! General string interpolation opens a gaping security hole 
and is accompanied by all kinds of problems and decisions. What I envision 
instead is something like this:
> f”hello {name}” 

> Which gets parsed by R to this:

> (STRINTERPSXP (CHARSXP (PROMISE nil)))

> Basically, a new type of R language construct that still can be processed 
by packages (for customized interpolation like in cli etc.), with a default 
eval which is basically paste0(). The benefit here would be that this is 
eagerly parsed and syntactically checked, and that the promise code could carry 
a srcref. And of course, that you could pass an interpolated string expression 
lazily between frames without losing the environment etc… For more advanced 
applications, a low level string interpolation expression constructor could be 
provided (that could either parse a general string — at the user’s risk, or 
build it directly from expressions). 

> — Taras

Well, many months ago, R's  NEWS (for R-devel, then became R 4.0.0)
contained

* There is a new syntax for specifying _raw_ character constants
  similar to the one used in C++: r"(...)" with ... any character
  sequence not containing the sequence )".  This makes it easier to
  write strings that contain backslashes or both single and double
  quotes.  For more details see ?Quotes.

This should be pretty close to what you propose above
(well, you need to replace your UTF-8 forward double quotes by
ASCII ones),
no ?

>> On 7 Dec 2021, at 12:06, Simon Urbanek  
wrote:
>> 
>> 
>> 
>>> On Dec 7, 2021, at 22:09, Taras Zakharko mailto:taras.zakha...@uzh.ch>> wrote:
>>> 
>>> Great summary, Avi. 
>>> 
>>> String concatenation cold be trivially added to R, but it probably 
should not be. You will notice that modern languages tend not to use “+” to do 
string concatenation (they either have 
>>> a custom operator or a special kind of pattern to do it) due to 
practical issues such an approach brings (implicit type casting, lack of 
commutativity, performance etc.). These issues will be felt even more so in R 
with it’s weak typing, idiosyncratic casting behavior and NAs. 
>>> 
>>> As other’s have pointed out, any kind of behavior one wants from string 
concatenation can be implemented by custom operators as needed. This is not 
something that needs to be in the base R. I would rather like the efforts to be 
directed on improving string formatting (such as glue-style built-in string 
interpolation).
>>> 
>> 
>> This is getting OT, but there is a very good reason why string 
interpolation is not in core R. As I recall it has been considered some time 
ago, but it is very dangerous as it implies evaluation on constants which opens 
a huge security hole and has questionable semantics (where you evaluate etc). 
Hence it's much easier to ban a package than to hack it out of R ;).
>> 
>> Cheers,
>> Simon
>> 
>>> — Taras

 []

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel (r81196) hanging at dchisq(large) (PR#13309)

2021-11-24 Thread Martin Maechler
>>>>> Avraham Adler  on Thu, 18 Nov 2021 02:18:54 + writes:

> Hello.  I have isolated the issue: it is the
> fused-multiply-add instruction set (FMA on Intel
> processors). Running -march=skylake -mno-fma not only does
> not hang, but passes make check-all (using R's native
> BLAS).  My intuition remains that something in the new
> more precise ebd0 code used in dpois_raw—called by dgamma,
> called by dchsq, called by dnchisq—is hanging when the
> assembler uses FMA. Unfortunately, I have come across
> other cases online where the extra precision and the
> different assembler code of FMA vs. non-FMA has caused
> bugs, such as [1]. Page 5 of this paper by Dr. William
> Kahan sheds some light on why this may be happening [2]
> (PDF).

> Martin & Morton, having written (PR#15628 [3]) and/or
> implemented the ebd0 code that is now being used, can
> either of you think of any reason why it would hang if
> compiled using FMA? 

I vaguely remember I had a version of ebd0(), either Morton
Welinder's original, or a slight modification of it that needed some
mending, because in some border case, there was an out of
array-boundary indexing... but that's just a vague recollection.

I had investigated  ebd0()'s behavior quite a bit, also notably
the version -- together with a pure R code version --
in my CRAN package DPQ, yesterday updated to version 0.5-0 on CRAN
{written in Summer, but published to CRAN only yesterday}
where I have  dpois_raw() optionally using several experimental versions of
bd0(), and both 'pure R' and a C version of ebd0(),
as DPQ::ebd0() and DPQ::edb0C()
each with an option  'verbose' which shows you which branches are chosen
for the given arguments.

So, if you install this version (0.5-0 or newer) from the development
sources, using the *same* FMA configuration,
I hope you should see the same "hanging" but would be able to see some
more.. ?

Can you install it from R-forge

install.packages("DPQ", type = "source",
 repos="http://R-Forge.R-project.org;)

and then experiment?
I'd be grateful  {and we maybe can move "off - mailing list"}

Thank you in advance,
Martin

Martin Maechler
ETH Zurich  and  R Core team


> Again, I'm not a professional, but
> line 325 of the ebd0 function in bd0.c [4] has "ADD1(-x *
> log1pmx ((M * fg - x) / x))" which looks like a
> Multiply-Add to me, at least in the inner parenthesis. Is
> there anything that can be, or should be, done with the
> code to prevent the hang, or must we forbid the use of FMA
> instructions (and I guess FMA4 on AMD processors) when
> compiling R?

> Also, What happens in the case where M/x neither over- nor
> under-flowed, M_LN2 * ((double) -e) <= 1. + DBL_MAX / x,
> fg != 1, and after 4 loops of lines 329 & 330, *yh is
> still finite? How does ebd0 exit in that case? There is no
> "return" after line 331. Am I missing something? Could
> that be related to this issue?

> As an aside, could ebd0 be enhanced by using FMA
> instructions on processors which support them?

> Thank you very much,

> Avi

> [1]
> 
https://flameeyes.blog/2014/10/27/the-subtlety-of-modern-cpus-or-the-search-for-the-phantom-bug/
> [2]
> https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
> [3] https://bugs.r-project.org/show_bug.cgi?id=15628 [4]
> https://github.com/wch/r-source/blob/trunk/src/nmath/bd0.c

> On Wed, Nov 17, 2021 at 3:55 PM Avraham Adler
>  wrote:
>> 
>> Hello, Martin et. al.
>> 
>> I apologize for top posting, but I believe I have tracked
>> down the difference why last time my build worked and now
>> it hangs on `dchisq(c(Inf, 1e80, 1e50, 1e40), df=10,
>> ncp=1)`. and it's NOT the BLAS. I built against both 3.15
>> AND R's vanilla and it hung both times. The issue was
>> passing "march=skylake". I own an i7-8700K which gcc
>> considers a skylake. When I pass mtune=skylake, it does
>> not hang and the make check-devel after the build
>> completes.
>> 
>> Below is a list of the different flags passed when using
>> mtune vs.  march. It stands to reason that at least one
>> of them contributed to the hanging issue which Martin
>> fixed in
>> https://bugs.r-project.org/show_bug.cgi?id=13309. While I
>> recognize the obvious ones, I'm not an expert and do not
>> understand which if any may be the culprit. For
>> reference, most of these flags are described here:
>> 
https://gcc.gnu.org/onlinedocs/g

Re: [Rd] Subsetting "dspMatrix" without coercion to "matrix"

2021-11-21 Thread Martin Maechler
> Mikael Jagan 
> on Wed, 17 Nov 2021 17:01:00 -0500 writes:

>> This seems entirely avoidable, given that there is a relatively simple 
>> formula for converting 2-ary indices [i,j] of S to 1-ary indices k of 
>> S[lower.tri(S, TRUE)]:
>> 
>> k <- i + round(0.5 * (2L * n - j) * (j - 1L)) # for i >= j

> I ought to be slightly more precise here: _coercion_ is avoidable, 
> because we can always map [i,j] to [k], but memory limits are not. 
> Certainly S@x[k] cannot be arbitrarily long...

> At the very least, it would be convenient if the subset were performed 
> efficiently whenever dimensions would be dropped anyway:

> * S[i, ] and S[, j] where i and j are vectors indexing exactly zero or 
> one rows/columns
> * S[i] where i is a matrix of the form cbind(i, j)


I agree that this could be improved in the Matrix package;
One reason this never happened is probably that we (the Matrix
package authors) never had a relevant use case for speeding
these up.

Would you be interested in collaboration to improve the Matrix
package to achieve this?

Best regards,
Martin

> This would support, e.g., a memory-efficient 'apply' analogue without 
> any need for MARGIN...

> applySymmetric <- function(X, FUN, ..., simplify = TRUE, check = TRUE) {
>   if (check && !isSymmetric(X)) {
> stop("'X' is not a symmetric matrix.")
>   }
>   ## preprocessing
>   ans <- vector("list", n)
>   for (i in seq_len(n)) {
> ans[[i]] <- forceAndCall(1L, FUN, S[i, ], ...)
>   }
>   ## postprocessing
>   ans
> }

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] substitute

2021-11-18 Thread Martin Maechler
> Duncan Murdoch 
> on Mon, 15 Nov 2021 13:06:23 -0500 writes:

> I'd recommend responding now with a pointer to that bug
> report: whoever at CRAN is dealing with your package
> doesn't necessarily know about the bug report.  You might
> or might not need to make a change in the end, but if you
> do, it could be hard to meet the two week deadline.

> Duncan Murdoch

With thanks to Duncan and Adrian:

Just in case, Adrian  hasn't been following R's bugzilla PR#18232
i.e.  https://bugs.r-project.org/show_bug.cgi?id=18232

There have been extra patches to fix more cases of deparsing
while being more back compatible than what's been in R-devel for
a couple of days.

Notably the changes do revert to previous behavior for the
example you give;  and indeed QCA  passes its own checks again,
after applying the patches.

The changes are under review currently, but the plan is to
commit the changes within a few days.
(read on)

> On 15/11/2021 12:58 p.m., Adrian Dușa wrote:
>> Thank you, I was given a deadline of two weeks to
>> respond, hopefully this will be settled by then.  Best
>> wishes, Adrian
>> 
>> On Mon, 15 Nov 2021 at 19:28, Duncan Murdoch
>> > > wrote:
>> 
>> This looks as though it is related to the recent patch in
>> 
>> https://bugs.r-project.org/show_bug.cgi?id=18232
>> 
>> 
>> I think you should probably wait until that settles down
>> before worrying about it.
>> 
>> Duncan Murdoch
>> 
>> On 15/11/2021 12:18 p.m., Adrian Dușa wrote: > Dear R
>> wizards,
>> >
>> > I have recently been informed about some build errors
>> of my package QCA, > which I was able to trace down to
>> the base function substitute(), with the > following
>> replication example:

foo <- function(x) return(substitute(x))

## In the stable R version 4.0.5, I get the expected result:

foo(A + ~B + C~D)
## A + ~B + C ~ D

BTW: no need for foo()  {and even less for a return(.) in a 1-liner !}

Be assured that we agree that

quote(A + ~B + C~D)

should not "gain" any parentheses, indeed, and what you've been
seeing can well be considered an intermediate step in iterations
to get to improved deparsing in subtle situations.


Thank you for the report, and best regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-patched tarball at https://stat.ethz.ch/R/daily/ outdated

2021-11-18 Thread Martin Maechler
>>>>> Gábor Csárdi 
>>>>> on Wed, 17 Nov 2021 16:11:43 +0100 writes:

> Hi all,

> AFAICT https://stat.ethz.ch/R/daily/R-patched.tar.gz is
> still R 4.0.5 patched.

> Probably needs a branch bump. FYI, Gabor

Yes; this has been an oversight by me (back in March).
It's amazing nobody has seen the issue till now.
I've fixed it a bit more than 3 hours ago.

Thank you, Gábor !

Martin

--
Martin Maechler, ETH Zurich and R Core team

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel (r81196) hanging at dchisq(large) (PR#13309)

2021-11-16 Thread Martin Maechler
> Avraham Adler 
> on Tue, 16 Nov 2021 02:35:56 + writes:

> I am building r-devel on Windows 10 64bit using Jeroen's mingw system,
> and I am finding that my make check-devel hangs on the above issue.
> Everything is vanila except that I am using OpenBLAS 0.3.18. I have
> been using OpenBLAS for over a decade and have not had this issue
> before. Is there anything I can do to dig deeper into this issue from
> my end? Could there be anything that changed in R-devel that may have
> triggered this? The bugzilla report doesn't have any code attached to
> it.

> Thank you,
> Avi

Hmm.. it would've be nice to tell a bit more, instead of having all
your readers to search links, etc.

In the bugzilla bug report PR#13309
https://bugs.r-project.org/show_bug.cgi?id=13309 ,the example was

 dchisq(x=Inf, df=10, ncp=1)

I had fixed the bug 13 years ago, in svn rev 47005
with regression test in /tests/d-p-q-r-tests.R :


## Non-central Chi^2 density for large x
stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))
## did hang in 2.8.0 and earlier (PR#13309).


and you are seeing your version of R hanging at exactly this
location?


I'd bet quite a bit that the underlying code in these
non-central chi square computations *never* calls BLAS and hence
I cannot imagine how openBLAS could play a role.

However, there must be something peculiar in your compiler setup,
compilation options, 
as of course the above regression test has been run 100s of
1000s of times also under Windows in the last 13 years ..

Last but not least (but really only vaguely related):
   There is still a FIXME in the source code (but not about
hanging, but rather of loosing some accuracy in border cases),
see e.g. https://svn.r-project.org/R/trunk/src/nmath/dnchisq.c
and for that reason I had written an R version of that C code
even back in 2008 which I've made available in  CRAN package
DPQ  a few years ago (together with many other D/P/Q
distribution computations/approximations).
 -> https://cran.r-project.org/package=DPQ

Best,
Martin


> sessionInfo:
> R Under development (unstable) (2021-11-15 r81196)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 19043)

> Matrix products: default

> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252

> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base

> loaded via a namespace (and not attached):
> [1] compiler_4.2.0 tools_4.2.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] match.arg With S4 Methods and Missing Inputs

2021-11-08 Thread Martin Maechler
>>>>> Georgi Boshnakov 
>>>>> on Mon, 8 Nov 2021 09:46:00 + writes:

> You need to define the generic with a default value for
> this parameter. Methods can then have a different default
> value for it. 
> I remember reading this in S4's documentation but don't remember where.

> Georgi Boshnakov

interesting ... and would make quite some sense.

Can others confirm / disprove ?

Even as co-author of the "using S4 almost everywhere" package 'Matrix'
I wouldn't have known this.

If this is seen to be true (I don't have time for checking just now),
I think it's something we really *should* add to one or more of
the related help pages.

Martin Maechler


> 

> Sent: Monday, November 8, 2021 5:37:18 AM
> To: Dario Strbenac 
> Cc: r-package-devel@r-project.org 
> Subject: Re: [R-pkg-devel] match.arg With S4 Methods and Missing Inputs

>> From the line `function(A, B) standardGeneric("SetOfParams")`, A and B 
will
> always have default values of R_MissingArg
> Providing default values within the methods does nothing since A and B 
have
> already been initialized before arriving at the method.
> You could do something like:


> if (missing(A))
> A <- ...
> if (missing(B))
> B <- ...


> within each method, and that would emulate having default values for A and
> B.

> On Mon, Nov 8, 2021 at 12:00 AM Dario Strbenac 

> wrote:

>> Good day,
>> 
>> How can a parameter take a default value from a vector of permitted ones,
>> if it is missing?
>> 
>> setClassUnion("characterOrMissing", c("character", "missing"))
>> setClassUnion("integerOrMissing", c("integer", "missing"))
>> setClass("SetOfParams", representation(A = "characterOrMissing", B =
>> "integer"))
>> setGeneric("SetOfParams", function(A, B) standardGeneric("SetOfParams"))
>> 
>> setMethod("SetOfParams", c("missing", "missing"), function() # Empty 
constructor
>> {
>> new("SetOfParams", A = "M", B = 100L)
>> })
>> 
>> setMethod("SetOfParams", c("characterOrMissing", "integerOrMissing"),
>> function(A = c("L", "M", "N"), B = 100L)
>> {
>> A <- match.arg(A)
>> new("SetOfParams", A = A, B = B)
>> })
>> 
>> SetOfParams(B = 500L)
>> Error in match.arg(A) : argument "A" is missing, with no default.
>> 
>> How can I avoid the error about A having no default? I thought I 
specified
>> it so that it does have one, which match.arg would set for me if the user
>> did not specify one.
>> 
>> --
>> Dario Strbenac
>> University of Sydney
>> Camperdown NSW 2050
>> Australia
>> __
>> R-package-devel@r-project.org mailing list

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] gettext(msgid, domain="R") doesn't work for some 'msgid':s

2021-11-06 Thread Martin Maechler
>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Sat, 6 Nov 2021 08:07:58 + (UTC) writes:

> This issue has come up before: 
https://stat.ethz.ch/pipermail/r-help/2013-February/346721.html ("gettext 
wierdness"), https://stat.ethz.ch/pipermail/r-devel/2007-December/047893.html 
("gettext() and messages in 'pkg' domain").
> Using 'ngettext' is a workaround, like in 
https://rdrr.io/cran/svMisc/src/R/svMisc-internal.R .

Thank you for the pointers!

> It is documented: "For 'gettext', leading and trailing whitespace is 
ignored when looking for the translation."

Indeed; and it *is* a feature  but really only valuable when the
msgid's (the original message strings) do *not* contain such
whitespace.
And, in fact, when xgettext() or xgettext2pot() from pkg 'tools'
are used to create the original *.pot files, they *also* trim
leading and trailing \n, \t and spaces.

So ideally there should not be any   end(or beginning)-of-line
"\n" in the R-base.pot (and hence corresponding  -base.po )
and as I mentioned there *are* only a few, and
we could (should?) consider to remove them from there.

A "problem" is still in the many C-code msgid's  where
end-of-line-"\n" are common.

Yes, indeed, one can use the workaround Suharto mentions,
ngettext()  even though users will typically only look at
ngettext() if they want / need to learn about plural/singular
messages ...

I.e. in our case, this works, and Henrik could get what he wants

> Sys.setenv(LANGUAGE = "de")
> ngettext(1,"Execution halted\n", "", domain="R")
[1] "Ausführung angehalten\n"

but it's still not so satisfactory, that you cannot use
gettext() itself to look at a considerable proportion of the
C/C++/.. level error messages just because they end with "\n".

One possibility would be to introduce an optional
`trim = TRUE` argument, so the above could be achieved (more
efficiently and naturally) by

   gettext("Execution halted\n", domain="R", trim=FALSE)

but in any case, to *not* do the trimming anymore in general,
as I proposed yesterday (see below) is not a good idea.

> 
>>> Martin Maechler
>>>>> on Fri, 5 Nov 2021 17:55:24 +0100 writes:

>>>>> Tomas Kalibera
>>>>> on Fri, 5 Nov 2021 16:15:19 +0100 writes:

>>> On 11/5/21 4:12 PM, Duncan Murdoch wrote:
>>>> On 05/11/2021 10:51 a.m., Henrik Bengtsson wrote:
>>>>> I'm trying to reuse some of the translations available in base R by
>>>>> using:
>>>>> 
>>>>>    gettext(msgid, domain="R")
>>>>> 
>>>>> This works great for most 'msgid's, e.g.
>>>>> 
>>>>> $ LANGUAGE=de Rscript -e 'gettext("cannot get working directory",
>>>>> domain="R")'
>>>>> [1] "kann das Arbeitsverzeichnis nicht ermitteln"
>>>>> 
>>>>> However, it does not work for all.  For instance,
>>>>> 
>>>>> $ LANGUAGE=de Rscript -e 'gettext("Execution halted\n", domain="R")'
>>>>> [1] "Execution halted\n"
>>>>> 
>>>>> This despite that 'msgid' existing in:
>>>>> 
>>>>> $ grep -C 2 -F 'Execution halted\n' src/library/base/po/de.po
>>>>> 
>>>>> #: src/main/main.c:342
>>>>> msgid "Execution halted\n"
>>>>> msgstr "Ausführung angehalten\n"
>>>>> 
>>>>> It could be that the trailing newline causes problems, because the
>>>>> same happens also for:
>>>>> 
>>>>> $ LANGUAGE=de Rscript --vanilla -e 'gettext("error during cleanup\n",
>>>>> domain="R")'
>>>>> [1] "error during cleanup\n"
>>>>> 
>>>>> Is this meant to work, and if so, how do I get it to work, or is it a
>>>>> bug?
>>>> 
>>>> I don't know the solution, but I think the cause is different than you
>>>> think, because I also have the problem with other strings not
>>>> including "\n":
>>>> 
>>>> $ LANGUAGE=de Rscript -e 'gettext("malformed version string",
>>>> domain="R")'
>>>> [1] "malformed version string"

>> You need domain="R-base" for the "malformed vers

Re: [Rd] gettext(msgid, domain="R") doesn't work for some 'msgid':s

2021-11-05 Thread Martin Maechler
>>>>> Martin Maechler 
>>>>> on Fri, 5 Nov 2021 17:55:24 +0100 writes:

>>>>> Tomas Kalibera 
>>>>> on Fri, 5 Nov 2021 16:15:19 +0100 writes:

>> On 11/5/21 4:12 PM, Duncan Murdoch wrote:
>>> On 05/11/2021 10:51 a.m., Henrik Bengtsson wrote:
>>>> I'm trying to reuse some of the translations available in base R by 
>>>> using:
>>>> 
>>>>    gettext(msgid, domain="R")
>>>> 
>>>> This works great for most 'msgid's, e.g.
>>>> 
>>>> $ LANGUAGE=de Rscript -e 'gettext("cannot get working directory", 
>>>> domain="R")'
>>>> [1] "kann das Arbeitsverzeichnis nicht ermitteln"
>>>> 
>>>> However, it does not work for all.  For instance,
>>>> 
>>>> $ LANGUAGE=de Rscript -e 'gettext("Execution halted\n", domain="R")'
>>>> [1] "Execution halted\n"
>>>> 
>>>> This despite that 'msgid' existing in:
>>>> 
>>>> $ grep -C 2 -F 'Execution halted\n' src/library/base/po/de.po
>>>> 
>>>> #: src/main/main.c:342
>>>> msgid "Execution halted\n"
>>>> msgstr "Ausführung angehalten\n"
>>>> 
>>>> It could be that the trailing newline causes problems, because the
>>>> same happens also for:
>>>> 
>>>> $ LANGUAGE=de Rscript --vanilla -e 'gettext("error during cleanup\n",
>>>> domain="R")'
>>>> [1] "error during cleanup\n"
>>>> 
>>>> Is this meant to work, and if so, how do I get it to work, or is it a 
>>>> bug?
>>> 
>>> I don't know the solution, but I think the cause is different than you 
>>> think, because I also have the problem with other strings not 
>>> including "\n":
>>> 
>>> $ LANGUAGE=de Rscript -e 'gettext("malformed version string", 
>>> domain="R")'
>>> [1] "malformed version string"

> You need domain="R-base" for the  "malformed version "string"


>> I can reproduce Henrik's report and the problem there is that the 
>> trailing \n is stripped by R before doing the lookup, in do_gettext


>>     /* strip leading and trailing white spaces and
>>    add back after translation */
>>     for(p = tmp;
>>     *p && (*p == ' ' || *p == '\t' || *p == '\n');
>>     p++, ihead++) ;

>> But, calling dgettext with the trailing \n does translate correctly for 
me.

>> I'd leave to translation experts how this should work (e.g. whether the 
>> .po files should have trailing newlines).

> Thanks a lot, Tomas.
> This is "interesting" .. and I think an R bug  one way or the
> other (and I also note that Henrik's guess was also right on !).

> We have the following:

> - New translation *.po source files are to be made from the original 
*.pot  files.

> In our case it's our code that produce  R.pot and R-base.pot  
> (and more for the non-base packages, and more e.g. for
> Recommended packages 'Matrix' and 'cluster' I maintain).

> And notably the R.pot (from all the "base" C error/warn/.. messages)
> contains tons of msgid strings of the form  "...\n"
> i.e., ending in \n.
>> From that automatically the translator's  *.po files should also
> end in \n.

> Additionally, the GNU gettext FAQ has
> (here :   https://www.gnu.org/software/gettext/FAQ.html#newline )

> 
> Q: What does this mean: “'msgid' and 'msgstr' entries do not both end 
with '\n'”

> A: It means that when the original string ends in a newline, your 
translation must also end in a newline. And if the original string does not end 
in a newline, then your translation should likewise not have a newline at the 
end.
> 
 
>> From all that I'd conclude that we (R base code) are the source
> of the problem.
> Given the above FAQ, it seems common in other projects also to
> have such trailing \n  and so we should really change the C code
> you cite above.

> On the other hand, this is from almost the very beginning of
   

Re: [Rd] gettext(msgid, domain="R") doesn't work for some 'msgid':s

2021-11-05 Thread Martin Maechler
> Tomas Kalibera 
> on Fri, 5 Nov 2021 16:15:19 +0100 writes:

> On 11/5/21 4:12 PM, Duncan Murdoch wrote:
>> On 05/11/2021 10:51 a.m., Henrik Bengtsson wrote:
>>> I'm trying to reuse some of the translations available in base R by 
>>> using:
>>> 
>>>    gettext(msgid, domain="R")
>>> 
>>> This works great for most 'msgid's, e.g.
>>> 
>>> $ LANGUAGE=de Rscript -e 'gettext("cannot get working directory", 
>>> domain="R")'
>>> [1] "kann das Arbeitsverzeichnis nicht ermitteln"
>>> 
>>> However, it does not work for all.  For instance,
>>> 
>>> $ LANGUAGE=de Rscript -e 'gettext("Execution halted\n", domain="R")'
>>> [1] "Execution halted\n"
>>> 
>>> This despite that 'msgid' existing in:
>>> 
>>> $ grep -C 2 -F 'Execution halted\n' src/library/base/po/de.po
>>> 
>>> #: src/main/main.c:342
>>> msgid "Execution halted\n"
>>> msgstr "Ausführung angehalten\n"
>>> 
>>> It could be that the trailing newline causes problems, because the
>>> same happens also for:
>>> 
>>> $ LANGUAGE=de Rscript --vanilla -e 'gettext("error during cleanup\n",
>>> domain="R")'
>>> [1] "error during cleanup\n"
>>> 
>>> Is this meant to work, and if so, how do I get it to work, or is it a 
>>> bug?
>> 
>> I don't know the solution, but I think the cause is different than you 
>> think, because I also have the problem with other strings not 
>> including "\n":
>> 
>> $ LANGUAGE=de Rscript -e 'gettext("malformed version string", 
>> domain="R")'
>> [1] "malformed version string"

You need domain="R-base" for the  "malformed version "string"


> I can reproduce Henrik's report and the problem there is that the 
> trailing \n is stripped by R before doing the lookup, in do_gettext


>     /* strip leading and trailing white spaces and
>    add back after translation */
>     for(p = tmp;
>     *p && (*p == ' ' || *p == '\t' || *p == '\n');
>     p++, ihead++) ;

> But, calling dgettext with the trailing \n does translate correctly for 
me.

> I'd leave to translation experts how this should work (e.g. whether the 
> .po files should have trailing newlines).

Thanks a lot, Tomas.
This is "interesting" .. and I think an R bug  one way or the
other (and I also note that Henrik's guess was also right on !).

We have the following:

- New translation *.po source files are to be made from the original *.pot  
files.

  In our case it's our code that produce  R.pot and R-base.pot  
  (and more for the non-base packages, and more e.g. for
   Recommended packages 'Matrix' and 'cluster' I maintain).

And notably the R.pot (from all the "base" C error/warn/.. messages)
contains tons of msgid strings of the form  "...\n"
i.e., ending in \n.
>From that automatically the translator's  *.po files should also
end in \n.

Additionally, the GNU gettext FAQ has
 (here :   https://www.gnu.org/software/gettext/FAQ.html#newline )


Q: What does this mean: “'msgid' and 'msgstr' entries do not both end with '\n'”

A: It means that when the original string ends in a newline, your translation 
must also end in a newline. And if the original string does not end in a 
newline, then your translation should likewise not have a newline at the end.

 
>From all that I'd conclude that we (R base code) are the source
of the problem.
Given the above FAQ, it seems common in other projects also to
have such trailing \n  and so we should really change the C code
you cite above.

On the other hand, this is from almost the very beginning of
when Brian added translation to R,

r32938 | ripley | 2005-01-30 20:24:04 +0100 (Sun, 30 Jan 2005) | 2 lines

include \n in whitespace ignored for R-level gettext


I think this has been because simultaneously we had started to
emphasize to useRs  they should *not* end message/format strings
in stop() / warning()  by a new line, but rather stop() and
warning() would *add* the newlines(s) themselves.

Still, currently we have a few such cases in  R-base.pot,
but just these few and maybe they really are "in error", in the
sense we could drop the ending '\n' (and do the same in all the *.po files!),
and newlines would be appended later {{not just by Rstudio which
   graceously adds final newlines in its R console, even for say
   cat("abc") }}

However, this is quite different for all the message strings from C, as
used there in  error() or warn() e.g., and so in   R.pot
we see many many msg strings ending in "\n" (which must then
also be in the *.po files.

My current conclusion is we should try simplifying the
do_gettext() code and 

Re: [Rd] Wrong number of names?

2021-11-01 Thread Martin Maechler
> Duncan Murdoch 
> on Mon, 1 Nov 2021 06:36:17 -0400 writes:

> The StackOverflow post
> https://stackoverflow.com/a/69767361/2554330 discusses a
> dataframe which has a named numeric column of length 1488
> that has 744 names. I don't think this is ever legal, but
> am I wrong about that?

> The `dat.rds` file mentioned in the post is temporarily
> available online in case anyone else wants to examine it.

> Assuming that the file contains a badly formed object, I
> wonder if readRDS() should do some sanity checks as it
> reads.

> Duncan Murdoch

Good question.

In the mean time, I've also added a bit on the SO page
above.. e.g.

---

d <- readRDS("<.>dat.rds")
str(d)
## 'data.frame':1488 obs. of  4 variables:
##  $ facet_var: chr  "AUT" "AUT" "AUT" "AUT" ...
##  $ date : Date, format: "2020-04-26" "2020-04-27" ...
##  $ variable : Factor w/ 2 levels "arima","prophet": 1 1 1 1 1 1 1 1 1 1 ...
##  $ score: Named num  2.74e-06 2.41e-06 2.48e-06 2.39e-06 2.79e-06 ...
##   ..- attr(*, "names")= chr [1:744] "new_confirmed10" "new_confirmed10" 
"new_confirmed10" "new_confirmed10" ...

ds <- d$score
c(length(ds), length(names(ds)))
## 1488   744

dput(ds) # -> 

##  *** caught segfault ***
## address (nil), cause 'memory not mapped'

---

Hence  "proving" that the dat.rds  really contains an invalid object,
when simple  dput(.) directly gives a segmentation fault.

I think we are aware that using C code and say .Call(..)  one
can create all kinds of invalid objects "easily".. and I think
it's clear that it's not feasible to check for validity of such
objects "everwhere".

Your proposal to have at least our deserialization code used in
readRDS() do (at least *some*) validity checks seems good, but
maybe we should think of more cases, and / or  do such validity
checks already during serialization { <-> saveRDS() here } ?

.. Such questions then really are for those who understand more than
me about (de)serialization in R, its performance bottlenecks etc.
Given the speed impact we should probably have such checks *optional*
but have them *on* by default e.g., at least for saveRDS() ?

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug (?) in vignette handling

2021-10-29 Thread Martin Maechler
> Duncan Murdoch 
> on Thu, 28 Oct 2021 13:18:54 -0400 writes:

> This StackOverflow post:  https://stackoverflow.com/q/69756236/2554330 
> points out that objects created in one vignette are available in a later 
> vignette.  I don't think this should be happening:  vignettes should be 
> self-contained.

I strongly agree.

> The current answer there, https://stackoverflow.com/a/69758025/2554330, 
> suggests that "R CMD check" will detect this.  However, sometimes one 
> vignette can replace a standard function with a custom version, and then 
> both will work without generating an error, but the second vignette 
> won't do the same thing if run independently.

> For example, try these pure Sweave vignettes:

> -
> aaa3.Rnw:
> -
> \documentclass{article}
> %\VignetteIndexEntry{Sweave aaa3}
> \begin{document}

> <<>>=
> mean <- function(x) "I am the Sweave mean"
> @

> \end{document}

> 
> aaa4.Rnw:
> 

> \documentclass{article}
> %\VignetteIndexEntry{Sweave aaa4}
> \begin{document}

> <<>>=
> mean(1:5)
> @

> \end{document}

> Put these in a package, build and install the package, and you'll see 
> that the mean() function in aaa4.Rnw prints the result from the 
> redefined mean in aaa3.Rnw.

Is it because R is *not* run with  --no-save --no-restore
accidentally?
Without looking, I would not expect that the vignettes are run
inside the same running R (even though that may speedup things)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fwd: Using existing envars in Renviron on friendly Windows

2021-10-21 Thread Martin Maechler
> Michał Bojanowski 
> on Wed, 20 Oct 2021 16:31:08 +0200 writes:

> Hello Tomas,
> Yes, that's accurate although rather terse, which is perhaps the
> reason why I did not realize it applies to my case.

> How about adding something in the direction of:

> 1. Continuing the cited paragraph with:
> In particular, on Windows it may be necessary to quote references to
> existing environment variables, especially those containing file paths
> (which include backslashes). For example: `"${WINVAR}"`.

> 2. Add an example (not run):

> # On Windows do quote references to variables containing paths, e.g.:
> # If APPDATA=C:\Users\foobar\AppData\Roaming
> # to point to a library tree inside APPDATA in .Renviron use
> R_LIBS_USER="${APPDATA}"/R-library

> Incidentally the last example is on backslashes too.


> What do you think?

I agree that adding an example really helps a lot in such cases,
in my experience, notably if it's precise enough to be used +/- directly.



> On Mon, Oct 18, 2021 at 5:02 PM Tomas Kalibera  
wrote:
>> 
>> 
>> On 10/15/21 6:44 PM, Michał Bojanowski wrote:
>> > Perhaps a small update to ?.Renviron would be in order to mention 
that...
>> 
>> Would you have a more specific suggestion how to update the
>> documentation? Please note that it already says
>> 
>> "‘value’ is then processed in a similar way to a Unix shell: in
>> particular the outermost level of (single or double) quotes is stripped,
>> and backslashes are removed except inside quotes."
>> 
>> Thanks,
>> Tomas
>> 
>> > On Fri, Oct 15, 2021 at 6:43 PM Michał Bojanowski 
 wrote:
>> >> Indeed quoting works! Kevin suggested the same, but he didnt reply to 
the list.
>> >> Thank you all!
>> >> Michal
>> >>
>> >> On Fri, Oct 15, 2021 at 6:40 PM Ivan Krylov  
wrote:
>> >>> Sorry for the noise! I wasn't supposed to send my previous message.
>> >>>
>> >>> On Fri, 15 Oct 2021 16:44:28 +0200
>> >>> Michał Bojanowski  wrote:
>> >>>
>>  AVAR=${APPDATA}/foo/bar
>> 
>>  Which is a documented way of referring to existing environment
>>  variables. Now, with that in R I'm getting:
>> 
>>  Sys.getenv("APPDATA")# That works OK
>>  [1] "C:\\Users\\mbojanowski\\AppData\\Roaming"
>> 
>>  so OK, but:
>> 
>>  Sys.getenv("AVAR")
>>  [1] "C:UsersmbojanowskiAppDataRoaming/foo/bar"
>> >>> Hmm, a function called by readRenviron does seem to remove 
backslashes,
>> >>> but not if they are encountered inside quotes:
>> >>>
>> >>> 
https://github.com/r-devel/r-svn/blob/3f8b75857fb1397f9f3ceab6c75554e1a5386adc/src/main/Renviron.c#L149
>> >>>
>> >>> Would AVAR="${APPDATA}"/foo/bar work?
>> >>>
>> >>> --
>> >>> Best regards,
>> >>> Ivan
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Is there a better way ...?

2021-10-21 Thread Martin Maechler
>>>>> Duncan Murdoch 
>>>>> on Thu, 21 Oct 2021 08:09:02 -0400 writes:

> I agree with almost everything Deepayan said, but would add one thing:
> On 21/10/2021 3:41 a.m., Deepayan Sarkar wrote:
> ...

>> My suggestion is having a package-specific environment, and Duncan's
>> is to have a function-specific environment. If you only need this for
>> this one function, then that should be good enough. If you eventually
>> want to access the persistent information from multiple functions,
>> having a package-specific environment would be more useful.

> I agree with that statement, but those aren't the only two choices. 
> Your local() call can create several functions and return them in a 
> list; then just those functions have access to the local variables.  For 
> example,

> createFns <- local({

> .fooInfo <- NULL

> fn1 <- function (...) { ... }
> fn2 <- function (...) { ... }

> list(fn1 = fn1, fn2 = fn2)
> })

> fns <- createFns()
> fn1 <- fns$fn1
> fn2 <- fns$fn2

> Now fn1 and fn2 are functions that can see .fooInfo, and nobody else can 
> (without going through contortions).

> One other difference between this approach and the package-specific 
> environment:  there's only one package-specific environment in 
> Deepayan's formulation, but I could call createFns() several times, 
> creating several pairs of functions, each pair with its own independent 
> version of .fooInfo.

> I don't know if that's something that would be useful to you, but 
> conceivably you'd want to maintain partial plots in several different 
> windows, and that would allow you to do so.

Note that the above approach has been how  nls()  has been
implemented for R ... a very long time ago {before R 1.0.0}

e.g. from  example(nls) :

DNase1 <- subset(DNase, Run == 1)
fm1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1)
str(fm1 $ m)
> List of 16
>  $ resid :function ()  
>  $ fitted:function ()  
>  $ formula   :function ()  
>  $ deviance  :function ()  
>  $ lhs   :function ()  
>  $ gradient  :function ()  
>  $ conv  :function ()  
>  $ incr  :function ()  
>  $ setVarying:function (vary = rep_len(TRUE, np))  
>  $ setPars   :function (newPars)  
>  $ getPars   :function ()  
>  $ getAllPars:function ()  
>  $ getEnv:function ()  
>  $ trace :function ()  
>  $ Rmat  :function ()  
>  $ predict   :function (newdata = list(), qr = FALSE)  
>  - attr(*, "class")= chr "nlsModel"

## so 16 functions, all sharing the *same* environment very
## efficiently and nicely

## this is *the* environment for the fitted model :
fmE <- environment(fm1$m[[1]])
ls.str(fmE)
> convCrit : function ()  
> dev :  num 0.00479
> env :  
> form : Class 'formula'  language density ~ SSlogis(log(conc), Asym, xmid, 
> scal)
> getPars : function ()  
>  
>  
>  

so the environment "contains" the functions themselves (but quite
a few more things) and for an environment that means it only
has pointers to the same function objects which are *also* in  `fm1$m`.

So, there has been a nice convincing and important example on
how to do this - inside R for more than two decennia.

Martin Maechler

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] stats::fft produces inconsistent results

2021-10-20 Thread Martin Maechler
>>>>> Martin Maechler 
>>>>> on Wed, 20 Oct 2021 11:26:21 +0200 writes:

[]

> Thank you, André , that's very good.

> Just to state the obvious conclusion:

> If Ben's suggestion is correct (and André has explained *how*
> that could happen) this would mean  a
> SEVERE BUG in package ravetools's  mvfftw() function.

> and it would have been (yet another) case of gaining speed by
> killing correctness...

> ... but then ravetools  is not even a CRAN package, so why
> should you dare to use it for anything serious ?

> ... yes, being grouchy ..

which I should rather not be.

Dipterix Wang *did* say initially that he is currently
developing ravetools so it's very reasonabl this is not yet a
CRAN package..

Best,
Martin

>> -Message d'origine-
>> De : R-devel  De la part de Ben Bolker
>> Envoyé : mercredi 20 octobre 2021 03:27
>> À : r-devel@r-project.org
>> Objet : Re: [Rd] stats::fft produces inconsistent results


>> This is a long shot, but here's a plausible scenario:

>> as part of its pipeline, ravetools::mvfftw computes the mean of the
>> input vector **and then centers it to a mean of zero** (intentionally or
>> accidentally?)

>> because variables are passed to compiled code by reference (someone
>> can feel free to correct my terminology), this means that the original
>> vector in R now has a mean of zero

>> the first element of fft() is mean(x)*length(x), so if mean(x) has
>> been forced to zero, that would explain your issue.

>> I don't know about the non-reproducibility part.

>> On 10/19/21 7:06 PM, Dipterix Wang wrote:
>>> Dear R-devel Team,
>>> 
>>> I'm developing a neuroscience signal pipeline package in R 
(https://github.com/dipterix/ravetools) and I noticed a weird issue that failed 
my unit test.
>>> 
>>> Basically I was trying to use `fftw3` library to implement fast 
multivariate fft function in C++. When I tried to compare my results with 
stats::fft, the test result showed the first element of **expected** (which was 
produced by stats::fft) was zero, which, I am pretty sure, is wrong, and I can 
confirm that my function produces correct results.
>>> 
>>> However, somehow I couldn’t reproduce this issue on my personal 
computer (osx, M1, R4.1.1), the error simply went away.
>>> 
>>> The catch is my function produced consistent and correct results but 
stats::fft was not. This does not mean `stats::fft` has bugs. Instead, I 
suspect there could be some weird interactions between my code and stats::fft 
at C/C++ level, but I couldn’t figure it out why.
>>> 
>>> +++ Details:
>>> 
>>> Here’s the code I used for the test:
>>> 
>>> 
https://github.com/dipterix/ravetools/blob/4dc35d64763304aff869d92dddad38a7f2b30637/tests/testthat/test-fftw.R#L33-L41
>>> 
>>> Test code
>>> set.seed(1)
>>> x <- rnorm(1000)
>>> dim(x) <- c(100,10)
>>> a <- ravetools:::mvfftw_r2c(x, 0)
>>> c <- apply(x, 2, stats::fft)[1:51,]
>>> expect_equal(a, c)
>>> 
>>> 
>>> Here are the tests that gave me the errors:
>>> 
>>> The test logs on win-builder
>>> https://win-builder.r-project.org/07586ios8AbL/00check.log
>>> 
>>> Test logs on GitHub
>>> 
https://github.com/dipterix/ravetools/runs/3944874310?check_suite_focus=true
>>> 
>>> 
>>> —— Failed tests ——
>>> -- Failure (test-fftw.R:41:3): mvfftw_r2c 
--
>>> `a` (`actual`) not equal to `c` (`expected`).
>>> 
>>> actual vs expected
>>> [,1][,2]  [,3]  
[,4]...
>>> - actual[1, ] 10.8887367+ 0.000i  -3.7808077+ 0.000i   
2.967354+ 0.00i   5.160186+ 0.00i ...
>>> + expected[1, ]0.000+ 0.000i  -3.7808077+ 0.000i   
2.967354+ 0.00i   5.160186+ 0.00i...
>>> 
>>> 
>>> 
>>> The first columns are different, `actual` is the results I produced via 
`ravetools:::mvfftw_r2c`, and `expected` was produced by `stats::fft`
>>> 
>>> 
>>> Any help or attention is very much appreciated.
>>> Thanks,
>>> - Zhengjia

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stats::fft produces inconsistent results

2021-10-20 Thread Martin Maechler
> GILLIBERT, Andre 
> on Wed, 20 Oct 2021 08:10:00 + writes:

> Hello,
> That sounds like a good diagnosis!
> Indeed, R vectors are passed "by reference" to C code, but the semantic 
must be "by value", i.e. the C function must NOT change the contents of the 
vector, except in very specific cases.

> A good program that has to work on a vector, must first duplicate the 
vector, unless the only reference to the vector is the reference inside the C 
function.
> This can be tested by the MAYBE_REFERENCED() macro in Rinternals.h.

> A good example can be found in the fft() function in 
src/library/stats/src/fourier.c in R source code:
> switch (TYPEOF(z)) {
> case INTSXP:
> case LGLSXP:
> case REALSXP:
> z = coerceVector(z, CPLXSXP);
> break;
> case CPLXSXP:
> if (MAYBE_REFERENCED(z)) z = duplicate(z);
> break;
> default:
> error(_("non-numeric argument"));
> }
> PROTECT(z);

> This code coerces non-complex vectors to complex. Since this makes a 
copy, there is no need to duplicate.
> Complex vectors are duplicated, unless they are not referenced by 
anything but the fft() function.

> Now, the z vector can be modified "in place" without inconsistency.

> Properly using R vectors in C code is tricky. You have to understand.
> 1) When you are allowed or not to modify vectors
> 2) When to PROTECT()vectors
> 3) How the garbage collector works and when it can trigger (answer : 
basically, when you call any internal R function)

> Chapter 5 of "Writing R Extensions" documentation is quite extensive:
> 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C

> -- 
> Sincerely
> André GILLIBERT

Thank you, André , that's very good.

Just to state the obvious conclusion:

If Ben's suggestion is correct (and André has explained *how*
that could happen) this would mean  a
SEVERE BUG in package ravetools's  mvfftw() function.

and it would have been (yet another) case of gaining speed by
killing correctness...

... but then ravetools  is not even a CRAN package, so why
should you dare to use it for anything serious ?

... yes, being grouchy ..

> -Message d'origine-
> De : R-devel  De la part de Ben Bolker
> Envoyé : mercredi 20 octobre 2021 03:27
> À : r-devel@r-project.org
> Objet : Re: [Rd] stats::fft produces inconsistent results


> This is a long shot, but here's a plausible scenario:

> as part of its pipeline, ravetools::mvfftw computes the mean of the
> input vector **and then centers it to a mean of zero** (intentionally or
> accidentally?)

> because variables are passed to compiled code by reference (someone
> can feel free to correct my terminology), this means that the original
> vector in R now has a mean of zero

> the first element of fft() is mean(x)*length(x), so if mean(x) has
> been forced to zero, that would explain your issue.

> I don't know about the non-reproducibility part.

> On 10/19/21 7:06 PM, Dipterix Wang wrote:
>> Dear R-devel Team,
>> 
>> I'm developing a neuroscience signal pipeline package in R 
(https://github.com/dipterix/ravetools) and I noticed a weird issue that failed 
my unit test.
>> 
>> Basically I was trying to use `fftw3` library to implement fast 
multivariate fft function in C++. When I tried to compare my results with 
stats::fft, the test result showed the first element of **expected** (which was 
produced by stats::fft) was zero, which, I am pretty sure, is wrong, and I can 
confirm that my function produces correct results.
>> 
>> However, somehow I couldn’t reproduce this issue on my personal computer 
(osx, M1, R4.1.1), the error simply went away.
>> 
>> The catch is my function produced consistent and correct results but 
stats::fft was not. This does not mean `stats::fft` has bugs. Instead, I 
suspect there could be some weird interactions between my code and stats::fft 
at C/C++ level, but I couldn’t figure it out why.
>> 
>> +++ Details:
>> 
>> Here’s the code I used for the test:
>> 
>> 
https://github.com/dipterix/ravetools/blob/4dc35d64763304aff869d92dddad38a7f2b30637/tests/testthat/test-fftw.R#L33-L41
>> 
>> Test code
>> set.seed(1)
>> x <- rnorm(1000)
>> dim(x) <- c(100,10)
>> a <- ravetools:::mvfftw_r2c(x, 0)
>> c <- apply(x, 2, stats::fft)[1:51,]
>> expect_equal(a, c)
>> 
>> 
>> Here are the tests that gave me the errors:
>> 
>> The test logs on win-builder
>> https://win-builder.r-project.org/07586ios8AbL/00check.log
>> 
>> Test logs on GitHub
>> 
https://github.com/dipterix/ravetools/runs/3944874310?check_suite_focus=true
>> 
>> 
>> —— Failed tests ——
>> -- Failure (test-fftw.R:41:3): mvfftw_r2c 

Re: [Rd] Potential bugs in table dnn

2021-10-14 Thread Martin Maechler
Dear Thomas,

actually, I have in the mean time already applied the changes I
think are needed,
both in the code and in the documentation.

So, in this case, it may be a waste of time to still open a
bugzilla issue, I think.

Here are my current changes (not yet committed; of course I would also add
a NEWS entry, mentioning you):


Index: src/library/base/R/table.R
===
53c53
<   if (length(dnn) != length(args))
---
>   if(length(args) == 1L || length(dnn) != length(args))
Index: src/library/base/man/table.Rd
===
23c23
<   \code{table} uses the cross-classifying factors to build a contingency
---
>   \code{table} uses cross-classifying factors to build a contingency
41c41,42
< (including character strings), or a list (or data frame) whose
---
> (including numbers or character strings), or a \code{\link{list}} (such
> as a data frame) whose
67c68,69
<   If the argument \code{dnn} is not supplied, the internal function
---
>   If the argument \code{dnn} is not supplied \emph{and} if \code{\dots} is
>   not one \code{list} with its own \code{\link{names}()}, the internal 
> function



With regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Potential bugs in table dnn

2021-10-14 Thread Martin Maechler
> SOEIRO Thomas 
> on Wed, 13 Oct 2021 11:12:09 + writes:

> Inline comments below in the previous message I'm not 100%
> sure if the current behavior is intended or not. If not,
> here is a patch (which I can submit on R Bugzilla if
> appropriate):

Excuse us for not replying earlier, Thomas, but yes, there is a
buglet in generating dimnames when creating table() objects,
but I think *not* in the behaviour you want to change
because that *is* partly purposeful and not a bug (in code).

Rather it's incomplete documentation which currently does not
cover that case ... and I see your proposed patch also tries to
address the issue of "too terse" documentation.

The only bug I see is that here,

  R> table(warpbreaks[3])

   L  M  H 
  18 18 18 
  R> 

the automatic dnn's (= [d]im[n]ames' [n]ames) are not taken as
in the (>= 2)-dim case,

  R> table(warpbreaks[-1])
  tension
  wool L M H
 A 9 9 9
 B 9 9 9
  R>

However, I definitely would not want to see anything different
than what we see now for

  R> table(FOOBAR = warpbreaks[-1])
  tension
  wool L M H
 A 9 9 9
 B 9 9 9
  R>

where indeed, the FOOBAR should be *kept* disregarded
(as it should in  table(FOOBAR = warpbreaks[3])  once we fix the
 1D --- {1-argument with own dimnames} case)

and of course, this should also stay as is, undisputedly:

  R> table(POISSON_7 = rpois(100, 7))
  POISSON_7
   2  3  4  5  6  7  8  9 10 11 12 13 14 
   4  5 14 16 20  8  8 13  1  5  3  2  1 
  R>

I'm fine if you move this to R bugzilla  {where it remains more
easily findable in say 1 year's time}.

Thank you for the report and diagnosis so far!
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Slow try in combination with do.call

2021-10-12 Thread Martin Maechler
Just in case, you hadn't noticed:

Since Sep.17, we have had the faster  try()  now in both R-devel
and "R 4.1.1 patched" which will be released as R 4.1.2  by the
end of this month, with NEWS entry

• try() is considerably faster in case of an error and long call,
  as e.g., from some do.call().  Thanks to Alexander Kaever's
  suggestion posted to R-devel.

Martin


>>>>> nospam@altfeld-im de 
>>>>> on Tue, 12 Oct 2021 12:11:10 +0200 writes:

> In fact an attentive user reported the same type of (slow due to deparse) 
problem in may tryCatchLog package recently when using a large sparse matrix
> https://github.com/aryoda/tryCatchLog/issues/68

> and I have fixed it by explicitly using the nlines arg of deparse() 
instead of using as.character()
> which implicitly calls deparse() for a call stack.

> Looking for a fix I think I may have found inconsistent deparse default 
arguments in base R between as.character() and deparse():

> A direct deparse call in R uses
> control = c("keepNA", "keepInteger", "niceNames", "showAttributes")
> as default (see ?.deparseOpts for details).

> The as.character() implementation in the C code of base R calls the 
internal deparse C function
> with another default for .deparseOpts:
> The SIMPLEDEPARSE C constant which corresponds to control = NULL.
> 
https://github.com/wch/r-source/blob/54f94f0433c487fe3b0df9bae477c9babdd1/src/main/deparse.c#L345

> This is clearly no bug but maybe the as.character() implementation should 
use the default args of deparse() for consistency (just a proposal!)...

> BTW: You can find my analysis result with the call path and links to the 
R source code in the github issue:
> https://github.com/aryoda/tryCatchLog/issues/68#issuecomment-930593002



> On Thu, 2021-09-16 at 18:04 +0200, Martin Maechler wrote:
>> > > > > > Martin Maechler 
>> > > > > > on Thu, 16 Sep 2021 17:48:41 +0200 writes:
>> > > > > > Alexander Kaever 
>> > > > > > on Thu, 16 Sep 2021 14:00:03 + writes:
>> 
>> >> Hi,
>> >> It seems like a try(do.call(f, args)) can be very slow on error 
depending on the args size. This is related to a complete deparse of the call
>> using deparse(call)[1L] within the try function. How about replacing 
deparse(call)[1L] by deparse(call, nlines = 1)?
>> 
>> >> Best,
>> >> Alex
>> 
>> > an *excellent* idea!
>> 
>> > I have checked that the resulting try() object continues to contain the
>> > long large call; indeed that is not the problem, but the
>> > deparse()ing  *is* as you say above.
>> 
>> > {The experts typically use  tryCatch() directly, instead of  try() ,
>> > which may be the reason other experienced R developers have not
>> > stumbled over this ...}
>> 
>> > Thanks a lot, notably also for the clear  repr.ex. below.
>> 
>> > Best regards,
>> > Martin
>> 
>> OTOH, I find so many cases  of   deparse(*)[1]  (or similar) in
>> R's own sources, I'm wondering
>> if I'm forgetting something ... and using nlines=* is not always
>> faster & equivalent and hence better ??
>> 
>> Martin
>> 
>> 
>> 
>> 
>> >> Example:
>> 
>> >> fun <- function(x) {
>> >> stop("testing")
>> >> }
>> >> d <- rep(list(mtcars), 1)
>> >> object.size(d)
>> >> # 72MB
>> 
>> >> system.time({
>> >> try(do.call(fun, args = list(x = d)))
>> >> })
>> >> # 8s
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] int overflow writing long vectors to socketConnection

2021-10-12 Thread Martin Maechler
>>>>> Zafer Barutcuoglu 
>>>>> on Tue, 12 Oct 2021 02:04:40 -0400 writes:

> Hi,

> Writing >=2GB to a socketConnection (e.g. via writeBin) does not work 
correctly, because of this int typecast in modules/internet/sockconn.c:
>> static size_t sock_write(const void *ptr, size_t size, size_t nitems,
>> Rconnection con)
>> {
>> Rsockconn this = (Rsockconn)con->private;
>> ssize_t n = R_SockWrite(this->fd, ptr, (int)(size * nitems),
this-> timeout)/((ssize_t)size);
>> return n > 0 ? n : 0;
>> }
> which seems uncalled for, given:
>> ssize_t R_SockWrite(int sockp, const void *buf, size_t len, int timeout)


> Is there a rationale for it, or should it be fixed?

I've fixed it; it's clearly been a  "typo" introduced at the
same when the type of 'len' in the R_SockWrite() header was
changed from int to size_t .. and the intent must have been to
do the same inside sock_write().

> Best,
> --
> Zafer

Thank you for the report!

Martin Maechler
ETH Zurich  and  R Core

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Crash/bug when calling match on latin1 strings

2021-10-11 Thread Martin Maechler
>>>>> Rui Barradas 
>>>>> on Mon, 11 Oct 2021 07:41:51 +0100 writes:

> Hello,

> R 4.1.1 on Ubuntu 20.04.

> I can reproduce this error but not ~90% of the time, only the 1st time I 
> run the script.
> If I run other (terminal) commands before rerunning the R script it 
> sometimes segfaults again but once again very far from 90% of the time.


> rui@rui:~/tmp$ R -q -f rhelp.R
>> sessionInfo()
> R version 4.1.1 (2021-08-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 20.04.3 LTS

> Matrix products: default
> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

> locale:
> [1] LC_CTYPE=pt_PT.UTF-8   LC_NUMERIC=C
> [3] LC_TIME=pt_PT.UTF-8LC_COLLATE=pt_PT.UTF-8
> [5] LC_MONETARY=pt_PT.UTF-8LC_MESSAGES=pt_PT.UTF-8
> [7] LC_PAPER=pt_PT.UTF-8   LC_NAME=C
> [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C

> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base

> loaded via a namespace (and not attached):
> [1] compiler_4.1.1
>> 
>> # A bunch of words in UTF8; replace *'s
>> words <- readLines("h://pastebin.c**/raw/MFCQfhpY", encoding = 
> "UTF-8")
>> words2 <- iconv(words, "utf-8", "latin1")
>> gctorture(TRUE)
>> y <- match(words2, words2)

> *** caught segfault ***
> address 0x10, cause 'memory not mapped'
> *** recursive gc invocation
> *** recursive gc invocation
> *** recursive gc invocation
> *** recursive gc invocation
> *** recursive gc invocation
> *** recursive gc invocation
> *** recursive gc invocation
> *** recursive gc invocation
> *** recursive gc invocation
> *** recursive gc invocation

> Traceback:
> 1: match(words2, words2)
> An irrecoverable exception occurred. R is aborting now ...
> Falta de segmentação (núcleo despejado)



> This last line is Portuguese for

> Segmentation fault (core dumped)

> Hope this helps,

Yes, it does, thank you!

I can confirm the problem:  Only in R 4.1.0 and newer, and
including current "R-patched" and "R-devel" versions.

I've now turned this into a formal R bug report on R's bugzilla,
and (slightly) extended your (Travers') example into self
contained (no internet access) R script.

Bugzilla PR#18211 :" match() memory corruption "

  https://bugs.r-project.org/show_bug.cgi?id=18211

  with attachment 2929
  --> https://bugs.r-project.org/attachment.cgi?id=2929=edit

==> please if possible follow up on bugzilla

Thanks again to you both!
Martin Maechler


> Rui Barradas

> Às 06:05 de 11/10/21, Travers Ching escreveu:
>> Here's a brief example:
>> 
>> # A bunch of words in UTF8; replace *'s
>> words <- readLines("h://pastebin.c**/raw/MFCQfhpY", encoding = 
"UTF-8")
>> words2 <- iconv(words, "utf-8", "latin1")
>> gctorture(TRUE)
>> y <- match(words2, words2)
>> 
>> 
>> I searched bugzilla but didn't see anything. Apologies if this is already
>> reported.
>> 
>> The bug appears in both R-devel and the release, but doesn't seem to 
affect
>> R 4.0.5.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 4.1.x make check fails, stats-Ex.R, step factor reduced below minFactor

2021-10-01 Thread Martin Maechler
> Andrew Piskorski 
> on Fri, 1 Oct 2021 05:01:39 -0400 writes:

> I recently built R 4.1.1 (Patched) from source, as I have many older
> versions over the years.  This version, on Ubuntu 18.04.4 LTS:

> R 4.1.1 (Patched), 2021-09-21, svn.rev 80946, x86_64-pc-linux-gnu

> Surprisingly, "make check" fails, which I don't recall seeing before.
> The error is in from stats-Ex.R, which unfortunately terminates all
> further testing!  This particular error, "step factor ... reduced
> below 'minFactor'" does not seem very serious, but I can't figure out
> why it's happening.

> I installed with "make install install-tests" as usual, which seemed
> to work fine.  Running the same tests after install, I'm able to get
> more coverage by using errorsAreFatal=FALSE.  However, it seems the
> rest of the 'stats' tests after the bad one still do not run.

> I'm confused about the intent of this particular test.  The comment
> above it seems to says that it's SUPPOSED to throw this error, yet
> getting the error still terminates further testing, which seems
> strange.  What's supposed to happen here?

> Any ideas on why this error might be occurring, and how I should debug
> it?  What's the right way for me to disable this one failing test, so
> the ones after it can run?

> Thanks for your help!


> ## "make check" output:
> make[1]: Entering directory 
'/home/nobackup/co/R/R-4-1-branch/Build-x86_64/tests'
> make[2]: Entering directory 
'/home/nobackup/co/R/R-4-1-branch/Build-x86_64/tests'
> make[3]: Entering directory 
'/home/nobackup/co/R/R-4-1-branch/Build-x86_64/tests/Examples'
> Testing examples for package 'base'
> Testing examples for package 'tools'
> comparing 'tools-Ex.Rout' to 'tools-Ex.Rout.save' ... OK
> Testing examples for package 'utils'
> Testing examples for package 'grDevices'
> comparing 'grDevices-Ex.Rout' to 'grDevices-Ex.Rout.save' ... OK
> Testing examples for package 'graphics'
> comparing 'graphics-Ex.Rout' to 'graphics-Ex.Rout.save' ... OK
> Testing examples for package 'stats'
> Error: testing 'stats' failed
> Execution halted
> Makefile:37: recipe for target 'test-Examples-Base' failed
> make[3]: *** [test-Examples-Base] Error 1
> make[3]: Leaving directory 
'/home/nobackup/co/R/R-4-1-branch/Build-x86_64/tests/Examples'
> ../../tests/Makefile.common:198: recipe for target 'test-Examples' failed
> make[2]: *** [test-Examples] Error 2
> make[2]: Leaving directory 
'/home/nobackup/co/R/R-4-1-branch/Build-x86_64/tests'
> ../../tests/Makefile.common:184: recipe for target 'test-all-basics' 
failed
> make[1]: *** [test-all-basics] Error 1
> make[1]: Leaving directory 
'/home/nobackup/co/R/R-4-1-branch/Build-x86_64/tests'
> Makefile:305: recipe for target 'check-all' failed
> make: *** [check-all] Error 2


> ## From file:  tests/Examples/stats-Ex.Rout.fail

>> ## Here, requiring close convergence, you need to use more accurate 
numerical
>> ##  differentiation; this gives Error: "step factor .. reduced below 
'minFactor' .."
>> options(digits = 10) # more accuracy for 'trace'
>> ## IGNORE_RDIFF_BEGIN
>> try(nlm1 <- update(nlmod, control = list(tol = 1e-7))) # where central 
diff. work here:
> Warning in nls(formula = y ~ Const + A * exp(B * x), algorithm = 
"default",  :
> No starting values specified for some parameters.
> Initializing 'Const', 'A', 'B' to '1.'.
> Consider specifying 'start' or using a selfStart model

So this did give an error we expected (on some platforms only),
hence used try().

However, the next one "should work" (*)
and failing there, *does* fail the tests :

>> (nlm2 <- update(nlmod, control = list(tol = 8e-8, nDcentral=TRUE), 
trace=TRUE))
> Warning in nls(formula = y ~ Const + A * exp(B * x), algorithm = 
"default",  :
> No starting values specified for some parameters.
> Initializing 'Const', 'A', 'B' to '1.'.
> Consider specifying 'start' or using a selfStart model
> 1017460.306(4.15e+02): par = (1 1 1)
> 758164.7503(2.34e+02): par = (13.42031396 1.961485 0.05947543745)
> 269506.3538(3.23e+02): par = (51.75719816 -13.09155957 0.8428607709)
> 68969.21893(1.03e+02): par = (76.0006985 -1.935226745 1.0190858)
> 633.3672230(1.29e+00): par = (100.3761515 8.624648402 5.104490259)
> 151.4400218(9.39e+00): par = (100.6344391 4.913490985 0.2849209569)
> 53.08739850(7.24e+00): par = (100.6830407 6.899303317 0.4637755074)
> 1.344478640(5.97e-01): par = (100.0368306 9.897714142 0.5169294939)
> 0.9908415909   (1.55e-02): par = (100.0300625 9.9144191 0.5023516843)
> 0.9906046057   (1.84e-05): par = (100.0288724 9.916224018 0.5025207336)
> 0.9906046054   (9.95e-08): par = (100.028875 9.916228366 0.50252165)
> 0.9906046054   (9.93e-08): par = (100.028875 

Re: [Rd] translation domain is not inferred correctly from a package's print methods -- intended behavior?

2021-10-01 Thread Martin Maechler
> Michael Chirico 
> on Mon, 12 Jul 2021 14:21:14 -0700 writes:

> Here is a reprex:


> # initialize reprex package
> cd /tmp
> mkdir myPkg && cd myPkg
> echo "Package: myPkg" > DESCRIPTION
> echo "Version: 0.0.1" >> DESCRIPTION
> mkdir R
> echo "print.my_class = function(x, ...) { cat(gettext(\"'%s' is
> deprecated.\"), '\n', gettext(\"'%s' is deprecated.\",
> domain='R-myPkg'), '\n') }" > R/foo.R
> echo "S3method(print, my_class)" > NAMESPACE
> # extract string for translation
> Rscript -e "tools::update_pkg_po('.')"
> # add dummy translation
> msginit -i po/R-myPkg.pot -o po/R-ja.po -l ja --no-translator
> head -n -1 po/R-ja.po > tmp && mv tmp po/R-ja.po
> echo 'msgstr "%s successfully translated"' >> po/R-ja.po
> # install .mo translations
> Rscript -e "tools::update_pkg_po('.')"
> # install package & test
> R CMD INSTALL .
> LANGUAGE=ja Rscript -e "library(myPkg); print(structure(1, class = 
'my_class'))"
> #  '%s' は廃止予定です
> #  %s successfully translated

Trying to see if the current "R-devel trunk" would still suffer
from this, and prompted by Suharto Anggono's suggestion on R's
bugzilla,   https://bugs.r-project.org/show_bug.cgi?id=17998#c24


I've finally started looking at this ..
(Not having a Japanese locale installed though).

> Note that the first gettext() call, which doesn't supply domain=,
> returns the corresponding translation from base R (i.e., the output is
> the same as gettext("'%s' is deprecated.", domain="R-base")).

I don't see this (not having a Japanase locale?  should I try
with a locale I have installed?)

> The second gettext() call, where domain= is supplied, returns our
> dummy translation, which is what I would have expected from the first
> execution.

I can get the following which seems to say that everything is
fine and fixed now, right?

MM@lynne:myPkg$ LANGUAGE=ja R-devel -s --vanilla -e 
'library(myPkg,lib.loc="~/R/library/64-linux-MM-only");structure(1,class="my_class");R.version.string'
%s successfully translated 
 %s successfully translated 
[1] "R Under development (unstable) (2021-09-30 r80997)"


MM@lynne:myPkg$ LANGUAGE=ja `R-devel RHOME`/bin/Rscript --vanilla -e 
'library(myPkg,lib.loc="~/R/library/64-linux-MM-only");structure(1,class="my_class");R.version.string'
%s successfully translated 
 %s successfully translated 
[1] "R Under development (unstable) (2021-09-30 r80997)"


Note: During my experiments, I also do observe things confusing to me, when
using Rscript and R from the command line... in some cases
getting errors (in Japanese) ... but that may be just in those
cases I have left any space in the string
((in the case of 'R' which in my case suffers from quoting hell
  because I use wrapper  sh-scripts to call my versions of R ... ))


> Here is what's in ?gettext:

>> If domain is NULL or "", and gettext or ngettext is called from a 
function in the namespace of package pkg the domain is set to "R-pkg". 
Otherwise there is no default domain.


> Does that mean the S3 print method is not "in the namespace of myPkg"?

no.

> Or is there a bug here?

Yes, rather;  or there *was* one.

Thanks a lot, Michael!

Best,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] trunc.Date and round.Date + documentation of DateTimeClasses

2021-09-30 Thread Martin Maechler
Excuse the exceptional top-reply:

Note that a very related issue has been raised not so long ago
by Dirk (in CC) on R's Bugzilla :

  trunc.Date should support months and years arguments as trunc.POSIXt does 
  https://bugs.r-project.org/show_bug.cgi?id=18099

which had some agreement (also with you: I agree we should
change something about this) but I also had proposed to approach
it more generally than in the PR .. which you already did by
mentioning trunc() and round() methods together.

Still, Dirk's proposal would try harder to remain back
compatible in those cases where  trunc.Date() currently does
"behave as it should".

Martin Maechler
ETH Zurich  and  R Core

>>>>> SOEIRO Thomas 
>>>>> on Thu, 30 Sep 2021 10:32:32 + writes:

> About fractional days, trunc.Date2 actually seems to have no regression 
and to be backward compatible compared to the original trunc.Date:

> frac <- as.Date("2020-01-01") + 0.5
> identical(trunc(frac), trunc.Date2(frac))

> (I may still miss something since I do not understand how
> trunc.Date manage fractional days with round(x - 0.499).)

> -Message d'origine-
> De : SOEIRO Thomas 
> Envoyé : mercredi 29 septembre 2021 17:00
> À : 'r-devel@r-project.org'
> Objet : trunc.Date and round.Date + documentation of DateTimeClasses

> Dear All,

> 1) trunc.Date and round.Date:

> Currently, the help page for trunc.Date and round.Date
> says "The methods for class "Date" are of little use
> except to remove fractional days". However, e.g.,
> trunc.POSIXt(Sys.Date(), "years") and
> round.POSIXt(Sys.Date(), "years") work because the
> functions start with x <- as.POSIXlt(x).

> Would you consider a simple implementation of trunc.Date
> and round.Date based on trunc.POSIXt and round.POSIXt?
> This would enable to avoid coercion from Date to POSIXt
> and back to Date for these simple manipulations.

> For example:
> # (I do not have a clear understanding of what "remove fractional days" 
means, and I did not implement it.)

> trunc.Date2 <-
>   function(x, units = c("days", "months", "years"), ...)
>   {
> units <- match.arg(units)
> x <- as.POSIXlt(x)
> 
> switch(units,
>"days" = {
>  x$sec[] <- 0; x$min[] <- 0L; x$hour[] <- 0L;
>  x$isdst[] <- -1L
>},
>"months" = {
>  x$sec[] <- 0; x$min[] <- 0L; x$hour[] <- 0L;
>  x$mday[] <- 1L
>  x$isdst[] <- -1L
>},
>"years" = {
>  x$sec[] <- 0; x$min[] <- 0L; x$hour[] <- 0L;
>  x$mday[] <- 1L; x$mon[] <- 0L
>  x$isdst[] <- -1L
>}
> )
> as.Date(x)
>   }



> 2) documentation of DateTimeClasses:

> It may be useful to add in the documentation of
> DateTimeClasses that manipulating elements of POSIXlt
> objects may results in "invalid" entries (e.g., mon = 12
> or mday = 0), but that the object is nevertheless
> correctly printed/coerced.

> Is this behavior explicitly supported?

> d <- as.POSIXlt("2000-01-01")
> unclass(d)
> d$mon <- d$mon + 12
> d$mday <- d$ mday - 1
> unclass(d)
> d
> d <- as.POSIXlt(as.POSIXct(d))
> dput(d)



> Best,
> Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel: as.character() for hexmode no longer pads with zeros

2021-09-23 Thread Martin Maechler
> Henrik Bengtsson 
> on Wed, 22 Sep 2021 20:48:05 -0700 writes:

> The update in rev 80946
> 
(https://github.com/wch/r-source/commit/d970867722e14811e8ba6b0ba8e0f478ff482f5e)
> caused as.character() on hexmode objects to no longer pads with zeros.

Yes -- very much on purpose; by me, after discussing a related issue
within R-core which showed "how wrong" the previous (current R)
behavior of the as.character() method is for
hexmode and octmode objects :

If you look at the whole rev 80946 , you also read NEWS

 * as.character() for "hexmode" or "octmode" objects now
   fulfills the important basic rule

  as.character(x)[j] === as.character(x[j]) 
  ^

rather than just calling format().

The format() generic (notably for "atomic-alike" objects) should indeed
return a character vector where each string has the same "width",
however, the result of  as.character(x) --- at least for all
"atomic-alike" / "vector-alike" objects --
for a single x[j] should not be influenced by other elements in x.




> Before:

>> x <- structure(as.integer(c(0,8,16,24,32)), class="hexmode")
>> x
> [1] "00" "08" "10" "18" "20"
>> as.character(x)
> [1] "00" "08" "10" "18" "20"

> After:

>> x <- structure(as.integer(c(0,8,16,24,32)), class="hexmode")
>> x
> [1] "00" "08" "10" "18" "20"
>> as.character(x)
> [1] "0"  "8"  "10" "18" "20"

> Was that intended?

Yes!
You have to explore your example a bit to notice how "illogical"
the behavior before was:

> as.character(as.hexmode(0:15))
 [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "a" "b" "c" "d" "e" "f"
> as.character(as.hexmode(0:16))
 [1] "00" "01" "02" "03" "04" "05" "06" "07" "08" "09" "0a" "0b" "0c" "0d" "0e"
[16] "0f" "10"

> as.character(as.hexmode(16^(0:2)))
[1] "001" "010" "100"
> as.character(as.hexmode(16^(0:3)))
[1] "0001" "0010" "0100" "1000"
> as.character(as.hexmode(16^(0:4)))
[1] "1" "00010" "00100" "01000" "1"

all breaking the rule in the NEWS  and given above.

If you want format()  you should use format(),
but as.character() should never have used format() ..

Martin

> /Henrik

> PS. This breaks R.utils::intToHex()
> [https://cran.r-project.org/web/checks/check_results_R.utils.html]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] formatC(character()) returns length 1 result, but is documented otherwise

2021-09-22 Thread Martin Maechler
> Davis Vaughan 
> on Mon, 13 Sep 2021 16:35:47 -0400 writes:

> Hi all,

> I believe I have either found a small bug, or a possible inconsistency in
> documentation. formatC() returns a length 1 result if given a length 0
> character() as input.

> formatC(character())
> #> [1] ""

> But the return value documentation states that it returns: "A character
> object of same size and attributes as x".

> I'd love for this to return a size 0 result here, consistent with the docs
> and my mental model of size stability for this function.

> Here is where this happens (it is explicitly hard coded):
> 
https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/format.R#L149

> Thanks,
> Davis Vaughan

> [[alternative HTML version deleted]]

This may well be a "historical artefact" - definitely
older than R 1.0.0. ... and yes I've known much less about S and
its new dialect R, back in 1998.  ;-)

You are right in all you say, and my mental model corresponds to
yours,  so we will almost surely change this (for R-devel;
possibly even consider "back" porting to R 4.1.1 patched).

Thank you for the report and suggestion,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] [External] Re: What is a "retired"package?

2021-09-22 Thread Martin Maechler
> Lenth, Russell V 
> on Tue, 21 Sep 2021 18:43:07 + writes:

> As I suspected, and a good point. But please note that the term "retired" 
causes angst, and it may be good to change that to "superceded" or something 
else.

well,  some of us will  become "retired" somewhere in the
future rather than "superseded" .. ;-)

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Slow try in combination with do.call

2021-09-16 Thread Martin Maechler
>>>>> Martin Maechler 
>>>>> on Thu, 16 Sep 2021 17:48:41 +0200 writes:

>>>>> Alexander Kaever 
>>>>> on Thu, 16 Sep 2021 14:00:03 + writes:

>> Hi,
>> It seems like a try(do.call(f, args)) can be very slow on error 
depending on the args size. This is related to a complete deparse of the call 
using deparse(call)[1L] within the try function. How about replacing 
deparse(call)[1L] by deparse(call, nlines = 1)?

>> Best,
>> Alex

> an *excellent* idea!

> I have checked that the resulting try() object continues to contain the
> long large call; indeed that is not the problem, but the
> deparse()ing  *is* as you say above.

> {The experts typically use  tryCatch() directly, instead of  try() ,
> which may be the reason other experienced R developers have not
> stumbled over this ...}

> Thanks a lot, notably also for the clear  repr.ex. below.

> Best regards,
> Martin

OTOH, I find so many cases  of   deparse(*)[1]  (or similar) in
R's own sources, I'm wondering
if I'm forgetting something ... and using nlines=* is not always
faster & equivalent and hence better ??

Martin




>> Example:

>> fun <- function(x) {
>> stop("testing")
>> }
>> d <- rep(list(mtcars), 1)
>> object.size(d)
>> # 72MB

>> system.time({
>> try(do.call(fun, args = list(x = d)))
>> })
>> # 8s

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Slow try in combination with do.call

2021-09-16 Thread Martin Maechler
> Alexander Kaever 
> on Thu, 16 Sep 2021 14:00:03 + writes:

> Hi,
> It seems like a try(do.call(f, args)) can be very slow on error depending 
on the args size. This is related to a complete deparse of the call using 
deparse(call)[1L] within the try function. How about replacing 
deparse(call)[1L] by deparse(call, nlines = 1)?

> Best,
> Alex

an *excellent* idea!

I have checked that the resulting try() object continues to contain the
long large call; indeed that is not the problem, but the
deparse()ing  *is* as you say above.

{The experts typically use  tryCatch() directly, instead of  try() ,
 which may be the reason other experienced R developers have not
 stumbled over this ...}

Thanks a lot, notably also for the clear  repr.ex. below.

Best regards,
Martin


> Example:

> fun <- function(x) {
> stop("testing")
> }
> d <- rep(list(mtcars), 1)
> object.size(d)
> # 72MB

> system.time({
> try(do.call(fun, args = list(x = d)))
> })
> # 8s


> Unsere Informationen zum Datenschutz finden Sie 
hier.

> Evotec International GmbH, Hamburg. Amtsgericht Hamburg HRB 72242
> Geschäftsführung: Dr. Cord Dohrmann, Dr. Craig Johnstone, Enno Spillner

> STATEMENT OF CONFIDENTIALITY.

> This email and any attachments may contain confidential, proprietary, 
privileged and/or private information.  
> If received in error, please notify us immediately by reply email and 
then delete this email and any attachments from your system. Thank you.
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question about quantile fuzz and GPL license

2021-09-15 Thread Martin Maechler
> GILLIBERT, Andre 
> on Tue, 14 Sep 2021 16:13:05 + writes:

> On 9/14/21 9:22 AM, Abel AOUN wrote:
>> However I don't get why epsilon is multiplied by 4 instead of simply 
using epsilon.
>> Is there someone who can explain this 4 ?

> .Machine$double.eps is the "precision" of floating point values for 
values close to 1.0 (between 0.5 and 2.0).

> Using fuzz = .Machine$double.eps would have no effect if nppm is greater 
than or equal to 2.
> Using fuzz = 4 * .Machine$double.eps can fix rounding errors for nppm < 
8; for greater nppm, it has no effect.

> Indeed:
> 2 + .Machine$double.eps == 2
> 8+ 4*.Machine$double.eps == 8

> Since nppm is approximatively equal to the quantile multiplied by the 
sample size, it can be much greater than 2 or 8.

hmm: not "quantile":
 it is approximatively equal to the *'prob'* multiplied by the sample size
 {the quantiles themselves can be on any scale anyway, but they
  don't matter yet fortunately in these parts of the calculations}

but you're right in the main point that they are
approx. proportional to  n.

> Maybe the rounding errors are only problematic for small nppm; or only 
that case is taken in account.

> Moreover, if rounding errors are cumulative, they can be much greater 
than the precision of the floating point value. I do not know how this constant 
was chosen and what the use-cases were.

I vaguely remember I've been wondering about this also (back at the time).

Experiential wisdom would tell us to take such  fuzz values as
*relative* to the magnitude of the values they are added to,
here 'nppm' (which is always >= 0, hence no need for  abs(.) as usually).

So, instead of

j <- floor(nppm + fuzz)
h <- nppm - j
if(any(sml <- abs(h) < fuzz, na.rm = TRUE)) h[sml] <- 0

it would be (something like)

j <- floor(nppm*(1 + fuzz))
h <- nppm - j
if(any(sml <- abs(h) < fuzz*nppm, na.rm = TRUE)) h[sml] <- 0

or rather we would define fuzz as
   nppm * (k * .Machine$double.eps) 
for a small k.

- - -

OTOH,  type=7 is the default, and I guess used in 99.9% of
all uses of quantile, *and* does never use any fuzz 

Martin

> --
> Sincerely
> Andre GILLIBERT


> [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unneeded if statements in RealFromComplex C code

2021-09-10 Thread Martin Maechler
> Hervé Pagès 
> on Thu, 9 Sep 2021 17:54:06 -0700 writes:

> Hi,

> I just stumbled across these 2 lines in RealFromComplex (lines 208 & 209 
> in src/main/coerce.c):

> double attribute_hidden
> RealFromComplex(Rcomplex x, int *warn)
> {
>   if (ISNAN(x.r) || ISNAN(x.i))
>   return NA_REAL;
>   if (ISNAN(x.r)) return x.r;<- line 208
>   if (ISNAN(x.i)) return NA_REAL;<- line 209
>   if (x.i != 0)
>  *warn |= WARN_IMAG;
>   return x.r;
> }

> They were added in 2015 (revision 69410).

by me.  "Of course" the intent at the time was to  *replace* the
previous 2 lines and return NA/NaN of the "exact same kind"

but in the mean time, I have learned that trying to preserve
exact *kinds* of NaN / NA is typically not platform portable,
anyway because compiler/library optimizations and
implementations are pretty "free to do what they want" with these.

> They don't serve any purpose and might slow things down a little (unless 
> compiler optimization is able to ignore them). In any case they should 
> probably be removed.

I've cleaned up now, indeed back compatibly, i.e., removing both
lines as you suggested.

Thank you, Hervé!

Martin


> Cheers,
> H.

> -- 
> Hervé Pagès

> Bioconductor Core Team
> hpages.on.git...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sep hard coded in write.ftable

2021-09-02 Thread Martin Maechler
> SOEIRO Thomas 
> on Wed, 1 Sep 2021 15:01:43 + writes:

> Dear all,

> (This is a follow up of a previous suggestion for ftable that was added 
in R 4.1.0: https://stat.ethz.ch/pipermail/r-devel/2020-May/079451.html)

> The sep argument is hard coded in write.ftable:

> write.ftable <- function(x, file = "", quote = TRUE, append = FALSE,
> digits = getOption("digits"), ...)
> {
> r <- format.ftable(x, quote = quote, digits = digits, ...)
> cat(t(r), file = file, append = append,
> sep = c(rep(" ", ncol(r) - 1), "\n"))
> invisible(x)
> }

> A minor change would allow users to modify it:

> write.ftable2 <- function(x, file = "", quote = TRUE, append = FALSE,
> digits = getOption("digits"), sep = " ", ...)
> {
> r <- stats:::format.ftable(x, quote = quote, digits = digits, ...)
> cat(t(r), file = file, append = append,
> sep = c(rep(sep, ncol(r) - 1), "\n"))
> invisible(x)
> }

I agree this sounds reasonable, and am currently running
'make check-devel' on sources modified accordingly ..

Martin


> This would allow to avoid a previous call to format.ftable (although 
write.ftable is significantly slower than write.table):

> ftable(formula = wool + tension ~ breaks, data = warpbreaks) |>
> format(quote = FALSE) |>
> write.table(sep = ";", row.names = FALSE, col.names = FALSE)

> ftable(formula = wool + tension ~ breaks, data = warpbreaks) |>
> write.ftable2(sep = ";")

> Best regards,
> Thomas

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is it a good choice to increase the NCONNECTION value?

2021-08-24 Thread Martin Maechler
> GILLIBERT, Andre 
> on Tue, 24 Aug 2021 09:49:52 + writes:

  > RConnection is a pointer to a Rconn structure. The Rconn
  > structure must be allocated independently (e.g. by
  > malloc() in R_new_custom_connection).  Therefore,
  > increasing NCONNECTION to 1024 should only use 8
  > kilobytes on 64-bits platforms and 4 kilobytes on 32
  > bits platforms.

You are right indeed, and I was wrong.

  > Ideally, it should be dynamically allocated : either as
  > a linked list or as a dynamic array
  > (malloc/realloc). However, a simple change of
  > NCONNECTION to 1024 should be enough for most uses.

There is one important other problem I've been made aware
(similarly to the number of open DLL libraries, an issue 1-2
years ago) :

The OS itself has limits on the number of open files
(yes, I know that there are other connections than files) and
these limits may quite differ from platform to platform.

On my Linux laptop, in a shell, I see

  $ ulimit -n
  1024

which is barely conformant with your proposed 1024 NCONNECTION.

Now if NCONNCECTION is larger than the max allowed number of
open files and if R opens more files than the OS allowed, the
user may get quite unpleasant behavior, e.g. R being terminated brutally
(or behaving crazily) without good R-level warning / error messages.

It's also not at all sufficient to check for the open files
limit at compile time, but rather at R process startup time 

So this may need considerably more work than you / we have
hoped, and it's probably hard to find a safe number that is
considerably larger than 128  and less than the smallest of all
non-crazy platforms' {number of open files limit}.

  > Sincerely
  > Andr� GILLIBERT

  []

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is it a good choice to increase the NCONNECTION value?

2021-08-24 Thread Martin Maechler
> qweytr1--- via R-devel 
> on Tue, 24 Aug 2021 00:51:31 +0800 (GMT+08:00) writes:

> At least in 2015, a github user, tobigithub, submit an
> [issue](https://github.com/sneumann/xcms/issues/20) about
> the error "Error in file(con, "w") : all connections are
> in use" Nowadays, since AMD have really cool CPUs which
> increases the thread numbers to 128 or even 256 on a
> single server, we found that the NCONNECTIONS variable
> could prevent us from utilizing all the 128 threads.  It
> might be a good choice to increase its value.


> the variable is defined in
> `R-4.1.1/src/main/connections.c: 17` I have tested that,
> increase it to 1024 generates no error and all the
> clusters (I tried with 256 clusters on my 16 threads
> Laptop) works fine.

> Is it possible increase the size of NCONNECTION?

Yes, of course, it is possible.
The question is how much it costs  and to which number it should
be increased.

A quick look at the source connections.c --> src/R_ext/include/Connections.h
reveals that the Rconnection* <--> Rconn is a struct with about
200 chars and ca 30 int-like plus another 20 pointers .. which
would amount to a rough 400 bytes per connection.
Adding 1024-128 = 896 new ones  would then amount to increase
the R executable by about 360 kB .. all the above being rough.
So personally, I guess that's  "about ok" --
are there other things to consider?

Ideally, of course, the number of possible connections could be
increased dynamically only when needed

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Seeking opinions on possible change to nls() code

2021-08-20 Thread Martin Maechler
> J C Nash 
> on Fri, 20 Aug 2021 11:41:26 -0400 writes:

> Thanks Martin. I'd missed the intention of that option,
> but re-reading it now it is obvious.

> FWIW, this problem is quite nasty, and so far I've found
> no method that reveals the underlying dangers well. And
> one of the issues with nonlinear models is that they
> reveal how slippery the concept of inference can be when
> applied to parameters in such models.

> JN

Indeed.

Just for the public (and those reading the archives in the future).

When Doug Bates and his phd student José Pinheiro wrote
"the NLME book"  (<==> Recommended R package {nlme}
   https://cran.R-project.org/package=nlme )

José C. Pinheiro and  Douglas M. Bates
Mixed-Effects Models in S and S-PLUS
Springer-Verlag (January 2000)
DOI: 10.1007/b98882 --> https://link.springer.com/book/10.1007%2Fb98882

They teach quite a bit about non-linear regression much of which
seems not much known nor taught nowadays.

NOTABLY they teach self-starting models, something phantastic,
available in R together with nls()  but unfortunately *also* not
much known nor taught!

I have improved the help pages, notably the examples for these,
in the distant past I vaguely remember.

Your present 9-point example can indeed also be solved beautiful
by R's builtin  SSbiexp()  [Self-starting bi-exponential model]:

NLSdata <- data.frame(
time = c(  1,   2,   3,   4,   6 ,  8,  10,  12,  16),
conc = c( 0.7, 1.2, 1.4, 1.4, 1.1, 0.8, 0.6, 0.5, 0.3))

## Once you realize that the above is the "simple"  bi-exponential model,
## you should remember  SSbiexp(),  and then

"everything is easy "

try4 <- nls(conc ~ SSbiexp(time, A1, lrc1, A2, lrc2), data = NLSdata,
trace=TRUE, control=nls.control(warnOnly=TRUE))
## --> converges nicely and starts much better anyway:
## 0.1369091   (2.52e+00): par = (-0.7623289 -2.116174 -2.339856 2.602446)
## 0.01160784  (4.97e-01): par = (-0.1988961 -1.974059 -3.523139 2.565856)
## 0.01016776  (1.35e-01): par = (-0.3653394 -1.897649 -3.547569 2.862685)
## 0.01005199  (3.22e-02): par = (-0.3253514 -1.909544 -3.55429 2.798951)
## 0.01004574  (8.13e-03): par = (-0.336659 -1.904219 -3.559615 2.821439)
## 0.01004534  (2.08e-03): par = (-0.3338447 -1.905399 -3.558815 2.816159)
## 0.01004532  (5.30e-04): par = (-0.3345701 -1.905083 -3.559067 2.817548)
## 0.01004531  (1.36e-04): par = (-0.3343852 -1.905162 -3.559006 2.817195)
## 0.01004531  (3.46e-05): par = (-0.3344325 -1.905142 -3.559022 2.817286)
## 0.01004531  (8.82e-06): par = (-0.3344204 -1.905147 -3.559018 2.817263)
## 0.01004531  (7.90e-06): par = (-3.559018 -0.3344204 2.817263 -1.905147)

## even adding central differences and  'scaleOffset' .. but that's not making 
big diff.:
try5 <- nls(conc ~ SSbiexp(time, A1, lrc1, A2, lrc2), data = NLSdata,
trace=TRUE, control=nls.control(warnOnly=TRUE, nDcentral=TRUE, 
scaleOffset = 1))
## 0.1369091 (1.43e-01): par = (-0.7623289 -2.116174 -2.339856 2.602446)
## 
## 0.01004531(5.43e-06): par = (-3.559006 -0.3343852 2.817195 -1.905162)
fitted(try5)
## [1] 0.6880142 1.2416734 1.3871354 1.3503718 1.1051246 0.8451185 0.6334280 
0.4717800
## [9] 0.2604932

all.equal(  coef(try4),   coef(try5)) # "Mean relative difference: 1.502088e-05"
all.equal(fitted(try4), fitted(try5)) # "Mean relative difference: 2.983784e-06"

## and a nice plot:
plot(NLSdata, ylim = c(0, 1.5), pch=21, bg="red")
abline(h=0, lty=3, col="gray")
lines(NLSdata$time, fitted(try5), lty=2, lwd=1/2, col="orange")
tt <- seq(0, 17, by=1/8)
str(pp <- predict(try5, newdata = list(time = tt)))
 ## num [1:137] -0.7418 -0.4891 -0.2615 -0.0569 0.1269 ...
 ## - attr(*, "gradient")= num [1:137, 1:4] 1 0.914 0.836 0.765 0.699 ...
 ##  ..- attr(*, "dimnames")=List of 2
 ##  .. ..$ : NULL
 ##  .. ..$ : chr [1:4] "A1" "lrc1" "A2" "lrc2"
lines(tt, pp, col=4)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Seeking opinions on possible change to nls() code

2021-08-20 Thread Martin Maechler
> J C Nash 
> on Fri, 20 Aug 2021 11:06:25 -0400 writes:

> In our work on a Google Summer of Code project
> "Improvements to nls()", the code has proved sufficiently
> entangled that we have found (so far!)  few
> straightforward changes that would not break legacy
> behaviour. One issue that might be fixable is that nls()
> returns no result if it encounters some computational
> blockage AFTER it has already found a much better "fit"
> i.e. set of parameters with smaller sum of squares.  Here
> is a version of the Tetra example:

time=c( 1,  2,  3,  4,  6 , 8, 10, 12, 16)
conc = c( 0.7, 1.2, 1.4, 1.4, 1.1, 0.8, 0.6, 0.5, 0.3)
NLSdata <- data.frame(time,conc)
NLSstart <-c(lrc1=-2,lrc2=0.25,A1=150,A2=50) # a starting vector (named!)
NLSformula <-conc ~ A1*exp(-exp(lrc1)*time)+A2*exp(-exp(lrc2)*time)
tryit <- try(nls(NLSformula, data=NLSdata, start=NLSstart, trace=TRUE))
print(tryit)

> If you run this, tryit does not give information that the
> sum of squares has been reduced from > 6 to < 2, as
> the trace shows.

> Should we propose that this be changed so the returned
> object gives the best fit so far, albeit with some form of
> message or return code to indicate that this is not
> necessarily a conventional solution? Our concern is that
> some examples might need to be adjusted slightly, or we
> might simply add the "try-error" class to the output
> information in such cases.

> Comments are welcome, as this is as much an infrastructure
> matter as a computational one.

Hmm...  many years ago, we had introduced the  'warnOnly=TRUE'
option to nls()  i.e., nls.control()  exactly for such cases,
where people would still like to see the solution:

So,

--
> try2 <- nls(NLSformula, data=NLSdata, start=NLSstart, trace=TRUE, 
  control = nls.control(warnOnly=TRUE))
61215.76(3.56e+03): par = (-2 0.25 150 50)
2.175672(2.23e+01): par = (-1.9991 0.3171134 2.618224 -1.366768)
1.621050(7.14e+00): par = (-1.960475 -2.620293 2.575261 -0.5559918)
Warning message:
In nls(NLSformula, data = NLSdata, start = NLSstart, trace = TRUE,  :
  singular gradient

> try2
Nonlinear regression model
  model: conc ~ A1 * exp(-exp(lrc1) * time) + A2 * exp(-exp(lrc2) * time)
   data: NLSdata
   lrc1lrc2  A1  A2 
 -22.89   96.43  156.70 -156.68 
 residual sum-of-squares: 218483

Number of iterations till stop: 2 
Achieved convergence tolerance: 7.138
Reason stopped: singular gradient

> coef(try2)
  lrc1   lrc2 A1 A2 
 -22.88540   96.42686  156.69547 -156.68461 


> summary(try2)
Error in chol2inv(object$m$Rmat()) : 
  element (3, 3) is zero, so the inverse cannot be computed
>
--

and similar for  vcov(), of course, where the above error
originates.

{ I think  GSoC (andr other)  students should start by studying and
  exploring relevant help pages before drawing conclusions
  ..
  but yes, I've been born in the last millennium ...
}

;-)

Have a nice weekend!
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] difference of m1 <- lm(f, data) and update(m1, formula=f)

2021-08-11 Thread Martin Maechler
I'm diverting this from R-help to R-devel,

because I'm asking / musing if and if where we should / could
change R here (see below).

>>>>> Martin Maechler on 11 Aug 2021 11:51:25 +0200

>>>>> Tim Taylor .. on 08:45:48 + writes:

>> Manipulating formulas within different models I notice the following:

>> m1 <- lm(formula = hp ~ cyl, data = mtcars)
>> m2 <- update(m1, formula. = hp ~ cyl)
>> all.equal(m1, m2)
>> #> [1] TRUE
>> identical(m1, m2)
>> #> [1] FALSE
>> waldo::compare(m1, m2)
>> #> `old$call[[2]]` is a call
>> #> `new$call[[2]]` is an S3 object of class , a call

>> I'm aware formulas are a form of call but what I'm unsure
>> of is whether there is meaningful difference between the
>> two versions of the models? 

> A good question.
> In principle, the promise of an update()  method should be to
> produce the *same* result as calling the original model-creation
> (or more generally object-creation) function call.

> So, already with identical(), you've shown that this is not
> quite the case for simple lm(),
> and indeed that is a bit undesirable.

> To answer your question re "meaningful" difference,
> given what I say above is:
> No, there shouldn't be any relevant difference, and if there is,
> that may considered a bug in the respective update() method,
> here update.lm.

> More about this in the following  R code snippet :

Again, a repr.ex.:

---0<---0<---0<---0<---0<---0<---0<

m1 <- lm(formula = hp ~ cyl, data = mtcars)
m2  <- update(m1, formula. = hp ~ cyl)
m2a <- update(m1)
identical(m1, m2a)#>  TRUE !
## ==> calling update() & explicitly specifying the formula is "the problem"

identical(m1$call, m2$call) #> [1] FALSE
noCall <- function(x) x[setdiff(names(x), "call")]
identical(noCall(m1), noCall(m2))# TRUE!
## look closer:
c1 <- m1$call
c2 <- m2$call
str(as.list(c1))
## List of 3
##  $: symbol lm
##  $ formula: language hp ~ cyl
##  $ data   : symbol mtcars

str(as.list(c2))
## List of 3
##  $: symbol lm
##  $ formula:Class 'formula'  language hp ~ cyl
##   .. ..- attr(*, ".Environment")=
##  $ data   : symbol mtcars

identical(c1[-2], c2[-2]) # TRUE ==> so, indeed the difference is *only* in the 
formula ( = [2]) component
f1 <- c1$formula
f2 <- c2$formula
all.equal(f1,f2) # TRUE
identical(f1,f2) # FALSE

## Note that this is typically *not* visible if the user uses
## the accessor functions they should :
identical(formula(m1), formula(m2)) # TRUE !
## and indeed, the formula() method for 'lm'  does set the environment:
stats:::formula.lm

---0<---0<---0<---0<---0<---0<---0<

We know that it has been important in  R  the formulas have an
environment and that's been the only R-core recommended way to
do non-standard evaluation (!! .. but let's skip that for now !!).

OTOH we have also kept the convention that a formula without
environment implicitly means its environment
is .GlobalEnv aka globalenv().

Currently, I think formula() methods then *should* always return
a formula *with* an environment .. even though that's not
claimed in the reference, i.e., ?formula.

Also, the print() method for formulas by default does *not* show the
environment if it is .GlobalEnv, as you can see on that help
already in the "Usage" section:

 ## S3 method for class 'formula'
 print(x, showEnv = !identical(e, .GlobalEnv), ...)
 
Now, I've looked at the update() here, which is update.default()
and the source code of that currently is

update.formula <- function (old, new, ...)
{
tmp <- .Call(C_updateform, as.formula(old), as.formula(new))
## FIXME?: terms.formula() with "large" unneeded attributes:
formula(terms.formula(tmp, simplify = TRUE))
}

where the important part is the "FIXME" comment (seen in the R
sources, but no longer in the R function after installation).

My current "idea" is to formalize what we see working here:
namely allow  update.formula() to *not* set the environment of
its result *if* that environment would be .GlobalEnv ..

--> I'm starting to test my proposal
but would still be *very* glad for comments, also contradicting
ones!

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] problem with pipes, textConnection and read.dcf

2021-08-11 Thread Martin Maechler
> peter dalgaard 
> on Tue, 10 Aug 2021 22:00:16 +0200 writes:

> It's not a pipe issue:

>> textConnection(gsub(gsub(L, pattern = " ", replacement = ""), pattern = 
" ", replacement = ""))
> Error in textConnection(gsub(gsub(L, pattern = " ", replacement = ""),  : 
> argument 'object' must deparse to a single character string
>> textConnection(gsub(L, pattern = " ", replacement = ""))
> A connection with 
 
> description "gsub(L, pattern = \" \", replacement = \"\")"
> class   "textConnection"  
> mode"r"   
> text"text"
> opened  "opened"  
> can read"yes" 
> can write   "no"  

> I suppose the culprit is that the deparse(substitute(...)) construct in 
the definition of textConnection() can generate multiple lines if the object 
expression gets complicated.

>> textConnection
> function (object, open = "r", local = FALSE, name = 
deparse(substitute(object)), 
> encoding = c("", "bytes", "UTF-8")) 

> This also suggests that setting name=something might be a cure.

> -pd

Indeed.

In R 4.0.0, I had introduced the deparse1() short cut to be used
in place of  deparse() in such cases:

NEWS has said

• New function deparse1() produces one string, wrapping deparse(),
  to be used typically in deparse1(substitute(*)), e.g., to fix
  PR#17671.

and the definition is a simple but useful oneliner

  deparse1 <- function (expr, collapse = " ", width.cutoff = 500L, ...) 
  paste(deparse(expr, width.cutoff, ...), collapse = collapse)


So I'm almost sure we should use  deparse1() in textConnection
(and will make check and potentially commit that unless ...)

Martin


>> On 10 Aug 2021, at 21:33 , Gabor Grothendieck  
wrote:
>> 
>> This gives an error bit if the first gsub line is commented out then 
there is no
>> error even though it is equivalent code.
>> 
>> L <- c("Variable:id", "Length:112630 ")
>> 
>> L |>
>> gsub(pattern = " ", replacement = "") |>
>> gsub(pattern = " ", replacement = "") |>
>> textConnection() |>
>> read.dcf()
>> ## Error in textConnection(gsub(gsub(L, pattern = " ", replacement = 
""),  :
>> ##  argument 'object' must deparse to a single character string
>> 
>> That is this works:
>> 
>> L |>
>> # gsub(pattern = " ", replacement = "") |>
>> gsub(pattern = " ", replacement = "") |>
>> textConnection() |>
>> read.dcf()
>> ##  Variable Length
>> ## [1,] "id" "112630"
>> 
>> R.version.string
>> ## [1] "R version 4.1.0 RC (2021-05-16 r80303)"
>> win.version()
>> ## [1] "Windows 10 x64 (build 19042)"
>> 
>> -- 
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Workaround for code/documentation mismatch

2021-08-11 Thread Martin Maechler
> Hello,
> 
> I've written two functions to emulate do while/until loops seen in other
> languages, but I'm having trouble documenting its usage. The function is
> typically used like:
> 
> do ({
> expr1
> expr2
> ...
> }) %while% (cond)

I understand that you did *not* ask .. but really
why don't you want to use R's own
builtin, fast, well documented, present everywhere *and* simpler syntax

while(cond) {
  expr1
  expr2
  ...
}   

???

and also

   repeat {
 expr1
 expr2
 
 if(cond) break
   }

instead of your   %until%  below?


> so I want to document it something like:
> 
> do(expr) %while% (cond)
> do(expr) %until% (cond)
> 
> to look like the documentation for 'while' and 'if', but R CMD check
> produces a "Code/documentation mismatch" warning, complaining that the
> documentation should look like:
> 
> expr %while% cond
> expr %until% cond
> 
> So, my question is, is there a way to bypass the
> * checking for code/documentation mismatches
> portion of R CMD check, at least for one file? Some way to acknowledge that
> the code and documentation will mismatch, but that's okay.
> 
> 
> Thank you!

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] attach "warning" is a message

2021-08-10 Thread Martin Maechler
> Simon Urbanek 
> on Tue, 10 Aug 2021 11:28:19 +1200 writes:

> Barry,

> it is not a warning nor plain output, it is a message, so you can use 

>> d = data.frame(x=1:10)
>> x=1
>> suppressMessages(attach(d))
>> 

> Looking at the history, this used to be cat() but got changed to a 
message in R 3.2.0 (r65385, CCIng Martin in case he remembers the rationale for 
warning vs message). I don't know for sure why it is not a warning, but I can 
see that it is more in the line with informative messages (like package 
masking) as opposed to a warning - in fact the change suggest that the 
intention was to synchonize both. That said, I guess the two options are to 
clarify the documentation (also in library()) or change to a warning - not sure 
what the consequences of the latter would be.

> Cheers,
> Simon

Thank you, Simon.
One rationale back then was that  message() *is* in some sense
closer to cat()  than warning() is (and indeed, to synchronize
with the conflict messages from `library(.)` or `require(.)`.
Also, I would say that a message is more appropriate here than a
warning when someone explicitly attach()es something to the
search() path (s)he may get a notice about masking but to be
warned is too strong {warnings being made into errors in some setups}.

So I'd propose to only update the documentation i.e.
help(attach).

Martin


>> On 10/08/2021, at 2:06 AM, Barry Rowlingson 
 wrote:
>> 
>> If I mask something via `attach`:
>> 
>>> d = data.frame(x=1:10)
>>> x=1
>>> attach(d)
>> The following object is masked _by_ .GlobalEnv:
>> 
>> x
>>> 
>> 
>> I get that message. The documentation for `attach` uses the phrase
>> "warnings", although the message isn't coming from `warning()`:
>> 
>> warn.conflicts: logical.  If ‘TRUE’, warnings are printed about
>> ‘conflicts’ from attaching the database, unless that database
>> contains an object ‘.conflicts.OK’.  A conflict is a function
>> masking a function, or a non-function masking a non-function.
>> 
>> and so you can't trap them with options(warn=...) and so on. This sent me
>> briefly down the wrong track while trying to figure out why R was 
showing a
>> masking error in one context but not another - I wondered if I'd 
supressed
>> warning()s in the other context.
>> 
>> Personally I'd like these messages to be coming from warning() since that
>> seems the appropriate way to warn someone they've done something which
>> might have unwanted effects. But fixing the documentation to say "If
>> ‘TRUE’, *messages* are printed" is probably less likely to break existing
>> code.
>> 
>> Happy to add something to bugzilla if anyone thinks I'm not being overly
>> pedantic here.
>> 
>> Barry
>> 
>> [[alternative HTML version deleted]]
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Tools beyond roxygen2 to write Rd documentation

2021-08-05 Thread Martin Maechler
>>>>> Iago Giné-Vázquez 
>>>>> on Thu, 05 Aug 2021 08:09:48 + writes:

> Dear all, Are there any tools to edit .Rd files by hand
> (so avoiding roxygen2), as an editor (like could be
> *RdStudio*, à la RStudio or TeXstudio) or a plugin for ViM
> or another editor (like could be *vim-Rd*, à la
> vim-latex)?

> Thanks for your answers,

> Stay safe, Iago

Yes, of course:
"All good" R package authors edit *.Rd files by hand
((;-), well no, I know it's not true ..
 I personally use roxygen (in ESS, see below) only sometimes at the very
 beginning of writing a new function, or for functions that I
 do *not* export, because there's a nice key stroke to create
 the outline). ... and if I export that function use the
 Roxygen -> Rd *once* only, and from then on, I edit the *.Rd nicely
 including using descriptions lists, some math, etc... all well
 indented, nicely human-readable etc, much nicer than hidden in
 roxygen R comments)

Rstudio has supported *.Rd in the past but as I'm not using it
regularly .. I don't know if they dropped it now... that would
honestly surprise me, though.

Rather  ESS (Emacs Speaks Statistics) has always well supported
Rd editing, and I thought the  vim - plugin for R would also
support *.Rd  but probably not ??

Best,
Martin

--
Martin Maechler
ETH Zurich,  R Core Team  (*and* ESS Core team)

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] I changed my vignette's file name to lowercase, then realized the url was case-sensitive

2021-08-02 Thread Martin Maechler
Dear Dominic,

This is the wrong mailing list for such questions
Do use 'R-help' or 'R-package-devel' instead, please.

(and also please do use  __plain text__
 instead of  "formatted" / "rich text" / ... e-mail  )

Best,
Martin Maechler


>>>>> Dominic Comtois 
>>>>> on Mon, 2 Aug 2021 02:36:27 -0400 writes:

> I changed my "Introduction.html" vignette's name to
> "introduction.html", realizing only after the fact that
> CRAN's URLs are case sensitive.

> Would the solution of adding to my package's source a new
> Introduction.html file pointing to introduction.html using
> a  be a viable one? Or is
> there maybe another, better solution?

> Thanks in advance

> Dominic Comtois, summarytools author & maintainer

>   [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] \Sexpr[results=hide] produces \verb{ newlines }

2021-07-31 Thread Martin Maechler
> Ivan Krylov 
> on Thu, 29 Jul 2021 17:48:38 +0200 writes:

> Hello R-devel!
> 
> Here's an Rd file that produces a large empty area when converted to
> HTML:
> 
> \name{repro}
> \title{title}
> \description{description}
> \details{
>   Hello
>   \Sexpr[stage=build,results=hide]{
> invisible(NULL)
> invisible(NULL)
> invisible(NULL)
> invisible(NULL)
> invisible(NULL)
> invisible(NULL)
> invisible(NULL)
> invisible(NULL)
> invisible(NULL)
> invisible(NULL)
> invisible(NULL)
> "" # workaround: remove results=hide and use the return value
>   }
> }
> 
> This seems to happen because \Sexpr gets expanded to \verb{ as many
> newlines as there were code lines } by processRdChunk, by first storing
> a newline for each line of the code:
> 
> https://github.com/wch/r-source/blob/d7a4ed9aaeee1f57c3c165aefa08b8d69dfe59fa/src/library/tools/R/RdConv2.R#L257
> 
> ...and then the newlines get translated to \verb because res is
> not empty:
> 
> https://github.com/wch/r-source/blob/d7a4ed9aaeee1f57c3c165aefa08b8d69dfe59fa/src/library/tools/R/RdConv2.R#L332
> 
> As long as Rd above doesn't stem from my misuse of \Sexpr, I would like
> to propose the following patch, which seems to fix the problem:
> 
> Index: src/library/tools/R/RdConv2.R
> ===
> --- src/library/tools/R/RdConv2.R (revision 80675)
> +++ src/library/tools/R/RdConv2.R (working copy)
> @@ -329,6 +329,8 @@
>   }
>   } else if (options$results == "text")
>   res <- tagged(err, "TEXT")
> + else if (options$results == "hide")
> + res <- tagged("", "COMMENT")
>   else if (length(res)) {
>   res <- lapply(as.list(res), function(x) tagged(x, "VERB"))
>   res <- tagged(res, "\\verb")
> 
> There are probably other ways of fixing this problem, e.g. by only
> populating res if options$results != "hide" or only appending newlines
> if res is non-empty.

Thank you, Ivan, for the example and patch,

I have implemented a version of your patch in my local copy of
R-devel and tested your example, also with  Rd2latex() ..
interestingly   Rd2txt()  does not produce the extra new lines
even without your patch.

I plan to commit your proposal after the weekend unless has 
reasons against that.

Best regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Trivial typo in help("browser")

2021-07-29 Thread Martin Maechler
> Rui Barradas 
> on Thu, 29 Jul 2021 07:52:02 +0100 writes:

> Hello,

> R 4.1.0 on Ubuntu 20.04, session info below.

> I am not an English native speaker but in help("browser"),
> section Arguments, argument expr, there is a wrong verb
> tense:

> "invoked" should be "invoke", right?

> expr An expression, which if it evaluates to TRUE the
> debugger will invoked, otherwise control is returned
> directly.

> sessionInfo()   R version 4.1.0  .

Thank you, Rui.   Indeed, there's a typo there.
I claim that it is a missing 'be':  "be invoked" almost surealy
was intended.

As we are on the R-devel mailing list, however, let's dig and learn a bit more:

Note that the *default*  is  `expr = TRUE`
which is already a bit "special" for an "expression"..

Let's try to understand what was meant --- NB a strategy I
strongly recommend even if you're somewhat experienced :

> ff <- function(x) { y <- x^2; browser("in ff():", expr = (y == 4)); y }
> ff(1)
[1] 1
> ff(2)
Called from: ff(2)
Browse[1]> debug at #1: y
Browse[2]> ls.str()
x :  num 2
y :  num 4
Browse[2]> c
[1] 4
> ff(3)
[1] 9
> 

So indeed, it does behave as I expected.
A further experiment, play with

   f2 <- function(x, e=1) { y <- x^2; browser("in ff():", expr = e); y }

shows that  'evaluates to TRUE'  is also
not as precise as it could be, and even "wrong":
 'expr = pi'  also behaves as TRUE,  and even
 'expr = NA'  behaves the same.


I don't know when/how  `expr` was introduced (probably taken
from 'S / S+' ..), but to me it seems actually somewhat a
misnomer because in that generalized sense, *every* 
argument passed to an R function is an "expression".
Instead, what counts is that a low-level as.logical(expr) is not TRUE.

So, yes, the documentation about `expr` definitely needs to be
changed.

Unless I get convincing suggestions for improvements, I'll commit

  \item{expr}{a \dQuote{condition}.  By default, and whenever not false
after being coerced to \code{\link{logical}}, the debugger will be
invoked, otherwise control is returned directly.}

(and also amend the formulation a bit later on the help page
 where expr is mentioned again).

Martin


> Thanks to all R Core Team members for their great work for
> all of us.

> Hope this helps,

> Rui Barradas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] package test returns error when R version 4.1.0

2021-07-06 Thread Martin Maechler
>>>>>   
>>>>> on Mon, 5 Jul 2021 17:46:54 -0400 writes:

> Gm:


> Did you try completely removing the LazyData line from the
> description file?


> David

Dear David,

where did you get this "idea" that 'LazyData'  is not good  for
 R 4.1.0 and newer ?

R's own  {datasets} package  *does* use lazyloading, and so do
most (formally) recommended packages, and I think most packages
I (co-)maintain, i.e., around two dozen CRAN packages do use
lazyloaded data.

---
*) The `Matrix` package is a big exception with 'Lazyload: No'
because its datasets partly are (S4-) classed objects from the package
itself, and -- as the "WRE" ('Writing R Extensions') manual
states -- package datasets must not *need* the package itself when
they should be lazy loaded.

Martin

--
Martin Maechler
ETH Zurich  and  R Core Team

 
> From: Gianmarco Alberti ..
> Sent: Monday, July 5, 2021 5:13 PM 
...

> Hello,

> Thank you all for the suggestions.

> I am starting being a bit worried because I seem not being
> able to fix the issue.

[..]

> I also tried to keep the new dependency and to drop the
> lazy download instead (in DESCRIPTION I have put LazyData:
> false). I got the same results as above.

> The package checks perfectly on my MAC, and checked
> perfectly when I asked a Win users to test the package on
> his PC (with the latest version of R).

> I am really scratching my head.

[]

> On 5 Jul 2021, 13:25 +0200, dbosa...@gmail.com
> <mailto:dbosa...@gmail.com> , wrote:


> For the lazy loading error, if you are not intentionally
> lazy loading data, you should remove the lazy loading
> entry from the description file. Previously this was not
> causing any problems with the CRAN checks, but now it is.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] undefined subclass warning

2021-07-01 Thread Martin Maechler
> Ben Bolker 
> on Wed, 30 Jun 2021 20:23:27 -0400 writes:

>A colleague recently submitted a paper to JSS and was
> advised to address the following warning which occurs when
> their package
> (https://CRAN.R-project.org/package=pcoxtime) is loaded:

> Warning message: In .recacheSubclasses(def@className, def,
> env) : undefined subclass "numericVector" of class
> "Mnumeric"; definition not updated

> After much digging I *think* I've concluded that this
> comes from the following import chain:

> pcoxtime -> riskRegression -> rms -> quantreg ->
> MatrixModels

>that is, loading any of these packages throws the
> warning.  MatrixModels Imports: *only* {stats, methods,
> Matrix} and loading these by themselves is warning-less.

> I assume there is some mismatch/incompatibility
> between MatrixModels (which was last updated 2021-03-01)
> and Matrix (2021-05-24), which has this NEWS item in the
> most release 1.3-3
> :

> * removed the nowhere used (and unexported but still
> active) class union "Mnumeric" which actually trickled
> into many base classes properties.  Notably would it break
> validity of factor with a proposed change in validity
> checking, as factors were also "Mnumeric" but did not
> fulfill its validity method. Similarly removed (disabled)
> unused class union "numericVector".

> It seems that REINSTALLING the package from source
> solves the problem, which is nice, but I don't fully
> understand why; I guess   there are class structures
> that are evaluated at install time and stored in the
> package environment ...

>Any more explanations would be welcome.
>cheers Ben

Yes, Ben,
you are right on spot and very close with your final guess.

Installation and even building of packages using S4 classes (their own *or* of
other packages where they import from) does store the class
definitions already in the binary "dump" of all the R code.

So yes, Matrix cleanup (dropping unused classes actually
helping/improving the class hiearchy by making it slightly
simpler) does need to re-installation of direct Matrix
dependencies in order to avoid the above warning --- which
otherwise has zero consequences.

So the referees of your colleague's paper / package should really learn
that they are wrong in their requirement of getting rid of that
warning.

and as you've suggested in another thread, I should alleviate
the problem by uploading a new version of 'MatrixModels' models
to CRAN {solving another small unrelated buglet} so the warnings
will go away for everyone who updates their installed packages.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] How to communicate WARNINGS fixed from *last* CRAN version of a package

2021-06-22 Thread Martin Maechler
> Alberto Garre 
> on Tue, 22 Jun 2021 10:52:26 +0200 writes:

> Thanks! It is the first time I got this message, so I was a bit puzzled
> about what to do. I will be patient, then :)

> Alberto

well, as Sebastian said

>> The auto-check e-mail said "Hence please reply-all and explain". If you
>> followed these instructions, you will just need some more patience.

so, did you "reply-all" to that automated e-mail and explain
that you fixed them?

You need to do this even when you had it in the "submitter's comments".

Martin



> El mar, 22 jun 2021 a las 10:50, Sebastian Meyer ()
> escribió:

>> Am 22.06.21 um 10:11 schrieb Alberto Garre:
>> > Hi,
>> >
>> > I submitted yesterday a new version of the biogrowth package (
>> > https://cran.r-project.org/package=biogrowth). In the automatic
>> response, I
>> > got the following message:
>> >
>> > The auto-check found additional issues for the *last* version released 
on
>> > CRAN:
>> >   donttest 
>> >   M1mac 
>> > CRAN incoming checks do not test for these additional issues.
>> > Hence please reply-all and explain: Have these been fixed?
>> >
>> > I resubmitted mentioning in cran-comments.md that these problems had 
been
>> > resolved, but I got again the same automatic response. Then, I answered
>> > directly to the automatic email with no effect.
>> >
>> > How should I communicate these issues have been fixed? Is there any
>> > additional issue I am not seeing?
>> 
>> The auto-check e-mail said "Hence please reply-all and explain". If you
>> followed these instructions, you will just need some more patience. You
>> submitted only yesterday evening and these checks require manual
>> inspection by the CRAN team. I can see your submission in the "inspect"
>> folder of the incoming queue (https://CRAN.R-project.org/incoming/).
>> 
>> Best regards,
>> 
>> Sebastian Meyer
>> 
>> >
>> > Thank you,
>> > Alberto
>> >
>> >   [[alternative HTML version deleted]]
>> >
>> > __
>> > R-package-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> >
>> 

> [[alternative HTML version deleted]]

> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Should last default to .Machine$integer.max-1 for substring()

2021-06-21 Thread Martin Maechler
>>>>> Tomas Kalibera 
>>>>> on Mon, 21 Jun 2021 10:08:37 +0200 writes:

    > On 6/21/21 9:35 AM, Martin Maechler wrote:
>>>>>>> Michael Chirico
>>>>>>> on Sun, 20 Jun 2021 15:20:26 -0700 writes:
>> > Currently, substring defaults to last=100L, which
>> > strongly suggests the intent is to default to "nchar(x)"
>> > without having to compute/allocate that up front.
>> 
>> > Unfortunately, this default makes no sense for "very
>> > large" strings which may exceed 100L in "width".
>> 
>> Yes;  and I tend to agree with you that this default is outdated
>> (Remember :  R was written to work and run on 2 (or 4?) MB of RAM on the
>> student lab  Macs in Auckland in ca 1994).
>> 
>> > The max width of a string is .Machine$integer.max-1:
>> 
>> (which Brodie showed was only almost true)
>> 
>> > So it seems to me either .Machine$integer.max or
>> > .Machine$integer.max-1L would be a more sensible default. Am I missing
>> > something?
>> 
>> The "drawback" is of course that .Machine$integer.max  is still
>> a function call (as R beginners may forget) contrary to L,
>> but that may even be inlined by the byte compiler (? how would we check 
?)
>> and even if it's not, it does more clearly convey the concept
>> and idea  *and* would probably even port automatically if ever
>> integer would be increased in R.

> We still have the problem that we need to count characters, not bytes, 
> if we want the default semantics of "until the end of the string".

> I think we would have to fix this either by really using 
> "nchar(type="c"))" or by using e.g. NULL and then treating this as a 
> special case, that would be probably faster.

> Tomas

You are right, as always, Tomas.
I agree that would be better and we should do it if/when we change
the default there.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Should last default to .Machine$integer.max-1 for substring()

2021-06-21 Thread Martin Maechler
> Michael Chirico 
> on Sun, 20 Jun 2021 15:20:26 -0700 writes:

> Currently, substring defaults to last=100L, which
> strongly suggests the intent is to default to "nchar(x)"
> without having to compute/allocate that up front.

> Unfortunately, this default makes no sense for "very
> large" strings which may exceed 100L in "width".

Yes;  and I tend to agree with you that this default is outdated
(Remember :  R was written to work and run on 2 (or 4?) MB of RAM on the
 student lab  Macs in Auckland in ca 1994).

> The max width of a string is .Machine$integer.max-1:

  (which Brodie showed was only almost true)

> So it seems to me either .Machine$integer.max or
> .Machine$integer.max-1L would be a more sensible default. Am I missing
> something?

The "drawback" is of course that .Machine$integer.max  is still
a function call (as R beginners may forget) contrary to L,
but that may even be inlined by the byte compiler (? how would we check ?)
and even if it's not, it does more clearly convey the concept
and idea  *and* would probably even port automatically if ever
integer would be increased in R.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dgTMatrix Segmentation Fault

2021-06-10 Thread Martin Maechler
>>>>> Ben Bolker 
>>>>> on Wed, 9 Jun 2021 21:11:18 -0400 writes:

> Nice!

Indeed -- and thanks a lot, Dario (and Martin Morgan !) for
getting down to the root problem.

so, indeed a bug in Matrix (though "far away" from 'dgTMatrix').

Thank you once more!

Martin Maechler

> On 6/9/21 9:00 PM, Dario Strbenac via R-devel wrote:
>> Good day,
>> 
>> Thanks to handy hints from Martin Morgan, I ran R under gdb and checked 
for any numeric overflow. We pinpointed the cause:
>> 
>> (gdb) info locals
>> i = 0
>> j = 10738
>> m = 20
>> n = 5
>> ans = 0x5b332790
>> aa = 0x5b3327c0
>> 
>> There is a line of C code in dgeMatrix.c for (i = 0; i < m; i++) aa[i] 
+= xx[i + j * m];
>> 
>> i  + j * m are all int, and overflow
>> (lldb) print 0 + 10738 * 20
>> (int) $5 = -2147367296
>> 
>> So, either the code should check that this doesn't occur, or be adjusted 
to allow for large indexes.
>> 
>> If anyone is interested, this is in the context of single-cell ATAC-seq 
data, which typically has about 20 genomic regions (rows) and perhaps 
10 biological cells (columns).
>> 
>> --
>> Dario Strbenac
>> University of Sydney
>> Camperdown NSW 2050
>> Australia
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dgTMatrix Segmentation Fault

2021-06-08 Thread Martin Maechler
>>>>> Dario Strbenac 
>>>>> on Tue, 8 Jun 2021 09:00:04 + writes:

> Good day, Indeed, changing the logical test is a
> workaround to the problem. However, a segmentation fault
> means that the software tried to access an invalid memory
> location, so I think the original problem should be
> addressed in Matrix package, regardless.

Hmm, you maybe right or not ..

Note we have the situation you (via R) ask your computer
(i.e. the OS system memory allocation routines) to provide
memory.

In a reasonable setup, the OS routine returns, saying
"I cannot provide the memory you asked for",
and the R function stop() s. .. no segfault, all is fine.

The problem that on some platforms that does not work, is a
relatively deep problem  and also has happened in base R in some
cases on some platforms (possibly never on Linux based ones
(Ubuntu, Debian, Fedora, CentOS..),  but maybe I'm too
optimistic there as well.

Note: I now also tried on our oldish Windows (Terminal) Server,
and it also just gave errors that it could not allocate so much
memory but did not produce a seg.fault.


Currently, I don't see what we should improve in the Matrix
package here.

Martin Maechler
(co-maintainer of 'Matrix')

> --
> Dario Strbenac University of Sydney Camperdown NSW 2050
> Australia

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Question about preventing CRAN package archival

2021-06-03 Thread Martin Maechler
>>>>> Avraham Adler 
>>>>> on Wed, 2 Jun 2021 15:28:25 -0400 writes:

> Exactly. Is square is just brow=ncol, is positive definite can be
> implemented as a check that all eigenvalues > 0, which can be done with
> base, and is.symmetric can be simply brute forced in a loop comparing i,j
> with j,i.

> The fewer dependencies a package has, the more robust it is. It’s a fine
> balance between not reinventing the wheel and ceding too much stability to
> other packages.

> Thanks,
> Avi

Indeed.  Further,

-  isSymmetric()  has been part of base for a long time
   so the definition of an alternative in matrixcalc  had been @_*#^$%
   It's also supported by methods in the Matrix package
   e.g. for sparse matrices etc  so definitely something you
   "should" use instead.

-  is.square() is trivial and I think an explicit check such as
   { d <- dim(x);  d[1] == d[2] }
   is often more sensical, notably as in many of your functions
   you'd need either nrow(.) or ncol(.) of your matrix anyway.

- A remark on Positive Definiteness (or also, often what you really want,
   "Positive Semi-definitness", allowing 0 eigen values):
  The  Matrix  package has an (S4) class  "dpoMatrix"
  of dense positive-definite (actually 'positive semi-definite') matrices.
  In its validation method (yes, formal classes have validation!), we
  use a cheap test instead of an expensive test with eigenvalues
  (which is "too expensive": there are faster ones at least in theory,
   e.g., trying an  LDL' Cholesky decomposition and returning as soon as
   a non-positive / negative entry in D would appear).

  The really cheap "pre-test" you may want to use  before or instead
  of doing one of the more expensive ones is simply checking the diagonal:

   if(any(diag()) <  0) "not positive-semidefinite"
   if(any(diag()) <= 0) "not positive-definite"

Martin Maechler
(Maintainer of 'Matrix').


> On Wed, Jun 2, 2021 at 3:15 PM John Harrold 
> wrote:

>> To add another option. In the past when this has happened to me I've 
found
>> other packages that provide similar functionality.
>> 
>> I'm assuming that is.square just checks the number of columns == number 
of
>> rows? And the others can probably be implemented pretty easily.
>> 
>> On Wed, Jun 2, 2021 at 10:41 AM Ben Staton  wrote:
>> 
>> > My package uses the MIT license, so would that not meet the 
compatibility
>> > requirements?
>> >
>> > I will attempt to reach out to the package author - thanks for your 
help!
>> >
>> > On Wed, Jun 2, 2021 at 10:31 AM Ben Bolker  wrote:
>> >
>> > > That all sounds exactly right.
>> > >GPL >= 2 allows you to use the material without asking permission 
as
>> > > long as your package is compatibly licensed (e.g. also GPL).
>> > >Under normal circumstances it would be polite to ask permission, 
but
>> > > if the reason for doing this is that the maintainer is unreachable in
>> > > the first place ...
>> > >
>> > >   If you want to try a little harder, it seems quite possible that 
you
>> > > can reach the matrixcalc maintainer at the (personal) e-mail address
>> > > shown in this page:
>> > >
>> > >
>> >
>> 
https://www.facebook.com/photo/?fbid=10208324530363130=ecnf.1000413042
>> > >
>> > >(Possibly an identity confusion, but I rate that as unlikely based
>> on
>> > > other facebook snooping)
>> > >
>> > >I don't think a short, polite e-mail request would be out of 
bounds,
>> > > they can always ignore it or tell you to go away.
>> > >
>> > >cheers
>> > > Ben Bolker
>> > >
>> > > On 6/2/21 1:15 PM, Ben Staton wrote:
>> > > > Hello,
>> > > >
>> > > > Thank you for your detailed list of solutions.
>> > > >
>> > > > I was initially tempted to go with option 1 (move matrixcalc to
>> > suggests
>> > > > and check for its existence before using functions that rely on 
it),
>> > but
>> > > as
>> > > > mentioned, this is not a long term fix.
>> > > >
>> > > > I unfortunately can't take on the responsibilities of option 2
>

Re: [Rd] `mode`

2021-05-18 Thread Martin Maechler
>>>>> Dmichael Parrish via R-devel 
>>>>> on Tue, 18 May 2021 02:05:04 + (UTC) writes:

> Hello, Kindly revise the documentation for `mode` to
> reflect foo <- function () {} typeof(foo) # [1] "closure"
> mode(foo)# [1] "function"


> `help(mode)` states: Modes have the same set of names as
> types (see typeof) except that

>     types "integer" and "double" are returned as
> "numeric".

>     types "special" and "builtin" are returned as
> "function".

>     type "symbol" is called mode "name".

>     type "language" is returned as "(" or "call".


Indeed, that help file is missing  "closure", ...
amazingly, for all the history of R (of 25+ years).

Thank you!

I've already fixed this in the sources' trunk (svn rev 80321) a
minute ago; of course this will not make it anymore into R 4.1.0
but in its "patched" version, and then 4.1.1 and one.

With thankful regards,
Martin

--
Martin Maechler
ETH Zurich  and  R Core team


> I am presently reading `help(mode)` on:

> write.dcf(R.Version()) # platform: x86_64-w64-mingw32 #
> arch: x86_64 # os: mingw32 # system: x86_64, mingw32 #
> status: # major: 4 # minor: 0.3 # year: 2020 # month: 10 #
> day: 10 # svn rev: 79318 # language: R # version.string: R
> version 4.0.3 (2020-10-10) # nickname: Bunny-Wunnies Freak
> Out

 
> __ Hmo < 0.1 L tanh kd ---Miche, 1951 / I have
> placed the sand for the bound of the sea... and though the
> waves thereof toss themselves... they cannot pass over it
> ---YHWH, ca. 600 B.C. (Jer. 5:22)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel new warning: no longer be an S4 object

2021-05-10 Thread Martin Maechler
> Jan Gorecki 
> on Mon, 10 May 2021 12:42:09 +0200 writes:

> Hi R-devs,
> R 4.0.5 gives no warning. Is it expected? Searching the news for "I("
> doesn't give any info. Thanks

> z = I(getClass("MethodDefinition"))

Now what exactly did you intend with the above line ?

I'm bold and say (for the moment) that the above line has always
been very dubious if not misleading,
and this "fact" is now finally revealed by the warning

> Warning message:
> In `class<-`(x, unique.default(c("AsIs", oldClass(x :
> Setting class(x) to multiple strings ("AsIs", "classRepresentation",
> ...); result will no longer be an S4 object

So, yes, the change has been on purpose to warn about problems,
you'd get later when trying to work with 'z'.


> [[alternative HTML version deleted]]

   (your fault: do use plain text aka   MIME time 'text/plain'))

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Testing R build when using --without-recommended-packages?

2021-05-05 Thread Martin Maechler
> Gabriel Becker 
> on Tue, 4 May 2021 14:40:22 -0700 writes:

> Hmm, that's fair enough Ben, I stand corrected.  I will say that this 
seems
> to be a pretty "soft" recommendation, as these things go, given that it
> isn't tested for by R CMD check, including with the -as-cran extensions. 
In
> principle, it seems like it could be, similar checks are made in package
> code for inappropriate external-package-symbol usage/

> Either way, though, I suppose I have a number of packages which have been
> invisibly non-best-practices compliant for their entire lifetimes (or at
> least, the portion of that where they had tests/vignettes...).

> Best,
> ~G

> On Tue, May 4, 2021 at 2:22 PM Ben Bolker  wrote:

>> Sorry if this has been pointed out already, but some relevant text
>> from
>> 
>> 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Suggested-packages
>> 
>> > Note that someone wanting to run the examples/tests/vignettes may not
>> have a suggested package available (and it may not even be possible to
>> install it for that platform). The recommendation used to be to make
>> their use conditional via if(require("pkgname")): this is OK if that
>> conditioning is done in examples/tests/vignettes, although using
>> if(requireNamespace("pkgname")) is preferred, if possible.
>> 
>> ...
>> 
>> > Some people have assumed that a ‘recommended’ package in ‘Suggests’
>> can safely be used unconditionally, but this is not so. (R can be
>> installed without recommended packages, and which packages are
>> ‘recommended’ may change.)


Thank you all (Henrik, Gabe, Dirk & Ben) !

I think it would be a good community effort  and worth the time
also of R core to further move into the right direction
as Dirk suggested.

I think we all agree it would be nice if Henrik (and anybody)
could use  'make check' on R's own sources after using
 --without-recommended-packages

Even one more piece of evidence is the   tests/README   file in
the R sources.  It has much more but simply starts with

---
There is a hierarchy of check targets:

 make check

for all builders.  If this works one can be reasonably happy R is working
and do `make install' (or the equivalent).

make check-devel

for people changing the code: this runs things like the demos and
no-segfault which might be broken by code changes, and checks on the
documentation (effectively R CMD check on each of the base packages).
This needs recommended packages installed.

make check-all

runs all the checks, those in check-devel plus tests of the recommended
packages.

Note that for complete testing you will need a number of other
..
..

---

So, our (R core) own intent has been that   'make check'  should
run w/o rec.packages  but further checking not.

So, yes, please, you are encouraged to send patches against the
R devel trunk  to fix such examples and tests.

Best,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Inconsistency in median()

2021-05-04 Thread Martin Maechler
>>>>> Gustavo Zapata Wainberg 
>>>>> on Mon, 3 May 2021 20:48:49 +0200 writes:

> Hi!

> I'm wrinting this post because there is an inconsistency
> when median() is calculated for even or odd vectors. For
> odd vectors, attributes (such as labels added with Hmisc)
> are kept after running median(), but this is not the case
> if the vector is even, in this last case attributes are
> lost.

> I know that this is due to median() using mean() to obtain
> the result when the vector is even, and mean() always
> takes attributes off vectors.

Yes, and this has been the design of  median()  for ever :

If n := length(x)  is odd,  the median is "the middle" observation,
   and should  equal to x[j] for j = (n+1)/2
   and hence e.g., is well defined for an ordered factor.

When  n  is even
 however, median() must be the mean of "the two middle" observations,
   which is e.g., not even *defined* for an ordered factor.

We *could* talk of the so called lo-median  or hi-median
(terms probably coined by John W. Tukey) because (IIRC), these
are equal to each other and to the median for odd n, but
are   equal to  x[j]  and  x[j+1]   j=n/2  for even n *and* are
still "of the same kind" as x[]  itself.

Interestingly, for the mad() { = the median absolute deviation from the median}
we *do* allow to specify logical 'low' and 'high',
but that for the "outer" median in MAD's definition, not the
inner one.

## From /src/library/stats/R/mad.R :

mad <- function(x, center = median(x), constant = 1.4826,
na.rm = FALSE, low = FALSE, high = FALSE)
{
if(na.rm)
x <- x[!is.na(x)]
n <- length(x)
constant *
if((low || high) && n%%2 == 0) {
if(low && high) stop("'low' and 'high' cannot be both TRUE")
n2 <- n %/% 2 + as.integer(high)
sort(abs(x - center), partial = n2)[n2]
}
else median(abs(x - center))
}




> Don't you think that attributes should be kept in both
> cases? 

well, not all attributes can be kept.
Note that for *named* vectors x,  x[j] can (and does) keep the name,
but there's definitely no sensible name to give to (x[j] + x[j+1])/2

I'm willing to collaborate with some, considering
to extend  median.default()  making  hi-median and lo-median
available to the user.
Both of these will always return x[j] for some j and hence keep
all (sensible!) attributes (well, if the `[`-method for the
corresponding class has been defined correctly; I've encountered
quite a few cases where people created vector-like classes but
did not provide a "correct"  subsetting method (typically you
should make sure both a `[[` and `[` method works!).

Best regards,
Martin

Martin Maechler
ETH Zurich  and  R Core team

> And, going further, shouldn't mean() keep
> attributes as well? I have looked in R's Bugzilla and I
> didn't find an entry related to this issue.

> Please, let me know if you consider that this issue should
> be posted in R's bugzilla.

> Here is an example with code.

> rndvar <- rnorm(n = 100)

> Hmisc::label(rndvar) <- "A label for RNDVAR"

> str(median(rndvar[-c(1,2)]))

> Returns: "num 0.0368"

> str(median(rndvar[-1]))

> Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A
> label for RNDVAR"

> Thanks in advance!

> Gustavo Zapata-Wainberg

>   [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.list fails on functions with S3 classes

2021-04-29 Thread Martin Maechler
> brodie gaslam via R-devel 
> on Thu, 29 Apr 2021 01:04:01 + (UTC) writes:

>> On Wednesday, April 28, 2021, 5:16:20 PM EDT, Gabriel Becker 
 wrote:
>> 
>> Hi Antoine,
>> 
>> I would say this is the correct behavior. S3 dispatch is solely (so far 
as
>> I know?) concerned with the "actual classes" on the object. This is 
because
>> S3 classes act as labels that inform dispatch what, and in what order,
>> methods should be applied. You took the function class (ie label) off of
>> your object, which means that in the S3 sense, that object is no longer a
>> function and dispatching to function methods for it would be incorrect.
>> This is independent of whether the object is still callable "as a 
function".
>> 
>> The analogous case for non-closures to what you are describing would be 
for
>> S3 to check mode(x) after striking out with class(x) to find relevant
>> methods. I don't think that would be appropriate.

> I would think of the general case to be to check `class(unclass(x))` on
> strike-out.  This would then include things such as "matrix", etc.
> Dispatching on the implicit class as fallback seems like a natural thing
> to do in a language that dispatches on implicit class when there is none.
> After all, once you've struck out of your explicit classes, you have
> none left!

> This does happen naturally in some places (e.g. interacting with a
> data.frame as a list), and is quite delightful (usually).  I won't get
> into an argument of what the documentation states or whether any changes
> should be made, but to me that dispatch doesn't end with the implicit
> class seems feels like a logical wrinkle.  Yes, I can twist my brain to
> see how it can be made to make sense, but I don't like it.

> A fun past conversation on this very topic:

> https://stat.ethz.ch/pipermail/r-devel/2019-March/077457.html

Thank you, Gabe and Brodie.

To the OP,  Gabe's advice to *NOT* throw away an existing class
is really important,  and  good code -- several examples in base R --
would really *extend* a class in such cases, i.e.,

function(x, ...) {
 ..
 ans <- things.on(x, .)
 class(ans) <- c("foo", class(x))   #
 ans
}

I don't have time to go in-depth here (teaching and other duties),
but  I want to point you to one important extra point,
which I think you have not been aware:

S3 dispatch *does* look at what you see from class()  *but* has
always done some extra things, notably for atomic and other
*base* objects.  There's always been a dedicated function in R's C
code to do this,  R_data_class2(),  e.g., called from  C
usemethod() called from R's UseMethod().

Since R 4.0.0,  we have provided R function .class2()   to give
the same result as the internal R_data_class2(),  and hence
show the classes (in the correct order!) which are really for S3
dispatch.

The NEWS entry for that was

  \item New function \code{.class2()} provides the full character
  vector of class names used for S3 method dispatch.


Best,
Martin


> Best,

> B.

>> Also, as an aside, if you want your class to override methods that exist
>> for function you would want to set the class to c("foo", "function"), not
>> c("function", "foo"), as you had it in your example.
>> 
>> Best,
>> ~G
>> 
>> On Wed, Apr 28, 2021 at 1:45 PM Antoine Fabri 
>> wrote:
>> 
>>> Dear R devel,
>>> 
>>> as.list() can be used on functions, but not if they have a S3 class that
>>> doesn't include "function".
>>> 
>>> See below :
>>> 
>>> ```r
>>> add1 <- function(x) x+1
>>> 
>>> as.list(add1)
>>> #> $x
>>> #>
>>> #>
>>> #> [[2]]
>>> #> x + 1
>>> 
>>> class(add1) <- c("function", "foo")
>>> 
>>> as.list(add1)
>>> #> $x
>>> #>
>>> #>
>>> #> [[2]]
>>> #> x + 1
>>> 
>>> class(add1) <- "foo"
>>> 
>>> as.list(add1)
>>> #> Error in as.vector(x, "list"): cannot coerce type 'closure' to 
vector of
>>> type 'list'
>>> 
>>> as.list.function(add1)
>>> #> $x
>>> #>
>>> #>
>>> #> [[2]]
>>> #> x + 1
>>> ```
>>> 
>>> In failing case the argument is dispatched to as.list.default instead of
>>> as.list.function.
>>> 
>>> (1) Shouldn't it be dispatched to as.list.function ?
>>> 
>>> (2) Shouldn't all generics when applied on an object of type closure 
fall
>>> back to the `fun.function` method  before falling back to the 
`fun.default`
>>> method ?
>>> 
>>> Best regards,
>>> 
>>> Antoine

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] New post not readable

2021-04-28 Thread Martin Maechler
> Lluís Revilla 
> on Wed, 28 Apr 2021 15:19:53 +0200 writes:

> Hi all,

> It has come to my attention that there is a new post on The R blog: "R
> Can Use Your Help: Testing R Before Release".
> However, the link returns an error "Not found":
> 
https://developer.r-project.org/Blog/public/2021/04/28/r-can-use-your-help-testing-r-before-release/index.html
> Hope this mailing list is the right place to make it known to the authors.

yes

> Maybe these new content could be announced on the R-announcement
> mailing list?

> For others interested I created a Twitter account that
> uses The R blog's RSS feed to announce new entries: R_dev_news.

Well, there's  @_R_Foundation   the posts of which are
automatically embedded / mirrored on R's  home page https://www.r-project.org/
and where you can see the previous Blog post still being
announced... ...  but the new one not, probably because of the
error (some files not committed I guess) that you mentioned
above.

So, maybe you should remove R_dev_news  or at least mention that
@_R_Foundation  is the official 'R Foundation' twitter account
*and* that it also uses the Blog feed ..

> Looking forward to reading the new post.

Me too,  thank you  Lluís , for the "heads up"!
Martin

> Cheers,
> Lluís

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NEWS item for bugfix in normalizePath and file.exists?

2021-04-28 Thread Martin Maechler
> Toby Hocking 
> on Wed, 28 Apr 2021 07:21:05 -0700 writes:

> Hi Tomas, thanks for the thoughtful reply. That makes sense about the
> problems with C locale on windows. Actually I did not choose to use C
> locale, but instead it was invoked automatically during a package check.
> To be clear, I do NOT have a file with that name, but I do want 
file.exists
> to return a reasonable value, FALSE (with no error). If that behavior is
> unspecified, then should I use something like tryCatch(file.exists(x),
> error=function(e)FALSE) instead of assuming that file.exists will always
> return a logical vector without error? For my particular application that
> work-around should probably be sufficient, but one may imagine a situation
> where you want to do

> x <- "\360\237\247\222\n| \360\237\247\222\360\237\217\273\n|
> \360\237\247\222\360\237\217\274\n| \360\237\247\222\360\237\217\275\n|
> \360\237\247\222\360\237\217\276\n| \360\237\247\222\360\237\217\277\n"
> Encoding(x) <- "unknown"
> Sys.setlocale(locale="C")
> f <- tempfile()
> cat("", file = f)
> two <- c(x, f)
> file.exists(two)

> and in that case the correct response from R, in my opinion, would be
> c(FALSE, TRUE) -- not an error.
> Toby

Indeed, thanks a lot to Tomas!

# A remark 
We *could* -- and according to my taste should -- try to have file.exists()
return a logical vector in almost all cases, namely, e.g., still give an
error for file.exists(pi) :
Notably  if  `c(...)`  {for the  `...`  arguments of file.exists() }
is a character vector, always return a logical vector of the same
length, *and* we could notably make use of the fact that R's
logical type is not binary but ternary, and hence that return
value could contain values from {TRUE, NA, FALSE}  and interpret NA
as "don't know" in all cases where the corresponding string in
the input had an Encoding(.) that was "fishy" in some sense
given the "context" (OS, locale, OS_version, ICU-presence, ...).

In particular, when the underlying code sees encoding-translation issues
for a string,  NA  would be returned instead of an error.

Martin

> On Wed, Apr 28, 2021 at 3:10 AM Tomas Kalibera 
> wrote:

>> Hi Toby,
>> 
>> a defensive, portable approach would be to use only file names regarded
>> portable by POSIX, so characters including ASCII letters, digits,
>> underscore, dot, hyphen (but hyphen should not be the first character).
>> That would always work on all systems and this is what I would use.
>> 
>> Individual operating systems and file systems and their configurations
>> differ in which additional characters they support and how. On some,
>> file names are just sequences of bytes, on some, they have to be valid
>> strings in certain encoding (and then with certain exceptions).
>> 
>> On Windows, file names are at the lowest level in UTF-16LE encoding (and
>> admitting unpaired surrogates for historical reasons). R stores strings
>> in other encodings (UTF-8, native, Latin-1), so file names have to be
>> translated to/from UTF-16LE, either directly by R or by Windows.
>> 
>> But, there is no way to convert (non-ASCII) strings in "C" encoding to
>> UTF16-LE, so the examples cannot be made to work on Windows.
>> 
>> When the translation is left on Windows, it assumes the non-UTF-16LE
>> strings are in the Active Code Page encoding (shown as "system encoding"
>> in sessionInfo() in R, Latin-1 in your example) instead of the current C
>> library encoding ("C" in your example). So, file names coming from
>> Windows will be either the bytes of their UTF-16LE representation or the
>> bytes of their Latin-1 representation, but which one is subject to the
>> implementation details, so the result is really unusable.
>> 
>> I would say using "C" as encoding in R is not a good idea, and
>> particularly not on Windows.
>> 
>> I would say that what happens with such file names in "C" encoding is
>> unspecified behavior, which is subject to change at any time without
>> notice, and that both the R 4.0.5 and R-devel behavior you are observing
>> are acceptable. I don't think it should be mentioned in the NEWS.
>> Personally, I would prefer some stricter checks of strings validity and
>> perhaps disallowing the "C" encoding in R, so yet another behavior where
>> it would be clearer that this cannot really work, but that would require
>> more thought and effort.
>> 
>> Best
>> Tomas
>> 
>> 
>> On 4/27/21 9:53 PM, Toby Hocking wrote:
>> 
>> > Hi all, Today I noticed bug(s?) in R-4.0.5, which seem to be fixed in
>> > R-devel already. I checked on
>> > https://developer.r-project.org/blosxom.cgi/R-devel/NEWS and there is 
no
>> > mention of these changes, so I'm wondering if they are intentional? If
   

[Rd] Msg not getting posted (or much delayed (was "R check false positive ..")

2021-04-24 Thread Martin Maechler
> Dénes Tóth 
> on Wed, 21 Apr 2021 12:57:48 +0200 writes:

> 
> Disclaimer: I sent this report first to r-package-de...@r-project.org 
> but it seems it has not been delivered to the list - re-trying to r-devel
> 

Also, for R-devel, your msg  sat for  3  days in the spam filter
queue, and I as list co-moderator noticed it (among all the real
spam, so quite by coincidence) and released it...

Almost surely the R-package-devel moderators did *not* notice it
in the spam filter queue there...

NB: The spam symptoms were indicated as
  X-Spamc: is spam (7.0/5.0) position : 6, spam decisive
  X-MailCleaner-SpamCheck: spam, Newsl (score=0.0, required=5.0, NONE,
   position : 0, not decisive), NiceBayes (42.47%, position : 2,
   not decisive), Spamc (score=7.0, required=5.0, EthURLb 0.0,
   URIBL_BLOCKED 0.0, EZURL 0.0, MC_SPF_SOFTFAIL 7.0, position : 6,
   spam decisive),

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD INSTALL warning for S4 replacement functions on R 4.1.0-alpha

2021-04-24 Thread Martin Maechler
> Sebastian Meyer 
> on Fri, 23 Apr 2021 23:23:16 +0200 writes:

> I can confirm this Rd warning in R-devel (2021-04-23 r80216), but not in
> R 4.0.5. It happens when installing the static help (INSTALL option 
--html).

> The following R code reproduces the warning by creating a tiny test
> package and then calling relevant internal functions from 'tools':

[...]

Thanks a lot, Sebastian,
for some reason I did not see your reply when I wrote mine...

So at least you indirectly solved my puzzlement on that this was
not visible with CRAN checks using R-devel  :  Almost surely
they all do *not* create static HTML pages.

So this really is a bug in R > 4.0.z   and we are happy if you
report it, just referring to your  R-devel post ... and then we
all will be really grateful for whomever provides a (careful,
minimal, ..) patch,  presumably to
src/library/tools/R/Rd2HTML.R

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD INSTALL warning for S4 replacement functions on R 4.1.0-alpha

2021-04-24 Thread Martin Maechler
using R-devel don't seem to show anything.
...
... hence my deep puzzlement.

Thank you once more for the report,
Martin

--
Martin Maechler
ETH Zurich and R Core team

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Using ggplot2 within another package

2021-04-24 Thread Martin Maechler
> Ben Bolker 
> on Thu, 22 Apr 2021 17:27:49 -0400 writes:

> For some reason that I don't remember, an R core member once told me 
> that they prefer x <- y <- NULL to utils::globalVariables(c("x","y")) - 

That could have been me.  Even though I think I still have some
globalVariables() statements in some of my package sources, I've
decided that it *harms* really, notably for relatively common variable names 
such
as "x":   It declares them "global"
{ for the purpose of codetools::globalVariables() } everywhere,
i.e. for all functions in the package namespace and that
basically kills the reliability of  globalVariables() checking
for the whole package.


> although I have also encountered problems with that strategy in edge 
cases.

well, when?

> Here's an example from StackOverflow from today where for some reason 
> I don't understand, evaluation of function arguments interacts with 
> non-standard/lazy evaluation within a dplyr function such that 'foo' 
> works while 'x$foo' doesn't ... don't know if it's a similar case.

> 
https://stackoverflow.com/questions/67218258/getting-error-error-in-usemethodfilter-no-applicable-method-for-filter/67220198#67220198


{ ceterum censeo ... to use NSE (non-standard-evaluation) for
  user convenience and to call this (together with really good
  ideas)  "tidy" has been one of the biggest euphemisms in the history of
  statistical computing ...  but yes, that's just my personal opinon  }

> On 4/22/21 5:19 PM, Kevin R. Coombes wrote:
>> Thanks.
>> 
>> Obviously, long. long ago, (in a galaxy not far enough away), Paul's
>> suggestion of using "aes_string" was the correct one, since "aes" uses
>> non-standard evaluation. (And to quote somebody from an R fortune
>> cookie, "The problem with non-standard evaluation is that it is
>> non-standard.") But teh documentation at the end oft he link provided by
>> Robert explivityl tells you not to do that, since "aes_string is
>> deprecated".  And reading more carefully into the manual page for
>> aes_string, one does indeed find the statement that the function is
>> "soft deprecated". I'm not sure what that means, other than someone on
>> the development team doesn't like it.
>> 
>> Instead, the vignette says you should
>>    importFrom("rlang", ".data")
>> in your NAMESPACE, and write
>>    ggplot(myData, aes(x = .data$myX, y = .data$myY))
>> 
>> And now my dinosaur question: That looks like using one non-standard
>> hack to cover up the problems with another non-standard hack. Why the
>> heck  is that any better for the developer than writing
>>    ggplot(myData, aes(x = myData$myX, y = myData$myY))
>> 
>> or using Dirk Eddelbuettel's suggestion of calling 
utils::globalVariables ??
>> 
>> It's time to tell those kids to get off of my lawn.
>>   Kevin
>> 
>> On 4/22/2021 4:45 PM, Robert M. Flight wrote:
>>> Kevin,
>>> 
>>> This vignette from ggplot2 itself gives the "officially recommended"
>>> ways to avoid the warnings from R CMD check
>>> 
>>> https://ggplot2.tidyverse.org/articles/ggplot2-in-packages.html
>>> 
>>> 
>>> Cheers,
>>> 
>>> -Robert
>>> 
>>> On Thu, Apr 22, 2021 at 4:39 PM Paul SAVARY
>>> mailto:paul.sav...@univ-fcomte.fr>> wrote:
>>> 
>>> Hi Kevin,
>>> 
>>> I was faced to the same problem and I used 'aes_string()' instead
>>> of 'aes()'. You can then just write the name of the columns
>>> containing the data to plot as character strings.
>>> 
>>> Example:
>>> 
>>> myPlot <- function(myData, ...) {
>>>     # get ready
>>>     ggplot(myData, aes_string(x = "myX", y = "myY")) +
>>>        # add my decorations
>>>        theme_bw()
>>> }
>>> 
>>> It is probably already the case for your function but you need to
>>> include #' @import ggplot2 in your function preamble (if I am not
>>> wrong).
>>> 
>>> Kind regards
>>> Paul
>>> 
>>> - Mail original -
>>> De: "Kevin R. Coombes" >> >
>>> À: "r-package-devel" >> >
>>> Envoyé: Jeudi 22 Avril 2021 22:28:55
>>> Objet: [R-pkg-devel] Using ggplot2 within another package
>>> 
>>> Hi,
>>> 
>>> I'm trying to help clean up an R package for someone else to
>>> submit to
>>> CRAN. He has used ggplot2 to implement a plotting function for the
>>> kinds
>>> of things that his packages generates. His plotting routine basically
>>> looks like (after changing names to protect the innocent):
>>> 
>>> myPlot <- fucntion(myData, ...) {
>>>     # get ready
>>>     ggplot(myData, aes(x = myX, y = myY)) +
>>>    # add my decorations
>>>    theme_bw()
  

Re: [Rd] Sys.timezone() fails on Linux under Microsoft WSL

2021-04-14 Thread Martin Maechler
> Brenton Wiernik 
> on Tue, 13 Apr 2021 09:15:50 -0400 writes:

> In Microsoft’s Windows Subsystem for Linux (WSL or WSL2),
> there is not system framework, so utilities that depend on
> it fail. This includes timedatectl which R uses in
> Sys.timezone(). The timedatectl utility is present on
> Linux systems installed under WSL/WSL2, but is
> non-functional. So, when Sys.timezone() checks for
> Sys.which("timedatectl"), it receives a false
> positive. The subsequent methods after this if () do work,
> however.

> This can be fixed if line 42 of Sys.timezone() were changed from:

> if (nzchar(Sys.which("timedatectl"))) {

> to:

> if (nzchar(Sys.which("timedatectl")) && !grepl("microsoft", system("uname 
-r", intern = TRUE), ignore.case = TRUE)) {

> "uname -r" returns for example:  "5.4.72-microsoft-standard-WSL2"

> So checking for "microsoft" or "WSL" would probably work.

> Brenton Wiernik

Thank you.  This all makes sense.
However,  using system("uname -r")  creates another platform
dependency (it fails, i.e., signals an error, e.g., on our Windows Server).

Could  Sys.info()  be used instead?
What does it give on your platform?



> [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [Solved] Possible x11 window manager window aggregation under one icon?

2021-03-26 Thread Martin Maechler
>>>>> Duncan Murdoch 
>>>>> on Thu, 25 Mar 2021 10:41:46 -0400 writes:

> On 25/03/2021 9:18 a.m., Dirk Eddelbuettel wrote:
>> 
    >> On 24 March 2021 at 10:30, Martin Maechler wrote:
>> | For this reason I've committed to R (the trunk, i.e., R-devel,
>> | for R 4.1.0 in a month or so)  in svn rev 80110.
>> 
>> I just saw that via the (still extremely helpful) RSS feed of SVN 
changes and
>> then pulled.
>> 
>> You may have missed that Ivan concluded, and I followed, that the _patch 
is
>> not needed_.  All one needs is to adjust the .desktop file. I posted my 
full
>> changelog from the Debian package (of which I currently run a test build 
on
>> two Ubuntu machines using the binaries from Launchpad I pointed to).
>> 
>> So in that sense I think r80110 may wants to be reverted.

> I'm not sure either if Martin saw your conclusion.

I saw it, but as there were two patches of Ivan, I understood
that the 2nd one (which would even group X11 windows of
unrelated R instances) was unneeded.

I concluded I liked the first one because it would achieve
what's considered "uniformly better" in the sense that it makes
R graphics behave like "all other" desktop applications *and* it
would do so for all possible window manager scheme without any
need of some desktop setting (which a typical user would not
know about, nor know that s?he should/could change).


> I haven't tested R-devel with r80110 yet, but I did make the equivalent 
> change in rgl, and have been working with that.

> In Ubuntu, it makes no difference if the .desktop file is changed as you 
> describe, 
a big "iff" at least conceptually, when in the present case,
Dirk as Debian maintainer of the 'R debian pkg' can make it happen.

What about Redhat/Fedora etc, what about the next cool window manager
on Linux distribution Z?  They may change to do what .desktop
does in a different way, etc, or more typically not package R
that way and hence not have a *.desktop equivalent.

> but I think it's an improvement if you don't make that change 
> for the usual case.  You don't get a ton of icons, you get one rgl icon 
> per process.

> In macOS, it does affect the behaviour of windows.  During rgl testing, 
> I sometimes create 100+ windows.  Before the change, the window manager 
> put them all over the screen, trying to make the newest one visible. 
> After the change (now it knows they're all in the same group), it just 
> cascades them down the screen until it hits the bottom, then keeps 
> creating tiny windows crammed against the bottom of the screen.  I think 
> this is negative (the usual reason I create them all is to hope to spot 
> bad changes).

> So for a reasonable number of windows the change is an improvement:  the 
> windows appear grouped.  For a very large number of
> windows it's a negative..

> Duncan Murdoch

which probably also depends on your screen size and the
configuration of several tuning parameters of your window
manager etc..

This all started with Dirk saying the R behaves differently than
"all" other applications in this respect, and Ivan found compact
way to change that .. window-manager -- independently  which I
still think is a pro.
Given Duncan's use case, may this should become an argument for
x11() and X11.options(),  say  grouping = c("process", "none", "all")
with  match.arg(grouping) used, so the default was "process"
i.e. group things together that belong to the same "process"
(current R-devel),  "none" would correspond to the previous
default and "all" would correspond to what the 2nd patch of Ivan
aimed for.

?

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [Solved] Possible x11 window manager window aggregation under one icon?

2021-03-24 Thread Martin Maechler
> Dirk Eddelbuettel 
> on Tue, 23 Mar 2021 12:36:47 -0500 writes:

> It all works now, thanks mostly to some very detailed reading of the 
specs by
> Ivan.  In short, I made the following changes:

> - add the missing WM hint to the .desktop file we install
> - add the svg logo as 'scalable'
> - create a new (square) 48x48 default png logo from the new one
> - deactivate yesterday's patch

> and it is all good now.  Duncan's rgl windows aggregate under the item, 
as do
> the standard R x11 devices.  I will try to attach a small screenshot, 
we'll
> see how mailman likes it.  Martin should still be able to get the old 
(and to
> me, buggy) behaviour back by removing the one key line from the .desktop
> file, if his Fedora environment ever updates as I plan.

> For anyone on Ubuntu 20.10, updated binaries are in my PPA, see
> 
https://launchpad.net/~edd/+archive/ubuntu/misc/+packages?field.name_filter=r-base_filter=published_filter=groovy

> Changelog entries below, these have not been committed to Debian's git yet
> but I think I will activate this for R 4.0.5 next week (and test it til
> then). Screenshot attached below too.

> -- changelog for these three test builds follows  
---

> r-base (4.0.4-1.2010.3) groovy; urgency=medium

> * debian/r-base-core.dirs: Also create the directory
> usr/share/icons/hicolor/scalable/apps for the svg logo

> -- Dirk Eddelbuettel   Tue, 23 Mar 2021 11:05:17 -0500

> r-base (4.0.4-1.2010.2) groovy; urgency=medium

> * icon-class-patch/R.desktop: Add 'StartupWMClass=R_x11'
> * icon-class-patch/rlogo_icon.OLD.png.mpack: Renamed old icon
> * icon-class-patch/rlogo_icon.png.mpack: New 48x48 png from svg
> * icon-class-patch/Rlogo.svg: Copy of official logo
> * debian/rules: Also install Rlogo.svg in 'scalable' icons dir

> * debian/patches/series: Deactivate unneeded grouping patch 

> -- Dirk Eddelbuettel   Tue, 23 Mar 2021 10:27:43 -0500

> r-base (4.0.4-1.2010.1) groovy; urgency=medium

> * PPA build on Ubuntu 20.10 "groovy"
> * src/modules/X11/devX11.c: Apply patch by Ivan Krylov (posted to r-devel
>   on 2021-03-22) enabling grouping of x11 plot device windows

> -- Dirk Eddelbuettel   Mon, 22 Mar 2021 21:33:09 -0500

Thank you, Dirk for raising the issue, providing the nice summary,
and again to Ivan for his patches and Duncan for testing and comments

I've checked the first of Ivan's patches -- IIRC the one Dirk
now also is going to patch the Debian/Ubuntu/... R-base packages with.
The code is nice, short (but not too short), partly self explainable,
and it also works fine under the Fedora_32--current version of
Gnome.

For this reason I've committed to R (the trunk, i.e., R-devel,
for R 4.1.0 in a month or so)  in svn rev 80110.

However we definitely want a stable R 4.0.5 with basically only
a bug fix of the character/UTF-8/.. problem so R 4.0.x itself
will surely not get such a patch.

Best regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible x11 window manager window aggregation under one icon?

2021-03-23 Thread Martin Maechler
> Dirk Eddelbuettel 
> on Mon, 22 Mar 2021 22:23:47 -0500 writes:

> On 22 March 2021 at 16:57, Dirk Eddelbuettel wrote:
> | 
> | On 23 March 2021 at 00:01, Ivan Krylov wrote:
> | | The surrounding code and
> | | 

> | | proved to be enough of an example. The following patch makes it
> | | possible to group x11() windows on my PC with Xfce running:
> | 
> | [...]
> | 
> | | Some very limited testing didn't seem to uncover any problems.
> | 
> | Woot woot -- works here too under Ubuntu 20.10 / Gnome / Unity.
> | 
> | I applied the adorably small patch to the usual checkout I keep of 
r-devel
> | (as an incremental build is faster than a full package build of 
r-release).
> | 
> | You are my hero. Next round of hot or cold beverages is on me. Already 
looks
> | so much better. I may put this into the next 4.0.5 (or 4.1.0 at the 
latest)
> | for Debian and Ubuntu (but will instrument a proper new r-base package 
and
> | hit it for a few days first).

> Close, close, close but no cigar yet: For a given R process, x11() windows
> group for a that process. But we often run multiple R processes.  Have you
> seen anything for grouping under the "program" (in some sense) but not the
> concrete process from it?

Hmm.. while I've been very happy with your (DE) original proposal and
the thread (with Ivan's nice small patch), I'm not sure I'd agree here.

Yes, you and I and a few handful more of people on the globe run
more than one *interactive* R process simultaneously.  But even
there, e.g., when I run  R-patched and R-devel, I'd sometimes rather keep
the two processes "separated", including their graphics windows,
because one important side condition of the workflow is to be
careful in comparing the two R versions.

And R is not firefox (where I really typically only want one
firefox running, already being a crazy process generator and
sometimes memory hog). 
The two (or more) different R processes are entirely autonomous
(in > 99.5% of cases), and I would rather have the current
proposal than a possibly quite a bit more complicated one which
I personally often would not even prefer...

With many thanks to Dirk, Naras, Ivan and Duncan for dealing
with the issue so nicely,

Martin

> ( If someone wants to play, Ubuntu binaries for groovy == 20.10 are at
> 
https://launchpad.net/~edd/+archive/ubuntu/misc/?field.series_filter=groovy )


> Dirk

> -- 
> https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] quantile() names

2021-03-17 Thread Martin Maechler
Getting back to this after 3 months :

>>>>> Martin Maechler 
>>>>> on Wed, 16 Dec 2020 11:13:32 +0100 writes:

>>>>> Gabriel Becker 
>>>>> on Mon, 14 Dec 2020 13:23:00 -0800 writes:

>> Hi Edgar, I certainly don't think quantile(x, .975) should
>> return 980, as that is a completely wrong answer.

>> I do agree that it seems like the name is a bit
>> offputting. I'm not sure how deep in the machinery you'd
>> have to go to get digits to no effect on the names (I
>> don't have time to dig in right this second).

>> On the other hand, though, if we're going to make the
>> names not respect digits entirely, what do we do when
>> someone does quantile(x, 1/3)? That'd be a bad time had by
>> all without digits coming to the rescue, i think.

>> Best, ~G

> and now we read more replies on this topic without anyone looking at
> the pure R source code which is pretty simple and easy.
> Instead, people do experiments and take time to muse about their 
findings..

> Honestly, I'm disappointed: I've always thought that if you
> *write* on R-devel, you should be able to figure out a few
> things yourself before that..

> It's not rocket science to see/know that you need to quickly look at
> the quantile.default() method function and then to note 
> that it's  format_perc(.) which is used to create the names.

> Almost surely, I've been a bit envolved in creating parts of
> this and probably am responsible for the current default
> behavior.

> 
> (sounds of digging) ...
> 
> 
> 
> 
> 
> 

--> Yes:

> 
> r837 | maechler | 1998-03-05 12:20:37 +0100 (Thu, 05. Mar 1998) | 2 Zeilen
> Geänderte Pfade:
> M /trunk/src/library/base/R/quantile
> M /trunk/src/library/base/man/quantile.Rd

> fixed names(.) construction
> 

> With this diff  (my 'svn-diffB -c837 quantile') :
> Index: quantile
> ===
> 21c21,23
> < names(qs) <- paste(round(100 * probs), "%", sep = "")
> ---
>>names(qs) <- paste(formatC(100 * probs, format= "fg", wid=1,
>>  dig= max(2,.Options$digits)),
>> "%", sep = "")

> -
> so this was before this was modularized into the format_perc()
> utility and quite a while before R 1.0.0 

> Now, 22.8 years later, I do think that indeed it was not
> necessarily the best idea to make the names() construction depend  on the
> 'digits' option entirely and just protect it by using at least 2 digits.

> What I think is better is to

> 1) provide an optional argument   'digits = 7'
> back compatible w/ default getOption("digits")

> 2) when used, check that it is at least '1'

> But then some scripts / examples of some people *will* change
> ..., e.g., because they preferred to have a global setting of digits=5

> so I'm guessing it may make more people unhappy than other
> people happy if we change this now, after close to 23 years  .. ??

> Martin

I had more thoughts about this, and noticed that not one example
or test in base R  plus Recommended packages was changed, so
I've now committed the above change.

NEWS entry

• The names of quantile()'s result no longer depend on the global
  getOption("digits"), but quantile() gets a new optional argument
  digits = 7 instead.

Martin


--
Martin Maechler
ETH Zurich  and  R Core team


>> On Mon, Dec 14, 2020 at 11:55 AM Merkle, Edgar
>> C.  wrote:

>>> All,
>>> 
>>> Consider the code below
>>> 
>>> options(digits=2)
>>> x <- 1:1000 
>>> quantile(x, .975)

>>> The value returned is 975 (the 97.5th percentile), but
>>> the name has been shortened to "98%" due to the digits
>>> option. Is this intended? I would have expected the name
>>> to also be "97.5%" here. Alternatively, the returned
>>> value might be 980 in order to match the name of "98%".
>>> 
>>> Best, Ed
>>>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Potential improvements of ave?

2021-03-16 Thread Martin Maechler
> Gabriel Becker 
> on Mon, 15 Mar 2021 15:08:44 -0700 writes:

> Abby,
> Vectors do have an internal mechanism for knowing that they are sorted via
> ALTREP (it was one of 2 core motivating features for 'smart vectors' the
> other being knowledge about presence of NAs).

> Currently I don't think we expose it at the R level, though it is part of
> the official C API. I don't know of any plans for this to change, but I
> suppose it could. Plus for functions in R itself, we could even use it
> without exposing it more widely. A number of functions, including sort
> itself, already do this in fact, but more could. I'd be interested in
> hearing which functions you think would particularly benefit from this.

Thank you Gabe.

> ~G

I vaguely remember (from Luke's docs/presentation on ALTREP)
that there are some "missing parts" here.
One of them the not-existing R level functionality, another may be
the C code below R's  is.unsorted()  ... maybe  is.unsorted()
could get a new argument and or be re-written, moving  the NA
handling also to C and have that happen *after* the C code looks
if it's an ALTREP object and if that "knows it's sorted".

Martin


> On Mon, Mar 15, 2021 at 12:01 PM SOEIRO Thomas 
> wrote:

>> Hi Abby,
>> 
>> Thank you for your positive feedback.
>> 
>> I agree for your general comment about sorting.
>> 
>> For ave specifically, ordering may not help because the output must
>> maintain the order of the input (as ave returns only x and not the 
entiere
>> data.frame).
>> 
>> Thanks,
>> 
>> Thomas
>> 
>> De : Abby Spurdle 
>> Envoyé : lundi 15 mars 2021 10:22
>> À : SOEIRO Thomas
>> Cc : r-devel@r-project.org
>> Objet : Re: [Rd] Potential improvements of ave?
>> 
>> EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS
>> 
>> Hi Thomas,
>> 
>> These are some great suggestions.
>> But I can't help but feel there's a much bigger problem here.
>> 
>> Intuitively, the ave function could (or should) sort the data.
>> Then the indexing step becomes almost trivial, in terms of both time
>> and space complexity.
>> And the ave function is not the only example of where a problem
>> becomes much simpler, if the data is sorted.
>> 
>> Historically, I've never found base R functions user-friendly for
>> aggregation purposes, or for sorting.
>> (At least, not by comparison to SQL).
>> 
>> But that's not the main problem.
>> It would seem preferable to sort the data, only once.
>> (Rather than sorting it repeatedly, or not at all).
>> 
>> Perhaps, objects such as vectors and data.frame(s) could have a
>> boolean attribute, to indicate if they're sorted.
>> Or functions such as ave could have a sorted argument.
>> In either case, if true, the function assumes the data is sorted and
>> applies a more efficient algorithm.
>> 
>> 
>> B.
>> 
>> 
>> On Sat, Mar 13, 2021 at 1:07 PM SOEIRO Thomas 
>> wrote:
>> >
>> > Dear all,
>> >
>> > I have two questions/suggestions about ave, but I am not sure if it's
>> relevant for bug reports.
>> >
>> >
>> >
>> > 1) I have performance issues with ave in a case where I didn't expect
>> it. The following code runs as expected:
>> >
>> > set.seed(1)
>> >
>> > df1 <- data.frame(id1 = sample(1:1e2, 5e2, TRUE),
>> >   id2 = sample(1:3, 5e2, TRUE),
>> >   id3 = sample(1:5, 5e2, TRUE),
>> >   val = sample(1:300, 5e2, TRUE))
>> >
>> > df1$diff <- ave(df1$val,
>> > df1$id1,
>> > df1$id2,
>> > df1$id3,
>> > FUN = function(i) c(diff(i), 0))
>> >
>> > head(df1[order(df1$id1,
>> >df1$id2,
>> >df1$id3), ])
>> >
>> > But when expanding the data.frame (* 1e4), ave fails (Error: cannot
>> allocate vector of size 1110.0 Gb):
>> >
>> > df2 <- data.frame(id1 = sample(1:(1e2 * 1e4), 5e2 * 1e4, TRUE),
>> >   id2 = sample(1:3, 5e2 * 1e4, TRUE),
>> >   id3 = sample(1:(5 * 1e4), 5e2 * 1e4, TRUE),
>> >   val = sample(1:300, 5e2 * 1e4, TRUE))
>> >
>> > df2$diff <- ave(df2$val,
>> > df2$id1,
>> > df2$id2,
>> > df2$id3,
>> > FUN = function(i) c(diff(i), 0))
>> >
>> > This use case does not seem extreme to me (e.g. aggregate et al work
>> perfectly on this data.frame).
>> > So my question is: Is this expected/intended/reasonable? i.e. Does ave
>> need to be optimized?
>> >
>> >
>> >
>> > 2) Gabor Grothendieck pointed out in 2011 that drop = 

Re: [Rd] Corrupt internal row names when creating a data.frame with `attributes<-`

2021-03-01 Thread Martin Maechler
> Davis Vaughan 
> on Tue, 16 Feb 2021 14:50:33 -0500 writes:

> This originally came up in this dplyr issue:
> https://github.com/tidyverse/dplyr/issues/5745

> Where `tibble::column_to_rownames()` failed because it
> eventually checks `.row_names_info(.data) > 0L` to see if
> there are automatic row names, which is in line with the
> documentation that Kevin pointed out: "type = 1 the latter
> with a negative sign for ‘automatic’ row names."

> Davis


> On Tue, Feb 16, 2021 at 2:29 PM Bill Dunlap
>  wrote:

>> as.matrix.data.frame does not take the absolute value of
>> that number:

slightly changed and extended by MM {and as R script} :



dPos <- structure(list(X=11:14, 1:4), class="data.frame", row.names=c(NA, +4L))
dNeg <- structure(list(X=11:14, 1:4), class="data.frame", row.names=c(NA, -4L))
##
all_rn_info <- function(x) lapply(setNames(,0:2),
   function(tp) .row_names_info(x, type=tp))
str(all_rn_info(dPos))
## List of 3
##  $ 0: int [1:2] NA 4
##  $ 1: int 4
##  $ 2: int 4
str(all_rn_info(dNeg))
## List of 3
##  $ 0: int [1:2] NA -4
##  $ 1: int -4
##  $ 2: int 4
stopifnot(exprs = {
identical(rownames(dPos), as.character(1:4))
identical(rownames(dPos), rownames(dNeg))
## using as.matrix.data.frame() which differentiates, too :
identical(rownames(as.matrix(dPos)), as.character(1:4))
is.null  (rownames(as.matrix(dNeg)))
## BTW, also :
identical(attributes(dPos), attributes(dNeg)) ## and hence also
identical(dPos, dNeg) # another case where identical() is possibly too 
"tolerant"
})

## and for your interest, these *also* have both 'c(NA, +|n|)'  ==> give '+4'
.row_names_info(dInt1 <- structure(list(X=11:14, 1:4), class="data.frame", 
row.names=1:4))
.row_names_info(dInt2 <- local({ dd <- data.frame(X=11:14, 1:4, fix.empty.names 
= FALSE)
 attr(dd, "row.names") <- 1:4; dd }))
stopifnot(exprs = {
identical(dInt1, dInt2)
identical(all_rn_info(dInt1),
  all_rn_info(dInt2))
identical(all_rn_info(dPos),
  all_rn_info(dInt1))
})



There never was a conclusion here
(and the above is not the full context of the thread) .. 
but if I understand Bill and his example (extended above) correctly,
he's indirectly hinting toward that there is **no bug** here :

1) You can use structure() well to get "truly automatic" row
   names by setting the row.names correctly to  c(NA, -3L)
   {yes, which is  c(NA_integer_, -3L) }

2) There's a subtle difference between *two* kinds of automatic
   row names, on purpose, notably used in  as.matrix.data.frame():
   c(NA, +3)  are automatic row names, too, but which translate also to
   matrix row names hence are somewhat slightly less automatic ... 

   Note that you may see this documented by careful reading of
   the 'Note' in  help(row.names) *and* the 'Examples' section
   of that help page

Last but not least:  We (R Core) did not document the nitty
gritty details here partly on purpose, because they should've
been subject to change, see e.g. the word "currently" in the
?row.names help page.

Notably with ALTREP objects, we could use "regular"  1:n
integer row names which would be ALTREP compacted automatically
for non-small 'n'.

Last but not least, the check in tibble that was mentioned in
this thread, should almost surely be fixed, if gives a problem
for these example, and I claim it has been *that* code that has
buggy rather than base R's one. 

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprised matrix (1:256, 8, 8) doesn't cause error/warning

2021-02-09 Thread Martin Maechler
 > - if (lendat > 1 && nrc % lendat != 0) {
> + if ((nrc % lendat) != 0) {
> if (((lendat > nr) && (lendat / nr) * nr != lendat) ||
> ((lendat < nr) && (nr / lendat) * lendat != nr))
> warning(_("data length [%d] is not a sub-multiple or multiple of the 
number of rows [%d]"), lendat, nr);
> else if (((lendat > nc) && (lendat / nc) * nc != lendat) ||
> ((lendat < nc) && (nc / lendat) * lendat != nc))
> - warning(_("data length [%d] is not a sub-multiple or multiple 
of the number of columns [%d]"), lendat, nc);
> - }
> - else if ((lendat > 1) && (nrc == 0)){
> + warning(_("data length [%d] is not a sub-multiple or 
multiple of the number of columns [%d]"), lendat, nc);
> + if (nrc == 0)
> warning(_("data length exceeds size of matrix"));
> +if (nrc != lendat)
> +warning(_("data length incompatible with size of matrix"));
> }
> }


> --
> // And here, for easy checking that part of the code in the new form:
> if (lendat > 1) {
> R_xlen_t nrc = (R_xlen_t) nr * nc;
> if ((nrc % lendat) != 0) {
> if (((lendat > nr) && (lendat / nr) * nr != lendat) ||
> ((lendat < nr) && (nr / lendat) * lendat != nr))
> warning(_("data length [%d] is not a sub-multiple or multiple of the 
number of rows [%d]"), lendat, nr);
> else if (((lendat > nc) && (lendat / nc) * nc != lendat) ||
> ((lendat < nc) && (nc / lendat) * lendat != nc))
> warning(_("data length [%d] is not a sub-multiple or multiple of the 
number of columns [%d]"), lendat, nc);
> if (nrc == 0)
> warning(_("data length exceeds size of matrix"));
> if (nrc != lendat)  
> warning(_("data length incompatible with size of matrix"));
> }
> }

>> Il giorno 2feb2021, alle ore 00:27, Abby Spurdle (/əˈbi/) 
 ha scritto:
>> 
>> So, does that mean that a clean result is contingent on the length of
>> the data being a multiple of both the number of rows and columns?
>> 
>> However, this rule is not straightforward.
>> 
>>> #EXAMPLE 1
>>> #what I would expect
>>> matrix (1:12, 0, 0)
>> <0 x 0 matrix>
>> Warning message:
>> In matrix(1:12, 0, 0) : data length exceeds size of matrix
>> 
>>> #EXAMPLE 2
>>> #don't like this
>>> matrix (numeric (), 2, 3)
>> [,1] [,2] [,3]
>> [1,]   NA   NA   NA
>> [2,]   NA   NA   NA
>> 
>> The first example is what I would expect, but is inconsistent with the
>> previous examples.
>> (Because zero is a valid multiple of twelve).
>> 
>> I dislike the second example with recycling of a zero-length vector.
>> This *is* covered in the help file, but also seems inconsistent with
>> the previous examples.
>> (Because two and three are not valid multiples of zero).
>> 
>> Also, I can't think of any reason why someone would want to construct
>> a matrix with extra data, and then discard part of it.
>> And even if there was, then why not allow an arbitrarily longer length?
>> 
>> 
>> On Mon, Feb 1, 2021 at 10:08 PM Martin Maechler
>>  wrote:
>>> 
>>>>>>>> Abby Spurdle (/əˈbi/)
>>>>>>>> on Mon, 1 Feb 2021 19:50:32 +1300 writes:
>>> 
>>>> I'm a little surprised that the following doesn't trigger an error or 
a warning.
>>>> matrix (1:256, 8, 8)
>>> 
>>>> The help file says that the main argument is recycled, if it's too 
short.
>>>> But doesn't say what happens if it's too long.
>>> 
>>> It's somewhat subtler than one may assume :
>>> 
>>>> matrix(1:9, 2,3)
>>> [,1] [,2] [,3]
>>> [1,]135
>>> [2,]246
>>> Warning message:
>>> In matrix(1:9, 2, 3) :
>>> data length [9] is not a sub-multiple or multiple of the number of rows 
[2]
>>> 
>>>> matrix(1:8, 2,3)
>>> [,1] [,2] [,3]
>>> [1,]135
>>> [2,]246
>>> Warning message:
>>> In matrix(1:8, 2, 3) :
>>> data length [8] is not a sub-multiple or multiple of the number of 
columns [3]
>>> 
>>>> matrix(1:12, 2,3)
>>> [,1] [,2] [,3]
>>> [1,]135
>>> [2,]246
>>>> 
>>> 
>>> So it looks to me the current behavior is quite on purpose.
>>> Are you sure it's not documented at all when reading the docs
>>> carefully?  (I did *not*, just now).
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprised matrix (1:256, 8, 8) doesn't cause error/warning

2021-02-01 Thread Martin Maechler
> Abby Spurdle (/əˈbi/) 
> on Mon, 1 Feb 2021 19:50:32 +1300 writes:

> I'm a little surprised that the following doesn't trigger an error or a 
warning.
> matrix (1:256, 8, 8)

> The help file says that the main argument is recycled, if it's too short.
> But doesn't say what happens if it's too long.

It's somewhat subtler than one may assume :

> matrix(1:9, 2,3)
 [,1] [,2] [,3]
[1,]135
[2,]246
Warning message:
In matrix(1:9, 2, 3) :
  data length [9] is not a sub-multiple or multiple of the number of rows [2]

> matrix(1:8, 2,3)
 [,1] [,2] [,3]
[1,]135
[2,]246
Warning message:
In matrix(1:8, 2, 3) :
  data length [8] is not a sub-multiple or multiple of the number of columns [3]

> matrix(1:12, 2,3)
 [,1] [,2] [,3]
[1,]135
[2,]246
>

So it looks to me the current behavior is quite on purpose.
Are you sure it's not documented at all when reading the docs
carefully?  (I did *not*, just now).

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] r79833 src/library/tools/R/Rd2HTML.R minor typo

2021-01-16 Thread Martin Maechler
Thank you, Ivan,

I've updated the source now,
Martin

> On line 105, "&\\hellip;" should probably be "":
> Index: Rd2HTML.R
> ===
> --- Rd2HTML.R   (revision 79833)
> +++ Rd2HTML.R   (working copy)
> 
> 
> -x <- psub("(dots|ldots)", "&\\hellip;", x)
> +x <- psub("(dots|ldots)", "", x)
> 
> 
> The backslash is ignored by gsub(), so no actual bug happens as a
> result of this.
> 
> []

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] URL checks

2021-01-11 Thread Martin Maechler
> Viechtbauer, Wolfgang (SP) 
> on Mon, 11 Jan 2021 10:41:03 + writes:

>>> Viechtbauer, Wolfgang (SP)
>>> on Fri, 8 Jan 2021 13:50:14 + writes:
>> 
>> > Instead of a separate file to store such a list, would it be an idea
>> to add versions of the \href{}{} and \url{} markup commands that are 
skipped
>> by the URL checks?
>> > Best,
>> > Wolfgang
>> 
>> I think John Nash and you misunderstood -- or then I
>> misunderstood -- the original proposal:
>> 
>> I've been understanding that there should be a  "central repository" of 
URL
>> exceptions that is maintained by volunteers.
>> 
>> And rather *not* that package authors should get ways to skip
>> URL checking..
>> 
>> Martin

> Hi Martin,

> Kirill suggested: "A file inst/URL that lists all URLs where failures are 
allowed -- possibly with a list of the HTTP codes accepted for that link."

> So, if it is a file in inst/, then this sounds to me like this is part of 
the package and not part of some central repository.

> Best,
> Wolfgang

Dear Wolfgang,
you are right and indeed it's *me* who misunderstood.

But then I don't think it's a particularly good idea: From a
CRAN point of view it is important that URLs in documents it
hosts do not raise errors (*), hence the validity checking of URLs.

So, CRAN (and other repository hosts) would need another option
to still check all URLs .. and definitely would want to do that before
accepting a package and also regularly do such checks on a per
package basis in a way that it is reported as part of the CRAN checks of
the respective package, right?

So this will get envolved, ... and maybe it *is* good idea for a
Google Summer of Code (GSoC) project ... well *if* it that is
supervised by someone who's in close contact with CRAN or Bioc
maintainer teams.

Martin

--
*) Such URL errors then lead to e-mails or other reports of web
 site checking engines reporting that you are hosting (too)
 (many) web pages with invalid links.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] URL checks

2021-01-11 Thread Martin Maechler
> Viechtbauer, Wolfgang (SP) 
> on Fri, 8 Jan 2021 13:50:14 + writes:

> Instead of a separate file to store such a list, would it be an idea to 
add versions of the \href{}{} and \url{} markup commands that are skipped by 
the URL checks?
> Best,
> Wolfgang

I think John Nash and you misunderstood -- or then I
misunderstood -- the original proposal:

I've been understanding that there should be a  "central repository" of URL
exceptions that is maintained by volunteers.

And rather *not* that package authors should get ways to skip
URL checking..

Martin


>> -Original Message-
>> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Spencer
>> Graves
>> Sent: Friday, 08 January, 2021 13:04
>> To: r-devel@r-project.org
>> Subject: Re: [Rd] URL checks
>> 
>> I also would be pleased to be allowed to provide "a list of known
>> false-positive/exceptions" to the URL tests.  I've been challenged
>> multiple times regarding URLs that worked fine when I checked them.  We
>> should not be required to do a partial lobotomy to pass R CMD check ;-)
>> 
>> Spencer Graves
>> 
>> On 2021-01-07 09:53, Hugo Gruson wrote:
>>> 
>>> I encountered the same issue today with 
https://astrostatistics.psu.edu/.
>>> 
>>> This is a trust chain issue, as explained here:
>>> https://whatsmychaincert.com/?astrostatistics.psu.edu.
>>> 
>>> I've worked for a couple of years on a project to increase HTTPS
>>> adoption on the web and we noticed that this type of error is very
>>> common, and that website maintainers are often unresponsive to requests
>>> to fix this issue.
>>> 
>>> Therefore, I totally agree with Kirill that a list of known
>>> false-positive/exceptions would be a great addition to save time to both
>>> the CRAN team and package developers.
>>> 
>>> Hugo
>>> 
>>> On 07/01/2021 15:45, Kirill Müller via R-devel wrote:
 One other failure mode: SSL certificates trusted by browsers that are
 not installed on the check machine, e.g. the "GEANT Vereniging"
 certificate from https://relational.fit.cvut.cz/ .
 
 K
 
 On 07.01.21 12:14, Kirill Müller via R-devel wrote:
> Hi
> 
> The URL checks in R CMD check test all links in the README and
> vignettes for broken or redirected links. In many cases this improves
> documentation, I see problems with this approach which I have
> detailed below.
> 
> I'm writing to this mailing list because I think the change needs to
> happen in R's check routines. I propose to introduce an "allow-list"
> for URLs, to reduce the burden on both CRAN and package maintainers.
> 
> Comments are greatly appreciated.
> 
> Best regards
> 
> Kirill
> 
> # Problems with the detection of broken/redirected URLs
> 
> ## 301 should often be 307, how to change?
> 
> Many web sites use a 301 redirection code that probably should be a
> 307. For example, https://www.oracle.com and https://www.oracle.com/
> both redirect to https://www.oracle.com/index.html with a 301. I
> suspect the company still wants oracle.com to be recognized as the
> primary entry point of their web presence (to reserve the right to
> move the redirection to a different location later), I haven't
> checked with their PR department though. If that's true, the redirect
> probably should be a 307, which should be fixed by their IT
> department which I haven't contacted yet either.
> 
> $ curl -i https://www.oracle.com
> HTTP/2 301
> server: AkamaiGHost
> content-length: 0
> location: https://www.oracle.com/index.html
> ...
> 
> ## User agent detection
> 
> twitter.com responds with a 400 error for requests without a user
> agent string hinting at an accepted browser.
> 
> $ curl -i https://twitter.com/
> HTTP/2 400
> ...
> ...Please switch to a supported browser..
> 
> $ curl -s -i https://twitter.com/ -A "Mozilla/5.0 (X11; Ubuntu; Linux
> x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" | head -n 1
> HTTP/2 200
> 
> # Impact
> 
> While the latter problem *could* be fixed by supplying a browser-like
> user agent string, the former problem is virtually unfixable -- so
> many web sites should use 307 instead of 301 but don't. The above
> list is also incomplete -- think of unreliable links, HTTP links,
> other failure modes...
> 
> This affects me as a package maintainer, I have the choice to either
> change the links to 

Re: [Rd] Small bug in the documentation of `[.data.frame`

2020-12-29 Thread Martin Maechler
> Duncan Murdoch 
> on Tue, 29 Dec 2020 08:37:51 -0500 writes:

> On 29/12/2020 8:29 a.m., Rui Barradas wrote:
>> Hello,
>> 
>> R 4.0.3 on Ubuntu 20.10, session info at end.
>> 
>> Isn't the default value of argument drop missing in
>> 
>> ?`[.data.frame`
>> 
>> Usage:
>> 
>> ## S3 method for class 'data.frame'
>> x[i, j, drop = ]
>> 
>> 
>> I had the impression that it was TRUE (it is when running the function,
>> I'm talking about the docs).

> No, you can see it if you print `[.data.frame`:

>> `[.data.frame`
> function (x, i, j, drop = if (missing(i)) TRUE else length(cols) ==
> 1)

> So if you ask for specific rows and your dataframe has more than one 
> column, it defaults to FALSE.

> I think the Rd checks allow you to leave out defaults, but don't allow 
> you to state them incorrectly, so that's probably why it is left as 
> blank in the Usage section, and explained in the Arguments section.

> Duncan Murdoch

Yes, indeed, Duncan,  it is as you think (above).

It is "official" in the sense that we've used this for a long
time in order to keep the 'Usage' section cleaner, when some
defaults are sophisticated, and a help page reader should rather
read the corresponding argument description.

Martin


>> sessionInfo()
>> R version 4.0.3 (2020-10-10)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 20.04.1 LTS
>> 
>> Matrix products: default
>> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
>> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
>> 
>> locale:
>> [1] LC_CTYPE=pt_PT.UTF-8   LC_NUMERIC=C
>> [3] LC_TIME=pt_PT.UTF-8LC_COLLATE=pt_PT.UTF-8
>> [5] LC_MONETARY=pt_PT.UTF-8LC_MESSAGES=pt_PT.UTF-8
>> [7] LC_PAPER=pt_PT.UTF-8   LC_NAME=C
>> [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C
>> 
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>> 
>> loaded via a namespace (and not attached):
>> [1] compiler_4.0.3 tools_4.0.3
>> 
>> 
>> Happy holidays,
>> 
>> Rui Barradas
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] installing from source

2020-12-28 Thread Martin Maechler
> Ben Bolker 
> on Sun, 27 Dec 2020 15:02:47 -0500 writes:

> There is a recurring issue with installing from source into paths 
> that contain single quotes/apostrophes. "Why would anyone do that??" is 
> certainly a legitimate response to such a problem, but I would also say 
> this constitutes a legitimate bug.  Would replacing both single-quotes 
> below with \\' solve the problem?

Here, I'm mostly among the  "Why would anyone do that??" people,
but I agree that it's worth some effort to try fixing this.

To your question above: Why don't you create a repr.ex. (we'd 
want anyway for R-bugzilla) and *see* if your proposition solves
it - or did I misinterpret the Q?

> I'm happy to post this (with a patch if my fix seems appropriate) on 
> r-bugzilla.


> cheers
> Ben Bolker

> line 1672 of src/library/tools/R/install.R :

> cmd <- paste0("tools:::.test_load_package('", pkg_name, "', ", 
> quote_path(lib), ")")


> 
https://github.com/wch/r-source/blob/2eade649c80725352256f16509f9ff6919fd079c/src/library/tools/R/install.R#L1672

> 
https://stackoverflow.com/questions/15129888/r-cmd-install-error-unexpected-symbol-in-test-load-package-function

> 
https://stackoverflow.com/questions/65462881/cannot-download-packages-from-github-from-unexpected-symbol

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Silent failure with NA results in fligner.test()

2020-12-24 Thread Martin Maechler
Not sure
If all of the variances are zero,  they are homogenous in that sense,
and I would give a  p-value of 1  ..
if only *some* of the variances are zero... it's less easy.

I still would try to *not* give an error in such cases  and even
prefer  NA  statistic and p-value..  because yes, these are "not
available" for such data.
But it is not strictly an error to try such a test on data of the
correct format...   Consequently, personally I would even try to not
give the current error ... but rather return NA values here:
>  if (all(x == 0))
>  stop("data are essentially constant")

On Mon, Dec 21, 2020 at 12:22 PM Kurt Hornik  wrote:
>
> > Karolis K writes:
>
> Any preferences?
>
> Best
> -k
>
> > Hello,
> > In certain cases fligner.test() returns NaN statistic and NA p-value.
> > The issue happens when, after centering with the median, all absolute 
> > values become constant, which ten leads to identical ranks.
>
> > Below are a few examples:
>
> > # 2 groups, 2 values each
> > # issue is caused by residual values after centering (-0.5, 0.5, -0.5, 0.5)
> > # then, after taking the absolute value, all the ranks become identical.
> >> fligner.test(c(2,3,4,5), gl(2,2))
>
> > Fligner-Killeen test of homogeneity of variances
>
> > data:  c(2, 3, 4, 5) and gl(2, 2)
> > Fligner-Killeen:med chi-squared = NaN, df = 1, p-value = NA
>
>
> > # similar situation with more observations and 3 groups
> >> fligner.test(c(2,3,2,3,4,4,5,5,8,9,9,8), gl(3,4))
>
> > Fligner-Killeen test of homogeneity of variances
>
> > data:  c(2, 3, 2, 3, 4, 4, 5, 5, 8, 9, 9, 8) and gl(3, 4)
> > Fligner-Killeen:med chi-squared = NaN, df = 2, p-value = NA
>
>
> > Two simple patches are proposed below. One returns an error, and another 
> > returns a p-value of 1.
> > Not sure which one is more appropriate, so submitting both.
>
> > Warm regards,
> > Karolis Koncevičius
>
> > ---
>
> > Index: fligner.test.R
> > ===
> > --- fligner.test.R(revision 79650)
> > +++ fligner.test.R(working copy)
> > @@ -59,8 +59,13 @@
> >  stop("data are essentially constant")
>
> >  a <- qnorm((1 + rank(abs(x)) / (n + 1)) / 2)
> > -STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))
> > -STATISTIC <- (STATISTIC - n * mean(a)^2) / var(a)
> > +if (var(a) > 0) {
> > +STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))
> > +STATISTIC <- (STATISTIC - n * mean(a)^2) / var(a)
> > +}
> > +else {
> > +STATISTIC <- 0
> > +}
> >  PARAMETER <- k - 1
> >  PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
> >  names(STATISTIC) <- "Fligner-Killeen:med chi-squared”
>
> > ---
>
> > Index: fligner.test.R
> > ===
> > --- fligner.test.R(revision 79650)
> > +++ fligner.test.R(working copy)
> > @@ -57,6 +57,8 @@
> >  x <- x - tapply(x,g,median)[g]
> >  if (all(x == 0))
> >  stop("data are essentially constant")
> > +if (var(abs(x)) == 0)
> > +stop("absolute residuals from the median are essentially constant")
>
> >  a <- qnorm((1 + rank(abs(x)) / (n + 1)) / 2)
> >  STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))
>
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Martinhttp://stat.ethz.ch/~maechler
Seminar für Statistik, ETH Zürich HG G 16   Rämistrasse 101
CH-8092 Zurich, SWITZERLAND   ☎ +41 44 632 3408<><

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] From .Fortran to .Call?

2020-12-23 Thread Martin Maechler
> Balasubramanian Narasimhan 
> on Wed, 23 Dec 2020 08:34:40 -0800 writes:

> I think it should be pretty easy to fix up SUtools to use the .Call 
> instead of .Fortran following along the lines of

> https://github.com/wrathematics/Romp

> I too deal with a lot of f77 and so I will most likely finish it before 
> the new year, if not earlier. (Would welcome testers besides myself.)

> Incidentally, any idea of what the performance hit is, quantitatively? I 
> confess I never paid attention to it myself as most Fortran code I use 
> seems pretty fast, i.e. glmnet.

> -Naras

well, glmnet's src/*.f  code seems closer to assembly than to
even old fortran 77 style ..
which would not change when calling it via .Call() ...
;-)

The performance "hit" of using .Fortran is probably almost only
from the fact .C() and .Fortran() now compulsorily *copy* their
arguments, whereas with .Call() you are enabled to shoot
yourself in both feet .. ;-)

Martin



> On 12/23/20 3:57 AM, Koenker, Roger W wrote:
>> Thanks to all and best wishes for a better 2021.
>> 
>> Unfortunately I remain somewhat confused:
>> 
>> o  Bill reveals an elegant way to get from my rudimentary  registration 
setup to
>> one that would explicitly type the C interface functions,
>> 
>> o Ivan seems to suggest that there would be no performance gain from 
doing this.
>> 
>> o  Naras’s pcLasso package does use the explicit C typing, but then uses 
.Fortran
>> not .Call.
>> 
>> o  Avi uses .Call and cites the Romp package 
https://github.com/wrathematics/Romp
>> where it is asserted that "there is a (nearly) deprecated interface 
.Fortran() which you
>> should not use due to its large performance overhead.”
>> 
>> As the proverbial naive R (ab)user I’m left wondering:
>> 
>> o  if I updated my quantreg_init.c file in accordance with Bill’s 
suggestion could I
>> then simply change my .Fortran calls to .Call?
>> 
>> o  and if so, do I need to include ALL the fortran subroutines in my src 
directory
>> or only the ones called from R?
>> 
>> o  and in either case could I really expect to see a significant 
performance gain?
>> 
>> Finally, perhaps I should stipulate that my fortran is strictly f77, so 
no modern features
>> are in play, indeed most of the code is originally written in ratfor, 
Brian Kernighan’s
>> dialect from ancient times at Bell Labs.
>> 
>> Again,  thanks to all for any advice,
>> Roger
>> 
>> 
>>> On Dec 23, 2020, at 1:11 AM, Avraham Adler  
wrote:
>>> 
>>> Hello, Ivan.
>>> 
>>> I used .Call instead of .Fortran in the Delaporte package [1]. What
>>> helped me out a lot was Drew Schmidt's Romp examples and descriptions
>>> [2]. If you are more comfortable with the older Fortran interface,
>>> Tomasz Kalinowski has a package which uses Fortran 2018 more
>>> efficiently [3]. I haven't tried this last in practice, however.
>>> 
>>> Hope that helps,
>>> 
>>> Avi
>>> 
>>> [1] 
https://urldefense.com/v3/__https://CRAN.R-project.org/package=Delaporte__;!!DZ3fjg!s1-ihrZ9DPUtXpxdIpJPA1VedpZFt12Ahmn4CycOmile_uSahFZnJPn_5KPITBN5NK8$
>>> [2] 
https://urldefense.com/v3/__https://github.com/wrathematics/Romp__;!!DZ3fjg!s1-ihrZ9DPUtXpxdIpJPA1VedpZFt12Ahmn4CycOmile_uSahFZnJPn_5KPISF5aCYs$
>>> [3] 
https://urldefense.com/v3/__https://github.com/t-kalinowski/RFI__;!!DZ3fjg!s1-ihrZ9DPUtXpxdIpJPA1VedpZFt12Ahmn4CycOmile_uSahFZnJPn_5KPIbwXmXqY$
>>> 
>>> Tomasz Kalinowski
>>> 
>>> 
>>> 
>>> On Tue, Dec 22, 2020 at 7:24 PM Balasubramanian Narasimhan
>>>  wrote:
 To deal with such Fortran issues in several packages I deal with, I
 wrote the SUtools package 
(https://urldefense.com/v3/__https://github.com/bnaras/SUtools__;!!DZ3fjg!s1-ihrZ9DPUtXpxdIpJPA1VedpZFt12Ahmn4CycOmile_uSahFZnJPn_5KPIJ5BbDwA$
 ) that you
 can try.  The current version generates the registration assuming
 implicit Fortran naming conventions though. (I've been meaning to
 upgrade it to use the gfortran -fc-prototypes-external flag which 
should
 be easy; I might just finish that during these holidays.)
 
 There's a vignette as well:
 
https://urldefense.com/v3/__https://bnaras.github.io/SUtools/articles/SUtools.html__;!!DZ3fjg!s1-ihrZ9DPUtXpxdIpJPA1VedpZFt12Ahmn4CycOmile_uSahFZnJPn_5KPITq9-Quc$
 
 -Naras
 
 
 On 12/19/20 9:53 AM, Ivan Krylov wrote:
> On Sat, 19 Dec 2020 17:04:59 +
> "Koenker, Roger W"  wrote:
> 
> There are comments in various places, including R-extensions §5.4
> suggesting that .Fortran is (nearly) deprecated and hinting that use
> of .Call is more efficient and now preferred for packages.
> My understanding of §5.4 and 5.5 is that 

  1   2   3   4   5   6   7   8   9   10   >