Re: [Rd] A demonstrated shortcoming of the R package management system

2023-08-06 Thread Ben Bolker
  I would support this suggestion.  There is a similar binary 
dependency chain from Matrix → TMB → glmmTMB; we have implemented 
various checks to make users aware that they need to reinstall from 
source, and to some extent we've tried to push out synchronous updates 
(i.e., push an update of TMB to CRAN every time Matrix changes, and an 
update of glmmTMB after that), but centralized machinery for this would 
certainly be nice.


  FWIW some of the machinery is here: 
https://github.com/glmmTMB/glmmTMB/blob/d9ee7b043281341429381faa19b5e53cb5a378c3/glmmTMB/R/utils.R#L209-L295 
-- it relies on a Makefile rule that caches the current installed 
version of TMB: 
https://github.com/glmmTMB/glmmTMB/blob/d9ee7b043281341429381faa19b5e53cb5a378c3/glmmTMB/R/utils.R#L209-L295



  cheers
   Ben Bolker


On 2023-08-06 5:05 p.m., Dirk Eddelbuettel wrote:


CRAN, by relying on the powerful package management system that is part of R,
provides an unparalleled framework for extending R with nearly 20k packages.

We recently encountered an issue that highlights a missing element in the
otherwise outstanding package management system. So we would like to start a
discussion about enhancing its feature set. As shown below, a mechanism to
force reinstallation of packages may be needed.

A demo is included below, it is reproducible in a container. We find the
easiest/fastest reproduction is by saving the code snippet below in the
current directory as eg 'matrixIssue.R' and have it run in a container as

docker run --rm -ti -v `pwd`:/mnt rocker/r2u Rscript /mnt/matrixIssue.R
   
This runs in under two minutes, first installing the older Matrix, next

installs SeuratObject, and then by removing the older Matrix making the
(already installed) current Matrix version the default. This simulates a
package update for Matrix. Which, as the final snippet demonstrates, silently
breaks SeuratObject as the cached S4 method Csparse_validate is now missing.
So when SeuratObject was installed under Matrix 1.5.1, it becomes unuseable
under Matrix 1.6.0.

What this shows is that a call to update.packages() will silently corrupt an
existing installation.  We understand that this was known and addressed at
CRAN by rebuilding all binary packages (for macOS and Windows).

But it leaves both users relying on source installation as well as
distributors of source packages in a dire situation. It hurt me three times:
my default R installation was affected with unit tests (involving
SeuratObject) silently failing. It similarly broke our CI setup at work.  And
it created a fairly bad headache for the Debian packaging I am involved with
(and I surmise it affects other distro similarly).

It would be good to have a mechanism where a package, when being upgraded,
could flag that 'more actions are required' by the system (administrator).
We think this example demonstrates that we need such a mechanism to avoid
(silently !!) breaking existing installations, possibly by forcing
reinstallation of other packages.  R knows the package dependency graph and
could trigger this, possibly after an 'opt-in' variable the user / admin
sets.

One possibility may be to add a new (versioned) field 'Breaks:'. Matrix could
then have added 'Breaks: SeuratObject (<= 4.1.3)' preventing an installation
of Matrix 1.6.0 when SeuratObject 4.1.3 (or earlier) is present, but
permitting an update to Matrix 1.6.0 alongside a new version, say, 4.1.4 of
SeuratObject which could itself have a versioned Depends: Matrix (>= 1.6.0).

Regards,  Dirk


## Code example follows. Recommended to run the rocker/r2u container.
## Could also run 'apt update -qq; apt upgrade -y' but not required
## Thanks to my colleague Paul Hoffman for the core of this example

## now have Matrix 1.6.0 because r2u and CRAN remain current but we can install 
an older Matrix
remotes::install_version('Matrix', '1.5.1')

## we can confirm that we have Matrix 1.5.1
packageVersion("Matrix")

## we now install SeuratObject from source and to speed things up we first 
install the binary
install.packages("SeuratObject")   # in this container via bspm/r2u as binary
## and then force a source installation (turning bspm off) _while Matrix is at 
1.5.1_
if (requireNamespace("bspm", quietly=TRUE) bspm::disable()
Sys.setenv(PKG_CXXFLAGS='-Wno-ignored-attributes')  # Eigen compilation 
noise silencer
install.packages('SeuratObject')

## we now remove the Matrix package version 1.5.1 we installed into /usr/local 
leaving 1.6.0
remove.packages("Matrix")
packageVersion("Matrix")

## and we now run a bit of SeuratObject code that is now broken as 
Csparse_validate is gone
suppressMessages(library(SeuratObject))
data('pbmc_small')
graph <- pbmc_small[['RNA_snn']]
class(graph)
getClass('Graph')
show(graph) # this fails




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] A demonstrated shortcoming of the R package management system

2023-08-06 Thread Dirk Eddelbuettel


CRAN, by relying on the powerful package management system that is part of R,
provides an unparalleled framework for extending R with nearly 20k packages.

We recently encountered an issue that highlights a missing element in the
otherwise outstanding package management system. So we would like to start a
discussion about enhancing its feature set. As shown below, a mechanism to
force reinstallation of packages may be needed.

A demo is included below, it is reproducible in a container. We find the
easiest/fastest reproduction is by saving the code snippet below in the
current directory as eg 'matrixIssue.R' and have it run in a container as

   docker run --rm -ti -v `pwd`:/mnt rocker/r2u Rscript /mnt/matrixIssue.R
  
This runs in under two minutes, first installing the older Matrix, next
installs SeuratObject, and then by removing the older Matrix making the
(already installed) current Matrix version the default. This simulates a
package update for Matrix. Which, as the final snippet demonstrates, silently
breaks SeuratObject as the cached S4 method Csparse_validate is now missing.
So when SeuratObject was installed under Matrix 1.5.1, it becomes unuseable
under Matrix 1.6.0.

What this shows is that a call to update.packages() will silently corrupt an
existing installation.  We understand that this was known and addressed at
CRAN by rebuilding all binary packages (for macOS and Windows).

But it leaves both users relying on source installation as well as
distributors of source packages in a dire situation. It hurt me three times:
my default R installation was affected with unit tests (involving
SeuratObject) silently failing. It similarly broke our CI setup at work.  And
it created a fairly bad headache for the Debian packaging I am involved with
(and I surmise it affects other distro similarly).

It would be good to have a mechanism where a package, when being upgraded,
could flag that 'more actions are required' by the system (administrator).
We think this example demonstrates that we need such a mechanism to avoid
(silently !!) breaking existing installations, possibly by forcing
reinstallation of other packages.  R knows the package dependency graph and
could trigger this, possibly after an 'opt-in' variable the user / admin
sets.

One possibility may be to add a new (versioned) field 'Breaks:'. Matrix could
then have added 'Breaks: SeuratObject (<= 4.1.3)' preventing an installation
of Matrix 1.6.0 when SeuratObject 4.1.3 (or earlier) is present, but
permitting an update to Matrix 1.6.0 alongside a new version, say, 4.1.4 of
SeuratObject which could itself have a versioned Depends: Matrix (>= 1.6.0).

Regards,  Dirk


## Code example follows. Recommended to run the rocker/r2u container.
## Could also run 'apt update -qq; apt upgrade -y' but not required
## Thanks to my colleague Paul Hoffman for the core of this example

## now have Matrix 1.6.0 because r2u and CRAN remain current but we can install 
an older Matrix
remotes::install_version('Matrix', '1.5.1')

## we can confirm that we have Matrix 1.5.1
packageVersion("Matrix")

## we now install SeuratObject from source and to speed things up we first 
install the binary
install.packages("SeuratObject")   # in this container via bspm/r2u as binary
## and then force a source installation (turning bspm off) _while Matrix is at 
1.5.1_
if (requireNamespace("bspm", quietly=TRUE) bspm::disable()
Sys.setenv(PKG_CXXFLAGS='-Wno-ignored-attributes')  # Eigen compilation 
noise silencer
install.packages('SeuratObject')

## we now remove the Matrix package version 1.5.1 we installed into /usr/local 
leaving 1.6.0
remove.packages("Matrix")
packageVersion("Matrix")

## and we now run a bit of SeuratObject code that is now broken as 
Csparse_validate is gone
suppressMessages(library(SeuratObject))
data('pbmc_small')
graph <- pbmc_small[['RNA_snn']]
class(graph)
getClass('Graph')
show(graph) # this fails


-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] hist(..., log="y")

2023-08-06 Thread David Winsemius
I guess my memory was off slightly. Densities are only plotted with freq=TRUE. 
Still there the ever present conundrum that 0 counts cannot be sensibly 
represented. 

Why not:
hist( log(x), …) #? In situations where it might make sense. 


Sent from my iPhone

> On Aug 6, 2023, at 9:01 AM, David Winsemius  wrote:
> 
> hist() is designed so that the total area sums to 1. You should build you 
> desired behavior using a barchart. 
> 
> — 
> David
> 
> 
> Sent from my iPhone
> 
>> On Aug 5, 2023, at 11:50 PM, Ott Toomet  wrote:
>> 
>> Sorry if this topic has been discussed earlier.
>> 
>> Currently, hist(..., log="y") fails with
>> 
>>> hist(rexp(1000, 1), log="y")
>> Warning messages:
>> 1: In plot.window(xlim, ylim, "", ...) :
>> nonfinite axis=2 limits [GScale(-inf,2.59218,..); log=TRUE] -- corrected
>> now
>> 2: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) :
>> "log" is not a graphical parameter
>> 3: In axis(1, ...) : "log" is not a graphical parameter
>> 4: In axis(2, at = yt, ...) : "log" is not a graphical parameter
>> 
>> The same applies for log="x"
>> 
>>> hist(rexp(1000, 1), log="x")
>> Warning messages:
>> 1: In plot.window(xlim, ylim, "", ...) :
>> nonfinite axis=1 limits [GScale(-inf,0.954243,..); log=TRUE] -- corrected
>> now
>> 2: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) :
>> "log" is not a graphical parameter
>> 3: In axis(1, ...) : "log" is not a graphical parameter
>> 4: In axis(2, at = yt, ...) : "log" is not a graphical parameter
>> 
>> This applies for the current svn version of R, and also a few recent
>> published versions.  This is unfortunate for two reasons:
>> 
>> * the error message is not quite correct--"log" is a graphical parameter,
>> but "hist" does not support it.
>> * for various kinds of data it is worthwhile to make histograms in log
>> scale.  "hist" is a very nice and convenient function and support for log
>> scale would be handy here.
>> 
>> I also played a little with the code, and it seems to be very easy to
>> implement.  I am happy to make a  patch if the team thinks it is worth
>> pursuing.
>> 
>> Cheers,
>> Ott
>> 
>>   [[alternative HTML version deleted]]
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] hist(..., log="y")

2023-08-06 Thread David Winsemius
hist() is designed so that the total area sums to 1. You should build you 
desired behavior using a barchart. 

— 
David


Sent from my iPhone

> On Aug 5, 2023, at 11:50 PM, Ott Toomet  wrote:
> 
> Sorry if this topic has been discussed earlier.
> 
> Currently, hist(..., log="y") fails with
> 
>> hist(rexp(1000, 1), log="y")
> Warning messages:
> 1: In plot.window(xlim, ylim, "", ...) :
>  nonfinite axis=2 limits [GScale(-inf,2.59218,..); log=TRUE] -- corrected
> now
> 2: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) :
>  "log" is not a graphical parameter
> 3: In axis(1, ...) : "log" is not a graphical parameter
> 4: In axis(2, at = yt, ...) : "log" is not a graphical parameter
> 
> The same applies for log="x"
> 
>> hist(rexp(1000, 1), log="x")
> Warning messages:
> 1: In plot.window(xlim, ylim, "", ...) :
>  nonfinite axis=1 limits [GScale(-inf,0.954243,..); log=TRUE] -- corrected
> now
> 2: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) :
>  "log" is not a graphical parameter
> 3: In axis(1, ...) : "log" is not a graphical parameter
> 4: In axis(2, at = yt, ...) : "log" is not a graphical parameter
> 
> This applies for the current svn version of R, and also a few recent
> published versions.  This is unfortunate for two reasons:
> 
> * the error message is not quite correct--"log" is a graphical parameter,
> but "hist" does not support it.
> * for various kinds of data it is worthwhile to make histograms in log
> scale.  "hist" is a very nice and convenient function and support for log
> scale would be handy here.
> 
> I also played a little with the code, and it seems to be very easy to
> implement.  I am happy to make a  patch if the team thinks it is worth
> pursuing.
> 
> Cheers,
> Ott
> 
>[[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] HTML documentation check works best with Tidy >= 5.0.0

2023-08-06 Thread Ivan Krylov
В Sun, 6 Aug 2023 12:18:09 +0200
Kurt Hornik  пишет:

> IIrc all Linux versions advertize themselves as something like
> 
>   HTML Tidy for Linux version 5.8.0
> 
> What about windows and macOS?

I've checked the "modern" Windows binaries of HTML Tidy, and they say
so too. Cannot check the macOS version easily.

I think that any released version older than 5.0.0 (before the
development moved from https://tidy.sf.net/ to
https://www.html-tidy.org/ in 2015) will only identify itself by the
release date. Judging by an R-SIG-Mac thread I've found, "Apple Inc.
build 2649" that may be bundled with some macOS versions must be of the
SourceForge vintage too, before they started using version numbers.

There are commits in the "modern" Tidy source tree using 4.x.x version
numbers, but I don't think they were considered to be formally
released. For example, Debian went from the SourceForge CVS snapshots
straight to 5.2.0 in 2016 without packaging the 4.x.x versions.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] feature request: optim() iteration of functions that return multiple values

2023-08-06 Thread Ott Toomet
I have done this using attributes:

fr <- function(x) {   ## Rosenbrock Banana function
   x1 <- x[1]
   x2 <- x[2]
   ans <- 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
   attr(ans, "extra1") <- 1:10
   attr(ans, "extra2") <- letters
   ans
}

Not sure if this works in your case though.

Cheers,
Ott

On Sat, Aug 5, 2023 at 1:13 AM Martin Becker <
martin.bec...@mx.uni-saarland.de> wrote:

> For a solution that does not require any change to the original function
> being optimized, the following one-liner could be used, which converts
> existing functions to functions that return only the first element:
>
> returnFirst <- function(fun) function(...) do.call(fun,list(...))[[1]]
>
> Example:
>
> fr <- function(x) {   ## Rosenbrock Banana function
>x1 <- x[1]
>x2 <- x[2]
>ans <- 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
>list(ans=ans, extra1 = 1:10, extra2 = letters)
> }
>
> fr2 <- returnFirst(fr)
> tmp <- optim(c(-1.2,1), fr2)
> fr(tmp$par)
>
>
> Am 03.08.23 um 22:21 schrieb Sami Tuomivaara:
> > Dear all,
> >
> > I have used optim a lot in contexts where it would useful to be able to
> iterate function myfun that, in addition to the primary objective to be
> minimized ('minimize.me'), could return other values such as alternative
> metrics of the minimization, informative intermediate values from the
> calculations, etc.
> >
> > myfun  <- function()
> > {
> > ...
> > return(list(minimize.me = minimize.me, R2 = R2, pval = pval, etc.))
> > }
> >
> > During the iteration, optim could utilize just the first value from the
> myfun return list; all the other values calculated and returned by myfun
> could be ignored by optim.
> > After convergence, the other return values of myfun could be finally
> extracted and appended into the optim return value (which is a list) as
> additional entry e.g.: $aux <- list(R2, pval, etc.), (without 'minimize.me'
> as it is already returned as $value).
> >
> > The usual ways for accessing optim return values, e.g., $par, $value,
> etc. are not affected.  Computational cost may not be prohibitive either.
> Is this feasible to consider?
> >
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] hist(..., log="y")

2023-08-06 Thread Ott Toomet
Sorry if this topic has been discussed earlier.

Currently, hist(..., log="y") fails with

> hist(rexp(1000, 1), log="y")
Warning messages:
1: In plot.window(xlim, ylim, "", ...) :
  nonfinite axis=2 limits [GScale(-inf,2.59218,..); log=TRUE] -- corrected
now
2: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) :
  "log" is not a graphical parameter
3: In axis(1, ...) : "log" is not a graphical parameter
4: In axis(2, at = yt, ...) : "log" is not a graphical parameter

The same applies for log="x"

> hist(rexp(1000, 1), log="x")
Warning messages:
1: In plot.window(xlim, ylim, "", ...) :
  nonfinite axis=1 limits [GScale(-inf,0.954243,..); log=TRUE] -- corrected
now
2: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) :
  "log" is not a graphical parameter
3: In axis(1, ...) : "log" is not a graphical parameter
4: In axis(2, at = yt, ...) : "log" is not a graphical parameter

This applies for the current svn version of R, and also a few recent
published versions.  This is unfortunate for two reasons:

* the error message is not quite correct--"log" is a graphical parameter,
but "hist" does not support it.
* for various kinds of data it is worthwhile to make histograms in log
scale.  "hist" is a very nice and convenient function and support for log
scale would be handy here.

I also played a little with the code, and it seems to be very easy to
implement.  I am happy to make a  patch if the team thinks it is worth
pursuing.

Cheers,
Ott

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel