Re: [Rd] Is it possible to increase MAX_NUM_DLLS in future R releases?

2016-05-10 Thread Henrik Bengtsson
Isn't the problem in Qin's example that unloadNamespace("scde") only
unloads 'scde' but none of its package dependencies that were loaded
when 'scde' was loaded.  For example:

$ R --vanilla
> ns0 <- loadedNamespaces()
> dlls0 <- getLoadedDLLs()

> packageDescription("scde")[c("Depends", "Imports")]
$Depends
[1] "R (>= 3.0.0), flexmix"

$Imports
[1] "Rcpp (>= 0.10.4), RcppArmadillo (>= 0.5.400.2.0), mgcv, Rook, rjson, MASS,
 Cairo, RColorBrewer, edgeR, quantreg, methods, nnet, RMTstat, extRemes, pcaMet
hods, BiocParallel, parallel"

> loadNamespace("scde")
> ns1 <- loadedNamespaces()
> dlls1 <- getLoadedDLLs()

> nsAdded <- setdiff(ns1, ns0)
> nsAdded
 [1] "flexmix"   "Rcpp"  "edgeR" "splines"
 [5] "BiocGenerics"  "MASS"  "BiocParallel"  "scde"
 [9] "lattice"   "rjson" "brew"  "RcppArmadillo"
[13] "minqa" "distillery""car"   "tools"
[17] "Rook"  "Lmoments"  "nnet"  "parallel"
[21] "pbkrtest"  "RMTstat"   "grid"  "Biobase"
[25] "nlme"  "mgcv"  "quantreg"  "modeltools"
[29] "MatrixModels"  "lme4"  "Matrix""nloptr"
[33] "RColorBrewer"  "extRemes"  "limma" "pcaMethods"
[37] "stats4""SparseM"   "Cairo"

> dllsAdded <- setdiff(names(dlls1), names(dlls0))
> dllsAdded
 [1] "Cairo" "parallel"  "limma" "edgeR"
 [5] "MASS"  "rjson" "Rcpp"  "grid"
 [9] "lattice"   "Matrix""SparseM"   "quantreg"
[13] "nnet"  "nlme"  "mgcv"  "Biobase"
[17] "pcaMethods""splines"   "minqa" "nloptr"
[21] "lme4"  "extRemes"  "RcppArmadillo" "tools"
[25] "Rook"  "scde"


If you unload these namespaces, I think the DLLs will also be
detached; or at least they should if packages implement an .onUnload()
with a dyn.unload().  More on this below.


To unloading these added namespaces (with DLLs), they have to be
unloaded in an order that does not break the dependency graph of the
currently loaded packages, because otherwise you'll get errors such
as:

> unloadNamespace("quantreg")
Error in unloadNamespace("quantreg") :
  namespace 'quantreg' is imported by 'car', 'scde' so cannot be unloaded

I don't know if there exist a function that unloads the namespaces in
the proper order, but here is a brute-force version:

unloadNamespaces <- function(ns, ...) {
  while (length(ns) > 0) {
ns0 <- loadedNamespaces()
for (name in ns) {
  try(unloadNamespace(name), silent=TRUE)
}
ns1 <- loadedNamespaces()
## No namespace was unloaded?
if (identical(ns1, ns0)) break
ns <- intersect(ns, ns1)
  }
  if (length(ns) > 0) stop("Failed to unload namespace: ",
paste(sQuote(ns), collapse=", "))
} # unloadNamespaces()


When I run the above on R 3.3.0 patched on Windows, I get:

> unloadNamespaces(nsAdded)
now dyn.unload("C:/Users/hb/R/win-library/3.3/scde/libs/x64/scde.dll") ...
> ns2 <- loadedNamespaces()
> dlls2 <- getLoadedDLLs()
> ns2
[1] "grDevices" "utils" "stats" "datasets"  "base"  "graphics"
[7] "methods"
> identical(sort(ns2), sort(ns0))
[1] TRUE


However, there are some namespaces for which the DLLs are still loaded:

> sort(setdiff(names(dlls2), names(dlls0)))
 [1] "Cairo" "edgeR" "extRemes"  "minqa"
 [5] "nloptr""pcaMethods""quantreg"  "Rcpp"
 [9] "RcppArmadillo" "rjson" "Rook"  "SparseM"


If we look for .onUnload() in packages that load DLLs, we find that
the following does not have an .onUnload() and therefore probably does
neither call dyn.unload() when the package is unloaded:

> sort(dllsAdded[!sapply(dllsAdded, FUN=function(pkg) {
+   ns <- getNamespace(pkg)
+   exists(".onUnload", envir=ns, inherits=FALSE)
+ })])
 [1] "Cairo" "edgeR" "extRemes"  "minqa"
 [5] "nloptr""pcaMethods""quantreg"  "Rcpp"
 [9] "RcppArmadillo" "rjson" "Rook"  "SparseM"


That doesn't look like a coincident to me.  Maybe `R CMD check` should
in addition to checking that the namespace of a package can be
unloaded also assert that it unloads whatever DLL a package loads.
Something like:

* checking whether the namespace can be unloaded cleanly ... WARNING
  Unloading the namespace does not unload DLL

At least I don't think this is tested for, e.g.
https://cran.r-project.org/web/checks/check_results_Cairo.html and
https://cran.r-project.org/web/checks/check_results_Rcpp.html.

/Henrik


On Mon, May 9, 2016 at 11:57 PM, Martin Maechler
 wrote:
>> Qin Zhu 
>> on Fri, 6 May 2016 11:33:37 -0400 writes:
>
> > Thanks for all your great answers.
> > The app I’m working on is indeed an exploratory data analysis tool for 
> gene expression, which requires a bunch of bioconductor packages.
>
> > I guess for now, my best solution is to divide my app into modules and 
> load/unload packages 

Re: [Rd] Is it possible to increase MAX_NUM_DLLS in future R releases?

2016-05-10 Thread Martin Maechler
> Qin Zhu 
> on Fri, 6 May 2016 11:33:37 -0400 writes:

> Thanks for all your great answers. 
> The app I’m working on is indeed an exploratory data analysis tool for 
gene expression, which requires a bunch of bioconductor packages. 

> I guess for now, my best solution is to divide my app into modules and 
load/unload packages as the user switch from one module to another.

> This brought me another question: it seems that unload package with the 
detach/unloadNamespace function does not unload the DLLs, or in the case of the 
"SCDE" package, not all dependent DLLs:

>> length(getLoadedDLLs())
> [1] 9
>> requireNamespace("scde")
> Loading required namespace: scde
>> length(getLoadedDLLs())
> [1] 34
>> unloadNamespace("scde")
> now 
dyn.unload("/Library/Frameworks/R.framework/Versions/3.3/Resources/library/scde/libs/scde.so")
 ...
>> length(getLoadedDLLs())
> [1] 33

> Does that mean I should use dyn.unload to unload whatever I think is 
associated with that package when the user’s done using it? I’m a little 
nervous about this because this seems to be OS dependent and previous versions 
of my app are running on both windows and macs. 

Hmm, I thought that  dyn.unload() would typically work on all
platforms, but did not research the question now, and am happy
to learn more by being corrected.

Even if we increase MAX_NUM_DLL in the future, a considerable
portion your app's will not use that future version of R yet,
and so you should try to "fight" the problem now.

> Any suggestions would be appreciated, and I’d appreciate if the 
MAX_NUM_DLLS can be increased.

> Thanks,
> Qin


>> On May 4, 2016, at 9:17 AM, Martin Morgan 
 wrote:
>> 
>> 
>> 
>> On 05/04/2016 05:15 AM, Prof Brian Ripley wrote:
>>> On 04/05/2016 08:44, Martin Maechler wrote:
> Qin Zhu 
> on Mon, 2 May 2016 16:19:44 -0400 writes:
 
 > Hi,
 > I’m working on a Shiny app for statistical analysis. I ran into
 this "maximal number of DLLs reached" issue recently because my app
 requires importing many other packages.
 
 > I’ve posted my question on stackoverflow
 
(http://stackoverflow.com/questions/36974206/r-maximal-number-of-dlls-reached
 
).
 
 
 > I’m just wondering is there any reason to set the maximal
 number of DLLs to be 100, and is there any plan to increase it/not
 hardcoding it in the future? It seems many people are also running
 into this problem. I know I can work around this problem by modifying
 the source, but since my package is going to be used by other people,
 I don’t think this is a feasible solution.
 
 > Any suggestions would be appreciated. Thanks!
 > Qin
 
 Increasing that number is of course "possible"... but it also
 costs a bit (adding to the fixed memory footprint of R).
>>> 
>>> And not only that.  At the time this was done (and it was once 50) the
>>> main cost was searching DLLs for symbols.  That is still an issue, and
>>> few packages exclude their DLL from symbol search so if symbols have to
>>> searched for a lot of DLLs will be searched.  (Registering all the
>>> symbols needed in a package avoids a search, and nowadays by default
>>> searches from a namespace are restricted to that namespace.)
>>> 
>>> See
>>> 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Registering-native-routines
>>> for some further details about the search mechanism.
>>> 
 I did not set that limit, but I'm pretty sure it was also meant
 as reminder for the useR to "clean up" a bit in her / his R
 session, i.e., not load package namespaces unnecessarily. I
 cannot yet imagine that you need > 100 packages | namespaces
 loaded in your R session. OTOH, some packages nowadays have a
 host of dependencies, so I agree that this at least may happen
 accidentally more frequently than in the past.
>>> 
>>> I am not convinced that it is needed.  The OP says he imports many
>>> packages, and I doubt that more than a few are required at any one time.
>>> Good practice is to load namespaces as required, using requireNamespace.
>> 
>> Extensive package dependencies in Bioconductor make it pretty easy to 
end up with dozen of packages attached or loaded. For instance
>> 
>> library(GenomicFeatures)
>> library(DESeq2)
>> 
>> > length(loadedNamespaces())
>> [1] 63
>> > length(getLoadedDLLs())
>> [1] 41
>> 
>> Qin's use case is a shiny app, presumably trying to provide relatively 
comprehensive access to a 

Re: [Rd] Is it possible to increase MAX_NUM_DLLS in future R releases?

2016-05-06 Thread Qin Zhu
Thanks for all your great answers. 

The app I’m working on is indeed an exploratory data analysis tool for gene 
expression, which requires a bunch of bioconductor packages. 

I guess for now, my best solution is to divide my app into modules and 
load/unload packages as the user switch from one module to another.

This brought me another question: it seems that unload package with the 
detach/unloadNamespace function does not unload the DLLs, or in the case of the 
"SCDE" package, not all dependent DLLs:

> length(getLoadedDLLs())
[1] 9
> requireNamespace("scde")
Loading required namespace: scde
> length(getLoadedDLLs())
[1] 34
> unloadNamespace("scde")
now 
dyn.unload("/Library/Frameworks/R.framework/Versions/3.3/Resources/library/scde/libs/scde.so")
 ...
> length(getLoadedDLLs())
[1] 33

Does that mean I should use dyn.unload to unload whatever I think is associated 
with that package when the user’s done using it? I’m a little nervous about 
this because this seems to be OS dependent and previous versions of my app are 
running on both windows and macs. 

Any suggestions would be appreciated, and I’d appreciate if the MAX_NUM_DLLS 
can be increased.

Thanks,
Qin


> On May 4, 2016, at 9:17 AM, Martin Morgan  
> wrote:
> 
> 
> 
> On 05/04/2016 05:15 AM, Prof Brian Ripley wrote:
>> On 04/05/2016 08:44, Martin Maechler wrote:
 Qin Zhu 
 on Mon, 2 May 2016 16:19:44 -0400 writes:
>>> 
>>> > Hi,
>>> > I’m working on a Shiny app for statistical analysis. I ran into
>>> this "maximal number of DLLs reached" issue recently because my app
>>> requires importing many other packages.
>>> 
>>> > I’ve posted my question on stackoverflow
>>> (http://stackoverflow.com/questions/36974206/r-maximal-number-of-dlls-reached
>>> ).
>>> 
>>> 
>>> > I’m just wondering is there any reason to set the maximal
>>> number of DLLs to be 100, and is there any plan to increase it/not
>>> hardcoding it in the future? It seems many people are also running
>>> into this problem. I know I can work around this problem by modifying
>>> the source, but since my package is going to be used by other people,
>>> I don’t think this is a feasible solution.
>>> 
>>> > Any suggestions would be appreciated. Thanks!
>>> > Qin
>>> 
>>> Increasing that number is of course "possible"... but it also
>>> costs a bit (adding to the fixed memory footprint of R).
>> 
>> And not only that.  At the time this was done (and it was once 50) the
>> main cost was searching DLLs for symbols.  That is still an issue, and
>> few packages exclude their DLL from symbol search so if symbols have to
>> searched for a lot of DLLs will be searched.  (Registering all the
>> symbols needed in a package avoids a search, and nowadays by default
>> searches from a namespace are restricted to that namespace.)
>> 
>> See
>> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Registering-native-routines
>> for some further details about the search mechanism.
>> 
>>> I did not set that limit, but I'm pretty sure it was also meant
>>> as reminder for the useR to "clean up" a bit in her / his R
>>> session, i.e., not load package namespaces unnecessarily. I
>>> cannot yet imagine that you need > 100 packages | namespaces
>>> loaded in your R session. OTOH, some packages nowadays have a
>>> host of dependencies, so I agree that this at least may happen
>>> accidentally more frequently than in the past.
>> 
>> I am not convinced that it is needed.  The OP says he imports many
>> packages, and I doubt that more than a few are required at any one time.
>>  Good practice is to load namespaces as required, using requireNamespace.
> 
> Extensive package dependencies in Bioconductor make it pretty easy to end up 
> with dozen of packages attached or loaded. For instance
> 
>  library(GenomicFeatures)
>  library(DESeq2)
> 
> > length(loadedNamespaces())
> [1] 63
> > length(getLoadedDLLs())
> [1] 41
> 
> Qin's use case is a shiny app, presumably trying to provide relatively 
> comprehensive access to a particular domain. Even if the app were to load / 
> requireNamespace() (this requires considerable programming discipline to 
> ensure that the namespace is available on all programming paths where it is 
> used), it doesn't seem at all improbable that the user in an exploratory 
> analysis would end up accessing dozens of packages with orthogonal 
> dependencies. This is also the use case with Karl Forner's post 
> https://stat.ethz.ch/pipermail/r-devel/2015-May/071104.html 
>  (adding 
> library(crlmm) to the above gets us to 53 DLLs).
> 
>> 
>>> The real solution of course would be a code improvement that
>>> starts with a relatively small number of "DLLinfo" structures
>>> (say 32), and then allocates more batches (of size say 32) if
>>> needed.
>> 

Re: [Rd] Is it possible to increase MAX_NUM_DLLS in future R releases?

2016-05-04 Thread Martin Morgan



On 05/04/2016 05:15 AM, Prof Brian Ripley wrote:

On 04/05/2016 08:44, Martin Maechler wrote:

Qin Zhu 
 on Mon, 2 May 2016 16:19:44 -0400 writes:


 > Hi,
 > I’m working on a Shiny app for statistical analysis. I ran into
this "maximal number of DLLs reached" issue recently because my app
requires importing many other packages.

 > I’ve posted my question on stackoverflow
(http://stackoverflow.com/questions/36974206/r-maximal-number-of-dlls-reached
).


 > I’m just wondering is there any reason to set the maximal
number of DLLs to be 100, and is there any plan to increase it/not
hardcoding it in the future? It seems many people are also running
into this problem. I know I can work around this problem by modifying
the source, but since my package is going to be used by other people,
I don’t think this is a feasible solution.

 > Any suggestions would be appreciated. Thanks!
 > Qin

Increasing that number is of course "possible"... but it also
costs a bit (adding to the fixed memory footprint of R).


And not only that.  At the time this was done (and it was once 50) the
main cost was searching DLLs for symbols.  That is still an issue, and
few packages exclude their DLL from symbol search so if symbols have to
searched for a lot of DLLs will be searched.  (Registering all the
symbols needed in a package avoids a search, and nowadays by default
searches from a namespace are restricted to that namespace.)

See
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Registering-native-routines
for some further details about the search mechanism.


I did not set that limit, but I'm pretty sure it was also meant
as reminder for the useR to "clean up" a bit in her / his R
session, i.e., not load package namespaces unnecessarily. I
cannot yet imagine that you need > 100 packages | namespaces
loaded in your R session. OTOH, some packages nowadays have a
host of dependencies, so I agree that this at least may happen
accidentally more frequently than in the past.


I am not convinced that it is needed.  The OP says he imports many
packages, and I doubt that more than a few are required at any one time.
  Good practice is to load namespaces as required, using requireNamespace.


Extensive package dependencies in Bioconductor make it pretty easy to 
end up with dozen of packages attached or loaded. For instance


  library(GenomicFeatures)
  library(DESeq2)

> length(loadedNamespaces())
[1] 63
> length(getLoadedDLLs())
[1] 41

Qin's use case is a shiny app, presumably trying to provide relatively 
comprehensive access to a particular domain. Even if the app were to 
load / requireNamespace() (this requires considerable programming 
discipline to ensure that the namespace is available on all programming 
paths where it is used), it doesn't seem at all improbable that the user 
in an exploratory analysis would end up accessing dozens of packages 
with orthogonal dependencies. This is also the use case with Karl 
Forner's post 
https://stat.ethz.ch/pipermail/r-devel/2015-May/071104.html (adding 
library(crlmm) to the above gets us to 53 DLLs).





The real solution of course would be a code improvement that
starts with a relatively small number of "DLLinfo" structures
(say 32), and then allocates more batches (of size say 32) if
needed.


The problem of course is that such code will rarely be exercised, and
people have made errors on the boundaries (here multiples of 32) many
times in the past.  (Note too that DLLs can be removed as well as added,
another point of coding errors.)


That argues for a simple increase in the maximum number of DLLs. This 
would enable some people to have very bulky applications that pay a 
performance cost (but the cost here is in small fractions of a 
second...) in terms of symbol look-up (and collision?), but would have 
no consequence for those of us with more sane use cases.


Martin Morgan




Patches to the R sources (development trunk in subversion at
https://svn.r-project.org/R/trunk/ ) are very welcome!

Martin Maechler
ETH Zurich  &  R Core Team







This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Is it possible to increase MAX_NUM_DLLS in future R releases?

2016-05-04 Thread Prof Brian Ripley

On 04/05/2016 08:44, Martin Maechler wrote:

Qin Zhu 
 on Mon, 2 May 2016 16:19:44 -0400 writes:


 > Hi,
 > I’m working on a Shiny app for statistical analysis. I ran into this "maximal 
number of DLLs reached" issue recently because my app requires importing many other 
packages.

 > I’ve posted my question on stackoverflow 
(http://stackoverflow.com/questions/36974206/r-maximal-number-of-dlls-reached 
).

 > I’m just wondering is there any reason to set the maximal number of DLLs 
to be 100, and is there any plan to increase it/not hardcoding it in the future? 
It seems many people are also running into this problem. I know I can work around 
this problem by modifying the source, but since my package is going to be used by 
other people, I don’t think this is a feasible solution.

 > Any suggestions would be appreciated. Thanks!
 > Qin

Increasing that number is of course "possible"... but it also
costs a bit (adding to the fixed memory footprint of R).


And not only that.  At the time this was done (and it was once 50) the 
main cost was searching DLLs for symbols.  That is still an issue, and 
few packages exclude their DLL from symbol search so if symbols have to 
searched for a lot of DLLs will be searched.  (Registering all the 
symbols needed in a package avoids a search, and nowadays by default 
searches from a namespace are restricted to that namespace.)


See 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Registering-native-routines 
for some further details about the search mechanism.



I did not set that limit, but I'm pretty sure it was also meant
as reminder for the useR to "clean up" a bit in her / his R
session, i.e., not load package namespaces unnecessarily. I
cannot yet imagine that you need > 100 packages | namespaces
loaded in your R session. OTOH, some packages nowadays have a
host of dependencies, so I agree that this at least may happen
accidentally more frequently than in the past.


I am not convinced that it is needed.  The OP says he imports many 
packages, and I doubt that more than a few are required at any one time. 
 Good practice is to load namespaces as required, using requireNamespace.



The real solution of course would be a code improvement that
starts with a relatively small number of "DLLinfo" structures
(say 32), and then allocates more batches (of size say 32) if
needed.


The problem of course is that such code will rarely be exercised, and 
people have made errors on the boundaries (here multiples of 32) many 
times in the past.  (Note too that DLLs can be removed as well as added, 
another point of coding errors.)



Patches to the R sources (development trunk in subversion at
https://svn.r-project.org/R/trunk/ ) are very welcome!

Martin Maechler
ETH Zurich  &  R Core Team




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Is it possible to increase MAX_NUM_DLLS in future R releases?

2016-05-04 Thread Martin Maechler
> Qin Zhu 
> on Mon, 2 May 2016 16:19:44 -0400 writes:

> Hi,
> I’m working on a Shiny app for statistical analysis. I ran into this 
"maximal number of DLLs reached" issue recently because my app requires 
importing many other packages.

> I’ve posted my question on stackoverflow 
(http://stackoverflow.com/questions/36974206/r-maximal-number-of-dlls-reached 
).
 

> I’m just wondering is there any reason to set the maximal number of DLLs 
to be 100, and is there any plan to increase it/not hardcoding it in the 
future? It seems many people are also running into this problem. I know I can 
work around this problem by modifying the source, but since my package is going 
to be used by other people, I don’t think this is a feasible solution. 

> Any suggestions would be appreciated. Thanks!
> Qin

Increasing that number is of course "possible"... but it also
costs a bit (adding to the fixed memory footprint of R). 

I did not set that limit, but I'm pretty sure it was also meant
as reminder for the useR to "clean up" a bit in her / his R
session, i.e., not load package namespaces unnecessarily. I
cannot yet imagine that you need > 100 packages | namespaces
loaded in your R session. OTOH, some packages nowadays have a
host of dependencies, so I agree that this at least may happen
accidentally more frequently than in the past. 

The real solution of course would be a code improvement that
starts with a relatively small number of "DLLinfo" structures
(say 32), and then allocates more batches (of size say 32) if
needed. 

Patches to the R sources (development trunk in subversion at
https://svn.r-project.org/R/trunk/ ) are very welcome! 

Martin Maechler
ETH Zurich  &  R Core Team

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel