Re: [Rd] Request: Increasing MAX_NUM_DLLS in Rdynload.c

2016-12-19 Thread Steve Bronder
Thanks Henrik this is very helpful! I will try this out on our tests and
see if gcDLLs() has a positive effect.

mlr currently has tests broken down by learner type such as classification,
regression, forecasting, clustering, etc.. There are 83 classifiers alone
so even when loading and unloading across learner types we can still hit
the MAX_NUM_DLLS error, meaning we'll have to break them down further (or
maybe we can be clever with gcDLLs()?). I'm CC'ing Lars Kotthoff and Bernd
Bischl to make sure I am representing the issue well.

Regards,

Steve Bronder
Website: stevebronder.com
Phone: 412-719-1282
Email: sbron...@stevebronder.com


On Tue, Dec 20, 2016 at 1:04 AM, Henrik Bengtsson <
henrik.bengts...@gmail.com> wrote:

> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
> packages don't unload their DLLs when they being unloaded themselves.
> In other words, there may be left-over DLLs just sitting there doing
> nothing but occupying space.  You can remove these, using:
>
>R.utils::gcDLLs()
>
> Maybe that will help you get through your tests (as long as you're
> unloading packages).  gcDLLs() will look at base::getLoadedDLLs() and
> its content and compare to loadedNamespaces() and unregister any
> "stray" DLLs that remain after corresponding packages have been
> unloaded.
>
> I think it would be useful if R CMD check would also check that DLLs
> are unregistered when a package is unloaded
> (https://github.com/HenrikBengtsson/Wishlist-for-R/issues/29), but of
> course, someone needs to write the code / a patch for this to happen.
>
> /Henrik
>
> On Mon, Dec 19, 2016 at 6:01 PM, Steve Bronder
>  wrote:
> > This is a request to increase MAX_NUM_DLLS in Rdynload.c in from 100 to
> 500.
> >
> > On line 131 of Rdynload.c, changing
> >
> > #define MAX_NUM_DLLS 100
> >
> >  to
> >
> > #define MAX_NUM_DLLS 500
> >
> >
> > In development of the mlr package, there have been several episodes in
> the
> > past where we have had to break up unit tests because of the "maximum
> > number of DLLs reached" error. This error has been an inconvenience that
> is
> > going to keep happening as the package continues to grow. Is there more
> > than meets the eye with this error or would everything be okay if the
> above
> > line changes? Would that have a larger effect in other parts of R?
> >
> > As R grows, we are likely to see more 'meta-packages' such as the
> > Hadley-verse, caret, mlr, etc. need an increasing amount of DLLs loaded
> at
> > any point in time to conduct effective unit tests. If  MAX_NUM_DLLS is
> set
> > to 100 for a very particular reason than I apologize, but if it is
> possible
> > to increase MAX_NUM_DLLS it would at least make the testing at mlr much
> > easier.
> >
> > I understand you are all very busy and thank you for your time.
> >
> >
> > Regards,
> >
> > Steve Bronder
> > Website: stevebronder.com
> > Phone: 412-719-1282
> > Email: sbron...@stevebronder.com
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Request: Increasing MAX_NUM_DLLS in Rdynload.c

2016-12-19 Thread Henrik Bengtsson
On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
packages don't unload their DLLs when they being unloaded themselves.
In other words, there may be left-over DLLs just sitting there doing
nothing but occupying space.  You can remove these, using:

   R.utils::gcDLLs()

Maybe that will help you get through your tests (as long as you're
unloading packages).  gcDLLs() will look at base::getLoadedDLLs() and
its content and compare to loadedNamespaces() and unregister any
"stray" DLLs that remain after corresponding packages have been
unloaded.

I think it would be useful if R CMD check would also check that DLLs
are unregistered when a package is unloaded
(https://github.com/HenrikBengtsson/Wishlist-for-R/issues/29), but of
course, someone needs to write the code / a patch for this to happen.

/Henrik

On Mon, Dec 19, 2016 at 6:01 PM, Steve Bronder
 wrote:
> This is a request to increase MAX_NUM_DLLS in Rdynload.c in from 100 to 500.
>
> On line 131 of Rdynload.c, changing
>
> #define MAX_NUM_DLLS 100
>
>  to
>
> #define MAX_NUM_DLLS 500
>
>
> In development of the mlr package, there have been several episodes in the
> past where we have had to break up unit tests because of the "maximum
> number of DLLs reached" error. This error has been an inconvenience that is
> going to keep happening as the package continues to grow. Is there more
> than meets the eye with this error or would everything be okay if the above
> line changes? Would that have a larger effect in other parts of R?
>
> As R grows, we are likely to see more 'meta-packages' such as the
> Hadley-verse, caret, mlr, etc. need an increasing amount of DLLs loaded at
> any point in time to conduct effective unit tests. If  MAX_NUM_DLLS is set
> to 100 for a very particular reason than I apologize, but if it is possible
> to increase MAX_NUM_DLLS it would at least make the testing at mlr much
> easier.
>
> I understand you are all very busy and thank you for your time.
>
>
> Regards,
>
> Steve Bronder
> Website: stevebronder.com
> Phone: 412-719-1282
> Email: sbron...@stevebronder.com
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Request: Increasing MAX_NUM_DLLS in Rdynload.c

2016-12-19 Thread Steve Bronder
This is a request to increase MAX_NUM_DLLS in Rdynload.c in from 100 to 500.

On line 131 of Rdynload.c, changing

#define MAX_NUM_DLLS 100

 to

#define MAX_NUM_DLLS 500


In development of the mlr package, there have been several episodes in the
past where we have had to break up unit tests because of the "maximum
number of DLLs reached" error. This error has been an inconvenience that is
going to keep happening as the package continues to grow. Is there more
than meets the eye with this error or would everything be okay if the above
line changes? Would that have a larger effect in other parts of R?

As R grows, we are likely to see more 'meta-packages' such as the
Hadley-verse, caret, mlr, etc. need an increasing amount of DLLs loaded at
any point in time to conduct effective unit tests. If  MAX_NUM_DLLS is set
to 100 for a very particular reason than I apologize, but if it is possible
to increase MAX_NUM_DLLS it would at least make the testing at mlr much
easier.

I understand you are all very busy and thank you for your time.


Regards,

Steve Bronder
Website: stevebronder.com
Phone: 412-719-1282
Email: sbron...@stevebronder.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] colnames for data.frame could be greatly improved

2016-12-19 Thread Jan Gorecki
Hello,

colnames seems to be not optimized well for data.frame. It escapes
processing for data.frame in

  if (is.data.frame(x) && do.NULL)
return(names(x))

but only when do.NULL true. This makes huge difference when do.NULL
false. Minimal edit to `colnames`:

if (is.data.frame(x)) {
nm <- names(x)
if (do.NULL || !is.null(nm))
return(nm)
else
return(paste0(prefix, seq_along(x)))
}

Script and timings:

N=1e7; K=100
set.seed(1)
DF <- data.frame(
id1 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
id2 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE), # small groups (char)
id4 = sample(K, N, TRUE),  # large groups (int)
id5 = sample(K, N, TRUE),  # large groups (int)
id6 = sample(N/K, N, TRUE),# small groups (int)
v1 =  sample(5, N, TRUE),  # int in range [1,5]
v2 =  sample(5, N, TRUE),  # int in range [1,5]
v3 =  sample(round(runif(100,max=100),4), N, TRUE) # numeric e.g. 23.5749
)
cat("GB =", round(sum(gc()[,2])/1024, 3), "\n")
#GB = 0.397
colnames(DF) = NULL
system.time(nm1<-colnames(DF, FALSE))
#   user  system elapsed
# 22.158   0.299  22.498
print(nm1)
#[1] "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9"

### restart R

colnames <- function (x, do.NULL = TRUE, prefix = "col")
{
if (is.data.frame(x)) {
nm <- names(x)
if (do.NULL || !is.null(nm))
return(nm)
else
return(paste0(prefix, seq_along(x)))
}
dn <- dimnames(x)
if (!is.null(dn[[2L]]))
dn[[2L]]
else {
nc <- NCOL(x)
if (do.NULL)
NULL
else if (nc > 0L)
paste0(prefix, seq_len(nc))
else character()
}
}
N=1e7; K=100
set.seed(1)
DF <- data.frame(
id1 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
id2 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE), # small groups (char)
id4 = sample(K, N, TRUE),  # large groups (int)
id5 = sample(K, N, TRUE),  # large groups (int)
id6 = sample(N/K, N, TRUE),# small groups (int)
v1 =  sample(5, N, TRUE),  # int in range [1,5]
v2 =  sample(5, N, TRUE),  # int in range [1,5]
v3 =  sample(round(runif(100,max=100),4), N, TRUE) # numeric e.g. 23.5749
)
cat("GB =", round(sum(gc()[,2])/1024, 3), "\n")
#GB = 0.397
colnames(DF) = NULL
system.time(nm1<-colnames(DF, FALSE))
#   user  system elapsed
#  0.001   0.000   0.000
print(nm1)
#[1] "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9"

sessionInfo()
#R Under development (unstable) (2016-12-19 r71815)
#Platform: x86_64-pc-linux-gnu (64-bit)
#Running under: Debian GNU/Linux stretch/sid
#
#locale:
# [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
# [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
# [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
# [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
# [9] LC_ADDRESS=C   LC_TELEPHONE=C
#[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#
#attached base packages:
#[1] stats graphics  grDevices utils datasets  methods   base  #
#
#loaded via a namespace (and not attached):
#[1] compiler_3.4.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Bioc-devel] Initial setup of git svn with existing GitHub repo - upstream working tree history error

2016-12-19 Thread Keegan Korthauer
Hi bioc-devel,

I am trying set up git svn for my existing GitHub repo for my Bioconductor 
package scDD.  I would like to be able to keep all of the development history 
in my preexisting GitHub repo, so I followed the instructions for Scenario 2 at 
http://bioconductor.org/developers/how-to/git-mirror/ 
.  However, when I try 
the ‘git svn rebase’ command I get the ‘dreaded’ error mentioned in the 
troubleshooting section (‘unable to determine upstream SVN information from 
working tree history’).  

I guessed that this error was a result of some commits that were made between 
the package being accepted to Bioconductor and the current state of my Github 
repo’s master branch.  I followed the directions a the file that was linked to 
as a reference in the troubleshooting section of the above link 
(http://stackoverflow.com/questions/9805980/unable-to-determine-upstream-svn-information-from-working-tree-history
 
),
 including a hard reset of a copy of the master branch to an earlier commit 
that reflects the parent state of the svn repo, then git svn fetch, but I still 
get the same error message as above.  

If anyone has any other suggestions on how I can resolve this conflict between 
svn and git, please let me know.  Thanks!

Best,
Keegan


--
Keegan Korthauer, PhD
Postdoctoral Research Fellow
Department of Biostatistics & Computational Biology, Dana-Farber Cancer 
Institute
Department of Biostatistics, Harvard T. H. Chan School of Public Health
http://bcb.dfci.harvard.edu/~keegan



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Rd] Startup process: Objects are automatically print():ed

2016-12-19 Thread Henrik Bengtsson
Consider a `~/.Rprofile` file containing:

print("hello")
"world"
invisible("!")

This will output the following:

[1] "hello"
[1] "world"

when R is started.  Note that "world" is also print():ed.  In
contrast, if you'd source() the same file, then you'd need to use
argument print.eval = TRUE to get the same behavior:

> source("~/.Rprofile")
[1] "hello"
> source("~/.Rprofile", print.eval = TRUE)
[1] "hello"
[1] "world"
```

I am aware that the R startup process is special in many ways, e.g. it
happens very early on and only the base package is loaded.  However,
is the automatic print():ing of objects by design, a side effect, or a
bug?  It appears to not be documented, at least not in help("Startup",
package = "base").

/Henrik

PS. I've got a bit of a deja vu while writing this - I might have
already brought this one up many years ago.

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] ‘gWidgets2RGtk2’ is not available in malbec1 Linux (Ubuntu 16.04.1 LTS) / x86_64

2016-12-19 Thread Obenchain, Valerie
It looks like this has cleared up. We couldn't find anything
reproducible so maybe it was a glitch.

Valerie



On 12/13/2016 07:00 AM, Obenchain, Valerie wrote:
> Hi Hemi,
>
> Thanks for the report . We're looking into this ...
>
> Valerie
>
> On 12/11/2016 05:31 PM, Luan Hemi wrote:
>> Dear sir or madam,
>>
>> The gWidgets2RGtk2 seems not to be at malbec1 Linux and causes some
>> problems. Could anyone give some helps?
>>
>> http://bioconductor.org/checkResults/3.4/bioc-LATEST/statTarget/malbec1-buildsrc.html
>>
>>
>>
>> ##
>> ##
>> ###
>> ### Running command:
>> ###
>> ###   /home/biocbuild/bbs-3.4-bioc/R/bin/R CMD build --keep-empty-dirs
>> --no-resave-data statTarget
>> ###
>> ##
>> ##
>>
>>
>> * checking for file ‘statTarget/DESCRIPTION’ ... OK
>> * preparing ‘statTarget’:
>> * checking DESCRIPTION meta-information ... OK
>> * installing the package to build vignettes
>>   ---
>> ERROR: dependency ‘gWidgets2RGtk2’ is not available for package ‘statTarget’
>> * removing ‘/tmp/RtmpsXjnzW/Rinst7c64641de7dc/statTarget’
>>   ---
>> ERROR: package installation failed
>>
>> Thanks!
>>
>> Sincerely yours,
>> Hemi
>>
>>  [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
> This email message may contain legally privileged and/or confidential 
> information.  If you are not the intended recipient(s), or the employee or 
> agent responsible for the delivery of this message to the intended 
> recipient(s), you are hereby notified that any disclosure, copying, 
> distribution, or use of this email message is prohibited.  If you have 
> received this message in error, please notify the sender immediately by 
> e-mail and delete this email message from your computer. Thank you.
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel