Re: [Rd] Request for help with UBSAN and total absense of CRAN response

2015-01-17 Thread Jan van der Laan


... and they didn't make it through. I put the files in a gist:

https://gist.github.com/djvanderlaan/1e9beb75d2d595824efc

Jan



On 16-01-15 15:21, Jan van der Laan wrote:

Dirk,

The vagrant setup I use to test my packages with UBSAN also seems to
replicate the error reported by CRAN (together with some other
warnings). I have attached the files (I hope they get through the
filters). I suppose you know what to do with them.

Jan





Dirk Eddelbuettel e...@debian.org schreef:


CRAN has a package of mine in upload limbo because it failed UBSAN.

I am not entirely ignorant on the topic of sanitizers and SAN / ASAN /
UBSAN;
we created not one but two Docker containers with ASAN and USBAN:

   https://registry.hub.docker.com/u/rocker/r-devel-san/
   https://registry.hub.docker.com/u/rocker/r-devel-ubsan-clang/

as well as predecessors to them in earlier Docker repos.

Yet I fail to recreate the errors reported by CRAN:


http://www.stats.ox.ac.uk/pub/bdr/memtests/UBSAN-clang-trunk/RcppAnnoy/tests/runUnitTests.Rout


http://www.stats.ox.ac.uk/pub/bdr/memtests/UBSAN/RcppAnnoy/tests/runUnitTests.Rout


I asked politely (and twice) for help with the corresponding compiler
configuration(s).  But CRAN is of course way above communicating with
mere
mortals such as yours truly.

So I have no recourse other than to spam all of you: if anybody here
has a
working UBSAN setup which can replicate the issue seen in the (rather
small)
RcppAnnoy package?

Erik (upstream for Annoy, CC'ed) and I would be most grateful.  We do not
like being held hostage on an error report we cannot replicate and for
which
we do not receive any help (or even further communication) whatsoever.

Dirk
about to turn into yet another frustrated CRAN user

--
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is the tcltk failure in affylmGUI related to R bug 15957

2015-01-17 Thread Keith Satterley
Thanks Peter and Dan for your replies.
After learning a bit more about tcltk and environments etc. I have replaced

  Try(n - evalq(TclVarCount - TclVarCount + 1, .TkRoot$env))

with

  Try(n - .TkRoot$env$TclVarCount - .TkRoot$env$TclVarCount +1L)

as you suggest.

It now works for both R-3.1.1 and R-3.1.2+

(My understanding is that the Try function is there to put a GUI box around the 
error messages.)

I shall update affylmGUI versions accordingly soon.

cheers
Keith
PS I have also changed the Depends in DESCRIPTION to Imports and added an 
import statement to the NAMESPACE file which is independent of this problem.
Consequently removed Require(tkrplot) statements as no longer needed.

- peter dalgaard pda...@gmail.com wrote:
 Seems unlikely that that particular bug is involved. I seem to recall some 
 change related to inadvertent variable capture in .TkRoot$env (?). At any 
 rate, we currently have
 
  parent.env(.TkRoot$env)
 environment: R_EmptyEnv
 
 which used to be
 
  parent.env(.TkRoot$env)
 environment: R_GlobalEnv
 
 as a result, this won't work any more because R_EmptyEnv has no operators and 
 functions in it:
 
  evalq(x - 1, .TkRoot$env)
 Error in eval(substitute(expr), envir, enclos) : 
   could not find function -
 
 and consequently, you conk out at
 
Try(n - evalq(TclVarCount - TclVarCount + 1, .TkRoot$env))
 
 which presumably needs to be recoded in the same way as the current code in 
 tclVar():
 
  tclVar
 function (init = ) 
 {
 n - .TkRoot$env$TclVarCount - .TkRoot$env$TclVarCount + 
 1L
 name - paste0(::RTcl, n)
 l - list(env = new.env())
 assign(name, NULL, envir = l$env)
 reg.finalizer(l$env, function(env) tcl(unset, ls(env)))
 class(l) - tclVar
 tclvalue(l) - init
 l
 }
 
 (The whole thing looks a bit odd: Your function clones a fair bit of tclVar, 
 wrapping each line in Try() for no apparent reason (or?), with the apparent 
 purpose of doing something that seems quite similar to what tclArray() 
 already does...)
 
 -pd
 
 
  On 14 Jan 2015, at 06:50 , Keith Satterley ke...@wehi.edu.au wrote:
  
  I maintain the package affylmGUI. It works when installed on many previous 
  versions of R. I have today tested exactly the same code under R-2.15.3, 
  R-3.0.2, R-3.1.0, R-3.1.1, R-3.1.2 and R-devel.
  
  I have also tested the versions of affylmGUI downloaded by biocLite for 
  each version of R and the same result applies.
  
  I have no errors under 2.15.3, 3.0.2, 3.1.0 and 3.1.1. The following error 
  occurs under 3.1.2 and R-devel.
  
  I run affylmGUI and read a targets file which then causes affylmGUI to read 
  the specified cel files. On attempting to display the RNA targets file in a 
  Tk window using the RNA Targets option from the RNA Targets Menu item 
  and the following errors occur:
  
  Error text box 1: Error in eval(substitute(expr),enclos):could not find 
  function -   - pressed OK
  Following error text box: Error in paste(::RTcl,n,sep=): object 'n' not 
  found   - pressed OK
  Following error text box: Error in assign(name, NULL, environ = I$env): 
  object 'name' not found   - pressed OK
  Following error text box: Error in paste(set,name, (0,0)\\,sep= 
  ):object 'name' not found   - pressed OK
  
  This then results in an unfilled Tk window.
  
  I am testing on a Windows 7, 64 bit environment. My sessionInfo is:
  
  R version 3.1.2 (2014-10-31)
  Platform: x86_64-w64-mingw32/x64 (64-bit)
  locale:
  [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 
  LC_MONETARY=English_Australia.1252
  [4] LC_NUMERIC=C LC_TIME=English_Australia.1252
  attached base packages:
  [1] stats4parallel  tcltk stats graphics  grDevices utils 
  datasets  methods   base
  other attached packages:
  [1] affylmGUI_1.40.0  AnnotationDbi_1.28.1 GenomeInfoDb_1.2.4
  IRanges_2.0.1 S4Vectors_0.4.0
  [6] xtable_1.7-4  R2HTML_2.3.1 affyPLM_1.42.0
  preprocessCore_1.28.0 gcrma_2.38.0
  [11] tkrplot_0.0-23affyio_1.34.0 BiocInstaller_1.16.1  affy_1.44.0  
   Biobase_2.26.0
  [16] BiocGenerics_0.12.1   limma_3.22.3
  loaded via a namespace (and not attached):
  [1] Biostrings_2.34.1 DBI_0.3.1 RSQLite_1.0.0 splines_3.1.2 
  XVector_0.6.0 zlibbioc_1.12.0
  
  I think the relevant code that is resulting in the error is generated by 
  this function in main.R:
  tclArrayVar - function(){
 Try(n - evalq(TclVarCount - TclVarCount + 1, .TkRoot$env))
 Try(name - paste(::RTcl, n,sep = ))
 Try(l - list(env = new.env()))
 Try(assign(name, NULL, envir = l$env))
 Try(reg.finalizer(l$env, function(env) tcl(unset, ls(env
 Try(class(l) - tclArrayVar)
 Try(.Tcl(paste(set ,name,(0,0) \\,sep=)))
 l  ### Investigate this line KS
  } #end of tclArrayVar - function()
  
  This code is lines 877-886 in main.R
  
  Despite the un-investigated last line in this function, it works fine in 
  earlier versions of R as described 

Re: [Rd] default min-v/nsize parameters

2015-01-17 Thread Nathan Kurz
On Thu, Jan 15, 2015 at 3:55 PM, Michael Lawrence
lawrence.mich...@gene.com wrote:
 Just wanted to start a discussion on whether R could ship with more
 appropriate GC parameters.

I've been doing a number of similar measurements, and have come to the
same conclusion.  R is currently very conservative about memory usage,
and this leads to unnecessarily poor performance on certain problems.
Changing the defaults to sizes that are more appropriate for modern
machines can often produce a 2x speedup.

On Sat, Jan 17, 2015 at 8:39 AM,  luke-tier...@uiowa.edu wrote:
 Martin Morgan discussed this a year or so ago and as I recall bumped
 up these values to the current defaults. I don't recall details about
 why we didn't go higher -- maybe Martin does.

I just checked, and it doesn't seem that any of the relevant values
have been increased in the last ten years.  Do you have a link to the
discussion you recall so we can see why the changes weren't made?

 I suspect the main concern would be with small memory machines in student labs
 and less developed countries.

While a reasonable concern, I'm doubtful there are many machines for
which the current numbers are optimal.  The current minimum size
increases for node and vector heaps are 40KB and 80KB respectively.
This grows as the heap grows (min + .05 * heap), but still means that
we do many more expensive garbage collections at while growing than we
need to.  Paradoxically, the SMALL_MEMORY compile option (which is
suggestd for computers with up to 32MB of RAM) has slightly larger at
50KB and 100KB.

I think we'd get significant benefit for most users by being less
conservative about memory consumption.The exact sizes should be
discussed, but with RAM costing about $10/GB it doesn't seem
unreasonable to assume most machines running R have multiple GB
installed, and those that don't will quite likely be running an OS
that needs a custom compiled binary anyway.

I could be way off, but my suggestion might be a 10MB start with 1MB
minimum increments for SMALL_MEMORY, 100MB start with 10MB increments
for NORMAL_MEMORY, and 1GB start with 100MB increments for
LARGE_MEMORY might be a reasonable spread.

Or one could go even larger, noting that on most systems,
overcommitted memory is not a problem until it is used.  Until we
write to it, it doesn't actually use physical RAM, just virtual
address space.  Or we could stay small, but make it possible to
programmatically increase the granularity from within R.

For ease of reference, here are the relevant sections of code:

https://github.com/wch/r-source/blob/master/src/include/Defn.h#L217
(ripley last authored on Jan 26, 2000 / pd last authored on May 8, 1999)
217  #ifndef R_NSIZE
218  #define R_NSIZE 35L
219  #endif
220  #ifndef R_VSIZE
221  #define R_VSIZE 6291456L
222  #endif

https://github.com/wch/r-source/blob/master/src/main/startup.c#L169
(ripley last authored on Jun 9, 2004)
157 Rp-vsize = R_VSIZE;
158 Rp-nsize = R_NSIZE;
166  #define Max_Nsize 5000 /* about 1.4Gb 32-bit, 2.8Gb 64-bit */
167  #define Max_Vsize R_SIZE_T_MAX /* unlimited */
169  #define Min_Nsize 22
170  #define Min_Vsize (1*Mega)

https://github.com/wch/r-source/blob/master/src/main/memory.c#L335
(luke last authored on Nov 1, 2000)
#ifdef SMALL_MEMORY
336  /* On machines with only 32M of memory (or on a classic Mac OS port)
337  it might be a good idea to use settings like these that are more
338  aggressive at keeping memory usage down. */
339  static double R_NGrowIncrFrac = 0.0, R_NShrinkIncrFrac = 0.2;
340  static int R_NGrowIncrMin = 5, R_NShrinkIncrMin = 0;
341  static double R_VGrowIncrFrac = 0.0, R_VShrinkIncrFrac = 0.2;
342  static int R_VGrowIncrMin = 10, R_VShrinkIncrMin = 0;
343#else
344  static double R_NGrowIncrFrac = 0.05, R_NShrinkIncrFrac = 0.2;
345  static int R_NGrowIncrMin = 4, R_NShrinkIncrMin = 0;
346  static double R_VGrowIncrFrac = 0.05, R_VShrinkIncrFrac = 0.2;
347  static int R_VGrowIncrMin = 8, R_VShrinkIncrMin = 0;
348#endif

static void AdjustHeapSize(R_size_t size_needed)
{
R_size_t R_MinNFree = (R_size_t)(orig_R_NSize * R_MinFreeFrac);
R_size_t R_MinVFree = (R_size_t)(orig_R_VSize * R_MinFreeFrac);
R_size_t NNeeded = R_NodesInUse + R_MinNFree;
R_size_t VNeeded = R_SmallVallocSize + R_LargeVallocSize +
size_needed + R_MinVFree;
double node_occup = ((double) NNeeded) / R_NSize;
double vect_occup = ((double) VNeeded) / R_VSize;

if (node_occup  R_NGrowFrac) {
R_size_t change = (R_size_t)(R_NGrowIncrMin + R_NGrowIncrFrac
* R_NSize);
if (R_MaxNSize = R_NSize + change)
   R_NSize += change;
}
else if (node_occup  R_NShrinkFrac) {
R_NSize -= (R_NShrinkIncrMin + R_NShrinkIncrFrac * R_NSize);
if (R_NSize  NNeeded)
 R_NSize = (NNeeded  R_MaxNSize) ? NNeeded: R_MaxNSize;
if (R_NSize  orig_R_NSize)
 R_NSize = orig_R_NSize;
 }

if (vect_occup  1.0  VNeeded  R_MaxVSize)
  

Re: [Rd] default min-v/nsize parameters

2015-01-17 Thread luke-tierney

Martin Morgan discussed this a year or so ago and as I recall bumped
up these values to the current defaults. I don't recall details about
why we didn't go higher -- maybe Martin does. I suspect the main
concern would be with small memory machines in student labs and less
developed countries. If there was a way on all platforms to identify
how much memory is available that might help to set a default, though
that isn't perfect since you want something different on a large
memory machine for one R process than for 16 R processes.

Best,

luke

On Thu, 15 Jan 2015, Michael Lawrence wrote:


Just wanted to start a discussion on whether R could ship with more
appropriate GC parameters. Right now, loading the recommended package
Matrix leads to:


library(Matrix)
gc()

 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1076796 57.61368491 73.1  1198505 64.1
Vcells 1671329 12.82685683 20.5  1932418 14.8

Results may vary, but here R needed 64MB of N cells and 15MB of V cells to
load one of the most important packages.

Currently, the default GC triggers are ~20MB (64 bit systems) for N cells
and ~6MB of V cells. Martin Morgan found that this leads to a lot of GC
overhead during package loading and at least in our tests can significantly
increase the load time of complex packages.

If we set the triggers at the command line beyond the reach of
library(Matrix) (--min-vsize=2048M --min-nsize=45M), then we see:

 used (Mb) gc trigger (Mb) max used  (Mb)
Ncells 1076859 57.6   47185920 2520  6260069 334.4
Vcells 1671431 12.8  268435456 2048  9010303  68.8

So by effectively disabling the GC, we let R consume 335MB N + 70MB of V,
but loading goes a lot faster:

Loading Matrix with default settings:

system.time(library(Matrix))

  user  system elapsed
 1.600   0.011   1.610

With high GC triggers ():

system.time(library(Matrix))

  user  system elapsed
 0.983   0.097   1.079

Given modern hardware capabilities and the need to efficiently load
software for the user to be able to do something, perhaps we should bump
the default settings so that the GC is fired sparingly when loading a large
package.

For users of Bioconductor, we see this for library(GenomicRanges):

 used (Mb) gc trigger (Mb) max used  (Mb)
Ncells 1322124 70.7   47185920 2520 15591302 832.7
Vcells 1216015  9.3  268435456 2048 13992181 106.8

So perhaps that user would want 900 MB of N and 100 MB of V as the trigger
(corresponding to --min-vsize=100M --min-nsize=16M).

Thoughts?

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel