from:"Henrik Bengtsson"

Re: [Rd] proper use of reg.finalizer to close connections

2014-10-27 Thread Henrik Bengtsson

...and don't forget to make sure all the function that .myFinalizer()
calls are also around. /Henrik

On Mon, Oct 27, 2014 at 10:10 AM, Murat Tasan mmu...@gmail.com wrote:
 Eh, after some flailing, I think I solved it.
 I _think_ this pattern should guarantee that the finalizer function is
 still present when needed:

 .STATE_CONTAINER - new.env(parent = emptyenv())
 .STATE_CONTAINER$some_state_variable - ## some code
 .STATE_CONTAINER$some_other_state_variable - ## some code

 .myFinalizer - function(name_of_state_variable_to_clean_up)

 .onLoad - function(libname, pkgname) {
 reg.finalizer(
 e = parent.env(environment()),
 f = function(env) sapply(ls(env$.STATE_CONTAINER), .myFinalizer),
 onexit = TRUE)
 }

 This way, the finalizer is registered on the enclosing environment of
 the .onLoad function, which should be the package environment itself.
 And that means .myFinalizer should still be around when it's called
 during q() or unload/gc().
 Effectively, the finalizer is tied to the entire package, rather than
 the state variable container(s), which might not be the most elegant
 solution, but it should work well enough for most purposes.

 Cheers and thanks for the advice,

 -m

 On Mon, Oct 27, 2014 at 12:18 AM, Murat Tasan mmu...@gmail.com wrote:
 Ah, good point, I hadn't thought of that detail.
 Would moving reg.finalizer back outside of .onLoad and hooking it to the
 package's environment itself work (more safely)?
 Something like:
 finalizerFunction - ## cleanup code
 reg.finalizer(parent.env(), finalizerFunction)

 -m

 On Oct 26, 2014 11:03 PM, Henrik Bengtsson h...@biostat.ucsf.edu wrote:

 On Sun, Oct 26, 2014 at 8:14 PM, Murat Tasan mmu...@gmail.com wrote:
  Ah (again)!
  Even with my fumbling presentation of the issue, you gave me the hint
  that solved it, thanks!
 
  Yes, the reg.finalizer call needs to be wrapped in an .onLoad hook so
  it's not called once during package installation and then never again.
  And once I switched to using ls() (instead of names()), everything
  works as expected.
 
  So, the package code effectively looks like so:
 
  .CONNS - new.env(parent = emptyenv())
  .onLoad - function(libname, pkgname) {
  reg.finalizer(.CONNS, function(x) sapply(ls(x), .disconnect))
  }
  .disconnect - function(x) {
  ## handle disconnection of .CONNS[[x]] here
  }

 In your example above, I would be concerned about what happens if you
 detach/unload your package, because then you're finalizer is still
 registered and will be called whenever '.CONNS' is being garbage
 collector (or there after).  However, the finalizer function calls
 .disconnect(), which is no longer available.

 Finalizers should be used with great care, because you're not in
 control in what order things are occurring and what resources are
 around when the finalizer function is eventually called and when it is
 called.  I've been bitten by this a few times and it can be very hard
 to reproduce and troubleshoot such bugs.  See also the 'Note' of
 ?reg.finalizer.

 My $.02

 /Henrik

 
  Cheers and thanks!
 
  -m
 
 
 
 
  On Sun, Oct 26, 2014 at 8:53 PM, Gábor Csárdi csardi.ga...@gmail.com
  wrote:
  Well, to be honest I don't understand fully what you are trying to do.
  If you want to run code when the package is detached or when it is
  unloaded, then use a hook:
  http://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Load-hooks
 
  If you want to run code when an object is freed, then use a finalizer.
 
  Note that when you install a package, R runs all the code in the
  package and only stores the results of the code in the installed
  package. So if you create an object outside of a function in your
  package, then only the object will be stored in the package, but not
  the code that creates it. The object will be simply loaded when you
  load the package, but it will not be re-created.
 
  Now, I am not sure what happens if you set the finalizer on such an
  object in the package. I can imagine that the finalizer will not be
  saved into the package, and is only used once, when
  building/installing the package. In this case you'll need to set the
  finalizer in .onLoad().
 
  Gabor
 
  On Sun, Oct 26, 2014 at 10:35 PM, Murat Tasan mmu...@gmail.com wrote:
  Ah, thanks for the ls() vs names() tip!
  (But sadly, it didn't solve the issue... )
 
  So, after some more tinkering, I believe the finalizer is being called
  _sometimes_.
  I changed the reg.finalizer(...) call to just this:
 
  reg.finalizer(.CONNS, function(x) print(foo), onexit  = TRUE)
 
  Now, when I load the package and detach(..., unload = TRUE), nothing
  prints.
  And when I quit, nothing prints.
 
  If I, however, create an environment on the workspace, like so:
  e - new.env(parent = emptyenv())
  reg.finalizer(e, function(x) print(bar), onexit = TRUE)
  When I quit (or rm(e)), bar is printed.
  But no foo (corresponding to same sequence of code, just in the
  package instead).
 
  BUT(!), when I

[Rd] OSX Yosemite (10.10): Are package binaries the same as for OSX Mavericks (10.9)?

2014-10-27 Thread Henrik Bengtsson

I'm trying to help someone to troubleshoot possible OSX Yosemite
issues, but I've only got access to OSX ( 10.9) so I cannot check
myself.

When building/installing binary R packages, there are different
binaries depending on OSX version.  For instance, CRAN provides
different binaries for 'OS X Snow Leopard' and 'OS X Mavericks', e.g.
http://cran.r-project.org/web/packages/matrixStats/index.html.

What about the new OSX Yosemite?  From
http://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Yosemite it
looks like its binaries are the same/compatible with those of 'OS X
Mavericks' - can someone please confirm this?  Another way to put it,
if a repository provides OSX Mavericks binaries will an OSX Yosemite
user install these or we s/he fall back to installing from source?

Thanks

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] OSX Yosemite (10.10): Are package binaries the same as for OSX Mavericks (10.9)?

2014-10-27 Thread Henrik Bengtsson

On Mon, Oct 27, 2014 at 11:23 AM, Dan Tenenbaum dtene...@fredhutch.org wrote:

 - Original Message -
 From: Dan Tenenbaum dtene...@fredhutch.org
 To: Henrik Bengtsson h...@biostat.ucsf.edu
 Cc: R-devel r-devel@r-project.org
 Sent: Monday, October 27, 2014 11:21:59 AM
 Subject: Re: [Rd] OSX Yosemite (10.10): Are package binaries the same as for 
 OSX Mavericks (10.9)?

 - Original Message -
  From: Henrik Bengtsson h...@biostat.ucsf.edu
  To: R-devel r-devel@r-project.org
  Sent: Monday, October 27, 2014 11:16:10 AM
  Subject: [Rd] OSX Yosemite (10.10): Are package binaries the same
  as for OSX Mavericks (10.9)?

  I'm trying to help someone to troubleshoot possible OSX Yosemite
  issues, but I've only got access to OSX ( 10.9) so I cannot check
  myself.

  When building/installing binary R packages, there are different
  binaries depending on OSX version.  For instance, CRAN provides
  different binaries for 'OS X Snow Leopard' and 'OS X Mavericks',
  e.g.
  http://cran.r-project.org/web/packages/matrixStats/index.html.

  What about the new OSX Yosemite?  From
  http://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Yosemite
  it
  looks like its binaries are the same/compatible with those of 'OS X
  Mavericks' - can someone please confirm this?  Another way to put
  it,
  if a repository provides OSX Mavericks binaries will an OSX
  Yosemite
  user install these or we s/he fall back to installing from source?

 Yes, a Yosemite user will by default be installing packages built on
 Mavericks using the Mavericks build of R, and they should work.

 Provided of course that that Yosemite user is using the Mavericks build of R. 
 They could also be using the Snow Leopard build of R which should also work, 
 and would be installing by default packages build on Snow Leopard using the 
 Snow Leopard build of R.

Thanks for this Dan.

As far as I understand, for an OSX user to install binary packages
option 'pkgType' has to be set to either mac.binary or
mac.binary.mavericks.  A few questions for clarification:

Q. Is it the default that 'pkgType' be set to mac.binary on OSX (
10.9) and to mac.binary.mavericks on OSX (= 10.9)?

Q. Are you saying that if an OSX (= 10.9) user uses
options(pkgType=mac.binary), then install.packages() will install
the OSX 10.6 (Snow Leopard) binaries *and* that these binaries are
backward compatible and should work equally well?

Q. In other words, if a user have problems with a particular OSX 10.9
(Mavericks) binary, would a first step of troubleshooting be to ask
that user to try the OSX 10.6 (Snow Leopard) build?

Q. If a user has options(pkgType=mac.binary.mavericks), but the
repository does not provide such binaries, will install.packages()
fall back to mac.binary, or will it go directly to source?

/Henrik

PS. rantFrom a non-active OSX user, using names instead of numbers
to refer to versions is cute but insane. You need a very good memory
to keep track of the ordering of Snow Leopard, Leopard, Mavericks etc.
and it's not getting easier./rant  It would be great if R/BioC and
everyone else would always present the version number when talking
about OSX version and only use the name for redundancy.

 Dan

 Dan

  Thanks

  Henrik

  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] OSX Yosemite (10.10): Are package binaries the same as for OSX Mavericks (10.9)?

2014-10-28 Thread Henrik Bengtsson

On Mon, Oct 27, 2014 at 12:33 PM, Dan Tenenbaum dtene...@fredhutch.org wrote:

 - Original Message -
 From: Henrik Bengtsson h...@biostat.ucsf.edu
 To: Dan Tenenbaum dtene...@fredhutch.org
 Cc: R-devel r-devel@r-project.org
 Sent: Monday, October 27, 2014 12:21:49 PM
 Subject: Re: [Rd] OSX Yosemite (10.10): Are package binaries the same as for 
 OSX Mavericks (10.9)?

 On Mon, Oct 27, 2014 at 11:23 AM, Dan Tenenbaum
 dtene...@fredhutch.org wrote:

  - Original Message -
  From: Dan Tenenbaum dtene...@fredhutch.org
  To: Henrik Bengtsson h...@biostat.ucsf.edu
  Cc: R-devel r-devel@r-project.org
  Sent: Monday, October 27, 2014 11:21:59 AM
  Subject: Re: [Rd] OSX Yosemite (10.10): Are package binaries the
  same as for OSX Mavericks (10.9)?

  - Original Message -
   From: Henrik Bengtsson h...@biostat.ucsf.edu
   To: R-devel r-devel@r-project.org
   Sent: Monday, October 27, 2014 11:16:10 AM
   Subject: [Rd] OSX Yosemite (10.10): Are package binaries the
   same
   as for OSX Mavericks (10.9)?

   I'm trying to help someone to troubleshoot possible OSX Yosemite
   issues, but I've only got access to OSX ( 10.9) so I cannot
   check
   myself.

   When building/installing binary R packages, there are different
   binaries depending on OSX version.  For instance, CRAN provides
   different binaries for 'OS X Snow Leopard' and 'OS X Mavericks',
   e.g.
   http://cran.r-project.org/web/packages/matrixStats/index.html.

   What about the new OSX Yosemite?  From
   http://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Yosemite
   it
   looks like its binaries are the same/compatible with those of
   'OS X
   Mavericks' - can someone please confirm this?  Another way to
   put
   it,
   if a repository provides OSX Mavericks binaries will an OSX
   Yosemite
   user install these or we s/he fall back to installing from
   source?

  Yes, a Yosemite user will by default be installing packages built
  on
  Mavericks using the Mavericks build of R, and they should work.

  Provided of course that that Yosemite user is using the Mavericks
  build of R. They could also be using the Snow Leopard build of R
  which should also work, and would be installing by default
  packages build on Snow Leopard using the Snow Leopard build of R.

 Thanks for this Dan.

 As far as I understand, for an OSX user to install binary packages
 option 'pkgType' has to be set to either mac.binary or
 mac.binary.mavericks.  A few questions for clarification:

 Q. Is it the default that 'pkgType' be set to mac.binary on OSX (
 10.9) and to mac.binary.mavericks on OSX (= 10.9)?

 Q. Are you saying that if an OSX (= 10.9) user uses
 options(pkgType=mac.binary), then install.packages() will install
 the OSX 10.6 (Snow Leopard) binaries *and* that these binaries are
 backward compatible and should work equally well?

 Q. In other words, if a user have problems with a particular OSX 10.9
 (Mavericks) binary, would a first step of troubleshooting be to ask
 that user to try the OSX 10.6 (Snow Leopard) build?

 Q. If a user has options(pkgType=mac.binary.mavericks), but the
 repository does not provide such binaries, will install.packages()
 fall back to mac.binary, or will it go directly to source?

 First of all, this should be on R-SIG-Mac.

I considered that, but I'm also asking this as a package developer and
wonder what happens if someone installs my packages incorrectly and I
need to troubleshoot what's reported as a bugs but may not be, so I
though it would be more appropriate here.

 It all depends on what build of R you are using. You can be on Snow Leopard 
 or later (including Mavericks and Yosemite)  and use the Snow Leopard build. 
 The default package type will be mac.binary.

 You can be on Mavericks or later and using the Mavericks build of R and your 
 package type will by default be mac.binary.mavericks.

Just for the record: I've verified that it is not possible to install
the Mavericks build of R on a pre-Mavericks OSX version by mistake; on
an OSX 10.6.8 machine I get:

$ wget 
http://r.research.att.com/mavericks/R-3.1-branch/R-3.1-branch-mavericks.pkg
$ sudo installer -pkg R-3.1-branch-mavericks.pkg -target /
...
installer: This build of R requires Mac OS X 10.9 or higher.
$

 The two types of binary packages are NOT binary compatible! You should not 
 mix and match them. (Technically, if a given package does not have native 
 code in it, it should work, but you don't really want to go there.)

I understand that packages without native code should work, but is
there a reason for why R and install.packages() allows such mix and
matching in the first place?  I've tested
install.packages(matrixStats, type=mac.binary.mavericks) on an OSX
10.6.8 machine and it install the package without complaints.
Wouldn't it be better then if it gave an error:

 install.packages(matrixStats, type=mac.binary.mavericks)
Installing package into '/Users/hb/Library/R/3.1/library'
(as 'lib

[Rd] Milestone: 6000 packages on CRAN

2014-10-29 Thread Henrik Bengtsson

Another 1000 packages were added to CRAN and this time in less than 12
months.  Today (2014-10-29) on The Comprehensive R Archive Network
(CRAN) [1]:

Currently, the CRAN package repository features 6000 available packages.

Going from 5000 to 6000 packages took 355 days - which means that it's
on average was only ~8.5 hours between each new packages added.  It is
actually even be more frequent since dropped packages are not
accounted for.  The 6000 packages on CRAN are maintained by 3444
people [2].  Thanks to all package developers and to the CRAN Team for
handling all this!

You can give back by carefully reporting bugs to the maintainers and
properly citing any packages you use in your publications, cf.
citation(pkg name).

Milestones:

2014-10-29: 6000 packages [this post]
2013-11-08: 5000 packages [8]
2012-08-23: 4000 packages [7]
2011-05-12: 3000 packages [6]
2009-10-04: 2000 packages [5]
2007-04-12: 1000 packages [4]
2004-10-01: 500 packages [3,4]
2003-04-01: 250 packages [3,4]

[1] http://cran.r-project.org/web/packages/
[2] http://cran.r-project.org/web/checks/check_summary_by_maintainer.html
[3] Private data.
[4] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
[5] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
[6] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html
[7] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html
[8] https://stat.ethz.ch/pipermail/r-devel/2013-November/067935.html

/Henrik

PS. These data are for CRAN only. There are many more packages
elsewhere, e.g. R-Forge, Bioconductor, Github etc.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Citation if copying R base code

2014-11-06 Thread Henrik Bengtsson

On Nov 6, 2014 3:36 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote:

 On 06/11/2014, 5:57 AM, Peter Meissner wrote:
  Dear Listeners,
 
  ... also I read the CRAN policies and tried to solve those questions
  myself I feel very much in the need of good advise ...
 
 
  I am currently finishing a package that -- to solve some nasty problems
  with dirty data -- uses its own as.Date() equivalent methods (i.e. its
  own generic and methods).
 
  Thereby, I shamelessly copied code from the as.Date() methods from the
  base package and only made some minor adjustments.

 There's no problem doing that, as long as you respect the license.  That
 includes keeping the copyright notices from the files where you found
 the code:  see the GPL

 
  For my main achievement was copy-pasting I feel obliged to cite the
  efforts made by base package authors - do I, should I? Currently I only
  use the help files to mention that the generic and its methods are
  basically the same as as.Date(), except this and that.

 In your package help file it would be polite to describe the
 contributions from the R source code.  In the DESCRIPTION file, the rule
 is that all significant contributors must be included.  You'll need to
 judge that, but from your description, I'd guess this counts.

  And if yes how to do it best? What is the standard procedure here?
  Should I include base package authors as contributors in DESCRIPTION???
 
  Am I allowed to use MIT + file license with that or is it wrong to do
so?

 No, you must use the GPL, since the code you copied is licensed under
 the GPL.  You can choose to use version 2 or 3 (or both).  You do not
 have permission to re-license R code under a different license.

Theoretically you could ask the copyright holder of that piece of code
whether he/she/it allows you to use a different license. This brings up
another question: who is formally the copyright holder of the R source code
(and documentation)? The R Foundation, the individual who contributed the
code in the first place, or someone else? You could certainly imagine a
case where a piece of code was donated to R by someone, e.g. the code
originates from a user-contributed package and has not been modified since.
It may even be that that code was licensed under another license at the
time.

Henrik


 Duncan Murdoch

 
 
  I appreciate any advise on these (I think important) but very confusing
  matters of referencing and licensing.
 
 
  Best, Peter
 
 
  PS:
  - My current description:
  https://github.com/petermeissner/wikipediatrend/blob/master/DESCRIPTION
 
  - the package specific as.Date() implementation:
  https://github.com/petermeissner/wikipediatrend/blob/master/R/wp_date.R
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How to maintain memory in R extension

2014-11-12 Thread Henrik Bengtsson

On Wed, Nov 12, 2014 at 10:20 AM, Martin Morgan mtmor...@fredhutch.org wrote:
 On 11/12/2014 05:36 AM, Zheng Da wrote:

 Hello,

 I wrote a system to perform data analysis in C++. Now I am integrating
 it to R. I need to allocate memory for my own C++ data structures,
 which can't be represented by any R data structures. I create a global
 hashtable to keep a reference to the C++ data structures. Whenever I
 allocate one, I register it in the hashtable and return its key to the
 R code. So later on, the R code can access the C++ data structures
 with their keys.

 The problem is how to perform garbage collection on the C++ data
 structures. Once an R object that contains the key is garbage
 collected, the R code can no longer access the corresponding C++ data
 structure, so I need to deallocate it. Is there any way that the C++
 code can get notification when an R object gets garbage collected? If
 not, what is the usual way to manage memory in R extensions?


 register a finalizer that runs when there are no longer references to the R
 object, see ?reg.finalizer or the interface to R and C finalizers in
 Rinternals.h. If you return more than one reference to a key, then of course
 you'll have to manage these in your own C++ code.

A small but important addition: Make sure your registered finalizer
also works, or at least don't core dump R, if your package (or one of
its dependencies) happens be unloaded by the time the garbage
collector runs.  This task seems easy but can be quite tricky, e.g.
should you reload you package temporarily and what are the side
effects from doing that?

/Henrik


 Martin Morgan


 Thanks,
 Da

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



 --
 Computational Biology / Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109

 Location: Arnold Building M1 B861
 Phone: (206) 667-2793


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Error promise already under evaluation ... with function(x, dim=dim(x))

2014-11-14 Thread Henrik Bengtsson

I've meant to ask the following for several years now.  I understand why:

 foo - function(x, dim=dim) { dim }
 foo(1)
Error in foo(1) :
  promise already under evaluation: recursive default argument
reference or earlier problems?

gives an error, but why wouldn't/couldn't the following work?

 foo - function(x, dim=dim(x)) { dim }
 foo(1)
Error in foo(1) :
  promise already under evaluation: recursive default argument
reference or earlier problems?

As a workaround I also tried:

 foo - function(x, dim) { if (missing(dim)) dim - dim(x); dim }
 foo(1)
Error in foo(1) : argument dim is missing, with no default

which surprised me too.


For the first case, is the rationale related to:

 foo - function(x, a=dim(x), dim) { a }
 foo(1)
Error in foo(1) : argument dim is missing, with no default

and

 foo - function(x, a=dim(x), dim=a) { a }
 foo(1)
Error in foo(1) :
  promise already under evaluation: recursive default argument
reference or earlier problems?

[since here argument 'dim' could take a function, e.g. foo(1,
dim=length)], and that R treats

foo - function(x, dim=dim(x)) { dim }

in a similar way?  That is, is R not clever enough to detect this as
a special case, but instead goes ahead and tries to evaluate the
default expression (=dim(x)) of argument 'dim' in order to get its
default value?  If so, is there anything preventing R from support
this special case, e.g. by evaluating the default expression without
argument/symbol 'dim' itself being in the picture to avoid it finds
itself?  (Sorry if I'm using the incorrect words here).


Yes, I understand that I can do:

 foo - function(x, dim=base::dim(x)) { dim }
 foo(1)
NULL

 foo - function(x, dim=NULL) { if (is.null(dim)) dim - dim(x); dim }
 foo(1)
NULL

or

 foo - function(x, dim.=dim(x)) { dim. }
 foo(1)
NULL

but I would prefer not to have to turn those rather ad hoc solutions in my code.


Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Code tools for identifying which package A functions package B use?

2014-11-14 Thread Henrik Bengtsson

Hi,

I'd like to list all package PkgA functions that another package PkgB
use via Depends or Imports (ignoring Suggests for simplicity).  As
long as PkgB uses importFrom(PkgA, ...) it's just a matter of
parsing the NAMESPACE file or inspecting
asNamespace(PkgB)$.__NAMESPACE__.$imports.  However, what can be
done in case PkgB uses import(PkgA)?  Is there a function/package
already available for this?

Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Code tools for identifying which package A functions package B use?

2014-11-14 Thread Henrik Bengtsson

Thanks Kasper, that seems to do it:

$ url=https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/codetoolsBioC
$ svn checkout --username readonly --password readonly $url
$ R CMD build codetoolsBioC
$ R CMD INSTALL codetoolsBioC
$ R

 library(codetoolsBioC)
 deps - findExternalDeps(MASS)
attaching required packages 'MASS'
Loading required package: MASS
 str(deps)
List of 4
 $ S4Classes: list()
 $ S4Methods:List of 1
  ..$ methods: chr body-
 $ functions:List of 5
  ..$ base : chr [1:248] - ! != $ ...
  ..$ graphics : chr [1:16] abline axis box frame ...
  ..$ grDevices: chr [1:5] dev.flush dev.hold nclass.FD ...
  ..$ methods  : chr new
  ..$ stats: chr [1:97] .checkMFClasses .getXlevels ...
 $ variables:List of 1
  ..$ base: chr [1:4] .GlobalEnv .Machine .Options pi


Great!

/Henrik

On Fri, Nov 14, 2014 at 8:41 PM, Kasper Daniel Hansen
kasperdanielhan...@gmail.com wrote:
 The best thing I have found is codetoolsBioC in the Bioconductor subversion
 repository.

 Best,
 Kasper

 On Fri, Nov 14, 2014 at 9:57 PM, Henrik Bengtsson h...@biostat.ucsf.edu
 wrote:

 Hi,

 I'd like to list all package PkgA functions that another package PkgB
 use via Depends or Imports (ignoring Suggests for simplicity).  As
 long as PkgB uses importFrom(PkgA, ...) it's just a matter of
 parsing the NAMESPACE file or inspecting
 asNamespace(PkgB)$.__NAMESPACE__.$imports.  However, what can be
 done in case PkgB uses import(PkgA)?  Is there a function/package
 already available for this?

 Thanks,

 Henrik

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Cursor not behaving properly

2014-11-19 Thread Henrik Bengtsson

FYI, it might be useful to check if the bug also appears on R-devel as
well as on earlier versions of R.  That might narrow down whether it
was introduced in a particular R version or not, which in turn would
be useful to whoever might try to tackle this problem.  It might not
even be an R problem in the end.

/Henrik

On Wed, Nov 19, 2014 at 1:14 PM, Scott Kostyshak skost...@princeton.edu wrote:
 On Tue, Nov 18, 2014 at 9:50 PM, Scott Kostyshak skost...@princeton.edu 
 wrote:
 On Mon, Nov 10, 2014 at 10:52 AM, Kaiyin Zhong (Victor Chung)
 kindlych...@gmail.com wrote:
 I found a strange bug in R recently (version 3.1.2):

 As you can see from the screenshots attached, when the cursor passes the
 right edge of the console, instead of start on a new line, it goes back to
 the beginning of the same line, and overwrites everything after it.

 This happens every time the size of the terminal is changed, for example,
 if you fit the terminal to the right half of the screen, start an R
 session, exec some commands, maximize the terminal, and type a long command
 into the session, then you will find the bug reproduced.

 I am on Ubuntu 14.04, and I have tested this in konsole, guake and
 gnome-terminal.

 I can reproduce this, also on Ubuntu 14.04, with gnome-terminal and
 xterm. If you don't get any response here, please file a bug report at
 bugs.r-project.org.

 For archival purposes, the OP reported the bug here:
 https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16077

 Scott


 --
 Scott Kostyshak
 Economics PhD Candidate
 Princeton University

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R string comparisons may vary with platform (plain text)

2014-11-22 Thread Henrik Bengtsson

On Sat, Nov 22, 2014 at 12:42 PM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 22/11/2014, 2:59 PM, Stuart Ambler wrote:
 A colleague¹s R program behaved differently when I ran it, and we thought
 we traced it probably to different results from string comparisons as
 below, with different R versions.  However the platforms also differed.  A
 friend ran it on a few machines and found that the comparison behavior
 didn¹t correlate with R version, but rather with platform.

 I wonder if you¹ve seen this.  If it¹s not some setting I¹m unaware of,
 maybe someone should look into it.  Sorry I haven¹t taken the time to read
 the source code myself.

 Looks like a collation order issue.  See ?Comparison.

With the oddity that both platforms use what look like similar locales:

LC_COLLATE=en_US.UTF-8
LC_COLLATE=en_US.utf8

/Henrik


 Duncan Murdoch

 Thanks,
 Stuart

 R version 3.0.2 (2013-09-25) -- Frisbee Sailing
 Platform: x86_64-unknown-linux-gnu (64-bit)
 Sys.getlocale()
 [1]
 LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF
 -8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_
 NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICA
 TION=C

 -1  1
 [1] TRUE

 -1 1
 [1] FALSE

 1  -1
 [1] TRUE

 1  -
 [1] FALSE

 Vs.

 R version 3.1.1 (2014-07-10) ‹ ³Sock it to Me
 Platform: x86_64-redhat-linux-gnu (64-bit)
 Sys.getlocale()
 [1]
 LC_CTYPE=en_US.utf8;LC_NUMERIC=C;LC_TIME=en_US.utf8;LC_COLLATE=en_US.utf8
 ;LC_MONETARY=en_US.utf8;LC_MESSAGES=en_US.utf8;LC_PAPER=en_US.utf8;LC_NAME
 =C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.utf8;LC_IDENTIFICATION
 =C

 -1  1
 [1] FALSE

 -1 1
 [1] TRUE

 1  -1
 [1] FALSE

 1  -
 [1] FALSE

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Rprof(..., memory.profiling=TRUE) to profile C memory allocation?

2014-11-22 Thread Henrik Bengtsson

Could someone please confirm/refute that Rprof(...,
memory.profiling=TRUE) can also be used to profile memory allocation
done in a C function (src/*.c) that uses, e.g.

  allocVector(INTSXP, n)

but also allocations such as

 R_alloc(n, sizeof(int))

?

Modulo how R was built, does the answer depend on OS?  I'm interested
in all the major ones (Linux, OS X and Windows).

Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Error promise already under evaluation ... with function(x, dim=dim(x))

2014-11-23 Thread Henrik Bengtsson

On Sat, Nov 15, 2014 at 1:47 AM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:

 On 14/11/2014, 9:06 PM, Henrik Bengtsson wrote:
  I've meant to ask the following for several years now.  I understand why:
 
  foo - function(x, dim=dim) { dim }
  foo(1)
  Error in foo(1) :
promise already under evaluation: recursive default argument
  reference or earlier problems?
 
  gives an error, but why wouldn't/couldn't the following work?
 
  foo - function(x, dim=dim(x)) { dim }
  foo(1)
  Error in foo(1) :
promise already under evaluation: recursive default argument
  reference or earlier problems?

 You refer to dim.  There's a dim defined in the argument list, so R
 uses that definition of it.

 But you didn't supply any value, so it tries to evaluate the default
 value.  Default expressions are always evaluated in the evaluation frame
 of the function call, so it looks for a function named dim in the
 local frame.

 It finds the argument in the local frame, so it tries to figure out if
 it is a function or a value.  It needs to evaluate it to do that, and
 you get the recursion.

 
  As a workaround I also tried:
 
  foo - function(x, dim) { if (missing(dim)) dim - dim(x); dim }
  foo(1)
  Error in foo(1) : argument dim is missing, with no default
 
  which surprised me too.
 
 
  For the first case, is the rationale related to:
 
  foo - function(x, a=dim(x), dim) { a }
  foo(1)
  Error in foo(1) : argument dim is missing, with no default
 
  and
 
  foo - function(x, a=dim(x), dim=a) { a }
  foo(1)
  Error in foo(1) :
promise already under evaluation: recursive default argument
  reference or earlier problems?
 
  [since here argument 'dim' could take a function, e.g. foo(1,
  dim=length)], and that R treats
 
  foo - function(x, dim=dim(x)) { dim }
 
  in a similar way?  That is, is R not clever enough to detect this as
  a special case, but instead goes ahead and tries to evaluate the
  default expression (=dim(x)) of argument 'dim' in order to get its
  default value?  If so, is there anything preventing R from support
  this special case, e.g. by evaluating the default expression without
  argument/symbol 'dim' itself being in the picture to avoid it finds
  itself?  (Sorry if I'm using the incorrect words here).

 No, it shouldn't do that.  It should use consistent rules for evaluation
 or there would be sure to be bugs.

 
  Yes, I understand that I can do:
 
  foo - function(x, dim=base::dim(x)) { dim }

 This is what you should do.

  foo(1)
  NULL
 
  foo - function(x, dim=NULL) { if (is.null(dim)) dim - dim(x); dim }

 This works, because when R is looking up the function dim(), it can
 evaluate the local argument dim and see it is not a function, so it
 proceeds to the parent frame.

  foo(1)
  NULL
 
  or
 
  foo - function(x, dim.=dim(x)) { dim. }
  foo(1)
  NULL

 This is another solution that works, but it has the ugly argument name
 now, so you'll get warnings during package checks from calls like

 foo(1, dim=2)

 
  but I would prefer not to have to turn those rather ad hoc solutions in my 
  code.

 Nothing ad hoc about the first one.

Thanks for the feedback.  I agree that base::dim(x) is clean and
clear, but unfortunately there is a ~500 times overhead in using '::'.
Since I went through the effort of doing the benchmarking and find
faster solutions, I'm sharing the following:

 library(microbenchmark)

 x - matrix(1:(80*80), nrow=80)

 # Not legal, because it calls .Primitive().
 dim_illegal - base::dim

 dim_R - function(x) {
+   ns - getNamespace(base)
+   dim - get(dim, envir=ns, inherits=FALSE, mode=function)
+   dim(x)
+ }

 dim_R_memoized - local({
+   dim - NULL
+   function(x) {
+ if (is.null(dim)) {
+   dim - get(dim, envir=getNamespace(base), inherits=FALSE,
mode=function)
+ }
+ dim(x)
+   }
+ })

 stats - microbenchmark(
+   dim(x),
+   base::dim(x),
+   dim_R(x),
+   dim_R_memoized(x),
+   dim_illegal(x),
+   sum(x),
+   unit=ns,
+   times=10e3
+ )
Warning message:
In microbenchmark(dim(x), base::dim(x), dim_R(x), dim_R_memoized(x),  :
  Could not measure a positive execution time for 3859 evaluations.

 print(stats)
Unit: nanoseconds
  expr  min   lq   mean medianuq max neval   cld
dim(x)0025.2226  1 1   10780 1 a
  base::dim(x) 6545 7700 10429.0165   8470 12897 2678155 1 e
  dim_R(x) 3080 3851  5163.8612   4236  6545   55435 1   c
 dim_R_memoized(x)  385  771  1238.8292   1156  1541   44656 1  b
dim_illegal(x)0151.4421  1 15775 1 a
sum(x) 8085 8470  9590.9570   8470 10395   49660 1d

Yes, yes, the extra cost of using base::dim(x) is only ~10 us, but if
you do, say, a million bootstrap samples calling this function, that's
an extra unnecessary 10 seconds of processing time.  As a comparison,
the overhead is roughly the same as summing 6400 integers.

For workarounds, I considered:

(a) dim_illegal
(b) dim_R

Re: [Rd] Error promise already under evaluation ... with function(x, dim=dim(x))

2014-11-23 Thread Henrik Bengtsson

On Sun, Nov 23, 2014 at 4:07 PM, Henrik Bengtsson h...@biostat.ucsf.edu wrote:
 On Sat, Nov 15, 2014 at 1:47 AM, Duncan Murdoch
 murdoch.dun...@gmail.com wrote:

 On 14/11/2014, 9:06 PM, Henrik Bengtsson wrote:
  I've meant to ask the following for several years now.  I understand why:
 
  foo - function(x, dim=dim) { dim }
  foo(1)
  Error in foo(1) :
promise already under evaluation: recursive default argument
  reference or earlier problems?
 
  gives an error, but why wouldn't/couldn't the following work?
 
  foo - function(x, dim=dim(x)) { dim }
  foo(1)
  Error in foo(1) :
promise already under evaluation: recursive default argument
  reference or earlier problems?

 You refer to dim.  There's a dim defined in the argument list, so R
 uses that definition of it.

 But you didn't supply any value, so it tries to evaluate the default
 value.  Default expressions are always evaluated in the evaluation frame
 of the function call, so it looks for a function named dim in the
 local frame.

 It finds the argument in the local frame, so it tries to figure out if
 it is a function or a value.  It needs to evaluate it to do that, and
 you get the recursion.

 
  As a workaround I also tried:
 
  foo - function(x, dim) { if (missing(dim)) dim - dim(x); dim }
  foo(1)
  Error in foo(1) : argument dim is missing, with no default
 
  which surprised me too.
 
 
  For the first case, is the rationale related to:
 
  foo - function(x, a=dim(x), dim) { a }
  foo(1)
  Error in foo(1) : argument dim is missing, with no default
 
  and
 
  foo - function(x, a=dim(x), dim=a) { a }
  foo(1)
  Error in foo(1) :
promise already under evaluation: recursive default argument
  reference or earlier problems?
 
  [since here argument 'dim' could take a function, e.g. foo(1,
  dim=length)], and that R treats
 
  foo - function(x, dim=dim(x)) { dim }
 
  in a similar way?  That is, is R not clever enough to detect this as
  a special case, but instead goes ahead and tries to evaluate the
  default expression (=dim(x)) of argument 'dim' in order to get its
  default value?  If so, is there anything preventing R from support
  this special case, e.g. by evaluating the default expression without
  argument/symbol 'dim' itself being in the picture to avoid it finds
  itself?  (Sorry if I'm using the incorrect words here).

 No, it shouldn't do that.  It should use consistent rules for evaluation
 or there would be sure to be bugs.

 
  Yes, I understand that I can do:
 
  foo - function(x, dim=base::dim(x)) { dim }

 This is what you should do.

  foo(1)
  NULL
 
  foo - function(x, dim=NULL) { if (is.null(dim)) dim - dim(x); dim }

 This works, because when R is looking up the function dim(), it can
 evaluate the local argument dim and see it is not a function, so it
 proceeds to the parent frame.

  foo(1)
  NULL
 
  or
 
  foo - function(x, dim.=dim(x)) { dim. }
  foo(1)
  NULL

 This is another solution that works, but it has the ugly argument name
 now, so you'll get warnings during package checks from calls like

 foo(1, dim=2)

 
  but I would prefer not to have to turn those rather ad hoc solutions in my 
  code.

 Nothing ad hoc about the first one.

 Thanks for the feedback.  I agree that base::dim(x) is clean and
 clear, but unfortunately there is a ~500 times overhead in using '::'.
 Since I went through the effort of doing the benchmarking and find
 faster solutions, I'm sharing the following:

 library(microbenchmark)

 x - matrix(1:(80*80), nrow=80)

 # Not legal, because it calls .Primitive().
 dim_illegal - base::dim

 dim_R - function(x) {
 +   ns - getNamespace(base)
 +   dim - get(dim, envir=ns, inherits=FALSE, mode=function)
 +   dim(x)
 + }

 dim_R_memoized - local({
 +   dim - NULL
 +   function(x) {
 + if (is.null(dim)) {
 +   dim - get(dim, envir=getNamespace(base), inherits=FALSE,
 mode=function)
 + }
 + dim(x)
 +   }
 + })

 stats - microbenchmark(
 +   dim(x),
 +   base::dim(x),
 +   dim_R(x),
 +   dim_R_memoized(x),
 +   dim_illegal(x),
 +   sum(x),
 +   unit=ns,
 +   times=10e3
 + )
 Warning message:
 In microbenchmark(dim(x), base::dim(x), dim_R(x), dim_R_memoized(x),  :
   Could not measure a positive execution time for 3859 evaluations.

 print(stats)
 Unit: nanoseconds
   expr  min   lq   mean medianuq max neval   cld
 dim(x)0025.2226  1 1   10780 1 a
   base::dim(x) 6545 7700 10429.0165   8470 12897 2678155 1 e
   dim_R(x) 3080 3851  5163.8612   4236  6545   55435 1   c
  dim_R_memoized(x)  385  771  1238.8292   1156  1541   44656 1  b
 dim_illegal(x)0151.4421  1 15775 1 a
 sum(x) 8085 8470  9590.9570   8470 10395   49660 1d

 Yes, yes, the extra cost of using base::dim(x) is only ~10 us, but if
 you do, say, a million bootstrap samples calling this function, that's
 an extra unnecessary 10 seconds of processing time

[Rd] R CMD check --as-cran and (a)spell checking

2014-12-05 Thread Henrik Bengtsson

Does anyone know if it is possible to add a dictionary file of known
words that becomes part of the *built* package to tell 'R CMD check
--as-cran' not to report these words as misspelled.  I want this
dictionary to come with the *.tar.gz such that it will be available
regardless where the package is checked.  For instance, currently I
get:

* using log directory 'T:/R/_R-3.1.2patched/matrixStats.Rcheck'
* using R version 3.1.2 Patched (2014-12-03 r67101)
* using platform: x86_64-w64-mingw32 (64-bit)
* using session charset: ISO8859-1
* checking for file 'matrixStats/DESCRIPTION' ... OK
* this is package 'matrixStats' version '0.12.0'
* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Henrik Bengtsson henr...@braju.com'
Possibly mis-spelled words in DESCRIPTION:
  rowMedians (18:74)
  rowRanks (18:92)
  rowSds (18:111)
* checking package namespace information ... OK
...

Thanks

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R on the Cydia Store

2014-12-09 Thread Henrik Bengtsson

On Dec 9, 2014 6:38 AM, Apps Embedded apps.embed...@gmail.com wrote:

 Hi,

 We have published an Android app called R Console on the Play Store since
 Décember 2013.
 https://play.google.com/store/apps/details?id=com.appsopensource.R
 https://play.google.com/store/apps/details?id=com.appsopensource.Rpremium

 In the mean time, we have developped its equivalent app for the App Store.
 We released it on march 2014. We have been approved from this date by Apple
 to publish it world wide.
 Recently, we learnt that GPL app are not compatible with the App Store
 distribution licence.

What I would like to write here, would fall under this is certainly
not a topic for R devel, so I refrain.

However, I can say that it's likely your problems wouldn't have stopped there;

[R] R on the iPhone/iPad? Not so mucha GPL violation
https://stat.ethz.ch/pipermail/r-help/2010-June/240901.html

/Henrik


 Thus we decided to remove the iOS app from the App Store several days ago.

 We are thinking of publishing the same app published under Cydia with a
 freemium model.
 Its licence would be GPL v3.

 What we would like to do under Cydia with R Console is to have the
 following behavior :
 - free version will be able to run recommended packages and graphics are
 not enabled. A small ad banner is present on top of the app.
 - premium version will be the same as the free version except the ad banner
 will not be present anymore and 3 compilers will be integrated into the app
 in order to be able to compile and run most of the Cran packages from
 source.
 - graphics may be added in a second step.

 The app will be considered as a bundle of open-source tools. This bundle
 will be under the Gnu General Public Licence version 3. Each open-source
 tool which contributes to the overall bundle will stay in its original
 licence (R is GPL v2 for instance) but the bundle will be GPL v3.


 From your point of view, do you see any legal issue with this project under
 Cydia for jailbroken iOS devices?
 From a trademark point of view, is the name of the apps R Console Free
 and R Console Premium ok ?

 Thanks for your help.

 Apps Embedded Team.

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?

2014-12-11 Thread Henrik Bengtsson

SUGGESTION:
Would it make sense if install.packages() and friends always use an
ascii(*) encoding when parse():ing R package source code files?

I believe this should be safe, because R code files should be in ASCII
[http://en.wikipedia.org/wiki/ASCII] and only in source-code comments
you may use other characters.  This is from Section 'Package
subdirectories' in 'Writing R Extensions':

Only ASCII characters (and the control characters tab, formfeed, LF
and CR) should be used in code files. Other characters are accepted in
comments, but then the comments may not be readable in e.g. a UTF-8
locale. Non-ASCII characters in object names will normally fail when
the package is installed. Any byte will be allowed in a quoted
character string but \u escapes should be used for non-ASCII
characters. However, non-ASCII character strings may not be usable in
some locales and may display incorrectly in others.

Since comments are dropped by parse(), their actual content does not
matter, and the rest of the code should be in ASCII.

(*) It could be that the specific encoding ascii is not cross
platforms. If so, is there another way to specify a pure ASCII
encoding?



BACKGROUND:
If a user/system sets the 'encoding' option at startup, it may break
package installations from source if the package has source code
comments with non-ASCII characters.  For example,

$ mkdir foo; cd foo
$ echo options(encoding='UTF-8')  .Rprofile
$ R --vanilla
 install.packages(R.oo, type=source)

 install.packages(R.oo, type=source)
Installing package into 'C:/Users/hb/R/win-library/3.2'
(as 'lib' is unspecified)
--- Please select a CRAN mirror for use in this session ---
trying URL 'http://cran.at.r-project.org/src/contrib/R.oo_1.18.0.tar.gz'
Content type 'application/x-gzip' length 394545 bytes (385 KB)
opened URL
downloaded 385 KB

* installing *source* package 'R.oo' ...
** package 'R.oo' successfully unpacked and MD5 sums checked
** R
Warning in parse(outFile) :
  invalid input found on input connection 'C:/Users/hb/R/win-library/3.2/R.oo/R/
R.oo'
** inst
** preparing package for lazy loading
Warning in parse(n = -1, file = file, srcfile = NULL, keep.source = FALSE) :
  invalid input found on input connection 'C:/Users/hb/R/win-library/3.2/R.oo/R/
R.oo'
** help
[...]

(This can be an extremely time consuming task to troubleshoot,
particularly if reported to a package maintainer not having access to
the original system).

FYI, setting it only in the session is alright:

 options(encoding=UTF-8)
 install.packages(R.oo, type=source)

because install.packages() launches a separated R process for the
installation and it's only then the startup code becomes an issue.


TROUBLESHOOTING:
My understanding for the

Warning in parse(n = -1, file = file, srcfile = NULL, keep.source = FALSE) :
  invalid input found on input connection 'C:/Users/hb/R/win-library/3.2/R.oo/R/

is that this happens when there is a non-ASCII character in one of the
source-code comments (*) with a bit pattern matching a multi-byte
UTF-8 sequence [http://en.wikipedia.org/wiki/UTF-8#Description].  For
instance, consider a source code comment with an acute accent:

 raw - as.raw(c(0x23, 0x20, 0xe9, 0x74, 0x75, 0x64, 0x69, 0x61, 0x6e, 0x74, 
 0x0a))
 writeBin(raw, con=foo.R)
 code - readLines(foo.R)
 code
[1] # étudiant

 options(encoding=UTF-8)
 parse(foo.R)
Warning message:
In readLines(file, warn = FALSE) :
  invalid input found on input connection 'foo.R'

 options(encoding=ascii)
 parse(foo.R)
expression()

Reason for the invalid input: The bit pattern for raw[3:5], is:

 R.utils::intToBin(raw[3:5])
[1] 11101001 01110100 01110101

The first byte (raw[3]) matched special UTF-8 byte pattern 1110,
which according to UTF-8 should be followed by two more bytes with bit
patterns 10xx and 10x
[http://en.wikipedia.org/wiki/UTF-8#Description].  Since raw[4:5] does
not match those, it's an invalid UTF-8 byte sequence.  So, technically
this does not happen for all comments using acute accents, but it's
very likely.  More generally, a multi-byte UTF-8 sequence is expected
when byte pattern 11x (= 192 in decimal values) is encountered.
Looking http://en.wikipedia.org/wiki/ISO/IEC_8859, there are several
characters with this bit pattern for many Latin-N encodings, which
I'd assume is still in dominant use by many developers.

So, since options(encoding=UTF-8) was set at startup, that is also
the encoding that R tries to follow.  My suggestion is that it seems
that R should be able to always use a pure-ASCII encoding when parsing
R code in packages, because that is what 'Writing R Extensions' says
we should use in the first place.

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?

2014-12-11 Thread Henrik Bengtsson

On Thu, Dec 11, 2014 at 10:47 AM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 11/12/2014 12:59 PM, Henrik Bengtsson wrote:

 SUGGESTION:
 Would it make sense if install.packages() and friends always use an
 ascii(*) encoding when parse():ing R package source code files?


 I think that would be a step backwards.  It would be better to accept other
 encodings.  As an English speaker this isn't a big deal to me, but users of
 other languages may want to have messages and variable names in their native
 language, and ASCII might not be enough for that.

Thanks for the feedback.  While I'll probably agree with you that R
packages should support other source code encodings than ASCII, that
would require a change in the specifications and design.  What I'm
proposing is (just) an adjustment to the implementation to meet the
current specs and design.


 On the other hand, I think it's quite reasonable to require a declared
 encoding if anything other than ASCII is used, and possibly to fail for some
 encodings.  It is probably also reasonable to at least warn when non-ASCII
 characters are used in strings in packages on CRAN, as many users can't
 display all characters.

That would be a reasonable extension of the design, which would be
backward compatible with the current design, i.e. if encoding for the
source code is not declared, then it is assumed to be ASCII.

Source code comments are special, because by the current design
('Writing R Extensions'), it somehow leaves it open to use any type of
encoding.  If I read it freely, it could even be that you can use
different encoding for different comments in the same file (which is
not unlikely to occur considered cut'n'paste and open-source
licenses).  If other encodings are to be supported, then I see two
ways forward:

1. Have R completely ignore what's in the comments (what follows #
until the newline) such that encoding does not matter, or
2. require the same encoding for the source code comments as the rest
of the code.

As I see it, today's design falls (could fall?) under 1, but the
implementation does not go all the way to support it.

/Henrik

PS. It should be emphasized that this is about R packages. AFAIK, you
can already now source() code written in any encoding, e.g.
 raw - as.raw(c(
+  0xcf, 0x80, 0x20, 0x3c, 0x2d, 0x20, 0x70, 0x69, 0x0a,
+  0x70, 0x72, 0x69, 0x6e, 0x74, 0x28, 0xcf, 0x80, 0x29, 0x0a
+ ))
 writeBin(raw, con=pi.R)
 source(pi.R, encoding=UTF-8)
[1] 3.141593


 Duncan Murdoch


 I believe this should be safe, because R code files should be in ASCII
 [http://en.wikipedia.org/wiki/ASCII] and only in source-code comments
 you may use other characters.  This is from Section 'Package
 subdirectories' in 'Writing R Extensions':

 Only ASCII characters (and the control characters tab, formfeed, LF
 and CR) should be used in code files. Other characters are accepted in
 comments, but then the comments may not be readable in e.g. a UTF-8
 locale. Non-ASCII characters in object names will normally fail when
 the package is installed. Any byte will be allowed in a quoted
 character string but \u escapes should be used for non-ASCII
 characters. However, non-ASCII character strings may not be usable in
 some locales and may display incorrectly in others.

 Since comments are dropped by parse(), their actual content does not
 matter, and the rest of the code should be in ASCII.

 (*) It could be that the specific encoding ascii is not cross
 platforms. If so, is there another way to specify a pure ASCII
 encoding?



 BACKGROUND:
 If a user/system sets the 'encoding' option at startup, it may break
 package installations from source if the package has source code
 comments with non-ASCII characters.  For example,

 $ mkdir foo; cd foo
 $ echo options(encoding='UTF-8')  .Rprofile
 $ R --vanilla
  install.packages(R.oo, type=source)

  install.packages(R.oo, type=source)
 Installing package into 'C:/Users/hb/R/win-library/3.2'
 (as 'lib' is unspecified)
 --- Please select a CRAN mirror for use in this session ---
 trying URL 'http://cran.at.r-project.org/src/contrib/R.oo_1.18.0.tar.gz'
 Content type 'application/x-gzip' length 394545 bytes (385 KB)
 opened URL
 downloaded 385 KB

 * installing *source* package 'R.oo' ...
 ** package 'R.oo' successfully unpacked and MD5 sums checked
 ** R
 Warning in parse(outFile) :
invalid input found on input connection
 'C:/Users/hb/R/win-library/3.2/R.oo/R/
 R.oo'
 ** inst
 ** preparing package for lazy loading
 Warning in parse(n = -1, file = file, srcfile = NULL, keep.source = FALSE)
 :
invalid input found on input connection
 'C:/Users/hb/R/win-library/3.2/R.oo/R/
 R.oo'
 ** help
 [...]

 (This can be an extremely time consuming task to troubleshoot,
 particularly if reported to a package maintainer not having access to
 the original system).

 FYI, setting it only in the session is alright:

  options(encoding=UTF-8)
  install.packages(R.oo, type=source)

 because install.packages() launches

Re: [Rd] Benchmark code, but avoid printing

2015-01-02 Thread Henrik Bengtsson

On Fri, Jan 2, 2015 at 9:02 AM, Gábor Csárdi csardi.ga...@gmail.com wrote:
 Dear all,

 I am trying to benchmark code that occasionally prints on the screen
 and I want to
 suppress the printing. Is there an idiom for this?

 If I do

 sink(tempfile)
 microbenchmark(...)
 sink()

 then I'll be also measuring the costs of writing to tempfile. I could
 also sink to /dev/null, which is probably fast, but that is not
 portable.

Interesting problem.  On Windows NUL corresponds to /dev/NULL, e.g.
con - file(NUL, open=wb).  Not that it's cross platform, but it
at least allows you to cover on more OS.  Maybe R should have a
built-in null device.  An easier solution is probably to go back to
the maintainers of the functions outputting text and ask them for an
option to disable that.


 Is there a better solution? Is writing to a textConnection() better?

For large number of output *lines* (not characters), textConnection()
is exponentially slow (at least in R 3.1.0).  Use rawConnection()
instead, cf. http://www.jottr.org/2014/05/captureOutput.html

/Henrik


 Thanks, Best,
 Gabor

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] WISH: eval() to preserve the visibility (now value is always visible)

2015-02-07 Thread Henrik Bengtsson

Would it be possible to have the value of eval() preserve the
visibility of the value of the expression?


PROBLEM:

# Invisible
 x - 1

# Visible
 eval(x - 2)
[1] 2


TROUBLESHOOTING:
 withVisible(x - 1)
$value
[1] 1
$visible
[1] FALSE

 withVisible(eval(x - 2))
$value
[1] 2
$visible
[1] TRUE


WORKAROUND:
eval2 - function(expr, envir=parent.frame(), ...) {
  res - eval(withVisible(expr), envir=envir, ...)
  value - res$value
  if (res$visible) value else invisible(value)
}

 x - 1
 eval(x - 2)
[1] 2
 eval2(x - 3)
 x
[1] 3

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] default min-v/nsize parameters

2015-01-20 Thread Henrik Bengtsson

Thanks for this.

Anyone know how I can find what those initial settings are from within
R?  Do I need to parse/look at both environment variables R_NSIZE and
R_VSIZE and then commandArgs()?

/Henrik

On Tue, Jan 20, 2015 at 1:42 AM, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 Peter Haverty haverty.pe...@gene.com
 on Mon, 19 Jan 2015 08:50:08 -0800 writes:

  Hi All, This is a very important issue. It would be very
  sad to leave most users unaware of a free speedup of this
  size.  These options don't appear in the R --help
  output. They really should be added there.

 Indeed, I've found that myself and had added them there about
 24 hours ago.
 ((I think they were accidentally dropped a while ago))

  if the garbage collector is working very hard, might it
  emit a note about better setting for these variables?

  It's not really my place to comment on design philosophy,
  but if there is a configure option for small memory
  machines I would assume that would be sufficient for the
  folks that are not on fairly current hardware.

 There's quite a few more issues with this,
 notably how the growth *steps* are done.
 That has been somewhat experimental and for that reason is
 _currently_ quite configurable via R_GC_* environment variables,
 see the code in src/main/memory.c

 This is currently discussed privately within the R core.
 I'm somewhat confident that R 3.2.0 in April will have changes.

 And -- coming back to the beginning -- at least the R-devel version now 
 shows

 R --help | grep -e min-.size

   --min-nsize=N Set min number of fixed size obj's (cons cells) to N
   --min-vsize=N Set vector heap minimum to N bytes; '4M' = 4 MegaB

 --
 Martin Maechler, ETH Zurich

  On Sat, Jan 17, 2015 at 11:40 PM, Nathan Kurz n...@verse.com wrote:

  On Thu, Jan 15, 2015 at 3:55 PM, Michael Lawrence
  lawrence.mich...@gene.com wrote:
   Just wanted to start a discussion on whether R could ship with more
   appropriate GC parameters.
 
  I've been doing a number of similar measurements, and have come to the
  same conclusion.  R is currently very conservative about memory usage,
  and this leads to unnecessarily poor performance on certain problems.
  Changing the defaults to sizes that are more appropriate for modern
  machines can often produce a 2x speedup.
 
  On Sat, Jan 17, 2015 at 8:39 AM,  luke-tier...@uiowa.edu wrote:
   Martin Morgan discussed this a year or so ago and as I recall bumped
   up these values to the current defaults. I don't recall details about
   why we didn't go higher -- maybe Martin does.
 
  I just checked, and it doesn't seem that any of the relevant values
  have been increased in the last ten years.  Do you have a link to the
  discussion you recall so we can see why the changes weren't made?
 
   I suspect the main concern would be with small memory machines in
  student labs
   and less developed countries.
 
  While a reasonable concern, I'm doubtful there are many machines for
  which the current numbers are optimal.  The current minimum size
  increases for node and vector heaps are 40KB and 80KB respectively.
  This grows as the heap grows (min + .05 * heap), but still means that
  we do many more expensive garbage collections at while growing than we
  need to.  Paradoxically, the SMALL_MEMORY compile option (which is
  suggestd for computers with up to 32MB of RAM) has slightly larger at
  50KB and 100KB.
 
  I think we'd get significant benefit for most users by being less
  conservative about memory consumption.The exact sizes should be
  discussed, but with RAM costing about $10/GB it doesn't seem
  unreasonable to assume most machines running R have multiple GB
  installed, and those that don't will quite likely be running an OS
  that needs a custom compiled binary anyway.
 
  I could be way off, but my suggestion might be a 10MB start with 1MB
  minimum increments for SMALL_MEMORY, 100MB start with 10MB increments
  for NORMAL_MEMORY, and 1GB start with 100MB increments for
  LARGE_MEMORY might be a reasonable spread.
 
  Or one could go even larger, noting that on most systems,
  overcommitted memory is not a problem until it is used.  Until we
  write to it, it doesn't actually use physical RAM, just virtual
  address space.  Or we could stay small, but make it possible to
  programmatically increase the granularity from within R.
 
  For ease of reference, here are the relevant sections of code:
 
  https://github.com/wch/r-source/blob/master/src/include/Defn.h#L217
  (ripley last authored on Jan 26, 2000 / pd last authored on May 8, 
 1999)
  217  #ifndef R_NSIZE
  218  #define R_NSIZE 35L
  219  #endif
  220  #ifndef R_VSIZE
  221  #define

Re: [Rd] :: and ::: as .Primitives?

2015-01-22 Thread Henrik Bengtsson

On Thu, Jan 22, 2015 at 11:44 AM,  luke-tier...@uiowa.edu wrote:
 I'm not convinced that how to make :: faster is the right question. If
 you are finding foo::bar being called often enough to matter to your
 overall performance then to me the question is: why are you calling
 foo::bar more than once? Making :: a bit faster by making it a
 primitive will remove some overhead, but your are still left with a
 lot of work that shouldn't need to happen more than once.

 For default methods there ought to be a way to create those so the
 default method is computed at creation or load time and stored in an
 environment. For other cases if I want to use foo::bar many times, say
 in a loop, I would do

 foo_bar - foo::bar

 and use foo_bar, or something along those lines.

While you're on the line: Do you think this is an optimization that
the 'compiler' package and it's cmpfun() byte compiler will be able to
do in the future?

/Henrik


 When :: and ::: were introduce they were intended primarily for
 reflection and debugging, so speed was not an issue. ::: is still
 really only reliably usable that way, and making it faster may just
 encourage bad practice. :: is different and there are good arguments
 for using it in code, but I'm not yet seeing good arguments for use in
 ways that would be performance-critical, but I'm happy to be convinced
 otherwise. If there is a need for a faster :: then going to a
 SPECIALSXP is fine; it would also be good to make the byte code
 compiler aware of it, and possibly to work on ways to improve the
 performance further e.g. through cacheing.

 Best,

 luke


 On Thu, 22 Jan 2015, Peter Haverty wrote:


 Hi all,

 When S4 methods are defined on base function (say, match), the
 function becomes a method with the body base::match(x,y). A call to
 such a function often spends more time doing :: than in the function
 itself.  I always assumed that :: was a very low-level thing, but it
 turns out to be a plain old function defined in base/R/namespace.R.
 What would you all think about making :: and ::: .Primitives?  I
 have submitted some examples, timings, and a patch to the R bug
 tracker (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16134).
 I'd be very interested to hear your thoughts on the matter.

 Regards,
 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 --
 Luke Tierney
 Ralph E. Wareham Professor of Mathematical Sciences
 University of Iowa  Phone: 319-335-3386
 Department of Statistics andFax:   319-335-3017
Actuarial Science
 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
 Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Programming Tools CTV

2015-01-22 Thread Henrik Bengtsson

On Thu, Jan 22, 2015 at 7:20 AM, Max Kuhn mxk...@gmail.com wrote:
 I've had a lot of requests for additions to the reproducible research
 task view that fall into a grey area (to me at least).

 For example, roxygen2 is a tool that broadly enable reproducibility
 but I see it more as a tool for better programming. I'm about to check
 in a new version of the task view that includes packrat and
 checkpoint, as they seem closer to reproducible research, but also
 feel like coding tools.

 There are a few other packages that many would find useful for better
 coding: devtools, testthat, lintr, codetools, svTools, rbenchmark,
 pkgutils, etc.

 This might be some overlap with the HPC task view. I would think that
 rJava, Rcpp and the like are better suited there but this is arguable.

 The last time I proposed something like this, Martin deftly convinced
 me to be the maintainer. It is probably better for everyone if we
 avoid that on this occasion.

 * Does anyone else see the need for this?

 * What other packages fit into this bin?

 * Would anyone like to volunteer?

Thanks for your work on this.

May I suggest a Git/GitHub repository for this?  That lowers the
barriers for contributions substantially, e.g. either via issues but
even better via pull requests (== point'n'click for you).  If you need
to mirror/push it to an SVN repository, I'm sure that's pretty easy to
do (and likely also to automate).

/Henrik

PS. Sorry, I'm not volunteering; too much on my plate.


 Thanks,

 Max

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] problem with update.packages() in R-Devel (3.2.0) on Windows

2015-01-27 Thread Henrik Bengtsson

It works again using:

% R --version
R Under development (unstable) (2015-01-26 r67627) -- Unsuffered
Consequences
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

/Henrik

On Mon, Jan 26, 2015 at 9:54 AM, John Fox j...@mcmaster.ca wrote:

 Dear all,

 I've noticed the following problem for the past several days:

  snip 
  update.packages(ask=FALSE)

 . . .

 trying URL 'http://cran.utstat.utoronto.ca/src/contrib/zoo_1.7-11.zip'
 Error in download.file(url, destfile, method, mode = wb, ...) :
   cannot open URL
 'http://cran.utstat.utoronto.ca/src/contrib/zoo_1.7-11.zip'
 In addition: Warning message:
 In download.file(url, destfile, method, mode = wb, ...) :
   cannot open: HTTP status was '404 Not Found'
 Warning in download.packages(pkgs, destdir = tmpd, available = available,
 :
   download of package 'zoo' failed

  snip 

 Apparently, the subdirectory for the version number (/3.2) is missing from
 the URL. OTOH, install.packages() works fine:

  snip 

  install.packages(zoo)
 trying URL
 'http://cran.utstat.utoronto.ca/bin/windows/contrib/3.2/zoo_1.7-11.zip'
 Content type 'application/zip' length 878614 bytes (858 KB)
 opened URL
 downloaded 858 KB

 package 'zoo' successfully unpacked and MD5 sums checked

 The downloaded binary packages are in
 C:\Users\John Fox\AppData\Local\Temp\RtmpuKqvB0\downloaded_packages

  snip 

 Session info:

  snip 

  sessionInfo()
 R Under development (unstable) (2015-01-25 r67615)
 Platform: x86_64-w64-mingw32/x64 (64-bit)
 Running under: Windows 7 x64 (build 7601) Service Pack 1

 locale:
 [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252
 [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
 [5] LC_TIME=English_Canada.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 loaded via a namespace (and not attached):
 [1] tools_3.2.0

  snip 

 Best,
  John

 ---
 John Fox, Professor
 McMaster University
 Hamilton, Ontario, Canada
 http://socserv.mcmaster.ca/jfox/




 ---
 This email has been checked for viruses by Avast antivirus software.
 http://www.avast.com

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Output to raw console rather than stdout/stderr?

2015-02-01 Thread Henrik Bengtsson

In R, there's readline(), which is great because you can prompt the user, e.g.

ans - readline(Would you like to install Pandoc? [y/N]: )

without having to worry the message is intercepted by
capture.output(), sink() or similar (which is used by dynamic report
generators among other things).  The message will always reach the
user.  (You can use sink(..., type=output) and sink(...,
type=message) to confirm this.)

Does anyone know of a similar function that outputs the message to
raw console *without* pausing for user interrupt?  This far I came
up with:

cmsg - function(...) {
  if (.Platform$OS.type == windows) {
pager - console
  } else {
pager - cat
  }

  ## Write output to a temporary file
  fh - tempfile()
  on.exit(file.remove(fh))
  cat(..., file=fh)

  ## Display file such that it cannot be captured/intercepted by R.
  file.show(fh, pager=pager)
}

but if a less ad hoc approach exists, I'd like to hear about it.

Thank you,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Output to raw console rather than stdout/stderr?

2015-02-01 Thread Henrik Bengtsson

On Sun, Feb 1, 2015 at 1:39 PM, Jeroen Ooms jeroeno...@gmail.com wrote:
 Why do you need this? The sink system is often specifically needed to
 capture such messages and display them to the user, for example in an
 embedded environment. Many applications would not work when you bypass
 the stdout/stderr set by the system. For example tools like knitr or
 rapache need to capture stdout to get the output and insert it in a
 report or webpage.

It's mostly so I can send partial prompt messages to the user and at
the very end use readline() to query for a decision.   The strategy of
outputting now and then is used by install.packages() et al., but
unfortunately it outputs to stdout, e.g.

 options(menu.graphics=FALSE)
 bfr - capture.output(install.packages(R.methodsS3, repos=@CRAN@))
Installing package into 'C:/Users/hb/R/win-library/3.2'
(as 'lib' is unspecified)
Selection: 1
Selection: 1
Selection: 1
trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.2/R.methodsS3_1.6.1.zi
p'
Content type 'application/zip' length 55873 bytes (54 KB)
opened URL
downloaded 54 KB

 str(bfr)
 chr [1:169] --- Please select a CRAN mirror for use in this session --- ...

Note how the user is prompted Selection:  without any clue on what
to answer.  It would of course be better if those message would have
been sent to stderr, but you could imagine that also stderr is
captured by report generators.  In that case, these prompt message
would be hidden from the user.  These kind of prompt messages are of
no use to the final report and similar.

To summarize: I would like a third stream prompt dedicated for
message to the interactive user, in addition to the current output
(stdout) and message (stderr) ones.  This prompt stream would be
to the terminal what the GUI prompts/dialogs are in windowed
environments.   I think of this as readline() already sends to the
prompt stream (unless that is just an bug/undocumented features that
I'm misinterpret).


 If you really want to, perhaps you could use something like
 system(echo hello) to send a message to stdout via another process.

Yes, that's the spirit of my cmsg() below.

Hope this makes more sense now

Henrik



 On Sun, Feb 1, 2015 at 11:13 AM, Henrik Bengtsson h...@biostat.ucsf.edu 
 wrote:
 In R, there's readline(), which is great because you can prompt the user, 
 e.g.

 ans - readline(Would you like to install Pandoc? [y/N]: )

 without having to worry the message is intercepted by
 capture.output(), sink() or similar (which is used by dynamic report
 generators among other things).  The message will always reach the
 user.  (You can use sink(..., type=output) and sink(...,
 type=message) to confirm this.)

 Does anyone know of a similar function that outputs the message to
 raw console *without* pausing for user interrupt?  This far I came
 up with:

 cmsg - function(...) {
   if (.Platform$OS.type == windows) {
 pager - console
   } else {
 pager - cat
   }

   ## Write output to a temporary file
   fh - tempfile()
   on.exit(file.remove(fh))
   cat(..., file=fh)

   ## Display file such that it cannot be captured/intercepted by R.
   file.show(fh, pager=pager)
 }

 but if a less ad hoc approach exists, I'd like to hear about it.

 Thank you,

 Henrik

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Inspect a delayed assigned whose value throws an error?

2015-01-26 Thread Henrik Bengtsson

On Mon, Jan 26, 2015 at 12:24 PM, Hadley Wickham h.wick...@gmail.com wrote:
 If it was any other environment than the global, you could use substitute:

 e - new.env()
 delayedAssign(foo, stop(Hey!), assign.env = e)
 substitute(foo, e)

 delayedAssign(foo, stop(Hey!))
 substitute(foo)

Hmm... interesting and odd.

Unfortunately, this doesn't seem to help for reaching into the
namespace of hgu133a.db and inspecting 'hgu133aPFAM', e.g.

 library(hgu133a.db)

 substitute(hgu133aPFAM, env=ns)
Error: hgu133aPFAM is defunct. Please use select() if you need access to PFAM
  or PROSITE accessions.

 evalq(substitute(hgu133aPFAM), envir=ns)
Error: hgu133aPFAM is defunct. Please use select() if you need access to PFAM
  or PROSITE accessions.

 evalq(substitute(hgu133aPFAM, env=ns), envir=ns)
Error: hgu133aPFAM is defunct. Please use select() if you need access to PFAM
  or PROSITE accessions.

Thanks,

Henrik



 Hadley

 On Mon, Jan 26, 2015 at 12:53 PM, Henrik Bengtsson h...@biostat.ucsf.edu 
 wrote:
 Hi, I got an interesting programming challenge:

 How do you inspect an object which is assigned via delayedAssign() and
 that throws an error as soon as it is touched (=the value is
 evaluated)?  Is it possible?


 MINIMAL EXAMPLE:

 $ R --vanilla
 delayedAssign(foo, stop(Hey!))

 (If you find this minimal example silly/obvious, please skip down to
 the real example at the end)

 foo
 Error: Hey!

 str(foo)
 Error in str(foo) : Hey!
 In addition: Warning message:
 In str(foo) : restarting interrupted promise evaluation

 mode(foo)
 Error in mode(foo) : Hey!
 In addition: Warning message:
 In mode(foo) : restarting interrupted promise evaluation

 .Internal(inspect(foo))
 Error: Hey!
 In addition: Warning message:
 restarting interrupted promise evaluation

 traceback()
 1: stop(Hey!)

 Is there anyway I can inspect this object using the R API without
 evaluating the value in the delayed assignment?  Is it possible to
 test if this is a delayed assigned or not?


 BACKGROUND:
 The background to this is where I have a function in the R.oo package
 that scans namespaces for functions with a certain class attribute.
 For this I use is.function() and inherits() to inspect each object.
 An aroma.affymetrix user reported on a problem that boiled down to the
 following:

 # source(http://bioconductor.org/biocLite.R;); biocLite(hgu133a.db)
 library(hgu133a.db)
 is.function(hgu133aPFAM)
 Error: hgu133aPFAM is defunct. Please use select() if you need access to PFAM
   or PROSITE accessions.
 .Internal(inspect(hgu133aPFAM))

 traceback()
 3: stop(paste(msg, collapse = ), call. = FALSE, domain = NA)
 2: .Defunct(msg = msg)
 1: (function ()
{
if (grepl(PFAM, x)) {
bimapName - paste0(prefix, PFAM)
}
else {
bimapName - paste0(prefix, PROSITE)
}
x - dc[[bimapName]]
msg = wmsg(paste0(bimapName,  is defunct. , Please use select() if 
 you
  need access to PFAM or PROSITE accessions. \n))
if (interactive()) {
.Defunct(msg = msg)
}
})()

 My immediate solution is to perform those tests using tryCatch(), but
 this is interesting, because this function is such that the error is
 only thrown in interactive() sessions, i.e. the following works:

 $ Rscript -e hgu133a.db::hgu133aPFAM
 [...]
 NULL

 This is probably also why none of my aroma.affymetrix system tests
 caught this.  Without tracing the source code behind, which seems
 quite nested, the above is why I believe the assignment is delayed;
 traceback() shows a body source code, the object evaluates to
 different things depending on interactive().

 /Henrik

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



 --
 http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Shouldn't grDevices::quartz() give an error instead of a warning when not available?

2015-01-10 Thread Henrik Bengtsson

Compare:

 quartz(); cat(Should this have generated an error instead?\n)
Warning message:
In quartz() : Quartz device is not available on this platform
Should this have generated an error instead?

to:

 x11()
Error in .External2(C_X11, d$display, d$width, d$height, d$pointsize,  :
  unable to start device X11cairo
In addition: Warning message:
In x11() : unable to open connection to X11 display ''

Wouldn't it make sense that a failed call to quartz() would give an
error instead?

 sessionInfo()
R Under development (unstable) (2015-01-09 r67397)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] WISH: eval() to preserve the visibility (now value is always visible)

2015-02-10 Thread Henrik Bengtsson

On Sun, Feb 8, 2015 at 8:44 PM, Suharto Anggono Suharto Anggono via
R-devel r-devel@r-project.org wrote:
 Sorry to intervene.

No, I'm very happy you intervened.  You're comment is 100%
valid/correct making my wish moot.

Your explanation is very clear and nails it; one should use
eval(substitute(expr)) or evalq(expr) for what I'm trying to do.

It all came from me trying to prevent

 withOptions({x - 1}, foo=1)

from printed the value, where (somewhat simplified):

withOptions - function(expr, ..., envir=parent.frame()) {
  oopts - options(...)
  on.exit(options(oopts))
  eval(expr, envir=envir)
}

I have a few of these withNnn() functions, but for this particular one
(*) I had forgotten an expr - substitute(expr) in there, which caused
me to incorrectly blame eval().  recursive mistakeThis is very much
the same problem as you observed with my eval2() example./recursive
mistake

Thank you very much

Henrik
(*) Actually withSeeds() which is to messy to use as an example.


 Argument passed to 'eval' is evaluated first.
 So,
 eval(x - 2)
 is effectively like
 { x - 2; eval(2) } ,
 which is effectively
 { x - 2; 2 } .
 The result is visible.

 eval(expression(x - 2))
 or
 eval(quote(x - 2))
 or
 evalq(x - 2)
 gives the same effect as
 x - 2 .
 The result is invisible.

 In function 'eval2',
 res - eval(withVisible(expr), envir=envir, ...)
 is effectively
 res - withVisible(expr) .

 ---

 Would it be possible to have the value of eval() preserve the
 visibility of the value of the expression?


 PROBLEM:

 # Invisible
 x - 1

 # Visible
 eval(x - 2)
 [1] 2

 TROUBLESHOOTING:
 withVisible(x - 1)
 $value
 [1] 1
 $visible
 [1] FALSE

 withVisible(eval(x - 2))
 $value
 [1] 2
 $visible
 [1] TRUE


 WORKAROUND:
 eval2 - function(expr, envir=parent.frame(), ...) {
   res - eval(withVisible(expr), envir=envir, ...)
   value - res$value
   if (res$visible) value else invisible(value)
 }

 x - 1
 eval(x - 2)
 [1] 2
 eval2(x - 3)
 x
 [1] 3

 /Henrik

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Performance issue in stats:::weighted.mean.default method

2015-03-05 Thread Henrik Bengtsson

See weightedMean() in the matrixStats package.  It's optimized for
data type, speed and memory and implemented in native code so it can
avoid some of these intermediate copies.  It's a few times faster than
weighted.mean[.default]();

library(matrixStats)
library(microbenchmark)
n - 5000
x - sample(500,n,replace=TRUE)
w - sample(1000,n,replace=TRUE)/1000 *
ifelse((sample(10,n,replace=TRUE) -1)  0, 1, 0)
fun.new - function(x,w) {sum(x*w)/sum(w)}
fun.orig  - function(x,w) {sum(x*w[w!=0])/sum(w)}
stats - microbenchmark(
  weightedMean(x,w),
  weighted.mean(x,w),
  ORIGFN = fun.orig(x,w),
  NEWFN  = fun.new(x,w),
  times = 1000
)

 print(stats, digits=3)
Unit: microseconds
expr   minlq  mean medianuqmax neval
  weightedMean(x, w)  28.7  31.7  33.4   32.9  33.8   81.7  1000
 weighted.mean(x, w) 129.6 141.6 149.6  143.7 147.1 2332.9  1000
  ORIGFN 205.7 222.0 235.0  225.4 231.4 2655.8  1000
   NEWFN  38.9  42.3  44.3   42.8  43.6  385.8  1000

Relative performance will vary with n = length(x).

The weightedMean() function handles zero-weight Inf values:

 w - c(0, 1)
 x - c(Inf, 1)
 weighted.mean(x, w)
[1] 1
 fun.new(x, w)
[1] NaN
 weightedMean(x,w)
[1] 1

You'll find more benchmark results on weightedMean() vs
weighted.mean() on
https://github.com/HenrikBengtsson/matrixStats/wiki/weightedMean

/Henrik

On Thu, Mar 5, 2015 at 9:49 AM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
 On 05/03/2015 14:55, Tadeáš Palusga wrote:

 Hi,
I'm using this mailing list for the first time and I hope this is the
 right one. I don't think that the following is a bug but it can be a
 performance issue.

 By my opinion, there is no need to filter by [w != 0] in last sum of
 weighted.mean.default method defined in
 src/library/stats/R/weighted.mean.R. There is no need to do it because
 you can always sum zero numbers and filtering is too expensive (see
 following benchmark snippet)


 But 0*x is not necessarily 0, so there is a need to do it ... see

 w - c(0, 1)
 x - c(Inf, 1)
 weighted.mean(x, w)
 [1] 1
 fun.new(x, w)
 [1] NaN





 library(microbenchmark)
 x - sample(500,5000,replace=TRUE)
 w - sample(1000,5000,replace=TRUE)/1000 *
 ifelse((sample(10,5000,replace=TRUE) -1)  0, 1, 0)
 fun.new - function(x,w) {sum(x*w)/sum(w)}
 fun.orig  - function(x,w) {sum(x*w[w!=0])/sum(w)}
 print(microbenchmark(
ORIGFN = fun.orig(x,w),
NEWFN  = fun.new(x,w),
times = 1000))

 #results:
 #Unit: microseconds
 #   expr min   lq  mean  median  uq  max neval
 # ORIGFN 190.889 194.6590 210.08952 198.847 202.928 1779.789  1000
 #  NEWFN  20.857  21.7175  24.61149  22.080  22.594 1744.014  1000




 So my suggestion is to remove the w != check




 Index: weighted.mean.R
 ===
 --- weighted.mean.R (revision 67941)
 +++ weighted.mean.R (working copy)
 @@ -29,7 +29,7 @@
   stop('x' and 'w' must have the same length)
   w - as.double(w) # avoid overflow in sum for integer weights.
   if (na.rm) { i - !is.na(x); w - w[i]; x - x[i] }
 -sum((x*w)[w != 0])/sum(w) # -- NaN in empty case
 +sum(x*w)/sum(w) # -- NaN in empty case
   }

   ## see note for ?mean.Date


 I hope i'm not missing something - I really don't see the reason to have
 this filtration here.

 BR

 Tadeas 'donarus' Palusga

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



 --
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Emeritus Professor of Applied Statistics, University of Oxford
 1 South Parks Road, Oxford OX1 3TG, UK


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] static pdf vignette

2015-02-27 Thread Henrik Bengtsson

On Fri, Feb 27, 2015 at 4:05 AM, Kirill Müller
kirill.muel...@ivt.baug.ethz.ch wrote:
 Perhaps the R.rsp package by Henrik Bengtsson [1,2] is an option.


 Cheers

 Kirill


 [1] http://cran.r-project.org/web/packages/R.rsp/index.html
 [2] https://github.com/HenrikBengtsson/R.rsp

Yes, this use case is one of the rationale for providing the
R.rsp::asis vignette engine (and the R.rsp::tex one).  Just make sure
you try your best to provide the source in the *.tar.gz distribution,
which shouldn't be hard in this case since you're generating the PDF
from a (Sweave/knitr) vignette.  For instructions, see the R.rsp 'R
packages: Static PDF and HTML vignettes'.

Also, if it's not already clear, users who install your package do
*not* have to install vignette engine packages (here R.rsp), i.e.
you're not adding any overhead for them; it's only when you as a
package developer run 'R CMD build' that the vignette engine machinery
is needed.

/Henrik
(author of R.rsp)



 On 27.02.2015 02:44, Wang, Zhu wrote:

 Dear all,

 In my package I have a computational expensive Rnw file which can't pass R
 CMD check. Therefore I set eval=FALSE in the Rnw file. But I would like to
 have the pdf vignette generated by the Rnw file with eval=TRUE. It seems to
 me a static pdf vignette is an option.  Any suggestions on this?

 Thanks,

 Zhu Wang


 **Connecticut Children's Confidentiality Notice**

 This e-mail message, including any attachments, is for...{{dropped:6}}


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Title case in DESCRIPTION for package where a word is a function namei

2015-04-25 Thread Henrik Bengtsson

On Apr 25, 2015 05:07, Prof J C Nash (U30A) nas...@uottawa.ca wrote:

 How about allowing underscore? (I believe WRE is silent on this, and I
 have not tried submitting a package with underscore in the title.) As I
 pointed out in my OP, _optim()_ works. And we have the advantage that we
 can distinguish package from function.

Backticks also works (and also happens to be what Markdown use for inline code);

 title - A Replacement and Extension of the `optim()` Function
 title == tools::toTitleCase(title)
[1] TRUE

Henrik


 The purpose of consistent editing is surely to provide the affordances
 that save us from needing extra documentation, as per Donald Norman's
 excellent discussions on Design of Everyday Things, or Turn Signals are
 the Facial Expressions of Automobiles. Changing the name of a function
 in a case-sensitive computing language may not be a bug, but it is
 asking for trouble.

 JN

 On 15-04-25 07:57 AM, peter dalgaard wrote:
 
  On 25 Apr 2015, at 13:11 , Prof J C Nash (U30A) nas...@uottawa.ca wrote:
 
  Hendrik pointed out it was the parentheses that gave the complaint.
  Single quotes and no parentheses seem to satisfy R CMD check. Perhaps
  that needs to be in the WRE.
 
  Well, it is in ?toTitleCase:
 
   ...However, unknown
   technical terms will be capitalized unless they are single words
   enclosed in single quotes: names of packages and libraries should
   be quoted in titles.
 
  ..and it is the single word bit that gets you. AFAICT, the issue is that 
  it splits the text into words and then looks for words that begin and end 
  with a single quote. And parentheses count as word separators, so the 
  quotes of 'optim()' end up in two different words.
 
  It's one of those things that aren't easy to fix: Presumably you do want 
  capitalization within parentheses so we can't just not let them be 
  separators, and we can't just look for sets of single quotes with arbitrary 
  content because they get used inside ordinary text (e.g. the beginning of 
  this paragraph contains 's one of those things that aren'). So either we 
  need more heuristics, like only counting () as separators when preceded by 
  or preceding a space, or some sort of explicit escape mechanism, like 
  BibTeX's {foo}.
 
 
  However, I have for some time used the parentheses to distinguish
  functions from packages. optim() is a function, optimx a package.
  Is this something CRAN should be thinking about? I would argue greater
  benefit to users than title case.
 
  JN
 
 
  On 15-04-24 06:17 PM, Uwe Ligges wrote:
 
 
  On 24.04.2015 22:44, Ben Bolker wrote:
  Prof J C Nash (U30A nashjc at uottawa.ca writes:
 
 
  I was preparing a fix for a minor glitch in my optimx package and R CMD
  check gave an error that the title was not in title case.
 
[snip] to make Gmane happy ...
 
  I have found
 
  A Replacement and Extension of the _optim()_ Function
 
  does not get the complaint, but I'm not sure the underscore is allowed.
 
  Given that I've obeyed the RTFM rule, I'm wondering what to do now.
 
Presumably you should ask the CRAN maintainers?  That seems to
  be the only possible answer -- I don't think anyone else can guess
  very accurately ...
 
  From WRE:
 
  Refer to other packages and external software in single quotes, and to
  book titles (and similar) in double quotes.
 
  Other non-English usage (as documented for the Description field; this
  inlcudes function names) can also be used in single quotes.
 
  Best,
  Uwe Ligges
 
 
 
Ben Bolker
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Shouldn't vector indexing with negative out-of-range index give an error?

2015-05-04 Thread Henrik Bengtsson

In Section 'Indexing by vectors' of 'R Language Definition'
(http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Indexing-by-vectors)
it says:

Integer. All elements of i must have the same sign. If they are
positive, the elements of x with those index numbers are selected. If
i contains negative elements, all elements except those indicated are
selected.

If i is positive and exceeds length(x) then the corresponding
selection is NA. A negative out of bounds value for i causes an error.

A special case is the zero index, which has null effects: x[0] is an
empty vector and otherwise including zeros among positive or negative
indices has the same effect as if they were omitted.

However, that A negative out of bounds value for i causes an error
in the second paragraph does not seem to apply.  Instead, R silently
ignore negative indices that are out of range.  For example:

 x - 1:4
 x[-9L]
[1] 1 2 3 4
 x[-c(1:9)]
integer(0)
 x[-c(3:9)]
[1] 1 2

 y - as.list(1:4)
 y[-c(1:9)]
list()

Is the observed non-error the correct behavior and therefore the
documentation is incorrect, or is it vice verse?  (...or is it me
missing something)

I get the above on R devel, R 3.2.0, and as far back as R 2.11.0
(haven't check earlier versions).

Thank you,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] capabilities(X11): Force refresh from within R? (... and minor documentation issue)

2015-05-06 Thread Henrik Bengtsson

Is there a way to refresh capabilities(X11) without restarting R
such that it reflects the enabling/disabling of X11?


BACKGROUND:

If I launch R with X11 server disabled (e.g. ssh -X / ssh -Y to remote
Linux but forgot to enable Xming on local Windows), then I get:

 capabilities(X11)
  X11
FALSE

 x11()
Error in .External2(C_X11, d$display, d$width, d$height, d$pointsize,  :
  unable to start device X11cairo
In addition: Warning message:
In x11() : unable to open connection to X11 display ''

So far so good.  However, if I then enable the X11 server (e.g. start
Xming on Windows), I still get:

 capabilities(X11)
  X11
FALSE

but

 x11()

successfully opens an X11 plot window.  In other words, the value of
capabilities(X11) is not reflecting the availability of X11; from
?capabilities:

 X11: Are the 'X11' graphics device and the X11-based data editor
  available?  This loads the X11 module if not already loaded,
  and checks that the default display can be contacted unless a
  'X11' device has already been used.

Not sure what that last ...unless a 'X11' device has already been
used actually means here; is it a disclaimer for the above behavior?

If I restart R, the I get:

 capabilities(X11)
  X11
TRUE

I came up with the following approach that launches another R session
querying the availability of X11:

capabilitiesX11 - function() {
  bin - file.path(R.home(bin), Rscript)
  cmd - cat(capabilities('X11'))
  value - system2(bin, args=c(-e, dQuote(cmd)), stdout=TRUE)
  as.logical(value)
}

 capabilitiesX11()
[1] TRUE

but it's certainly feels like a hack.

Is there a way to force a refresh of capabilities(X11) without restarting R?

BTW, the description of ?capabilities says: Report on the optional
features which have been compiled into this build of R.  The
compiled into this build part seems too specific; the above shows
that capabilities() also reflects run-time availabilities.  Other
properties does this as well.


Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Shouldn't vector indexing with negative out-of-range index give an error?

2015-05-06 Thread Henrik Bengtsson

On Wed, May 6, 2015 at 1:33 AM, Martin Maechler
maech...@lynne.stat.math.ethz.ch wrote:
 John Chambers j...@stat.stanford.edu
 on Tue, 5 May 2015 08:39:46 -0700 writes:

  When someone suggests that we might have had a reason for some 
 peculiarity in the original S, my usual reaction is Or else we never thought 
 of the problem.
  In this case, however, there is a relevant statement in the 1988 blue 
 book.  In the discussion of subscripting (p 358) the definition for negative 
 i says: the indices consist of the elements of seq(along=x) that do not 
 match any elements in -i.

  Suggesting that no bounds checking on -i takes place.

  John

 Indeed!
 Thanks a lot John, for the perspective and clarification!

 I'm committing a patch to the documentation now.

Thank you both and also credits to Dongcan Jiang for pointing out to
me that errors were indeed not generated in this case.

I agree with the decision. It's interesting to notice that now the
only way an error is generated is when index-vector subsetting is done
using mixed positive and negative indices, e.g. x[c(-1,1)].

/Henrik

 Martin


  On May 5, 2015, at 7:01 AM, Martin Maechler 
 maech...@lynne.stat.math.ethz.ch wrote:

  Henrik Bengtsson henrik.bengts...@ucsf.edu
  on Mon, 4 May 2015 12:20:44 -0700 writes:
 
  In Section 'Indexing by vectors' of 'R Language Definition'
  
 (http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Indexing-by-vectors)
  it says:
 
  Integer. All elements of i must have the same sign. If they are
  positive, the elements of x with those index numbers are selected. If
  i contains negative elements, all elements except those indicated are
  selected.
 
  If i is positive and exceeds length(x) then the corresponding
  selection is NA. A negative out of bounds value for i causes an error.
 
  A special case is the zero index, which has null effects: x[0] is an
  empty vector and otherwise including zeros among positive or negative
  indices has the same effect as if they were omitted.
 
  However, that A negative out of bounds value for i causes an error
  in the second paragraph does not seem to apply.  Instead, R silently
  ignore negative indices that are out of range.  For example:
 
  x - 1:4
  x[-9L]
  [1] 1 2 3 4
  x[-c(1:9)]
  integer(0)
  x[-c(3:9)]
  [1] 1 2
 
  y - as.list(1:4)
  y[-c(1:9)]
  list()
 
  Is the observed non-error the correct behavior and therefore the
  documentation is incorrect, or is it vice verse?  (...or is it me
  missing something)
 
  I get the above on R devel, R 3.2.0, and as far back as R 2.11.0
  (haven't check earlier versions).
 
  Thank you, Henrik!
 
  I've checked further back: The change happened between R 2.5.1 and R 
 2.6.0.
 
  The previous behavior was
 
  (1:3)[-(3:5)]
  Error: subscript out of bounds
 
  If you start reading NEWS.2, you see a *lot* of new features
  (and bug fixes) in the 2.6.0 news, but from my browsing, none of
  them mentioned the new behavior as feature.
 
  Let's -- for a moment -- declare it a bug in the code, i.e., not
  in the documentation:
 
  - As 2.6.0  happened quite a while ago (Oct. 2007),
  we could wonder how much R code will break if we fix the bug.
 
  - Is the R package authors' community willing to do the necessary
  cleanup in their packages ?
 
             
 
 
  Now, after reading the source code for a while, and looking at
  the changes, I've found the log entry
 
  
 
  r42123 | ihaka | 2007-07-05 02:00:05 +0200 (Thu, 05 Jul 2007) | 4 lines
 
  Changed the behaviour of out-of-bounds negative
  subscripts to match that of S.  Such values are
  now ignored rather than tripping an error.
 
  
 
 
  So, it was changed on purpose, by one of the true Rs, very
  much on purpose.
 
  Making it a *warning* instead of the original error
  may have been both more cautious and more helpful for
  detecting programming errors.
 
  OTOH, John Chambers, the father of S and hence grandfather of R,
  may have had good reasons why it seemed more logical to silently
  ignore such out of bound negative indices:
  One could argue that
 
  x[-5]  means  leave away the 5-th element of x
 
  and if there is no 5-th element of x, leaving it away should be a 
 no-op.
 
  After all this musing and history detection, my gut decision
  would be to only change the documentation which Ross forgot to change.
 
  But of course, it may be interesting

[Rd] PATCH: library(..., quietly=TRUE) still outputs Loading required package: ... (forgot to pass down 'quietly')

2015-05-09 Thread Henrik Bengtsson

Calling library(..., quietly=TRUE) may still output:

   Loading required package: other pkg

in some cases, e.g.

 library(R.utils, quietly=TRUE)
Loading required package: R.methodsS3
[...]

I traced this to base:::.getRequiredPackages2(), which forgets to pass
'quietly' to an internal library() call:

if (!attached) {
if (!quietly)
packageStartupMessage(gettextf(Loading required package: %s,
  pkg), domain = NA)
library(pkg, character.only = TRUE, logical.return = TRUE,
lib.loc = lib.loc) || stop(gettextf(package %s could not be loaded,

sQuote(pkg)), call. = FALSE, domain = NA)
}

It's from that library() call the message is generated.


Here's a patch:

$ svn diff src\library\base\R\library.R
Index: src/library/base/R/library.R
===
--- src/library/base/R/library.R(revision 68345)
+++ src/library/base/R/library.R(working copy)
@@ -871,7 +871,7 @@
 packageStartupMessage(gettextf(Loading required package: %s,
pkg), domain = NA)
 library(pkg, character.only = TRUE, logical.return = TRUE,
-lib.loc = lib.loc) ||
+lib.loc = lib.loc, quietly = quietly) ||
 stop(gettextf(package %s could not be loaded, sQuote(pkg)),
  call. = FALSE, domain = NA)
 }

I can submit it via http://bugs.r-project.org/ if preferred.


Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R Language Definition: Subsetting matrices with negative indices is not an error

2015-05-09 Thread Henrik Bengtsson

On Sat, May 9, 2015 at 12:55 AM, peter dalgaard pda...@gmail.com wrote:

 On 09 May 2015, at 02:53 , Henrik Bengtsson henrik.bengts...@ucsf.edu 
 wrote:

 Hi,

 I spotted what looks like another(*) mistake in 'R Language
 Definition' on how subsetting should work.  In Section 'Indexing
 matrices and arrays'
 [http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Indexing-matrices-and-arrays]
 one can read

   Negative indices are not allowed in indexing matrices.

 Parse error: I believe that this is intended to mean

 Indexing matrices may not contain negative indices

 not

 You cannot use negative indices when indexing matrices.

 This is consistent with the help page:

 
  A third form of indexing is via a numeric matrix with the one
  column for each dimension: each row of the index matrix then
  selects a single element of the array, and the result is a vector.
  Negative indices are not allowed in the index matrix.
 

 Rephrasing would seem to be in order

Ah... definitely a parse error (I read it as a new paragraph).  I
second rephrasing this; your Indexing matrices may not contain
negative indices is non-ambiguous.

Thanks Peter

/Henrik


 -pd


 but this is not true, e.g.

 x - matrix(1:12, nrow=4)
 x
 [,1] [,2] [,3]
 [1,]159
 [2,]26   10
 [3,]37   11
 [4,]48   12

 x[c(-2,-4),]
 [,1] [,2] [,3]
 [1,]159
 [2,]37   11

 /Henrik

 (*) https://stat.ethz.ch/pipermail/r-devel/2015-May/071091.html [docs
 have been fixed]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 --
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com









__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] R Language Definition: Subsetting matrices with negative indices is not an error

2015-05-08 Thread Henrik Bengtsson

Hi,

I spotted what looks like another(*) mistake in 'R Language
Definition' on how subsetting should work.  In Section 'Indexing
matrices and arrays'
[http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Indexing-matrices-and-arrays]
one can read

   Negative indices are not allowed in indexing matrices.

but this is not true, e.g.

 x - matrix(1:12, nrow=4)
 x
 [,1] [,2] [,3]
[1,]159
[2,]26   10
[3,]37   11
[4,]48   12

 x[c(-2,-4),]
 [,1] [,2] [,3]
[1,]159
[2,]37   11

/Henrik

(*) https://stat.ethz.ch/pipermail/r-devel/2015-May/071091.html [docs
have been fixed]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] WISH: A more informative abor message than aborting ...

2015-05-12 Thread Henrik Bengtsson

When R aborts (core dumps), it outputs:

  aborting ...

This message is rather generic and can be hard to track back to R
itself, i.e. it is not always clear whether it is R itself that
aborted or some other piece of code that caused the abort/core dump
and outputted that message.

May I suggest to expand the message to clarify that it is R that
aborts and that make it explicit that it is the very last message that
is generated by R, e.g.

  An exception occurred that R could not recover from. The R session
is now aborting ...

The code that needs to be updated is in
https://svn.r-project.org/R/trunk/src/main/main.c.  Here's a patch for
the above suggestion:

$ svn diff src/main/main.c
Index: src/main/main.c
===
--- src/main/main.c (revision 68355)
+++ src/main/main.c (working copy)
@@ -594,7 +594,7 @@
}
}
 }
-REprintf(aborting ...\n);
+REprintf(An exception occurred that R could not recover from.
The R session is now aborting ...\n);
 R_CleanTempDir();
 /* now do normal behaviour, e.g. core dump */
 signal(signum, SIG_DFL);

FYI, after signal(), raise(signum) is called and I think that's it from R.

Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Alternative for wildcard gnu extension in Makevars

2015-05-16 Thread Henrik Bengtsson

On Fri, May 15, 2015 at 8:01 AM, Simon Urbanek
simon.urba...@r-project.org wrote:
 On May 13, 2015, at 2:28 PM, Henrik Bengtsson henrik.bengts...@ucsf.edu 
 wrote:

 While at it:  'Makevars' is an R invention (i.e. documentation of it
 is only available through the R docs), correct?  /Henrik


 Well, it's just a Makefile fragment that gets included along with the rest of 
 the Makefiles, so for all practical purposes it's just a Makefile which 
 implicitly includes R's makefile on top so you don't have to do that by hand.

Thanks for confirming.

/Henrik


 Cheers,
 Simon



 On Wed, May 13, 2015 at 10:10 AM, Kevin Ushey kevinus...@gmail.com wrote:
 One other solution that's only a little crazy: you could have a R
 function within your package that generates the appropriate (portable)
 Makevars, and within the package `configure` script call that
 function. For example

R --vanilla --slave -e source('R/makevars.R'); makevars()

 And that 'makevars()' function could generate portable
 'Makevars(.win)' files for your package.

 Kevin

 On Wed, May 13, 2015 at 9:08 AM, Gábor Csárdi csardi.ga...@gmail.com 
 wrote:
 On Wed, May 13, 2015 at 12:05 PM, Jan van der Laan rh...@eoos.dds.nl
 wrote:
 [...]

 Too bad. Since it is only a handful of files, I will probably move them
 directly into the src directory and prefix them. It would have been nice 
 to
 have been able to keep them separate.


 If it is a couple of files, then you can also just list them in SOURCES (or
 even just OBJECTS, with a .o suffix), and leave them where they are.

 Gabor

 [...]

[[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Alternative for wildcard gnu extension in Makevars

2015-05-13 Thread Henrik Bengtsson

While at it:  'Makevars' is an R invention (i.e. documentation of it
is only available through the R docs), correct?  /Henrik

On Wed, May 13, 2015 at 10:10 AM, Kevin Ushey kevinus...@gmail.com wrote:
 One other solution that's only a little crazy: you could have a R
 function within your package that generates the appropriate (portable)
 Makevars, and within the package `configure` script call that
 function. For example

 R --vanilla --slave -e source('R/makevars.R'); makevars()

 And that 'makevars()' function could generate portable
 'Makevars(.win)' files for your package.

 Kevin

 On Wed, May 13, 2015 at 9:08 AM, Gábor Csárdi csardi.ga...@gmail.com wrote:
 On Wed, May 13, 2015 at 12:05 PM, Jan van der Laan rh...@eoos.dds.nl
 wrote:
 [...]

 Too bad. Since it is only a handful of files, I will probably move them
 directly into the src directory and prefix them. It would have been nice to
 have been able to keep them separate.


 If it is a couple of files, then you can also just list them in SOURCES (or
 even just OBJECTS, with a .o suffix), and leave them where they are.

 Gabor

 [...]

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Creating a vignette which depends on a non-distributable file

2015-05-14 Thread Henrik Bengtsson

On May 14, 2015 15:04, January Weiner january.wei...@gmail.com wrote:

 Dear all,

 I am writing a vignette that requires a file which I am not allowed to
 distribute, but which the user can easily download manually. Moreover, it
 is not possible to download this file automatically from R: downloading
 requires a (free) registration that seems to work only through a browser.
 (I'm talking here about the MSigDB from the Broad Institute,
 http://www.broadinstitute.org/gsea/msigdb/index.jsp).

 In the vignette, I tell the user to download the file and then show how it
 can be parsed and used in R. Thus, I can compile the vignette only if this
 file is present in the vignettes/ directory of the package. However, it
 would then get included in the package -- which I am not allowed to do.

 What should I do?

 (1) finding an alternative to MSigDB is not a solution -- there simply is
 no alternative.
 (2) I could enter the code (and the results) in a verbatim environment
 instead of using Sweave. This has obvious drawbacks (for one thing, it
 would look incosistent).
 (3) I could build vignette outside of the package and put it into the
 inst/doc directory. This also has obvious drawbacks.
 (4) Leaving this example out defies the purpose of my package.

 I am tending towards solution (2). What do you think?

Not clear how big of a static piece you're taking about, but maybe you
could set it up such that you use (2) as a fallback, i.e. have the vignette
include a static/pre-generated piece (which is clearly marked as such) only
if the external dependency is not available.

Just a thought

Henrik


 Kind regards,

 j.



 --
  January Weiner --

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] install.packages() / update.packages() sometimes outputs to stdout and sometimes to stderr [and menu() readline()]

2015-05-17 Thread Henrik Bengtsson

I've noticed that install.packages()
[https://svn.r-project.org/R/trunk/src/library/utils/R/packages.R] and
update.packages()
[https://svn.r-project.org/R/trunk/src/library/utils/R/packages2.R]
sometimes output to stdout and sometimes to stderr.

It looks like stderr is used (e.g. via cat()) when the message is part
of querying the user, e.g.

update.packages - function(lib.loc = NULL, repos = getOption(repos),
[...]
cat(old[k, Package], :\n,
Version, old[k, Installed],
installed in, old[k, LibPath],
if(checkBuilt) paste(built under R, old[k, Built]),
\n,
Version, old[k, ReposVer], available at,
simplifyRepos(old[k, Repository], type))
cat(\n)
answer - substr(readline(Update (y/N/c)?  ), 1L, 1L)
if(answer == c | answer == C) {
cat(cancelled by user\n)
return(invisible())
}

but it is not consistently so, because some are sent to stderr (e.g.
via message()), e.g.

install.packages -
function(pkgs, lib, repos = getOption(repos),
[...]
if(action == interactive  interactive()) {
msg -
ngettext(sum(later  hasSrc),
 Do you want to install from sources
the package which needs compilation?,
 Do you want to install from sources
the packages which need compilation?)
message(msg, domain = NA)
res - readline(y/n: )
if(res != y) later - later  !hasSrc
} else if (action == never) {
cat(  Binaries will be installed\n)
later - later  !hasSrc
}

Also, as one see in the latter example, it is not only interactive
user queries for which stdout is used.  It's simply not consistent -
at least I cannot see pattern.

If these are not intended behaviors, I'm happy to provide patches.
I'd prefer stderr for all user queries (see below) - but I can also
see how this is something that needs considered thoughts and made an
official design policy across the R base distribution.


[Related to the above but could deserve it's own thread (feel free to
move the below to its own thread)]

utils::menu(..., graphics=FALSE)
[https://svn.r-project.org/R/trunk/src/library/utils/R/menu.R] queries
the user via standard output, which becomes an issue when running
interactive report generators, which mostly captures stdout and makes
it part of the produced artifact.  Personally, I'd argue that querying
the user via stderr would be a better choice in more cases.

Also, it's a bit weird that base::readline(), which is used for the
actual prompting of the user, is sent neither to R's stdout nor
stderr, e.g.

 zz - file(all.Rout, open = wt)
 sink(zz); sink(zz, type = message)
 ans - menu(letters[1:3], title=Select one:, graphics=FALSE)
Selection: 1

 sink(type = message); sink()
 ans
[1] 1

 cat(readLines(all.Rout), sep=\n)
Select one:

1: a
2: b
3: c

Note the only thing displayed to the user is the prompt Selection: ,
which is generated by readline().  It does however output to the
system's stdout (verified on Linux and Windows), e.g.

$ Rscript -e readline('Press ENTER: ')  stdout.log
$ cat stdout.log
Press ENTER:
[1] 

Compare this to how it works in, for instance, bash:

$ read -p Press ENTER:  ans  stdout.log
Press ENTER:
$ read -p Press ENTER:  ans  stderr.log
$ cat stderr.log
Press ENTER:

My preference would be that menu() and readline() and other messages
for querying the user would output to the same connection/stream.
Again, I'd favor stderr over stdout, but possibly even a third
alternative designed specifically for user queries, cf. my R-devel
post '[Rd] Output to raw console rather than stdout/stderr?' on
2015-02-01 (https://stat.ethz.ch/pipermail/r-devel/2015-February/070578.html).
Just like when using menu(..., graphics=TRUE), this would not clutter
up the output to stdout (or stderr).  But even without this third
alternative, I argue that stderr is better than stdout. I'm happy to
provide patches for this as well.


Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] The function cummax() seems to have a bug.

2015-05-17 Thread Henrik Bengtsson

Below is some further troubleshooting on this:

From code inspection this bug happens for only:

* for integer values
* when the first element is NA_integer_ and the second is not.


Examples:

# Numeric/doubles works as expected
 cummax(c(NA_real_, 0, 1, 2, 3))
[1] NA NA NA NA NA

# It does not occur when the first value is non-NA
 cummax(c(0L, NA_integer_, 1L, 2L, 3L))
[1]  0 NA NA NA NA

# When first value is NA, it is not remembered
# (because internal for loop starts with 2nd element)
 cummax(c(NA_integer_, 0L, 1L, 2L, 3L))
[1] NA  0  1  2  3

The problem is not there for cummin():

 cummin(c(0L, NA_integer_, 1L, 2L, 3L))
[1]  0 NA NA NA NA
 cummin(c(NA_integer_, 0L, 1L, 2L, 3L))
[1] NA NA NA NA NA

but that is just pure luck due to the fact how NA_integer_ is
internally represented as the smallest possible 4-byte signed integer,
i.e.

LibExtern intR_NaInt;   /* NA_INTEGER:= INT_MIN currently */
#define NA_INTEGER  R_NaInt

Note the comment, which implies that code should not rely on the
assumption that NA_integer_ == NA_INTEGER == R_NaInt == INT_MIN; it
could equally well have been INT_MAX, which in case cummin()would
return the wrong result whereas cummax() wouldn't. So, cummin() makes
the same mistake ascummax(), where the for-loop skips the test for NA
of the first element, cf.
https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L145-L148

The simple solution is probably to do (cf. native icumsum):

[HB-X201]{hb}: svn diff src/main/cum.c
Index: src/main/cum.c
===
--- src/main/cum.c  (revision 68378)
+++ src/main/cum.c  (working copy)
@@ -130,7 +130,7 @@
 int *ix = INTEGER(x), *is = INTEGER(s);
 int max = ix[0];
 is[0] = max;
-for (R_xlen_t i = 1 ; i  xlength(x) ; i++) {
+for (R_xlen_t i = 0 ; i  xlength(x) ; i++) {
if(ix[i] == NA_INTEGER) break;
is[i] = max = (max  ix[i]) ? max : ix[i];
 }
@@ -142,7 +142,7 @@
 int *ix = INTEGER(x), *is = INTEGER(s);
 int min = ix[0];
 is[0] = min;
-for (R_xlen_t i = 1 ; i  xlength(x) ; i++ ) {
+for (R_xlen_t i = 0 ; i  xlength(x) ; i++ ) {
if(ix[i] == NA_INTEGER) break;
is[i] = min = (min  ix[i]) ? min : ix[i];
 }

/Henrik

On Sun, May 17, 2015 at 4:13 AM, Dongcan Jiang dongcan.ji...@gmail.com wrote:
 Hi,

 The function cummax() seems to have a bug.

 x - c(NA, 0)
 storage.mode(x) - integer
 cummax(x)
 [1] NA  0

 The correct result of this case should be NA NA. The mistake in
 [https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L130-L136] may be
 the reason.

 Best Regards,
 Dongcan

 --
 Dongcan Jiang
 Team of Search Engine  Web Mining
 School of Electronic Engineering  Computer Science
 Peking University, Beijing, 100871, P.R.China

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Why is the diag function so slow (for extraction)?

2015-05-13 Thread Henrik Bengtsson

As kindly pointed out to me (oh my decaying gray matter), is.object()
is better suited for this test;

$ svn diff src/library/base/R/diag.R
Index: src/library/base/R/diag.R
===
--- src/library/base/R/diag.R   (revision 68345)
+++ src/library/base/R/diag.R   (working copy)
@@ -23,9 +23,11 @@
 stop('nrow' or 'ncol' cannot be specified when 'x' is a matrix)

 if((m - min(dim(x))) == 0L) return(vector(typeof(x), 0L))
+nms - dimnames(x)
+nrow - dim(x)[1L]
 ## NB: need double index to avoid overflows.
-y - c(x)[1 + 0L:(m - 1L) * (dim(x)[1L] + 1)]
-nms - dimnames(x)
+if (is.object(x)) x - c(x)

/Henrik

On Tue, May 12, 2015 at 8:24 PM, Henrik Bengtsson
henrik.bengts...@ucsf.edu wrote:
 Along Luke's lines, would(n't) it be enough to look for existence of
 attribute 'class' to decide whether to dispatch or not, i.e. if c() is
 needed or not?  Even without .subset(), there is a remarkable
 improvement.  I think it's worth condition the code on dispatch or
 not.  For example:

 [HB-X201]{hb}: svn diff diag.R
 Index: diag.R
 ===
 --- diag.R  (revision 68345)
 +++ diag.R  (working copy)
 @@ -23,9 +23,11 @@
  stop('nrow' or 'ncol' cannot be specified when 'x' is a matrix)

  if((m - min(dim(x))) == 0L) return(vector(typeof(x), 0L))
 +nms - dimnames(x)
 +nrow - dim(x)[1L]
  ## NB: need double index to avoid overflows.
 -y - c(x)[1 + 0L:(m - 1L) * (dim(x)[1L] + 1)]
 -nms - dimnames(x)
 +if (!is.null(attr(x, class))) x - c(x)
 +y - x[1 + 0L:(m - 1L) * (nrow + 1)]
  if (is.list(nms)  !any(sapply(nms, is.null)) 
  identical((nm - nms[[1L]][seq_len(m)]), nms[[2L]][seq_len(m)]))
  names(y) - nm

 ?

 /Henrik

 On Tue, May 12, 2015 at 5:33 AM, Martin Maechler
 maech...@lynne.stat.math.ethz.ch wrote:
 Steve Bronder sbron...@stevebronder.com
 on Thu, 7 May 2015 11:49:49 -0400 writes:

  Is it possible to replace c() with .subset()?

 It would be possible, but I think entirely wrong.

 .subset() is documented to be an internal function not to be
 used lightly and more to the point it is documented to *NOT*
 dispatch at all.

 If you read and understood what Peter and Luke wrote, you'd not
 special case here:

 diag() should not work only for pure matrices, but for all
 matrix-like objects for which ``the usual methods'' work, such
 as
as.vector(.), c(.)

 That's why there has been the c(.) in there.

 You can always make code faster if you write the code so it only
 has to work in one special case .. and work there very fast.


  Example below
  
  

  library(microbenchmark)

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How best to get around shadowing of executables by system()'s prepending of directories to Windows' PATH?

2015-05-18 Thread Henrik Bengtsson

You probably already know, but you can at least work around it as:

Sys.which2 - function(cmd) {
  stopifnot(length(cmd) == 1)
  if (.Platform$OS.type == windows) {
suppressWarnings({
  pathname - shell(sprintf(where %s 2 NUL, cmd), intern=TRUE)[1]
   })
   if (!is.na(pathname)) return(setNames(pathname, cmd))
  }
  Sys.which(cmd)
}

(it falls back to Sys.which() if 'where %s' doesn't give anything)


 Sys.which2(convert)
convert
C:\\Program Files\\ImageMagick-6.8.3-Q16\\convert.exe

 Sys.which(convert)
 convert
C:\\Windows\\system32\\convert.exe

/Henrik

On Mon, May 18, 2015 at 11:08 AM, Yihui Xie x...@yihui.name wrote:
 +1 I have exactly the same problem.

 Regards,
 Yihui
 --
 Yihui Xie xieyi...@gmail.com
 Web: http://yihui.name


 On Mon, May 18, 2015 at 12:29 PM, Josh O'Brien joshmobr...@gmail.com wrote:
 My question:

 On Windows, R's system() command prepends several directories to those
 in the Windows Path variable.

 From ?system

  The search path for 'command' may be system-dependent: it will
  include the R 'bin' directory, the working directory and the
  Windows system directories before 'PATH'.

 This shadows any executables on the Path that share a name with, for
 example, one of the Windows commands.

 What should I do when I'd really like (the equivalent of) a call
 passed to system() that would be executed using the same Path that
 you'd get if working directly at the Windows command line? Is there a
 recommended workaround for situtations like this? (It _seems_ like it
 would be handy if system() et al. included an additional argument that
 optionally disabled the prepending of those extra directories, to give
 Windows users full control of the path seen by system(). Would adding
 such an argument have undesirable ramifications?)


 Motivation and reproducible example:

 I'm motivated here by a desire to use the function plotdiff() from
 Paul Murrell's gridGraphics package on my Windows laptop.  Getting
 that to work will require a few code fixes, of which the masking of
 ImageMagick's convert.exe by that in the C:/Windows/System32 seems to
 be the most challenging. plotdiff() relies on system2() calls to
 ImageMagick's 'convert'  function, as well as a call to
 Sys.which(c(convert, compare)) that tests for the presence of
 ImageMagick on the Path. Even  if ImageMagick is placed early on the
 Path, though, both calls to Sys.which() and system2() find Windows'
 convert command  (which Converts FAT volumes to NTFS) rather than
 ImageMagick's convert.


 Here's a reproducible example that shows what I'm seeing:

 ## In R, make a pdf
 pdf(a.pdf)
 plot(rnorm(99), col=red)
 dev.off()

 ## At Windows cmd command line
 where convert
 ## C:\Program Files\ImageMagick-6.8.8-Q16\convert.exe
 ## C:\Windows\System32\convert.exe
 convert -density 100x100 a.pdf a.png

 ## From R

 ## Unqualified references to convert find the 'wrong' one
 Sys.which(convert)
 ##   convert
 ## C:\\Windows\\system32\\convert.exe
  system2(convert,  -density 100x100 a.pdf b.png)
 ## Invalid Parameter - 100x100
 ## Warning message:
 ## running command 'convert -density 100x100 a.pdf b.png' had status 4

 ## A fully qualified reference does work
 system2(C:/Program Files/ImageMagick-6.8.8-Q16/convert,
 -density 100x100 a.pdf b.png)

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Why is the diag function so slow (for extraction)?

2015-05-12 Thread Henrik Bengtsson

Along Luke's lines, would(n't) it be enough to look for existence of
attribute 'class' to decide whether to dispatch or not, i.e. if c() is
needed or not?  Even without .subset(), there is a remarkable
improvement.  I think it's worth condition the code on dispatch or
not.  For example:

[HB-X201]{hb}: svn diff diag.R
Index: diag.R
===
--- diag.R  (revision 68345)
+++ diag.R  (working copy)
@@ -23,9 +23,11 @@
 stop('nrow' or 'ncol' cannot be specified when 'x' is a matrix)

 if((m - min(dim(x))) == 0L) return(vector(typeof(x), 0L))
+nms - dimnames(x)
+nrow - dim(x)[1L]
 ## NB: need double index to avoid overflows.
-y - c(x)[1 + 0L:(m - 1L) * (dim(x)[1L] + 1)]
-nms - dimnames(x)
+if (!is.null(attr(x, class))) x - c(x)
+y - x[1 + 0L:(m - 1L) * (nrow + 1)]
 if (is.list(nms)  !any(sapply(nms, is.null)) 
 identical((nm - nms[[1L]][seq_len(m)]), nms[[2L]][seq_len(m)]))
 names(y) - nm

?

/Henrik

On Tue, May 12, 2015 at 5:33 AM, Martin Maechler
maech...@lynne.stat.math.ethz.ch wrote:
 Steve Bronder sbron...@stevebronder.com
 on Thu, 7 May 2015 11:49:49 -0400 writes:

  Is it possible to replace c() with .subset()?

 It would be possible, but I think entirely wrong.

 .subset() is documented to be an internal function not to be
 used lightly and more to the point it is documented to *NOT*
 dispatch at all.

 If you read and understood what Peter and Luke wrote, you'd not
 special case here:

 diag() should not work only for pure matrices, but for all
 matrix-like objects for which ``the usual methods'' work, such
 as
as.vector(.), c(.)

 That's why there has been the c(.) in there.

 You can always make code faster if you write the code so it only
 has to work in one special case .. and work there very fast.


  Example below
  
  

  library(microbenchmark)

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Package compilation woes on submission.

2015-04-07 Thread Henrik Bengtsson

I've been there too.  The clang compiler does useful validation and
troubleshooting of your code in addition to what you get with gcc.  I
recommend installing/trying it as a complement to your default setup.

It has a -c option for Only run preprocess, compile, and assemble
steps allowing you do check you code as:

clang -c -pedantic -Wall -I$(R_HOME)/include/ src/*.c

I wouldn't be surprised if this would be enough for you to identify
and fix the problematic code.

Hope this helps

Henrik



On Tue, Apr 7, 2015 at 4:28 AM, Aravind Jayaraman
aravindjayaramana...@gmail.com wrote:
 Hello,

 I am trying to submit a new package to CRAN. I had checked the
 packages with R-devel and R-release in windows 7 local and ubuntu
 12.04 local. The package has a C file in src folder which successfully
 compiled in all the cases (there were few trivial warnings).

 I had submitted the package last week and was asked to resubmit with
 few changes in the description file.

 However upon resubmission today, the C code failed to compile in the
 reviewer's system throwing several errors.

 I rechecked the package with windows 7 and ubuntu as before and there
 were no errors. I tried winbuilder with R-release as well as R 3.2.0
 beta again without any errors.

 The only difference that I could understand was that the compiler used
 by me as well as winbuilder is gcc while that in case of the reviewer
 is clang.

 In such a case, how should I proceed with the submission?

 --
 J.Aravind
 Germplasm Conservation Division
 ICAR-National Bureau of Plant Genetic Resources
 New Delhi - 110 012

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Listing all spawned jobs/processed after parallel::mcparallel()?

2015-06-23 Thread Henrik Bengtsson

On Sun, Jun 21, 2015 at 9:59 AM, Prof Brian Ripley
rip...@stats.ox.ac.uk wrote:
 On 20/06/2015 22:21, Henrik Bengtsson wrote:

 QUESTION:
 Is it possible to query number of active jobs running after launching
 them with parallel::mcparallel()?

 For example, if I launch 3 jobs using:

 library(parallel)
 f - lapply(1:3, FUN=mcparallel)


 then I can inspect them as:

 str(f)

 List of 3
   $ :List of 2
..$ pid: int 142225
..$ fd : int [1:2] 8 13
..- attr(*, class)= chr [1:3] parallelJob childProcess process
   $ :List of 2
..$ pid: int 142226
..$ fd : int [1:2] 10 15
..- attr(*, class)= chr [1:3] parallelJob childProcess process
   $ :List of 2
..$ pid: int 142227
..$ fd : int [1:2] 12 17
..- attr(*, class)= chr [1:3] parallelJob childProcess process

 However, if I launch them without recording them, or equivalently if I
 do:

 f - lapply(1:3, FUN=mcparallel)
 rm(list=f)


 is there a function/mechanism in R/the parallel package allowing me to
 find the currently active/running processes?  ... or at least query
 how many they are?  I'd like to use this to prevent spawning of more
 than a maximum number of parallel processes.  (Yes, I'm away of
 mclapply() and friends, but I'm looking at using more low-level
 mcparallel()/mccollect()). I'm trying to decide whether I should
 implement my own mechanism for keeping track of jobs or not.


 Note that 'currently active/running' is a slippery concept and is not what
 the results above show.  But see ?children, which seems to be what you are
 looking for.  It is not exported and there is no more detailed explanation
 save the source code.  Also note that tells you about children and not
 grandchildren 

 You can find out about child processes (and their children) at OS level, for
 example via the 'ps' command, but doing so portably is not easy.

Thank you very much.  This was exactly what I was looking for.  I
appreciate the problem of identifying grandchildren, but with
children() I know at least have chance to get to a lower bound of the
number of active children (?children).

After some initial testing on Linux and OSX, I'm glad to see that
parallel:::children() seem to reflect what are actually active
processes, e.g. if I SIGTERM one of them externally, it is immediately
dropped from parallel:::children().  I also noticed that the process
remains active until it has been parallel:::mccollect():ed.

/Henrik


 --
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Emeritus Professor of Applied Statistics, University of Oxford
 1 South Parks Road, Oxford OX1 3TG, UK

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Listing all spawned jobs/processed after parallel::mcparallel()?

2015-06-20 Thread Henrik Bengtsson

QUESTION:
Is it possible to query number of active jobs running after launching
them with parallel::mcparallel()?

For example, if I launch 3 jobs using:

 library(parallel)
 f - lapply(1:3, FUN=mcparallel)

then I can inspect them as:

 str(f)
List of 3
 $ :List of 2
  ..$ pid: int 142225
  ..$ fd : int [1:2] 8 13
  ..- attr(*, class)= chr [1:3] parallelJob childProcess process
 $ :List of 2
  ..$ pid: int 142226
  ..$ fd : int [1:2] 10 15
  ..- attr(*, class)= chr [1:3] parallelJob childProcess process
 $ :List of 2
  ..$ pid: int 142227
  ..$ fd : int [1:2] 12 17
  ..- attr(*, class)= chr [1:3] parallelJob childProcess process

However, if I launch them without recording them, or equivalently if I do:

 f - lapply(1:3, FUN=mcparallel)
 rm(list=f)

is there a function/mechanism in R/the parallel package allowing me to
find the currently active/running processes?  ... or at least query
how many they are?  I'd like to use this to prevent spawning of more
than a maximum number of parallel processes.  (Yes, I'm away of
mclapply() and friends, but I'm looking at using more low-level
mcparallel()/mccollect()). I'm trying to decide whether I should
implement my own mechanism for keeping track of jobs or not.

Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Code demos via HTML help gives an error

2015-06-16 Thread Henrik Bengtsson

PROBLEM:
I'm getting error:

Error in order(matches$Position) : argument 1 is not a vector

Whenever I try to access a package's Code demos page via the link on
the package HTML index page.


SOME TROUBLESHOOTING:
Looking at for instance the 'stats' package.  The Code demos URL
takes the form

 http://127.0.0.1:30200/library/stats/demo

Click on this, takes me to

 http://127.0.0.1:30200/doc/html/Search?package=statsagrep=FALSEtypes=demo


WHERE:
I see this on R devel (2015-06-15 r68521) and R 3.2.1 RC (2015-06-14
r68516) - both on Platform: x86_64-w64-mingw32/x64 (64-bit).  Haven't
checked other platforms.


Regarding the imminent release of R 3.2.1; Sorry, I rarely use demos,
so it's only in this very moment I noticed.

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RFC: Declaring foo.bar as nonS3method() ?!

2015-06-12 Thread Henrik Bengtsson

Analogously to how S4 methods are declared in the code, cf.
methods::setMethod(), I'd find it more natural to also declare S3
methods in the code and note in the NAMESPACE.  For example:

# S3 method summary() for class 'aov':

summary.aov - function(x, ...) {
  # something
}
S3class(summary.aov) - aov

with

`S3class-` - function(x, value) {
  attr(x, S3class) - value
  x
}

For backward compatibility, if 'S3class' is not set, the default
could/should be to infer it using the current strategy, i.e. the part
after the last period/dot plus other bells'n'whistles discussed.  If
all S3 methods had attribute 'S3class' set, there would be no need to
declare the non-S3 case.

Finally, to explicitly declare a function _not_ to be a S3 method, one
could allow for

S3class(all.effects) - FALSE


At this point, I need  to bring up the wish of have a core R function,
again cf. setMethod(), doing the above for us, e.g.

setMethodS3(summary, aov, function(x, ...) {
  # something
})

It can be extremely light weight and would resemble what setMethod()
does for S4 methods.

Also, with the 'S3class' attribute set, one could imagine not having
to declare them as S3method(summary, aov) in the NAMESPACE.  This
could be fully automatic (and backward compatible for migration).
Absolutely not a rant, but from a developers point of view I always
found it a bit ad hoc to have to declare S3 methods in the NAMESPACE
rather than in the code.  We're not doing it for S4 methods, so why
for S3 ones? (BTW, I think I understand why).  For the same reason,
I'd would think adding NAMESPACE declaration nonS3method() would just
add another workaround.

The above would be backward compatible, allow for a long-term
migration, while allowing folks to use periods/dots however they wish.
It would also allow code inspections such as 'R CMD check --as-cran'
to avoid false positives.

/Henrik


On Fri, Jun 12, 2015 at 8:30 AM,  luke-tier...@uiowa.edu wrote:
 The notes available off the devloper page
 https://developer.r-project.org/ describe some of the rationale for
 the S3 method search design. One thing that has changed since then is
 that all packages now have name spaces. We could change the search
 algorithm to skip attached package exports (and package imports and
 base), which would require methods defined in packages that are to be
 accessible outside the package to be declared.  Methods defined inside
 a package for internal use or methods defined in scripts not in
 packages would still be found. Packages not currently registering
 their methods would have to do so -- not sure how many that would
 affect. Testing on CRAN/Bioc should show how much of an effect this
 would have and whether there are any other issues.

 Best,

 luke


 On Fri, 12 Jun 2015, Duncan Murdoch wrote:

 On 12/06/2015 10:53 AM, Hadley Wickham wrote:

 To me, it seems like there's actually two problems here:

 1) Preventing all() from dispatching to all.effects() for objects of
 class effects
 2) Eliminating the NOTE in R CMD check

 My impression is that 1) actually causes few problems, particularly
 since people are mostly now aware of the problem and avoid using `.`
 in function names unless they're S3 methods. Fixing this issue seems
 like it would be a lot of work for relatively little gain.

 However, I think we want to prevent people from writing new functions
 with this confusing naming scheme, but equally we want to grandfather
 in existing functions, because renaming them all would be a lot of
 work (I'm looking at you t.test()!).

 Could we have a system similar to globalVariables() where you could
 flag a function as definitely not being an S3 method? I'm not sure
 what R CMD check should do - ideally you wouldn't be allow to use
 method.class for new functions, but still be able suppress the note
 for old functions that can't easily be changed.


 We have a mechanism for suppressing the warning for existing functions,
 it's just not available to users to modify.  So it would be possible to
 add effects::all.effects to the stop list, and this might be the easiest
 action here.

 This isn't perfect because all.effects() would still act as a method.
 However,  it does give the deprecated message if you ever call it, so
 nobody would do this unknowingly.  The only real risk is that if anyone
 ever wrote an all.effects function that *was* supposed to be an S3
 method, it might be masked by the one in effects.

 Duncan Murdoch


 Hadley

 On Fri, Jun 12, 2015 at 6:52 AM, Kurt Hornik kurt.hor...@wu.ac.at
 wrote:

 Duncan Murdoch writes:


 On 12/06/2015 7:16 AM, Kurt Hornik wrote:

 Duncan Murdoch writes:


 On 12/06/2015 4:12 AM, Martin Maechler wrote:

 This is a topic ' apparent S3 methods note in R CMD check '
 from R-package-devel
 https://stat.ethz.ch/pipermail/r-package-devel/2015q2/000126.html

 which is relevant to here because some of us have been thinking
 about extending R  because of the issue.

 John Fox, maintainer of the 'effects' package has

Re: [Rd] Code demos via HTML help gives an error

2015-06-17 Thread Henrik Bengtsson

Went ahead and did it directly, cf. PR #16432
(https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16432).  /H

On Wed, Jun 17, 2015 at 4:02 AM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 16/06/2015 4:20 PM, Henrik Bengtsson wrote:
 PROBLEM:
 I'm getting error:

 Error in order(matches$Position) : argument 1 is not a vector

 Whenever I try to access a package's Code demos page via the link on
 the package HTML index page.


 SOME TROUBLESHOOTING:
 Looking at for instance the 'stats' package.  The Code demos URL
 takes the form

  http://127.0.0.1:30200/library/stats/demo

 Click on this, takes me to

  http://127.0.0.1:30200/doc/html/Search?package=statsagrep=FALSEtypes=demo


 WHERE:
 I see this on R devel (2015-06-15 r68521) and R 3.2.1 RC (2015-06-14
 r68516) - both on Platform: x86_64-w64-mingw32/x64 (64-bit).  Haven't
 checked other platforms.


 Regarding the imminent release of R 3.2.1; Sorry, I rarely use demos,
 so it's only in this very moment I noticed.


 A fix won't make it into 3.2.1, but I will try to fix it in
 3.2.1-patched, which will become 3.2.2 eventually, unless I forget.
 Could I ask you to post a bug report about it if you don't hear that
 it's fixed in a few days?

 Duncan Murdoch


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Add-on argument in sample()

2015-06-15 Thread Henrik Bengtsson

You're not the first one, e.g.

https://stat.ethz.ch/pipermail/r-devel/2010-March/057029.html
https://stat.ethz.ch/pipermail/r-devel/2010-November/058981.html

(I was bitten by this in a resampling scheme where the set sampled
from was data driven).

Here's a simple solution - taken from R.utils::resample();

 resample - function (x, ...) x[sample.int(length(x), ...)]

 resample(10, size = 1, replace = FALSE)
[1] 10
 resample(10, size = 3, replace = TRUE)
[1] 10 10 10
 resample(10, size = 3, replace = FALSE)
Error in sample.int(length(x), ...) :
  cannot take a sample larger than the population when 'replace = FALSE'

/Henrik

On Mon, Jun 15, 2015 at 5:55 AM, Millot Gael gael.mil...@curie.fr wrote:
 Hi.

 I have a problem with the default behavior of sample(), which performs 
 sample(1:x) when x is a single value.
 This behavior is well explained in ?sample.
 However, this behavior is annoying when the number of value is not 
 predictable. Would it be possible to add an argument
 that desactivates this and perform the sampling on a single value ? Examples:
 sample(10, size = 1, replace = FALSE)
 10

 sample(10, size = 3, replace = TRUE)
 10 10 10

 sample(10, size = 3, replace = FALSE)
 Error

 Many thanks for your help.

 Best wishes,

 Gael Millot.


 Gael Millot
 UMR 3244 (IC-CNRS-UPMC) et Universite Pierre et Marie Curie
 Equipe Recombinaison et instabilite genetique
 Pav Trouillet Rossignol 5eme etage
 Institut Curie
 26 rue d'Ulm
 75248 Paris Cedex 05
 FRANCE
 tel : 33 1 56 24 66 34
 fax : 33 1 56 24 66 44
 Email : gael.mil...@curie.fr
 http://perso.curie.fr/Gael.Millot/index.html


 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] sum(..., na.rm=FALSE): Summing over NA_real_ values much more expensive than non-NAs for na.rm=FALSE? Hmm...

2015-05-31 Thread Henrik Bengtsson

This is a great example how you cannot figure it out after spending
two hours troubleshooting, but a few minutes after you post to
R-devel, it's just jumps to you (is there a word for this other than
impatient?);

Let me answer my own question.  The discrepancy between my sum2() code
and the internal code for base::sum() is that the latter uses LDOUBLE
= long double (on some system it's only double, cf.
https://github.com/wch/r-source/blob/trunk/src/nmath/nmath.h#L28-L33),
whereas my sum2() code uses double.  So using long double, I can
reproduce the penalty of having NA_real_ with na.rm=FALSE;

 sum3 - inline::cfunction(sig=c(x=double, narm=logical), body='
#define LDOUBLE long double
 double *x_ = REAL(x);
 int narm_ = asLogical(narm);
 int n = length(x);
 LDOUBLE sum = 0.0;
 for (R_xlen_t i = 0; i  n; i++) {
   if (!narm_ || !ISNAN(x_[i])) sum += x_[i];
 }
 return ScalarReal((double)sum);
')

 x - rep(0, 1e8)
 stopifnot(typeof(x) == double)
 system.time(sum3(x, narm=FALSE))
   user  system elapsed
   0.400.000.44
 y - rep(NA_real_, 1e8)
 stopifnot(typeof(y) == double)
 system.time(sum3(y, narm=FALSE))
   user  system elapsed
   9.800.009.84
 z - x; z[length(z)/2] - NA_real_
 stopifnot(typeof(z) == double)
 system.time(sum3(z, narm=FALSE))
   user  system elapsed
   4.490.004.50

This might even be what the following comment refers to:

/* Required by C99 but might be slow */
#ifdef HAVE_LONG_DOUBLE
# define LDOUBLE long double
#else
# define LDOUBLE double
#endif

So now I should rephrase my question: Is there away to avoid this
penalty when using 'long double'?  Is this something the compiler can
be clever about, or is the only solution to not use 'long double'?

/Henrik

On Sun, May 31, 2015 at 5:02 PM, Henrik Bengtsson
henrik.bengts...@ucsf.edu wrote:
 I'm observing that base::sum(x, na.rm=FALSE) for typeof(x) == double
 is much more time consuming when there are missing values versus when
 there are not.  I'm observing this on both Window and Linux, but it's
 quite surprising to me.  Currently, my main suspect is settings in on
 how R was built.  The second suspect is my brain.  I hope that someone
 can clarify the below results and confirm or not whether they see the
 same.  Note, this is for doubles, so I'm not expecting
 early-stopping as for integers (where testing for NA is cheap).

 On R 3.2.0, on Windows (using the official CRAN builds), on Linux
 (local built), and on OS X (official ATT builds), I get:

 x - rep(0, 1e8)
 stopifnot(typeof(x) == double)
 system.time(sum(x, na.rm=FALSE))
user  system elapsed
0.190.010.20

 y - rep(NA_real_, 1e8)
 stopifnot(typeof(y) == double)
 system.time(sum(y, na.rm=FALSE))
user  system elapsed
9.540.009.55

 z - x; z[length(z)/2] - NA_real_
 stopifnot(typeof(z) == double)
 system.time(sum(z, na.rm=FALSE))
user  system elapsed
4.490.004.51

 Following the source code, I'm pretty sure the code
 (https://github.com/wch/r-source/blob/trunk/src/main/summary.c#L112-L128)
 performing the calculation is:

 static Rboolean rsum(double *x, R_xlen_t n, double *value, Rboolean narm)
 {
   LDOUBLE s = 0.0;
   Rboolean updated = FALSE;
   for (R_xlen_t i = 0; i  n; i++) {
 if (!narm || !ISNAN(x[i])) {
   if(!updated) updated = TRUE;
 s += x[i];
 }
   }
   if(s  DBL_MAX) *value = R_PosInf;
   else if (s  -DBL_MAX) *value = R_NegInf;
   else *value = (double) s;
   return updated;
 }

 In other words, when na.rm=FALSE, that inner for loop:

   for (R_xlen_t i = 0; i  n; i++) {
 if (!narm || !ISNAN(x[i])) {
   if(!updated) updated = TRUE;
 s += x[i];
 }
   }

 should effectively become (because !ISNAN(x[i]) does not make a difference):

   for (R_xlen_t i = 0; i  n; i++) {
 if (!narm) {
   if(!updated) updated = TRUE;
 s += x[i];
 }
   }

 That is, sum(x, na.rm=FALSE) basically spends time on `s += x[i]`.
 Now, I have always been under impression that summing with NA:s is
 *not* more expensive that summing over regular (double) values, which
 is confirmed by the below example, but the above benchmarking
 disagree.  It looks like there is a big overhead keeping track of the
 sum `s` being NA, which is supported by the fact that summing over 'z'
 is costs half of 'y'.

 Now, I *cannot* reproduce the above using the following 'inline' example:

 sum2 - inline::cfunction(sig=c(x=double, narm=logical), body='
  double *x_ = REAL(x);
  int narm_ = asLogical(narm);
  int n = length(x);
  double sum = 0;
  for (R_xlen_t i = 0; i  n; i++) {
if (!narm_ || !ISNAN(x_[i])) sum += x_[i];
  }
  return ScalarReal(sum);
 ')

 x - rep(0, 1e8)
 stopifnot(typeof(x) == double)
 system.time(sum2(x, narm=FALSE))
user  system elapsed
0.160.000.16

 y - rep(NA_real_, 1e8)
 stopifnot(typeof(y) == double)
 system.time(sum2(y, narm=FALSE))
user  system elapsed
0.160.000.15

 z - x; z[length(z)/2] - NA_real_
 stopifnot(typeof(z

[Rd] sum(..., na.rm=FALSE): Summing over NA_real_ values much more expensive than non-NAs for na.rm=FALSE? Hmm...

2015-05-31 Thread Henrik Bengtsson

I'm observing that base::sum(x, na.rm=FALSE) for typeof(x) == double
is much more time consuming when there are missing values versus when
there are not.  I'm observing this on both Window and Linux, but it's
quite surprising to me.  Currently, my main suspect is settings in on
how R was built.  The second suspect is my brain.  I hope that someone
can clarify the below results and confirm or not whether they see the
same.  Note, this is for doubles, so I'm not expecting
early-stopping as for integers (where testing for NA is cheap).

On R 3.2.0, on Windows (using the official CRAN builds), on Linux
(local built), and on OS X (official ATT builds), I get:

 x - rep(0, 1e8)
 stopifnot(typeof(x) == double)
 system.time(sum(x, na.rm=FALSE))
   user  system elapsed
   0.190.010.20

 y - rep(NA_real_, 1e8)
 stopifnot(typeof(y) == double)
 system.time(sum(y, na.rm=FALSE))
   user  system elapsed
   9.540.009.55

 z - x; z[length(z)/2] - NA_real_
 stopifnot(typeof(z) == double)
 system.time(sum(z, na.rm=FALSE))
   user  system elapsed
   4.490.004.51

Following the source code, I'm pretty sure the code
(https://github.com/wch/r-source/blob/trunk/src/main/summary.c#L112-L128)
performing the calculation is:

static Rboolean rsum(double *x, R_xlen_t n, double *value, Rboolean narm)
{
  LDOUBLE s = 0.0;
  Rboolean updated = FALSE;
  for (R_xlen_t i = 0; i  n; i++) {
if (!narm || !ISNAN(x[i])) {
  if(!updated) updated = TRUE;
s += x[i];
}
  }
  if(s  DBL_MAX) *value = R_PosInf;
  else if (s  -DBL_MAX) *value = R_NegInf;
  else *value = (double) s;
  return updated;
}

In other words, when na.rm=FALSE, that inner for loop:

  for (R_xlen_t i = 0; i  n; i++) {
if (!narm || !ISNAN(x[i])) {
  if(!updated) updated = TRUE;
s += x[i];
}
  }

should effectively become (because !ISNAN(x[i]) does not make a difference):

  for (R_xlen_t i = 0; i  n; i++) {
if (!narm) {
  if(!updated) updated = TRUE;
s += x[i];
}
  }

That is, sum(x, na.rm=FALSE) basically spends time on `s += x[i]`.
Now, I have always been under impression that summing with NA:s is
*not* more expensive that summing over regular (double) values, which
is confirmed by the below example, but the above benchmarking
disagree.  It looks like there is a big overhead keeping track of the
sum `s` being NA, which is supported by the fact that summing over 'z'
is costs half of 'y'.

Now, I *cannot* reproduce the above using the following 'inline' example:

 sum2 - inline::cfunction(sig=c(x=double, narm=logical), body='
 double *x_ = REAL(x);
 int narm_ = asLogical(narm);
 int n = length(x);
 double sum = 0;
 for (R_xlen_t i = 0; i  n; i++) {
   if (!narm_ || !ISNAN(x_[i])) sum += x_[i];
 }
 return ScalarReal(sum);
')

 x - rep(0, 1e8)
 stopifnot(typeof(x) == double)
 system.time(sum2(x, narm=FALSE))
   user  system elapsed
   0.160.000.16

 y - rep(NA_real_, 1e8)
 stopifnot(typeof(y) == double)
 system.time(sum2(y, narm=FALSE))
   user  system elapsed
   0.160.000.15

 z - x; z[length(z)/2] - NA_real_
 stopifnot(typeof(z) == double)
 system.time(sum2(z, narm=FALSE))
   user  system elapsed
   0.160.000.15

This is why I suspect it's related to how R was configured when it was
built. What's going on? Can someone please bring some light on this?

Thanks

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] MetaCran website v1.0.0-alpha

2015-05-24 Thread Henrik Bengtsson

On May 24, 2015 2:44 AM, Rainer M Krug rai...@krugs.de wrote:

 Gábor Csárdi csardi.ga...@gmail.com writes:

  Dear All,
 
  [ I was wondering if this should have gone to the new mailing list.
Maybe. ]
 
  As some of you maybe know from my earlier posts, I am building a simple
  search engine for R packages. Now the search engine has a proper web
site,
  where you can also browse CRAN packages.
 
  http://www.r-pkg.org/
 
  As I see the value is in
  1. package search (search box on top right)
  2. APIs, see http://www.r-pkg.org/services
 
  It is in alpha version, meaning that things seem to work, some pages
are a
  bit slow and there are a lot of glitches to fix.

 I had a quick peek, and it looks really nice! I particularly think the
 github integration for diff-ing versions can be very use full!

 It might be an idea, to also add R itself to the github repo for
 diff-ing?

You'll find that at:

https://github.com/wch/r-source

Henrik


 Thanks a lot,

 Rainer

 
  Please tell me what you think.
 
  Best,
  Gabor
 
[[alternative HTML version deleted]]
 

 --
 Rainer M. Krug
 email: Raineratkrugsdotde
 PGP: 0x0F52F982

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] MetaCran website v1.0.0-alpha

2015-05-26 Thread Henrik Bengtsson

On Tue, May 26, 2015 at 12:45 AM, Gábor Csárdi csardi.ga...@gmail.com wrote:
 On Mon, May 25, 2015 at 8:28 PM, Simon Urbanek simon.urba...@r-project.org
 wrote:

 One issue I have with this is that it doesn't point to the original GitHub
 repositories of the packages, so you end up with additional repositories on
 Github in Gabor's name that have nothing to do with the actual Github
 repositories of the packages. I understand that it's technically necessary,
 but I fear it will lead to a lot of confusion...


 Well, we point to the original GitHub repo is that is given in the URL
 field. It would be nice to have an official field for source code
 repository in DESCRIPTION.

 But I agree with you that this has great potential for confusion. Several
 people have been sending pull requests to github.com/cran repos, most of
 them not realizing that they are not the right repos to fork. (Although
 many packages are not on GH or any other similar service, and then are kind
 of the places to fork.)

 I could have a large warning popup on the link from r-pkg.org, with red
 flags, and you would see this before the actual repo. But this has its own
 problems, like being annoying after a while, how to turn it off with
 browser cookies, etc.

 The best would be to somehow have a warning on the GitHub repo pages, but
 there isn't a lot I can modify there if I don't want to change/add the
 README file, which would effectively change the package. I could probably
 add 'WARNING: this is a read-only mirror, and not the original package
 repository' to the one-line description on the top.

 If you have other ideas, please let me know.

If people send pull requests, maybe adding a generic open pull request
to each repository with title MIRROR ONLY: Do not send pull requests
here would help.  The fancy version would be to say MIRROR ONLY: All
patches/pull requests should be sent to URL, where URL is from
the DESCRIPTION field 'URL'.  That might prevent a few more.
blinkYou can boost it with lots of colorful WARNING, NO
TRESPASSING, ... labels as well/blink.   The next level up is to
have a service that automatically reject pull requests with an
informative error message.

/Henrik



 Gabor


 On May 24, 2015, at 5:44 AM, Rainer M Krug rai...@krugs.de wrote:

  Gábor Csárdi csardi.ga...@gmail.com writes:
 
  Dear All,
 
  [ I was wondering if this should have gone to the new mailing list.
 Maybe. ]
 
  As some of you maybe know from my earlier posts, I am building a simple
  search engine for R packages. Now the search engine has a proper web
 site,
  where you can also browse CRAN packages.
 
  http://www.r-pkg.org/
 
  As I see the value is in
  1. package search (search box on top right)
  2. APIs, see http://www.r-pkg.org/services
 
  It is in alpha version, meaning that things seem to work, some pages
 are a
  bit slow and there are a lot of glitches to fix.
 
  I had a quick peek, and it looks really nice! I particularly think the
  github integration for diff-ing versions can be very use full!
 
  It might be an idea, to also add R itself to the github repo for
  diff-ing?
 
  Thanks a lot,
 
  Rainer
 
 
  Please tell me what you think.
 
  Best,
  Gabor
 
   [[alternative HTML version deleted]]
 
 
  --
  Rainer M. Krug
  email: Raineratkrugsdotde
  PGP: 0x0F52F982
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel



 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] download.file() on ftp URL fails in windows with default download method

2015-08-08 Thread Henrik Bengtsson

Works for me on Windows 7.  Also when I explicitly set 'method' to
internal, libcurl, curl, wininet and 'wget.

 sessionInfo()
R version 3.2.2 beta (2015-08-04 r68843)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.2.2

/Henrik

On Sat, Aug 8, 2015 at 1:11 AM, Dan Tenenbaum dtene...@fredhutch.org wrote:
 Hi,

 url - 
 ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/All/GCF_01405.13.assembly.txt;
 download.file(url, tempfile())
 trying URL 
 'ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/All/GCF_01405.13.assembly.txt'
 Error in download.file(url, tempfile()) :
   cannot open URL 
 'ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/All/GCF_01405.13.assembly.txt'
 In addition: Warning message:
 In download.file(url, tempfile()) : InternetOpenUrl failed: ''

 If I set method=curl it works fine. This was on R-3.2.2-beta (sessionInfo() 
 below) but I got the same results in R-3.2.1 and R-devel.

 This does not happen on Windows Server 2008 but it happens on Windows Server 
 2012.

 Dan

 sessionInfo()
 R version 3.2.2 beta (2015-08-05 r68859)
 Platform: x86_64-w64-mingw32/x64 (64-bit)
 Running under: Windows Server 2012 x64 (build 9200)

 locale:
 [1] LC_COLLATE=English_United States.1252
 [2] LC_CTYPE=English_United States.1252
 [3] LC_MONETARY=English_United States.1252
 [4] LC_NUMERIC=C
 [5] LC_TIME=English_United States.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] unset() function?

2015-08-22 Thread Henrik Bengtsson

Hi,

I was playing around with this idea earlier this year. This would
allow you to remove a variable with NAMED2 while still passing it's
value, e.g.

x1 - log(r(x1))

where the returned value/variable has NAMED=1.  At first I was quite
excited about the results, but it turned out that it only worked for a
few functions.  If you want to play around with it, I've created the
'recycle' package:

https://github.com/HenrikBengtsson/recycle

Have a look at the package tests for examples and what works and what
doesn't work:

https://github.com/HenrikBengtsson/recycle/tree/master/tests

However, basically due to what Luke says, I've decided not to pursue
this any further for now.

But, I certainly agree that if the internals of R could be made less
conservative (not force NAMED=2), this idea would certainly be worth
pursuing and could save quite a bit of memory.  The downside would be
that code would be cluttered up with lots of explicit r() statements.
On the other hand, maybe those could be added automatically by code
compilers, e.g.

x1 - log(x1)

would become

x1 - log(r(x1))

/Henrik

On Sat, Aug 22, 2015 at 4:50 PM,  luke-tier...@uiowa.edu wrote:
 This wouldn't actually work at present as evaluating a promise always
 sets NAMED to 2. With reference counting it would work so might be
 worth considering when we switch.

 Going forward it would be best to use MAYBE_REFERENCED to test whether
 a duplicate is needed -- this macro is defined appropriately whether R
 is compiled to use NAMED or reference counting.

 Best,

 luke


 On Fri, 21 Aug 2015, William Dunlap wrote:

 Does R have a function like the S/S++ unset() function?
 unset(name) would remove 'name' from the current evaluation
 frame and return its value.  It allowed you to safely avoid
 some memory copying when calling .C or .Call.

 E.g., suppose you had C code like
  #include R.h
  #include Rinternals.h
  SEXP add1(SEXP pX)
  {
  int nProtected = 0;
  int n = Rf_length(pX);
  int i;
  double* x;
  Rprintf(NAMED(pX)=%d: , NAMED(pX));
  if (NAMED(pX)) {
  Rprintf(Copying pX before adding 1\n);
  PROTECT(pX = duplicate(pX)); nProtected++;
  } else {
  Rprintf(Changing pX in place\n);
  }
  x = REAL(pX);
  for(i=0 ; in ; i++) {
x[i] = x[i] + 1.0;
  }
  UNPROTECT(nProtected);
  return pX;
  }

 If I call this from an R function
  add1 - function(x) {
  stopifnot(inherits(x, numeric))
 .Call(add1, x)
  }
 it will will always copy 'x', even though not copying would
 be safe (since add1 doesn't use 'x' after calling .Call()).
   add1(c(1.2, 3.4))
  NAMED(pX)=2: Copying pX before adding 1
  [1] 2.2 4.4
 If I make the .Call directly, without a nice R function around it
 then I can avoid the copy
   .Call(add1, c(1.2, 3.4))
  NAMED(pX)=0: Changing pX in place
  [1] 2.2 4.4

 If something like S's unset() were available I could avoid the copy,
 when safe to do so, by making the .Call in add1
   .Call(add1, unset(x))

 If you called this new add1 with a named variable from another
 function the copying would be done, since NAMED(x) would be
 2 even after the local binding was removed.  It actually requires some
 care to to eliminate the copying, as all the functions in the call
 chain would have to use unset() when possible.

 I ask this because I ran across a function in the 'bit' package that
 does not have its C code call duplicate but instead assumes the
 x[1] - x[1] will force x to be copied:
  !.bit - function(x){
if (length(x)){
  ret - x
  ret[1] - ret[1]  # force duplication
  .Call(R_bit_not, ret, PACKAGE=bit)
}else{
  x
}
  }
 If you optimize things so that 'ret[1] - ret[1]' does not copy 'ret',
 then this function alters its input.  It a function like unset()
 were there then the .Call could be
 .Call(R_bit_not, unset(x))

 I suppose the compiler could analyze the code and see that
 x was not used after the .Call and thus feel free to avoid the
 copy.

 In any case bit's maintainer should add something like
if(NAMED(x) {
PROTECT(x=duplicate(x));
nProtect++;
}
...
UNPROTECT(nProtect);
 in the C code, but unset() would help avoid unneeded duplications.


 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 --
 Luke Tierney
 Ralph E. Wareham Professor of Mathematical Sciences
 University of Iowa  Phone: 319-335-3386
 Department of Statistics andFax:   319-335-3017
Actuarial Science
 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
 Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] capture.output() duplicates last line unless newline (R-devel bug)

2015-08-14 Thread Henrik Bengtsson

In R-devel (2015-08-12 r69024), capture.output() incorrectly
duplicates the last line unless it ends with a newline.  I don't see
this in R 3.2.2 RC (2015-08-13 r69049).  It seems to have started
fairily recently; I spotted this yesterday after starting to get
errors in my R.utils check that use capture.output(), cf.
https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-clang/R.utils-00check.html

Examples:

 x - a
 cat(x)
a
 capture.output(cat(x))
[1] a a

 x - a\n
 cat(x)
a
 capture.output(cat(x))
[1] a

 x - a\nb
 cat(x)
a
b
 capture.output(cat(x))
[1] a b b

 x - a\nb\n
 cat(x)
a
b
 capture.output(cat(x))
[1] a b

 capture.output(cat(x))
[1] a b
 x - c(a, b)
 cat(x)
a b
 capture.output(cat(x))
[1] a b a b

 x - c(a, b\n)
 cat(x)
a b
 capture.output(cat(x))
[1] a b


 sessionInfo()
R Under development (unstable) (2015-08-12 r69024)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] R.utils_2.1.0-9000 R.oo_1.19.0-9000   R.methodsS3_1.7.0-9000

loaded via a namespace (and not attached):
[1] tools_3.3.0

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Why not pthreads on Windows in 'parallel' package?

2015-08-14 Thread Henrik Bengtsson

Aaaah ...

and argh - I should have better not to post R question at midnight,
especially when I know it forks the process and it's not using threads.
Brain meltdown. (So, we'll proceed trying to use pthreads in matrixStats
also for Windows). Sorry for the noise and thanks Kasper.

Henrik
On Aug 15, 2015 02:52, Kasper Daniel Hansen kasperdanielhan...@gmail.com
wrote:

 mclapply uses fork which is different from pthreads.  As I understand it,
 pthreads requires you to rewrite code, fork is a system call which takes
 care of completely replicating the current state of the process.

 Kasper

 On Fri, Aug 14, 2015 at 5:00 PM, Henrik Bengtsson 
 henrik.bengts...@ucsf.edu wrote:

 On Windows there are a few 'pthreads' implementation, e.g.
 pthreads-w32 and winpthreads
 [
 https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Using-pthreads
 ].
 We're thinking of giving them a try for the matrixStats package, and
 basic tests indicates it works, but since Windows pthreads are not
 used by core R (or?) I've got a little bit worried that we will face
 overwhelming problems.

 So, why are the above Windows implementations not used in the
 'parallel' package in order to add multicore support for mclapply() on
 Windows?  Was it tried but found to be unreliable?  Was it that no one
 had the time to do it?  License issues?  Are there any pointers to old
 R-devel threads discussing this?

 Thanks

 Henrik

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Best way to implement optional functions?

2015-10-25 Thread Henrik Bengtsson

On Thu, Oct 22, 2015 at 3:48 PM, Paul Gilbert  wrote:
>
>
> On 10/22/2015 03:55 PM, Duncan Murdoch wrote:
>>
>> I'm planning on adding some new WebGL functionality to the rgl package,
>> but it will pull in a very large number of dependencies. Since many
>> people won't need it, I'd like to make the new parts optional.
>>
>> The general idea I'm thinking of is to put the new stuff into a separate
>> package, and have rgl "Suggest" it.  But I'm not sure whether these
>> functions  should only be available in the new package (so users would
>> have to attach it to use them), or whether they should be in rgl, but
>> fail if the new package is not available for loading.
>>
>> Can people suggest other packages that solve this kind of problem in a
>> good way?
>
>
> I do something similar in several packages. I would distinguish between the
> situation where the new functions have some functionality without all the
> extra dependencies, and the case where they really do not. In the former
> case it makes sense to put the functions in rgl and then fail when the extra
> functionality is demanded and not available. In the latter case, it "feels
> like" you are trying to defeat Depends: or Imports:. That route has usually
> gotten me in trouble.
>
> Another thing you might want to consider is that, at least for awhile, the
> new functions in rglPlus will probably be less stable then those in rgl.
> Being able to change those and update rglPlus without needing to update rgl
> can be a real advantage (i.e. if the API for the new functions is in rgl,
> and you need to change it, then you are required to notify all the package
> maintainers that depend on rgl, do reverse testing, and you have to explain
> that your update of rgl is going to break rglPlus and you have a new version
> of that but you cannot submit that yet because it will not work until the
> new rgl is in place.)

I favor the latter solution; keep rgl as a core package and add bells
and whistles to rglPlus and have rglPlus attach rgl so that people
using the extra functions can just do library(rglPlus).  This way rgl
does not have to "know about" rglPlus.  Not sure if this design is
possible, or rgl do need to have to know about rglPlus.

/Henrik

>
> Paul
>
>>
>> Duncan Murdoch
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] PDFs and SVGs containing rasterGrobs don't display correctly in some other software

2015-11-05 Thread Henrik Bengtsson

On Nov 5, 2015 03:45, "Richard Cotton"  wrote:
>
> I've just been trying to post-process some R-created heatmaps using
> Inkscape, but I can't get them to display correctly in that software.
>
> To reproduce:
>
> library(grid)
> r <- as.raster(matrix(runif(25), 5, 5))
> pdf("test.pdf")
> grid.newpage()
> grid.raster(r, interpolate = FALSE)
> dev.off()
>
> This figure should be a five by five block of grey squares.  This is
> what I see in the R GUI device window, and when I open test.pdf in
> Abode Reader or SumatraPDF.
>
> However, when I open the file in Inkscape or Firefox, each of the
> squares is blurred.

Not sure if it's related to how PNGs are rendered in Firefox, but it sounds
similar to the anti-aliasing used by Firefox when scaling up PNGs:

http://stackoverflow.com/questions/388492/firefox-blurs-an-image-when-scaled-through-external-css-or-inline-style

Firefox made anti-aliasing ("blurring") the new default many years ago.
After complaints they provided a way to control this behavior via (a
Mozilla-specific) CSS specification.  See above thread (which is wire old;
there might be more up-to-date reference out there).

If you output a PNG and insert it in a HTML img where you scale up the
width and height, do you see the same problem in Inkscape as in Firefox?

>
> I tried swapping grDevices::pdf for Cairo::CairoPDF and got the same
> result.  I also tried generating SVGs using both grDevices::svg and
> Cairo::CairoSVG, and also got the same result.

If you inspect your generated SVG, is the raster image embedded as a PNG?
Then SVG is unlikely to change the rendering property compared PNGs
(sic!).  However, you might be able to control the anti-aliasing property
of embedded rather images in SVGs using embedded CSS if you want a
self-contained SVG file that renders properly everywhere. You can also do a
self-contained HTML in a similar way.

Don't know if there exist a workaround for PDFs (didn't even know about it
until you mentioned it).

My $.02

Henrik
(from smartphone so sorry for typos etc)

>
> I see the same thing using R-devel and R3.2.2 under Windows, and (with
> an older version of R) under Linux.
>
> I don't know whether the problem is with grid's rasterGrobs, or how R
> writes PDF and SVG files, or with Inkscape and Firefox's method of
> rendering those files, or with me.  Please can you help me narrow it
> down.
>
> - Can you reproduce my problem?  That is, when you run the above code,
> does the file look OK in a PDF reader but blurry in Inkscape?
> - Do you know of any issues with using rasterGrobs in PDFs or SVGs?
>
> --
> Regards,
> Richie
>
> Learning R
> 4dpiecharts.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] A where() functions that does what exists() does but return the environment when object lives?

2015-10-13 Thread Henrik Bengtsson

Thanks Uwe and thanks Hadley.  I ended up implementing:

## Emulates R internal findVar1mode() function
## https://svn.r-project.org/R/trunk/src/main/envir.c
where <- function(x, where=-1, envir=if (missing(frame)) { if (where <
0) parent.frame(-where) else as.environment(where) } else
sys.frame(frame), frame, mode="any", inherits=TRUE) {
  tt <- 1
  ## Validate arguments
  stopifnot(is.environment(envir))
  stopifnot(is.character(mode), length(mode) == 1L)
  inherits <- as.logical(inherits)
  stopifnot(inherits %in% c(FALSE, TRUE))

  ## Search
  while (!identical(envir, emptyenv())) {
if (exists(x, envir=envir, mode=mode, inherits=FALSE)) return(envir)
if (!inherits) return(NULL)
envir <- parent.env(envir)
  }

  NULL
}

Here where() provides the same arguments as exists() and get().  It
turns out one needs to tweak the default value for 'envir' argument in
order work the same.

One could argue that where() should always return an environment, i.e.
it should return emptyenv() instead of NULL if the object was not
found.  On the other hand, it's easier to test for is.null(env) than
identical(env, emptyenv()).

/Henrik


On Tue, Oct 13, 2015 at 2:44 PM, Hadley Wickham  wrote:
> On Tue, Oct 13, 2015 at 4:43 PM, Hadley Wickham  wrote:
>> Seems easy enough to write yourself:
>>
>> where <- function(x, env = parent.frame()) {
>> if (identical(env, emptyenv()))
>> return(NULL)
>> if (exists(x, envir = env, inherits = FALSE))
>> return(env)
>> where(x, parent.env(env))
>> }
>>
>> sample2 <- base::sample
>> where("sample2")
>> #> 
>
> And that returns a random environment because I ran it with
> reprex::reprex().  In interactive use it will return  R_GlobalEnv>
>
> Hadley
>
> --
> http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] A where() functions that does what exists() does but return the environment when object lives?

2015-10-09 Thread Henrik Bengtsson

Hi,

exists("foo", inherits=TRUE) check whether an object named "foo"
exists, and get("foo", inherits=TRUE) retrieves it.  I'm looking for a
similar function to exists() that returns the environment where the
object "foo" exists, iff at all.  If not found, NULL is returned.
Does that exist?

EXAMPLE #1:

> sample2 <- base::sample
> env <- where("sample2", inherits=TRUE)
> env


Note the difference to:

> obj <- get("sample2", inherits=TRUE)
> environment(obj)



EXAMPLE #2:

> a <- 1
> foo <- function() { b <- 2; list(a=where("a", inherits=TRUE), b=where("b", 
> inherits=TRUE)) }
> foo()
$a

$b


> foo()
$a

$b



I do understand that I can implement such a function myself, but I
prefer not to.

Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] A where() functions that does what exists() does but return the environment when object lives?

2015-10-13 Thread Henrik Bengtsson

On Sat, Oct 10, 2015 at 1:24 AM, Uwe Ligges
<lig...@statistik.tu-dortmund.de> wrote:
> I'd start looking at getAnywhere().

Thanks Uwe, that does indeed provides "where" information.
Unfortunately, I don't see how it will allow me to search environments
similarly/in the same order as exists/get(..., envir, inherits=TRUE)
does it.  getAnywhere() will search everything in any order.  It's a
start though.

Cheers,

Henrik

>
> Best,
> Uwe
>
>
> On 10.10.2015 01:18, Henrik Bengtsson wrote:
>>
>> Hi,
>>
>> exists("foo", inherits=TRUE) check whether an object named "foo"
>> exists, and get("foo", inherits=TRUE) retrieves it.  I'm looking for a
>> similar function to exists() that returns the environment where the
>> object "foo" exists, iff at all.  If not found, NULL is returned.
>> Does that exist?
>>
>> EXAMPLE #1:
>>
>>> sample2 <- base::sample
>>> env <- where("sample2", inherits=TRUE)
>>> env
>>
>> 
>>
>> Note the difference to:
>>
>>> obj <- get("sample2", inherits=TRUE)
>>> environment(obj)
>>
>> 
>>
>>
>> EXAMPLE #2:
>>
>>> a <- 1
>>> foo <- function() { b <- 2; list(a=where("a", inherits=TRUE),
>>> b=where("b", inherits=TRUE)) }
>>> foo()
>>
>> $a
>> 
>> $b
>> 
>>
>>> foo()
>>
>> $a
>> 
>> $b
>> 
>>
>>
>> I do understand that I can implement such a function myself, but I
>> prefer not to.
>>
>> Thanks,
>>
>> Henrik
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Rscript --args -e / R --vanilla --slave --args -e opens an interactive R session

2015-09-22 Thread Henrik Bengtsson

When using Rscript -e , (mistakenly) putting --args in front of
-e causes an interactive R session to start that does not quit
automatically and that does not display a prompt. For example,

{hb}: Rscript --vanilla -e "0"
[1] 0

{hb}: Rscript --vanilla -e "0" --args
[1] 0

{hb}: Rscript --vanilla --args -e "0"
1
[1] 1
2
[1] 2
quit("no")

Further troubleshooting narrows this down to:

{hb}: R --vanilla --slave --args -e 0
1
[1] 1
quit('no')

having this problem.

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Long vectors: Missing values and R_xlen_t?

2015-09-21 Thread Henrik Bengtsson

On Mon, Sep 21, 2015 at 11:20 AM, Simon Urbanek
<simon.urba...@r-project.org> wrote:
>
> On Sep 20, 2015, at 3:06 PM, Henrik Bengtsson <henrik.bengts...@ucsf.edu> 
> wrote:
>
>> Is there a missing value constant defined for R_xlen_t, cf. NA_INTEGER
>> (== R_NaInt == INT_MIN) for int(eger)?  If not, is it correct to
>> assume that missing values should be taken care/tested for before
>> coercing from int or double?
>>
>
> R_xlen_t is type of the vector length (see XLENGTH()) and as such never holds 
> a missing value (since there is no such thing as a missing length). It is 
> *not* a native type for R vectors and therefore there is no official 
> representation of NAs in R_xlen_t.
>
> Although native R vectors can be used as indices, the way it typically works 
> is that the code first checks for NAs in the R vector and only then converts 
> to R_xlen_t, so the NA value is never stored in R_xlen_t even for indexing.
>
> --- cut here, content below is less relevant ---
>
> That said, when converting packages from "legacy" .Call code before long 
> vector support which used asInteger() to convert an index I tend to use this 
> utility for convenience:
>
> static R_INLINE R_xlen_t asLength(SEXP x, R_xlen_t NA) {
> double d;
> if (TYPEOF(x) == INTSXP && LENGTH(x) > 0) {
> int res = INTEGER(x)[0];
> return (res == NA_INTEGER) ? NA : ((R_xlen_t) res);
> }
> d = asReal(x);
> return (R_finite(d)) ? ((R_xlen_t) d) : NA;
> }
>
> Note that this explicitly allows the caller to specify NA representation 
> since it depends on the use - often it's simply 0, other times -1 will do 
> since typically anything negative is equally bad. As noted above, this is not 
> what R itself does, so it's more of a convenience to simplify conversion of 
> legacy code.

Thank you Simon,

all this helped clarify it for me.  It's in line with what I
suspected, but it is really useful to hear it from the "officials".

Cheers,

Henrik

>
> Cheers,
> Simon
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Long vectors: Missing values and R_xlen_t?

2015-09-21 Thread Henrik Bengtsson

Is there a missing value constant defined for R_xlen_t, cf. NA_INTEGER
(== R_NaInt == INT_MIN) for int(eger)?  If not, is it correct to
assume that missing values should be taken care/tested for before
coercing from int or double?

Thank you,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RProfmem output format

2016-06-04 Thread Henrik Bengtsson

I'm picking up this 5-year old thread.

1. About the four memory allocations without a stacktrace

I think the four memory allocations without a stacktrace reported by Rprofmem():

> Rprofmem(); x <- raw(2000); Rprofmem("")
> cat(readLines("Rprofmem.out", n=5, warn=FALSE), sep="\n")
192 :360 :360 :1064 :2040 :"raw"

are due to some initialization of R that is independent of Rprofmem(),
because they can be avoided if one allocates some memory before (in a
fresh R session):

> z <- raw(1000); dummy <- gc()
> Rprofmem(); x <- raw(2000); Rprofmem("")
> cat(readLines("Rprofmem.out", n=5, warn=FALSE), sep="\n")
2040 :"raw"


2. About missing newlines when stacktrace is empty

As a refresher, the problem is that memory allocations an empty
stracktrace are reported without newlines, i.e.

192 :360 :360 :1064 :2040 :"raw"

The question is why this is not reported as:

192 :
360 :
360 :
1064 :
2040 :"raw"

This was/is because C function R_OutputStackTrace() - part of
src/main/memory.c  - looks like:

static void R_OutputStackTrace(FILE *file)
{
int newline = 0;
RCNTXT *cptr;

for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
   && TYPEOF(cptr->call) == LANGSXP) {
   SEXP fun = CAR(cptr->call);
   if (!newline) newline = 1;
   fprintf(file, "\"%s\" ",
   TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
   "");
}
}
if (newline) fprintf(file, "\n");
}


Thomas, your last comment was:

> Yes. It's obviously better to always print a newline, and so clearly
> deliberate not to, that I suspect there may have been a good reason.
> If I can't work it out (after my grant deadline this week) I will just
> assume it's wrong.

When I search the code and the commit history
(https://github.com/wch/r-source/commit/3d5eb2a09f2d75893efdc8bbf1c72d17603886a0),
it appears that this was there from the very first commit.  Also,
searching the code for usages of R_OutputStackTrace(), I only find
R_ReportAllocation() and R_ReportNewPage(), both part of of
src/main/memory.c (see below).

static void R_ReportAllocation(R_size_t size)
{
if (R_IsMemReporting) {
if(size > R_MemReportingThreshold) {
   fprintf(R_MemReportingOutfile, "%lu :", (unsigned long) size);
   R_OutputStackTrace(R_MemReportingOutfile);
}
}
return;
}

static void R_ReportNewPage(void)
{
if (R_IsMemReporting) {
fprintf(R_MemReportingOutfile, "new page:");
R_OutputStackTrace(R_MemReportingOutfile);
}
return;
}


Could it be that when you wrote it you had another usage for
R_OutputStackTrace() in mind as well?  If so, it makes sense that
R_OutputStackTrace() shouldn't output a newline if the stack trace was
empty.  But if the above is the only usage, to me it looks pretty safe
to always add a newline.

> sessionInfo()
R version 3.3.0 Patched (2016-05-26 r70682)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

/Henrik


On Sun, May 15, 2011 at 1:16 PM, Thomas Lumley  wrote:
> On Mon, May 16, 2011 at 1:02 AM, Hadley Wickham  wrote:
>> So what causes allocations when the call stack is empty?  Something
>> internal?  Does the garbage collector trigger allocations (i.e. could
>> it be caused by moving data to contiguous memory)?
>
> The garbage collector doesn't move anything, it just swaps pointers in
> a linked list.
>
> The lexer, parser, and evaluator all have  to do some work before a
> function context is set up for the top-level function, so I assume
> that's where it is happening.
>
>> Any ideas what the correct thing to do with these memory allocations?
>> Ignore them because they're not really related to the function they're
>> attributed to?  Sum them up?
>>
>>> I don't see why this is done, and I may well be the person who did it
>>> (I don't have svn on this computer to check), but it is clearly
>>> deliberate.
>>
>> It seems like it would be more consistent to always print a newline,
>> and then it would obvious those allocations occurred when the call
>> stack was empty.  This would make parsing the file a little bit
>> easier.
>
> Yes. It's obviously better to always print a newline, and so clearly
> deliberate not to, that I suspect there may have been a good reason.
> If I can't work it out (after my grant deadline this week) I will just
> assume it's wrong.
>
>
>-thomas
>
> --
> Thomas Lumley
> Professor of Biostatistics
> University of Auckland
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Multiple cores are used in simple for loop

2016-01-15 Thread Henrik Bengtsson

On Fri, Jan 15, 2016 at 10:15 AM, Daniel Kaschek
 wrote:
> Dear Martyn,
>
>
> On Fr, Jan 15, 2016 at 4:01 , Martyn Plummer  wrote:
>>
>>
>> Alternatively, you may be able to control the maximum number of threads
>> by setting and exporting an appropriate environment variable depending
>> on what backend you are using, e.g. OPENBLAS_NUM_THREADS or
>> MKL_NUM_THREADS.
>
>
>
> Thanks a lot. Running
>
> export OPENBLAS_NUM_THREADS = 1
>
> in the bash before starting R solves both problems!

I don't have builds so I can try myself, but as an alternative, is it
possible to set this environment variable in ~/.Renviron, or is that
too late in the R startup process?  What about
Sys.setenv(OPENBLAS_NUM_THREADS=1) in ~/.Rprofile?

/Henrik

>
>
>
> Cheers,
> Daniel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Best way for rgl's .onLoad to fail?

2016-02-01 Thread Henrik Bengtsson

If I recall it correctly, at least on Linux, rgl only needs X11 when
rgl.useNULL(FALSE).  Is that correct?  If so, I would say dependency
on X11 is optional and therefore you should be able to load the
package even without X11.  Or is it that it still requires X11 libs
but not an X11 server?

My $.02

/Henrik

On Mon, Feb 1, 2016 at 1:44 AM, Duncan Murdoch  wrote:
> On 01/02/2016 4:26 AM, Martin Maechler wrote:
>>>
>>> "BH" == Bryan Hanson 
>>>  on Sun, 31 Jan 2016 09:50:46 -0500 writes:
>>
>>
>>  BH> I think the 2nd option will be more palatable to
>>  BH> inexperienced users, but both do state the important
>>  BH> detail.  Bryan
>>
>>  >> On Jan 30, 2016, at 4:11 PM, Duncan Murdoch
>>  wrote:
>>  >>
>>  >> On OSX and Linux, the rgl package currently requires X11
>>  >> libs to be available for linking.  Recent versions of OSX
>>  >> don't include them by default, so I'd like rgl to fail
>>  >> nicely.
>>  >>
>>  >> Ideally, it will load a library that doesn't need to link
>>  >> to the X11 libs but will still allow WebGL code to work,
>>  >> but that's complicated, so I'd like a stopgap.
>>  >>
>>  >> I can detect that the failure is about to happen, and
>>  >> call stop() in the .onLoad hook, but that gives an ugly
>>  >> message:
>>  >>
>>  >> > library(rgl)
>>  >> Error : .onLoad failed in loadNamespace() for 'rgl', details:
>>  >> call: NULL
>>  >> error: X11 not found; XQuartz (from www.xquartz.org) is required
>> to run rgl.
>>  >> Error: package or namespace load failed for ‘rgl’
>>
>> I agree that the error message is a bit messy or even ugly,
>> however, other than Bryan, I would want  library(.)  to signal
>> an error when it cannot provide a working package, loaded and
>> attached to search().
>>
>> Other functions, such as  require(.)  do rely on this behavior of
>> library(.),
>> e.g., the much used idiom
>>
>>if(require()) {
>>
>>
>>
>>}
>>
>> needs library() to signal an error on  non-success.
>
>
> Yes, that's a good point.  That's what the version on R-forge currently
> does.
>
> Duncan Murdoch
>
>
>>
>> Martin
>>
>>  >> Alternatively, I can just give a warning and not attempt to load
>> the rgl lib:
>>  >>
>>  >> > library(rgl)
>>  >> Warning message:
>>  >> X11 not found; XQuartz (from www.xquartz.org) is required to run
>> rgl.
>>  >>
>>  >> rgl is now loaded, but it doesn't work; just about any function
>> call will give an error, e.g.
>>  >>
>>  >> > plot3d(1,2,3)
>>  >> Error in rgl.cur() : object 'rgl_dev_getcurrent' not found
>>  >>
>>  >> Do people have opinions about this?
>>  >>
>>  >> One comparable package is RGtk2:  if Gtk2 isn't install, it offers
>> to install it.  I could probably do that for XQuartz.  If the user says no,
>> RGtk2 gives really ugly error messages.  rgl can work without XQuartz, but
>> as I already mentioned, making this work is complicated, so I'd like
>> something simple for now.
>>  >>
>>  >> Duncan Murdoch
>>  >>
>>  >> __
>>  >> R-devel@r-project.org mailing list
>>  >> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>  > __
>>  > R-devel@r-project.org mailing list
>>  > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Milestone: 8000 packages on CRAN

2016-02-29 Thread Henrik Bengtsson

Another 1000 packages were added to CRAN, which took less than 7
months. Today (February 29, 2017), the Comprehensive R Archive Network
(CRAN) [1] reports:

“Currently, the CRAN package repository features 8002 available packages.”

The rate with which new packages are added to CRAN is increasing.  In
2014-2015 we had 1000 packages added to CRAN in 355 days (2.8 per
day), the following 1000 packages took 287 days (3.5 per day) and now
the most recent 1000 packages clocked in at an impressive 201 days
(5.0 per day).  Since the start of CRAN 18.9 years ago on April 23,
1997 [2], there has been on average one new package appearing on CRAN
every 20.6 hours - it is actually more frequent than that because
dropped/archived packages are not accounted for. The 8000 packages on
CRAN are maintained by ~4279 people [3].

Thanks to the CRAN team and to all package developers. You can give
back by carefully reporting bugs to the maintainers, properly citing
any packages you use in your publications, cf. citation("pkg name")
and help out helping others using the R.

Milestones:

2016-02-29: 8000 packages [this post]
2015-08-12: 7000 packages [11]
2014-10-29: 6000 packages [10]
2013-11-08: 5000 packages [9]
2012-08-23: 4000 packages [8]
2011-05-12: 3000 packages [7]
2009-10-04: 2000 packages [6]
2007-04-12: 1000 packages [5]
2004-10-01: 500 packages [4]
2003-04-01: 250 packages [4]

These data are for CRAN only. There are many more packages elsewhere,
e.g. R-Forge, Bioconductor, Github etc.

[1] http://cran.r-project.org/web/packages/
[2] https://en.wikipedia.org/wiki/R_(programming_language)#Milestones
[3] http://www.r-pkg.org/
[4] Private data
[5] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
[6] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
[7] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html
[8] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html
[9] https://stat.ethz.ch/pipermail/r-devel/2013-November/067935.html
[10] https://stat.ethz.ch/pipermail/r-devel/2014-October/069997.html
[11] https://stat.ethz.ch/pipermail/r-package-devel/2015q3/000393.html

Thanks

Henrik
(a long-term fan)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] formals(x)<- drops attributes including class

2016-03-13 Thread Henrik Bengtsson

Just checking in to see whether it is intended or not that assigning
new formals to a function/closure causes any attributes to be dropped:

EXAMPLE:
> fcn <- structure(function() {}, foo="foo", class=c("foo"))
> str(fcn)
function ()
 - attr(*, "srcref")=Class 'srcref'  atomic [1:8] 1 18 1 30 18 30 1 1
  .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' 
 - attr(*, "foo")= chr "foo"
 - attr(*, "class")= chr "foo"

> formals(fcn) <- list(a=1)
> str(fcn)
function (a = 1)
> attributes(fcn)
NULL


TROUBLESHOOTING:
>From the definition of formals()<-, it's quite clear why this happens:

> `formals<-`
function (fun, envir = environment(fun), value)
{
bd <- body(fun)
as.function(c(value, if (is.null(bd) || is.list(bd)) list(bd) else bd),
envir)
}




I'm fine with this, but I just wanted to make sure it's not overlooked.

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Under Windows, Rgui and Rterm crash if one tries to close the graphic device while identify or locator are running

2016-04-05 Thread Henrik Bengtsson

If of any help,

I can reproduce this (on Windows 7) back to at least R 3.0.3 but it's
not there in R 3.0.0.  (I have *not* checked with R 3.0.1 and 3.0.2
which I don't have installed).

/Henrik


On Tue, Apr 5, 2016 at 8:23 AM, Duncan Murdoch  wrote:
> Thanks, I'll track this down.
>
> Duncan Murdoch
>
>
> On 05/04/2016 9:35 AM, Simone Giannerini wrote:
>>
>> minimal reproducible example
>>
>> plot(1,1)
>> identify(1,1) # or locator()
>>
>> now, trying to close the window by clicking on the cross of the upper
>> right corner causes Rgui (and Rterm) to crash.
>>
>> I see the same behaviour on 2 different Windows PC (one with Win 8.1
>> and one with Win 10).
>> I did not see the problem in linux (see below)
>>
>> WINDOWS **
>> > sessionInfo()
>> R version 3.3.0 beta (2016-04-04 r70420)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows 10 x64 (build 10586)
>>
>> locale:
>> [1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252
>> [3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
>> [5] LC_TIME=Italian_Italy.1252
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>>
>> > sessionInfo()
>> R version 3.2.2 Patched (2015-09-29 r69441)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows 10 x64 (build 10240)
>>
>> locale:
>> [1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252
>> LC_MONETARY=Italian_Italy.1252
>> [4] LC_NUMERIC=C   LC_TIME=Italian_Italy.1252
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] tools_3.2.2
>>
>>
>> ** LINUX *
>>
>> > sessionInfo()
>> R version 3.2.0 Patched (2015-04-21 r68221)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>> Running under: openSUSE 13.2 (Harlequin) (x86_64)
>>
>> locale:
>>   [1] LC_CTYPE=it_IT.UTF-8   LC_NUMERIC=C
>>   [3] LC_TIME=it_IT.UTF-8LC_COLLATE=it_IT.UTF-8
>>   [5] LC_MONETARY=it_IT.UTF-8LC_MESSAGES=it_IT.UTF-8
>>   [7] LC_PAPER=it_IT.UTF-8   LC_NAME=C
>>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>>
>> --
>> __
>>
>> Simone Giannerini
>> Dipartimento di Scienze Statistiche "Paolo Fortunati"
>> Universita' di Bologna
>> Via delle belle arti 41 - 40126  Bologna,  ITALY
>> Tel: +39 051 2098262  Fax: +39 051 232153
>> http://www2.stat.unibo.it/giannerini/
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R not responding (must force quit) when saving graphic to PDF (bug?)

2016-04-11 Thread Henrik Bengtsson

I'm not an OS X user, but two things you might look into/help you troubleshoot:

1. Because you said it only happens when you try to overwrite an
existing PDF, could it be that there is another process holding onto
(=locking) the PDF file that you're trying to write to?  For instance,
are you viewing it at the same time?

2. What happens if you create a dummy dummy.txt file, and then try to
overwrite that (with the same name - so no *.pdf)?  Does that work/not
work?  If it works, it means there's something special with *.pdf
files.  If not, then ... ?!?.

My $.02

/Henrik

PS. I think Wolfgang's point is that few people will spent time
troubleshooting obsolete versions of R (unless they're on the same
version), meaning the chances for feedback/troubleshooting on this is
much smaller than if it could be confirmed we a more up-to-date
version of R.  Because if it does given a problem in a new version,
there is nothing really to fix.  Having said this, any feedback that
might >= 1 person is always useful.


On Mon, Apr 11, 2016 at 2:35 PM, Neil French Collier
 wrote:
> Dear Wolfgang,
>
> Thanks for your response.
>
> No, I haven't tried doing the complete reinstallation as you suggest. If I
> recall correctly, this has happened on previous occasions but like I said
> in my email it's not all that disruptive to my workflow. It's more like a
> 'first world problem'.
>
> I was encouraged to submit the query because I had correspondence with
> someone involved with R development: "this shouldn't be happening" was the
> advice I received. I use R a lot but I'm not deeply knowledgable about it's
> inner workings.
>
> To be frank, I really don't care if this is resolved. I was just trying to
> help the R development group after receiving advice that it might be
> important.
>
> Again, thanks for your help.
>
> Cheers
>
> Neil
>
> On Mon, Apr 11, 2016 at 6:41 PM, Wolfgang Huber  wrote:
>
>> Dear Neil
>>
>> Have you tried with completely uninstalling your R, and reinstalling a
>> more recent version (e.g. 3.2.4 right now).
>> And then the same with your operating system - OS X is currently at
>> 10.11.4.
>> There is likely not much interest (or benefit) in chasing such things in
>> obsolete versions.
>>
>> And I think R-help would be the more appropriate place for this kind of
>> question.
>> Wolfgang
>>
>> > On Apr 11, 2016, at 18:37 GMT+2, Neil French Collier <
>> neilander...@gmail.com> wrote:
>> >
>> > Dear colleagues,
>> >
>> > I wish to report a problem I encounter when trying to save a graphic to
>> > file. When I produce a graphic and try to save it R becomes unresponsive
>> > and I must force quit, and then restart R. The problem occurs when I try
>> to
>> > overwrite an existing graphic: for example when I made changes to the
>> > graphic and want to save the graphic using the original file name. It
>> only
>> > happens when I use the menus to save files, not using script. It doesn't
>> > happen when I save it as a new file. So, like this:
>> >
>> > 1. Make the graphic
>> > 2. Click File -> Save -> click on existing file name
>> > 3. Colour wheel appears and R is unresponsive.
>> >
>> > Reproducible example:
>> >
>> > x <- seq(1,10)
>> > y <- x^2
>> > plot(x,y, type="l") # Save as new file, all fine.
>> >
>> > #Change plot and save as old plot file name:
>> >
>> > plot(x,y, type="l", col=2)
>> >
>> > # Click File -> Save
>> > # 'Save quartz to PDF file' box opens
>> > # Click on existing file name
>> > # Colour wheel
>> >
>> > Here is the sessionInfo():
>> >
>> > R version 3.2.2 (2015-08-14)
>> > Platform: x86_64-apple-darwin13.4.0 (64-bit)
>> > Running under: OS X 10.11.1 (El Capitan)
>> >
>> > locale:
>> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>> >
>> > attached base packages:
>> > [1] stats graphics  grDevices utils datasets  methods   base
>> >
>> > I'm not sure if this is worthy of reporting but I was encouraged to
>> submit
>> > a report. It doesn't really make a huge impact on my workflow, but it can
>> > be annoying.
>> >
>> > --
>> > Cheers,
>> >
>> > Neil
>> >
>> > Neil French Collier, PhD
>> > Faculty of Sustainability
>> > Leuphana University Lüneburg
>> > Rotenbleicher Weg 67
>> > 21335 Lueneburg
>> > Germany
>> >
>> > Twitter: @foodsecbio
>> > email: coll...@leuphana.de
>> > Google Scholar
>> > <
>> https://scholar.google.com.au/citations?hl=en=xVdc-dsJ_op=list_works=pubdate
>> >
>> > Ideas for Sustainability 
>> >
>> >   [[alternative HTML version deleted]]
>> >
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> Wolfgang
>>
>> Wolfgang Huber
>> Principal Investigator, EMBL Senior Scientist
>> Genome Biology Unit
>> European Molecular Biology Laboratory (EMBL)
>> Heidelberg, Germany
>>
>> wolfgang.hu...@embl.de
>> http://www.huber.embl.de
>>
>>
>>
>>
>>
>>
>>
>>

Re: [Rd] Milestone: 8000 packages on CRAN

2016-03-01 Thread Henrik Bengtsson

Thank you Gergely - great work.

So on planet CRAN with have a large living population of packages and
some deceased packages of the past ... and then there are packages
that reincarnate one or more times.

I wonder how long it will be before someone makes this into an R data
package? ;)

Cheers,

Henrik

On Tue, Mar 1, 2016 at 12:11 AM, Gergely Daróczi <daroc...@rapporter.net> wrote:
> Thank you very much, Henrik, for maintaining this list -- it's always
> a pleasure to see the ever growing number of useful R packages!
>
> I decided a few times in the past to extend your research with the
> list of archived packages, but did not actually start coding -- until
> tonight: https://gist.github.com/daroczig/3cf06d6db4be2bbe3368 (this
> includes the a CSV with 9K rows, so might be slow to load -- but it;s
> worth waiting, as you get a searchable list of package names, dates &
> index)
>
> In short, combining the list of current CRAN packages + the list of
> archived packages results in a list with more than 9,000 R packages by
> now with the following milestones (using the numbers from your
> analysis):
>
> ##  date  index   name
> ##  1: 2016-01-12   9000   dChipIO
> ##  2: 2015-06-30   8000   gkmSVM
> ##  3: 2014-10-22   7000   glmvsd
> ##  4: 2014-02-05   6000   bilan
> ##  5: 2013-03-20   5000   Rgnuplot
> ##  6: 2012-06-21   4000   HIBAG
> ##  7: 2011-04-24   3000   SPECIES
> ##  8: 2009-09-10   2000   maticce
> ##  9: 2007-03-11   1000   cairoDevice
> ## 10: 2005-02-21500   micEcon
> ## 11: 2003-03-19250   polspline
>
> So including the archived packages in this report, 8K was actually
> reached at the time of useR! 2015 :)
>
> Best,
> Gergely
>
> On Mon, Feb 29, 2016 at 12:54 PM, Henrik Bengtsson
> <henrik.bengts...@gmail.com> wrote:
>> Another 1000 packages were added to CRAN, which took less than 7
>> months. Today (February 29, 2017), the Comprehensive R Archive Network
>> (CRAN) [1] reports:
>>
>> “Currently, the CRAN package repository features 8002 available packages.”
>>
>> The rate with which new packages are added to CRAN is increasing.  In
>> 2014-2015 we had 1000 packages added to CRAN in 355 days (2.8 per
>> day), the following 1000 packages took 287 days (3.5 per day) and now
>> the most recent 1000 packages clocked in at an impressive 201 days
>> (5.0 per day).  Since the start of CRAN 18.9 years ago on April 23,
>> 1997 [2], there has been on average one new package appearing on CRAN
>> every 20.6 hours - it is actually more frequent than that because
>> dropped/archived packages are not accounted for. The 8000 packages on
>> CRAN are maintained by ~4279 people [3].
>>
>> Thanks to the CRAN team and to all package developers. You can give
>> back by carefully reporting bugs to the maintainers, properly citing
>> any packages you use in your publications, cf. citation("pkg name")
>> and help out helping others using the R.
>>
>> Milestones:
>>
>> 2016-02-29: 8000 packages [this post]
>> 2015-08-12: 7000 packages [11]
>> 2014-10-29: 6000 packages [10]
>> 2013-11-08: 5000 packages [9]
>> 2012-08-23: 4000 packages [8]
>> 2011-05-12: 3000 packages [7]
>> 2009-10-04: 2000 packages [6]
>> 2007-04-12: 1000 packages [5]
>> 2004-10-01: 500 packages [4]
>> 2003-04-01: 250 packages [4]
>>
>> These data are for CRAN only. There are many more packages elsewhere,
>> e.g. R-Forge, Bioconductor, Github etc.
>>
>> [1] http://cran.r-project.org/web/packages/
>> [2] https://en.wikipedia.org/wiki/R_(programming_language)#Milestones
>> [3] http://www.r-pkg.org/
>> [4] Private data
>> [5] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
>> [6] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
>> [7] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html
>> [8] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html
>> [9] https://stat.ethz.ch/pipermail/r-devel/2013-November/067935.html
>> [10] https://stat.ethz.ch/pipermail/r-devel/2014-October/069997.html
>> [11] https://stat.ethz.ch/pipermail/r-package-devel/2015q3/000393.html
>>
>> Thanks
>>
>> Henrik
>> (a long-term fan)
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Is it possible to increase MAX_NUM_DLLS in future R releases?

2016-05-10 Thread Henrik Bengtsson

Isn't the problem in Qin's example that unloadNamespace("scde") only
unloads 'scde' but none of its package dependencies that were loaded
when 'scde' was loaded.  For example:

$ R --vanilla
> ns0 <- loadedNamespaces()
> dlls0 <- getLoadedDLLs()

> packageDescription("scde")[c("Depends", "Imports")]
$Depends
[1] "R (>= 3.0.0), flexmix"

$Imports
[1] "Rcpp (>= 0.10.4), RcppArmadillo (>= 0.5.400.2.0), mgcv, Rook, rjson, MASS,
 Cairo, RColorBrewer, edgeR, quantreg, methods, nnet, RMTstat, extRemes, pcaMet
hods, BiocParallel, parallel"

> loadNamespace("scde")
> ns1 <- loadedNamespaces()
> dlls1 <- getLoadedDLLs()

> nsAdded <- setdiff(ns1, ns0)
> nsAdded
 [1] "flexmix"   "Rcpp"  "edgeR" "splines"
 [5] "BiocGenerics"  "MASS"  "BiocParallel"  "scde"
 [9] "lattice"   "rjson" "brew"  "RcppArmadillo"
[13] "minqa" "distillery""car"   "tools"
[17] "Rook"  "Lmoments"  "nnet"  "parallel"
[21] "pbkrtest"  "RMTstat"   "grid"  "Biobase"
[25] "nlme"  "mgcv"  "quantreg"  "modeltools"
[29] "MatrixModels"  "lme4"  "Matrix""nloptr"
[33] "RColorBrewer"  "extRemes"  "limma" "pcaMethods"
[37] "stats4""SparseM"   "Cairo"

> dllsAdded <- setdiff(names(dlls1), names(dlls0))
> dllsAdded
 [1] "Cairo" "parallel"  "limma" "edgeR"
 [5] "MASS"  "rjson" "Rcpp"  "grid"
 [9] "lattice"   "Matrix""SparseM"   "quantreg"
[13] "nnet"  "nlme"  "mgcv"  "Biobase"
[17] "pcaMethods""splines"   "minqa" "nloptr"
[21] "lme4"  "extRemes"  "RcppArmadillo" "tools"
[25] "Rook"  "scde"


If you unload these namespaces, I think the DLLs will also be
detached; or at least they should if packages implement an .onUnload()
with a dyn.unload().  More on this below.


To unloading these added namespaces (with DLLs), they have to be
unloaded in an order that does not break the dependency graph of the
currently loaded packages, because otherwise you'll get errors such
as:

> unloadNamespace("quantreg")
Error in unloadNamespace("quantreg") :
  namespace 'quantreg' is imported by 'car', 'scde' so cannot be unloaded

I don't know if there exist a function that unloads the namespaces in
the proper order, but here is a brute-force version:

unloadNamespaces <- function(ns, ...) {
  while (length(ns) > 0) {
ns0 <- loadedNamespaces()
for (name in ns) {
  try(unloadNamespace(name), silent=TRUE)
}
ns1 <- loadedNamespaces()
## No namespace was unloaded?
if (identical(ns1, ns0)) break
ns <- intersect(ns, ns1)
  }
  if (length(ns) > 0) stop("Failed to unload namespace: ",
paste(sQuote(ns), collapse=", "))
} # unloadNamespaces()


When I run the above on R 3.3.0 patched on Windows, I get:

> unloadNamespaces(nsAdded)
now dyn.unload("C:/Users/hb/R/win-library/3.3/scde/libs/x64/scde.dll") ...
> ns2 <- loadedNamespaces()
> dlls2 <- getLoadedDLLs()
> ns2
[1] "grDevices" "utils" "stats" "datasets"  "base"  "graphics"
[7] "methods"
> identical(sort(ns2), sort(ns0))
[1] TRUE


However, there are some namespaces for which the DLLs are still loaded:

> sort(setdiff(names(dlls2), names(dlls0)))
 [1] "Cairo" "edgeR" "extRemes"  "minqa"
 [5] "nloptr""pcaMethods""quantreg"  "Rcpp"
 [9] "RcppArmadillo" "rjson" "Rook"  "SparseM"


If we look for .onUnload() in packages that load DLLs, we find that
the following does not have an .onUnload() and therefore probably does
neither call dyn.unload() when the package is unloaded:

> sort(dllsAdded[!sapply(dllsAdded, FUN=function(pkg) {
+   ns <- getNamespace(pkg)
+   exists(".onUnload", envir=ns, inherits=FALSE)
+ })])
 [1] "Cairo" "edgeR" "extRemes"  "minqa"
 [5] "nloptr""pcaMethods""quantreg"  "Rcpp"
 [9] "RcppArmadillo" "rjson" "Rook"  "SparseM"


That doesn't look like a coincident to me.  Maybe `R CMD check` should
in addition to checking that the namespace of a package can be
unloaded also assert that it unloads whatever DLL a package loads.
Something like:

* checking whether the namespace can be unloaded cleanly ... WARNING
  Unloading the namespace does not unload DLL

At least I don't think this is tested for, e.g.
https://cran.r-project.org/web/checks/check_results_Cairo.html and
https://cran.r-project.org/web/checks/check_results_Rcpp.html.

/Henrik


On Mon, May 9, 2016 at 11:57 PM, Martin Maechler
 wrote:
>> Qin Zhu 
>> on Fri, 6 May 2016 11:33:37 -0400 writes:
>
> > Thanks for all your great answers.
> > The app I’m working on is indeed an exploratory data analysis tool for 
> gene expression, which requires a bunch of bioconductor packages.
>
> > I guess for now, my best solution is to divide my app into modules and 
> load/unload packages

[Rd] R devel: install.packages(..., type="both") not supported on Windows

2016-05-14 Thread Henrik Bengtsson

Is the following intentional or something that has been overlooked?

[HB-X201]{hb}: R --vanilla

R Under development (unstable) (2016-05-13 r70616) -- "Unsuffered Consequences"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

[...]

## Note that "source" is the built-in default
> getOption("pkgType")
[1] "source"

## Trying with 'both'
> install.packages("MASS", type="both")
Installing package into 'C:/Users/hb/R/win-library/3.4'
(as 'lib' is unspecified)
Error in install.packages("MASS") :
  type == "both" can only be used on Windows or a CRAN build for Mac OS X

## But 'win.binary' works
> install.packages("MASS", type="win.binary")
Installing package into 'C:/Users/hb/R/win-library/3.4'
(as 'lib' is unspecified)
trying URL 'https://cran.r-project.org/bin/windows/contrib/3.4/MASS_7.3-45.zip'
Content type 'application/zip' length 1088567 bytes (1.0 MB)
downloaded 1.0 MB

package 'MASS' successfully unpacked and MD5 sums checked


> sessionInfo()
R Under development (unstable) (2016-05-13 r70616)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Is it possible to retrieve the last error? (not error message)

2016-05-05 Thread Henrik Bengtsson

Thanks.

As mentioned in that Stackoverflow thread, this requires re-evaluation
of the problematic code, which may or may not work (in addition to
taking time).

The closest I get to a solution, but which also requires being
proactive, is to use options(error=...) to record the condition
signaled by stop().  However, contrary to try()/tryCatch(), this is an
option that can be on all the time.  It can be automatically enabled
by setting it in for instance .Rprofile.

## This can be placed in .Rprofile
local({
  recordStop <- function(...) {
## Find the stop() frame
frames <- sys.frames()
args <- names(formals(base::stop))
isStop <- lapply(frames, FUN=function(f) all(args %in% names(f)))
idx <- which(unlist(isStop))[1]
frame <- frames[[idx]]

## Was stop() called with a condition or a message?
vars <- names(frame)
if ("cond" %in% vars) {
  .Last.error <- frame$cond
} else {
  msg <- eval(quote(.makeMessage(..., domain=domain)), envir=frame)
  call <- if (frame$call.) sys.calls()[[1]] else NULL
  .Last.error <- simpleError(msg, call=call)
}

assign(".Last.error", .Last.error, envir=.GlobalEnv)
  } ## recordStop()

  options(error=recordStop)
})


Then it can be used as:

## Requires options(error=recordStop)

## stop() at the prompt
stop("Hello")
## Error: Hello
print(.Last.error)
## 
str(.Last.error)
# List of 3
#  $ message: chr "woops"
#  $ call   : NULL
#  $ value  : num 2
#  - attr(*, "class")= chr [1:4] "MyError" "simpleError" "error" "condition"


## stop() in a function
foo <- function() stop("woops")
ex <- tryCatch(foo(), error = function(ex) ex)
print(ex)
## 
foo()
## Error in foo() : woops
print(.Last.error)
## 
## Assert identical results
stopifnot(all.equal(.Last.error, ex))


## stop() in a nested call
bar <- function() foo()
ex <- tryCatch(bar(), error = function(ex) ex)
# 
bar()
# Error in foo() : woops
print(.Last.error)
# 
## Assert identical results
stopifnot(all.equal(.Last.error, ex))


## A custom error class
MyError <- function(..., value=0) {
  ex <- simpleError(...)
  ex$value <- value
  class(ex) <- c("MyError", class(ex))
  ex
}


## stop() from prompt
err <- MyError("woops", value=1L)
ex <- tryCatch(stop(err), error = function(ex) ex)
print(ex)
# 
stop(err)
## Error: woops
print(.Last.error)
# 
## Assert identical results
stopifnot(all.equal(.Last.error, ex))

## stop() in a function
yo <- function(value=1) stop(MyError("woops", value=value))
ex <- tryCatch(yo(), error = function(ex) ex)
print(ex)
# 
yo()
# Error: woops
print(.Last.error)
# 
## Assert identical results
stopifnot(all.equal(.Last.error, ex))

## stop() in a nested call
yeah <- function(value=2) yo(value=value)
ex <- tryCatch(yeah(), error = function(ex) ex)
print(ex)
# 
yeah()
# Error: woops
print(.Last.error)
# 
stopifnot(all.equal(.Last.error, ex))
str(.Last.error)
# List of 3
#  $ message: chr "woops"
#  $ call   : NULL
#  $ value  : num 2
#  - attr(*, "class")= chr [1:4] "MyError" "simpleError" "error" "condition"


/Henrik


On Wed, May 4, 2016 at 11:59 PM, Richard Cotton <richiero...@gmail.com> wrote:
> I wondered the same thing a few days ago.
>
> https://stackoverflow.com/questions/36966036/how-to-get-the-last-error
>
> The here's the solution from that discussion:
>
> get_last_error <- function()
> {
>   tr <- .traceback()
>   if(length(tr) == 0)
>   {
> return(NULL)
>   }
>   tryCatch(eval(parse(text = tr[[1]])), error = identity)
> }
>
> Note that it uses .traceback() from R 3.3.0; you'll have to use
> baseenv()$.Traceback with earlier version of R.
>
> On 4 May 2016 at 22:41, Henrik Bengtsson <henrik.bengts...@gmail.com> wrote:
>> Hi,
>>
>> at the R prompt, is it possible to retrieve the last error (as in
>> condition object of class "error")?
>>
>> I'm not asking for geterrmessage(), which only returns the error
>> message (as a character string).  I'm basically looking for a
>> .Last.error or .Last.condition, analogously to .Last.value for values,
>> which can be used when it is "too late" (not possible) to go back an
>> use try()/tryCatch().
>>
>> Thanks,
>>
>> Henrik
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> --
> Regards,
> Richie
>
> Learning R
> 4dpiecharts.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Is it possible to retrieve the last error? (not error message)

2016-05-04 Thread Henrik Bengtsson

On Wed, May 4, 2016 at 1:27 PM, David Winsemius <dwinsem...@comcast.net> wrote:
>
>> On May 4, 2016, at 12:41 PM, Henrik Bengtsson <henrik.bengts...@gmail.com> 
>> wrote:
>>
>> Hi,
>>
>> at the R prompt, is it possible to retrieve the last error (as in
>> condition object of class "error")?
>>
>> I'm not asking for geterrmessage(), which only returns the error
>> message (as a character string).  I'm basically looking for a
>> .Last.error or .Last.condition, analogously to .Last.value for values,
>> which can be used when it is "too late" (not possible) to go back an
>> use try()/tryCatch().
>
> After looking at the code for the exposed `traceback`
>>  I'm wondering if this delivers what you expect:
>
> .traceback()[1]

Thanks, but unfortunately not:

> stop("Hello")
Error: Hello
> ex <- .traceback()[1]
> str(ex)
List of 1
 $ : chr "stop(\"Hello\")"
> inherits(ex, "condition")
[1] FALSE


I'm looking for something that returns the object of class condition, cf.

> ex <- attr(try(stop("Hello")), "condition")
Error in try(stop("Hello")) : Hello
> str(ex)
List of 2
 $ message: chr "Hello"
 $ call   : language doTryCatch(return(expr), name, parentenv, handler)
 - attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
> inherits(ex, "condition")
[1] TRUE


The reason is that this object may contain additional information that
is not available in the message nor the call.  For instance, a package
may define a richer error class that captures/carries more information
about an error (e.g. a time stamp, a remote session information, ...)
and which is therefore not available via neither geterrmessage() nor
traceback().

I am aware that I might be asking for something that is not supported
and requires that the default signal handlers be modified.

/Henrik

>
>
>> Thanks,
>>
>> Henrik
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> David Winsemius
> Alameda, CA, USA
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Is it possible to retrieve the last error? (not error message)

2016-05-04 Thread Henrik Bengtsson

Hi,

at the R prompt, is it possible to retrieve the last error (as in
condition object of class "error")?

I'm not asking for geterrmessage(), which only returns the error
message (as a character string).  I'm basically looking for a
.Last.error or .Last.condition, analogously to .Last.value for values,
which can be used when it is "too late" (not possible) to go back an
use try()/tryCatch().

Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Closing a fifo() on Windows core dumps R (3.2.5, 3.3.0 RC and devel)

2016-05-01 Thread Henrik Bengtsson

The following core dumps R 3.2.5, R 3.3.0 RC and R devel on Windows.
I have tried to use a minimal setup (for all versions tested), i.e.

C:\> cd C:\
C:\> set PATH=C:\PROGRA~1\R\R-33~1.0RC\bin
C:\> set R_DEFAULT_PACKAGES=base

C:\> R --quiet --vanilla
> close(fifo("foo.tmp", open="wb"))
[core dump]


C:\> R --quiet --vanilla
> con <- fifo('foo.tmp', open='wb')
> print(con)
description   classmodetext  openedcan read
   "\v"  "fifo""wb"  "text""opened""no"
  can write
  "yes"
> close(con)


C:\>R --quiet --vanilla -e "close(fifo('foo.tmp', open='wb'))"
> close(fifo('foo.tmp', open='wb'))
[core dump]


C:\>R --quiet --vanilla -e "fh <- fifo('foo.tmp', open='wb');
print(fh); close(fh)"
> fh <- fifo('foo.tmp', open='wb'); print(fh); close(fh)
description   classmodetext  openedcan read
 "\016"  "fifo""wb"  "text""opened""no"
  can write
  "yes"
[core dump]

C:\>R --quiet --vanilla -e "fifo('foo.tmp', open='wb'); closeAllConnections()"
> fifo('foo.tmp', open='wb'); closeAllConnections()
description   classmodetext  openedcan read
 "\016"  "fifo""wb"  "text""opened""no"
  can write
  "yes"
[core dump]

C:\>Rscript --vanilla -e "close(fifo('foo.tmp', open='wb'))"
[core dump]

C:\> Rscript --vanilla -e "con <- fifo('foo.tmp', open='wb');
print(con); close(con)"
description   classmodetext  openedcan read
   "\f"  "fifo""wb"  "text""opened""no"
  can write
  "yes"
[core dump]


I doesn't core dump every time, but quite often. When I get it to core
dump ones I can often repeat it several times.  And when it doesn't
core dump, it seems to work for quite a while.

Note the random values of the `description` when it core dumps
(spurious memory mapping?).  These are reproducible between core dumps
and I even seen then across different cmd.exe sessions.  When it does
*not* core dump, I typically see "foo.tmp" (as expected), but I've
also seen "con" (sic!).


I've got it to core dump with the following versions of R:

C:\>Rscript -e "utils::sessionInfo()"
R version 3.2.5 (2016-04-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

C:\>R --version
R version 3.3.0 RC (2016-04-28 r70564) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

all with session info:

C:\>Rscript --vanilla -e "utils::sessionInfo()"
R version { version }
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] base

loaded via a namespace (and not attached):
[1] utils_3.3.0



Can some else reproduce this?  I'd be happy to do a formal bug report.


Thanks

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Closing a fifo() on Windows core dumps R (3.2.5, 3.3.0 RC and devel)

2016-05-01 Thread Henrik Bengtsson

Sorry for the version mess; here are the versions I've got installed
and on which I can reproduce the core dump:

C:\>R --version
R version 3.2.5 (2016-04-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

C:\>R --version
R version 3.3.0 RC (2016-04-28 r70564) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

C:\>R --version
R Under development (unstable) (2016-04-29 r70564) -- "Unsuffered Consequences"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

/Henrik

On Sun, May 1, 2016 at 7:08 PM, Henrik Bengtsson
<henrik.bengts...@gmail.com> wrote:
> The following core dumps R 3.2.5, R 3.3.0 RC and R devel on Windows.
> I have tried to use a minimal setup (for all versions tested), i.e.
>
> C:\> cd C:\
> C:\> set PATH=C:\PROGRA~1\R\R-33~1.0RC\bin
> C:\> set R_DEFAULT_PACKAGES=base
>
> C:\> R --quiet --vanilla
>> close(fifo("foo.tmp", open="wb"))
> [core dump]
>
>
> C:\> R --quiet --vanilla
>> con <- fifo('foo.tmp', open='wb')
>> print(con)
> description   classmodetext  openedcan read
>"\v"  "fifo""wb"  "text""opened""no"
>   can write
>   "yes"
>> close(con)
>
>
> C:\>R --quiet --vanilla -e "close(fifo('foo.tmp', open='wb'))"
>> close(fifo('foo.tmp', open='wb'))
> [core dump]
>
>
> C:\>R --quiet --vanilla -e "fh <- fifo('foo.tmp', open='wb');
> print(fh); close(fh)"
>> fh <- fifo('foo.tmp', open='wb'); print(fh); close(fh)
> description   classmodetext  openedcan read
>  "\016"  "fifo""wb"  "text""opened""no"
>   can write
>   "yes"
> [core dump]
>
> C:\>R --quiet --vanilla -e "fifo('foo.tmp', open='wb'); closeAllConnections()"
>> fifo('foo.tmp', open='wb'); closeAllConnections()
> description   classmodetext  openedcan read
>  "\016"  "fifo""wb"  "text""opened""no"
>   can write
>   "yes"
> [core dump]
>
> C:\>Rscript --vanilla -e "close(fifo('foo.tmp', open='wb'))"
> [core dump]
>
> C:\> Rscript --vanilla -e "con <- fifo('foo.tmp', open='wb');
> print(con); close(con)"
> description   classmodetext  openedcan read
>"\f"  "fifo""wb"  "text""opened""no"
>   can write
>   "yes"
> [core dump]
>
>
> I doesn't core dump every time, but quite often. When I get it to core
> dump ones I can often repeat it several times.  And when it doesn't
> core dump, it seems to work for quite a while.
>
> Note the random values of the `description` when it core dumps
> (spurious memory mapping?).  These are reproducible between core dumps
> and I even seen then across different cmd.exe sessions.  When it does
> *not* core dump, I typically see "foo.tmp" (as expected), but I've
> also seen "con" (sic!).
>
>
> I've got it to core dump with the following versions of R:
>
> C:\>Rscript -e "utils::sessionInfo()"
> R version 3.2.5 (2016-04-14)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> C:\>R --version
> R version 3.3.0 RC (2016-04-28 r70564) -- "Supposedly Educational"
> Copyright (C) 2016 The R Foundation for Statistical Computing
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> all with session info:
>
> C:\>Rscript --vanilla -e "utils::sessionInfo()"
> R version { version }
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] base
>
> loaded via a namespace (and not attached):
> [1] utils_3.3.0
>
>
>
> Can some else reproduce this?  I'd be happy to do a formal bug report.
>
>
> Thanks
>
> Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] S3 dispatch for S4 subclasses only works if variable "extends" is accessible from global environment

2016-04-19 Thread Henrik Bengtsson

On Tue, Apr 19, 2016 at 9:21 AM, Hadley Wickham  wrote:
>
> This might be too big a change - but is it worth reconsidering the
> behaviour of Rscript? Maybe the simplest fix would be simply to always
> load the methods package.  (I think historically it didn't because
> loading methods took a long time, but that is no longer true)

Slightly weaker version of this wish (that would also remove
confusion): At least make R and Rscript load the same set of packages
by default.


More clarification (in case some is new to this topic):

The packages loaded by default when R and Rscript is loaded can be
controlled by environment variable 'R_DEFAULT_PACKAGES' and/or option
'defaultPackages', cf. help("Startup").  When this is empty or
undefined, the built-in defaults kick in, and it's these built-in
defaults that differ between the R and the Rscript executable:

$ R --quiet --vanilla -e "getOption('defaultPackages')"
> getOption('defaultPackages')
[1] "datasets"  "utils" "grDevices" "graphics"  "stats" "methods"

$ Rscript --vanilla -e "getOption('defaultPackages')"
[1] "datasets"  "utils" "grDevices" "graphics"  "stats"

Thus, a user can enforce the same set of default packages by using:

$ export R_DEFAULT_PACKAGES=datasets,utils,grDevices,graphics,stats,methods

$ R --quiet --vanilla -e "getOption('defaultPackages')"
> getOption('defaultPackages')
[1] "datasets"  "utils" "grDevices" "graphics"  "stats" "methods"

$ Rscript --vanilla -e "getOption('defaultPackages')"
[1] "datasets"  "utils" "grDevices" "graphics"  "stats" "methods"

/Henrik

>
> Hadley
>
> On Tue, Apr 19, 2016 at 10:37 AM, Gabriel Becker  wrote:
> > Does it make sense to be able to load an S4 object without the methods
> > package being attached? I'm not sure implementation-wise how easy this
> > would be, but it seems like any time there is an S4 object around, the
> > methods package should be available to deal with it.
> >
> > ~G
> >
> > On Tue, Apr 19, 2016 at 7:34 AM, Michael Lawrence  >> wrote:
> >
> >> Right, R_has_methods_attached() uses that. Probably not the right
> >> check, since it refers to S4 dispatch, while S4_extends() is used by
> >> S3 dispatch.
> >>
> >> Perhaps S4_extends() should force load the methods package? The above
> >> example works after fixing the check to ensure that R_MethodsNamespace
> >> is not R_GlobalEnv, but one could load a serialized S4 object and
> >> expect S3 dispatch to work with Rscript.
> >>
> >> On Tue, Apr 19, 2016 at 6:51 AM, Gabriel Becker 
> >> wrote:
> >> > See also .isMethodsDispatchOn, which is what trace uses to decide if the
> >> > methods package needs to be loaded.
> >> >
> >> > ~G
> >> >
> >> > On Tue, Apr 19, 2016 at 5:34 AM, Michael Lawrence
> >> >  wrote:
> >> >>
> >> >> Not sure why R_has_methods_attached() exists. Maybe Martin could shed
> >> >> some light on that.
> >> >>
> >> >> On Mon, Apr 18, 2016 at 11:50 PM, Kirill Müller
> >> >>  wrote:
> >> >> > Thanks for looking into it, your approach sounds good to me. See also
> >> >> > R_has_methods_attached()
> >> >> >
> >> >> > (
> >> https://github.com/wch/r-source/blob/42ecf5f492a005f5398cbb4c9becd4aa5af9d05c/src/main/objects.c#L258-L265
> >> ).
> >> >> >
> >> >> > I'm fine with Rscript not loading "methods", as long as everything
> >> works
> >> >> > properly with "methods" loaded but not attached.
> >> >> >
> >> >> >
> >> >> > -Kirill
> >> >> >
> >> >> >
> >> >> >
> >> >> > On 19.04.2016 04:10, Michael Lawrence wrote:
> >> >> >>
> >> >> >> Right, the methods package is not attached by default when running R
> >> >> >> with Rscript. We should probably remove that special case, as it
> >> >> >> mostly just leads to confusion, but that won't happen immediately.
> >> >> >>
> >> >> >> For now, the S4_extends() should probably throw an error when the
> >> >> >> methods namespace is not loaded. And the check should be changed to
> >> >> >> directly check whether R_MethodsNamespace has been set to something
> >> >> >> other than the default (R_GlobalEnv). Agreed?
> >> >> >>
> >> >> >> On Mon, Apr 18, 2016 at 4:35 PM, Kirill Müller
> >> >> >>  wrote:
> >> >> >>>
> >> >> >>> Scenario: An S3 method is declared for an S4 base class but called
> >> for
> >> >> >>> an
> >> >> >>> instance of a derived class.
> >> >> >>>
> >> >> >>> Steps to reproduce:
> >> >> >>>
> >> >>  Rscript -e "test <- function(x) UseMethod('test', x); test.Matrix
> >> <-
> >> >>  function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())"
> >> >> >>>
> >> >> >>> Error in UseMethod("test", x) :
> >> >> >>>no applicable method for 'test' applied to an object of class
> >> >> >>> "lsyMatrix"
> >> >> >>> Calls: 
> >> >> >>> 1: MatrixDispatchTest::test(Matrix::Matrix())
> >> >> >>>
> >> >>  Rscript -e "extends <- 42; test <- function(x)

[Rd] Is .packageName part of the official API?

2016-07-12 Thread Henrik Bengtsson

Hi, I've seen that some packages use .packageName internally to infer
their own name.  Is that officially supported?  I could not find it
documented anywhere.

There's utils::packageName(), which internally looks for .packageName.
However, if the latter is not found, it may return NULL whereas an
error would be more appropriate if a package name is expected.  Using
.packageName would give an error if it does not exist.  Also, which is
minor, using packageName() would add explicit dependency on the utils
package whereas .packageName doesn't.

So, should I use .packageName or utils::packageName() for this?

Thanks

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-26 Thread Henrik Bengtsson

On a related note, the storage mode should try to match ans[[1]] (or
unlist:ed and) when allocating 'ansmat' to avoid coercion and hence a full
copy.

Henrik


On Jan 26, 2017 07:50, "William Dunlap via R-devel" 
wrote:

It would be cool if the default for tapply's init.value could be
FUN(X[0]), so it would be 0 for FUN=sum or FUN=length, TRUE for
FUN=all, -Inf for FUN=max, etc.  But that would take time and would
break code for which FUN did not work on length-0 objects.
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Jan 26, 2017 at 2:42 AM, Martin Maechler
 wrote:
> Last week, we've talked here about "xtabs(), factors and NAs",
>  ->  https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html
>
> In the mean time, I've spent several hours on the issue
> and also committed changes to R-devel "in two iterations".
>
> In the case there is a *Left* hand side part to xtabs() formula,
> see the help page example using 'esoph',
> it uses  tapply(...,  FUN = sum)   and
> I now think there is a missing feature in tapply() there, which
> I am proposing to change.
>
> Look at a small example:
>
>> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]),
N=3)[-c(1,5), ]; xtabs(~., D2)
> , , N = 3
>
>L
> n   A B C D E F
>   1 1 2 0 0 0 0
>   2 0 0 1 2 0 0
>   3 0 0 0 0 2 2
>
>> DN <- D2; DN[1,"N"] <- NA; DN
>n L  N
> 2  1 A NA
> 3  1 B  3
> 4  1 B  3
> 6  2 C  3
> 7  2 D  3
> 8  2 D  3
> 9  3 E  3
> 10 3 E  3
> 11 3 F  3
> 12 3 F  3
>> with(DN, tapply(N, list(n,L), FUN=sum))
>A  B  C  D  E  F
> 1 NA  6 NA NA NA NA
> 2 NA NA  3  6 NA NA
> 3 NA NA NA NA  6  6
>>
>
> and as you can see, the resulting matrix has NAs, all the same
> NA_real_, but semantically of two different kinds:
>
> 1) at ["1", "A"], the  NA  comes from the NA in 'N'
> 2) all other NAs come from the fact that there is no such factor
combination
>*and* from the fact that tapply() uses
>
>array(dim = .., dimnames = ...)
>
> i.e., initializes the array with NAs  (see definition of 'array').
>
> My proposition is the following patch to  tapply(), adding a new
> option 'init.value':
>
> 
-
>
> -tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
> +tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify
= TRUE)
>  {
>  FUN <- if (!is.null(FUN)) match.fun(FUN)
>  if (!is.list(INDEX)) INDEX <- list(INDEX)
> @@ -44,7 +44,7 @@
>  index <- as.logical(lengths(ans))  # equivalently, lengths(ans) > 0L
>  ans <- lapply(X = ans[index], FUN = FUN, ...)
>  if (simplify && all(lengths(ans) == 1L)) {
> -   ansmat <- array(dim = extent, dimnames = namelist)
> +   ansmat <- array(init.value, dim = extent, dimnames = namelist)
> ans <- unlist(ans, recursive = FALSE)
>  } else {
> ansmat <- array(vector("list", prod(extent)),
>
> 
-
>
> With that, I can set the initial value to '0' instead of array's
> default of NA :
>
>> with(DN, tapply(N, list(n,L), FUN=sum, init.value=0))
>A B C D E F
> 1 NA 6 0 0 0 0
> 2  0 0 3 6 0 0
> 3  0 0 0 0 6 6
>>
>
> which now has 0 counts and NA  as is desirable to be used inside
> xtabs().
>
> All fine... and would not be worth a posting to R-devel,
> except for this:
>
> The change will not be 100% back compatible -- by necessity: any new
argument for
> tapply() will make that argument name not available to be
> specified (via '...') for 'FUN'.  The new function would be
>
>> str(tapply)
> function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = TRUE)
>
> where the '...' are passed FUN(),  and with the new signature,
> 'init.value' then won't be passed to FUN  "anymore" (compared to
> R <= 3.3.x).
>
> For that reason, we could use   'INIT.VALUE' instead (possibly decreasing
> the probability the arg name is used in other functions).
>
>
> Opinions?
>
> Thank you in advance,
> Martin
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] cross-platform portable code in CRAN Repository Policy

2017-01-27 Thread Henrik Bengtsson

Second this.  As the CRAN Policies suggests, there's also the very
handy winbuilder service (https://win-builder.r-project.org/) you can
use to check your package on Windows.  This service has been a
valuable workhorse for years.

We should also mention the continuous integration (CI) services
provided for free by Travis (Linux and macOS) and AppVeyor (Windows)
in combination with GitHub (or GitLab, ...).  By adding simple
.travis.yml and appveyor.yml to your Git repos (e.g.
https://github.com/HenrikBengtsson/globals), they run R CMD check
--as-cran and covr::package_coverage() etc for you more or less on the
fly, e.g.

* https://travis-ci.org/HenrikBengtsson/globals
* https://ci.appveyor.com/project/HenrikBengtsson/globals

/Henrik

PS. Thanks to everyone who made all of the above possible.

On Fri, Jan 27, 2017 at 2:17 PM, Dirk Eddelbuettel  wrote:
>
> On 27 January 2017 at 21:54, Gábor Csárdi wrote:
> | On Fri, Jan 27, 2017 at 9:28 PM, Da Zheng  wrote:
> | > What major R platforms does this policy refer to?
> | >
> |
> | Linux, macOS, Windows.
> |
> |
> | > Currently, my package runs in Ubuntu. If it works on both Ubuntu and
> | > Redhat, does it count as two platforms?
> | >
> |
> | I think that Linux is just one. Is it hard to make it work on macOS?
> |
> | I am not saying that if it is Linux-only then it definitely cannot make it
> | to CRAN.
> | A CRAN maintainer will decide that.
>
> Gabor is *way* too modest here to not mention the *fabulous* tool he has
> written (with the [financial] support of the R Consortium):  R Hub.
>
> These days I just do'rhub::check_for_cran()'   and four tests launch
> covering the three required OSs as well as the required r-devel and r-release
> versions.  Results tickle in within minutes by mail; the windows one (which
> is slowest) is also display.  You need a one-time token handshake.
>
> I strongly recommend the service.
>
> Dirk
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Subject: Milestone: 10000 packages on CRAN

2017-01-27 Thread Henrik Bengtsson

Continuing the tradition to post millennia milestones on CRAN:

So, it happened. Today (January 27, 2017 PCT) CRAN reached 10,000 packages [1].

Needless to say, the rate with which new packages are added to CRAN
keeps increasing and so does the number of contributors (maintainers).
Somewhere out there, there are ~3 persons who are about to submit
their first packages to CRAN today and ~3 persons who will submit
another package of theirs. And by the amazing work of the CRAN team,
these packages are inspected and quality controlled before going live
- which often happens within a day or so.

As usual and it can't be said too many times: A big thank you to the
CRAN team, to the R core, to all package developers, to our friendly
community, to everyone out there helping others, and to various online
services that simplify package development. We can all give back by
carefully reporting bugs to the maintainers, properly citing packages
we use in publications (see citation("pkg")), and help newcomers to
use R.


Milestones:

2017-01-27 1 pkgs (+6.3/day over 158 days) 5845 mnts (+3.5/day)
2016-08-22 9000 pkgs (+5.7/day over 175 days) 5289 mnts (+5.8/day)
2016-02-29 8000 pkgs (+5.0/day over 201 days) 4279 mnts (+0.7/day)
2015-08-12 7000 pkgs (+3.4/day over 287 days) 4130 mnts (+2.4/day)
2014-10-29 6000 pkgs (+3.0/day over 335 days) 3444 mnts (+1.6/day)
2013-11-08 5000 pkgs (+2.7/day over 442 days) 2900 mnts (+1.2/day)
2012-08-23 4000 pkgs (+2.1/day over 469 days) 2350 mnts
2011-05-12 3000 pkgs (+1.7/day over 585 days)
2009-10-04 2000 pkgs (+1.1/day over 906 days)
2007-04-12 1000 pkgs
2004-10-01 500 pkgs
2003-04-01 250 pkgs
2002-09-17 68 pkgs
1997-04-23 12 pkgs

These data are for CRAN only [1-13]. There are many more packages
elsewhere, e.g. Bioconductor, GitHub, R-Forge etc.

[1] http://cran.r-project.org/web/packages/
[2] https://en.wikipedia.org/wiki/R_(programming_language)#Milestones
[3] http://www.r-pkg.org/
[4] Private data
[5] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
[6] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
[7] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html
[8] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html
[9] https://stat.ethz.ch/pipermail/r-devel/2013-November/067935.html
[10] https://stat.ethz.ch/pipermail/r-devel/2014-October/069997.html
[11] https://stat.ethz.ch/pipermail/r-package-devel/2015q3/000393.html
[12] https://stat.ethz.ch/pipermail/r-devel/2016-February/072388.html
[13] https://stat.ethz.ch/pipermail/r-devel/2016-August/073011.html

All the best,

Henrik
(just a user)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-27 Thread Henrik Bengtsson

On Fri, Jan 27, 2017 at 12:34 AM, Martin Maechler
<maech...@stat.math.ethz.ch> wrote:
>
> > On Jan 26, 2017 07:50, "William Dunlap via R-devel" 
> <r-devel@r-project.org>
> > wrote:
>
> > It would be cool if the default for tapply's init.value could be
> > FUN(X[0]), so it would be 0 for FUN=sum or FUN=length, TRUE for
> > FUN=all, -Inf for FUN=max, etc.  But that would take time and would
> > break code for which FUN did not work on length-0 objects.
>
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
>
> I had the same idea (after my first post), so I agree that would
> be nice. One could argue it would take time only if the user is too lazy
> to specify the value,  and we could use
>tryCatch(FUN(X[0]), error = NA)
> to safeguard against those functions that fail for 0 length arg.
>
> But I think the main reason for _not_ setting such a default is
> back-compatibility.  In my proposal, the new argument would not
> be any change by default and so all current uses of tapply()
> would remain unchanged.
>
>>>>>> Henrik Bengtsson <henrik.bengts...@gmail.com>
>>>>>> on Thu, 26 Jan 2017 07:57:08 -0800 writes:
>
> > On a related note, the storage mode should try to match ans[[1]] (or
> > unlist:ed and) when allocating 'ansmat' to avoid coercion and hence a 
> full
> > copy.
>
> Yes, related indeed; and would fall "in line" with Bill's idea.
> OTOH, it could be implemented independently,
> by something like
>
>if(missing(init.value))
>  init.value <-
>if(length(ans)) as.vector(NA, mode=storage.mode(ans[[1]]))
>else NA

I would probably do something like:

  ans <- unlist(ans, recursive = FALSE, use.names = FALSE)
  if (length(ans)) storage.mode(init.value) <- storage.mode(ans[[1]])
  ansmat <- array(init.value, dim = extent, dimnames = namelist)

instead.  That completely avoids having to use missing() and the value
of 'init.value' will be coerced later if not done upfront.  use.names
= FALSE speeds up unlist().

/Henrik

>
> .
>
> A colleague proposed to use the shorter argument name 'default'
> instead of 'init.value'  which indeed maybe more natural and
> still not too often used as "non-first" argument in  FUN(.).
>
> Thank you for the constructive feedback!
> Martin
>
> > On Thu, Jan 26, 2017 at 2:42 AM, Martin Maechler
> > <maech...@stat.math.ethz.ch> wrote:
> >> Last week, we've talked here about "xtabs(), factors and NAs",
> -> https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html
> >>
> >> In the mean time, I've spent several hours on the issue
> >> and also committed changes to R-devel "in two iterations".
> >>
> >> In the case there is a *Left* hand side part to xtabs() formula,
> >> see the help page example using 'esoph',
> >> it uses  tapply(...,  FUN = sum)   and
> >> I now think there is a missing feature in tapply() there, which
> >> I am proposing to change.
> >>
> >> Look at a small example:
> >>
> >>> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]),
> > N=3)[-c(1,5), ]; xtabs(~., D2)
> >> , , N = 3
> >>
> >> L
> >> n   A B C D E F
> >> 1 1 2 0 0 0 0
> >> 2 0 0 1 2 0 0
> >> 3 0 0 0 0 2 2
> >>
> >>> DN <- D2; DN[1,"N"] <- NA; DN
> >> n L  N
> >> 2  1 A NA
> >> 3  1 B  3
> >> 4  1 B  3
> >> 6  2 C  3
> >> 7  2 D  3
> >> 8  2 D  3
> >> 9  3 E  3
> >> 10 3 E  3
> >> 11 3 F  3
> >> 12 3 F  3
> >>> with(DN, tapply(N, list(n,L), FUN=sum))
> >> A  B  C  D  E  F
> >> 1 NA  6 NA NA NA NA
> >> 2 NA NA  3  6 NA NA
> >> 3 NA NA NA NA  6  6
> >>>
> >>
> >> and as you can see, the resulting matrix has NAs, all the same
> >> NA_real_, but semantically of two different kinds:
> >>
> >> 1) at ["1", "A"], the  NA  comes from the NA in 'N'
> >> 2) all other NAs come from the fact that there is no such factor
> > combination
> >> *and* from the fact that tapply() uses
> >>
> >> array(dim = .., dimnames = ...)
> >>
> >> i.e., initializ

[Rd] parallel::mc*: Is it possible for a child process to know it is a fork?

2017-01-24 Thread Henrik Bengtsson

When using multicore-forking of the parallel package, is it possible
for a child process to know that it is a fork?  Something like:

  parallel::mclapply(1:10, FUN = function(i) { test_if_running_in_a_fork() })

I'm looking into ways to protect against further parallel processes
(including threads), which not necessarily are created via the
parallel:mc* API, are being spawned off recursively.  For instance,
there are several packages that by default perform multi-threaded
processing using native code, but I'm not sure there's a way for such
package to avoid running in multi-threaded mode if running in a forked
child R processes.  Imagine

  y <- parallel::mclapply(1:10, FUN = function(i) {
 somepkg::threaded_calculation_using_all_cores()
  })

where the developer of `somepkg` is off no control whether user calls
it via mclapply() or via lapply().   I can see how the user of
mclapply() / lapply() can pass on this information, but that's not
safe and it might not be that the user is aware that deep down in the
dependency hierarchy there's one or more functions that do
multi-thread/process processing.

Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] parallel::mc*: Is it possible for a child process to know it is a fork?

2017-01-25 Thread Henrik Bengtsson

On Tue, Jan 24, 2017 at 8:10 PM, Jeroen Ooms <jeroeno...@gmail.com> wrote:
> On Tue, Jan 24, 2017 at 7:06 PM, Henrik Bengtsson
> <henrik.bengts...@gmail.com> wrote:
>> When using multicore-forking of the parallel package, is it possible
>> for a child process to know that it is a fork?
>
> R internally uses R_isForkedChild to prevent certain operations within
> the fork. However I don't think this is exported anywhere. You could
> do something like:
>
>   extern Rboolean R_isForkedChild;
>   SEXP is_forked(){
> return ScalarLogical(R_isForkedChild);
>   }
>
> But that won't be allowed on CRAN:
>
> * checking compiled code ... NOTE
>   Found non-API call to R: ‘R_isForkedChild’
>   Compiled code should not call non-API entry points in R.

Yes, that's a bummer.  It could be useful to have this exposed.  It's
used by several core packages, not just 'parallel' itself;

$ grep -F R_isForkedChild -r --include="*.h"
src/include/Defn.h:extern Rboolean R_isForkedChild INI_as(FALSE); /*
was this forked? */

$ grep -F R_isForkedChild -r --include="*.c"
src/library/tcltk/src/tcltk_unix.c://extern Rboolean R_isForkedChild;
src/library/tcltk/src/tcltk_unix.c:if (!R_isForkedChild && !Tcl_lock
src/library/parallel/src/fork.c:#include  // for R_isForkedChild
src/library/parallel/src/fork.c: R_isForkedChild = 1;
src/modules/X11/devX11.c:while (!R_isForkedChild && displayOpen &&
XPending(display)) {
src/modules/X11/devX11.c:if(R_isForkedChild)
src/unix/sys-unix.c:if (ptr_R_ProcessEvents && !R_isForkedChild)
ptr_R_ProcessEvents();

>
> Another method would be to look at getppid(2) and getpgid(2) to lookup
> the parent-id and group-id of the current process and test if it
> matches that of the (parent) R process.

I'm not 100% sure I follow.  Is the idea similar to the following in R?

ppid <- Sys.getpid()
is_child <- parallel::mclapply(1:10, FUN = function(i) { Sys.getpid() != ppid })

How can the child process know 'ppid'?  getppid would give the parent
PID for any process, which could be a non-R process.

>
> If you are only interested in limiting further parallelization within
> the fork, perhaps you can simply use parallel::mcaffinity to restrict
> the forked process to a single core.

This is tied to parallelization via parallel::mc*, correct?  That is,
is it only parallel:::mcfork() that respects those settings or does
this go down deeper in the OS such that it affects forking / threading
on a more general level?

Thanks for your pointers and suggestions,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] How to build R without support for translations?

2017-02-21 Thread Henrik Bengtsson

In Section 'Localization of messages' of R Installation and
Administration (R 3.3.2), it says:

   "R can be built without support for translations, but it is enabled
by default."

How can this be done?  Is this an option to 'configure', which I then
failed to identify, or via some environment variable setting?

My objective is to get an R installation (on Linux) that is as small
as possible.

Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How to build R without support for translations?

2017-02-21 Thread Henrik Bengtsson

On Tue, Feb 21, 2017 at 7:00 PM, Dirk Eddelbuettel <e...@debian.org> wrote:
>
> On 21 February 2017 at 18:45, Henrik Bengtsson wrote:
> | In Section 'Localization of messages' of R Installation and
> | Administration (R 3.3.2), it says:
> |
> |"R can be built without support for translations, but it is enabled
> | by default."
> |
> | How can this be done?  Is this an option to 'configure', which I then
> | failed to identify, or via some environment variable setting?
>
> To a first approximation:  ensure configure fails that sub-tests by not
> having the corresponding -dev package.  More elaborately, turn the
> corresponding configure variable to 'no'.

To identify and manually disable / fail all relevant configure tests
was the answer I feared.

>
> | My objective is to get an R installation (on Linux) that is as small
> | as possible.
>
> I considered playing that game a couple of years ago and decided that it is
> more or less a waste of time: as good as 'R the interpreter' is, the real
> added value (at least to me) comes from the *incredible* power supplied by
> the *massive* number *perfectly well working add-on* packages from CRAN.
>
> Which nixes the idea of a minimal size. R really is /usr/bin/R plus whatever
> you want from CRAN.  So for you, what use in reducing R by 10% if you can't
> add the 'future' package?  Not to mention that many packages may need a
> compiler, or a beast like BH, or ...

I'm aware this question comes up once in a while.  One immediate
interest is running R on Amazon Lambda, which only allows for
deploying a 50 MB ZIP file / 250 MB uncompressed
(http://docs.aws.amazon.com/lambda/latest/dg/limits.html). So, an
obvious ~7 MB reduction can be valuable / critical there.

Thanks,

Henrik

>
> Dirk
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Pressing either Ctrl-\ of Ctrl-4 core dumps R

2017-02-10 Thread Henrik Bengtsson

When running R from the terminal on Linux (Ubuntu 16.04), it core
dumps whenever / wherever I press Ctrl-4 or Ctrl-\.  You get thrown
back to the terminal with "Quit (core dump)" being the only message.
Grepping the R source code, it doesn't look like that message is
generated by R itself.  Over on Twitter, it has been confirmed to also
happen on macOS.

$ R -d valgrind --vanilla --quiet
==979== Memcheck, a memory error detector
==979== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==979== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==979== Command: /usr/lib/R/bin/exec/R --vanilla --quiet
==979==
> 1+2
[1] 3

# At next prompt I press Ctrl-\. The same happens also when done in
the middle of an entry.

> ==979==
==979== Process terminating with default action of signal 3 (SIGQUIT)
==979==at 0x576C9C3: __select_nocancel (syscall-template.S:84)
==979==by 0x502EABE: R_SelectEx (in /usr/lib/R/lib/libR.so)
==979==by 0x502EDDF: R_checkActivityEx (in /usr/lib/R/lib/libR.so)
==979==by 0x502F32B: ??? (in /usr/lib/R/lib/libR.so)
==979==by 0x4F6988B: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
==979==by 0x4F69CF0: ??? (in /usr/lib/R/lib/libR.so)
==979==by 0x4F69DA7: run_Rmainloop (in /usr/lib/R/lib/libR.so)
==979==by 0x4007CA: main (in /usr/lib/R/bin/exec/R)
==979==
==979== HEAP SUMMARY:
==979== in use at exit: 28,981,596 bytes in 13,313 blocks
==979==   total heap usage: 27,002 allocs, 13,689 frees, 49,025,684
bytes allocated
==979==
==979== LEAK SUMMARY:
==979==definitely lost: 0 bytes in 0 blocks
==979==indirectly lost: 0 bytes in 0 blocks
==979==  possibly lost: 0 bytes in 0 blocks
==979==still reachable: 28,981,596 bytes in 13,313 blocks
==979== suppressed: 0 bytes in 0 blocks
==979== Rerun with --leak-check=full to see details of leaked memory
==979==
==979== For counts of detected and suppressed errors, rerun with: -v
==979== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Quit (core dumped)

$ R --version
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Pressing either Ctrl-\ of Ctrl-4 core dumps R

2017-02-12 Thread Henrik Bengtsson

Thanks for these explanations - it all makes sense, that is, the
default behavior for a process that does not capture SIGQUIT is to
quit and perform a core dump
(https://en.wikipedia.org/wiki/Unix_signal#SIGQUIT).

Then the remaining question, as Luke says, is: should R handle this
signal?  For instance, in interactive mode, SIGQUIT could maybe bring
up:

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
5: ignore SIGQUIT (continue evaluation)
Selection:

giving the option to ignore a SIGQUIT send by mistake.  Not sure how
big of a problem this is (I'm surprise I never hit Ctrl+\ by mistake
previously).


Also, I'm sharing my notes about stty and SIGQUIT in case someone else
finds them useful:

My terminal (Linux / Ubuntu 16.04) settings are the same as Bill's
(stty --all). Thus, pressing Ctrl+\ causes the terminal to signals
SIGQUIT to the running process (= R).  Since R does not handle /
capture this specifically, this results in the process to quit and
perform a core dump.  Pressing Ctrl+\ is effectively the same a
calling 'kill -s QUIT '.

One can disable the QUIT signal send by the terminal by:

$ stty quit ''

such that one gets:

$ stty --all
speed 38400 baud; rows 33; columns 80; line = 0;
intr = ^C; quit = ; erase = ^?; kill = ^U; eof = ^D; eol = ;
eol2 = ; swtch = ; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff
-iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
echoctl echoke -flusho -extproc

This will prevent QUIT to be signalled when pressing Ctrl+\ and
thereby R (Python, ...) won't core dump.  One can of course still use
kill -s QUIT .

To reset the above, either restart the terminal or use either of:

$ stty quit ^\\## caret notation (espaced ^\)
$ stty quit 0x1c   ## hexadecimal notation
$ stty quit 034## octal notation
$ stty quit 28 ## decimal notation

I still don't understand why the terminal treats keypress Ctrl+4 the
same as Ctrl+\, but at least I'm not alone;
https://catern.com/posts/terminal_quirks.html#fn.3.

Thanks

Henrik

On Fri, Feb 10, 2017 at 11:00 AM,  <luke-tier...@uiowa.edu> wrote:
> So do a number of other interactive programs when working in a
> terminal (e.g. python) since it looks like your terminal is configured
> for those two actions to send the SIGQUIT signal. Whether R should
> ignore that signal, under some circumstances at least, is another
> question.
>
> Best,
>
> luke
>
>
> On Fri, 10 Feb 2017, Henrik Bengtsson wrote:
>
>> When running R from the terminal on Linux (Ubuntu 16.04), it core
>> dumps whenever / wherever I press Ctrl-4 or Ctrl-\.  You get thrown
>> back to the terminal with "Quit (core dump)" being the only message.
>> Grepping the R source code, it doesn't look like that message is
>> generated by R itself.  Over on Twitter, it has been confirmed to also
>> happen on macOS.
>>
>> $ R -d valgrind --vanilla --quiet
>> ==979== Memcheck, a memory error detector
>> ==979== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
>> ==979== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
>> ==979== Command: /usr/lib/R/bin/exec/R --vanilla --quiet
>> ==979==
>>>
>>> 1+2
>>
>> [1] 3
>>
>> # At next prompt I press Ctrl-\. The same happens also when done in
>> the middle of an entry.
>>
>>> ==979==
>>
>> ==979== Process terminating with default action of signal 3 (SIGQUIT)
>> ==979==at 0x576C9C3: __select_nocancel (syscall-template.S:84)
>> ==979==by 0x502EABE: R_SelectEx (in /usr/lib/R/lib/libR.so)
>> ==979==by 0x502EDDF: R_checkActivityEx (in /usr/lib/R/lib/libR.so)
>> ==979==by 0x502F32B: ??? (in /usr/lib/R/lib/libR.so)
>> ==979==by 0x4F6988B: Rf_ReplIteration (in /usr/lib/R/lib/libR.so)
>> ==979==by 0x4F69CF0: ??? (in /usr/lib/R/lib/libR.so)
>> ==979==by 0x4F69DA7: run_Rmainloop (in /usr/lib/R/lib/libR.so)
>> ==979==by 0x4007CA: main (in /usr/lib/R/bin/exec/R)
>> ==979==
>> ==979== HEAP SUMMARY:
>> ==979== in use at exit: 28,981,596 bytes in 13,313 blocks
>> ==979==   total heap usage: 27,002 allocs, 13,689 frees, 49,025,684
>> bytes allocated
>> ==979==
>> ==979== LEAK SUMMARY:
>> ==979==definitely lost: 0 bytes in 0 blocks
>> ==979==indirectly lost: 0 bytes in 0 blocks
>> ==979==  possibly lost: 0 bytes in 0 blocks
>> ==979==still

Re: [Rd] Bug with zlib version checking for zlib >= 1.2.10, R version 3.3.2

2017-02-12 Thread Henrik Bengtsson

This has been fixed (https://cran.r-project.org/doc/manuals/r-devel/NEWS.html):

CHANGES IN R 3.3.2 patched:

INSTALLATION on a UNIX-ALIKE

* The configure check for the zlib version is now robust to versions
longer than 5 characters, including 1.2.10.

in SVN r71889 (2017-01-03):

https://github.com/wch/r-source/commit/a0fe05ce9d0937ad2334bb370785cb22c71e592b

/Henrik


On Sun, Feb 12, 2017 at 3:51 PM, Justin Bedő  wrote:
>
> Hi,
>
> Posting here as bugzilla is closed to registration.
> The zlib version checking code does not handle double digits for the
> patch version in the semantic versioning scheme. Consequently, a
> ./configure fails when using a zlib version ≥ 1.5.10. I suggest
> something like the following patch:
>
> --- a/m4/R.m4
> +++ b/m4/R.m4
> @@ -3116,7 +3116,7 @@ int main() {
>  #ifdef ZLIB_VERSION
>  /* Work around Debian bug: it uses 1.2.3.4 even though there was no such
> version on the master site zlib.net */
> -  exit(strncmp(ZLIB_VERSION, "1.2.5", 5) < 0);
> +  exit(strncmp(ZLIB_VERSION, "1.2.5", 5) < 0 && (strlen(ZLIB_VERSION) < 6 || 
> strncmp(ZLIB_VERSION, "1.2.10", 6) < 0));
>  #else
>exit(1);
>  #endif
>
> This could of course be improved to properly parse the string.
>
> Cheers,
>
> Justin
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Milestone: 9000 packages on CRAN

2016-08-22 Thread Henrik Bengtsson

An additional 1000 packages have been added to CRAN.  This time, it
took less than 6 months. Today (August 22, 2016), the Comprehensive R
Archive Network (CRAN) [1] reports:

“Currently, the CRAN package repository features 9004 available packages.”

The rate with which new packages are added to CRAN is increasing.
During 2007-2009 we went from 1000 to 2000 packages in 906 days (1.1
per day) and in 2014-2015 we went from 6000 to 7000 packages in 287
days (3.4 per day). The next 1000 packages took 201 days (5.0 per day)
and these most recent 1000 packages took only 175 days (5.7 per day).
With this speedup, we should hit 1 packages on CRAN early 2017.

Since the start of CRAN on April 23, 1997 [2], there has been on
average one new package appearing on CRAN every 18.8 hours - actually
even more than that because dropped/archived packages are not
accounted for. The 9000 packages on CRAN are maintained by 5289 people
[3].

A big thank you to the R core, the CRAN team (!), to all package
developers, to our friendly community, to everyone out there helping
others, and to various online services that simplify package
development. We can all give back by carefully reporting bugs to the
maintainers, properly citing any packages you use in your publications
(see citation("pkg name")), and help new comers to use R.

Milestones:

2016-08-22: 9000 packages [this post]
2016-02-29: 8000 packages [12]
2015-08-12: 7000 packages [11]
2014-10-29: 6000 packages [10]
2013-11-08: 5000 packages [9]
2012-08-23: 4000 packages [8]
2011-05-12: 3000 packages [7]
2009-10-04: 2000 packages [6]
2007-04-12: 1000 packages [5]
2004-10-01: 500 packages [4]
2003-04-01: 250 packages [4]

These data are for CRAN only. There are many more packages elsewhere,
e.g. R-Forge, Bioconductor, Github etc.

[1] http://cran.r-project.org/web/packages/
[2] https://en.wikipedia.org/wiki/R_(programming_language)#Milestones
[3] http://www.r-pkg.org/
[4] Private data
[5] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
[6] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
[7] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html
[8] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html
[9] https://stat.ethz.ch/pipermail/r-devel/2013-November/067935.html
[10] https://stat.ethz.ch/pipermail/r-devel/2014-October/069997.html
[11] https://stat.ethz.ch/pipermail/r-package-devel/2015q3/000393.html
[12] https://stat.ethz.ch/pipermail/r-devel/2016-February/072388.html

All the best,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 1007 matches

Mail list logo