from:"Martin Maechler"

[Rd] withAutoprint({ .... }) ?

2016-09-02 Thread Martin Maechler

On R-help, with subject
   '[R] source() does not include added code'

> Joshua Ulrich 
> on Wed, 31 Aug 2016 10:35:01 -0500 writes:

> I have quantstrat installed and it works fine for me.  If you're
> asking why the output of t(tradeStats('macross')) isn't being printed,
> that's because of what's described in the first paragraph in the
> *Details* section of help("source"):

> Note that running code via ‘source’ differs in a few respects from
> entering it at the R command line.  Since expressions are not
> executed at the top level, auto-printing is not done.  So you will
> need to include explicit ‘print’ calls for things you want to be
> printed (and remember that this includes plotting by ‘lattice’,
> FAQ Q7.22).

> So you need:

> print(t(tradeStats('macross')))

> if you want the output printed to the console.

indeed, and "of course"" ;-)

As my subject indicates, this is another case, where it would be
very convenient to have a function

   withAutoprint()

so the OP could have (hopefully) have used
   withAutoprint(source(..))
though that would have been equivalent to the already nicely existing

   source(.., print.eval = TRUE)

which works via the  withVisible(.) utility that returns for each
'expression' if it would auto print or not, and then does print (or
not) accordingly.

My own use cases for such a withAutoprint({...})
are demos and examples, sometimes even package tests which I want to print:

Assume I have a nice demo / example on a help page/ ...

foo(..)
(z <- bar(..))
summary(z)

where I carefully do print parts (and don't others),
and suddenly I find I want to run that part of the demo /
example / test only in some circumstances, e.g., only when
interactive, but not in BATCH, or only if it is me, the package maintainer,

if( identical(Sys.getenv("USER"), "maechler") ) {
  foo(..)
  (z <- bar(..))
  summary(z)

}

Now all the auto-printing is gone, and

1) I have to find out which of these function calls do autoprint and wrap
   a print(..) around these, and

2) the result is quite ugly (for an example on a help page etc.)

What I would like in a future R, is to be able to simply wrap the "{
.. } above with an 'withAutoprint(.) :

if( identical(Sys.getenv("USER"), "maechler") ) withAutoprint({
  foo(..)
  (z <- bar(..))
  summary(z)

})

Conceptually such a function could be written similar to source() with an R
level for loop, treating each expression separately, calling eval(.) etc.
That may cost too much performnace, ... still to have it would be better than
not having the possibility.

If you read so far, you'd probably agree that such a function
could be a nice asset in R,
notably if it was possible to do this on the fast C level of R's main
REPL.

Have any of you looked into how this could be provided in R ?
If you know the source a little, you will remember that there's
the global variable  R_Visible  which is crucial here.
The problem with that is that it *is* global, and only available
as that; that the auto-printing "concept" is so linked to "toplevel context"
and that is not easy, and AFAIK not so much centralized in one place in the
source. Consequently, all kind of (very) low level functions manipulate 
R_Visible
temporarily and so a C level implementation of  withAutoprint() may
need considerable more changes than just setting R_Visible to TRUE in one
place. 

Have any efforts / experiments already happened towards providing such
functionality ?

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

2016-09-07 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Tue, 6 Sep 2016 22:26:31 +0200 writes:

> Yesterday, changes to R's development version were committed, relating
> to arithmetic, logic ('&' and '|') and
> comparison/relational ('<', '==') binary operators
> which in NEWS are described as

> SIGNIFICANT USER-VISIBLE CHANGES:

> [.]

> • Arithmetic, logic (‘&’, ‘|’) and comparison (aka
> ‘relational’, e.g., ‘<’, ‘==’) operations with arrays now
> behave consistently, notably for arrays of length zero.

> Arithmetic between length-1 arrays and longer non-arrays had
> silently dropped the array attributes and recycled.  This
> now gives a warning and will signal an error in the future,
> as it has always for logic and comparison operations in
> these cases (e.g., compare ‘matrix(1,1) + 2:3’ and
> ‘matrix(1,1) < 2:3’).

> As the above "visually suggests" one could think of the changes
> falling mainly two groups,
> 1) <0-extent array>  (op) 
> 2) <1-extent array>  (arith)  

> These changes are partly non-back compatible and may break
> existing code.  We believe that the internal consistency gained
> from the changes is worth the few places with problems.

> We expect some package maintainers (10-20, or even more?) need
> to adapt their code.

> Case '2)' above mainly results in a new warning, e.g.,

>> matrix(1,1) + 1:2
> [1] 2 3
> Warning message:
> In matrix(1, 1) + 1:2 :
> dropping dim() of array of length one.  Will become ERROR
>> 

> whereas '1)' gives errors in cases the result silently was a
> vector of length zero, or also keeps array (dim & dimnames) in
> cases these were silently dropped.

> The following is a "heavily" commented  R script showing (all ?)
> the important cases with changes :

> 


> (m <- cbind(a=1[0], b=2[0]))
> Lm <- m; storage.mode(Lm) <- "logical"
> Im <- m; storage.mode(Im) <- "integer"

> ## 1. -
> try( m & NULL ) # in R <= 3.3.x :
> ## Error in m & NULL :
> ##  operations are possible only for numeric, logical or complex types
> ##
> ## gives 'Lm' in R >= 3.4.0

> ## 2. -
> m + 2:3 ## gave numeric(0), now remains matrix identical to  m
> Im + 2:3 ## gave integer(0), now remains matrix identical to Im (integer)

> m > 1  ## gave logical(0), now remains matrix identical to Lm 
(logical)
> m > 0.1[0] ##  ditto
> m > NULL   ##  ditto

> ## 3. -
> mm <- m[,c(1:2,2:1,2)]
> try( m == mm ) ## now gives error   "non-conformable arrays",
> ## but gave logical(0) in R <= 3.3.x

> ## 4. -
> str( Im + NULL)  ## gave "num", now gives "int"

> ## 5. -
> ## special case for arithmetic w/ length-1 array
> (m1 <- matrix(1,1,1, dimnames=list("Ro","col")))
> (m2 <- matrix(1,2,1, dimnames=list(c("A","B"),"col")))

> m1 + 1:2  # ->  2:3  but now with warning to  "become ERROR"
> tools::assertError(m1 & 1:2)# ERR: dims [product 1] do not match the 
length of object [2]
> tools::assertError(m1 < 1:2)# ERR:  (ditto)
> ##
> ## non-0-length arrays combined with {NULL or double() or ...} *fail*

> ### Length-1 arrays:  Arithmetic with |vectors| > 1  treated array as 
scalar
> m1 + NULL # gave  numeric(0) in R <= 3.3.x --- still, *but* w/ warning to 
"be ERROR"
> try(m1 > NULL)# gave  logical(0) in R <= 3.3.x --- an *error* now in 
R >= 3.4.0
> tools::assertError(m1 & NULL)# gave and gives error
> tools::assertError(m1 | double())# ditto
> ## m2 was slightly different:
> tools::assertError(m2 + NULL)
> tools::assertError(m2 & NULL)
> try(m2 == NULL) ## was logical(0) in R <= 3.3.x; now error as above!

> 



> Note that in R's own  'nls'  sources, there was one case of
> situation '2)' above, i.e. a  1x1-matrix was used as a "scalar".

> In such cases, you should explicitly coerce it to a vector,
> either ("self-explainingly") by  as.vector(.), or as I did in
> the nls case  by  c(.) :  The latt

Re: [Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

2016-09-08 Thread Martin Maechler

>>>>> robin hankin <hankin.ro...@gmail.com>
>>>>> on Thu, 8 Sep 2016 10:05:21 +1200 writes:

> Martin I'd like to make a comment; I think that R's
> behaviour on 'edge' cases like this is an important thing
> and it's great that you are working on it.

> I make heavy use of zero-extent arrays, chiefly because
> the dimnames are an efficient and logical way to keep
> track of certain types of information.

> If I have, for example,

> a <- array(0,c(2,0,2))
> dimnames(a) <- list(name=c('Mike','Kevin'),NULL,item=c("hat","scarf"))


> Then in R-3.3.1, 70800 I get

a> 0
> logical(0)
>> 

> But in 71219 I get

a> 0
> , , item = hat


> name
> Mike
> Kevin

> , , item = scarf


> name
> Mike
> Kevin

> (which is an empty logical array that holds the names of the people and
> their clothes). I find the behaviour of 71219 very much preferable because
> there is no reason to discard the information in the dimnames.

Thanks a lot, Robin, (and Oliver) !

Yes, the above is such a case where the new behavior makes much sense.
And this behavior remains identical after the 71222 amendment.

Martin

    > Best wishes
> Robin




> On Wed, Sep 7, 2016 at 9:49 PM, Martin Maechler 
<maech...@stat.math.ethz.ch>
> wrote:

>> >>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>> >>>>> on Tue, 6 Sep 2016 22:26:31 +0200 writes:
>> 
>> > Yesterday, changes to R's development version were committed,
>> relating
>> > to arithmetic, logic ('&' and '|') and
>> > comparison/relational ('<', '==') binary operators
>> > which in NEWS are described as
>> 
>> > SIGNIFICANT USER-VISIBLE CHANGES:
>> 
>> > [.]
>> 
>> > • Arithmetic, logic (‘&’, ‘|’) and comparison (aka
>> > ‘relational’, e.g., ‘<’, ‘==’) operations with arrays now
>> > behave consistently, notably for arrays of length zero.
>> 
>> > Arithmetic between length-1 arrays and longer non-arrays had
>> > silently dropped the array attributes and recycled.  This
>> > now gives a warning and will signal an error in the future,
>> > as it has always for logic and comparison operations in
>> > these cases (e.g., compare ‘matrix(1,1) + 2:3’ and
>> > ‘matrix(1,1) < 2:3’).
>> 
>> > As the above "visually suggests" one could think of the changes
>> > falling mainly two groups,
>> > 1) <0-extent array>  (op) 
>> > 2) <1-extent array>  (arith)  
>> 
>> > These changes are partly non-back compatible and may break
>> > existing code.  We believe that the internal consistency gained
>> > from the changes is worth the few places with problems.
>> 
>> > We expect some package maintainers (10-20, or even more?) need
>> > to adapt their code.
>> 
>> > Case '2)' above mainly results in a new warning, e.g.,
>> 
>> >> matrix(1,1) + 1:2
>> > [1] 2 3
>> > Warning message:
>> > In matrix(1, 1) + 1:2 :
>> > dropping dim() of array of length one.  Will become ERROR
>> >>
>> 
>> > whereas '1)' gives errors in cases the result silently was a
>> > vector of length zero, or also keeps array (dim & dimnames) in
>> > cases these were silently dropped.
>> 
>> > The following is a "heavily" commented  R script showing (all ?)
>> > the important cases with changes :
>> 
>> > 
>> 
>> 
>> > (m <- cbind(a=1[0], b=2[0]))
>> > Lm <- m; storage.mode(Lm) <- "logical"
>> > Im <- m; storage.mode(Im) <- "integer"
>> 
>> > ## 1. -
>> > try( m & NULL ) # in R <= 3.3.x :
>> > ## Error in m & NULL :
>> > ##  operations are possible only for numeric, logical or complex
>> types
>> > ##
>> > ## gives 'Lm' in R >= 3.4.0
>> 
>> > ## 2. -
>> > m + 2:3 ## gave numeric(0), now remains matrix identical to  m
>> > Im + 2:3 ## gave integer(0),

Re: [Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

2016-09-07 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Wed, 7 Sep 2016 11:49:11 +0200 writes:

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Tue, 6 Sep 2016 22:26:31 +0200 writes:

>> Yesterday, changes to R's development version were committed, relating
>> to arithmetic, logic ('&' and '|') and
>> comparison/relational ('<', '==') binary operators
>> which in NEWS are described as

>> SIGNIFICANT USER-VISIBLE CHANGES:

>> [.]

>> • Arithmetic, logic (‘&’, ‘|’) and comparison (aka
>> ‘relational’, e.g., ‘<’, ‘==’) operations with arrays now
>> behave consistently, notably for arrays of length zero.

>> Arithmetic between length-1 arrays and longer non-arrays had
>> silently dropped the array attributes and recycled.  This
>> now gives a warning and will signal an error in the future,
>> as it has always for logic and comparison operations in
>> these cases (e.g., compare ‘matrix(1,1) + 2:3’ and
>> ‘matrix(1,1) < 2:3’).

>> As the above "visually suggests" one could think of the changes
>> falling mainly two groups,
>> 1) <0-extent array>  (op) 
>> 2) <1-extent array>  (arith)  

>> These changes are partly non-back compatible and may break
>> existing code.  We believe that the internal consistency gained
>> from the changes is worth the few places with problems.

>> We expect some package maintainers (10-20, or even more?) need
>> to adapt their code.

>> Case '2)' above mainly results in a new warning, e.g.,

>>> matrix(1,1) + 1:2
>> [1] 2 3
>> Warning message:
>> In matrix(1, 1) + 1:2 :
>> dropping dim() of array of length one.  Will become ERROR
>>> 

>> whereas '1)' gives errors in cases the result silently was a
>> vector of length zero, or also keeps array (dim & dimnames) in
>> cases these were silently dropped.

>> The following is a "heavily" commented  R script showing (all ?)
>> the important cases with changes :

>> 


>> (m <- cbind(a=1[0], b=2[0]))
>> Lm <- m; storage.mode(Lm) <- "logical"
>> Im <- m; storage.mode(Im) <- "integer"

>> ## 1. -
>> try( m & NULL ) # in R <= 3.3.x :
>> ## Error in m & NULL :
>> ##  operations are possible only for numeric, logical or complex types
>> ##
>> ## gives 'Lm' in R >= 3.4.0

>> ## 2. -
>> m + 2:3 ## gave numeric(0), now remains matrix identical to  m
>> Im + 2:3 ## gave integer(0), now remains matrix identical to Im (integer)

>> m > 1  ## gave logical(0), now remains matrix identical to Lm 
(logical)
>> m > 0.1[0] ##  ditto
>> m > NULL   ##  ditto

>> ## 3. -
>> mm <- m[,c(1:2,2:1,2)]
>> try( m == mm ) ## now gives error   "non-conformable arrays",
>> ## but gave logical(0) in R <= 3.3.x

>> ## 4. -
>> str( Im + NULL)  ## gave "num", now gives "int"

>> ## 5. -
>> ## special case for arithmetic w/ length-1 array
>> (m1 <- matrix(1,1,1, dimnames=list("Ro","col")))
>> (m2 <- matrix(1,2,1, dimnames=list(c("A","B"),"col")))

>> m1 + 1:2  # ->  2:3  but now with warning to  "become ERROR"
>> tools::assertError(m1 & 1:2)# ERR: dims [product 1] do not match the 
length of object [2]
>> tools::assertError(m1 < 1:2)# ERR:  (ditto)
>> ##
>> ## non-0-length arrays combined with {NULL or double() or ...} *fail*

>> ### Length-1 arrays:  Arithmetic with |vectors| > 1  treated array as 
scalar
>> m1 + NULL # gave  numeric(0) in R <= 3.3.x --- still, *but* w/ warning 
to "be ERROR"
>> try(m1 > NULL)# gave  logical(0) in R <= 3.3.x --- an *error* now in 
R >= 3.4.0
>> tools::assertError(m1 & NULL)# gave and gives error
>> tools::assertError(m1 | double())# ditto
>> ## m2 was slightly different:
>> tools::assertError(m2 + NULL)
>> tools::assertError(m2 & NULL)
>> try(m2 == NULL) ## was logical(0) in R <= 3.3.x; now error as above

Re: [Rd] 'droplevels' inappropriate change

2016-08-31 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Sat, 27 Aug 2016 18:55:37 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
>>>>> on Sat, 27 Aug 2016 03:17:32 + writes:

>> In R devel r71157, 'droplevels' documentation, in "Arguments" section, 
says this about argument 'exclude'.
>> passed to factor(); factor levels which should be excluded from the 
result even if present.  Note that this was implicitly NA in R <= 3.3.1 which 
did drop NA levels even when present in x, contrary to the documentation.  The 
current default is compatible with x[ , drop=FALSE].

>> The part
>> x[ , drop=FALSE]
>> should be
>> x[ , drop=TRUE]

> Yes, definitely, thank you!
> a "typo" by me. .. fixed now.

>> Saying that 'exclude' is factor levels is not quite true for NA element. 
NA may be not an original level, but NA in 'exclude' affects the result.

>> For a factor 'x', factor(x, exclude = exclude) doesn't really work for 
excluding in general. See, for example, 
https://stat.ethz.ch/pipermail/r-help/2005-September/079336.html .
>> factor(factor(c("a","b","c")), exclude="c")

>> However, this excludes "2":
>> factor(factor(2:3), exclude=2)

>> Rather unexpectedly, this excludes NA:
>> factor(factor(c("a",NA), exclude=NULL), exclude="c")

>> For a factor 'x', factor(x, exclude = exclude) can only exclude 
integer-like or NA levels. An explanation is in 
https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html .

> Well, Peter Dalgaard (in that R-devel e-mail, a bit more than 5
> years ago) is confirming the problem there,  and suggesting (as
> you, right?) that actually   `factor()` is not behaving
> correctly here.

> And your persistence is finally getting close to convince me
> that it is not just droplevels(), but  factor() itself which
> needs care here.

> Interestingly, the following patch *does* pass 'make check-all'
> (after small change in tests/reg-tests-1b.R which is ok),
> and leads to behavior which is much closer to the documentation,
> notably for your two examples above would give what one would
> expect.

> (( If the R-Hub would support experiments with branches of R-devel 
> from R-core members,  I could just create such a branch and R Hub
> would run 'R CMD check '  for thousands of CRAN packages
> and provide a web page with the *differences* in the package
> check results ... so we could see ... ))

> I do agree that we should strongly consider such a change.

as nobody has commented, I've been liberal and have taken these
no comments as consent.

Hence I have committed


r71178 | maechler | 2016-08-31 09:45:40 +0200 (Wed, 31 Aug 2016) | 1 line
Changed paths:
   M /trunk/doc/NEWS.Rd
   M /trunk/src/library/base/R/factor.R
   M /trunk/src/library/base/man/factor.Rd
   M /trunk/tests/reg-tests-1b.R
   M /trunk/tests/reg-tests-1c.R

factor(x, exclude) more "rational" when x or exclude are character


which apart from documentation, examples, and regression tests
is just the patch below.

Martin Maechler
ETH Zurich


> --- factor.R  (revision 71157)
> +++ factor.R  (working copy)
> @@ -28,8 +28,12 @@
> levels <- unique(y[ind])
> }
> force(ordered) # check if original x is an ordered factor
> -exclude <- as.vector(exclude, typeof(x)) # may result in NA
> -x <- as.character(x)
> +if(!is.character(x)) {
> + if(!is.character(exclude))
> + exclude <- as.vector(exclude, typeof(x)) # may result in NA
> + x <- as.character(x)
> +} else
> + exclude <- as.vector(exclude, typeof(x)) # may result in NA
> ## levels could be a long vectors, but match will not handle that.
> levels <- levels[is.na(match(levels, exclude))]
> f <- match(x, levels)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

2016-09-12 Thread Martin Maechler

>>>>> Radford Neal <radf...@cs.toronto.edu>
>>>>> on Fri, 9 Sep 2016 10:29:14 -0400 writes:

>> Radford Nea:
>> > So it may make more sense to move towards consistency in the
>> > permissive direction, rather than the restrictive direction.
>> 
>> > That would mean allowing matrix(1,1,1) < (1:2), and maybe also things
>> > like matrix(1,2,2)+(1:8).
>> 
>> Martin Maechler:
>> That is an interesting idea.  Yes, in my view that would
>> definitely also have to allow the latter, by the above argument
>> of not treating the dim/dimnames attributes special.  For
>> non-arrays length-1 is not treated much special apart from the
>> fact that length-1 can always be recycled (without warning).

> I think one could argue for allowing matrix(1,1,1)+(1:8) but not
> matrix(1,2,2)+(1:8).  Length-1 vectors certainly are special in some
> circumstances, being R's only way of representing a scalar.  For
> instance, if (c(T,F)) gives a warning.

well, the if(.)  situation is very special and does not weigh
much for me, here.

> This really goes back to what I think may have been a basic mistake in
> the design of S, in deciding that everything is a vector, then halfway
> modifying this with dim attributes, but it's too late to totally undo
> that (though allowing a 0-length dim attribute to explicitly mark a
> length-1 vector as a scalar might help).

(yes; I think there are also other ideas of adding true small
 scalars to R... I am not familiar with those, and in any case
 that should be a completely different thread and not be
 discussed in this one)

>> > And I think there would be some significant problems. In addition to
>> > the 10-20+ packages that Martin expects to break, there could be quite
>> > a bit of user code that would no longer work - scripts for analysing
>> > data sets that used to work, but now don't with the latest version.
>> 
>> That's not true (at least for the cases above): They would give
>> a strong warning

> But isn't the intent to make it an error later?  So I assume we're
> debating making it an error, not just a warning.  

Yes, that's correct.
But if we have a longish deprecation period (i.e. where there's
only a warning) all important code should have been adapted
before it turns to an error 
 (( unless for those people who are careless enough to "graciously"
use something like suppressWarnings(...) in too many places )).

> (Though I'm
> generally opposed to such warnings anyway, unless they could somehow
> be restricted to come up only for interactive uses, not from deep in a
> program the user didn't write, making them totally mysterious...)

>> *and* the  logic and relop versions of this, e.g.,
>> matrix(TRUE,1) | c(TRUE,FALSE) ;  matrix(1,1) > 1:2,
>> have always been an  error; so nothing would break there.

> Yes, that wouldn't change the behaviour of old code, but if we're
> aiming for consistency, it might make sense to get rid of that error,
> allowing code like sum(a%*%b<c(10,20,30)) with a and b being vectors,
> rather than forcing the programmer to write sum(c(a%*%b)<c(10,20,30)).

Yes, that would be another way for consistency... leading to
less problems in existing code.  As said earlier, getting
consistency by becoming "more lenient" instead of "more restrictive" 
is a good option in my view.

We would however have this somewhat special  length-1-array
exception in how arrays behave in binary OPs, and both the underlying C
code and the full documentation being/becoming slightly more complicated
rather than simpler,

OTOH we would remain back compatible (*) to S or at least S-plus
(as far as I know) and all earlier versions of R, here,
and that is valuable, too, I agree.

Nobody else has commented yet on this sub-thread ... not even
privately to me.  If that status does not change quite a bit,
I don't see enough incentive for changing (the current R-devel code).

Martin

--
(*) "back-compatible" in the sense that old code which "worked"
would continue to work the same
(but some old code that gave an error would no longer do so)



>> Of course; that *was* the reason the very special treatment for 
arithmetic
>> length-1 arrays had been introduced.  It is convenient.
>> 
>> However, *some* of the conveniences in S (and hence R) functions
>> have been dangerous {and much more used, hence close to
>> impossible to abolish, e.g., sample(x) when x  is numeric of length 1,

> There's a difference between these two

Re: [Rd] R-intro: function 'stderr' and 'sd'

2016-09-13 Thread Martin Maechler

>>>>> William Dunlap <wdun...@tibco.com>
>>>>> on Tue, 13 Sep 2016 09:06:00 -0700 writes:

> While you are editing that, you might change its name from 'stderr'
> to standardError (or standard_error, etc.) so as not to conflict with
> base::stderr().

oh yes.. blush! ..
Martin

> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com

> On Tue, Sep 13, 2016 at 8:55 AM, Martin Maechler 
<maech...@stat.math.ethz.ch
>> wrote:

>> >>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
>> >>>>> on Fri, 9 Sep 2016 16:52:01 + writes:
>> 
>> > In "An Introduction to R" Version 3.3.1, in "4.2 The function
>> tapply() and ragged arrays", after
>> > stderr <- function(x) sqrt(var(x)/length(x))  ,
>> > there is a note in brackets:
>> > Writing functions will be considered later in [Writing your own
>> functions], and in this case was unnecessary as R also has a builtin
>> function sd().
>> 
>> > The part "in this case was unnecessary as R also has a builtin
>> function sd()" is misleading. The builtin function sd() doesn't calculate
>> standard error of the mean. It calculates standard deviation. The 
function
>> 'stderr' can use 'sd':
>> > function(x) sd(x)/sqrt(length(x))
>> 
>> You are right; thank you Suharto.
>> It now says
>> 
>> (Writing functions will be considered later in @ref{Writing your own
>> functions}.  Note that @R{}'s a builtin function @code{sd()} is something
>> different.)
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 

> [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-intro: function 'stderr' and 'sd'

2016-09-13 Thread Martin Maechler

> Suharto Anggono Suharto Anggono via R-devel 
> on Fri, 9 Sep 2016 16:52:01 + writes:

> In "An Introduction to R" Version 3.3.1, in "4.2 The function tapply() 
and ragged arrays", after
> stderr <- function(x) sqrt(var(x)/length(x))  ,
> there is a note in brackets:
> Writing functions will be considered later in [Writing your own 
functions], and in this case was unnecessary as R also has a builtin function 
sd().

> The part "in this case was unnecessary as R also has a builtin function 
sd()" is misleading. The builtin function sd() doesn't calculate standard error 
of the mean. It calculates standard deviation. The function 'stderr' can use 
'sd':
> function(x) sd(x)/sqrt(length(x))

You are right; thank you Suharto.
It now says

(Writing functions will be considered later in @ref{Writing your own
functions}.  Note that @R{}'s a builtin function @code{sd()} is something 
different.)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] c(, ) / help(dotsMethods) etc

2016-09-13 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Sat, 10 Sep 2016 21:49:37 +0200 writes:

>>>>> John Chambers <j...@r-project.org>
>>>>> on Sat, 10 Sep 2016 09:16:38 -0700 writes:

>> (Brief reply, I'm traveling but as per below, this is on my radar right 
now so wanted to comment.)
>> Two points regarding "dotsMethods".

>> 1.  To clarify the limitation.  It's not that all the arguments have to 
be of the same class, but there must be one class that they belong to or 
subclass.  (So, as in the example in the documentation, the method could be 
defined for a class union or other virtual class that all the actual arguments 
inherit from.)

> Thank you for the clarification.
> I knew that the limitation "the same class" has not been a big
> one, that's why I did use a class union in my example (below).
> I thought there were other limitations.. never mind.

>> 2.  The current documentation is a mess.  In common with lots of other 
very old documentation.  I'm in the process of rewriting a large chunk of the 
documentation including that for dotsMethods.  Sometime in the next few weeks, 
I hope to have it coherent enough to commit.

> That's great!

>> So far, I'm not trying to change any significant aspects of the code, 
including for "..." methods, which seem to do roughly what was intended.

> Yes, I'm sorry if I sounded like saying something different.

> That I think this [getting c() to work for a collection objects,
> some S4] needs changes in R is because it seems that do_c()
> fails to dispatch here, and hence the problem was with our C
> function that has carried the comment 

> | * To call this an ugly hack would be to insult all existing ugly hacks
> | * at large in the world.

> and I don't think I would be able to correctly patch that
> infamous function (in src/main/eval.c) ...

> Martin

On the other hand,
with the following patch

--- bind.c  (revision 71239)
+++ bind.c  (working copy)
@@ -732,7 +732,8 @@
 
 /* Attempt method dispatch. */
 
-if (DispatchOrEval(call, op, "c", args, env, , 1, 1))
+if (DispatchAnyOrEval(call, op, "c", args, env, , 1, 1))
+   //  ^^^ "Any" => all args are eval()ed and checked => correct 
multi-arg dispatch
return(ans);
 PROTECT(ans);
 SEXP res = do_c_dflt(call, op, ans, env);

the problem is basically solved.

Yes it does cost a tiny bit: according to minimal example testing (and
microbenchmark):

d4 <- diag(4); microbenchmark(c(), c(1), c(2,3), c(d4,3:1), times=2^12)

it costs 10-20 nanoseconds per call .. and possibly slightly
more after attaching a version of 'Matrix' with new 'c' methods,
where all versions of

   c(..., , ...)

would work.
OTOH, it seems very natural to me to allow proper dispatch once
a
   setMethod("c", "numMatrixLike", function(x, ..., recursive) { ...})
or even a
   setMethod("c", "ANY", function(x, ..., recursive) { ...})

method is defined.




>> On Sep 10, 2016, at 8:27 AM, Martin Maechler 
<maech...@stat.math.ethz.ch> wrote:

>>> I have been asked  (by Roger; thank you for the good question,
>>> and I hope it's fine to answer to the public) :
>>> 
>>>> with Pi a sparse matrix and x,y, and ones
>>>> compatible n-vectors — when I do:
>>> 
>>>>> c(t(x) %*% Pi %*% ones, t(ones) %*% Pi %*% y )
>>>> [[1]] 1 x 1 Matrix of class "dgeMatrix"
>>>> [,1] [1,]
>>>> 0.1338527
>>>> [[2]] 1 x 1 Matrix of class "dgeMatrix"
>>> [,1] [1,]
>>>> 0.7810341
>>> 
>>>> I get a list whereas if Pi is an ordinary matrix I get a
>>>> vector.  Is this intentional?
>>> 
>>> Well, no.  But it has been "unavoidable" in the sense that it had not
>>> been possible to provide S4 methods for '...' in the "remote"
>>> past, when  Matrix was created.
>>> 
>>> Later ... also quite a few years ago, John Chambers had added
>>> that possibility, with still some limitation (all '...' must be
>>> of the same class), and also plans to remove some of the
>>> limitations, see   ?dotsMethods  in R.
>>> 
>>> I honestly have forgotten the history of my trying to provide 'c'
>>> methods for our "Matrix" objects after the  'dotsMethods'
>>> possibility had emerged,  but I know I tried and had not seen a

Re: [Rd] Coercion of 'exclude' in function 'factor' (was 'droplevels' inappropriate change)

2016-09-13 Thread Martin Maechler

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
>>>>> on Fri, 2 Sep 2016 16:10:00 + writes:

> I am basically fine with the change.
> How about using just the following?
> if(!is.character(exclude))
>   exclude <- as.vector(exclude, typeof(x)) # may result in NA
> x <- as.character(x)

> It looks simpler and is, more or less, equivalent.

yes, but the current code should be slightly faster..

> In factor.Rd, in description of argument 'exclude', "(when \code{x} is a 
\code{factor} already)" can be removed.


> A larger change that, I think, is reasonable is entirely removing the code
> exclude <- as.vector(exclude, typeof(x)) # may result in NA

> The explicit coercion of 'exclude' is not necessary. 
> Function 'factor' works without it.

> The coercion of 'exclude' may lead to a surprise because it "may result 
in NA". 
> Example from https://stat.ethz.ch/pipermail/r-help/2005-April/069053.html 
:
>factor(as.integer(c(1,2,3,3,NA)), exclude=NaN)
> excludes NA.

> As a bonus, without the coercion of 'exclude', 'exclude' can be a factor 
if 'x' is a factor. This part of an example in 
https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html works.
> cc <- c("x","y","NA")
> ff <- factor(cc)
> factor(ff,exclude=ff[3])

Yes, two good reasons for a change.  factor() would finally
behave according to the documentation which has been mentioning
that 'exclude' could be a factor: ((Until my R-devel changes of a
few weeks ago, i.e. at least in all recent released versions of R)),
the help page for factor has said

|| If 'exclude' is used it should either be a factor with the same
|| level set as 'x' or a set of codes for the levels to be excluded.

  > However, the coercion of 'exclude' has been in function 'factor' in R 
"forever".

Indeed: On March 6, 1998, svn rev. 845, when the R source files got a
'.R' appended, and quite a long time before  R 1.0.0,
the factor function was as short as (but using an .Internal(.) !)

function (x, levels = sort(unique(x), na.last = TRUE), labels, exclude = NA, 
ordered = FALSE) 
{
if (length(x) == 0) 
return(character(0))
exclude <- as.vector(exclude, typeof(x))
levels <- levels[is.na(match(levels, exclude))]
x <- .Internal(factor(match(x, levels), length(levels), 
ordered))
if (missing(labels)) 
levels(x) <- levels
else levels(x) <- labels
x
}

and already contained that line.

Nevertheless, simplifying factor() by removing that line (or those
2 lines, now!) seems to only have advantages

I'm not yet committing to anything, but currently would strongly
consider it .. though *after* the
   '  OP  '
issue has settled a bit.

Martin

> --------
> On Wed, 31/8/16, Martin Maechler <maech...@stat.math.ethz.ch> wrote:

> Subject: Re: [Rd] 'droplevels' inappropriate change

> Cc: "Martin Maechler" <maech...@stat.math.ethz.ch>
> Date: Wednesday, 31 August, 2016, 2:51 PM
 
>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Sat, 27 Aug 2016 18:55:37 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
>>>>> on Sat, 27 Aug 2016 03:17:32 + writes:

>>> In R devel r71157, 'droplevels' documentation, in "Arguments" section, 
says this about argument 'exclude'.
>>> passed to factor(); factor levels which should be excluded from the 
result even if present.  Note that this was implicitly NA in R <= 3.3.1 which 
did drop NA levels even when present in x, contrary to the documentation.  The 
current default is compatible with x[ , drop=FALSE].

>>> The part
>>> x[ , drop=FALSE]
>>> should be
>>> x[ , drop=TRUE]

> [[elided Yahoo spam]]
>> a "typo" by me. .. fixed now.

>>> Saying that 'exclude' is factor levels is not quite true for NA 
element. NA may be not an original level, but NA in 'exclude' affects the 
result.

>>> For a factor 'x', factor(x, exclude = exclude) doesn't really work for 
excluding in general. See, for example, 
https://stat.ethz.ch/pipermail/r-help/2005-September/079336.html .
>>> factor(factor(c("a","b","c")), exclude="c")

>>> However, this excludes "2":
>>> factor(factor(2:3), exclude=2)

>>> Rather unexpectedly, this excludes NA:
>>> factor(factor(c("a&quo

Re: [Rd] c(, ) / help(dotsMethods) etc

2016-09-10 Thread Martin Maechler

I have been asked  (by Roger; thank you for the good question,
and I hope it's fine to answer to the public) :
   
> with Pi a sparse matrix and x,y, and ones
> compatible n-vectors — when I do:

>> c(t(x) %*% Pi %*% ones, t(ones) %*% Pi %*% y )
> [[1]] 1 x 1 Matrix of class "dgeMatrix"
> [,1] [1,]
> 0.1338527
> [[2]] 1 x 1 Matrix of class "dgeMatrix"
 [,1] [1,]
> 0.7810341

> I get a list whereas if Pi is an ordinary matrix I get a
> vector.  Is this intentional?

Well, no.  But it has been "unavoidable" in the sense that it had not
been possible to provide S4 methods for '...' in the "remote"
past, when  Matrix was created.

Later ... also quite a few years ago, John Chambers had added
that possibility, with still some limitation (all '...' must be
of the same class), and also plans to remove some of the
limitations, see   ?dotsMethods  in R.

I honestly have forgotten the history of my trying to provide 'c'
methods for our "Matrix" objects after the  'dotsMethods'
possibility had emerged,  but I know I tried and had not seen a
way to succeed "satisfactorily",
but maybe I now think I maybe should try again.
I currently think this needs changes to R before it can be done
satisfactorily, and this is the main reason why this is a public
answer to R-devel@..., but I'm happy if I'am wrong.

The real challenge here is that I think that if it  should "work",
it should work so in all cases, e.g., also for

c(NA, 3:2, Matrix(2:1), matrix(10:11))

and that's not so easy, e.g., the following class and method
definitions do *not* achieve the desired result:

## "mMatrix" is already hidden in Matrix pkg:
setClassUnion("mMatrix", members = c("matrix", "Matrix"))
setClassUnion("numMatrixLike", members =
c("logical", "integer","numeric", "mMatrix"))

c.Matrix <- function(...) unlist(lapply(list(...), as.vector))
## NB: Must use   signature  '(x, ..., recursive = FALSE)' :
setMethod("c", "Matrix", function(x, ..., recursive) c.Matrix(x,
...))
## The above is not sufficient for
##c(NA, 3:2, , ) :
setMethod("c", "numMatrixLike", function(x, ..., recursive)
   c.Matrix(x, ...))

## but the above does not really help:

> c(Diagonal(3), NA, Matrix(10:11))   ## works fine,
 [1]  1  0  0  0  1  0  0  0  1 NA 10 11

> c(NA, Diagonal(3)) ## R's lowlevel c() already decided to use list():
[[1]]
 [1] NA

[[2]]
 [,1] [,2] [,3]
 [1,]1..
 [2,].1.
 [3,]..1

>
--

BTW, I (and the package users) suffer from exactly the same
problem with the "MPFR" (multi precision numbers) provided by my
package Rmpfr:

> require(Rmpfr)
> c(mpfr(3,100), 1/mpfr(7, 80)) ## works fine
2 'mpfr' numbers of precision  80 .. 100  bits
[1]3 0.14285714285714285714285708

> c(pi, 1/mpfr(7, 80)) ## "fails" even worse than in 'Matrix' case
[[1]]
[1] 3.141593

[[2]]
'mpfr1' 0.14285714285714285714285708

> 


Yes, it would be very nice  if  c(.)  could be used to
concatenate quite arbitrary  R objects into one long atomic
vector, but I don't see how to achieve this easily.

The fact, that  c()  just builds a list of its arguments if it
("thinks" it) cannot dispatch to a method, is a good strategy,
but I'd hope it should be possible to have c() try to do better
(and hence work for this case, and
without a noticable performance penalty.

Suggestions are very welcome.
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] table(exclude = NULL) always includes NA

2016-09-10 Thread Martin Maechler

> Suharto Anggono Suharto Anggono 
> on Sat, 10 Sep 2016 02:36:54 + writes:

> Looking at the code of function 'table' in R devel r71227, I see that the 
part "remove NA level if it was added only for excluded in factor(a, 
exclude=.)" is not quite right.
> In
> is.na(a) <- match(a0, c(exclude,NA), nomatch=0L)   ,
> I think that what is intended is
> a[a0 %in% c(exclude,NA)] <- NA  .
yes.
> So, it should be
>   is.na(a) <- match(a0, c(exclude,NA), nomatch=0L) > 0L
> or
>   is.na(a) <- as.logical(match(a0, c(exclude,NA), nomatch=0L))  .
> The parallel code
>is.na(a) <- match(a0,   exclude, nomatch=0L)
> is to be treated similarly.

indeed.  I may have been  very wrongly thinking that `is.na<-`
coerced its value to logical... or otherwise not thinking at all ;-)


> Example that gives wrong result in R devel r71225:
> table(3:1, exclude = 1)
> table(3:1, exclude = 1, useNA = "always")
> 

Thanks a lot, Suharto.   You are entirely correct.

I'm amazed that  table(*, exclude = *)  seems so rarely used / tested,
that this has gone undetected for almost four weeks.
It is fixed now with svn r71230.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] c(, ) / help(dotsMethods) etc

2016-09-10 Thread Martin Maechler

>>>>> John Chambers <j...@r-project.org>
>>>>> on Sat, 10 Sep 2016 09:16:38 -0700 writes:

> (Brief reply, I'm traveling but as per below, this is on my radar right 
now so wanted to comment.)
> Two points regarding "dotsMethods".

> 1.  To clarify the limitation.  It's not that all the arguments have to 
be of the same class, but there must be one class that they belong to or 
subclass.  (So, as in the example in the documentation, the method could be 
defined for a class union or other virtual class that all the actual arguments 
inherit from.)

Thank you for the clarification.
I knew that the limitation "the same class" has not been a big
one, that's why I did use a class union in my example (below).
I thought there were other limitations.. never mind.

> 2.  The current documentation is a mess.  In common with lots of other 
very old documentation.  I'm in the process of rewriting a large chunk of the 
documentation including that for dotsMethods.  Sometime in the next few weeks, 
I hope to have it coherent enough to commit.

That's great!

> So far, I'm not trying to change any significant aspects of the code, 
including for "..." methods, which seem to do roughly what was intended.

Yes, I'm sorry if I sounded like saying something different.

That I think this [getting c() to work for a collection objects,
some S4] needs changes in R is because it seems that do_c()
fails to dispatch here, and hence the problem was with our C
function that has carried the comment 

| * To call this an ugly hack would be to insult all existing ugly hacks
| * at large in the world.

and I don't think I would be able to correctly patch that
infamous function (in src/main/eval.c) ...

Martin

> John


> On Sep 10, 2016, at 8:27 AM, Martin Maechler <maech...@stat.math.ethz.ch> 
wrote:

>> I have been asked  (by Roger; thank you for the good question,
>> and I hope it's fine to answer to the public) :
>> 
>>> with Pi a sparse matrix and x,y, and ones
>>> compatible n-vectors — when I do:
>> 
>>>> c(t(x) %*% Pi %*% ones, t(ones) %*% Pi %*% y )
>>> [[1]] 1 x 1 Matrix of class "dgeMatrix"
>>> [,1] [1,]
>>> 0.1338527
>>> [[2]] 1 x 1 Matrix of class "dgeMatrix"
>> [,1] [1,]
>>> 0.7810341
>> 
>>> I get a list whereas if Pi is an ordinary matrix I get a
>>> vector.  Is this intentional?
>> 
>> Well, no.  But it has been "unavoidable" in the sense that it had not
>> been possible to provide S4 methods for '...' in the "remote"
>> past, when  Matrix was created.
>> 
>> Later ... also quite a few years ago, John Chambers had added
>> that possibility, with still some limitation (all '...' must be
>> of the same class), and also plans to remove some of the
>> limitations, see   ?dotsMethods  in R.
>> 
>> I honestly have forgotten the history of my trying to provide 'c'
>> methods for our "Matrix" objects after the  'dotsMethods'
>> possibility had emerged,  but I know I tried and had not seen a
>> way to succeed "satisfactorily",
>> but maybe I now think I maybe should try again.
>> I currently think this needs changes to R before it can be done
>> satisfactorily, and this is the main reason why this is a public
>> answer to R-devel@..., but I'm happy if I'am wrong.
>> 
>> The real challenge here is that I think that if it  should "work",
>> it should work so in all cases, e.g., also for
>> 
>> c(NA, 3:2, Matrix(2:1), matrix(10:11))
>> 
>> and that's not so easy, e.g., the following class and method
>> definitions do *not* achieve the desired result:
>> 
>> ## "mMatrix" is already hidden in Matrix pkg:
>> setClassUnion("mMatrix", members = c("matrix", "Matrix"))
>> setClassUnion("numMatrixLike", members =
>> c("logical", "integer","numeric", "mMatrix"))
>> 
>> c.Matrix <- function(...) unlist(lapply(list(...), as.vector))
>> ## NB: Must use   signature  '(x, ..., recursive = FALSE)' :
>> setMethod("c", "Matrix", function(x, ..., recursive) c.Matrix(x,
>> ...))
>> ## The above is not sufficient for
>> ##c(NA, 3:2, , ) :
>> setMethod("c", "numMatrixLike", function(x, ..., recursive)
>> c.Matrix(x, ...))
>> 
&g

Re: [Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

2016-09-09 Thread Martin Maechler

>> 
>>>> Returning a logical of length 0 is more backwards compatible, but is it
>>>> ever what the author actually intended? I have trouble thinking of a 
case
>>>> where that less-than didn't carry an implicit assumption that y was
>>>> non-NULL.  I can say that in my own code, I've never hit that behavior
>>>> in a
>>>> case that wasn't an error.
>>>> 
>>>> My vote (unless someone else points out a compelling use for the
>>>> behavior)
>>>> is for the to throw an error. As a developer, I'd rather things like 
this
>>>> break so the bug in my logic is visible, rather than  propagating as 
the
>>>> 0-length logical is &'ed or |'ed with other logical vectors, or used to
>>>> subset, or (in the case it should be length 1) passed to if() (if 
throws
>>>> an
>>>> error now, but the rest would silently "work").
>>>> 
>>>> Best,
>>>> ~G
>>>> 
>>>> On Thu, Sep 8, 2016 at 3:49 AM, Martin Maechler <
>>>> maech...@stat.math.ethz.ch>
>>>> wrote:
>>>> 
>>>> > >>>>> robin hankin <hankin.ro...@gmail.com>
>>>> > >>>>> on Thu, 8 Sep 2016 10:05:21 +1200 writes:
>>>> >
>>>> > > Martin I'd like to make a comment; I think that R's
>>>> > > behaviour on 'edge' cases like this is an important thing
>>>> > > and it's great that you are working on it.
>>>> >
>>>> > > I make heavy use of zero-extent arrays, chiefly because
>>>> > > the dimnames are an efficient and logical way to keep
>>>> > > track of certain types of information.
>>>> >
>>>> > > If I have, for example,
>>>> >
>>>> > > a <- array(0,c(2,0,2))
>>>> >     > dimnames(a) <- list(name=c('Mike','Kevin'),
>>>> > NULL,item=c("hat","scarf"))
>>>> >
>>>> >
>>>> > > Then in R-3.3.1, 70800 I get
>>>> >
>>>> > a> 0
>>>> > > logical(0)
>>>> > >>
>>>> >
>>>> > > But in 71219 I get
>>>> >
>>>> > a> 0
>>>> > > , , item = hat
>>>> >
>>>> >
>>>> > > name
>>>> > > Mike
>>>> > > Kevin
>>>> >
>>>> > > , , item = scarf
>>>> >
>>>> >
>>>> > > name
>>>> > > Mike
>>>> > > Kevin
>>>> >
>>>> > > (which is an empty logical array that holds the names of the
>>>> people
>>>> > and
>>>> > > their clothes). I find the behaviour of 71219 very much
>>>> preferable
>>>> > because
>>>> > > there is no reason to discard the information in the dimnames.
>>>> >
>>>> > Thanks a lot, Robin, (and Oliver) !
>>>> >
>>>> > Yes, the above is such a case where the new behavior makes much 
sense.
>>>> > And this behavior remains identical after the 71222 amendment.
>>>> >
>>>> > Martin
>>>> >
>>>> > > Best wishes
>>>> > > Robin
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > > On Wed, Sep 7, 2016 at 9:49 PM, Martin Maechler <
>>>> > maech...@stat.math.ethz.ch>
>>>> > > wrote:
>>>> >
>>>> > >> >>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>> > >> >>>>> on Tue, 6 Sep 2016 22:26:31 +0200 writes:
>>>> > >>
>>>> > >> > Yesterday, changes to R's development version were 
committed,
>>>> > >> relating
>>>> > >> > to arithmetic,

Re: [Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

2016-09-09 Thread Martin Maechler

> Radford Neal 
> on Thu, 8 Sep 2016 17:11:18 -0400 writes:

> Regarding Martin Maechler's proposal:
> Arithmetic between length-1 arrays and longer non-arrays had
> silently dropped the array attributes and recycled.  This now gives
> a warning and will signal an error in the future, as it has always
> for logic and comparison operations

> For example, matrix(1,1,1) + (1:2) would give a warning/error.

> I think this might be a mistake.

> The potential benefits of this would be detection of some programming
> errors, and increased consistency.  The downsides are breaking
> existing working programs, and decreased consistency.

> Regarding consistency, the overall R philosophy is that attaching an
> attribute to a vector doesn't change what you can do with it, or what
> the result is, except that the result (often) gets the attributes
> carried forward.  By this logic, adding a 'dim' attribute shouldn't
> stop you from doing arithmetic (or comparisons) that you otherwise
> could.

Thank you, Radford, for joining in.
The above is a good line of reasoning.

> But maybe 'dim' attributes are special?  Well, they are in some
> circumstances, and have to be when they are intended to change the
> behaviour, such as when a matrix is used as an index with [.

indeed.

> But in many cases at present, 'dim' attributes DON'T stop you from
> treating the object as a plain vector - for example, one is allowed 
> to do matrix(1:4,2,2)[3], and a<-numeric(10); a[2:5]<-matrix(1,2,2).

agreed.

> So it may make more sense to move towards consistency in the
> permissive direction, rather than the restrictive direction.

> That would mean allowing matrix(1,1,1) < (1:2), and maybe also things
> like matrix(1,2,2)+(1:8).

That is an interesting idea.  Yes, in my view that would
definitely also have to allow the latter, by the above argument
of not treating the dim/dimnames attributes special.  For
non-arrays length-1 is not treated much special apart from the
fact that length-1 can always be recycled (without warning).

> Obviously, a change that removes error conditions is much less likely
> to produce backwards-compatibility problems than a change that gives
> errors for previously-allowed operations.

Of course that is true... and that has also been the reason for
my amendment

> And I think there would be some significant problems. In addition to
> the 10-20+ packages that Martin expects to break, there could be quite
> a bit of user code that would no longer work - scripts for analysing
> data sets that used to work, but now don't with the latest version.

That's not true (at least for the cases above): They would give
a strong warning, "strong" because it is

   > matrix(1,1) + 1:2
   [1] 2 3
   Warning message:
   In matrix(1, 1) + 1:2 :
 dropping dim() of array of length one.  Will become ERROR
   > 

*and* the  logic and relop versions of this, e.g.,
   matrix(TRUE,1) | c(TRUE,FALSE) ;  matrix(1,1) > 1:2,
have always been an  error; so nothing would break there.

> There are reasons to expect such problems.  Some operations such as
> vector dot products using %*% produce results that are always scalar,
> but are formed as 1x1 matrices.

Of course; that *was* the reason the very special treatment for arithmetic
length-1 arrays had been introduced.  It is convenient.

However, *some* of the conveniences in S (and hence R) functions
have been dangerous {and much more used, hence close to
impossible to abolish, e.g., sample(x) when x  is numeric of length 1,
and several others, you'll find in the "R Inferno"}, or at least
quirky for *programming* with R (as opposed to pure interactive use).

> One can anticipate that many people
> have not been getting rid of the 'dim' attribute in such cases, when
> doing so hasn't been necessary in the past.

If it remains at 10-20 CRAN packages (out of 9000), each with
just very few instances, that would indicate I think not so wide
spread use.
Note that they only did not have to get rid of the dim() in the
length-1 case (and only for arithmetic): as soon as they had
another dimension, they would have got an error.

Still, I agree about the validity of your line of thought, and
that in order to get consistency we also could go into the
direction of being more permissive rather than restrictive.

I'm interested to hear other opinions notably as in recent years,
some famous R teachers have typically critized R are as being
not strict enough ...

> Regarding the 0-length vector issue, I agree with other posters that
> after a<-numeric(0), is has to be allowable to write a<1.  To not
> allow this would be highly destructive of code reliability.  And for
> similar reason, after a<-c(), a<1 needs to be allowed, which means
> NULL<1 should be allowed (giving logical(0)), since

Re: [Rd] Undocumented 'use.names' argument to c()

2016-09-24 Thread Martin Maechler

> Karl Millar via R-devel 
> on Fri, 23 Sep 2016 11:12:49 -0700 writes:

> I'd expect that a lot of the performance overhead could be eliminated
> by simply improving the underlying code.  IMHO, we should ignore it in
> deciding the API that we want here.

I agree partially.  Even if the underlying code can be made
faster, the 'use.names = FALSE' version will still be faster
than the default, notably in some "long" cases.

More further down.

> On Fri, Sep 23, 2016 at 10:54 AM, Henrik Bengtsson
>  wrote:
>> I'd vote for it to stay.  It could of course suprise someone who'd
>> expect c(list(a=1), b=2, use.names = FALSE) to generate list(a=1, b=2,
>> use.names=FALSE).   On the upside, is the performance gain from using
>> use.names=FALSE.  Below benchmarks show that the combining of the
>> names attributes themselves takes ~20-25 times longer than the
>> combining of the integers themselves.  Also, at no surprise,
>> use.names=FALSE avoids some memory allocations.
>> 
>>> options(digits = 2)
>>> 
>>> a <- b <- c <- d <- 1:1e4
>>> names(c) <- c
>>> names(d) <- d
>>> 
>>> stats <- microbenchmark::microbenchmark(
>> +   c(a, b, use.names=FALSE),
>> +   c(c, d, use.names=FALSE),
>> +   c(a, d, use.names=FALSE),
>> +   c(a, b, use.names=TRUE),
>> +   c(a, d, use.names=TRUE),
>> +   c(c, d, use.names=TRUE),
>> +   unit = "ms"
>> + )
>>> 
>>> stats
>> Unit: milliseconds
>> expr   minlq  mean medianuq   max neval
>> c(a, b, use.names = FALSE) 0.031 0.032 0.049  0.034 0.036 1.474   100
>> c(c, d, use.names = FALSE) 0.031 0.031 0.035  0.034 0.035 0.064   100
>> c(a, d, use.names = FALSE) 0.031 0.031 0.049  0.034 0.035 1.452   100
>> c(a, b, use.names = TRUE) 0.031 0.031 0.055  0.034 0.036 2.094   100
>> c(a, d, use.names = TRUE) 0.510 0.526 0.588  0.549 0.617 1.998   100
>> c(c, d, use.names = TRUE) 0.780 0.815 0.886  0.841 0.944 1.430   100
>> 
>>> profmem::profmem(c(c, d, use.names=FALSE))
>> Rprofmem memory profiling of:
>> c(c, d, use.names = FALSE)
>> 
>> Memory allocations:
>> bytes  calls
>> 1 80040 
>> total 80040
>> 
>>> profmem::profmem(c(c, d, use.names=TRUE))
>> Rprofmem memory profiling of:
>> c(c, d, use.names = TRUE)
>> 
>> Memory allocations:
>> bytes  calls
>> 1  80040 
>> 2 160040 
>> total 240080
>> 
>> /Henrik
>> 
>> On Fri, Sep 23, 2016 at 10:25 AM, William Dunlap via R-devel
>>  wrote:
>>> In Splus c() and unlist() called the same C code, but with a different
>>> 'sys_index'  code (the last argument to .Internal) and c() did not 
consider
>>> an argument named 'use.names' special.

Thank you, Bill, very much, for making the historical context
clear, and giving us the facts, there.

OTOH, it is also true in R, that  c() and unlist() share code
.. quite a bit less though .. but more importantly, the very
original C code of Ross Ihaka (and possibly Robert Gentleman)
had explicitly considered both extra arguments 'recursive' and
'use.names', and not just the first.

The fact that c() has always been a .Primitive function and that
these have no formals()  had contributed to what I think to be a
documentation glitch early on, and when, quite a bit later, we've
added a fake argument list for printing, the then current
documentation was used.

This was the reason for declaring it a documentation "hole"
rather than something we do not want.

(read on)

 c
>>> function(..., recursive = F)
>>> .Internal(c(..., recursive = recursive), "S_unlist", TRUE, 1)
 unlist
>>> function(data, recursive = T, use.names = T)
>>> .Internal(unlist(data, recursive = recursive, use.names = use.names),
>>> "S_unlist", TRUE, 2)
 c(A=1,B=2,use.names=FALSE)
>>> A B use.names
>>> 1 2 0
>>> 
>>> The C code used sys_index==2 to mean 'the last  argument is the 
'use.names'
>>> argument, if sys_index==1 only the recursive argument was considered
>>> special.
>>> 
>>> Sys.funs.c:
>>> 405 S_unlist(vector *ent, vector *arglist, s_evaluator *S_evaluator)
>>> 406 {
>>> 407 int which = sys_index; boolean named, recursive, names;
>>> ...
>>> 419 args = arglist->value.tree; n = arglist->length;
>>> ...
>>> 424 names = which==2 ? logical_value(args[--n], ent, 
S_evaluator)
>>> : (which == 1);
>>> 
>>> Thus there is no historical reason for giving c() the use.names 
argument.
>>> 
>>> 
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>> 
>>> On Fri, Sep 23, 2016 at 9:37 AM, Suharto Anggono Suharto Anggono via
>>> R-devel  wrote:
>>> 
 In S-PLUS 3.4 help on

Re: [Rd] withAutoprint({ .... }) ?

2016-09-24 Thread Martin Maechler

>>>>> William Dunlap <wdun...@tibco.com>
>>>>> on Fri, 2 Sep 2016 08:33:47 -0700 writes:

> Re withAutoprint(), Splus's source() function could take a expression
> (literal or not) in place of a file name or text so it could support
> withAutoprint-like functionality in its GUI.  E.g.,

>> source(auto.print=TRUE, exprs.literal= { x <- 3:7 ; sum(x) ; y <- log(x)
> ; x - 100}, prompt="--> ")
--> x <- 3:7
--> sum(x)
> [1] 25
--> y <- log(x)
--> x - 100
> [1] -97 -96 -95 -94 -93

> or

>> expr <- quote({ x <- 3:7 ; sum(x) ; y <- log(x) ; x - 100})
>> source(auto.print=TRUE, exprs = expr, prompt="--> ")
--> x <- 3:7
--> sum(x)
> [1] 25
--> y <- log(x)
--> x - 100
> [1] -97 -96 -95 -94 -93

> It was easy to implement, since exprs's default value is parse(file) or
> parse(text=text), which source is calculating anyway.


> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com

Thank you, Bill  (and the other correspondents); that's indeed a
very good suggestion :

I've come to the conclusion that Duncan and Bill are right:  One
should do this in R (not C) and as Bill hinted, one should use
source().  I first tried to do it separately, just "like source()",
but a considerable part of the source of source()  {:-)} is
about using src attributes instead of deparse() when the former
are present,  and it does make sense to generalize
withAutoprint() to have the same feature, so after all, have it
call source().

I've spent a few hours now trying things and variants, also
found I needed to enhance source()  very slightly also in a few
other details, and now (in my uncommitted version of R-devel), 

  withAutoprint({ x <- 1:12; x-1; (y <- (x-5)^2); z <- y; z - 10 })

produces

> withAutoprint({ x <- 1:12; x-1; (y <- (x-5)^2); z <- y; z - 10 })
> x <- 1:12
> x - 1
 [1]  0  1  2  3  4  5  6  7  8  9 10 11
> (y <- (x - 5)^2)
 [1] 16  9  4  1  0  1  4  9 16 25 36 49
> z <- y
> z - 10
 [1]   6  -1  -6  -9 -10  -9  -6  -1   6  15  26  39
> 

and is equivalent to 

   withAutoprint(expression(x <- 1:12, x-1, (y <- (x-5)^2), z <- y, z - 10 ))

I don't see any way around the "mis-feature" that all "input"
expressions are in the end shown twice in the "output" (the
first time by showing the withAutoprint(...) call itself).

The function *name* is "not bad" but also a bit longish;
maybe there are better ideas?  (not longer, no "_" - I know this
is a matter of taste only)

Martin

> On Fri, Sep 2, 2016 at 4:56 AM, Martin Maechler 
<maech...@stat.math.ethz.ch>
> wrote:

>> On R-help, with subject
>> '[R] source() does not include added code'
>> 
>> >>>>> Joshua Ulrich <josh.m.ulr...@gmail.com>
>> >>>>> on Wed, 31 Aug 2016 10:35:01 -0500 writes:
>> 
>> > I have quantstrat installed and it works fine for me.  If you're
>> > asking why the output of t(tradeStats('macross')) isn't being
>> printed,
>> > that's because of what's described in the first paragraph in the
>> > *Details* section of help("source"):
>> 
>> > Note that running code via ‘source’ differs in a few respects from
>> > entering it at the R command line.  Since expressions are not
>> > executed at the top level, auto-printing is not done.  So you will
>> > need to include explicit ‘print’ calls for things you want to be
>> > printed (and remember that this includes plotting by ‘lattice’,
>> > FAQ Q7.22).
>> 
>> 
>> 
>> > So you need:
>> 
>> > print(t(tradeStats('macross')))
>> 
>> > if you want the output printed to the console.
>> 
>> indeed, and "of course"" ;-)
>> 
>> As my subject indicates, this is another case, where it would be
>> very convenient to have a function
>> 
>> withAutoprint()
>> 
>> so the OP could have (hopefully) have used
>> withAutoprint(source(..))
>> though that would have been equivalent to the already nicely existing
>> 
>> source(.., print.eval = TRUE)
>> 
>> which works via the  withVisible(.) utility that returns for each
>> 'expression' if it would auto print or not, and then does print (or
>> not) accordingly.
>> 
>> My own u

Re: [Rd] Undocumented 'use.names' argument to c()

2016-09-26 Thread Martin Maechler

>>>>> Suharto Anggono Suharto Anggono <suharto_angg...@yahoo.com>
>>>>> on Mon, 26 Sep 2016 14:51:11 + writes:

> By "an argument named 'use.names' is included for concatenation", I meant 
something like this, that someone might try.
>> c(as.Date("2016-01-01"), use.names=FALSE)
> use.names 
> "2016-01-01" "1970-01-01" 

> See, 'use.names' is in the output. That's precisely because 'c.Date' 
doesn't have 'use.names', so that 'use.names' is absorbed into '...'.

Yes, of course.
Thank you for the explanation; now I understand what you meant.

Indeed, the situation is not entirely satisfactory:

Ideally, *both* the  'recursive' and 'use.names' arguments of
c() should be considered arguments of only the *default* method of c(),
not the generic.

OTOH, c() being .Primitive() the implementation is in C only,
and (in some sense) of *both* the generic function and the
default method.
The C code clearly treats  'recursive' and 'use.names' "the
same", and has been part of R "forever".

I think that ideally, we should aim for

1) The generic function  c()  only has arguments "..." (or possibly
   ---  because of history of the S4 part ---  "x, ...").

2) The default method has additional arguments
   'recursive = FALSE, use.names = TRUE'
   and other methods of c() can choose if they want to also
   support one or two or none of these extras.

Somewhat related, but in principle independent of '1)'
and '2)' above  -- I think, because of the ".Primitive"-ness of c() --
is the quite how 'c' should print in R.
Currently it prints like what I say should just be the default
method.

Honestly, I'm not sure if it would be straightforward or even
just relatively painless to go to  '1) + 2)' ... may change
r71349 (to the S4 generic definition of "c") had dramatical
effects in "package land" and hence reversion of that (with
r71354) was necessary, for the time being.

Martin

> 
> On Sun, 25/9/16, Martin Maechler <maech...@stat.math.ethz.ch> wrote:

> Subject: Re: [Rd] Undocumented 'use.names' argument to c()
> To: "Suharto Anggono Suharto Anggono" <suharto_angg...@yahoo.com>
> Cc: "R-devel" <R-devel@r-project.org>
> Date: Sunday, 25 September, 2016, 10:14 PM

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
>>>>> on Sun, 25 Sep 2016 14:12:10 + writes:

>>> From comments in
>>> 
http://stackoverflow.com/questions/24815572/why-does-function-c-accept-an-undocumented-argument/24815653
>>> : The code of c() and unlist() was formerly shared but
>>> has been (long time passing) separated. From July 30,
>>> 1998, is where do_c got split into do_c and do_unlist.
>> With the implementation of 'c.Date' in R devel r71350, an
>> argument named 'use.names' is included for
>> concatenation. So, it doesn't follow the documented
>> 'c'. But, 'c.Date' is not explicitly documented in
>> Dates.Rd, that has 'c.Date' as an alias.

> I do not see any  c.Date  in R-devel with a 'use.names'; its a
> base function, hence not hidden ..

> As mentioned before, 'use.names' is used in unlist() in quite a
> few places, and such an argument also exists for

> lengths()and
> all.equal.list()

> and now c()

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Coercion of 'exclude' in function 'factor' (was 'droplevels' inappropriate change)

2016-09-30 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Tue, 13 Sep 2016 18:33:35 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
>>>>> on Fri, 2 Sep 2016 16:10:00 + writes:

>> I am basically fine with the change.
>> How about using just the following?
>> if(!is.character(exclude))
>> exclude <- as.vector(exclude, typeof(x)) # may result in NA
>> x <- as.character(x)

>> It looks simpler and is, more or less, equivalent.

> yes, but the current code should be slightly faster..

>> In factor.Rd, in description of argument 'exclude', "(when \code{x} is a 
\code{factor} already)" can be removed.


>> A larger change that, I think, is reasonable is entirely removing the 
code
>> exclude <- as.vector(exclude, typeof(x)) # may result in NA

>> The explicit coercion of 'exclude' is not necessary. 
>> Function 'factor' works without it.

>> The coercion of 'exclude' may lead to a surprise because it "may result 
in NA". 
>> Example from 
https://stat.ethz.ch/pipermail/r-help/2005-April/069053.html :
>> factor(as.integer(c(1,2,3,3,NA)), exclude=NaN)
>> excludes NA.

>> As a bonus, without the coercion of 'exclude', 'exclude' can be a factor 
if 'x' is a factor. This part of an example in 
https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html works.
>> cc <- c("x","y","NA")
>> ff <- factor(cc)
>> factor(ff,exclude=ff[3])

> Yes, two good reasons for a change.  factor() would finally
> behave according to the documentation which has been mentioning
> that 'exclude' could be a factor: ((Until my R-devel changes of a
> few weeks ago, i.e. at least in all recent released versions of R)),
> the help page for factor has said

> || If 'exclude' is used it should either be a factor with the same
> || level set as 'x' or a set of codes for the levels to be excluded.

>> However, the coercion of 'exclude' has been in function 'factor' in R 
"forever".

> Indeed: On March 6, 1998, svn rev. 845, when the R source files got a
> '.R' appended, and quite a long time before  R 1.0.0,
> the factor function was as short as (but using an .Internal(.) !)

> function (x, levels = sort(unique(x), na.last = TRUE), labels, exclude = 
NA, 
>   ordered = FALSE) 
> {
>   if (length(x) == 0) 
>   return(character(0))
>   exclude <- as.vector(exclude, typeof(x))
>   levels <- levels[is.na(match(levels, exclude))]
>   x <- .Internal(factor(match(x, levels), length(levels), 
>   ordered))
>   if (missing(labels)) 
>   levels(x) <- levels
>   else levels(x) <- labels
>   x
> }

> and already contained that line.

> Nevertheless, simplifying factor() by removing that line (or those
> 2 lines, now!) seems to only have advantages

> I'm not yet committing to anything, but currently would strongly
> consider it .. though *after* the
> '  OP  '
> issue has settled a bit.

  (Which it has;  the decision has been to keep it.)

I have now committed Suharto's proposal above, to entirely drop the
exclude <- as.vector(exclude, typeof(x))
parts
in the factor() function...  which has the two advantages
mentioned above and simplifies the code (and documentation).


r71424 | maechler | 2016-09-30 14:38:43 +0200 (Fri, 30 Sep 2016) | 1 line
Changed paths:
   M /trunk/doc/NEWS.Rd
   M /trunk/src/library/base/R/factor.R
   M /trunk/src/library/base/man/factor.Rd
   M /trunk/tests/reg-tests-1c.R

simplify factor(), allowing 'exclude= ' as documented in R <= 3.3.x


I do expect some "reaction" in CRAN/Bioconductor package space,
so the final word has not been spoken on this, but the new code
is more aestethic to me.

Thank you for the suggestion,
Martin 


>> 
>> On Wed, 31/8/16, Martin Maechler <maech...@stat.math.ethz.ch> wrote:

>> Subject: Re: [Rd] 'droplevels' inappropriate change

>> Cc: "Martin Maechler" <maech...@stat.math.ethz.ch>
>> Date: Wednesday, 31 August, 2016, 2:51 PM
 
>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Sat, 27 Aug 2016 18:55:37 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
&

Re: [Rd] Numerical accuracy of matrix multiplication

2016-09-20 Thread Martin Maechler

>>>>> peter dalgaard <pda...@gmail.com>
>>>>> on Fri, 16 Sep 2016 13:33:11 +0200 writes:

> On 16 Sep 2016, at 12:41 , Alexis Sarda <alexis.sa...@gmail.com> wrote:

>> Hello,
>> 
>> while testing the crossprod() function under Linux, I noticed the 
following:
>> 
>> set.seed(883)
>> x <- rnorm(100)
>> x %*% x - sum(x^2) # equal to 1.421085e-14
>> 
>> Is this difference normal? It seems to be rather large for double 
precision.
>> 

> It's less than .Machine$double.eps, relative (!) to x  %*% x ~= 100.

indeed!

Still, it gives exactly 0 on my platform(s), where I'm using R's
own version of BLAS / Lapack.

Are you perhaps using an "optimized" BLAS / LAPACK , i.e, one
that is fast but slightly less so accurate ?

Martin Maechler,
ETH Zurich


> -pd

>> Regards,
>> Alexis.
>> 
>> [[alternative HTML version deleted]]
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Numerical accuracy of matrix multiplication

2016-09-20 Thread Martin Maechler

>>>>> Alexis Sarda <alexis.sa...@gmail.com>
>>>>> on Tue, 20 Sep 2016 17:33:49 +0200 writes:

> I just realized that I was actually using a different random number
> generator, could that be a valid reason for the discrepancy?

> The code should be:

> RNGkind("L'Ecuyer")
> set.seed(883)
> x <- rnorm(100)
> x %*% x - sum(x^2) # equal to 1.421085e-14


Yes, now I get the same result so my story on  "BLAS / LAPACK"
is not relevant here.

But do note the main point from Peter Dalgaard that this is well
within Machine epsilon precision.

More precisely, here it is really one bit difference in the
least significant bit :

> print(rbind( x%*%x, crossprod(x), sum(x^2)), digits= 19)
 [,1]
[1,] 103.5096830356289814
[2,] 103.5096830356289814
[3,] 103.5096830356289672
> cbind(sprintf("%a", c(x%*%x, crossprod(x), sum(x^2
 [,1]  
[1,] "0x1.9e09ea598568fp+6"
[2,] "0x1.9e09ea598568fp+6"
[3,] "0x1.9e09ea598568ep+6"
> 


> Regards,
> Alexis Sarda.



> On Tue, Sep 20, 2016 at 5:27 PM, Martin Maechler 
<maech...@stat.math.ethz.ch
>> wrote:

>> >>>>> peter dalgaard <pda...@gmail.com>
>> >>>>> on Fri, 16 Sep 2016 13:33:11 +0200 writes:
>> 
>> > On 16 Sep 2016, at 12:41 , Alexis Sarda <alexis.sa...@gmail.com>
>> wrote:
>> 
>> >> Hello,
>> >>
>> >> while testing the crossprod() function under Linux, I noticed the
>> following:
>> >>
>> >> set.seed(883)
>> >> x <- rnorm(100)
>> >> x %*% x - sum(x^2) # equal to 1.421085e-14
>> >>
>> >> Is this difference normal? It seems to be rather large for double
>> precision.
>> >>
>> 
>> > It's less than .Machine$double.eps, relative (!) to x  %*% x ~= 100.
>> 
>> indeed!
>> 
>> Still, it gives exactly 0 on my platform(s), where I'm using R's
>> own version of BLAS / Lapack.
>> 
>> Are you perhaps using an "optimized" BLAS / LAPACK , i.e, one
>> that is fast but slightly less so accurate ?
>> 
>> Martin Maechler,
>> ETH Zurich
>> 
>> 
>> > -pd
>> 
>> >> Regards,
>> >> Alexis.
>> >>
>> >> [[alternative HTML version deleted]]
>> >>
>> >> __
>> >> R-devel@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> > --
>> > Peter Dalgaard, Professor,
>> > Center for Statistics, Copenhagen Business School
>> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> > Phone: (+45)38153501
>> > Office: A 4.23
>> > Email: pd@cbs.dk  Priv: pda...@gmail.com
>> 
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> 

> [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Undocumented 'use.names' argument to c()

2016-09-21 Thread Martin Maechler

> David Winsemius 
> on Tue, 20 Sep 2016 23:46:48 -0700 writes:

>> On Sep 20, 2016, at 7:18 PM, Karl Millar via R-devel 
 wrote:
>> 
>> 'c' has an undocumented 'use.names' argument.  I'm not sure if this is
>> a documentation or implementation bug.

> It came up on stackoverflow a couple of years ago:

> 
http://stackoverflow.com/questions/24815572/why-does-function-c-accept-an-undocumented-argument/24815653#24815653

> At the time it appeared to me to be a documentation lag.

Thank you, Karl and David,
yes it is a documentation glitch ... and a bit more:  Experts know that
print()ing of primitive functions is, eehm, "special".

I've committed a change to R-devel ... (with the intent to port
to R-patched).

Martin

>> 
>>> c(a = 1)
>> a
>> 1
>>> c(a = 1, use.names = F)
>> [1] 1
>> 
>> Karl

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] 'droplevels' inappropriate change

2016-08-27 Thread Martin Maechler

> Suharto Anggono Suharto Anggono via R-devel 
> on Sat, 27 Aug 2016 03:17:32 + writes:

> In R devel r71157, 'droplevels' documentation, in "Arguments" section, 
says this about argument 'exclude'.
> passed to factor(); factor levels which should be excluded from the 
result even if present.  Note that this was implicitly NA in R <= 3.3.1 which 
did drop NA levels even when present in x, contrary to the documentation.  The 
current default is compatible with x[ , drop=FALSE].

> The part
> x[ , drop=FALSE]
> should be
> x[ , drop=TRUE]

Yes, definitely, thank you!
a "typo" by me. .. fixed now.

> Saying that 'exclude' is factor levels is not quite true for NA element. 
NA may be not an original level, but NA in 'exclude' affects the result.

> For a factor 'x', factor(x, exclude = exclude) doesn't really work for 
excluding in general. See, for example, 
https://stat.ethz.ch/pipermail/r-help/2005-September/079336.html .
> factor(factor(c("a","b","c")), exclude="c")

> However, this excludes "2":
> factor(factor(2:3), exclude=2)

> Rather unexpectedly, this excludes NA:
> factor(factor(c("a",NA), exclude=NULL), exclude="c")

> For a factor 'x', factor(x, exclude = exclude) can only exclude 
integer-like or NA levels. An explanation is in 
https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html .

Well, Peter Dalgaard (in that R-devel e-mail, a bit more than 5
years ago) is confirming the problem there,  and suggesting (as
you, right?) that actually   `factor()` is not behaving
correctly here.

And your persistence is finally getting close to convince me
that it is not just droplevels(), but  factor() itself which
needs care here.

Interestingly, the following patch *does* pass 'make check-all'
(after small change in tests/reg-tests-1b.R which is ok),
and leads to behavior which is much closer to the documentation,
notably for your two examples above would give what one would
expect.

(( If the R-Hub would support experiments with branches of R-devel 
   from R-core members,  I could just create such a branch and R Hub
   would run 'R CMD check '  for thousands of CRAN packages
   and provide a web page with the *differences* in the package
   check results ... so we could see ... ))

I do agree that we should strongly consider such a change.

Martin

--- factor.R	(revision 71157)
+++ factor.R	(working copy)
@@ -28,8 +28,12 @@
 	levels <- unique(y[ind])
 }
 force(ordered) # check if original x is an ordered factor
-exclude <- as.vector(exclude, typeof(x)) # may result in NA
-x <- as.character(x)
+if(!is.character(x)) {
+	if(!is.character(exclude))
+	exclude <- as.vector(exclude, typeof(x)) # may result in NA
+	x <- as.character(x)
+} else
+	exclude <- as.vector(exclude, typeof(x)) # may result in NA
 ## levels could be a long vectors, but match will not handle that.
 levels <- levels[is.na(match(levels, exclude))]
 f <- match(x, levels)
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] "plot.ts" doesn't respect the value of "pch" (+ blocked from Bugzilla signups)

2016-08-26 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Fri, 26 Aug 2016 09:31:41 +0200 writes:

>>>>> Gregory Werbin <greg.wer...@libertymail.net>
>>>>> on Thu, 25 Aug 2016 15:21:05 -0400 writes:

>> I've had a chance to read the source more thoroughly. The chain of 
>> events is as follows:

>> 1. Local function `plotts()` is defined with argument `cex` that 
>> defaults to `par("cex")`
>> 2. `...` is passed to `plotts()`. If "cex" is an element in `...`, 
>> inside `plotts()` the variable `cex` is assigned thereby (overriding the 
>> default arg). Importantly, this means that the element "cex" is captured 
>> and _removed_ from `...`. `...` is eventually passed to `plot.window()`.
>> 3.
>> - In the univariate case (NCOL(x) == 1): When `lines.default()` is 
>> called to actually plot the data 
>> 
(https://github.com/wch/r-source/blob/trunk/src/library/stats/R/ts.R#L588 
>> and 
>> https://github.com/wch/r-source/blob/trunk/src/library/stats/R/ts.R#L597 
>> for reference), `cex` is not included in the call.
>> - In the bivariate case (NCOL(x) > 1): Because "cex" was captured and 
>> removed from `...`, it is not passed to `plot.default` when it is called 
>> 
(https://github.com/wch/r-source/blob/trunk/src/library/stats/R/ts.R#L548).

>> It turns out that the "eating" is not being done by `...` but by the 
>> signature of `plotts`.

>> The documentation currently reads:

>>> ...: additional graphical arguments, see 'plot', 'plot.default' and 
>>> 'par'.

>> This, to me, suggests parity with the 'plot' function in how the 
>> arguments in '...' are handled. Therefore either the code is wrong or 
>> the documentation is incomplete and misleading.

> the code is not perfect aka "wrong" .. so the bug is there.
> Making the minimal reproducible example more concise,

> plot(as.ts((-10:12)^3), type="b", cex=.5)
> plot( ((-10:12)^3), type="b", cex=.5)

> should plot identically ... but currently don't

And there are more (such) problems,
E.g, lty, and lwd are not propagated, in the (x,y) case,

 plot.ts(as.ts(1:300), cumsum(rnorm(300)), type = "b", cex = 0.5, lwd = 2)

and also not in the "multiple" / matrix case.

I will commit a fix to  R-devel in a moment... but would be glad
for a careful review.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] summary.default rounding on numeric seems inconsistent with other R behaviors

2016-08-23 Thread Martin Maechler

> Dirk Eddelbuettel 
> on Fri, 19 Aug 2016 11:40:05 -0500 writes:

> It is the old story of defined behaviour and expected outcomes. Hard to
> change now.

yes...  not impossible though... see below

> So I would suggest you do something like this in your ~/.Rprofile:

R> smry <- function(...) summary(..., digits=6)
R> smry(15L)
> Min. 1st Qu.  MedianMean 3rd Qu.Max.
> 15  15  15  15  15  15
R> 

> Maybe call it Summary() instead.

yes, do use a different name.   There other such functions, 'summarize()'.

Simone wrote

> I had raised the matter ten years ago, and I was told that the topic was
> already very^3 old
> 
> https://stat.ethz.ch/pipermail/r-devel/2006-September/042684.html
> 
> there is some discussion on its origin and also a declaration of intents to
> change the default behaviour, which, unfortunately, remained a declaration.
> I agree that R could do better here, let's hope in less than ten years
> though. ;-)

and the 2006 thread he mentions is basically a similar question
and a reply by me that I agreed to some extent that a change was
desirable ... originally we had adhered to the S "standard"
which became the S+ one and at that time I did still have access
to a running instance of S-PLUS 6.2 where I had seen that
Insightful (the company selling curating and selling S-PLUS)
also had decided to change the ~15 year old S "standard"... and
indeed I was implicitly *asking* for proposals of such a change,
but I think I never saw a (careful) proposal.

In the spirit of probably 99% of other "base R" code, a change
should really *not* round __at all__ in the summary() methods,
but *only* in the print() methods of such summary() results.

OTOH, for back compatibility, if a user does use  summary(.., digits=.)
explicitly, these digits should be 'obeyed' of course.

I think summary(<1-variable>)  could easily, and relatively "back-compatibly"
be changed in the above vain.

One "real problem" is the wrong decision (also from S and S-PLUS
times IIRC) to return a "character" matrix for
   summary(, ..)
or summary(, ..)
(For a data frame, I think it should return a list() of
 single-variable summary()es, or then a numeric matrix .. in
 both cases have a good print() method)

because when you return a character matrix, all the numbers are
already rounded, ... and if we follow the above approach they 
would have to be rounded further... ``the horror''

I wonder how much code out there is relying on the internal
structure of  summary().. because that is the one
part I'd definitely want to change, too.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] summary.default rounding on numeric seems inconsistent with other R behaviors

2016-08-25 Thread Martin Maechler

>>>>> John Mount <jmo...@win-vector.com>
>>>>> on Wed, 24 Aug 2016 07:25:50 -0700 writes:

>> On Aug 24, 2016, at 2:36 AM, Martin Maechler
>> <maech...@stat.math.ethz.ch> wrote:
>> 
>>>>>>> 
>> 
>> [Talking to myself .. ;-)] Yes, but that's the tough part
>> to change.
>> 
>> This thread's topic is really only about changing
>> summary.default(), and I have started testing such a
>> change now, and that does seem very sensible:
>> 
>> - No rounding in summary.default(), but - (almost)
>> back-compatible rounding in its print() method.
>> 
>> My current plan is to commit this to R-devel in a day or
>> so, unless unforeseen issues emerge.
>> 
>> Martin
>> 

> That is potentially a very good outcome.  Thank you so
> much for producing and testing a patch.

I have now committed such a change to R-devel:

r71150 | maechler | 2016-08-25 21:57:19 +0200 (Thu, 25 Aug 2016) | 1 line
Changed paths:
   M /trunk/doc/NEWS.Rd
   M /trunk/src/library/base/R/summary.R
   M /trunk/src/library/base/man/summary.Rd
   M /trunk/src/library/stats/R/ecdf.R
   M /trunk/tests/Examples/stats-Ex.Rout.save
   M /trunk/tests/reg-tests-2.Rout.save

summary.default() no longer rounds by default; just *prints* rounded

I do expect quite a few packages giving slightly changed output,
typically uniformly not-worse one,  but just "typically".

Note that I did also have to patch   stats:::print.summary.ecdf()
because that had relied on the fact that summary() did
round itself already.
Other useR's code may need similar changes... and so this *is* a
user visible change, listed accordingly in NEWS (the above doc/NEWS.Rd in
the sources).

I hope very much that the overall and longer term benefit will
vastly outweigh the nuisance (to people publishing, e.g.) that
quite a few "basic" outputs will slightly change.

The benefit for maintainers and old timers like me will be that
we will not need to answer this (non-official) FAQ nor excuse a
peculiar behavior in the future .
But yes, I expect a flurry of questions starting in April 2017,
and hope that the smart readers of this list will share the load
answering them .. ;-)

Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] "plot.ts" doesn't respect the value of "pch" (+ blocked from Bugzilla signups)

2016-08-26 Thread Martin Maechler

> Gregory Werbin 
> on Thu, 25 Aug 2016 15:21:05 -0400 writes:

> I've had a chance to read the source more thoroughly. The chain of 
> events is as follows:

> 1. Local function `plotts()` is defined with argument `cex` that 
> defaults to `par("cex")`
> 2. `...` is passed to `plotts()`. If "cex" is an element in `...`, 
> inside `plotts()` the variable `cex` is assigned thereby (overriding the 
> default arg). Importantly, this means that the element "cex" is captured 
> and _removed_ from `...`. `...` is eventually passed to `plot.window()`.
> 3.
> - In the univariate case (NCOL(x) == 1): When `lines.default()` is 
> called to actually plot the data 
> (https://github.com/wch/r-source/blob/trunk/src/library/stats/R/ts.R#L588 
> and 
> https://github.com/wch/r-source/blob/trunk/src/library/stats/R/ts.R#L597 
> for reference), `cex` is not included in the call.
> - In the bivariate case (NCOL(x) > 1): Because "cex" was captured and 
> removed from `...`, it is not passed to `plot.default` when it is called 
> 
(https://github.com/wch/r-source/blob/trunk/src/library/stats/R/ts.R#L548).

> It turns out that the "eating" is not being done by `...` but by the 
> signature of `plotts`.

> The documentation currently reads:

>> ...: additional graphical arguments, see 'plot', 'plot.default' and 
>> 'par'.

> This, to me, suggests parity with the 'plot' function in how the 
> arguments in '...' are handled. Therefore either the code is wrong or 
> the documentation is incomplete and misleading.

the code is not perfect aka "wrong" .. so the bug is there.
Making the minimal reproducible example more concise,

 plot(as.ts((-10:12)^3), type="b", cex=.5)
 plot( ((-10:12)^3), type="b", cex=.5)

should plot identically ... but currently don't


> I filed this is as a bug because it's undocumented, and inconsistent 
> with how other arguments typically passed through `plot.default` are 
> handled.

> I'll be happy to do the patch myself -- I just need to know which thing 
> to patch (the source or the docs).

[yes... and please subscribe to bugzilla which I told you
 yesterday I had explicitly opened for you !]

Martin
 
> Greg


> On 2016-08-25 03:00, David Winsemius wrote:

>>> On Aug 24, 2016, at 5:59 PM, Gregory Werbin 
>>>  wrote:
>>> 
>>> I did a search on Bugzilla for "plot.ts" and didn't find anything on 
>>> this issue. I tried to sign up for Bugzilla to report it, but my 
>>> e-mail address didn't pass your "syntax checking" for a legal e-mail 
>>> address.
>>> 
>>> The bug is easily reproducible on my machine as follows:
>>> 
>>> ## start
>>> 
>>> # generate some data
>>> y <- arima.sim(list(), 150)
>>> 
>>> # this will definitely dispatch to a ".ts" method
>>> class(y)[1] == 'ts'
>>> 
>>> # compare and note that `cex = 0.5` has no effect
>>> plot(y, type = 'b', pch = 16)
>>> plot(y, type = 'b', pch = 16, cex = 0.5)
>>> 
>>> # it works if `y` is coerced back to a regular vector
>>> plot(as.numeric(y), type = 'b', pch = 16, cex = 0.5)
>>> 
>>> # another way to see the issue
>>> plot.ts(y, type = 'b', pch = 16, cex = 0.5)
>>> plot.default(y, type = 'b', pch = 16, cex = 0.5)
>>> 
>>> ## end
>>> 
>>> Skimming through source code for `plot.ts`, it seems like the `cex` 
>>> argument is being "eaten" by a `...` somewhere without being properly 
>>> passed to `plot.default`.
>> 
>> '...' does not "eat" parameters, it passes them on.
>> 
>> Looking at the very top of the body we see this in the definition of 
>> the internal `plotts` function:
>> 
>> cex = par("cex"), lty = par("lty"), lwd = par("lwd"),
>> axes = TRUE, frame.plot = axes, ann = par("ann"), cex.lab = 
>> par("cex.lab"),
>> col.lab = par("col.lab"), font.lab = par("font.lab"),
>> cex.axis = par("cex.axis"), col.axis = par("col.axis"),
>> 
>> And at the end of the body we se the call to plotts (including the 
>> "dots")
>> 
>> So I would suggest using par-settings.
>> 
>> par(cex=0.5)
>> plot(y, type = 'b', pch = 16)
>> 
>> (Question seems more appropriate for r-help.)
>> 
>> --
>> David.
>> 
>>> The output of `R.version` is:
>>> platform   x86_64-apple-darwin15.5.0
>>> arch   x86_64
>>> os darwin15.5.0
>>> system x86_64, darwin15.5.0
>>> status
>>> major  3
>>> minor  3.1
>>> year   2016
>>> month  06
>>> day21
>>> svn rev70800
>>> language   R
>>> version.string R version 3.3.1 (2016-06-21)
>>> nickname   Bug in Your Hair
>>> 
>>> Greg
>>> 
>>>

Re: [Rd] Undocumented 'use.names' argument to c()

2016-09-29 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Mon, 26 Sep 2016 18:26:25 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono <suharto_angg...@yahoo.com>
>>>>> on Mon, 26 Sep 2016 14:51:11 + writes:

>> By "an argument named 'use.names' is included for concatenation", I 
meant something like this, that someone might try.
>>> c(as.Date("2016-01-01"), use.names=FALSE)
>> use.names 
>> "2016-01-01" "1970-01-01" 

>> See, 'use.names' is in the output. That's precisely because 'c.Date' 
doesn't have 'use.names', so that 'use.names' is absorbed into '...'.

> Yes, of course.
> Thank you for the explanation; now I understand what you meant.

> Indeed, the situation is not entirely satisfactory:

> Ideally, *both* the  'recursive' and 'use.names' arguments of
> c() should be considered arguments of only the *default* method of c(),
> not the generic.

> OTOH, c() being .Primitive() the implementation is in C only,
> and (in some sense) of *both* the generic function and the
> default method.
> The C code clearly treats  'recursive' and 'use.names' "the
> same", and has been part of R "forever".

> I think that ideally, we should aim for

> 1) The generic function  c()  only has arguments "..." (or possibly
> ---  because of history of the S4 part ---  "x, ...").

> 2) The default method has additional arguments
> 'recursive = FALSE, use.names = TRUE'
> and other methods of c() can choose if they want to also
> support one or two or none of these extras.

> Somewhat related, but in principle independent of '1)'
> and '2)' above  -- I think, because of the ".Primitive"-ness of c() --
> is the quite how 'c' should print in R.
> Currently it prints like what I say should just be the default
> method.

> Honestly, I'm not sure if it would be straightforward or even
> just relatively painless to go to  '1) + 2)' ... may change
> r71349 (to the S4 generic definition of "c") had dramatical
> effects in "package land" and hence reversion of that (with
> r71354) was necessary, for the time being.

I have just now committed a change to R-devel which on the   ?c
help page gives

| Usage:
| 
|  ## S3 Generic function
|  c(...)
|  
|  ## Default S3 method:
|  c(..., recursive = FALSE, use.names = TRUE)

and in the console, simply

| > c
| function (...)  .Primitive("c")
| > 

and am considering committing a similar change to the place
where S4 generic c() is setup in the 'methods' package.

If this persists,  methods for c(), S3 or S4, will have the
freedom to carry none, one, or both of  'recursive' and
'use.names' arguments.

  > methods(c)
  [1] c.bibentry*   c.Datec.difftimec.noquote
  [5] c.numeric_version c.person* c.POSIXct c.POSIXlt
  [9] c.warnings   

Currently, most existing c() methods have a 'recursive = FALSE'
*and* ignore a 'recursive' specification completely .. and as
Suharto has noted already, of course they do not have a
'use.names' argument yet and so do not ignore it, but treat it
as a regular argument (to be concatenated).

One consequence of this change (the above commit) is that in
principle all c() methods which have more than the '...'
arguments should be documented as "they have surprising
arguments":  They have a 'recursive' argument which is not part
of the generic.

I would say that this "should be documented" is rather a good
thing, because indeed they do silently ignore any 'recursive = foobar()'
and that should be documented, e.g., in current R (and R-devel):

  > c(Sys.Date(), recursive=quote(foobar_nonsense()))
  [1] "2016-09-29"
  > 

which is not well documented, I'd say

Martin



>> 
>> On Sun, 25/9/16, Martin Maechler <maech...@stat.math.ethz.ch> wrote:

>> Subject: Re: [Rd] Undocumented 'use.names' argument to c()
>> To: "Suharto Anggono Suharto Anggono" <suharto_angg...@yahoo.com>
>> Cc: "R-devel" <R-devel@r-project.org>
>> Date: Sunday, 25 September, 2016, 10:14 PM
 
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
>>>>> on Sun, 25 Sep 2016 14:12:10 + writes:

>>>> From comments in
>>>> 
http://stackoverflow.com/questions/24815572/why-does-function-c-accept-an-undocumented-argument/24815653
>>>> : The code of c() and unlist() was formerly share

Re: [Rd] withAutoprint({ .... }) ?

2016-09-27 Thread Martin Maechler

>>>>> Henrik Bengtsson <henrik.bengts...@gmail.com>
>>>>> on Sun, 25 Sep 2016 12:38:27 -0700 writes:

    > On Sun, Sep 25, 2016 at 9:29 AM, Martin Maechler
> <maech...@stat.math.ethz.ch> wrote:
>>>>>>> Henrik Bengtsson <henrik.bengts...@gmail.com> on
>>>>>>> Sat, 24 Sep 2016 11:31:49 -0700 writes:
>> 
>> > Martin, did you post your code for withAutoprint()
>> anywhere?  > Building withAutoprint() on top of source()
>> definitely makes sense, > unless, as Bill says, source()
>> itself could provide the same feature.
>> 
>> I was really mainly asking for advice about the function
>> name .. and got none.

> I missed that part.  I think the name is good.  A shorter
> alternative would be withEcho(), but could be a little bit
> misleading since it doesn't reflect 'print=TRUE' to
> source().

>> 
>> I'm now committing my version (including (somewhat incomplete)
>> documentation, so you (all) can look at it and try / test it further.
>> 
>> > To differentiate between withAutoprint({ x <- 1 }) and
>> > withAutoprint(expr) where is an expression / language object, one
>> > could have an optional argument `substitute=TRUE`, e.g.
>> 
>> > withAutoprint <- function(expr, substitute = TRUE, ...) {
>> >if (substitute) expr <- substitute(expr)
>> >[...]
>> > }
>> 
>> I think my approach is nicer insofar it does not seem to need
>> such an argument I'm sure you'll try to disprove that ;-)

> Nah, I like that you've extended source() with the 'exprs' argument.

> May I suggest to add:

> svn diff src/library/base/R/
> Index: src/library/base/R/source.R
> ===
> --- src/library/base/R/source.R (revision 71357)
> +++ src/library/base/R/source.R (working copy)
> @@ -198,7 +198,7 @@
> if (!tail) {
> # Deparse.  Must drop "expression(...)"
> dep <- substr(paste(deparse(ei, width.cutoff = width.cutoff,
> -control = 
"showAttributes"),
> +  control = c("keepInteger", "showAttributes")),
> collapse = "\n"), 12L, 1e+06L)
> dep <- paste0(prompt.echo,
> gsub("\n", paste0("\n", continue.echo), dep))

> such that you get:

>> withAutoprint(x <- c(1L, NA_integer_, NA))
>> x <- c(1L, NA_integer_, NA)

> because without it, you get:

>> withAutoprint(x <- c(1L, NA_integer_, NA))
>> x <- c(1, NA, NA)

That's a very good consideration.
However, your change would change the semantics of source(),
not just those of withAutoprint(), and I would not want to do
that ... at least not at the moment. 

What I've done instead, is to make this yet another new
 argument of both source() and withAutoprint(),
called   'deparseCtrl'  and with different defaults (currently)
for the 2 functions. 

Thank you for the feedback!
Martin




> Thanks,
> Henrik


>> 
>> Martin
>> 
>> > Just some thoughts
>> > /Henrik
>> 
>> 
>> > On Sat, Sep 24, 2016 at 6:37 AM, Martin Maechler
>> > <maech...@stat.math.ethz.ch> wrote:
>> >>>>>>> William Dunlap <wdun...@tibco.com>
>> >>>>>>> on Fri, 2 Sep 2016 08:33:47 -0700 writes:
>> >>
>> >> > Re withAutoprint(), Splus's source() function could take a 
expression
>> >> > (literal or not) in place of a file name or text so it could support
>> >> > withAutoprint-like functionality in its GUI.  E.g.,
>> >>
>> >> >> source(auto.print=TRUE, exprs.literal= { x <- 3:7 ; sum(x) ; y <- 
log(x)
>> >> > ; x - 100}, prompt="--> ")
--> x <- 3:7
--> sum(x)
>> >> > [1] 25
--> y <- log(x)
--> x - 100
>> >> > [1] -97 -96 -95 -94 -93
>> >>
>> >> > or
>> >>
>> >> >> expr <- quote({ x <- 3:7 ; sum(x) ; y <- log(x) ; x - 100})
>> >> >> source(auto.print=TRUE, exprs = expr, prompt="--> ")
--> x <- 3:7
--> sum(x)
>> >> > [1] 25
--> y <- log(x)
--> x - 100
>> >> > [1] -97 -96

Re: [Rd] withAutoprint({ .... }) ?

2016-09-25 Thread Martin Maechler

>>>>> Henrik Bengtsson <henrik.bengts...@gmail.com>
>>>>> on Sat, 24 Sep 2016 11:31:49 -0700 writes:

> Martin, did you post your code for withAutoprint() anywhere?
> Building withAutoprint() on top of source() definitely makes sense,
> unless, as Bill says, source() itself could provide the same feature.

I was really mainly asking for advice about the function name
.. and got none.

I'm now committing my version (including (somewhat incomplete)
documentation, so you (all) can look at it and try / test it further.

> To differentiate between withAutoprint({ x <- 1 }) and
> withAutoprint(expr) where is an expression / language object, one
> could have an optional argument `substitute=TRUE`, e.g.

> withAutoprint <- function(expr, substitute = TRUE, ...) {
>if (substitute) expr <- substitute(expr)
>[...]
> }

I think my approach is nicer insofar it does not seem to need
such an argument I'm sure you'll try to disprove that ;-)

Martin

> Just some thoughts
> /Henrik


> On Sat, Sep 24, 2016 at 6:37 AM, Martin Maechler
> <maech...@stat.math.ethz.ch> wrote:
>>>>>>> William Dunlap <wdun...@tibco.com>
>>>>>>> on Fri, 2 Sep 2016 08:33:47 -0700 writes:
>> 
>> > Re withAutoprint(), Splus's source() function could take a expression
>> > (literal or not) in place of a file name or text so it could support
>> > withAutoprint-like functionality in its GUI.  E.g.,
>> 
>> >> source(auto.print=TRUE, exprs.literal= { x <- 3:7 ; sum(x) ; y <- 
log(x)
>> > ; x - 100}, prompt="--> ")
--> x <- 3:7
--> sum(x)
>> > [1] 25
--> y <- log(x)
--> x - 100
>> > [1] -97 -96 -95 -94 -93
>> 
>> > or
>> 
>> >> expr <- quote({ x <- 3:7 ; sum(x) ; y <- log(x) ; x - 100})
>> >> source(auto.print=TRUE, exprs = expr, prompt="--> ")
--> x <- 3:7
--> sum(x)
>> > [1] 25
--> y <- log(x)
--> x - 100
>> > [1] -97 -96 -95 -94 -93
>> 
>> > It was easy to implement, since exprs's default value is parse(file) or
>> > parse(text=text), which source is calculating anyway.
>> 
>> 
>> > Bill Dunlap
>> > TIBCO Software
>> > wdunlap tibco.com
>> 
>> Thank you, Bill  (and the other correspondents); that's indeed a
>> very good suggestion :
>> 
>> I've come to the conclusion that Duncan and Bill are right:  One
>> should do this in R (not C) and as Bill hinted, one should use
>> source().  I first tried to do it separately, just "like source()",
>> but a considerable part of the source of source()  {:-)} is
>> about using src attributes instead of deparse() when the former
>> are present,  and it does make sense to generalize
>> withAutoprint() to have the same feature, so after all, have it
>> call source().
>> 
>> I've spent a few hours now trying things and variants, also
>> found I needed to enhance source()  very slightly also in a few
>> other details, and now (in my uncommitted version of R-devel),
>> 
>> withAutoprint({ x <- 1:12; x-1; (y <- (x-5)^2); z <- y; z - 10 })
>> 
>> produces
>> 
>>> withAutoprint({ x <- 1:12; x-1; (y <- (x-5)^2); z <- y; z - 10 })
>>> x <- 1:12
>>> x - 1
>> [1]  0  1  2  3  4  5  6  7  8  9 10 11
>>> (y <- (x - 5)^2)
>> [1] 16  9  4  1  0  1  4  9 16 25 36 49
>>> z <- y
>>> z - 10
    >> [1]   6  -1  -6  -9 -10  -9  -6  -1   6  15  26  39
>>> 
>> 
>> and is equivalent to
>> 
>> withAutoprint(expression(x <- 1:12, x-1, (y <- (x-5)^2), z <- y, z - 10 
))
>> 
>> I don't see any way around the "mis-feature" that all "input"
>> expressions are in the end shown twice in the "output" (the
>> first time by showing the withAutoprint(...) call itself).
>> 
>> The function *name* is "not bad" but also a bit longish;
>> maybe there are better ideas?  (not longer, no "_" - I know this
>> is a matter of taste only)
>> 
>> Martin
>> 
>> > On Fri, Sep 2, 2016 at 4:56 AM, Martin Maechler 
<maech...@stat.math.ethz.ch>
>> > wrote

Re: [Rd] Undocumented 'use.names' argument to c()

2016-09-25 Thread Martin Maechler

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
>>>>> on Sun, 25 Sep 2016 14:12:10 + writes:

>> From comments in
>> 
http://stackoverflow.com/questions/24815572/why-does-function-c-accept-an-undocumented-argument/24815653
>> : The code of c() and unlist() was formerly shared but
>> has been (long time passing) separated. From July 30,
>> 1998, is where do_c got split into do_c and do_unlist.
> With the implementation of 'c.Date' in R devel r71350, an
> argument named 'use.names' is included for
> concatenation. So, it doesn't follow the documented
> 'c'. But, 'c.Date' is not explicitly documented in
> Dates.Rd, that has 'c.Date' as an alias.

I do not see any  c.Date  in R-devel with a 'use.names'; its a
base function, hence not hidden ..

As mentioned before, 'use.names' is used in unlist() in quite a
few places, and such an argument also exists for

lengths()   and
all.equal.list()

and now c() 

> 
> On Sat, 24/9/16, Martin Maechler
> <maech...@stat.math.ethz.ch> wrote:

>  Subject: Re: [Rd] Undocumented 'use.names' argument to
> c() To: "Karl Millar" <kmil...@google.com>

>  Date: Saturday, 24 September, 2016, 9:12 PM
 
>>>>>> Karl Millar via R-devel <r-devel@r-project.org>
>>>>> on Fri, 23 Sep 2016 11:12:49 -0700 writes:

>> I'd expect that a lot of the performance overhead could
>> be eliminated by simply improving the underlying code.
>> IMHO, we should ignore it in deciding the API that we
>> want here.

> I agree partially.  Even if the underlying code can be
> made faster, the 'use.names = FALSE' version will still be
> faster than the default, notably in some "long" cases.

> More further down.

>> On Fri, Sep 23, 2016 at 10:54 AM, Henrik Bengtsson
>> <henrik.bengts...@gmail.com> wrote:
>>> I'd vote for it to stay.  It could of course suprise
>>> someone who'd expect c(list(a=1), b=2, use.names =
>>> FALSE) to generate list(a=1, b=2, use.names=FALSE).  On
>>> the upside, is the performance gain from using
>>> use.names=FALSE.  Below benchmarks show that the
>>> combining of the names attributes themselves takes
>>> ~20-25 times longer than the combining of the integers
>>> themselves.  Also, at no surprise, use.names=FALSE
>>> avoids some memory allocations.
>>> 
>>>> options(digits = 2)
>>>> 
>>>> a <- b <- c <- d <- 1:1e4 names(c) <- c names(d) <- d
>>>> 
>>>> stats <- microbenchmark::microbenchmark(
>>> + c(a, b, use.names=FALSE), + c(c, d, use.names=FALSE),
>>> + c(a, d, use.names=FALSE), + c(a, b, use.names=TRUE), +
>>> c(a, d, use.names=TRUE), + c(c, d, use.names=TRUE), +
>>> unit = "ms" + )
>>>> 
>>>> stats
>>> Unit: milliseconds expr min lq mean median uq max neval
>>> c(a, b, use.names = FALSE) 0.031 0.032 0.049 0.034 0.036
>>> 1.474 100 c(c, d, use.names = FALSE) 0.031 0.031 0.035
>>> 0.034 0.035 0.064 100 c(a, d, use.names = FALSE) 0.031
>>> 0.031 0.049 0.034 0.035 1.452 100 c(a, b, use.names =
>>> TRUE) 0.031 0.031 0.055 0.034 0.036 2.094 100 c(a, d,
>>> use.names = TRUE) 0.510 0.526 0.588 0.549 0.617 1.998
>>> 100 c(c, d, use.names = TRUE) 0.780 0.815 0.886 0.841
>>> 0.944 1.430 100
>>> 
>>>> profmem::profmem(c(c, d, use.names=FALSE))
>>> Rprofmem memory profiling of: c(c, d, use.names = FALSE)
>>> 
>>> Memory allocations: bytes calls 1 80040  total
>>> 80040
>>> 
>>>> profmem::profmem(c(c, d, use.names=TRUE))
>>> Rprofmem memory profiling of: c(c, d, use.names = TRUE)
>>> 
>>> Memory allocations: bytes calls 1 80040  2
>>> 160040  total 240080
>>> 
>>> /Henrik
>>> 
>>> On Fri, Sep 23, 2016 at 10:25 AM, William Dunlap via
>>> R-devel <r-devel@r-project.org> wrote:
>>>> In Splus c() and unlist() called the same C code, but
>>>> with a different 'sys_index' code (the last argument to
>>>> .Internal) and c() did not consider an argument named
>>>> 'use.names' special.

> Thank you, Bill, very much, for making the historical

Re: [Rd] Running package tests and not stop on first fail

2016-11-08 Thread Martin Maechler

>>>>> Hervé Pagès <hpa...@fredhutch.org>
>>>>> on Mon, 7 Nov 2016 14:37:15 -0800 writes:

    > On 11/05/2016 01:53 PM, Martin Maechler wrote:
>>>>>>> Oliver Keyes <ironho...@gmail.com>
>>>>>>> on Fri, 4 Nov 2016 12:42:54 -0400 writes:
>> 
>> > On Friday, 4 November 2016, Martin Maechler
>> > <maech...@stat.math.ethz.ch> wrote:
>> 
>> >> >>>>> Dirk Eddelbuettel <e...@debian.org <javascript:;>>
>> >> >>>>> on Fri, 4 Nov 2016 10:36:52 -0500 writes:
>> >>
>> >> > On 4 November 2016 at 16:24, Martin Maechler wrote: |
>> >> My > proposed name '--no-stop-on-error' was a quick shot;
>> >> if | > somebody has a more concise or better "English
>> >> style" > wording | (which is somewhat compatible with all
>> >> the other > options you see | from 'R CMD check --help'),
>> >> | please > speak up.
>> >>
>> >> > Why not keep it simple?  The similar feature this most
>> >> > resembles is 'make -k' and its help page has
>> >>
>> >> > -k, --keep-going
>> >>
>> >> > Continue as much as possible after an > error.  While
>> >> the target that failed, and those that > depend on it,
>> >> cannot be remade, the other dependencies of > these
>> >> targets can be processed all the same.
>> >>
>> >> Yes, that would be quite a bit simpler and nice in my
>> >> view.  One may think it to be too vague,
>> 
>> > Mmn, I would agree on vagueness (and it breaks the pattern
>> > set by other flags of human-readability). Deep familiarity
>> > with make is probably not something we should ask of
>> > everyone who needs to test a package, too.
>> 
>> > I quite like stop-on-error=true (exactly the same as the
>> > previous suggestion but shaves off some characters by
>> > inverting the Boolean)
>> 
>> Thank you, Brian, Dirk and Oliver for these (and some offline)
>> thoughts and suggestions!
>> 
>> My current summary:
>> 
>> 1) I really don't want a  --=value
>> but rather stay with logical/binary variables that "express
>> themselves"... in the same way I strongly prefer
>> 
>> if (A_is_special)   
>> to
>> if (A_special == TRUE)  
>> 
>> for a logical variable A_* .   Yes, this is mostly a matter
>> of taste,.. but related to how R style itself "works"
>> 
>> 2) Brian mentioned that this is only about ./tests/ tests which
>> are continued, not about the Examples which are treated separately.
>> That's why we had contemplated additionally using 'tests' (because that's
>> the directory name used for unit/regression/.. tests) in the option
>> name.
>> 
>> Even though Brian is correct, ideally we *would* want to also influence 
the
>> examples' running to *not* stop on a first error..   However that would
>> need more work, reorganizing how the examples are run and that may not be
>> worth the pain.   However it should be considered a goal in the long run.

> My name is Hervé, and I was not suggesting that what happens with the
> examples should be changed. I was just preaching consistency (again
> sorry) between what happens with the examples and what happens with
> the tests. 

Thank you, Hervé and excuse me for not answering more focused on
what you said.
I think I do understand what you say (at least by now :-)) and
agree that consistency is something important and to be strived for,
also with these options.

> Why not simply change the latter?
> Do we really need an option to control this? 

Very good questions.  If the change could be made much better,
I'd agree we'd not need a new option because the change could be
considerided uniformly better than the current (R 3.3.2, say) behavior.
However the change as it is currently, is not good enough to be
the only option (see below). 

> The behavior was changed for the examples a couple of
> years ago and nobody felt the need to introduce an option
> to control this at the time.

Yes, that change was made very nicely (not by me) and I'd say
the result *was* uniformly better than the previous behavior, so
there did not seem much of a reason to

Re: [Rd] Memory leak with tons of closed connections

2016-11-11 Thread Martin Maechler

> Gergely Daróczi 
> on Thu, 10 Nov 2016 16:48:12 +0100 writes:

> Dear All,
> I'm developing an R application running inside of a Java daemon on
> multiple threads, and interacting with the parent daemon via stdin and
> stdout.

> Everything works perfectly fine except for having some memory leaks
> somewhere. Simplified version of the R app:

> while (TRUE) {
> con <- file('stdin', open = 'r', blocking = TRUE)
> line <- scan(con, what = character(0), nlines = 1, quiet = TRUE)
> close(con)
> }

> This loop uses more and more RAM as time passes (see more on this
> below), not sure why, and I have no idea currently on how to debug
> this further. Can someone please try to reproduce it and give me some
> hints on what is the problem?

> Sample bash script to trigger an R process with such memory leak:

> Rscript --vanilla -e "while(TRUE)cat(runif(1),'\n')" | Rscript
> --vanilla -e 
"cat(Sys.getpid(),'\n');while(TRUE){con<-file('stdin',open='r',blocking=TRUE);line<-scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);gc()}"

> Maybe you have to escape '\n' depending on your shell.

> Thanks for reading this and any hints would be highly appreciated!

I have no hints, sorry... but give some more "data":

I've changed the above to *print* the gc() result every 1000th
iteration, and after 100'000 iterations, there is still no
memory increase from the point of view of R itself.

However, monitoring the process (via 'htop', e.g.) shows about
1 MB per second increase in memory foot print of the process.

One could argue that the error is with the OS / pipe / bash
rather than with R itself... but I'm not expert enough to do
argue  here at all.

Here's my version of your sample bash script and its output:

$  Rscript --vanilla -e "while(TRUE)cat(runif(1),'\n')" | Rscript --vanilla -e 
"cat(Sys.getpid(),'\n');i <- 0; 
while(TRUE){con<-file('stdin',open='r',blocking=TRUE);line<-scan(con,what=character(0),nlines=1,quiet=TRUE);close(con);rm(con);a
 <- gc(); i <- i+1; if(i %% 1000 == 1) {cat('i=',i,'\\n'); print(a)} }"

11059 
i= 1 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83216  4.5   1000 534.1   213529 11.5
Vcells 172923  1.4   16777216 128.0   562476  4.3
i= 1001 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83255  4.5   1000 534.1   213529 11.5
Vcells 172958  1.4   16777216 128.0   562476  4.3
...
...
...
...
i= 80001 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83255  4.5   1000 534.1   213529 11.5
Vcells 172958  1.4   16777216 128.0   562476  4.3
i= 81001 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83255  4.5   1000 534.1   213529 11.5
Vcells 172959  1.4   16777216 128.0   562476  4.3
i= 82001 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83255  4.5   1000 534.1   213529 11.5
Vcells 172959  1.4   16777216 128.0   562476  4.3
i= 83001 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83255  4.5   1000 534.1   213529 11.5
Vcells 172958  1.4   16777216 128.0   562476  4.3
i= 84001 
 used (Mb) gc trigger  (Mb) max used (Mb)
Ncells  83255  4.5   1000 534.1   213529 11.5
Vcells 172958  1.4   16777216 128.0   562476  4.3


> Best,
> Gergely

> PS1 see the image posted at
> 
http://stackoverflow.com/questions/40522584/memory-leak-with-closed-connections
> on memory usage over time
> PS2 the issue doesn't seem to be due to writing more data in the first
> R app compared to what the second R app can handle, as I tried the
> same with adding a Sys.sleep(0.01) in the first app and that's not an
> issue at all in the real application
> PS3 I also tried using stdin() instead of file('stdin'), but that did
> not work well for the stream running on multiple threads started by
> the same parent Java daemon
> PS4 I've tried this on Linux using R 3.2.3 and 3.3.2

For me, it's Linux, too (Fedora 24), using  'R 3.3.2 patched'..

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How to assign NULL value to pairlist element while keeping it a pairlist?

2016-10-15 Thread Martin Maechler

> Michael Lawrence 
> on Wed, 12 Oct 2016 15:21:13 -0700 writes:

> Thanks, this was what I expected. There is a desire to
> eliminate the usage of pairlist from user code, which
> suggests the alternative of allowing for function
> arguments to be stored in lists. That's a much deeper
> change though.

and I hope we would not go there just for the purpose of
eliminating pairlists from user code, would we ?

As nobody else has mentioned it, I'd really  like to mention the
two (actually 3) functions important for dealing with function
argument lists much more transparently than the
as.list() things below:

  formals()
  formals() <-   #  and
  alist()

for creating / modifying function argument lists (which are
pairlists, but the user does not need to know really).
Or did you imply, Henrik, that would you want is not achievable
with these?

Martin

> On Wed, Oct 12, 2016 at 12:31 PM, Henrik Bengtsson
>  wrote:
>> Michael, thanks for this info.
>> 
>> I've stumbled upon this in a case where I walk an R expression (the
>> AST) and (optionally) modifies it (part of the globals package).  In R
>> expressions, a function definition uses a pairlist to represent the
>> arguments.  For example,
>> 
>>> expr <- quote(function(x = 1) x)
>>> str(as.list(expr))
>> List of 4
>> $ : symbol function
>> $ :Dotted pair list of 1
>> ..$ x: num 1
>> $ : symbol x
>> $ :Class 'srcref'  atomic [1:8] 1 15 1 29 15 29 1 1
>> .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'
>> 
>> 
>> Here the 2nd element is a pairlist:
>> 
>>> str(expr[[2]])
>> Dotted pair list of 1
>> $ x: num 1
>>> typeof(expr[[2]])
>> [1] "pairlist"
>> 
>> Now say that I want to update the default value of argument 'x', which
>> is currently 1, to NULL.  Then I do:
>> 
>>> expr[[2]][1] <- list(x = NULL)
>> 
>> At this step, I end up with an expression 'expr' where the arguments
>> are no longer represented by a pairlist:
>> 
>>> str(expr[[2]])
>> List of 1
>> $ x: NULL
>>> typeof(expr[[2]])
>> [1] "list"
>> 
>> More importantly, at this point 'expr' no longer holds a valid R 
expression:
>> 
>>> expr
>> Error: badly formed function expression
>> 
>> The solution is to make sure we have a pairlist:
>> 
>>> expr[[2]] <- as.pairlist(expr[[2]])
>>> expr
>> function(x = NULL) x
>> 
>> 
>> I agree it would be nice to fix this for consistency, but if you bump
>> into major issues, at least I can live with having to use an explicit
>> as.pairlist().
>> 
>> Thanks
>> 
>> Henrik
>> 
>> On Wed, Oct 12, 2016 at 10:53 AM, Michael Lawrence
>>  wrote:
>>> Hi Henrik,
>>> 
>>> It would help to understand your use case for pairlists.
>>> 
>>> Thanks,
>>> Michael
>>> 
>>> On Wed, Oct 12, 2016 at 9:40 AM, Michael Lawrence  
wrote:
 The coercion is probably the most viable workaround for now, as it's
 consistent with what happens internally for calls. All pairlists/calls
 are converted to list for subassignment, but only calls are converted
 back. My guess is that the intent was for users to move from using a
 pairlist to the "new" (almost 20 years ago) list. In my opinion,
 consistency trumps "convenience" in this case. If others agree, I'll
 change it to also coerce back to pairlist.
 
 Michael
 
 On Wed, Oct 12, 2016 at 9:20 AM, Henrik Bengtsson
  wrote:
> Hi, I seem to not be able to assign NULL to an element of a pairlist
> without causing it to be coerced to a plain list.  For example:
> 
> x <- pairlist(1, 2)
> class(x)
> [1] "pairlist"
> 
> x[1] <- list(NULL)
> class(x)
> [1] "list"
> 
> This actually true for all [()<- assignments regardless of list 
value, e.g.
> 
> x <- pairlist(1, 2)
> x[1] <- list(0)
> [1] "list"
> 
> I also tried assigning a pairlist(), but still the same problem:
> 
> x <- pairlist(1, 2)
> x[1] <- pairlist(0)
> [1] "list"
> 
> The only workaround I'm aware of is to:
> 
> x <- as.pairlist(x)
> 
> at the end.  Any other suggestions?
> 
> Thanks,
> 
> Henrik
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] typo or stale info in qr man

2016-10-25 Thread Martin Maechler

>>>>> Wojciech Musial (Voitek) <wojciech.mus...@gmail.com>
>>>>> on Mon, 24 Oct 2016 15:07:55 -0700 writes:

> man for `qr` says that the function uses LINPACK's DQRDC, while it in
> fact uses DQRDC2.

which is a modification of LINPACK's DQRDC.

But you are right, and I have added to the help file (and a tiny
bit to the comments in the Fortran source).

When this change was done > 20 years ago, it was still hoped 
that the numerical linear algebra community or more specifically
those behind LAPACK would eventually provide this functionality
with LAPACK (and we would then use that),
but that has never happened according to my knowledge.

Thank you for the 'heads up'.

Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Missing objects using dump.frames for post-mortem debugging of crashed batch jobs. Bug or gap in documentation?

2016-11-14 Thread Martin Maechler

> nospam@altfeld-im de 
> on Sun, 13 Nov 2016 13:11:38 +0100 writes:

> Dear R friends, to allow post-mortem debugging In my
> Rscript based batch jobs I use

>tryCatch( , error = function(e) {
> dump.frames(to.file = TRUE) })

> to write the called frames into a dump file.

> This is similar to the method recommended in the "Writing
> R extensions" manual in section 4.2 Debugging R code (page
> 96):

> https://cran.r-project.org/doc/manuals/R-exts.pdf

>> options(error = quote({dump.frames(to.file=TRUE); q()}))



> When I load the dump later in a new R session to examine
> the error I use

> load(file = "last.dump.rda") debugger(last.dump)

> My problem is that the global objects in the workspace are
> NOT contained in the dump since "dump.frames" does not
> save the workspace.

> This makes debugging difficult.



> For more details see the stackoverflow question + answer
> in:
> 
https://stackoverflow.com/questions/40421552/r-how-make-dump-frames-include-all-variables-for-later-post-mortem-debugging/40431711#40431711



> I think the reason of the problem is:
> 

> If you use dump.files(to.file = FALSE) in an interactive
> session debugging works as expected because it creates a
> global variable called "last.dump" and the workspace is
> still loaded.

> In the batch job scenario however the workspace is NOT
> saved in the dump and therefore lost if you debug the dump
> in a new session.


> Options to solve the issue:
> --

> 1. Improve the documentation of the R help for
> "dump.frames" and the R_exts manual to propose another
> code snippet for batch job scenarios:

>   dump.frames() save.image(file = "last.dump.rda")

> 2. Change the semantics of "dump.frames(to.file = TRUE)"
> to include the workspace in the dump.  This would change
> the semantics implied by the function name but makes the
> semantics consistent for both "to.file" param values.

There is a third option, already in place for three months now:
Andreas Kersting did propose it (nicely, as a wish),
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17116
and I had added it to the development version of R back then :


r71102 | maechler | 2016-08-16 17:36:10 +0200 (Tue, 16 Aug 2016) | 1 line

dump.frames(*, include.GlobalEnv)


So, if you (or others) want to use this before next spring,
you should install a version of R-devel
and you use that, with

  tryCatch( ,
   error = function(e)
   dump.frames(to.file = TRUE, include.GlobalEnv = TRUE))

Using R-devel is nice and helpful for the R community, as you
will help finding bugs/problems in the new features (and
possibly changed features) we've introduced there. 


Best regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Memory leak with tons of closed connections

2016-11-14 Thread Martin Maechler

> Gábor Csárdi 
> on Sun, 13 Nov 2016 20:49:57 + writes:

> Using dup() before fdopen() (and calling fclose() on the connection
> when it is closed) indeed fixes the memory leak.
> 

Thank you, Gábor!
Yes I can confirm that this fixes the memory leak.

I'm testing ('make check-all') currently and then (probably) will
commit the patch R-devel only for the time being.

Martin

> FYI,
> Gabor
> 
> Index: src/main/connections.c
> ===
> --- src/main/connections.c (revision 71653)
> +++ src/main/connections.c (working copy)
> @@ -576,7 +576,7 @@
>  fp = R_fopen(name, con->mode);
>  } else {  /* use file("stdin") to refer to the file and not the console 
> */
>  #ifdef HAVE_FDOPEN
> - fp = fdopen(0, con->mode);
> +fp = fdopen(dup(0), con->mode);
>  #else
>   warning(_("cannot open file '%s': %s"), name,
>   "fdopen is not supported on this platform");
> @@ -633,8 +633,7 @@
>  static void file_close(Rconnection con)
>  {
>  Rfileconn this = con->private;
> -if(con->isopen && strcmp(con->description, "stdin"))
> - con->status = fclose(this->fp);
> +con->status = fclose(this->fp);
>  con->isopen = FALSE;
>  #ifdef Win32
>  if(this->anon_file) unlink(this->name);
> 
> On Fri, Nov 11, 2016 at 1:12 PM, Gábor Csárdi  wrote:
> > On Fri, Nov 11, 2016 at 12:46 PM, Gergely Daróczi
> >  wrote:
> > [...]
> >>> I've changed the above to *print* the gc() result every 1000th
> >>> iteration, and after 100'000 iterations, there is still no
> >>> memory increase from the point of view of R itself.
> >
> > Yes, R does not know about it, it does not manage this memory (any
> > more), but the R process requested this memory from the OS, and never
> > gave it back, which is basically the definition of a memory leak. No?
> >
> > I think the leak is because 'stdin' is special and R opens it with fdopen():
> > https://github.com/wch/r-source/blob/f8cdadb769561970cc42776f563043ea5e12fe05/src/main/connections.c#L561-L579
> >
> > and then it does not close it:
> > https://github.com/wch/r-source/blob/f8cdadb769561970cc42776f563043ea5e12fe05/src/main/connections.c#L636
> >
> > I understand that R cannot fclose the FILE*, because that would also
> > close the file descriptor, but anyway, this causes a memory leak. I
> > think.
> >
> > It seems that you cannot close the FILE* without closing the
> > descriptor, so maybe a workaround would be to keep one FILE* open,
> > instead of calling fdopen() to create new ones every time. Another
> > possible workaround is to use dup(), but I don't know enough about the
> > details to be sure.
> >
> > Gabor
> >
> > [...]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Missing objects using dump.frames for post-mortem debugging of crashed batch jobs. Bug or gap in documentation?

2016-11-15 Thread Martin Maechler

>>>>> nospam@altfeld-im de <nos...@altfeld-im.de>
>>>>> on Tue, 15 Nov 2016 01:15:46 +0100 writes:

> Martin, thanks for the good news and sorry for wasting your (and others
> time) by not doing my homework and query bugzilla first (lesson learned!
> ).
> 
> I have tested the new implementation from R-devel and observe a semantic
> difference when playing with the parameters:
> 
>   # Test script 1
>   g <- "global"
>   f <- function(p) {
> l <- "local"
> dump.frames()
>   }
>   f("parameter")
> 
> results in
>   # > debugger()
>   # Message:  object 'server' not foundAvailable environments had calls:
>   # 1: source("~/.active-rstudio-document", echo = TRUE)
>   # 2: withVisible(eval(ei, envir))
>   # 3: eval(ei, envir)
>   # 4: eval(expr, envir, enclos)
>   # 5: .active-rstudio-document#9: f("parameter")
>   # 
>   # Enter an environment number, or 0 to exit  
>   # Selection: 5
>   # Browsing in the environment with call:
>   #   .active-rstudio-document#9: f("parameter")
>   # Called from: debugger.look(ind)
>   # Browse[1]> g
>   # [1] "global"
>   # Browse[1]> 
> 
> while dumping to a file
> 
>   # Test script 2
>   g <- "global"
>   f <- function(p) {
> l <- "local"
> dump.frames(to.file = TRUE, include.GlobalEnv = TRUE)
>   }
>   f("parameter")
> 
> results in
>   # > load("last.dump.rda")
>   # > debugger()

>   # Message:  object 'server' not foundAvailable environments had calls:
>   # 1: .GlobalEnv
>   # 2: source("~/.active-rstudio-document", echo = TRUE)
>   # 3: withVisible(eval(ei, envir))
>   # 4: eval(ei, envir)
>   # 5: eval(expr, envir, enclos)
>   # 6: .active-rstudio-document#11: f("parameter")
>   # 
>   # Enter an environment number, or 0 to exit  
>   # Selection: 6
>   # Browsing in the environment with call:
>   #   .active-rstudio-document#11: f("parameter")
>   # Called from: debugger.look(ind)
>   # Browse[1]> g
>   # Error: object 'g' not found
>   # Browse[1]> 

Your call to f() and the corresponding dump is heavily
obfuscated by all the wrap paper that Rstudio seems to wrap around a
simple function call (or just around using debugger() ?).

All this was to get the correct environments when things are run
in a batch job... and there's no Rstudio gift wrapping in that case.

In my simple use of the above, "g" is clearly available in the .GlobalEnv
component of last.dump :

> exists("g", last.dump$.GlobalEnv)
[1] TRUE
> get("g", last.dump$.GlobalEnv)
[1] "global"
> 

and that's all what's promised, right?
In such a post mortem debugging, notably from a batch job (!),
you don't want your .GlobalEnv to be *replaced* by the
.GlobalEnv from 'last.dump', do you?

I think in the end, I think you are indirectly asking for new features to be
added to  debugger(), namely that it works more seemlessly with
a last.dump object that has been created via 'include.GlobalEnv = TRUE'.

This wish for a new feature may be a very sensible wish.
I think it's fine if you add it as wish (for a new feature to
debugger()) to the R bugzilla site
( https://bugs.r-project.org/ -- after asking one of R core to
  add you to the list of "registered ones" there, see the
  boldface note in https://www.r-project.org/bugs.html )

Personally, I would only look into this issue if we also get a patch
proposal (see also https://www.r-project.org/bugs.html), because
already now you can easily get to "g" in your example.

Martin

> The semantic difference is that the global variable "g" is visible
> within the function "f" in the first version, but not in the second
> version.
> 
> If I dump to a file and load and debug it then the search path through
> the
> frames is not the same during run time vs. debug time.
> 
> An implementation with the same semantics could be achieved
> by applying this workaround currently:
> 
>   dump.frames()
>   save.image(file = "last.dump.rda")
> 
> Does it possibly make sense to unify the semantics?
> 
> THX!
> 
> 
> On Mon, 2016-11-14 at 11:34 +0100, Martin Maechler wrote:
> > >>>>> nospam@altfeld-im de <nos...@altfeld-im.de>
> > >>>>> on Sun, 13 Nov 2016 13:11:38 +0100 writes:
> > 
> > > Dear R friends, to allow post-mortem debugging In my
> > > Rscript based batch jobs I use
> > 
> > >tryCatch( , error = function(e) {
> > > dump.frames(to.file = TRUE) })
> > 
> > >

Re: [Rd] shared libraries: missing soname

2016-11-23 Thread Martin Maechler

>>>>> Joseph Mingrone <j...@ftfl.ca>
>>>>> on Tue, 22 Nov 2016 22:21:49 -0400 writes:

> Dirk Eddelbuettel <e...@debian.org> writes:
>> On 22 November 2016 at 00:02, Joseph Mingrone wrote:
>> | These are also not fatal errors on FreeBSD, where everything, for now, 
also just
>> | works.  ...until a library's interface changes.  You seem to be 
arguing that
>> | sonmaes are pointless.  We disagree.

>> You are putting words in my mouth. In my very first reply to you, I 
pointed
>> out that (for non-BSD systems at least) the sonames do not matter as R 
loads
>> the libraries itself, rather than via ldd.  No more, no less.

> Let me restate.  You seem to be arguing that, because R itself doesn't 
consume
> it's shared libraries via ldd(), sonames serve no purpose, in this case.  
Please
> correct me if I'm putting words in your mouth.

>> | I can't say for certain (I'm not an rkward user), but looking at the 
build

>> Why did _you_ then bring up rkward as an example? That was your 
suggestion.

> Because you asked, "Yes, well, but are there other customers?"  Also, I'm 
trying
> to put myself in the perspective of package users.

> Is this a more appropriate example?

> # ldd /usr/local/lib/R/library/tseries/libs/tseries.so | grep libR
> libRblas.so => /usr/local/lib/R/lib/libRblas.so (0x80120c000)
> libR.so => /usr/local/lib/R/lib/libR.so (0x801c0)

Well, Dirk has said to have given his last reply on this thread.
I (as a member of R-core) am glad about people like Dirk who
take some of our load and helpfully answer such
questions/reports on R-devel.

To the issue:  I also don't see what your point is.
R works with these so libraries  as intended  in all cases as
far as we know, and so I don't understand why anything needs to
be changed.
All these libraries "belong to R" and are tied to a specific
version of R  and are not be used outside of R,  so I also don't see
your point.

Best regards,

Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

2016-11-22 Thread Martin Maechler

>>>>> Gabriel Becker <gmbec...@ucdavis.edu>
>>>>> on Tue, 15 Nov 2016 11:56:04 -0800 writes:

> All,
> Martin: Thanks for this and all the other things you are doing to both
> drive R forward and engage more with the community about things like this.

> Apologies for missing this discussion the first time it came around and if
> anything here has already been brought up, but I wonder what exactly you
> mean when you want recycling behavior.

Thank you, Gabe.

Note that my premise was really to get *away* from inheriting
too much from 'test'.
Hence, I have *not* been talking about replacing ifelse() but
rather of providing a new  ifelse2()

   [ or if_else()  if Hadley was willing to ditch the dplyr one
   in favor of a base one]

> Specifically, based on an unrelated discussion with Henrik Bengtsson on
> Twitter, I wonder if preserving the recycling behavior test is longer than
> yes, no, but making the case where

> length( test ) < max(length( yes ), length( no ))

> would simplify usage for userRs in a useful way.

I'm sorry I don't understand the sentence above.

> I suspect it's easy to
> forget that the result is not guaranteed to be the length of  test, even
> for intermediate and advanced users familiar with ifelse and it's
> strengths/weaknesses.

> I certainly agree (for what that's worth...) that

> x = rnorm(100)

> y = ifelse2(x > 0, 1L, 2L)

> should continue to work.

(and give a a length 10 result).
Also
ifelse2(x > 0, sqrt(x), 0L)

should work even though  class(sqrt(x)) is "numeric" and the one
of 0L is "integer", and I'd argue

ifelse2(x < 0, sqrt(x + 0i), sqrt(x))

should also continue to work as with ifelse().

> Also, If we combine a stricter contract that the output will always be of
> length with the suggestion of a specified output class 

that was not my intent here but would be another interesting
extension. However, I would like to keep  R-semantic silent coercions
such as
  logical < integer < double < complex

and your pseudo code below would not work so easily I think.

> the pseudo code could be

(I'm changing assignment '=' to  '<-' ...  [please!] )

> ifelse2 <- function(test, yes, no, outclass) {
>   lenout  <- length(test)
>   out <- as( rep(yes, length.out <- lenout), outclass)
>   out[!test] <- as(rep(no, length.out = lenout)[!test], outclass)
>   # handle NA stuff
>   out
> }


> NAs could be tricky if outclass were allowed to be completely general, but
> doable, I think? Another approach  if we ARE fast-passing while leaving
> ifelse intact is that maybe NA's in test just aren't allowed in ifelse2.
> I'm not saying we should definitely do that, but it's possible and would
> make things faster.

> Finally, In terms of efficiency, with the stuff that Luke and I are 
working
> on, the NA detection could be virtually free in certain cases, which could
> give a nice boost for long vectors  that don't have any NAs (and 'know'
> that they don't).

That *is* indeed a very promising prospect!
Thank you in advance! 

> Best,
> ~G

I still am bit disappointed by the fact that it seems nobody has
taken a good look at my ifelse2() proposal.

I really would like an alternative to ifelse() in *addition* to
the current ifelse(), but hopefully in the future being used in
quite a few places instead of ifelse()
efficiency but for changed semantics, namely working for considerably
more "vector like" classes of  'yes' and 'no'  than the current
ifelse().

As I said, the current proposal works for objects of class
   "Date", "POSIXct", "POSIXlt", "factor",  "mpfr" (pkg 'Rmpfr')
and hopefully for "sparseVector" (in a next version of the 'Matrix' pkg).

Martin

> On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler 
<maech...@stat.math.ethz.ch
>> wrote:

>> Finally getting back to this :
>> 
>> >>>>> Hadley Wickham <h.wick...@gmail.com>
>> >>>>> on Mon, 15 Aug 2016 07:51:35 -0500 writes:
>> 
>> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
>> > <h.wick...@gmail.com> wrote:
>> >>> >> One possibility would also be to consider a
>> >>> "numbers-only" or >> rather "same type"-only {e.g.,
>> >>> would also work for characters} >> version.
>> >>>
>> >>> > I don't know what you mean by these.
>> >>>
>

Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

2016-11-28 Thread Martin Maechler

ded to solve this differently.

I'm looking at these suggestions now, notably also your proposals below;
thank you, Suharto!

(I wanted to put my improved 'ifelse2' out first, quickly).
Martin


> A concrete version of 'ifelse2' that starts the result from 'yes':
> function(test, yes, no, NA. = NA) {
>     if(!is.logical(test))
>         test <- if(isS4(test)) methods::as(test, "logical") else 
as.logical(test)
>     n <- length(test)
>     ans <- rep(yes, length.out = n)
>     ans[!test & !is.na(test)] <- rep(no, length.out = n)[!test & 
!is.na(test)]
>     ans[is.na(test)] <- rep(NA., length.out = n)[is.na(test)]
>     ans
> }

> It requires 'rep' method that is compatible with subsetting. It also 
works with "POSIXlt" in R 2.7.2, when 'length' gives 9, and gives an 
appropriate result if time zones are the same.
> For coercion of 'test', there is no need of keeping attributes. So, it 
doesn't do
> storage.mode(test) <- "logical"
> and goes directly to 'as.logical'.
> It relies on subassignment for silent coercions of
> logical < integer < double < complex .
> Unlike 'ifelse', it never skips any subassignment. So, phenomenon as in 
"example of different return modes" in ?ifelse doesn't happen.

> Another version, for keeping attributes as pointed out by Duncan Murdoch:
> function(test, yes, no, NA. = NA) {
>     if(!is.logical(test))
>         test <- if(isS4(test)) methods::as(test, "logical") else 
as.logical(test)
>     n <- length(test)
>     n.yes <- length(yes); n.no <- length(no)
>     if (n.yes != n) {
>         if (n.no == n) {  # swap yes <-> no
>             test <- !test
>             ans <- yes; yes <- no; no <- ans
>             n.no <- n.yes
>         } else yes <- yes[rep_len(seq_len(n.yes), n)]
>     }
>     ans <- yes
>     if (n.no == 1L)
>         ans[!test] <- no
>     else
>         ans[!test & !is.na(test)] <- no[
>             if (n.no == n) !test & !is.na(test)
>             else rep_len(seq_len(n.no), n)[!test & !is.na(test)]]
>     stopifnot(length(NA.) == 1L)
>     ans[is.na(test)] <- NA.
>     ans
> }

> Note argument evaluation order: 'test', 'yes', 'no', 'NA.'.
> First, it chooses the first of 'yes' and 'no' that has the same length as 
the result. If none of 'yes' and 'no' matches the length of the result, it 
chooses recycled (or truncated) 'yes'.
> It uses 'rep' on the index and subsetting as a substitute for 'rep' on 
the value.
> It requires 'length' method that is compatible with subsetting.
> Additionally, it uses the same idea as dplyr::if_else, or more precisely 
the helper function 'replace_with'. It doesn't use 'rep' if the length of 'no' 
is 1 or is the same as the length of the result. For subassignment with value 
of length 1, recycling happens by itself and NA in index is OK.
    > It limits 'NA.' to be of length 1, considering 'NA.' just as a label for 
NA.

> Cases where the last version above or 'ifelse2 or 'ifelseHW' in 
ifelse-def.R gives inappropriate answers:
> - 'yes' and 'no' are "difftime" objects with different "units" attribute
> - 'yes' and 'no' are "POSIXlt" objects with different time zone
> Example: 'yes' in "UTC" and 'no' in "EST5EDT". The reverse, 'yes' in 
"EST5EDT" and 'no' in "UTC" gives error.

> For the cases, c(yes, no) helps. Function 'ifelseJH' in ifelse-def.R 
gives a right answer for "POSIXlt" case.
> -
> Martin et al.,




> On Tue, Nov 22, 2016 at 2:12 AM, Martin Maechler > wrote:

>> 
>> Note that my premise was really to get *away* from inheriting
>> too much from 'test'.
>> Hence, I have *not* been talking about replacing ifelse() but
>> rather of providing a new  ifelse2()
>> 
>>         [ or if_else()  if Hadley was willing to ditch the dplyr one
>>                         in favor of a base one]
>> 
>>      > Specifically, based on an unrelated discussion with Henrik 
Bengtsson
>> on
>>      > Twitter, I wonder if preserving the recycling behavior test is
>> longer than
>>      > yes, no, but making the case where
>> 
>>      > length( test ) < max(length( yes ), length( no ))
>> 
>>      > would simplify usage for userRs in a useful way.
>> 

> That was a copyediting bug on my part, it seem

Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

2016-11-28 Thread Martin Maechler


> Related to the length of 'ifelse' result, I want to say that "example of 
> different return modes" in ?ifelse led me to perceive a wrong thing in the 
> past.
>  ## example of different return modes:
>  yes <- 1:3
>  no <- pi^(0:3)
>  typeof(ifelse(NA,yes, no)) # logical
>  typeof(ifelse(TRUE,  yes, no)) # integer
>  typeof(ifelse(FALSE, yes, no)) # double
> 
> As the result of each 'ifelse' call is not printed, I thought that the length 
> of the result is 3. In fact, the length of the result is 1.

"of course"... (;-)

But this indeed proves that the example is too sophisticated and
not helpful/clear enough.
Is this better?

## example of different return modes (and 'test' alone determining length):
yes <- 1:3
no  <- pi^(1:4)
utils::str( ifelse(NA,yes, no) ) # logical, length 1
utils::str( ifelse(TRUE,  yes, no) ) # integer, length 1
utils::str( ifelse(FALSE, yes, no) ) # double,  length 1



> I realize just now that the length of 'no' is different from 'yes'. The 
> length of 'yes' is 3, the length of 'no' is 4.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] problem with normalizePath()

2016-11-18 Thread Martin Maechler

>>>>> Evan Cortens <ecort...@mtroyal.ca>
>>>>> on Thu, 17 Nov 2016 15:51:03 -0700 writes:

> I wonder if this could be related to the issue that I
> submitted to bugzilla about two months ago? (
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17159)

> That is to say, could it be that it's treating the first
> path after the single backslash as an actual directory,
> rather than as the name of the share?

> -- 
> Evan Cortens, PhD Institutional Analyst - Office of
> Institutional Analysis Mount Royal University 403-440-6529

Could well be.  Thank you, Evan, also for your bug report
including patch proposal.

In such situations we (R core) would be really happy if
Microsoft showed another facet of their investment into R:
Ideally there should be enough staff who can judge and test such
bugs and bug fixes? 

--> I'm BCC'ing this to one place at least.

Best,
Martin Maechler  ETH Zurich

> On Thu, Nov 17, 2016 at 2:28 PM, Laviolette, Michael <
> michael.laviole...@dhhs.nh.gov> wrote:

>> The packages "readxl" and "haven" (and possibly others)
>> no longer access files on shared network drives. The
>> problem appears to be in the normalizePath()
>> function. The file can be read from a local drive or by
>> functions that don't call normalizePath(). The error
>> thrown is
>> 
>> Error:
>> path[1]="\\Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls":
>> The system cannot find the file specified
>> 
>> Here's my session:
>> 
>> library(readxl) library(XLConnect)
>> 
>> # attempting to read file from network drive df1 <-
>> read_excel("//Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls")
>> # pathname is fully qualified, but error thrown as above
>> 
>> cat(normalizePath("//Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls"))
>> # throws same error
>> 
>> # reading same file with different function df2 <-
>> readWorksheetFromFile("//Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls",
>> 1) # completes successfully
>> 
>> # reading same file from local drive df3 <-
>> read_excel("C:/17.xls") # completes successfully
>> 
>> sessionInfo() R version 3.3.2 (2016-10-31) Platform:
>> x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7
>> x64 (build 7601) Service Pack 1
>> 
>> locale: [1] LC_COLLATE=English_United States.1252
>> LC_CTYPE=English_United States.1252 [3]
>> LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5]
>> LC_TIME=English_United States.1252
>> 
>> attached base packages: [1] stats graphics grDevices
>> utils datasets methods base
>> 
>> other attached packages: [1] readxl_0.1.1 dplyr_0.5.0
>> XLConnect_0.2-12 [4] XLConnectJars_0.2-12 ROracle_1.2-1
>> DBI_0.5-1
>> 
>> loaded via a namespace (and not attached): [1]
>> magrittr_1.5 R6_2.2.0 assertthat_0.1 tools_3.3.2
>> haven_1.0.0 [6] tibble_1.2 Rcpp_0.12.7 rJava_0.9-8
>> 
>> Please advise.  Thanks,
>> 
>> Michael Laviolette PhD MPH Public Health Statistician
>> Bureau of Public Health Statistics and Informatics New
>> Hampshire Division of Public Health Services 29 Hazen
>> Drive Concord, NH 03301-6504 Phone: 603-271-5688 Fax:
>> 603-271-7623 Email: michael.laviole...@dhhs.nh.gov
>> 
>> 
>> 
>> [[alternative HTML version deleted]]
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 

>   [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

2016-11-15 Thread Martin Maechler

Finally getting back to this :

>>>>> Hadley Wickham <h.wick...@gmail.com>
>>>>> on Mon, 15 Aug 2016 07:51:35 -0500 writes:

> On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
> <h.wick...@gmail.com> wrote:
>>> >> One possibility would also be to consider a
>>> "numbers-only" or >> rather "same type"-only {e.g.,
>>> would also work for characters} >> version.
>>>
>>> > I don't know what you mean by these.
>>>
>>> In the mean time, Bob Rudis mentioned dplyr::if_else(),
>>> which is very relevant, thank you Bob!
>>>
>>> As I have found, that actually works in such a "same
>>> type"-only way: It does not try to coerce, but gives an
>>> error when the classes differ, even in this somewhat
>>> debatable case :
>>>
>>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error:
>>> `false` has type 'double' not 'integer'
>>> >
>>>
>>> As documented, if_else() is clearly stricter than
>>> ifelse() and e.g., also does no recycling (but of
>>> length() 1).
>>
>> I agree that if_else() is currently too strict - it's
>> particularly annoying if you want to replace some values
>> with a missing:
>>
>> x <- sample(10) if_else(x > 5, NA, x) # Error: `false`
>> has type 'integer' not 'logical'
>>
>> But I would like to make sure that this remains an error:
>>
>> if_else(x > 5, x, "BLAH")
>>
>> Because that seems more likely to be a user error (but
>> reasonable people might certainly believe that it should
>> just work)
>>
>> dplyr is more accommodating in other places (i.e. in
>> bind_rows(), collapse() and the joins) but it's
>> surprisingly hard to get all the details right. For
>> example, what should the result of this call be?
>>
>> if_else(c(TRUE, FALSE), factor(c("a", "b")),
>> factor(c("c", "b"))
>>
>> Strictly speaking I think you could argue it's an error,
>> but that's not very user-friendly. Should it be a factor
>> with the union of the levels? Should it be a character
>> vector + warning? Should the behaviour change if one set
>> of levels is a subset of the other set?
>>
>> There are similar issues for POSIXct (if the time zones
>> are different, which should win?), and difftimes
>> (similarly for units).  Ideally you'd like the behaviour
>> to be extensible for new S3 classes, which suggests it
>> should be a generic (and for the most general case, it
>> would need to dispatch on both arguments).

> One possible principle would be to use c() -
> i.e. construct out as

> out <- c(yes[0], no[0]
> length(out) <- max(length(yes), length(no))

yes; this would require that a  `length<-` method works for the
class of the result.

Duncan Murdoch mentioned a version of this, in his very
first reply:

ans <- c(yes, no)[seq_along(test)]
ans <- ans[seq_along(test)]

which is less efficient for atomic vectors, but requires
less from the class: it "only" needs `c` and `[` to work

and a mixture of your two proposals would be possible too:

ans <- c(yes[0], no[0])
ans <- ans[seq_along(test)]

which does *not* work for my "mpfr" numbers (CRAN package 'Rmpfr'),
but that's a buglet in the  c.mpfr() implementation of my Rmpfr
package... (which has already been fixed in the development version on R-forge,
https://r-forge.r-project.org/R/?group_id=386)

> But of course that wouldn't help with factor responses.

Yes.  However, a version of Duncan's suggestion -- of treating 'yes' first
-- does help in that case.

For once, mainly as "feasability experiment",
I have created a github gist to make my current ifelse2() proposal available
for commenting, cloning, pullrequesting, etc:

Consisting of 2 files
- ifelse-def.R :  Functions definitions only, basically all the current
proposals, called  ifelse*()
- ifelse-checks.R : A simplistic checking function
and examples calling it, notably demonstrating that my
ifelse2()  does work with
"Date",  (i.e. "POSIXct" and "POSIXlt"), factors,
    and "mpfr" (the arbitrary-precision numbers in my package "Rmpfr")

Also if you are not on github, you can quickly get to the ifelse2()
definition :

htt

Re: [Rd] BUG?: On Linux setTimeLimit() fails to propagate timeout error when it occurs (works on Windows)

2016-10-26 Thread Martin Maechler

> Spencer Graves 
> on Tue, 25 Oct 2016 22:02:29 -0500 writes:

> On 10/25/2016 9:44 PM, Henrik Bengtsson wrote:
>> setTimeLimit(elapsed=1) causes a timeout error whenever a call takes
>> more than one second.  For instance, this is how it works on Windows
>> (R 3.3.1):
>> 
>>> setTimeLimit(elapsed=1)
>>> Sys.sleep(10); message("done")
>> Error in Sys.sleep(10) : reached elapsed time limit
>> 
>> Also, the error propagates immediately and causes an interrupt after ~1 
second;
>> 
>>> system.time({ Sys.sleep(10); message("done") })
>> Error in Sys.sleep(10) : reached elapsed time limit
>> Timing stopped at: 0.01 0 1.02
>> 
>> This works as expected.  However, on Linux (R 3.3.1 but also e.g.
>> 2.11.0, 2.15.3) I get:
>> 
>>> setTimeLimit(elapsed=1)
>>> system.time({ Sys.sleep(10); message("done") })
>> Error in Sys.sleep(10) : reached elapsed time limit
>> Timing stopped at: 0 0 10.01
>> 
>> Note how the timeout error is signaled, but for some reason, it does
>> not interrupt the Sys.sleep(10) call until after it finishes after 10
>> seconds.  If you change to Sys.sleep(60) it will take 1 minute. Note
>> that the following print("done") is not called, so the timeout error
>> does propagate immediately after Sys.sleep() but not before / during.
>> 
>> This looks like a bug to me.  Can anyone on macOS confirm whether this
>> is also a problem there or not?


>> setTimeLimit(elapsed=1)
>> system.time({ Sys.sleep(10); message("done") })
> Error in Sys.sleep(10) : reached elapsed time limit
> Timing stopped at: 0.003 0.004 0.978
>> 
>> sessionInfo()
> R version 3.3.1 (2016-06-21)
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
> Running under: OS X 10.11.6 (El Capitan)

Thank you, Spencer.

Indeed, confirmed here (Linux Fedora 24) for the most current
'R-devel' and "R 3.3.2 RC".

Also, this "not quite terminating" on Linux is not limited to
Sys.sleep() in case someone was wondering:

> setTimeLimit(elapsed=0.5) ; system.time(P <- sfsmisc::primes(1e7))
   user  system elapsed 
  0.227   0.055   0.281 
> str(P)
 int [1:664579] 2 3 5 7 11 13 17 19 23 29 ...
> setTimeLimit(elapsed=0.5) ; system.time(P <- sfsmisc::primes(3e7))
Error in sfsmisc::primes(3e+07) : reached elapsed time limit
Timing stopped at: 0.538 0.132 0.671


This *is* embarrassing a bit;  ..  probably too late to be fixed
for 3.3.2 .. (and something I'd rather leave to others to fix).

It may be that this has never worked on Linux, or then worked in
Linuxen where some interrupt behavior was different.
At least on my current Linux it does not work, all the way back
to R 2.11.1 .. and setTimeLimit() has not existed for much longer...

Martin


> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

> attached base packages:
> [1] stats graphics  grDevices utils
> [5] datasets  methods   base

> loaded via a namespace (and not attached):
> [1] rsconnect_0.5 tools_3.3.1
> Error: reached elapsed time limit

>> /Henrik
>> 
>>> sessionInfo()
>> R version 3.3.1 (2016-06-21)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows XP x64 (build 2600) Service Pack 3
>> 
>>> sessionInfo()
>> R version 3.3.1 (2016-06-21)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 16.04.1 LTS
>> 
>>> sessionInfo()
>> R version 2.11.0 (2010-04-22)
>> x86_64-unknown-linux-gnu
>> 
>> sessionInfo()
>> R version 2.15.3 (2013-03-01)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] improve 'package not installed' load errors?

2016-10-26 Thread Martin Maechler

> Duncan Murdoch 
> on Mon, 24 Oct 2016 14:54:16 -0400 writes:

> On 24/10/2016 1:51 PM, Kevin Ushey wrote:
>> Hi R-devel,
>> 
>> One of the more common issues that new R users see, and become stumped
>> by, is error messages during package load of the form:
>> 
>> > library(ggplot2)
>> Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()),
>> versionCheck = vI[[j]]) :
>> there is no package called 'Rcpp'
>> Error: package or namespace load failed for 'ggplot2'
>> 
>> Typically, error messages of this form are caused simply by one or
>> more dependent packages (in this case, 'Rcpp') not being installed or
>> available on the current library paths. (A side question, which I do
>> not know the answer to, is how users get themselves into this state.)

> I think one way to get here is to be running with several libraries.  
> You install ggplot2 while Rcpp is available, but in a different part of 
> the .libPaths list, then in a later session try to use it with a 
> different .libPaths setting.
>> 
>> I believe it would be helpful for new users if the error message
>> reported here was a bit more direct, e.g.
>> 
>> > library(ggplot2)
>> Error: 'ggplot2' depends on package 'Rcpp', but 'Rcpp' is not installed
>> consider installing 'Rcpp' with install.packages("Rcpp")

> The risk with this message is that Rcpp may really be installed, but 
> it's just not currently on .libPaths.  Detecting that situation and 
> reporting on it looks like it would be relatively hard:  it would mean 
> the ggplot2 installation needs to record where it found all 
> dependencies, and if at some later time it doesn't find one, see if that 
> location still exists and would still work (in which case the message 
> should suggest modifying .libPaths).  I think that's too much work.

> Even a simple change like

> Error: 'ggplot2' depends on package 'Rcpp', but 'Rcpp' was not found


> might not be easy (which function knows both names?)  

> However, if you want to suggest a patch to implement this,
> I would take a look. 

I woul want to take a look, even before that. Our current error
handling here should be revised, I think : 

For library() the user sees *two* error messages: In my "setup"
((where I did fiddle with .libPaths() to provoke the error,
  exactly as Duncan mentioned)), I have

>> > library(ggplot2)

1. >> Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = 
vI[[i]]) : 
>>   there is no package called ‘gtable’

2. >> Error: package or namespace load failed for ‘ggplot2’

and together they at least give a good clue to the user (yes,
not easy enough for the beginner, I agree).

However, because the above is already a kludge (only one of the
two error messages is part of the error that is signalled !!!),
the situation is even worse if the user (or her code) uses require():

>> > require(ggplot2)
>> Loading required package: ggplot2
>> Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : 
>>   there is no package called ‘gtable’
>> > 

Only the 2nd of  library()'s "Error" messages is transfered to require()
[or any other caller of library() !]
and that is in itself very unsatisfactory.


>> In other words, it might be helpful to avoid printing the
>> 'loadNamespace()' call on error (since it's mostly just scary /
>> uninformative), and check up-front that the package is installed
>> before attempting to call 'loadNamespace()'.

well, yes, one should not use try() there, but tryCatch() anyway :
try() is a wrapper around tryCatch() and I agree the error
message should not be printed which try() *does* by default, but
should be *combined* with the "2nd one" to one error.. which
then also is automatically "transfered" to require() or another caller.

There is a small problem for producing a really nice error
message : It is *wrong* to assume we can easily use  sub() or
similar to get the dependecy package name ('gtable' or 'Rcpp' in
the above examples) from the error message :

The error message may be and often is translated {{apart from the
 "Error in " of the first error message which is  never
 translated, it seems, but that is different issue(buglet) }} :

a) French:

> Sys.setenv("LANGUAGE"="fr"); Sys.setlocale("LC_MESSAGES", "fr_FR.UTF-8")
> library(ggplot2)
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : 
  aucun package nommé ‘gtable’ n'est trouvé
Erreur : le chargement du package ou de l'espace de noms a échoué pour ‘ggplot2’

b) German:

> Sys.setenv("LANGUAGE"="de"); Sys.setlocale("LC_MESSAGES", "de_CH.UTF-8")
[1] "de_CH.UTF-8"
> library(ggplot2)
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : 
  es gibt kein Paket namens ‘gtable’
Fehler: Laden von Paket oder Namensraum für ‘ggplot2’ fehlgeschlagen
> 

c)

Re: [Rd] Running package tests and not stop on first fail

2016-11-03 Thread Martin Maechler

>>>>> Jan Gorecki <j.gore...@wit.edu.pl>
>>>>> on Tue, 1 Nov 2016 22:51:28 + writes:

> Hello community/devs, Is there an option to run package
> tests during R CMD check and not stop on first error? I
> know that testing frameworks (testhat and others) can do
> that but asking about just R and base packages. Currently
> when package check runs test scripts in ./tests directory
> it will stop after first fail.  Do you think it could be
> optionally available to continue to run tests after
> failures?  Regards, Jan Gorecki

I agree that this would be a useful option sometimes.

So I would be supportive to get such an option, say,

   R CMD check --no-stop-on-error  

into R if someone provided (relatively small) patches to the R
sources (i.e. subversion repos at https://svn.r-project.org/R/trunk/ ).
The relevant source code should basically all be in
src/library/tools/R/testing.R

Note that this may be complicated, also because "parallel"
checking is available in parts, via the TEST_MC_CORES
environment variable ((which is currently only quickly
documented in the 'R Administration ..' manual))


Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Running package tests and not stop on first fail

2016-11-04 Thread Martin Maechler

>>>>> Jan Gorecki <j.gore...@wit.edu.pl>
>>>>> on Fri, 4 Nov 2016 11:20:37 + writes:

> Martin, I submitted very simple patch on
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17176

> Herve, While I like your idea, I prefer to keep my patch
> simple, it is now exactly what Martin mentions. I think it
> is a good start that can eventually be extended later for
> what you are asking.

I tend to agree; this seems indeed much easier than I
anticipated.  Thank you, Jan!

I'm testing a version which uses the logical variable
'stop_on_error' rather than 'no_stop_on_error' (because
!no_stop_on_error is hard to mentally parse quickly).

My proposed name  '--no-stop-on-error'  was a quick shot; if
somebody has a more concise or better "English style" wording
(which is somewhat compatible with all the other options you see
from 'R CMD check --help'),
please speak up.

Martin

> Regards, Jan

> On 3 November 2016 at 17:25, Hervé Pagès
> <hpa...@fredhutch.org> wrote:
>> 
>> Hi Martin, Jan,
>> 
>> On 11/03/2016 03:45 AM, Martin Maechler wrote:
>>>>>>>> 
>>>>>>>> Jan Gorecki <j.gore...@wit.edu.pl> on Tue, 1 Nov
>>>>>>>> 2016 22:51:28 + writes:
>>> 
>>> 
>>> > Hello community/devs, Is there an option to run
>>> package > tests during R CMD check and not stop on first
>>> error? I > know that testing frameworks (testhat and
>>> others) can do > that but asking about just R and base
>>> packages. Currently > when package check runs test
>>> scripts in ./tests directory > it will stop after first
>>> fail.  Do you think it could be > optionally available
>>> to continue to run tests after > failures?  Regards, Jan
>>> Gorecki
>>> 
>>> I agree that this would be a useful option sometimes.
>>> 
>>> So I would be supportive to get such an option, say,
>>> 
>>> R CMD check --no-stop-on-error 
>> 
>> 
>> A couple of years ago the behavior of 'R CMD check' was
>> changed to continue checking (e.g. the examples) after
>> many types of errors, and to output a summary count of
>> errors at the end if any have occurred.  So
>> --no-stop-on-error could easily be interpreted as an
>> option that controls this behavior (and would also
>> suggest that the default has been reverted back to what
>> it was prior to R 3.2.0), rather than an option that
>> specifically controls what should happen while running
>> the tests.
>> 
>> Cheers, H.
>> 
>>> 
>>> into R if someone provided (relatively small) patches to
>>> the R sources (i.e. subversion repos at
>>> https://svn.r-project.org/R/trunk/ ).  The relevant
>>> source code should basically all be in
>>> src/library/tools/R/testing.R
>>> 
>>> Note that this may be complicated, also because
>>> "parallel" checking is available in parts, via the
>>> TEST_MC_CORES environment variable ((which is currently
>>> only quickly documented in the 'R Administration ..'
>>> manual))
>>> 
>>> 
>>> Martin Maechler ETH Zurich
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 
>> --
>> Hervé Pagès
>> 
>> Program in Computational Biology Division of Public
>> Health Sciences Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA
>> 98109-1024
>> 
>> E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:
>> (206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Running package tests and not stop on first fail

2016-11-04 Thread Martin Maechler

>>>>> Brian G Peterson <br...@braverock.com>
>>>>> on Fri, 4 Nov 2016 10:37:18 -0500 writes:

> On Fri, 2016-11-04 at 16:24 +0100, Martin Maechler wrote:
>> >>>>> Jan Gorecki <j.gore...@wit.edu.pl> >>>>> on Fri, 4
>> Nov 2016 11:20:37 + writes:
>> 
>> > Martin, I submitted very simple patch on >
>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17176
>> 
>> > Herve, While I like your idea, I prefer to keep my
>> patch > simple, it is now exactly what Martin mentions. I
>> think it > is a good start that can eventually be
>> extended later for > what you are asking.
>> 
>> I tend to agree; this seems indeed much easier than I
>> anticipated.  Thank you, Jan!
>> 
>> I'm testing a version which uses the logical variable
>> 'stop_on_error' rather than 'no_stop_on_error' (because
>> !no_stop_on_error is hard to mentally parse quickly).
>> 
>> My proposed name '--no-stop-on-error' was a quick shot;
>> if somebody has a more concise or better "English style"
>> wording (which is somewhat compatible with all the other
>> options you see from 'R CMD check --help'), please speak
>> up.

> I might suggest

> --stop-tests-on-error

> with default=TRUE to match current functionality.

Thank you, Brian.

though that would be less concise and I think less matching the
'R CMD check' philosophy with many '--no-*' options to turn
*off* defaults. Note that most options have no " = " part, because
they are binary and I think that's easiest for use (when the 'binary' case 
is general enough). Also   R CMD check --help  
ends saying  "By default, all test sections are turned on."
which does fit the use of all those '--no-*' options.

OTOH, we also have  '--ignore-vignettes'
so we could consider

   --ignore-tests-errors

?


> This might avoid any confusion related to the behavior of
> continuing to run examples on error in R CMD check.

You are quite right on that, indeed.
Martin

> Regards,
> Brian

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Running package tests and not stop on first fail

2016-11-04 Thread Martin Maechler

>>>>> Dirk Eddelbuettel <e...@debian.org>
>>>>> on Fri, 4 Nov 2016 10:36:52 -0500 writes:

> On 4 November 2016 at 16:24, Martin Maechler wrote: | My
> proposed name '--no-stop-on-error' was a quick shot; if |
> somebody has a more concise or better "English style"
> wording | (which is somewhat compatible with all the other
> options you see | from 'R CMD check --help'), | please
> speak up.

> Why not keep it simple?  The similar feature this most
> resembles is 'make -k' and its help page has

>-k, --keep-going
   
>Continue as much as possible after an
> error.  While the target that failed, and those that
> depend on it, cannot be remade, the other dependencies of
> these targets can be processed all the same.

Yes, that would be quite a bit simpler and nice in my view.
One may think it to be too vague,
notably from Brian Pedersen's mentioning that the examples are
already continued in any case if they lead to an error.

Other opinions?

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] optim(…?=, =?utf-8?Q?method=‘L-BFGS-B’) stops with an error message while violating the lower bound

2016-10-10 Thread Martin Maechler

>>>>> Spencer Graves <spencer.gra...@prodsyse.com>
>>>>> on Sat, 8 Oct 2016 18:03:43 -0500 writes:

[.]

>  2.  It would be interesting to know if the
> current algorithm behind optim and optimx with
> method='L-BFGS-B' incorporates Morales and Nocedal (2011)
> 'Remark on “Algorithm 778: L-BFGS-B: Fortran Subroutines
> for Large-Scale Bound Constrained Optimization”'.  I
> created this vignette and started this threat hoping that
> someone on the R Core team might decide it's worth
> checking things like that.

well I hope you mean "thread" rather "threat"  ;-)

I've now looked at the reference above, which is indeed quite
interesting.
doi 10.1145/2049662.2049669
--> http://dl.acm.org/citation.cfm?doid=2049662.2049669
A "free" (pre-publication I assume) version of the manuscript is
  http://www.eecs.northwestern.edu/~morales/PSfiles/acm-remark.pdf

The authors, Morales and Nocedal, the 2nd one being one of the
original L-BFGS-B(1997) paper, make two remarks, the 2nd one
about the "machine epsilon" used, and I can assure you that R's
optim() version never suffered from that; we've always been
using a C translation of the fortran code, and then used DBL_EPSILON.
R's (main) source file for that is in .../src/appl/lbfgsb.c, e.g., here
https://svn.r-project.org/R/trunk/src/appl/lbfgsb.c

OTOH, their remark 1 is very relevant and promising faster /
more reliable convergence. 
I'd be "happy" if optim() could gain a new option, say, "L-BFGS-B-2011"
which would incorporate what they call "modified L-BFGS-B".

However, I did not find published code to go together with their
remark.
Ideally, some of you interested in this, would provide a patch
against the above  lbfgsb.c  file

Martin Maechler,
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] New leap second end of 2016 / beginning 2017 (depending on TZ)

2016-12-14 Thread Martin Maechler

As R is sophisticated enough to track leap seconds,

   ?.leap.seconds

we'd need to update our codes real soon now again:

https://en.wikipedia.org/wiki/Leap_second

(and those of you who want second precision in R in 2017 need to start
working with 'R patched' or 'R devel' ...)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] New leap second end of 2016 / beginning 2017 (depending on TZ)

2016-12-15 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Wed, 14 Dec 2016 17:04:22 +0100 writes:

> As R is sophisticated enough to track leap seconds,
> ?.leap.seconds

> we'd need to update our codes real soon now again:

> https://en.wikipedia.org/wiki/Leap_second

> (and those of you who want second precision in R in 2017 need to start
> working with 'R patched' or 'R devel' ...)

I've been told offline, that the above could be considered as
FUD .. which I hope nobody read from it.

Furthermore, there seems to be wide disagreement about the
usefulness of leap seconds, and how computers (and OSs) should
deal with them.
One recent approach (e.g. by Google) is to "smear the leap
second" into the system (by somehow "throttling" time servers ;-)..

(and no, I even less would want this to become a long thread, so
 please refrain if you can ...)

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] print.POSIXct doesn't seem to use tz argument, as per its example

2016-12-16 Thread Martin Maechler

> Jennifer Lyon 
> on Thu, 15 Dec 2016 09:33:30 -0700 writes:

> On the documentation page for DateTimeClasses, in the Examples section,
> there are the following two lines:
> 
> format(.leap.seconds) # the leap seconds in your time zone
> print(.leap.seconds, tz = "PST8PDT")  # and in Seattle's
> 
> The second line (using print) seems to ignore the tz argument, and prints
> the dates in my time zone, while:
> 
> format(.leap.seconds, tz = "PST8PDT")
> 
> does print the dates in PST. The code in
> https://github.com/wch/r-source/blob/trunk/src/library/base/R/datetime.R
> around line 234 looks like the ... argument is passed to print, not to
> format.
> 
> print.POSIXct <-
> print.POSIXlt <- function(x, ...)
> {
> max.print <- getOption("max.print", L)
> if(max.print < length(x)) {
> print(format(x[seq_len(max.print)], usetz = TRUE), ...)
> cat(' [ reached getOption("max.print") -- omitted',
> length(x) - max.print, 'entries ]\n')
> } else print(if(length(x)) format(x, usetz = TRUE)
>  else paste(class(x)[1L], "of length 0"), ...)
> invisible(x)
> }
> 
> The documentation for print() on this page seems to be silent on tz as an
> argument, but I do believe the example using print() does not work as
> advertised.

> Thanks.
> 
> Jen

Thank you, Jen!
Indeed,  both your observation and your diagnosis are correct:
This has been a misleading example and needs amending (or the
code is changed, see below).

The most simple fix would be to replace  'print('  by
'format('; then the example would work as advertized.
That change has two drawbacks still:

1) format(.) examples on the help of print.POSIXct() where
   format.POSIXct() is *not* documented 

2) It *would* make sense that print.POSIXct() allowed for a 'tz' argument
   (and maybe 'usetz' too).  This/these would be (an) extra
   argument(s) rather than passing '...' not just to print() but
   also to format()rathere

My personal preference would tend to add both
 tz = ""
and  usetz = TRUE
to the formal arguments of print.POSIXct and pass them to the
format(.) calls.

Martin


> sessionInfo()
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 14.04.5 LTS
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] segfault with POSIXlt zone=NULL zone=""

2016-12-06 Thread Martin Maechler

> Joshua Ulrich 
> on Tue, 6 Dec 2016 09:51:16 -0600 writes:

> On Tue, Dec 6, 2016 at 6:37 AM,   wrote:
>> Hi all,
>> 
>> I ran into a segfault while playing with dates.
>> 
>> $ R --no-init-file
>> ...
>> > library(lubridate); d=as.POSIXlt(floor_date(Sys.time(),"year")); 
d$zone=NULL; d$zone=""; d
>> 
> If you're asking about a bug in R, you should provide a *minimal*
> reproducible example (i.e. one without any package dependencies).
> This has nothing to do with lubridate, so you can reproduce the
> behavior with:

> d <- as.POSIXlt(Sys.time())
> d$zone <- NULL
> d$zone <- ""
> d

[..]

>> Hope I'm not doing something illegal...
>> 
> You are.  You're changing the internal structure of a POSIXlt object
> by re-ordering the list elements.  You should not expect a malformed
> POSIXlt object to behave as if it's correctly formed.  You can see
> it's malformed by comparing it's unclass()'d output.

> d <- as.POSIXlt(Sys.time())
> unclass(d)  # valid POSIXlt object
> d$zone <- NULL
> d$zone <- ""
> unclass(d)  # your malformed POSIXlt object

Indeed, really illegal, i.e. "against the law" ... ;-)

Thank you, Joshua!

Still, if R segfaults without the user explicitly
calling .Call(), .Internal()  or similar -- as here --
we usually acknowledge there *is* a bug in R .. even if it is
only triggered by a users "illegal" messing around.

an MRE for the above, where I really only re-order the "internal" list:

d <- as.POSIXlt("2016-12-06"); dz <- d$zone; d$zone <- NULL; d$zone <- dz; f <- 
format(d)

>  *** caught segfault ***
> address 0x8020, cause 'memory not mapped'

> Traceback:
>  1: format.POSIXlt(d)
>  2: format(d)

The current code is "optimized for speed" (not perfectly), and
a patch should hopefully address the C code.

Note that a smaller MRE -- which does *not* re-order, but just
invalidate the time zone is

  d <- as.POSIXlt("2016-12-06"); d$zone <- 1; f <- format(d)

--

I have now committed a "minimal" patch (to the C code) which for
the above two cases gives a sensible error rather than a
seg.fault :

  > d <- as.POSIXlt("2016-12-06"); d$zone <- 1 ; f <- format(d)
  Error in format.POSIXlt(d) : 
invalid 'zone' component in "POSIXlt" structure

  > d <- as.POSIXlt("2016-12-06"); dz <- d$zone; d$zone <- NULL; d$zone <- dz; 
f <- format(d)
  Error in format.POSIXlt(d) : 
invalid 'zone' component in "POSIXlt" structure
  > 

I guess that it should still be possible to produce a segfault
with invalid 'POSIXlt' structures though.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Strange behavior when using progress bar (Fwd: Re: [R] The code itself disappears after starting to execute the for loop)

2016-12-07 Thread Martin Maechler

>>>>> Jon Skoien <jon.sko...@jrc.ec.europa.eu>
>>>>> on Wed, 7 Dec 2016 11:04:04 +0100 writes:

> I would like to ask once more if this is reproducible also for others? 
> If yes, should I submit it as a bug-report?

> Best,
> Jon

Please  Windows users .. this is possibly only for you!

Note that I do *not* see problems on Linux (in ESS; did not try RStudio).

Please also indicate in which form you are running R.
Here it does depend if this is inside RStudio, ESS, the "Windows
GUI", the "Windows terminal", ...

Martin Maechler,
ETH Zurich


> On 11/28/2016 11:26 AM, Jon Skoien wrote:
>> I first answered to the email below in r-help, but as I did not see 
>> any response, and it looks like a bug/unwanted behavior, I am also 
>> posting here. I have observed this in RGui, whereas it seems not to 
>> happen in RStudio.
>> 
>> Similar to OP, I sometimes have a problem with functions using the 
>> progress bar. Frequently, the console is cleared after x iterations 
>> when the progress bar is called in a function which is wrapped in a 
>> loop. In the example below, this happened for me every ~44th 
>> iteration. Interestingly, it seems that reduction of the sleep times 
>> in this function increases the number of iterations before clearing. 
>> In my real application, where the progress bar is used in a much 
>> slower function, the console is cleared every 2-3 iteration, which 
>> means that I cannot scroll back to check the output.

 testit <- function(x = sort(runif(20)), ...) {
   pb <- txtProgressBar(...)
   for(i in c(0, x, 1)) {Sys.sleep(0.2); setTxtProgressBar(pb, i)}
   Sys.sleep(1)
   close(pb)
 }
 
 it <- 0
 while (TRUE) {testit(style = 3); it <- it + 1; print(paste("done", it))}

>> Is this only a problem for a few, or is it reproducible? Any hints to
>> what the problem could be, or if it can be fixed? I have seen this in 
>> some versions of R, and could also reproduce in 3.3.2.

"some versions of R" ... all on Windows ?

>> 
>> Best wishes,
>> Jon
>> 
>> R version 3.3.2 (2016-10-31)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows 8.1 x64 (build 9600)
>> 
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>> 
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods base

[.]

> Jon Olav Skøien
> Joint Research Centre - European Commission
> Institute for Space, Security & Migration
> Disaster Risk Management Unit

> Via E. Fermi 2749, TP 122,  I-21027 Ispra (VA), ITALY

> jon.sko...@jrc.ec.europa.eu
> Tel:  +39 0332 789205

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] accelerating matrix multiply

2017-01-10 Thread Martin Maechler

>>>>> Cohn, Robert S <robert.s.c...@intel.com>
>>>>> on Sat, 7 Jan 2017 16:41:42 + writes:

> I am using R to multiply some large (30k x 30k double)
> matrices on a 64 core machine (xeon phi).  I added some timers
> to src/main/array.c to see where the time is going. All of the
> time is being spent in the matprod function, most of that time
> is spent in dgemm. 15 seconds is in matprod in some code that
> is checking if there are NaNs.

> > system.time (C <- B %*% A)
> nancheck: wall time 15.240282s
>dgemm: wall time 43.111064s
>  matprod: wall time 58.351572s
> user   system  elapsed 
> 2710.154   20.999   58.398
> 
> The NaN checking code is not being vectorized because of the
> early exit when NaN is detected:
> 
>   /* Don't trust the BLAS to handle NA/NaNs correctly: PR#4582
>* The test is only O(n) here.
>*/
>   for (R_xlen_t i = 0; i < NRX*ncx; i++)
>   if (ISNAN(x[i])) {have_na = TRUE; break;}
>   if (!have_na)
>   for (R_xlen_t i = 0; i < NRY*ncy; i++)
>   if (ISNAN(y[i])) {have_na = TRUE; break;}
> 
> I tried deleting the 'break'. By inspecting the asm code, I
> verified that the loop was not being vectorized before, but
> now is vectorized. Total time goes down:
> 
> system.time (C <- B %*% A)
> nancheck: wall time  1.898667s
>dgemm: wall time 43.913621s
>  matprod: wall time 45.812468s
> user   system  elapsed 
> 2727.877   20.723   45.859
> 
> The break accelerates the case when there is a NaN, at the
> expense of the much more common case when there isn't a
> NaN. If a NaN is detected, it doesn't call dgemm and calls its
> own matrix multiply, which makes the NaN check time
> insignificant so I doubt the early exit provides any benefit.
> 
> I was a little surprised that the O(n) NaN check is costly
> compared to the O(n**2) dgemm that follows. I think the reason
> is that nan check is single thread and not vectorized, and my
> machine can do 2048 floating point ops/cycle when you consider
> the cores/dual issue/8 way SIMD/muladd, and the constant
> factor will be significant for even large matrices.
> 
> Would you consider deleting the breaks? I can submit a patch
> if that will help. Thanks.
> 
> Robert

Thank you Robert for bringing the issue up ("again", possibly).
Within R core, some have seen somewhat similar timing on some
platforms (gcc) .. but much less dramatical differences e.g. on
macOS with clang.

As seen in the source code you cite above, the current
implementation was triggered by a nasty BLAS bug .. actually
also showing up only on some platforms, possibly depending on
runtime libraries in addition to the compilers used.

Do you have R code (including set.seed(.) if relevant) to show
on how to generate the large square matrices you've mentioned in
the beginning?  So we get to some reproducible benchmarks?

With best regards,
Martin Maechler

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Different results for cos,sin,tan and cospi,sinpi,tanpi

2016-12-01 Thread Martin Maechler

>>>>> Ei-ji Nakama <nak...@ki.rim.or.jp>
>>>>> on Thu, 1 Dec 2016 14:39:55 +0900 writes:

> Hi,
> i try sin, cos, and tan.

>> sapply(c(cos,sin,tan),function(x,y)x(y),1.23e45*pi)
> [1] 0.5444181 0.8388140 1.5407532

> However, *pi results the following

>> sapply(c(cospi,sinpi,tanpi),function(x,y)x(y),1.23e45)
> [1] 1 0 0

> Please try whether the following becomes all right.

[..]

Yes, it does  -- the fix will be in all future versions of R.

Thank you very much Ei-ji Nakama, for this valuable contribution
to make R better!

Martin Maechler,
ETH Zurich


> -- 
> Best Regards,
> --
> Eiji NAKAMA 
> "\u4e2d\u9593\u6804\u6cbb"  

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Different results for cos,sin,tan and cospi,sinpi,tanpi

2016-12-01 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Thu, 1 Dec 2016 09:36:10 +0100 writes:

>>>>> Ei-ji Nakama <nak...@ki.rim.or.jp>
>>>>> on Thu, 1 Dec 2016 14:39:55 +0900 writes:

>> Hi,
>> i try sin, cos, and tan.

>>> sapply(c(cos,sin,tan),function(x,y)x(y),1.23e45*pi)
>> [1] 0.5444181 0.8388140 1.5407532

>> However, *pi results the following

>>> sapply(c(cospi,sinpi,tanpi),function(x,y)x(y),1.23e45)
>> [1] 1 0 0

>> Please try whether the following becomes all right.

> [..]

> Yes, it does  -- the fix will be in all future versions of R.

oops not so quickly, Martin!

Of course, the results then coincide,  by sheer implementation.

*BUT* it is not at all clear which of the two results is better;
e.g., if you replace '1.23' by '1' in the above examples, the
result of the unchnaged  *pi() functions is 100% accurate,
whereas

 R> sapply(c(cos,sin,tan), function(Fn) Fn(1e45*pi))
 [1] -0.8847035 -0.4661541  0.5269043

is "garbage".  After all,  1e45 is an even integer and so, the
(2pi)-periodic functions should give the same as for 0  which
*is*  (1, 0, 0).

For such very large arguments, the results of all of sin() ,
cos() and tan()  are in some sense "random garbage" by
necessity:
Such large numbers have zero information about the resolution modulo
[0, 2pi)  or (-pi, pi]  and hence any (non-trivial) periodic
function with such a "small" period can only return "random noise".

> Thank you very much Ei-ji Nakama, for this valuable contribution
> to make R better!

That is still true!  It raises the issue to all of us and will
improve the documentation at least!

At the moment, I'm not sure where we should go.
Of course, I could start experiments using my own 'Rmpfr'
package where I can (with increasing computational effort!) get
correct values (for increasingly larger arguments) but at the
moment, I don't see how this would help.

Martin

> Martin Maechler,
> ETH Zurich

>> -- 
>> Best Regards,
>> --
>> Eiji NAKAMA 
>> "\u4e2d\u9593\u6804\u6cbb"  

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] problem with normalizePath()

2016-12-01 Thread Martin Maechler

>>>>> Evan Cortens <ecort...@mtroyal.ca>
>>>>> on Wed, 30 Nov 2016 09:58:59 -0700 writes:

> I found this as well. At our institution, our home directories are on
> network shares that are mapped to local drives. The default, it appears, 
is
> to set the location for libraries (etc) to the network share name
> (//computer//share/director/a/b/user) rather than the local drive mapping
> (H:/). Given the issue with dir.create(), this means it's impossible to
> install packages (since it tries to "create" the share, not the highest
> directory). This can be fixed in the same way Michael found, namely, set
> the environment variables to use the local mapping rather than the network
> share. But ideally, the fix would be to treat Windows network paths
> correctly.

Yes, and why shouldn't Microsoft be the institution who can best
judge how to do that,  now that they sell a "Microsoft R"  ?? 
!??!?!??!?!??!?
(trying again with BCC;  next time, I'll use CC).

(a slowly increasingly frustrated)
Martin Maechler
ETH Zurich

> Best,
> Evan

> On Wed, Nov 30, 2016 at 7:16 AM, Laviolette, Michael <
> michael.laviole...@dhhs.nh.gov> wrote:

>> In researching another issue, I discovered a workaround: the network 
drive
>> folder needs to be mapped to the local PC.
>> 
>> setwd("//Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/Michael Laviolette/Stat
>> tools")
>> df1 <- readxl::read_excel("addrlist-4-MikeL.xls", 2)
>> # fails, throws same error
>> df2 <- readxl::read_excel("Z:/Stat tools/addrlist-4-MikeL.xls", 2)  #
>> works
>> 
>> -Original Message-
>> From: Martin Maechler [mailto:maech...@stat.math.ethz.ch]
>> Sent: Friday, November 18, 2016 3:37 PM
>> To: Evan Cortens
>> Cc: Laviolette, Michael; r-devel@r-project.org
>> Subject: Re: [Rd] problem with normalizePath()
>> 
>> >>>>> Evan Cortens <ecort...@mtroyal.ca>
>> >>>>> on Thu, 17 Nov 2016 15:51:03 -0700 writes:
>> 
>> > I wonder if this could be related to the issue that I
>> > submitted to bugzilla about two months ago? (
>> > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17159)
>> 
>> > That is to say, could it be that it's treating the first
>> > path after the single backslash as an actual directory,
>> > rather than as the name of the share?
>> 
>> > --
>> > Evan Cortens, PhD Institutional Analyst - Office of
>> > Institutional Analysis Mount Royal University 403-440-6529
>> 
>> Could well be.  Thank you, Evan, also for your bug report including patch
>> proposal.
>> 
>> In such situations we (R core) would be really happy if Microsoft showed
>> another facet of their investment into R:
>> Ideally there should be enough staff who can judge and test such bugs and
>> bug fixes?
>> 
--> I'm BCC'ing this to one place at least.
>> 
>> Best,
>> Martin Maechler  ETH Zurich
>> 
>> > On Thu, Nov 17, 2016 at 2:28 PM, Laviolette, Michael <
>> > michael.laviole...@dhhs.nh.gov> wrote:
>> 
>> >> The packages "readxl" and "haven" (and possibly others)
>> >> no longer access files on shared network drives. The
>> >> problem appears to be in the normalizePath()
>> >> function. The file can be read from a local drive or by
>> >> functions that don't call normalizePath(). The error
>> >> thrown is
>> >>
>> >> Error:
>> >> path[1]="\\Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls":
>> >> The system cannot find the file specified
>> >>
>> >> Here's my session:
>> >>
>> >> library(readxl) library(XLConnect)
>> >>
>> >> # attempting to read file from network drive df1 <-
>> >> read_excel("//Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls")
>> >> # pathname is fully qualified, but error thrown as above
>> >>
>> >> cat(normalizePath("//Hzndhhsvf2/data/OCPH/EPI/BHSDM/Group/17.xls"))
>> >> # throws same error
>> >>
>> >> # reading same file with different function df2 <-
>> >> readWorksheetFromFile(&

Re: [Rd] seq.int/seq.default

2017-01-06 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Thu, 5 Jan 2017 12:39:29 +0100 writes:

>>>>> Mick Jordan <mick.jor...@oracle.com>
>>>>>     on Wed, 4 Jan 2017 08:15:03 -0800 writes:

>> On 1/4/17 1:26 AM, Martin Maechler wrote:
>>>>>>>> Mick Jordan <mick.jor...@oracle.com> on Tue, 3 Jan
>>>>>>>> 2017 07:57:15 -0800 writes:
>>> > This is a message for someone familiar with the
>>> implementation.  > Superficially the R code for
>>> seq.default and the C code for seq.int > appear to be
>>> semantically very similar. My question is whether, in
>>> fact, > it is intended that behave identically for all
>>> inputs.
>>> 
>>> Strictly speaking, "no": As usual, RT?Manual (;-)
>>> 
>>> The help page says in the very first paragraph
>>> ('Description'):
>>> 
>>> ‘seq’ is a standard generic with a default method.
>>> ‘seq.int’ is a primitive which can be much faster but
>>> has a few restrictions.
>>> 
>>> > I have found two cases so far where they differ, first
>>> > that seq.int will coerce a character string to a real
>>> (via > Rf_asReal) whereas seq.default appears to coerce
>>> it to NA > and then throws an error:
>>> 
>>> >> seq.default("2", "5") > Error in seq.default("2",
>>> "5") : 'from' cannot be NA, NaN or infinite >>
>>> seq.int("2", "5") > [1] 2 3 4 5
>>> >>
>>> 
>>> this may be a bit surprising (if one does _not_ look at
>>> the code), indeed, notably because seq.int() is
>>> mentioned to have more restrictions than seq() which
>>> here calls seq.default().  "Surprising" also when
>>> considering
>>> 
>>> > "2":"5" [1] 2 3 4 5
>>> 
>>> and the documentation of ':' claims 'from:to' to be the
>>> same as rep(from,to) apart from the case of factors.
>>> 
>>> --- I am considering a small change in seq.default()
>>> which would make it work for this case, compatibly with
>>> ":" and seq.int().
>>> 
>>> 
>>> > and second, that the error messages for non-numeric
>>> arguments differ:
>>> 
>>> which I find fine... if the functions where meant to be
>>> identical, we (the R developers) would be silly to have
>>> both, notably as the ".int" suffix has emerged as
>>> confusing the majority of useRs (who don't read help
>>> pages).
>>> 
>>> Rather it has been meant as saying "internal" (including
>>> "fast") also for other such R functions, but the suffix
>>> of course is a potential clash with S3 method naming
>>> schemes _and_ the fact that 'int' is used as type name
>>> for integer in other languages, notably C.
>>> 
>>> > seq.default(to=quote(b), by=2) > Error in
>>> is.finite(to) : default method not implemented for type
>>> 'symbol'
>>> 
>>> which I find a very appropriate and helpful message
>>> 
>>> > seq.int(to=quote(b), by=2) > Error in seq.int(to =
>>> quote(b), by = 2) : > 'to' cannot be NA, NaN or infinite
>>> 
>>> which is true, as well, and there's no "default method"
>>> to be mentioned, but you are right that it would be
>>> nicer if the message mentioned 'symbol' as well.

>> Thanks for the clarifications. It was surprising that
>> seq.int supported more types than seq.default. I was
>> expecting the reverse.

> exactly, me too!

>> BTW, There are a couple of, admittedly odd, cases,
>> exposed by brute force testing, where seq.int will
>> actually return "missing", which I presume is not
>> intended, and seq.default behaves differently, vis:

>>> seq.default(to=1,by=2)
>> [1] 1
>>> seq.int(to=1,by=2)

>>> > x <- seq.int(to=1,by=2) x
>> Error: argument "x" is missing, with no default

>> Lines 792 and 799 of seq.c return the incoming argument
>> (as op

Re: [Rd] utils::ls.str(): Partial argument name 'digits' to seq() (should be digits.d?)

2017-01-03 Thread Martin Maechler

You are right (though picky).  I have updated it now.

Thank you Henrik!
Martin

> Should utils::ls.str() be updated as:

> svn diff src/library/utils/R/str.R
> Index: src/library/utils/R/str.R
> ===
> --- src/library/utils/R/str.R (revision 71879)
> +++ src/library/utils/R/str.R (working copy)
> @@ -622,7 +622,7 @@
>  args$digits.d <- NULL
>  }
>  strargs <- c(list(max.level = max.level, give.attr = give.attr,
> -  digits = digits), args)
> +  digits.d = digits), args)
>  for(nam in x) {
>   cat(nam, ": ")


[...]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] seq.int/seq.default

2017-01-05 Thread Martin Maechler

>>>>> Mick Jordan <mick.jor...@oracle.com>
>>>>> on Wed, 4 Jan 2017 08:15:03 -0800 writes:

> On 1/4/17 1:26 AM, Martin Maechler wrote:
>>>>>>> Mick Jordan <mick.jor...@oracle.com>
>>>>>>> on Tue, 3 Jan 2017 07:57:15 -0800 writes:
>> > This is a message for someone familiar with the implementation.
>> > Superficially the R code for seq.default and the C code for seq.int
>> > appear to be semantically very similar. My question is whether, in 
fact,
>> > it is intended that behave identically for all inputs.
>> 
>> Strictly speaking, "no":  As usual, RT?Manual (;-)
>> 
>> The help page says in the very first paragraph ('Description'):
>> 
>> ‘seq’ is a standard generic with a default method.
>> ‘seq.int’ is a primitive which can be much faster but has a few 
restrictions.
>> 
>> > I have found two cases so far where they differ, first
>> > that seq.int will coerce a character string to a real (via
>> > Rf_asReal) whereas seq.default appears to coerce it to NA
>> > and then throws an error:
>> 
>> >> seq.default("2", "5")
>> > Error in seq.default("2", "5") : 'from' cannot be NA, NaN or infinite
>> >> seq.int("2", "5")
>> > [1] 2 3 4 5
>> >>
>> 
>> this may be a bit surprising (if one does _not_ look at the code),
>> indeed, notably because seq.int() is mentioned to have more
>> restrictions than seq() which here calls seq.default().
>> "Surprising" also when considering
>> 
>> > "2":"5"
>> [1] 2 3 4 5
>> 
>> and the documentation of ':' claims 'from:to' to be the same as
>> rep(from,to)  apart from the case of factors.
>> 
>> --- I am considering a small change in  seq.default()
>> which would make it work for this case, compatibly with ":" and 
seq.int().
>> 
>> 
>> > and second, that the error messages for non-numeric arguments differ:
>> 
>> which I find fine... if the functions where meant to be
>> identical, we (the R developers) would be silly to have both,
>> notably as the ".int" suffix  has emerged as confusing the
>> majority of useRs (who don't read help pages).
>> 
>> Rather it has been meant as saying "internal" (including "fast") also 
for other
>> such R functions, but the suffix of course is a potential clash
>> with S3 method naming schemes _and_ the fact that 'int' is used
>> as type name for integer in other languages, notably C.
>> 
>> > seq.default(to=quote(b), by=2)
>> > Error in is.finite(to) : default method not implemented for type 
'symbol'
>> 
>> which I find a very appropriate and helpful message
>> 
>> > seq.int(to=quote(b), by=2)
>> > Error in seq.int(to = quote(b), by = 2) :
>> > 'to' cannot be NA, NaN or infinite
>> 
>> which is true, as well, and there's no "default method" to be
>> mentioned, but you are right that it would be nicer if the
>> message mentioned 'symbol' as well.

> Thanks for the clarifications. It was surprising that seq.int supported 
> more types than seq.default. I was expecting the reverse.

exactly, me too!

> BTW, There are a couple of, admittedly odd, cases, exposed by brute 
> force testing, where seq.int will actually return "missing", which I 
> presume is not intended, and seq.default behaves differently, vis:

>> seq.default(to=1,by=2)
> [1] 1
>> seq.int(to=1,by=2)

>> > x <- seq.int(to=1,by=2)
>> x
> Error: argument "x" is missing, with no default

> Lines 792 and 799 of seq.c return the incoming argument (as opposed to a 
> value based on its coercion to double via asReal) and this can, as in 
> the above example, be "missing".

> Thanks
> Mick Jordan

Thanks a lot, Mick -- you are right!

I'm fixing these  (the line numbers have greatly changed in the
mean time: Remember we work with "R-devel", i.e., the "trunk" :
always available at
https://svn.r-project.org/R/trunk/src/main/seq.c

Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] seq.int/seq.default

2017-01-05 Thread Martin Maechler

> Mick Jordan 
> on Wed, 4 Jan 2017 12:49:41 -0800 writes:

> On 1/4/17 8:15 AM, Mick Jordan wrote:
> Here is another difference that I am guessing is unintended.

>> y <- seq.int(1L, 3L, length.out=2)
>> typeof(y)
> [1] "double"
>> x <- seq.default(1L, 3L, length.out=2)
>> typeof(x)
> [1] "integer"

> The if (by == R_MissingArg) branch at line 842 doesn't contain a check 
> for "all INTSXP" unlike the if (to == R_MissingArg) branch.

> Mick

I'll look at this case, too,
thank you once more!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] colnames for data.frame could be greatly improved

2016-12-29 Thread Martin Maechler

> Hi there,
> Any update on this?
> Should I create bugzilla ticket and submit patch?

> Regards
> Jan Gorecki

Hi Jan,

Why should we care that the  do.NULL = FALSE case is slower?
After all do.NULL = TRUE is the default.

In other words, where are use cases where it is problematic that
do.NULL = FALSE is relatively slow?

Shorter code  *is* nicer than longer code,  so I need a bit more
conviction why we should add more code for that special case ..

Martin Maechler, ETH Zurich

> On 20 December 2016 at 01:27, Jan Gorecki <j.gore...@wit.edu.pl> wrote:
> > Hello,
> >
> > colnames seems to be not optimized well for data.frame. It escapes
> > processing for data.frame in
> >
> >   if (is.data.frame(x) && do.NULL)
> > return(names(x))
> >
> > but only when do.NULL true. This makes huge difference when do.NULL
> > false. Minimal edit to `colnames`:
> >
> > if (is.data.frame(x)) {
> > nm <- names(x)
> > if (do.NULL || !is.null(nm))
> > return(nm)
> > else
> > return(paste0(prefix, seq_along(x)))
> > }
> >
> > Script and timings:
> >
> > N=1e7; K=100
> > set.seed(1)
> > DF <- data.frame(
> > id1 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
> > id2 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
> > id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE), # small groups (char)
> > id4 = sample(K, N, TRUE),  # large groups (int)
> > id5 = sample(K, N, TRUE),  # large groups (int)
> > id6 = sample(N/K, N, TRUE),# small groups (int)
> > v1 =  sample(5, N, TRUE),  # int in range [1,5]
> > v2 =  sample(5, N, TRUE),  # int in range [1,5]
> > v3 =  sample(round(runif(100,max=100),4), N, TRUE) # numeric e.g. 
> > 23.5749
> > )
> > cat("GB =", round(sum(gc()[,2])/1024, 3), "\n")
> > #GB = 0.397
> > colnames(DF) = NULL
> > system.time(nm1<-colnames(DF, FALSE))
> > #   user  system elapsed
> > # 22.158   0.299  22.498
> > print(nm1)
> > #[1] "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9"
> >
> > ### restart R
> >
> > colnames <- function (x, do.NULL = TRUE, prefix = "col")
> > {
> > if (is.data.frame(x)) {
> > nm <- names(x)
> > if (do.NULL || !is.null(nm))
> > return(nm)
> > else
> > return(paste0(prefix, seq_along(x)))
> > }
> > dn <- dimnames(x)
> > if (!is.null(dn[[2L]]))
> > dn[[2L]]
> > else {
> > nc <- NCOL(x)
> > if (do.NULL)
> > NULL
> > else if (nc > 0L)
> > paste0(prefix, seq_len(nc))
> > else character()
> > }
> > }
> > N=1e7; K=100
> > set.seed(1)
> > DF <- data.frame(
> > id1 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
> > id2 = sample(sprintf("id%03d",1:K), N, TRUE),  # large groups (char)
> > id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE), # small groups (char)
> > id4 = sample(K, N, TRUE),  # large groups (int)
> > id5 = sample(K, N, TRUE),  # large groups (int)
> > id6 = sample(N/K, N, TRUE),# small groups (int)
> > v1 =  sample(5, N, TRUE),  # int in range [1,5]
> > v2 =  sample(5, N, TRUE),  # int in range [1,5]
> > v3 =  sample(round(runif(100,max=100),4), N, TRUE) # numeric e.g. 
> > 23.5749
> > )
> > cat("GB =", round(sum(gc()[,2])/1024, 3), "\n")
> > #GB = 0.397
> > colnames(DF) = NULL
> > system.time(nm1<-colnames(DF, FALSE))
> > #   user  system elapsed
> > #  0.001   0.000   0.000
> > print(nm1)
> > #[1] "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9"
> >
> > sessionInfo()
> > #R Under development (unstable) (2016-12-19 r71815)
> > #Platform: x86_64-pc-linux-gnu (64-bit)
> > #Running under: Debian GNU/Linux stretch/sid
> > #
> > #locale:
> > # [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> > # [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> > # [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> > # [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> > # [9] LC_ADDRESS=C   LC_TELEPHONE=C
> > #[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> > #
> > #attached base packages:
> > #[1] stats graphics  grDevices utils datasets  methods   base  #
> > #
> > #loaded via a namespace (and not attached):
> > #[1] compiler_3.4.0
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] structure(NULL, *) is deprecated [was: Unexpected I(NULL) output]

2016-12-29 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Thu, 22 Dec 2016 10:24:43 +0100 writes:

>>>>> Florent Angly <florent.an...@gmail.com>
>>>>> on Tue, 20 Dec 2016 13:42:37 +0100 writes:

>> Hi all,
>> I believe there is an issue with passing NULL to the function I().

>> class(NULL)  # "NULL"  (as expected)
>> print(NULL)   # NULL  (as expected)
>> is.null(NULL) # TRUE  (as expected)

>> According to the documentation I() should return a copy of its input
>> with class "AsIs" preprended:

>> class(I(NULL))  # "AsIs"  (as expected)
>> print(I(NULL))   # list()  (not expected! should be NULL)
>> is.null(I(NULL)) # FALSE  (not expected! should be TRUE)

>> So, I() does not behave according to its documentation. 

> yes.

>> In R, it is
>> not possible to give NULL attributes, but I(NULL) attempts to do that
>> nonetheless, using the structure() function. Probably:
>> 1/ structure() should not accept NULL as input since the goal of
>> structure() is to set some attributes, something cannot be done on
>> NULL.

> I tend to agree.  However if we gave an error now, I notice that
> even our own code, e.g., in stats:::formula.default()  would fail.

> Still, I think we should consider *deprecating*  structure(NULL, *),
> so it would give a *warning* (and continue working otherwise)
> (for a while before giving an error a year later).

 [..]

> Martin Maechler
> ETH Zurich

Since svn rev 71841,   structure(NULL, *) now __is__ deprecated
in R-devel, i.e.,

  > structure(NULL, foo = 2)
  list()
  attr(,"foo")
  [1] 2
  Warning message:
  In structure(NULL, foo = 2) :
Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
Consider 'structure(list(), *)' instead.
  > 

A dozen or so CRAN packages now not only give warnings but
partially also  ERRORS in their checks,  which I find strange,
but it may be because of too stringent checks (e.g. checks were
all warnings are turned into errors).

The most prominent packages now giving errors are
data.table and ggplot2,  then also GGally.

Of course, we (the R core team) could make the deprecation even
milder by not giving a warning() but only a message(.) aka
"NOTE";  however, that renders the deprecation process longer and more
complicated (notably for us),  and there is still a few months' time
before this version of R will be released...
and really, as I said,... a new warning should rarely cause
*errors* but rather warnings.

OTOH, some of us have now seen / read on the  R-package-devel  mailing list
that it seems ggplot2 has stopped working correctly (under
R-devel only!) in building packages because of this warning.. 

The current plan is it will eventually, i.e., after the
deprecation period, become an error, so ideally packages are
patched and re-released ASAP.  It's bedtime here now and we will
see tomorrow how to continue.

My current plan is to an e-mail to the package maintainers of CRAN
packages that are affected, at least for those packages that are "easy to find".

Martin Maechler,
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] seq.int/seq.default

2017-01-04 Thread Martin Maechler

>>>>> Mick Jordan <mick.jor...@oracle.com>
>>>>> on Tue, 3 Jan 2017 07:57:15 -0800 writes:

> This is a message for someone familiar with the implementation.
> Superficially the R code for seq.default and the C code for seq.int 
> appear to be semantically very similar. My question is whether, in fact, 
> it is intended that behave identically for all inputs. 

Strictly speaking, "no":  As usual, RT?Manual (;-)

The help page says in the very first paragraph ('Description'):

  ‘seq’ is a standard generic with a default method.
  ‘seq.int’ is a primitive which can be much faster but has a few restrictions. 

> I have found two cases so far where they differ, first
> that seq.int will coerce a character string to a real (via
> Rf_asReal) whereas seq.default appears to coerce it to NA
> and then throws an error:

>> seq.default("2", "5")
> Error in seq.default("2", "5") : 'from' cannot be NA, NaN or infinite
>> seq.int("2", "5")
> [1] 2 3 4 5
>> 

this may be a bit surprising (if one does _not_ look at the code),
indeed, notably because seq.int() is mentioned to have more
restrictions than seq() which here calls seq.default().
"Surprising" also when considering

   > "2":"5"
   [1] 2 3 4 5

and the documentation of ':' claims 'from:to' to be the same as
rep(from,to)  apart from the case of factors.

--- I am considering a small change in  seq.default()
which would make it work for this case, compatibly with ":" and seq.int().

> and second, that the error messages for non-numeric arguments differ:

which I find fine... if the functions where meant to be
identical, we (the R developers) would be silly to have both,
notably as the ".int" suffix  has emerged as confusing the
majority of useRs (who don't read help pages).

Rather it has been meant as saying "internal" (including "fast") also for other
such R functions, but the suffix of course is a potential clash
with S3 method naming schemes _and_ the fact that 'int' is used
as type name for integer in other languages, notably C. 

> seq.default(to=quote(b), by=2)
> Error in is.finite(to) : default method not implemented for type 'symbol'

which I find a very appropriate and helpful message

> seq.int(to=quote(b), by=2)
> Error in seq.int(to = quote(b), by = 2) :
> 'to' cannot be NA, NaN or infinite

which is true, as well, and there's no "default method" to be
mentioned, but you are right that it would be nicer if the
message mentioned 'symbol' as well.

> Please reply off list.

[which I understand as that we should  CC you (which of course is
 netiquette to do)]

Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Very small numbers in hexadecimal notation parsed as zero

2016-12-21 Thread Martin Maechler

> Florent Angly 
> on Tue, 20 Dec 2016 13:26:36 +0100 writes:

> Hi all,
> I have noticed incorrect parsing of very small hexadecimal numbers
> like "0x1.dp-987". Such a hexadecimal representation can
> can be produced by sprintf() using the %a flag. The return value is
> incorrectly reported as 0 when coercing these numbers to double using
> as.double()/as.numeric(), as illustrated in the three examples below:

> as.double("0x1.dp-987")# should be 7.645296e-298
> as.double("0x1.0p-1022")  # should be 2.225074e-308
> as.double("0x1.f89fc1a6f6613p-974")  # should be 1.23456e-293

> The culprit seems to be the src/main/util.c:R_strtod function and in
> some cases, removing the zeroes directly before the 'p' leads to
> correct parsing:

> as.double("0x1.dp-987") # 7.645296e-298, as expected
> as.double("0x1.p-1022") # 2.225074e-308, as expected

Yes, this looks like a bug, indeed.
Similarly convincing is a simple comparison (of even less extreme)

> as.double("0x1p-987")
[1] 7.645296e-298
> as.double("0x1.00p-987")
[1] 0
> 

The "bug boundary" seems around here:

> as.double("0x1.p-928") # fails
[1] 0
> as.double("0x1p-928")
[1] 4.407213e-280
> 

> as.double("0x1.p-927") # works
[1] 8.814426e-280

but then adding more zeros before "p-927" also underflows.

--> I have created an R bugzilla account for you; so you now
 can submit bug reports (including patch proposals to the source (hint!) ;-)

Thank you, Florent!
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Unexpected I(NULL) output

2016-12-22 Thread Martin Maechler

>>>>> Florent Angly <florent.an...@gmail.com>
>>>>> on Tue, 20 Dec 2016 13:42:37 +0100 writes:

> Hi all,
> I believe there is an issue with passing NULL to the function I().

> class(NULL)  # "NULL"  (as expected)
> print(NULL)   # NULL  (as expected)
> is.null(NULL) # TRUE  (as expected)

> According to the documentation I() should return a copy of its input
> with class "AsIs" preprended:

> class(I(NULL))  # "AsIs"  (as expected)
> print(I(NULL))   # list()  (not expected! should be NULL)
> is.null(I(NULL)) # FALSE  (not expected! should be TRUE)

> So, I() does not behave according to its documentation. 

yes.

> In R, it is
> not possible to give NULL attributes, but I(NULL) attempts to do that
> nonetheless, using the structure() function. Probably:
> 1/ structure() should not accept NULL as input since the goal of
> structure() is to set some attributes, something cannot be done on
> NULL.

I tend to agree.  However if we gave an error now, I notice that
even our own code, e.g., in stats:::formula.default()  would fail.

Still, I think we should consider *deprecating*  structure(NULL, *),
so it would give a *warning* (and continue working otherwise)
(for a while before giving an error a year later).

> 2/ I() could accept NULL, but, as an exception, not set an "AsIs"
> class attribute on it. This would be in line with the philosophy of
> the I() function to return an object that is functionally equivalent
> to the input object.

If we'd adopt 2, the I(.) function would become slightly more
complicated and slower...  but possibly not practically
noticeable.

A last option would be

3/  The help page for I() could note what happens in the NULL case.

That would be the least work for everyone,
but at the moment, I tend to agree that '1/' is worth the pain to
have R's structure() become more consistent.

Martin Maechler
ETH Zurich

> My sessionInfo() returns:
>> sessionInfo()
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1

> locale:
> [1] LC_COLLATE=German_Switzerland.1252
> LC_CTYPE=German_Switzerland.1252
> LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Switzerland.1252

> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base

> Best regards,

> Florent

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Request: Increasing MAX_NUM_DLLS in Rdynload.c

2016-12-20 Thread Martin Maechler

>>>>> Steve Bronder <sbron...@stevebronder.com>
>>>>> on Tue, 20 Dec 2016 01:34:31 -0500 writes:

> Thanks Henrik this is very helpful! I will try this out on our tests and
> see if gcDLLs() has a positive effect.

> mlr currently has tests broken down by learner type such as 
classification,
> regression, forecasting, clustering, etc.. There are 83 classifiers alone
> so even when loading and unloading across learner types we can still hit
> the MAX_NUM_DLLS error, meaning we'll have to break them down further (or
> maybe we can be clever with gcDLLs()?). I'm CC'ing Lars Kotthoff and Bernd
> Bischl to make sure I am representing the issue well.

This came up *here* in May 2015
and then May 2016 ... did you not find it when googling.

Hint:  Use  
   site:stat.ethz.ch MAX_NUM_DLLS
as search string in Google, so it will basically only search the
R mailing list archives

Here's the start of that thread :

  https://stat.ethz.ch/pipermail/r-devel/2016-May/072637.html

There was not a clear conclusion back then, notably as
Prof Brian Ripley noted that 100 had already been an increase
and that a large number of loaded DLLs decreases look up speed.

OTOH (I think others have noted that) a large number of DLLs
only penalizes those who *do* load many, and we should probably
increase it.

Your use case of "hyper packages" which load many others
simultaneously is somewhat convincing to me... in so far as the
general feeling is that memory should be cheap and limits should
not be low.

(In spite of Brian Ripleys good reasons against it, I'd still
 aim for a *dynamic*, i.e. automatically increased list here).

Martin Maechler

> Regards,

> Steve Bronder
> Website: stevebronder.com
> Phone: 412-719-1282
> Email: sbron...@stevebronder.com


> On Tue, Dec 20, 2016 at 1:04 AM, Henrik Bengtsson <
> henrik.bengts...@gmail.com> wrote:

>> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
>> packages don't unload their DLLs when they being unloaded themselves.
>> In other words, there may be left-over DLLs just sitting there doing
>> nothing but occupying space.  You can remove these, using:
>> 
>> R.utils::gcDLLs()
>> 
>> Maybe that will help you get through your tests (as long as you're
>> unloading packages).  gcDLLs() will look at base::getLoadedDLLs() and
>> its content and compare to loadedNamespaces() and unregister any
>> "stray" DLLs that remain after corresponding packages have been
>> unloaded.
>> 
>> I think it would be useful if R CMD check would also check that DLLs
>> are unregistered when a package is unloaded
>> (https://github.com/HenrikBengtsson/Wishlist-for-R/issues/29), but of
>> course, someone needs to write the code / a patch for this to happen.
>> 
>> /Henrik
>> 
>> On Mon, Dec 19, 2016 at 6:01 PM, Steve Bronder
>> <sbron...@stevebronder.com> wrote:
>> > This is a request to increase MAX_NUM_DLLS in Rdynload.c in from 100 to
>> 500.
>> >
>> > On line 131 of Rdynload.c, changing
>> >
>> > #define MAX_NUM_DLLS 100
>> >
>> >  to
>> >
>> > #define MAX_NUM_DLLS 500
>> >
>> >
>> > In development of the mlr package, there have been several episodes in
>> the
>> > past where we have had to break up unit tests because of the "maximum
>> > number of DLLs reached" error. This error has been an inconvenience 
that
>> is
>> > going to keep happening as the package continues to grow. Is there more
>> > than meets the eye with this error or would everything be okay if the
>> above
>> > line changes? Would that have a larger effect in other parts of R?
>> >
>> > As R grows, we are likely to see more 'meta-packages' such as the
>> > Hadley-verse, caret, mlr, etc. need an increasing amount of DLLs loaded
>> at
>> > any point in time to conduct effective unit tests. If  MAX_NUM_DLLS is
>> set
>> > to 100 for a very particular reason than I apologize, but if it is
>> possible
>> > to increase MAX_NUM_DLLS it would at least make the testing at mlr much
>> > easier.
>> >
>> > I understand you are all very busy and thank you for your time.
>> >
>> >
>> > Regards,
>> >
>> > Steve Bronder
>> > Website: stevebronder.com

Re: [Rd] A question on stats::as.hclust.dendrogram

2017-03-24 Thread Martin Maechler

>>>>> Ma,Man Chun John <m...@mdanderson.org>
>>>>> on Thu, 23 Mar 2017 19:29:25 + writes:

> Hi all,
> This is the first time I'm writing to R-devel, and this time I'm just 
asking for the purpose for a certain line of code in 
stats::as.hclust.dendrogram, which comes up as I'm trying to fix dendextend.

"fix": where is it broken?
Do you mean the fact that in R <= 3.3.3, it is defined via
recursion and hence infeasible for "deep" dendrograms?

In any case, note that  NEWS  for the upcoming version of R,
R 3.4.0  contains 

• The str() and as.hclust() methods for "dendrogram" now also work
  for deeply nested dendrograms thanks to non-recursive
  implementations by Bradley Broom.

so the source code of  as.hclust.dendrogram  has been changed
substantially already.

Note that you **NEVER** see the "real" source code of function
by printing it to the console.
The source code is in the source of the corresponding package,
in the case of 'stats', as part of the source code of R.

I.e., here,
 https://svn.r-project.org/R/trunk/src/library/stats/R/dendrogram.R


I think the following question has become irrelevant now,
but yes, dendrograms *are* implemented as nested lists.

Martin Maechler
ETH Zurich and R core team


> The line in question is at line 128 of dendrogram.R in R-3.3.3, at 
stats::as.hclust.dendrogram:

> stopifnot(length(s) == 2L, all( vapply(s, is.integer, NA) ))

> Is there any legitimate possibility that s is a nested list? Currently I 
have a case where a dendrogram object is breaks at this line, because s is a 
nested list:

>> str (s)
> List of 2
> $ : int -779
> $ :List of 2
> ..$ : int -625
> ..$ : int 15

> I'm unsure if my dendrogram was malformed in the first place, since I was 
trying to use dendrapply.

> So, my question is: for that particular check, why use

> stopifnot(length(s) == 2L, all( vapply(s, is.integer, NA) ))

> instead of

> stopifnot(length(s) == 2L, all( vapply(unlist(s), is.integer, NA) ))?

> I appreciate your time and I'm looking forward to your response.

> Cheers,

> Man Chun John Ma, PhD
> Postdoctoral Fellow
> Unit 0903
> Dept Lymphoma & Myeloma Research
> 1515 Holcombe Blvd.
> Houston, TX 77030
> m...@mdanderson.org

> The information contained in this e-mail message may be ...{{dropped:14}}

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Documentation of model.frame() and get_all_vars()

2017-03-27 Thread Martin Maechler

ngths differ (found for '(new)')

> But, maybe that's something for the "Details" section? (Or it's a bug
> - I don't really know.)

I would not want to change model.frame.default() currently as it's
too important a building block and it may be wise to require
that its callers should have done recycling.

> Thanks in advance for your consideration.

Thank you Thomas for the suggested help file improvements!
Martin 

--
Martin Maechler
ETH Zurich

> Best,
> -Thomas

> Thomas J. Leeper
> http://www.thomasleeper.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Error in documentation for ?legend

2017-03-27 Thread Martin Maechler

>>>>> POLITZER-AHLES, Stephen [CBS] <stephen.politzerah...@polyu.edu.hk>
>>>>> on Sat, 25 Mar 2017 13:25:32 + writes:

> Right, that's my point. The help page mentions a
> `title.cex`, like I said; saying that `cex` sets the
> default `title.cex` sure implies to me (and presumably to
> the other people whose discussion I linked) that a
> `title.cex` parameter exists. Since no such parameter
> exists, this bit in the documentation is misleading
> (suggesting that there is a `title.cex` parameter which
> can be set, when there really isn't). Regardless of
> whether we call it an "oddity" or what, I don't think it's
> controversial that this is misleading. If it's misleading,
> shouldn't it be removed?

Yes.
I've done so now,  thank you for the report!

(You did not understand Peter:  He *did* agree with you that
 there's no 'title.cex' argument  and explained why the oddity
 probably has happened in the distant past ..)

Martin Maechler
ETH Zurich
and R Core Team (as Peter Dalgaard)

> From: peter dalgaard <pda...@gmail.com>
> Sent: Saturday, March 25, 2017 9:10:57 PM
> To: POLITZER-AHLES, Stephen [CBS]
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] Error in documentation for ?legend


>> On 25 Mar 2017, at 00:39 , POLITZER-AHLES, Stephen [CBS] 
<stephen.politzerah...@polyu.edu.hk> wrote:
>> 
>> To whom it may concern:
>> 
>> 
>> The help page for ?legend refers to a `title.cex` parameter, which 
suggests that the function has such a parameter.

> No it does not. All arguments are listed and documented, none of them is 
title.cex, and there's no "...".

> However, the documentation for "cex" has this oddity inside:

> cex: character expansion factor *relative* to current
> �par("cex")�.  Used for text, and provides the default for
> �pt.cex� and �title.cex�.

> Checking the sources suggests that this is the last anyone has seen of 
title.cex:

> pd$ grep -r title.cex src
> src/library/graphics/man/legend.Rd:\code{pt.cex} and 
\code{title.cex}.}
> pd$

> The text was inserted as part of the addition of the title.col (!) 
argument, so it looks like the author got some wires crossed.

> -pd

>> As far as I can tell, though, it doesn't; here's an example:
>> 
>> 
>>> plot(1,1)
>>> legend("topright",pch=1, legend="something", title="my legend", 
title.cex=2)
>> Error in legend("topright", pch = 1, legend = "something", title = "my 
legend",  :
>> unused argument (title.cex = 2)
>> 
>> 
>> This issue appears to have been discussed online before (e.g. here's a 
post from 2011 mentioning it: 
http://r.789695.n4.nabble.com/Change-the-text-size-of-the-title-in-a-legend-of-a-R-plot-td3482880.html)
 but I'm not sure if anyone ever reported it to R developers.
>> 
>> 
>> Is it possible for someone to update the ?legend documentation page so 
that it doens't refer to a parameter that isn't usable?
>> 
>> Best,
>> 
>> Steve Politzer-Ahles
>> 
>> ---
>> Stephen Politzer-Ahles
>> The Hong Kong Polytechnic University
>> Department of Chinese and Bilingual Studies
>> http://www.mypolyuweb.hk/~sjpolit/<http://www.mypolyuweb.hk/%7Esjpolit/>
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] `[` not recognized as a primitive in certain cases.

2017-03-29 Thread Martin Maechler

> Joris Meys 
> on Tue, 28 Mar 2017 15:19:14 +0200 writes:

> Thank you gents, I overlooked the subtle differences.

> On Tue, Mar 28, 2017 at 2:49 PM, Lukas Stadler 
> wrote:

>> “typeof” is your friend here:
>> 
>> > typeof(`[`)
>> [1] "special"
>> > typeof(mc[[1]])
>> [1] "symbol"
>> > typeof(mc2[[1]])
>> [1] "special"
>> 
>> so mc[[1]] is a symbol, and thus not a primitive.

or  str()  which should be better known to Joe Average useR

> mc <- call("[",iris,2,"Species")
> str(mc[[1]])
 symbol [
> str(`[`)
.Primitive("[") 
> 


>> - Lukas
>> 
>> > On 28 Mar 2017, at 14:46, Michael Lawrence 
>> wrote:
>> >
>> > There is a difference between the symbol and the function (primitive
>> > or closure) to which it is bound.
>> >
>> > This:
>> > mc2 <- as.call(list(`[`,iris,2,"Species"))
>> >
>> > Evaluates `[` to its value, in this case the primitive object, and the
>> > primitive itself is incorporated into the returned call.
>> >
>> > If you were to do this:
>> > mc2 <- as.call(list(quote(`[`),iris,2,"Species"))
>> >
>> > The `[` would _not_ be evaluated, quote() would return the symbol, and
>> > the symbol would end up in the call.
>> >
>> > The two forms have virtually identical behavior as long as the call
>> > ends up getting evaluated in the same environment.
>> >
>> > On Tue, Mar 28, 2017 at 3:03 AM, Joris Meys  
wrote:
>> >> Dear,
>> >>
>> >> I have noticed this problem while looking at the following question on
>> >> Stackoverflow :
>> >>
>> >> http://stackoverflow.com/questions/42894213/s4-class-
>> subset-inheritance-with-additional-arguments
>> >>
>> >> While going through callNextMethod, I've noticed the following odd
>> >> behaviour:
>> >>
>> >> mc <- call("[",iris,2,"Species")
>> >>
>> >> mc[[1]]
>> >> ## `[`
>> >>
>> >> is.primitive(`[`)
>> >> ## [1] TRUE
>> >>
>> >> is.primitive(mc[[1]])
>> >> ## [1] FALSE
>> >> # Expected to be TRUE
>> >>
>> >> mc2 <- as.call(list(`[`,iris,2,"Species"))
>> >>
>> >> is.primitive(mc2[[1]])
>> >> ## [1] TRUE
>> >>
>> >> So depending on how I construct the call (using call() or as.call() ),
>> the
>> >> function `[` is or is not recognized as a primitive by is.primitive()
>> >>
>> >> The behaviour is counterintuitive and -unless I miss something obvious
>> >> here- likely to be a bug imho. I immediately admit that my C chops
>> aren't
>> >> sufficient to come up with a patch.
>> >>
>> >> Cheers
>> >> Joris
>> >>
>> >> --
>> >> Joris Meys
>> >> Statistical consultant
>> >>
>> >> Ghent University
>> >> Faculty of Bioscience Engineering
>> >> Department of Mathematical Modelling, Statistics and Bio-Informatics
>> >>
>> >> tel :  +32 (0)9 264 61 79
>> >> joris.m...@ugent.be
>> >> ---
>> >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>> >>
>> >>[[alternative HTML version deleted]]
>> >>
>> >> __
>> >> R-devel@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 


> -- 
> Joris Meys
> Statistical consultant

> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics

> tel :  +32 (0)9 264 61 79
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

> [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Support for user defined unary functions

2017-03-16 Thread Martin Maechler

> Jim Hester 
> on Thu, 16 Mar 2017 12:31:56 -0400 writes:

> Gabe,
> The unary functions have the same precedence as normal SPECIALS
> (although the new unary forms take precedence over binary SPECIALS).
> So they are lower precedence than unary + and -. Yes, both of your
> examples are valid with this patch, here are the results and quoted
> forms to see the precedence.

> `%chr%` <- function(x) as.character(x)

  [more efficient would be `%chr%` <- as.character]

> `%identical%` <- function(x, y) identical(x, y)
> quote("100" %identical% %chr% 100)
> #>  "100" %identical% (`%chr%`(100))

> "100" %identical% %chr% 100
> #> [1] TRUE

> `%num%` <- as.numeric
> quote(1 + - %num% "5")
> #> 1 + -(`%num%`("5"))

> 1 + - %num% "5"
> #> [1] -4

> Jim

I'm sorry to be a bit of a spoiler to "coolness", but
you may know that I like to  applaud Norm Matloff for his book
title "The Art of R Programming",
because for me good code should also be beautiful to some extent.

I really very much prefer

   f(x)
to%f% x   

and hence I really really really cannot see why anybody would prefer
the ugliness of

   1 + - %num% "5"
to
   1 + -num("5")

(after setting  num <- as.numeric )

Martin


> On Thu, Mar 16, 2017 at 12:01 PM, Gabriel Becker  
wrote:
>> Jim,
>> 
>> This seems cool. Thanks for proposing it. To be concrete, he user-defined
>> unary operations would be of the same precedence (or just slightly 
below?)
>> built-in unary ones? So
>> 
>> "100" %identical% %chr% 100
>> 
>> would work and return TRUE under your patch?
>> 
>> And  with %num% <- as.numeric, then
>> 
>> 1 + - %num% "5"
>> 
>> would also be legal (though quite ugly imo) and work?
>> 
>> Best,
>> ~G
>> 
>> On Thu, Mar 16, 2017 at 7:24 AM, Jim Hester 
>> wrote:
>>> 
>>> R has long supported user defined binary (infix) functions, defined
>>> with `%fun%`. A one line change [1] to R's grammar allows users to
>>> define unary (prefix) functions in the same manner.
>>> 
>>> `%chr%` <- function(x) as.character(x)
>>> `%identical%` <- function(x, y) identical(x, y)
>>> 
>>> %chr% 100
>>> #> [1] "100"
>>> 
>>> %chr% 100 %identical% "100"
>>> #> [1] TRUE
>>> 
>>> This seems a natural extension of the existing functionality and
>>> requires only a minor change to the grammar. If this change seems
>>> acceptable I am happy to provide a complete patch with suitable tests
>>> and documentation.
>>> 
>>> [1]:
>>> Index: src/main/gram.y
>>> ===
>>> --- src/main/gram.y (revision 72358)
>>> +++ src/main/gram.y (working copy)
>>> @@ -357,6 +357,7 @@
>>> |   '+' expr %prec UMINUS   { $$ = xxunary($1,$2);
>>> setId( $$, @$); }
>>> |   '!' expr %prec UNOT { $$ = xxunary($1,$2);
>>> setId( $$, @$); }
>>> |   '~' expr %prec TILDE{ $$ = xxunary($1,$2);
>>> setId( $$, @$); }
>>> +   |   SPECIAL expr{ $$ = xxunary($1,$2);
>>> setId( $$, @$); }
>>> |   '?' expr{ $$ = xxunary($1,$2);
>>> setId( $$, @$); }
>>> 
>>> |   expr ':'  expr  { $$ =
>>> xxbinary($2,$1,$3);  setId( $$, @$); }
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 
>> 
>> 
>> --
>> Gabriel Becker, PhD
>> Associate Scientist (Bioinformatics)
>> Genentech Research

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] outer not applying a constant function

2017-03-21 Thread Martin Maechler

>>>>> William Dunlap <wdun...@tibco.com>
>>>>> on Mon, 20 Mar 2017 10:20:11 -0700 writes:

>> Or is this a bad idea?
> I don't like the proposal.  I have seen code like the following (in
> fact, I have written such code, where I had forgotten a function was
> not vectorized) where the error would have been discovered much later
> if outer() didn't catch it.

>> outer(1:3, 11:13, sum)
>  Error in outer(1:3, 11:13, sum) :
>dims [product 9] do not match the length of object [1]

> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com

You are right, thank you!
Such a "convenience change" would not be a good idea.

Martin Maechler
ETH Zurich




> On Mon, Mar 20, 2017 at 6:36 AM, Martin Maechler
> <maech...@stat.math.ethz.ch> wrote:
>>>>>>> Gebhardt, Albrecht <albrecht.gebha...@aau.at>
>>>>>>> on Sun, 19 Mar 2017 09:14:56 + writes:
>> 
>> > Hi,
>> > the function outer can not apply a constant function as in the last 
line of the following example:
>> 
>> >> xg <- 1:4
>> >> yg <- 1:4
>> >> fxyg <- outer(xg, yg, function(x,y) x*y)
>> >> fconstg <- outer(xg, yg, function(x,y) 1.0)
>> > Error in outer(xg, yg, function(x, y) 1) :
>> > dims [product 16] do not match the length of object [1]
>> 
>> > Of course there are simpler ways to construct a constant matrix, that 
is not my point.
>> 
>> > It happens for me in the context of generating matrices of partial 
derivatives, and if on of these partial derivatives happens to be constant it 
fails.
>> 
>> > So e.g this works:
>> 
>> > library(Deriv)
>> > f <- function(x,y) (x-1.5)*(y-1)*(x-1.8)+(y-1.9)^2*(x-1.1)^3
>> > fx <- Deriv(f,"x")
>> > fy <- Deriv(f,"y")
>> > fxy <- Deriv(Deriv(f,"y"),"x")
>> > fxx <- Deriv(Deriv(f,"x"),"x")
>> > fyy <- Deriv(Deriv(f,"y"),"y")
>> 
>> > fg   <- outer(xg,yg,f)
>> > fxg  <- outer(xg,yg,fx)
>> > fyg  <- outer(xg,yg,fy)
>> > fxyg <- outer(xg,yg,fxy)
>> > fxxg <- outer(xg,yg,fxx)
>> > fyyg <- outer(xg,yg,fyy)
>> 
>> > And with
>> 
>> > f <- function(x,y) x+y
>> 
>> > it stops working. Of course I can manually fix this for that special 
case, but thats not my point. I simply thought "outer" should be able to handle 
constant functions.
>> 
>> ?outer   clearly states that  FUN  needs to be vectorized
>> 
>> but  function(x,y) 1is not.
>> 
>> It is easy to solve by wrapping the function in Vectorize(.):
>> 
>>> x <- 1:3; y <- 1:4
>> 
>>> outer(x,y, function(x,y) 1)
>> Error in dim(robj) <- c(dX, dY) :
>> dims [product 12] do not match the length of object [1]
>> 
>>> outer(x,y, Vectorize(function(x,y) 1))
>> [,1] [,2] [,3] [,4]
>> [1,]1111
>> [2,]1111
>> [3,]1111
>> 
>> 
>> 
>> So, your "should"  above must be read in the sense
>> 
>> "It really would be convenient here and
>> correspond to other "recycling" behavior of R"
>> 
>> and I agree with that, having experienced the same inconvenience
>> as you several times in the past.
>> 
>> outer() being a nice R-level function (i.e., no C speed up)
>> makes it easy to improve:
>> 
>> Adding something like the line
>> 
>> if(length(robj) == 1L) robj <- rep.int(robj, dX*dY)
>> 
>> beforedim(robj) <- c(dX, dY)   [which gave the error]
>> 
>> would solve the issue and not cost much (in the cases it is unneeded).
>> 
>> Or is this a bad idea?
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Hyperbolic tangent different results on Windows and Mac

2017-03-21 Thread Martin Maechler

>>>>> Rodrigo Zepeda <rzeped...@gmail.com>
>>>>> on Fri, 17 Mar 2017 12:56:06 -0600 writes:

> Dear all,
> We seem to have found a "strange" behaviour in the hyperbolic tangent
> function tanh on Windows.
> When running tanh(356 + 0i) the Windows result is NaN + 0.i while on Mac
> the result is 1 + 0i. It doesn't seem to be a floating point error because
> on Mac it is possible to run arbitrarily large numbers (say tanh(
> 
99677873648767519238192348124812341234182374817239847812738481234871823+0i)
> ) and still get 1 + 0i as result. This seems to be related to the 
imaginary
> part as tanh(356) returns 1 in both Windows and Mac.

> We have obtained those results in:
> 1) Mac with El Capitan v 10.11.6 *processor: 2.7 GHz Intel Core i5*
> - 2) Mac with Sierra v 10.12.3 *processor: 3.2 GHz Intel Core i5*
> - 3) Windows 10 Home v 1607 *processor: Intel Core m3-SY30 CPU@ 0.90 GHz
> 1.51 GHz*
> - 4) Windows 7 Home Premium Service Pack 1 *processor: Intel Core i5-2410M
> CPU @2.30 GHz 2.30GHz.*

(The hardware should not matter).

Yes, there is a bug here on Windows only, (several Linux
versions work correctly too).

> In all cases we are using R version 3.3.3 (64 bits)

> - *Does anybody have a clue on why is this happening?*

> PS: We have previously posted this issue in Stack Overflow (
> 
http://stackoverflow.com/questions/42847414/hyperbolic-tangent-in-r-throws-nan-in-windows-but-not-in-mac).
> A comment suggests it is related to a glibc bug.

Yes, that would have been my guess too... as indeed, R on
Windows which should work for quite old versions of Windows has
been using a relatively old (gcc / libc) toolchain.

The upcoming version of R 3.4.0 uses a considerably newer
toolchain *BUT* I've just checked the latest "R-devel" binary
and the bug is still present there.

Here's a slight extension of the answer I wrote to the
above SO question here:  http://stackoverflow.com/a/42923289/161921

... Windows uses somewhat old C libraries, and here it is the
"mathlib" part of glibc. 

More specifically, according to the CRAN download page for R-devel for Windows
https://cran.r-project.org/bin/windows/base/rdevel.html ,
the R 3.3.z series uses the gcc 4.6.3 (March 2012) toolchain, whereas
"R-devel", the upcoming (not yet released!) R 3.4.z series uses
the gcc 4.9.3 (June 2015) toolchain.

According to Ben Bolker's comment on SO, the bug in glibc should have
been fixed in 2012 -- and so the change from 4.6.3 to 4.9.3
should have helped,
**however* I've just checked (installed the R-devel binary from CRAN on our 
Windows server virtual machine) and I see that the problem is still present 
there: In yesterday's version of R-devel, tanh(500+0i) still returns NaN+0i.

I now think a better solution would be to use R's internal
substitute (in R's src/main/complex.c): There, we have

#ifndef HAVE_CTANH
#define ctanh R_ctanh
static double complex ctanh(double complex z)
{
return -I * ctan(z * I); /* A 4.5.9 */
}
#endif

and we should use it (by "#undef HAVE_CTAN" (or better by a
  configure check, using  ctanh("500 + 0i"),
as I see that on Windows,
   R> -1i * tan((500+0i)*1i)
gives
   [1] 1+0i
as it should for tanh(500+0i) --- but does not on Windows.

Martin Maechler
ETH Zurich and R Core

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] outer not applying a constant function

2017-03-20 Thread Martin Maechler

> Gebhardt, Albrecht 
> on Sun, 19 Mar 2017 09:14:56 + writes:

> Hi,
> the function outer can not apply a constant function as in the last line 
of the following example:

>> xg <- 1:4
>> yg <- 1:4
>> fxyg <- outer(xg, yg, function(x,y) x*y)
>> fconstg <- outer(xg, yg, function(x,y) 1.0)
> Error in outer(xg, yg, function(x, y) 1) :
> dims [product 16] do not match the length of object [1]

> Of course there are simpler ways to construct a constant matrix, that is 
not my point.

> It happens for me in the context of generating matrices of partial 
derivatives, and if on of these partial derivatives happens to be constant it 
fails.

> So e.g this works:

> library(Deriv)
> f <- function(x,y) (x-1.5)*(y-1)*(x-1.8)+(y-1.9)^2*(x-1.1)^3
> fx <- Deriv(f,"x")
> fy <- Deriv(f,"y")
> fxy <- Deriv(Deriv(f,"y"),"x")
> fxx <- Deriv(Deriv(f,"x"),"x")
> fyy <- Deriv(Deriv(f,"y"),"y")

> fg   <- outer(xg,yg,f)
> fxg  <- outer(xg,yg,fx)
> fyg  <- outer(xg,yg,fy)
> fxyg <- outer(xg,yg,fxy)
> fxxg <- outer(xg,yg,fxx)
> fyyg <- outer(xg,yg,fyy)

> And with

> f <- function(x,y) x+y

> it stops working. Of course I can manually fix this for that special 
case, but thats not my point. I simply thought "outer" should be able to handle 
constant functions.

?outer   clearly states that  FUN  needs to be vectorized

but  function(x,y) 1is not.

It is easy to solve by wrapping the function in Vectorize(.):

> x <- 1:3; y <- 1:4

> outer(x,y, function(x,y) 1)
Error in dim(robj) <- c(dX, dY) : 
  dims [product 12] do not match the length of object [1]

> outer(x,y, Vectorize(function(x,y) 1))
 [,1] [,2] [,3] [,4]
[1,]1111
[2,]1111
[3,]1111



So, your "should"  above must be read in the sense

  "It really would be convenient here and
   correspond to other "recycling" behavior of R"

and I agree with that, having experienced the same inconvenience
as you several times in the past.

outer() being a nice R-level function (i.e., no C speed up)
makes it easy to improve:

Adding something like the line

if(length(robj) == 1L) robj <- rep.int(robj, dX*dY)

beforedim(robj) <- c(dX, dY)   [which gave the error]

would solve the issue and not cost much (in the cases it is unneeded).

Or is this a bad idea?

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] IO error when writing to disk

2017-03-22 Thread Martin Maechler

>>>>> realitix  <reali...@gmail.com>
>>>>> on Wed, 22 Mar 2017 10:17:54 +0100 writes:

> Hello,
> I have sent a mail but I got no answer.

All work here happens on a volunteer basis... and it seems
everybody was busy or not interested.

> Can you create a bugzilla account for me.

I've done that now.

Note that your prposed patch did contain a bit too many "copy &
paste" repetitions...  which I personally would have liked to be
written differently, using a wrapper (function or macro).

Also, let's assume on Linux, would there be a way to create a
small, say 1 MB, temporary file system as a non-root user?
In that case, we could do all the testing from inside R ..

Best,
Martin Maechler

> Thanks,
> Jean-Sébastien Bevilacqua

> 2017-03-20 10:24 GMT+01:00 realitix <reali...@gmail.com>:

>> Hello,
>> Here a small improvement for R.
>> 
>> When you use the function write.table, if the disk is full for example,
>> the function doesn't return an error and the file is written but 
truncated.
>> 
>> It can be a source of mistakes because you can then copy the output file
>> and think everything is ok.
>> 
>> How to reproduce
>> -
>> 
>> >> write.csv(1:1000, 'path')
>> 
>> You must have a path with a small amount of disk available (on linux:
>> http://souptonuts.sourceforge.net/quota_tutorial.html)
>> 
>> I have joined the patch in this email.
>> Can you open a bugzilla account for me to keep track of this change.
>> 
>> Thanks,
>> Jean-Sébastien Bevilacqua
>> 

> [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] complex NA's match(), etc: not back-compatible change proposal

2017-04-03 Thread Martin Maechler

> Suharto Anggono Suharto Anggono via R-devel 
> on Sat, 1 Apr 2017 14:10:06 + writes:

> I am raising this again.

> With
> z <- complex(real = c(0,NaN,NaN), imaginary = c(NA,NA,0)) ,
> results of
> sapply(z, match, table = z)
> and
> match(z, z)
> are different in R 3.4.0 alpha. I think they should be the same.

> I suggest changing 'cequal' in unique.c such that a
> complex number that has both NA and NaN matches NA and
> doesn't match NaN, as such complex number is printed as NA.

Thank you very much, Suharto, for the reminder.

I have committed a change to R-devel yesterday, though
your suggestion above had not been 100% clear to me.

What I think we want and I decided to commit
  r72473 | maechler | 2017-04-02 22:23:56 +0200 (Sun, 02 Apr 2017)

was to entirely mimic how R format()s and prints() complex numbers:

1) If a complex number has a real or imaginary which is NA then
   it is formatted / printed as "NA"
   ==>  All such complex numbers should match()
   i.e. match(), unique(), duplicated() treat such complex
   numbers as "the same".

2) The picture is very different with (non-NA)  NaN:
   There, R formats and prints  NaN+1i  or NaN+99i  or 0+1i*NaN
   differently, and [in R-devel only, planned in R 3.4.0 alpha
   in a day or two!]
   match(), unique(), duplicated() now treat them as different.

The change is more consistent notably does give the same result

for   match(z,z)
and   sapply(z, match, table = z)  

for a variety of z (permutations).

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Potential bug in utils::citation()

2017-04-03 Thread Martin Maechler

>>>>> Zhian Kamvar <zkam...@gmail.com>
>>>>> on Sun, 2 Apr 2017 16:26:37 -0500 writes:

> Hi, I believe the function utils::citation() will fail if
> the package specified has two or more citation entries in
> the current R-devel. The following error is issued:

> 'missing' can only be used for arguments

> I have created a working example on github [0] that is
> build using R-devel on travis-ci [1]. Jim Hester has
> potentially identified [2] the source of the problem as
> being from a commit on the 27th [3, 4]. I do not have
> R-devel built on my machine, but I believe this error can
> be reproduced on the current R-devel with:

> if (require("boot") & require("utils"))
>utils::citation("boot")

Correct: it does reproduce the new bug 
and that is due to a change by me, and I had started investigation
on Friday (but not with your package and not having seen a
straighforward example yet).

This will be fixed ASAP, i.e., within hours.
Martin Maechler

> Background:

> My package poppr suddenly started failing check on R-devel
> during a weekly travis-ci job [5] due to the error
> above. Another package of mine, ezec, passed [6]. Both
> contain calls to utils::citation() within the vignettes,
> but poppr has two citations and ezec only has one (called
> from another package).

> Thanks, Zhian

> [0]: https://github.com/zkamvar/citest [1]:
> https://travis-ci.org/zkamvar/citest/jobs/217874351 [2]:
> 
https://github.com/wch/r-source/commit/7890e9e87d44f85ab76c0e786036a191eacd71d1
> [3]: https://svn.r-project.org/R/trunk@72419 [4]:
> 
https://github.com/wch/r-source/commit/7890e9e87d44f85ab76c0e786036a191eacd71d1
> [5]:
> https://travis-ci.org/grunwaldlab/poppr/jobs/216452458
> [6]: https://travis-ci.org/grunwaldlab/ezec/jobs/216452916

> -
> Zhian N. Kamvar, Ph. D.  Postdoctoral Researcher (Everhart
> Lab) Department of Plant Pathology University of
> Nebraska-Lincoln

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Very hard to reproduce bug (?) in R-devel

2017-04-05 Thread Martin Maechler

> Winston Chang 
> on Tue, 4 Apr 2017 15:29:40 -0500 writes:

> I've done some more investigation into the problem, and it is very
> difficult to pin down. What it looks like is happening is roughly like 
this:
> - `p` is an environment and `p$e` is also an environment.
> - There is a loop. In each iteration, it looks for one item in `p$e`, 
saves
> it in a variable `x`, then removes that item from `p$e`. Then it invokes
> `x()`. The loop runs again, until there are no more items in `p$e`.

> The problem is that `ls(p$e)` sometimes returns the wrong values -- it
> returns the values that it had in previous iterations of the loop. The
> behavior is very touchy. Almost any change to the code will slightly 
change
> the behavior; sometimes the `ls()` returns values from a different
> iteration of the loop, and sometimes the problem doesn't happen at all.

> I've put a  Dockerfile and instructions for reproducing the problem here:
> https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88

> I think that I've gotten about as far with this as I can, though I'd be
> happy to provide more information if anyone wants to take look at the
> problem.

Dear Winston,

While I agree this may very well be a bug in R(-devel), and hence
also R in 3.4.0 alpha and hence quite important to be dealt with,

your code still involves 3 non-trivial  packages (DBI, R6,
testthat) some of which have their own C code and notably load
a couple of other package's namespaces.
We've always made a point
  https://www.r-project.org/bugs.html
that bugs in R should be reproducible without extra
packages... and I think it would definitely help to pinpoint the
issue to be seen outside of your extra packages' world. 

Or have you been aware of that and are just asking for help
finding a bug in one of the extra packages involved, a bug that might only be 
triggered by recent changes in R ?

OTOH, what you describe above  (p ; p$e ; p$e$x ...)
should be reproducible in pure "base" R code, right?

I'm sorry not to be of more help
Martin

> -Winston

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Bug report: POSIX regular expression doesn't match for somewhat higher values of upper bound

2017-04-05 Thread Martin Maechler

>   
> on Tue, 4 Apr 2017 08:45:30 + writes:

> Dear Sirs,
> while

>> regexpr('(.{1,2})\\1', 'foo')
> [1] 2
> attr(,"match.length")
> [1] 2
> attr(,"useBytes")
> [1] TRUE

> yields the correct match, an incremented upper bound in

>> regexpr('(.{1,3})\\1', 'foo')
> [1] -1
> attr(,"match.length")
> [1] -1
> attr(,"useBytes")
> [1] TRUE

> incorrectly yields no match.

Hmm, yes, I would also say that this is incorrect
(though I'm always cautious: The  ?regex  help page explicitly
 mentions greedy repetitions, and these can "bite you" ..)

The behavior is also different from the  perl=TRUE one which is
correct (according to the above understanding).

Using  grep() instead of regexpr() makes the behavior easier to parse.
The following code 
--

tx <- c("ab","abc", paste0("foo", c("", "b", "o", "bar", "oofy")))
setNames(nchar(tx), tx)
## ab abc foofoobfooo  foobar ffy
##  2   3   3   4   4   6   7

grep1r <- function(n, txt, ...) {
pattern <- paste0('(.{1,',n,'})\\1', collapse="") ## can have empty n
ans <- grep(pattern, txt, value=TRUE, ...)
cat(sprintf("pattern '%s' : ", pattern)); print(ans, quote=FALSE)
invisible(ans)
}

grep1r({}, tx)# '.{1,}' : because of _greedy_ matching there is __no__ 
repetiion!
grep1r(100,tx)# i.e., these both give an empty match :  character(0)

## matching at most once:
grep1r(1, tx)# matches all 5 starting with "foo"
grep1r(2, tx)# ditto: all have more than 2 chars
grep1r(3, tx)# not "foo": those with more than 3 chars
grep1r(4, tx)# .. those with more than 4 characters
grep1r(5, tx)# .. those with more than 5 characters
grep1r(6, tx)# .. those with more than 6 characters
grep1r(7, tx)# NONE (= those with more than 7 characters)

for(p in c(FALSE,TRUE)) {
cat("\ngrep(*, perl =", p, ") :\n")
for(n in c(list(NULL), 1:7))
grep1r(n, tx, perl = p)
}

--

ends with

> for(p in c(FALSE,TRUE)) {
+ cat("\ngrep(*, perl =", p, ") :\n")
+ for(n in c(list(NULL), 1:7))
+ grep1r(n, tx, perl = p)
+ }

grep(*, perl = FALSE ) :
pattern '(.{1,})\1' : character(0)
pattern '(.{1,1})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,2})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,3})\1' : [1] foobfooofoobar  ffy
pattern '(.{1,4})\1' : [1] foobar  ffy
pattern '(.{1,5})\1' : [1] foobar  ffy
pattern '(.{1,6})\1' : [1] ffy
pattern '(.{1,7})\1' : character(0)

grep(*, perl = TRUE ) :
pattern '(.{1,})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,1})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,2})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,3})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,4})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,5})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,6})\1' : [1] foo foobfooofoobar  ffy
pattern '(.{1,7})\1' : [1] foo foobfooofoobar  ffy
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Potential bug in utils::citation()

2017-04-04 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Mon, 3 Apr 2017 10:22:52 +0200 writes:

>>>>> Zhian Kamvar <zkam...@gmail.com>
>>>>> on Sun, 2 Apr 2017 16:26:37 -0500 writes:

>> Hi, I believe the function utils::citation() will fail if
>> the package specified has two or more citation entries in
>> the current R-devel. The following error is issued:

>> 'missing' can only be used for arguments

>> I have created a working example on github [0] that is
>> build using R-devel on travis-ci [1]. Jim Hester has
>> potentially identified [2] the source of the problem as
>> being from a commit on the 27th [3, 4]. I do not have
>> R-devel built on my machine, but I believe this error can
>> be reproduced on the current R-devel with:

>> if (require("boot") & require("utils"))
>> utils::citation("boot")

> Correct: it does reproduce the new bug 
> and that is due to a change by me, and I had started investigation
> on Friday (but not with your package and not having seen a
> straighforward example yet).

> This will be fixed ASAP, i.e., within hours.

In the end, it took two dozens of hours. The change is
r72478 | maechler | 2017-04-04 11:41:51 +0200 

Martin


>> Background:

>> My package poppr suddenly started failing check on R-devel
>> during a weekly travis-ci job [5] due to the error
>> above. Another package of mine, ezec, passed [6]. Both
>> contain calls to utils::citation() within the vignettes,
>> but poppr has two citations and ezec only has one (called
>> from another package).

>> Thanks, Zhian

>> [0]: https://github.com/zkamvar/citest [1]:
>> https://travis-ci.org/zkamvar/citest/jobs/217874351 [2]:
>> 
https://github.com/wch/r-source/commit/7890e9e87d44f85ab76c0e786036a191eacd71d1
>> [3]: https://svn.r-project.org/R/trunk@72419 [4]:
>> 
https://github.com/wch/r-source/commit/7890e9e87d44f85ab76c0e786036a191eacd71d1
>> [5]:
>> https://travis-ci.org/grunwaldlab/poppr/jobs/216452458
>> [6]: https://travis-ci.org/grunwaldlab/ezec/jobs/216452916

>> -
>> Zhian N. Kamvar, Ph. D.  Postdoctoral Researcher (Everhart
>> Lab) Department of Plant Pathology University of
>> Nebraska-Lincoln

>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] "table(droplevels(aq)$Month)" in manual page of droplevels

2017-04-13 Thread Martin Maechler

>>>>> Rui Barradas <ruipbarra...@sapo.pt>
>>>>> on Wed, 12 Apr 2017 17:07:45 +0100 writes:

> Hello, Inline.

> Em 12-04-2017 16:40, Henric Winell escreveu:
>> (Let's keep the discussion on-list -- I've added back
>> R-devel.)
>> 
>> On 2017-04-12 16:39, Ulrich Windl wrote:
>> 
>>>>> Henric Winell <nilsson.hen...@gmail.com> schrieb am
>> 12.04.2017
>>>>> um 15:35 in
>>> Nachricht
>>> <b66fe849-bb8d-f00d-87e5-553f866d5...@gmail.com>:
>>>> On 2017-04-12 14:40, Ulrich Windl wrote:
>>>> 
>>>>> The last line of the example in droplevels' manual
>>>>> page seems to be incorrect to me. I think it should
>>>>> read: "table(droplevels(aq$Month))". Amazingly (I
>>>>> don't understand) both variants seem to produce the
>>>>> same result (R 3.3.3): ---
>>>> 
>>>> The manual says that "The function 'droplevels' is used
>>>> to drop unused levels from a 'factor' or, more
>>>> commonly, from factors in a data frame." and, as
>>>> documented, the 'droplevels' generic has methods for
>>>> objects of class "data.frame" and "factor".  So, your
>>>> being amazed is a bit surprising given that 'aq' is a
>>>> data frame.
>>> 
>>> The "surprising" thing is the syntax: I was unaware that
>>> '$' is a generic operator that can be applied to the
>>> result of a function (i.e.: droplevels); I thought it's
>>> kind of a special variable syntax.
>> 
>> Then your surprise is unrelated to the use of
>> 'droplevels'.
>> 
>> Since the 'droplevels' method for objects of class
>> "data.frame" returns a data frame, the extraction
>> operator '$' works directly on the resulting object.  So,
>> 'droplevels(aq)$Month' is essentially the same as
>> 
>> aq <- droplevels(aq) aq$Month
>> 
>> > Isn't there also the syntax
>> ``droplevels(aq)["Month"]''?
>> 
>> Sure, and there are even more ways to do subsetting.  But
>> this is basic stuff and therefore off-topic for R-devel.
>> Please see the manual (?Extract) or, e.g., Chapter 3 of
>> Hadley Wickham's "Advanced R".

> But note that droplevels(aq)["Month"] and
> droplevels(aq)$Month are _not_ the same. The first returns
> a data.frame (with just one vector), the latter returns a
> vector. To return just a vector you could also use

> droplevels(aq)[["Month"]]

> which is preferable for programming, by the way. The '$'
> operator should be reserved for interactive use only.

> Hope this helps,

Indeed, we hope..  Thanks to the helpers!

Ulrich, please note that in the end this was all  because you're
still learning to understand R (e.g., data frames !) better.

As such this was completely inappropriate for R-devel and should
have gotten to the R help list  R-help.

With regards,
Martin Maechler, ETH Zurich

> Rui Barradas
>> 
>> 
>> Henric Winell
>>> 
>>> Regards, Ulrich
>>> 
>>>> 
>>>> 
>>>> Henric Winell
>>>> 
>>>> 
>>>> 
>>>>> aq <- transform(airquality, Month = factor(Month, labels =
>>>>> month.abb[5:9])) aq <- subset(aq, Month != "Jul")
>>>>> table(aq$Month)
>>>>> 
>>>>> May Jun Jul Aug Sep 31 30 0 31 30
>>>>> table(droplevels(aq)$Month)
>>>>> 
>>>>> May Jun Aug Sep 31 30 31 30
>>>>> table(droplevels(aq$Month))
>>>>> 
>>>>> May Jun Aug Sep 31 30 31 30
>>>>>> 
>>>>> --- For the sake of learners, try to keep the examples
>>>>> simple and useful, even though you experts want to
>>>>> impress the newbees...
>>>>> 
>>>>> Ulrich
>>>>> 
>>>>> __
>>>>> R-devel@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] c() documentation after change; 'recursive' in "base" methods

2017-04-20 Thread Martin Maechler

> Suharto Anggono Suharto Anggono via R-devel 
> on Wed, 19 Apr 2017 22:50:41 + writes:

> In R 3.4.0 RC, argument list of 'c' as S4 generic function has become
> (x, ...) .
> However, "S4 methods" section in documentation of 'c' (c.Rd) is not 
updated yet.

Thank you, I've committed a change (72564 & 72565).

> Also, in R 3.4.0 RC, 'c' method of class "Date" ('c.Date') is still not 
explicitly documented.

yes, but that's true for other S3 methods, see below.

This is a bigger issue.  Thank you for raising it!  Look at

 R code --

(mc <- methods("c"))
## [1] c.bibentry*   c.Datec.difftimec.noquote 
c.numeric_version
## [6] c.person* c.POSIXct c.POSIXlt c.warnings
## and from `lcNSnm` below, you can see that these are from 'base',
## apart from {bibentry, person} which are from 'utils'
lc <- lapply(mc, function(nm) { f <- getAnywhere(nm) })
names(lc) <- sapply(lc, `[[`, "name")
str(lcwh <- lapply(lc, `[[`, "where"))
lcNSnm <- sub("^namespace:", '', sapply(lcwh, function(v) v[length(v)]))
lcNS <- lapply(lcNSnm, asNamespace)
str(lcMeths <-
sapply(names(lcNS), function(n) get(n, envir=lcNS[[n]], inherits=FALSE),
simplify = FALSE))
## $ c.bibentry   :function (..., recursive = FALSE)
## $ c.Date   :function (..., recursive = FALSE)
## $ c.difftime   :function (..., recursive = FALSE)
## $ c.noquote:function (..., recursive = FALSE)
## $ c.numeric_version:function (..., recursive = FALSE)
## $ c.person :function (..., recursive = FALSE)
## $ c.POSIXct:function (..., recursive = FALSE)
## $ c.POSIXlt:function (..., recursive = FALSE)
## $ c.warnings   :function (..., recursive = FALSE)

 .. --

and from these, only the 'noquote' method has a "\usage{ . }"
documentation.

The reason actually is that I had *wanted* to consider
__removing__ the 'recursive' argument from most of these S3 methods,
since all but  c.numeric_version()  completely disregard it and
it would be nicer if they did not have it.

HOWEVER, if it is removed and a user / code has

val <- c(, recursive = r)

then 'recursive' will become part of 'val' which is not desirable.

I had never thought more about this and if we should try or not to
remove it from the S3 methods in all those cases it is unused
... hoping that callers would also *not* set it.

As _one_ consequence I had decided rather *not* documenting it
for the S3 methods where it is (still ?!) part.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Bug in nlm()

2017-03-03 Thread Martin Maechler

>>>>> Boehnstedt, Marie <boehnst...@demogr.mpg.de>
>>>>> on Fri, 3 Mar 2017 10:23:12 + writes:

> Dear all,
> I have found a bug in nlm() and would like to submit a report on this.
> Since nlm() is in the stats-package, which is maintained by the R Core 
team, bug reports should be submitted to R's Bugzilla. However, I'm not a 
member of Bugzilla. Could anyone be so kind to add me to R's Bugzilla members 
or let me know to whom I should send the bug report?

Dear Marie,

I can do this ... but  are you really sure?  There is
 https://www.r-project.org/bugs.html
which you should spend some time reading if you haven't already.

I think you would post a MRE (Minimal Reproducible Example) here
{or on stackoverflow or ...} if you'd follow what the 'R bugs' web
page (above) recommends and only report a bug after some
feedback from "the public".

Of course, I could be wrong.. and happy if you explain / tell me why.

Best,
Martin Maechler

> Thank you in advance.

> Kind regards,
> Marie B�hnstedt

> Marie B�hnstedt, MSc
> Research Scientist
> Max Planck Institute for Demographic Research
> Konrad-Zuse-Str. 1, 18057 Rostock, Germany
> www.demogr.mpg.de<http://www.demogr.mpg.de/>

> --
> This mail has been sent through the MPI for Demographic ...{{dropped:9}}

> --
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Control statements with condition with greater than one should give error (not just warning) [PATCH]

2017-03-03 Thread Martin Maechler

>>>>> Henrik Bengtsson <henrik.bengts...@gmail.com>
>>>>> on Fri, 3 Mar 2017 00:52:16 -0800 writes:

> I'd like to propose that the whenever the length of condition passed
> to an if or a while statement differs from one, an error is produced
> rather than just a warning as today:

>> x <- 1:2
>> if (x == 1) message("x == 1")
> x == 1
> Warning message:
> In if (x == 1) message("x == 1") :
> the condition has length > 1 and only the first element will be used

> There are probably legacy reasons for why this is accepted by R in the
> first place, but I cannot imagine than anyone wants to use an if/while
> statement this way on purpose.  The warning about this misuse, was
> introduced in November 2002 (R-devel thread 'vector arguments to
> if()'; https://stat.ethz.ch/pipermail/r-devel/2002-November/025537.html).

yes, before, there was *no* warning at all and so the problem existed
in several partly important R packages.

Now is a different time, I agree, and I even tend to agree we
should make this an error... probably however not for the
upcoming R 3.4.0 (in April which is somewhat soon) but rather
for the next version.


> Below is patch (also attached) that introduces option
> 'check.condition' such that when TRUE, 

ouch ouch ouch!   There are many sayings starting with
  "The way to hell "

Here:

The way to R hell starts (or "widens", your choice) by
introducing options() that influence basic language semantics

!!

For robust code you will start to test all code of R for all
different possible combinations of these options set  I am
sure you would not want this.

No --- don't even think of allowing an option for something such basic!

Martin Maechler
ETH Zurich (and R Core)

> it will generate an error
> rather than a warning (default).  This option allows for a smooth
> migration as it can be added to 'R CMD check --as-cran' and developers
> can give time to check and fix their packages.  Eventually,
> check.condition=TRUE can become the new default.

> With options(check.condition = TRUE), one gets:

>> x <- 1:2
>> if (x == 1) message("x == 1")
> Error in if (x == 1) message("x == 1") : the condition has length > 1

> and

>> while (x < 2) message("x < 2")
> Error in while (x < 2) message("x < 2") : the condition has length > 1


> Index: src/library/base/man/options.Rd
> ===
> --- src/library/base/man/options.Rd (revision 72298)
> +++ src/library/base/man/options.Rd (working copy)
> @@ -86,6 +86,11 @@
> vector (atomic or \code{\link{list}}) is extended, by something
> like \code{x <- 1:3; x[5] <- 6}.}

> +\item{\code{check.condition}:}{logical, defaulting to \code{FALSE}.  
If
> +  \code{TRUE}, an error is produced whenever the condition to an
> +  \code{if} or a \code{while} control statement is of length greater
> +  than one.  If \code{FALSE}, a \link{warning} is produced.}
> +
> \item{\code{CBoundsCheck}:}{logical, controlling whether
> \code{\link{.C}} and \code{\link{.Fortran}} make copies to check for
> array over-runs on the atomic vector arguments.
> @@ -445,6 +450,7 @@
> \tabular{ll}{
> \code{add.smooth} \tab \code{TRUE}\cr
> \code{check.bounds} \tab \code{FALSE}\cr
> +\code{check.condition} \tab \code{FALSE}\cr
> \code{continue} \tab \code{"+ "}\cr
> \code{digits} \tab \code{7}\cr
> \code{echo} \tab \code{TRUE}\cr
> Index: src/library/utils/R/completion.R
> ===
> --- src/library/utils/R/completion.R (revision 72298)
> +++ src/library/utils/R/completion.R (working copy)
> @@ -1304,8 +1304,8 @@
> "plt", "ps", "pty", "smo", "srt", "tck", "tcl", "usr",
> "xaxp", "xaxs", "xaxt", "xpd", "yaxp", "yaxs", "yaxt")

> -options <- c("add.smooth", "browser", "check.bounds", "continue",
> - "contrasts", "defaultPackages", "demo.ask", "device",
> +options <- c("add.smooth", "browser", "check.bounds", 
"check.condition",
> +"continue", "contrasts", "defaultPackages", "demo.ask", "device",
> "di

Re: [Rd] named arguments in formula and terms

2017-03-13 Thread Martin Maechler

Dear Achim,

> Achim Zeileis 
> on Fri, 10 Mar 2017 15:02:38 +0100 writes:

> Hi, we came across the following unexpected (for us)
> behavior in terms.formula: When determining whether a term
> is duplicated, only the order of the arguments in function
> calls seems to be checked but not their names. Thus the
> terms f(x, a = z) and f(x, b = z) are deemed to be
> duplicated and one of the terms is thus dropped.

R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels")
> [1] "f(x, a = z)"

> However, changing the arguments or the order of arguments
> keeps both terms:

R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels")
> [1] "f(x, a = z)" "f(x, b = zz)"
R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels")
> [1] "f(x, a = z)" "f(b = z, x)"

> Is this intended behavior or needed for certain terms?

> We came across this problem when setting up certain smooth
> regressors with different kinds of patterns. As a trivial
> simplified example we can generate the same kind of
> problem with rep(). Consider the two dummy variables rep(x
> = 0:1, each = 4) and rep(x = 0:1, times = 4). With the
> response y = 1:8 I get:

R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))

> Call: lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x
> = 0:1, times = 4))

> Coefficients: (Intercept) rep(x = 0:1, each = 4) 2.5 4.0

> So while the model is identified because the two
> regressors are not the same, terms.fomula does not
> recognize this and drops the second regressor.  What I
> would have wanted can be obtained by switching the
> arguments:

R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times =4))

> Call: lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x
> = 0:1, times = 4))

> Coefficients: (Intercept) rep(each = 4, x = 0:1) rep(x =
> 0:1, times = 4) 2 4 1

> Of course, here I could avoid the problem by setting up
> proper factors etc. But to me this looks a potential bug
> in terms.formula...

I agree that there is a bug.
According to https://www.r-project.org/bugs.html
I have generated an R bugzilla account for you so you can report
it there (for "book keeping", posteriority, etc).

> Thanks in advance for any insights, Z

and thank *you* (and Nikolaus ?) for the report!

Best regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Control statements with condition with greater than one should give error (not just warning) [PATCH]

2017-03-06 Thread Martin Maechler

>>>>> Michael Lawrence <lawrence.mich...@gene.com>
>>>>> on Sat, 4 Mar 2017 12:20:45 -0800 writes:

> Is there really a need for these complications? Packages
> emitting this warning are broken by definition and should be fixed. 

I agree and probably Henrik, too.

(Others may disagree to some extent .. and find it convenient
 that R does translate 'if(x)'  to  'if(x[1])'  for them albeit
 with a warning .. )

> Perhaps we could "flip the switch" in a test
> environment and see how much havoc is wreaked and whether
> authors are sufficiently responsive?

> Michael

As we have > 10'000 packages on CRAN alonce,  and people have
started (mis)using suppressWarnings(.) in many places,  there
may be considerably more packages affected than we optimistically assume...

As R core member who would  "flip the switch"  I'd typically then
have to be the one sending an e-mail to all package maintainers
affected and in this case I'm very reluctant to volunteer
for that and so, I'd prefer the environment variable where R
core and others can decide how to use it .. for a while .. until
the flip is switched for all.

or have I overlooked an issue?

Martin

> On Sat, Mar 4, 2017 at 12:04 PM, Martin Maechler
> <maech...@stat.math.ethz.ch
>> wrote:

>> >>>>> Henrik Bengtsson <henrik.bengts...@gmail.com> >>>>>
>> on Fri, 3 Mar 2017 10:10:53 -0800 writes:
>> 
>> > On Fri, Mar 3, 2017 at 9:55 AM, Hadley Wickham >
>> <h.wick...@gmail.com> wrote: >>> But, how you propose a
>> warning-to-error transition >>> should be made without
>> wreaking havoc?  Just flip the >>> switch in R-devel and
>> see CRAN and Bioconductor packages >>> break overnight?
>> Particularly Bioconductor devel might >>> become
>> non-functional (since at times it requires >>> R-devel).
>> For my own code / packages, I would be able >>> to handle
>> such a change, but I'm completely out of >>> control if
>> one of the package I'm depending on does not >>> provide
>> a quick fix (with the only option to remove >>> package
>> tests for those dependencies).
>> >>
>> >> Generally, a package can not be on CRAN if it has any
>> >> warnings, so I don't think this change would have any
>> >> impact on CRAN packages.  Isn't this also true for >>
>> bioconductor?
>> 
>> > Having a tests/warn.R file with:
>> 
>> > warning("boom")
>> 
>> > passes through R CMD check --as-cran unnoticed.
>> 
>> Yes, indeed.. you are right Henrik that many/most R
>> warning()s would not produce R CMD check 'WARNING's ..
>> 
>> I think Hadley and I fell into the same mental pit of
>> concluding that such warning()s from
>> if() ...  would not currently happen
>> in CRAN / Bioc packages and hence turning them to errors
>> would not have a direct effect.
>> 
>> With your 2nd e-mail of saying that you'd propose such an
>> option only for a few releases of R you've indeed
>> clarified your intent to me.  OTOH, I would prefer using
>> an environment variable (as you've proposed as an
>> alternative) which is turned "active" at the beginning
>> only manually or for the "CRAN incoming" checks of the
>> CRAN team (and bioconductor submission checks?)  and
>> later for '--as-cran' etc until it eventually becomes the
>> unconditional behavior of R (and the env.variable is no
>> longer used).
>> 
>> Martin
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 

>   [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Please add me to bugzilla

2017-03-06 Thread Martin Maechler

>>>>> Bradley Broom <bmbr...@gmail.com>
>>>>> on Mon, 6 Mar 2017 06:55:35 -0600 writes:

> Apologies, I thought I was following exactly that sentence
> and trying to make a minimal post that would waste as
> little developer bandwidth as possible given the lack of a
> better system.

I understand.   My apologies now, as I was mistrusting, clearly
wrongly in this case.

> Anyway, I have been using R for like forever (20 years).

> In my current project, I have run into problems with stack
> overflows in R's dendrogram code when trying to use either
> str() or as.hclust() on very deep dendrograms.

I understand.  Indeed, bug PR#16424 
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16424
encountered the same problem in other dendrogram functions and
solved it by re-programming the relevant parts non-recursively,
too.

   [.]  

> What should happen: Function completes without a stack
> overflow.

> 2nd bug: hh <- as.hclust(de)

> What happens: Error: C stack usage 7971248 is too close to the limit

> What should happen: Function completes without a stack
> overflow.

> A knowledgeable user might be able to increase R's limits
> to avoid these errors on this particular dendrogram, but
> a) my users aren't that knowledgeable about R and this is
> expected to be a common problem, and b) there will be
> bigger dendrograms (up to at least 25000 leaves).

Agreed.  The current help pages warns about the problem and
gives advice (related to increasing the stack), but what you propose
is better, i.e., re-implementing relevant parts non-recursively.

> Please see attached patch for non-recursive
> implementations.

Very well done, thank you a lot!
[and I will add you to bugzilla .. so you can use it for the
 next bug .. ;-)]

Best,
Martin

> Regards, Bradley



> On Mon, Mar 6, 2017 at 3:50 AM, Martin Maechler
> <maech...@stat.math.ethz.ch> wrote:

>> >>>>> Bradley Broom <bmbr...@gmail.com> >>>>> on Sun, 5
>> Mar 2017 16:03:30 -0600 writes:
>> 
>> > Please add me to R bugzilla.  Thanks, Bradley
>> 
>> Well, I will not do it just like that (mean "after such a
>> minimal message").
>> 
>> I don't see any evidence as to your credentials,
>> knowledge of R, etc, as part of this request.  We are all
>> professionals, devoting part of our (work and free) time
>> to the R project (rather than employees of the company
>> you paid to serve you ...)
>> 
>> It may be that you have read
>> https://www.r-project.org/bugs.html
>> 
>> Notably this part
>> 
--> NOTE: due to abuse by spammers, since 2016-07-09 only
--> users who have
>> previously submitted bugs can submit new ones on R’s
>> Bugzilla. We’re working on a better system… In the mean
>> time, post (e-mail) to R-devel or ask an R Core member to
>> add you manually to R’s Bugzilla members.
>> 
>> The last sentence was *meant* to say you should post
>> (possibly parts, ideally a minimal reproducible example
>> of) your bug report to R-devel so others could comment on
>> it, agree or disagree with your assessment etc, __or__
>> ask an R-core member to add you to bugzilla (if you
>> really read the other parts of the 'R bugs' web page
>> above).
>> 
>> Posting to all 1000 R-devel readers with no content about
>> what you consider a bug is a waste of bandwidth for at
>> least 99% of these readers.
>> 
>> [Yes, I'm also using their time ... in the hope to
>> *improve* the quality of future such postings].
>> 
>> Martin Maechler ETH Zurich
>> 
> x external: dendro-non-recursive.patch text/x-patch, u
> [Click mouse-2 to display text]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Control statements with condition with greater than one should give error (not just warning) [PATCH]

2017-03-04 Thread Martin Maechler

> Henrik Bengtsson 
> on Fri, 3 Mar 2017 10:10:53 -0800 writes:

> On Fri, Mar 3, 2017 at 9:55 AM, Hadley Wickham
>  wrote:
>>> But, how you propose a warning-to-error transition
>>> should be made without wreaking havoc?  Just flip the
>>> switch in R-devel and see CRAN and Bioconductor packages
>>> break overnight?  Particularly Bioconductor devel might
>>> become non-functional (since at times it requires
>>> R-devel).  For my own code / packages, I would be able
>>> to handle such a change, but I'm completely out of
>>> control if one of the package I'm depending on does not
>>> provide a quick fix (with the only option to remove
>>> package tests for those dependencies).
>> 
>> Generally, a package can not be on CRAN if it has any
>> warnings, so I don't think this change would have any
>> impact on CRAN packages.  Isn't this also true for
>> bioconductor?

> Having a tests/warn.R file with:

> warning("boom")

> passes through R CMD check --as-cran unnoticed.  

Yes, indeed.. you are right Henrik  that many/most R warning()s would
not produce  R CMD check  'WARNING's ..

I think Hadley and I fell into the same mental pit of concluding
that such warning()s  from   if()  ...
would not currently happen in CRAN / Bioc packages and hence
turning them to errors would not have a direct effect.

With your 2nd e-mail of saying that you'd propose such an option
only for a few releases of R you've indeed clarified your intent
to me.
OTOH, I would prefer using an environment variable (as you've
proposed as an alternative)  which is turned "active"  at the
beginning only manually or  for the  "CRAN incoming" checks of
the CRAN team (and bioconductor submission checks?)
and later for  '--as-cran'  etc until it eventually becomes the
unconditional behavior of R (and the env.variable is no longer used).

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Please add me to bugzilla

2017-03-06 Thread Martin Maechler

>>>>> Bradley Broom <bmbr...@gmail.com>
>>>>> on Sun, 5 Mar 2017 16:03:30 -0600 writes:

> Please add me to R bugzilla.  Thanks, Bradley

Well, I will not do it just like that (mean "after such a
minimal message").

I don't see any evidence as to your credentials, knowledge of R,
etc, as part of this request.  We are all professionals,
devoting part of our (work and free) time to the R project
(rather than employees of the company you paid to serve you ...)

It may be that you have read   https://www.r-project.org/bugs.html

Notably this part

--> NOTE: due to abuse by spammers, since 2016-07-09 only users who have 
previously submitted bugs can submit new ones on R’s Bugzilla. We’re working on 
a better system… In the mean time, post (e-mail) to R-devel or ask an R Core 
member to add you manually to R’s Bugzilla members.

The last sentence was *meant* to say you should post (possibly
parts, ideally a minimal reproducible example of) your bug
report to R-devel so others could comment on it, agree or
disagree with your assessment etc,
__or__ ask an R-core member to add you to bugzilla (if you really read the
other parts of the 'R bugs' web page above).

Posting to all 1000 R-devel readers with no content about what
you consider a bug  is a waste of bandwidth for at least 99% of
these readers.

[Yes, I'm also using their time ... in the hope to *improve* the
 quality of future such postings].

Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stats::median

2017-03-01 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Mon, 27 Feb 2017 10:42:19 +0100 writes:

>>>>> Rob J Hyndman <rob.hynd...@monash.edu>
>>>>> on Wed, 15 Feb 2017 21:48:56 +1100 writes:

>> The generic stats::median method is defined as median <-
>> function (x, na.rm = FALSE) {UseMethod("median")}

>> I suggest that this should become median <- function (x,
>> na.rm = FALSE, ...)  {UseMethod("median")}

>> This would allow additional S3 methods to be developed
>> with additional arguments.

> and S4 methods, too.

>> Currently I have to over-ride this generic definition in
>> the demography package because median.demogdata has
>> several other arguments.

>> This shouldn't break any code, and will make it easier
>> for new S3 methods to be developed. It is also consistent
>> with almost all other S3 methods which do include an
>> ellipsis.

> "shouldn't break any code" is almost always quite
> optimistic nowadays,

For CRAN, the change leads   13 packages (out of > 1) to
"regress" to  status: WARN.

I've checked 10 of them, and all these define  median() S3
methods, and currently of course have not had the '...' in their
formal argument list(s).

They (and all other useRs who define median() S3 methods and
want their code to work both in R <= 3.3.x _and_ R >= 3.4.0
could use code such as
(for package 'sets' in R/summary.R )

median.set <-
function(x, na.rm = FALSE, ...)
{
median(as.numeric(x), na.rm = na.rm, ...)
}

## drop '...' in R versions <= 3.4.0 :
if((!any("..." == names(formals(median {
formals(median.set) <- formals(median.set)[names(formals(median.set)) != 
"..."]
body(median.set)[[2]] <- body(median.set)[[2]][-4]
}

or simply

median.cset <-
if("..." %in% names(formals(median))) {
function(x, na.rm = FALSE, ...) median.gset(x, na.rm = na.rm, ...)
} else
function(x, na.rm = FALSE)  median.gset(x, na.rm = na.rm)

which is R code that will work fine in both current (and older)
R and in R-devel and future R versions.

For packages however, this will leave a 'R CMD check '
warning (for now) because code and documentation mismatch
either in R-devel (and future)  R  or in current and previous R versions.

It is less clear what to do for these man i.e. *.Rd  pages [if you
have them for your median method(s): Note that they *are* optional for
registered S3 methods; package 'sets', e.g., documents 2 out of
4 median methods]. 

It may (or may not) make sense to tweak R-devel's own 'R CMD check'
to _not_ warn for the missing '...' in median methods for a
while and consequently you'd get away with continued use of no
'...' in the help page \usage{ ... } section.

One solution of course, would be to wait a bit and then release
such package only with

Depends: R (>= 3.4.0)

where you'd  use  '...' and keep the previous CRAN version of
the package for all earlier versions of R.
That is a maintenance pain however, if you want to change your
package features, because then you'd have to start releasing to
versions of the package: an "old" one with

Depends: R (< 3.4.0)

and a "new" one with   R (>= 3.4.0).

Probably easiest would be to comment the \usage{.} / \arguments \item{...}
parts for the time being {as long as you don't want R (>= 3.4.0)
in your package DESCRIPTION "unconditionally"}.

--

Tweaking  R-devel's  tools::codoc()  for this special case may
be a solution liked more by package maintainers for this case.
OTOH, we can only change R-devel's version of codoc(), so it
would be that platform which would show slightly inaccurate
"Usage:" for these (by not showing "...")  which also seems a
kludgy solution.

> Actually it probably will break things when people start
> using the new R version which implements the above *AND*
> use packages installed with a previous version of R.  I
> agree that this does not count as "breaking any code".

> In spite of all that *and* the perennial drawback that a
> '...' will allow argument name typos to go unnoticed

> I agree you have a good argument nowadays, that median()
> should be the same as many similar "basic statistics" R
> functions and so I'll commit such a change to R-devel (to
> become R 3.4.0 in April).

> Thank you for the suggestion!  Martin Maechler, ETH Zurich

>> -
>> Rob J Hyndman Professor of Statistics, Monash University
>> www.robjhyndman.com

>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Bug in nlm()

2017-03-08 Thread Martin Maechler

   {This was sent to me, MM, only, but for completeness should
have gone back to R-devel.

Further: I now *have* added Marie B to the members'of "R bugzilla"
-- M.Maechler}


I had already read the R bug reporting guide and I'm sure it is a bug.
The bug occurs when the user provides not only the analytic gradient but also 
the analytic Hessian of the objective function. In that case, the algorithm 
does not converge due to an erroneous implementation of the modified Cholesky 
decomposition of the Hessian matrix. It is actually a bug in the C-code called 
by nlm(), therefore it is hard to show that the non-convergence of the 
algorithm is really due to this bug with only a MRE.
However, a short example (optimizing the Rosenbrock banana valley function with 
and without analytic Hessian) is:

fg <- function(x){  
  gr <- function(x1, x2) c(-400*x1*(x2 - x1*x1)-2*(1-x1), 200*(x2 - x1*x1)) 
  x1 <- x[1]; x2 <- x[2]
  res<- 100*(x2 - x1*x1)^2 + (1-x1)^2 
  attr(res, "gradient") <- gr(x1, x2)
  return(res)
} 
nlm.fg <- nlm(fg, c(-1.2, 1))

fgh <- function(x){ 
  gr <- function(x1, x2) c(-400*x1*(x2 - x1*x1) - 2*(1-x1), 200*(x2 - x1*x1)) 
  h <- function(x1, x2){
a11 <- 2 - 400*x2 + 1200*x1*x1
a21 <- -400*x1 
matrix(c(a11, a21, a21, 200), 2, 2)
  } 
  x1 <- x[1];  x2 <- x[2];  res<- 100*(x2 - x1*x1)^2 + (1-x1)^2 
  attr(res, "gradient") <- gr(x1, x2)
  attr(res, "hessian") <- h(x1, x2) 
  return(res)
}
nlm.fgh <- nlm(fgh, c(-1.2,1))

I have almost finished a more detailed bug report, which I would like to submit.

Best,
Marie Boehnstedt

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Fri, 3 Mar 2017 18:15:47 +0100 writes:

>>>>> Boehnstedt, Marie <boehnst...@demogr.mpg.de>
>>>>> on Fri, 3 Mar 2017 10:23:12 + writes:

>> Dear all, I have found a bug in nlm() and would like to
>> submit a report on this.  Since nlm() is in the
>> stats-package, which is maintained by the R Core team,
>> bug reports should be submitted to R's Bugzilla. However,
>> I'm not a member of Bugzilla. Could anyone be so kind to
>> add me to R's Bugzilla members or let me know to whom I
>> should send the bug report?

> Dear Marie,

> I can do this ... but are you really sure?  There is
> https://www.r-project.org/bugs.html which you should spend
> some time reading if you haven't already.

> I think you would post a MRE (Minimal Reproducible
> Example) here {or on stackoverflow or ...} if you'd follow
> what the 'R bugs' web page (above) recommends and only
> report a bug after some feedback from "the public".

> Of course, I could be wrong.. and happy if you explain /
> tell me why.

> Best, Martin Maechler

>> Thank you in advance.

>> Kind regards, Marie B�hnstedt


>> Marie B�hnstedt, MSc Research Scientist Max Planck
>> Institute for Demographic Research Konrad-Zuse-Str. 1,
>> 18057 Rostock, Germany
>> www.demogr.mpg.de<http://www.demogr.mpg.de/>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] translateChar in NewName in bind.c

2017-07-31 Thread Martin Maechler

> Suharto Anggono Suharto Anggono via R-devel 
> on Sun, 30 Jul 2017 14:57:53 + writes:

> R devel's bind.c has been ported to R patched. Is it OK while names of 
'unlist' or 'c' result may be not strictly the same as in R 3.4.1 because of 
changed function 'NewName' in bind.c?

> Using 'translateCharUTF8' instead of 'translateChar' is as it should be. 
It has an effect in non-UTF-8 locale for this example.

> x <- list(1:2)
> names(x) <- "\ue7"
> res <- unlist(x)
> charToRaw(names(res)[1])

> Directly assigning 'tag' to 'ans' is more efficient, but
> may be different from in R 3.4.1 that involves
> 'translateCharUTF8', that is also correct. It has an
> effect for this example. 

> x <- 0
> names(x) <- "\xe7"
> Encoding(names(x)) <- "latin1"
> res <- c(x)
> Encoding(names(res))
> charToRaw(names(res))

Yes, you are right, thank you:

That part of the changes in bind.c was *not* directly related to
the two R-bugs (PR#17284 & PR#17292)... and therefore, maybe I
should not have ported it to R-patched (= R 3.4.1 patched).

Your examples above are instructive..  notably the 2nd one seems
to demonstrate to me, that the change also *did* fix a bug:

   Encoding(names(res))

is "latin1" in R-devel  but interestingly is "UTF-8" in R 3.4.1,
indeed independently of the locale.

I would argue R-devel (and current R-patched) is more faithful
by keeping the Encoding "latin1" that was set for names(x) also
in the  names(c(x)) .

I could revert R-patched's  bind.c (so it only contains the two
official bug fixes PR#172(84|92)   but I wonder if it is
desirable in this case.

I'm glad for further reasoning.
Given current "knowledge"/"evidence",  I would not  revert
R-patched to R 3.4.1's behavior.

Martin

> 
> On Tue, 13/6/17, Tomas Kalibera  wrote:

> Subject: Re: [Rd] translateChar in NewName in bind.c

> @r-project.org
> Date: Tuesday, 13 June, 2017, 2:35 PM

> Thanks, fixed in R-devel.
> Best
> Tomas

[.]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Problem compiling R patched and R devel on Ubuntu

2017-08-03 Thread Martin Maechler

> Berwin A Turlach 
> on Thu, 3 Aug 2017 15:27:56 +0800 writes:

> G'day all,
> since about a week my daily re-compilations of R patched and R devel
> are falling over, i.e. they stop with an error during "make
> check" (while building the 32 bit architecture) on my Ubuntu 16.04.3
> LTS machine.

Dear Berwin,

thanks a lot for the report!

>  Specifically, a test in graphics-Ex.R seems to fail and
> the last lines of graphics-ex.Rout.fail are:

>> ## Extreme outliers; the "FD" rule would take very large number of
> 'breaks': 
>> XXL <- c(1:9, c(-1,1)*1e300)
>> hh <- hist(XXL, "FD") # did not work in R <= 3.4.1; now gives
> warning 
> Warning in hist.default(XXL, "FD") :
> 'breaks = 4.44796e+299' is too large and set to 1e9
> Error in pretty.default(range(x), n = breaks, min.n = 1) : 
> cannot allocate vector of length 11
> Calls: hist -> hist.default -> pretty -> pretty.default
> Execution halted

> My R 3.4.1 installation, the last R patched version that I could
> compile (R version 3.4.1 Patched (2017-07-26 r72974)) and the last R
> devel version that I could compile (R Under development (unstable)
> (2017-07-26 r72974)) 

  ((well, well ... you could also compile later versions.  It was
"only"  'make check' that failed ..))

> give the following results (under the 32bit architecture
> and the 64bit architecture): 

>> XXL <- c(1:9, c(-1,1)*1e300)
>> hh <- hist(XXL, "FD")
> Error in pretty.default(range(x), n = breaks, min.n = 1) : 
> invalid 'n' argument
> In addition: Warning message:
> In pretty.default(range(x), n = breaks, min.n = 1) :
> NAs introduced by coercion to integer range

  [yes, that was the bug; see below]

> Not sure if this is a general problem, or only a problem on my machine.

It is not a problem on 64-bit I think.  This is related to the
bug and bugfix  for PR#17274 
   ( https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17274 )
which I had handled.

Now I see the above, I can well imagine that I had made
assumptions that only worked on my (64-bit) platform.
I'll have a look and will amend the bug fix to also work on
"smaller" platforms - hopefully today.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] rnorm is not truly random used in the lm function

2017-08-03 Thread Martin Maechler

> Victor Tian 
> on Thu, 3 Aug 2017 09:49:57 -0400 writes:

> To whom it may concern,
> I happened to run the following R code just to check the layout of the
> output, but found that the code doesn't work the way I thought it should
> work.

yes, your expectations were wrong.

>> lm(rnorm(100) ~ rnorm(100))

> Call:
> lm(formula = rnorm(100) ~ rnorm(100))

> Coefficients:
> (Intercept)
> -0.07966

> Warning messages:
> 1: In model.matrix.default(mt, mf, contrasts) :
> the response appeared on the right-hand side and was dropped
> 2: In model.matrix.default(mt, mf, contrasts) :
> problem with term 1 in model.matrix: no columns are assigned

> It appears that rnorm(100) produces the same array of numbers on both 
sides
> of the ~ sign.

Indeed.  And all this has nothing to do with lm()  but rather with
how formulas in R have been treated probably "forever".
[I assume not only in R, but rather since the time formulas
 where introduced into the S language (for "S version 3") a few
 years before R was born. But I can no longer verify or disprove
 this assumption.] 

Even more revealing may be this:

> f <- rnorm(9) ~ rnorm(9)
> str(f)
Class 'formula'  language rnorm(9) ~ rnorm(9)
  ..- attr(*, ".Environment")= 
> (mm <- model.matrix(f))
  (Intercept)
1   1
2   1
3   1
4   1
5   1
6   1
7   1
8   1
9   1
attr(,"assign")
[1] 0
Warning messages:
1: In model.matrix.default(f) :
  the response appeared on the right-hand side and was dropped
2: In model.matrix.default(f) :
  problem with term 1 in model.matrix: no columns are assigned
> 
-

BTW: One of the goals of formulas,  notably in R since they got an
environment attached, is a clean way to deal with non-standard
evaluation (=: NSE).
[ Some of us would claim it is the only clean way to deal with NSE in R,
  and all new functionality using NSE should use formulas,
  but recently tidyverse-scholars have claimed to be able to deal
  with it cleanly w/o the use of formulas, but via "tidy evaluation" ]

Using random expressions in a formula is therefore typically not
a good idea, because you don't realy know when the terms in the
formula will be evaluated.
For lm() and all other good formula-based statistical modeling
functions, the evaluation happens via model.matrix().

As you've noticed from that warning, model.matrix() tries to
help the user by checking terms and eliminating those that
appear on both sides of the '~'.
This has been documented on the help page [ ?model.matrix ] for
(almost exactly 14) years, the "Details:" section ending with

 _> By convention, if the response variable also appears on the
 _> right-hand side of the formula it is dropped (with a warning),
 _> although interactions involving the term are retained.

I hope this explains the issue.
And yes:  Do *not* use rnorm() in formulas.

Martin

--
Martin Mächler 
Seminar für Statistik, ETH Zürich //  R Core Team

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Issues of R_pretty in src/appl/pretty.c

2017-08-15 Thread Martin Maechler

>>>>> Martin Maechler <maech...@stat.math.ethz.ch>
>>>>> on Mon, 14 Aug 2017 11:46:07 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
>>>>> on Fri, 11 Aug 2017 17:11:06 + writes:
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org>
>>>>> on Fri, 11 Aug 2017 17:11:06 + writes:

>> See https://stat.ethz.ch/pipermail/r-devel/2017-August/074746.html for 
the origin of the example here.

>> That
>> pretty(c(-1,1)*1e300, n = 1e9, min.n = 1) gave 20 intervals, far from 
1e9, but
>> pretty(c(-1,1)*1e300, n = 1e6, min.n = 1) gave 100 intervals
>> (on a machine), made me trace through the code to function 'R_pretty' in 
https://svn.r-project.org/R/trunk/src/appl/pretty.c .

> thank you.

>> *lo is -1e300, *up is 1e300.
>> cell = fmax2(fabs(*lo),fabs(*up));
>> 'cell' is 1e300.
>> i_small = dx < cell * U * imax2(1,*ndiv) * DBL_EPSILON *3;
>> When *ndiv is (int) 1e9, apparently cell * U * imax2(1,*ndiv) overflows 
to infinity and 'i_small' is 1 (true). It doesn't happen when *ndiv is (int) 
1e6.

> well spotted!

>> Putting parentheses may avoid the floating point overflow. For example,
>> i_small = dx < cell * (U * imax2(1,*ndiv) * DBL_EPSILON) *3;

> yes... but only if the compiler optimization steps  "keep the 
parentheses".
> AFAIK, there is no guarantee for that.
> To make sure, I'd replace the above by

> U *= imax2(1,*ndiv) * DBL_EPSILON;
> i_small = dx < cell * U * 3;


>> The part
>> U = (1 + (h5 >= 1.5*h+.5)) ? 1/(1+h) : 1.5/(1+h5);
>> is strange. Because (h5 >= 1.5*h+.5) is 1 or 0, (1 + (h5 >= 1.5*h+.5)) 
is never zero and 1/(1+h) will always be chosen.

> Yes, strange indeed!
> here was as a change (not by me!) adding wrong parentheses
> there (or maybe adding what the previously "missing" parens
> implied, but not what they intended!).
> The original code had been
 
> U = 1 + (h5 >= 1.5*h+.5) ? 1/(1+h) : 1.5/(1+h5);

> and "of course" was intended to mean

> U = 1 + ((h5 >= 1.5*h+.5) ? 1/(1+h) : 1.5/(1+h5));

> and this what I'll change it to, now.


>> The comment for 'rounding_eps' says "1e-7 is consistent with 
seq.default()". Currently, seq.default() uses 1e-10 as fuzz.

> Hmm, yes, thank you; this was correct when written,
> but seq.default had been changed in the mean time,
> namely in  svn r51095 | 2010-02-03

> Usually we are cautious / reluctant to change such things w/o
> any bug that we see to fix.
> OTOH, we did have  bug cases we'd wanted to amend for seq() /
> seq.int();
> and I'll look into updating the "pretty - epsilon" also to
> 1e-10.

> Thank you for your analysis and suggestions!

I've committed now what I think has been suggested
above ... to R-devel only :

r73094 | maechler | 2017-08-15 09:10:27 +0200 (Tue, 15. Aug 2017) | 1 Zeile
Geänderte Pfade:
   M doc/NEWS.Rd
   M src/appl/pretty.c
   M src/main/engine.c
   M tests/reg-large.R
   M tests/reg-tests-2.Rout.save

pretty(x, n) fix overflow for large n suggested by Suhartu Aggano, R-devel, 
2017-08-11

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Help to create bugzilla account

2017-08-11 Thread Martin Maechler

> Dmitriy Selivanov 
> on Fri, 11 Aug 2017 17:33:31 +0400 writes:

> Hi mailing list and R-core. Could someone from R-core please help me to
> create account in bugzilla? I would like to submit issue related to gc() 
to
> wishlist.

I will create one.

Your previous e-mails left me pretty clueless about what the
problem is that you want to solve ... but maybe others
understand better what you mean.

Note that in the case of such a relatively sophisticated wish
without a clear sign of a problem (in my view)
chances are not high that anything will change, unless someone
provides a (small footprint) patch towards the (R-devel aka
"trunk") sources *and* reproducible R code that depicts the
problem.

Still: Thank you for trying to make R better by contributing with
careful bug reports !

Best,
Martin

> Related context is here -
> https://stat.ethz.ch/pipermail/r-devel/2017-July/074715.html

> -- 
> Regards
> Dmitriy Selivanov

> [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

< 5 6 7 8 9 10 11 12 13 14 >

901 - 1000 of 1728 matches

Mail list logo