Re: [Rd] Unexpected argument-matching when some are missing

2018-12-03 Thread Emil Bode
Thanks as well!
I'm now getting how it's exactly matched, but it still "feels wrong".
Martins rewording is exactly what I meant/was concerned about.
My intuition would say that anyone using ff(x=, ...) would not intent x to be 
matched to something else, but maybe I'm overlooking certain cases.
Anyway, I agree that throwing a warning would probably be the best solution.
I don't exactly know how to test for compatibility for such changes (never 
really worked with CRAN/extensive testing), but if I can do something to help 
I'd be glad to.
And if it turns out to be too disruptive, maybe we can write Patrick Burns (the 
R-inferno) ; - )

Best regards, 
Emil Bode
 
On 03/12/2018, 10:57, "Martin Maechler"  wrote:

>>>>> Michael Lawrence 
>>>>> on Fri, 30 Nov 2018 08:24:31 -0800 writes:

> Argument matching is by name first, then the still missing
> arguments are filled positionally. Unnamed missing
> arguments are thus left missing. Does that help?

Thank you, Michael!
Unfortunately, it may not help sufficiently notably once this
thread will be forgotten, even though I had thought so exactly
as well.  Of course we two may find R's matching algorithm
entirely intuitive, but e.g., Ista expected R even "to throw an
error" in this case, and there are about 99% of R users less savvy than
him, so let me think loudly a bit further ...
IIUC, Emil's case is mostly about this

  > ff <- function(x,y,z,...) list(sysC=sys.call(), match=match.call())
  > str( ff(x=, z=pi, "foo") )
  List of 2
   $ sysC : language ff(x = , z = pi, "foo")
   $ match: language ff(x = "foo", z = pi)
  > 

where the argument matching rule above would have suggested to him that the
matched call should have become
  ff(y = "foo", z = pi)  rather than
  ff(x = "foo", z = pi)

because he'd expected the empty 'x =' to be matched by name and
hence *not* be matched again later when all the missing
arguments are matched positionally in the end.
NB because of the rule Michael cited above *of course*,
", ," (in your example below) is not equivalent to
"y = ," because the former leads to positional matching at position 2.

Now R's matching argument algorithm has therefore been consistent with
the above simple matching rule ((which did not include the exact vs
partial matching but that was not the topic here anyway))
that had been documented as that forever and AFAIK the same as S had.

What may be possible (and suggested in this thread ?) would be
to start signalling a warning when named empty arguments (the
" y = , "  in the example) are matched(*), i.e., it would give a
warning in match.call() but not sys.call(), and hence utilities
such as  alist()  would continue to work unchanged.

I have no idea (and no time currently to investigate) if such
warnings would be too disruptive for the current R code base or not.

Martin

    ----
    *) "matched" in that case effectively means "dropped" as we have
seen in the examples.


> On Fri, Nov 30, 2018 at 8:18 AM Emil Bode  
wrote:
>> 
>> But the main point is where arguments are mixed together:
>> 
>> > debugonce(plot.default)
>> > plot(x=1:10, y=, 'l')
>> ...
>> Browse[2]> missing(y)
>> [1] FALSE
>> Browse[2]> y
>> [1] "l"
>> Browse[2]> type
    >> [1] "p"
>> 
>> I think that's what I fall over mostly: that named, empty arguments 
behave entirely different from omitting them (", ,")
>> 
>> And I definitely agree we need a guru to explain it all to us (
>> 
>> Cheers, Emil Bode
>> 
>> 
>> On 30/11/2018, 15:35, "S Ellison"  wrote:
>> 
>> > Yes, I think all of that is correct. But y _is_ missing in this 
sense:
>> > > plot(1:10, y=)
>> > > ...
>> > Browse[2]> missing(y)
>> 
>> Although I said what I meant by 'missing' vs 'not present', it 
wasn't exactly what missing() means. My bad.
>> missing() returns TRUE if an argument is not specified in the call 
_whether or not_ it has a default, hence the behaviour of missing(y) in 
debug(plot).
>> 
>> But we can easily find out whether a default has been assigned:
>> plot(1:10, y=, type=)
>> Browse[2]> y
>> NULL
>> Browse[2]> type
>> "p"
>> 
>> ... which is consistent with silent omission of 'y=' and 'type='
>> 
>> 
>> Still waiting for a guru...
>> 
>> Steve E
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unexpected argument-matching when some are missing

2018-11-30 Thread Emil Bode
But the main point is where arguments are mixed together:

> debugonce(plot.default)
> plot(x=1:10, y=, 'l')
...
Browse[2]> missing(y)
[1] FALSE
Browse[2]> y
[1] "l"
Browse[2]> type
[1] "p"

I think that's what I fall over mostly: that named, empty arguments behave 
entirely different from omitting them (", ,")

And I definitely agree we need a guru to explain it all to us (

Cheers, Emil Bode


On 30/11/2018, 15:35, "S Ellison"  wrote:

> Yes, I think all of that is correct. But y _is_ missing in this sense:
> > plot(1:10, y=)
> > ...
> Browse[2]> missing(y)

Although I said what I meant by 'missing' vs 'not present', it wasn't 
exactly what missing() means. My bad.
missing() returns TRUE if an argument is not specified in the call _whether 
or not_ it has a default, hence the behaviour of missing(y) in debug(plot).

But we can easily find out whether a default has been assigned:
plot(1:10, y=, type=)
Browse[2]> y
NULL
Browse[2]> type
"p"

... which is consistent with silent omission of 'y=' and 'type=' 


Still waiting for a guru...

Steve E



***
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If 
you have received this message in error, please notify the sender 
immediately via +44(0)20 8943 7000 or notify postmas...@lgcgroup.com 
and delete this message and any copies from your computer and network. 
LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unexpected argument-matching when some are missing

2018-11-30 Thread Emil Bode
It looks like you're right that somewhere in (presumably) match.call, the 
named, empty arguments are removed, such that the call plot(x=1:10, y=, 10:1) 
is translated to plot(x=1:10, 10:1).
But I would have expected it to be the same as plot(x=1:10, , 10:1) (note the 
", ,"), which gives an error (10:1 is not a valid plot-type). In this case you 
get an error straightaway, I find this more interesting:
> options(warn=-1)
> plot(x=1, y=, 'p', ylim=c(0,10))
> plot(x=1, , 'p', ylim=c(0,10))
Both valid (no errors), albeit strange calls, but I'd say the first call is 
better code, it's clearer you intend to not give any value for y. But exactly 
this one gives unexpected results: it tries to plot at position (1, 'p'), or 
(1, NA).

And the behaviour as it is gives rise to some strange inconsistencies. I have 
gathered some examples below (at the very bottom of the thread, as it got quite 
extensive), where some variations are surprisingly different from each other.
There are also some issues when using data.frame(...)[i=, j=,...], but at least 
here you are warned about naming i and j.
But basically, it means any function where arguments like fun(,,) are a valid 
possibility should throw the same warning, e.g. any R-code replacement of 
[.matrix or [.array, or as in my examples, for data.table (and related 
structures)

On 29/11/2018, 19:10, "S Ellison"  wrote:


> > plot(x=1:10, y=)
> > plot(x=1:10, y=, 10:1)
> >
> > In both cases, 'y=' is ignored. In the first, the plot is for y=NULL 
(so not
> 'missing' y)
> > In the second case, 10:1 is positionally matched to y despite the 
intervening
> 'missing' 'y='
> >
> > So it isn't just 'missing'; it's 'not there at all'
> 
> What exactly is the difference between "missing" and "not there at all"?

A "missing argument" in R means that an argument with no default value was 
omitted from the call, and that is what I meant by "missing".
But that is not what is happening here. I was talking about "y=" apparently 
being treated as not present in the call, rather than the argument y being 
treated as a missing argument.  

In these examples, plot.default has a default value for y (NULL) so y can 
never be "missing" in the sense of the 'missing argument' error (compare what 
happens with plot(y=1:10), which reports x as 'missing'). 
In the first example, y was (from the plot behaviour) taken as NULL - the 
default - so was not considered a missing argument. In the second, it was taken 
as 10:1 - again, non-missing, despite 10:1 being in the normal position for the 
(character) argument "type".
But neither call did anything at all with "y=". Instead, the behaviour is 
consistent with what would have happened if 'y=' were "not present at all" when 
counting position or named argument list, rather than if 'y' were an absent 
required argument. 
It _looks_ as if the initial call parsing silently ignored the malformed 
expression "y=" before any argument matching - positional or by name - takes 
place.

But I'm thinking that it'll take an R-core guru to explain what's going on 
here, so I was going to wait and see.

Steve Ellison


Exampled if what I (Emil) found odd:
---
> library(data.table)
> options(warn=1) # Or 2
> data.table(a=1:2, b=3:4)[1] # As expected
   a b
1: 1 3
> data.table(a=1:2, b=3:4)[, 1] # As expected
   a
1: 1
2: 2
> data.table(a=1:2, b=3:4)[i=, 1] # Huh? We get the first row
   a b
1: 1 3
> data.table(a=1:2, b=3:4)[, 1, 'a'] # As expected
   a V1
1: 1  1
2: 2  1
> data.table(a=1:2, b=3:4)[i=, 1, 'a'] # I would have expected the same result, 
> and definitely more than 1 value
   a
1: 1
> data.table(a=1:2, b=3:4)[i=, 1, by='a'] # And this doesn't work?
Error in `[.data.table`(data.table(a = 1:2, b = 3:4), i = , 1, by = "a") : 
  'by' or 'keyby' is supplied but not j
> myfun <- function(x,y,z) {
+   print(match.call())
+   cat('nargs: ', nargs(), '\n')
+   cat('x=',if(missing(x)) 'missing' else x, '\n')
+   cat('y=',if(missing(y)) 'missing' else y, '\n')
+   cat('z=',if(missing(z)) 'missing' else z, '\n')
+ }
> myfun(x=, y=, , , "z's value") # 5 arguments??
myfun(z = "z's value")
nargs:  5 
x= missing 
y= missing 
z= z's value 
> myfun(x=, y=, , , "z's value", , ) # But any more are not allowed
Error in myfun(x = , y = , , , "z's value", , ) : 
  unused arguments (alist(, ))
> myfun(x2=, y=, "z's value") # And named arguments are ignored, but the names 
> have to be to existing argument-names
Error in myfun(x2 = , y = , "z's value") : unused argument (alist(x2 = ))
> myfun(x=, x=, , "z's value") # And naming it multiple times also gives an 
> error
Error in myfun(x = , x = , , "z's value") : 
  formal argument "x" matched by multiple actual arguments
> myfun(y=, , "z's value", x=3) # Having fun with obfuscation, is this call 
> 

Re: [Rd] Unexpected argument-matching when some are missing

2018-11-29 Thread Emil Bode
Well, I did mean it as "missing".
To me, it felt just as natural as providing an empty index for subsetting (e.g. 
some.data.frame[,,drop=FALSE])
I can't think of a whole lot of other uses than subsetting, but I think this 
issue may be mostly important when you're not entirely sure what a call is 
going to end up, when passing along arguments, or when calling an unknown 
function (as in variants of the apply-family, where you provide a function as 
an argument).
Or what happens if I use do.call(FUN, args=MyNamedList)? I have a bit more 
extensive example further down where you can more clearly see the unexpected 
output.

But the problem is that R does NOT treat it as simply "missing". That would 
have been reasonable, but instead, as in the example in my previous mail, 
myfun(x=, y=, "z's value") means x is assigned "z's value", and y and z are 
seen as missing. Which is not at all what I was expecting.

And is also not consistent with other behaviour, as myfun(,,"z's value") and 
myfun(x=, y=, z="z's value") do work as expected (at least what I was expecting)

The extensice example:
Suppose I want to write a function that selects data from some external source. 
In order to do this, we put the data in its own environment, where we look for 
variables called "df", "rows", "cols" and "drop", and use these to make a 
selection. I write this function:

doselect <- function(env) {
  do.call(`[.data.frame`, list(env$df, if(!is.null(env$rows)) env$rows, 
if(!is.null(env$cols)) env$cols, drop=if(!is.null(env$drop)) env$drop))
}

It works for this code:
myenv <- new.env()
assign('df', data.frame(a=1:2, b=3:4), myenv, inherits=FALSE)
assign('rows', 1, myenv, inherits=FALSE) # Code breaks if we don't have this 
line
assign('cols', 1, myenv, inherits=FALSE) # Code breaks if we don't have this 
line
assign('drop', FALSE, myenv, inherits=FALSE)
doselect(myenv)

But if we don't assign "rows" and/or "cols", the variable "drop" is inserted in 
the place of the first unnamed variable, so the result is the same as if calling
df[FALSE,,]:
[1] a b
<0 rows> (or 0-length row.names)

What I did expect was the same result as df[,,FALSE], i.e. the full data.frame. 
Of course I can rewrite the function "doselect", but I think my current call is 
how most people would write it (even though I admit the example in its entirety 
is far-fetched)


Best regards, 
Emil Bode
 

On 29/11/2018, 14:58, "Ista Zahn"  wrote:

On Thu, Nov 29, 2018 at 5:09 AM Emil Bode  wrote:
>
> When trying out some variations with `[.data.frame` I noticed some (to 
me) odd behaviour, which I found out has nothing to do with `[.data.frame`, but 
rather with the way arguments are matched, when mixing named/unnamed and 
missing/non-missing arguments. Consider the following example:
>
> myfun <- function(x,y,z) {
>   print(match.call())
>   cat('x=',if(missing(x)) 'missing' else x, '\n')
>   cat('y=',if(missing(y)) 'missing' else y, '\n')
>   cat('z=',if(missing(z)) 'missing' else z, '\n')
> }
> myfun(x=, y=, "z's value")
>
> gives:
>
> # myfun(x = "z's value")
> # x= z's value
> # y= missing
> # z= missing
>
> This seems very counterintuitive to me, I expect the arguments x and y to 
be missing, and z to get “z’s value”.

Interesting. I would expect it to throw an error, since "x=" is not
syntactically complete. What does "x=" mean anyway? It looks like R
interprets it as "x was not set to anything, i.e., is missing". That
seems reasonable, though I think the example itself is pathological
and would prefer that it produced an error.

--Ista
>
> When I call myfun(,y=,"z's value"), x is missing, and y gets “z’s value”.
   > Are my expectations wrong or is this a bug? And if my expectations are 
wrong, where can I find more information on argument-matching?
   > My gut-feeling says to call this a bug, but then I’m surprised no-one else 
has encountered it before.
   >
> And I don’t have multiple installations to work from, so could somebody 
else confirm this (if it’s not my expectations that are wrong) for 
R-devel/other R-versions/other platforms?
>
> My setup: R 3.5.1, MacOS 10.13.6, both Rstudio 1.1.453 and R --vanilla 
from Bash
>
> Best regards,
>
> Emil Bode
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Unexpected argument-matching when some are missing

2018-11-29 Thread Emil Bode
When trying out some variations with `[.data.frame` I noticed some (to me) odd 
behaviour, which I found out has nothing to do with `[.data.frame`, but rather 
with the way arguments are matched, when mixing named/unnamed and 
missing/non-missing arguments. Consider the following example:

 

myfun <- function(x,y,z) {

  print(match.call())

  cat('x=',if(missing(x)) 'missing' else x, '\n')

  cat('y=',if(missing(y)) 'missing' else y, '\n')

  cat('z=',if(missing(z)) 'missing' else z, '\n')

}

myfun(x=, y=, "z's value")

 

gives:

 

# myfun(x = "z's value")

# x= z's value 

# y= missing 

# z= missing

 

This seems very counterintuitive to me, I expect the arguments x and y to be 
missing, and z to get “z’s value”. 

When I call myfun(,y=,"z's value"), x is missing, and y gets “z’s value”.

Are my expectations wrong or is this a bug? And if my expectations are wrong, 
where can I find more information on argument-matching?

My gut-feeling says to call this a bug, but then I’m surprised no-one else has 
encountered it before.

 

And I don’t have multiple installations to work from, so could somebody else 
confirm this (if it’s not my expectations that are wrong) for R-devel/other 
R-versions/other platforms?

My setup: R 3.5.1, MacOS 10.13.6, both Rstudio 1.1.453 and R --vanilla from Bash

 

Best regards, 

Emil Bode 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] named arguments discouraged in `[.data.frame` and `[<-.data.frame`

2018-11-29 Thread Emil Bode
Well, the situation with `[.data.frame` (and [<-) is complicated by the fact 
that the data.frame-method is not a primitive, but the generic IS. 
I'm not sure about dispatch for primitive-generics, but I bet it's done on the 
first argument (as with S3). Which means `[`(j=1:2,d,i=1) has nothing to do 
with `[.data.frame`, as some internal code equivalent to something like 
`[.integer` is called (`[.integer` is not an R-function, but I guess it's 
implemented in the C-code for `[`)
And note that `[.data.frame`(j=1:2,d,i=1) does work (throws a warning, but 
returns the right result), because then you're simply calling the direct 
R-function, and matching by name is done.

But I think the main reason for the warning is forwards compatibility (and 
maybe backwards?). As of this version, `[.data.frame`(x = d, j = 2, i = 1) 
works fine, and `[.data.frame` is a regular R-function. But it's used a lot, I 
wouldn't be surprised if some future R-version would implement it as a 
primitive.
Without the warning, implementing [.data.frame as a primitive would involve a 
LOT of issues where older code breaks. With the warning, we can make clear to 
any users that calls like this one are undefined. They may work for now, but 
one shouldn't rely on it. Which means only the "right" order may be used, and 
then naming them is superfluous.

By the way, when trying some things I noticed something else, which I'll send a 
separate mail about...

Cheers,
Emil 

On 29/11/2018, 09:20, "R-devel on behalf of Henrik Pärn" 
 wrote:

Thanks Bill and Michael for taking the time to share your knowledge! 

As a further background to my question, here are two examples that I forgot 
to include in my original post (reminded by Michael's answer). I swapped the i 
and j arguments in `[.data.frame` and `[<-.data.frame`. With warnings, but else 
without (?) problem. Using Bill's data:

`[.data.frame`(x = d, i = 1, j = 2)
# [1] 12

`[.data.frame`(x = d, j = 2, i = 1)
# [1] 12

And similar for `[<-.data.frame` :
`[<-.data.frame`(x = d, i = 1, j = 2, value = 1122)
`[<-.data.frame`(x = d, j = 2, i = 1, value = 12)

Because this seemed to work, I made the hasty conclusion that argument 
switching _wasn't_ a problem for `[.data frame`, and that we could rely on 
exact matching on tags. But apparently not: despite that `[.data.frame` and 
`[<-.data.frame` are _not_ primitive functions, positional matching is done 
there as well. Sometimes. At least when 'x' argument is not first, as shown in 
Bill's examples. Obviously my "test" was insufficient...

Cheers,

Henrik



From: William Dunlap  
Sent: Wednesday, November 28, 2018 9:10 PM
To: Henrik Pärn 
Cc: r-devel@r-project.org
Subject: Re: [Rd] named arguments discouraged in `[.data.frame` and 
`[<-.data.frame`

They can get bitten in the last two lines of this example, where the 'x' 
argument is not first:
> d <- data.frame(C1=c(r1=11,r2=21,r3=31), C2=c(12,22,32))
> d[1,1:2]
   C1 C2
r1 11 12
> `[`(d,j=1:2,i=1)
   C1 C2
r1 11 12
Warning message:
In `[.data.frame`(d, j = 1:2, i = 1) :
  named arguments other than 'drop' are discouraged
> `[`(j=1:2,d,i=1)
Error in (1:2)[d, i = 1] : incorrect number of dimensions
> do.call("[", list(j=1:2, i=1, x=d))
Error in 1:2[i = 1, x = list(C1 = c(11, 21, 31), C2 = c(12, 22, 32))] :
  incorrect number of dimensions

Bill Dunlap
TIBCO Software
wdunlap http://tibco.com


On Wed, Nov 28, 2018 at 11:30 AM Henrik Pärn  
wrote:
tl;dr:

Why are named arguments discouraged in `[.data.frame`, `[<-.data.frame` and 
`[[.data.frame`?

(because this question is of the kind 'why is R designed like this?', I 
though R-devel would be more appropriate than R-help)

#

Background:

Now and then students presents there fancy functions like this: 

myfancyfun(d,12,0.3,0.2,500,1000,FALSE,TRUE,FALSE,TRUE,FALSE)

Incomprehensible. Thus, I encourage them to use spaces and name arguments, 
_at least_ when trying to communicate their code with others. Something like:

myfancyfun(data = d, n = 12, gamma = 0.3, prob = 0.2,
  size = 500, niter = 1000, model = FALSE,
 scale = TRUE, drop = FALSE, plot = TRUE, save = FALSE)


Then some overzealous students started to use named arguments everywhere. 
E-v-e-r-y-w-h-e-r-e. Even in the most basic situation when indexing vectors (as 
a subtle protest?), like:

vec <- 1:9

vec[i = 4]
`[`(x = vec, i = 4)

vec[[i = 4]]
`[[`(x = vec, i = 4)

vec[i = 4] <- 10
`[<-`(x = vec, i = 4, value = 10)

...or when indexing matrices:

m <- matrix(vec, ncol = 3)
m[i = 2, j = 2]
`[`(x = m, i = 2, j = 2)
# 5


Re: [Rd] [tryExcept] New try Function

2018-11-23 Thread Emil Bode
Hi Ernest,

To start: I don't see an attachment, I think they're not (always) allowed on 
this mailing-list. If you want to send something, text is your safest bet.
But regarding the issue of tryCatch: I think you're not fully using what it 
already can do. In almost all circumstances I've encountered the following 
works fine:
res <- tryCatch(expr, error = function(cond) {
  # a bunch of code
  # Some value to be stored in res
})
The only difference is that now "#abunchofcode" is run from inside a function, 
which means you're working in a different environment, and if you want to 
assign values to other variables you need to use <<- or assign.
For a modified function, I think it would be nice if there's a way to supply an 
expression instead of a function, so that evaluation (and assignment!) takes 
place in the same environment as the main code in the tryCatch (in expr). Is 
that what you made?
And with the current tryCatch, you could use something like this:
res <- tryCatch(expr, error=function(e) evalq({
  # a bunch of code
  # Some value for res
}, envir=parent.frame(4))) # The 4 is because some internal functions are 
involved, parent.frame(4) is the same environment as used by expr

Although this is cumbersome, and it gets even more cumbersome if you want to 
access the error-object in #abunchofcode, or use #abunchofcode to return to a 
higher level, so I get it you're looking for a more elegant solution.

Best regards, 
Emil Bode
 
On 23/11/2018, 08:49, "R-devel on behalf of Ernest Benedito" 
 wrote:

Hi everyone,

When dealing with errors, sometimes I want to run a bunch of code when an 
error occurs.
For now I usually use a structure such as:

res <- tryCatch(expr, error = function(cond) cond) # or try(expr)

if (inherits(res, “error”)) # or inherits(res, “try-error”)
  # a bunch of code

I though it would be useful to have a function that does this naturally, so 
I came up with the attached function.

I would be glad to hear your insights and if you think it would make sense 
to add this function to R.

Best regards,
Ernest
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting row in single column matrix drops names in resulting vector

2018-11-22 Thread Emil Bode
The problem is that the drop is only applied (or not) after the subsetting, so 
what R does is:
- Getting the subset, which means a 1 x 1 matrix.
- Only then It either returns that as is (when drop=FALSE), or removes ALL 
dimensions of extent 1, regardless of whether these are rows or columns (or 
higher dimensions).
And it can't keep any names, because what name should be returned? The name 
'row1' is just as valid as 'col1'.
I guess if we could design everything anew, a solution would be to be able to 
specify something like a[1,,drop='row'], or a[1,,drop=1] to drop the rows but 
keep columns, and get a vector being equal to 'row1' (which in this case just 
has length-1, and names 'col1')
That not how it's designed, but you could use 'adrop()' from the 'abind' 
package:
abind:: adrop(a[1,,drop=FALSE], drop=1) first subsets, then drops the 
row-dimension, so gives what you're looking for.
Hope this solves your problem.

Best regards, 
Emil Bode
 

On 21/11/2018, 17:58, "R-devel on behalf of Dmitriy Selivanov" 
 wrote:

Hi Rui. Thanks for answer, I'm aware of drop = FALSE option. Unfortunately
it doesn't resolve the issue - I'm expecting to get a vector, not a matrix .

ср, 21 нояб. 2018 г. в 20:54, Rui Barradas :

> Hello,
>
> Use drop = FALSE.
>
> a[1, , drop = FALSE]
> # col1
> #row11
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 16:51 de 21/11/2018, Dmitriy Selivanov escreveu:
> > Hello here. I'm struggling to understand R's subsetting behavior in
> couple
> > of edge cases - subsetting row in a single column matrix and subsetting
> > column in a single row matrix. I've read R's docs several times and
> haven't
> > found answer.
> >
> > Consider following example:
> >
> > a = matrix(1:2, nrow = 2, dimnames = list(c("row1", "row2"), c("col1")))
> > a[1, ]
> > # 1
> >
> > It returns *unnamed* vector `1` where I would expect named vector. In
> fact
> > it returns named vector when number of columns is > 1.
> > Same issue applicable to single row matrix. Is it a bug? looks very
> > counterintuitive.
> >
> >
>


-- 
Regards
Dmitriy Selivanov

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sys.call() inside replacement functions incorrectly returns *tmp*

2018-10-15 Thread Emil Bode
Hi,

Agreed that it would be better if sys.call() were to return "x" instead of 
"*tmp*", as it behaves as a local variable. Although I'm not sure what problem 
it would solve, the effect here is comparable to what happens when calling a 
function indirectly (although then you could use sys.call(2), which here 
doesn't work).
 
But your other suggestion, accepting non-existent x without error, would cause 
a lot of other problems I think. The way I see it, replacement functions are 
meant to edit a certain aspect of an object/variable. Which only makes sense if 
it exists in the first place. With your replacement function there may be a use 
for setting new values but what should e.g. "levels(x) <- letters" do? Make a 
new empty factor? Discard any results? Make an as-empty-as-possible variable 
(probably an empty list) with a levels attribute? I think the current behaviour 
is fine.
In general, I can't see any scenario where you want to "edit" a variable, but 
make the end-result independent of the original value (Then you'd simply 
assign, without a custom function). So any replacement function is going to 
read the original value of x, so is there any downside to requiring it?

Also, I have to say that your example looks confusing to me. Do you want to 
assign 0 to x, and ignore all other arguments? Or was it your intention to set 
all variables to 0? And having an ellipsis argument only makes sense in my 
experience if these arguments are optional. Which would mean 
myreplacementfunction() = 0 should be a valid call, but I can't see what that 
would be expected to do. So I would at least make the call 
`myreplacementfunction(x, ..., value).
The reason y and z are "right" is because these are simple extra input 
parameters, which can have any value, including missing, they needn't be 
evaluated.

Best regards, 
Emil Bode
 
Data-analyst
 
+31 6 43 83 89 33
emil.b...@dans.knaw.nl
 
DANS: Netherlands Institute for Permanent Access to Digital Research Resources
Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | 
i...@dans.knaw.nl <mailto:i...@dans.kn> | dans.knaw.nl 

DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and funding 
organisation NWO <http://www.nwo.nl/>. 

On 15/10/2018, 02:20, "R-devel on behalf of Abs Spurdle" 
 wrote:

Kia Ora

Let's say we have:
"myreplacementfunction<-" = function (..., value)
{   call = sys.call ()
print (as.list (call) )
0
}

Then we call:
x = 0
myreplacementfunction (x, y, z) = 0

It will return:
[[1]]
`myreplacementfunction<-`

[[2]]
`*tmp*`

[[3]]
y

[[4]]
z

$value


There's two problems here.
Firstly, x has to be defined otherwise we get an error message.
Secondly, the first argument is returned as *tmp*.

Both are incorrect.

It should be possible to call the function without defining x.
And it should return x rather than *tmp*.
In other words, replacement function calls should be the same as other
function calls.

Although it gets y and z right.


kind regards
Abs

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Buglet in handling times in R-3.5.1

2018-10-09 Thread Emil Bode
Here on my Mac it looks worse: not a rounding difference, but an off-by-one 
error for fractional seconds before 1970, looks like the conversion to POSIXlt 
is doing something wrong:

Options(digits=12)
as.numeric(as.POSIXlt(as.POSIXct('1969-01-01')))
[1] -31539600
# As expected
as.numeric(as.POSIXlt(as.POSIXct('1969-01-01')+.1))
[1] -31539598.9
# An additional second disappears (and no, there was no (negative) leap second)

I'm not enough at home in C to get to the core of it, the problem is in the 
.Internal call
Hoping somebody can investigate further.
Specs: R 3.5.1; macOS 10.13.6

Best regards, 
Emil Bode
 

On 09/10/2018, 17:27, "R-devel on behalf of Russell, George" 
 wrote:

Dear R developers,

I have found a minute bug in R-3.5.1 (Windows version), about how times not 
an exact number of seconds are displayed.
> as.POSIXct("1969-01-01 01:00")+0.3
[1] "1969-01-01 01:00:01 CET"
> as.POSIXct("1970-01-01 01:00")+0.3
[1] "1970-01-01 01:00:00 CET"

So for 1969, adding 0.3 of a second means you round UP, for 1970 you round 
DOWN. But I think it should be consistent.

At the end of this message I have put the usual version information.

Thanks for all your help and for your work on this wonderful product.


> R.version
   _
platform   x86_64-w64-mingw32
arch   x86_64
os mingw32
system x86_64, mingw32
status
major  3
minor  5.1
year   2018
month  07
day02
svn rev74947
language   R
version.string R version 3.5.1 (2018-07-02)
nickname   Feather Spray

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Warning when calling formals() for `[`.

2018-10-08 Thread Emil Bode
Hello,

I agree the documentation of args can be improved, but the main question is 
what the return should be.
I guess the reason args() returns NULL is because of the way argument-matching 
works for primitives: there is a lot going on under the hood, and what 
arguments are/are not acceptable for `[` can't be stated as straightforward as 
we can with other functions.
Note also the difference in printing "sum" and "[": sum first prints 
"function(..., na.rm=FALSE)", whereas `[` jumps straight to the body. And this 
is not an artefact of printing it, as makes overwriting it makes clear: 
`[` <- function(x, i, j, ..., drop=FALSE) .Primitive("[")
exhibited very strange behaviour, where you need to call it twice/nested: 1[2] 
returns a primitive function, to get it to do its job you need 1[2](df, 3, 4) 
instead of df[3,4].
So general advice would probably be to stay away from messing with arguments 
with primitives, as ?args already hints at: " mainly used interactively (...). 
For programming, consider using formals instead." 
Basically, primitives are optimized and down to the core, which probably means 
the concept of an argument-list may not be as clear as it is with "normal" 
functions.
So working with args() on primitives comes with some risks, which probably is 
the reason that formals() always return NULL in that case. 
What is your use case?
If you really need a return value, I think you could catch NULL-values, 
something like this:
args <- function(name) {
  if(is.character(name)) name <- get(name, parent.frame(), mode='function')
  if(!is.function(name)) return(NULL)
  ret <- base::args(name)
  if(is.null(ret) && is.primitive(name)) {
ret <- function(...) NULL
environment(ret) <- parent.frame()
  }
  return(ret)
}

Which would just return "function(...) NULL" for args("["), which is of the 
expected class, but does not give you any real information. Would that help you?
Otherwise, to get to know the arguments there is of course "?"
And note that if there is a dispatch, it's possible to get the argument-list of 
a specific method, e.g. args(`[.data.frame`) works as expected (as it is not a 
primitive) 


Best regards, 
Emil Bode

On 07/10/2018, 16:34, "R-devel on behalf of Rui Barradas" 
 wrote:

Hello,

This is because args(`[`) returns NULL and class(NULL) is NULL.
So the question would be why is the return value of args(`[`) NULL?

Rui Barradas

Às 15:14 de 07/10/2018, Peter Dalgaard escreveu:
> 
> 
>> On 7 Oct 2018, at 16:04 , Rui Barradas  wrote:
>>
>> Hello,
>>
>> I don't see why you say that the documentation seems to be wrong:
>>
>>
>> class(args(`+`))
>> #[1] "function"
>>
>>
>> args() on a primitive does return a closure. At least in this case it 
does.
> 
> But in this case it doesn't:
> 
>> is.primitive(get("["))
> [1] TRUE
>> class(args(get("[")))
> [1] "NULL"
> 
> Or, for that matter:
> 
>> is.primitive(`[`)
> [1] TRUE
>> class(args(`[`))
> [1] "NULL"
> 
> -pd
> 
>>
>>
>> Rui Barradas
>>
>> Às 14:05 de 07/10/2018, Peter Dalgaard escreveu:
>>> There is more "fun" afoot here, but I don't recall what the point may 
be:
>>>> args(get("+"))
>>> function (e1, e2)
>>> NULL
>>>> args(get("["))
>>> NULL
>>>> get("[")
>>> .Primitive("[")
>>>> get("+")
>>> function (e1, e2)  .Primitive("+")
>>> The other index operators, "[[", "[<-", "[[<-" are similar
>>> The docs are pretty clear that args() on a primitive should yield a 
closure, so at least the documentation seems to be wrong.
>>> -pd
>>>> On 6 Oct 2018, at 19:26 , Laurent Gautier  wrote:
>>>>
>>>> Hi,
>>>>
>>>> A short code example showing the warning might the only thing needed 
here:
>>>>
>>>> ```
>>>>> formals(args(`[`))
>>>> NULL
>>>>
>>>> *Warning message:In formals(fun) : argument is not a function*
>>>>> is.function(`[`)
>>>> [1] TRUE
>>>>> is.primitive(`[`)
>>>> [1] TRUE
>>>> ```
>>>>
>>>> Now with an other primitive:
>>>&g

[Rd] Relevel confusing with numeric value

2018-10-02 Thread Emil Bode
Something that bit me:
The function relevel takes a factor, and a reference level to be promoted to 
the first place.
If “ref” is a character this level is promoted, if it’s a numeric the “ref”-th 
level is promoted.
Which turns out to be very confusing if you have factor with numeric values 
(e.g. when reading in a csv with some dirty numeric columns and 
stringsAsFactors TRUE)
For example:

set.seed(1)
test <- data.frame(n=sample(c(1:100, letters[1:10]), size=90))
test$n <- relevel(test$n, 50)
print(levels(test$n))

gives “62” as the first level.

Could we make something like this an error, or at least issue a warning?
Also because some other functions automatically coerce, factor(…, levels=1:100) 
and levels(test$n) <- 1:100 works fine.
So this is maybe the most confusing: relevel(factor(1:10, levels = -10:20), 15) 
gives “4” as the first level

For now I’ve thought of 2 possible implementations, that could be inserted in 
stats::relevel.factor(), just before is.character(ref):

if(is.numeric(ref) && ref %in% lev)
warning('Provided numeric reference, note that this will promote the ', 
ref, 'th value, not level with value "', ref, '"!')

or

if(is.numeric(ref) && any(!is.na(suppressWarnings(as.numeric(lev)
warning('Provided numeric reference, note that this will promote the ', 
ref, 'th value, not level with value "', ref, '"!')


Best regards,
Emil Bode

Data-analyst

+31 6 43 83 89 33
emil.b...@dans.knaw.nl<mailto:emil.b...@dans.knaw.nl>

DANS: Netherlands Institute for Permanent Access to Digital Research Resources
Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | 
i...@dans.knaw.nl<mailto:i...@dans.kn> | 
dans.knaw.nl
DANS is an institute of the Dutch Academy KNAW<http://knaw.nl/nl> and funding 
organisation NWO<http://www.nwo.nl/>.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] future time stamps warning

2018-09-20 Thread Emil Bode

On Thu, Sep 20, 2018 at 11:46 AM Leo Lahti  wrote:
>
> Time stamps are correct and my system time is correct.

How is your timezone set?
When I look at your github I see as timestamp for DESCRIPTION today, 1:25 PM 
GMT+2. (and as I'm writing this, it's 1:12 PM GMT+2)
GMT+2 is CEST, if I see your mail-adress I guess you're in Finland, EEST, GMT+3.


>
> I am now tried to use Sys.setFileTime() to update time stamps as proposed.
> This does not help.
>
> The windows and debian builds give different reports on the time stamp
> issue.
> 
https://win-builder.r-project.org/incoming_pretest/eurostat_3.2.8_20180920_122655/Windows/00check.log
> 
https://win-builder.r-project.org/incoming_pretest/eurostat_3.2.8_20180920_122655/Debian/00check.log
>
> I attached the time stamp listing below.

Well, maybe it is an inaccurate system clock on your machine, or on
CRAN, but nevertheless,
for this submission, you can just set the time stamps to a time in the past.

Gabor

> Leo
>
>
>
> lei@kone:~/Rpackages/eurostat/inst/extras$ tar zvtf eurostat_3.2.8.tar.gz
> drwxr-xr-x lei/lei   0 2018-09-20 13:23 eurostat/build/
> -rw-r--r-- lei/lei 301 2018-09-20 13:23 
eurostat/build/vignette.rds
> drwxr-xr-x lei/lei   0 2018-09-20 13:23 eurostat/data/
> -rw-r--r-- lei/lei  98 2018-09-20 13:23 eurostat/data/datalist
> -rwxr-xr-x lei/lei 340 2018-06-17 20:44
> eurostat/data/ea_countries.rda
> -rwxr-xr-x lei/lei 214 2018-06-17 20:44
> eurostat/data/efta_countries.rda
> -rwxr-xr-x lei/lei 252 2018-06-17 20:44
> eurostat/data/eu_candidate_countries.rda
> -rwxr-xr-x lei/lei 431 2018-06-17 20:44
> eurostat/data/eu_countries.rda
> -rw-r--r-- lei/lei 3651661 2018-08-28 18:44
> eurostat/data/eurostat_geodata_60_2016.rda
> -rwxr-xr-x lei/lei5596 2018-06-17 20:44 eurostat/data/tgs00026.rda
> -rwxr-xr-x lei/lei1447 2018-09-20 13:23 eurostat/DESCRIPTION
> drwxr-xr-x lei/lei   0 2018-09-20 13:23 eurostat/inst/
> -rwxr-xr-x lei/lei1060 2018-06-17 20:44 eurostat/inst/CITATION
> drwxr-xr-x lei/lei   0 2018-09-20 13:23 eurostat/inst/doc/
> -rw-r--r-- lei/lei6361 2018-09-20 13:22
> eurostat/inst/doc/blogposts.html
> -rwxr-xr-x lei/lei 518 2018-06-17 20:44
> eurostat/inst/doc/blogposts.Rmd
> -rw-r--r-- lei/lei6455 2018-09-20 13:22
> eurostat/inst/doc/cheatsheet.html
> -rwxr-xr-x lei/lei 610 2018-06-17 20:44
> eurostat/inst/doc/cheatsheet.Rmd
> -rw-r--r-- lei/lei  246912 2018-09-20 01:43
> eurostat/inst/doc/eurostat_tutorial.pdf
> -rw-r--r-- lei/lei   10565 2018-09-20 13:23
> eurostat/inst/doc/eurostat_tutorial.R
> -rwxr-xr-x lei/lei   17177 2018-09-20 00:07
> eurostat/inst/doc/eurostat_tutorial.Rmd
> -rw-r--r-- lei/lei6834 2018-09-20 13:23
> eurostat/inst/doc/publications.html
> -rwxr-xr-x lei/lei 942 2018-06-17 20:44
> eurostat/inst/doc/publications.Rmd
> -rwxr-xr-x lei/lei  92 2018-06-17 20:44 eurostat/LICENSE
> drwxr-xr-x lei/lei   0 2018-08-28 18:44 eurostat/man/
> -rwxr-xr-x lei/lei 697 2018-06-17 20:44
> eurostat/man/clean_eurostat_cache.Rd
> -rwxr-xr-x lei/lei 395 2018-06-17 20:44
> eurostat/man/convert_time_col.Rd
> -rwxr-xr-x lei/lei1276 2018-06-17 20:44
> eurostat/man/cut_to_classes.Rd
> -rwxr-xr-x lei/lei1133 2018-06-17 20:44 eurostat/man/dic_order.Rd
> -rwxr-xr-x lei/lei 788 2018-06-17 20:44 
eurostat/man/eu_countries.Rd
> -rw-r--r-- lei/lei 826 2018-08-28 18:44
> eurostat/man/eurostat_geodata_60_2016.Rd
> -rwxr-xr-x lei/lei 805 2018-06-17 20:44
> eurostat/man/eurostat-package.Rd
> -rwxr-xr-x lei/lei1220 2018-06-17 20:44
> eurostat/man/eurotime2date.Rd
> -rwxr-xr-x lei/lei 978 2018-06-17 20:44 
eurostat/man/eurotime2num.Rd
> -rwxr-xr-x lei/lei1169 2018-06-17 20:44
> eurostat/man/get_eurostat_dic.Rd
> -rwxr-xr-x lei/lei2321 2018-08-28 18:44
> eurostat/man/get_eurostat_geospatial.Rd
> -rwxr-xr-x lei/lei2675 2018-08-28 18:44
> eurostat/man/get_eurostat_json.Rd
> -rwxr-xr-x lei/lei1172 2018-06-17 20:44
> eurostat/man/get_eurostat_raw.Rd
> -rwxr-xr-x lei/lei1191 2018-06-17 20:44
> eurostat/man/get_eurostat_toc.Rd
> -rwxr-xr-x lei/lei6351 2018-08-28 18:44 
eurostat/man/get_eurostat.Rd
> -rwxr-xr-x lei/lei 783 2018-06-17 20:44
> eurostat/man/harmonize_country_code.Rd
> -rwxr-xr-x lei/lei2518 2018-06-17 20:44
> eurostat/man/label_eurostat.Rd
> -rwxr-xr-x lei/lei2015 2018-06-17 20:44
> eurostat/man/search_eurostat.Rd
> -rwxr-xr-x lei/lei 

Re: [Rd] A different error in sample()

2018-09-20 Thread Emil Bode
But do we handle it as an error in what sample does, or how the documentation 
is?
I think what is done now would be best described as "ceilinged", i.e. what 
ceiling() does. But is there an English word to describe this?
Or just use "converted to the next smallest integer"?

But then again, what happens is that the answer is ceilinged, not the input.
I guess the rationale is that multiplying by any integer and then dividing 
should give the same results:
ceiling(sample(n * x, size=1e6, replace = TRUE) / x) gives the same results for 
any integer n and x, it's nice that this also holds for non-integer n.
The most important thing is why people would use sample with a non-integer x, I 
don’t see many use cases.
So I agree with Luke that a warning would be best, regardless of what the docs 
say.

Best regards, 
Emil Bode

Although it seems to be pretty weird to enter a numeric vector of length 
one that is not an integer as the first argument to sample(), the results do 
not seem to match what is documented in the manual. In addition, the results 
below do not support the use of round rather than truncate in the 
documentation. Consider the code below.
The first sentence in the details section says: "If x has length 1, is 
numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes 
place from 1:x."
In the console:> 1:2.001
[1] 1 2
> 1:2.9
[1] 1 2

truncation:
> trunc(2.9)
[1] 2

So, this seems to support the quote from in previous emails: "Non-integer 
positive numerical values of n or x will be truncated to the next smallest 
integer, which has to be no larger than .Machine$integer.max."
However, again in the console:> set.seed(123)
 > table(sample(2.001, 1, replace=TRUE))

   123 
5052 49417

So, neither rounding nor truncation is occurring. Next, define a sequence.
> x <- seq(2.001, 2.51, length.out=20)
Now, grab all of the threes from sample()-ing this sequence.

 > set.seed(123)
> threes <- sapply(x, function(y) table(sample(y, 1, replace=TRUE))[3])

Check for NAs (I cheated here and found a nice seed).> any(is.na(threes))
[1] FALSE
Now, the (to me) disturbing result.

> is.unsorted(threes)
[1] FALSE

or equivalently

> all(diff(threes) > 0)
[1] TRUE

So the number of threes grows monotonically as 2.001 moves to 2.5. As I 
hinted above, the monotonic growth is not assured. My guess is that the growth 
is stochastic and relates to some "probability weighting" based on how close 
the element of x is to 3. Perhaps this has been brought up before, but it seems 
relevant to the current discussion.
A potential aid to this issue would be something like
if(length(x) == 1 && !all.equal(x, as.integer(x))) warning("It is a bad 
idea to use vectors of length 1 in the x argument that are not integers.")
Hope that helps,luke

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug when calling system/system2 (and request for Bugzilla account)

2018-09-14 Thread Emil Bode
I hope it's not too specific in my setup...
I've tried with system2 added on the first line, so:

Example.R:
system2('ls', timeout=5)
cat('Start non-interruptable functions\n')
sample_a <- sample(1:1e7)
sample_b <- sample(1:2e7)
matching <- match(sample_a, sample_b)
cat('Finished\n')
Sys.sleep(10)

And in terminal/bash:
R --vanilla
source('Example.R')
Send ^C between the messages (Start...  until Finished)

Or if you have a more powerful CPU you can increase the samples, the exact code 
doesn't matter very much.
As soon as you restart and source again with the system2 call outcommented, the 
behaviour is different, there is a pause, and your return to the R-prompt.

Best, Emil



On 14/09/2018, 17:39, "luke-tier...@uiowa.edu"  wrote:

I can't reproduce this. Can you be more precise: exactly where are you
putting the system2 call and exactly where are you sending the
interrupt signal with ^C?

Best,

luke

On Fri, 14 Sep 2018, Emil Bode wrote:

> Hi all,
>
> I found some strange behaviour, which I think is a bug. Could someone 
make an account for me on Bugzilla or pass on my report?
>
> The problem:
> When pressing Ctrl-C when a file is sourced in R, run from Terminal 
(macOS), sometimes the entire session is ended right away, while I just want to 
stop the script. This is the case when I press Ctrl-C while some functions are 
running that don’t catch the interrupt. However, the behaviour is different 
whether I’m in a clean session (in which case some time is allowed to pass, so 
that when the function returns the script can be interrupted), or whether I 
have called base::system() or system2() with timeout other than 0.
>
> Reproducible example:
> cat('Start non-interruptable functions\n')
> sample_a <- sample(1:1e7)
> sample_b <- sample(1:2e7)
> matching <- match(sample_a, sample_b)
> cat('Finished\n')
> Sys.sleep(10)
>
> Observed behaviour:
> In a clean session, when I hit Ctrl-C during the execution of match, 
there is a delay, and as soon as Sys.sleep() is invoked, the script is 
interrupted, I get back my R “>”-prompt (unless options(error=…) is set)
> But If I add the line system2("ls", timeout = 5), or something similar, 
when I try to break during the first part of the script, my Rsession ends, I 
get thrown back to my terminal-prompt.
>
> Desired behaviour:
> The best setup would probably be if Ctrl-C would always try to break from 
the sourced file, and only if that doesn’t success in n seconds, break the 
rsession altogether, ideally with a customizable option. But maybe that’s too 
hard, so maybe the most pragmatic would be to have 2 hotkeys: one to break from 
a hanging/broken rsession, and one to gently try to break from a script. But at 
least I think it should be:
>
> Expected behaviour:
> Consistent behaviour for Ctrl-C: either trying to break the script, or 
end the session altogether.
>
> Some observations:
>
>  *   I can still break cleanly during the Sys.sleep(). But for larger 
scripts, it is largely a matter of luck if I hit Ctrl-C during the right moment.
>  *   I don’t notice any difference between using system or system2, or 
any of the arguments other than timeout provided
>  *   I don’t notice any difference whether the timeout is actually 
exhausted or not.
>  *   Later calls to system/system2 don’t change anything (i.e. later 
calling system(…, timeout=0) does not revert back to the old situation)
>
> My setup:
> R 3.5.1 (Feather Spray), run with –vanilla option
> GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin17)
> macOS High Sierra 10.13.6
>
> Best regards,
> Emil Bode
>
> Data-analyst
>
> +31 6 43 83 89 33
> emil.b...@dans.knaw.nl<mailto:emil.b...@dans.knaw.nl>
>
> DANS: Netherlands Institute for Permanent Access to Digital Research 
Resources
> Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | 
i...@dans.knaw.nl<mailto:i...@dans.kn> | 
dans.knaw.nl
> DANS is an institute of the Dutch Academy KNAW<http://knaw.nl/nl> and 
funding organisation NWO<http://www.nwo.nl/>.
>
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug when calling system/system2 (and request for Bugzilla account)

2018-09-14 Thread Emil Bode
Hi all,

I found some strange behaviour, which I think is a bug. Could someone make an 
account for me on Bugzilla or pass on my report?

The problem:
When pressing Ctrl-C when a file is sourced in R, run from Terminal (macOS), 
sometimes the entire session is ended right away, while I just want to stop the 
script. This is the case when I press Ctrl-C while some functions are running 
that don’t catch the interrupt. However, the behaviour is different whether I’m 
in a clean session (in which case some time is allowed to pass, so that when 
the function returns the script can be interrupted), or whether I have called 
base::system() or system2() with timeout other than 0.

Reproducible example:
cat('Start non-interruptable functions\n')
sample_a <- sample(1:1e7)
sample_b <- sample(1:2e7)
matching <- match(sample_a, sample_b)
cat('Finished\n')
Sys.sleep(10)

Observed behaviour:
In a clean session, when I hit Ctrl-C during the execution of match, there is a 
delay, and as soon as Sys.sleep() is invoked, the script is interrupted, I get 
back my R “>”-prompt (unless options(error=…) is set)
But If I add the line system2("ls", timeout = 5), or something similar, when I 
try to break during the first part of the script, my Rsession ends, I get 
thrown back to my terminal-prompt.

Desired behaviour:
The best setup would probably be if Ctrl-C would always try to break from the 
sourced file, and only if that doesn’t success in n seconds, break the rsession 
altogether, ideally with a customizable option. But maybe that’s too hard, so 
maybe the most pragmatic would be to have 2 hotkeys: one to break from a 
hanging/broken rsession, and one to gently try to break from a script. But at 
least I think it should be:

Expected behaviour:
Consistent behaviour for Ctrl-C: either trying to break the script, or end the 
session altogether.

Some observations:

  *   I can still break cleanly during the Sys.sleep(). But for larger scripts, 
it is largely a matter of luck if I hit Ctrl-C during the right moment.
  *   I don’t notice any difference between using system or system2, or any of 
the arguments other than timeout provided
  *   I don’t notice any difference whether the timeout is actually exhausted 
or not.
  *   Later calls to system/system2 don’t change anything (i.e. later calling 
system(…, timeout=0) does not revert back to the old situation)

My setup:
R 3.5.1 (Feather Spray), run with –vanilla option
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin17)
macOS High Sierra 10.13.6

Best regards,
Emil Bode

Data-analyst

+31 6 43 83 89 33
emil.b...@dans.knaw.nl<mailto:emil.b...@dans.knaw.nl>

DANS: Netherlands Institute for Permanent Access to Digital Research Resources
Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | 
i...@dans.knaw.nl<mailto:i...@dans.kn> | 
dans.knaw.nl
DANS is an institute of the Dutch Academy KNAW<http://knaw.nl/nl> and funding 
organisation NWO<http://www.nwo.nl/>.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Modification-proposal for %% (modulo) when supplied with double

2018-09-13 Thread Emil Bode
Okay, thanks for your reactions.
I realized it's not something being "broken", just that it's a situation where 
my intuition messes things up, and that in this situation it's not as easy as 
using all.equal, which is my usual approach when working with floats.
But maybe I underestimated the impact a change would have, thanks for the 
example Frederick.

Best,
Emil

On 11/09/2018, 21:10, "frede...@ofb.net"  wrote:

Duncan, I think Emil realizes that the floating point format isn't
able to represent certain numbers, that's why he is suggesting this
change rather than complaining about our arithmetic being broken.

However, I agree with you that we should not adopt his proposal. It
would not make things more "user friendly" for people. Everyone has a
different application and a different use of %% and they just need to
keep in mind that they are talking to a computer and not a blackboard.
Here is an example of a feature that was meant to help users get more
intuitive results with floating point numbers, but which actually
caused headaches instead:
https://github.com/Rdatatable/data.table/issues/1642 It is a slightly
different scenario to this one, but I think it is still a good example
of how we can end up creating unforeseen problems for people if we
change core functionality to do unsolicited rounding behind the
scenes.

Best wishes,

Frederick

On Tue, Sep 11, 2018 at 12:11:29PM -0400, Duncan Murdoch wrote:
> On 11/09/2018 11:23 AM, Emil Bode wrote:
> > Hi all,
> > 
> > 
> > 
> > Could we modify the "%%" (modulo)-operator to include some tolerance 
for rounding-errors when supplied with doubles?
> > 
> > It's not much work (patch supplied on the bottom), and I don't think it 
would break anything, only if you were really interested in analysing rounding 
differences.
> > 
> > Any ideas about implementing this and overwriting base::`%%`, or would 
we want another method (as I've done for the moment)?
> 
> I think this is a bad idea.  Your comments say "The
> \code{\link[base:Arithmetic]{`\%\%`}} operator calculates the modulo, but
> sometimes has rounding errors, e.g. "\code{(9.1/.1) \%\% 1}" gives ~ 1,
> instead of 0."
> 
> This is false.  The %% calculation is exactly correct.  The rounding error
> happened in your input:  9.1/0.1 is not equal to 91, it is a little bit
> less:
> 
> > options(digits=20)
> > 9.1/.1
> [1] 90.985789
> 
> And %% did not return 1, it returned the correct value:
> 
> > (9.1/.1) %% 1
> [1] 0.98578915
> 
> So it makes no sense to change %%.
> 
> You might argue that the division 9.1/.1 is giving the wrong answer, but 
in
> fact that answer is correct too.  The real problem is that in double
> precision floating point the numbers 9.1 and .1 can't be represented
> exactly.  This is well known, it's in the FAQ (question 7.31).
> 
> Duncan Murdoch
> 
> > 
> > 
> > 
> > Background
> > 
> > I was writing some code where something has to happen at a certain 
interval, with progress indicated, something like this:
> > 
> > interval <- .001
> > 
> > progress <- .1
> > 
> > for(i in 1:1000*interval) {myFun(i); Sys.sleep(interval); if(i %% 
progress, 0))) cat(i, '\n')}
> > 
> > without interval and progress being known in advance. I could work 
around it and make i integer, or do something like
> > 
> > isTRUE(all.equal(i %% progress,0)) || isTRUE(all.equal(i %% progress, 
progress),
> > 
> > but I think my code is clearer as it is. And I like the idea behind 
all.equal: we want double to approximately identical.
> > 
> > 
> > 
> > So my patch (with roxygen2-markup):
> > 
> > #' Modulo-operator with near-equality
> > 
> > #'
> > 
> > #' The \code{\link[base:Arithmetic]{`\%\%`}} operator calculates the 
modulo, but sometimes has rounding errors, e.g. "\code{(9.1/.1) \%\% 1}" gives 
~ 1, instead of 0.\cr
> > 
> > #' Comparable to what all.equal does, this operator has some tolerance 
for small rounding errors.\cr
> > 
> > #' If the answer would be equal to the divisor within a small 
tolerance, 0 is returned instead.
> > 
> > #'
> > 
> > #' For integer x and y, the normal \%\%-operator is used
> > 
> > #'
> >

[Rd] Modification-proposal for %% (modulo) when supplied with double

2018-09-11 Thread Emil Bode
Hi all,



Could we modify the "%%" (modulo)-operator to include some tolerance for 
rounding-errors when supplied with doubles?

It's not much work (patch supplied on the bottom), and I don't think it would 
break anything, only if you were really interested in analysing rounding 
differences.

Any ideas about implementing this and overwriting base::`%%`, or would we want 
another method (as I've done for the moment)?



Background

I was writing some code where something has to happen at a certain interval, 
with progress indicated, something like this:

interval <- .001

progress <- .1

for(i in 1:1000*interval) {myFun(i); Sys.sleep(interval); if(i %% progress, 
0))) cat(i, '\n')}

without interval and progress being known in advance. I could work around it 
and make i integer, or do something like

isTRUE(all.equal(i %% progress,0)) || isTRUE(all.equal(i %% progress, progress),

but I think my code is clearer as it is. And I like the idea behind all.equal: 
we want double to approximately identical.



So my patch (with roxygen2-markup):

#' Modulo-operator with near-equality

#'

#' The \code{\link[base:Arithmetic]{`\%\%`}} operator calculates the modulo, 
but sometimes has rounding errors, e.g. "\code{(9.1/.1) \%\% 1}" gives ~ 1, 
instead of 0.\cr

#' Comparable to what all.equal does, this operator has some tolerance for 
small rounding errors.\cr

#' If the answer would be equal to the divisor within a small tolerance, 0 is 
returned instead.

#'

#' For integer x and y, the normal \%\%-operator is used

#'

#' @usage `\%mod\%`(x, y, tolerance = sqrt(.Machine$double.eps))

#' x \%mod\% y

#' @param x,y numeric vectors, similar to those passed on to \%\%

#' @param tolerance numeric, maximum difference, see 
\code{\link[base]{all.equal}}. The default is ~ \code{1.5e-8}

#' @return identical to the result for \%\%, unless the answer would be really 
close to y, in which case 0 is returned

#' @note To specify tolerance, use the call \code{`\%mod\%`(x,y,tolerance)}

#' @note The precedence for \code{\%mod\%} is the same as that for \code{\%\%}

#'

#' @name mod

#' @rdname mod

#'

#' @export

`%mod%` <- function(x,y, tolerance = sqrt(.Machine$double.eps)) {

  stopifnot(is.numeric(x), is.numeric(y), is.numeric(tolerance),

!is.na(tolerance), length(tolerance)==1, tolerance>=0)

  if(is.integer(x) && is.integer(y)) {

return(x %% y)

  } else {

ans <- x %% y

return(ifelse(abs(ans-y)https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] compairing doubles

2018-08-31 Thread Emil Bode
Agreed that's it's rounding error, and all.equal would be the way to go.
I wouldn't call it a bug, it's simply part of working with floating point 
numbers, any language has the same issue.

And while we're at it, I think the function can be a lot shorter:
.is_continous_evenly_spaced <- function(n){
  length(n)>1 && isTRUE(all.equal(n[order(n)], seq(from=min(n), to=max(n), 
length.out = length(n
}

Cheers, Emil

El vie., 31 ago. 2018 a las 15:10, Felix Ernst
() escribió:
>
> Dear all,
>
> I a bit unsure, whether this qualifies as a bug, but it is definitly a 
strange behaviour. That why I wanted to discuss it.
>
> With the following function, I want to test for evenly space numbers, 
starting from anywhere.
>
> .is_continous_evenly_spaced <- function(n){
>   if(length(n) < 2) return(FALSE)
>   n <- n[order(n)]
>   n <- n - min(n)
>   step <- n[2] - n[1]
>   test <- seq(from = min(n), to = max(n), by = step)
>   if(length(n) == length(test) &&
>  all(n == test)){
> return(TRUE)
>   }
>   return(FALSE)
> }
>
> > .is_continous_evenly_spaced(c(1,2,3,4))
> [1] TRUE
> > .is_continous_evenly_spaced(c(1,3,4,5))
> [1] FALSE
> > .is_continous_evenly_spaced(c(1,1.1,1.2,1.3))
> [1] FALSE
>
> I expect the result for 1 and 2, but not for 3. Upon Investigation it 
turns out, that n == test is TRUE for every pair, but not for the pair of 0.2.
>
> The types reported are always double, however n[2] == 0.1 reports FALSE 
as well.
>
> The whole problem is solved by switching from all(n == test) to 
all(as.character(n) == as.character(test)). However that is weird, isn’t it?
>
> Does this work as intended? Thanks for any help, advise and suggestions 
in advance.

I guess this has something to do with how the sequence is built and
the inherent error of floating point arithmetic. In fact, if you
return test minus n, you'll get:

[1] 0.00e+00 0.00e+00 2.220446e-16 0.00e+00

and the error gets bigger when you continue the sequence; e.g., this
is for c(1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7):

[1] 0.00e+00 0.00e+00 2.220446e-16 2.220446e-16 4.440892e-16
[6] 4.440892e-16 4.440892e-16 0.00e+00

So, independently of this is considered a bug or not, instead of

length(n) == length(test) && all(n == test)

I would use the following condition:

isTRUE(all.equal(n, test))

Iñaki

>
> Best regards,
> Felix
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Ucar

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ROBUSTNESS: x || y and x && y to give warning/error if length(x) != 1 or length(y) != 1

2018-08-31 Thread Emil Bode

On 30/08/2018, 20:15, "R-devel on behalf of Hadley Wickham" 
 wrote:

On Thu, Aug 30, 2018 at 10:58 AM Martin Maechler
 wrote:
>
> >>>>> Joris Meys
> >>>>> on Thu, 30 Aug 2018 14:48:01 +0200 writes:
>
> > On Thu, Aug 30, 2018 at 2:09 PM Dénes Tóth
> >  wrote:
> >> Note that `||` and `&&` have never been symmetric:
> >>
> >> TRUE || stop() # returns TRUE stop() || TRUE # returns an
> >> error
> >>
> >>
> > Fair point. So the suggestion would be to check whether x
> > is of length 1 and whether y is of length 1 only when
> > needed. I.e.
>
> > c(TRUE,FALSE) || TRUE
>
> > would give an error and
>
> > TRUE || c(TRUE, FALSE)
>
> > would pass.
>
> > Thought about it a bit more, and I can't come up with a
> > use case where the first line must pass. So if the short
> > circuiting remains and the extra check only gives a small
> > performance penalty, adding the error could indeed make
> > some bugs more obvious.
>
> I agree "in theory".
> Thank you, Henrik, for bringing it up!
>
> In practice I think we should start having a warning signalled.
> I have checked the source code in the mean time, and the check
> is really very cheap
> { because it can/should be done after checking isNumber(): so
>   then we know we have an atomic and can use XLENGTH() }
>
>
> The 0-length case I don't think we should change as I do find
> NA (is logical!) to be an appropriate logical answer.

Can you explain your reasoning a bit more here? I'd like to understand
the general principle, because from my perspective it's more
parsimonious to say that the inputs to || and && must be length 1,
rather than to say that inputs could be length 0 or length 1, and in
the length 0 case they are replaced with NA.

Hadley

I would say the value NA would cause warnings later on, that are easy to track 
down, so a return of NA is far less likely to cause problems than an unintended 
TRUE or FALSE. And I guess there would be some code reliant on 'logical(0) || 
TRUE' returning TRUE, that wouldn't necessarily be a mistake.

But I think it's hard to predict how exactly people are using functions. I 
personally can't imagine a situation where I'd use || or && outside an 
if-statement, so I'd rather have the current behaviour, because I'm not sure if 
I'm reliant on logical(0) || TRUE  somewhere in my code (even though that would 
be ugly code, it's not wrong per se)
But I could always rewrite it, so I believe it's more a question of how much 
would have to be rewritten. Maybe implement it first in devel, to see how many 
people would complain?

Emil Bode




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ROBUSTNESS: x || y and x && y to give warning/error if length(x) != 1 or length(y) != 1

2018-08-30 Thread Emil Bode
Okay, I thought you always wanted to check the length, but if we can only check 
what's evaluated I mostly agree.

I still think there's not much wrong with how length-0 logicals are treated, as 
the return of NA in cases where the value matters is enough warning I think, 
and I can imagine some code like my previous example 'x==-1 || length(x)==0', 
which wouldn't need a warning.

But we could do a check for length being >1

Greetings, Emil


On 30/08/2018, 14:55, "R-devel on behalf of Joris Meys" 
 wrote:

On Thu, Aug 30, 2018 at 2:09 PM Dénes Tóth  wrote:

> Note that `||` and `&&` have never been symmetric:
>
> TRUE || stop() # returns TRUE
> stop() || TRUE # returns an error
>
>
Fair point. So the suggestion would be to check whether x is of length 1
and whether y is of length 1 only when needed. I.e.

c(TRUE,FALSE) || TRUE

would give an error and

TRUE || c(TRUE, FALSE)

would pass.

Thought about it a bit more, and I can't come up with a use case where the
first line must pass. So if the short circuiting remains and the extra
check only gives a small performance penalty, adding the error could indeed
make some bugs more obvious.

Cheers
Joris

-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)



---
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ROBUSTNESS: x || y and x && y to give warning/error if length(x) != 1 or length(y) != 1

2018-08-30 Thread Emil Bode
I have to disagree, I think one of the advantages of '||' (or &&) is the lazy 
evaluation, i.e. you can use the first condition to "not care" about the second 
(and stop errors from being thrown).
So if I want to check if x is a length-one numeric with value a value between 0 
and 1, I can do 'class(x)=='numeric' && length(x)==1 && x>0 && x<1'.
In your proposal, having x=c(1,2) would throw an error or multiple warnings.
Also code that relies on the second argument not being evaluated would break, 
as we need to evaluate y in order to know length(y)
There may be some benefit in checking for length(x) only, though that could 
also cause some false positives (e.g. 'x==-1 || length(x)==0' would be a bit 
ugly, but not necessarily wrong, same for someone too lazy to write x[1] 
instead of x).

And I don’t really see the advantage. The casting to length one is (I think), a 
feature, not a bug. If I have/need a length one x, and a length one y, why not 
use '|' and '&'? I have to admit I only use them in if-statements, and if I 
need an error to be thrown when x and y are not length one, I can use the 
shorter versions and then the if throws a warning (or an error for a length-0 
or NA result).

I get it that for someone just starting in R, the differences between | and || 
can be confusing, but I guess that's just the price to pay for having a 
vectorized language.

Best regards, 
Emil Bode
 
Data-analyst
 
+31 6 43 83 89 33
emil.b...@dans.knaw.nl
 
DANS: Netherlands Institute for Permanent Access to Digital Research Resources
Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | 
i...@dans.knaw.nl <mailto:i...@dans.kn> | dans.knaw.nl 

DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and funding 
organisation NWO <http://www.nwo.nl/>. 

On 29/08/2018, 05:03, "R-devel on behalf of Henrik Bengtsson" 
 wrote:

# Issue

'x || y' performs 'x[1] || y' for length(x) > 1.  For instance (here
using R 3.5.1),

> c(TRUE, TRUE) || FALSE
[1] TRUE
> c(TRUE, FALSE) || FALSE
[1] TRUE
> c(TRUE, NA) || FALSE
[1] TRUE
> c(FALSE, TRUE) || FALSE
[1] FALSE

This property is symmetric in LHS and RHS (i.e. 'y || x' behaves the
same) and it also applies to 'x && y'.

Note also how the above truncation of 'x' is completely silent -
there's neither an error nor a warning being produced.


# Discussion/Suggestion

Using 'x || y' and 'x && y' with a non-scalar 'x' or 'y' is likely a
mistake.  Either the code is written assuming 'x' and 'y' are scalars,
or there is a coding error and vectorized versions 'x | y' and 'x & y'
were intended.  Should 'x || y' always be considered an mistake if
'length(x) != 1' or 'length(y) != 1'?  If so, should it be a warning
or an error?  For instance,
'''r
> x <- c(TRUE, TRUE)
> y <- FALSE
> x || y

Error in x || y : applying scalar operator || to non-scalar elements
Execution halted

What about the case where 'length(x) == 0' or 'length(y) == 0'?  Today
'x || y' returns 'NA' in such cases, e.g.

> logical(0) || c(FALSE, NA)
[1] NA
> logical(0) || logical(0)
[1] NA
> logical(0) && logical(0)
[1] NA

I don't know the background for this behavior, but I'm sure there is
an argument behind that one.  Maybe it's simply that '||' and '&&'
should always return a scalar logical and neither TRUE nor FALSE can
be returned.

/Henrik

PS. This is in the same vein as
https://mailman.stat.ethz.ch/pipermail/r-devel/2017-March/073817.html
- in R (>=3.4.0) we now get that if (1:2 == 1) ... is an error if
_R_CHECK_LENGTH_1_CONDITION_=true

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] validspamobject?

2018-08-15 Thread Emil Bode
Hello,

If you want to determine where the warning is generated, I think it's easiest 
to run R with options(warn=2).
In that case all warnings are converted to errors, and you have more debugging 
tools, e.g. you can run traceback() to see the calling stack, or use 
options(error=recover).
Hope you can catch it.


Best regards, 
Emil Bode

is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and funding 
organisation NWO <http://www.nwo.nl/>. 

On 15/08/2018, 02:57, "R-devel on behalf of Ronald Barry" 
 wrote:

Greetings,
  My R package has been showing warnings of the form:

`validspamobject()` is deprecated. Use `validate_spam()` directly

None of my code uses the function validspamobject, so it must be a problem
in another package I'm calling, possibly spam or spdep.  Has this problem
occurred with other people?  It doesn't have any deleterious effect, but
it's annoying.  In particular, how do I determine which package is causing
this warning?  Thanks.

Ron B.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] apply with zero-row matrix

2018-07-30 Thread Emil Bode
Hi David,

Besides Martins point, there is also the issue that for a lot of cases you 
would still like to have the right class returned.
Right now these are returns:

> apply(matrix(NA_integer_,0,5), 1, class)
character(0)
> apply(matrix(NA_integer_,0,5), 1, identity)
integer(0)
> apply(matrix(NA,0,5), 1, identity)
logical(0)

In your case, these would all return NULL, so I think there is value in 
running FUN at least once (Say if you'd want to check if FUN always returns the 
right class). 
And from a philosophical point of view, R is mostly a functional 
programming language, I think if you want side-effects a for-loop would look 
better.


Best regards, 
Emil Bode
 
Data-analyst
 
+31 6 43 83 89 33
emil.b...@dans.knaw.nl
 
DANS: Netherlands Institute for Permanent Access to Digital Research 
Resources
Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | 
i...@dans.knaw.nl <mailto:i...@dans.kn> | dans.knaw.nl 

DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and 
funding organisation NWO <http://www.nwo.nl/>. 

On 30/07/2018, 11:12, "R-devel on behalf of David Hugh-Jones" 
 wrote:

Hi Martin,

Fair enough for R functions in general. But the behaviour of apply 
violates
the expectation that apply(m, 1, fun) calls fun n times when m has n 
rows.
That seems pretty basic.

Also, I understand from your argument why it makes sense to call apply 
and
return a special result (presumably NULL) for an empty argument; but why
should apply call fun?

Cheers
David

On Mon, 30 Jul 2018 at 08:41, Martin Maechler 

wrote:

> >>>>> David Hugh-Jones
> >>>>> on Mon, 30 Jul 2018 05:33:19 +0100 writes:
>
> > Forgive me if this has been asked many times before, but I
> > couldn't find anything on the mailing lists.
>
> > I'd expect apply(m, 1, foo) not to call `foo` if m is a
> > matrix with zero rows.  In fact:
>
> > m <- matrix(NA, 0, 5)
> > apply(m, 1, function (x) {cat("Called...\n"); print(x)})
> > ## Called...
> > ## [1] FALSE FALSE FALSE FALSE FALSE
>
>
> > Similarly for apply(m, 2,...) if m has no columns.  Is
> > there a reason for this?
>
> Yes :
>
> The reverse is really true for almost all basic R functions:
>
> They *are* called and give an "empty" result automatically
> when the main argument is empty.
>
> What you basicaly propose is to add an extra
>
>  if()
>  return()
>
> to all R functions.  While that makes sense for high-level R
> functions that do a lot of things, this would really be a bad
> idea in general :
>
> This would make all of these basic functions larger {more to 
maintain} and
> slightly slower for all non-zero cases just to make them
> slightly faster for the rare zero-length case.
>
> Martin Maechler
> ETH Zurich and R core Team
>
> --
Sent from Gmail Mobile

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] oddity in transform

2018-07-24 Thread Emil Bode
I think you meant to call BOD[,1]
From ?transform, the ... arguments are supposed to be vectors, and BOD[1] is 
still a data.frame (with one column). So I don't think it's surprising 
transform gets confused by which name to use (X, or Time?), and kind of 
compromises on the name "Time". It's also in a note in ?transform: "If some of 
the values are not vectors of the appropriate length, you deserve whatever you 
get!"
And if you want to do it with multiple extra columns (and are not satisfied 
with these labels), I think the proper way to go would be " transform(BOD, 
X=BOD[,1]*seq(6), Y=BOD[,2]*seq(6))"
 
If you want to trace it back further, it's not in transform but in data.frame. 
Column-names are prepended with a higher-level name if the object has more than 
one column.
And it uses the tag-name if simply supplied with a vector:
data.frame(BOD[1:2], X=BOD[1]*seq(6)) takes the name of the only column of 
BOD[1], Time. Only because that column name is already present, it's changed to 
Time.1
data.frame(BOD[1:2], X=BOD[,1]*seq(6)) gives third column-name X (as X is now a 
vector)
data.frame(BOD[1:2], X=BOD[1:2]*seq(6)) or with BOD[,1:2] gives columns names 
X.Time and X.demand, to show these (multiple) columns are coming from X

So I don't think there's much to fix here. I this case having X.Time in all 
cases would have been better, but in general the column-naming of data.frame 
works, changing it would likely cause a lot of problems.
You can always change the column-names later.

Best regards, 
Emil Bode
 
Data-analyst
 
+31 6 43 83 89 33
emil.b...@dans.knaw.nl
 
DANS: Netherlands Institute for Permanent Access to Digital Research Resources
Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | 
i...@dans.knaw.nl <mailto:i...@dans.kn> | dans.knaw.nl 

DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and funding 
organisation NWO <http://www.nwo.nl/>. 

On 23/07/2018, 16:52, "R-devel on behalf of Gabor Grothendieck" 
 wrote:

Note the inconsistency in the names in these two examples.  X.Time in
the first case and Time.1 in the second case.

  > transform(BOD, X = BOD[1:2] * seq(6))
Time demand X.Time X.demand
  118.3  1  8.3
  22   10.3  4 20.6
  33   19.0  9 57.0
  44   16.0 16 64.0
  55   15.6 25 78.0
  67   19.8 42118.8

  > transform(BOD, X = BOD[1] * seq(6))
Time demand Time.1
  118.3  1
  22   10.3  4
  33   19.0  9
  44   16.0 16
  55   15.6 25
  67   19.8 42

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] write.table with quote=TRUE fails on nested data.frames

2018-07-05 Thread Emil Bode
Looks like I’m bumping a lot into unexpected behaviour lately, but I think I 
found a bug again, but don’t have access to Bugzilla:
Write.table (from core-package utils) doesn’t handle nested data.frames well, 
the quote arguments only marks top-level character (or-factor columns) for 
quoting, so this fails:

df <- data.frame(a='One;Two;Three',
 b=I(data.frame(c="OtherVal",
d='Four;Five;Six',
e=4)),
f=5)
write.table(df, "~/Desktop/Tempfile.csv", quote = T, col.names = NA,
   sep = ";", dec = ",", qmethod = "double")

The “four;five;six” string is stored unquoted, so read.table (or read.csv) 
breaks down.
This also affects write.csv and write.csv2, but I’ve written a patch,
See here-under.
Anyone who could file this for me?

Best regards,
Emil Bode

New write.table, note that its environment needs to be set to namespace:utils
write.table <- function (x, file = "", append = FALSE, quote = TRUE, sep = " ",
 eol = "\n", na = "NA", dec = ".", row.names = TRUE, 
col.names = TRUE,
 qmethod = c("escape", "double"), fileEncoding = "")
{
  qmethod <- match.arg(qmethod)
  if (is.logical(quote) && (length(quote) != 1L || is.na(quote)))
stop("'quote' must be 'TRUE', 'FALSE' or numeric")
  quoteC <- if (is.logical(quote))
quote
  else TRUE
  qset <- is.logical(quote) && quote
  if (!is.data.frame(x) && !is.matrix(x))
x <- data.frame(x)
  makeRownames <- isTRUE(row.names)
  makeColnames <- is.logical(col.names) && !identical(FALSE,
  col.names)
  if (is.matrix(x)) {
p <- ncol(x)
d <- dimnames(x)
if (is.null(d))
  d <- list(NULL, NULL)
if (is.null(d[[1L]]) && makeRownames)
  d[[1L]] <- seq_len(nrow(x))
if (is.null(d[[2L]]) && makeColnames && p > 0L)
  d[[2L]] <- paste0("V", 1L:p)
if (qset)
  quote <- if (is.character(x))
seq_len(p)
else numeric()
  }
  else {
if (any(sapply(x, function(z) length(dim(z)) == 2 &&
   dim(z)[2L] > 1))) {
  if (qset) {
quote <- which(rapply(x, function(x) is.character(x) || is.factor(x)))
  }
  c1 <- names(x)
  x <- as.matrix(x, rownames.force = makeRownames)
  d <- dimnames(x)
}
else {
  if (qset)
quote <- if (length(x))
  which(unlist(lapply(x, function(x) is.character(x) ||
is.factor(x
  else numeric()
  d <- list(if (makeRownames) row.names(x), if (makeColnames) names(x))
}
p <- ncol(x)
  }
  nocols <- p == 0L
  if (is.logical(quote))
quote <- NULL
  else if (is.numeric(quote)) {
if (any(quote < 1L | quote > p))
  stop("invalid numbers in 'quote'")
  }
  else stop("invalid 'quote' specification")
  rn <- FALSE
  rnames <- NULL
  if (is.logical(row.names)) {
if (row.names) {
  rnames <- as.character(d[[1L]])
  rn <- TRUE
}
  }
  else {
rnames <- as.character(row.names)
rn <- TRUE
if (length(rnames) != nrow(x))
  stop("invalid 'row.names' specification")
  }
  if (!is.null(quote) && rn)
quote <- c(0, quote)
  if (is.logical(col.names)) {
if (!rn && is.na(col.names))
  stop("'col.names = NA' makes no sense when 'row.names = FALSE'")
col.names <- if (is.na(col.names) && rn)
  c("", d[[2L]])
else if (col.names)
  d[[2L]]
else NULL
  }
  else {
col.names <- as.character(col.names)
if (length(col.names) != p)
  stop("invalid 'col.names' specification")
  }
  if (file == "")
file <- stdout()
  else if (is.character(file)) {
file <- if (nzchar(fileEncoding))
  file(file, ifelse(append, "a", "w"), encoding = fileEncoding)
else file(file, ifelse(append, "a", "w"))
on.exit(close(file))
  }
  else if (!isOpen(file, "w")) {
open(file, "w")
on.exit(close(file))
  }
  if (!inherits(file, "connection"))
stop("'file' must be a character string or connection")
  qstring <- switch(qmethod, escape = "\"", double = "\"\"")
  if (!is.null(col.names)) {
if (append)
  warning("appending column names to file")
if (quoteC)
  col.names <- paste0("\"", gsub("\"", qstring, col.names),
  "\"")
writeLines(paste(col.names, collapse = sep), file, sep = eol)
  }
  if (nrow(x) =

[Rd] Inconsistencies when extracting with non-integer numeric indices near zero

2018-07-03 Thread Emil Bode
Dear R-devel,

When I was playing around with different kind of indices when subsetting I 
noticed some unexpected behaviours when using non-integer numeric indices, 
especially near zero.
From the docs: “Numeric values are coerced to integer as by 
as.integer<http://127.0.0.1:14277/help/library/base/help/as.integer> (and hence 
truncated towards zero).”
But some behaviour differs from that, and the behaviour also differs between [ 
and [[ :
c(1,2)[as.integer(.5)]   --> numeric(0) # As expected
c(1,2)[.5] --> numeric(0) # As expected
c(1,2)[[as.integer(.5)]]--> Error in c(1, 2)[[as.integer(0.5)]] : 
attempt to select less than one element in integerOneIndex # Also as expected
c(1,2)[[.5]]   --> [1] 1# Not so 
expected
c(1,2)[[1.5]] --> [1] 1# As expected, 
but this also means somevector[[n]] and somevector[[n+1]] give back the same 
element for 0 [1] 2# As would be expected, 
though negative subscript for [[ is of course sketchy
c(1,2)[[-1.5]]--> [1] 1# But coerced to 
-2 is the last thing I’d expect
c(1,2)[as.integer(-.5)] --> numeric(0) # As expected
c(1,2)[-.5]--> [1] 2# Coerced to 
-1?, this also means that length(union(c(1,2)[-n], c(1,2)[n])) != 2 for -1 [1] 2# Again as 
expected, but same problem as before: indexing with n and n+1 can give same 
element back.

I suspect most of this behaviour is due to the case of special treatment of 
zero, where first 0-indices are dropped, and only then the casting to integer 
is done, and when that returns zero some unforeseen behaviour occurs.
Along with using negative indices when extracting with [[, which in any case 
only succeeds with length-2 vectors (We need a length-one index resulting in 
the return of a single element). For the last case I’d think we’d do best in 
throwing an error whenever negative indices are used with [[, but for other 
cases I think we need to change the underlying code, or at the very least 
update documentation.
Any thoughts?

Best regards,
Emil Bode

Data-analyst

+31 6 43 83 89 33
emil.b...@dans.knaw.nl<mailto:emil.b...@dans.knaw.nl>

DANS: Netherlands Institute for Permanent Access to Digital Research Resources
Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | 
i...@dans.knaw.nl<mailto:i...@dans.kn> | 
dans.knaw.nl
DANS is an institute of the Dutch Academy KNAW<http://knaw.nl/nl> and funding 
organisation NWO<http://www.nwo.nl/>.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()

2018-06-12 Thread Emil Bode
I agree that calling it invalid is a bit confusing, but I’m not sure what the 
wording should be, as the problem is that the conversion to POSIXlt is failing.
The best solution would be to extend the whole POSIXlt-class, but that’s too 
much work.
I’ve done some experiments, and it also seems that the Date class can store 
larger values than POSIXlt:
> as.Date(8e9, origin='1970-01-01')==as.Date(9e9, origin='1970-01-01')
[1] FALSE
> as.POSIXlt(as.Date(8e9, origin='1970-01-01'))==as.POSIXlt(as.Date(9e9, 
> origin='1970-01-01'))
[1] TRUE
> as.POSIXlt(as.Date(8e9, origin='1970-01-01'))
[1] "-5877641-06-23 UTC"
# Same for 9e9
> as.Date(8e9, origin='1970-01-01')>Sys.Date()
[1] TRUE
> as.POSIXlt(as.Date(8e9, origin='1970-01-01'))>as.POSIXlt(Sys.Date())
[1] FALSE

So the situation as I see it now:

  *   Having an infinite date may convey some information, so we shouldn’t 
prohibit it anyway
  *   Idem for very large values (positive or negative)
  *   But we should warn users that their dates may not be neatly 
representable, that there is no way to use the default-print
  *   So for values where the POSIXlt-print fails, I think it’s best to print 
the numerical value, along with some text warning the user
So I’ve adapted the format-function a bit more, with behaviour below.
The details can be adapted of course, but I feel it’s best to print some 
variant of as.numeric(x) if as.POSIXlt(x) turns out to be unreliable, and 
further leave is.na()


format.Date <- function (x, ...)
{
  xx <- format(as.POSIXlt(x), ...)
  names(xx) <- names(x)
  if(any(!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896))) {
xx[!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896)] <-
  paste('Date with numerical value',as.numeric(x[!is.na(x) & 
(-719162>as.numeric(x) | as.numeric(x)>2932896)]))
warning('Some dates are not in the interval 01-01-01 and -12-31, 
showing numerical value.')
  }
  xx
}

With the following results:

> environment(print.Date) <- .GlobalEnv
> as.Date(Inf, origin='1970-01-01')
[1] "Date with numerical value Inf"
Warning message:
In format.Date(x) :
  Some dates are not in the interval 01-01-01 and -12-31, showing numerical 
value.



From: Gabe Becker 
Date: Monday, 11 June 2018 at 23:59
To: Emil Bode 
Cc: Joris Meys , Werner Grundlingh 
, "macque...@llnl.gov" , r-devel 

Subject: Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()

format.Date <- function (x, ...)
{
  xx <- format(as.POSIXlt(x), ...)
  names(xx) <- names(x)
  xx[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)] <- paste('Invalid 
date:',as.numeric(x[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)]))
  xx
}

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()

2018-06-11 Thread Emil Bode
I don't think there's much wrong with is.na(as_date(Inf, 
origin='1970-01-01'))==FALSE, as there still is some "non-NA-ness" about the 
value (as difftime shows), but that the output when printing is confusing. The 
way cat is treating it is clearer: it does print Inf.

So would this be a solution?

format.Date <- function (x, ...) 
{
  xx <- format(as.POSIXlt(x), ...)
  names(xx) <- names(x)
  xx[is.na(xx) & !is.na(x)] <- paste('Invalid date:',as.numeric(x[is.na(xx) & 
!is.na(x)]))
  xx
}

Which causes this behaviour, which I think is clearer:

environment(print.Date) <- .GlobalEnv
x <- as_date(Inf, origin='1970-01-01')
print(x)
# [1] "Invalid date: Inf"

Best regards, 
Emil Bode
 
Data-analyst
 
+31 6 43 83 89 33
emil.b...@dans.knaw.nl
 
DANS: Netherlands Institute for Permanent Access to Digital Research Resources
Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | 
i...@dans.knaw.nl <mailto:i...@dans.kn> | dans.knaw.nl 

DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and funding 
organisation NWO <http://www.nwo.nl/>.
 
Who will be the winner of the Dutch Data Prize 2018? Go to researchdata.nl to 
nominate. 

On 09/06/2018, 13:52, "R-devel on behalf of Joris Meys" 
 wrote:

And now I've seen I copied the wrong part of ?is.na

> The default method for is.na applied to an atomic vector returns a
logical vector of the same length as its argument x, containing TRUE for
those elements marked NA or, for numeric or complex vectors, NaN, and FALSE
otherwise.

Key point being "atomic vector" here.


On Sat, Jun 9, 2018 at 1:41 PM, Joris Meys  wrote:

> Hi Werner,
>
> on ?is.na it says:
>
> > The default method for anyNA handles atomic vectors without a class and
> NULL.
>
> I hear you, and it is confusing to say the least. Looking deeper, the
> culprit seems to be in the conversion of a Date to POSIXlt prior to the
> formatting:
>
> > x <- as.Date(Inf,origin = '1970-01-01')
> > is.na(as.POSIXlt(x))
> [1] TRUE
>
> Given this implicit conversion, I'd argue that as.Date should really
> return NA as well when passed an infinite value. The other option is to
> provide an is.na method for the Date class, which is -given is.na is an
> internal generic- rather trivial:
>
> > is.na.Date <- function(x) is.na(as.POSIXlt(x))
> > is.na(x)
> [1] TRUE
>
> This might be a workaround for your current problem without needing
> changes to R itself. But this will give a "wrong" answer in the sense that
> this still works:
>
> > Sys.Date() - x
> Time difference of -Inf days
>
> I personally would go for NA as the "correct" date for an infinite value,
> but given that this will have implications in other areas, there is a
> possibility of breaking code and it should be investigated a bit further
> imho.
> Cheers
> Joris
>
>
>
>
> On Fri, Jun 8, 2018 at 11:21 PM, Werner Grundlingh 
> wrote:
>
>> Indeed. as_date is from lubridate, but the same holds for as.Date.
>>
>> The output and it's interpretation should be consistent, otherwise it
>> leads
>> to confusion when programming. I understand that the difference exists
>> after asking a question on Stack Overflow:
>>   https://stackoverflow.com/q/50766089/914686
>> This understanding is never mentioned in the documentation - that an Inf
>> date is actually represented as NA:
>>   https://www.rdocumentation.org/packages/base/versions/3.5.0/
>> topics/as.Date
>> So I'm of the impression that the display should be fixed as a first
>> option
>> (thereby providing clarity/transparency in terms of back-end and output),
>> or the documentation amended (to highlight this) as a second option.
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Department of Data Analysis and Mathematical Modelling
> Ghent University
> Coupure Links 653, B-9000 Gent (Belgium)
>
> 
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium=gmail=g>
>
> ---
> Biowiskundedagen 2017-2018
> http://www.biowiskundedagen.ugent.be/
>