Re: [R] The end of Matlab

2008-12-12 Thread Wacek Kusnierczyk
Duncan Murdoch wrote:
 On 11/12/2008 9:45 PM, Mike Rowe wrote:
 Greetings!

 I come to R by way of Matlab.  One feature in Matlab I miss is its
 end keyword.  When you put end inside an indexing expression, it
 is interpreted as the length of the variable along the dimension being
 indexed.  For example, if the same feature were implemented in R:

 my.vector[5:end]

 would be equivalent to:

 my.vector[5:length(my.vector)]

 And if my.vector is of length less than 5?

 or:

 this.matrix[3:end,end]

 would be equivalent to:

 this.matrix[3:nrow(this.matrix),ncol(this.matrix)]   # or
 this.matrix[3:dim(this.matrix)[1],dim(this.matrix)[2]]

 As you can see, the R version requires more typing, and I am a lousy
 typist.

 It doesn't save typing, but a more readable version would be

 rows - nrow(this.matrix)
 cols - ncol(this.matrix)
 this.matrix[3:rows, cols]


and if nrow(this.matrix) is less than 3?



 With this in mind, I wanted to try to implement something like this in
 R.  It seems like that in order to be able to do this, I would have to
 be able to access the parse tree of the expression currently being
 evaluated by the interpreter from within my End function-- is this
 possible?  Since the [ and [[ operators are primitive I can't see
 their arguments via the call stack functions...

 Anyone got a workaround?  Would anybody else like to see this feature
 added to R?

 I like the general rule that subexpressions have values that can be
 evaluated independent of context, so I don't think this is a good idea.


but this 'general rule' is not really adhered to in r!  one example
already discussed here at length is subset:

subset(data.frame(...), select=a)

what will be selected?  column named a, or columns named by the
components of the vector a?  this is an example of how you can't say
what an expression means in a context-independent manner.  and this is
an ubiquitous problem in r.


vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Prof Brian Ripley

On Fri, 12 Dec 2008, Duncan Murdoch wrote:


On 11/12/2008 9:45 PM, Mike Rowe wrote:

Greetings!

I come to R by way of Matlab.  One feature in Matlab I miss is its
end keyword.  When you put end inside an indexing expression, it
is interpreted as the length of the variable along the dimension being
indexed.  For example, if the same feature were implemented in R:

my.vector[5:end]

would be equivalent to:

my.vector[5:length(my.vector)]


And if my.vector is of length less than 5?


Also consider

my.vector[-(1:4)]


or:

this.matrix[3:end,end]

would be equivalent to:

this.matrix[3:nrow(this.matrix),ncol(this.matrix)]   # or
this.matrix[3:dim(this.matrix)[1],dim(this.matrix)[2]]

As you can see, the R version requires more typing, and I am a lousy
typist.


It doesn't save typing, but a more readable version would be

rows - nrow(this.matrix)
cols - ncol(this.matrix)
this.matrix[3:rows, cols]


I would have used

this.matrix[-(1:2), ncol(this.matrix)]

which I find much clearer as to its intentions.


With this in mind, I wanted to try to implement something like this in
R.  It seems like that in order to be able to do this, I would have to
be able to access the parse tree of the expression currently being
evaluated by the interpreter from within my End function-- is this
possible?  Since the [ and [[ operators are primitive I can't see
their arguments via the call stack functions...

Anyone got a workaround?  Would anybody else like to see this feature
added to R?


Learning to use the power of R's indexing and functios like head() and 
tail() (which are just syntactic sugar) will probably lead you not to miss 
this.


I like the general rule that subexpressions have values that can be evaluated 
independent of context, so I don't think this is a good idea.


Also, '[' is generic, so it would need to be done in such a way that it 
applied to all methods.  As arguments other than the first are passed 
unevaluated to the methods, I don't think this is really possible (you 
don't even know if the third argument to `[` is a dimension for a method).


Also, this would effectively make 'end' a reserved word, or 3:end is 
ambiguous (or at best context-dependent).


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Duncan Murdoch

On 12/12/2008 3:41 AM, Wacek Kusnierczyk wrote:

Duncan Murdoch wrote:

On 11/12/2008 9:45 PM, Mike Rowe wrote:

Greetings!

I come to R by way of Matlab.  One feature in Matlab I miss is its
end keyword.  When you put end inside an indexing expression, it
is interpreted as the length of the variable along the dimension being
indexed.  For example, if the same feature were implemented in R:

my.vector[5:end]

would be equivalent to:

my.vector[5:length(my.vector)]

And if my.vector is of length less than 5?

or:

this.matrix[3:end,end]

would be equivalent to:

this.matrix[3:nrow(this.matrix),ncol(this.matrix)]   # or
this.matrix[3:dim(this.matrix)[1],dim(this.matrix)[2]]

As you can see, the R version requires more typing, and I am a lousy
typist.

It doesn't save typing, but a more readable version would be

rows - nrow(this.matrix)
cols - ncol(this.matrix)
this.matrix[3:rows, cols]



and if nrow(this.matrix) is less than 3?



With this in mind, I wanted to try to implement something like this in
R.  It seems like that in order to be able to do this, I would have to
be able to access the parse tree of the expression currently being
evaluated by the interpreter from within my End function-- is this
possible?  Since the [ and [[ operators are primitive I can't see
their arguments via the call stack functions...

Anyone got a workaround?  Would anybody else like to see this feature
added to R?

I like the general rule that subexpressions have values that can be
evaluated independent of context, so I don't think this is a good idea.



but this 'general rule' is not really adhered to in r!  one example
already discussed here at length is subset:

subset(data.frame(...), select=a)

what will be selected?  column named a, or columns named by the
components of the vector a?  this is an example of how you can't say
what an expression means in a context-independent manner.  


From which you might conclude that I don't like the design of subset, 
and you'd be right.  However, I don't think this is a counterexample to 
my general rule.  In the subset function, the select argument is treated 
as an unevaluated expression, and then there are rules about what to do 
with it.  (I.e. try to look up name `a` in the data frame, if that 
fails, ...)


For the requested behaviour to similarly fall within the general rule, 
we'd have to treat all indices to all kinds of things (vectors, 
matrices, dataframes, etc.) as unevaluated expressions, with special 
handling for the particular symbol `end`.  But Mike wanted an End 
function, so presumably he wanted the old behaviour of indexing, but to 
have a function whose value depended on where it was called from.  We do 
have those (e.g. the functions for examining the stack that Mike wanted 
to make use of), and they're needed for debugging and a few special 
cases, but as a general rule they should be avoided.


 and this is

an ubiquitous problem in r.


I don't think so.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Claudia Beleites
Dear list,

 Learning to use the power of R's indexing and functios like head() and
 tail() (which are just syntactic sugar) will probably lead you not to miss
 this.
However, how do I exclude the last columns of a data.frame or matrix (or, in 
general, head and tail for given dimensions of an array)?

I.e. something nicer than 
t (head (t (x), -n))
for excluding the last n columns of matrix x

THX, Claudia


-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Patrick Burns

How about:

x[, -seq(to=ncol(x), length=n)]


Patrick Burns
patr...@burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)

Claudia Beleites wrote:

Dear list,

  

Learning to use the power of R's indexing and functios like head() and
tail() (which are just syntactic sugar) will probably lead you not to miss
this.

However, how do I exclude the last columns of a data.frame or matrix (or, in 
general, head and tail for given dimensions of an array)?


I.e. something nicer than 
t (head (t (x), -n))

for excluding the last n columns of matrix x

THX, Claudia





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Wacek Kusnierczyk
Duncan Murdoch wrote:
 On 12/12/2008 3:41 AM, Wacek Kusnierczyk wrote:

 but this 'general rule' is not really adhered to in r!  one example
 already discussed here at length is subset:

 subset(data.frame(...), select=a)

 what will be selected?  column named a, or columns named by the
 components of the vector a?  this is an example of how you can't say
 what an expression means in a context-independent manner.  

 From which you might conclude that I don't like the design of subset,
 and you'd be right.  However, I don't think this is a counterexample
 to my general rule.  In the subset function, the select argument is
 treated as an unevaluated expression, and then there are rules about
 what to do with it.  (I.e. try to look up name `a` in the data frame,
 if that fails, ...)

 For the requested behaviour to similarly fall within the general rule,
 we'd have to treat all indices to all kinds of things (vectors,
 matrices, dataframes, etc.) as unevaluated expressions, with special
 handling for the particular symbol `end`.  But Mike wanted an End
 function, so presumably he wanted the old behaviour of indexing, but
 to have a function whose value depended on where it was called from. 
 We do have those (e.g. the functions for examining the stack that Mike
 wanted to make use of), and they're needed for debugging and a few
 special cases, but as a general rule they should be avoided.

  and this is
 an ubiquitous problem in r.

 I don't think so.


i'd think that neither the 'evaluate' nor the 'deparse' approaches to
establishing the values of arguments are particularly rare in r.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Wacek Kusnierczyk
Duncan Murdoch wrote:
 On 11/12/2008 9:45 PM, Mike Rowe wrote:


 this.matrix[3:end,end]

 would be equivalent to:

 this.matrix[3:nrow(this.matrix),ncol(this.matrix)]   # or
 this.matrix[3:dim(this.matrix)[1],dim(this.matrix)[2]]

 As you can see, the R version requires more typing, and I am a lousy
 typist.

 It doesn't save typing, but a more readable version would be

 rows - nrow(this.matrix)
 cols - ncol(this.matrix)
 this.matrix[3:rows, cols]


 With this in mind, I wanted to try to implement something like this in
 R.  It seems like that in order to be able to do this, I would have to
 be able to access the parse tree of the expression currently being
 evaluated by the interpreter from within my End function-- is this
 possible?  Since the [ and [[ operators are primitive I can't see
 their arguments via the call stack functions...

 Anyone got a workaround?  Would anybody else like to see this feature
 added to R?

 I like the general rule that subexpressions have values that can be
 evaluated independent of context, so I don't think this is a good idea.

if 'end' poses a problem to the general rule of context-free
establishment of the values of expressions, the python way might be
another option:

x[3:] 

instead of 

x[3:length(x)]
x[3:end]

(modulo 0-based indexing in python)  could this be considered?  laziness
seems to be considered a virtue here, and r is stuffed with 'features'
designed by lazy programmers to avoid, e.g., typing quotes; why would
not having to type 'length(...)' or 'nrows(...)' etc. be considered
annoying?

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread hadley wickham
 From which you might conclude that I don't like the design of subset, and
 you'd be right.  However, I don't think this is a counterexample to my
 general rule.  In the subset function, the select argument is treated as an
 unevaluated expression, and then there are rules about what to do with it.
  (I.e. try to look up name `a` in the data frame, if that fails, ...)

 For the requested behaviour to similarly fall within the general rule, we'd
 have to treat all indices to all kinds of things (vectors, matrices,
 dataframes, etc.) as unevaluated expressions, with special handling for the
 particular symbol `end`.

Except you wouldn't have to necessarily change indexing - you could
change seq instead.  Then 5:end could produce some kind of special
data structure (maybe an iterator) that was recognised by the various
indexing functions.  This would still be a lot of work for not a lot
of payoff, but it would be a logically consistent way of adding this
behaviour to indexing, and the basic work would make it possible to
develop other sorts of indexing, eg df[evens(), ], or df[last(5),
last(3)].

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Claudia Beleites
Am Freitag 12 Dezember 2008 13:10:20 schrieb Patrick Burns:
 How about:

 x[, -seq(to=ncol(x), length=n)]
Doing it is not my problem. I just agree with Mike in that I would like if I 
could do shorter than: 

x[, 1 : (ncol(x) - n)] 

which I btw prefer to your solution.

Also, I don't have a problem writing generalized versions of head and tail to 
work along other/more dimensions. Or a combined function, taking first and last 
arguments. 

Still, they would not be as convenient to use as matlab's:
3 : end - 4
which btw. also does not need parentheses.

I guess the general problem is that there is only one thing with integers that 
can easily be (ab)used as a flag: the negative sign. 

But there are (at least) 2 possibly useful special ways of indexing: 
- exclusion (as in R)
- using -n for end - n (as in perl)

Now we enjoy having a shortcut for exclusion (at least I do), but still feel 
that marking from the end would be useful.

As no other signs (in the sense of flag) are available for integers, we won't 
be able to stop typing somewhat more in R.

Wacek:
 x[3:]
 instead of
 x[3:length(x)]
 x[3:end]
I don't think that would help: 
what to use for end - 3 within the convention that negative values mean 
exclusion?




--- now I start dreaming ---

However, it is possible to define new binary operators (operators are great for 
lazy typing...).

Let's say %:% should be a new operator to generate proper indexing sequences 
to be used inside [ :
e.g. an.array [ 1:3, -2 %:% -5, ...]

If we now find an.array which is x inside [ (and also inside [[) - which is 
possible but maybe a bit fiddly

and if we can also find out which of the indices is actually evaluated (which I 
don't know how to do)

then we could use something* as a flag for from the end and calculate the 
proper sequence.

something* could e.g. be 
either an attribute to the operators (convenient if we can define an unary 
operator that allows setting it, e.g. § 3 [§ is the easy-to-type sign on my 
keyboard that is not yet used...])

or i (the imaginary one) if there is no other convenient unary operator e.g. 
3i

= 
easy part of the solution:
make.index - function (x, along.dim = 1, from, to){
if (is.null (dim (x)))
   dim - length (x)
else 
  dim - dim (x)[along.dim]

if (is.complex (from)){
from - dim - from # 0i means end
## warning if re (from) != 0 ?
}
if (is.complex (to)){
to - dim - to # 0i means end
## warning if re (to) != 0 ?
}
   
from : to
}

%:% - function (e1, e2)  ## using a new operator does not mess up :
make.index (x = find.x (), along.dim = find.dim (), e1, e2)

now, the heavy part are the still missing find.x () and find.dim () functions...

I'm not sure whether this would be worth the work, but maybe someone is around 
who just knows how to do this.


Claudia

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Wacek Kusnierczyk
Claudia Beleites wrote:
 Am Freitag 12 Dezember 2008 13:10:20 schrieb Patrick Burns:
   
 How about:

 x[, -seq(to=ncol(x), length=n)]
 
 Doing it is not my problem. I just agree with Mike in that I would like if I 
 could do shorter than: 

 x[, 1 : (ncol(x) - n)] 

 which I btw prefer to your solution.

 Also, I don't have a problem writing generalized versions of head and tail to 
 work along other/more dimensions. Or a combined function, taking first and 
 last 
 arguments. 

 Still, they would not be as convenient to use as matlab's:
 3 : end - 4
 which btw. also does not need parentheses.

 I guess the general problem is that there is only one thing with integers 
 that 
 can easily be (ab)used as a flag: the negative sign. 

 But there are (at least) 2 possibly useful special ways of indexing: 
 - exclusion (as in R)
 - using -n for end - n (as in perl)

 Now we enjoy having a shortcut for exclusion (at least I do), but still feel 
 that marking from the end would be useful.

 As no other signs (in the sense of flag) are available for integers, we won't 
 be able to stop typing somewhat more in R.

 Wacek:
   
 x[3:]
 instead of
 x[3:length(x)]
 x[3:end]
 
 I don't think that would help: 
 what to use for end - 3 within the convention that negative values mean 
 exclusion?


   

might seem tricky, but not impossible:

x[-2]
# could mean 'all except for 2nd', as it is now

x[1:-2]
# could mean 'from start to the 2nd backwards from the end'

since r disallows mixing positive and negative indexing, the above would
not be ambiguous.  worse with

x[-3:-1]

which could mean both 'except for 3rd, 2nd, and 1st' and 'from the 3rd
to the 1st from the end', and so would be ambiguous.  in this context,
indeed, having explicit 'end' could help avoid the ambiguity.


vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Duncan Murdoch

On 12/12/2008 8:25 AM, hadley wickham wrote:

From which you might conclude that I don't like the design of subset, and
you'd be right.  However, I don't think this is a counterexample to my
general rule.  In the subset function, the select argument is treated as an
unevaluated expression, and then there are rules about what to do with it.
 (I.e. try to look up name `a` in the data frame, if that fails, ...)

For the requested behaviour to similarly fall within the general rule, we'd
have to treat all indices to all kinds of things (vectors, matrices,
dataframes, etc.) as unevaluated expressions, with special handling for the
particular symbol `end`.


Except you wouldn't have to necessarily change indexing - you could
change seq instead.  Then 5:end could produce some kind of special
data structure (maybe an iterator) that was recognised by the various
indexing functions. 


Ummm, doesn't that require changes to *both* indexing and seq?


This would still be a lot of work for not a lot
of payoff, but it would be a logically consistent way of adding this
behaviour to indexing, and the basic work would make it possible to
develop other sorts of indexing, eg df[evens(), ], or df[last(5),
last(3)].


I agree:  it would be a nice addition, but a fair bit of work.  I think 
it would be quite doable for the indexable things in the base packages, 
but there are a lot of contributed packages that define [ methods, and 
those methods would all need to be modified too.


(Just to be clear, when I say doable, I'm thinking that your iterators 
return functions that compute subsets of index ranges.  For example, 
evens() might be implemented as


evens - function() {
  result - function(indices) {
indices[indices %% 2 == 0]
  }
  class(result) - iterator
  return(result)
}

and then `[` in v[evens()] would recognize that it had been passed an 
iterator, and would pass 1:length(v) to the iterator to get the subset 
of even indices.  Is that what you had in mind?)


Duncan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Wacek Kusnierczyk

 Wacek:
   
 
 x[3:]
 instead of
 x[3:length(x)]
 x[3:end]
 
   
 I don't think that would help: 
 what to use for end - 3 within the convention that negative values mean 
 exclusion?


   
 

 might seem tricky, but not impossible:

 x[-2]
 # could mean 'all except for 2nd', as it is now

 x[1:-2]
 # could mean 'from start to the 2nd backwards from the end'

 since r disallows mixing positive and negative indexing, the above would
 not be ambiguous.  worse with

 x[-3:-1]

 which could mean both 'except for 3rd, 2nd, and 1st' and 'from the 3rd
 to the 1st from the end', and so would be ambiguous.  in this context,
 indeed, having explicit 'end' could help avoid the ambiguity.

   

on the other hand, another possible solution would be to have ':' mean,
inside range selection expressions, not the usual sequence generation,
but rather specification of start and end indices:

x[1:2]
# from 1st to 2nd, inclusive

x[seq(1,2)]
# same as above

x[c(1,2)]
# same as above

x[1:-2]
# from 1st to 2nd from the end, not x[c(1,0,-1,-2)]

x[seq(1,-2)]
# no way, mixed indices

x[-2:-1]
# from 2nd to 1st, both from the end, not x[c(-2,-1)]

x[length(x) + -1:0]
# same as above

x[seq(-2,-1)]
# except for 2nd and 1st

x[c(-2,-1)]
# same as above

x[2:]
# from 2nd up

x[seq(2, max(2, length(x)))]
# same as above (would not be without max)

x[:3]
# up to 3rd

x[seq(1,3)]
# same as above

x[:-3]
# up to 3rd from the end

x[seq(1, length(x)-2)]
# same as above

with additional specifications for the behaviour in case of invalid
indices and decreasing indices:

x[2:1]
# from the 2nd to the 1st, in reverse order
# or nothing, invalid indexing

which can be easily done with the unambiguous x[seq(2,1)] or x[c(2,1)]

this is daydreaming, of course, because such modifications would break
much old code, and the benefit may not outweigh the effort.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Gabor Grothendieck
Here is how to emulate matlab end in R in the case of matrices.

Rather than redefine the matrix class (which would be a bit intrusive) we
just define a subclass of matrix called matrix2.  Note in the examples that
matrix2 survives some operations such as + but not others such as crossprod
so in those one would have to coerce back to matrix2 using as.matrix2.


as.matrix2 - function(x, ...) UseMethod(as.matrix2)

as.matrix2.default - function(x, ...) {
   do.call(structure, list(x, ...,
   class = c(matrix2, setdiff(class(x), matrix2
}

matrix2 - function(data, ...) as.matrix2(matrix(data, ...))

[.matrix2 - function(x, i, j, ...) {
i - if (missing(i)) TRUE
else eval.parent(do.call(substitute, list(substitute(i), list(end = 
nrow(x)
j - if (missing(j)) TRUE
else eval.parent(do.call(substitute, list(substitute(j), list(end = 
ncol(x)
.subset(x, i, j, ...)
}


 # test
 m - matrix2(1:12, 3, 4)
 # matrix2 survives the + operation
 class(m+2)
[1] matrix2 matrix

 # but not crossprod
 class(crossprod(m))
[1] matrix

 # coercing back
 as.matrix2(crossprod(m))
 [,1] [,2] [,3] [,4]
[1,]   14   32   50   68
[2,]   32   77  122  167
[3,]   50  122  194  266
[4,]   68  167  266  365
attr(,class)
[1] matrix2 matrix

 # example of using end
 m[2:end, 2:end]
 [,1] [,2] [,3]
[1,]58   11
[2,]69   12



On Thu, Dec 11, 2008 at 9:45 PM, Mike Rowe mwr...@gmail.com wrote:
 Greetings!

 I come to R by way of Matlab.  One feature in Matlab I miss is its
 end keyword.  When you put end inside an indexing expression, it
 is interpreted as the length of the variable along the dimension being
 indexed.  For example, if the same feature were implemented in R:

 my.vector[5:end]

 would be equivalent to:

 my.vector[5:length(my.vector)]

 or:

 this.matrix[3:end,end]

 would be equivalent to:

 this.matrix[3:nrow(this.matrix),ncol(this.matrix)]   # or
 this.matrix[3:dim(this.matrix)[1],dim(this.matrix)[2]]

 As you can see, the R version requires more typing, and I am a lousy
 typist.

 With this in mind, I wanted to try to implement something like this in
 R.  It seems like that in order to be able to do this, I would have to
 be able to access the parse tree of the expression currently being
 evaluated by the interpreter from within my End function-- is this
 possible?  Since the [ and [[ operators are primitive I can't see
 their arguments via the call stack functions...

 Anyone got a workaround?  Would anybody else like to see this feature
 added to R?

 Thanks,
 Mike

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Claudia Beleites
I just realized that my idea of doing something without going into the 
extraction functions itself won't work 
:-( it was a nice dream, though.

The reason is that there is no general way to find out what the needed length 
is: At least I'm just writing a class where 2 kinds of columns are involved. I 
don't give a dim attribute, though. But I could, and then: how to know how it 
should be interpreted?

 on the other hand, another possible solution would be to have ':' mean,
 inside range selection expressions, not the usual sequence generation,
 but rather specification of start and end indices:
...
 this is daydreaming, of course, because such modifications would break
 much old code,
nothing would break if some other sign instead of : would be used. Maybe 
something like end...

 and the benefit may not outweigh the effort.
This might be true in any case.

If I only think of how many lines of nrow, ncol, length  Co I could have 
written instead of posting wrong proposals

Claudia

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Wacek Kusnierczyk
Claudia Beleites wrote:
 Wacek:
   
 x[3:]
 instead of
 x[3:length(x)]
 x[3:end]
 
 I don't think that would help:
 what to use for end - 3 within the convention that negative values mean
 exclusion?
   
 might seem tricky, but not impossible:

 x[-2]
 # could mean 'all except for 2nd', as it is now

 x[1:-2]
 # could mean 'from start to the 2nd backwards from the end'
 
 I know you get thus far. You might even think to decide whether exclusion or 
 'from the end' is meant from ascending ./. descending order of the sequence, 
 but this messes around with returning the reverse order.

   

that's a design issue.  one simple solution is to have this sort of
indexing return always in ascending order.  thus,

x = 1:5
x[1:-1]
# 1 2 3 4 5
x[5:-5]
# NULL rather than 5 4 3 2 1 -- as in matlab or python

x[seq(5,1)]
# 5 4 3 2 1

that is, the ':'-based indexing can be made not to mess with the order. 
for reversing the order, why not use:

x[5:-1:1]
# 5 4 3 2 1

x[-3:-1:-5]
# 3 2 1 rather than x[c(-3,-4,-5)], which would be 1 2


 since r disallows mixing positive and negative indexing, the above would
 not be ambiguous.  worse with

 x[-3:-1]

 which could mean both 'except for 3rd, 2nd, and 1st' and 'from the 3rd
 to the 1st from the end', and so would be ambiguous.  in this context,
 indeed, having explicit 'end' could help avoid the ambiguity.
 
 that's the problem.
 also: how would 'except from the 5th last to the 3rd last' be expressed?
   

for exclusions you'd need to use negative indices anyway:

x[seq(-5,-3)]

now, neither x[-5:-3] nor x[-3:-5] would do the job they do now, but the
above is not particularly longer, while selecting the
5th-to3rd-from-the-end columns is simply x[-5:-3] (which could be made
to fail on out-of-range indices) instead of something like x[length(x) -
4:2] (which will silently do the wrong thing if length(x)  4, and thus
requires extra care).

this is a rather loose idea, and unrealistic in the context of r, but i
do not see much problem with it on the conceptual level.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Greg Snow
Just to muddy the waters a bit further.  Currently we can do things like:

 pascal.tri - numeric(0)
 class(pascal.tri) - 'pasctri'

 `[.pasctri` - function(x, ...) {
+ dots - list(...)
+ n - dots[[1]]
+ row - choose(n, 0:n)
+ if(length(dots)  1) {
+ row - row[ dots[[2]] ]
+ }
+ row
+ }

 pascal.tri[4]
[1] 1 4 6 4 1
 pascal.tri[4,2]
[1] 4

Now whether that is clever or abusive, I'm not sure (probably not clever).

But what would we expect:

 pascal.tri[end]

to return?

Also if we can access the last element of a vector as:

 x[end]

(which I am not opposed to, just don't know if it is worth the effort) then how 
long will it be before someone wants to be able to do:

 x[end+1] - new.value

and put that in a loop, which would lead to very poor programming practice (but 
so easy it would tempt many).

Just my $0.015 worth,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Wacek Kusnierczyk
 Sent: Friday, December 12, 2008 8:57 AM
 To: claudia.belei...@gmx.de
 Cc: R help
 Subject: Re: [R] The end of Matlab

 Claudia Beleites wrote:
  Wacek:
 
  x[3:]
  instead of
  x[3:length(x)]
  x[3:end]
 
  I don't think that would help:
  what to use for end - 3 within the convention that negative values
 mean
  exclusion?
 
  might seem tricky, but not impossible:
 
  x[-2]
  # could mean 'all except for 2nd', as it is now
 
  x[1:-2]
  # could mean 'from start to the 2nd backwards from the end'
 
  I know you get thus far. You might even think to decide whether
 exclusion or
  'from the end' is meant from ascending ./. descending order of the
 sequence,
  but this messes around with returning the reverse order.
 
 

 that's a design issue.  one simple solution is to have this sort of
 indexing return always in ascending order.  thus,

 x = 1:5
 x[1:-1]
 # 1 2 3 4 5
 x[5:-5]
 # NULL rather than 5 4 3 2 1 -- as in matlab or python

 x[seq(5,1)]
 # 5 4 3 2 1

 that is, the ':'-based indexing can be made not to mess with the order.
 for reversing the order, why not use:

 x[5:-1:1]
 # 5 4 3 2 1

 x[-3:-1:-5]
 # 3 2 1 rather than x[c(-3,-4,-5)], which would be 1 2


  since r disallows mixing positive and negative indexing, the above
 would
  not be ambiguous.  worse with
 
  x[-3:-1]
 
  which could mean both 'except for 3rd, 2nd, and 1st' and 'from the
 3rd
  to the 1st from the end', and so would be ambiguous.  in this
 context,
  indeed, having explicit 'end' could help avoid the ambiguity.
 
  that's the problem.
  also: how would 'except from the 5th last to the 3rd last' be
 expressed?
 

 for exclusions you'd need to use negative indices anyway:

 x[seq(-5,-3)]

 now, neither x[-5:-3] nor x[-3:-5] would do the job they do now, but
 the
 above is not particularly longer, while selecting the
 5th-to3rd-from-the-end columns is simply x[-5:-3] (which could be made
 to fail on out-of-range indices) instead of something like x[length(x)
 -
 4:2] (which will silently do the wrong thing if length(x)  4, and thus
 requires extra care).

 this is a rather loose idea, and unrealistic in the context of r, but i
 do not see much problem with it on the conceptual level.

 vQ

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread hadley wickham
On Fri, Dec 12, 2008 at 8:41 AM, Duncan Murdoch murd...@stats.uwo.ca wrote:
 On 12/12/2008 8:25 AM, hadley wickham wrote:

 From which you might conclude that I don't like the design of subset, and
 you'd be right.  However, I don't think this is a counterexample to my
 general rule.  In the subset function, the select argument is treated as
 an
 unevaluated expression, and then there are rules about what to do with
 it.
  (I.e. try to look up name `a` in the data frame, if that fails, ...)

 For the requested behaviour to similarly fall within the general rule,
 we'd
 have to treat all indices to all kinds of things (vectors, matrices,
 dataframes, etc.) as unevaluated expressions, with special handling for
 the
 particular symbol `end`.

 Except you wouldn't have to necessarily change indexing - you could
 change seq instead.  Then 5:end could produce some kind of special
 data structure (maybe an iterator) that was recognised by the various
 indexing functions.

 Ummm, doesn't that require changes to *both* indexing and seq?

Ooops, yes.  I meant it wouldn't require indexing to use unevaluated
expression.

 This would still be a lot of work for not a lot
 of payoff, but it would be a logically consistent way of adding this
 behaviour to indexing, and the basic work would make it possible to
 develop other sorts of indexing, eg df[evens(), ], or df[last(5),
 last(3)].

 I agree:  it would be a nice addition, but a fair bit of work.  I think it
 would be quite doable for the indexable things in the base packages, but
 there are a lot of contributed packages that define [ methods, and those
 methods would all need to be modified too.

That's true, although I suspect many contributed [.methods eventually
delegate to base methods and might work without further modification.

 (Just to be clear, when I say doable, I'm thinking that your iterators
 return functions that compute subsets of index ranges.  For example, evens()
 might be implemented as

 evens - function() {
  result - function(indices) {
indices[indices %% 2 == 0]
  }
  class(result) - iterator
  return(result)
 }

 and then `[` in v[evens()] would recognize that it had been passed an
 iterator, and would pass 1:length(v) to the iterator to get the subset of
 even indices.  Is that what you had in mind?)

Yes, that's exactly what I was thinking, although you'd have to put
some thought into the conventions - would it be better to pass in the
length of the vector instead of a vector of indices?  Should all
iterators return logical vectors?  That way you could do x[evens() 
last(5)] to get the even indices out of the last 5, as opposed to
x[evens()][last(5)] which would return the last 5 even indices.

You could also imagine similar iterators for random sampling, like
samp(0.2) to choose 20% of the indices, or boot(0.8) to choose 80%
with replacement.  first(n) could also be useful, selecting the first
min(n, length(vector)) observations.   An iterator version of rev()
would also be handy.

Maybe selector would be a better name than iterator though, as these
don't have the same feel as iterators in other languages.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Vitalie Spinu

On Fri, 12 Dec 2008 17:38:13 +0100, hadley wickham h.wick...@gmail.com wrote:


You could also imagine similar iterators for random sampling, like
samp(0.2) to choose 20% of the indices, or boot(0.8) to choose 80%
with replacement.  first(n) could also be useful, selecting the first
min(n, length(vector)) observations.   An iterator version of rev()
would also be handy.

Maybe selector would be a better name than iterator though, as these
don't have the same feel as iterators in other languages.


That is really something!! Real high level language!!
Selectors could depend on named variables in data frame as well:

mtcars[sel(cyl3)last(5)]
mtcars[sel(cyl3)boot(80%)]

or may be just 


mtcars[cyl3last(20)]

or this is already too far?


VS.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Duncan Murdoch

On 12/12/2008 11:38 AM, hadley wickham wrote:

On Fri, Dec 12, 2008 at 8:41 AM, Duncan Murdoch murd...@stats.uwo.ca wrote:

On 12/12/2008 8:25 AM, hadley wickham wrote:


From which you might conclude that I don't like the design of subset, and
you'd be right.  However, I don't think this is a counterexample to my
general rule.  In the subset function, the select argument is treated as
an
unevaluated expression, and then there are rules about what to do with
it.
 (I.e. try to look up name `a` in the data frame, if that fails, ...)

For the requested behaviour to similarly fall within the general rule,
we'd
have to treat all indices to all kinds of things (vectors, matrices,
dataframes, etc.) as unevaluated expressions, with special handling for
the
particular symbol `end`.


Except you wouldn't have to necessarily change indexing - you could
change seq instead.  Then 5:end could produce some kind of special
data structure (maybe an iterator) that was recognised by the various
indexing functions.


Ummm, doesn't that require changes to *both* indexing and seq?


Ooops, yes.  I meant it wouldn't require indexing to use unevaluated
expression.


This would still be a lot of work for not a lot
of payoff, but it would be a logically consistent way of adding this
behaviour to indexing, and the basic work would make it possible to
develop other sorts of indexing, eg df[evens(), ], or df[last(5),
last(3)].


I agree:  it would be a nice addition, but a fair bit of work.  I think it
would be quite doable for the indexable things in the base packages, but
there are a lot of contributed packages that define [ methods, and those
methods would all need to be modified too.


That's true, although I suspect many contributed [.methods eventually
delegate to base methods and might work without further modification.


(Just to be clear, when I say doable, I'm thinking that your iterators
return functions that compute subsets of index ranges.  For example, evens()
might be implemented as

evens - function() {
 result - function(indices) {
   indices[indices %% 2 == 0]
 }
 class(result) - iterator
 return(result)
}

and then `[` in v[evens()] would recognize that it had been passed an
iterator, and would pass 1:length(v) to the iterator to get the subset of
even indices.  Is that what you had in mind?)


Yes, that's exactly what I was thinking, although you'd have to put
some thought into the conventions - would it be better to pass in the
length of the vector instead of a vector of indices?  Should all
iterators return logical vectors?  That way you could do x[evens() 
last(5)] to get the even indices out of the last 5, as opposed to
x[evens()][last(5)] which would return the last 5 even indices.


Actually, I don't think so.  evens()  last(5) would fail to evaluate, 
because you're trying to do a logical combination of two functions, not 
of two logical vectors.  Or are we going to extend the logical operators 
to work on iterators/selectors too?


Duncan Murdoch


You could also imagine similar iterators for random sampling, like
samp(0.2) to choose 20% of the indices, or boot(0.8) to choose 80%
with replacement.  first(n) could also be useful, selecting the first
min(n, length(vector)) observations.   An iterator version of rev()
would also be handy.

Maybe selector would be a better name than iterator though, as these
don't have the same feel as iterators in other languages.

Hadley



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread hadley wickham
On Fri, Dec 12, 2008 at 11:18 AM, Duncan Murdoch murd...@stats.uwo.ca wrote:
 On 12/12/2008 11:38 AM, hadley wickham wrote:

 On Fri, Dec 12, 2008 at 8:41 AM, Duncan Murdoch murd...@stats.uwo.ca
 wrote:

 On 12/12/2008 8:25 AM, hadley wickham wrote:

 From which you might conclude that I don't like the design of subset,
 and
 you'd be right.  However, I don't think this is a counterexample to my
 general rule.  In the subset function, the select argument is treated
 as
 an
 unevaluated expression, and then there are rules about what to do with
 it.
  (I.e. try to look up name `a` in the data frame, if that fails, ...)

 For the requested behaviour to similarly fall within the general rule,
 we'd
 have to treat all indices to all kinds of things (vectors, matrices,
 dataframes, etc.) as unevaluated expressions, with special handling for
 the
 particular symbol `end`.

 Except you wouldn't have to necessarily change indexing - you could
 change seq instead.  Then 5:end could produce some kind of special
 data structure (maybe an iterator) that was recognised by the various
 indexing functions.

 Ummm, doesn't that require changes to *both* indexing and seq?

 Ooops, yes.  I meant it wouldn't require indexing to use unevaluated
 expression.

 This would still be a lot of work for not a lot
 of payoff, but it would be a logically consistent way of adding this
 behaviour to indexing, and the basic work would make it possible to
 develop other sorts of indexing, eg df[evens(), ], or df[last(5),
 last(3)].

 I agree:  it would be a nice addition, but a fair bit of work.  I think
 it
 would be quite doable for the indexable things in the base packages, but
 there are a lot of contributed packages that define [ methods, and those
 methods would all need to be modified too.

 That's true, although I suspect many contributed [.methods eventually
 delegate to base methods and might work without further modification.

 (Just to be clear, when I say doable, I'm thinking that your iterators
 return functions that compute subsets of index ranges.  For example,
 evens()
 might be implemented as

 evens - function() {
  result - function(indices) {
   indices[indices %% 2 == 0]
  }
  class(result) - iterator
  return(result)
 }

 and then `[` in v[evens()] would recognize that it had been passed an
 iterator, and would pass 1:length(v) to the iterator to get the subset of
 even indices.  Is that what you had in mind?)

 Yes, that's exactly what I was thinking, although you'd have to put
 some thought into the conventions - would it be better to pass in the
 length of the vector instead of a vector of indices?  Should all
 iterators return logical vectors?  That way you could do x[evens() 
 last(5)] to get the even indices out of the last 5, as opposed to
 x[evens()][last(5)] which would return the last 5 even indices.

 Actually, I don't think so.  evens()  last(5) would fail to evaluate,
 because you're trying to do a logical combination of two functions, not of
 two logical vectors.  Or are we going to extend the logical operators to
 work on iterators/selectors too?

Oh yes, that's a good point.  But wouldn't the following do the job?

.selector - function(a, b) {
  function(n) a(n)  b(n)
}

or

.selector - function(a, b) {
  function(n) intersect(a(n), b(n))
}

depending on whether selectors return logical or numeric vectors.
Writing functions for | and ! would be similarly easy.  Or am I
missing something?

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread hadley wickham
On Fri, Dec 12, 2008 at 11:11 AM, Vitalie Spinu vitosm...@rambler.ru wrote:
 On Fri, 12 Dec 2008 17:38:13 +0100, hadley wickham h.wick...@gmail.com
 wrote:

 You could also imagine similar iterators for random sampling, like
 samp(0.2) to choose 20% of the indices, or boot(0.8) to choose 80%
 with replacement.  first(n) could also be useful, selecting the first
 min(n, length(vector)) observations.   An iterator version of rev()
 would also be handy.

 Maybe selector would be a better name than iterator though, as these
 don't have the same feel as iterators in other languages.

 That is really something!! Real high level language!!
 Selectors could depend on named variables in data frame as well:

 mtcars[sel(cyl3)last(5)]
 mtcars[sel(cyl3)boot(80%)]

 or may be just
 mtcars[cyl3last(20)]

 or this is already too far?

This would be a considerable extension because then the selector would
need to know about all other variables in the dataset, and you'd need
someway of combining selectors with logical vectors.  So it would be a
huge increase in complexity for not much gain, given that with just
the interface we have described you could do:

mtcars[mtcars$cyl  3, ][last(20), ]
# or
subset(mtcars, cyl  3)[last(20), ]

The main idea of selectors is that they would be independent of the
data structure that they are being used with.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Gabor Grothendieck
On Fri, Dec 12, 2008 at 12:11 PM, Vitalie Spinu vitosm...@rambler.ru wrote:
 On Fri, 12 Dec 2008 17:38:13 +0100, hadley wickham h.wick...@gmail.com
 wrote:

 You could also imagine similar iterators for random sampling, like
 samp(0.2) to choose 20% of the indices, or boot(0.8) to choose 80%
 with replacement.  first(n) could also be useful, selecting the first
 min(n, length(vector)) observations.   An iterator version of rev()
 would also be handy.

 Maybe selector would be a better name than iterator though, as these
 don't have the same feel as iterators in other languages.

 That is really something!! Real high level language!!
 Selectors could depend on named variables in data frame as well:

 mtcars[sel(cyl3)last(5)]
 mtcars[sel(cyl3)boot(80%)]

 or may be just
 mtcars[cyl3last(20)]


You can do this (and quite a bit more) in data.table:

 library(data.table)
 mtcars.dt - as.data.table(mtcars)
 tail(mtcars.dt[cyl  5], 4)
  mpg cyl  disp  hp dratwt qsec vs am gear carb
[1,] 27.3   4  79.0  66 4.08 1.935 18.9  1  141
[2,] 26.0   4 120.3  91 4.43 2.140 16.7  0  152
[3,] 30.4   4  95.1 113 3.77 1.513 16.9  1  152
[4,] 21.4   4 121.0 109 4.11 2.780 18.6  1  142

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Duncan Murdoch

On 12/12/2008 12:23 PM, hadley wickham wrote:

On Fri, Dec 12, 2008 at 11:18 AM, Duncan Murdoch murd...@stats.uwo.ca wrote:

On 12/12/2008 11:38 AM, hadley wickham wrote:


On Fri, Dec 12, 2008 at 8:41 AM, Duncan Murdoch murd...@stats.uwo.ca
wrote:


On 12/12/2008 8:25 AM, hadley wickham wrote:


From which you might conclude that I don't like the design of subset,
and
you'd be right.  However, I don't think this is a counterexample to my
general rule.  In the subset function, the select argument is treated
as
an
unevaluated expression, and then there are rules about what to do with
it.
 (I.e. try to look up name `a` in the data frame, if that fails, ...)

For the requested behaviour to similarly fall within the general rule,
we'd
have to treat all indices to all kinds of things (vectors, matrices,
dataframes, etc.) as unevaluated expressions, with special handling for
the
particular symbol `end`.


Except you wouldn't have to necessarily change indexing - you could
change seq instead.  Then 5:end could produce some kind of special
data structure (maybe an iterator) that was recognised by the various
indexing functions.


Ummm, doesn't that require changes to *both* indexing and seq?


Ooops, yes.  I meant it wouldn't require indexing to use unevaluated
expression.


This would still be a lot of work for not a lot
of payoff, but it would be a logically consistent way of adding this
behaviour to indexing, and the basic work would make it possible to
develop other sorts of indexing, eg df[evens(), ], or df[last(5),
last(3)].


I agree:  it would be a nice addition, but a fair bit of work.  I think
it
would be quite doable for the indexable things in the base packages, but
there are a lot of contributed packages that define [ methods, and those
methods would all need to be modified too.


That's true, although I suspect many contributed [.methods eventually
delegate to base methods and might work without further modification.


(Just to be clear, when I say doable, I'm thinking that your iterators
return functions that compute subsets of index ranges.  For example,
evens()
might be implemented as

evens - function() {
 result - function(indices) {
  indices[indices %% 2 == 0]
 }
 class(result) - iterator
 return(result)
}

and then `[` in v[evens()] would recognize that it had been passed an
iterator, and would pass 1:length(v) to the iterator to get the subset of
even indices.  Is that what you had in mind?)


Yes, that's exactly what I was thinking, although you'd have to put
some thought into the conventions - would it be better to pass in the
length of the vector instead of a vector of indices?  Should all
iterators return logical vectors?  That way you could do x[evens() 
last(5)] to get the even indices out of the last 5, as opposed to
x[evens()][last(5)] which would return the last 5 even indices.


Actually, I don't think so.  evens()  last(5) would fail to evaluate,
because you're trying to do a logical combination of two functions, not of
two logical vectors.  Or are we going to extend the logical operators to
work on iterators/selectors too?


Oh yes, that's a good point.  But wouldn't the following do the job?

.selector - function(a, b) {
  function(n) a(n)  b(n)
}

or

.selector - function(a, b) {
  function(n) intersect(a(n), b(n))
}

depending on whether selectors return logical or numeric vectors.
Writing functions for | and ! would be similarly easy.  Or am I
missing something?


No, I think those definitions would be fine, but I'd be concerned about 
speed issues if we start messing with primitives.


While we're at it, we might as well do the same sort of thing for :, and 
define a selector named end, and then 3:end would give a selector from 3 
to the end, which brings us back to the original question.  So it's not 
nearly as intrusive as I thought it would be.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread hadley wickham
 Oh yes, that's a good point.  But wouldn't the following do the job?

 .selector - function(a, b) {
  function(n) a(n)  b(n)
 }

 or

 .selector - function(a, b) {
  function(n) intersect(a(n), b(n))
 }

 depending on whether selectors return logical or numeric vectors.
 Writing functions for | and ! would be similarly easy.  Or am I
 missing something?

 No, I think those definitions would be fine, but I'd be concerned about
 speed issues if we start messing with primitives.

Speed or expressiveness: pick one? ;)  People could always use the
regular subsetting mechanisms if they want the best speed - any
changes to support selectors wouldn't affect the speed of the other
methods of subsetting, would they?

 While we're at it, we might as well do the same sort of thing for :, and
 define a selector named end, and then 3:end would give a selector from 3 to
 the end, which brings us back to the original question.  So it's not nearly
 as intrusive as I thought it would be.

3:end() do you mean?  Or do you mean extending seq so that it uses
unevaluted input?

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Duncan Murdoch

On 12/12/2008 1:06 PM, hadley wickham wrote:

Oh yes, that's a good point.  But wouldn't the following do the job?

.selector - function(a, b) {
 function(n) a(n)  b(n)
}

or

.selector - function(a, b) {
 function(n) intersect(a(n), b(n))
}

depending on whether selectors return logical or numeric vectors.
Writing functions for | and ! would be similarly easy.  Or am I
missing something?


No, I think those definitions would be fine, but I'd be concerned about
speed issues if we start messing with primitives.


Speed or expressiveness: pick one? ;)  People could always use the
regular subsetting mechanisms if they want the best speed - any
changes to support selectors wouldn't affect the speed of the other
methods of subsetting, would they?


While we're at it, we might as well do the same sort of thing for :, and
define a selector named end, and then 3:end would give a selector from 3 to
the end, which brings us back to the original question.  So it's not nearly
as intrusive as I thought it would be.


3:end() do you mean?  Or do you mean extending seq so that it uses
unevaluted input?


My end would be the output of your end().  If there are no args and no 
local context, I don't see the need for it to be a function call.  It 
would just be defined as something like


end - structure( function(n) c(rep(FALSE, n-1), TRUE), class=selector)

I'm not sure what the definition of : should be if one of the args is a 
selector.


Duncan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Claudia Beleites
 evens()  last(5)
wouldn't x[evens()][last(5)] do the  already?

or is different, though.

Claudia

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Greg Snow
That depends on what you want evens()  last(5) to mean.  Does that mean the 
last 5 evens (returning 5 values) or the values in the last 5 that are also 
even items (returning either 2 or 3 values depending on if the structure has an 
odd or even number of elements).  It could be interpreted either way.  Your 
subset below does the first, the other examples do the 2nd.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Claudia Beleites
 Sent: Friday, December 12, 2008 11:38 AM
 To: r-help@r-project.org
 Subject: Re: [R] The end of Matlab

  evens()  last(5)
 wouldn't x[evens()][last(5)] do the  already?

 or is different, though.

 Claudia

 --
 Claudia Beleites
 Dipartimento dei Materiali e delle Risorse Naturali
 Università degli Studi di Trieste
 Via Alfonso Valerio 6/a
 I-34127 Trieste

 phone: +39 (0 40) 5 58-34 47
 email: cbelei...@units.it

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread hadley wickham
 My end would be the output of your end().  If there are no args and no local
 context, I don't see the need for it to be a function call.  It would just
 be defined as something like

 end - structure( function(n) c(rep(FALSE, n-1), TRUE), class=selector)

Oh, I see what you mean.

 I'm not sure what the definition of : should be if one of the args is a
 selector.

Alternatively you could use !first(2), and only use end/last when you
want to select the last n observations.  Of course !first(2) would be
the equivalent to -(1:2) so there's not much savings there.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Vitalie Spinu

On Fri, 12 Dec 2008 18:27:02 +0100, hadley wickham h.wick...@gmail.com wrote:



or may be just
mtcars[cyl3last(20)]

or this is already too far?


This would be a considerable extension because then the selector would
need to know about all other variables in the dataset, and you'd need
someway of combining selectors with logical vectors. 



If selector returns a logical vector then I really don't see where is the problem. Probably I am mistaken but 
implementing mtcars[cyl3] is not such a big deal. Just an operator `[.` start searching for cyl 
from inside the x frame and not from parent.frame as it does now. It is just like putting 
with inside '[', or not?

When started with R I was really disappointed that such a natural and intuitive 
subsetting  is not allowed, but instead lengthy and ackward 
mtcars[mtcars$syl3] is required.

R is an interactive language for 99% of the users and features like that(and 
selectors indeed) would make a tremendous difference.

Regards,
SV.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread hadley wickham
On Fri, Dec 12, 2008 at 3:08 PM, Vitalie Spinu vitosm...@rambler.ru wrote:
 On Fri, 12 Dec 2008 18:27:02 +0100, hadley wickham h.wick...@gmail.com
 wrote:


 or may be just
 mtcars[cyl3last(20)]

 or this is already too far?

 This would be a considerable extension because then the selector would
 need to know about all other variables in the dataset, and you'd need
 someway of combining selectors with logical vectors.

 If selector returns a logical vector then I really don't see where is the
 problem. Probably I am mistaken but implementing mtcars[cyl3] is not such a
 big deal. Just an operator `[.` start searching for cyl from inside the
 x frame and not from parent.frame as it does now. It is just like putting
 with inside '[', or not?

And that's a big change to the current behaviour!

I think there are a few good reasons why this shouldn't be the default:

 * You could no longer do: cyl - 4;  mtcars[mtcars$cyl == cyl, ]
(which is very useful when writing function)

 * If you want that behaviour, then just use subset

 * It only makes sense for variables of data frames, not for all the
other types of subsets

 * Generally it's better to be explicit than not

 When started with R I was really disappointed that such a natural and
 intuitive subsetting  is not allowed, but instead lengthy and ackward
 mtcars[mtcars$syl3] is required.

 R is an interactive language for 99% of the users and features like that(and
 selectors indeed) would make a tremendous difference.

 Regards,
 SV.




-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] The end of Matlab

2008-12-11 Thread Mike Rowe
Greetings!

I come to R by way of Matlab.  One feature in Matlab I miss is its
end keyword.  When you put end inside an indexing expression, it
is interpreted as the length of the variable along the dimension being
indexed.  For example, if the same feature were implemented in R:

my.vector[5:end]

would be equivalent to:

my.vector[5:length(my.vector)]

or:

this.matrix[3:end,end]

would be equivalent to:

this.matrix[3:nrow(this.matrix),ncol(this.matrix)]   # or
this.matrix[3:dim(this.matrix)[1],dim(this.matrix)[2]]

As you can see, the R version requires more typing, and I am a lousy
typist.

With this in mind, I wanted to try to implement something like this in
R.  It seems like that in order to be able to do this, I would have to
be able to access the parse tree of the expression currently being
evaluated by the interpreter from within my End function-- is this
possible?  Since the [ and [[ operators are primitive I can't see
their arguments via the call stack functions...

Anyone got a workaround?  Would anybody else like to see this feature
added to R?

Thanks,
Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-11 Thread Gabor Grothendieck
Use tail and head. See interspersed.

On Thu, Dec 11, 2008 at 9:45 PM, Mike Rowe mwr...@gmail.com wrote:
 Greetings!

 I come to R by way of Matlab.  One feature in Matlab I miss is its
 end keyword.  When you put end inside an indexing expression, it
 is interpreted as the length of the variable along the dimension being
 indexed.  For example, if the same feature were implemented in R:

 my.vector[5:end]

tail(my.vector, -4)


 would be equivalent to:

 my.vector[5:length(my.vector)]

 or:

 this.matrix[3:end,end]

tail(tail(this.matrix, -2), 1)


 would be equivalent to:

 this.matrix[3:nrow(this.matrix),ncol(this.matrix)]   # or
 this.matrix[3:dim(this.matrix)[1],dim(this.matrix)[2]]

 As you can see, the R version requires more typing, and I am a lousy
 typist.

 With this in mind, I wanted to try to implement something like this in
 R.  It seems like that in order to be able to do this, I would have to
 be able to access the parse tree of the expression currently being
 evaluated by the interpreter from within my End function-- is this
 possible?  Since the [ and [[ operators are primitive I can't see
 their arguments via the call stack functions...

 Anyone got a workaround?  Would anybody else like to see this feature
 added to R?

 Thanks,
 Mike

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-11 Thread Gabor Grothendieck
Its been pointed out to me that the second one is wrong.

It should be:

tail(this.matrix, -2)[, ncol(this.matrix)]

which is not as compact as matlab or my prior post but
still not particularly onerous.

On Thu, Dec 11, 2008 at 11:49 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 Use tail and head. See interspersed.

 On Thu, Dec 11, 2008 at 9:45 PM, Mike Rowe mwr...@gmail.com wrote:
 Greetings!

 I come to R by way of Matlab.  One feature in Matlab I miss is its
 end keyword.  When you put end inside an indexing expression, it
 is interpreted as the length of the variable along the dimension being
 indexed.  For example, if the same feature were implemented in R:

 my.vector[5:end]

 tail(my.vector, -4)


 would be equivalent to:

 my.vector[5:length(my.vector)]

 or:

 this.matrix[3:end,end]

 tail(tail(this.matrix, -2), 1)


 would be equivalent to:

 this.matrix[3:nrow(this.matrix),ncol(this.matrix)]   # or
 this.matrix[3:dim(this.matrix)[1],dim(this.matrix)[2]]

 As you can see, the R version requires more typing, and I am a lousy
 typist.

 With this in mind, I wanted to try to implement something like this in
 R.  It seems like that in order to be able to do this, I would have to
 be able to access the parse tree of the expression currently being
 evaluated by the interpreter from within my End function-- is this
 possible?  Since the [ and [[ operators are primitive I can't see
 their arguments via the call stack functions...

 Anyone got a workaround?  Would anybody else like to see this feature
 added to R?

 Thanks,
 Mike

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-11 Thread Duncan Murdoch

On 11/12/2008 9:45 PM, Mike Rowe wrote:

Greetings!

I come to R by way of Matlab.  One feature in Matlab I miss is its
end keyword.  When you put end inside an indexing expression, it
is interpreted as the length of the variable along the dimension being
indexed.  For example, if the same feature were implemented in R:

my.vector[5:end]

would be equivalent to:

my.vector[5:length(my.vector)]


And if my.vector is of length less than 5?


or:

this.matrix[3:end,end]

would be equivalent to:

this.matrix[3:nrow(this.matrix),ncol(this.matrix)]   # or
this.matrix[3:dim(this.matrix)[1],dim(this.matrix)[2]]

As you can see, the R version requires more typing, and I am a lousy
typist.


It doesn't save typing, but a more readable version would be

rows - nrow(this.matrix)
cols - ncol(this.matrix)
this.matrix[3:rows, cols]



With this in mind, I wanted to try to implement something like this in
R.  It seems like that in order to be able to do this, I would have to
be able to access the parse tree of the expression currently being
evaluated by the interpreter from within my End function-- is this
possible?  Since the [ and [[ operators are primitive I can't see
their arguments via the call stack functions...

Anyone got a workaround?  Would anybody else like to see this feature
added to R?


I like the general rule that subexpressions have values that can be 
evaluated independent of context, so I don't think this is a good idea.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.