Providing the wrapper would allow for both performance as well as
user-simplicity.

x[RANGE(1,1e6)] and x[1:1e6] could both be handled internally, where:

RANGE <- function(from,to) {
  structure(seq(from,to), class="RANGE")
}

Just testing for a 'RANGE' object in your [. method would let the
optimization be up to the end user.

The 'xts' package provides something similar with respect to subsetting by
time.  We accept a character string conforming to ISO8601 style time ranges,
as well as standard classes that would be available to subset any other
matrix-like object.

The ISO way will get you fast binary searching over the time-index, whereas
using POSIX time is a linear search.

HTH
Jeff

On Wed, May 12, 2010 at 3:27 PM, James Bullard <bull...@stat.berkeley.edu>wrote:

> >> -----Original Message-----
> >> From: r-devel-boun...@r-project.org
> >> [mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan Murdoch
> >> Sent: Wednesday, May 12, 2010 11:35 AM
> >> To: bull...@stat.berkeley.edu
> >> Cc: r-de...@stat.math.ethz.ch
> >> Subject: Re: [Rd] ranges and contiguity checking
> >>
> >> On 12/05/2010 2:18 PM, James Bullard wrote:
> >> > Hi All,
> >> >
> >> > I am interfacing to some C libraries (hdf5) and I have
> >> methods defined for
> >> > '[', these methods do hyperslab selection, however, currently I am
> >> > limiting slab selection to contiguous blocks, i.e., things
> >> defined like:
> >> > i:(i+k). I don't do any contiguity checking at this point,
> >> I just grab the
> >> > max and min of the range and them potentially do an
> >> in-memory subselection
> >> > which is what I am definitely trying to avoid. Besides
> >> using deparse, I
> >> > can't see anyway to figure out that these things (i:(i+k)
> >> and c(i, i+1,
> >> > ..., i+k)) are different.
> >> >
> >> > I have always liked how 1:10 was a valid expression in R
> >> (as opposed to
> >> > python where it is not by itself.), however I'd somehow
> >> like to know that
> >> > the thing was contiguous range without examining the un-evaluated
> >> > expression or worse, all(diff(i:(i+k)) == 1)
> >
> > You could define a sequence class, say 'hfcSeq'
> > and insist that the indices given to [.hfc are
> > hfcSeq objects.  E.g., instead of
> >     hcf[i:(i+k)]
> > the user would use
> >     hcf[hfcSeq(i,i+k)]
> > or
> >     index <- hcfSeq(i,i+k)
> >     hcf[index]
> > max, min, and range methods for hcfSeq
> > would just inspect one or both of its
> > elements.
>
> I could do this, but I wanted it to not matter to the user whether or not
> they were dealing with a HDF5Dataset or a plain-old matrix.
>
> It seems like I cannot define methods on: ':'. If I could do that then I
> could implement an immutable 'range' class which would be good, but then
> I'd have to also implement: '['(matrix, range) -- which would be easy, but
> still more work than I wanted to do.
>
> I guess I was thinking that there is some inherent value in an immutable
> native range type which is constant in time and memory for construction.
> Then I could define methods on '['(matrix, range) and '['(matrix,
> integer). I'm pretty confident this is more less what is happening in the
> IRanges package in Bioconductor, but (maybe for the lack of support for
> setting methods on ':') it is happening in a way that makes things very
> non-transparent to a user. As it stands, I can optimize for performance by
> using a IRange-type wrapper or I can optimize for code-clarity by killing
> performance.
>
> thanks again, jim
>
>
>
>
>
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >>
> >> You can implement all(diff(x) == 1) more efficiently in C,
> >> but I don't
> >> see how you could hope to do any better than that without
> >> putting very
> >> un-R-like restrictions on your code.  Do you really want to say that
> >>
> >> A[i:(i+k)]
> >>
> >> is legal, but
> >>
> >> x <- i:(i+k)
> >> A[x]
> >>
> >> is not?  That will be very confusing for your users.  The problem is
> >> that objects don't remember where they came from, only arguments to
> >> functions do, and functions that make use of this fact mainly
> >> do it for
> >> decorating the output (nice labels in plots) or making error messages
> >> more intelligible.
> >>
> >> Duncan Murdoch
> >>
> >> ______________________________________________
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Jeffrey Ryan
jeffrey.r...@insightalgo.com

ia: insight algorithmics
www.insightalgo.com

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to