Re: [Rd] RFC: adding an 'exact' argument to [[

2007-05-22 Thread Gabor Grothendieck
In addition to $ that was mentioned in this thread there is
also attr, e.g.

 names(attributes(CO2))
[1] names row.names class formula   outer labels
[7] units
 attr(CO2, f)  # matches formula
uptake ~ conc | Plant

On 5/17/07, Seth Falcon [EMAIL PROTECTED] wrote:
 Hi all,

 One of the things I find most problematic in R is the partial matching
 of names in lists.  Robert and I have discussed this and we believe
 that having a mechanism that does not do partial matching would be of
 significant benefit to R programmers.  To that end, I have written a
 patch that modifies the behavior of [[ as follows:

   1. [[ gains an 'exact' argument with default value NA

   2. Behavior of 'exact' argument:

  exact=NA
  partial matching is performed as usual, however, a warning
  will be issued when a partial match occurs.  This is the
  default.

  exact=TRUE
  no partial matching is performed.

  exact=FALSE
  partial matching is allowed and no warning issued if it
  occurs.

 This change has been discussed among R-core members and there appeared
 to be a general consensus that this approach was a good way to
 proceed.  However, we are interested in other suggestions from the
 broader R developer community.

 Some additional rationale for our approach:

 Lists are used as the underlying data structures in many R programs
 and in these cases the named elements are not a fixed set of things
 with a fixed set of names.  For these programs, [[ will be used with
 an argument that gets evaluated at runtime and partial matching here
 is almost always a disaster.  Furthermore, dealing with data that has
 common prefixes happens often and is not an exceptional circumstance
 (a precondition for partial matching issues).

 We have tested a similar patch that simply eliminated partial matching
 for [[ on all CRAN and Bioconductor packages and did not see any
 obvious failures.

 A downside of this approach is that S4 methods on [[ will need to be
 modified to accommodate the new signature.  However, by adding an
 argument, we are able to move more slowly towards a non-partially
 matching [[ (eventually, the default could be exact=TRUE, but that is
 a discussion for another day).


 + seth

 --
 Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
 http://bioconductor.org

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: adding an 'exact' argument to [[

2007-05-22 Thread Seth Falcon
Hi again,

Robert has committed the proposed patch to R-devel.  So [[ now has an
'exact' argument and the behavior is as described:

Seth Falcon [EMAIL PROTECTED] writes:
1. [[ gains an 'exact' argument with default value NA

2. Behavior of 'exact' argument:

   exact=NA
   partial matching is performed as usual, however, a warning
   will be issued when a partial match occurs.  This is the
   default.

   exact=TRUE
   no partial matching is performed.

   exact=FALSE
   partial matching is allowed and no warning issued if it
   occurs.


+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: adding an 'exact' argument to [[

2007-05-18 Thread Prof Brian Ripley
On Thu, 17 May 2007, Duncan Murdoch wrote:

 On 5/17/2007 3:54 PM, Prof Brian Ripley wrote:

 There is a similar issue with argument partial matching.  Since we have the 
 source of R one can pretty easily build a version of R which does not have 
 the feature: I have been doing that in conjunction with 'codetools' to do 
 some checking.
 
 In both cases there is traditional partial matching: seq(along=) or 
 seq(length=), and $fitted vs $fitted.values.  There are not many uses of 
 seq(along.with=) about and vastly more of seq(along=) (although in R using 
 seq_along() is preferable): even in some packages which do use 
 seq(along.with=) there are more instances of seq(along=).

 Opinions, please:

 In another thread I think we have agreement to add an extra arg to the 
 vignette() function to limit it to attached packages.  By analogy with other 
 similar functions, the arg would be named all.available.  However, I suspect 
 most users would abbreviate that to just all.

 Should I name it all.available for consistency, or all in anticipation of 
 a day when exact argument matching will be required?

I don't think it will be required.

However, the use of all.names etc is historical, from the days when S (and 
R) would warn if you used the name of a local non-function as a function, 
do an arg 'all' got in the way.   I would use the most intuitive form.

Shortly R-devel will have options to warn on partial matching in $ and in 
args.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: adding an 'exact' argument to [[

2007-05-17 Thread Seth Falcon
Bill Dunlap [EMAIL PROTECTED] writes:
 This sounds interesting.  Do you intend to leave the $
 operator alone, so it will continue to do partial
 matching?  I suspect that that is where the majority
 of partial matching for list names is done.

The current proposal will not touch $.  I agree that most intentional
partial matching uses $ (hopefully only during interactive sessions).
The main benefit of the our proposed change is more reliable package
code.  For long lists and certain patterns of use, there are also
performance benefits:

 kk - paste(abc, 1:(1e6), sep=)
 vv = as.list(1:(1e6))
 names(vv) = kk

 system.time(vv[[fooo, exact=FALSE]])
   user  system elapsed 
  0.074   0.000   0.074 

 system.time(vv[[fooo, exact=TRUE]])
   user  system elapsed 
  0.042   0.000   0.042 


 It might be nice to have an option that made x$partial warn so we
 would fix code that relied on partial matching, but that is lower
 priority.

I think that could be useful as well.  To digress a bit further in
discussing $... I think the argument that partial matching is
desirable because it saves typing during interactive sessions now has
a lot less weight.  The recent integration of the completion code
gives less typing and complete names.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: adding an 'exact' argument to [[

2007-05-17 Thread Prof Brian Ripley
On Thu, 17 May 2007, Seth Falcon wrote:

 Bill Dunlap [EMAIL PROTECTED] writes:
 This sounds interesting.  Do you intend to leave the $
 operator alone, so it will continue to do partial
 matching?  I suspect that that is where the majority
 of partial matching for list names is done.

 The current proposal will not touch $.  I agree that most intentional
 partial matching uses $ (hopefully only during interactive sessions).
 The main benefit of the our proposed change is more reliable package
 code.  For long lists and certain patterns of use, there are also
 performance benefits:

 kk - paste(abc, 1:(1e6), sep=)
 vv = as.list(1:(1e6))
 names(vv) = kk

 system.time(vv[[fooo, exact=FALSE]])
   user  system elapsed
  0.074   0.000   0.074

 system.time(vv[[fooo, exact=TRUE]])
   user  system elapsed
  0.042   0.000   0.042


 It might be nice to have an option that made x$partial warn so we
 would fix code that relied on partial matching, but that is lower
 priority.

 I think that could be useful as well.  To digress a bit further in
 discussing $... I think the argument that partial matching is
 desirable because it saves typing during interactive sessions now has
 a lot less weight.  The recent integration of the completion code
 gives less typing and complete names.

There is a similar issue with argument partial matching.  Since we have 
the source of R one can pretty easily build a version of R which does not 
have the feature: I have been doing that in conjunction with 'codetools' 
to do some checking.

In both cases there is traditional partial matching: seq(along=) or 
seq(length=), and $fitted vs $fitted.values.  There are not many uses of 
seq(along.with=) about and vastly more of seq(along=) (although in R using 
seq_along() is preferable): even in some packages which do use 
seq(along.with=) there are more instances of seq(along=).

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: adding an 'exact' argument to [[

2007-05-17 Thread Duncan Murdoch
On 5/17/2007 3:54 PM, Prof Brian Ripley wrote:
 On Thu, 17 May 2007, Seth Falcon wrote:
 
 Bill Dunlap [EMAIL PROTECTED] writes:
 This sounds interesting.  Do you intend to leave the $
 operator alone, so it will continue to do partial
 matching?  I suspect that that is where the majority
 of partial matching for list names is done.

 The current proposal will not touch $.  I agree that most intentional
 partial matching uses $ (hopefully only during interactive sessions).
 The main benefit of the our proposed change is more reliable package
 code.  For long lists and certain patterns of use, there are also
 performance benefits:

 kk - paste(abc, 1:(1e6), sep=)
 vv = as.list(1:(1e6))
 names(vv) = kk

 system.time(vv[[fooo, exact=FALSE]])
   user  system elapsed
  0.074   0.000   0.074

 system.time(vv[[fooo, exact=TRUE]])
   user  system elapsed
  0.042   0.000   0.042


 It might be nice to have an option that made x$partial warn so we
 would fix code that relied on partial matching, but that is lower
 priority.

 I think that could be useful as well.  To digress a bit further in
 discussing $... I think the argument that partial matching is
 desirable because it saves typing during interactive sessions now has
 a lot less weight.  The recent integration of the completion code
 gives less typing and complete names.
 
 There is a similar issue with argument partial matching.  Since we have 
 the source of R one can pretty easily build a version of R which does not 
 have the feature: I have been doing that in conjunction with 'codetools' 
 to do some checking.
 
 In both cases there is traditional partial matching: seq(along=) or 
 seq(length=), and $fitted vs $fitted.values.  There are not many uses of 
 seq(along.with=) about and vastly more of seq(along=) (although in R using 
 seq_along() is preferable): even in some packages which do use 
 seq(along.with=) there are more instances of seq(along=).

Opinions, please:

In another thread I think we have agreement to add an extra arg to the 
vignette() function to limit it to attached packages.  By analogy with 
other similar functions, the arg would be named all.available.  However, 
I suspect most users would abbreviate that to just all.

Should I name it all.available for consistency, or all in 
anticipation of a day when exact argument matching will be required?

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel