> -----Original Message----- > From: r-devel-boun...@r-project.org > [mailto:r-devel-boun...@r-project.org] On Behalf Of Patrick Burns > Sent: Sunday, June 20, 2010 3:08 AM > To: r-devel@r-project.org > Subject: [Rd] proposed change to 'sample' > > There is a weakness in the 'sample' > function that is highlighted in the > help file. The 'x' argument can be > either the vector from which to sample, > or the maximum value of the sequence > from which to sample. > > This can be ambiguous if the length of > 'x' is one. > > I propose adding an argument that allows > the user (programmer) to avoid that > ambiguity: > > function (x, size, replace = FALSE, prob = NULL, > max = length(x) == 1L && is.numeric(x) && x >= 1)
S+'s sample() has an argument 'n' to achieve the same result. It has been there since at least 2005 (S+ 7.0.6). sample(n=n) means to return a sample from seq_along(n), where n must be a scalar nonnegative integer. sample(x=x) retains it old ambiguous meaning. sample(x, size = n, replace = F, prob = NULL, n = NULL, ...) S+ also has an rsample function where n (with the same meaning) is the only way to specify the population. It also has an order=TRUE/FALSE argument where order=TRUE means to randomly order the output. order=FALSE means that the ordering of the output is unspecified, but it allows the person writing rsample methods to use the quickest way to get a random sample (for big data it can be fastest to return the sample from 1:n in increasing order). rsample(n, size = n, replace = F, prob = NULL, bigdata = F, minimal = NULL, ..., order = T) I like the idea of separating the concepts of sampling and permuting data. Many statistics are invariant to ordering of the data and it can be a waste of time to randomly order a sample to feed to such functions. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > { > if (max) { > if (missing(size)) > size <- x > .Internal(sample(x, size, replace, prob)) > } > else { > if (missing(size)) > size <- length(x) > x[.Internal(sample(length(x), size, replace, prob))] > } > } > <environment: namespace:base> > > > This just takes the condition of the first > 'if' to be the default value of the new 'max' > argument. > > So in the "surprise" section of the examples > in the 'sample' help file > > sample(x[x > 9]) > > and > > sample(x[x > 9], max=FALSE) > > have different behaviours. > > By the way, I'm certainly not convinced that > 'max' is the best name for the argument. > > -- > Patrick Burns > pbu...@pburns.seanet.com > http://www.burns-stat.com > (home of 'Some hints for the R beginner' > and 'The R Inferno') > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel