To be more clear: I do NOT state that the function "round" is used. I read the documentation as "non integer positive numerical values will be replaced by the next smallest integer", the important part being the NEXT smallest integer, i.e. how ceiling() does it. So that's exactly what I would expect. If "replaced by" causes less confusion than "rounded to" or "truncated to", then use that.
I do agree that this wording would still indicate that this happens prior to the sampling, whereas the output indicates that this is done after the sampling. I can reproduce the sample() outcome using runif() as follows: > table(ceiling(runif(10000,0,2.1))) 1 2 3 4774 4756 470 > table(ceiling(runif(10000,0,3))) 1 2 3 3273 3440 3287 I don't know if that's the intended behaviour, but there is some logic in it. It's up to the R core team to decide if this is OK and rephrase the help page so it becomes more clear what actually happens, or simply add something like if( (x%%1) != 0) x <- ceiling(x) prior to the sampling algorithm. Cheers Joris On Thu, Sep 20, 2018 at 9:44 AM lmo via R-devel <r-devel@r-project.org> wrote: > Although it seems to be pretty weird to enter a numeric vector of length > one that is not an integer as the first argument to sample(), the results > do not seem to match what is documented in the manual. In addition, the > results below do not support the use of round rather than truncate in the > documentation. Consider the code below. > The first sentence in the details section says: "If x has length 1, is > numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes > place from 1:x." > In the console:> 1:2.001 > [1] 1 2 > > 1:2.9 > [1] 1 2 > > truncation: > > trunc(2.9) > [1] 2 > > So, this seems to support the quote from in previous emails: "Non-integer > positive numerical values of n or x will be truncated to the next smallest > integer, which has to be no larger than .Machine$integer.max." > However, again in the console:> set.seed(123) > > table(sample(2.001, 10000, replace=TRUE)) > > 1 2 3 > 5052 4941 7 > > So, neither rounding nor truncation is occurring. Next, define a sequence. > > x <- seq(2.001, 2.51, length.out=20) > Now, grab all of the threes from sample()-ing this sequence. > > > set.seed(123) > > threes <- sapply(x, function(y) table(sample(y, 10000, replace=TRUE))[3]) > > Check for NAs (I cheated here and found a nice seed).> any(is.na(threes)) > [1] FALSE > Now, the (to me) disturbing result. > > > is.unsorted(threes) > [1] FALSE > > or equivalently > > > all(diff(threes) > 0) > [1] TRUE > > So the number of threes grows monotonically as 2.001 moves to 2.5. As I > hinted above, the monotonic growth is not assured. My guess is that the > growth is stochastic and relates to some "probability weighting" based on > how close the element of x is to 3. Perhaps this has been brought up > before, but it seems relevant to the current discussion. > A potential aid to this issue would be something like > if(length(x) == 1 && !all.equal(x, as.integer(x))) warning("It is a bad > idea to use vectors of length 1 in the x argument that are not integers.") > Hope that helps,luke > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Joris Meys Statistical consultant Department of Data Analysis and Mathematical Modelling Ghent University Coupure Links 653, B-9000 Gent (Belgium) <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g> ----------- Biowiskundedagen 2017-2018 http://www.biowiskundedagen.ugent.be/ ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel