Petr Savicky wrote:
> >> Notice that the discrepancy comes from sums that really are identical >> values (in decimal arithmetic), but where the binary FP inaccuracy makes >> them slightly different. >> >> [for a nice picture, continue the example with >> >>> tt <- table(signif(zz,7)) >>> plot(as.numeric(names(tt)),tt, type="h") > > The form of this picture is not due to rounding errors. The picture may be > obtained even within an integer arithmetic as follows. > > ss <- round(10*sleep$extra) > zz <- replicate(20000,sum(sample(ss,10))) > tt <- table(zz) > plot(as.numeric(names(tt)),tt, type="h") I know. The point was rather that if you are not careful with rounding, you get the some of the bars wrong (you get 2 or 3 small bars very close to each other instead of one longer one). Computed p values from permutation tests (as in mean(sim>=obs)) also need care for the same reason. > > The variation of the frequencies is due to two effects. > > First, each individual value of the sum occurs with low probability, so 20000 .... > > The other cause of variation of the frequencies is that even the true > distribution of > the sums has a lot of local minima and maxima. Yes. You can actually generate the exact distribution easily using d <- combn(sleep$extra, 10, sum) d <- signif(d,7) tt <- table(d) plot(as.numeric(names(tt)),tt, type="h") and if you omit the signif() bit (not with R-devel): > table(table(names(table(d)))) 1 2 3 137 161 17 i.e. 315 distinct values but over half occur in duplicate or triplicate versions. -- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel