Re: [Rd] suggestion for extending ?as.factor

Peter Dalgaard Tue, 05 May 2009 02:30:51 -0700

Petr Savicky wrote:


> 
>> Notice that the discrepancy comes from sums that really are identical
>> values (in decimal arithmetic), but where the binary FP inaccuracy makes
>> them slightly different.
>>
>> [for a nice picture, continue the example with
>>
>>> tt <- table(signif(zz,7))
>>> plot(as.numeric(names(tt)),tt, type="h")
> 
> The form of this picture is not due to rounding errors. The picture may be
> obtained even within an integer arithmetic as follows.
> 
>   ss <- round(10*sleep$extra)
>   zz <- replicate(20000,sum(sample(ss,10)))
>   tt <- table(zz)
>   plot(as.numeric(names(tt)),tt, type="h")

I know. The point was rather that if you are not careful with rounding,
you get the some of the bars wrong (you get 2 or 3 small bars very close
to each other instead of one longer one). Computed p values from
permutation tests (as in mean(sim>=obs)) also need care for the same reason.

> 
> The variation of the frequencies is due to two effects.
> 
> First, each individual value of the sum occurs with low probability, so 20000
....

> 
> The other cause of variation of the frequencies is that even the true 
> distribution of
> the sums has a lot of local minima and maxima. 

Yes. You can actually generate the exact distribution easily using

d <- combn(sleep$extra, 10, sum)
d <- signif(d,7)
tt <- table(d)
plot(as.numeric(names(tt)),tt, type="h")

and if you omit the signif() bit (not with R-devel):

> table(table(names(table(d))))

  1   2   3
137 161  17

i.e. 315 distinct values but over half occur in duplicate or triplicate
versions.


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalga...@biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] suggestion for extending ?as.factor

Reply via email to