If it matters to you (why?) use pmax(0, kde$y). Actually, it bothers me because I need sufficient precision of numerical calculation in this case, where the density estimate is around zero. To illustrate why I concern about it, I'd like to introduce the problem I am working with.
The problem is to estimate the relative entropy of two distribution given 2 samples from them. I'll take an example. Suppose sample1 is standard normal distributed, and sample2 ~ N(u,1), and I am interested in estimating the relative entropy of these 2 samples, which are not known in practice. It can be illustrated as the following code: n=30 # sample size u=-1 # the expected value of sample2 N=500 # the gridsize for the integeral of relative entropy sample1=rnorm(n);sample2=rnorm(n)+u #samples are realized here # the range for the integeral of relative entropy from.x=min(c(sample1,sample2))-var(c(sample1,sample2)) to.x=max(c(sample1,sample2))+var(c(sample1,sample2)) range=to.x-from.x # estimate density fx=density(sample1,n=N,from=from.x,to=to.x)$y gx=density(sample2,n=N,from=from.x,to=to.x)$y # calculate relative entropy relative.entropy=1/2* (mean(fx*(log(fx)-log(gx)))*range+mean(gx*(log(gx)-log(fx)))*range) relative.entropy Sometimes, the variation of samples I obtained is very large, which corresponds to u is set very large in the above code, and many negative fx and gx occures, so I don't know how to solve it. If use pmax(0, kde$y) or pmax(10^(-300),kde$y), will there be a considerable problem for the calculation of relative entropy where fx or gx is around zero? I found the function akj() wouldn't give negative denstiy estimation, but the calculation speed was much lower than density(). And akj() is not exactly the same as kernel density estimate but called *Density Estimation using Adaptive Kernel method*. Thanks for your help. Best, Junjie 2006/12/30, Prof Brian Ripley <[EMAIL PROTECTED]>: > > This is called 'rounding error', and has been discussed here previously. > If it matters to you (why?) use pmax(0, kde$y). > > When doing numerical calculations you should always be aware that the > numerical results will differ from algebraic ones, and that is all that is > happening here. > > On Sat, 30 Dec 2006, Àî¿¡½Ü wrote: > > > Why? And how to solve it? The code and result are following, > > > > > >> data=rnorm(50) > >> > >> kde=density(data,n=20,from=-1,to=10) > >> > >> kde$x;kde$y > > [1] -1.0000000 -0.4210526 0.1578947 0.7368421 1.3157895 1.8947368 > > [7] 2.4736842 3.0526316 3.6315789 4.2105263 4.7894737 5.3684211 > > [13] 5.9473684 6.5263158 7.1052632 7.6842105 8.2631579 8.8421053 > > [19] 9.4210526 10.0000000 > > [1] 2.422392e-01 3.877025e-01 4.746580e-01 2.757747e-01 > 1.787630e-01 > > [6] 1.102396e-01 2.331694e-02 3.294412e-04 2.260746e-07 > 6.996146e-12 > > [11] -1.179461e-18 -1.226790e-17 8.892545e-18 1.144173e-17 - > 1.881253e-17 > > [16] -2.782621e-17 -5.314722e-18 -1.691545e-17 -1.986261e-17 - > 2.498227e-17 > > > > Best, > > > > Junjie Li > > > > Tsinghua Univercity > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > [email protected] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > [[alternative HTML version deleted]]
______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
