If it matters to you (why?) use pmax(0, kde$y).

Actually, it bothers me because I need sufficient precision of numerical
calculation in this case, where the density estimate is around zero.  To
illustrate why I concern about it, I'd like to introduce the problem I am
working with.

The problem is to estimate the relative entropy of two distribution given
2 samples from them. I'll take an example.  Suppose sample1 is
standard normal distributed, and sample2 ~ N(u,1), and I am interested in
estimating the relative entropy of these 2 samples, which are not known in
practice. It can be illustrated as the following code:


n=30  # sample size
u=-1 # the expected value of sample2
N=500 # the gridsize for the integeral of relative entropy
sample1=rnorm(n);sample2=rnorm(n)+u #samples are realized here

# the range for the integeral of relative entropy
from.x=min(c(sample1,sample2))-var(c(sample1,sample2))
to.x=max(c(sample1,sample2))+var(c(sample1,sample2))
range=to.x-from.x

# estimate density
fx=density(sample1,n=N,from=from.x,to=to.x)$y
gx=density(sample2,n=N,from=from.x,to=to.x)$y

# calculate relative entropy
relative.entropy=1/2*
(mean(fx*(log(fx)-log(gx)))*range+mean(gx*(log(gx)-log(fx)))*range)
relative.entropy
Sometimes, the variation of samples I obtained is very large, which
corresponds to u is set very large in the above code, and many negative fx
and gx occures, so I don't know how to solve it. If use pmax(0, kde$y) or
pmax(10^(-300),kde$y), will there be a considerable problem for the
calculation of relative entropy where fx or gx is around zero?

I found the function akj() wouldn't give negative denstiy estimation, but
the calculation speed was much lower than density(). And akj() is not
exactly the same as kernel density estimate but called *Density Estimation
using Adaptive Kernel method*.


Thanks for your help.

Best,

Junjie


2006/12/30, Prof Brian Ripley <[EMAIL PROTECTED]>:
>
> This is called 'rounding error', and has been discussed here previously.
> If it matters to you (why?) use pmax(0, kde$y).
>
> When doing numerical calculations you should always be aware that the
> numerical results will differ from algebraic ones, and that is all that is
> happening here.
>
> On Sat, 30 Dec 2006, Àî¿¡½Ü wrote:
>
> > Why? And how to solve it?  The code and result are following,
> >
> >
> >> data=rnorm(50)
> >>
> >> kde=density(data,n=20,from=-1,to=10)
> >>
> >> kde$x;kde$y
> > [1] -1.0000000 -0.4210526  0.1578947  0.7368421  1.3157895  1.8947368
> > [7]  2.4736842  3.0526316  3.6315789  4.2105263  4.7894737  5.3684211
> > [13]  5.9473684  6.5263158  7.1052632  7.6842105  8.2631579  8.8421053
> > [19]  9.4210526 10.0000000
> > [1]  2.422392e-01  3.877025e-01  4.746580e-01  2.757747e-01
> 1.787630e-01
> > [6]  1.102396e-01  2.331694e-02  3.294412e-04  2.260746e-07
> 6.996146e-12
> > [11] -1.179461e-18 -1.226790e-17  8.892545e-18  1.144173e-17 -
> 1.881253e-17
> > [16] -2.782621e-17 -5.314722e-18 -1.691545e-17 -1.986261e-17 -
> 2.498227e-17
> >
> > Best,
> >
> > Junjie Li
> >
> > Tsinghua Univercity
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Brian D. Ripley,                  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to