Re: Incorrect calculation of pdf

Vasil Vasilev Tue, 28 Jun 2011 06:02:04 -0700

In fact my idea was very simple, although I do not know if it will work OK:
Do all calculations on logarithmic level and just before return -
exponentiate the result. This will not change the function's expected result


On Mon, Jun 27, 2011 at 9:03 PM, Ted Dunning <[email protected]> wrote:

> Actually, pdf() should always be a pdf(), not a logPdf().  Many algorithms
> want one or the other.  Some don't much care because log is monotonic.  But
> we should do what the name implies.
>
> On Mon, Jun 27, 2011 at 10:15 AM, Jeff Eastman <[email protected]> wrote:
>
> > A better approach would be to create a new Model and ModelDistribution
> that
> > uses log arithmetic of your choosing. The initial models are very simple
> > minded and are likely not adequate for real applications.
> >
> > -----Original Message-----
> > From: Ted Dunning [mailto:[email protected]]
> > Sent: Monday, June 27, 2011 7:51 AM
> > To: [email protected]
> > Subject: Re: Incorrect calculation of pdf
> >
> > There should not be a change to an existing method.
> >
> > It would be find to add another method, perhaps called logPdf, that does
> > what you suggest.  This loss of precision is common with the normal
> > distribution in high dimensions.
> >
> > On Mon, Jun 27, 2011 at 1:49 AM, Vasil Vasilev <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > Recently I wanted to use Dirichlet clustering algorithm to cluster
> > vectors
> > > directly taken out of vectorized text, whose dimensionality was around
> > > 50000. In this situation the algorithm fails to calculate the pdf of a
> > > vector corresponding to cluster center due to problems with numerical
> > > precision during multiplication.
> > >
> > > In this regard, what do you think of modifying the
> GaussianCluster.pdf()
> > > method in such way that it works with logarithmic probabilities?
> > >
> > > Regards, Vasil
> > >
> >
>

Re: Incorrect calculation of pdf

Reply via email to