Re: Incorrect calculation of pdf

Ted Dunning Mon, 27 Jun 2011 11:04:41 -0700

Actually, pdf() should always be a pdf(), not a logPdf().  Many algorithms
want one or the other.  Some don't much care because log is monotonic.  But
we should do what the name implies.


On Mon, Jun 27, 2011 at 10:15 AM, Jeff Eastman <[email protected]> wrote:

> A better approach would be to create a new Model and ModelDistribution that
> uses log arithmetic of your choosing. The initial models are very simple
> minded and are likely not adequate for real applications.
>
> -----Original Message-----
> From: Ted Dunning [mailto:[email protected]]
> Sent: Monday, June 27, 2011 7:51 AM
> To: [email protected]
> Subject: Re: Incorrect calculation of pdf
>
> There should not be a change to an existing method.
>
> It would be find to add another method, perhaps called logPdf, that does
> what you suggest.  This loss of precision is common with the normal
> distribution in high dimensions.
>
> On Mon, Jun 27, 2011 at 1:49 AM, Vasil Vasilev <[email protected]>
> wrote:
>
> > Hi,
> >
> > Recently I wanted to use Dirichlet clustering algorithm to cluster
> vectors
> > directly taken out of vectorized text, whose dimensionality was around
> > 50000. In this situation the algorithm fails to calculate the pdf of a
> > vector corresponding to cluster center due to problems with numerical
> > precision during multiplication.
> >
> > In this regard, what do you think of modifying the GaussianCluster.pdf()
> > method in such way that it works with logarithmic probabilities?
> >
> > Regards, Vasil
> >
>

Re: Incorrect calculation of pdf

Reply via email to