That is a fine idea and often works well, especially where you are
multiplying or just comparing probabilities.  You need a different method
there, of course, to get the log probability.

Where you are adding probabilities, it doesn't work quite so simply.  Even
there, though, the correct method is to get all the log probabilities,
subtract the maximum value so that the max probability is 0 and then add
only those probabilities where the log is large enough to matter (anything
smaller than -60 after offsetting is == 0).  Then you add back in the max
log prob and take the exponent.

So you are correct that it is important to have the log pdf available as a
call.

On Tue, Jun 28, 2011 at 6:01 AM, Vasil Vasilev <[email protected]> wrote:

> In fact my idea was very simple, although I do not know if it will work OK:
> Do all calculations on logarithmic level and just before return -
> exponentiate the result. This will not change the function's expected
> result
>
> On Mon, Jun 27, 2011 at 9:03 PM, Ted Dunning <[email protected]>
> wrote:
>
> > Actually, pdf() should always be a pdf(), not a logPdf().  Many
> algorithms
> > want one or the other.  Some don't much care because log is monotonic.
>  But
> > we should do what the name implies.
> >
> > On Mon, Jun 27, 2011 at 10:15 AM, Jeff Eastman <[email protected]>
> wrote:
> >
> > > A better approach would be to create a new Model and ModelDistribution
> > that
> > > uses log arithmetic of your choosing. The initial models are very
> simple
> > > minded and are likely not adequate for real applications.
> > >
> > > -----Original Message-----
> > > From: Ted Dunning [mailto:[email protected]]
> > > Sent: Monday, June 27, 2011 7:51 AM
> > > To: [email protected]
> > > Subject: Re: Incorrect calculation of pdf
> > >
> > > There should not be a change to an existing method.
> > >
> > > It would be find to add another method, perhaps called logPdf, that
> does
> > > what you suggest.  This loss of precision is common with the normal
> > > distribution in high dimensions.
> > >
> > > On Mon, Jun 27, 2011 at 1:49 AM, Vasil Vasilev <[email protected]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Recently I wanted to use Dirichlet clustering algorithm to cluster
> > > vectors
> > > > directly taken out of vectorized text, whose dimensionality was
> around
> > > > 50000. In this situation the algorithm fails to calculate the pdf of
> a
> > > > vector corresponding to cluster center due to problems with numerical
> > > > precision during multiplication.
> > > >
> > > > In this regard, what do you think of modifying the
> > GaussianCluster.pdf()
> > > > method in such way that it works with logarithmic probabilities?
> > > >
> > > > Regards, Vasil
> > > >
> > >
> >
>

Reply via email to