Better to have a logPdf method that never does the exponentiation
internally.

But other than that detail, yes.

On Tue, Jun 28, 2011 at 8:56 AM, Jeff Eastman <[email protected]> wrote:

> In other words, you plan to take the log(pdf) of each term in the model
> vectors, sum them and exponentiate the result? It would be interesting to
> compare the results.
>
> -----Original Message-----
> From: Vasil Vasilev [mailto:[email protected]]
> Sent: Tuesday, June 28, 2011 6:02 AM
> To: [email protected]
> Subject: Re: Incorrect calculation of pdf
>
> In fact my idea was very simple, although I do not know if it will work OK:
> Do all calculations on logarithmic level and just before return -
> exponentiate the result. This will not change the function's expected
> result
>
> On Mon, Jun 27, 2011 at 9:03 PM, Ted Dunning <[email protected]>
> wrote:
>
> > Actually, pdf() should always be a pdf(), not a logPdf().  Many
> algorithms
> > want one or the other.  Some don't much care because log is monotonic.
>  But
> > we should do what the name implies.
> >
> > On Mon, Jun 27, 2011 at 10:15 AM, Jeff Eastman <[email protected]>
> wrote:
> >
> > > A better approach would be to create a new Model and ModelDistribution
> > that
> > > uses log arithmetic of your choosing. The initial models are very
> simple
> > > minded and are likely not adequate for real applications.
> > >
> > > -----Original Message-----
> > > From: Ted Dunning [mailto:[email protected]]
> > > Sent: Monday, June 27, 2011 7:51 AM
> > > To: [email protected]
> > > Subject: Re: Incorrect calculation of pdf
> > >
> > > There should not be a change to an existing method.
> > >
> > > It would be find to add another method, perhaps called logPdf, that
> does
> > > what you suggest.  This loss of precision is common with the normal
> > > distribution in high dimensions.
> > >
> > > On Mon, Jun 27, 2011 at 1:49 AM, Vasil Vasilev <[email protected]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Recently I wanted to use Dirichlet clustering algorithm to cluster
> > > vectors
> > > > directly taken out of vectorized text, whose dimensionality was
> around
> > > > 50000. In this situation the algorithm fails to calculate the pdf of
> a
> > > > vector corresponding to cluster center due to problems with numerical
> > > > precision during multiplication.
> > > >
> > > > In this regard, what do you think of modifying the
> > GaussianCluster.pdf()
> > > > method in such way that it works with logarithmic probabilities?
> > > >
> > > > Regards, Vasil
> > > >
> > >
> >
>

Reply via email to