Actually, pdf() should always be a pdf(), not a logPdf(). Many algorithms want one or the other. Some don't much care because log is monotonic. But we should do what the name implies.
On Mon, Jun 27, 2011 at 10:15 AM, Jeff Eastman <[email protected]> wrote: > A better approach would be to create a new Model and ModelDistribution that > uses log arithmetic of your choosing. The initial models are very simple > minded and are likely not adequate for real applications. > > -----Original Message----- > From: Ted Dunning [mailto:[email protected]] > Sent: Monday, June 27, 2011 7:51 AM > To: [email protected] > Subject: Re: Incorrect calculation of pdf > > There should not be a change to an existing method. > > It would be find to add another method, perhaps called logPdf, that does > what you suggest. This loss of precision is common with the normal > distribution in high dimensions. > > On Mon, Jun 27, 2011 at 1:49 AM, Vasil Vasilev <[email protected]> > wrote: > > > Hi, > > > > Recently I wanted to use Dirichlet clustering algorithm to cluster > vectors > > directly taken out of vectorized text, whose dimensionality was around > > 50000. In this situation the algorithm fails to calculate the pdf of a > > vector corresponding to cluster center due to problems with numerical > > precision during multiplication. > > > > In this regard, what do you think of modifying the GaussianCluster.pdf() > > method in such way that it works with logarithmic probabilities? > > > > Regards, Vasil > > >
