In fact my idea was very simple, although I do not know if it will work OK: Do all calculations on logarithmic level and just before return - exponentiate the result. This will not change the function's expected result
On Mon, Jun 27, 2011 at 9:03 PM, Ted Dunning <[email protected]> wrote: > Actually, pdf() should always be a pdf(), not a logPdf(). Many algorithms > want one or the other. Some don't much care because log is monotonic. But > we should do what the name implies. > > On Mon, Jun 27, 2011 at 10:15 AM, Jeff Eastman <[email protected]> wrote: > > > A better approach would be to create a new Model and ModelDistribution > that > > uses log arithmetic of your choosing. The initial models are very simple > > minded and are likely not adequate for real applications. > > > > -----Original Message----- > > From: Ted Dunning [mailto:[email protected]] > > Sent: Monday, June 27, 2011 7:51 AM > > To: [email protected] > > Subject: Re: Incorrect calculation of pdf > > > > There should not be a change to an existing method. > > > > It would be find to add another method, perhaps called logPdf, that does > > what you suggest. This loss of precision is common with the normal > > distribution in high dimensions. > > > > On Mon, Jun 27, 2011 at 1:49 AM, Vasil Vasilev <[email protected]> > > wrote: > > > > > Hi, > > > > > > Recently I wanted to use Dirichlet clustering algorithm to cluster > > vectors > > > directly taken out of vectorized text, whose dimensionality was around > > > 50000. In this situation the algorithm fails to calculate the pdf of a > > > vector corresponding to cluster center due to problems with numerical > > > precision during multiplication. > > > > > > In this regard, what do you think of modifying the > GaussianCluster.pdf() > > > method in such way that it works with logarithmic probabilities? > > > > > > Regards, Vasil > > > > > >
