Better to have a logPdf method that never does the exponentiation internally.
But other than that detail, yes. On Tue, Jun 28, 2011 at 8:56 AM, Jeff Eastman <[email protected]> wrote: > In other words, you plan to take the log(pdf) of each term in the model > vectors, sum them and exponentiate the result? It would be interesting to > compare the results. > > -----Original Message----- > From: Vasil Vasilev [mailto:[email protected]] > Sent: Tuesday, June 28, 2011 6:02 AM > To: [email protected] > Subject: Re: Incorrect calculation of pdf > > In fact my idea was very simple, although I do not know if it will work OK: > Do all calculations on logarithmic level and just before return - > exponentiate the result. This will not change the function's expected > result > > On Mon, Jun 27, 2011 at 9:03 PM, Ted Dunning <[email protected]> > wrote: > > > Actually, pdf() should always be a pdf(), not a logPdf(). Many > algorithms > > want one or the other. Some don't much care because log is monotonic. > But > > we should do what the name implies. > > > > On Mon, Jun 27, 2011 at 10:15 AM, Jeff Eastman <[email protected]> > wrote: > > > > > A better approach would be to create a new Model and ModelDistribution > > that > > > uses log arithmetic of your choosing. The initial models are very > simple > > > minded and are likely not adequate for real applications. > > > > > > -----Original Message----- > > > From: Ted Dunning [mailto:[email protected]] > > > Sent: Monday, June 27, 2011 7:51 AM > > > To: [email protected] > > > Subject: Re: Incorrect calculation of pdf > > > > > > There should not be a change to an existing method. > > > > > > It would be find to add another method, perhaps called logPdf, that > does > > > what you suggest. This loss of precision is common with the normal > > > distribution in high dimensions. > > > > > > On Mon, Jun 27, 2011 at 1:49 AM, Vasil Vasilev <[email protected]> > > > wrote: > > > > > > > Hi, > > > > > > > > Recently I wanted to use Dirichlet clustering algorithm to cluster > > > vectors > > > > directly taken out of vectorized text, whose dimensionality was > around > > > > 50000. In this situation the algorithm fails to calculate the pdf of > a > > > > vector corresponding to cluster center due to problems with numerical > > > > precision during multiplication. > > > > > > > > In this regard, what do you think of modifying the > > GaussianCluster.pdf() > > > > method in such way that it works with logarithmic probabilities? > > > > > > > > Regards, Vasil > > > > > > > > > >
