A better approach would be to create a new Model and ModelDistribution that uses log arithmetic of your choosing. The initial models are very simple minded and are likely not adequate for real applications.
-----Original Message----- From: Ted Dunning [mailto:[email protected]] Sent: Monday, June 27, 2011 7:51 AM To: [email protected] Subject: Re: Incorrect calculation of pdf There should not be a change to an existing method. It would be find to add another method, perhaps called logPdf, that does what you suggest. This loss of precision is common with the normal distribution in high dimensions. On Mon, Jun 27, 2011 at 1:49 AM, Vasil Vasilev <[email protected]> wrote: > Hi, > > Recently I wanted to use Dirichlet clustering algorithm to cluster vectors > directly taken out of vectorized text, whose dimensionality was around > 50000. In this situation the algorithm fails to calculate the pdf of a > vector corresponding to cluster center due to problems with numerical > precision during multiplication. > > In this regard, what do you think of modifying the GaussianCluster.pdf() > method in such way that it works with logarithmic probabilities? > > Regards, Vasil >
