On Wed, Oct 26, 2011 at 22:38, Robert Layton <[email protected]> wrote: > On 27 October 2011 13:29, Alexandre Passos <[email protected]> wrote: >> >> On Wed, Oct 26, 2011 at 22:27, Alexandre Passos <[email protected]> >> wrote: >> > On Wed, Oct 26, 2011 at 22:15, Robert Layton <[email protected]> >> > wrote: >> >> I am trying to implement the Adjusted Mutual Information in a stable >> >> way. >> >> Unfortunately, the third term for the Expected Mutual Information is >> >> not >> >> stable and can result in overflow issues with only a moderate number of >> >> samples (eg N=1000 fails). See >> >> here: http://en.wikipedia.org/wiki/Adjusted_mutual_information >> >> I think I've reduced the equation to a more stable >> >> format: https://github.com/robertlayton/scikit-learn/wiki/Reducing-EMI >> >> I would appreciate if someone could look through this an check: >> >> 1) That I did this correctly >> >> 2) That there isn't a better way (a better identity or efficient way to >> >> reduce factorials) >> > >> > Have you tried using scipy.special.gammaln, doing all the >> > multiplications and divisions with additions and subtractions in >> > logspace, and then exponentiating? >> >> And if this turns out to be too expensive you can probably get away >> with stirling's approximation for log n! >> http://en.wikipedia.org/wiki/Stirling%27s_approximation >> >> >> -- >> - Alexandre >> >> >> ------------------------------------------------------------------------------ >> The demand for IT networking professionals continues to grow, and the >> demand for specialized networking skills is growing even more rapidly. >> Take a complimentary Learning@Cisco Self-Assessment and learn >> about Cisco certifications, training, and career opportunities. >> http://p.sf.net/sfu/cisco-dev2dev >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > That is an option. I wasn't sure how to use it though -> calculating the > factorial isn't the issue, its working with the really large numbers that > is. That is why I went with permutations, as the number should be lower.
Correct my if I'm wrong, but I'd say the problem is that in your computation that should produce a reasonably small number your intermediate steps actually involve very big numbers, which will be multiplied and divided with each other until something reasonable is left. So working in logspace will "squash" these numbers into manageable sizes and after all the multiplications and divisions (which will be additions and subtractions) let you have reasonable numbers again. Most of your simplifications can still apply in logspace, I think, and they could make it faster. -- - Alexandre ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
