On 27 October 2011 13:42, Alexandre Passos <[email protected]> wrote:

> On Wed, Oct 26, 2011 at 22:38, Robert Layton <[email protected]>
> wrote:
> > On 27 October 2011 13:29, Alexandre Passos <[email protected]>
> wrote:
> >>
> >> On Wed, Oct 26, 2011 at 22:27, Alexandre Passos <[email protected]
> >
> >> wrote:
> >> > On Wed, Oct 26, 2011 at 22:15, Robert Layton <[email protected]>
> >> > wrote:
> >> >> I am trying to implement the Adjusted Mutual Information in a stable
> >> >> way.
> >> >> Unfortunately, the third term for the Expected Mutual Information is
> >> >> not
> >> >> stable and can result in overflow issues with only a moderate number
> of
> >> >> samples (eg N=1000 fails). See
> >> >> here: http://en.wikipedia.org/wiki/Adjusted_mutual_information
> >> >> I think I've reduced the equation to a more stable
> >> >> format:
> https://github.com/robertlayton/scikit-learn/wiki/Reducing-EMI
> >> >> I would appreciate if someone could look through this an check:
> >> >> 1) That I did this correctly
> >> >> 2) That there isn't a better way (a better identity or efficient way
> to
> >> >> reduce factorials)
> >> >
> >> > Have you tried using scipy.special.gammaln, doing all the
> >> > multiplications and divisions with additions and subtractions in
> >> > logspace, and then exponentiating?
> >>
> >> And if this turns out to be too expensive you can probably get away
> >> with stirling's approximation for log n!
> >> http://en.wikipedia.org/wiki/Stirling%27s_approximation
> >>
> >>
> >> --
> >>  - Alexandre
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> The demand for IT networking professionals continues to grow, and the
> >> demand for specialized networking skills is growing even more rapidly.
> >> Take a complimentary Learning@Cisco Self-Assessment and learn
> >> about Cisco certifications, training, and career opportunities.
> >> http://p.sf.net/sfu/cisco-dev2dev
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> [email protected]
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> > That is an option. I wasn't sure how to use it though -> calculating the
> > factorial isn't the issue, its working with the really large numbers that
> > is. That is why I went with permutations, as the number should be lower.
>
> Correct my if I'm wrong, but I'd say the problem is that in your
> computation that should produce a reasonably small number your
> intermediate steps actually involve very big numbers, which will be
> multiplied and divided with each other until something reasonable is
> left. So working in logspace will "squash" these numbers into
> manageable sizes and after all the multiplications and divisions
> (which will be additions and subtractions) let you have reasonable
> numbers again. Most of your simplifications can still apply in
> logspace, I think, and they could make it faster.
>
> --
>  - Alexandre
>
>
> ------------------------------------------------------------------------------
> The demand for IT networking professionals continues to grow, and the
> demand for specialized networking skills is growing even more rapidly.
> Take a complimentary Learning@Cisco Self-Assessment and learn
> about Cisco certifications, training, and career opportunities.
> http://p.sf.net/sfu/cisco-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>


You have it correct. I haven't done this level of algebra for a while, so
I'll need to work it out.
If I have it correctly, I should be able to do:

log(a!) + log(b!) + log((N-a)!) + log((N-b)!) - log(N!) - log(n!) -
log((a-n)!) - log((b-n)!) - log((N-a-b+n)!),

then apply Stirling and simplify?

-- 


My public key can be found at: http://pgp.mit.edu/
Search for this email address and select the key from "2011-08-19" (key id:
54BA8735)
Older keys can be used, but please inform me beforehand (and update when
possible!)
------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to