On Sat, Oct 15, 2011 at 3:57 PM, Pietro Berkes <[email protected]> wrote: > I wish there was a native numpy function for this case, which is > fairly common in information theory quantities. > As a workaround, I sometimes use these reasonably efficient utility functions: > > def log0(x): > """Robust 'entropy' logarithm: log(0.) = 0.""" > return np.where(x==0., 0., np.log(x)) > > > def log0_no_warning(x): > """Robust 'entropy' logarithm: log(0.) = 0. > > This version does not raise any warning when values of x=0. are first > encountered. However, it is slightly more inefficient.""" > with np.errstate(divide='ignore'): > res = np.where(x==0., 0., np.log(x)) > return res >
I think the function is quite dangerous if you take it out of the context of information measures >>> np.log(0) -inf The equivalent functions that I used where all for xlogy res = np.where(x==0., 0., x*np.log(y)) Just my 2c from other packages. Josef > > > On Fri, Oct 14, 2011 at 10:31 AM, Olivier Grisel > <[email protected]> wrote: >> 2011/10/14 Robert Layton <[email protected]>: >>> I'm working on adding Adjusted Mutual Information, and need to calculate the >>> Mutual Information. >>> I think I have the algorithm itself correct, except for the fact that >>> whenever the contingency matrix is 0, a nan happens and propogates through >>> the code. >>> >>> Sample code on the net [1] uses an eps=np.finfo(float).eps. Should I do >>> this, adding eps to anything that is a denominator or parameter to log? >>> Is there a better way? >> >> I would rather filter out any entry that has a 0.0 in the denominator >> before the final sum using array masking. >> >> BTW, thanks for tackling this. >> >> -- >> Olivier >> http://twitter.com/ogrisel - http://github.com/ogrisel >> >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense. >> http://p.sf.net/sfu/splunk-d2d-oct >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
