On Sat, Oct 15, 2011 at 4:12 PM, Pietro Berkes <[email protected]> wrote: > On Sat, Oct 15, 2011 at 9:07 PM, <[email protected]> wrote: >> On Sat, Oct 15, 2011 at 3:57 PM, Pietro Berkes <[email protected]> wrote: >>> I wish there was a native numpy function for this case, which is >>> fairly common in information theory quantities. >>> As a workaround, I sometimes use these reasonably efficient utility >>> functions: >>> >>> def log0(x): >>> """Robust 'entropy' logarithm: log(0.) = 0.""" >>> return np.where(x==0., 0., np.log(x)) >>> >>> >>> def log0_no_warning(x): >>> """Robust 'entropy' logarithm: log(0.) = 0. >>> >>> This version does not raise any warning when values of x=0. are first >>> encountered. However, it is slightly more inefficient.""" >>> with np.errstate(divide='ignore'): >>> res = np.where(x==0., 0., np.log(x)) >>> return res >>> >> >> I think the function is quite dangerous if you take it out of the >> context of information measures >> >>>>> np.log(0) >> -inf >> >> The equivalent functions that I used where all for xlogy >> >> res = np.where(x==0., 0., x*np.log(y)) >> >> >> Just my 2c from other packages. > > Well it is useful in other contexts, e.g. to compute the log pdf of a > beta distribution: > > from scipy.special import gammaln > > def log_beta_pdf(x, a, b): > """Return the natural logarithm of the Beta(a,b) distribution at x.""" > return (gammaln(a+b) - gammaln(a) - gammaln(b) > + (a-1.)*log0(x) + (b-1.)*log0(1.-x))
not here: >>> from scipy import stats >>> stats.beta._logpdf(0, 0.5, 0.5) inf >>> stats.beta._logpdf(1e-15, 0.5, 0.5) 16.124658311605941 >>> stats.beta._logpdf(1e-30, 0.5, 0.5) 33.394046509061283 >>> stats.beta._logpdf(1e-100, 0.5, 0.5) 113.98452476385289 >>> stats.beta._logpdf(1e-500, 0.5, 0.5) inf >>> stats.beta._logpdf(1e-300, 0.5, 0.5) 344.24303406325743 0log0 only if a=1 or b=1 and x is 0 or 1 or gamma: https://github.com/scipy/scipy/pull/5 (bug in scipy 0.9: >>> stats.beta._logpdf(1e-300, 1, 0.5) -0.69314718055994529 >>> stats.beta._logpdf(0, 1, 0.5) nan >>> np.log(stats.beta._pdf(0, 1, 0.5)) -0.69314718055994529 ) Josef > > I agree that it could have a more explicit name, like entropy_log(x) . > > > > >> >> Josef >> >>> >>> >>> On Fri, Oct 14, 2011 at 10:31 AM, Olivier Grisel >>> <[email protected]> wrote: >>>> 2011/10/14 Robert Layton <[email protected]>: >>>>> I'm working on adding Adjusted Mutual Information, and need to calculate >>>>> the >>>>> Mutual Information. >>>>> I think I have the algorithm itself correct, except for the fact that >>>>> whenever the contingency matrix is 0, a nan happens and propogates through >>>>> the code. >>>>> >>>>> Sample code on the net [1] uses an eps=np.finfo(float).eps. Should I do >>>>> this, adding eps to anything that is a denominator or parameter to log? >>>>> Is there a better way? >>>> >>>> I would rather filter out any entry that has a 0.0 in the denominator >>>> before the final sum using array masking. >>>> >>>> BTW, thanks for tackling this. >>>> >>>> -- >>>> Olivier >>>> http://twitter.com/ogrisel - http://github.com/ogrisel >>>> >>>> ------------------------------------------------------------------------------ >>>> All the data continuously generated in your IT infrastructure contains a >>>> definitive record of customers, application performance, security >>>> threats, fraudulent activity and more. Splunk takes this data and makes >>>> sense of it. Business sense. IT sense. Common sense. >>>> http://p.sf.net/sfu/splunk-d2d-oct >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>> >>> ------------------------------------------------------------------------------ >>> All the data continuously generated in your IT infrastructure contains a >>> definitive record of customers, application performance, security >>> threats, fraudulent activity and more. Splunk takes this data and makes >>> sense of it. Business sense. IT sense. Common sense. >>> http://p.sf.net/sfu/splunk-d2d-oct >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense. >> http://p.sf.net/sfu/splunk-d2d-oct >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
