On Sat, Oct 15, 2011 at 9:07 PM,  <[email protected]> wrote:
> On Sat, Oct 15, 2011 at 3:57 PM, Pietro Berkes <[email protected]> wrote:
>> I wish there was a native numpy function for this case, which is
>> fairly common in information theory quantities.
>> As a workaround, I sometimes use these reasonably efficient utility 
>> functions:
>>
>> def log0(x):
>>    """Robust 'entropy' logarithm: log(0.) = 0."""
>>    return np.where(x==0., 0., np.log(x))
>>
>>
>> def log0_no_warning(x):
>>    """Robust 'entropy' logarithm: log(0.) = 0.
>>
>>    This version does not raise any warning when values of x=0. are first
>>    encountered. However, it is slightly more inefficient."""
>>    with np.errstate(divide='ignore'):
>>        res = np.where(x==0., 0., np.log(x))
>>    return res
>>
>
> I think the function is quite dangerous if you take it out of the
> context of information measures
>
>>>> np.log(0)
> -inf
>
> The equivalent functions that I used where all  for xlogy
>
> res = np.where(x==0., 0., x*np.log(y))
>
>
> Just my 2c from other packages.

Well it is useful in other contexts, e.g. to compute the log pdf of a
beta distribution:

from scipy.special import gammaln

def log_beta_pdf(x, a, b):
    """Return the natural logarithm of the Beta(a,b) distribution at x."""
    return (gammaln(a+b) - gammaln(a) - gammaln(b)
            + (a-1.)*log0(x) + (b-1.)*log0(1.-x))

I agree that it could have a more explicit name, like entropy_log(x) .




>
> Josef
>
>>
>>
>> On Fri, Oct 14, 2011 at 10:31 AM, Olivier Grisel
>> <[email protected]> wrote:
>>> 2011/10/14 Robert Layton <[email protected]>:
>>>> I'm working on adding Adjusted Mutual Information, and need to calculate 
>>>> the
>>>> Mutual Information.
>>>> I think I have the algorithm itself correct, except for the fact that
>>>> whenever the contingency matrix is 0, a nan happens and propogates through
>>>> the code.
>>>>
>>>> Sample code on the net [1] uses an eps=np.finfo(float).eps. Should I do
>>>> this, adding eps to anything that is a denominator or parameter to log?
>>>> Is there a better way?
>>>
>>> I would rather filter out any entry that has a 0.0 in the denominator
>>> before the final sum using array masking.
>>>
>>> BTW, thanks for tackling this.
>>>
>>> --
>>> Olivier
>>> http://twitter.com/ogrisel - http://github.com/ogrisel
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2d-oct
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2d-oct
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to