On Sat, Oct 15, 2011 at 4:12 PM, Pietro Berkes <[email protected]> wrote:
> On Sat, Oct 15, 2011 at 9:07 PM,  <[email protected]> wrote:
>> On Sat, Oct 15, 2011 at 3:57 PM, Pietro Berkes <[email protected]> wrote:
>>> I wish there was a native numpy function for this case, which is
>>> fairly common in information theory quantities.
>>> As a workaround, I sometimes use these reasonably efficient utility 
>>> functions:
>>>
>>> def log0(x):
>>>    """Robust 'entropy' logarithm: log(0.) = 0."""
>>>    return np.where(x==0., 0., np.log(x))
>>>
>>>
>>> def log0_no_warning(x):
>>>    """Robust 'entropy' logarithm: log(0.) = 0.
>>>
>>>    This version does not raise any warning when values of x=0. are first
>>>    encountered. However, it is slightly more inefficient."""
>>>    with np.errstate(divide='ignore'):
>>>        res = np.where(x==0., 0., np.log(x))
>>>    return res
>>>
>>
>> I think the function is quite dangerous if you take it out of the
>> context of information measures
>>
>>>>> np.log(0)
>> -inf
>>
>> The equivalent functions that I used where all  for xlogy
>>
>> res = np.where(x==0., 0., x*np.log(y))
>>
>>
>> Just my 2c from other packages.
>
> Well it is useful in other contexts, e.g. to compute the log pdf of a
> beta distribution:
>
> from scipy.special import gammaln
>
> def log_beta_pdf(x, a, b):
>    """Return the natural logarithm of the Beta(a,b) distribution at x."""
>    return (gammaln(a+b) - gammaln(a) - gammaln(b)
>            + (a-1.)*log0(x) + (b-1.)*log0(1.-x))

not here:

>>> from scipy import stats
>>> stats.beta._logpdf(0, 0.5, 0.5)
inf
>>> stats.beta._logpdf(1e-15, 0.5, 0.5)
16.124658311605941
>>> stats.beta._logpdf(1e-30, 0.5, 0.5)
33.394046509061283
>>> stats.beta._logpdf(1e-100, 0.5, 0.5)
113.98452476385289
>>> stats.beta._logpdf(1e-500, 0.5, 0.5)
inf
>>> stats.beta._logpdf(1e-300, 0.5, 0.5)
344.24303406325743

0log0 only if a=1 or b=1 and x is 0 or 1

or gamma: https://github.com/scipy/scipy/pull/5

(bug in scipy 0.9:
>>> stats.beta._logpdf(1e-300, 1, 0.5)
-0.69314718055994529
>>> stats.beta._logpdf(0, 1, 0.5)
nan
>>> np.log(stats.beta._pdf(0, 1, 0.5))
-0.69314718055994529
)

Josef

>
> I agree that it could have a more explicit name, like entropy_log(x) .
>
>
>
>
>>
>> Josef
>>
>>>
>>>
>>> On Fri, Oct 14, 2011 at 10:31 AM, Olivier Grisel
>>> <[email protected]> wrote:
>>>> 2011/10/14 Robert Layton <[email protected]>:
>>>>> I'm working on adding Adjusted Mutual Information, and need to calculate 
>>>>> the
>>>>> Mutual Information.
>>>>> I think I have the algorithm itself correct, except for the fact that
>>>>> whenever the contingency matrix is 0, a nan happens and propogates through
>>>>> the code.
>>>>>
>>>>> Sample code on the net [1] uses an eps=np.finfo(float).eps. Should I do
>>>>> this, adding eps to anything that is a denominator or parameter to log?
>>>>> Is there a better way?
>>>>
>>>> I would rather filter out any entry that has a 0.0 in the denominator
>>>> before the final sum using array masking.
>>>>
>>>> BTW, thanks for tackling this.
>>>>
>>>> --
>>>> Olivier
>>>> http://twitter.com/ogrisel - http://github.com/ogrisel
>>>>
>>>> ------------------------------------------------------------------------------
>>>> All the data continuously generated in your IT infrastructure contains a
>>>> definitive record of customers, application performance, security
>>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>>> sense of it. Business sense. IT sense. Common sense.
>>>> http://p.sf.net/sfu/splunk-d2d-oct
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2d-oct
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2d-oct
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to