On 03/08/2012 05:41 PM, Nir Krakauer wrote:
> With the current package version (2.5.2) running in Octave 3.6.1, cov
> may return infinite covariances when there is only a single non-NaN
> overlap between two data series. For example,
>
> C = cov([NaN 1 2 3 4; 1 NaN NaN NaN 2]')
>
> returns
>
> C =
>
>     1.66667       Inf
>         Inf   0.50000
>
>
> I assume that is not the intended outcome?


Hi Nir,

this is a strange question, how should I answer? The brief answer is: 
the outcome itself is not intended by me, but the behavior of the 
function cov() is intended. Let me explain:

Inf is caused by the default normalization with (N-1) in cov(), 
resulting in a division by zero. You can avoid this, by using a 
normalization with N as documented:

C = cov([NaN 1 2 3 4; 1 NaN NaN NaN 2]', 1)
C =

    1.25000   0.75000
    0.75000   0.25000

Whatsoever, the function cov() is rather strange in its behavior with 
respect to data containing missing values. Unlike for data w/o missing 
values, there is no garantee that the outcome of cov() is "positive 
definite", or that the (magnitude of the) elements are always smaller 
than 1.

det(C)
   ans = -0.25000

In the NaN-tb, the nice properties we know from data w/o missing values  
(positive definiteness, and magnitude of elements <=1) are maintained by 
the function corrcoef().

C= corrcoef([NaN 1 2 3 4; 1 NaN NaN NaN 2]'*10)
C =

      1   NaN
    NaN     1

Here, NaN indicates a 0/0 like the std(x) of a single element x is also 
(x-mean(x))/(N-1) = 0/0 = NaN, resulting in an undefined value.
(Note, it does not mean that the off-diagonals can be outside the 
interval ]-1,1[, it means the value can take any value in the interval 
+-1. )

cov() is fast and might be suitable for large data sets, with corrcoef() 
we get the nice properties even for very small data sets. Therefore, I 
did not try to make cov() and corrcoef() similar, it's intended by me 
that the two functions can behave differently, and that the user can 
choose which one (s)he wants. Note also, that the functions yield the 
same outcome for data w/o NaNs. So, the compatibility w.r.t. to data w/o 
NaN is maintained.

I hope this answers your question.

    Alois

>
> Thanks,
>
> Nir


------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev

Reply via email to