On 03/08/2012 05:41 PM, Nir Krakauer wrote:
> With the current package version (2.5.2) running in Octave 3.6.1, cov
> may return infinite covariances when there is only a single non-NaN
> overlap between two data series. For example,
>
> C = cov([NaN 1 2 3 4; 1 NaN NaN NaN 2]')
>
> returns
>
> C =
>
> 1.66667 Inf
> Inf 0.50000
>
>
> I assume that is not the intended outcome?
Hi Nir,
this is a strange question, how should I answer? The brief answer is:
the outcome itself is not intended by me, but the behavior of the
function cov() is intended. Let me explain:
Inf is caused by the default normalization with (N-1) in cov(),
resulting in a division by zero. You can avoid this, by using a
normalization with N as documented:
C = cov([NaN 1 2 3 4; 1 NaN NaN NaN 2]', 1)
C =
1.25000 0.75000
0.75000 0.25000
Whatsoever, the function cov() is rather strange in its behavior with
respect to data containing missing values. Unlike for data w/o missing
values, there is no garantee that the outcome of cov() is "positive
definite", or that the (magnitude of the) elements are always smaller
than 1.
det(C)
ans = -0.25000
In the NaN-tb, the nice properties we know from data w/o missing values
(positive definiteness, and magnitude of elements <=1) are maintained by
the function corrcoef().
C= corrcoef([NaN 1 2 3 4; 1 NaN NaN NaN 2]'*10)
C =
1 NaN
NaN 1
Here, NaN indicates a 0/0 like the std(x) of a single element x is also
(x-mean(x))/(N-1) = 0/0 = NaN, resulting in an undefined value.
(Note, it does not mean that the off-diagonals can be outside the
interval ]-1,1[, it means the value can take any value in the interval
+-1. )
cov() is fast and might be suitable for large data sets, with corrcoef()
we get the nice properties even for very small data sets. Therefore, I
did not try to make cov() and corrcoef() similar, it's intended by me
that the two functions can behave differently, and that the user can
choose which one (s)he wants. Note also, that the functions yield the
same outcome for data w/o NaNs. So, the compatibility w.r.t. to data w/o
NaN is maintained.
I hope this answers your question.
Alois
>
> Thanks,
>
> Nir
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev