On 03/08/2012 05:41 PM, Nir Krakauer wrote: > With the current package version (2.5.2) running in Octave 3.6.1, cov > may return infinite covariances when there is only a single non-NaN > overlap between two data series. For example, > > C = cov([NaN 1 2 3 4; 1 NaN NaN NaN 2]') > > returns > > C = > > 1.66667 Inf > Inf 0.50000 > > > I assume that is not the intended outcome?
Hi Nir, this is a strange question, how should I answer? The brief answer is: the outcome itself is not intended by me, but the behavior of the function cov() is intended. Let me explain: Inf is caused by the default normalization with (N-1) in cov(), resulting in a division by zero. You can avoid this, by using a normalization with N as documented: C = cov([NaN 1 2 3 4; 1 NaN NaN NaN 2]', 1) C = 1.25000 0.75000 0.75000 0.25000 Whatsoever, the function cov() is rather strange in its behavior with respect to data containing missing values. Unlike for data w/o missing values, there is no garantee that the outcome of cov() is "positive definite", or that the (magnitude of the) elements are always smaller than 1. det(C) ans = -0.25000 In the NaN-tb, the nice properties we know from data w/o missing values (positive definiteness, and magnitude of elements <=1) are maintained by the function corrcoef(). C= corrcoef([NaN 1 2 3 4; 1 NaN NaN NaN 2]'*10) C = 1 NaN NaN 1 Here, NaN indicates a 0/0 like the std(x) of a single element x is also (x-mean(x))/(N-1) = 0/0 = NaN, resulting in an undefined value. (Note, it does not mean that the off-diagonals can be outside the interval ]-1,1[, it means the value can take any value in the interval +-1. ) cov() is fast and might be suitable for large data sets, with corrcoef() we get the nice properties even for very small data sets. Therefore, I did not try to make cov() and corrcoef() similar, it's intended by me that the two functions can behave differently, and that the user can choose which one (s)he wants. Note also, that the functions yield the same outcome for data w/o NaNs. So, the compatibility w.r.t. to data w/o NaN is maintained. I hope this answers your question. Alois > > Thanks, > > Nir ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Octave-dev mailing list Octave-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/octave-dev