Re: [Scilab-users] opportunity of merging cov() and covar()

Federico Miyara Wed, 19 Feb 2020 17:59:22 -0800


Stéphane,

My first argument in favor of keeping covar and cov as separatefunctions is that often what one needs is the covariance between twopotentially correlated signals regardless of their individual variances,so it seems somewhat inefficient to compute essentially threecovariances (two of them between two equal signals) when it is only oneof them what one wants to calculate.

However, the syntax of covar should have an option to process thesignals directly instead of their statistics (values and jointfrequencies): covar(x,y)

My second argument seems to be the opposite of my pevious request:covar(x,y,fre) is a quite different function since the input informationis presented in a different way, which is valuable when one happens tohave the information in such fashion.


Regards,

Federico Miyara


On 19/02/2020 17:14, Stéphane Mottelet wrote:

Hi all,
Within the development team we recently had a discussion about theimprovement of cov() in terms of speed and memory requirement andabout the opportunity of merging cov() and covar() wich are twodisctinct macros. Since we did not manage to reach a consensus wethought it could be the occasion to have the opinion of members ofthis list which have a recognized academical/research knowledge inprobability and statistics. Here are some elements to start thediscussion. Let us start with covar() macro and what it actually computes:
* covar()

Let us start with a definition of covariance in general:

https://fr.wikipedia.org/wiki/Covariance#D%C3%A9finition_de_la_covariance

and with an example there:

https://en.wikipedia.org/wiki/Covariance#Example
In the two above links scalar/real variables are considered and in thesecond link discrete random variables are considered. In the examplethe covariance is computed knowing the possible values and their jointdensity. You can easily check in the source of covar() (type "editcovar") that, after normalizing the matrix of joint probabilities(named "frequencies" in the source), the macro computes the samevalue, which is confirmed by the result of the following statements:
--> x=[1 2];y=[1 2 3];fre = [1/4 1/4 0;0 1/4 1/4];covar(x,y,fre)
 ans  =

   0.25
Please note that covar() output is always a scalar. Now let usconsider cov():
* cov()

Here is a definition of the covariance matrix:

https://en.wikipedia.org/wiki/Covariance_matrix
Here we consider vectors of random variables (not scalar randomvariables) and in this case the covariance is a matrix. When there isno a priori knowledge on these variables (when the joint density isnot known, typically), the best you can do is, when you have samplesof this random vector, is to compute an estimation of the covariancematrix, see e.g. he following page:
https://en.wikipedia.org/wiki/Estimation_of_covariance_matrices
You can verify in actual code of cov() that this macro computes thesame estimation (sums are vectorized).
We can summarize these facts this way:
* covar(x,y,fre) computes the scalar covariance of two discrete randomvariables knowing their possible values x(:) and y(:) and their jointprobability density
* When x is a matrix, cov(x) computes an estimator of the covariancematrix of a vector X of size(x,2) random variables by using size(x,1)samples of this vector (each x(i,:) is a sample). if x and y arevectors of the same size, cov(x,y) is computed as cov([x(:) y(:)]).
To me, the main difference is that covar(x,y,fre) does not compute an_estimator_but a _exact value_. Of course, the vectors x and y can bethe unique value of two random variables, gathered from samples (x,y)and "fre" be the empirical frequency of samples (x_i,y_j). In thiscase covar() will compute an estimation. For example, consider the tworandom variables X and Y, where X takes values {1,2} with equalprobability, and Y=X+U where U takes values {0,1} with equalprobability. We can use covar() to compute the exact covariance of Xand Y, but if we only have samples, like in the below script, if wewant to estimate the covariance with the same macro, then unique pairshave to be found and occurences counted in order to estimate thefrequency :
N=1000;
x=ceil(rand(N,1)*2);
y=x+floor(rand(N,1)*2);

[pairs,k]=unique(gsort([x y],'lr','i'),'r');
f=diff([k;N+1])/N;

freq=sparse(pairs,f)
N/(N-1)*covar(1:2,1:3,freq)
cov(x,y)

If you have a look to the results,

--> freq
 freq  =

   0.2526   0.2489   0.
   0.       0.2453   0.2532

--> N/(N-1)*covar(1:2,1:3,freq)
 ans  =

   0.249769

--> cov(x,y)
 ans  =

   0.2500182   0.249769
   0.249769    0.4995447

you can see that
1. we have considered the same random variables as in the examplehttps://en.wikipedia.org/wiki/Covariance#Example2. covar's output (up to the normalization to correct the bias) givesthe off diagonal term of cov(x,y)
So, yes, off diagonal term of cov(x,y) and covar(x,y,fre) (up tounique pairs determination, computation of "fre" and bias correction)have the same value, but is it a reason to merge the two functions. Ithink that the answer is NO.
If you agree or disagree, feel free to continue the discussion in thisthread.
S.

--
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet

_______________________________________________
users mailing list
[email protected]
http://lists.scilab.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
http://lists.scilab.org/mailman/listinfo/users

Re: [Scilab-users] opportunity of merging cov() and covar()

Reply via email to