Re: [Numpy-discussion] Numpy correlate

2013-03-20 Thread Pierre Haessig
Hi,
Le 19/03/2013 08:12, Sudheer Joseph a écrit :
 *Thank you Pierre,*
 It appears the numpy.correlate uses the
 frequency domain method for getting the ccf. I would like to know how
 serious or exactly what is the issue with normalization?. I have
 computed cross correlation using the function and interpreting the
 results based on it. It will be helpful if you could tell me if there
 is a significant bug in the function
 with best regards,
 Sudheer
np.correlate works in the time domain. I started a discussion about a
month ago about the way it's implemented
http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065562.html
Unfortunately I didn't find time to dig deeper in the matter which needs
working in the C code of numpy which I'm not familiar with.

Concerning the normalization of mpl.xcorr, I think that what is computed
is just fine. It's just the way this normalization is described in the
docstring which I think is weird.
https://github.com/matplotlib/matplotlib/issues/1835

best,
Pierre


signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy correlate

2013-03-19 Thread Sudheer Joseph
Thank you All for the response,
                                                       acf do not accept 2 
variables so naturally 
http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb
 
http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html
 
http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html

These may not work for me.

 
***
Sudheer Joseph 
Indian National Centre for Ocean Information Services
Ministry of Earth Sciences, Govt. of India
POST BOX NO: 21, IDA Jeedeemetla P.O.
Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55
Tel:+91-40-23886047(O),Fax:+91-40-23895011(O),
Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile)
E-mail:sjo.in...@gmail.com;sudheer.jos...@yahoo.com
Web- http://oppamthadathil.tripod.com
***



 From: josef.p...@gmail.com josef.p...@gmail.com
To: Discussion of Numerical Python numpy-discussion@scipy.org 
Sent: Tuesday, 19 March 2013 1:51 AM
Subject: Re: [Numpy-discussion] Numpy correlate
 
On Mon, Mar 18, 2013 at 1:10 PM, Skipper Seabold jsseab...@gmail.com wrote:
 On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig pierre.haes...@crans.org
 wrote:

 Hi Sudheer,

 Le 14/03/2013 10:18, Sudheer Joseph a écrit :

 Dear Numpy/Scipy experts,
                                               Attached is a script which I
 made to test the numpy.correlate ( which is called py plt.xcorr) to see how
 the cross correlation is calculated. From this it appears the if i call
 plt.xcorr(x,y)
 Y is slided back in time compared to x. ie if y is a process that causes a
 delayed response in x after 5 timesteps then there should be a high
 correlation at Lag 5. However in attached plot the response is seen in only
 -ve side of the lags.
 Can any one advice me on how to see which way exactly the 2 series are
 slided back or forth.? and understand the cause result relation better?( I
 understand merely by correlation one cannot assume cause and result
 relation, but it is important to know which series is older in time at a
 given lag.

 You indeed pointed out a lack of documentation of in matplotlib.xcorr
 function because the definition of covariance can be ambiguous.

 The way I would try to get an interpretation of xcorr function ( its
 friends) is to go back to the theoretical definition of cross-correlation,
 which is a normalized version of the covariance.

 In your example you've created a time series X(k) and a lagged one : Y(k)
 = X(k-5)

 Now, the covariance function of X and Y is commonly defined as :
  Cov_{X,Y}(h) = E(X(k+h) * Y(k))   where E is the expectation
  (assuming that X and Y are centered for the sake of clarity).

 If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This
 yields naturally the fact that the covariance is indeed maximal at h=-5 and
 not h=+5.

 Note that this reasoning does yield the opposite result with a different
 definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h))  (and
 that's what I first did !).


 Therefore, I think there should be a definition in of cross correlation in
 matplotlib xcorr docstring. In R's acf doc, there is this mention : The lag
 k value returned by ccf(x, y) estimates the correlation between x[t+k] and
 y[t]. 
 (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)

 Now I believe, this upper discussion really belongs to matplotlib ML. I'll
 put an issue on github (I just spotted a mistake the definition of
 normalization anyway)



 You might be interested in the statsmodels implementation which should be
 similar to the R functionality.

 http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb
 http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html
 http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html

we don't have any cross-correlation xcorr, AFAIR
but I guess it should work the same way.

Josef


 Skipper

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy correlate

2013-03-19 Thread Sudheer Joseph
Thank you Pierre,
                        It appears the numpy.correlate uses the frequency 
domain method for getting the ccf. I would like to know how serious or exactly 
what is the issue with normalization?. I have computed cross correlation using 
the function and interpreting the results based on it. It will be helpful if 
you could tell me if there is a significant bug in the function
with best regards,
Sudheer
From: Pierre Haessig pierre.haes...@crans.org
To: numpy-discussion@scipy.org 
Sent: Monday, 18 March 2013 10:30 PM
Subject: Re: [Numpy-discussion] Numpy correlate
 

Hi Sudheer,

Le 14/03/2013 10:18, Sudheer Joseph a écrit : 
Dear Numpy/Scipy experts,
                                              Attached is a script which I 
made to test the numpy.correlate ( which is called py plt.xcorr) to see how 
the cross correlation is calculated. From this it appears the if i call 
plt.xcorr(x,y)
Y is slided back in time compared to x. ie if y is a process that causes a 
delayed response in x after 5 timesteps then there should be a high 
correlation at Lag 5. However in attached plot the response is seen in only 
-ve side of the lags.
Can any one advice me on how to see which way exactly the 2 series are slided 
back or forth.? and understand the cause result relation better?( I understand 
merely by correlation one cannot assume cause and result relation, but it is 
important to know which series is older in time at a given lag.
You indeed pointed out a lack of documentation of in matplotlib.xcorr function 
because the definition of covariance can be ambiguous.

The way I would try to get an interpretation of xcorr function
( its friends) is to go back to the theoretical definition of
cross-correlation, which is a normalized version of the covariance.

In your example you've created a time series X(k) and a lagged one :
Y(k) = X(k-5)

Now, the covariance function of X and Y is commonly defined as :
 Cov_{X,Y}(h) = E(X(k+h) * Y(k))   where E is the expectation
 (assuming that X and Y are centered for the sake of clarity).

If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)).
This yields naturally the fact that the covariance is indeed maximal
at h=-5 and not h=+5.

Note that this reasoning does yield the opposite result with a
different definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) *
Y(k+h))  (and that's what I first did !).


Therefore, I think there should be a definition in of cross
correlation in matplotlib xcorr docstring. In R's acf doc, there is
this mention : The lag k value returned by ccf(x, y) estimates the
correlation between x[t+k] and y[t]. 
(see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)

Now I believe, this upper discussion really belongs to matplotlib
ML. I'll put an issue on github (I just spotted a mistake the
definition of normalization anyway)



Coming back to numpy :
There's a strange thing, the definition of numpy.correlate seems to
give the other definition z[k] = sum_n a[n] * conj(v[n+k]) ( 
http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html) 
although its usage prooves otherwise. What did I miss ?

best,
Pierre

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy correlate

2013-03-18 Thread Pierre Haessig
Hi Sudheer,

Le 14/03/2013 10:18, Sudheer Joseph a écrit :
 Dear Numpy/Scipy experts,
   Attached is a script
 which I made to test the numpy.correlate ( which is called py
 plt.xcorr) to see how the cross correlation is calculated. From this
 it appears the if i call plt.xcorr(x,y)
 Y is slided back in time compared to x. ie if y is a process that
 causes a delayed response in x after 5 timesteps then there should be
 a high correlation at Lag 5. However in attached plot the response is
 seen in only -ve side of the lags.
 Can any one advice me on how to see which way exactly the 2 series
 are slided back or forth.? and understand the cause result relation
 better?( I understand merely by correlation one cannot assume cause
 and result relation, but it is important to know which series is older
 in time at a given lag.
You indeed pointed out a lack of documentation of in matplotlib.xcorr
function because the definition of covariance can be ambiguous.

The way I would try to get an interpretation of xcorr function ( its
friends) is to go back to the theoretical definition of
cross-correlation, which is a normalized version of the covariance.

In your example you've created a time series X(k) and a lagged one :
Y(k) = X(k-5)

Now, the covariance function of X and Y is commonly defined as :
 Cov_{X,Y}(h) = E(X(k+h) * Y(k))   where E is the expectation
 (assuming that X and Y are centered for the sake of clarity).

If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)).
This yields naturally the fact that the covariance is indeed maximal at
h=-5 and not h=+5.

Note that this reasoning does yield the opposite result with a different
definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h))  (and
that's what I first did !).


Therefore, I think there should be a definition in of cross correlation
in matplotlib xcorr docstring. In R's acf doc, there is this mention :
The lag k value returned by ccf(x, y) estimates the correlation between
x[t+k] and y[t]. 
(see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)

Now I believe, this upper discussion really belongs to matplotlib ML.
I'll put an issue on github (I just spotted a mistake the definition of
normalization anyway)



Coming back to numpy :
There's a strange thing, the definition of numpy.correlate seems to give
the other definition z[k] = sum_n a[n] * conj(v[n+k]) (
http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html) 
although
its usage prooves otherwise. What did I miss ?

best,
Pierre


signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy correlate

2013-03-18 Thread Skipper Seabold
On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig pierre.haes...@crans.orgwrote:

  Hi Sudheer,

 Le 14/03/2013 10:18, Sudheer Joseph a écrit :

 Dear Numpy/Scipy experts,
   Attached is a script which I
 made to test the numpy.correlate ( which is called py plt.xcorr) to see how
 the cross correlation is calculated. From this it appears the if i call
 plt.xcorr(x,y)
 Y is slided back in time compared to x. ie if y is a process that causes a
 delayed response in x after 5 timesteps then there should be a high
 correlation at Lag 5. However in attached plot the response is seen in only
 -ve side of the lags.
 Can any one advice me on how to see which way exactly the 2 series
 are slided back or forth.? and understand the cause result relation
 better?( I understand merely by correlation one cannot assume cause and
 result relation, but it is important to know which series is older in time
 at a given lag.

 You indeed pointed out a lack of documentation of in matplotlib.xcorr
 function because the definition of covariance can be ambiguous.

 The way I would try to get an interpretation of xcorr function ( its
 friends) is to go back to the theoretical definition of cross-correlation,
 which is a normalized version of the covariance.

 In your example you've created a time series X(k) and a lagged one : Y(k)
 = X(k-5)

 Now, the covariance function of X and Y is commonly defined as :
  Cov_{X,Y}(h) = E(X(k+h) * Y(k))   where E is the expectation
  (assuming that X and Y are centered for the sake of clarity).

 If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This
 yields naturally the fact that the covariance is indeed maximal at h=-5 and
 not h=+5.

 Note that this reasoning does yield the opposite result with a different
 definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h))  (and
 that's what I first did !).


 Therefore, I think there should be a definition in of cross correlation in
 matplotlib xcorr docstring. In R's acf doc, there is this mention : The
 lag k value returned by ccf(x, y) estimates the correlation between x[t+k]
 and y[t]. 
 (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)

 Now I believe, this upper discussion really belongs to matplotlib ML. I'll
 put an issue on github (I just spotted a mistake the definition of
 normalization anyway)



You might be interested in the statsmodels implementation which should be
similar to the R functionality.

http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb
http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.htmlhttp://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html?highlight=acf#statsmodels.tsa.stattools.acf
http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.htmlhttp://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html?highlight=acf#statsmodels.graphics.tsaplots.plot_acf

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy correlate

2013-03-18 Thread josef . pktd
On Mon, Mar 18, 2013 at 1:10 PM, Skipper Seabold jsseab...@gmail.com wrote:
 On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig pierre.haes...@crans.org
 wrote:

 Hi Sudheer,

 Le 14/03/2013 10:18, Sudheer Joseph a écrit :

 Dear Numpy/Scipy experts,
   Attached is a script which I
 made to test the numpy.correlate ( which is called py plt.xcorr) to see how
 the cross correlation is calculated. From this it appears the if i call
 plt.xcorr(x,y)
 Y is slided back in time compared to x. ie if y is a process that causes a
 delayed response in x after 5 timesteps then there should be a high
 correlation at Lag 5. However in attached plot the response is seen in only
 -ve side of the lags.
 Can any one advice me on how to see which way exactly the 2 series are
 slided back or forth.? and understand the cause result relation better?( I
 understand merely by correlation one cannot assume cause and result
 relation, but it is important to know which series is older in time at a
 given lag.

 You indeed pointed out a lack of documentation of in matplotlib.xcorr
 function because the definition of covariance can be ambiguous.

 The way I would try to get an interpretation of xcorr function ( its
 friends) is to go back to the theoretical definition of cross-correlation,
 which is a normalized version of the covariance.

 In your example you've created a time series X(k) and a lagged one : Y(k)
 = X(k-5)

 Now, the covariance function of X and Y is commonly defined as :
  Cov_{X,Y}(h) = E(X(k+h) * Y(k))   where E is the expectation
  (assuming that X and Y are centered for the sake of clarity).

 If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This
 yields naturally the fact that the covariance is indeed maximal at h=-5 and
 not h=+5.

 Note that this reasoning does yield the opposite result with a different
 definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h))  (and
 that's what I first did !).


 Therefore, I think there should be a definition in of cross correlation in
 matplotlib xcorr docstring. In R's acf doc, there is this mention : The lag
 k value returned by ccf(x, y) estimates the correlation between x[t+k] and
 y[t]. 
 (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)

 Now I believe, this upper discussion really belongs to matplotlib ML. I'll
 put an issue on github (I just spotted a mistake the definition of
 normalization anyway)



 You might be interested in the statsmodels implementation which should be
 similar to the R functionality.

 http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb
 http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html
 http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html

we don't have any cross-correlation xcorr, AFAIR
but I guess it should work the same way.

Josef


 Skipper

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion