Re: [Numpy-discussion] Numpy correlate
Hi, Le 19/03/2013 08:12, Sudheer Joseph a écrit : *Thank you Pierre,* It appears the numpy.correlate uses the frequency domain method for getting the ccf. I would like to know how serious or exactly what is the issue with normalization?. I have computed cross correlation using the function and interpreting the results based on it. It will be helpful if you could tell me if there is a significant bug in the function with best regards, Sudheer np.correlate works in the time domain. I started a discussion about a month ago about the way it's implemented http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065562.html Unfortunately I didn't find time to dig deeper in the matter which needs working in the C code of numpy which I'm not familiar with. Concerning the normalization of mpl.xcorr, I think that what is computed is just fine. It's just the way this normalization is described in the docstring which I think is weird. https://github.com/matplotlib/matplotlib/issues/1835 best, Pierre signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy correlate
Thank you All for the response, acf do not accept 2 variables so naturally http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html These may not work for me. *** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.in...@gmail.com;sudheer.jos...@yahoo.com Web- http://oppamthadathil.tripod.com *** From: josef.p...@gmail.com josef.p...@gmail.com To: Discussion of Numerical Python numpy-discussion@scipy.org Sent: Tuesday, 19 March 2013 1:51 AM Subject: Re: [Numpy-discussion] Numpy correlate On Mon, Mar 18, 2013 at 1:10 PM, Skipper Seabold jsseab...@gmail.com wrote: On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig pierre.haes...@crans.org wrote: Hi Sudheer, Le 14/03/2013 10:18, Sudheer Joseph a écrit : Dear Numpy/Scipy experts, Attached is a script which I made to test the numpy.correlate ( which is called py plt.xcorr) to see how the cross correlation is calculated. From this it appears the if i call plt.xcorr(x,y) Y is slided back in time compared to x. ie if y is a process that causes a delayed response in x after 5 timesteps then there should be a high correlation at Lag 5. However in attached plot the response is seen in only -ve side of the lags. Can any one advice me on how to see which way exactly the 2 series are slided back or forth.? and understand the cause result relation better?( I understand merely by correlation one cannot assume cause and result relation, but it is important to know which series is older in time at a given lag. You indeed pointed out a lack of documentation of in matplotlib.xcorr function because the definition of covariance can be ambiguous. The way I would try to get an interpretation of xcorr function ( its friends) is to go back to the theoretical definition of cross-correlation, which is a normalized version of the covariance. In your example you've created a time series X(k) and a lagged one : Y(k) = X(k-5) Now, the covariance function of X and Y is commonly defined as : Cov_{X,Y}(h) = E(X(k+h) * Y(k)) where E is the expectation (assuming that X and Y are centered for the sake of clarity). If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This yields naturally the fact that the covariance is indeed maximal at h=-5 and not h=+5. Note that this reasoning does yield the opposite result with a different definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h)) (and that's what I first did !). Therefore, I think there should be a definition in of cross correlation in matplotlib xcorr docstring. In R's acf doc, there is this mention : The lag k value returned by ccf(x, y) estimates the correlation between x[t+k] and y[t]. (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html) Now I believe, this upper discussion really belongs to matplotlib ML. I'll put an issue on github (I just spotted a mistake the definition of normalization anyway) You might be interested in the statsmodels implementation which should be similar to the R functionality. http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html we don't have any cross-correlation xcorr, AFAIR but I guess it should work the same way. Josef Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy correlate
Thank you Pierre, It appears the numpy.correlate uses the frequency domain method for getting the ccf. I would like to know how serious or exactly what is the issue with normalization?. I have computed cross correlation using the function and interpreting the results based on it. It will be helpful if you could tell me if there is a significant bug in the function with best regards, Sudheer From: Pierre Haessig pierre.haes...@crans.org To: numpy-discussion@scipy.org Sent: Monday, 18 March 2013 10:30 PM Subject: Re: [Numpy-discussion] Numpy correlate Hi Sudheer, Le 14/03/2013 10:18, Sudheer Joseph a écrit : Dear Numpy/Scipy experts, Attached is a script which I made to test the numpy.correlate ( which is called py plt.xcorr) to see how the cross correlation is calculated. From this it appears the if i call plt.xcorr(x,y) Y is slided back in time compared to x. ie if y is a process that causes a delayed response in x after 5 timesteps then there should be a high correlation at Lag 5. However in attached plot the response is seen in only -ve side of the lags. Can any one advice me on how to see which way exactly the 2 series are slided back or forth.? and understand the cause result relation better?( I understand merely by correlation one cannot assume cause and result relation, but it is important to know which series is older in time at a given lag. You indeed pointed out a lack of documentation of in matplotlib.xcorr function because the definition of covariance can be ambiguous. The way I would try to get an interpretation of xcorr function ( its friends) is to go back to the theoretical definition of cross-correlation, which is a normalized version of the covariance. In your example you've created a time series X(k) and a lagged one : Y(k) = X(k-5) Now, the covariance function of X and Y is commonly defined as : Cov_{X,Y}(h) = E(X(k+h) * Y(k)) where E is the expectation (assuming that X and Y are centered for the sake of clarity). If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This yields naturally the fact that the covariance is indeed maximal at h=-5 and not h=+5. Note that this reasoning does yield the opposite result with a different definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h)) (and that's what I first did !). Therefore, I think there should be a definition in of cross correlation in matplotlib xcorr docstring. In R's acf doc, there is this mention : The lag k value returned by ccf(x, y) estimates the correlation between x[t+k] and y[t]. (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html) Now I believe, this upper discussion really belongs to matplotlib ML. I'll put an issue on github (I just spotted a mistake the definition of normalization anyway) Coming back to numpy : There's a strange thing, the definition of numpy.correlate seems to give the other definition z[k] = sum_n a[n] * conj(v[n+k]) ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html) although its usage prooves otherwise. What did I miss ? best, Pierre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy correlate
Hi Sudheer, Le 14/03/2013 10:18, Sudheer Joseph a écrit : Dear Numpy/Scipy experts, Attached is a script which I made to test the numpy.correlate ( which is called py plt.xcorr) to see how the cross correlation is calculated. From this it appears the if i call plt.xcorr(x,y) Y is slided back in time compared to x. ie if y is a process that causes a delayed response in x after 5 timesteps then there should be a high correlation at Lag 5. However in attached plot the response is seen in only -ve side of the lags. Can any one advice me on how to see which way exactly the 2 series are slided back or forth.? and understand the cause result relation better?( I understand merely by correlation one cannot assume cause and result relation, but it is important to know which series is older in time at a given lag. You indeed pointed out a lack of documentation of in matplotlib.xcorr function because the definition of covariance can be ambiguous. The way I would try to get an interpretation of xcorr function ( its friends) is to go back to the theoretical definition of cross-correlation, which is a normalized version of the covariance. In your example you've created a time series X(k) and a lagged one : Y(k) = X(k-5) Now, the covariance function of X and Y is commonly defined as : Cov_{X,Y}(h) = E(X(k+h) * Y(k)) where E is the expectation (assuming that X and Y are centered for the sake of clarity). If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This yields naturally the fact that the covariance is indeed maximal at h=-5 and not h=+5. Note that this reasoning does yield the opposite result with a different definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h)) (and that's what I first did !). Therefore, I think there should be a definition in of cross correlation in matplotlib xcorr docstring. In R's acf doc, there is this mention : The lag k value returned by ccf(x, y) estimates the correlation between x[t+k] and y[t]. (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html) Now I believe, this upper discussion really belongs to matplotlib ML. I'll put an issue on github (I just spotted a mistake the definition of normalization anyway) Coming back to numpy : There's a strange thing, the definition of numpy.correlate seems to give the other definition z[k] = sum_n a[n] * conj(v[n+k]) ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html) although its usage prooves otherwise. What did I miss ? best, Pierre signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy correlate
On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig pierre.haes...@crans.orgwrote: Hi Sudheer, Le 14/03/2013 10:18, Sudheer Joseph a écrit : Dear Numpy/Scipy experts, Attached is a script which I made to test the numpy.correlate ( which is called py plt.xcorr) to see how the cross correlation is calculated. From this it appears the if i call plt.xcorr(x,y) Y is slided back in time compared to x. ie if y is a process that causes a delayed response in x after 5 timesteps then there should be a high correlation at Lag 5. However in attached plot the response is seen in only -ve side of the lags. Can any one advice me on how to see which way exactly the 2 series are slided back or forth.? and understand the cause result relation better?( I understand merely by correlation one cannot assume cause and result relation, but it is important to know which series is older in time at a given lag. You indeed pointed out a lack of documentation of in matplotlib.xcorr function because the definition of covariance can be ambiguous. The way I would try to get an interpretation of xcorr function ( its friends) is to go back to the theoretical definition of cross-correlation, which is a normalized version of the covariance. In your example you've created a time series X(k) and a lagged one : Y(k) = X(k-5) Now, the covariance function of X and Y is commonly defined as : Cov_{X,Y}(h) = E(X(k+h) * Y(k)) where E is the expectation (assuming that X and Y are centered for the sake of clarity). If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This yields naturally the fact that the covariance is indeed maximal at h=-5 and not h=+5. Note that this reasoning does yield the opposite result with a different definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h)) (and that's what I first did !). Therefore, I think there should be a definition in of cross correlation in matplotlib xcorr docstring. In R's acf doc, there is this mention : The lag k value returned by ccf(x, y) estimates the correlation between x[t+k] and y[t]. (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html) Now I believe, this upper discussion really belongs to matplotlib ML. I'll put an issue on github (I just spotted a mistake the definition of normalization anyway) You might be interested in the statsmodels implementation which should be similar to the R functionality. http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.htmlhttp://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html?highlight=acf#statsmodels.tsa.stattools.acf http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.htmlhttp://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html?highlight=acf#statsmodels.graphics.tsaplots.plot_acf Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy correlate
On Mon, Mar 18, 2013 at 1:10 PM, Skipper Seabold jsseab...@gmail.com wrote: On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig pierre.haes...@crans.org wrote: Hi Sudheer, Le 14/03/2013 10:18, Sudheer Joseph a écrit : Dear Numpy/Scipy experts, Attached is a script which I made to test the numpy.correlate ( which is called py plt.xcorr) to see how the cross correlation is calculated. From this it appears the if i call plt.xcorr(x,y) Y is slided back in time compared to x. ie if y is a process that causes a delayed response in x after 5 timesteps then there should be a high correlation at Lag 5. However in attached plot the response is seen in only -ve side of the lags. Can any one advice me on how to see which way exactly the 2 series are slided back or forth.? and understand the cause result relation better?( I understand merely by correlation one cannot assume cause and result relation, but it is important to know which series is older in time at a given lag. You indeed pointed out a lack of documentation of in matplotlib.xcorr function because the definition of covariance can be ambiguous. The way I would try to get an interpretation of xcorr function ( its friends) is to go back to the theoretical definition of cross-correlation, which is a normalized version of the covariance. In your example you've created a time series X(k) and a lagged one : Y(k) = X(k-5) Now, the covariance function of X and Y is commonly defined as : Cov_{X,Y}(h) = E(X(k+h) * Y(k)) where E is the expectation (assuming that X and Y are centered for the sake of clarity). If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This yields naturally the fact that the covariance is indeed maximal at h=-5 and not h=+5. Note that this reasoning does yield the opposite result with a different definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h)) (and that's what I first did !). Therefore, I think there should be a definition in of cross correlation in matplotlib xcorr docstring. In R's acf doc, there is this mention : The lag k value returned by ccf(x, y) estimates the correlation between x[t+k] and y[t]. (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html) Now I believe, this upper discussion really belongs to matplotlib ML. I'll put an issue on github (I just spotted a mistake the definition of normalization anyway) You might be interested in the statsmodels implementation which should be similar to the R functionality. http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html we don't have any cross-correlation xcorr, AFAIR but I guess it should work the same way. Josef Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion