Hi folks,
Yesterday, I was able to make my simple cross-correlation script to work
with PyPy / NumPy at all with the valuable help of bdk, however, I ended
up being quite disappointed about the performance.
I stripped out everything I could and ended up with something like this:
Python 3:
$ time python corrbench.py
0
1
2
real 0m2.225s
user 0m1.980s
sys 0m0.240s
PyPy from hg / NumPy from hg (3581f7a906c9+):
$ time python corrbench.py
0
1
2
real 1m2.928s
user 1m2.460s
sys 0m0.270s
What this script does, it basically generates two time series of ~5 000
elements and bins them into 3 000 000 buckets, which it then multiples
element-wise and sums up the result. I expected a huge performance
boost, of course :-)
I think I'll be running this with plain Python for now, but I hope
that's a valuable benchmark for you guys!
--
Sincerely yours,
Yury V. Zaytsev
import numpy as np
TIME_SPAN = 120 * 60 * 1000
MAX_LAG = 50
BIN_SIZE = 2
def xcorr(x1, x2):
assert len(x1) == len(x2)
xc = []
lags = np.arange(-MAX_LAG, MAX_LAG + 1)
for lag in lags:
if lag > 0:
rho = np.sum(x1[:len(x1) - lag] * x2[lag:])
else:
rho = np.sum(x2[-lag:] * x1[:len(x1) - (-lag)])
xc.append(rho)
xc = np.asarray(xc, dtype=np.float64)
bins = len(x1)
xc /= bins
argmax = np.argmax(xc)
max_val = xc[argmax]
max_lag = lags[argmax] * BIN_SIZE
return xc, max_val, max_lag
ts1 = np.arange(0, TIME_SPAN, TIME_SPAN // 5000)
ts2 = np.arange(10, TIME_SPAN, TIME_SPAN // 5000)
for i in range(3):
print(i)
ih, ib = np.histogram(ts1, range=(0, TIME_SPAN), bins=TIME_SPAN // BIN_SIZE)
jh, jb = np.histogram(ts2, range=(0, TIME_SPAN), bins=TIME_SPAN // BIN_SIZE)
_, mv, ml = xcorr(ts1, ts2)
_______________________________________________
pypy-dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-dev