Hi folks, Yesterday, I was able to make my simple cross-correlation script to work with PyPy / NumPy at all with the valuable help of bdk, however, I ended up being quite disappointed about the performance.
I stripped out everything I could and ended up with something like this: Python 3: $ time python corrbench.py 0 1 2 real 0m2.225s user 0m1.980s sys 0m0.240s PyPy from hg / NumPy from hg (3581f7a906c9+): $ time python corrbench.py 0 1 2 real 1m2.928s user 1m2.460s sys 0m0.270s What this script does, it basically generates two time series of ~5 000 elements and bins them into 3 000 000 buckets, which it then multiples element-wise and sums up the result. I expected a huge performance boost, of course :-) I think I'll be running this with plain Python for now, but I hope that's a valuable benchmark for you guys! -- Sincerely yours, Yury V. Zaytsev
import numpy as np TIME_SPAN = 120 * 60 * 1000 MAX_LAG = 50 BIN_SIZE = 2 def xcorr(x1, x2): assert len(x1) == len(x2) xc = [] lags = np.arange(-MAX_LAG, MAX_LAG + 1) for lag in lags: if lag > 0: rho = np.sum(x1[:len(x1) - lag] * x2[lag:]) else: rho = np.sum(x2[-lag:] * x1[:len(x1) - (-lag)]) xc.append(rho) xc = np.asarray(xc, dtype=np.float64) bins = len(x1) xc /= bins argmax = np.argmax(xc) max_val = xc[argmax] max_lag = lags[argmax] * BIN_SIZE return xc, max_val, max_lag ts1 = np.arange(0, TIME_SPAN, TIME_SPAN // 5000) ts2 = np.arange(10, TIME_SPAN, TIME_SPAN // 5000) for i in range(3): print(i) ih, ib = np.histogram(ts1, range=(0, TIME_SPAN), bins=TIME_SPAN // BIN_SIZE) jh, jb = np.histogram(ts2, range=(0, TIME_SPAN), bins=TIME_SPAN // BIN_SIZE) _, mv, ml = xcorr(ts1, ts2)
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev