Hi folks,

Yesterday, I was able to make my simple cross-correlation script to work
with PyPy / NumPy at all with the valuable help of bdk, however, I ended
up being quite disappointed about the performance.

I stripped out everything I could and ended up with something like this:

Python 3:

$ time python corrbench.py 
0
1
2

real 0m2.225s
user 0m1.980s
sys 0m0.240s

PyPy from hg / NumPy from hg (3581f7a906c9+):

$ time python corrbench.py
0
1
2

real 1m2.928s
user 1m2.460s
sys 0m0.270s

What this script does, it basically generates two time series of ~5 000
elements and bins them into 3 000 000 buckets, which it then multiples
element-wise and sums up the result. I expected a huge performance
boost, of course :-)

I think I'll be running this with plain Python for now, but I hope
that's a valuable benchmark for you guys!

-- 
Sincerely yours,
Yury V. Zaytsev

import numpy as np

TIME_SPAN = 120 * 60 * 1000
MAX_LAG = 50
BIN_SIZE = 2


def xcorr(x1, x2):

    assert len(x1) == len(x2)

    xc = []

    lags = np.arange(-MAX_LAG, MAX_LAG + 1)

    for lag in lags:

        if lag > 0:
            rho = np.sum(x1[:len(x1) - lag] * x2[lag:])
        else:
            rho = np.sum(x2[-lag:] * x1[:len(x1) - (-lag)])

        xc.append(rho)

    xc = np.asarray(xc, dtype=np.float64)

    bins = len(x1)
    xc /= bins

    argmax = np.argmax(xc)

    max_val = xc[argmax]
    max_lag = lags[argmax] * BIN_SIZE

    return xc, max_val, max_lag

ts1 = np.arange(0, TIME_SPAN, TIME_SPAN // 5000)
ts2 = np.arange(10, TIME_SPAN, TIME_SPAN // 5000)

for i in range(3):
    print(i)
    ih, ib = np.histogram(ts1, range=(0, TIME_SPAN), bins=TIME_SPAN // BIN_SIZE)
    jh, jb = np.histogram(ts2, range=(0, TIME_SPAN), bins=TIME_SPAN // BIN_SIZE)
    _, mv, ml = xcorr(ts1, ts2)
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev

Reply via email to