On Thu, Jan 19, 2012 at 2:49 PM, Dmitrey <[email protected]> wrote: > On 01/19/2012 07:31 PM, Maciej Fijalkowski wrote: >> >> On Thu, Jan 19, 2012 at 6:46 PM, Dmitrey<[email protected]> wrote: >>> >>> Hi all, >>> could you provide clarification to numpypy new funcs accepting (not only >>> for >>> me, but for any other possible volunteers)? >>> The doc I've been directed says only "You have to test exhaustively your >>> module", while I would like to know more explicit rules. >>> For example, "at least 3 tests per func" (however, I guess for funcs of >>> different complexity and variability number of tests also should expected >>> to >>> be different). >>> Also, are there any strict rules for the testcases to be submitted, or I, >>> for example, can mere write >>> >>> if __name__ == '__main__': >>> assert array_equal(1, 1) >>> assert array_equal([1, 2], [1, 2]) >>> assert array_equal(N.array([1, 2]), N.array([1, 2])) >>> assert array_equal([1, 2], N.array([1, 2])) >>> assert array_equal([1, 2], [1, 2, 3]) is False >>> print('passed') >> >> We have pretty exhaustive automated testing suites. Look for example >> in pypy/module/micronumpy/test directory for the test file style. >> They're run with py.test and we require at the very least full code >> coverage (every line has to be executed, there are tools to check, >> like coverage). Also passing "unusual" input, like sys.maxint etc. is >> usually recommended. With your example, you would check if it works >> for say views and multidimensional arrays. Also "is False" is not >> considered good style. >> >>> Or there is a certain rule for storing files with tests? >>> >>> If I or someone else will submit a func with some tests like in the >>> example >>> above, will you put the func and tests in the proper files by yourself? >>> I'm >>> not lazy to go for it by myself, but I mere no merged enough into numpypy >>> dev process, including mercurial branches and numpypy files structure, >>> and >>> can spend only quite limited time for diving into it in nearest future. >> >> We generally require people to put their own tests as they go with the >> code (in appropriate places) because you also should not break >> anything. The usefullness of a patch that has to be sliced and diced >> and put into places is very limited and for straightforward >> mostly-copied code, like array_equal, plain useless, since it's almost >> as much work to just do it. > > Well, for this func (array_equal) my docstrings really were copied from > cpython numpy (why wouln't do this to save some time, while license allows > it?), but > * why would'n go for this (), while other programmers are busy by other > tasks? > * engines of my and CPython numpy funcs complitely differs. At first, in > PyPy the CPython code just doesn't work at all (because of the problem with > ndarray.flat). At 2nd, I have implemented walkaround - just replaced some > code lines by > Size = a1.size > f1, f2 = a1.flat, a2.flat > # TODO: replace xrange by range in Python3 > for i in xrange(Size): > if f1.next() != f2.next(): return False > return True > > Here are some results in CPython for the following bench: > > from time import time > n = 100000 > m = 100 > a = N.zeros(n) > b = N.ones(n) > t = time() > for i in range(m): > N.array_equal(a, b) > print('classic numpy array_equal time elapsed (on different arrays): %0.5f' > % (time()-t)) > > > t = time() > for i in range(m): > array_equal(a, b) > print('Alternative array_equal time elapsed (on different arrays): %0.5f' % > (time()-t)) > > b = N.zeros(n) > > t = time() > for i in range(m): > N.array_equal(a, b) > print('classic numpy array_equal time elapsed (on same arrays): %0.5f' % > (time()-t)) > > t = time() > for i in range(m): > array_equal(a, b) > print('Alternative array_equal time elapsed (on same arrays): %0.5f' % > (time()-t)) > > CPython numpy results: > classic numpy array_equal time elapsed (on different arrays): 0.07728 > Alternative array_equal time elapsed (on different arrays): 0.00056 > classic numpy array_equal time elapsed (on same arrays): 0.11163 > Alternative array_equal time elapsed (on same arrays): 9.09458 > > PyPy results (cannot test on "classic" version because it depends on some > funcs that are unavailable yet): > Alternative array_equal time elapsed (on different arrays): 0.00133 > Alternative array_equal time elapsed (on same arrays): 0.95038 > > > So, as you see, even in CPython numpy my version is 138 times faster for > different arrays (yet slower in 90 times for same arrays). However, in real > world usually different arrays come to this func, and only sometimes similar > arrays are encountered. > Well, for my implementation for case of equal arrays time elapsed > essentially depends on their size, but in either way I still think my > implementation is better than CPython, - it's faster and doesn't require > allocation of memory for the boolean array, that will go to the logical_and. > > I updated my array_equal implementation with the changes mentioned above, > some tests on multidimensional arrays you've asked and put it in > http://pastebin.com/tg2aHE6x (now I'll update the bugs.pypy.org entry with > the link). > > > ----------------------- > Regards, D. > http://openopt.org/Dmitrey > _______________________________________________ > pypy-dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/pypy-dev
Worth pointing out that the implementation of array_equal and array_equiv in NumPy are a bit embarrassing because they require a full N comparisons instead of short-circuiting whenever a False value is found. This is completely silly IMHO: In [34]: x = np.random.randn(100000) In [35]: y = np.random.randn(100000) In [36]: timeit np.array_equal(x, y) 1000 loops, best of 3: 349 us per loop - W _______________________________________________ pypy-dev mailing list [email protected] http://mail.python.org/mailman/listinfo/pypy-dev
