On Fri, Jan 20, 2012 at 2:47 PM, Neal Becker <[email protected]> wrote: > Alex Gaynor wrote: > >> On Thu, Jan 19, 2012 at 6:15 PM, Wes McKinney <[email protected]> wrote: >> >>> On Thu, Jan 19, 2012 at 2:49 PM, Dmitrey <[email protected]> wrote: >>> > On 01/19/2012 07:31 PM, Maciej Fijalkowski wrote: >>> >> >>> >> On Thu, Jan 19, 2012 at 6:46 PM, Dmitrey<[email protected]> wrote: >>> >>> >>> >>> Hi all, >>> >>> could you provide clarification to numpypy new funcs accepting (not >>> only >>> >>> for >>> >>> me, but for any other possible volunteers)? >>> >>> The doc I've been directed says only "You have to test exhaustively >>> your >>> >>> module", while I would like to know more explicit rules. >>> >>> For example, "at least 3 tests per func" (however, I guess for funcs of >>> >>> different complexity and variability number of tests also should >>> expected >>> >>> to >>> >>> be different). >>> >>> Also, are there any strict rules for the testcases to be submitted, or >>> I, >>> >>> for example, can mere write >>> >>> >>> >>> if __name__ == '__main__': >>> >>> assert array_equal(1, 1) >>> >>> assert array_equal([1, 2], [1, 2]) >>> >>> assert array_equal(N.array([1, 2]), N.array([1, 2])) >>> >>> assert array_equal([1, 2], N.array([1, 2])) >>> >>> assert array_equal([1, 2], [1, 2, 3]) is False >>> >>> print('passed') >>> >> >>> >> We have pretty exhaustive automated testing suites. Look for example >>> >> in pypy/module/micronumpy/test directory for the test file style. >>> >> They're run with py.test and we require at the very least full code >>> >> coverage (every line has to be executed, there are tools to check, >>> >> like coverage). Also passing "unusual" input, like sys.maxint etc. is >>> >> usually recommended. With your example, you would check if it works >>> >> for say views and multidimensional arrays. Also "is False" is not >>> >> considered good style. >>> >> >>> >>> Or there is a certain rule for storing files with tests? >>> >>> >>> >>> If I or someone else will submit a func with some tests like in the >>> >>> example >>> >>> above, will you put the func and tests in the proper files by yourself? >>> >>> I'm >>> >>> not lazy to go for it by myself, but I mere no merged enough into >>> numpypy >>> >>> dev process, including mercurial branches and numpypy files structure, >>> >>> and >>> >>> can spend only quite limited time for diving into it in nearest future. >>> >> >>> >> We generally require people to put their own tests as they go with the >>> >> code (in appropriate places) because you also should not break >>> >> anything. The usefullness of a patch that has to be sliced and diced >>> >> and put into places is very limited and for straightforward >>> >> mostly-copied code, like array_equal, plain useless, since it's almost >>> >> as much work to just do it. >>> > >>> > Well, for this func (array_equal) my docstrings really were copied from >>> > cpython numpy (why wouln't do this to save some time, while license >>> allows >>> > it?), but >>> > * why would'n go for this (), while other programmers are busy by other >>> > tasks? >>> > * engines of my and CPython numpy funcs complitely differs. At first, in >>> > PyPy the CPython code just doesn't work at all (because of the problem >>> with >>> > ndarray.flat). At 2nd, I have implemented walkaround - just replaced some >>> > code lines by >>> > Size = a1.size >>> > f1, f2 = a1.flat, a2.flat >>> > # TODO: replace xrange by range in Python3 >>> > for i in xrange(Size): >>> > if f1.next() != f2.next(): return False >>> > return True >>> > >>> > Here are some results in CPython for the following bench: >>> > >>> > from time import time >>> > n = 100000 >>> > m = 100 >>> > a = N.zeros(n) >>> > b = N.ones(n) >>> > t = time() >>> > for i in range(m): >>> > N.array_equal(a, b) >>> > print('classic numpy array_equal time elapsed (on different arrays): >>> %0.5f' >>> > % (time()-t)) >>> > >>> > >>> > t = time() >>> > for i in range(m): >>> > array_equal(a, b) >>> > print('Alternative array_equal time elapsed (on different arrays): >>> %0.5f' % >>> > (time()-t)) >>> > >>> > b = N.zeros(n) >>> > >>> > t = time() >>> > for i in range(m): >>> > N.array_equal(a, b) >>> > print('classic numpy array_equal time elapsed (on same arrays): %0.5f' % >>> > (time()-t)) >>> > >>> > t = time() >>> > for i in range(m): >>> > array_equal(a, b) >>> > print('Alternative array_equal time elapsed (on same arrays): %0.5f' % >>> > (time()-t)) >>> > >>> > CPython numpy results: >>> > classic numpy array_equal time elapsed (on different arrays): 0.07728 >>> > Alternative array_equal time elapsed (on different arrays): 0.00056 >>> > classic numpy array_equal time elapsed (on same arrays): 0.11163 >>> > Alternative array_equal time elapsed (on same arrays): 9.09458 >>> > >>> > PyPy results (cannot test on "classic" version because it depends on some >>> > funcs that are unavailable yet): >>> > Alternative array_equal time elapsed (on different arrays): 0.00133 >>> > Alternative array_equal time elapsed (on same arrays): 0.95038 >>> > >>> > >>> > So, as you see, even in CPython numpy my version is 138 times faster for >>> > different arrays (yet slower in 90 times for same arrays). However, in >>> real >>> > world usually different arrays come to this func, and only sometimes >>> similar >>> > arrays are encountered. >>> > Well, for my implementation for case of equal arrays time elapsed >>> > essentially depends on their size, but in either way I still think my >>> > implementation is better than CPython, - it's faster and doesn't require >>> > allocation of memory for the boolean array, that will go to the >>> logical_and. >>> > >>> > I updated my array_equal implementation with the changes mentioned above, >>> > some tests on multidimensional arrays you've asked and put it in >>> > http://pastebin.com/tg2aHE6x (now I'll update the bugs.pypy.org entry >>> with >>> > the link). >>> > >>> > >>> > ----------------------- >>> > Regards, D. >>> > http://openopt.org/Dmitrey >>> > _______________________________________________ >>> > pypy-dev mailing list >>> > [email protected] >>> > http://mail.python.org/mailman/listinfo/pypy-dev >>> >>> Worth pointing out that the implementation of array_equal and >>> array_equiv in NumPy are a bit embarrassing because they require a >>> full N comparisons instead of short-circuiting whenever a False value >>> is found. This is completely silly IMHO: >>> >>> In [34]: x = np.random.randn(100000) >>> >>> In [35]: y = np.random.randn(100000) >>> >>> In [36]: timeit np.array_equal(x, y) >>> 1000 loops, best of 3: 349 us per loop >>> >>> - W >>> _______________________________________________ >>> pypy-dev mailing list >>> [email protected] >>> http://mail.python.org/mailman/listinfo/pypy-dev >>> >> >> The correct solution (IMO), is to reuse the original NumPy implementation, >> but have logical_and.reduce short circuit correctly. This has the nice >> side effect of allowing all() and any() to use >> logical_and/logical_or.reduce. >> >> Alx >> > > > I have complained on the numpy list about 1 year ago about this issue. The > usual numpy idiom is > > np.any (some comparison) > > which will create an array of the full size comparing each element before > attempting the 'any', which is obviously wasteful. Hope numpypy can do > better.
It does better already FYI. It does not completely work with all kinds of possible constructs (like .flat) but works in general. _______________________________________________ pypy-dev mailing list [email protected] http://mail.python.org/mailman/listinfo/pypy-dev
