On Thu, Jan 19, 2012 at 6:15 PM, Wes McKinney <[email protected]> wrote:
> On Thu, Jan 19, 2012 at 2:49 PM, Dmitrey <[email protected]> wrote: > > On 01/19/2012 07:31 PM, Maciej Fijalkowski wrote: > >> > >> On Thu, Jan 19, 2012 at 6:46 PM, Dmitrey<[email protected]> wrote: > >>> > >>> Hi all, > >>> could you provide clarification to numpypy new funcs accepting (not > only > >>> for > >>> me, but for any other possible volunteers)? > >>> The doc I've been directed says only "You have to test exhaustively > your > >>> module", while I would like to know more explicit rules. > >>> For example, "at least 3 tests per func" (however, I guess for funcs of > >>> different complexity and variability number of tests also should > expected > >>> to > >>> be different). > >>> Also, are there any strict rules for the testcases to be submitted, or > I, > >>> for example, can mere write > >>> > >>> if __name__ == '__main__': > >>> assert array_equal(1, 1) > >>> assert array_equal([1, 2], [1, 2]) > >>> assert array_equal(N.array([1, 2]), N.array([1, 2])) > >>> assert array_equal([1, 2], N.array([1, 2])) > >>> assert array_equal([1, 2], [1, 2, 3]) is False > >>> print('passed') > >> > >> We have pretty exhaustive automated testing suites. Look for example > >> in pypy/module/micronumpy/test directory for the test file style. > >> They're run with py.test and we require at the very least full code > >> coverage (every line has to be executed, there are tools to check, > >> like coverage). Also passing "unusual" input, like sys.maxint etc. is > >> usually recommended. With your example, you would check if it works > >> for say views and multidimensional arrays. Also "is False" is not > >> considered good style. > >> > >>> Or there is a certain rule for storing files with tests? > >>> > >>> If I or someone else will submit a func with some tests like in the > >>> example > >>> above, will you put the func and tests in the proper files by yourself? > >>> I'm > >>> not lazy to go for it by myself, but I mere no merged enough into > numpypy > >>> dev process, including mercurial branches and numpypy files structure, > >>> and > >>> can spend only quite limited time for diving into it in nearest future. > >> > >> We generally require people to put their own tests as they go with the > >> code (in appropriate places) because you also should not break > >> anything. The usefullness of a patch that has to be sliced and diced > >> and put into places is very limited and for straightforward > >> mostly-copied code, like array_equal, plain useless, since it's almost > >> as much work to just do it. > > > > Well, for this func (array_equal) my docstrings really were copied from > > cpython numpy (why wouln't do this to save some time, while license > allows > > it?), but > > * why would'n go for this (), while other programmers are busy by other > > tasks? > > * engines of my and CPython numpy funcs complitely differs. At first, in > > PyPy the CPython code just doesn't work at all (because of the problem > with > > ndarray.flat). At 2nd, I have implemented walkaround - just replaced some > > code lines by > > Size = a1.size > > f1, f2 = a1.flat, a2.flat > > # TODO: replace xrange by range in Python3 > > for i in xrange(Size): > > if f1.next() != f2.next(): return False > > return True > > > > Here are some results in CPython for the following bench: > > > > from time import time > > n = 100000 > > m = 100 > > a = N.zeros(n) > > b = N.ones(n) > > t = time() > > for i in range(m): > > N.array_equal(a, b) > > print('classic numpy array_equal time elapsed (on different arrays): > %0.5f' > > % (time()-t)) > > > > > > t = time() > > for i in range(m): > > array_equal(a, b) > > print('Alternative array_equal time elapsed (on different arrays): > %0.5f' % > > (time()-t)) > > > > b = N.zeros(n) > > > > t = time() > > for i in range(m): > > N.array_equal(a, b) > > print('classic numpy array_equal time elapsed (on same arrays): %0.5f' % > > (time()-t)) > > > > t = time() > > for i in range(m): > > array_equal(a, b) > > print('Alternative array_equal time elapsed (on same arrays): %0.5f' % > > (time()-t)) > > > > CPython numpy results: > > classic numpy array_equal time elapsed (on different arrays): 0.07728 > > Alternative array_equal time elapsed (on different arrays): 0.00056 > > classic numpy array_equal time elapsed (on same arrays): 0.11163 > > Alternative array_equal time elapsed (on same arrays): 9.09458 > > > > PyPy results (cannot test on "classic" version because it depends on some > > funcs that are unavailable yet): > > Alternative array_equal time elapsed (on different arrays): 0.00133 > > Alternative array_equal time elapsed (on same arrays): 0.95038 > > > > > > So, as you see, even in CPython numpy my version is 138 times faster for > > different arrays (yet slower in 90 times for same arrays). However, in > real > > world usually different arrays come to this func, and only sometimes > similar > > arrays are encountered. > > Well, for my implementation for case of equal arrays time elapsed > > essentially depends on their size, but in either way I still think my > > implementation is better than CPython, - it's faster and doesn't require > > allocation of memory for the boolean array, that will go to the > logical_and. > > > > I updated my array_equal implementation with the changes mentioned above, > > some tests on multidimensional arrays you've asked and put it in > > http://pastebin.com/tg2aHE6x (now I'll update the bugs.pypy.org entry > with > > the link). > > > > > > ----------------------- > > Regards, D. > > http://openopt.org/Dmitrey > > _______________________________________________ > > pypy-dev mailing list > > [email protected] > > http://mail.python.org/mailman/listinfo/pypy-dev > > Worth pointing out that the implementation of array_equal and > array_equiv in NumPy are a bit embarrassing because they require a > full N comparisons instead of short-circuiting whenever a False value > is found. This is completely silly IMHO: > > In [34]: x = np.random.randn(100000) > > In [35]: y = np.random.randn(100000) > > In [36]: timeit np.array_equal(x, y) > 1000 loops, best of 3: 349 us per loop > > - W > _______________________________________________ > pypy-dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/pypy-dev > The correct solution (IMO), is to reuse the original NumPy implementation, but have logical_and.reduce short circuit correctly. This has the nice side effect of allowing all() and any() to use logical_and/logical_or.reduce. Alx -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero
_______________________________________________ pypy-dev mailing list [email protected] http://mail.python.org/mailman/listinfo/pypy-dev
