Re: [pypy-dev] certificate for accepting numpypy new funcs?

Maciej Fijalkowski Fri, 20 Jan 2012 05:19:35 -0800

On Fri, Jan 20, 2012 at 2:47 PM, Neal Becker <[email protected]> wrote:
> Alex Gaynor wrote:
>
>> On Thu, Jan 19, 2012 at 6:15 PM, Wes McKinney <[email protected]> wrote:
>>
>>> On Thu, Jan 19, 2012 at 2:49 PM, Dmitrey <[email protected]> wrote:
>>> > On 01/19/2012 07:31 PM, Maciej Fijalkowski wrote:
>>> >>
>>> >> On Thu, Jan 19, 2012 at 6:46 PM, Dmitrey<[email protected]>  wrote:
>>> >>>
>>> >>> Hi all,
>>> >>> could you provide clarification to numpypy new funcs accepting (not
>>> only
>>> >>> for
>>> >>> me, but for any other possible volunteers)?
>>> >>> The doc I've been directed says only "You have to test exhaustively
>>> your
>>> >>> module", while I would like to know more explicit rules.
>>> >>> For example, "at least 3 tests per func" (however, I guess for funcs of
>>> >>> different complexity and variability number of tests also should
>>> expected
>>> >>> to
>>> >>> be different).
>>> >>> Also, are there any strict rules for the testcases to be submitted, or
>>> I,
>>> >>> for example, can mere write
>>> >>>
>>> >>> if __name__ == '__main__':
>>> >>>    assert array_equal(1, 1)
>>> >>>    assert array_equal([1, 2], [1, 2])
>>> >>>    assert array_equal(N.array([1, 2]), N.array([1, 2]))
>>> >>>    assert array_equal([1, 2], N.array([1, 2]))
>>> >>>    assert array_equal([1, 2], [1, 2, 3]) is False
>>> >>>    print('passed')
>>> >>
>>> >> We have pretty exhaustive automated testing suites. Look for example
>>> >> in pypy/module/micronumpy/test directory for the test file style.
>>> >> They're run with py.test and we require at the very least full code
>>> >> coverage (every line has to be executed, there are tools to check,
>>> >> like coverage). Also passing "unusual" input, like sys.maxint  etc. is
>>> >> usually recommended. With your example, you would check if it works
>>> >> for say views and multidimensional arrays. Also "is False" is not
>>> >> considered good style.
>>> >>
>>> >>> Or there is a certain rule for storing files with tests?
>>> >>>
>>> >>> If I or someone else will submit a func with some tests like in the
>>> >>> example
>>> >>> above, will you put the func and tests in the proper files by yourself?
>>> >>> I'm
>>> >>> not lazy to go for it by myself, but I mere no merged enough into
>>> numpypy
>>> >>> dev process, including mercurial branches and numpypy files structure,
>>> >>> and
>>> >>> can spend only quite limited time for diving into it in nearest future.
>>> >>
>>> >> We generally require people to put their own tests as they go with the
>>> >> code (in appropriate places) because you also should not break
>>> >> anything. The usefullness of a patch that has to be sliced and diced
>>> >> and put into places is very limited and for straightforward
>>> >> mostly-copied code, like array_equal, plain useless, since it's almost
>>> >> as much work to just do it.
>>> >
>>> > Well, for this func (array_equal) my docstrings really were copied from
>>> > cpython numpy (why wouln't do this to save some time, while license
>>> allows
>>> > it?), but
>>> > * why would'n go for this (), while other programmers are busy by other
>>> > tasks?
>>> > * engines of my and CPython numpy funcs complitely differs. At first, in
>>> > PyPy the CPython code just doesn't work at all (because of the problem
>>> with
>>> > ndarray.flat). At 2nd, I have implemented walkaround - just replaced some
>>> > code lines by
>>> >    Size = a1.size
>>> >    f1, f2 = a1.flat, a2.flat
>>> >    # TODO: replace xrange by range in Python3
>>> >    for i in xrange(Size):
>>> >        if f1.next() != f2.next(): return False
>>> >    return True
>>> >
>>> > Here are some results in CPython for the following bench:
>>> >
>>> > from time import time
>>> > n = 100000
>>> > m = 100
>>> > a = N.zeros(n)
>>> > b = N.ones(n)
>>> > t = time()
>>> > for i in range(m):
>>> >    N.array_equal(a, b)
>>> > print('classic numpy array_equal time elapsed (on different arrays):
>>> %0.5f'
>>> > % (time()-t))
>>> >
>>> >
>>> > t = time()
>>> > for i in range(m):
>>> >    array_equal(a, b)
>>> > print('Alternative array_equal time elapsed (on different arrays):
>>> %0.5f' %
>>> > (time()-t))
>>> >
>>> > b = N.zeros(n)
>>> >
>>> > t = time()
>>> > for i in range(m):
>>> >    N.array_equal(a, b)
>>> > print('classic numpy array_equal time elapsed (on same arrays): %0.5f' %
>>> > (time()-t))
>>> >
>>> > t = time()
>>> > for i in range(m):
>>> >    array_equal(a, b)
>>> > print('Alternative array_equal time elapsed (on same arrays): %0.5f' %
>>> > (time()-t))
>>> >
>>> > CPython numpy results:
>>> > classic numpy array_equal time elapsed (on different arrays): 0.07728
>>> > Alternative array_equal time elapsed (on different arrays): 0.00056
>>> > classic numpy array_equal time elapsed (on same arrays): 0.11163
>>> > Alternative array_equal time elapsed (on same arrays): 9.09458
>>> >
>>> > PyPy results (cannot test on "classic" version because it depends on some
>>> > funcs that are unavailable yet):
>>> > Alternative array_equal time elapsed (on different arrays): 0.00133
>>> > Alternative array_equal time elapsed (on same arrays): 0.95038
>>> >
>>> >
>>> > So, as you see, even in CPython numpy my version is 138 times faster for
>>> > different arrays (yet slower in 90 times for same arrays). However, in
>>> real
>>> > world usually different arrays come to this func, and only sometimes
>>> similar
>>> > arrays are encountered.
>>> > Well, for my implementation for case of equal arrays time elapsed
>>> > essentially depends on their size, but in either way I still think my
>>> > implementation is better than CPython, - it's faster and doesn't require
>>> > allocation of memory for the boolean array, that will go to the
>>> logical_and.
>>> >
>>> > I updated my array_equal implementation with the changes mentioned above,
>>> > some tests on multidimensional arrays you've asked and put it in
>>> > http://pastebin.com/tg2aHE6x (now I'll update the bugs.pypy.org entry
>>> with
>>> > the link).
>>> >
>>> >
>>> > -----------------------
>>> > Regards, D.
>>> > http://openopt.org/Dmitrey
>>> > _______________________________________________
>>> > pypy-dev mailing list
>>> > [email protected]
>>> > http://mail.python.org/mailman/listinfo/pypy-dev
>>>
>>> Worth pointing out that the implementation of array_equal and
>>> array_equiv in NumPy are a bit embarrassing because they require a
>>> full N comparisons instead of short-circuiting whenever a False value
>>> is found. This is completely silly IMHO:
>>>
>>> In [34]: x = np.random.randn(100000)
>>>
>>> In [35]: y = np.random.randn(100000)
>>>
>>> In [36]: timeit np.array_equal(x, y)
>>> 1000 loops, best of 3: 349 us per loop
>>>
>>> - W
>>> _______________________________________________
>>> pypy-dev mailing list
>>> [email protected]
>>> http://mail.python.org/mailman/listinfo/pypy-dev
>>>
>>
>> The correct solution (IMO), is to reuse the original NumPy implementation,
>> but have logical_and.reduce short circuit correctly.  This has the nice
>> side effect of allowing all() and any() to use
>> logical_and/logical_or.reduce.
>>
>> Alx
>>
>
>
> I have complained on the numpy list about 1 year ago about this issue.  The
> usual numpy idiom is
>
> np.any (some comparison)
>
> which will create an array of the full size comparing each element before
> attempting the 'any', which is obviously wasteful.  Hope numpypy can do 
> better.


It does better already FYI. It does not completely work with all kinds
of possible constructs (like .flat) but works in general.
_______________________________________________
pypy-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/pypy-dev

Re: [pypy-dev] certificate for accepting numpypy new funcs?

Reply via email to