Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-30 Thread Marten van Kerkwijk
Hi Nathaniel, others,

I read the discussion of plans with interest. One item that struck me is
that while there are great plans to have a proper extensible and presumably
subclassable dtype, it is discouraged to subclass ndarray itself (rather,
it is encouraged to use a broader array interface). From my experience with
astropy in both Quantity (an ndarray subclass), Time (a separate class
containing high precision times using two ndarray float64), and Table
(initially holding structured arrays, but now sets of Columns, which
themselves are ndarray subclasses), I'm not convinced the broader, new
containers approach is that much preferable. Rather, it leads to a lot of
boiler-plate code to reimplement things ndarray does already (since one is
effectively just calling the methods on the underlying arrays).

I also think the idea that a dtype becomes something that also contains a
unit is a bit odd. Shouldn't dtype just be about how data is stored? Why
include meta-data such as units?

Instead, I think a quantity is most logically seen as numbers with a unit,
just like masked arrays are numbers with masks, and variables numbers with
uncertainties. Each of these cases adds extra information in different
forms, and all are quite easily thought of as subclasses of ndarray where
all operations do the normal operation, plus some extra work to keep the
extra information up to date.

Anyway, my suggestion would be to *encourage* rather than discourage
ndarray subclassing, and help this by making ndarray (even) better.

All the best,

Marten




On Thu, Aug 27, 2015 at 11:03 AM,  wrote:

>
>
> On Wed, Aug 26, 2015 at 10:06 AM, Travis Oliphant 
> wrote:
>
>>
>>
>> On Wed, Aug 26, 2015 at 1:41 AM, Nathaniel Smith  wrote:
>>
>>> Hi Travis,
>>>
>>> Thanks for taking the time to write up your thoughts!
>>>
>>> I have many thoughts in return, but I will try to restrict myself to two
>>> main ones :-).
>>>
>>> 1) On the question of whether work should be directed towards improving
>>> NumPy-as-it-is or instead towards a compatibility-breaking replacement:
>>> There's plenty of room for debate about whether it's better engineering
>>> practice to try and evolve an existing system in place versus starting
>>> over, and I guess we have some fundamental disagreements there, but I
>>> actually think this debate is a distraction -- we can agree to disagree,
>>> because in fact we have to try both.
>>>
>>
>> Yes, on this we agree.   I think NumPy can improve *and* we can have new
>> innovative array objects.   I don't disagree about that.
>>
>>
>>>
>>> At a practical level: NumPy *is* going to continue to evolve, because it
>>> has users and people interested in evolving it; similarly, dynd and other
>>> alternatives libraries will also continue to evolve, because they also have
>>> people interested in doing it. And at a normative level, this is a good
>>> thing! If NumPy and dynd both get better, than that's awesome: the worst
>>> case is that NumPy adds the new features that we talked about at the
>>> meeting, and dynd simultaneously becomes so awesome that everyone wants to
>>> switch to it, and the result of this would be... that those NumPy features
>>> are exactly the ones that will make the transition to dynd easier. Or if
>>> some part of that plan goes wrong, then well, NumPy will still be there as
>>> a fallback, and in the mean time we've actually fixed the major pain points
>>> our users are begging us to fix.
>>>
>>> You seem to be urging us all to make a double-or-nothing wager that your
>>> extremely ambitious plans will all work out, with the entire numerical
>>> Python ecosystem as the stakes. I think this ambition is awesome, but maybe
>>> it'd be wise to hedge our bets a bit?
>>>
>>
>> You are mis-characterizing my view.  I think NumPy can evolve (though I
>> would personally rather see a bigger change to the underlying system like I
>> outlined before).But, I don't believe it can even evolve easily in the
>> direction needed without breaking ABI and that insisting on not breaking it
>> or even putting too much effort into not breaking it will continue to
>> create less-optimal solutions that are harder to maintain and do not take
>> advantage of knowledge this community now has.
>>
>> I'm also very concerned that 'evolving' NumPy will create a situation
>> where there are regular semantic and subtle API changes that will cause
>> NumPy to be less stable for it's user-base.I've watched this happen.
>> This at a time that people are already looking around for new and different
>> approaches anyway.
>>
>>
>>>
>>> 2) You really emphasize this idea of an ABI-breaking (but not
>>> API-breaking) release, and I think this must indicate some basic gap in how
>>> we're looking at things. Where I'm getting stuck here is that... I actually
>>> can't think of anything important that we can't do now, but could if we
>>> were allowed to break ABI compatibility. The kinds of things that break ABI
>>> but keep API are 

[Numpy-discussion] np.sign and object comparisons

2015-08-30 Thread Jaime Fernández del Río
There's been some work going on recently on Py2 vs Py3 object comparisons.
If you want all the background, see gh-6265
 and follow the links there.

There is a half baked PR in the works, gh-6269
, that tries to unify behavior
and fix some bugs along the way, by replacing all 2.x uses of
PyObject_Compare with several calls to PyObject_RichCompareBool, which is
available on 2.6, the oldest Python version we support.

The poster child for this example is computing np.sign on an object array
that has an np.nan entry. 2.x will just make up an answer for us:

>>> cmp(np.nan, 0)
-1

even though none of the relevant compares succeeds:

>>> np.nan < 0
False
>>> np.nan > 0
False
>>> np.nan == 0
False

The current 3.x is buggy, so the fact that it produces the same made up
result as in 2.x is accidental:

>>> np.sign(np.array([np.nan], 'O'))
array([-1], dtype=object)

Looking at the code, it seems that the original intention was for the
answer to be `0`, which is equally made up but perhaps makes a little more
sense.

There are three ways of fixing this that I see:

   1. Arbitrarily choose a value to set the return to. This is equivalent
   to choosing a default return for `cmp` for comparisons. This preserves
   behavior, but feels wrong.
   2. Similarly to how np.sign of a floating point array with nans returns
   nan for those values, return e,g, None for these cases. This is my
   preferred option.
   3. Raise an error, along the lines of the TypeError: unorderable types
   that 3.x produces for some comparisons.

Thoughts anyone?

Jaime
-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Cythonizing some of NumPy

2015-08-30 Thread David Cournapeau
Hi there,

Reading Nathaniel summary from the numpy dev meeting, it looks like there
is a consensus on using cython in numpy for the Python-C interfaces.

This has been on my radar for a long time: that was one of my rationale for
splitting multiarray into multiple "independent" .c files half a decade
ago. I took the opportunity of EuroScipy sprints to look back into this,
but before looking more into it, I'd like to make sure I am not going
astray:

1. The transition has to be gradual
2. The obvious way I can think of allowing cython in multiarray is
modifying multiarray such as cython "owns" the PyMODINIT_FUNC and the
module PyModuleDef table.
3. We start using cython for the parts that are mostly menial refcount
work. Things like functions in calculation.c are obvious candidates.

Step 2 should not be disruptive, and does not look like a lot of work:
there are < 60 methods in the table, and most of them should be fairly
straightforward to cythonize. At worse, we could just keep them as is
outside cython and just "export" them in cython.

Does that sound like an acceptable plan ?

If so, I will start working on a PR to work on 2.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion