Re: [Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-24 Thread Antoine Pitrou
On Thu, 24 Sep 2015 00:20:23 -0700
Nathaniel Smith  wrote:
> > int PyUFunc_Identity(PyFuncObject *)
> >
> >   Replaces ufunc->identity.
> 
> Hmm, I can imagine cases where we might want to change how this works.
> (E.g. if np.dot were a ufunc then the existing identity settings
> wouldn't work very well... and I have some vague memory that there
> might already some delicate code in a few places because of
> difficulties in defining "zero" and "one" for arbitrary dtypes.)

Yes... As long as there is a way for us to set the identity value
(whatever the exact API) when constructing a ufunc, it should be ok.

> I assume the 'i' part isn't actually interesting here (since there's
> no longer any parallel vector of function pointers accessible), and
> the high-level semantics that you're looking for are "please give me
> the set of signatures that have a loop defined"?

Indeed.

> [Edit: Also, see the discussion below about integer type pointers. The
> consequences here are that we can certainly provide an operation like
> this, but if we do then we might be abandoning it in a few releases
> (e.g. it might start telling you about only a subset of defined
> signatures). So can you expand a bit on what you mean by "would be
> nice" above?]

"Would be nice" really means "we could make use of it" for letting the
user access ufunc metadata. We don't *need* it currently. But
generally being able to query the high-level properties of a ufunc,
from C, sounds like a good thing, and perhaps other people would be
interested.

> > PyObject *PyUFunc_SetObject(PyUFuncObject *, PyObject *)
> >
> >   Sets the ufunc's "object" to the given object.  The object has no
> >   special semantics except that it is DECREF'ed when the ufunc is
> >   deallocated (this is today's ufunc->obj).  The DECREF should happen
> >   only after the ufunc has accessed any internal resources (since the
> >   DECREF could deallocate some of those resources).
> 
> I understand why you need a "base" object like this for individual
> loops, but if ufuncs start managing the ufunc-level memory buffers
> internally, then is this still useful? I guess I'm curious to see an
> example.

Well, for example, we dynamically allocate the ufunc's name (and
possibly its docstring), so we need to deallocate it when the ufunc is
destroyed.  Actually, we should probably deallocate more stuff that we
currently don't (such as the execution environment)...

> > PyObject *PyUFunc_GetObject(PyUFuncObject *)
> >
> >   Return the ufunc's current "object".
> 
> Oh, are you planning to actually use this to attach some arbitrary
> metadata, not just attach deallocation callbacks?

No, just deallocation callbacks. I was including the GetObject function
for completeness, I'm not sure we would need it (but it sounds trivial
to provide and maintain).

> Hmm, that's an interesting and tricky point, actually -- I think the
> way it will work eventually is that signatures will be specified in
> terms of "dtypetypes" (i.e., subclasses of dtype, rather than ints
> *or* instances of dtype = PyArray_Descrs).

Subclasses? I'm not sure what you mean by that, how would one specify
e.g. an int64 vs. an int32?

Are you referring to Travis' dtypes-as-classes project, or something
similar? In that case though, a dtype would still be an instance of a
"dtypetype" (metatype), not a subclass :-)

> But I guess that's just a
> challenge we'll have to think about when implementing this stuff --
> either it means that the new ufunc API will have to wait a bit for
> more of the new dtype machinery to be ready, or we'll have to
> temporarily bridge the gap with an loop registration API that takes
> new-style loop callbacks but uses int signatures (and then later turn
> it into a thin wrapper around the final API).

Well, as long as you keep the int typecodes in Numpy (and I guess
you'll do for quite some time, for compatibility), bridging should be
easy indeed.

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-24 Thread Nathaniel Smith
On Tue, Sep 22, 2015 at 7:57 AM, Antoine Pitrou  wrote:
>
> Hi,
>
> This e-mail is an attempt at proposing an API to solve Numba's needs.

Thanks!

> Attribute access
> 
>
> int PyUFunc_Nin(PyUFuncObject *)
>
>   Replaces ufunc->nin.
>
> int PyUFunc_Nout(PyUFuncObject *)
>
>   Replaces ufunc->nout.
>
> int PyUFunc_Nargs(PyUFuncObject *)
>
>   Replaces ufunc->nargs.
>
> PyObject *PyUFunc_Name(PyUFuncObject *)
>
>   Replaces ufunc->name, returns a unicode object.
>   (alternative: return a const char *)

These all seem trivially supportable going forward.

> For introspection, the following would be nice too:
>
> int PyUFunc_Identity(PyFuncObject *)
>
>   Replaces ufunc->identity.

Hmm, I can imagine cases where we might want to change how this works.
(E.g. if np.dot were a ufunc then the existing identity settings
wouldn't work very well... and I have some vague memory that there
might already some delicate code in a few places because of
difficulties in defining "zero" and "one" for arbitrary dtypes.)

> const char *PyUFunc_Signature(PyUFuncObject *, int i)
>
>   Gives a pointer to the types of the i'th signature.
>   (equivalent today to >ntypes[i * ufunc->nargs])

I assume the 'i' part isn't actually interesting here (since there's
no longer any parallel vector of function pointers accessible), and
the high-level semantics that you're looking for are "please give me
the set of signatures that have a loop defined"?

[Edit: Also, see the discussion below about integer type pointers. The
consequences here are that we can certainly provide an operation like
this, but if we do then we might be abandoning it in a few releases
(e.g. it might start telling you about only a subset of defined
signatures). So can you expand a bit on what you mean by "would be
nice" above?]

> Lifetime control
> 
>
> PyObject *PyUFunc_SetObject(PyUFuncObject *, PyObject *)
>
>   Sets the ufunc's "object" to the given object.  The object has no
>   special semantics except that it is DECREF'ed when the ufunc is
>   deallocated (this is today's ufunc->obj).  The DECREF should happen
>   only after the ufunc has accessed any internal resources (since the
>   DECREF could deallocate some of those resources).

I understand why you need a "base" object like this for individual
loops, but if ufuncs start managing the ufunc-level memory buffers
internally, then is this still useful? I guess I'm curious to see an
example.

> PyObject *PyUFunc_GetObject(PyUFuncObject *)
>
>   Return the ufunc's current "object".

Oh, are you planning to actually use this to attach some arbitrary
metadata, not just attach deallocation callbacks?

> Loop registration
> -
>
> int PyUFunc_RegisterLoopForSignature(
> PyUFuncObject* ufunc,
> PyUFuncGenericFunction function, int *arg_types,
> void *data, PyObject *obj)
>
>   Register a loop implementation for the given arg_types (built-in
>   types, presumably). This either appends the loop to the types and
>   functions array (reallocating it if necessary), or replaces an
>   existing one with the same signature.
>
>   A copy of arg_types is done, such that the caller does not have to
>   manage its lifetime. The optional "PyObject *obj" is an object which
>   gets DECREF'ed when the loop is relinquished (for example when the
>   ufunc is destroyed, or when the loop gets replaced with another by
>   calling this function again).
>
>
> I cannot say I'm 100% sure this is sufficient, but this seems it should
> cover our current needs.
>
> Note this is a minimal proposal. For example, Numpy could instead decide
> to pass and return all argument types as PyArray_Descr pointers rather
> than raw integers, and that would probably work for us too.

Hmm, that's an interesting and tricky point, actually -- I think the
way it will work eventually is that signatures will be specified in
terms of "dtypetypes" (i.e., subclasses of dtype, rather than ints
*or* instances of dtype = PyArray_Descrs). But I guess that's just a
challenge we'll have to think about when implementing this stuff --
either it means that the new ufunc API will have to wait a bit for
more of the new dtype machinery to be ready, or we'll have to
temporarily bridge the gap with an loop registration API that takes
new-style loop callbacks but uses int signatures (and then later turn
it into a thin wrapper around the final API).

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-22 Thread Nathaniel Smith
On Tue, Sep 22, 2015 at 3:43 PM, Charles R Harris
 wrote:
>
>
> On Mon, Sep 21, 2015 at 10:23 PM, Nathaniel Smith  wrote:
[...]
>> When it comes to evolving these APIs in general: one unfortunate thing
>> about the PyArrayObject changes in 1.7 is that because they were
>> implemented using *inline* functions (/macros) they haven't affected
>
>
> One thing we might consider along the way is separating numpy.multiarray and
> friends into an actual library plus a module. That way the new numpy api
> would be exposed in the library rather than by importing an array of
> pointers from the module.
>

I'm not sure whether we'll be able to pull this off at the technical
level? Partly because anything involving cross-platform linker
behavior is a recipe for unpleasantness, but mostly because doing
sliding-window API/ABI tracking requires that we have some way to
check which of multiple APIs a given third-party package is
requesting, and provide a nice error if the one they want isn't
available, and I'm not certain how to accomplish that via the regular
linker. But sure, something to look into when we reach that point :-)

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-22 Thread Charles R Harris
On Tue, Sep 22, 2015 at 10:19 PM, Nathaniel Smith  wrote:

> On Tue, Sep 22, 2015 at 3:43 PM, Charles R Harris
>  wrote:
> >
> >
> > On Mon, Sep 21, 2015 at 10:23 PM, Nathaniel Smith  wrote:
> [...]
> >> When it comes to evolving these APIs in general: one unfortunate thing
> >> about the PyArrayObject changes in 1.7 is that because they were
> >> implemented using *inline* functions (/macros) they haven't affected
> >
> >
> > One thing we might consider along the way is separating numpy.multiarray
> and
> > friends into an actual library plus a module. That way the new numpy api
> > would be exposed in the library rather than by importing an array of
> > pointers from the module.
> >
>
> I'm not sure whether we'll be able to pull this off at the technical
> level? Partly because anything involving cross-platform linker
> behavior is a recipe for unpleasantness, but mostly because doing
> sliding-window API/ABI tracking requires that we have some way to
> check which of multiple APIs a given third-party package is
> requesting, and provide a nice error if the one they want isn't
> available, and I'm not certain how to accomplish that via the regular
> linker. But sure, something to look into when we reach that point :-)
>

I'd recommend the Henry Ford approach,  "Any customer can have a car
painted any color that he wants so long as it is *black*".  Essentially, an
ABI break split between a backward compatible layer on top, and a bare
metal layer below, with the latter recommended. We would still need to
solve the 'hide the structure" problem, but that is probably unavoidable
whatever approach we take. In any case, it might be worthwhile making a
list of the functions such a library would expose. I'm not sure how big a
problem linking would be, likely Windows would continue to be the largest
source of problems if we go the shared library route.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-22 Thread Antoine Pitrou
On Mon, 21 Sep 2015 21:38:36 -0700
Nathaniel Smith  wrote:
> Hi Antoine,
> 
> On Mon, Sep 21, 2015 at 2:44 AM, Antoine Pitrou  wrote:
> >
> > Hi Nathaniel,
> >
> > On Sun, 20 Sep 2015 21:13:30 -0700
> > Nathaniel Smith  wrote:
> >> Given this, I propose that for 1.11 we:
> >> 1) go ahead and hide/disable the problematic parts of the ABI/API,
> >> 2) coordinate with the known affected projects to minimize disruption
> >> to their users (which is made easier since they are all projects that
> >> are almost exclusively distributed via conda, which enforces strict
> >> NumPy ABI versioning),
> >> 3) publicize these changes widely so as to give any private code that
> >> might be affected a chance to speak up or adapt, and
> >> 4) leave the "ABI version tag" as it is, so as not to force rebuilds
> >> of the vast majority of projects that will be unaffected by these
> >> changes.
> >
> > Thanks for a detailed and clear explanation of the proposed changes.
> > As far as Numba is concerned, making changes is ok for us provided
> > Numpy provides APIs to do what we want.
> 
> Good to hear, thanks!
> 
> Any interest in designing those new APIs that will do what you want?
> :-)

I'll take a look.

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-22 Thread Antoine Pitrou

Hi,

This e-mail is an attempt at proposing an API to solve Numba's needs.

Attribute access


int PyUFunc_Nin(PyUFuncObject *)

  Replaces ufunc->nin.

int PyUFunc_Nout(PyUFuncObject *)

  Replaces ufunc->nout.

int PyUFunc_Nargs(PyUFuncObject *)

  Replaces ufunc->nargs.

PyObject *PyUFunc_Name(PyUFuncObject *)

  Replaces ufunc->name, returns a unicode object.
  (alternative: return a const char *)

For introspection, the following would be nice too:

int PyUFunc_Identity(PyFuncObject *)

  Replaces ufunc->identity.

const char *PyUFunc_Signature(PyUFuncObject *, int i)

  Gives a pointer to the types of the i'th signature.
  (equivalent today to >ntypes[i * ufunc->nargs])


Lifetime control


PyObject *PyUFunc_SetObject(PyUFuncObject *, PyObject *)

  Sets the ufunc's "object" to the given object.  The object has no
  special semantics except that it is DECREF'ed when the ufunc is
  deallocated (this is today's ufunc->obj).  The DECREF should happen
  only after the ufunc has accessed any internal resources (since the
  DECREF could deallocate some of those resources).

PyObject *PyUFunc_GetObject(PyUFuncObject *)

  Return the ufunc's current "object".


Loop registration
-

int PyUFunc_RegisterLoopForSignature(
PyUFuncObject* ufunc,
PyUFuncGenericFunction function, int *arg_types,
void *data, PyObject *obj)

  Register a loop implementation for the given arg_types (built-in
  types, presumably). This either appends the loop to the types and
  functions array (reallocating it if necessary), or replaces an
  existing one with the same signature.

  A copy of arg_types is done, such that the caller does not have to
  manage its lifetime. The optional "PyObject *obj" is an object which
  gets DECREF'ed when the loop is relinquished (for example when the
  ufunc is destroyed, or when the loop gets replaced with another by
  calling this function again).


I cannot say I'm 100% sure this is sufficient, but this seems it should
cover our current needs.

Note this is a minimal proposal. For example, Numpy could instead decide
to pass and return all argument types as PyArray_Descr pointers rather
than raw integers, and that would probably work for us too.

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-21 Thread Nathaniel Smith
On Mon, Sep 21, 2015 at 7:29 AM, Jaime Fernández del Río
 wrote:
> We have the PyArrayObject vs PyArrayObject_fields definition in
> ndarraytypes.h that is used to enforce access to the members through inline
> functions rather than directly,  which seems to me like the right way to go:
> don't leave stones unturned, hide everything and provide PyUFunc_NIN,
> PyUFunc_NOUT and friends to handle those too.

The PyArrayObject vs PyArrayObject_fields distinction is only enabled
if a downstream library explicitly requests it with #define
NPY_NO_DEPRECATED_API, though -- the idea is that the changes in this
NEP would be enabled unconditionally, even for old code. So the reason
nin/nout/nargs are left exposed in this proposal is that there's some
existing code out there that would break (until updated) if we hid
them, and not much benefit to breaking it.

If we're fine with breaking that code then we could just hide them
unconditionally too. The only code I found in the wild that would be
affected is the "rational" user-defined dtype, which would be
trivially fixable since the only thing it does with ufunc->nargs is a
quick consistency check:

  
https://github.com/numpy/numpy-dtypes/blob/c0175a6b1c5aa89b4520b29487f06d0e200e2a03/npytypes/rational/rational.c#L1140-L1151

Also it's not 100% clear right now whether we even want to keep
supporting the old user-defined dtype API that this particular code is
based around. But if this code uses ufunc->nargs then perhaps other
code does too? I'm open to opinions -- I doubt it matters that much
either way. I just want to make sure that we can hide the other stuff
:-).

When it comes to evolving these APIs in general: one unfortunate thing
about the PyArrayObject changes in 1.7 is that because they were
implemented using *inline* functions (/macros) they haven't affected
the a*B*i exposure at all, even in code that has upgraded to the new
calling conventions. While user code no longer *names* the internal
fields directly, we still have to implement exactly the same fields
and put them in exactly the same place in memory or else break ABI.
And the other unfortunate thing is that we don't really have a
mechanism for saying "okay, we're dropping support for the old way of
doing things in 1.xx" -- in particular the current
NPY_NO_DEPRECATED_API mechanism doesn't give us any way to detect and
error out if someone tries to use an old version of the APIs, so ABI
breaks still mean segfaults. I'm thinking that if/when we figure out
how to implement the "sliding window" API/ABI idea that we talked
about at SciPy, then that will give us a strategy for cleanly
transitioning to a world with a maintainable API+ABI and it becomes
worth sitting down and making up a set of setters/getters for the
attributes that we want to make public in a maintainable way. But
until then our only real options are either hard breaks or nothing, so
unless we want to do a hard break there's not much point talking about
it.

-n

> On Sun, Sep 20, 2015 at 9:13 PM, Nathaniel Smith  wrote:
>>
>> Hi all,
>>
>> Here's a first draft NEP for comments.
>>
>> --
>>
>> Synopsis
>> 
>>
>> Improving numpy's dtype system requires that ufunc loops start having
>> access to details of the specific dtype instance they are acting on:
>> e.g. an implementation of np.equal for strings needs access to the
>> dtype object in order to know what "n" to pass to strncmp. Similar
>> issues arise with variable length strings, missing values, categorical
>> data, unit support, datetime with timezone support, etc. -- this is a
>> major blocker for improving numpy.
>>
>> Unfortunately, the current ufunc inner loop function signature makes
>> it very difficult to provide this information. We might be able to
>> wedge it in there, but it'd be ugly.
>>
>> The other option would be to change the signature. What would happen
>> if we did this? For most common uses of the C API/ABI, we could do
>> this easily while maintaining backwards compatibility. But there are
>> also some rarely-used parts  of the API/ABI that would be
>> prohibitively difficult to preserve.
>>
>> In addition, there are other potential changes to ufuncs on the
>> horizon (e.g. extensions of gufuncs to allow them to be used more
>> generally), and the current API exposure is so massive that any such
>> changes will be difficult to make in a fully compatible way. This NEP
>> thus considers the possibility of closing down the ufunc API to a
>> minimal, maintainable subset of the current API.
>>
>> To better understand the consequences of this potential change, I
>> performed an exhaustive analysis of all the code on Github, Bitbucket,
>> and Fedora, among others. The results make me highly confident that of
>> all the publically available projects in the world, the only ones
>> which touch the problematic parts of the ufunc API are: Numba,
>> dynd-python, and `gulinalg `_
>> (with the 

Re: [Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-21 Thread Nathaniel Smith
Hi Antoine,

On Mon, Sep 21, 2015 at 2:44 AM, Antoine Pitrou  wrote:
>
> Hi Nathaniel,
>
> On Sun, 20 Sep 2015 21:13:30 -0700
> Nathaniel Smith  wrote:
>> Given this, I propose that for 1.11 we:
>> 1) go ahead and hide/disable the problematic parts of the ABI/API,
>> 2) coordinate with the known affected projects to minimize disruption
>> to their users (which is made easier since they are all projects that
>> are almost exclusively distributed via conda, which enforces strict
>> NumPy ABI versioning),
>> 3) publicize these changes widely so as to give any private code that
>> might be affected a chance to speak up or adapt, and
>> 4) leave the "ABI version tag" as it is, so as not to force rebuilds
>> of the vast majority of projects that will be unaffected by these
>> changes.
>
> Thanks for a detailed and clear explanation of the proposed changes.
> As far as Numba is concerned, making changes is ok for us provided
> Numpy provides APIs to do what we want.

Good to hear, thanks!

Any interest in designing those new APIs that will do what you want?
:-) A no-brainer is that PyUFuncObject should just take responsibility
for managing the memory of its own internal arrays instead of assuming
that they'll always be statically allocated and forcing elaborate
workarounds when they're not, but there is a lot of complicated stuff
going on in numba/npyufunc/_internal.c... I am even wondering whether
we should go ahead and reify a first-class "ufunc loop" object, so it
can have its own refcounting.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-21 Thread Bryan Van de Ven

> 
> until then our only real options are either hard breaks or nothing, so
> unless we want to do a hard break there's not much point talking about
> it.

I think this is the most important sentence from this thread. Thank you 
Nathaniel for you extremely thorough analysis of the impact on real-world 
projects. 

Bryan 


> On Sep 22, 2015, at 12:23 AM, Nathaniel Smith  wrote:
> 
> On Mon, Sep 21, 2015 at 7:29 AM, Jaime Fernández del Río
>  wrote:
>> We have the PyArrayObject vs PyArrayObject_fields definition in
>> ndarraytypes.h that is used to enforce access to the members through inline
>> functions rather than directly,  which seems to me like the right way to go:
>> don't leave stones unturned, hide everything and provide PyUFunc_NIN,
>> PyUFunc_NOUT and friends to handle those too.
> 
> The PyArrayObject vs PyArrayObject_fields distinction is only enabled
> if a downstream library explicitly requests it with #define
> NPY_NO_DEPRECATED_API, though -- the idea is that the changes in this
> NEP would be enabled unconditionally, even for old code. So the reason
> nin/nout/nargs are left exposed in this proposal is that there's some
> existing code out there that would break (until updated) if we hid
> them, and not much benefit to breaking it.
> 
> If we're fine with breaking that code then we could just hide them
> unconditionally too. The only code I found in the wild that would be
> affected is the "rational" user-defined dtype, which would be
> trivially fixable since the only thing it does with ufunc->nargs is a
> quick consistency check:
> 
>  
> https://github.com/numpy/numpy-dtypes/blob/c0175a6b1c5aa89b4520b29487f06d0e200e2a03/npytypes/rational/rational.c#L1140-L1151
> 
> Also it's not 100% clear right now whether we even want to keep
> supporting the old user-defined dtype API that this particular code is
> based around. But if this code uses ufunc->nargs then perhaps other
> code does too? I'm open to opinions -- I doubt it matters that much
> either way. I just want to make sure that we can hide the other stuff
> :-).
> 
> When it comes to evolving these APIs in general: one unfortunate thing
> about the PyArrayObject changes in 1.7 is that because they were
> implemented using *inline* functions (/macros) they haven't affected
> the a*B*i exposure at all, even in code that has upgraded to the new
> calling conventions. While user code no longer *names* the internal
> fields directly, we still have to implement exactly the same fields
> and put them in exactly the same place in memory or else break ABI.
> And the other unfortunate thing is that we don't really have a
> mechanism for saying "okay, we're dropping support for the old way of
> doing things in 1.xx" -- in particular the current
> NPY_NO_DEPRECATED_API mechanism doesn't give us any way to detect and
> error out if someone tries to use an old version of the APIs, so ABI
> breaks still mean segfaults. I'm thinking that if/when we figure out
> how to implement the "sliding window" API/ABI idea that we talked
> about at SciPy, then that will give us a strategy for cleanly
> transitioning to a world with a maintainable API+ABI and it becomes
> worth sitting down and making up a set of setters/getters for the
> attributes that we want to make public in a maintainable way. But
> until then our only real options are either hard breaks or nothing, so
> unless we want to do a hard break there's not much point talking about
> it.
> 
> -n
> 
>> On Sun, Sep 20, 2015 at 9:13 PM, Nathaniel Smith  wrote:
>>> 
>>> Hi all,
>>> 
>>> Here's a first draft NEP for comments.
>>> 
>>> --
>>> 
>>> Synopsis
>>> 
>>> 
>>> Improving numpy's dtype system requires that ufunc loops start having
>>> access to details of the specific dtype instance they are acting on:
>>> e.g. an implementation of np.equal for strings needs access to the
>>> dtype object in order to know what "n" to pass to strncmp. Similar
>>> issues arise with variable length strings, missing values, categorical
>>> data, unit support, datetime with timezone support, etc. -- this is a
>>> major blocker for improving numpy.
>>> 
>>> Unfortunately, the current ufunc inner loop function signature makes
>>> it very difficult to provide this information. We might be able to
>>> wedge it in there, but it'd be ugly.
>>> 
>>> The other option would be to change the signature. What would happen
>>> if we did this? For most common uses of the C API/ABI, we could do
>>> this easily while maintaining backwards compatibility. But there are
>>> also some rarely-used parts  of the API/ABI that would be
>>> prohibitively difficult to preserve.
>>> 
>>> In addition, there are other potential changes to ufuncs on the
>>> horizon (e.g. extensions of gufuncs to allow them to be used more
>>> generally), and the current API exposure is so massive that any such
>>> changes will be difficult to make in a fully compatible way. This NEP
>>> thus considers 

Re: [Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-21 Thread Jaime Fernández del Río
We have the PyArrayObject vs PyArrayObject_fields definition in
ndarraytypes.h that is used to enforce access to the members through inline
functions rather than directly,  which seems to me like the right way to
go: don't leave stones unturned, hide everything and provide PyUFunc_NIN,
PyUFunc_NOUT and friends to handle those too.

On Sun, Sep 20, 2015 at 9:13 PM, Nathaniel Smith  wrote:

> Hi all,
>
> Here's a first draft NEP for comments.
>
> --
>
> Synopsis
> 
>
> Improving numpy's dtype system requires that ufunc loops start having
> access to details of the specific dtype instance they are acting on:
> e.g. an implementation of np.equal for strings needs access to the
> dtype object in order to know what "n" to pass to strncmp. Similar
> issues arise with variable length strings, missing values, categorical
> data, unit support, datetime with timezone support, etc. -- this is a
> major blocker for improving numpy.
>
> Unfortunately, the current ufunc inner loop function signature makes
> it very difficult to provide this information. We might be able to
> wedge it in there, but it'd be ugly.
>
> The other option would be to change the signature. What would happen
> if we did this? For most common uses of the C API/ABI, we could do
> this easily while maintaining backwards compatibility. But there are
> also some rarely-used parts  of the API/ABI that would be
> prohibitively difficult to preserve.
>
> In addition, there are other potential changes to ufuncs on the
> horizon (e.g. extensions of gufuncs to allow them to be used more
> generally), and the current API exposure is so massive that any such
> changes will be difficult to make in a fully compatible way. This NEP
> thus considers the possibility of closing down the ufunc API to a
> minimal, maintainable subset of the current API.
>
> To better understand the consequences of this potential change, I
> performed an exhaustive analysis of all the code on Github, Bitbucket,
> and Fedora, among others. The results make me highly confident that of
> all the publically available projects in the world, the only ones
> which touch the problematic parts of the ufunc API are: Numba,
> dynd-python, and `gulinalg `_
> (with the latter's exposure being trivial).
>
> Given this, I propose that for 1.11 we:
> 1) go ahead and hide/disable the problematic parts of the ABI/API,
> 2) coordinate with the known affected projects to minimize disruption
> to their users (which is made easier since they are all projects that
> are almost exclusively distributed via conda, which enforces strict
> NumPy ABI versioning),
> 3) publicize these changes widely so as to give any private code that
> might be affected a chance to speak up or adapt, and
> 4) leave the "ABI version tag" as it is, so as not to force rebuilds
> of the vast majority of projects that will be unaffected by these
> changes.
>
> This NEP defers the question of exactly what the improved API should
> be, since there's no point in trying to nail down the details until
> we've decided whether it's even possible to change.
>
>
> Details
> ===
>
> The problem
> ---
>
> Currently, a ufunc inner loop implementation is called via the
> following function prototype::
>
> typedef void (*PyUFuncGenericFunction)
> (char **args,
>  npy_intp *dimensions,
>  npy_intp *strides,
>  void *innerloopdata);
>
> Here ``args`` is an array of pointers to 1-d buffers of input/output
> data, ``dimensions`` is a pointer to the number of entries in these
> buffers, ``strides`` is an array of integers giving the strides for
> each input/output array, and ``innerloopdata`` is an arbitrary void*
> supplied by whoever registered the ufunc loop. (For gufuncs, extra
> shape and stride information about the core dimensions also gets
> packed into the ends of these arrays in a somewhat complicated way.)
>
> There are 4 key items that define a NumPy array: data, shape, strides,
> dtype. Notice that this function only gets access to 3 of them. Our
> goal is to fix that. For example, a better signature would be::
>
> typedef void (*PyUFuncGenericFunction_NEW)
> (char **data,
>  npy_intp *shapes,
>  npy_intp *strides,
>  PyArray_Descr *dtypes,   /* NEW */
>  void *innerloopdata);
>
> (In practice I suspect we might want to make some more changes as
> well, like upgrading gufunc core shape/strides to proper arguments
> instead of tacking it onto the existing arrays, and adding an "escape
> valve" void* reserved for future extensions. But working out such
> details is outside the scope of this NEP; the above will do for
> illustration.)
>
> The goal of this NEP is to clear the ground so that we can start
> supporting ufunc inner loops that take dtype arguments, and make other
> enhancements to ufunc functionality going 

Re: [Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-21 Thread Antoine Pitrou

Hi Nathaniel,

On Sun, 20 Sep 2015 21:13:30 -0700
Nathaniel Smith  wrote:
> Given this, I propose that for 1.11 we:
> 1) go ahead and hide/disable the problematic parts of the ABI/API,
> 2) coordinate with the known affected projects to minimize disruption
> to their users (which is made easier since they are all projects that
> are almost exclusively distributed via conda, which enforces strict
> NumPy ABI versioning),
> 3) publicize these changes widely so as to give any private code that
> might be affected a chance to speak up or adapt, and
> 4) leave the "ABI version tag" as it is, so as not to force rebuilds
> of the vast majority of projects that will be unaffected by these
> changes.

Thanks for a detailed and clear explanation of the proposed changes.
As far as Numba is concerned, making changes is ok for us provided
Numpy provides APIs to do what we want.

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

2015-09-20 Thread Nathaniel Smith
Hi all,

Here's a first draft NEP for comments.

--

Synopsis


Improving numpy's dtype system requires that ufunc loops start having
access to details of the specific dtype instance they are acting on:
e.g. an implementation of np.equal for strings needs access to the
dtype object in order to know what "n" to pass to strncmp. Similar
issues arise with variable length strings, missing values, categorical
data, unit support, datetime with timezone support, etc. -- this is a
major blocker for improving numpy.

Unfortunately, the current ufunc inner loop function signature makes
it very difficult to provide this information. We might be able to
wedge it in there, but it'd be ugly.

The other option would be to change the signature. What would happen
if we did this? For most common uses of the C API/ABI, we could do
this easily while maintaining backwards compatibility. But there are
also some rarely-used parts  of the API/ABI that would be
prohibitively difficult to preserve.

In addition, there are other potential changes to ufuncs on the
horizon (e.g. extensions of gufuncs to allow them to be used more
generally), and the current API exposure is so massive that any such
changes will be difficult to make in a fully compatible way. This NEP
thus considers the possibility of closing down the ufunc API to a
minimal, maintainable subset of the current API.

To better understand the consequences of this potential change, I
performed an exhaustive analysis of all the code on Github, Bitbucket,
and Fedora, among others. The results make me highly confident that of
all the publically available projects in the world, the only ones
which touch the problematic parts of the ufunc API are: Numba,
dynd-python, and `gulinalg `_
(with the latter's exposure being trivial).

Given this, I propose that for 1.11 we:
1) go ahead and hide/disable the problematic parts of the ABI/API,
2) coordinate with the known affected projects to minimize disruption
to their users (which is made easier since they are all projects that
are almost exclusively distributed via conda, which enforces strict
NumPy ABI versioning),
3) publicize these changes widely so as to give any private code that
might be affected a chance to speak up or adapt, and
4) leave the "ABI version tag" as it is, so as not to force rebuilds
of the vast majority of projects that will be unaffected by these
changes.

This NEP defers the question of exactly what the improved API should
be, since there's no point in trying to nail down the details until
we've decided whether it's even possible to change.


Details
===

The problem
---

Currently, a ufunc inner loop implementation is called via the
following function prototype::

typedef void (*PyUFuncGenericFunction)
(char **args,
 npy_intp *dimensions,
 npy_intp *strides,
 void *innerloopdata);

Here ``args`` is an array of pointers to 1-d buffers of input/output
data, ``dimensions`` is a pointer to the number of entries in these
buffers, ``strides`` is an array of integers giving the strides for
each input/output array, and ``innerloopdata`` is an arbitrary void*
supplied by whoever registered the ufunc loop. (For gufuncs, extra
shape and stride information about the core dimensions also gets
packed into the ends of these arrays in a somewhat complicated way.)

There are 4 key items that define a NumPy array: data, shape, strides,
dtype. Notice that this function only gets access to 3 of them. Our
goal is to fix that. For example, a better signature would be::

typedef void (*PyUFuncGenericFunction_NEW)
(char **data,
 npy_intp *shapes,
 npy_intp *strides,
 PyArray_Descr *dtypes,   /* NEW */
 void *innerloopdata);

(In practice I suspect we might want to make some more changes as
well, like upgrading gufunc core shape/strides to proper arguments
instead of tacking it onto the existing arrays, and adding an "escape
valve" void* reserved for future extensions. But working out such
details is outside the scope of this NEP; the above will do for
illustration.)

The goal of this NEP is to clear the ground so that we can start
supporting ufunc inner loops that take dtype arguments, and make other
enhancements to ufunc functionality going forward.


Proposal


Currently, the public API/ABI for ufuncs consists of the functions::

PyUFunc_GenericFunction

PyUFunc_FromFuncAndData
PyUFunc_FromFuncAndDataAndSignature
PyUFunc_RegisterLoopForDescr
PyUFunc_RegisterLoopForType

PyUFunc_ReplaceLoopBySignature
PyUFunc_SetUsesArraysAsData

together with direct access to PyUFuncObject's internal fields::

typedef struct {
PyObject_HEAD
int nin, nout, nargs;
int identity;
PyUFuncGenericFunction *functions;
void **data;
int ntypes;
int