Re: [Numpy-discussion] PR added: frozen dimensions in gufunc signatures

2015-06-23 Thread Oscar Villellas
On Fri, Aug 29, 2014 at 10:55 AM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 On Thu, Aug 28, 2014 at 5:40 PM, Nathaniel Smith n...@pobox.com wrote:

 Some thoughts:

 But, for your computed dimension idea I'm wondering if what we should
 do instead is just let a gufunc provide a C callback that looks at the
 input array dimensions and explicitly says somehow which dimensions it
 wants to treat as the core dimensions and what its output shapes will
 be. There's no rule that we have to extend the signature mini-language
 to be Turing complete, we can just use C :-).

 It would be good to have a better motivation for computed gufunc
 dimensions, though. Your all pairwise cross products example would
 be *much* better handled by implementing the .outer method for binary
 gufuncs: pairwise_cross(a) == cross.outer(a, a). This would make
 gufuncs more consistent with ufuncs, plus let you do
 all-pairwise-cross-products between two different sets of cross
 products, plus give us all-pairwise-matrix-products for free, etc.


 The outer for binary gufuncs sounds like a good idea. A reduce for binary
 gufuncs that allow it (like square matrix multiplication) would also be
 nice. But going back to the original question, the pairwise whatevers were
 just an example: one could come up with several others, e.g.:

 (m),(n)-($p),($q) with $p = m - n and $q = n - 1, could be (I think)
 the signature of a polynomial division gufunc
 (m),(n)-($p), with $p = m - n + 1, could be the signature of a
 convolution or correlation gufunc
 (m)-($n), with $n = m / 2, could be some form of downsampling gufunc


An example where a computed output dimension would be useful is with
linalg.svd, as some resulting dimensions for a matrix (m, n) are based on
min(m, n). This, coupled with the required keyword support makes it
necessary to have 6 gufuncs to support the functionality.

I do think that the C callback solution would be enough, and just allow the
signature to have unbound variables that can be resolved by that
callback... no need to change the syntax:

(m),(n)-(p),(q)

When registering such a gufunc, a callback function that resolves the
missing dimensions would be required.

Extra niceties that could be built on top of that:
- pass keyword arguments to that function so that stuff like full_matrices
could be resolved inside the gufunc. Maybe even allowing to modify the
number of results (harder) that would be needed to support stuff like
compute_uv in svd as well.

- allow context to be created in that resolution that gets passed into the
ufunc kernel itself (note that this might be *necessary*). If context is
created another function would be needed to dispose that context.


In my experience when implementing the linalg gufunc, a very common pattern
was needing some buffers for the actual LAPACK calls (as those functions
are inplace, a tmp buffer was always needed). Some setup and buffer
allocation was performed before looping. Every iteration in the inner loop
will reuse that data and at the end of the loop the buffers will be
released. That means the initialization/allocation/release is done once per
inner loop call. If the hooks to allocate/dispose the context existed, that
initialization/allocation/release could be done once per ufunc call. AFAIK,
a ufunc call can involve several inner loop calls depending on outer
dimensions and layout of the operands.


 While you're messing around with the gufunc dimension matching logic,
 any chance we can tempt you to implement the optional dimensions
 needed to handle '@', solve, etc. elegantly? The rule would be that
 you can write something like
(n?,k),(k,m?)-(n?,m?)
 and the ? dimensions are allowed to take on an additional value
 nothing at all. If there's no dimension available in the input, then
 we act like it was reshaped to add a dimension with shape 1, and then
 in the output we squeeze this dimension out again. I guess the rules
 would be that (1) in the input, you can have ? dimensions at the
 beginning or the end of your shape, but not both at the same time, (2)
 any dimension that has a ? in one place must have it in all places,
 (3) when checking argument conformity, nothing at all only matches
 against nothing at all, not against 1; this is because if we allowed
 (n?,m),(n?,m)-(n?,m) to be applied to two arrays with shapes (5,) and
 (1, 5), then it would be ambiguous whether the output should have
 shape (5,) or (1, 5).


 I definitely do not mind taking a look into it. I need to think a little
 more about the rules to convince myself that there is a consistent set of
 them that we can use. I also thought there may be a performance concern,
 that you may want to have different implementations when dimensions are
 missing, not automatically add a 1 and then remove it. It doesn't seem to
 be the case with neither `np.dot` nor `np.solve`, so maybe I am being
 overly cautious.

 Thanks for your comments and ideas. I have a feeling there are some 

Re: [Numpy-discussion] PR added: frozen dimensions in gufunc signatures

2015-06-22 Thread Ian Henriksen
On Fri, Aug 29, 2014 at 2:55 AM Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 On Thu, Aug 28, 2014 at 5:40 PM, Nathaniel Smith n...@pobox.com wrote:

 Some thoughts:


 But, for your computed dimension idea I'm wondering if what we should
 do instead is just let a gufunc provide a C callback that looks at the
 input array dimensions and explicitly says somehow which dimensions it
 wants to treat as the core dimensions and what its output shapes will
 be. There's no rule that we have to extend the signature mini-language
 to be Turing complete, we can just use C :-).

 It would be good to have a better motivation for computed gufunc
 dimensions, though. Your all pairwise cross products example would
 be *much* better handled by implementing the .outer method for binary
 gufuncs: pairwise_cross(a) == cross.outer(a, a). This would make
 gufuncs more consistent with ufuncs, plus let you do
 all-pairwise-cross-products between two different sets of cross
 products, plus give us all-pairwise-matrix-products for free, etc.


 The outer for binary gufuncs sounds like a good idea. A reduce for binary
 gufuncs that allow it (like square matrix multiplication) would also be
 nice. But going back to the original question, the pairwise whatevers were
 just an example: one could come up with several others, e.g.:

 (m),(n)-($p),($q) with $p = m - n and $q = n - 1, could be (I think)
 the signature of a polynomial division gufunc
 (m),(n)-($p), with $p = m - n + 1, could be the signature of a
 convolution or correlation gufunc
 (m)-($n), with $n = m / 2, could be some form of downsampling gufunc


 While you're messing around with the gufunc dimension matching logic,
 any chance we can tempt you to implement the optional dimensions
 needed to handle '@', solve, etc. elegantly? The rule would be that
 you can write something like
(n?,k),(k,m?)-(n?,m?)
 and the ? dimensions are allowed to take on an additional value
 nothing at all. If there's no dimension available in the input, then
 we act like it was reshaped to add a dimension with shape 1, and then
 in the output we squeeze this dimension out again. I guess the rules
 would be that (1) in the input, you can have ? dimensions at the
 beginning or the end of your shape, but not both at the same time, (2)
 any dimension that has a ? in one place must have it in all places,
 (3) when checking argument conformity, nothing at all only matches
 against nothing at all, not against 1; this is because if we allowed
 (n?,m),(n?,m)-(n?,m) to be applied to two arrays with shapes (5,) and
 (1, 5), then it would be ambiguous whether the output should have
 shape (5,) or (1, 5).


 I definitely do not mind taking a look into it. I need to think a little
 more about the rules to convince myself that there is a consistent set of
 them that we can use. I also thought there may be a performance concern,
 that you may want to have different implementations when dimensions are
 missing, not automatically add a 1 and then remove it. It doesn't seem to
 be the case with neither `np.dot` nor `np.solve`, so maybe I am being
 overly cautious.

 Thanks for your comments and ideas. I have a feeling there are some nice
 features hidden in here, but I can't seem to figure out what should they be
 on my own.

 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
 de dominación mundial.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


I'm not sure where this is at, given the current amount of work that is
coming from the 1.10 release, but this sounds like a really great idea.
Computed/fixed dimensions would allow gufuncs for things like:
- polynomial multiplication, division, differentiation, and integration
- convolutions
- views of different types (see the corresponding discussion at
http://permalink.gmane.org/gmane.comp.python.numeric.general/59847).
Some of these examples would work better with gufuncs that can construct
views and have an axes keyword, but this is exactly the kind of
functionality that would be really great to have.
Thanks for the great work!
-Ian Henriksen
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PR added: frozen dimensions in gufunc signatures

2014-08-29 Thread Jaime Fernández del Río
On Thu, Aug 28, 2014 at 5:40 PM, Nathaniel Smith n...@pobox.com wrote:

 Some thoughts:

 But, for your computed dimension idea I'm wondering if what we should
 do instead is just let a gufunc provide a C callback that looks at the
 input array dimensions and explicitly says somehow which dimensions it
 wants to treat as the core dimensions and what its output shapes will
 be. There's no rule that we have to extend the signature mini-language
 to be Turing complete, we can just use C :-).

 It would be good to have a better motivation for computed gufunc
 dimensions, though. Your all pairwise cross products example would
 be *much* better handled by implementing the .outer method for binary
 gufuncs: pairwise_cross(a) == cross.outer(a, a). This would make
 gufuncs more consistent with ufuncs, plus let you do
 all-pairwise-cross-products between two different sets of cross
 products, plus give us all-pairwise-matrix-products for free, etc.


The outer for binary gufuncs sounds like a good idea. A reduce for binary
gufuncs that allow it (like square matrix multiplication) would also be
nice. But going back to the original question, the pairwise whatevers were
just an example: one could come up with several others, e.g.:

(m),(n)-($p),($q) with $p = m - n and $q = n - 1, could be (I think)
the signature of a polynomial division gufunc
(m),(n)-($p), with $p = m - n + 1, could be the signature of a
convolution or correlation gufunc
(m)-($n), with $n = m / 2, could be some form of downsampling gufunc


 While you're messing around with the gufunc dimension matching logic,
 any chance we can tempt you to implement the optional dimensions
 needed to handle '@', solve, etc. elegantly? The rule would be that
 you can write something like
(n?,k),(k,m?)-(n?,m?)
 and the ? dimensions are allowed to take on an additional value
 nothing at all. If there's no dimension available in the input, then
 we act like it was reshaped to add a dimension with shape 1, and then
 in the output we squeeze this dimension out again. I guess the rules
 would be that (1) in the input, you can have ? dimensions at the
 beginning or the end of your shape, but not both at the same time, (2)
 any dimension that has a ? in one place must have it in all places,
 (3) when checking argument conformity, nothing at all only matches
 against nothing at all, not against 1; this is because if we allowed
 (n?,m),(n?,m)-(n?,m) to be applied to two arrays with shapes (5,) and
 (1, 5), then it would be ambiguous whether the output should have
 shape (5,) or (1, 5).


I definitely do not mind taking a look into it. I need to think a little
more about the rules to convince myself that there is a consistent set of
them that we can use. I also thought there may be a performance concern,
that you may want to have different implementations when dimensions are
missing, not automatically add a 1 and then remove it. It doesn't seem to
be the case with neither `np.dot` nor `np.solve`, so maybe I am being
overly cautious.

Thanks for your comments and ideas. I have a feeling there are some nice
features hidden in here, but I can't seem to figure out what should they be
on my own.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] PR added: frozen dimensions in gufunc signatures

2014-08-28 Thread Jaime Fernández del Río
Hi,

I have just sent a PR (https://github.com/numpy/numpy/pull/5015), adding
the possibility of having frozen dimensions in gufunc signatures. As a
proof of concept, I have added a `cross1d` gufunc to
`numpy.core.umath_tests`:

In [1]: import numpy as np
In [2]: from numpy.core.umath_tests import cross1d

In [3]: cross1d.signature
Out[3]: '(3),(3)-(3)'

In [4]: a = np.random.rand(1000, 3)
In [5]: b = np.random.rand(1000, 3)

In [6]: np.allclose(np.cross(a, b), cross1d(a, b))
Out[6]: True

In [7]: %timeit np.cross(a, b)
1 loops, best of 3: 76.1 us per loop

In [8]: %timeit cross1d(a, b)
10 loops, best of 3: 13.1 us per loop

In [9]: c = np.random.rand(1000, 2)
In [10]: d = np.random.rand(1000, 2)

In [11]: cross1d(c, d)
---
ValueErrorTraceback (most recent call last)
ipython-input-11-72c66212e40c in module()
 1 cross1d(c, d)

ValueError: cross1d: Operand 0 has a mismatch in its core dimension 0, with
gufunc signature (3),(3)-(3) (size 2 is different from 3)

The speed up over `np.cross` is nice, and while `np.cross` is not the best
of examples, as it needs to handle more sizes, in many cases this will
allow producing gufuncs that work without a Python wrapper redoing checks
that are best left to the iterator, such as dimension sizes.

It still needs tests, but before embarking on fully developing those, I
wanted to make sure that there is an interest on this.

I would also like to further enhance gufuncs providing computed dimensions,
e.g. making it possible to e.g. define `pairwise_cross` with signature '(n,
3)-($m, 3)', where the $ indicates that m is a computed dimension, that
would have to be calculated by a function passed to the gufunc constructor
and stored in the gufunc object, based on the other core dimensions. In
this case it would make $m be n*(n-1), so that all pairwise cross products
between 3D vectors could be computed.

The syntax with '$' is kind of crappy, so any suggestions on how to better
express this in the signature are more than welcome, as well as any
feedback on the merits (or lack of them) of implementing this.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PR added: frozen dimensions in gufunc signatures

2014-08-28 Thread Nathaniel Smith
On Fri, Aug 29, 2014 at 1:14 AM, Jaime Fernández del Río
jaime.f...@gmail.com wrote:
 Hi,

 I have just sent a PR (https://github.com/numpy/numpy/pull/5015), adding the
 possibility of having frozen dimensions in gufunc signatures. As a proof of
 concept, I have added a `cross1d` gufunc to `numpy.core.umath_tests`:

 In [1]: import numpy as np
 In [2]: from numpy.core.umath_tests import cross1d

 In [3]: cross1d.signature
 Out[3]: '(3),(3)-(3)'

 In [4]: a = np.random.rand(1000, 3)
 In [5]: b = np.random.rand(1000, 3)

 In [6]: np.allclose(np.cross(a, b), cross1d(a, b))
 Out[6]: True

 In [7]: %timeit np.cross(a, b)
 1 loops, best of 3: 76.1 us per loop

 In [8]: %timeit cross1d(a, b)
 10 loops, best of 3: 13.1 us per loop

 In [9]: c = np.random.rand(1000, 2)
 In [10]: d = np.random.rand(1000, 2)

 In [11]: cross1d(c, d)
 ---
 ValueErrorTraceback (most recent call last)
 ipython-input-11-72c66212e40c in module()
  1 cross1d(c, d)

 ValueError: cross1d: Operand 0 has a mismatch in its core dimension 0, with
 gufunc signature (3),(3)-(3) (size 2 is different from 3)

 The speed up over `np.cross` is nice, and while `np.cross` is not the best
 of examples, as it needs to handle more sizes, in many cases this will allow
 producing gufuncs that work without a Python wrapper redoing checks that are
 best left to the iterator, such as dimension sizes.

 It still needs tests, but before embarking on fully developing those, I
 wanted to make sure that there is an interest on this.

 I would also like to further enhance gufuncs providing computed dimensions,
 e.g. making it possible to e.g. define `pairwise_cross` with signature '(n,
 3)-($m, 3)', where the $ indicates that m is a computed dimension, that
 would have to be calculated by a function passed to the gufunc constructor
 and stored in the gufunc object, based on the other core dimensions. In this
 case it would make $m be n*(n-1), so that all pairwise cross products
 between 3D vectors could be computed.

 The syntax with '$' is kind of crappy, so any suggestions on how to better
 express this in the signature are more than welcome, as well as any feedback
 on the merits (or lack of them) of implementing this.

Some thoughts:

When I first saw the PR my first reaction was that maybe we should be
allowing more general hooks for a gufunc to choose its core
dimensions. Reading the code convinced me that this is a relatively
minimal enhancement over what we're currently doing, so your current
PR looks fine to me.

But, for your computed dimension idea I'm wondering if what we should
do instead is just let a gufunc provide a C callback that looks at the
input array dimensions and explicitly says somehow which dimensions it
wants to treat as the core dimensions and what its output shapes will
be. There's no rule that we have to extend the signature mini-language
to be Turing complete, we can just use C :-).

It would be good to have a better motivation for computed gufunc
dimensions, though. Your all pairwise cross products example would
be *much* better handled by implementing the .outer method for binary
gufuncs: pairwise_cross(a) == cross.outer(a, a). This would make
gufuncs more consistent with ufuncs, plus let you do
all-pairwise-cross-products between two different sets of cross
products, plus give us all-pairwise-matrix-products for free, etc.

While you're messing around with the gufunc dimension matching logic,
any chance we can tempt you to implement the optional dimensions
needed to handle '@', solve, etc. elegantly? The rule would be that
you can write something like
   (n?,k),(k,m?)-(n?,m?)
and the ? dimensions are allowed to take on an additional value
nothing at all. If there's no dimension available in the input, then
we act like it was reshaped to add a dimension with shape 1, and then
in the output we squeeze this dimension out again. I guess the rules
would be that (1) in the input, you can have ? dimensions at the
beginning or the end of your shape, but not both at the same time, (2)
any dimension that has a ? in one place must have it in all places,
(3) when checking argument conformity, nothing at all only matches
against nothing at all, not against 1; this is because if we allowed
(n?,m),(n?,m)-(n?,m) to be applied to two arrays with shapes (5,) and
(1, 5), then it would be ambiguous whether the output should have
shape (5,) or (1, 5).

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion