subject:"Re\: \[Numpy\-discussion\] Proposed new feature for numpy.einsum\: repeated output subscripts as diagonal"

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

2014-08-20 Thread Pierre-Andre Noel


Thanks all for the feedback!

So there appears to be interest for this feature, and I think that I can 
implement it. However, it may take a while before I do so: I have other 
priorities right now.


In view of jaimefrio's comment on 
https://github.com/numpy/numpy/issues/4965 as well as Eelco 
Hoogendoorn's reply above, here is how I currently intend to implement 
the feature.


1. Implement a `diag_view` function that uses strides to make a view. 
The function would use subscripts in a way very similar to `einsum`, 
except that no commas are allowed and all indices appearing on one side 
of `-` must also appear on the other side. Like the current `einsum`, 
indices on the right-hand side of `-` cannot be repeated. For example, 
`B=diag_view('iij-ij',A)` returns a 2D view `B` of the 3D array `A` 
where the off-diagonal elements in the first two dimensions of `A` are 
inaccessible in `B`.


2. The edits to `einsum` itself should be minimal. For the purpose of 
the following, suppose that the indices have the form `lhs+'-'+rhs`, 
where `lhs` and `rhs` are character strings. To make sure that the 
current behavior of `einsum` is not slowed down nor broken by the new 
functionality, I intend to limit edits to the point where an error would 
be raised due to repeated indices in `rhs`. The following outlines what 
would replace the current error-raising.


2.1 Extract from `rhs` the first occurrences of each indices; call 
that `rhs_first_oc`.


2.2 If no `out` has been provided to `einsum`, allocate a zeroed 
out `ndarray` of appropriate size, including off-diagonal entries; call 
that `full_out`. If an `out` was provided to `einsum`, set `full_out=out`.


2.3 Set `diag_out=diag_view(rhs+'-'+rhs_first_oc,full_out)`.

2.4 Call `einsum(lhs+'-'+rhs_first_oc, [...], out=diag_out)`. This 
call is recursive, but the recursion should stop there.


2.5 Return `full_out`.

Note that if an `out` is provided to `einsum`, the off-diagonal entries 
are not zeroed out. This should be a documented feature of `einsum`.


A disadvantage of this approach is that the indices are parsed 2-4 
times, depending how you count. However, for large `ndarray`, the 
bottleneck won't be there anyway.


Thanks again!

Pierre-André Noël

On 08/15/2014 03:09 PM, Eelco Hoogendoorn wrote:
here is a snippet I extracted from a project with similar aims 
(integrating the functionality of einsum and numexpr, actually)


Not much to it, but in case someone needs a reminder on how to use 
striding tricks:

http://pastebin.com/kQNySjcj


On Fri, Aug 15, 2014 at 5:20 PM, Eelco Hoogendoorn 
hoogendoorn.ee...@gmail.com mailto:hoogendoorn.ee...@gmail.com wrote:


Well, there is the numpy-API C level, and then there is the arcane
macro C level. The two might as well be a completely different
language.

Indeed, it should be doing something similar for the inputs.
Actually, I think I wrote a wrapper around einsum/numexpr once
that performed this generalized indexing once... ill see if I can
dig that up.


On Fri, Aug 15, 2014 at 5:01 PM, Sebastian Berg
sebast...@sipsolutions.net mailto:sebast...@sipsolutions.net
wrote:

On Fr, 2014-08-15 at 16:42 +0200, Eelco Hoogendoorn wrote:
 Agreed; this addition occurred to me as well. Note that the
 implemenatation should be straightforward: just allocate an
enlarged
 array, use some striding logic to construct the relevant
view, and let
 einsums internals act on the view. hopefully, you wont even
have to
 touch the guts of einsum at the C level, because id say that
isn't for
 the faint of heart...


I am not sure if einsum isn't pure C :). But even if, it
should be doing
something identical already for duplicate indices on the inputs...

- Sebastian


 On Fri, Aug 15, 2014 at 3:53 PM, Sebastian Berg
 sebast...@sipsolutions.net
mailto:sebast...@sipsolutions.net wrote:
 On Do, 2014-08-14 at 12:42 -0700, Stephan Hoyer wrote:
  I think this would be very nice addition.
 
 
  On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root
 ben.r...@ou.edu mailto:ben.r...@ou.edu
  wrote:
  You had me at Kronecker delta... :-)  +1
 


 Sounds good to me. I don't see a reason for not
relaxing the
 restriction, unless there is some technical issue,
but I doubt
 that.

 - Sebastian

 
 
  On Thu, Aug 14, 2014 at 3:07 PM,
Pierre-Andre Noel
  noel.pierre.an...@gmail.com
mailto:noel.pierre.an...@gmail.com wrote:
  (I created issue 4965 earlier

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

2014-08-20 Thread Jaime Fernández del Río

On Wed, Aug 20, 2014 at 6:26 AM, Pierre-Andre Noel 
noel.pierre.an...@gmail.com wrote:

  Thanks all for the feedback!

 So there appears to be interest for this feature, and I think that I can
 implement it. However, it may take a while before I do so: I have other
 priorities right now.

 In view of jaimefrio's comment on
 https://github.com/numpy/numpy/issues/4965 as well as Eelco Hoogendoorn's
 reply above, here is how I currently intend to implement the feature.

 1. Implement a `diag_view` function that uses strides to make a view. The
 function would use subscripts in a way very similar to `einsum`, except
 that no commas are allowed and all indices appearing on one side of `-`
 must also appear on the other side. Like the current `einsum`, indices on
 the right-hand side of `-` cannot be repeated. For example,
 `B=diag_view('iij-ij',A)` returns a 2D view `B` of the 3D array `A` where
 the off-diagonal elements in the first two dimensions of `A` are
 inaccessible in `B`.

 2. The edits to `einsum` itself should be minimal. For the purpose of the
 following, suppose that the indices have the form `lhs+'-'+rhs`, where
 `lhs` and `rhs` are character strings. To make sure that the current
 behavior of `einsum` is not slowed down nor broken by the new
 functionality, I intend to limit edits to the point where an error would be
 raised due to repeated indices in `rhs`. The following outlines what would
 replace the current error-raising.

 2.1 Extract from `rhs` the first occurrences of each indices; call
 that `rhs_first_oc`.

 2.2 If no `out` has been provided to `einsum`, allocate a zeroed out
 `ndarray` of appropriate size, including off-diagonal entries; call that
 `full_out`. If an `out` was provided to `einsum`, set `full_out=out`.

 2.3 Set `diag_out=diag_view(rhs+'-'+rhs_first_oc,full_out)`.

 2.4 Call `einsum(lhs+'-'+rhs_first_oc, [...], out=diag_out)`. This
 call is recursive, but the recursion should stop there.

 2.5 Return `full_out`.


I have looked a little into this, and I think there is an additional
complication: if I understood the structure of the code correctly,
`einsum`'s current entry point is the function `array_einsum` in
`multiarraymodule.c`, which accepts two different input methods: the
subscript one we have been discussing here, and another one that uses
 lists of axes after each operand. This second method gets translated into
subscript notation by several functions in that same module:
`einsum_sub_op_from_str`,
`einsum_list_to_subscripts` and `einsum_sub_op_from_lists`, and then the C
API einsum function, `PyArray_EinsteinSum` in `einsum.c.src`, which only
understands the subscript notation, gets called.

The simplest place to implement the changes you propose without any major
rearchitecturing is therefore in `PyArray_EinsteinSum`. And while the flow
you propose seems to me be correct, doing that at the C level will probably
look somewhat different, e.g. you would probably let the iterator create an
array with all the axes, and then remove the repeated ones from the
iterator and modify the strides, instead of passing in a strided view with
fewer axes.

If you were planning on writing your code in a Python wrapper, you need to
figure out how to keep the alternative syntax code path. Haven't given it
much thought, but it doesn't look easy without rewriting a lot of stuff.

I see either solution as way too much complication for the reward. And
still see writing a function that does the opposite of your `diag_view`,
and expecting the end user to chain a call to it to the call to einsum, as
the simplest way of providing this functionality. Although if you can find
the time and the motivation to do the big change, I am perfectly OK with
it, of course!

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

2014-08-15 Thread Sebastian Berg

On Do, 2014-08-14 at 12:42 -0700, Stephan Hoyer wrote:
 I think this would be very nice addition.
 
 
 On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root ben.r...@ou.edu
 wrote:
 You had me at Kronecker delta... :-)  +1
 

Sounds good to me. I don't see a reason for not relaxing the
restriction, unless there is some technical issue, but I doubt that.

- Sebastian

 
 
 On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel
 noel.pierre.an...@gmail.com wrote:
 (I created issue 4965 earlier today on this topic, and
 I have been
 advised to email to this mailing list to discuss
 whether it is a good
 idea or not. I include my original post as-is,
 followed by additional
 comments.)
 
 I think that the following new feature would make
 `numpy.einsum` even
 more powerful/useful/awesome than it already is.
 Moreover, the change
 should not interfere with existing code, it would
 preserve the
 minimalistic spirit of `numpy.einsum`, and the new
 functionality would
 integrate in a seamless/intuitive manner for the
 users.
 
 In short, the new feature would allow for repeated
 subscripts to appear
 in the output part of the `subscripts` parameter
 (i.e., on the
 right-hand side of `-`). The corresponding dimensions
 in the resulting
 `ndarray` would only be filled along their diagonal,
 leaving the off
 diagonal entries to the default value for this `dtype`
 (typically zero).
 Note that the current behavior is to raise an
 exception when repeated
 output subscripts are being used.
 
 This is simplest to describe with an example involving
 the dual behavior
 of `numpy.diag`.
 
 ```python
 # Extracting the diagonal of a 2-D array.
 A = arange(16).reshape(4,4)
 print(diag(A)) # Output: [ 0 5 10 15 ]
 print(einsum('ii-i', A)) # Same as previous line
 (current behavior).
 
 # Constructing a diagonal 2-D array.
 v = arange(4)
 print(diag(v)) # Output: [[0 0 0 0] [0 1 0 0] [0 0 2
 0] [0 0 0 3]]
 print(einsum('i-ii', v)) # New behavior would be same
 as previous line.
 # The current behavior of the previous line is to
 raise an exception.
 ```
 
 By opposition to `numpy.diag`, the approach
 generalizes to higher
 dimensions: `einsum('iii-i', A)` extracts the
 diagonal of a 3-D array,
 and `einsum('i-iii', v)` would build a diagonal 3-D
 array.
 
 The proposed behavior really starts to shine in more
 intricate cases.
 
 ```python
 # Dummy values, these should be probabilities to make
 sense below.
 P_w_ab = arange(24).reshape(3,2,4)
 P_y_wxab = arange(144).reshape(3,3,2,2,4)
 
 # With the proposed behavior, the following two lines
 should be equivalent.
 P_xyz_ab = einsum('wab,xa,ywxab,zy-xyzab', P_w_ab,
 eye(2), P_y_wxab,
 eye(3))
 also_P_xyz_ab = einsum('wab,ywaab-ayyab', P_w_ab,
 P_y_wxab)
 ```
 
 If this is not convincing enough, replace `eye(2)` by
 `eye(P_w_ab.shape[1])` and replace `eye(3)` by
 `eye(P_y_wxab.shape[0])`,
 then imagine more dimensions and repeated indices...
 The new notation
 would allow for crisper codes and reduce the
 opportunities for dumb
 mistakes.
 
 For those who wonder, the above computation amounts to
 $P(X=x,Y=y,Z=z|A=a,B=b) = \sum_w P(W=w|A=a,B=b) P(X=x|
 A=a)
 P(Y=y|W=w,X=x,A=a,B=b) P(Z=z|Y=y)$ with $P(X=x|A=a)=
 \delta_{xa}$ and
 $P(Z=z|Y=y)=\delta_{zy}$ (using LaTeX notation, and
 $\delta_{ij}$ is
 [Kronecker's
 delta](http://en.wikipedia.org/wiki/Kronecker_delta)).
 
 (End of original post.)

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

2014-08-15 Thread Eelco Hoogendoorn

Agreed; this addition occurred to me as well. Note that the implemenatation
should be straightforward: just allocate an enlarged array, use some
striding logic to construct the relevant view, and let einsums internals
act on the view. hopefully, you wont even have to touch the guts of einsum
at the C level, because id say that isn't for the faint of heart...


On Fri, Aug 15, 2014 at 3:53 PM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Do, 2014-08-14 at 12:42 -0700, Stephan Hoyer wrote:
  I think this would be very nice addition.
 
 
  On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root ben.r...@ou.edu
  wrote:
  You had me at Kronecker delta... :-)  +1
 

 Sounds good to me. I don't see a reason for not relaxing the
 restriction, unless there is some technical issue, but I doubt that.

 - Sebastian

 
 
  On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel
  noel.pierre.an...@gmail.com wrote:
  (I created issue 4965 earlier today on this topic, and
  I have been
  advised to email to this mailing list to discuss
  whether it is a good
  idea or not. I include my original post as-is,
  followed by additional
  comments.)
 
  I think that the following new feature would make
  `numpy.einsum` even
  more powerful/useful/awesome than it already is.
  Moreover, the change
  should not interfere with existing code, it would
  preserve the
  minimalistic spirit of `numpy.einsum`, and the new
  functionality would
  integrate in a seamless/intuitive manner for the
  users.
 
  In short, the new feature would allow for repeated
  subscripts to appear
  in the output part of the `subscripts` parameter
  (i.e., on the
  right-hand side of `-`). The corresponding dimensions
  in the resulting
  `ndarray` would only be filled along their diagonal,
  leaving the off
  diagonal entries to the default value for this `dtype`
  (typically zero).
  Note that the current behavior is to raise an
  exception when repeated
  output subscripts are being used.
 
  This is simplest to describe with an example involving
  the dual behavior
  of `numpy.diag`.
 
  ```python
  # Extracting the diagonal of a 2-D array.
  A = arange(16).reshape(4,4)
  print(diag(A)) # Output: [ 0 5 10 15 ]
  print(einsum('ii-i', A)) # Same as previous line
  (current behavior).
 
  # Constructing a diagonal 2-D array.
  v = arange(4)
  print(diag(v)) # Output: [[0 0 0 0] [0 1 0 0] [0 0 2
  0] [0 0 0 3]]
  print(einsum('i-ii', v)) # New behavior would be same
  as previous line.
  # The current behavior of the previous line is to
  raise an exception.
  ```
 
  By opposition to `numpy.diag`, the approach
  generalizes to higher
  dimensions: `einsum('iii-i', A)` extracts the
  diagonal of a 3-D array,
  and `einsum('i-iii', v)` would build a diagonal 3-D
  array.
 
  The proposed behavior really starts to shine in more
  intricate cases.
 
  ```python
  # Dummy values, these should be probabilities to make
  sense below.
  P_w_ab = arange(24).reshape(3,2,4)
  P_y_wxab = arange(144).reshape(3,3,2,2,4)
 
  # With the proposed behavior, the following two lines
  should be equivalent.
  P_xyz_ab = einsum('wab,xa,ywxab,zy-xyzab', P_w_ab,
  eye(2), P_y_wxab,
  eye(3))
  also_P_xyz_ab = einsum('wab,ywaab-ayyab', P_w_ab,
  P_y_wxab)
  ```
 
  If this is not convincing enough, replace `eye(2)` by
  `eye(P_w_ab.shape[1])` and replace `eye(3)` by
  `eye(P_y_wxab.shape[0])`,
  then imagine more dimensions and repeated indices...
  The new notation
  would allow for crisper codes and reduce the
  opportunities for dumb
  mistakes.
 
  For those who wonder, the above computation amounts to
  $P(X=x,Y=y,Z=z|A=a,B=b) = \sum_w P(W=w|A=a,B=b) P(X=x|
  A=a)

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

2014-08-15 Thread Sebastian Berg

On Fr, 2014-08-15 at 16:42 +0200, Eelco Hoogendoorn wrote:
 Agreed; this addition occurred to me as well. Note that the
 implemenatation should be straightforward: just allocate an enlarged
 array, use some striding logic to construct the relevant view, and let
 einsums internals act on the view. hopefully, you wont even have to
 touch the guts of einsum at the C level, because id say that isn't for
 the faint of heart...
 

I am not sure if einsum isn't pure C :). But even if, it should be doing
something identical already for duplicate indices on the inputs...

- Sebastian

 
 On Fri, Aug 15, 2014 at 3:53 PM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
 On Do, 2014-08-14 at 12:42 -0700, Stephan Hoyer wrote:
  I think this would be very nice addition.
 
 
  On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root
 ben.r...@ou.edu
  wrote:
  You had me at Kronecker delta... :-)  +1
 
 
 
 Sounds good to me. I don't see a reason for not relaxing the
 restriction, unless there is some technical issue, but I doubt
 that.
 
 - Sebastian
 
 
 
  On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel
  noel.pierre.an...@gmail.com wrote:
  (I created issue 4965 earlier today on this
 topic, and
  I have been
  advised to email to this mailing list to
 discuss
  whether it is a good
  idea or not. I include my original post
 as-is,
  followed by additional
  comments.)
 
  I think that the following new feature would
 make
  `numpy.einsum` even
  more powerful/useful/awesome than it already
 is.
  Moreover, the change
  should not interfere with existing code, it
 would
  preserve the
  minimalistic spirit of `numpy.einsum`, and
 the new
  functionality would
  integrate in a seamless/intuitive manner for
 the
  users.
 
  In short, the new feature would allow for
 repeated
  subscripts to appear
  in the output part of the `subscripts`
 parameter
  (i.e., on the
  right-hand side of `-`). The corresponding
 dimensions
  in the resulting
  `ndarray` would only be filled along their
 diagonal,
  leaving the off
  diagonal entries to the default value for
 this `dtype`
  (typically zero).
  Note that the current behavior is to raise
 an
  exception when repeated
  output subscripts are being used.
 
  This is simplest to describe with an example
 involving
  the dual behavior
  of `numpy.diag`.
 
  ```python
  # Extracting the diagonal of a 2-D array.
  A = arange(16).reshape(4,4)
  print(diag(A)) # Output: [ 0 5 10 15 ]
  print(einsum('ii-i', A)) # Same as previous
 line
  (current behavior).
 
  # Constructing a diagonal 2-D array.
  v = arange(4)
  print(diag(v)) # Output: [[0 0 0 0] [0 1 0
 0] [0 0 2
  0] [0 0 0 3]]
  print(einsum('i-ii', v)) # New behavior
 would be same
  as previous line.
  # The current behavior of the previous line
 is to
  raise an exception.
  ```
 
  By opposition to `numpy.diag`, the approach
  generalizes to higher
  dimensions: `einsum('iii-i', A)` extracts
 the
  diagonal of a 3-D array,
  and `einsum('i-iii', v)` would build a
 diagonal 3-D
  array.
 
  The proposed behavior really starts to shine
 in more
  intricate cases.
 
  ```python
  # Dummy values, these should be
 probabilities to

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

2014-08-15 Thread Eelco Hoogendoorn

Well, there is the numpy-API C level, and then there is the arcane macro C
level. The two might as well be a completely different language.

Indeed, it should be doing something similar for the inputs. Actually, I
think I wrote a wrapper around einsum/numexpr once that performed this
generalized indexing once... ill see if I can dig that up.


On Fri, Aug 15, 2014 at 5:01 PM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Fr, 2014-08-15 at 16:42 +0200, Eelco Hoogendoorn wrote:
  Agreed; this addition occurred to me as well. Note that the
  implemenatation should be straightforward: just allocate an enlarged
  array, use some striding logic to construct the relevant view, and let
  einsums internals act on the view. hopefully, you wont even have to
  touch the guts of einsum at the C level, because id say that isn't for
  the faint of heart...
 

 I am not sure if einsum isn't pure C :). But even if, it should be doing
 something identical already for duplicate indices on the inputs...

 - Sebastian

 
  On Fri, Aug 15, 2014 at 3:53 PM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
  On Do, 2014-08-14 at 12:42 -0700, Stephan Hoyer wrote:
   I think this would be very nice addition.
  
  
   On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root
  ben.r...@ou.edu
   wrote:
   You had me at Kronecker delta... :-)  +1
  
 
 
  Sounds good to me. I don't see a reason for not relaxing the
  restriction, unless there is some technical issue, but I doubt
  that.
 
  - Sebastian
 
  
  
   On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel
   noel.pierre.an...@gmail.com wrote:
   (I created issue 4965 earlier today on this
  topic, and
   I have been
   advised to email to this mailing list to
  discuss
   whether it is a good
   idea or not. I include my original post
  as-is,
   followed by additional
   comments.)
  
   I think that the following new feature would
  make
   `numpy.einsum` even
   more powerful/useful/awesome than it already
  is.
   Moreover, the change
   should not interfere with existing code, it
  would
   preserve the
   minimalistic spirit of `numpy.einsum`, and
  the new
   functionality would
   integrate in a seamless/intuitive manner for
  the
   users.
  
   In short, the new feature would allow for
  repeated
   subscripts to appear
   in the output part of the `subscripts`
  parameter
   (i.e., on the
   right-hand side of `-`). The corresponding
  dimensions
   in the resulting
   `ndarray` would only be filled along their
  diagonal,
   leaving the off
   diagonal entries to the default value for
  this `dtype`
   (typically zero).
   Note that the current behavior is to raise
  an
   exception when repeated
   output subscripts are being used.
  
   This is simplest to describe with an example
  involving
   the dual behavior
   of `numpy.diag`.
  
   ```python
   # Extracting the diagonal of a 2-D array.
   A = arange(16).reshape(4,4)
   print(diag(A)) # Output: [ 0 5 10 15 ]
   print(einsum('ii-i', A)) # Same as previous
  line
   (current behavior).
  
   # Constructing a diagonal 2-D array.
   v = arange(4)
   print(diag(v)) # Output: [[0 0 0 0] [0 1 0
  0] [0 0 2
   0] [0 0 0 3]]
   print(einsum('i-ii', v)) # New behavior
  would be same
   as previous line.
   # The current behavior of the previous line
  is to
   raise an exception.
   ```
  
   By opposition to `numpy.diag`, the approach
   generalizes to higher

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

2014-08-15 Thread Eelco Hoogendoorn

here is a snippet I extracted from a project with similar aims (integrating
the functionality of einsum and numexpr, actually)

Not much to it, but in case someone needs a reminder on how to use striding
tricks:
http://pastebin.com/kQNySjcj


On Fri, Aug 15, 2014 at 5:20 PM, Eelco Hoogendoorn 
hoogendoorn.ee...@gmail.com wrote:

 Well, there is the numpy-API C level, and then there is the arcane macro C
 level. The two might as well be a completely different language.

 Indeed, it should be doing something similar for the inputs. Actually, I
 think I wrote a wrapper around einsum/numexpr once that performed this
 generalized indexing once... ill see if I can dig that up.


 On Fri, Aug 15, 2014 at 5:01 PM, Sebastian Berg 
 sebast...@sipsolutions.net wrote:

 On Fr, 2014-08-15 at 16:42 +0200, Eelco Hoogendoorn wrote:
  Agreed; this addition occurred to me as well. Note that the
  implemenatation should be straightforward: just allocate an enlarged
  array, use some striding logic to construct the relevant view, and let
  einsums internals act on the view. hopefully, you wont even have to
  touch the guts of einsum at the C level, because id say that isn't for
  the faint of heart...
 

 I am not sure if einsum isn't pure C :). But even if, it should be doing
 something identical already for duplicate indices on the inputs...

 - Sebastian

 
  On Fri, Aug 15, 2014 at 3:53 PM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
  On Do, 2014-08-14 at 12:42 -0700, Stephan Hoyer wrote:
   I think this would be very nice addition.
  
  
   On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root
  ben.r...@ou.edu
   wrote:
   You had me at Kronecker delta... :-)  +1
  
 
 
  Sounds good to me. I don't see a reason for not relaxing the
  restriction, unless there is some technical issue, but I doubt
  that.
 
  - Sebastian
 
  
  
   On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel
   noel.pierre.an...@gmail.com wrote:
   (I created issue 4965 earlier today on this
  topic, and
   I have been
   advised to email to this mailing list to
  discuss
   whether it is a good
   idea or not. I include my original post
  as-is,
   followed by additional
   comments.)
  
   I think that the following new feature would
  make
   `numpy.einsum` even
   more powerful/useful/awesome than it already
  is.
   Moreover, the change
   should not interfere with existing code, it
  would
   preserve the
   minimalistic spirit of `numpy.einsum`, and
  the new
   functionality would
   integrate in a seamless/intuitive manner for
  the
   users.
  
   In short, the new feature would allow for
  repeated
   subscripts to appear
   in the output part of the `subscripts`
  parameter
   (i.e., on the
   right-hand side of `-`). The corresponding
  dimensions
   in the resulting
   `ndarray` would only be filled along their
  diagonal,
   leaving the off
   diagonal entries to the default value for
  this `dtype`
   (typically zero).
   Note that the current behavior is to raise
  an
   exception when repeated
   output subscripts are being used.
  
   This is simplest to describe with an example
  involving
   the dual behavior
   of `numpy.diag`.
  
   ```python
   # Extracting the diagonal of a 2-D array.
   A = arange(16).reshape(4,4)
   print(diag(A)) # Output: [ 0 5 10 15 ]
   print(einsum('ii-i', A)) # Same as previous
  line
   (current behavior).
  
   # Constructing a diagonal 2-D array.
   v = arange(4)
   print(diag(v)) # Output: [[0 0 0 0] [0 1 0
  0] [0 0 2
   0] [0 0 0 3]]
   print(einsum('i-ii', v)) # New behavior
  would be same

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

2014-08-14 Thread Benjamin Root

You had me at Kronecker delta... :-)  +1


On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel 
noel.pierre.an...@gmail.com wrote:

 (I created issue 4965 earlier today on this topic, and I have been
 advised to email to this mailing list to discuss whether it is a good
 idea or not. I include my original post as-is, followed by additional
 comments.)

 I think that the following new feature would make `numpy.einsum` even
 more powerful/useful/awesome than it already is. Moreover, the change
 should not interfere with existing code, it would preserve the
 minimalistic spirit of `numpy.einsum`, and the new functionality would
 integrate in a seamless/intuitive manner for the users.

 In short, the new feature would allow for repeated subscripts to appear
 in the output part of the `subscripts` parameter (i.e., on the
 right-hand side of `-`). The corresponding dimensions in the resulting
 `ndarray` would only be filled along their diagonal, leaving the off
 diagonal entries to the default value for this `dtype` (typically zero).
 Note that the current behavior is to raise an exception when repeated
 output subscripts are being used.

 This is simplest to describe with an example involving the dual behavior
 of `numpy.diag`.

 ```python
 # Extracting the diagonal of a 2-D array.
 A = arange(16).reshape(4,4)
 print(diag(A)) # Output: [ 0 5 10 15 ]
 print(einsum('ii-i', A)) # Same as previous line (current behavior).

 # Constructing a diagonal 2-D array.
 v = arange(4)
 print(diag(v)) # Output: [[0 0 0 0] [0 1 0 0] [0 0 2 0] [0 0 0 3]]
 print(einsum('i-ii', v)) # New behavior would be same as previous line.
 # The current behavior of the previous line is to raise an exception.
 ```

 By opposition to `numpy.diag`, the approach generalizes to higher
 dimensions: `einsum('iii-i', A)` extracts the diagonal of a 3-D array,
 and `einsum('i-iii', v)` would build a diagonal 3-D array.

 The proposed behavior really starts to shine in more intricate cases.

 ```python
 # Dummy values, these should be probabilities to make sense below.
 P_w_ab = arange(24).reshape(3,2,4)
 P_y_wxab = arange(144).reshape(3,3,2,2,4)

 # With the proposed behavior, the following two lines should be equivalent.
 P_xyz_ab = einsum('wab,xa,ywxab,zy-xyzab', P_w_ab, eye(2), P_y_wxab,
 eye(3))
 also_P_xyz_ab = einsum('wab,ywaab-ayyab', P_w_ab, P_y_wxab)
 ```

 If this is not convincing enough, replace `eye(2)` by
 `eye(P_w_ab.shape[1])` and replace `eye(3)` by `eye(P_y_wxab.shape[0])`,
 then imagine more dimensions and repeated indices... The new notation
 would allow for crisper codes and reduce the opportunities for dumb
 mistakes.

 For those who wonder, the above computation amounts to
 $P(X=x,Y=y,Z=z|A=a,B=b) = \sum_w P(W=w|A=a,B=b) P(X=x|A=a)
 P(Y=y|W=w,X=x,A=a,B=b) P(Z=z|Y=y)$ with $P(X=x|A=a)=\delta_{xa}$ and
 $P(Z=z|Y=y)=\delta_{zy}$ (using LaTeX notation, and $\delta_{ij}$ is
 [Kronecker's delta](http://en.wikipedia.org/wiki/Kronecker_delta)).

 (End of original post.)

 I have been told by @jaimefrio that The best way of getting a new
 feature into numpy is putting it in yourself. Hence, if discussions
 here do reveal that this is a good idea, then I may give a try at coding
 it myself. However, I currently know nothing of the inner workings of
 numpy/ndarray/einsum, and I have higher priorities right now. This means
 that it could take a long while before I contribute any code, if I ever
 do. Hence, if anyone feels like doing it, feel free to do so!

 Also, I am aware that storing a lot of zeros in an `ndarray` may not, a
 priori, be a desirable avenue. However, there are times where you have
 to do it: think of `numpy.eye` as an example. In my case of application,
 I use such diagonal structures in the initialization of an `ndarray`
 which is later updated through an iterative process. After these
 iterations, most of the zeros will be gone. Do other people see a use
 for such capabilities?

 Thank you for your time and have a nice day.

 Sincerely,

 Pierre-André Noël
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

2014-08-14 Thread Stephan Hoyer

I think this would be very nice addition.


On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root ben.r...@ou.edu wrote:

 You had me at Kronecker delta... :-)  +1


 On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel 
 noel.pierre.an...@gmail.com wrote:

 (I created issue 4965 earlier today on this topic, and I have been
 advised to email to this mailing list to discuss whether it is a good
 idea or not. I include my original post as-is, followed by additional
 comments.)

 I think that the following new feature would make `numpy.einsum` even
 more powerful/useful/awesome than it already is. Moreover, the change
 should not interfere with existing code, it would preserve the
 minimalistic spirit of `numpy.einsum`, and the new functionality would
 integrate in a seamless/intuitive manner for the users.

 In short, the new feature would allow for repeated subscripts to appear
 in the output part of the `subscripts` parameter (i.e., on the
 right-hand side of `-`). The corresponding dimensions in the resulting
 `ndarray` would only be filled along their diagonal, leaving the off
 diagonal entries to the default value for this `dtype` (typically zero).
 Note that the current behavior is to raise an exception when repeated
 output subscripts are being used.

 This is simplest to describe with an example involving the dual behavior
 of `numpy.diag`.

 ```python
 # Extracting the diagonal of a 2-D array.
 A = arange(16).reshape(4,4)
 print(diag(A)) # Output: [ 0 5 10 15 ]
 print(einsum('ii-i', A)) # Same as previous line (current behavior).

 # Constructing a diagonal 2-D array.
 v = arange(4)
 print(diag(v)) # Output: [[0 0 0 0] [0 1 0 0] [0 0 2 0] [0 0 0 3]]
 print(einsum('i-ii', v)) # New behavior would be same as previous line.
 # The current behavior of the previous line is to raise an exception.
 ```

 By opposition to `numpy.diag`, the approach generalizes to higher
 dimensions: `einsum('iii-i', A)` extracts the diagonal of a 3-D array,
 and `einsum('i-iii', v)` would build a diagonal 3-D array.

 The proposed behavior really starts to shine in more intricate cases.

 ```python
 # Dummy values, these should be probabilities to make sense below.
 P_w_ab = arange(24).reshape(3,2,4)
 P_y_wxab = arange(144).reshape(3,3,2,2,4)

 # With the proposed behavior, the following two lines should be
 equivalent.
 P_xyz_ab = einsum('wab,xa,ywxab,zy-xyzab', P_w_ab, eye(2), P_y_wxab,
 eye(3))
 also_P_xyz_ab = einsum('wab,ywaab-ayyab', P_w_ab, P_y_wxab)
 ```

 If this is not convincing enough, replace `eye(2)` by
 `eye(P_w_ab.shape[1])` and replace `eye(3)` by `eye(P_y_wxab.shape[0])`,
 then imagine more dimensions and repeated indices... The new notation
 would allow for crisper codes and reduce the opportunities for dumb
 mistakes.

 For those who wonder, the above computation amounts to
 $P(X=x,Y=y,Z=z|A=a,B=b) = \sum_w P(W=w|A=a,B=b) P(X=x|A=a)
 P(Y=y|W=w,X=x,A=a,B=b) P(Z=z|Y=y)$ with $P(X=x|A=a)=\delta_{xa}$ and
 $P(Z=z|Y=y)=\delta_{zy}$ (using LaTeX notation, and $\delta_{ij}$ is
 [Kronecker's delta](http://en.wikipedia.org/wiki/Kronecker_delta)).

 (End of original post.)

 I have been told by @jaimefrio that The best way of getting a new
 feature into numpy is putting it in yourself. Hence, if discussions
 here do reveal that this is a good idea, then I may give a try at coding
 it myself. However, I currently know nothing of the inner workings of
 numpy/ndarray/einsum, and I have higher priorities right now. This means
 that it could take a long while before I contribute any code, if I ever
 do. Hence, if anyone feels like doing it, feel free to do so!

 Also, I am aware that storing a lot of zeros in an `ndarray` may not, a
 priori, be a desirable avenue. However, there are times where you have
 to do it: think of `numpy.eye` as an example. In my case of application,
 I use such diagonal structures in the initialization of an `ndarray`
 which is later updated through an iterative process. After these
 iterations, most of the zeros will be gone. Do other people see a use
 for such capabilities?

 Thank you for your time and have a nice day.

 Sincerely,

 Pierre-André Noël
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

Re: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal

9 matches

Site Navigation

Mail list logo

Footer information