Re: [Numpy-discussion] Views of a different dtype

2015-02-04 Thread Jaime Fernández del Río
On Tue, Feb 3, 2015 at 1:52 PM, Ian Henriksen 
insertinterestingnameh...@gmail.com wrote:

 On Tue Feb 03 2015 at 1:47:34 PM Jaime Fernández del Río 
 jaime.f...@gmail.com wrote:

 On Tue, Feb 3, 2015 at 8:59 AM, Sebastian Berg 
 sebast...@sipsolutions.net wrote:

 On Di, 2015-02-03 at 07:18 -0800, Jaime Fernández del Río wrote:
 
 snip
 
 
 
  Do you have a concrete example of what a non (1, 1) array that fails
  with relaxed strides would look like?
 
 
  If we used, as right now, the array flags as a first choice point, and
  only if none is set try to determine it from the strides/dimensions
  information, I fail to imagine any situation where the end result
  would be worse than now. I don't think that a little bit of
  predictable surprising in an advanced functionality is too bad. We
  could start raising on the face of ambiguity, we refuse to guess
  errors, even for the current behavior you show above, but that is more
  likely to trip people by not giving them any simple workaround, that
  it seems to me would be add a .T if all dimensions are 1 in some
  particular situations. Or are you thinking of something more serious
  than a shape mismatch when you write about breaking current code?
 

 Yes, I am talking only about wrongly shaped results for some fortran
 order arrays. A (20, 1) fortran order complex array being viewed as
 float, will with relaxed strides become a (20, 2) array instead of a
 (40, 1) one.


 That is a limitation of the current implementation too, and happens
 already whenever relaxed strides are in place. Which is the default for
 1.10, right?

 Perhaps giving 'view' an 'order' or 'axis' kwarg could make sense after
 all? It should probably be more of a hint of what to do (fortran vs c) when
 in doubt. C would prioritize last axis, F the first, and we could even
 add a raise option to have it fail if the axis cannot be inferred from
 the strides and shape. Current behavior would is equivalent to what C
 would do.

 Jaime



 IMHO, the best option would be something like this:

 - When changing to a type with smaller itemsize, add a new axis after the
 others so the resulting array is C contiguous (unless a different axis is
 specified by a keyword argument). The advantage here is that if you index
 the new view using the old indices for an entry, you get an array showing
 its representation in the new type.
 - No shape change for views with the same itemsize
 - When changing to a type with a larger itemsize, collapse along the last
 axis unless a different axis is specified, throwing an error if the axis
 specified does not match the axis specified.


My only concern with adding a new axis, backwards compatibility aside, is
that you would not know wether to keep or discard the resulting size 1
dimension when taking a view of the view. We could reuse the keepdims
terminology from ufuncs, though.

So the current behavior would remain unchanged if you set axis=None,
keepdims=True, and we could transition to a more reasonable default with
axis=-1, keepdims=False over a few releases with adequate warnings.

In my mind, when expanding, the axis would not indicate where to place the
new axis, but which axis to collapse over, that is the hard part to figure
out!If you want something else, it is only a call to rollaxis away. Perhaps
we could also give keepdims a meaning when expanding, as to whether to add
a new axis at the end, or change the size of the chosen dimension.

I don't know, there may be an actual interface hidden somewhere here, but
still needs some  cooking to fully figure it out.

Jaime



 The last point essentially is just adding an axis argument. I like that
 idea because it gives users the ability to do all that the array's memory
 layout allows in a clear and concise way. Throwing an error if the default
 axis doesn't work would be a good way to prevent strange bugs from
 happening when the default behavior is expected.

 The first point would be a break in backwards compatibility, so I'm not
 sure if it's feasible at this point. The advantage would be that all all
 arrays returned when using this functionality would be contiguous along the
 last axis. The shape of the new array would be independent of the memory
 layout of the original one. This would also be a much cleaner way to ensure
 that views of a different type are always reversible while still allowing
 for relaxed strides.

 Either way, thanks for looking into this. It's a great feature to have
 available.

 -Ian Henriksen





 
  If there are any real loopholes in expanding this functionality, then
  lets not do it, but we know we have at least one user unsatisfied with
  the current performance, so I really think it is worth trying. Plus,
  I'll admit to that, messing around with some of these stuff deep
  inside the guts of the beast is lots of fun! ;)
 

 I do not think there are loopholes with expanding this functionality. I
 think there have regressions when we put relaxed strides to on, 

Re: [Numpy-discussion] Views of a different dtype

2015-02-03 Thread Jaime Fernández del Río
On Tue, Feb 3, 2015 at 1:28 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Mo, 2015-02-02 at 06:25 -0800, Jaime Fernández del Río wrote:
  On Sat, Jan 31, 2015 at 1:17 AM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
  On Fr, 2015-01-30 at 19:52 -0800, Jaime Fernández del Río
  wrote:
   On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith
  n...@pobox.com
   wrote:
   On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández
  del Río
   jaime.f...@gmail.com wrote:
   [...]
 
  snip
 
  
   Could we make it more like: check to see if the last
  dimension
   works.
   If not, raise an error (and let the user transpose
  some other
   dimension there if that's what they wanted)? Or
  require the
   user to
   specify which dimension will absorb the shape
  change? (If we
   were
   doing this from scratch, then it would be tempting
  to just say
   that we
   always add a new dimension at the end with
  newtype.itemsize /
   oldtype.itemsize entries, or absorb such a dimension
  if
   shrinking. As
   a bonus, this would always work, regardless of
  contiguity!
   Except that
   when shrinking the last dimension would have to be
  contiguous,
   of
   course.)
  
  
   When we roll @ in and people start working with stacks of
  matrices, we
   will probably find ourselves having to create an alias,
  similar to .T,
   for .swapaxes(-1, -2). Searching for the smallest stride
  allows to
   take views of such arrays, which does not work right now
  because the
   array is no longer contiguous globally.
  
 
  That is true, but I agree with Nathaniel at least as far as
  that I would
  prefer a user to be able to safely use `view` even he has not
  even an
  inkling about what his memory layout is. One option would be
  an
  `axis=-1` default (maybe FutureWarn this from `axis=None`
  which would
  look at order, see below -- or maybe have axis='A', 'C' and
  'F' and
  default to 'A' for starters).
 
  This even now could start creating bugs when enabling relaxed
  strides :(, because your good old fortran order complex array
  being
  viewed as a float one could expand along the wrong axis, and
  even
  without such arrays swap order pretty fast when operating on
  them, which
  can create impossibly to find bugs, because even a poweruser
  is likely
  to forget about such things.
 
  Of course you could argue that view is a poweruser feature and
  a user
  using it should keep these things in mind Though if you
  argue that,
  you can almost just use `np.ndarray` directly ;) -- ok, not
  really
  considering how cumbersome it is, but still.
 
 
  I have been giving this some thought, and am willing to concede that
  my first proposal may have been too ambitious. So even though the knob
  goes to 11, we can always do things incrementally. I am also wary of
  adding new keywords when it seems obvious that we do not have the
  functionality completely figured out, so here's my new proposal:
 
 
* The objective is that a view of an array that is the result of
  slicing a contiguous array should be possible, if it remains
  contiguous (meaning stride == itemsize) along its original
  contiguous (first or last) dimension. This eliminates axis
  transposition from the previous proposal, although reversing
  the axes completely would also work.
* To verify this, unless the C contiguous or Fortran contiguous
  flags are set, we would still need to look at the strides. An
  array would be C contiguous if, starting from the last stride
  it is equal to the itemsize, and working backwards every next
  stride is larger or equal than the product of the previous
  stride by the previous dimension. dimensions of size 1 would
  be ignored for these, except for the last one, which would be
  taken to have stride = itemsize. The Fortran case is of course
  the same in reverse.
* I think the above combined with the current preference of C
  contiguousness over Fortran, would actually allow the views to
  always be reversible, which is also a nice thing to have.
  This eliminates most of the weirdness, but extends 

Re: [Numpy-discussion] Views of a different dtype

2015-02-03 Thread Sebastian Berg
On Di, 2015-02-03 at 07:18 -0800, Jaime Fernández del Río wrote:
 
snip
 
 
 
 Do you have a concrete example of what a non (1, 1) array that fails
 with relaxed strides would look like?
 
 
 If we used, as right now, the array flags as a first choice point, and
 only if none is set try to determine it from the strides/dimensions
 information, I fail to imagine any situation where the end result
 would be worse than now. I don't think that a little bit of
 predictable surprising in an advanced functionality is too bad. We
 could start raising on the face of ambiguity, we refuse to guess
 errors, even for the current behavior you show above, but that is more
 likely to trip people by not giving them any simple workaround, that
 it seems to me would be add a .T if all dimensions are 1 in some
 particular situations. Or are you thinking of something more serious
 than a shape mismatch when you write about breaking current code?
 

Yes, I am talking only about wrongly shaped results for some fortran
order arrays. A (20, 1) fortran order complex array being viewed as
float, will with relaxed strides become a (20, 2) array instead of a
(40, 1) one.

 
 If there are any real loopholes in expanding this functionality, then
 lets not do it, but we know we have at least one user unsatisfied with
 the current performance, so I really think it is worth trying. Plus,
 I'll admit to that, messing around with some of these stuff deep
 inside the guts of the beast is lots of fun! ;)
 

I do not think there are loopholes with expanding this functionality. I
think there have regressions when we put relaxed strides to on, because
suddenly the fortran order array might be expanded along a 1-sized axis,
because it is also C order. So I wonder if we can fix these regressions
and at the same time maybe provide a more intuitive approach then using
the memory order blindly

- Sebastian

 
 Jaime
  
 
 - Sebastian
 
 
 
  Jaime
 
 
 
  - Sebastian
 
  
   I guess the main consideration for this is
 that we
  may be
   stuck with
   stuff b/c of backwards compatibility. Can
 you maybe
  say a
   little bit
   about what is allowed now, and what
 constraints that
  puts on
   things?
   E.g. are we already grovelling around in
 strides and
  picking
   random
   dimensions in some cases?
  
  
   Just to restate it: right now we only allow new
 views if the
  array is
   globally contiguous, so either along the first or
 last
  dimension.
  
  
   Jaime
  
  
   -n
  
   --
   Nathaniel J. Smith
   Postdoctoral researcher - Informatics -
 University
  of
   Edinburgh
   http://vorpus.org
  
  ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
  
 
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
  
  
   --
   (\__/)
   ( O.o)
   (  ) Este es Conejo. Copia a Conejo en tu firma
 y ayúdale
  en sus
   planes de dominación mundial.
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
  
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
 
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 
  --
  (\__/)
  ( O.o)
  (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale
 en sus
  planes de dominación mundial.
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  

Re: [Numpy-discussion] Views of a different dtype

2015-02-03 Thread Jaime Fernández del Río
On Tue, Feb 3, 2015 at 8:59 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Di, 2015-02-03 at 07:18 -0800, Jaime Fernández del Río wrote:
 
 snip
 
 
 
  Do you have a concrete example of what a non (1, 1) array that fails
  with relaxed strides would look like?
 
 
  If we used, as right now, the array flags as a first choice point, and
  only if none is set try to determine it from the strides/dimensions
  information, I fail to imagine any situation where the end result
  would be worse than now. I don't think that a little bit of
  predictable surprising in an advanced functionality is too bad. We
  could start raising on the face of ambiguity, we refuse to guess
  errors, even for the current behavior you show above, but that is more
  likely to trip people by not giving them any simple workaround, that
  it seems to me would be add a .T if all dimensions are 1 in some
  particular situations. Or are you thinking of something more serious
  than a shape mismatch when you write about breaking current code?
 

 Yes, I am talking only about wrongly shaped results for some fortran
 order arrays. A (20, 1) fortran order complex array being viewed as
 float, will with relaxed strides become a (20, 2) array instead of a
 (40, 1) one.


That is a limitation of the current implementation too, and happens already
whenever relaxed strides are in place. Which is the default for 1.10,
right?

Perhaps giving 'view' an 'order' or 'axis' kwarg could make sense after
all? It should probably be more of a hint of what to do (fortran vs c) when
in doubt. C would prioritize last axis, F the first, and we could even
add a raise option to have it fail if the axis cannot be inferred from
the strides and shape. Current behavior would is equivalent to what C
would do.

Jaime



 
  If there are any real loopholes in expanding this functionality, then
  lets not do it, but we know we have at least one user unsatisfied with
  the current performance, so I really think it is worth trying. Plus,
  I'll admit to that, messing around with some of these stuff deep
  inside the guts of the beast is lots of fun! ;)
 

 I do not think there are loopholes with expanding this functionality. I
 think there have regressions when we put relaxed strides to on, because
 suddenly the fortran order array might be expanded along a 1-sized axis,
 because it is also C order. So I wonder if we can fix these regressions
 and at the same time maybe provide a more intuitive approach then using
 the memory order blindly


 - Sebastian

 
  Jaime
 
 
  - Sebastian
 
 
  
   Jaime
  
  
  
   - Sebastian
  
   
I guess the main consideration for this is
  that we
   may be
stuck with
stuff b/c of backwards compatibility. Can
  you maybe
   say a
little bit
about what is allowed now, and what
  constraints that
   puts on
things?
E.g. are we already grovelling around in
  strides and
   picking
random
dimensions in some cases?
   
   
Just to restate it: right now we only allow new
  views if the
   array is
globally contiguous, so either along the first or
  last
   dimension.
   
   
Jaime
   
   
-n
   
--
Nathaniel J. Smith
Postdoctoral researcher - Informatics -
  University
   of
Edinburgh
http://vorpus.org
   
   ___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
   
  
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
   
   
   
   
   
--
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma
  y ayúdale
   en sus
planes de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
   
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
 

Re: [Numpy-discussion] Views of a different dtype

2015-02-03 Thread Sebastian Berg
On Mo, 2015-02-02 at 06:25 -0800, Jaime Fernández del Río wrote:
 On Sat, Jan 31, 2015 at 1:17 AM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
 On Fr, 2015-01-30 at 19:52 -0800, Jaime Fernández del Río
 wrote:
  On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith
 n...@pobox.com
  wrote:
  On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández
 del Río
  jaime.f...@gmail.com wrote:
  [...]
 
 snip
 
 
  Could we make it more like: check to see if the last
 dimension
  works.
  If not, raise an error (and let the user transpose
 some other
  dimension there if that's what they wanted)? Or
 require the
  user to
  specify which dimension will absorb the shape
 change? (If we
  were
  doing this from scratch, then it would be tempting
 to just say
  that we
  always add a new dimension at the end with
 newtype.itemsize /
  oldtype.itemsize entries, or absorb such a dimension
 if
  shrinking. As
  a bonus, this would always work, regardless of
 contiguity!
  Except that
  when shrinking the last dimension would have to be
 contiguous,
  of
  course.)
 
 
  When we roll @ in and people start working with stacks of
 matrices, we
  will probably find ourselves having to create an alias,
 similar to .T,
  for .swapaxes(-1, -2). Searching for the smallest stride
 allows to
  take views of such arrays, which does not work right now
 because the
  array is no longer contiguous globally.
 
 
 That is true, but I agree with Nathaniel at least as far as
 that I would
 prefer a user to be able to safely use `view` even he has not
 even an
 inkling about what his memory layout is. One option would be
 an
 `axis=-1` default (maybe FutureWarn this from `axis=None`
 which would
 look at order, see below -- or maybe have axis='A', 'C' and
 'F' and
 default to 'A' for starters).
 
 This even now could start creating bugs when enabling relaxed
 strides :(, because your good old fortran order complex array
 being
 viewed as a float one could expand along the wrong axis, and
 even
 without such arrays swap order pretty fast when operating on
 them, which
 can create impossibly to find bugs, because even a poweruser
 is likely
 to forget about such things.
 
 Of course you could argue that view is a poweruser feature and
 a user
 using it should keep these things in mind Though if you
 argue that,
 you can almost just use `np.ndarray` directly ;) -- ok, not
 really
 considering how cumbersome it is, but still.
 
 
 I have been giving this some thought, and am willing to concede that
 my first proposal may have been too ambitious. So even though the knob
 goes to 11, we can always do things incrementally. I am also wary of
 adding new keywords when it seems obvious that we do not have the
 functionality completely figured out, so here's my new proposal:
 
 
   * The objective is that a view of an array that is the result of
 slicing a contiguous array should be possible, if it remains
 contiguous (meaning stride == itemsize) along its original
 contiguous (first or last) dimension. This eliminates axis
 transposition from the previous proposal, although reversing
 the axes completely would also work.
   * To verify this, unless the C contiguous or Fortran contiguous
 flags are set, we would still need to look at the strides. An
 array would be C contiguous if, starting from the last stride
 it is equal to the itemsize, and working backwards every next
 stride is larger or equal than the product of the previous
 stride by the previous dimension. dimensions of size 1 would
 be ignored for these, except for the last one, which would be
 taken to have stride = itemsize. The Fortran case is of course
 the same in reverse.
   * I think the above combined with the current preference of C
 contiguousness over Fortran, would actually allow the views to
 always be reversible, which is also a nice thing to have.
 This eliminates most of the weirdness, but extends current
 functionality to cover cases like Jens reported a few days back.
 
 
 Does this sound better?
 

It seems fine as such, but I still worry 

Re: [Numpy-discussion] Views of a different dtype

2015-02-03 Thread Ian Henriksen
On Tue Feb 03 2015 at 1:47:34 PM Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 On Tue, Feb 3, 2015 at 8:59 AM, Sebastian Berg sebast...@sipsolutions.net
  wrote:

 On Di, 2015-02-03 at 07:18 -0800, Jaime Fernández del Río wrote:
 
 snip
 
 
 
  Do you have a concrete example of what a non (1, 1) array that fails
  with relaxed strides would look like?
 
 
  If we used, as right now, the array flags as a first choice point, and
  only if none is set try to determine it from the strides/dimensions
  information, I fail to imagine any situation where the end result
  would be worse than now. I don't think that a little bit of
  predictable surprising in an advanced functionality is too bad. We
  could start raising on the face of ambiguity, we refuse to guess
  errors, even for the current behavior you show above, but that is more
  likely to trip people by not giving them any simple workaround, that
  it seems to me would be add a .T if all dimensions are 1 in some
  particular situations. Or are you thinking of something more serious
  than a shape mismatch when you write about breaking current code?
 

 Yes, I am talking only about wrongly shaped results for some fortran
 order arrays. A (20, 1) fortran order complex array being viewed as
 float, will with relaxed strides become a (20, 2) array instead of a
 (40, 1) one.


 That is a limitation of the current implementation too, and happens
 already whenever relaxed strides are in place. Which is the default for
 1.10, right?

 Perhaps giving 'view' an 'order' or 'axis' kwarg could make sense after
 all? It should probably be more of a hint of what to do (fortran vs c) when
 in doubt. C would prioritize last axis, F the first, and we could even
 add a raise option to have it fail if the axis cannot be inferred from
 the strides and shape. Current behavior would is equivalent to what C
 would do.

 Jaime



IMHO, the best option would be something like this:

- When changing to a type with smaller itemsize, add a new axis after the
others so the resulting array is C contiguous (unless a different axis is
specified by a keyword argument). The advantage here is that if you index
the new view using the old indices for an entry, you get an array showing
its representation in the new type.
- No shape change for views with the same itemsize
- When changing to a type with a larger itemsize, collapse along the last
axis unless a different axis is specified, throwing an error if the axis
specified does not match the axis specified.

The last point essentially is just adding an axis argument. I like that
idea because it gives users the ability to do all that the array's memory
layout allows in a clear and concise way. Throwing an error if the default
axis doesn't work would be a good way to prevent strange bugs from
happening when the default behavior is expected.

The first point would be a break in backwards compatibility, so I'm not
sure if it's feasible at this point. The advantage would be that all all
arrays returned when using this functionality would be contiguous along the
last axis. The shape of the new array would be independent of the memory
layout of the original one. This would also be a much cleaner way to ensure
that views of a different type are always reversible while still allowing
for relaxed strides.

Either way, thanks for looking into this. It's a great feature to have
available.

-Ian Henriksen





 
  If there are any real loopholes in expanding this functionality, then
  lets not do it, but we know we have at least one user unsatisfied with
  the current performance, so I really think it is worth trying. Plus,
  I'll admit to that, messing around with some of these stuff deep
  inside the guts of the beast is lots of fun! ;)
 

 I do not think there are loopholes with expanding this functionality. I
 think there have regressions when we put relaxed strides to on, because
 suddenly the fortran order array might be expanded along a 1-sized axis,
 because it is also C order. So I wonder if we can fix these regressions
 and at the same time maybe provide a more intuitive approach then using
 the memory order blindly


 - Sebastian

 
  Jaime
 
 
  - Sebastian
 
 
  
   Jaime
  
  
  
   - Sebastian
  
   
I guess the main consideration for this is
  that we
   may be
stuck with
stuff b/c of backwards compatibility. Can
  you maybe
   say a
little bit
about what is allowed now, and what
  constraints that
   puts on
things?
E.g. are we already grovelling around in
  strides and
   picking
random

Re: [Numpy-discussion] Views of a different dtype

2015-02-02 Thread Neal Becker
I find it useful to be able to view a simple 1D contiguous array of complex as 
float (alternative real and imag), and also the do the reverse.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Views of a different dtype

2015-01-31 Thread Sebastian Berg
On Fr, 2015-01-30 at 19:52 -0800, Jaime Fernández del Río wrote:
 On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith n...@pobox.com
 wrote:
 On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández del Río
 jaime.f...@gmail.com wrote:
 [...]

snip

 
 Could we make it more like: check to see if the last dimension
 works.
 If not, raise an error (and let the user transpose some other
 dimension there if that's what they wanted)? Or require the
 user to
 specify which dimension will absorb the shape change? (If we
 were
 doing this from scratch, then it would be tempting to just say
 that we
 always add a new dimension at the end with newtype.itemsize /
 oldtype.itemsize entries, or absorb such a dimension if
 shrinking. As
 a bonus, this would always work, regardless of contiguity!
 Except that
 when shrinking the last dimension would have to be contiguous,
 of
 course.)
 
 
 When we roll @ in and people start working with stacks of matrices, we
 will probably find ourselves having to create an alias, similar to .T,
 for .swapaxes(-1, -2). Searching for the smallest stride allows to
 take views of such arrays, which does not work right now because the
 array is no longer contiguous globally.
  

That is true, but I agree with Nathaniel at least as far as that I would
prefer a user to be able to safely use `view` even he has not even an
inkling about what his memory layout is. One option would be an
`axis=-1` default (maybe FutureWarn this from `axis=None` which would
look at order, see below -- or maybe have axis='A', 'C' and 'F' and
default to 'A' for starters).

This even now could start creating bugs when enabling relaxed
strides :(, because your good old fortran order complex array being
viewed as a float one could expand along the wrong axis, and even
without such arrays swap order pretty fast when operating on them, which
can create impossibly to find bugs, because even a poweruser is likely
to forget about such things.

Of course you could argue that view is a poweruser feature and a user
using it should keep these things in mind Though if you argue that,
you can almost just use `np.ndarray` directly ;) -- ok, not really
considering how cumbersome it is, but still.

- Sebastian

 
 I guess the main consideration for this is that we may be
 stuck with
 stuff b/c of backwards compatibility. Can you maybe say a
 little bit
 about what is allowed now, and what constraints that puts on
 things?
 E.g. are we already grovelling around in strides and picking
 random
 dimensions in some cases?
 
 
 Just to restate it: right now we only allow new views if the array is
 globally contiguous, so either along the first or last dimension.
 
 
 Jaime
  
 
 -n
 
 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of
 Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 
 -- 
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
 planes de dominación mundial.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Views of a different dtype

2015-01-30 Thread Jaime Fernández del Río
On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández del Río
 jaime.f...@gmail.com wrote:
 [...]
  With all these in mind, my proposal for the new behavior is that taking a
  view of an array with a different dtype would require:
 
  That the newtype and oldtype be compatible, as defined by the algorithm
  checking object offsets linked above.
  If newtype.itemsize == oldtype.itemsize no more checks are needed, make
 it
  happen!
  If the array is C/Fortran contiguous, check that the size in bytes of the
  last/first dimension is evenly divided by newtype.itemsize. If it does,
 go
  for it.
  For non-contiguous arrays:
 
  Ignoring dimensions of size 1, check that no stride is smaller than
 either
  oldtype.itemsize or newtype.itemsize. If any is found this is an
 as_strided
  product, sorry, can't do it!
  Ignoring dimensions of size 1, find a contiguous dimension, i.e. stride
 ==
  oldtype.itemsize
 
  If found, check that it is the only one with that stride, that it is the
  minimal stride, and that the size in bytes of that dimension is evenly
  divided by newitem,itemsize.
  If none is found, check if there is a size 1 dimension that is also
 unique
  (unless we agree on a default, as mentioned above) and that
 newtype.itemsize
  evenly divides oldtype.itemsize.

 I'm really wary of this idea that we go grovelling around looking for
 some suitable dimension somewhere to absorb the new items. Basically
 nothing in numpy produces semantically different arrays (e.g., ones
 with different shapes) depending on the *strides* of the input array.


In a convoluted way, changing the dtype already does different thing
depending on the strides, as right now the expansion/contraction happens
along the last/first axis depending if the array is C/Fortran contiguous,
and those flags are calculated from the strides:

 a = np.ones((2, 2), dtype=complex)
 a.view(float).shape
(2, 4)
 a.T.view(float).shape
(4, 2)

A user unaware that transposition has changed the memory layout will be
surprised to find his complex values being unpacked along the first, not
the last dimension. But that's the way it already is.

With my proposal above, the intention is that the same happens not only for
the standard reverse axes order transposition, but with any other one,
even if you have sliced the array.



 Could we make it more like: check to see if the last dimension works.
 If not, raise an error (and let the user transpose some other
 dimension there if that's what they wanted)? Or require the user to
 specify which dimension will absorb the shape change? (If we were
 doing this from scratch, then it would be tempting to just say that we
 always add a new dimension at the end with newtype.itemsize /
 oldtype.itemsize entries, or absorb such a dimension if shrinking. As
 a bonus, this would always work, regardless of contiguity! Except that
 when shrinking the last dimension would have to be contiguous, of
 course.)


When we roll @ in and people start working with stacks of matrices, we will
probably find ourselves having to create an alias, similar to .T, for
.swapaxes(-1, -2). Searching for the smallest stride allows to take views
of such arrays, which does not work right now because the array is no
longer contiguous globally.



 I guess the main consideration for this is that we may be stuck with
 stuff b/c of backwards compatibility. Can you maybe say a little bit
 about what is allowed now, and what constraints that puts on things?
 E.g. are we already grovelling around in strides and picking random
 dimensions in some cases?


Just to restate it: right now we only allow new views if the array is
globally contiguous, so either along the first or last dimension.

Jaime



 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Views of a different dtype

2015-01-29 Thread Allan Haldane
Hi Jamie,

I'm not sure whether to reply here or on github!

I have a comment on the condition There must be the same total number
of objects before and after.

I originally looked into this to solve
https://github.com/numpy/numpy/issues/3256, which involves taking a view
of a subset of the fields of a structured array. Such a view causes the
final number objects to be less than the original number.

For example, say you have an array with three fields a,b,c, and you only
want a and c. Numpy currently does the equivalent of this (in
_index_fields in numpy/core/_internals.py):

 a = zeros([(1,2,3), (4,5,6)],
...   dtype([('a', 'i8'), ('b', 'i8'), ('c', 'i8')]))
 a.view(dtype({'names': ['x', 'y'],
...   'formats': ['i8', 'i8'],
...   'offsets': [0, 16]}))
array([(1, 3), (4, 6)],
  dtype={'names':['x','y'], 'formats':['i8','i8'],
'offsets':[0,16], 'itemsize':24})

It works although the repr is a bit wordy. However, it currently does
not work if we have 'O' instead of 'i8' since we can't take views of
object arrays. If this was an object array, the resulting view only has
two objects, not the original three.

This does mean that the view can't be undone since the view object no
longer knows there is an object at offset 8. But that does not seem
bad to me compared to the usefulness of such a view.

I think to fix this, you only need to remove the test for this case in
dtype_view_is_safe, and then return all(o in old_offsets for o in
new_offsets). (and btw, I like the much cleaner logic you have in that
section!)

Allan

On 01/28/2015 07:56 PM, Jaime Fernández del Río wrote:
 HI all,
 
 There has been some recent discussion going on on the limitations that
 numpy imposes to taking views of an array with a different dtype.
 
 As of right now, you can basically only take a view of an array if it
 has no Python objects and neither the old nor the new dtype are
 structured. Furthermore, the array has to be either C or Fortran contiguous.
 
 This seem to be way too strict, but the potential for disaster getting a
 loosening of the restrictions wrong is big, so it should be handled with
 care.
 
 Allan Haldane and myself have been looking into this separately and
 discussing some of the details over at github, and we both think that
 the only true limitation that has to be imposed is that the offsets of
 Python objects within the new and old dtypes remain compatible. I have
 expanded Allan's work from here:
 
 https://github.com/ahaldane/numpy/commit/e9ca367
 
 to make it as flexible as I have been able. An implementation of the
 algorithm in Python, with a few tests, can be found here:
 
 https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py
 
 I would appreciate getting some eyes on it for correctness, and to make
 sure that it won't break with some weird dtype.
 
 I am also trying to figure out what the ground rules for stride and
 shape conversions when taking a view with a different dtype should be. I
 submitted a PR (gh-5508) a couple for days ago working on that, but I am
 not so sure that the logic is completely sound. Again, to get more eyes
 on it, I am going to reproduce my thoughts here on the hope of getting
 some feedback.
 
 The objective would be to always allow a view of a different dtype
 (given that the dtypes be compatible as described above) to be taken if:
 
   * The itemsize of the dtype doesn't change.
   * The itemsize changes, but the array being viewed is the result of
 slicing and transposing axes of a contiguous array, and it is still
 contiguous, defined as stride == dtype.itemsize, along its
 smallest-strided dimension, and the itemsize of the newtype exactly
 divides the size of that dimension.
   * Ideally taking a view should be a reversible process, i.e. if
 oldtype = arr.dtype, then doing arr.view(newtype).view(oldtype)
 should give you back a view of arr with the same original shape,
 strides and dtype.
 
 This last point can get tricky if the minimal stride dimension has size
 1, as there could be several of those, e.g.:
 
  a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1)
  a.flags.contiguous
 False
  a.shape
 (3, 1, 2)
  a.strides  # the stride of the size 1 dimension could be
 anything, ignore it!
 (32, 8, 8)
 
 b = a.view(complex)  # this fails right now, but should work
  b.flags.contiguous
 False
  b.shape
 (3, 1, 1)
  b.strides  # the stride of the size 1 dimensions could be
 anything, ignore them!
 (32, 16, 16)
 
 c = b.view(float)  # which of the two size 1 dimensions should we
 expand?
 
 
 In the face of ambiguity refuse the temptation to guess dictates that
 last view should raise an error, unless we agree and document some
 default. Any thoughts?
 
 Then there is the endless complication one could get into with arrays
 created with as_strided. I'm not smart enough to 

Re: [Numpy-discussion] Views of a different dtype

2015-01-29 Thread Nathaniel Smith
On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fernández del Río
jaime.f...@gmail.com wrote:
[...]
 With all these in mind, my proposal for the new behavior is that taking a
 view of an array with a different dtype would require:

 That the newtype and oldtype be compatible, as defined by the algorithm
 checking object offsets linked above.
 If newtype.itemsize == oldtype.itemsize no more checks are needed, make it
 happen!
 If the array is C/Fortran contiguous, check that the size in bytes of the
 last/first dimension is evenly divided by newtype.itemsize. If it does, go
 for it.
 For non-contiguous arrays:

 Ignoring dimensions of size 1, check that no stride is smaller than either
 oldtype.itemsize or newtype.itemsize. If any is found this is an as_strided
 product, sorry, can't do it!
 Ignoring dimensions of size 1, find a contiguous dimension, i.e. stride ==
 oldtype.itemsize

 If found, check that it is the only one with that stride, that it is the
 minimal stride, and that the size in bytes of that dimension is evenly
 divided by newitem,itemsize.
 If none is found, check if there is a size 1 dimension that is also unique
 (unless we agree on a default, as mentioned above) and that newtype.itemsize
 evenly divides oldtype.itemsize.

I'm really wary of this idea that we go grovelling around looking for
some suitable dimension somewhere to absorb the new items. Basically
nothing in numpy produces semantically different arrays (e.g., ones
with different shapes) depending on the *strides* of the input array.

Could we make it more like: check to see if the last dimension works.
If not, raise an error (and let the user transpose some other
dimension there if that's what they wanted)? Or require the user to
specify which dimension will absorb the shape change? (If we were
doing this from scratch, then it would be tempting to just say that we
always add a new dimension at the end with newtype.itemsize /
oldtype.itemsize entries, or absorb such a dimension if shrinking. As
a bonus, this would always work, regardless of contiguity! Except that
when shrinking the last dimension would have to be contiguous, of
course.)

I guess the main consideration for this is that we may be stuck with
stuff b/c of backwards compatibility. Can you maybe say a little bit
about what is allowed now, and what constraints that puts on things?
E.g. are we already grovelling around in strides and picking random
dimensions in some cases?

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Views of a different dtype

2015-01-29 Thread Allan Haldane
Hello again,

I also have a minor code comment:

In get_object_offsets you iterate over dtype.fields.values(). Be
careful, because dtype.fields also includes the field titles. For
example this fails:

dta = np.dtype([(('a', 'title'), 'O'), ('b', 'O'), ('c', 'i1')])
dtb = np.dtype([('a', 'O'), ('b', 'O'), ('c', 'i1')])
assert dtype_view_is_safe(dta, dtb)

I've seen two strategies in the numpy code to work around this. One is
to to skip entries that are titles, like this:

for key,field in dtype.fields.iteritems():
if len(field) == 3 and field[2] == key: #detect titles
continue
#do something

You can find all examples that do this by grepping NPY_TITLE_KEY in the
numpy source.

The other (more popular) strategy is to iterate over dtype.names. You
can find all examples of this by grepping for names_size.

I don't know the history of it, but it looks to me like titles in
dtypes are an obsolete feature. Are they actually used anywhere?

Allan

On 01/28/2015 07:56 PM, Jaime Fernández del Río wrote:
 HI all,
 
 There has been some recent discussion going on on the limitations that
 numpy imposes to taking views of an array with a different dtype.
 
 As of right now, you can basically only take a view of an array if it
 has no Python objects and neither the old nor the new dtype are
 structured. Furthermore, the array has to be either C or Fortran contiguous.
 
 This seem to be way too strict, but the potential for disaster getting a
 loosening of the restrictions wrong is big, so it should be handled with
 care.
 
 Allan Haldane and myself have been looking into this separately and
 discussing some of the details over at github, and we both think that
 the only true limitation that has to be imposed is that the offsets of
 Python objects within the new and old dtypes remain compatible. I have
 expanded Allan's work from here:
 
 https://github.com/ahaldane/numpy/commit/e9ca367
 
 to make it as flexible as I have been able. An implementation of the
 algorithm in Python, with a few tests, can be found here:
 
 https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py
 
 I would appreciate getting some eyes on it for correctness, and to make
 sure that it won't break with some weird dtype.
 
 I am also trying to figure out what the ground rules for stride and
 shape conversions when taking a view with a different dtype should be. I
 submitted a PR (gh-5508) a couple for days ago working on that, but I am
 not so sure that the logic is completely sound. Again, to get more eyes
 on it, I am going to reproduce my thoughts here on the hope of getting
 some feedback.
 
 The objective would be to always allow a view of a different dtype
 (given that the dtypes be compatible as described above) to be taken if:
 
   * The itemsize of the dtype doesn't change.
   * The itemsize changes, but the array being viewed is the result of
 slicing and transposing axes of a contiguous array, and it is still
 contiguous, defined as stride == dtype.itemsize, along its
 smallest-strided dimension, and the itemsize of the newtype exactly
 divides the size of that dimension.
   * Ideally taking a view should be a reversible process, i.e. if
 oldtype = arr.dtype, then doing arr.view(newtype).view(oldtype)
 should give you back a view of arr with the same original shape,
 strides and dtype.
 
 This last point can get tricky if the minimal stride dimension has size
 1, as there could be several of those, e.g.:
 
  a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1)
  a.flags.contiguous
 False
  a.shape
 (3, 1, 2)
  a.strides  # the stride of the size 1 dimension could be
 anything, ignore it!
 (32, 8, 8)
 
 b = a.view(complex)  # this fails right now, but should work
  b.flags.contiguous
 False
  b.shape
 (3, 1, 1)
  b.strides  # the stride of the size 1 dimensions could be
 anything, ignore them!
 (32, 16, 16)
 
 c = b.view(float)  # which of the two size 1 dimensions should we
 expand?
 
 
 In the face of ambiguity refuse the temptation to guess dictates that
 last view should raise an error, unless we agree and document some
 default. Any thoughts?
 
 Then there is the endless complication one could get into with arrays
 created with as_strided. I'm not smart enough to figure when and when
 not those could work, but am willing to retake the discussion if someone
 wiser si interested.
 
 With all these in mind, my proposal for the new behavior is that taking
 a view of an array with a different dtype would require:
 
  1. That the newtype and oldtype be compatible, as defined by the
 algorithm checking object offsets linked above.
  2. If newtype.itemsize == oldtype.itemsize no more checks are needed,
 make it happen!
  3. If the array is C/Fortran contiguous, check that the size in bytes
 of the last/first dimension is evenly divided by newtype.itemsize.
 

[Numpy-discussion] Views of a different dtype

2015-01-28 Thread Jaime Fernández del Río
HI all,

There has been some recent discussion going on on the limitations that
numpy imposes to taking views of an array with a different dtype.

As of right now, you can basically only take a view of an array if it has
no Python objects and neither the old nor the new dtype are structured.
Furthermore, the array has to be either C or Fortran contiguous.

This seem to be way too strict, but the potential for disaster getting a
loosening of the restrictions wrong is big, so it should be handled with
care.

Allan Haldane and myself have been looking into this separately and
discussing some of the details over at github, and we both think that the
only true limitation that has to be imposed is that the offsets of Python
objects within the new and old dtypes remain compatible. I have expanded
Allan's work from here:

https://github.com/ahaldane/numpy/commit/e9ca367

to make it as flexible as I have been able. An implementation of the
algorithm in Python, with a few tests, can be found here:

https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py

I would appreciate getting some eyes on it for correctness, and to make
sure that it won't break with some weird dtype.

I am also trying to figure out what the ground rules for stride and shape
conversions when taking a view with a different dtype should be. I
submitted a PR (gh-5508) a couple for days ago working on that, but I am
not so sure that the logic is completely sound. Again, to get more eyes on
it, I am going to reproduce my thoughts here on the hope of getting some
feedback.

The objective would be to always allow a view of a different dtype (given
that the dtypes be compatible as described above) to be taken if:

   - The itemsize of the dtype doesn't change.
   - The itemsize changes, but the array being viewed is the result of
   slicing and transposing axes of a contiguous array, and it is still
   contiguous, defined as stride == dtype.itemsize, along its smallest-strided
   dimension, and the itemsize of the newtype exactly divides the size of that
   dimension.
   - Ideally taking a view should be a reversible process, i.e. if oldtype
   = arr.dtype, then doing arr.view(newtype).view(oldtype) should give you
   back a view of arr with the same original shape, strides and dtype.

This last point can get tricky if the minimal stride dimension has size 1,
as there could be several of those, e.g.:

 a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1)
 a.flags.contiguous
False
 a.shape
(3, 1, 2)
 a.strides  # the stride of the size 1 dimension could be anything,
ignore it!
(32, 8, 8)

b = a.view(complex)  # this fails right now, but should work
 b.flags.contiguous
False
 b.shape
(3, 1, 1)
 b.strides  # the stride of the size 1 dimensions could be anything,
ignore them!
(32, 16, 16)

c = b.view(float)  # which of the two size 1 dimensions should we expand?


In the face of ambiguity refuse the temptation to guess dictates that
last view should raise an error, unless we agree and document some default.
Any thoughts?

Then there is the endless complication one could get into with arrays
created with as_strided. I'm not smart enough to figure when and when not
those could work, but am willing to retake the discussion if someone wiser
si interested.

With all these in mind, my proposal for the new behavior is that taking a
view of an array with a different dtype would require:

   1. That the newtype and oldtype be compatible, as defined by the
   algorithm checking object offsets linked above.
   2. If newtype.itemsize == oldtype.itemsize no more checks are needed,
   make it happen!
   3. If the array is C/Fortran contiguous, check that the size in bytes of
   the last/first dimension is evenly divided by newtype.itemsize. If it does,
   go for it.
   4. For non-contiguous arrays:
  1. Ignoring dimensions of size 1, check that no stride is smaller
  than either oldtype.itemsize or newtype.itemsize. If any is found this is
  an as_strided product, sorry, can't do it!
  2. Ignoring dimensions of size 1, find a contiguous dimension, i.e.
  stride == oldtype.itemsize
 1. If found, check that it is the only one with that stride, that
 it is the minimal stride, and that the size in bytes of that
dimension is
 evenly divided by newitem,itemsize.
 2. If none is found, check if there is a size 1 dimension that is
 also unique (unless we agree on a default, as mentioned
above) and that
 newtype.itemsize evenly divides oldtype.itemsize.

Apologies for the long, dense content, but any thought or comments are very
welcome.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion