Re: [Numpy-discussion] numpy.trapz() doesn't respect subclass

2010-03-29 Thread Bruce Southey
On 03/27/2010 01:31 PM, Ryan May wrote:
 On Sat, Mar 27, 2010 at 11:12 AM,josef.p...@gmail.com  wrote:

 On Sat, Mar 27, 2010 at 1:00 PM, Ryan Mayrma...@gmail.com  wrote:
  
 On Mon, Mar 22, 2010 at 8:14 AM, Ryan Mayrma...@gmail.com  wrote:

 On Sun, Mar 21, 2010 at 11:57 PM,josef.p...@gmail.com  wrote:
  
 On Mon, Mar 22, 2010 at 12:49 AM, Ryan Mayrma...@gmail.com  wrote:

 Hi,

 I found that trapz() doesn't work with subclasses:

 http://projects.scipy.org/numpy/ticket/1438

 A simple patch (attached) to change asarray() to asanyarray() fixes
 the problem fine.
  
 Are you sure this function works with matrices and other subclasses?

 Looking only very briefly at it: the multiplication might be a problem.

 Correct, it probably *is* a problem in some cases with matrices.  In
 this case, I was using quantities (Darren Dale's unit-aware array
 package), and the result was that units were stripped off.

 The patch can't make trapz() work with all subclasses. However, right
 now, you have *no* hope of getting a subclass out of trapz().  With
 this change, subclasses that don't redefine operators can work fine.
 If you're passing a Matrix to trapz() and expecting it to work, IMHO
 you're doing it wrong.  You can still pass one in by using asarray()
 yourself.  Without this patch, I'm left with copying and maintaining a
 copy of the code elsewhere, just so I can loosen the function's input
 processing. That seems wrong, since there's really no need in my case
 to drop down to an ndarray. The input I'm giving it supports all the
 operations it needs, so it should just work with my original input.
  
 With asarray it gives correct results for matrices and all array_like
 and subclasses, it just doesn't preserve the type.
 Your patch would break matrices and possibly other types, masked_arrays?, ...
  
 It would break matrices, yes.  I would argue that masked arrays are
 already broken with trapz:

 In [1]: x = np.arange(10)

 In [2]: y = x * x

 In [3]: np.trapz(y, x)
 Out[3]: 244.5

 In [4]: ym = np.ma.array(y, mask=(x4)(x7))

 In [5]: np.trapz(ym, x)
 Out[5]: 244.5

 In [6]: y[5:7] = 0

 In [7]: ym = np.ma.array(y, mask=(x4)(x7))

 In [8]: np.trapz(ym, x)
 Out[8]: 183.5

 Because of the call to asarray(), the mask is completely discarded and
 you end up with identical results to an unmasked array,
 which is not what I'd expect.  Worse, the actual numeric value of the
 positions that were masked affect the final answer. My patch allows
 this to work as expected too.

Actually you should assume that unless it is explicitly addressed 
(either by code or via a test), any subclass of ndarray (matrix, masked, 
structured, record and even sparse) may not provide a 'valid' answer.  
There are probably many numpy functions that only really work with the 
standard ndarray. Most of the time people do not meet these with the 
subclasses or have workarounds so there has been little requirement to 
address this especially due to the added overhead needed for checking.

Also, any patch that does not explicitly define the assumed behavior 
with points that are masked  has to be rejected. It is not even clear 
what the expected behavior is for masked arrays should be:
Is it even valid for trapz to be integrating across the full range if 
there are missing points? That implies some assumption about the missing 
points.
If is valid, then should you just ignore the masked values or try to 
predict the missing values first? Perhaps you may want to have the 
option to do both.


 One solution would be using arraywrap as in numpy.linalg.
  
 By arraywrap, I'm assuming you mean:

 def _makearray(a):
  new = asarray(a)
  wrap = getattr(a, __array_prepare__, new.__array_wrap__)
  return new, wrap

 I'm not sure if that's identical to just letting the subclass handle
 what's needed.  To my eyes, that doesn't look as though it'd be
 equivalent, both for handling masked arrays and Quantities. For
 quantities at least, the result of trapz will have different units
 than either of the inputs.


 for related discussion:
 http://mail.scipy.org/pipermail/scipy-dev/2009-June/012061.html
  
 Actually, that discussion kind of makes my point.  Matrices are a pain
 to make work in a general sense because they *break* ndarray
 conventions--to me it doesn't make sense to help along classes that
 break convention at the expense of making well-behaved classes a pain
 to use.  You should need an *explicit* cast of a matrix to an ndarray
 instead of the function quietly doing it for you. (Explicit is better
 than implicit) It just seems absurd that if I make my own ndarray
 subclass that *just* adds some behavior to the array, but doesn't
 break *any* operations, I need to do one of the following:

 1) Have my own copy of trapz that works with my class
 2) Wrap every call to numpy's own trapz() to put the metadata back.

 Does it not seem backwards that 

Re: [Numpy-discussion] numpy.trapz() doesn't respect subclass

2010-03-29 Thread Ryan May
On Mon, Mar 29, 2010 at 8:00 AM, Bruce Southey bsout...@gmail.com wrote:
 On 03/27/2010 01:31 PM, Ryan May wrote:
 Because of the call to asarray(), the mask is completely discarded and
 you end up with identical results to an unmasked array,
 which is not what I'd expect.  Worse, the actual numeric value of the
 positions that were masked affect the final answer. My patch allows
 this to work as expected too.

 Actually you should assume that unless it is explicitly addressed
 (either by code or via a test), any subclass of ndarray (matrix, masked,
 structured, record and even sparse) may not provide a 'valid' answer.
 There are probably many numpy functions that only really work with the
 standard ndarray. Most of the time people do not meet these with the
 subclasses or have workarounds so there has been little requirement to
 address this especially due to the added overhead needed for checking.

It's not that I'm surprised that masked arrays don't work. It's more
that the calls to np.asarray within trapz() have been held up as being
necessary for things like matrices and (at the time) masked arrays to
work properly; as if calling asarray() is supposed to make all
subclasses work, though at a base level by dropping to an ndarray. To
me, the current behavior with masked arrays is worse than if passing
in a matrix raised an exception.  One is a silently wrong answer, the
other is a big error that the programmer can see, test, and fix.

 Also, any patch that does not explicitly define the assumed behavior
 with points that are masked  has to be rejected. It is not even clear
 what the expected behavior is for masked arrays should be:
 Is it even valid for trapz to be integrating across the full range if
 there are missing points? That implies some assumption about the missing
 points.
 If is valid, then should you just ignore the masked values or try to
 predict the missing values first? Perhaps you may want to have the
 option to do both.

You're right, it doesn't actually work with MaskedArrays as it stand
right now, because it calls add.reduce() directly instead of using the
array.sum() method. Once fixed, by allowing MaskedArray to handle the
operation, you end up not integrating over the masked region. Any
operation involving masked points results in contributions by masked
points are ignored.  I guess it's as if you assumed the function was 0
over the masked region.  If you wanted to ignore the masked points,
but integrate over the region (making a really big trapezoid over that
region), you could just pass in the .compressed() versions of the
arrays.

 than implicit) It just seems absurd that if I make my own ndarray
 subclass that *just* adds some behavior to the array, but doesn't
 break *any* operations, I need to do one of the following:

 1) Have my own copy of trapz that works with my class
 2) Wrap every call to numpy's own trapz() to put the metadata back.

 Does it not seem backwards that the class that breaks conventions
 just works while those that don't break conventions, will work
 perfectly with the function as written, need help to be treated
 properly?

 You need your own version of trapz or whatever function because it has
 the behavior that you expect. But a patch should not break numpy so you
 need to at least to have a section that looks for masked array subtypes
 and performs the desired behavior(s).

I'm not trying to be difficult but it seems like there are conflicting
ideas here: we shouldn't break numpy, which in this case means making
matrices no longer work with trapz().  On the other hand, subclasses
can do a lot of things, so there's no real expectation that they
should ever work with numpy functions in general.  Am I missing
something here? I'm just trying to understand what I perceive to be
some inconsistencies in numpy's behavior and, more importantly,
convention with regard subclasses.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.trapz() doesn't respect subclass

2010-03-29 Thread Ryan May
Hi,

I decided that having actual code that does what I want and keeps
backwards compatibility (and adds tests) might be better than arguing
semantics.  I've updated my patch to:

* Uses the array.sum() method instead of add.reduce to make subclasses
fully work (this was still breaking masked arrays.
* Catches an exception on doing the actual multiply and sum of the
arrays and tries again after casting to ndarrays.  This allows any
subclasses that relied on being cast to still work.
* Adds tests that ensure matrices work (test passes before and after
changes to trapz()) and adds a test for masked arrays that checks that
masked points are treated as expected. In this case, expected is
defined to be the same as if you implemented the trapezoidal method by
hand using MaskedArray's basic arithmetic operations.

Attached here and at: http://projects.scipy.org/numpy/ticket/1438

I think this addresses the concerns that were raised about the changes
for subclasses in this case. Let me know if I've missed something (or
if there's no way in hell any such patch will ever be committed).

Thanks,

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.trapz() doesn't respect subclass

2010-03-29 Thread Bruce Southey
On 03/29/2010 10:17 AM, Ryan May wrote:
 On Mon, Mar 29, 2010 at 8:00 AM, Bruce Southeybsout...@gmail.com  wrote:

 On 03/27/2010 01:31 PM, Ryan May wrote:
  
 Because of the call to asarray(), the mask is completely discarded and
 you end up with identical results to an unmasked array,
 which is not what I'd expect.  Worse, the actual numeric value of the
 positions that were masked affect the final answer. My patch allows
 this to work as expected too.


 Actually you should assume that unless it is explicitly addressed
 (either by code or via a test), any subclass of ndarray (matrix, masked,
 structured, record and even sparse) may not provide a 'valid' answer.
 There are probably many numpy functions that only really work with the
 standard ndarray. Most of the time people do not meet these with the
 subclasses or have workarounds so there has been little requirement to
 address this especially due to the added overhead needed for checking.
  
 It's not that I'm surprised that masked arrays don't work. It's more
 that the calls to np.asarray within trapz() have been held up as being
 necessary for things like matrices and (at the time) masked arrays to
 work properly; as if calling asarray() is supposed to make all
 subclasses work, though at a base level by dropping to an ndarray. To
 me, the current behavior with masked arrays is worse than if passing
 in a matrix raised an exception.  One is a silently wrong answer, the
 other is a big error that the programmer can see, test, and fix.


 Also, any patch that does not explicitly define the assumed behavior
 with points that are masked  has to be rejected. It is not even clear
 what the expected behavior is for masked arrays should be:
 Is it even valid for trapz to be integrating across the full range if
 there are missing points? That implies some assumption about the missing
 points.
 If is valid, then should you just ignore the masked values or try to
 predict the missing values first? Perhaps you may want to have the
 option to do both.
  
 You're right, it doesn't actually work with MaskedArrays as it stand
 right now, because it calls add.reduce() directly instead of using the
 array.sum() method. Once fixed, by allowing MaskedArray to handle the
 operation, you end up not integrating over the masked region. Any
 operation involving masked points results in contributions by masked
 points are ignored.  I guess it's as if you assumed the function was 0
 over the masked region.  If you wanted to ignore the masked points,
 but integrate over the region (making a really big trapezoid over that
 region), you could just pass in the .compressed() versions of the
 arrays.


 than implicit) It just seems absurd that if I make my own ndarray
 subclass that *just* adds some behavior to the array, but doesn't
 break *any* operations, I need to do one of the following:

 1) Have my own copy of trapz that works with my class
 2) Wrap every call to numpy's own trapz() to put the metadata back.

 Does it not seem backwards that the class that breaks conventions
 just works while those that don't break conventions, will work
 perfectly with the function as written, need help to be treated
 properly?


 You need your own version of trapz or whatever function because it has
 the behavior that you expect. But a patch should not break numpy so you
 need to at least to have a section that looks for masked array subtypes
 and performs the desired behavior(s).
  
 I'm not trying to be difficult but it seems like there are conflicting
 ideas here: we shouldn't break numpy, which in this case means making
 matrices no longer work with trapz().  On the other hand, subclasses
 can do a lot of things, so there's no real expectation that they
 should ever work with numpy functions in general.  Am I missing
 something here? I'm just trying to understand what I perceive to be
 some inconsistencies in numpy's behavior and, more importantly,
 convention with regard subclasses.

 Ryan


You should not confuse class functions with normal Python functions. 
Functions that are inherited from the ndarray superclass should be the 
same in the subclass unless these class functions have been modified. 
However many functions like trapz are not part of the ndarray superclass 
that have been written to handle the standard array (i.e. the unmodified 
ndarray superclass) but these may or may not work for all ndarray 
subclasses.  Other functions have been written to handle the specific 
ndarray subclass such as masked array or Matrix and may (but not 
guaranteed to) work for the standard array. Thus, I think your 
'inconsistencies' relate to the simple fact that not all numpy functions 
are aware of ndarray subclasses.

What is missing are the bug reports and solutions for these functions 
that occur when the expected behavior differs between the ndarray 
superclass and ndarray subclasses.  In the case of trapz, the bug report 
needs to be 

Re: [Numpy-discussion] numpy.trapz() doesn't respect subclass

2010-03-28 Thread Darren Dale
On Sat, Mar 27, 2010 at 10:23 PM,  josef.p...@gmail.com wrote:
 subclasses of ndarray, like masked_arrays and quantities, and classes
 that delegate to array calculations, like pandas, can redefine
 anything. So there is not much that can be relied on if any subclass
 is allowed to be used inside a function

 e.g. quantities redefines sin, cos,...
 http://packages.python.org/quantities/user/issues.html#umath-functions

Those functions were only intended to be used in the short term, until
the ufuncs that ship with numpy included a mechanism that allowed
quantity arrays to propagate the units. It would be nice to have a
mechanism (like we have discussed briefly just recently on this list)
where there is a single entry point to a given function like add, but
subclasses can tweak the execution.

We discussed the possibility of simplifying the wrapping scheme with a
method like __handle_gfunc__. (I don't think this necessarily has to
be limited to ufuncs.) I think a second method like __prepare_input__
is also necessary. Imagine something like:

class GenericFunction:
@property
def executable(self):
return self._executable
def __init__(self, executable):
self._executable = executable
def __call__(self, *args, **kwargs):
# find the input with highest priority, and then:
args, kwargs = input.__prepare_input__(self, *args, **kwargs)
return input.__handle_gfunc__(self, *args, **kwargs)

# this is the core function to be passed to the generic class:
def _add(a, b, out=None):
# the generic, ndarray implementation.
...

# here is the publicly exposed interface:
add = GenericFunction(_add)

# now my subclasses
class MyArray(ndarray):
# My class tweaks the execution of the function in __handle_gfunc__
def __prepare_input__(self, gfunc, *args, **kwargs):
return mod_input[gfunc](*args, **kwargs)
def __handle_gfunc__(self, gfunc, *args, **kwargs):
res = gfunc.executable(*args, **kwargs)
# you could have called a different core func there
return mod_output[gfunc](res, *args, **kwargs)

class MyNextArray(MyArray):
def __prepare_input__(self, gfunc, *args, **kwargs):
# let the superclass do its thing:
args, kwargs = MyArray.__prepare_input__(self, gfunc, *args, **kwargs)
# now I can tweak it further:
return mod_input_further[gfunc](*args, **kwargs)
def __handle_gfunc__(self, gfunc, *args, **kwargs):
# let's defer to the superclass to handle calling the core function:
res = MyArray.__handle_gfunc__(self, gfunc, *args, **kwargs)
# and now we have one more crack at the result before passing it back:
return mod_output_further[gfunc](res, *args, **kwargs)

If a gfunc is not recognized, the subclass might raise a
NotImplementedError or it might just pass the original args, kwargs on
through. I didn't write that part out because the example was already
running long. But the point is that a single entry point could be used
for any subclass, without having to worry about how to support every
subclass.

Darren
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.trapz() doesn't respect subclass

2010-03-27 Thread Ryan May
On Mon, Mar 22, 2010 at 8:14 AM, Ryan May rma...@gmail.com wrote:
 On Sun, Mar 21, 2010 at 11:57 PM,  josef.p...@gmail.com wrote:
 On Mon, Mar 22, 2010 at 12:49 AM, Ryan May rma...@gmail.com wrote:
 Hi,

 I found that trapz() doesn't work with subclasses:

 http://projects.scipy.org/numpy/ticket/1438

 A simple patch (attached) to change asarray() to asanyarray() fixes
 the problem fine.

 Are you sure this function works with matrices and other subclasses?

 Looking only very briefly at it: the multiplication might be a problem.

 Correct, it probably *is* a problem in some cases with matrices.  In
 this case, I was using quantities (Darren Dale's unit-aware array
 package), and the result was that units were stripped off.

 The patch can't make trapz() work with all subclasses. However, right
 now, you have *no* hope of getting a subclass out of trapz().  With
 this change, subclasses that don't redefine operators can work fine.
 If you're passing a Matrix to trapz() and expecting it to work, IMHO
 you're doing it wrong.  You can still pass one in by using asarray()
 yourself.  Without this patch, I'm left with copying and maintaining a
 copy of the code elsewhere, just so I can loosen the function's input
 processing. That seems wrong, since there's really no need in my case
 to drop down to an ndarray. The input I'm giving it supports all the
 operations it needs, so it should just work with my original input.

Anyone else care to weigh in here?

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.trapz() doesn't respect subclass

2010-03-27 Thread josef . pktd
On Sat, Mar 27, 2010 at 1:00 PM, Ryan May rma...@gmail.com wrote:
 On Mon, Mar 22, 2010 at 8:14 AM, Ryan May rma...@gmail.com wrote:
 On Sun, Mar 21, 2010 at 11:57 PM,  josef.p...@gmail.com wrote:
 On Mon, Mar 22, 2010 at 12:49 AM, Ryan May rma...@gmail.com wrote:
 Hi,

 I found that trapz() doesn't work with subclasses:

 http://projects.scipy.org/numpy/ticket/1438

 A simple patch (attached) to change asarray() to asanyarray() fixes
 the problem fine.

 Are you sure this function works with matrices and other subclasses?

 Looking only very briefly at it: the multiplication might be a problem.

 Correct, it probably *is* a problem in some cases with matrices.  In
 this case, I was using quantities (Darren Dale's unit-aware array
 package), and the result was that units were stripped off.

 The patch can't make trapz() work with all subclasses. However, right
 now, you have *no* hope of getting a subclass out of trapz().  With
 this change, subclasses that don't redefine operators can work fine.
 If you're passing a Matrix to trapz() and expecting it to work, IMHO
 you're doing it wrong.  You can still pass one in by using asarray()
 yourself.  Without this patch, I'm left with copying and maintaining a
 copy of the code elsewhere, just so I can loosen the function's input
 processing. That seems wrong, since there's really no need in my case
 to drop down to an ndarray. The input I'm giving it supports all the
 operations it needs, so it should just work with my original input.

With asarray it gives correct results for matrices and all array_like
and subclasses, it just doesn't preserve the type.
Your patch would break matrices and possibly other types, masked_arrays?, ...

One solution would be using arraywrap as in numpy.linalg.

for related discussion:
http://mail.scipy.org/pipermail/scipy-dev/2009-June/012061.html

Josef


 Anyone else care to weigh in here?

 Ryan

 --
 Ryan May
 Graduate Research Assistant
 School of Meteorology
 University of Oklahoma
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.trapz() doesn't respect subclass

2010-03-27 Thread Ryan May
On Sat, Mar 27, 2010 at 11:12 AM,  josef.p...@gmail.com wrote:
 On Sat, Mar 27, 2010 at 1:00 PM, Ryan May rma...@gmail.com wrote:
 On Mon, Mar 22, 2010 at 8:14 AM, Ryan May rma...@gmail.com wrote:
 On Sun, Mar 21, 2010 at 11:57 PM,  josef.p...@gmail.com wrote:
 On Mon, Mar 22, 2010 at 12:49 AM, Ryan May rma...@gmail.com wrote:
 Hi,

 I found that trapz() doesn't work with subclasses:

 http://projects.scipy.org/numpy/ticket/1438

 A simple patch (attached) to change asarray() to asanyarray() fixes
 the problem fine.

 Are you sure this function works with matrices and other subclasses?

 Looking only very briefly at it: the multiplication might be a problem.

 Correct, it probably *is* a problem in some cases with matrices.  In
 this case, I was using quantities (Darren Dale's unit-aware array
 package), and the result was that units were stripped off.

 The patch can't make trapz() work with all subclasses. However, right
 now, you have *no* hope of getting a subclass out of trapz().  With
 this change, subclasses that don't redefine operators can work fine.
 If you're passing a Matrix to trapz() and expecting it to work, IMHO
 you're doing it wrong.  You can still pass one in by using asarray()
 yourself.  Without this patch, I'm left with copying and maintaining a
 copy of the code elsewhere, just so I can loosen the function's input
 processing. That seems wrong, since there's really no need in my case
 to drop down to an ndarray. The input I'm giving it supports all the
 operations it needs, so it should just work with my original input.

 With asarray it gives correct results for matrices and all array_like
 and subclasses, it just doesn't preserve the type.
 Your patch would break matrices and possibly other types, masked_arrays?, ...

It would break matrices, yes.  I would argue that masked arrays are
already broken with trapz:

In [1]: x = np.arange(10)

In [2]: y = x * x

In [3]: np.trapz(y, x)
Out[3]: 244.5

In [4]: ym = np.ma.array(y, mask=(x4)(x7))

In [5]: np.trapz(ym, x)
Out[5]: 244.5

In [6]: y[5:7] = 0

In [7]: ym = np.ma.array(y, mask=(x4)(x7))

In [8]: np.trapz(ym, x)
Out[8]: 183.5

Because of the call to asarray(), the mask is completely discarded and
you end up with identical results to an unmasked array,
which is not what I'd expect.  Worse, the actual numeric value of the
positions that were masked affect the final answer. My patch allows
this to work as expected too.

 One solution would be using arraywrap as in numpy.linalg.

By arraywrap, I'm assuming you mean:

def _makearray(a):
new = asarray(a)
wrap = getattr(a, __array_prepare__, new.__array_wrap__)
return new, wrap

I'm not sure if that's identical to just letting the subclass handle
what's needed.  To my eyes, that doesn't look as though it'd be
equivalent, both for handling masked arrays and Quantities. For
quantities at least, the result of trapz will have different units
than either of the inputs.

 for related discussion:
 http://mail.scipy.org/pipermail/scipy-dev/2009-June/012061.html

Actually, that discussion kind of makes my point.  Matrices are a pain
to make work in a general sense because they *break* ndarray
conventions--to me it doesn't make sense to help along classes that
break convention at the expense of making well-behaved classes a pain
to use.  You should need an *explicit* cast of a matrix to an ndarray
instead of the function quietly doing it for you. (Explicit is better
than implicit) It just seems absurd that if I make my own ndarray
subclass that *just* adds some behavior to the array, but doesn't
break *any* operations, I need to do one of the following:

1) Have my own copy of trapz that works with my class
2) Wrap every call to numpy's own trapz() to put the metadata back.

Does it not seem backwards that the class that breaks conventions
just works while those that don't break conventions, will work
perfectly with the function as written, need help to be treated
properly?

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.trapz() doesn't respect subclass

2010-03-27 Thread josef . pktd
On Sat, Mar 27, 2010 at 2:31 PM, Ryan May rma...@gmail.com wrote:
 On Sat, Mar 27, 2010 at 11:12 AM,  josef.p...@gmail.com wrote:
 On Sat, Mar 27, 2010 at 1:00 PM, Ryan May rma...@gmail.com wrote:
 On Mon, Mar 22, 2010 at 8:14 AM, Ryan May rma...@gmail.com wrote:
 On Sun, Mar 21, 2010 at 11:57 PM,  josef.p...@gmail.com wrote:
 On Mon, Mar 22, 2010 at 12:49 AM, Ryan May rma...@gmail.com wrote:
 Hi,

 I found that trapz() doesn't work with subclasses:

 http://projects.scipy.org/numpy/ticket/1438

 A simple patch (attached) to change asarray() to asanyarray() fixes
 the problem fine.

 Are you sure this function works with matrices and other subclasses?

 Looking only very briefly at it: the multiplication might be a problem.

 Correct, it probably *is* a problem in some cases with matrices.  In
 this case, I was using quantities (Darren Dale's unit-aware array
 package), and the result was that units were stripped off.

 The patch can't make trapz() work with all subclasses. However, right
 now, you have *no* hope of getting a subclass out of trapz().  With
 this change, subclasses that don't redefine operators can work fine.
 If you're passing a Matrix to trapz() and expecting it to work, IMHO
 you're doing it wrong.  You can still pass one in by using asarray()
 yourself.  Without this patch, I'm left with copying and maintaining a
 copy of the code elsewhere, just so I can loosen the function's input
 processing. That seems wrong, since there's really no need in my case
 to drop down to an ndarray. The input I'm giving it supports all the
 operations it needs, so it should just work with my original input.

 With asarray it gives correct results for matrices and all array_like
 and subclasses, it just doesn't preserve the type.
 Your patch would break matrices and possibly other types, masked_arrays?, ...

 It would break matrices, yes.  I would argue that masked arrays are
 already broken with trapz:

 In [1]: x = np.arange(10)

 In [2]: y = x * x

 In [3]: np.trapz(y, x)
 Out[3]: 244.5

 In [4]: ym = np.ma.array(y, mask=(x4)(x7))

 In [5]: np.trapz(ym, x)
 Out[5]: 244.5

 In [6]: y[5:7] = 0

 In [7]: ym = np.ma.array(y, mask=(x4)(x7))

 In [8]: np.trapz(ym, x)
 Out[8]: 183.5

 Because of the call to asarray(), the mask is completely discarded and
 you end up with identical results to an unmasked array,
 which is not what I'd expect.  Worse, the actual numeric value of the
 positions that were masked affect the final answer. My patch allows
 this to work as expected too.

 One solution would be using arraywrap as in numpy.linalg.

 By arraywrap, I'm assuming you mean:

 def _makearray(a):
    new = asarray(a)
    wrap = getattr(a, __array_prepare__, new.__array_wrap__)
    return new, wrap

 I'm not sure if that's identical to just letting the subclass handle
 what's needed.  To my eyes, that doesn't look as though it'd be
 equivalent, both for handling masked arrays and Quantities. For
 quantities at least, the result of trapz will have different units
 than either of the inputs.

 for related discussion:
 http://mail.scipy.org/pipermail/scipy-dev/2009-June/012061.html

 Actually, that discussion kind of makes my point.  Matrices are a pain
 to make work in a general sense because they *break* ndarray
 conventions--to me it doesn't make sense to help along classes that
 break convention at the expense of making well-behaved classes a pain
 to use.  You should need an *explicit* cast of a matrix to an ndarray
 instead of the function quietly doing it for you. (Explicit is better
 than implicit) It just seems absurd that if I make my own ndarray
 subclass that *just* adds some behavior to the array, but doesn't
 break *any* operations, I need to do one of the following:

 1) Have my own copy of trapz that works with my class
 2) Wrap every call to numpy's own trapz() to put the metadata back.

 Does it not seem backwards that the class that breaks conventions
 just works while those that don't break conventions, will work
 perfectly with the function as written, need help to be treated
 properly?

(trying again, firefox or gmail deleted my earlier response instead of
sending it)

Matrices have been part of numpy for a long time and your patch would
break backwards compatibility in a pretty serious way.

subclasses of ndarray, like masked_arrays and quantities, and classes
that delegate to array calculations, like pandas, can redefine
anything. So there is not much that can be relied on if any subclass
is allowed to be used inside a function

e.g. quantities redefines sin, cos,...
http://packages.python.org/quantities/user/issues.html#umath-functions
What happens if you call fft with a quantity array?

For ndarrays  I know that  (x*y).sum(0) is equivalent to np.dot(x.T,y)
 for appropriate y and x (e.g. for a faster calculation of means), so,
it's just an implementation detail what I use. If a subclass redefines
ufuncs but not linalg, then these two can be different. Just a
thought, what happens if an 

Re: [Numpy-discussion] numpy.trapz() doesn't respect subclass

2010-03-27 Thread Ryan May
On Sat, Mar 27, 2010 at 8:23 PM,  josef.p...@gmail.com wrote:
 Matrices have been part of numpy for a long time and your patch would
 break backwards compatibility in a pretty serious way.

Yeah, and I should admit that I realize that makes this particular
patch a no-go. However, that to me doesn't put the issue to bed for
any future code that gets written (see below).

 subclasses of ndarray, like masked_arrays and quantities, and classes
 that delegate to array calculations, like pandas, can redefine
 anything. So there is not much that can be relied on if any subclass
 is allowed to be used inside a function

 e.g. quantities redefines sin, cos,...
 http://packages.python.org/quantities/user/issues.html#umath-functions
 What happens if you call fft with a quantity array?

Probably ends up casting to an ndarray. But that's a complex operation
that I can live with not working. It's coded in C and can't be
implemented quickly using array methods. And in this

 Except for simple functions and ufuncs, it would be a lot of work and
 fragile to allow asanyarray. And, as we discussed in a recent thread
 on masked arrays (and histogram), it would push the work on the
 function writer instead of the ones that are creating new subclasses.

I disagree in this case.  I think the function writer should only be
burdened to try to use array methods rather than numpy functions, if
possible, and avoiding casts other than asanyarray() at all costs.  I
think we shouldn't be scared of getting an error when a subclass is
passed to a function, because that's an indication to the programmer
that it doesn't work with what you're passing in and you need to
*explicitly* cast it to an ndarray. Having the function do the cast
for you is: 1) magical and implicit 2) Forces an unnecessary cast on
those who would otherwise work fine. I get errors when I try to pass
structured arrays to math functions, but I don't see numpy casting
that away.

 Of course, the behavior in numpy and scipy can be improved, and trapz
 may be simple enough to change, but I don't think a patch that breaks
 backwards compatibility pretty seriously and is not accompanied by
 sufficient tests should go into numpy or scipy.

If sufficient tests is the only thing holding this back, let me know.
I'll get to coding.

But I can't argue with the backwards incompatibility. At this point, I
think I'm more trying to see if there's any agreement that: casting
*everyone* because some class breaks behavior is a bad idea.  The
programmer can always make it work by explicitly asking for the cast,
but there's no way for the programmer to ask the function *not* to
cast the data. Hell, I'd be happy if trapz took a flag just telling it
subok=True.

 (On the other hand, I'm very slowly getting used to the pattern that
 for a simple function, 10% is calculation and 90% is interface code.)

Yeah, it's kind of annoying, since the 10% is the cool part you want,
and that 90% is thorny to design and boring to code.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.trapz() doesn't respect subclass

2010-03-27 Thread josef . pktd
On Sat, Mar 27, 2010 at 11:37 PM, Ryan May rma...@gmail.com wrote:
 On Sat, Mar 27, 2010 at 8:23 PM,  josef.p...@gmail.com wrote:
 Matrices have been part of numpy for a long time and your patch would
 break backwards compatibility in a pretty serious way.

 Yeah, and I should admit that I realize that makes this particular
 patch a no-go. However, that to me doesn't put the issue to bed for
 any future code that gets written (see below).

 subclasses of ndarray, like masked_arrays and quantities, and classes
 that delegate to array calculations, like pandas, can redefine
 anything. So there is not much that can be relied on if any subclass
 is allowed to be used inside a function

 e.g. quantities redefines sin, cos,...
 http://packages.python.org/quantities/user/issues.html#umath-functions
 What happens if you call fft with a quantity array?

 Probably ends up casting to an ndarray. But that's a complex operation
 that I can live with not working. It's coded in C and can't be
 implemented quickly using array methods. And in this

 Except for simple functions and ufuncs, it would be a lot of work and
 fragile to allow asanyarray. And, as we discussed in a recent thread
 on masked arrays (and histogram), it would push the work on the
 function writer instead of the ones that are creating new subclasses.

 I disagree in this case.  I think the function writer should only be
 burdened to try to use array methods rather than numpy functions, if
 possible, and avoiding casts other than asanyarray() at all costs.  I
 think we shouldn't be scared of getting an error when a subclass is
 passed to a function, because that's an indication to the programmer
 that it doesn't work with what you're passing in and you need to
 *explicitly* cast it to an ndarray. Having the function do the cast
 for you is: 1) magical and implicit 2) Forces an unnecessary cast on
 those who would otherwise work fine. I get errors when I try to pass
 structured arrays to math functions, but I don't see numpy casting
 that away.

the problem is quality control and testing.
If the cast is automatically done, then if I feed anything array_like
into the function, I only have to pay attention that casting to
ndarray works as intended (e.g. it doesn't work with masked arrays
with inappropriate masked values).
If the casting is correct, I know that I get correct numbers back out,
even if I have to reattach the meta information or convert the type
again. With asanyarray, anything could happen, including getting
numbers back that are wrong, maybe obviously so, maybe not.
(structured arrays at least have the advantage that an exception is
thrown pretty fast.)
And from what I have seen so far, testing is not a high priority for
many users.

Overall, I think that there are very few functions that are simple
enough that asanyarray would work, without wrapping and casting (at
least internally). In scipy.stats we converted a few, but for most
functions I don't want to spend the time thinking how it can be
written so we can have the internal code use anyarray (main problem
are matrices, masked_array and arrays with nans need special code
anyway, structured dtypes are mostly useless for calculations without
creating a view)
e.g. what is the quantities outcome of a call to np.corrcoef, np.cov ?

And as long as not every function returns a consistent result, changes
in implementation details can affect the outcome, and then it will be
a lot of fun hunting for bugs. e.g. stats.linregress, dot or explicit
sum of squares or np.cov, or ... Are you sure you always get the same
result with quantities?


 Of course, the behavior in numpy and scipy can be improved, and trapz
 may be simple enough to change, but I don't think a patch that breaks
 backwards compatibility pretty seriously and is not accompanied by
 sufficient tests should go into numpy or scipy.

 If sufficient tests is the only thing holding this back, let me know.
 I'll get to coding.

 But I can't argue with the backwards incompatibility. At this point, I
 think I'm more trying to see if there's any agreement that: casting
 *everyone* because some class breaks behavior is a bad idea.  The
 programmer can always make it work by explicitly asking for the cast,
 but there's no way for the programmer to ask the function *not* to
 cast the data. Hell, I'd be happy if trapz took a flag just telling it
 subok=True.

I thought a while ago that subclasses or even classes that implement
an array_like interface should have an attribute to signal this, like
iamalgebrasafe, or subok or dontcast.

The freedom or choice not to get cast to ndarray is desirable, but the
increase in bugs and bug reports won't be much fun. And the user with
the next subclass will argue that numpy/scipy should do the casting
because it's too much wrapping code that has to be build around every
function.

(Just as a related aside, in statsmodels I'm also still trying hard to
keep the main models to use ndarrays only, either it becomes 

Re: [Numpy-discussion] numpy.trapz() doesn't respect subclass

2010-03-22 Thread Ryan May
On Sun, Mar 21, 2010 at 11:57 PM,  josef.p...@gmail.com wrote:
 On Mon, Mar 22, 2010 at 12:49 AM, Ryan May rma...@gmail.com wrote:
 Hi,

 I found that trapz() doesn't work with subclasses:

 http://projects.scipy.org/numpy/ticket/1438

 A simple patch (attached) to change asarray() to asanyarray() fixes
 the problem fine.

 Are you sure this function works with matrices and other subclasses?

 Looking only very briefly at it: the multiplication might be a problem.

Correct, it probably *is* a problem in some cases with matrices.  In
this case, I was using quantities (Darren Dale's unit-aware array
package), and the result was that units were stripped off.

The patch can't make trapz() work with all subclasses. However, right
now, you have *no* hope of getting a subclass out of trapz().  With
this change, subclasses that don't redefine operators can work fine.
If you're passing a Matrix to trapz() and expecting it to work, IMHO
you're doing it wrong.  You can still pass one in by using asarray()
yourself.  Without this patch, I'm left with copying and maintaining a
copy of the code elsewhere, just so I can loosen the function's input
processing. That seems wrong, since there's really no need in my case
to drop down to an ndarray. The input I'm giving it supports all the
operations it needs, so it should just work with my original input.

Or am I just off base here?

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.trapz() doesn't respect subclass

2010-03-21 Thread josef . pktd
On Mon, Mar 22, 2010 at 12:49 AM, Ryan May rma...@gmail.com wrote:
 Hi,

 I found that trapz() doesn't work with subclasses:

 http://projects.scipy.org/numpy/ticket/1438

 A simple patch (attached) to change asarray() to asanyarray() fixes
 the problem fine.

Are you sure this function works with matrices and other subclasses?

Looking only very briefly at it: the multiplication might be a problem.

Josef



 Ryan

 --
 Ryan May
 Graduate Research Assistant
 School of Meteorology
 University of Oklahoma

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion