[Numpy-discussion] evaluating a function in a matrix element???

2010-03-18 Thread Frank Horowitz
Dear All,

I'm working on a piece of optimisation code where it turns out to be 
mathematically convenient to have a matrix where a few pre-chosen elements must 
be computed at evaluation time for the dot product (i.e. matrix multiplication) 
of a matrix with a vector.

As I see the problem, there are two basic approaches to accomplishing this. 

First (and perhaps conceptually simplest, not to mention apparently faster) 
might be to stash appropriate functions at their corresponding locations in the 
matrix (with the rest of the matrix being constants, as usual). I mucked around 
with this for a little while in iPython, and it appears that having dtype == 
object_ works for stashing the references to the functions, but fails to allow 
me to actually evaluate the function(s) when the matrix is used in the dot 
product.

Does anyone have any experience with making such a beast work within numpy? If 
so, how?

The second basic approach is to build a ufunc that implements the switching 
logic, and returns the constants and evaluated functions in the appropriate 
locations.  This seems to be the more do-able approach, but it requires the 
ufunc to be aware of both the position of each element (via index, or somesuch) 
as well as the arguments to the functions themselves being evaluated at the 
matrix elements. It appears that frompyfunc() nearly does what I want, but I am 
currently failing to see how to actually *use* it for anything more elaborate 
than the octal example code (i.e. one value in, and one value out). Does anyone 
have any other more elaborate examples they can point me towards?

Thanks in advance for any help you might be able to provide!

Frank Horowitz
fr...@horow.net




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in ndarray.resize?

2010-03-18 Thread Francesc Alted
A Wednesday 17 March 2010 22:56:20 Charles R Harris escrigué:
 On Wed, Mar 17, 2010 at 8:42 AM, Alan G Isaac ais...@american.edu wrote:
  On 3/17/2010 10:16 AM, josef.p...@gmail.com wrote:
   numpy.resize(a, new_shape)
   Return a new array with the specified shape.
  
   If the new array is larger than the original array, then the new array
   is filled with repeated copied of a. Note that this behavior is
   different from a.resize(new_shape) which fills with zeros instead of
   repeated copies of a.
 
  Yes indeed.  Sorry, I must have scrolled the help without realizing it,
  and this part was at the top.
 
  So my follow up: why is this desirable/necessary?  (I find it
  surprising.)
 
 IIRC, it behaved that way in Numeric.

This does not mean that this behaviour is desirable.  I find it inconsistent 
and misleading so +1 for fixing it.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in ndarray.resize?

2010-03-18 Thread Sebastian Haase
On Wed, Mar 17, 2010 at 10:56 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Wed, Mar 17, 2010 at 8:42 AM, Alan G Isaac ais...@american.edu wrote:

 On 3/17/2010 10:16 AM, josef.p...@gmail.com wrote:
  numpy.resize(a, new_shape)
  Return a new array with the specified shape.
 
  If the new array is larger than the original array, then the new array
  is filled with repeated copied of a. Note that this behavior is
  different from a.resize(new_shape) which fills with zeros instead of
  repeated copies of a.


 Yes indeed.  Sorry, I must have scrolled the help without realizing it,
 and this part was at the top.

 So my follow up: why is this desirable/necessary?  (I find it surprising.)

 IIRC, it behaved that way in Numeric.

How would people feel about unifying the function vs. the method behavior ?
One could add an addition option like
`repeat` or `fillZero`.
One could (at first !?) keep opposite defaults to not change the
current behavior.
But this way it would be most visible and clear what is going on.

Regards,
Sebastian Haase
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in ndarray.resize?

2010-03-18 Thread Gael Varoquaux
On Thu, Mar 18, 2010 at 09:56:21AM +0100, Sebastian Haase wrote:
 How would people feel about unifying the function vs. the method behavior ?
 One could add an addition option like
 `repeat` or `fillZero`.

You mean fill_zero...

Sorry, my linter went off. :)

Gaël
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ufunc improvements [Was: Warnings in numpy.ma.test()]

2010-03-18 Thread Darren Dale
On Wed, Mar 17, 2010 at 10:16 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Wed, Mar 17, 2010 at 7:39 PM, Darren Dale dsdal...@gmail.com wrote:

 On Wed, Mar 17, 2010 at 8:22 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Wed, Mar 17, 2010 at 5:26 PM, Darren Dale dsdal...@gmail.com wrote:
 
  On Wed, Mar 17, 2010 at 5:43 PM, Charles R Harris
  charlesr.har...@gmail.com wrote:
   On Wed, Mar 17, 2010 at 3:13 PM, Darren Dale dsdal...@gmail.com
   wrote:
   On Wed, Mar 17, 2010 at 4:48 PM, Pierre GM pgmdevl...@gmail.com
   wrote:
On Mar 17, 2010, at 8:19 AM, Darren Dale wrote:
   
I started thinking about a third method called __input_prepare__
that
would be called on the way into the ufunc, which would allow you
to
intercept the input and pass a somehow modified copy back to the
ufunc. The total flow would be:
   
1) Call myufunc(x, y[, z])
2) myufunc calls ?.__input_prepare__(myufunc, x, y), which
returns
x',
y' (or simply passes through x,y by default)
3) myufunc creates the output array z (if not specified) and
calls
?.__array_prepare__(z, (myufunc, x, y, ...))
4) myufunc finally gets around to performing the calculation
5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and
returns
the result to the caller
   
Is this general enough for your use case? I haven't tried to
think
about how to change some global state at one point and change it
back
at another, that seems like a bad idea and difficult to support.
   
   
Sounds like a good plan. If we could find a way to merge the first
two
(__input_prepare__ and __array_prepare__), that'd be ideal.
  
   I think it is better to keep them separate, so we don't have one
   method that is trying to do too much. It would be easier to explain
   in
   the documentation.
  
   I may not have much time to look into this until after Monday. Is
   there a deadline we need to consider?
  
  
   I don't think this should go into 2.0, I think it needs more thought.
 
  Now that you mention it, I agree that it would be too rushed to try to
  get it in for 2.0. Concerning a later release, is there anything in
  particular that you think needs to be clarified or reconsidered?
 
   And
   2.0 already has significant code churn. Is there any reason beyond a
   big
   hassle not to set/restore the error state around all the ufunc calls
   in
   ma?
   Beyond that, the PEP that you pointed to looks interesting. Maybe
   some
   sort
   of decorator around ufunc calls could also be made to work.
 
  I think the PEP is interesting, but it is languishing. There were some
  questions and criticisms on the mailing list that I do not think were
  satisfactorily addressed, and as far as I know the author of the PEP
  has not pursued the matter further. There was some interest on the
  python-dev mailing list in the numpy community's use case, but I think
  we need to consider what can be done now to meet the needs of ndarray
  subclasses. I don't see PEP 3124 happening in the near future.
 
  What I am proposing is a simple extension to our existing framework to
  let subclasses hook into ufuncs and customize their behavior based on
  the context of the operation (using the __array_priority__ of the
  inputs and/or outputs, and the identity of the ufunc). The steps I
  listed allow customization at the critical steps: prepare the input,
  prepare the output, populate the output (currently no proposal for
  customization here), and finalize the output. The only additional step
  proposed is to prepare the input.
 
 
  What bothers me here is the opposing desire to separate ufuncs from
  their
  ndarray dependency, having them operate on buffer objects instead. As I
  see
  it ufuncs would be split into layers, with a lower layer operating on
  buffer
  objects, and an upper layer tying them together with ndarrays where the
  business logic -- kinds, casting, etc -- resides. It is in that upper
  layer that what you are proposing would reside. Mind, I'm not sure that
  having matrices and masked arrays subclassing ndarray was the way to go,
  but
  given that they do one possible solution is to dump the whole mess onto
  the
  subtype with the highest priority. That subtype would then be
  responsible
  for casts and all the other stuff needed for the call and wrapping the
  result. There could be library routines to help with that. It seems to
  me
  that that would be the most general way to go. In that sense ndarrays
  themselves would just be another subtype with especially low priority.

 I'm sorry, I didn't understand your point. What you described sounds
 identical to how things are currently done. What distinction are you
 making, aside from operating on the buffer object? How would adding a
 method to modify the input to a ufunc complicate the situation?


 Just *one* function to rule them all and on the subtype dump it. No
 

[Numpy-discussion] Int bitsize in python and c

2010-03-18 Thread Martin Raspaud
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

I work on a 64bit machine with 64bits enable fedora on it.

I just discovered that numpy.int on the python part are 64bits ints, while
npy_int in the C api are 32bits ints.

I can live with it, but it seems to be different on 32bit machines, hence I
wonder what is the right way to do when retrieving an array from python to C.

Here is what I use now:
data_pyarray = (PyArrayObject *)PyArray_ContiguousFromObject(data_list,
PyArray_INT, 1, 2);

but that implies that I send np.int32 arrays to the C part.

Should I use longs instead ?

Regards,
Martin
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJLoiuqAAoJEBdvyODiyJI4TikIAIUpnsIxxeYMlz8qEeZL/UUB
3UTGOCcrcIICPVRW/CLbOss5W4xe8BTxPslRXZfckSuMMgHHiD3rGC302gZgfvsb
mS6fcDzTOboJ1da1xoczpJYVCwvC9aWAPEjEDa6jyI331pDAXABurmjzIQqjowDw
1cWX5swt9MeSn0yOa/a2EYQP8Xj+n0RQlSIutEDR5jktlK3yyHX8LAtZd0tAPgrd
hr9RGwO09Hwcn7ke4B9SwHF7Zg/mBrHgdTdaufW+kjPleZ479lyMO8r/LsWbehVo
usQ5wefnmnzhDhOoxff8aKUo8D+Ne8gqxI4BR5EOAdHfQ2uUPpBA91pJ0cNbzZI=
=E0XH
-END PGP SIGNATURE-
attachment: martin_raspaud.vcf___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [OT] Starving CPUs article featured in IEEE's ComputingNow portal

2010-03-18 Thread Francesc Alted
Hi,

Konrad Hinsen has just told me that my article Why Modern CPUs Are Starving 
and What Can Be Done About It, which has just released on the March/April 
issue of Computing in Science and Engineering, also made into this month's 
free-access selection on IEEE's ComputingNow portal:

http://www.computer.org/portal/web/computingnow
http://www.computer.org/portal/web/computingnow/0310/whatsnew/cise

On it, I discuss one of my favourite subjects, memory access, and why it is 
important for nowadays computing.  There are also recommendations for people 
wanting to squeeze the maximum of performance out of their computers.  And, 
although I tried to be as language-agnostic as I could, there can be seen some 
Python references here and there :-).

Well, sorry about this semi-OT but I could not resist :-)

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread Pierre GM
On Mar 17, 2010, at 3:18 PM, josef.p...@gmail.com wrote:
 On Wed, Mar 17, 2010 at 3:12 PM, Christopher Barker
 chris.bar...@noaa.gov wrote:
 
 One of the things I liked about MATLAB was that NaNs were well handled
 almost all the time. Given all the limitations of NaN, having a masked
 array is a better way to go, but I'd love it if they were just there,
 and therefore EVERY numpy function and package built on numpy would
 handle them gracefully. I had thought that there would be a significant
 performance penalty, and thus there would be a boatload of if_mask
 code all over the place, but maybe not.
 
 many function are defined differently for missing values, in stats,
 regression or time series analysis with the assumption of equally
 spaced time periods always needs to use special methods to handle
 missing values.

I think Christopher was referring to ufuncs, not necessarily to more 
complicated functions (like in stats or such).


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [OT] Starving CPUs article featured in IEEE's ComputingNow portal

2010-03-18 Thread Anne Archibald
On 18 March 2010 09:57, Francesc Alted fal...@pytables.org wrote:
 Hi,

 Konrad Hinsen has just told me that my article Why Modern CPUs Are Starving
 and What Can Be Done About It, which has just released on the March/April
 issue of Computing in Science and Engineering, also made into this month's
 free-access selection on IEEE's ComputingNow portal:

Speak for your own CPUs :).

But seriously, congratulations on the wide publication of the article;
it's an important issue we often don't think enough about. I'm just a
little snarky because this exact issue came up for us recently - a
visiting astro speaker put it as flops are free - and so I did some
tests and found that even without optimizing for memory access, our
tasks are already CPU-bound:
http://lighthouseinthesky.blogspot.com/2010/03/flops.html

In terms of specifics, I was a little surprised you didn't mention
FFTW among your software tools that optimize memory access. FFTW's
planning scheme seems ideal for ensuring memory locality, as much as
possible, during large FFTs. (And in fact I also found that for really
large FFTs, reducing padding - memory size - at the cost of a
non-power-of-two size was also worth it.)

 http://www.computer.org/portal/web/computingnow
 http://www.computer.org/portal/web/computingnow/0310/whatsnew/cise

 On it, I discuss one of my favourite subjects, memory access, and why it is
 important for nowadays computing.  There are also recommendations for people
 wanting to squeeze the maximum of performance out of their computers.  And,
 although I tried to be as language-agnostic as I could, there can be seen some
 Python references here and there :-).

Heh. Indeed numexpr is a good tool for this sort of thing; it's an
unfortunate fact that simple use of numpy tends to do operations in
the pessimal order...

Ane

 Well, sorry about this semi-OT but I could not resist :-)

 --
 Francesc Alted
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Int bitsize in python and c

2010-03-18 Thread Martin Raspaud
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dag Sverre Seljebotn skrev:
 Martin Raspaud wrote:
 Hello,
 
 I work on a 64bit machine with 64bits enable fedora on it.
 
 I just discovered that numpy.int on the python part are 64bits ints, while
 npy_int in the C api are 32bits ints.
   
 
 np.intc

Thanks !

But I'm also wondering about the C side. How can I make sure my conversion to
PyArray goes well ? Should I use PyArray_LONG ? or is this also platform 
dependent ?

Regards,
Martin

 I can live with it, but it seems to be different on 32bit machines, hence I
 wonder what is the right way to do when retrieving an array from python to C.
 
 Here is what I use now:
 data_pyarray = (PyArrayObject *)PyArray_ContiguousFromObject(data_list,
 PyArray_INT, 1, 2);
 
 but that implies that I send np.int32 arrays to the C part.
 
 Should I use longs instead ?
 
 Regards,
 Martin

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJLokeGAAoJEBdvyODiyJI4tv0H/R+sUTIfvH2JejX85XHUGtOw
mUwDkAD2ePj9qNc62RJJQg75B58useoe8zHSNj2NFG7U5fhFYviIQImugJ4dAESo
z4WgrIRQE29apyCScRid8lXBHDL8kvJtpI+G/uFQ62jxSzheIDpEEzKUlDMYZeIA
MLFhMWqzHVmuNEUVyUdPt+ryL6T23t/uzQEtpt+QKt5U2Vj/dYerRRjilMA5bvLn
r+xGIFQqKOLW8MzwJ+zo0nkea7JvfDBbSZdU2oS5yuQ8rdXR/v9OeSfzTI5fuI7C
erUTkEozZ+5jShZbmJbihJcjjw61KleF7QfiySOahqW/E/gKYWRSX5doB4rF35A=
=MDuv
-END PGP SIGNATURE-
attachment: martin_raspaud.vcf___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Int bitsize in python and c

2010-03-18 Thread Robert Kern
On Thu, Mar 18, 2010 at 08:33, Martin Raspaud martin.rasp...@smhi.se wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hello,

 I work on a 64bit machine with 64bits enable fedora on it.

 I just discovered that numpy.int on the python part are 64bits ints, while
 npy_int in the C api are 32bits ints.

Note that np.int is just Python's builtin int type (there only for
historical reasons). It corresponds to a C long. npy_int corresponds
to a C int.

 I can live with it, but it seems to be different on 32bit machines, hence I
 wonder what is the right way to do when retrieving an array from python to C.

 Here is what I use now:
 data_pyarray = (PyArrayObject *)PyArray_ContiguousFromObject(data_list,
 PyArray_INT, 1, 2);

 but that implies that I send np.int32 arrays to the C part.

 Should I use longs instead ?

Not necessarily; C longs are the cause of your problem. On some
platforms they are 64-bit, some they are 32-bit. Technically speaking,
C ints can vary from platform to platform, but they are typically
32-bits on all modern platforms running numpy.

Of course, numpy defaults to using a C long for its integer arrays,
just as Python does for its int type, so perhaps using a C long would
work best for you. It's platform dependent, but it matches the
platform dependent changes in numpy. It depends on what your needs
are. If you need a consistent size (perhaps you are writing bytes out
to a file), then always use the int32 or int64 specific types.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread Pierre GM
On Mar 17, 2010, at 5:43 PM, Charles R Harris wrote:
 
 
 
 On Wed, Mar 17, 2010 at 3:13 PM, Darren Dale dsdal...@gmail.com wrote:
 On Wed, Mar 17, 2010 at 4:48 PM, Pierre GM pgmdevl...@gmail.com wrote:
  On Mar 17, 2010, at 8:19 AM, Darren Dale wrote:
 
  I started thinking about a third method called __input_prepare__ that
  would be called on the way into the ufunc, which would allow you to
  intercept the input and pass a somehow modified copy back to the
  ufunc. The total flow would be:
 
  1) Call myufunc(x, y[, z])
  2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns x',
  y' (or simply passes through x,y by default)
  3) myufunc creates the output array z (if not specified) and calls
  ?.__array_prepare__(z, (myufunc, x, y, ...))
  4) myufunc finally gets around to performing the calculation
  5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns
  the result to the caller
 
  Is this general enough for your use case? I haven't tried to think
  about how to change some global state at one point and change it back
  at another, that seems like a bad idea and difficult to support.
 
 
  Sounds like a good plan. If we could find a way to merge the first two 
  (__input_prepare__ and __array_prepare__), that'd be ideal.
 
 I think it is better to keep them separate, so we don't have one
 method that is trying to do too much. It would be easier to explain in
 the documentation.
 
 I may not have much time to look into this until after Monday. Is
 there a deadline we need to consider?
 
 
 I don't think this should go into 2.0, I think it needs more thought.  And 
 2.0 already has significant code churn. Is there any reason beyond a big 
 hassle not to set/restore the error state around all the ufunc calls in ma?

Should be done with r8295. Please let me know whether I missed one. 

Otherwise, I agree with Chuck. Let's take some time to figure something. It'd 
be significant a change that it shouldn't go in 2.0


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread Bruce Southey
On 03/17/2010 04:20 PM, Pierre GM wrote:
 On Mar 17, 2010, at 11:09 AM, Bruce Southey wrote:

 On 03/17/2010 01:07 AM, Pierre GM wrote:
  
 All,
 As you're probably aware, the current test suite for numpy.ma raises some 
 nagging warnings such as invalid value in  These warnings are only 
 issued when a standard numpy ufunc (eg., np.sqrt) is called on a 
 MaskedArray, instead of its numpy.ma (eg., np.ma.sqrt) equivalent. The 
 reason is that the masked versions of the ufuncs temporarily set the numpy 
 error status to 'ignore' before the operation takes place, and reset the 
 status to its original value.


 Perhaps naive question, what is really being tested here?

 That is, it appears that you are testing both the generation of the
 invalid values and function. So if the generation fails, then the
 function will also fail. However, the test for the generation of invalid
 values  should be elsewhere so you have to assume that the generation of
 values will work correctly.
  
 That's not really the point here. The issue is that when numpy ufuncs are 
 called on a MaskedArray, a warning or an exception is raised when an invalid 
 is met. With the numpy.ma version of those functions, the error is trapped 
 and processed. Of course, using the numpy.ma version of the ufuncs is the 
 right way to go



 I think that you should be only testing that the specific function
 passes the test. Why not just use 'invalid' values like np.inf directly?

 For example, in numpy/ma/tests/test_core.py
 We have this test:
  def test_fix_invalid(self):
  Checks fix_invalid.
  data = masked_array(np.sqrt([-1., 0., 1.]), mask=[0, 0, 1])
  data_fixed = fix_invalid(data)

 If that is to test that fix_invalid Why not create the data array as:
 data = masked_array([np.inf, 0., 1.]), mask=[0, 0, 1])
  
 Sure, that's nicer. But once again, that's not really the core of the issue.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

I needed to point out that your statement was not completely correct:
'These warnings are only issued when a standard numpy ufunc (eg., 
np.sqrt) is called on a MaskedArray...'.

There are valid warnings for some of the tests because these are do not 
operate on masked arrays. As in the above example, the masked array only 
occurs *after* the square root has been taken and hence the warning. So 
some of the warnings in the tests should be eliminated just by changing 
the test input.

Furthermore, this warning is technically valid for non-masked values:
  np.sqrt(np.ma.array([-1], mask=[0]))
Warning: invalid value encountered in sqrt
masked_array(data = [--],
  mask = [ True],
fill_value = 1e+20)

In contrast not warning is emitted with ma function:
  np.ma.sqrt(np.ma.array([-1], mask=[0]))
masked_array(data = [--],
  mask = [ True],
fill_value = 1e+20)

But I fully agree that there is a problem when the value is masked 
because it should have ignored the operation:
  np.sqrt(np.ma.array([-1], mask=[1]))
Warning: invalid value encountered in sqrt
masked_array(data = [--],
  mask = [ True],
fill_value = 1e+20)


Here I understand your view is that if the operation is on a masked 
array then all 'invalid' operations like square root then these should 
be silently converted to masked values.

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread Pierre GM
On Mar 17, 2010, at 9:16 PM, Charles R Harris wrote:
 
 
 On Wed, Mar 17, 2010 at 7:39 PM, Darren Dale dsdal...@gmail.com wrote:
 On Wed, Mar 17, 2010 at 8:22 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
  What bothers me here is the opposing desire to separate ufuncs from their
  ndarray dependency, having them operate on buffer objects instead. As I see
  it ufuncs would be split into layers, with a lower layer operating on buffer
  objects, and an upper layer tying them together with ndarrays where the
  business logic -- kinds, casting, etc -- resides. It is in that upper
  layer that what you are proposing would reside. Mind, I'm not sure that
  having matrices and masked arrays subclassing ndarray was the way to go, but
  given that they do one possible solution is to dump the whole mess onto the
  subtype with the highest priority. That subtype would then be responsible
  for casts and all the other stuff needed for the call and wrapping the
  result. There could be library routines to help with that. It seems to me
  that that would be the most general way to go. In that sense ndarrays
  themselves would just be another subtype with especially low priority.
 
 I'm sorry, I didn't understand your point. What you described sounds
 identical to how things are currently done. What distinction are you
 making, aside from operating on the buffer object? How would adding a
 method to modify the input to a ufunc complicate the situation?
 
 
 Just *one* function to rule them all and on the subtype dump it. No 
 __array_wrap__, __input_prepare__, or __array_prepare__, just something like 
 __handle_ufunc__. So it is similar but perhaps more radical. I'm proposing 
 having the ufunc upper layer do nothing but decide which argument type will 
 do all the rest of the work, casting, calling the low level ufunc base, 
 providing buffers, wrapping, etc. Instead of pasting bits and pieces into the 
 existing framework I would like to lay out a line of attack that ends up 
 separating ufuncs into smaller pieces that provide low level routines that 
 work on strided memory while leaving policy implementation to the subtype. 
 There would need to be some default type (ndarray) when the functions are 
 called on nested lists and scalars and I'm not sure of the best way to handle 
 that.
 
 I'm just sort of thinking out loud, don't take it too seriously.

Still, I like it. It sounds far cleaner than the current Harlequin's costume 
approach. 
In the thinking out loud department: 
* the upper layer should allow the user to modify the input on the fly (a 
current limitation of __array_prepare__), so that we can change values that we 
know will give invalid results before the lower layer processes them.
* It'd be nice to have the domains defined in the functions. Right now, the 
domains are defined in numpy.ma.core for unary and binary functions. Maybe we 
could extend the 'context' of each ufunc ?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread Pierre GM
On Mar 18, 2010, at 11:03 AM, Bruce Southey wrote:
 On 03/17/2010 04:20 PM, Pierre GM wrote:
 On Mar 17, 2010, at 11:09 AM, Bruce Southey wrote:
 
 On 03/17/2010 01:07 AM, Pierre GM wrote:
 
 All,
 As you're probably aware, the current test suite for numpy.ma raises some 
 nagging warnings such as invalid value in  These warnings are only 
 issued when a standard numpy ufunc (eg., np.sqrt) is called on a 
 MaskedArray, instead of its numpy.ma (eg., np.ma.sqrt) equivalent. The 
 reason is that the masked versions of the ufuncs temporarily set the numpy 
 error status to 'ignore' before the operation takes place, and reset the 
 status to its original value.
 
 
 Perhaps naive question, what is really being tested here?
 
 That is, it appears that you are testing both the generation of the
 invalid values and function. So if the generation fails, then the
 function will also fail. However, the test for the generation of invalid
 values  should be elsewhere so you have to assume that the generation of
 values will work correctly.
 
 That's not really the point here. The issue is that when numpy ufuncs are 
 called on a MaskedArray, a warning or an exception is raised when an invalid 
 is met. With the numpy.ma version of those functions, the error is trapped 
 and processed. Of course, using the numpy.ma version of the ufuncs is the 
 right way to go
 
 
 
 I think that you should be only testing that the specific function
 passes the test. Why not just use 'invalid' values like np.inf directly?
 
 For example, in numpy/ma/tests/test_core.py
 We have this test:
 def test_fix_invalid(self):
 Checks fix_invalid.
 data = masked_array(np.sqrt([-1., 0., 1.]), mask=[0, 0, 1])
 data_fixed = fix_invalid(data)
 
 If that is to test that fix_invalid Why not create the data array as:
 data = masked_array([np.inf, 0., 1.]), mask=[0, 0, 1])
 
 Sure, that's nicer. But once again, that's not really the core of the issue.
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 I needed to point out that your statement was not completely correct:
 'These warnings are only issued when a standard numpy ufunc (eg., 
 np.sqrt) is called on a MaskedArray...'.
 
 There are valid warnings for some of the tests because these are do not 
 operate on masked arrays. As in the above example, the masked array only 
 occurs *after* the square root has been taken and hence the warning. So 
 some of the warnings in the tests should be eliminated just by changing 
 the test input.

Dang, I knew I was forgetting something... OK, I'll work on that. 
But anyway, I agree with you, some of the tests are not particularly 
well-thought

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Daniel Wright

2010-03-18 Thread Gustavo Mercier
http://carshowcolombia.com/John.html
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ask.scipy.org is online

2010-03-18 Thread Gökhan Sever
Hello,

Please tolerate my impatience for being the first announcing the new
discussion platform :) and my cross-posting over the lists.

The new site is beating at ask.scipy.org . David Warde-Farley is moving the
questions from the old-site at advice.mechanicalkern.com (announced at
SciPy09 by Robert Kern)

We welcome you to join the discussions there. I have kicked-off the new
questions chain by
http://ask.scipy.org/en/topic/11-what-is-your-favorite-numpy-feature
(Also cross-posted at
http://stackoverflow.com/questions/2471872/what-is-your-favorite-numpy-featureto
pull more attention to ask.scipy)

Please share your questions and comments, and have fun with your
discussions.

-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [OT] Starving CPUs article featured in IEEE's ComputingNow portal

2010-03-18 Thread Francesc Alted
A Thursday 18 March 2010 16:26:09 Anne Archibald escrigué:
 Speak for your own CPUs :).
 
 But seriously, congratulations on the wide publication of the article;
 it's an important issue we often don't think enough about. I'm just a
 little snarky because this exact issue came up for us recently - a
 visiting astro speaker put it as flops are free - and so I did some
 tests and found that even without optimizing for memory access, our
 tasks are already CPU-bound:
 http://lighthouseinthesky.blogspot.com/2010/03/flops.html

Well, I thought that my introduction was enough to convince anybody about the 
problem, but forgot that you, the scientists, always try to demonstrate things 
experimentally :-/

Seriously, your example is a clear example of what I'm recommending in the 
article, i.e. always try to use libraries that are already leverage the 
blocking technique (that is, taking advantage of both temporal and spatial 
locality).  Don't know about FFTW (never used it, sorry), but after having a 
look at its home page, I'm pretty convinced that its authors are very 
conscious about these techniques.

Being said this, it seems that, in addition, you are applying the blocking 
technique yourself also: get the data in bunches (256 floating point elements, 
which fits perfectly well on modern L1 caches), apply your computation (in 
this case, FFTW) and put the result back in memory.  A perfect example of what 
I wanted to show to the readers so, congratulations! you made it without the 
need to read my article (so perhaps the article was not so necessary after all 
:-)

 In terms of specifics, I was a little surprised you didn't mention
 FFTW among your software tools that optimize memory access. FFTW's
 planning scheme seems ideal for ensuring memory locality, as much as
 possible, during large FFTs. (And in fact I also found that for really
 large FFTs, reducing padding - memory size - at the cost of a
 non-power-of-two size was also worth it.)

I must say that I'm quite naïve in many existing great tools for scientific 
computing.  What I know, is that when I need to do something I always look for 
good existing tools first.  So this is basically why I spoke about numexpr and 
BLAS/LAPACK:  I know them well.

 Heh. Indeed numexpr is a good tool for this sort of thing; it's an
 unfortunate fact that simple use of numpy tends to do operations in
 the pessimal order...

Well, to honor the truth, NumPy does not have control in the order of the 
operations in expressions and how temporaries are managed: it is Python who 
decides that.  NumPy only can do what Python wants it to do, and do it as good 
as possible.  And NumPy plays its role reasonably well here, but of course, 
this is not enough for providing performance.  In fact, this problem probably 
affects to all interpreted languages out there, unless they implement a JIT 
compiler optimised for evaluating expressions --and this is basically what 
numexpr is.

Anyway, thanks for constructive criticism, I really appreciate it!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy.gradient() does not return ndarray subclasses

2010-03-18 Thread Ryan May
Hi,

Can I get someone to look at: http://projects.scipy.org/numpy/ticket/1435

Basically, numpy.gradient() uses numpy.zeros() to create an output
array.  This breaks the use of any ndarray subclasses, like masked
arrays, since the function will only return
ndarrays.  I've attached a patch that fixes that problem and has a
simple test checking output types. With this patch, I can use gradient
on masked arrays and get appropriately masked output.

If we could, it'd be nice to get this in for 2.0 so that I (and my
coworkers who found the bug) don't have to use a custom patched
gradient until the next release after that.

Thanks,

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma


fix_gradient_with_subclasses.diff
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread Christopher Barker
josef.p...@gmail.com wrote:
 On Wed, Mar 17, 2010 at 3:12 PM, Christopher Barker

 Given all the limitations of NaN, having a masked
 array is a better way to go, but I'd love it if they were just there,
 and therefore EVERY numpy function and package built on numpy would
 handle them gracefully.

 many function are defined differently for missing values, in stats,
 regression or time series analysis with the assumption of equally
 spaced time periods always needs to use special methods to handle
 missing values.

sure -- that's kind of my point -- if EVERY numpy array were 
(potentially) masked, then folks would write code to deal with them 
appropriately.

 Plus, you have to operate on two arrays and keep both in memory. So
 the penalty is pretty high even in C.

Only if there is actually a mask, which might make this pretty ugly -- 
lots of if mask code branching.

If a given routine either didn't make sense with missing values, or was 
simply too costly with them, it could certainly raise an exception if it 
got an array with a non-null mask.

 Anyway, just a fantasy, but C-level ufuncs that support masks would be
 great.

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread Gael Varoquaux
On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:
 sure -- that's kind of my point -- if EVERY numpy array were 
 (potentially) masked, then folks would write code to deal with them 
 appropriately.

That's pretty much saying: I have a complicated problem and I want every
one else to have to deal with the full complexity of it, even if they
have a simple problem. In my experience, such choice doesn't fair well,
unless it is inside a controled codebase, and someone working on that
codebase is ready to spend heaps of time working on that specific issue.

Gaël
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread josef . pktd
On Thu, Mar 18, 2010 at 3:19 PM, Gael Varoquaux
gael.varoqu...@normalesup.org wrote:
 On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:
 sure -- that's kind of my point -- if EVERY numpy array were
 (potentially) masked, then folks would write code to deal with them
 appropriately.

 That's pretty much saying: I have a complicated problem and I want every
 one else to have to deal with the full complexity of it, even if they
 have a simple problem. In my experience, such choice doesn't fair well,
 unless it is inside a controled codebase, and someone working on that
 codebase is ready to spend heaps of time working on that specific issue.

If the mask doesn't get quietly added during an operation, we would
need to keep out the masked arrays at the front door.
I worry about speed penalties for pure number crunching, although, if
nomask=True gives a fast path, then it might not be too much of a
problem.

ufuncs are simple enough, but for reduce operations and other more
complicated things (linalg, fft) the user would need to control how
missing values are supposed to be handled, which still requires
special treatment and if mask all over the place.

Josef



 Gaël
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread Christopher Barker
Gael Varoquaux wrote:
 On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:
 sure -- that's kind of my point -- if EVERY numpy array were 
 (potentially) masked, then folks would write code to deal with them 
 appropriately.
 
 That's pretty much saying: I have a complicated problem and I want every
 one else to have to deal with the full complexity of it, even if they
 have a simple problem.

Well -- I did say it was a fantasy...

But I disagree -- having invalid data is a very common case. What we 
have now is a situation where we have two parallel systems, masked 
arrays and regular arrays. Each time someone does something new with 
masked arrays, they often find another missing feature, and have to 
solve that. Also, the fact that masked arrays are tacked on means that 
performance suffers.

Maybe it would simply be too ugly, but If I were to start from the 
ground up with a scientific computing package, I would want to put in 
support for missing values from that start.

There are some cases where is it simply too complicated or to expensive 
to handle missing values -- fine, then an exception is raised.

You may be right about how complicated it would be, and what would 
happen is that everyone would simply put a:

if a.masked:
raise (I can't deal with masked dat)

stanza at the top of every new method they wrote, but I suspect that if 
the core infrastructure was in place, it would be used.

I'm facing this at the moment: not a big deal, but I'm using histogram2d 
on some data. I just realized that it may have some NaNs in it, and I 
have no idea how those are being handled. I also want to move to masked 
arrays and have no idea if histogram2d can deal with those. At the 
least, I need to do some testing, and I suspect I'll need to do some 
hacking on histogram2d (or just write my own).

I'm sure I'm not the only one in the world that needs to histogram some 
data that may have invalid values -- so wouldn't it be nice if that were 
already handled in a defined way?

-Chris






-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread Ryan May
On Thu, Mar 18, 2010 at 2:46 PM, Christopher Barker
chris.bar...@noaa.gov wrote:
 Gael Varoquaux wrote:
 On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:
 sure -- that's kind of my point -- if EVERY numpy array were
 (potentially) masked, then folks would write code to deal with them
 appropriately.

 That's pretty much saying: I have a complicated problem and I want every
 one else to have to deal with the full complexity of it, even if they
 have a simple problem.

 Well -- I did say it was a fantasy...

 But I disagree -- having invalid data is a very common case. What we
 have now is a situation where we have two parallel systems, masked
 arrays and regular arrays. Each time someone does something new with
 masked arrays, they often find another missing feature, and have to
 solve that. Also, the fact that masked arrays are tacked on means that
 performance suffers.

Case in point, I just found a bug in np.gradient where it forces the
output to be an ndarray.
(http://projects.scipy.org/numpy/ticket/1435).  Easy fix that doesn't
actually require any special casing for masked arrays, just making
sure to use the proper function to create a new array of the same
subclass as the input.  However, now for any place that I can't patch
I have to use a custom function until a fixed numpy is released.

Maybe universal support for masked arrays (and masking invalid points)
is a pipe dream, but every function in numpy should IMO deal properly
with subclasses of ndarray.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread josef . pktd
On Thu, Mar 18, 2010 at 3:46 PM, Christopher Barker
chris.bar...@noaa.gov wrote:
 Gael Varoquaux wrote:
 On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:
 sure -- that's kind of my point -- if EVERY numpy array were
 (potentially) masked, then folks would write code to deal with them
 appropriately.

 That's pretty much saying: I have a complicated problem and I want every
 one else to have to deal with the full complexity of it, even if they
 have a simple problem.

 Well -- I did say it was a fantasy...

 But I disagree -- having invalid data is a very common case. What we
 have now is a situation where we have two parallel systems, masked
 arrays and regular arrays. Each time someone does something new with
 masked arrays, they often find another missing feature, and have to
 solve that. Also, the fact that masked arrays are tacked on means that
 performance suffers.

 Maybe it would simply be too ugly, but If I were to start from the
 ground up with a scientific computing package, I would want to put in
 support for missing values from that start.

 There are some cases where is it simply too complicated or to expensive
 to handle missing values -- fine, then an exception is raised.

 You may be right about how complicated it would be, and what would
 happen is that everyone would simply put a:

 if a.masked:
    raise (I can't deal with masked dat)

 stanza at the top of every new method they wrote, but I suspect that if
 the core infrastructure was in place, it would be used.

 I'm facing this at the moment: not a big deal, but I'm using histogram2d
 on some data. I just realized that it may have some NaNs in it, and I
 have no idea how those are being handled. I also want to move to masked
 arrays and have no idea if histogram2d can deal with those. At the
 least, I need to do some testing, and I suspect I'll need to do some
 hacking on histogram2d (or just write my own).

 I'm sure I'm not the only one in the world that needs to histogram some
 data that may have invalid values -- so wouldn't it be nice if that were
 already handled in a defined way?

histogram2d handles neither masked arrays nor arrays with nans
correctly, but assuming you want to drop all columns that have at
least one missing value, then it is just one small step. Unless you
want to replace the missing value with the mean, or a conditional
prediction, or by interpolation.

This could be included in the histogram function.

 x = np.ma.array([[1,2, 3],[2,1,1]], mask=[[0, 1,0], [0,0,0]])
 np.histogram2d(x[0],x[1],bins=3)
(array([[ 0.,  0.,  1.],
   [ 1.,  0.,  0.],
   [ 1.,  0.,  0.]]), array([ 1.,  1.6667,
2.,  3.]), array([ 1.,  1.,
1.6667,  2.]))

 x2=x[:,~x.mask.any(0)]
 np.histogram2d(x2[0],x2[1],bins=3)
(array([[ 0.,  0.,  1.],
   [ 0.,  0.,  0.],
   [ 1.,  0.,  0.]]), array([ 1.,  1.6667,
2.,  3.]), array([ 1.,  1.,
1.6667,  2.]))


 x = np.array([[1.,np.nan, 3],[2,1,1]])
 x
array([[  1.,  NaN,   3.],
   [  2.,   1.,   1.]])
 np.histogram2d(x[0],x[1],bins=3)
(array([[ 0.,  0.,  0.],
   [ 0.,  0.,  0.],
   [ 0.,  0.,  0.]]), array([ NaN,  NaN,  NaN,  NaN]), array([ 1.
  ,  1.,  1.6667,  2.]))

 x2=x[:,np.isfinite(x).all(0)]
 np.histogram2d(x2[0],x2[1],bins=3)
(array([[ 0.,  0.,  1.],
   [ 0.,  0.,  0.],
   [ 1.,  0.,  0.]]), array([ 1.,  1.6667,
2.,  3.]), array([ 1.,  1.,
1.6667,  2.]))


Josef

 -Chris






 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR            (206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115       (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread Eric Firing
Ryan May wrote:
 On Thu, Mar 18, 2010 at 2:46 PM, Christopher Barker
 chris.bar...@noaa.gov wrote:
 Gael Varoquaux wrote:
 On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:
 sure -- that's kind of my point -- if EVERY numpy array were
 (potentially) masked, then folks would write code to deal with them
 appropriately.
 That's pretty much saying: I have a complicated problem and I want every
 one else to have to deal with the full complexity of it, even if they
 have a simple problem.
 Well -- I did say it was a fantasy...

 But I disagree -- having invalid data is a very common case. What we
 have now is a situation where we have two parallel systems, masked
 arrays and regular arrays. Each time someone does something new with
 masked arrays, they often find another missing feature, and have to
 solve that. Also, the fact that masked arrays are tacked on means that
 performance suffers.
 
 Case in point, I just found a bug in np.gradient where it forces the
 output to be an ndarray.
 (http://projects.scipy.org/numpy/ticket/1435).  Easy fix that doesn't
 actually require any special casing for masked arrays, just making
 sure to use the proper function to create a new array of the same
 subclass as the input.  However, now for any place that I can't patch
 I have to use a custom function until a fixed numpy is released.
 
 Maybe universal support for masked arrays (and masking invalid points)
 is a pipe dream, but every function in numpy should IMO deal properly
 with subclasses of ndarray.

1) This can't be done in general because subclasses can change things to 
the point where there is little one can count on.  The matrix subclass, 
for example, redefines multiplication and iteration, making it difficult 
to write functions that will work for ndarrays or matrices.

2) There is a lot that can be done to improve the handling of masked 
arrays, and I still believe that much of it should be done at the C 
level, where it can be done with speed and simplicity.  Unfortunately, 
figuring out how to do it well, and implementing it well, will require a 
lot of intensive work.  I suspect it won't get done unless we can figure 
out how to get a qualified person dedicated to it.

Eric

 
 Ryan
 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] peak finding approach

2010-03-18 Thread Davide Cittaro
Hi all, 
Is there a fast numpy way to find the peak boundaries in a (looong, millions of 
points) smoothed signal? I've found some approaches, like this:

z = data[1:-1]
l = data[:-2]
r = data[2:]
f = np.greater(z, l)
f *= np.greater(z, r)
boundaries = np.nonzero(f)

but it is too sensitive... it detects any small variations in slope on the 
shoulders of a bigger peak... 
Any hint?

thanks

d
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] evaluating a function in a matrix element???

2010-03-18 Thread Friedrich Romstedt
2010/3/18 Frank Horowitz fr...@horow.net:
 I'm working on a piece of optimisation code where it turns out to be 
 mathematically convenient to have a matrix where a few pre-chosen elements 
 must be computed at evaluation time for the dot product (i.e. matrix 
 multiplication) of a matrix with a vector.

The following *might* be helpful:

 class X:
... def __mul__(self, other):
... return numpy.random.random() * other
... def __rmul__(self, other):
... return other * numpy.random.random()

Instances of this class calculate the product in-time:

 x = X()
 x * 1
0.222103712078775
 1 * x
0.044647569053175573

How to use it:

 a = numpy.asarray([[X(), X()], [0, 1]])
 a
array([[__main__.X instance at 0x00AABAA8,
__main__.X instance at 0x00E76530],
   [0, 1]], dtype=object)

The first row is purely random, the second purely deterministic:

 numpy.dot(a, [1, 2])
array([1.60154958363, 2], dtype=object)
 numpy.dot(a, [1, 2])
array([2.06294335235, 2], dtype=object)

You can convert back to dtype = numpy.float by result.astype(numpy.float):

 numpy.dot(a, [1, 2]).astype(numpy.float)
array([ 1.33217562,  2.])

Friedrich
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread Darren Dale
On Thu, Mar 18, 2010 at 5:12 PM, Eric Firing efir...@hawaii.edu wrote:
 Ryan May wrote:
 On Thu, Mar 18, 2010 at 2:46 PM, Christopher Barker
 chris.bar...@noaa.gov wrote:
 Gael Varoquaux wrote:
 On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:
 sure -- that's kind of my point -- if EVERY numpy array were
 (potentially) masked, then folks would write code to deal with them
 appropriately.
 That's pretty much saying: I have a complicated problem and I want every
 one else to have to deal with the full complexity of it, even if they
 have a simple problem.
 Well -- I did say it was a fantasy...

 But I disagree -- having invalid data is a very common case. What we
 have now is a situation where we have two parallel systems, masked
 arrays and regular arrays. Each time someone does something new with
 masked arrays, they often find another missing feature, and have to
 solve that. Also, the fact that masked arrays are tacked on means that
 performance suffers.

 Case in point, I just found a bug in np.gradient where it forces the
 output to be an ndarray.
 (http://projects.scipy.org/numpy/ticket/1435).  Easy fix that doesn't
 actually require any special casing for masked arrays, just making
 sure to use the proper function to create a new array of the same
 subclass as the input.  However, now for any place that I can't patch
 I have to use a custom function until a fixed numpy is released.

 Maybe universal support for masked arrays (and masking invalid points)
 is a pipe dream, but every function in numpy should IMO deal properly
 with subclasses of ndarray.

 1) This can't be done in general because subclasses can change things to
 the point where there is little one can count on.  The matrix subclass,
 for example, redefines multiplication and iteration, making it difficult
 to write functions that will work for ndarrays or matrices.

I'm more optimistic that it can be done in general, if we provide a
mechanism where the subclass with highest priority can customize the
execution of the function (ufunc or not). In principle, the subclass
could even override the buffer operation, like in the case of
matrices. It still can put a lot of responsibility on the authors of
the subclass, but what is gained is a framework where np.add (for
example) could yield the appropriate result for any subclass, as
opposed to the current situation of needing to know which add function
can be used for a particular type of input.

All speculative, of course. I'll start throwing some examples together
when I get a chance.

Darren
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] peak finding approach

2010-03-18 Thread josef . pktd
On Thu, Mar 18, 2010 at 5:19 PM, Davide Cittaro
davide.citt...@ifom-ieo-campus.it wrote:
 Hi all,
 Is there a fast numpy way to find the peak boundaries in a (looong, millions 
 of points) smoothed signal? I've found some approaches, like this:

 z = data[1:-1]
 l = data[:-2]
 r = data[2:]
 f = np.greater(z, l)
 f *= np.greater(z, r)
 boundaries = np.nonzero(f)

 but it is too sensitive... it detects any small variations in slope on the 
 shoulders of a bigger peak...
 Any hint?

to find peaks something like the  following works

np.nonzero(data == maximumfilter data, larger window size)

where maximum filter is in scipy.signal or ndimage

I'm not sure about boundaries, maybe the same with minimum filter

Josef


 thanks

 d
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread Christopher Barker
josef.p...@gmail.com wrote:
 I'm facing this at the moment: not a big deal, but I'm using histogram2d
 on some data. I just realized that it may have some NaNs in it, and I
 have no idea how those are being handled.

 histogram2d handles neither masked arrays nor arrays with nans
 correctly,

I really wasn't asking for help (yet) .. but thanks!

 x2=x[:,np.isfinite(x).all(0)]
 np.histogram2d(x2[0],x2[1],bins=3)
 (array([[ 0.,  0.,  1.],
[ 0.,  0.,  0.],
[ 1.,  0.,  0.]]), array([ 1.,  1.6667,
 2.,  3.]), array([ 1.,  1.,
 1.6667,  2.]))

I'll probably do something like that for now. I guess the question is -- 
should this be built in to histogram2d (and other similar functions)?

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warnings in numpy.ma.test()

2010-03-18 Thread josef . pktd
On Thu, Mar 18, 2010 at 7:26 PM, Christopher Barker
chris.bar...@noaa.gov wrote:
 josef.p...@gmail.com wrote:
 I'm facing this at the moment: not a big deal, but I'm using histogram2d
 on some data. I just realized that it may have some NaNs in it, and I
 have no idea how those are being handled.

 histogram2d handles neither masked arrays nor arrays with nans
 correctly,

 I really wasn't asking for help (yet) .. but thanks!

 x2=x[:,np.isfinite(x).all(0)]
 np.histogram2d(x2[0],x2[1],bins=3)
 (array([[ 0.,  0.,  1.],
        [ 0.,  0.,  0.],
        [ 1.,  0.,  0.]]), array([ 1.        ,  1.6667,
 2.,  3.        ]), array([ 1.        ,  1.,
 1.6667,  2.        ]))

 I'll probably do something like that for now. I guess the question is --
 should this be built in to histogram2d (and other similar functions)?

I think yes, for all functions that are closer to actual data and
where there is an obvious way to handle the missing values. But, it's
work and adds a lot of code to a nice simple function. And if it's
just one extra line for the user, than it is not too high on my
priority.

For example, I rewrote stats.zscore a while ago to handle also
matrices and masked arrays, and Bruce rewrote geometric mean and
others, but these are easy cases, for many of the other functions it's
more work.

Also. if a function gets too much overhead, I end up rewriting and
inlining the core of the function over and over again when I need it
inside a loop, for example for optimization, or I keep a copy of the
function around that doesn't use the overhead.

I actually do little profiling, so I don't really know what the cost
would be in a loop.

Josef


 -Chris



 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR            (206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115       (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion