[Numpy-discussion] evaluating a function in a matrix element???
Dear All, I'm working on a piece of optimisation code where it turns out to be mathematically convenient to have a matrix where a few pre-chosen elements must be computed at evaluation time for the dot product (i.e. matrix multiplication) of a matrix with a vector. As I see the problem, there are two basic approaches to accomplishing this. First (and perhaps conceptually simplest, not to mention apparently faster) might be to stash appropriate functions at their corresponding locations in the matrix (with the rest of the matrix being constants, as usual). I mucked around with this for a little while in iPython, and it appears that having dtype == object_ works for stashing the references to the functions, but fails to allow me to actually evaluate the function(s) when the matrix is used in the dot product. Does anyone have any experience with making such a beast work within numpy? If so, how? The second basic approach is to build a ufunc that implements the switching logic, and returns the constants and evaluated functions in the appropriate locations. This seems to be the more do-able approach, but it requires the ufunc to be aware of both the position of each element (via index, or somesuch) as well as the arguments to the functions themselves being evaluated at the matrix elements. It appears that frompyfunc() nearly does what I want, but I am currently failing to see how to actually *use* it for anything more elaborate than the octal example code (i.e. one value in, and one value out). Does anyone have any other more elaborate examples they can point me towards? Thanks in advance for any help you might be able to provide! Frank Horowitz fr...@horow.net ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in ndarray.resize?
A Wednesday 17 March 2010 22:56:20 Charles R Harris escrigué: On Wed, Mar 17, 2010 at 8:42 AM, Alan G Isaac ais...@american.edu wrote: On 3/17/2010 10:16 AM, josef.p...@gmail.com wrote: numpy.resize(a, new_shape) Return a new array with the specified shape. If the new array is larger than the original array, then the new array is filled with repeated copied of a. Note that this behavior is different from a.resize(new_shape) which fills with zeros instead of repeated copies of a. Yes indeed. Sorry, I must have scrolled the help without realizing it, and this part was at the top. So my follow up: why is this desirable/necessary? (I find it surprising.) IIRC, it behaved that way in Numeric. This does not mean that this behaviour is desirable. I find it inconsistent and misleading so +1 for fixing it. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in ndarray.resize?
On Wed, Mar 17, 2010 at 10:56 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Mar 17, 2010 at 8:42 AM, Alan G Isaac ais...@american.edu wrote: On 3/17/2010 10:16 AM, josef.p...@gmail.com wrote: numpy.resize(a, new_shape) Return a new array with the specified shape. If the new array is larger than the original array, then the new array is filled with repeated copied of a. Note that this behavior is different from a.resize(new_shape) which fills with zeros instead of repeated copies of a. Yes indeed. Sorry, I must have scrolled the help without realizing it, and this part was at the top. So my follow up: why is this desirable/necessary? (I find it surprising.) IIRC, it behaved that way in Numeric. How would people feel about unifying the function vs. the method behavior ? One could add an addition option like `repeat` or `fillZero`. One could (at first !?) keep opposite defaults to not change the current behavior. But this way it would be most visible and clear what is going on. Regards, Sebastian Haase ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in ndarray.resize?
On Thu, Mar 18, 2010 at 09:56:21AM +0100, Sebastian Haase wrote: How would people feel about unifying the function vs. the method behavior ? One could add an addition option like `repeat` or `fillZero`. You mean fill_zero... Sorry, my linter went off. :) Gaël ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ufunc improvements [Was: Warnings in numpy.ma.test()]
On Wed, Mar 17, 2010 at 10:16 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Mar 17, 2010 at 7:39 PM, Darren Dale dsdal...@gmail.com wrote: On Wed, Mar 17, 2010 at 8:22 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Mar 17, 2010 at 5:26 PM, Darren Dale dsdal...@gmail.com wrote: On Wed, Mar 17, 2010 at 5:43 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Mar 17, 2010 at 3:13 PM, Darren Dale dsdal...@gmail.com wrote: On Wed, Mar 17, 2010 at 4:48 PM, Pierre GM pgmdevl...@gmail.com wrote: On Mar 17, 2010, at 8:19 AM, Darren Dale wrote: I started thinking about a third method called __input_prepare__ that would be called on the way into the ufunc, which would allow you to intercept the input and pass a somehow modified copy back to the ufunc. The total flow would be: 1) Call myufunc(x, y[, z]) 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns x', y' (or simply passes through x,y by default) 3) myufunc creates the output array z (if not specified) and calls ?.__array_prepare__(z, (myufunc, x, y, ...)) 4) myufunc finally gets around to performing the calculation 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns the result to the caller Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support. Sounds like a good plan. If we could find a way to merge the first two (__input_prepare__ and __array_prepare__), that'd be ideal. I think it is better to keep them separate, so we don't have one method that is trying to do too much. It would be easier to explain in the documentation. I may not have much time to look into this until after Monday. Is there a deadline we need to consider? I don't think this should go into 2.0, I think it needs more thought. Now that you mention it, I agree that it would be too rushed to try to get it in for 2.0. Concerning a later release, is there anything in particular that you think needs to be clarified or reconsidered? And 2.0 already has significant code churn. Is there any reason beyond a big hassle not to set/restore the error state around all the ufunc calls in ma? Beyond that, the PEP that you pointed to looks interesting. Maybe some sort of decorator around ufunc calls could also be made to work. I think the PEP is interesting, but it is languishing. There were some questions and criticisms on the mailing list that I do not think were satisfactorily addressed, and as far as I know the author of the PEP has not pursued the matter further. There was some interest on the python-dev mailing list in the numpy community's use case, but I think we need to consider what can be done now to meet the needs of ndarray subclasses. I don't see PEP 3124 happening in the near future. What I am proposing is a simple extension to our existing framework to let subclasses hook into ufuncs and customize their behavior based on the context of the operation (using the __array_priority__ of the inputs and/or outputs, and the identity of the ufunc). The steps I listed allow customization at the critical steps: prepare the input, prepare the output, populate the output (currently no proposal for customization here), and finalize the output. The only additional step proposed is to prepare the input. What bothers me here is the opposing desire to separate ufuncs from their ndarray dependency, having them operate on buffer objects instead. As I see it ufuncs would be split into layers, with a lower layer operating on buffer objects, and an upper layer tying them together with ndarrays where the business logic -- kinds, casting, etc -- resides. It is in that upper layer that what you are proposing would reside. Mind, I'm not sure that having matrices and masked arrays subclassing ndarray was the way to go, but given that they do one possible solution is to dump the whole mess onto the subtype with the highest priority. That subtype would then be responsible for casts and all the other stuff needed for the call and wrapping the result. There could be library routines to help with that. It seems to me that that would be the most general way to go. In that sense ndarrays themselves would just be another subtype with especially low priority. I'm sorry, I didn't understand your point. What you described sounds identical to how things are currently done. What distinction are you making, aside from operating on the buffer object? How would adding a method to modify the input to a ufunc complicate the situation? Just *one* function to rule them all and on the subtype dump it. No
[Numpy-discussion] Int bitsize in python and c
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I work on a 64bit machine with 64bits enable fedora on it. I just discovered that numpy.int on the python part are 64bits ints, while npy_int in the C api are 32bits ints. I can live with it, but it seems to be different on 32bit machines, hence I wonder what is the right way to do when retrieving an array from python to C. Here is what I use now: data_pyarray = (PyArrayObject *)PyArray_ContiguousFromObject(data_list, PyArray_INT, 1, 2); but that implies that I send np.int32 arrays to the C part. Should I use longs instead ? Regards, Martin -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJLoiuqAAoJEBdvyODiyJI4TikIAIUpnsIxxeYMlz8qEeZL/UUB 3UTGOCcrcIICPVRW/CLbOss5W4xe8BTxPslRXZfckSuMMgHHiD3rGC302gZgfvsb mS6fcDzTOboJ1da1xoczpJYVCwvC9aWAPEjEDa6jyI331pDAXABurmjzIQqjowDw 1cWX5swt9MeSn0yOa/a2EYQP8Xj+n0RQlSIutEDR5jktlK3yyHX8LAtZd0tAPgrd hr9RGwO09Hwcn7ke4B9SwHF7Zg/mBrHgdTdaufW+kjPleZ479lyMO8r/LsWbehVo usQ5wefnmnzhDhOoxff8aKUo8D+Ne8gqxI4BR5EOAdHfQ2uUPpBA91pJ0cNbzZI= =E0XH -END PGP SIGNATURE- attachment: martin_raspaud.vcf___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [OT] Starving CPUs article featured in IEEE's ComputingNow portal
Hi, Konrad Hinsen has just told me that my article Why Modern CPUs Are Starving and What Can Be Done About It, which has just released on the March/April issue of Computing in Science and Engineering, also made into this month's free-access selection on IEEE's ComputingNow portal: http://www.computer.org/portal/web/computingnow http://www.computer.org/portal/web/computingnow/0310/whatsnew/cise On it, I discuss one of my favourite subjects, memory access, and why it is important for nowadays computing. There are also recommendations for people wanting to squeeze the maximum of performance out of their computers. And, although I tried to be as language-agnostic as I could, there can be seen some Python references here and there :-). Well, sorry about this semi-OT but I could not resist :-) -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
On Mar 17, 2010, at 3:18 PM, josef.p...@gmail.com wrote: On Wed, Mar 17, 2010 at 3:12 PM, Christopher Barker chris.bar...@noaa.gov wrote: One of the things I liked about MATLAB was that NaNs were well handled almost all the time. Given all the limitations of NaN, having a masked array is a better way to go, but I'd love it if they were just there, and therefore EVERY numpy function and package built on numpy would handle them gracefully. I had thought that there would be a significant performance penalty, and thus there would be a boatload of if_mask code all over the place, but maybe not. many function are defined differently for missing values, in stats, regression or time series analysis with the assumption of equally spaced time periods always needs to use special methods to handle missing values. I think Christopher was referring to ufuncs, not necessarily to more complicated functions (like in stats or such). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [OT] Starving CPUs article featured in IEEE's ComputingNow portal
On 18 March 2010 09:57, Francesc Alted fal...@pytables.org wrote: Hi, Konrad Hinsen has just told me that my article Why Modern CPUs Are Starving and What Can Be Done About It, which has just released on the March/April issue of Computing in Science and Engineering, also made into this month's free-access selection on IEEE's ComputingNow portal: Speak for your own CPUs :). But seriously, congratulations on the wide publication of the article; it's an important issue we often don't think enough about. I'm just a little snarky because this exact issue came up for us recently - a visiting astro speaker put it as flops are free - and so I did some tests and found that even without optimizing for memory access, our tasks are already CPU-bound: http://lighthouseinthesky.blogspot.com/2010/03/flops.html In terms of specifics, I was a little surprised you didn't mention FFTW among your software tools that optimize memory access. FFTW's planning scheme seems ideal for ensuring memory locality, as much as possible, during large FFTs. (And in fact I also found that for really large FFTs, reducing padding - memory size - at the cost of a non-power-of-two size was also worth it.) http://www.computer.org/portal/web/computingnow http://www.computer.org/portal/web/computingnow/0310/whatsnew/cise On it, I discuss one of my favourite subjects, memory access, and why it is important for nowadays computing. There are also recommendations for people wanting to squeeze the maximum of performance out of their computers. And, although I tried to be as language-agnostic as I could, there can be seen some Python references here and there :-). Heh. Indeed numexpr is a good tool for this sort of thing; it's an unfortunate fact that simple use of numpy tends to do operations in the pessimal order... Ane Well, sorry about this semi-OT but I could not resist :-) -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Int bitsize in python and c
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dag Sverre Seljebotn skrev: Martin Raspaud wrote: Hello, I work on a 64bit machine with 64bits enable fedora on it. I just discovered that numpy.int on the python part are 64bits ints, while npy_int in the C api are 32bits ints. np.intc Thanks ! But I'm also wondering about the C side. How can I make sure my conversion to PyArray goes well ? Should I use PyArray_LONG ? or is this also platform dependent ? Regards, Martin I can live with it, but it seems to be different on 32bit machines, hence I wonder what is the right way to do when retrieving an array from python to C. Here is what I use now: data_pyarray = (PyArrayObject *)PyArray_ContiguousFromObject(data_list, PyArray_INT, 1, 2); but that implies that I send np.int32 arrays to the C part. Should I use longs instead ? Regards, Martin ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJLokeGAAoJEBdvyODiyJI4tv0H/R+sUTIfvH2JejX85XHUGtOw mUwDkAD2ePj9qNc62RJJQg75B58useoe8zHSNj2NFG7U5fhFYviIQImugJ4dAESo z4WgrIRQE29apyCScRid8lXBHDL8kvJtpI+G/uFQ62jxSzheIDpEEzKUlDMYZeIA MLFhMWqzHVmuNEUVyUdPt+ryL6T23t/uzQEtpt+QKt5U2Vj/dYerRRjilMA5bvLn r+xGIFQqKOLW8MzwJ+zo0nkea7JvfDBbSZdU2oS5yuQ8rdXR/v9OeSfzTI5fuI7C erUTkEozZ+5jShZbmJbihJcjjw61KleF7QfiySOahqW/E/gKYWRSX5doB4rF35A= =MDuv -END PGP SIGNATURE- attachment: martin_raspaud.vcf___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Int bitsize in python and c
On Thu, Mar 18, 2010 at 08:33, Martin Raspaud martin.rasp...@smhi.se wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I work on a 64bit machine with 64bits enable fedora on it. I just discovered that numpy.int on the python part are 64bits ints, while npy_int in the C api are 32bits ints. Note that np.int is just Python's builtin int type (there only for historical reasons). It corresponds to a C long. npy_int corresponds to a C int. I can live with it, but it seems to be different on 32bit machines, hence I wonder what is the right way to do when retrieving an array from python to C. Here is what I use now: data_pyarray = (PyArrayObject *)PyArray_ContiguousFromObject(data_list, PyArray_INT, 1, 2); but that implies that I send np.int32 arrays to the C part. Should I use longs instead ? Not necessarily; C longs are the cause of your problem. On some platforms they are 64-bit, some they are 32-bit. Technically speaking, C ints can vary from platform to platform, but they are typically 32-bits on all modern platforms running numpy. Of course, numpy defaults to using a C long for its integer arrays, just as Python does for its int type, so perhaps using a C long would work best for you. It's platform dependent, but it matches the platform dependent changes in numpy. It depends on what your needs are. If you need a consistent size (perhaps you are writing bytes out to a file), then always use the int32 or int64 specific types. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
On Mar 17, 2010, at 5:43 PM, Charles R Harris wrote: On Wed, Mar 17, 2010 at 3:13 PM, Darren Dale dsdal...@gmail.com wrote: On Wed, Mar 17, 2010 at 4:48 PM, Pierre GM pgmdevl...@gmail.com wrote: On Mar 17, 2010, at 8:19 AM, Darren Dale wrote: I started thinking about a third method called __input_prepare__ that would be called on the way into the ufunc, which would allow you to intercept the input and pass a somehow modified copy back to the ufunc. The total flow would be: 1) Call myufunc(x, y[, z]) 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns x', y' (or simply passes through x,y by default) 3) myufunc creates the output array z (if not specified) and calls ?.__array_prepare__(z, (myufunc, x, y, ...)) 4) myufunc finally gets around to performing the calculation 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns the result to the caller Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support. Sounds like a good plan. If we could find a way to merge the first two (__input_prepare__ and __array_prepare__), that'd be ideal. I think it is better to keep them separate, so we don't have one method that is trying to do too much. It would be easier to explain in the documentation. I may not have much time to look into this until after Monday. Is there a deadline we need to consider? I don't think this should go into 2.0, I think it needs more thought. And 2.0 already has significant code churn. Is there any reason beyond a big hassle not to set/restore the error state around all the ufunc calls in ma? Should be done with r8295. Please let me know whether I missed one. Otherwise, I agree with Chuck. Let's take some time to figure something. It'd be significant a change that it shouldn't go in 2.0 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
On 03/17/2010 04:20 PM, Pierre GM wrote: On Mar 17, 2010, at 11:09 AM, Bruce Southey wrote: On 03/17/2010 01:07 AM, Pierre GM wrote: All, As you're probably aware, the current test suite for numpy.ma raises some nagging warnings such as invalid value in These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray, instead of its numpy.ma (eg., np.ma.sqrt) equivalent. The reason is that the masked versions of the ufuncs temporarily set the numpy error status to 'ignore' before the operation takes place, and reset the status to its original value. Perhaps naive question, what is really being tested here? That is, it appears that you are testing both the generation of the invalid values and function. So if the generation fails, then the function will also fail. However, the test for the generation of invalid values should be elsewhere so you have to assume that the generation of values will work correctly. That's not really the point here. The issue is that when numpy ufuncs are called on a MaskedArray, a warning or an exception is raised when an invalid is met. With the numpy.ma version of those functions, the error is trapped and processed. Of course, using the numpy.ma version of the ufuncs is the right way to go I think that you should be only testing that the specific function passes the test. Why not just use 'invalid' values like np.inf directly? For example, in numpy/ma/tests/test_core.py We have this test: def test_fix_invalid(self): Checks fix_invalid. data = masked_array(np.sqrt([-1., 0., 1.]), mask=[0, 0, 1]) data_fixed = fix_invalid(data) If that is to test that fix_invalid Why not create the data array as: data = masked_array([np.inf, 0., 1.]), mask=[0, 0, 1]) Sure, that's nicer. But once again, that's not really the core of the issue. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I needed to point out that your statement was not completely correct: 'These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray...'. There are valid warnings for some of the tests because these are do not operate on masked arrays. As in the above example, the masked array only occurs *after* the square root has been taken and hence the warning. So some of the warnings in the tests should be eliminated just by changing the test input. Furthermore, this warning is technically valid for non-masked values: np.sqrt(np.ma.array([-1], mask=[0])) Warning: invalid value encountered in sqrt masked_array(data = [--], mask = [ True], fill_value = 1e+20) In contrast not warning is emitted with ma function: np.ma.sqrt(np.ma.array([-1], mask=[0])) masked_array(data = [--], mask = [ True], fill_value = 1e+20) But I fully agree that there is a problem when the value is masked because it should have ignored the operation: np.sqrt(np.ma.array([-1], mask=[1])) Warning: invalid value encountered in sqrt masked_array(data = [--], mask = [ True], fill_value = 1e+20) Here I understand your view is that if the operation is on a masked array then all 'invalid' operations like square root then these should be silently converted to masked values. Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
On Mar 17, 2010, at 9:16 PM, Charles R Harris wrote: On Wed, Mar 17, 2010 at 7:39 PM, Darren Dale dsdal...@gmail.com wrote: On Wed, Mar 17, 2010 at 8:22 PM, Charles R Harris charlesr.har...@gmail.com wrote: What bothers me here is the opposing desire to separate ufuncs from their ndarray dependency, having them operate on buffer objects instead. As I see it ufuncs would be split into layers, with a lower layer operating on buffer objects, and an upper layer tying them together with ndarrays where the business logic -- kinds, casting, etc -- resides. It is in that upper layer that what you are proposing would reside. Mind, I'm not sure that having matrices and masked arrays subclassing ndarray was the way to go, but given that they do one possible solution is to dump the whole mess onto the subtype with the highest priority. That subtype would then be responsible for casts and all the other stuff needed for the call and wrapping the result. There could be library routines to help with that. It seems to me that that would be the most general way to go. In that sense ndarrays themselves would just be another subtype with especially low priority. I'm sorry, I didn't understand your point. What you described sounds identical to how things are currently done. What distinction are you making, aside from operating on the buffer object? How would adding a method to modify the input to a ufunc complicate the situation? Just *one* function to rule them all and on the subtype dump it. No __array_wrap__, __input_prepare__, or __array_prepare__, just something like __handle_ufunc__. So it is similar but perhaps more radical. I'm proposing having the ufunc upper layer do nothing but decide which argument type will do all the rest of the work, casting, calling the low level ufunc base, providing buffers, wrapping, etc. Instead of pasting bits and pieces into the existing framework I would like to lay out a line of attack that ends up separating ufuncs into smaller pieces that provide low level routines that work on strided memory while leaving policy implementation to the subtype. There would need to be some default type (ndarray) when the functions are called on nested lists and scalars and I'm not sure of the best way to handle that. I'm just sort of thinking out loud, don't take it too seriously. Still, I like it. It sounds far cleaner than the current Harlequin's costume approach. In the thinking out loud department: * the upper layer should allow the user to modify the input on the fly (a current limitation of __array_prepare__), so that we can change values that we know will give invalid results before the lower layer processes them. * It'd be nice to have the domains defined in the functions. Right now, the domains are defined in numpy.ma.core for unary and binary functions. Maybe we could extend the 'context' of each ufunc ? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
On Mar 18, 2010, at 11:03 AM, Bruce Southey wrote: On 03/17/2010 04:20 PM, Pierre GM wrote: On Mar 17, 2010, at 11:09 AM, Bruce Southey wrote: On 03/17/2010 01:07 AM, Pierre GM wrote: All, As you're probably aware, the current test suite for numpy.ma raises some nagging warnings such as invalid value in These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray, instead of its numpy.ma (eg., np.ma.sqrt) equivalent. The reason is that the masked versions of the ufuncs temporarily set the numpy error status to 'ignore' before the operation takes place, and reset the status to its original value. Perhaps naive question, what is really being tested here? That is, it appears that you are testing both the generation of the invalid values and function. So if the generation fails, then the function will also fail. However, the test for the generation of invalid values should be elsewhere so you have to assume that the generation of values will work correctly. That's not really the point here. The issue is that when numpy ufuncs are called on a MaskedArray, a warning or an exception is raised when an invalid is met. With the numpy.ma version of those functions, the error is trapped and processed. Of course, using the numpy.ma version of the ufuncs is the right way to go I think that you should be only testing that the specific function passes the test. Why not just use 'invalid' values like np.inf directly? For example, in numpy/ma/tests/test_core.py We have this test: def test_fix_invalid(self): Checks fix_invalid. data = masked_array(np.sqrt([-1., 0., 1.]), mask=[0, 0, 1]) data_fixed = fix_invalid(data) If that is to test that fix_invalid Why not create the data array as: data = masked_array([np.inf, 0., 1.]), mask=[0, 0, 1]) Sure, that's nicer. But once again, that's not really the core of the issue. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I needed to point out that your statement was not completely correct: 'These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray...'. There are valid warnings for some of the tests because these are do not operate on masked arrays. As in the above example, the masked array only occurs *after* the square root has been taken and hence the warning. So some of the warnings in the tests should be eliminated just by changing the test input. Dang, I knew I was forgetting something... OK, I'll work on that. But anyway, I agree with you, some of the tests are not particularly well-thought ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Daniel Wright
http://carshowcolombia.com/John.html ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ask.scipy.org is online
Hello, Please tolerate my impatience for being the first announcing the new discussion platform :) and my cross-posting over the lists. The new site is beating at ask.scipy.org . David Warde-Farley is moving the questions from the old-site at advice.mechanicalkern.com (announced at SciPy09 by Robert Kern) We welcome you to join the discussions there. I have kicked-off the new questions chain by http://ask.scipy.org/en/topic/11-what-is-your-favorite-numpy-feature (Also cross-posted at http://stackoverflow.com/questions/2471872/what-is-your-favorite-numpy-featureto pull more attention to ask.scipy) Please share your questions and comments, and have fun with your discussions. -- Gökhan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [OT] Starving CPUs article featured in IEEE's ComputingNow portal
A Thursday 18 March 2010 16:26:09 Anne Archibald escrigué: Speak for your own CPUs :). But seriously, congratulations on the wide publication of the article; it's an important issue we often don't think enough about. I'm just a little snarky because this exact issue came up for us recently - a visiting astro speaker put it as flops are free - and so I did some tests and found that even without optimizing for memory access, our tasks are already CPU-bound: http://lighthouseinthesky.blogspot.com/2010/03/flops.html Well, I thought that my introduction was enough to convince anybody about the problem, but forgot that you, the scientists, always try to demonstrate things experimentally :-/ Seriously, your example is a clear example of what I'm recommending in the article, i.e. always try to use libraries that are already leverage the blocking technique (that is, taking advantage of both temporal and spatial locality). Don't know about FFTW (never used it, sorry), but after having a look at its home page, I'm pretty convinced that its authors are very conscious about these techniques. Being said this, it seems that, in addition, you are applying the blocking technique yourself also: get the data in bunches (256 floating point elements, which fits perfectly well on modern L1 caches), apply your computation (in this case, FFTW) and put the result back in memory. A perfect example of what I wanted to show to the readers so, congratulations! you made it without the need to read my article (so perhaps the article was not so necessary after all :-) In terms of specifics, I was a little surprised you didn't mention FFTW among your software tools that optimize memory access. FFTW's planning scheme seems ideal for ensuring memory locality, as much as possible, during large FFTs. (And in fact I also found that for really large FFTs, reducing padding - memory size - at the cost of a non-power-of-two size was also worth it.) I must say that I'm quite naïve in many existing great tools for scientific computing. What I know, is that when I need to do something I always look for good existing tools first. So this is basically why I spoke about numexpr and BLAS/LAPACK: I know them well. Heh. Indeed numexpr is a good tool for this sort of thing; it's an unfortunate fact that simple use of numpy tends to do operations in the pessimal order... Well, to honor the truth, NumPy does not have control in the order of the operations in expressions and how temporaries are managed: it is Python who decides that. NumPy only can do what Python wants it to do, and do it as good as possible. And NumPy plays its role reasonably well here, but of course, this is not enough for providing performance. In fact, this problem probably affects to all interpreted languages out there, unless they implement a JIT compiler optimised for evaluating expressions --and this is basically what numexpr is. Anyway, thanks for constructive criticism, I really appreciate it! -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] numpy.gradient() does not return ndarray subclasses
Hi, Can I get someone to look at: http://projects.scipy.org/numpy/ticket/1435 Basically, numpy.gradient() uses numpy.zeros() to create an output array. This breaks the use of any ndarray subclasses, like masked arrays, since the function will only return ndarrays. I've attached a patch that fixes that problem and has a simple test checking output types. With this patch, I can use gradient on masked arrays and get appropriately masked output. If we could, it'd be nice to get this in for 2.0 so that I (and my coworkers who found the bug) don't have to use a custom patched gradient until the next release after that. Thanks, Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma fix_gradient_with_subclasses.diff Description: Binary data ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
josef.p...@gmail.com wrote: On Wed, Mar 17, 2010 at 3:12 PM, Christopher Barker Given all the limitations of NaN, having a masked array is a better way to go, but I'd love it if they were just there, and therefore EVERY numpy function and package built on numpy would handle them gracefully. many function are defined differently for missing values, in stats, regression or time series analysis with the assumption of equally spaced time periods always needs to use special methods to handle missing values. sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately. Plus, you have to operate on two arrays and keep both in memory. So the penalty is pretty high even in C. Only if there is actually a mask, which might make this pretty ugly -- lots of if mask code branching. If a given routine either didn't make sense with missing values, or was simply too costly with them, it could certainly raise an exception if it got an array with a non-null mask. Anyway, just a fantasy, but C-level ufuncs that support masks would be great. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote: sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately. That's pretty much saying: I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem. In my experience, such choice doesn't fair well, unless it is inside a controled codebase, and someone working on that codebase is ready to spend heaps of time working on that specific issue. Gaël ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
On Thu, Mar 18, 2010 at 3:19 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote: sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately. That's pretty much saying: I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem. In my experience, such choice doesn't fair well, unless it is inside a controled codebase, and someone working on that codebase is ready to spend heaps of time working on that specific issue. If the mask doesn't get quietly added during an operation, we would need to keep out the masked arrays at the front door. I worry about speed penalties for pure number crunching, although, if nomask=True gives a fast path, then it might not be too much of a problem. ufuncs are simple enough, but for reduce operations and other more complicated things (linalg, fft) the user would need to control how missing values are supposed to be handled, which still requires special treatment and if mask all over the place. Josef Gaël ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
Gael Varoquaux wrote: On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote: sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately. That's pretty much saying: I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem. Well -- I did say it was a fantasy... But I disagree -- having invalid data is a very common case. What we have now is a situation where we have two parallel systems, masked arrays and regular arrays. Each time someone does something new with masked arrays, they often find another missing feature, and have to solve that. Also, the fact that masked arrays are tacked on means that performance suffers. Maybe it would simply be too ugly, but If I were to start from the ground up with a scientific computing package, I would want to put in support for missing values from that start. There are some cases where is it simply too complicated or to expensive to handle missing values -- fine, then an exception is raised. You may be right about how complicated it would be, and what would happen is that everyone would simply put a: if a.masked: raise (I can't deal with masked dat) stanza at the top of every new method they wrote, but I suspect that if the core infrastructure was in place, it would be used. I'm facing this at the moment: not a big deal, but I'm using histogram2d on some data. I just realized that it may have some NaNs in it, and I have no idea how those are being handled. I also want to move to masked arrays and have no idea if histogram2d can deal with those. At the least, I need to do some testing, and I suspect I'll need to do some hacking on histogram2d (or just write my own). I'm sure I'm not the only one in the world that needs to histogram some data that may have invalid values -- so wouldn't it be nice if that were already handled in a defined way? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
On Thu, Mar 18, 2010 at 2:46 PM, Christopher Barker chris.bar...@noaa.gov wrote: Gael Varoquaux wrote: On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote: sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately. That's pretty much saying: I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem. Well -- I did say it was a fantasy... But I disagree -- having invalid data is a very common case. What we have now is a situation where we have two parallel systems, masked arrays and regular arrays. Each time someone does something new with masked arrays, they often find another missing feature, and have to solve that. Also, the fact that masked arrays are tacked on means that performance suffers. Case in point, I just found a bug in np.gradient where it forces the output to be an ndarray. (http://projects.scipy.org/numpy/ticket/1435). Easy fix that doesn't actually require any special casing for masked arrays, just making sure to use the proper function to create a new array of the same subclass as the input. However, now for any place that I can't patch I have to use a custom function until a fixed numpy is released. Maybe universal support for masked arrays (and masking invalid points) is a pipe dream, but every function in numpy should IMO deal properly with subclasses of ndarray. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
On Thu, Mar 18, 2010 at 3:46 PM, Christopher Barker chris.bar...@noaa.gov wrote: Gael Varoquaux wrote: On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote: sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately. That's pretty much saying: I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem. Well -- I did say it was a fantasy... But I disagree -- having invalid data is a very common case. What we have now is a situation where we have two parallel systems, masked arrays and regular arrays. Each time someone does something new with masked arrays, they often find another missing feature, and have to solve that. Also, the fact that masked arrays are tacked on means that performance suffers. Maybe it would simply be too ugly, but If I were to start from the ground up with a scientific computing package, I would want to put in support for missing values from that start. There are some cases where is it simply too complicated or to expensive to handle missing values -- fine, then an exception is raised. You may be right about how complicated it would be, and what would happen is that everyone would simply put a: if a.masked: raise (I can't deal with masked dat) stanza at the top of every new method they wrote, but I suspect that if the core infrastructure was in place, it would be used. I'm facing this at the moment: not a big deal, but I'm using histogram2d on some data. I just realized that it may have some NaNs in it, and I have no idea how those are being handled. I also want to move to masked arrays and have no idea if histogram2d can deal with those. At the least, I need to do some testing, and I suspect I'll need to do some hacking on histogram2d (or just write my own). I'm sure I'm not the only one in the world that needs to histogram some data that may have invalid values -- so wouldn't it be nice if that were already handled in a defined way? histogram2d handles neither masked arrays nor arrays with nans correctly, but assuming you want to drop all columns that have at least one missing value, then it is just one small step. Unless you want to replace the missing value with the mean, or a conditional prediction, or by interpolation. This could be included in the histogram function. x = np.ma.array([[1,2, 3],[2,1,1]], mask=[[0, 1,0], [0,0,0]]) np.histogram2d(x[0],x[1],bins=3) (array([[ 0., 0., 1.], [ 1., 0., 0.], [ 1., 0., 0.]]), array([ 1., 1.6667, 2., 3.]), array([ 1., 1., 1.6667, 2.])) x2=x[:,~x.mask.any(0)] np.histogram2d(x2[0],x2[1],bins=3) (array([[ 0., 0., 1.], [ 0., 0., 0.], [ 1., 0., 0.]]), array([ 1., 1.6667, 2., 3.]), array([ 1., 1., 1.6667, 2.])) x = np.array([[1.,np.nan, 3],[2,1,1]]) x array([[ 1., NaN, 3.], [ 2., 1., 1.]]) np.histogram2d(x[0],x[1],bins=3) (array([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]]), array([ NaN, NaN, NaN, NaN]), array([ 1. , 1., 1.6667, 2.])) x2=x[:,np.isfinite(x).all(0)] np.histogram2d(x2[0],x2[1],bins=3) (array([[ 0., 0., 1.], [ 0., 0., 0.], [ 1., 0., 0.]]), array([ 1., 1.6667, 2., 3.]), array([ 1., 1., 1.6667, 2.])) Josef -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
Ryan May wrote: On Thu, Mar 18, 2010 at 2:46 PM, Christopher Barker chris.bar...@noaa.gov wrote: Gael Varoquaux wrote: On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote: sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately. That's pretty much saying: I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem. Well -- I did say it was a fantasy... But I disagree -- having invalid data is a very common case. What we have now is a situation where we have two parallel systems, masked arrays and regular arrays. Each time someone does something new with masked arrays, they often find another missing feature, and have to solve that. Also, the fact that masked arrays are tacked on means that performance suffers. Case in point, I just found a bug in np.gradient where it forces the output to be an ndarray. (http://projects.scipy.org/numpy/ticket/1435). Easy fix that doesn't actually require any special casing for masked arrays, just making sure to use the proper function to create a new array of the same subclass as the input. However, now for any place that I can't patch I have to use a custom function until a fixed numpy is released. Maybe universal support for masked arrays (and masking invalid points) is a pipe dream, but every function in numpy should IMO deal properly with subclasses of ndarray. 1) This can't be done in general because subclasses can change things to the point where there is little one can count on. The matrix subclass, for example, redefines multiplication and iteration, making it difficult to write functions that will work for ndarrays or matrices. 2) There is a lot that can be done to improve the handling of masked arrays, and I still believe that much of it should be done at the C level, where it can be done with speed and simplicity. Unfortunately, figuring out how to do it well, and implementing it well, will require a lot of intensive work. I suspect it won't get done unless we can figure out how to get a qualified person dedicated to it. Eric Ryan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] peak finding approach
Hi all, Is there a fast numpy way to find the peak boundaries in a (looong, millions of points) smoothed signal? I've found some approaches, like this: z = data[1:-1] l = data[:-2] r = data[2:] f = np.greater(z, l) f *= np.greater(z, r) boundaries = np.nonzero(f) but it is too sensitive... it detects any small variations in slope on the shoulders of a bigger peak... Any hint? thanks d ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] evaluating a function in a matrix element???
2010/3/18 Frank Horowitz fr...@horow.net: I'm working on a piece of optimisation code where it turns out to be mathematically convenient to have a matrix where a few pre-chosen elements must be computed at evaluation time for the dot product (i.e. matrix multiplication) of a matrix with a vector. The following *might* be helpful: class X: ... def __mul__(self, other): ... return numpy.random.random() * other ... def __rmul__(self, other): ... return other * numpy.random.random() Instances of this class calculate the product in-time: x = X() x * 1 0.222103712078775 1 * x 0.044647569053175573 How to use it: a = numpy.asarray([[X(), X()], [0, 1]]) a array([[__main__.X instance at 0x00AABAA8, __main__.X instance at 0x00E76530], [0, 1]], dtype=object) The first row is purely random, the second purely deterministic: numpy.dot(a, [1, 2]) array([1.60154958363, 2], dtype=object) numpy.dot(a, [1, 2]) array([2.06294335235, 2], dtype=object) You can convert back to dtype = numpy.float by result.astype(numpy.float): numpy.dot(a, [1, 2]).astype(numpy.float) array([ 1.33217562, 2.]) Friedrich ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
On Thu, Mar 18, 2010 at 5:12 PM, Eric Firing efir...@hawaii.edu wrote: Ryan May wrote: On Thu, Mar 18, 2010 at 2:46 PM, Christopher Barker chris.bar...@noaa.gov wrote: Gael Varoquaux wrote: On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote: sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately. That's pretty much saying: I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem. Well -- I did say it was a fantasy... But I disagree -- having invalid data is a very common case. What we have now is a situation where we have two parallel systems, masked arrays and regular arrays. Each time someone does something new with masked arrays, they often find another missing feature, and have to solve that. Also, the fact that masked arrays are tacked on means that performance suffers. Case in point, I just found a bug in np.gradient where it forces the output to be an ndarray. (http://projects.scipy.org/numpy/ticket/1435). Easy fix that doesn't actually require any special casing for masked arrays, just making sure to use the proper function to create a new array of the same subclass as the input. However, now for any place that I can't patch I have to use a custom function until a fixed numpy is released. Maybe universal support for masked arrays (and masking invalid points) is a pipe dream, but every function in numpy should IMO deal properly with subclasses of ndarray. 1) This can't be done in general because subclasses can change things to the point where there is little one can count on. The matrix subclass, for example, redefines multiplication and iteration, making it difficult to write functions that will work for ndarrays or matrices. I'm more optimistic that it can be done in general, if we provide a mechanism where the subclass with highest priority can customize the execution of the function (ufunc or not). In principle, the subclass could even override the buffer operation, like in the case of matrices. It still can put a lot of responsibility on the authors of the subclass, but what is gained is a framework where np.add (for example) could yield the appropriate result for any subclass, as opposed to the current situation of needing to know which add function can be used for a particular type of input. All speculative, of course. I'll start throwing some examples together when I get a chance. Darren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] peak finding approach
On Thu, Mar 18, 2010 at 5:19 PM, Davide Cittaro davide.citt...@ifom-ieo-campus.it wrote: Hi all, Is there a fast numpy way to find the peak boundaries in a (looong, millions of points) smoothed signal? I've found some approaches, like this: z = data[1:-1] l = data[:-2] r = data[2:] f = np.greater(z, l) f *= np.greater(z, r) boundaries = np.nonzero(f) but it is too sensitive... it detects any small variations in slope on the shoulders of a bigger peak... Any hint? to find peaks something like the following works np.nonzero(data == maximumfilter data, larger window size) where maximum filter is in scipy.signal or ndimage I'm not sure about boundaries, maybe the same with minimum filter Josef thanks d ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
josef.p...@gmail.com wrote: I'm facing this at the moment: not a big deal, but I'm using histogram2d on some data. I just realized that it may have some NaNs in it, and I have no idea how those are being handled. histogram2d handles neither masked arrays nor arrays with nans correctly, I really wasn't asking for help (yet) .. but thanks! x2=x[:,np.isfinite(x).all(0)] np.histogram2d(x2[0],x2[1],bins=3) (array([[ 0., 0., 1.], [ 0., 0., 0.], [ 1., 0., 0.]]), array([ 1., 1.6667, 2., 3.]), array([ 1., 1., 1.6667, 2.])) I'll probably do something like that for now. I guess the question is -- should this be built in to histogram2d (and other similar functions)? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Warnings in numpy.ma.test()
On Thu, Mar 18, 2010 at 7:26 PM, Christopher Barker chris.bar...@noaa.gov wrote: josef.p...@gmail.com wrote: I'm facing this at the moment: not a big deal, but I'm using histogram2d on some data. I just realized that it may have some NaNs in it, and I have no idea how those are being handled. histogram2d handles neither masked arrays nor arrays with nans correctly, I really wasn't asking for help (yet) .. but thanks! x2=x[:,np.isfinite(x).all(0)] np.histogram2d(x2[0],x2[1],bins=3) (array([[ 0., 0., 1.], [ 0., 0., 0.], [ 1., 0., 0.]]), array([ 1. , 1.6667, 2., 3. ]), array([ 1. , 1., 1.6667, 2. ])) I'll probably do something like that for now. I guess the question is -- should this be built in to histogram2d (and other similar functions)? I think yes, for all functions that are closer to actual data and where there is an obvious way to handle the missing values. But, it's work and adds a lot of code to a nice simple function. And if it's just one extra line for the user, than it is not too high on my priority. For example, I rewrote stats.zscore a while ago to handle also matrices and masked arrays, and Bruce rewrote geometric mean and others, but these are easy cases, for many of the other functions it's more work. Also. if a function gets too much overhead, I end up rewriting and inlining the core of the function over and over again when I need it inside a loop, for example for optimization, or I keep a copy of the function around that doesn't use the overhead. I actually do little profiling, so I don't really know what the cost would be in a loop. Josef -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion